Sample records for extract statistical information

  1. a Probability-Based Statistical Method to Extract Water Body of TM Images with Missing Information

    NASA Astrophysics Data System (ADS)

    Lian, Shizhong; Chen, Jiangping; Luo, Minghai

    2016-06-01

    Water information cannot be accurately extracted using TM images because true information is lost in some images because of blocking clouds and missing data stripes, thereby water information cannot be accurately extracted. Water is continuously distributed in natural conditions; thus, this paper proposed a new method of water body extraction based on probability statistics to improve the accuracy of water information extraction of TM images with missing information. Different disturbing information of clouds and missing data stripes are simulated. Water information is extracted using global histogram matching, local histogram matching, and the probability-based statistical method in the simulated images. Experiments show that smaller Areal Error and higher Boundary Recall can be obtained using this method compared with the conventional methods.

  2. a Statistical Texture Feature for Building Collapse Information Extraction of SAR Image

    NASA Astrophysics Data System (ADS)

    Li, L.; Yang, H.; Chen, Q.; Liu, X.

    2018-04-01

    Synthetic Aperture Radar (SAR) has become one of the most important ways to extract post-disaster collapsed building information, due to its extreme versatility and almost all-weather, day-and-night working capability, etc. In view of the fact that the inherent statistical distribution of speckle in SAR images is not used to extract collapsed building information, this paper proposed a novel texture feature of statistical models of SAR images to extract the collapsed buildings. In the proposed feature, the texture parameter of G0 distribution from SAR images is used to reflect the uniformity of the target to extract the collapsed building. This feature not only considers the statistical distribution of SAR images, providing more accurate description of the object texture, but also is applied to extract collapsed building information of single-, dual- or full-polarization SAR data. The RADARSAT-2 data of Yushu earthquake which acquired on April 21, 2010 is used to present and analyze the performance of the proposed method. In addition, the applicability of this feature to SAR data with different polarizations is also analysed, which provides decision support for the data selection of collapsed building information extraction.

  3. The extraction and integration framework: a two-process account of statistical learning.

    PubMed

    Thiessen, Erik D; Kronstein, Alexandra T; Hufnagle, Daniel G

    2013-07-01

    The term statistical learning in infancy research originally referred to sensitivity to transitional probabilities. Subsequent research has demonstrated that statistical learning contributes to infant development in a wide array of domains. The range of statistical learning phenomena necessitates a broader view of the processes underlying statistical learning. Learners are sensitive to a much wider range of statistical information than the conditional relations indexed by transitional probabilities, including distributional and cue-based statistics. We propose a novel framework that unifies learning about all of these kinds of statistical structure. From our perspective, learning about conditional relations outputs discrete representations (such as words). Integration across these discrete representations yields sensitivity to cues and distributional information. To achieve sensitivity to all of these kinds of statistical structure, our framework combines processes that extract segments of the input with processes that compare across these extracted items. In this framework, the items extracted from the input serve as exemplars in long-term memory. The similarity structure of those exemplars in long-term memory leads to the discovery of cues and categorical structure, which guides subsequent extraction. The extraction and integration framework provides a way to explain sensitivity to both conditional statistical structure (such as transitional probabilities) and distributional statistical structure (such as item frequency and variability), and also a framework for thinking about how these different aspects of statistical learning influence each other. 2013 APA, all rights reserved

  4. ECG Identification System Using Neural Network with Global and Local Features

    ERIC Educational Resources Information Center

    Tseng, Kuo-Kun; Lee, Dachao; Chen, Charles

    2016-01-01

    This paper proposes a human identification system via extracted electrocardiogram (ECG) signals. Two hierarchical classification structures based on global shape feature and local statistical feature is used to extract ECG signals. Global shape feature represents the outline information of ECG signals and local statistical feature extracts the…

  5. Tagline: Information Extraction for Semi-Structured Text Elements in Medical Progress Notes

    ERIC Educational Resources Information Center

    Finch, Dezon Kile

    2012-01-01

    Text analysis has become an important research activity in the Department of Veterans Affairs (VA). Statistical text mining and natural language processing have been shown to be very effective for extracting useful information from medical documents. However, neither of these techniques is effective at extracting the information stored in…

  6. [Road Extraction in Remote Sensing Images Based on Spectral and Edge Analysis].

    PubMed

    Zhao, Wen-zhi; Luo, Li-qun; Guo, Zhou; Yue, Jun; Yu, Xue-ying; Liu, Hui; Wei, Jing

    2015-10-01

    Roads are typically man-made objects in urban areas. Road extraction from high-resolution images has important applications for urban planning and transportation development. However, due to the confusion of spectral characteristic, it is difficult to distinguish roads from other objects by merely using traditional classification methods that mainly depend on spectral information. Edge is an important feature for the identification of linear objects (e. g. , roads). The distribution patterns of edges vary greatly among different objects. It is crucial to merge edge statistical information into spectral ones. In this study, a new method that combines spectral information and edge statistical features has been proposed. First, edge detection is conducted by using self-adaptive mean-shift algorithm on the panchromatic band, which can greatly reduce pseudo-edges and noise effects. Then, edge statistical features are obtained from the edge statistical model, which measures the length and angle distribution of edges. Finally, by integrating the spectral and edge statistical features, SVM algorithm is used to classify the image and roads are ultimately extracted. A series of experiments are conducted and the results show that the overall accuracy of proposed method is 93% comparing with only 78% overall accuracy of the traditional. The results demonstrate that the proposed method is efficient and valuable for road extraction, especially on high-resolution images.

  7. Table Extraction from Web Pages Using Conditional Random Fields to Extract Toponym Related Data

    NASA Astrophysics Data System (ADS)

    Luthfi Hanifah, Hayyu'; Akbar, Saiful

    2017-01-01

    Table is one of the ways to visualize information on web pages. The abundant number of web pages that compose the World Wide Web has been the motivation of information extraction and information retrieval research, including the research for table extraction. Besides, there is a need for a system which is designed to specifically handle location-related information. Based on this background, this research is conducted to provide a way to extract location-related data from web tables so that it can be used in the development of Geographic Information Retrieval (GIR) system. The location-related data will be identified by the toponym (location name). In this research, a rule-based approach with gazetteer is used to recognize toponym from web table. Meanwhile, to extract data from a table, a combination of rule-based approach and statistical-based approach is used. On the statistical-based approach, Conditional Random Fields (CRF) model is used to understand the schema of the table. The result of table extraction is presented on JSON format. If a web table contains toponym, a field will be added on the JSON document to store the toponym values. This field can be used to index the table data in accordance to the toponym, which then can be used in the development of GIR system.

  8. Words and possible words in early language acquisition.

    PubMed

    Marchetto, Erika; Bonatti, Luca L

    2013-11-01

    In order to acquire language, infants must extract its building blocks-words-and master the rules governing their legal combinations from speech. These two problems are not independent, however: words also have internal structure. Thus, infants must extract two kinds of information from the same speech input. They must find the actual words of their language. Furthermore, they must identify its possible words, that is, the sequences of sounds that, being morphologically well formed, could be words. Here, we show that infants' sensitivity to possible words appears to be more primitive and fundamental than their ability to find actual words. We expose 12- and 18-month-old infants to an artificial language containing a conflict between statistically coherent and structurally coherent items. We show that 18-month-olds can extract possible words when the familiarization stream contains marks of segmentation, but cannot do so when the stream is continuous. Yet, they can find actual words from a continuous stream by computing statistical relationships among syllables. By contrast, 12-month-olds can find possible words when familiarized with a segmented stream, but seem unable to extract statistically coherent items from a continuous stream that contains minimal conflicts between statistical and structural information. These results suggest that sensitivity to word structure is in place earlier than the ability to analyze distributional information. The ability to compute nontrivial statistical relationships becomes fully effective relatively late in development, when infants have already acquired a considerable amount of linguistic knowledge. Thus, mechanisms for structure extraction that do not rely on extensive sampling of the input are likely to have a much larger role in language acquisition than general-purpose statistical abilities. Copyright © 2013. Published by Elsevier Inc.

  9. Statistical learning and language acquisition

    PubMed Central

    Romberg, Alexa R.; Saffran, Jenny R.

    2011-01-01

    Human learners, including infants, are highly sensitive to structure in their environment. Statistical learning refers to the process of extracting this structure. A major question in language acquisition in the past few decades has been the extent to which infants use statistical learning mechanisms to acquire their native language. There have been many demonstrations showing infants’ ability to extract structures in linguistic input, such as the transitional probability between adjacent elements. This paper reviews current research on how statistical learning contributes to language acquisition. Current research is extending the initial findings of infants’ sensitivity to basic statistical information in many different directions, including investigating how infants represent regularities, learn about different levels of language, and integrate information across situations. These current directions emphasize studying statistical language learning in context: within language, within the infant learner, and within the environment as a whole. PMID:21666883

  10. Information retrieval and terminology extraction in online resources for patients with diabetes.

    PubMed

    Seljan, Sanja; Baretić, Maja; Kucis, Vlasta

    2014-06-01

    Terminology use, as a mean for information retrieval or document indexing, plays an important role in health literacy. Specific types of users, i.e. patients with diabetes need access to various online resources (on foreign and/or native language) searching for information on self-education of basic diabetic knowledge, on self-care activities regarding importance of dietetic food, medications, physical exercises and on self-management of insulin pumps. Automatic extraction of corpus-based terminology from online texts, manuals or professional papers, can help in building terminology lists or list of "browsing phrases" useful in information retrieval or in document indexing. Specific terminology lists represent an intermediate step between free text search and controlled vocabulary, between user's demands and existing online resources in native and foreign language. The research aiming to detect the role of terminology in online resources, is conducted on English and Croatian manuals and Croatian online texts, and divided into three interrelated parts: i) comparison of professional and popular terminology use ii) evaluation of automatic statistically-based terminology extraction on English and Croatian texts iii) comparison and evaluation of extracted terminology performed on English manual using statistical and hybrid approaches. Extracted terminology candidates are evaluated by comparison with three types of reference lists: list created by professional medical person, list of highly professional vocabulary contained in MeSH and list created by non-medical persons, made as intersection of 15 lists. Results report on use of popular and professional terminology in online diabetes resources, on evaluation of automatically extracted terminology candidates in English and Croatian texts and on comparison of statistical and hybrid extraction methods in English text. Evaluation of automatic and semi-automatic terminology extraction methods is performed by recall, precision and f-measure.

  11. Extracting laboratory test information from biomedical text

    PubMed Central

    Kang, Yanna Shen; Kayaalp, Mehmet

    2013-01-01

    Background: No previous study reported the efficacy of current natural language processing (NLP) methods for extracting laboratory test information from narrative documents. This study investigates the pathology informatics question of how accurately such information can be extracted from text with the current tools and techniques, especially machine learning and symbolic NLP methods. The study data came from a text corpus maintained by the U.S. Food and Drug Administration, containing a rich set of information on laboratory tests and test devices. Methods: The authors developed a symbolic information extraction (SIE) system to extract device and test specific information about four types of laboratory test entities: Specimens, analytes, units of measures and detection limits. They compared the performance of SIE and three prominent machine learning based NLP systems, LingPipe, GATE and BANNER, each implementing a distinct supervised machine learning method, hidden Markov models, support vector machines and conditional random fields, respectively. Results: Machine learning systems recognized laboratory test entities with moderately high recall, but low precision rates. Their recall rates were relatively higher when the number of distinct entity values (e.g., the spectrum of specimens) was very limited or when lexical morphology of the entity was distinctive (as in units of measures), yet SIE outperformed them with statistically significant margins on extracting specimen, analyte and detection limit information in both precision and F-measure. Its high recall performance was statistically significant on analyte information extraction. Conclusions: Despite its shortcomings against machine learning methods, a well-tailored symbolic system may better discern relevancy among a pile of information of the same type and may outperform a machine learning system by tapping into lexically non-local contextual information such as the document structure. PMID:24083058

  12. Evidence for a Global Sampling Process in Extraction of Summary Statistics of Item Sizes in a Set.

    PubMed

    Tokita, Midori; Ueda, Sachiyo; Ishiguchi, Akira

    2016-01-01

    Several studies have shown that our visual system may construct a "summary statistical representation" over groups of visual objects. Although there is a general understanding that human observers can accurately represent sets of a variety of features, many questions on how summary statistics, such as an average, are computed remain unanswered. This study investigated sampling properties of visual information used by human observers to extract two types of summary statistics of item sets, average and variance. We presented three models of ideal observers to extract the summary statistics: a global sampling model without sampling noise, global sampling model with sampling noise, and limited sampling model. We compared the performance of an ideal observer of each model with that of human observers using statistical efficiency analysis. Results suggest that summary statistics of items in a set may be computed without representing individual items, which makes it possible to discard the limited sampling account. Moreover, the extraction of summary statistics may not necessarily require the representation of individual objects with focused attention when the sets of items are larger than 4.

  13. Multi-scale statistical analysis of coronal solar activity

    DOE PAGES

    Gamborino, Diana; del-Castillo-Negrete, Diego; Martinell, Julio J.

    2016-07-08

    Multi-filter images from the solar corona are used to obtain temperature maps that are analyzed using techniques based on proper orthogonal decomposition (POD) in order to extract dynamical and structural information at various scales. Exploring active regions before and after a solar flare and comparing them with quiet regions, we show that the multi-scale behavior presents distinct statistical properties for each case that can be used to characterize the level of activity in a region. Information about the nature of heat transport is also to be extracted from the analysis.

  14. Traffic crash statistics report, 2005

    DOT National Transportation Integrated Search

    2006-01-01

    The information contained in this Traffic Crash Statistics booklet is extracted from law enforcement : agency long-form reports of traffic crashes. A law enforcement officer must submit a long-form : crash report when investigating: : Motor vehic...

  15. A scan statistic to extract causal gene clusters from case-control genome-wide rare CNV data.

    PubMed

    Nishiyama, Takeshi; Takahashi, Kunihiko; Tango, Toshiro; Pinto, Dalila; Scherer, Stephen W; Takami, Satoshi; Kishino, Hirohisa

    2011-05-26

    Several statistical tests have been developed for analyzing genome-wide association data by incorporating gene pathway information in terms of gene sets. Using these methods, hundreds of gene sets are typically tested, and the tested gene sets often overlap. This overlapping greatly increases the probability of generating false positives, and the results obtained are difficult to interpret, particularly when many gene sets show statistical significance. We propose a flexible statistical framework to circumvent these problems. Inspired by spatial scan statistics for detecting clustering of disease occurrence in the field of epidemiology, we developed a scan statistic to extract disease-associated gene clusters from a whole gene pathway. Extracting one or a few significant gene clusters from a global pathway limits the overall false positive probability, which results in increased statistical power, and facilitates the interpretation of test results. In the present study, we applied our method to genome-wide association data for rare copy-number variations, which have been strongly implicated in common diseases. Application of our method to a simulated dataset demonstrated the high accuracy of this method in detecting disease-associated gene clusters in a whole gene pathway. The scan statistic approach proposed here shows a high level of accuracy in detecting gene clusters in a whole gene pathway. This study has provided a sound statistical framework for analyzing genome-wide rare CNV data by incorporating topological information on the gene pathway.

  16. DARHT Multi-intelligence Seismic and Acoustic Data Analysis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Stevens, Garrison Nicole; Van Buren, Kendra Lu; Hemez, Francois M.

    The purpose of this report is to document the analysis of seismic and acoustic data collected at the Dual-Axis Radiographic Hydrodynamic Test (DARHT) facility at Los Alamos National Laboratory for robust, multi-intelligence decision making. The data utilized herein is obtained from two tri-axial seismic sensors and three acoustic sensors, resulting in a total of nine data channels. The goal of this analysis is to develop a generalized, automated framework to determine internal operations at DARHT using informative features extracted from measurements collected external of the facility. Our framework involves four components: (1) feature extraction, (2) data fusion, (3) classification, andmore » finally (4) robustness analysis. Two approaches are taken for extracting features from the data. The first of these, generic feature extraction, involves extraction of statistical features from the nine data channels. The second approach, event detection, identifies specific events relevant to traffic entering and leaving the facility as well as explosive activities at DARHT and nearby explosive testing sites. Event detection is completed using a two stage method, first utilizing signatures in the frequency domain to identify outliers and second extracting short duration events of interest among these outliers by evaluating residuals of an autoregressive exogenous time series model. Features extracted from each data set are then fused to perform analysis with a multi-intelligence paradigm, where information from multiple data sets are combined to generate more information than available through analysis of each independently. The fused feature set is used to train a statistical classifier and predict the state of operations to inform a decision maker. We demonstrate this classification using both generic statistical features and event detection and provide a comparison of the two methods. Finally, the concept of decision robustness is presented through a preliminary analysis where uncertainty is added to the system through noise in the measurements.« less

  17. Data Flow Analysis and Visualization for Spatiotemporal Statistical Data without Trajectory Information.

    PubMed

    Kim, Seokyeon; Jeong, Seongmin; Woo, Insoo; Jang, Yun; Maciejewski, Ross; Ebert, David S

    2018-03-01

    Geographic visualization research has focused on a variety of techniques to represent and explore spatiotemporal data. The goal of those techniques is to enable users to explore events and interactions over space and time in order to facilitate the discovery of patterns, anomalies and relationships within the data. However, it is difficult to extract and visualize data flow patterns over time for non-directional statistical data without trajectory information. In this work, we develop a novel flow analysis technique to extract, represent, and analyze flow maps of non-directional spatiotemporal data unaccompanied by trajectory information. We estimate a continuous distribution of these events over space and time, and extract flow fields for spatial and temporal changes utilizing a gravity model. Then, we visualize the spatiotemporal patterns in the data by employing flow visualization techniques. The user is presented with temporal trends of geo-referenced discrete events on a map. As such, overall spatiotemporal data flow patterns help users analyze geo-referenced temporal events, such as disease outbreaks, crime patterns, etc. To validate our model, we discard the trajectory information in an origin-destination dataset and apply our technique to the data and compare the derived trajectories and the original. Finally, we present spatiotemporal trend analysis for statistical datasets including twitter data, maritime search and rescue events, and syndromic surveillance.

  18. Road Damage Extraction from Post-Earthquake Uav Images Assisted by Vector Data

    NASA Astrophysics Data System (ADS)

    Chen, Z.; Dou, A.

    2018-04-01

    Extraction of road damage information after earthquake has been regarded as urgent mission. To collect information about stricken areas, Unmanned Aerial Vehicle can be used to obtain images rapidly. This paper put forward a novel method to detect road damage and bring forward a coefficient to assess road accessibility. With the assistance of vector road data, image data of the Jiuzhaigou Ms7.0 Earthquake is tested. In the first, the image is clipped according to vector buffer. Then a large-scale segmentation is applied to remove irrelevant objects. Thirdly, statistics of road features are analysed, and damage information is extracted. Combining with the on-filed investigation, the extraction result is effective.

  19. Ontology-Based Information Extraction for Business Intelligence

    NASA Astrophysics Data System (ADS)

    Saggion, Horacio; Funk, Adam; Maynard, Diana; Bontcheva, Kalina

    Business Intelligence (BI) requires the acquisition and aggregation of key pieces of knowledge from multiple sources in order to provide valuable information to customers or feed statistical BI models and tools. The massive amount of information available to business analysts makes information extraction and other natural language processing tools key enablers for the acquisition and use of that semantic information. We describe the application of ontology-based extraction and merging in the context of a practical e-business application for the EU MUSING Project where the goal is to gather international company intelligence and country/region information. The results of our experiments so far are very promising and we are now in the process of building a complete end-to-end solution.

  20. Data and Statistics on New York's Mining Resources - NYS Dept. of

    Science.gov Websites

    New York's Mining Resources Skip to main navigation Data and Statistics on New York's Mining Resources and review information about the regulated site. Materials Mined in New York- This site provides information on the various material mined in New York and the locations where they are extracted. Mined Land

  1. Four types of ensemble coding in data visualizations.

    PubMed

    Szafir, Danielle Albers; Haroz, Steve; Gleicher, Michael; Franconeri, Steven

    2016-01-01

    Ensemble coding supports rapid extraction of visual statistics about distributed visual information. Researchers typically study this ability with the goal of drawing conclusions about how such coding extracts information from natural scenes. Here we argue that a second domain can serve as another strong inspiration for understanding ensemble coding: graphs, maps, and other visual presentations of data. Data visualizations allow observers to leverage their ability to perform visual ensemble statistics on distributions of spatial or featural visual information to estimate actual statistics on data. We survey the types of visual statistical tasks that occur within data visualizations across everyday examples, such as scatterplots, and more specialized images, such as weather maps or depictions of patterns in text. We divide these tasks into four categories: identification of sets of values, summarization across those values, segmentation of collections, and estimation of structure. We point to unanswered questions for each category and give examples of such cross-pollination in the current literature. Increased collaboration between the data visualization and perceptual psychology research communities can inspire new solutions to challenges in visualization while simultaneously exposing unsolved problems in perception research.

  2. Extracting chemical information from high-resolution Kβ X-ray emission spectroscopy

    NASA Astrophysics Data System (ADS)

    Limandri, S.; Robledo, J.; Tirao, G.

    2018-06-01

    High-resolution X-ray emission spectroscopy allows studying the chemical environment of a wide variety of materials. Chemical information can be obtained by fitting the X-ray spectra and observing the behavior of some spectral features. Spectral changes can also be quantified by means of statistical parameters calculated by considering the spectrum as a probability distribution. Another possibility is to perform statistical multivariate analysis, such as principal component analysis. In this work the performance of these procedures for extracting chemical information in X-ray emission spectroscopy spectra for mixtures of Mn2+ and Mn4+ oxides are studied. A detail analysis of the parameters obtained, as well as the associated uncertainties is shown. The methodologies are also applied for Mn oxidation state characterization of double perovskite oxides Ba1+xLa1-xMnSbO6 (with 0 ≤ x ≤ 0.7). The results show that statistical parameters and multivariate analysis are the most suitable for the analysis of this kind of spectra.

  3. Histogram of gradient and binarized statistical image features of wavelet subband-based palmprint features extraction

    NASA Astrophysics Data System (ADS)

    Attallah, Bilal; Serir, Amina; Chahir, Youssef; Boudjelal, Abdelwahhab

    2017-11-01

    Palmprint recognition systems are dependent on feature extraction. A method of feature extraction using higher discrimination information was developed to characterize palmprint images. In this method, two individual feature extraction techniques are applied to a discrete wavelet transform of a palmprint image, and their outputs are fused. The two techniques used in the fusion are the histogram of gradient and the binarized statistical image features. They are then evaluated using an extreme learning machine classifier before selecting a feature based on principal component analysis. Three palmprint databases, the Hong Kong Polytechnic University (PolyU) Multispectral Palmprint Database, Hong Kong PolyU Palmprint Database II, and the Delhi Touchless (IIDT) Palmprint Database, are used in this study. The study shows that our method effectively identifies and verifies palmprints and outperforms other methods based on feature extraction.

  4. Detection of reflecting surfaces by a statistical model

    NASA Astrophysics Data System (ADS)

    He, Qiang; Chu, Chee-Hung H.

    2009-02-01

    Remote sensing is widely used assess the destruction from natural disasters and to plan relief and recovery operations. How to automatically extract useful features and segment interesting objects from digital images, including remote sensing imagery, becomes a critical task for image understanding. Unfortunately, current research on automated feature extraction is ignorant of contextual information. As a result, the fidelity of populating attributes corresponding to interesting features and objects cannot be satisfied. In this paper, we present an exploration on meaningful object extraction integrating reflecting surfaces. Detection of specular reflecting surfaces can be useful in target identification and then can be applied to environmental monitoring, disaster prediction and analysis, military, and counter-terrorism. Our method is based on a statistical model to capture the statistical properties of specular reflecting surfaces. And then the reflecting surfaces are detected through cluster analysis.

  5. SD-MSAEs: Promoter recognition in human genome based on deep feature extraction.

    PubMed

    Xu, Wenxuan; Zhang, Li; Lu, Yaping

    2016-06-01

    The prediction and recognition of promoter in human genome play an important role in DNA sequence analysis. Entropy, in Shannon sense, of information theory is a multiple utility in bioinformatic details analysis. The relative entropy estimator methods based on statistical divergence (SD) are used to extract meaningful features to distinguish different regions of DNA sequences. In this paper, we choose context feature and use a set of methods of SD to select the most effective n-mers distinguishing promoter regions from other DNA regions in human genome. Extracted from the total possible combinations of n-mers, we can get four sparse distributions based on promoter and non-promoters training samples. The informative n-mers are selected by optimizing the differentiating extents of these distributions. Specially, we combine the advantage of statistical divergence and multiple sparse auto-encoders (MSAEs) in deep learning to extract deep feature for promoter recognition. And then we apply multiple SVMs and a decision model to construct a human promoter recognition method called SD-MSAEs. Framework is flexible that it can integrate new feature extraction or new classification models freely. Experimental results show that our method has high sensitivity and specificity. Copyright © 2016 Elsevier Inc. All rights reserved.

  6. Built-up Areas Extraction in High Resolution SAR Imagery based on the method of Multiple Feature Weighted Fusion

    NASA Astrophysics Data System (ADS)

    Liu, X.; Zhang, J. X.; Zhao, Z.; Ma, A. D.

    2015-06-01

    Synthetic aperture radar in the application of remote sensing technology is becoming more and more widely because of its all-time and all-weather operation, feature extraction research in high resolution SAR image has become a hot topic of concern. In particular, with the continuous improvement of airborne SAR image resolution, image texture information become more abundant. It's of great significance to classification and extraction. In this paper, a novel method for built-up areas extraction using both statistical and structural features is proposed according to the built-up texture features. First of all, statistical texture features and structural features are respectively extracted by classical method of gray level co-occurrence matrix and method of variogram function, and the direction information is considered in this process. Next, feature weights are calculated innovatively according to the Bhattacharyya distance. Then, all features are weighted fusion. At last, the fused image is classified with K-means classification method and the built-up areas are extracted after post classification process. The proposed method has been tested by domestic airborne P band polarization SAR images, at the same time, two groups of experiments based on the method of statistical texture and the method of structural texture were carried out respectively. On the basis of qualitative analysis, quantitative analysis based on the built-up area selected artificially is enforced, in the relatively simple experimentation area, detection rate is more than 90%, in the relatively complex experimentation area, detection rate is also higher than the other two methods. In the study-area, the results show that this method can effectively and accurately extract built-up areas in high resolution airborne SAR imagery.

  7. Extracting Information about the Rotator Cuff from Magnetic Resonance Images Using Deterministic and Random Techniques

    PubMed Central

    De Los Ríos, F. A.; Paluszny, M.

    2015-01-01

    We consider some methods to extract information about the rotator cuff based on magnetic resonance images; the study aims to define an alternative method of display that might facilitate the detection of partial tears in the supraspinatus tendon. Specifically, we are going to use families of ellipsoidal triangular patches to cover the humerus head near the affected area. These patches are going to be textured and displayed with the information of the magnetic resonance images using the trilinear interpolation technique. For the generation of points to texture each patch, we propose a new method that guarantees the uniform distribution of its points using a random statistical method. Its computational cost, defined as the average computing time to generate a fixed number of points, is significantly lower as compared with deterministic and other standard statistical techniques. PMID:25650281

  8. A model for indexing medical documents combining statistical and symbolic knowledge.

    PubMed

    Avillach, Paul; Joubert, Michel; Fieschi, Marius

    2007-10-11

    To develop and evaluate an information processing method based on terminologies, in order to index medical documents in any given documentary context. We designed a model using both symbolic general knowledge extracted from the Unified Medical Language System (UMLS) and statistical knowledge extracted from a domain of application. Using statistical knowledge allowed us to contextualize the general knowledge for every particular situation. For each document studied, the extracted terms are ranked to highlight the most significant ones. The model was tested on a set of 17,079 French standardized discharge summaries (SDSs). The most important ICD-10 term of each SDS was ranked 1st or 2nd by the method in nearly 90% of the cases. The use of several terminologies leads to more precise indexing. The improvement achieved in the models implementation performances as a result of using semantic relationships is encouraging.

  9. A Model for Indexing Medical Documents Combining Statistical and Symbolic Knowledge.

    PubMed Central

    Avillach, Paul; Joubert, Michel; Fieschi, Marius

    2007-01-01

    OBJECTIVES: To develop and evaluate an information processing method based on terminologies, in order to index medical documents in any given documentary context. METHODS: We designed a model using both symbolic general knowledge extracted from the Unified Medical Language System (UMLS) and statistical knowledge extracted from a domain of application. Using statistical knowledge allowed us to contextualize the general knowledge for every particular situation. For each document studied, the extracted terms are ranked to highlight the most significant ones. The model was tested on a set of 17,079 French standardized discharge summaries (SDSs). RESULTS: The most important ICD-10 term of each SDS was ranked 1st or 2nd by the method in nearly 90% of the cases. CONCLUSIONS: The use of several terminologies leads to more precise indexing. The improvement achieved in the model’s implementation performances as a result of using semantic relationships is encouraging. PMID:18693792

  10. Statistical Design Model (SDM) of satellite thermal control subsystem

    NASA Astrophysics Data System (ADS)

    Mirshams, Mehran; Zabihian, Ehsan; Aarabi Chamalishahi, Mahdi

    2016-07-01

    Satellites thermal control, is a satellite subsystem that its main task is keeping the satellite components at its own survival and activity temperatures. Ability of satellite thermal control plays a key role in satisfying satellite's operational requirements and designing this subsystem is a part of satellite design. In the other hand due to the lack of information provided by companies and designers still doesn't have a specific design process while it is one of the fundamental subsystems. The aim of this paper, is to identify and extract statistical design models of spacecraft thermal control subsystem by using SDM design method. This method analyses statistical data with a particular procedure. To implement SDM method, a complete database is required. Therefore, we first collect spacecraft data and create a database, and then we extract statistical graphs using Microsoft Excel, from which we further extract mathematical models. Inputs parameters of the method are mass, mission, and life time of the satellite. For this purpose at first thermal control subsystem has been introduced and hardware using in the this subsystem and its variants has been investigated. In the next part different statistical models has been mentioned and a brief compare will be between them. Finally, this paper particular statistical model is extracted from collected statistical data. Process of testing the accuracy and verifying the method use a case study. Which by the comparisons between the specifications of thermal control subsystem of a fabricated satellite and the analyses results, the methodology in this paper was proved to be effective. Key Words: Thermal control subsystem design, Statistical design model (SDM), Satellite conceptual design, Thermal hardware

  11. Summary of water body extraction methods based on ZY-3 satellite

    NASA Astrophysics Data System (ADS)

    Zhu, Yu; Sun, Li Jian; Zhang, Chuan Yin

    2017-12-01

    Extracting from remote sensing images is one of the main means of water information extraction. Affected by spectral characteristics, many methods can be not applied to the satellite image of ZY-3. To solve this problem, we summarize the extraction methods for ZY-3 and analyze the extraction results of existing methods. According to the characteristics of extraction results, the method of WI& single band threshold and the method of texture filtering based on probability statistics are explored. In addition, the advantages and disadvantages of all methods are compared, which provides some reference for the research of water extraction from images. The obtained conclusions are as follows. 1) NIR has higher water sensitivity, consequently when the surface reflectance in the study area is less similar to water, using single band threshold method or multi band operation can obtain the ideal effect. 2) Compared with the water index and HIS optimal index method, object extraction method based on rules, which takes into account not only the spectral information of the water, but also space and texture feature constraints, can obtain better extraction effect, yet the image segmentation process is time consuming and the definition of the rules requires a certain knowledge. 3) The combination of the spectral relationship and water index can eliminate the interference of the shadow to a certain extent. When there is less small water or small water is not considered in further study, texture filtering based on probability statistics can effectively reduce the noises in result and avoid mixing shadows or paddy field with water in a certain extent.

  12. Independent component analysis for automatic note extraction from musical trills

    NASA Astrophysics Data System (ADS)

    Brown, Judith C.; Smaragdis, Paris

    2004-05-01

    The method of principal component analysis, which is based on second-order statistics (or linear independence), has long been used for redundancy reduction of audio data. The more recent technique of independent component analysis, enforcing much stricter statistical criteria based on higher-order statistical independence, is introduced and shown to be far superior in separating independent musical sources. This theory has been applied to piano trills and a database of trill rates was assembled from experiments with a computer-driven piano, recordings of a professional pianist, and commercially available compact disks. The method of independent component analysis has thus been shown to be an outstanding, effective means of automatically extracting interesting musical information from a sea of redundant data.

  13. A method for automatic feature points extraction of human vertebrae three-dimensional model

    NASA Astrophysics Data System (ADS)

    Wu, Zhen; Wu, Junsheng

    2017-05-01

    A method for automatic extraction of the feature points of the human vertebrae three-dimensional model is presented. Firstly, the statistical model of vertebrae feature points is established based on the results of manual vertebrae feature points extraction. Then anatomical axial analysis of the vertebrae model is performed according to the physiological and morphological characteristics of the vertebrae. Using the axial information obtained from the analysis, a projection relationship between the statistical model and the vertebrae model to be extracted is established. According to the projection relationship, the statistical model is matched with the vertebrae model to get the estimated position of the feature point. Finally, by analyzing the curvature in the spherical neighborhood with the estimated position of feature points, the final position of the feature points is obtained. According to the benchmark result on multiple test models, the mean relative errors of feature point positions are less than 5.98%. At more than half of the positions, the error rate is less than 3% and the minimum mean relative error is 0.19%, which verifies the effectiveness of the method.

  14. An Overview of Biomolecular Event Extraction from Scientific Documents

    PubMed Central

    Vanegas, Jorge A.; Matos, Sérgio; González, Fabio; Oliveira, José L.

    2015-01-01

    This paper presents a review of state-of-the-art approaches to automatic extraction of biomolecular events from scientific texts. Events involving biomolecules such as genes, transcription factors, or enzymes, for example, have a central role in biological processes and functions and provide valuable information for describing physiological and pathogenesis mechanisms. Event extraction from biomedical literature has a broad range of applications, including support for information retrieval, knowledge summarization, and information extraction and discovery. However, automatic event extraction is a challenging task due to the ambiguity and diversity of natural language and higher-level linguistic phenomena, such as speculations and negations, which occur in biological texts and can lead to misunderstanding or incorrect interpretation. Many strategies have been proposed in the last decade, originating from different research areas such as natural language processing, machine learning, and statistics. This review summarizes the most representative approaches in biomolecular event extraction and presents an analysis of the current state of the art and of commonly used methods, features, and tools. Finally, current research trends and future perspectives are also discussed. PMID:26587051

  15. Constructing and Modifying Sequence Statistics for relevent Using informR in 𝖱

    PubMed Central

    Marcum, Christopher Steven; Butts, Carter T.

    2015-01-01

    The informR package greatly simplifies the analysis of complex event histories in 𝖱 by providing user friendly tools to build sufficient statistics for the relevent package. Historically, building sufficient statistics to model event sequences (of the form a→b) using the egocentric generalization of Butts’ (2008) relational event framework for modeling social action has been cumbersome. The informR package simplifies the construction of the complex list of arrays needed by the rem() model fitting for a variety of cases involving egocentric event data, multiple event types, and/or support constraints. This paper introduces these tools using examples from real data extracted from the American Time Use Survey. PMID:26185488

  16. Learning Predictive Statistics: Strategies and Brain Mechanisms.

    PubMed

    Wang, Rui; Shen, Yuan; Tino, Peter; Welchman, Andrew E; Kourtzi, Zoe

    2017-08-30

    When immersed in a new environment, we are challenged to decipher initially incomprehensible streams of sensory information. However, quite rapidly, the brain finds structure and meaning in these incoming signals, helping us to predict and prepare ourselves for future actions. This skill relies on extracting the statistics of event streams in the environment that contain regularities of variable complexity from simple repetitive patterns to complex probabilistic combinations. Here, we test the brain mechanisms that mediate our ability to adapt to the environment's statistics and predict upcoming events. By combining behavioral training and multisession fMRI in human participants (male and female), we track the corticostriatal mechanisms that mediate learning of temporal sequences as they change in structure complexity. We show that learning of predictive structures relates to individual decision strategy; that is, selecting the most probable outcome in a given context (maximizing) versus matching the exact sequence statistics. These strategies engage distinct human brain regions: maximizing engages dorsolateral prefrontal, cingulate, sensory-motor regions, and basal ganglia (dorsal caudate, putamen), whereas matching engages occipitotemporal regions (including the hippocampus) and basal ganglia (ventral caudate). Our findings provide evidence for distinct corticostriatal mechanisms that facilitate our ability to extract behaviorally relevant statistics to make predictions. SIGNIFICANCE STATEMENT Making predictions about future events relies on interpreting streams of information that may initially appear incomprehensible. Past work has studied how humans identify repetitive patterns and associative pairings. However, the natural environment contains regularities that vary in complexity from simple repetition to complex probabilistic combinations. Here, we combine behavior and multisession fMRI to track the brain mechanisms that mediate our ability to adapt to changes in the environment's statistics. We provide evidence for an alternate route for learning complex temporal statistics: extracting the most probable outcome in a given context is implemented by interactions between executive and motor corticostriatal mechanisms compared with visual corticostriatal circuits (including hippocampal cortex) that support learning of the exact temporal statistics. Copyright © 2017 Wang et al.

  17. Comparison of methods of extracting information for meta-analysis of observational studies in nutritional epidemiology.

    PubMed

    Bae, Jong-Myon

    2016-01-01

    A common method for conducting a quantitative systematic review (QSR) for observational studies related to nutritional epidemiology is the "highest versus lowest intake" method (HLM), in which only the information concerning the effect size (ES) of the highest category of a food item is collected on the basis of its lowest category. However, in the interval collapsing method (ICM), a method suggested to enable a maximum utilization of all available information, the ES information is collected by collapsing all categories into a single category. This study aimed to compare the ES and summary effect size (SES) between the HLM and ICM. A QSR for evaluating the citrus fruit intake and risk of pancreatic cancer and calculating the SES by using the HLM was selected. The ES and SES were estimated by performing a meta-analysis using the fixed-effect model. The directionality and statistical significance of the ES and SES were used as criteria for determining the concordance between the HLM and ICM outcomes. No significant differences were observed in the directionality of SES extracted by using the HLM or ICM. The application of the ICM, which uses a broader information base, yielded more-consistent ES and SES, and narrower confidence intervals than the HLM. The ICM is advantageous over the HLM owing to its higher statistical accuracy in extracting information for QSR on nutritional epidemiology. The application of the ICM should hence be recommended for future studies.

  18. Beyond Transitional Probability Computations: Extracting Word-Like Units when Only Statistical Information Is Available

    ERIC Educational Resources Information Center

    Perruchet, Pierre; Poulin-Charronnat, Benedicte

    2012-01-01

    Endress and Mehler (2009) reported that when adult subjects are exposed to an unsegmented artificial language composed from trisyllabic words such as ABX, YBC, and AZC, they are unable to distinguish between these words and what they coined as the "phantom-word" ABC in a subsequent test. This suggests that statistical learning generates knowledge…

  19. The Contribution of Language-Specific Knowledge in the Selection of Statistically-Coherent Word Candidates

    ERIC Educational Resources Information Center

    Toro, Juan M.; Pons, Ferran; Bion, Ricardo A. H.; Sebastian-Galles, Nuria

    2011-01-01

    Much research has explored the extent to which statistical computations account for the extraction of linguistic information. However, it remains to be studied how language-specific constraints are imposed over these computations. In the present study we investigated if the violation of a word-forming rule in Catalan (the presence of more than one…

  20. Traffic crash statistics report, 1998

    DOT National Transportation Integrated Search

    1999-01-01

    The information contained in this Traffic Crash Facts booklet is extracted from law : enforcement agency long-form reports of traffic crashes. A law enforcement officer must submit a : long-form crash report when investigating: : Motor vehicle crashe...

  1. Traffic crash statistics report, 1995

    DOT National Transportation Integrated Search

    1996-01-01

    The information contained in this Traffic Crash Facts booklet is extracted from law enforcement : agency long-form reports of traffic crashes. A law enforcement officer must submit a long-form crash report : when investigating: : Motor vehicle crashe...

  2. Traffic crash statistics report, 1996

    DOT National Transportation Integrated Search

    1997-11-01

    The information contained in this Traffic Crash Facts booklet is extracted from law enforcement : agency long-form reports of traffic crashes. A law enforcement officer must submit a long-form crash report : when investigating: : Motor vehicle crashe...

  3. Traffic crash statistics report, 1994

    DOT National Transportation Integrated Search

    1995-01-01

    The information contained in this Traffic Crash Data booklet is extracted from law enforcement : agency long-form reports of traffic crashes. A law enforcement officer must submit a long-form crash report : when investigating: : Motor vehicle crashes...

  4. Traffic crash statistics report, 2004

    DOT National Transportation Integrated Search

    2005-01-01

    The information contained in this Traffic Crash Data booklet is extracted from law enforcement : agency long-form reports of traffic crashes. A law enforcement officer must submit a long-form crash report : when investigating: : Motor vehicle crashes...

  5. Traffic crash statistics report, 1997

    DOT National Transportation Integrated Search

    1998-01-01

    The information contained in this Traffic Crash Facts booklet is extracted from law enforcement agency : long-form reports of traffic crashes. A law enforcement officer must submit a long-form crash report when : investigating: : Motor vehicle crashe...

  6. Application of wavelet techniques for cancer diagnosis using ultrasound images: A Review.

    PubMed

    Sudarshan, Vidya K; Mookiah, Muthu Rama Krishnan; Acharya, U Rajendra; Chandran, Vinod; Molinari, Filippo; Fujita, Hamido; Ng, Kwan Hoong

    2016-02-01

    Ultrasound is an important and low cost imaging modality used to study the internal organs of human body and blood flow through blood vessels. It uses high frequency sound waves to acquire images of internal organs. It is used to screen normal, benign and malignant tissues of various organs. Healthy and malignant tissues generate different echoes for ultrasound. Hence, it provides useful information about the potential tumor tissues that can be analyzed for diagnostic purposes before therapeutic procedures. Ultrasound images are affected with speckle noise due to an air gap between the transducer probe and the body. The challenge is to design and develop robust image preprocessing, segmentation and feature extraction algorithms to locate the tumor region and to extract subtle information from isolated tumor region for diagnosis. This information can be revealed using a scale space technique such as the Discrete Wavelet Transform (DWT). It decomposes an image into images at different scales using low pass and high pass filters. These filters help to identify the detail or sudden changes in intensity in the image. These changes are reflected in the wavelet coefficients. Various texture, statistical and image based features can be extracted from these coefficients. The extracted features are subjected to statistical analysis to identify the significant features to discriminate normal and malignant ultrasound images using supervised classifiers. This paper presents a review of wavelet techniques used for preprocessing, segmentation and feature extraction of breast, thyroid, ovarian and prostate cancer using ultrasound images. Copyright © 2015 Elsevier Ltd. All rights reserved.

  7. Text mining by Tsallis entropy

    NASA Astrophysics Data System (ADS)

    Jamaati, Maryam; Mehri, Ali

    2018-01-01

    Long-range correlations between the elements of natural languages enable them to convey very complex information. Complex structure of human language, as a manifestation of natural languages, motivates us to apply nonextensive statistical mechanics in text mining. Tsallis entropy appropriately ranks the terms' relevance to document subject, taking advantage of their spatial correlation length. We apply this statistical concept as a new powerful word ranking metric in order to extract keywords of a single document. We carry out an experimental evaluation, which shows capability of the presented method in keyword extraction. We find that, Tsallis entropy has reliable word ranking performance, at the same level of the best previous ranking methods.

  8. Truly work-like work extraction via a single-shot analysis.

    PubMed

    Aberg, Johan

    2013-01-01

    The work content of non-equilibrium systems in relation to a heat bath is often analysed in terms of expectation values of an underlying random work variable. However, when optimizing the expectation value of the extracted work, the resulting extraction process is subject to intrinsic fluctuations, uniquely determined by the Hamiltonian and the initial distribution of the system. These fluctuations can be of the same order as the expected work content per se, in which case the extracted energy is unpredictable, thus intuitively more heat-like than work-like. This raises the question of the 'truly' work-like energy that can be extracted. Here we consider an alternative that corresponds to an essentially fluctuation-free extraction. We show that this quantity can be expressed in terms of a one-shot relative entropy measure introduced in information theory. This suggests that the relations between information theory and statistical mechanics, as illustrated by concepts like Maxwell's demon, Szilard engines and Landauer's principle, extends to the single-shot regime.

  9. Duration of surgical-orthodontic treatment.

    PubMed

    Häll, Birgitta; Jämsä, Tapio; Soukka, Tero; Peltomäki, Timo

    2008-10-01

    To study the duration of surgical-orthodontic treatment with special reference to patients' age and the type of tooth movements, i.e. extraction vs. non-extraction and intrusion before or extrusion after surgery to level the curve of Spee. The material consisted files of 37 consecutive surgical-orthodontic patients. The files were reviewed and gender, diagnosis, type of malocclusion, age at the initiation of treatment, duration of treatment, type of tooth movements (extraction vs. non-extraction and levelling of the curve of Spee before or after operation) and type of operation were retrieved. For statistical analyses two sample t-test, Kruskal-Wallis and Spearman rank correlation tests were used. Mean treatment duration of the sample was 26.8 months, of which pre-surgical orthodontics took on average 17.5 months. Patients with extractions as part of the treatment had statistically and clinically significantly longer treatment duration, on average 8 months, than those without extractions. No other studied variable seemed to have an impact on the treatment time. The present small sample size prevents reliable conclusions to be made. However, the findings suggest, and patients should be informed, that extractions included in the treatment plan increase chances of longer duration of surgical-orthodontic treatment.

  10. Knowledge representation and management: transforming textual information into useful knowledge.

    PubMed

    Rassinoux, A-M

    2010-01-01

    To summarize current outstanding research in the field of knowledge representation and management. Synopsis of the articles selected for the IMIA Yearbook 2010. Four interesting papers, dealing with structured knowledge, have been selected for the section knowledge representation and management. Combining the newest techniques in computational linguistics and natural language processing with the latest methods in statistical data analysis, machine learning and text mining has proved to be efficient for turning unstructured textual information into meaningful knowledge. Three of the four selected papers for the section knowledge representation and management corroborate this approach and depict various experiments conducted to .extract meaningful knowledge from unstructured free texts such as extracting cancer disease characteristics from pathology reports, or extracting protein-protein interactions from biomedical papers, as well as extracting knowledge for the support of hypothesis generation in molecular biology from the Medline literature. Finally, the last paper addresses the level of formally representing and structuring information within clinical terminologies in order to render such information easily available and shareable among the health informatics community. Delivering common powerful tools able to automatically extract meaningful information from the huge amount of electronically unstructured free texts is an essential step towards promoting sharing and reusability across applications, domains, and institutions thus contributing to building capacities worldwide.

  11. The Communicability of Graphical Alternatives to Tabular Displays of Statistical Simulation Studies

    PubMed Central

    Cook, Alex R.; Teo, Shanice W. L.

    2011-01-01

    Simulation studies are often used to assess the frequency properties and optimality of statistical methods. They are typically reported in tables, which may contain hundreds of figures to be contrasted over multiple dimensions. To assess the degree to which these tables are fit for purpose, we performed a randomised cross-over experiment in which statisticians were asked to extract information from (i) such a table sourced from the literature and (ii) a graphical adaptation designed by the authors, and were timed and assessed for accuracy. We developed hierarchical models accounting for differences between individuals of different experience levels (under- and post-graduate), within experience levels, and between different table-graph pairs. In our experiment, information could be extracted quicker and, for less experienced participants, more accurately from graphical presentations than tabular displays. We also performed a literature review to assess the prevalence of hard-to-interpret design features in tables of simulation studies in three popular statistics journals, finding that many are presented innumerately. We recommend simulation studies be presented in graphical form. PMID:22132184

  12. The communicability of graphical alternatives to tabular displays of statistical simulation studies.

    PubMed

    Cook, Alex R; Teo, Shanice W L

    2011-01-01

    Simulation studies are often used to assess the frequency properties and optimality of statistical methods. They are typically reported in tables, which may contain hundreds of figures to be contrasted over multiple dimensions. To assess the degree to which these tables are fit for purpose, we performed a randomised cross-over experiment in which statisticians were asked to extract information from (i) such a table sourced from the literature and (ii) a graphical adaptation designed by the authors, and were timed and assessed for accuracy. We developed hierarchical models accounting for differences between individuals of different experience levels (under- and post-graduate), within experience levels, and between different table-graph pairs. In our experiment, information could be extracted quicker and, for less experienced participants, more accurately from graphical presentations than tabular displays. We also performed a literature review to assess the prevalence of hard-to-interpret design features in tables of simulation studies in three popular statistics journals, finding that many are presented innumerately. We recommend simulation studies be presented in graphical form.

  13. EDGE COMPUTING AND CONTEXTUAL INFORMATION FOR THE INTERNET OF THINGS SENSORS

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Klein, Levente

    Interpreting sensor data require knowledge about sensor placement and the surrounding environment. For a single sensor measurement, it is easy to document the context by visual observation, however for millions of sensors reporting data back to a server, the contextual information needs to be automatically extracted from either data analysis or leveraging complimentary data sources. Data layers that overlap spatially or temporally with sensor locations, can be used to extract the context and to validate the measurement. To minimize the amount of data transmitted through the internet, while preserving signal information content, two methods are explored; computation at the edgemore » and compressed sensing. We validate the above methods on wind and chemical sensor data (1) eliminate redundant measurement from wind sensors and (2) extract peak value of a chemical sensor measuring a methane plume. We present a general cloud based framework to validate sensor data based on statistical and physical modeling and contextual data extracted from geospatial data.« less

  14. Adaptive Statistical Language Modeling; A Maximum Entropy Approach

    DTIC Science & Technology

    1994-04-19

    models exploit the immediate past only. To extract information from further back in the document’s history , I use trigger pairs as the basic information...9 2.2 Context-Free Estimation (Unigram) ...... .................... 12 2.3 Short-Term History (Conventional N-gram...12 2.4 Short-term Class History (Class-Based N-gram) ................... 14 2.5 Intermediate Distance ........ ........................... 16

  15. Smart Extraction and Analysis System for Clinical Research.

    PubMed

    Afzal, Muhammad; Hussain, Maqbool; Khan, Wajahat Ali; Ali, Taqdir; Jamshed, Arif; Lee, Sungyoung

    2017-05-01

    With the increasing use of electronic health records (EHRs), there is a growing need to expand the utilization of EHR data to support clinical research. The key challenge in achieving this goal is the unavailability of smart systems and methods to overcome the issue of data preparation, structuring, and sharing for smooth clinical research. We developed a robust analysis system called the smart extraction and analysis system (SEAS) that consists of two subsystems: (1) the information extraction system (IES), for extracting information from clinical documents, and (2) the survival analysis system (SAS), for a descriptive and predictive analysis to compile the survival statistics and predict the future chance of survivability. The IES subsystem is based on a novel permutation-based pattern recognition method that extracts information from unstructured clinical documents. Similarly, the SAS subsystem is based on a classification and regression tree (CART)-based prediction model for survival analysis. SEAS is evaluated and validated on a real-world case study of head and neck cancer. The overall information extraction accuracy of the system for semistructured text is recorded at 99%, while that for unstructured text is 97%. Furthermore, the automated, unstructured information extraction has reduced the average time spent on manual data entry by 75%, without compromising the accuracy of the system. Moreover, around 88% of patients are found in a terminal or dead state for the highest clinical stage of disease (level IV). Similarly, there is an ∼36% probability of a patient being alive if at least one of the lifestyle risk factors was positive. We presented our work on the development of SEAS to replace costly and time-consuming manual methods with smart automatic extraction of information and survival prediction methods. SEAS has reduced the time and energy of human resources spent unnecessarily on manual tasks.

  16. Analysis of soil moisture extraction algorithm using data from aircraft experiments

    NASA Technical Reports Server (NTRS)

    Burke, H. H. K.; Ho, J. H.

    1981-01-01

    A soil moisture extraction algorithm is developed using a statistical parameter inversion method. Data sets from two aircraft experiments are utilized for the test. Multifrequency microwave radiometric data surface temperature, and soil moisture information are contained in the data sets. The surface and near surface ( or = 5 cm) soil moisture content can be extracted with accuracy of approximately 5% to 6% for bare fields and fields with grass cover by using L, C, and X band radiometer data. This technique is used for handling large amounts of remote sensing data from space.

  17. Permanent first molar extraction in adolescents and young adults and its effect on the development of third molar.

    PubMed

    Halicioglu, Koray; Toptas, Orcun; Akkas, Ismail; Celikoglu, Mevlut

    2014-01-01

    The aim of the present study was to determine the prevalence of permanent first molar (P1M) extraction among Turkish adolescents and young adult subpopulation, and to investigate the effects of P1M extraction on development of the third molars (3Ms) in the same quadrant. A retrospective study including 2,925 panoramic radiographs (PRs) taken from patients (aged 13-20 years) who were examined to identify cases of had at least one maxillary or mandibular P1Ms extracted was performed. Additionally, 294 PRs with the maxillary or mandibular unilateral loss of a P1M were used to assess the developmental grades of the 3Ms. Statistical analyses were performed by means of parametric tests after performing a Shapiro-Wilks normality test to the data. A total of 945 patients (32.3 %) presented with at least one P1M extraction with no gender difference (P = 0.297). There were more cases of mandibular P1Ms extracted (784 patients, 1,066 teeth) than maxillary P1Ms extracted (441 patients, 549 teeth) (P < 0.001). The development of the 3Ms on the extraction side, in the both maxilla and mandible, was significantly accelerated when compared with the contralateral teeth (P = 0.000, P = 0.000, respectively). No statistically significant differences were found in the differences in the developmental of the 3Ms between the maxilla and mandible (P = 0.718). High prevalence of P1Ms extraction among Turkish adolescents and young adults shows a need for targeted dental actions, including prevention and treatment. The development of the 3Ms on the extraction side, in the both maxilla and mandible, was significantly accelerated. To date, no information about prevalence of P1Ms extraction among Turkish adolescents and young adults is documented. In addition, the present study has a larger population and complementary information about 3Ms development than previous studies.

  18. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Klein, Levente

    Interpreting sensor data require knowledge about sensor placement and the surrounding environment. For a single sensor measurement, it is easy to document the context by visual observation, however for millions of sensors reporting data back to a server, the contextual information needs to be automatically extracted from either data analysis or leveraging complimentary data sources. Data layers that overlap spatially or temporally with sensor locations, can be used to extract the context and to validate the measurement. To minimize the amount of data transmitted through the internet, while preserving signal information content, two methods are explored; computation at the edgemore » and compressed sensing. We validate the above methods on wind and chemical sensor data (1) eliminate redundant measurement from wind sensors and (2) extract peak value of a chemical sensor measuring a methane plume. We present a general cloud based framework to validate sensor data based on statistical and physical modeling and contextual data extracted from geospatial data.« less

  19. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Liu, Richen; Guo, Hanqi; Yuan, Xiaoru

    Most of the existing approaches to visualize vector field ensembles are to reveal the uncertainty of individual variables, for example, statistics, variability, etc. However, a user-defined derived feature like vortex or air mass is also quite significant, since they make more sense to domain scientists. In this paper, we present a new framework to extract user-defined derived features from different simulation runs. Specially, we use a detail-to-overview searching scheme to help extract vortex with a user-defined shape. We further compute the geometry information including the size, the geo-spatial location of the extracted vortexes. We also design some linked views tomore » compare them between different runs. At last, the temporal information such as the occurrence time of the feature is further estimated and compared. Results show that our method is capable of extracting the features across different runs and comparing them spatially and temporally.« less

  20. Rapid acquisition of data dense solid-state CPMG NMR spectral sets using multi-dimensional statistical analysis

    DOE PAGES

    Mason, H. E.; Uribe, E. C.; Shusterman, J. A.

    2018-01-01

    Tensor-rank decomposition methods have been applied to variable contact time 29 Si{ 1 H} CP/CPMG NMR data sets to extract NMR dynamics information and dramatically decrease conventional NMR acquisition times.

  1. Rapid acquisition of data dense solid-state CPMG NMR spectral sets using multi-dimensional statistical analysis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mason, H. E.; Uribe, E. C.; Shusterman, J. A.

    Tensor-rank decomposition methods have been applied to variable contact time 29 Si{ 1 H} CP/CPMG NMR data sets to extract NMR dynamics information and dramatically decrease conventional NMR acquisition times.

  2. Long-Term Marine Traffic Monitoring for Environmental Safety in the Aegean Sea

    NASA Astrophysics Data System (ADS)

    Giannakopoulos, T.; Gyftakis, S.; Charou, E.; Perantonis, S.; Nivolianitou, Z.; Koromila, I.; Makrygiorgos, A.

    2015-04-01

    The Aegean Sea is characterized by an extremely high marine safety risk, mainly due to the significant increase of the traffic of tankers from and to the Black Sea that pass through narrow straits formed by the 1600 Greek islands. Reducing the risk of a ship accident is therefore vital to all socio-economic and environmental sectors. This paper presents an online long-term marine traffic monitoring work-flow that focuses on extracting aggregated vessel risks using spatiotemporal analysis of multilayer information: vessel trajectories, vessel data, meteorological data, bathymetric / hydrographic data as well as information regarding environmentally important areas (e.g. protected high-risk areas, etc.). A web interface that enables user-friendly spatiotemporal queries is implemented at the frontend, while a series of data mining functionalities extracts aggregated statistics regarding: (a) marine risks and accident probabilities for particular areas (b) trajectories clustering information (c) general marine statistics (cargo types, etc.) and (d) correlation between spatial environmental importance and marine traffic risk. Towards this end, a set of data clustering and probabilistic graphical modelling techniques has been adopted.

  3. Towards an Obesity-Cancer Knowledge Base: Biomedical Entity Identification and Relation Detection

    PubMed Central

    Lossio-Ventura, Juan Antonio; Hogan, William; Modave, François; Hicks, Amanda; Hanna, Josh; Guo, Yi; He, Zhe; Bian, Jiang

    2017-01-01

    Obesity is associated with increased risks of various types of cancer, as well as a wide range of other chronic diseases. On the other hand, access to health information activates patient participation, and improve their health outcomes. However, existing online information on obesity and its relationship to cancer is heterogeneous ranging from pre-clinical models and case studies to mere hypothesis-based scientific arguments. A formal knowledge representation (i.e., a semantic knowledge base) would help better organizing and delivering quality health information related to obesity and cancer that consumers need. Nevertheless, current ontologies describing obesity, cancer and related entities are not designed to guide automatic knowledge base construction from heterogeneous information sources. Thus, in this paper, we present methods for named-entity recognition (NER) to extract biomedical entities from scholarly articles and for detecting if two biomedical entities are related, with the long term goal of building a obesity-cancer knowledge base. We leverage both linguistic and statistical approaches in the NER task, which supersedes the state-of-the-art results. Further, based on statistical features extracted from the sentences, our method for relation detection obtains an accuracy of 99.3% and a f-measure of 0.993. PMID:28503356

  4. HEDEA: A Python Tool for Extracting and Analysing Semi-structured Information from Medical Records

    PubMed Central

    Aggarwal, Anshul; Garhwal, Sunita

    2018-01-01

    Objectives One of the most important functions for a medical practitioner while treating a patient is to study the patient's complete medical history by going through all records, from test results to doctor's notes. With the increasing use of technology in medicine, these records are mostly digital, alleviating the problem of looking through a stack of papers, which are easily misplaced, but some of these are in an unstructured form. Large parts of clinical reports are in written text form and are tedious to use directly without appropriate pre-processing. In medical research, such health records may be a good, convenient source of medical data; however, lack of structure means that the data is unfit for statistical evaluation. In this paper, we introduce a system to extract, store, retrieve, and analyse information from health records, with a focus on the Indian healthcare scene. Methods A Python-based tool, Healthcare Data Extraction and Analysis (HEDEA), has been designed to extract structured information from various medical records using a regular expression-based approach. Results The HEDEA system is working, covering a large set of formats, to extract and analyse health information. Conclusions This tool can be used to generate analysis report and charts using the central database. This information is only provided after prior approval has been received from the patient for medical research purposes. PMID:29770248

  5. HEDEA: A Python Tool for Extracting and Analysing Semi-structured Information from Medical Records.

    PubMed

    Aggarwal, Anshul; Garhwal, Sunita; Kumar, Ajay

    2018-04-01

    One of the most important functions for a medical practitioner while treating a patient is to study the patient's complete medical history by going through all records, from test results to doctor's notes. With the increasing use of technology in medicine, these records are mostly digital, alleviating the problem of looking through a stack of papers, which are easily misplaced, but some of these are in an unstructured form. Large parts of clinical reports are in written text form and are tedious to use directly without appropriate pre-processing. In medical research, such health records may be a good, convenient source of medical data; however, lack of structure means that the data is unfit for statistical evaluation. In this paper, we introduce a system to extract, store, retrieve, and analyse information from health records, with a focus on the Indian healthcare scene. A Python-based tool, Healthcare Data Extraction and Analysis (HEDEA), has been designed to extract structured information from various medical records using a regular expression-based approach. The HEDEA system is working, covering a large set of formats, to extract and analyse health information. This tool can be used to generate analysis report and charts using the central database. This information is only provided after prior approval has been received from the patient for medical research purposes.

  6. Active learning for ontological event extraction incorporating named entity recognition and unknown word handling.

    PubMed

    Han, Xu; Kim, Jung-jae; Kwoh, Chee Keong

    2016-01-01

    Biomedical text mining may target various kinds of valuable information embedded in the literature, but a critical obstacle to the extension of the mining targets is the cost of manual construction of labeled data, which are required for state-of-the-art supervised learning systems. Active learning is to choose the most informative documents for the supervised learning in order to reduce the amount of required manual annotations. Previous works of active learning, however, focused on the tasks of entity recognition and protein-protein interactions, but not on event extraction tasks for multiple event types. They also did not consider the evidence of event participants, which might be a clue for the presence of events in unlabeled documents. Moreover, the confidence scores of events produced by event extraction systems are not reliable for ranking documents in terms of informativity for supervised learning. We here propose a novel committee-based active learning method that supports multi-event extraction tasks and employs a new statistical method for informativity estimation instead of using the confidence scores from event extraction systems. Our method is based on a committee of two systems as follows: We first employ an event extraction system to filter potential false negatives among unlabeled documents, from which the system does not extract any event. We then develop a statistical method to rank the potential false negatives of unlabeled documents 1) by using a language model that measures the probabilities of the expression of multiple events in documents and 2) by using a named entity recognition system that locates the named entities that can be event arguments (e.g. proteins). The proposed method further deals with unknown words in test data by using word similarity measures. We also apply our active learning method for the task of named entity recognition. We evaluate the proposed method against the BioNLP Shared Tasks datasets, and show that our method can achieve better performance than such previous methods as entropy and Gibbs error based methods and a conventional committee-based method. We also show that the incorporation of named entity recognition into the active learning for event extraction and the unknown word handling further improve the active learning method. In addition, the adaptation of the active learning method into named entity recognition tasks also improves the document selection for manual annotation of named entities.

  7. aCGH-MAS: Analysis of aCGH by means of Multiagent System

    PubMed Central

    Benito, Rocío; Bajo, Javier; Rodríguez, Ana Eugenia; Abáigar, María

    2015-01-01

    There are currently different techniques, such as CGH arrays, to study genetic variations in patients. CGH arrays analyze gains and losses in different regions in the chromosome. Regions with gains or losses in pathologies are important for selecting relevant genes or CNVs (copy-number variations) associated with the variations detected within chromosomes. Information corresponding to mutations, genes, proteins, variations, CNVs, and diseases can be found in different databases and it would be of interest to incorporate information of different sources to extract relevant information. This work proposes a multiagent system to manage the information of aCGH arrays, with the aim of providing an intuitive and extensible system to analyze and interpret the results. The agent roles integrate statistical techniques to select relevant variations and visualization techniques for the interpretation of the final results and to extract relevant information from different sources of information by applying a CBR system. PMID:25874203

  8. Statistical Analysis Techniques for Small Sample Sizes

    NASA Technical Reports Server (NTRS)

    Navard, S. E.

    1984-01-01

    The small sample sizes problem which is encountered when dealing with analysis of space-flight data is examined. Because of such a amount of data available, careful analyses are essential to extract the maximum amount of information with acceptable accuracy. Statistical analysis of small samples is described. The background material necessary for understanding statistical hypothesis testing is outlined and the various tests which can be done on small samples are explained. Emphasis is on the underlying assumptions of each test and on considerations needed to choose the most appropriate test for a given type of analysis.

  9. ExaCT: automatic extraction of clinical trial characteristics from journal publications

    PubMed Central

    2010-01-01

    Background Clinical trials are one of the most important sources of evidence for guiding evidence-based practice and the design of new trials. However, most of this information is available only in free text - e.g., in journal publications - which is labour intensive to process for systematic reviews, meta-analyses, and other evidence synthesis studies. This paper presents an automatic information extraction system, called ExaCT, that assists users with locating and extracting key trial characteristics (e.g., eligibility criteria, sample size, drug dosage, primary outcomes) from full-text journal articles reporting on randomized controlled trials (RCTs). Methods ExaCT consists of two parts: an information extraction (IE) engine that searches the article for text fragments that best describe the trial characteristics, and a web browser-based user interface that allows human reviewers to assess and modify the suggested selections. The IE engine uses a statistical text classifier to locate those sentences that have the highest probability of describing a trial characteristic. Then, the IE engine's second stage applies simple rules to these sentences to extract text fragments containing the target answer. The same approach is used for all 21 trial characteristics selected for this study. Results We evaluated ExaCT using 50 previously unseen articles describing RCTs. The text classifier (first stage) was able to recover 88% of relevant sentences among its top five candidates (top5 recall) with the topmost candidate being relevant in 80% of cases (top1 precision). Precision and recall of the extraction rules (second stage) were 93% and 91%, respectively. Together, the two stages of the extraction engine were able to provide (partially) correct solutions in 992 out of 1050 test tasks (94%), with a majority of these (696) representing fully correct and complete answers. Conclusions Our experiments confirmed the applicability and efficacy of ExaCT. Furthermore, they demonstrated that combining a statistical method with 'weak' extraction rules can identify a variety of study characteristics. The system is flexible and can be extended to handle other characteristics and document types (e.g., study protocols). PMID:20920176

  10. Accessible Information Without Disturbing Partially Known Quantum States on a von Neumann Algebra

    NASA Astrophysics Data System (ADS)

    Kuramochi, Yui

    2018-04-01

    This paper addresses the problem of how much information we can extract without disturbing a statistical experiment, which is a family of partially known normal states on a von Neumann algebra. We define the classical part of a statistical experiment as the restriction of the equivalent minimal sufficient statistical experiment to the center of the outcome space, which, in the case of density operators on a Hilbert space, corresponds to the classical probability distributions appearing in the maximal decomposition by Koashi and Imoto (Phys. Rev. A 66, 022,318 2002). We show that we can access by a Schwarz or completely positive channel at most the classical part of a statistical experiment if we do not disturb the states. We apply this result to the broadcasting problem of a statistical experiment. We also show that the classical part of the direct product of statistical experiments is the direct product of the classical parts of the statistical experiments. The proof of the latter result is based on the theorem that the direct product of minimal sufficient statistical experiments is also minimal sufficient.

  11. SU-F-I-10: Spatially Local Statistics for Adaptive Image Filtering

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Iliopoulos, AS; Sun, X; Floros, D

    Purpose: To facilitate adaptive image filtering operations, addressing spatial variations in both noise and signal. Such issues are prevalent in cone-beam projections, where physical effects such as X-ray scattering result in spatially variant noise, violating common assumptions of homogeneous noise and challenging conventional filtering approaches to signal extraction and noise suppression. Methods: We present a computational mechanism for probing into and quantifying the spatial variance of noise throughout an image. The mechanism builds a pyramid of local statistics at multiple spatial scales; local statistical information at each scale includes (weighted) mean, median, standard deviation, median absolute deviation, as well asmore » histogram or dynamic range after local mean/median shifting. Based on inter-scale differences of local statistics, the spatial scope of distinguishable noise variation is detected in a semi- or un-supervised manner. Additionally, we propose and demonstrate the incorporation of such information in globally parametrized (i.e., non-adaptive) filters, effectively transforming the latter into spatially adaptive filters. The multi-scale mechanism is materialized by efficient algorithms and implemented in parallel CPU/GPU architectures. Results: We demonstrate the impact of local statistics for adaptive image processing and analysis using cone-beam projections of a Catphan phantom, fitted within an annulus to increase X-ray scattering. The effective spatial scope of local statistics calculations is shown to vary throughout the image domain, necessitating multi-scale noise and signal structure analysis. Filtering results with and without spatial filter adaptation are compared visually, illustrating improvements in imaging signal extraction and noise suppression, and in preserving information in low-contrast regions. Conclusion: Local image statistics can be incorporated in filtering operations to equip them with spatial adaptivity to spatial signal/noise variations. An efficient multi-scale computational mechanism is developed to curtail processing latency. Spatially adaptive filtering may impact subsequent processing tasks such as reconstruction and numerical gradient computations for deformable registration. NIH Grant No. R01-184173.« less

  12. Machinery running state identification based on discriminant semi-supervised local tangent space alignment for feature fusion and extraction

    NASA Astrophysics Data System (ADS)

    Su, Zuqiang; Xiao, Hong; Zhang, Yi; Tang, Baoping; Jiang, Yonghua

    2017-04-01

    Extraction of sensitive features is a challenging but key task in data-driven machinery running state identification. Aimed at solving this problem, a method for machinery running state identification that applies discriminant semi-supervised local tangent space alignment (DSS-LTSA) for feature fusion and extraction is proposed. Firstly, in order to extract more distinct features, the vibration signals are decomposed by wavelet packet decomposition WPD, and a mixed-domain feature set consisted of statistical features, autoregressive (AR) model coefficients, instantaneous amplitude Shannon entropy and WPD energy spectrum is extracted to comprehensively characterize the properties of machinery running state(s). Then, the mixed-dimension feature set is inputted into DSS-LTSA for feature fusion and extraction to eliminate redundant information and interference noise. The proposed DSS-LTSA can extract intrinsic structure information of both labeled and unlabeled state samples, and as a result the over-fitting problem of supervised manifold learning and blindness problem of unsupervised manifold learning are overcome. Simultaneously, class discrimination information is integrated within the dimension reduction process in a semi-supervised manner to improve sensitivity of the extracted fusion features. Lastly, the extracted fusion features are inputted into a pattern recognition algorithm to achieve the running state identification. The effectiveness of the proposed method is verified by a running state identification case in a gearbox, and the results confirm the improved accuracy of the running state identification.

  13. Information Extraction from Large-Multi-Layer Social Networks

    DTIC Science & Technology

    2015-08-06

    mization [4]. Methods that fall into this category include spec- tral algorithms, modularity methods, and methods that rely on statistical inference...Snijders and Chris Baerveldt, “A multilevel network study of the effects of delinquent behavior on friendship evolution,” Journal of mathematical sociol- ogy...1970. [10] Ulrike Luxburg, “A tutorial on spectral clustering,” Statistics and Computing, vol. 17, no. 4, pp. 395–416, Dec. 2007. [11] R. A. Fisher, “On

  14. Modeling Statistics of Fish Patchiness and Predicting Associated Influence on Statistics of Acoustic Echoes

    DTIC Science & Technology

    2015-09-30

    part from data analyzed in this project. This work involved informal collaborations with Chris Wilson of NOAA Alaska Fisheries , whose team collected...characteristics of animal groups such as schools, swarms and flocks arise from individuals’ immediate responses to the relative positions and velocities of...infrastructure to extract cognitive behavior and other parameters from the NOAA Alaska Fisheries Science Center acoustic/trawl walleye pollock survey data

  15. Investigating the feasibility of using partial least squares as a method of extracting salient information for the evaluation of digital breast tomosynthesis

    NASA Astrophysics Data System (ADS)

    Zhang, George Z.; Myers, Kyle J.; Park, Subok

    2013-03-01

    Digital breast tomosynthesis (DBT) has shown promise for improving the detection of breast cancer, but it has not yet been fully optimized due to a large space of system parameters to explore. A task-based statistical approach1 is a rigorous method for evaluating and optimizing this promising imaging technique with the use of optimal observers such as the Hotelling observer (HO). However, the high data dimensionality found in DBT has been the bottleneck for the use of a task-based approach in DBT evaluation. To reduce data dimensionality while extracting salient information for performing a given task, efficient channels have to be used for the HO. In the past few years, 2D Laguerre-Gauss (LG) channels, which are a complete basis for stationary backgrounds and rotationally symmetric signals, have been utilized for DBT evaluation2, 3 . But since background and signal statistics from DBT data are neither stationary nor rotationally symmetric, LG channels may not be efficient in providing reliable performance trends as a function of system parameters. Recently, partial least squares (PLS) has been shown to generate efficient channels for the Hotelling observer in detection tasks involving random backgrounds and signals.4 In this study, we investigate the use of PLS as a method for extracting salient information from DBT in order to better evaluate such systems.

  16. Data Treatment for LC-MS Untargeted Analysis.

    PubMed

    Riccadonna, Samantha; Franceschi, Pietro

    2018-01-01

    Liquid chromatography-mass spectrometry (LC-MS) untargeted experiments require complex chemometrics strategies to extract information from the experimental data. Here we discuss "data preprocessing", the set of procedures performed on the raw data to produce a data matrix which will be the starting point for the subsequent statistical analysis. Data preprocessing is a crucial step on the path to knowledge extraction, which should be carefully controlled and optimized in order to maximize the output of any untargeted metabolomics investigation.

  17. The Research of Feature Extraction Method of Liver Pathological Image Based on Multispatial Mapping and Statistical Properties

    PubMed Central

    Liu, Huiling; Xia, Bingbing; Yi, Dehui

    2016-01-01

    We propose a new feature extraction method of liver pathological image based on multispatial mapping and statistical properties. For liver pathological images of Hematein Eosin staining, the image of R and B channels can reflect the sensitivity of liver pathological images better, while the entropy space and Local Binary Pattern (LBP) space can reflect the texture features of the image better. To obtain the more comprehensive information, we map liver pathological images to the entropy space, LBP space, R space, and B space. The traditional Higher Order Local Autocorrelation Coefficients (HLAC) cannot reflect the overall information of the image, so we propose an average correction HLAC feature. We calculate the statistical properties and the average gray value of pathological images and then update the current pixel value as the absolute value of the difference between the current pixel gray value and the average gray value, which can be more sensitive to the gray value changes of pathological images. Lastly the HLAC template is used to calculate the features of the updated image. The experiment results show that the improved features of the multispatial mapping have the better classification performance for the liver cancer. PMID:27022407

  18. Statistical interpretation of machine learning-based feature importance scores for biomarker discovery.

    PubMed

    Huynh-Thu, Vân Anh; Saeys, Yvan; Wehenkel, Louis; Geurts, Pierre

    2012-07-01

    Univariate statistical tests are widely used for biomarker discovery in bioinformatics. These procedures are simple, fast and their output is easily interpretable by biologists but they can only identify variables that provide a significant amount of information in isolation from the other variables. As biological processes are expected to involve complex interactions between variables, univariate methods thus potentially miss some informative biomarkers. Variable relevance scores provided by machine learning techniques, however, are potentially able to highlight multivariate interacting effects, but unlike the p-values returned by univariate tests, these relevance scores are usually not statistically interpretable. This lack of interpretability hampers the determination of a relevance threshold for extracting a feature subset from the rankings and also prevents the wide adoption of these methods by practicians. We evaluated several, existing and novel, procedures that extract relevant features from rankings derived from machine learning approaches. These procedures replace the relevance scores with measures that can be interpreted in a statistical way, such as p-values, false discovery rates, or family wise error rates, for which it is easier to determine a significance level. Experiments were performed on several artificial problems as well as on real microarray datasets. Although the methods differ in terms of computing times and the tradeoff, they achieve in terms of false positives and false negatives, some of them greatly help in the extraction of truly relevant biomarkers and should thus be of great practical interest for biologists and physicians. As a side conclusion, our experiments also clearly highlight that using model performance as a criterion for feature selection is often counter-productive. Python source codes of all tested methods, as well as the MATLAB scripts used for data simulation, can be found in the Supplementary Material.

  19. Maximal use of kinematic information for the extraction of the mass of the top quark in single-lepton tt bar events at DO

    NASA Astrophysics Data System (ADS)

    Estrada Vigil, Juan Cruz

    The mass of the top (t) quark has been measured in the lepton+jets channel of tt¯ final states studied by the DØ and CDF experiments at Fermilab using data from Run I of the Tevatron pp¯ collider. The result published by DØ is 173.3 +/- 5.6(stat) +/- 5.5(syst) GeV. We present a different method to perform this measurement using the existing data. The new technique uses all available kinematic information in an event, and provides a significantly smaller statistical uncertainty than achieved in previous analyses. The preliminary results presented in this thesis indicate a statistical uncertainty for the extracted mass of the top quark of 3.5 GeV, which represents a significant improvement over the previous value of 5.6 GeV. The method of analysis is very general, and may be particularly useful in situations where there is a small signal and a large background.

  20. Built-Up Area Detection from High-Resolution Satellite Images Using Multi-Scale Wavelet Transform and Local Spatial Statistics

    NASA Astrophysics Data System (ADS)

    Chen, Y.; Zhang, Y.; Gao, J.; Yuan, Y.; Lv, Z.

    2018-04-01

    Recently, built-up area detection from high-resolution satellite images (HRSI) has attracted increasing attention because HRSI can provide more detailed object information. In this paper, multi-resolution wavelet transform and local spatial autocorrelation statistic are introduced to model the spatial patterns of built-up areas. First, the input image is decomposed into high- and low-frequency subbands by wavelet transform at three levels. Then the high-frequency detail information in three directions (horizontal, vertical and diagonal) are extracted followed by a maximization operation to integrate the information in all directions. Afterward, a cross-scale operation is implemented to fuse different levels of information. Finally, local spatial autocorrelation statistic is introduced to enhance the saliency of built-up features and an adaptive threshold algorithm is used to achieve the detection of built-up areas. Experiments are conducted on ZY-3 and Quickbird panchromatic satellite images, and the results show that the proposed method is very effective for built-up area detection.

  1. A sentence sliding window approach to extract protein annotations from biomedical articles

    PubMed Central

    Krallinger, Martin; Padron, Maria; Valencia, Alfonso

    2005-01-01

    Background Within the emerging field of text mining and statistical natural language processing (NLP) applied to biomedical articles, a broad variety of techniques have been developed during the past years. Nevertheless, there is still a great ned of comparative assessment of the performance of the proposed methods and the development of common evaluation criteria. This issue was addressed by the Critical Assessment of Text Mining Methods in Molecular Biology (BioCreative) contest. The aim of this contest was to assess the performance of text mining systems applied to biomedical texts including tools which recognize named entities such as genes and proteins, and tools which automatically extract protein annotations. Results The "sentence sliding window" approach proposed here was found to efficiently extract text fragments from full text articles containing annotations on proteins, providing the highest number of correctly predicted annotations. Moreover, the number of correct extractions of individual entities (i.e. proteins and GO terms) involved in the relationships used for the annotations was significantly higher than the correct extractions of the complete annotations (protein-function relations). Conclusion We explored the use of averaging sentence sliding windows for information extraction, especially in a context where conventional training data is unavailable. The combination of our approach with more refined statistical estimators and machine learning techniques might be a way to improve annotation extraction for future biomedical text mining applications. PMID:15960831

  2. Stacked sparse autoencoder in hyperspectral data classification using spectral-spatial, higher order statistics and multifractal spectrum features

    NASA Astrophysics Data System (ADS)

    Wan, Xiaoqing; Zhao, Chunhui; Wang, Yanchun; Liu, Wu

    2017-11-01

    This paper proposes a novel classification paradigm for hyperspectral image (HSI) using feature-level fusion and deep learning-based methodologies. Operation is carried out in three main steps. First, during a pre-processing stage, wave atoms are introduced into bilateral filter to smooth HSI, and this strategy can effectively attenuate noise and restore texture information. Meanwhile, high quality spectral-spatial features can be extracted from HSI by taking geometric closeness and photometric similarity among pixels into consideration simultaneously. Second, higher order statistics techniques are firstly introduced into hyperspectral data classification to characterize the phase correlations of spectral curves. Third, multifractal spectrum features are extracted to characterize the singularities and self-similarities of spectra shapes. To this end, a feature-level fusion is applied to the extracted spectral-spatial features along with higher order statistics and multifractal spectrum features. Finally, stacked sparse autoencoder is utilized to learn more abstract and invariant high-level features from the multiple feature sets, and then random forest classifier is employed to perform supervised fine-tuning and classification. Experimental results on two real hyperspectral data sets demonstrate that the proposed method outperforms some traditional alternatives.

  3. Using GIS in ecological management: green assessment of the impacts of petroleum activities in the state of Texas.

    PubMed

    Merem, Edmund; Robinson, Bennetta; Wesley, Joan M; Yerramilli, Sudha; Twumasi, Yaw A

    2010-05-01

    Geo-information technologies are valuable tools for ecological assessment in stressed environments. Visualizing natural features prone to disasters from the oil sector spatially not only helps in focusing the scope of environmental management with records of changes in affected areas, but it also furnishes information on the pace at which resource extraction affects nature. Notwithstanding the recourse to ecosystem protection, geo-spatial analysis of the impacts remains sketchy. This paper uses GIS and descriptive statistics to assess the ecological impacts of petroleum extraction activities in Texas. While the focus ranges from issues to mitigation strategies, the results point to growth in indicators of ecosystem decline.

  4. Using GIS in Ecological Management: Green Assessment of the Impacts of Petroleum Activities in the State of Texas

    PubMed Central

    Merem, Edmund; Robinson, Bennetta; Wesley, Joan M.; Yerramilli, Sudha; Twumasi, Yaw A.

    2010-01-01

    Geo-information technologies are valuable tools for ecological assessment in stressed environments. Visualizing natural features prone to disasters from the oil sector spatially not only helps in focusing the scope of environmental management with records of changes in affected areas, but it also furnishes information on the pace at which resource extraction affects nature. Notwithstanding the recourse to ecosystem protection, geo-spatial analysis of the impacts remains sketchy. This paper uses GIS and descriptive statistics to assess the ecological impacts of petroleum extraction activities in Texas. While the focus ranges from issues to mitigation strategies, the results point to growth in indicators of ecosystem decline. PMID:20623014

  5. A study on building data warehouse of hospital information system.

    PubMed

    Li, Ping; Wu, Tao; Chen, Mu; Zhou, Bin; Xu, Wei-guo

    2011-08-01

    Existing hospital information systems with simple statistical functions cannot meet current management needs. It is well known that hospital resources are distributed with private property rights among hospitals, such as in the case of the regional coordination of medical services. In this study, to integrate and make full use of medical data effectively, we propose a data warehouse modeling method for the hospital information system. The method can also be employed for a distributed-hospital medical service system. To ensure that hospital information supports the diverse needs of health care, the framework of the hospital information system has three layers: datacenter layer, system-function layer, and user-interface layer. This paper discusses the role of a data warehouse management system in handling hospital information from the establishment of the data theme to the design of a data model to the establishment of a data warehouse. Online analytical processing tools assist user-friendly multidimensional analysis from a number of different angles to extract the required data and information. Use of the data warehouse improves online analytical processing and mitigates deficiencies in the decision support system. The hospital information system based on a data warehouse effectively employs statistical analysis and data mining technology to handle massive quantities of historical data, and summarizes from clinical and hospital information for decision making. This paper proposes the use of a data warehouse for a hospital information system, specifically a data warehouse for the theme of hospital information to determine latitude, modeling and so on. The processing of patient information is given as an example that demonstrates the usefulness of this method in the case of hospital information management. Data warehouse technology is an evolving technology, and more and more decision support information extracted by data mining and with decision-making technology is required for further research.

  6. Ghana Open Data Initiative | Ghana Open Data Initiative

    Science.gov Websites

    Information Technology Agency (NITA) 2012-2016 government of Ghana. All right reserved. Agencies Search  Get Involved Finance Health Agriculture Energy Education Environment Local Government City Data Extractive Statistics Business Elections About Us Open Government FAQ Aid Data Sites Data

  7. Uncertainty relation based on unbiased parameter estimations

    NASA Astrophysics Data System (ADS)

    Sun, Liang-Liang; Song, Yong-Shun; Qiao, Cong-Feng; Yu, Sixia; Chen, Zeng-Bing

    2017-02-01

    Heisenberg's uncertainty relation has been extensively studied in spirit of its well-known original form, in which the inaccuracy measures used exhibit some controversial properties and don't conform with quantum metrology, where the measurement precision is well defined in terms of estimation theory. In this paper, we treat the joint measurement of incompatible observables as a parameter estimation problem, i.e., estimating the parameters characterizing the statistics of the incompatible observables. Our crucial observation is that, in a sequential measurement scenario, the bias induced by the first unbiased measurement in the subsequent measurement can be eradicated by the information acquired, allowing one to extract unbiased information of the second measurement of an incompatible observable. In terms of Fisher information we propose a kind of information comparison measure and explore various types of trade-offs between the information gains and measurement precisions, which interpret the uncertainty relation as surplus variance trade-off over individual perfect measurements instead of a constraint on extracting complete information of incompatible observables.

  8. Audio feature extraction using probability distribution function

    NASA Astrophysics Data System (ADS)

    Suhaib, A.; Wan, Khairunizam; Aziz, Azri A.; Hazry, D.; Razlan, Zuradzman M.; Shahriman A., B.

    2015-05-01

    Voice recognition has been one of the popular applications in robotic field. It is also known to be recently used for biometric and multimedia information retrieval system. This technology is attained from successive research on audio feature extraction analysis. Probability Distribution Function (PDF) is a statistical method which is usually used as one of the processes in complex feature extraction methods such as GMM and PCA. In this paper, a new method for audio feature extraction is proposed which is by using only PDF as a feature extraction method itself for speech analysis purpose. Certain pre-processing techniques are performed in prior to the proposed feature extraction method. Subsequently, the PDF result values for each frame of sampled voice signals obtained from certain numbers of individuals are plotted. From the experimental results obtained, it can be seen visually from the plotted data that each individuals' voice has comparable PDF values and shapes.

  9. Kinetics from Replica Exchange Molecular Dynamics Simulations.

    PubMed

    Stelzl, Lukas S; Hummer, Gerhard

    2017-08-08

    Transitions between metastable states govern many fundamental processes in physics, chemistry and biology, from nucleation events in phase transitions to the folding of proteins. The free energy surfaces underlying these processes can be obtained from simulations using enhanced sampling methods. However, their altered dynamics makes kinetic and mechanistic information difficult or impossible to extract. Here, we show that, with replica exchange molecular dynamics (REMD), one can not only sample equilibrium properties but also extract kinetic information. For systems that strictly obey first-order kinetics, the procedure to extract rates is rigorous. For actual molecular systems whose long-time dynamics are captured by kinetic rate models, accurate rate coefficients can be determined from the statistics of the transitions between the metastable states at each replica temperature. We demonstrate the practical applicability of the procedure by constructing master equation (Markov state) models of peptide and RNA folding from REMD simulations.

  10. Value assignment and uncertainty evaluation for single-element reference solutions

    NASA Astrophysics Data System (ADS)

    Possolo, Antonio; Bodnar, Olha; Butler, Therese A.; Molloy, John L.; Winchester, Michael R.

    2018-06-01

    A Bayesian statistical procedure is proposed for value assignment and uncertainty evaluation for the mass fraction of the elemental analytes in single-element solutions distributed as NIST standard reference materials. The principal novelty that we describe is the use of information about relative differences observed historically between the measured values obtained via gravimetry and via high-performance inductively coupled plasma optical emission spectrometry, to quantify the uncertainty component attributable to between-method differences. This information is encapsulated in a prior probability distribution for the between-method uncertainty component, and it is then used, together with the information provided by current measurement data, to produce a probability distribution for the value of the measurand from which an estimate and evaluation of uncertainty are extracted using established statistical procedures.

  11. Geomatic Methods for the Analysis of Data in the Earth Sciences: Lecture Notes in Earth Sciences, Vol. 95

    NASA Astrophysics Data System (ADS)

    Pavlis, Nikolaos K.

    Geomatics is a trendy term that has been used in recent years to describe academic departments that teach and research theories, methods, algorithms, and practices used in processing and analyzing data related to the Earth and other planets. Naming trends aside, geomatics could be considered as the mathematical and statistical “toolbox” that allows Earth scientists to extract information about physically relevant parameters from the available data and accompany such information with some measure of its reliability. This book is an attempt to present the mathematical-statistical methods used in data analysis within various disciplines—geodesy, geophysics, photogrammetry and remote sensing—from a unifying perspective that inverse problem formalism permits. At the same time, it allows us to stretch the relevance of statistical methods in achieving an optimal solution.

  12. Support Vector Feature Selection for Early Detection of Anastomosis Leakage From Bag-of-Words in Electronic Health Records.

    PubMed

    Soguero-Ruiz, Cristina; Hindberg, Kristian; Rojo-Alvarez, Jose Luis; Skrovseth, Stein Olav; Godtliebsen, Fred; Mortensen, Kim; Revhaug, Arthur; Lindsetmo, Rolv-Ole; Augestad, Knut Magne; Jenssen, Robert

    2016-09-01

    The free text in electronic health records (EHRs) conveys a huge amount of clinical information about health state and patient history. Despite a rapidly growing literature on the use of machine learning techniques for extracting this information, little effort has been invested toward feature selection and the features' corresponding medical interpretation. In this study, we focus on the task of early detection of anastomosis leakage (AL), a severe complication after elective surgery for colorectal cancer (CRC) surgery, using free text extracted from EHRs. We use a bag-of-words model to investigate the potential for feature selection strategies. The purpose is earlier detection of AL and prediction of AL with data generated in the EHR before the actual complication occur. Due to the high dimensionality of the data, we derive feature selection strategies using the robust support vector machine linear maximum margin classifier, by investigating: 1) a simple statistical criterion (leave-one-out-based test); 2) an intensive-computation statistical criterion (Bootstrap resampling); and 3) an advanced statistical criterion (kernel entropy). Results reveal a discriminatory power for early detection of complications after CRC (sensitivity 100%; specificity 72%). These results can be used to develop prediction models, based on EHR data, that can support surgeons and patients in the preoperative decision making phase.

  13. Statistical Frequency-Dependent Analysis of Trial-to-Trial Variability in Single Time Series by Recurrence Plots.

    PubMed

    Tošić, Tamara; Sellers, Kristin K; Fröhlich, Flavio; Fedotenkova, Mariia; Beim Graben, Peter; Hutt, Axel

    2015-01-01

    For decades, research in neuroscience has supported the hypothesis that brain dynamics exhibits recurrent metastable states connected by transients, which together encode fundamental neural information processing. To understand the system's dynamics it is important to detect such recurrence domains, but it is challenging to extract them from experimental neuroscience datasets due to the large trial-to-trial variability. The proposed methodology extracts recurrent metastable states in univariate time series by transforming datasets into their time-frequency representations and computing recurrence plots based on instantaneous spectral power values in various frequency bands. Additionally, a new statistical inference analysis compares different trial recurrence plots with corresponding surrogates to obtain statistically significant recurrent structures. This combination of methods is validated by applying it to two artificial datasets. In a final study of visually-evoked Local Field Potentials in partially anesthetized ferrets, the methodology is able to reveal recurrence structures of neural responses with trial-to-trial variability. Focusing on different frequency bands, the δ-band activity is much less recurrent than α-band activity. Moreover, α-activity is susceptible to pre-stimuli, while δ-activity is much less sensitive to pre-stimuli. This difference in recurrence structures in different frequency bands indicates diverse underlying information processing steps in the brain.

  14. Statistical Frequency-Dependent Analysis of Trial-to-Trial Variability in Single Time Series by Recurrence Plots

    PubMed Central

    Tošić, Tamara; Sellers, Kristin K.; Fröhlich, Flavio; Fedotenkova, Mariia; beim Graben, Peter; Hutt, Axel

    2016-01-01

    For decades, research in neuroscience has supported the hypothesis that brain dynamics exhibits recurrent metastable states connected by transients, which together encode fundamental neural information processing. To understand the system's dynamics it is important to detect such recurrence domains, but it is challenging to extract them from experimental neuroscience datasets due to the large trial-to-trial variability. The proposed methodology extracts recurrent metastable states in univariate time series by transforming datasets into their time-frequency representations and computing recurrence plots based on instantaneous spectral power values in various frequency bands. Additionally, a new statistical inference analysis compares different trial recurrence plots with corresponding surrogates to obtain statistically significant recurrent structures. This combination of methods is validated by applying it to two artificial datasets. In a final study of visually-evoked Local Field Potentials in partially anesthetized ferrets, the methodology is able to reveal recurrence structures of neural responses with trial-to-trial variability. Focusing on different frequency bands, the δ-band activity is much less recurrent than α-band activity. Moreover, α-activity is susceptible to pre-stimuli, while δ-activity is much less sensitive to pre-stimuli. This difference in recurrence structures in different frequency bands indicates diverse underlying information processing steps in the brain. PMID:26834580

  15. Effectiveness of feature and classifier algorithms in character recognition systems

    NASA Astrophysics Data System (ADS)

    Wilson, Charles L.

    1993-04-01

    At the first Census Optical Character Recognition Systems Conference, NIST generated accuracy data for more than character recognition systems. Most systems were tested on the recognition of isolated digits and upper and lower case alphabetic characters. The recognition experiments were performed on sample sizes of 58,000 digits, and 12,000 upper and lower case alphabetic characters. The algorithms used by the 26 conference participants included rule-based methods, image-based methods, statistical methods, and neural networks. The neural network methods included Multi-Layer Perceptron's, Learned Vector Quantitization, Neocognitrons, and cascaded neural networks. In this paper 11 different systems are compared using correlations between the answers of different systems, comparing the decrease in error rate as a function of confidence of recognition, and comparing the writer dependence of recognition. This comparison shows that methods that used different algorithms for feature extraction and recognition performed with very high levels of correlation. This is true for neural network systems, hybrid systems, and statistically based systems, and leads to the conclusion that neural networks have not yet demonstrated a clear superiority to more conventional statistical methods. Comparison of these results with the models of Vapnick (for estimation problems), MacKay (for Bayesian statistical models), Moody (for effective parameterization), and Boltzmann models (for information content) demonstrate that as the limits of training data variance are approached, all classifier systems have similar statistical properties. The limiting condition can only be approached for sufficiently rich feature sets because the accuracy limit is controlled by the available information content of the training set, which must pass through the feature extraction process prior to classification.

  16. Inferring Small Scale Dynamics from Aircraft Measurements of Tracers

    NASA Technical Reports Server (NTRS)

    Sparling, L. C.; Einaudi, Franco (Technical Monitor)

    2000-01-01

    The millions of ER-2 and DC-8 aircraft measurements of long-lived tracers in the Upper Troposphere/Lower Stratosphere (UT/LS) hold enormous potential as a source of statistical information about subgrid scale dynamics. Extracting this information however can be extremely difficult because the measurements are made along a 1-D transect through fields that are highly anisotropic in all three dimensions. Some of the challenges and limitations posed by both the instrumentation and platform are illustrated within the context of the problem of using the data to obtain an estimate of the dissipation scale. This presentation will also include some tutorial remarks about the conditional and two-point statistics used in the analysis.

  17. Bayesian depth estimation from monocular natural images.

    PubMed

    Su, Che-Chun; Cormack, Lawrence K; Bovik, Alan C

    2017-05-01

    Estimating an accurate and naturalistic dense depth map from a single monocular photographic image is a difficult problem. Nevertheless, human observers have little difficulty understanding the depth structure implied by photographs. Two-dimensional (2D) images of the real-world environment contain significant statistical information regarding the three-dimensional (3D) structure of the world that the vision system likely exploits to compute perceived depth, monocularly as well as binocularly. Toward understanding how this might be accomplished, we propose a Bayesian model of monocular depth computation that recovers detailed 3D scene structures by extracting reliable, robust, depth-sensitive statistical features from single natural images. These features are derived using well-accepted univariate natural scene statistics (NSS) models and recent bivariate/correlation NSS models that describe the relationships between 2D photographic images and their associated depth maps. This is accomplished by building a dictionary of canonical local depth patterns from which NSS features are extracted as prior information. The dictionary is used to create a multivariate Gaussian mixture (MGM) likelihood model that associates local image features with depth patterns. A simple Bayesian predictor is then used to form spatial depth estimates. The depth results produced by the model, despite its simplicity, correlate well with ground-truth depths measured by a current-generation terrestrial light detection and ranging (LIDAR) scanner. Such a strong form of statistical depth information could be used by the visual system when creating overall estimated depth maps incorporating stereopsis, accommodation, and other conditions. Indeed, even in isolation, the Bayesian predictor delivers depth estimates that are competitive with state-of-the-art "computer vision" methods that utilize highly engineered image features and sophisticated machine learning algorithms.

  18. DOA-informed source extraction in the presence of competing talkers and background noise

    NASA Astrophysics Data System (ADS)

    Taseska, Maja; Habets, Emanuël A. P.

    2017-12-01

    A desired speech signal in hands-free communication systems is often degraded by noise and interfering speech. Even though the number and locations of the interferers are often unknown in practice, it is justified to assume in certain applications that the direction-of-arrival (DOA) of the desired source is approximately known. Using the known DOA, fixed spatial filters such as the delay-and-sum beamformer can be steered to extract the desired source. However, it is well-known that fixed data-independent spatial filters do not provide sufficient reduction of directional interferers. Instead, the DOA information can be used to estimate the statistics of the desired and the undesired signals and to compute optimal data-dependent spatial filters. One way the DOA is exploited for optimal spatial filtering in the literature, is by designing DOA-based narrowband detectors to determine whether a desired or an undesired signal is dominant at each time-frequency (TF) bin. Subsequently, the statistics of the desired and the undesired signals can be estimated during the TF bins where the respective signal is dominant. In a similar manner, a Gaussian signal model-based detector which does not incorporate DOA information has been used in scenarios where the undesired signal consists of stationary background noise. However, when the undesired signal is non-stationary, resulting for example from interfering speakers, such a Gaussian signal model-based detector is unable to robustly distinguish desired from undesired speech. To this end, we propose a DOA model-based detector to determine the dominant source at each TF bin and estimate the desired and undesired signal statistics. We demonstrate that data-dependent spatial filters that use the statistics estimated by the proposed framework achieve very good undesired signal reduction, even when using only three microphones.

  19. Semantic information extracting system for classification of radiological reports in radiology information system (RIS)

    NASA Astrophysics Data System (ADS)

    Shi, Liehang; Ling, Tonghui; Zhang, Jianguo

    2016-03-01

    Radiologists currently use a variety of terminologies and standards in most hospitals in China, and even there are multiple terminologies being used for different sections in one department. In this presentation, we introduce a medical semantic comprehension system (MedSCS) to extract semantic information about clinical findings and conclusion from free text radiology reports so that the reports can be classified correctly based on medical terms indexing standards such as Radlex or SONMED-CT. Our system (MedSCS) is based on both rule-based methods and statistics-based methods which improve the performance and the scalability of MedSCS. In order to evaluate the over all of the system and measure the accuracy of the outcomes, we developed computation methods to calculate the parameters of precision rate, recall rate, F-score and exact confidence interval.

  20. An Electronic Engineering Curriculum Design Based on Concept-Mapping Techniques

    ERIC Educational Resources Information Center

    Toral, S. L.; Martinez-Torres, M. R.; Barrero, F.; Gallardo, S.; Duran, M. J.

    2007-01-01

    Curriculum design is a concern in European Universities as they face the forthcoming European Higher Education Area (EHEA). This process can be eased by the use of scientific tools such as Concept-Mapping Techniques (CMT) that extract and organize the most relevant information from experts' experience using statistics techniques, and helps a…

  1. Web 2.0 in the Professional LIS Literature: An Exploratory Analysis

    ERIC Educational Resources Information Center

    Aharony, Noa

    2011-01-01

    This paper presents a statistical descriptive analysis and a thorough content analysis of descriptors and journal titles extracted from the Library and Information Science Abstracts (LISA) database, focusing on the subject of Web 2.0 and its main applications: blog, wiki, social network and tags.The primary research questions include: whether the…

  2. Non-Gaussian information from weak lensing data via deep learning

    NASA Astrophysics Data System (ADS)

    Gupta, Arushi; Matilla, José Manuel Zorrilla; Hsu, Daniel; Haiman, Zoltán

    2018-05-01

    Weak lensing maps contain information beyond two-point statistics on small scales. Much recent work has tried to extract this information through a range of different observables or via nonlinear transformations of the lensing field. Here we train and apply a two-dimensional convolutional neural network to simulated noiseless lensing maps covering 96 different cosmological models over a range of {Ωm,σ8} . Using the area of the confidence contour in the {Ωm,σ8} plane as a figure of merit, derived from simulated convergence maps smoothed on a scale of 1.0 arcmin, we show that the neural network yields ≈5 × tighter constraints than the power spectrum, and ≈4 × tighter than the lensing peaks. Such gains illustrate the extent to which weak lensing data encode cosmological information not accessible to the power spectrum or even other, non-Gaussian statistics such as lensing peaks.

  3. Fusion of LBP and SWLD using spatio-spectral information for hyperspectral face recognition

    NASA Astrophysics Data System (ADS)

    Xie, Zhihua; Jiang, Peng; Zhang, Shuai; Xiong, Jinquan

    2018-01-01

    Hyperspectral imaging, recording intrinsic spectral information of the skin cross different spectral bands, become an important issue for robust face recognition. However, the main challenges for hyperspectral face recognition are high data dimensionality, low signal to noise ratio and inter band misalignment. In this paper, hyperspectral face recognition based on LBP (Local binary pattern) and SWLD (Simplified Weber local descriptor) is proposed to extract discriminative local features from spatio-spectral fusion information. Firstly, the spatio-spectral fusion strategy based on statistical information is used to attain discriminative features of hyperspectral face images. Secondly, LBP is applied to extract the orientation of the fusion face edges. Thirdly, SWLD is proposed to encode the intensity information in hyperspectral images. Finally, we adopt a symmetric Kullback-Leibler distance to compute the encoded face images. The hyperspectral face recognition is tested on Hong Kong Polytechnic University Hyperspectral Face database (PolyUHSFD). Experimental results show that the proposed method has higher recognition rate (92.8%) than the state of the art hyperspectral face recognition algorithms.

  4. On statistical independence of a contingency matrix

    NASA Astrophysics Data System (ADS)

    Tsumoto, Shusaku; Hirano, Shoji

    2005-03-01

    A contingency table summarizes the conditional frequencies of two attributes and shows how these two attributes are dependent on each other with the information on a partition of universe generated by these attributes. Thus, this table can be viewed as a relation between two attributes with respect to information granularity. This paper focuses on several characteristics of linear and statistical independence in a contingency table from the viewpoint of granular computing, which shows that statistical independence in a contingency table is a special form of linear dependence. The discussions also show that when a contingency table is viewed as a matrix, called a contingency matrix, its rank is equal to 1.0. Thus, the degree of independence, rank plays a very important role in extracting a probabilistic model from a given contingency table. Furthermore, it is found that in some cases, partial rows or columns will satisfy the condition of statistical independence, which can be viewed as a solving process of Diophatine equations.

  5. Network-based statistical comparison of citation topology of bibliographic databases

    PubMed Central

    Šubelj, Lovro; Fiala, Dalibor; Bajec, Marko

    2014-01-01

    Modern bibliographic databases provide the basis for scientific research and its evaluation. While their content and structure differ substantially, there exist only informal notions on their reliability. Here we compare the topological consistency of citation networks extracted from six popular bibliographic databases including Web of Science, CiteSeer and arXiv.org. The networks are assessed through a rich set of local and global graph statistics. We first reveal statistically significant inconsistencies between some of the databases with respect to individual statistics. For example, the introduced field bow-tie decomposition of DBLP Computer Science Bibliography substantially differs from the rest due to the coverage of the database, while the citation information within arXiv.org is the most exhaustive. Finally, we compare the databases over multiple graph statistics using the critical difference diagram. The citation topology of DBLP Computer Science Bibliography is the least consistent with the rest, while, not surprisingly, Web of Science is significantly more reliable from the perspective of consistency. This work can serve either as a reference for scholars in bibliometrics and scientometrics or a scientific evaluation guideline for governments and research agencies. PMID:25263231

  6. The Exponentially Embedded Family of Distributions for Effective Data Representation, Information Extraction, and Decision Making

    DTIC Science & Technology

    2013-03-01

    information ex- traction and learning from data. First of all, it admits sufficient statistics and therefore, provides the means for selecting good models...readily found since the Kullback -Liebler divergence can be used to ascertain distances between PDFs for various hypothesis testing scenarios. We...t1, t2) Information content of T2 (x) is D(pryj,IJ2(tl, t2)11Pryj,!J2=0(tl, t2)) = reduction in distance to true PDF where D(p1llp2) is Kullback

  7. Modeling ECM fiber formation: structure information extracted by analysis of 2D and 3D image sets

    NASA Astrophysics Data System (ADS)

    Wu, Jun; Voytik-Harbin, Sherry L.; Filmer, David L.; Hoffman, Christoph M.; Yuan, Bo; Chiang, Ching-Shoei; Sturgis, Jennis; Robinson, Joseph P.

    2002-05-01

    Recent evidence supports the notion that biological functions of extracellular matrix (ECM) are highly correlated to its structure. Understanding this fibrous structure is very crucial in tissue engineering to develop the next generation of biomaterials for restoration of tissues and organs. In this paper, we integrate confocal microscopy imaging and image-processing techniques to analyze the structural properties of ECM. We describe a 2D fiber middle-line tracing algorithm and apply it via Euclidean distance maps (EDM) to extract accurate fibrous structure information, such as fiber diameter, length, orientation, and density, from single slices. Based on a 2D tracing algorithm, we extend our analysis to 3D tracing via Euclidean distance maps to extract 3D fibrous structure information. We use computer simulation to construct the 3D fibrous structure which is subsequently used to test our tracing algorithms. After further image processing, these models are then applied to a variety of ECM constructions from which results of 2D and 3D traces are statistically analyzed.

  8. Functional brain networks for learning predictive statistics.

    PubMed

    Giorgio, Joseph; Karlaftis, Vasilis M; Wang, Rui; Shen, Yuan; Tino, Peter; Welchman, Andrew; Kourtzi, Zoe

    2017-08-18

    Making predictions about future events relies on interpreting streams of information that may initially appear incomprehensible. This skill relies on extracting regular patterns in space and time by mere exposure to the environment (i.e., without explicit feedback). Yet, we know little about the functional brain networks that mediate this type of statistical learning. Here, we test whether changes in the processing and connectivity of functional brain networks due to training relate to our ability to learn temporal regularities. By combining behavioral training and functional brain connectivity analysis, we demonstrate that individuals adapt to the environment's statistics as they change over time from simple repetition to probabilistic combinations. Further, we show that individual learning of temporal structures relates to decision strategy. Our fMRI results demonstrate that learning-dependent changes in fMRI activation within and functional connectivity between brain networks relate to individual variability in strategy. In particular, extracting the exact sequence statistics (i.e., matching) relates to changes in brain networks known to be involved in memory and stimulus-response associations, while selecting the most probable outcomes in a given context (i.e., maximizing) relates to changes in frontal and striatal networks. Thus, our findings provide evidence that dissociable brain networks mediate individual ability in learning behaviorally-relevant statistics. Copyright © 2017 The Authors. Published by Elsevier Ltd.. All rights reserved.

  9. Collecting and Analyzing Patient Experiences of Health Care From Social Media.

    PubMed

    Rastegar-Mojarad, Majid; Ye, Zhan; Wall, Daniel; Murali, Narayana; Lin, Simon

    2015-07-02

    Social Media, such as Yelp, provides rich information of consumer experience. Previous studies suggest that Yelp can serve as a new source to study patient experience. However, the lack of a corpus of patient reviews causes a major bottleneck for applying computational techniques. The objective of this study is to create a corpus of patient experience (COPE) and report descriptive statistics to characterize COPE. Yelp reviews about health care-related businesses were extracted from the Yelp Academic Dataset. Natural language processing (NLP) tools were used to split reviews into sentences, extract noun phrases and adjectives from each sentence, and generate parse trees and dependency trees for each sentence. Sentiment analysis techniques and Hadoop were used to calculate a sentiment score of each sentence and for parallel processing, respectively. COPE contains 79,173 sentences from 6914 patient reviews of 985 health care facilities near 30 universities in the United States. We found that patients wrote longer reviews when they rated the facility poorly (1 or 2 stars). We demonstrated that the computed sentiment scores correlated well with consumer-generated ratings. A consumer vocabulary to describe their health care experience was constructed by a statistical analysis of word counts and co-occurrences in COPE. A corpus called COPE was built as an initial step to utilize social media to understand patient experiences at health care facilities. The corpus is available to download and COPE can be used in future studies to extract knowledge of patients' experiences from their perspectives. Such information can subsequently inform and provide opportunity to improve the quality of health care.

  10. Structural health monitoring feature design by genetic programming

    NASA Astrophysics Data System (ADS)

    Harvey, Dustin Y.; Todd, Michael D.

    2014-09-01

    Structural health monitoring (SHM) systems provide real-time damage and performance information for civil, aerospace, and other high-capital or life-safety critical structures. Conventional data processing involves pre-processing and extraction of low-dimensional features from in situ time series measurements. The features are then input to a statistical pattern recognition algorithm to perform the relevant classification or regression task necessary to facilitate decisions by the SHM system. Traditional design of signal processing and feature extraction algorithms can be an expensive and time-consuming process requiring extensive system knowledge and domain expertise. Genetic programming, a heuristic program search method from evolutionary computation, was recently adapted by the authors to perform automated, data-driven design of signal processing and feature extraction algorithms for statistical pattern recognition applications. The proposed method, called Autofead, is particularly suitable to handle the challenges inherent in algorithm design for SHM problems where the manifestation of damage in structural response measurements is often unclear or unknown. Autofead mines a training database of response measurements to discover information-rich features specific to the problem at hand. This study provides experimental validation on three SHM applications including ultrasonic damage detection, bearing damage classification for rotating machinery, and vibration-based structural health monitoring. Performance comparisons with common feature choices for each problem area are provided demonstrating the versatility of Autofead to produce significant algorithm improvements on a wide range of problems.

  11. From Principal Component to Direct Coupling Analysis of Coevolution in Proteins: Low-Eigenvalue Modes are Needed for Structure Prediction

    PubMed Central

    Cocco, Simona; Monasson, Remi; Weigt, Martin

    2013-01-01

    Various approaches have explored the covariation of residues in multiple-sequence alignments of homologous proteins to extract functional and structural information. Among those are principal component analysis (PCA), which identifies the most correlated groups of residues, and direct coupling analysis (DCA), a global inference method based on the maximum entropy principle, which aims at predicting residue-residue contacts. In this paper, inspired by the statistical physics of disordered systems, we introduce the Hopfield-Potts model to naturally interpolate between these two approaches. The Hopfield-Potts model allows us to identify relevant ‘patterns’ of residues from the knowledge of the eigenmodes and eigenvalues of the residue-residue correlation matrix. We show how the computation of such statistical patterns makes it possible to accurately predict residue-residue contacts with a much smaller number of parameters than DCA. This dimensional reduction allows us to avoid overfitting and to extract contact information from multiple-sequence alignments of reduced size. In addition, we show that low-eigenvalue correlation modes, discarded by PCA, are important to recover structural information: the corresponding patterns are highly localized, that is, they are concentrated in few sites, which we find to be in close contact in the three-dimensional protein fold. PMID:23990764

  12. Bayesian learning of visual chunks by human observers

    PubMed Central

    Orbán, Gergő; Fiser, József; Aslin, Richard N.; Lengyel, Máté

    2008-01-01

    Efficient and versatile processing of any hierarchically structured information requires a learning mechanism that combines lower-level features into higher-level chunks. We investigated this chunking mechanism in humans with a visual pattern-learning paradigm. We developed an ideal learner based on Bayesian model comparison that extracts and stores only those chunks of information that are minimally sufficient to encode a set of visual scenes. Our ideal Bayesian chunk learner not only reproduced the results of a large set of previous empirical findings in the domain of human pattern learning but also made a key prediction that we confirmed experimentally. In accordance with Bayesian learning but contrary to associative learning, human performance was well above chance when pair-wise statistics in the exemplars contained no relevant information. Thus, humans extract chunks from complex visual patterns by generating accurate yet economical representations and not by encoding the full correlational structure of the input. PMID:18268353

  13. Survey of the Methods and Reporting Practices in Published Meta-analyses of Test Performance: 1987 to 2009

    ERIC Educational Resources Information Center

    Dahabreh, Issa J.; Chung, Mei; Kitsios, Georgios D.; Terasawa, Teruhiko; Raman, Gowri; Tatsioni, Athina; Tobar, Annette; Lau, Joseph; Trikalinos, Thomas A.; Schmid, Christopher H.

    2013-01-01

    We performed a survey of meta-analyses of test performance to describe the evolution in their methods and reporting. Studies were identified through MEDLINE (1966-2009), reference lists, and relevant reviews. We extracted information on clinical topics, literature review methods, quality assessment, and statistical analyses. We reviewed 760…

  14. Statistics for Time-Series Spatial Data: Applying Survival Analysis to Study Land-Use Change

    ERIC Educational Resources Information Center

    Wang, Ninghua Nathan

    2013-01-01

    Traditional spatial analysis and data mining methods fall short of extracting temporal information from data. This inability makes their use difficult to study changes and the associated mechanisms of many geographic phenomena of interest, for example, land-use. On the other hand, the growing availability of land-change data over multiple time…

  15. Application of Ontology Technology in Health Statistic Data Analysis.

    PubMed

    Guo, Minjiang; Hu, Hongpu; Lei, Xingyun

    2017-01-01

    Research Purpose: establish health management ontology for analysis of health statistic data. Proposed Methods: this paper established health management ontology based on the analysis of the concepts in China Health Statistics Yearbook, and used protégé to define the syntactic and semantic structure of health statistical data. six classes of top-level ontology concepts and their subclasses had been extracted and the object properties and data properties were defined to establish the construction of these classes. By ontology instantiation, we can integrate multi-source heterogeneous data and enable administrators to have an overall understanding and analysis of the health statistic data. ontology technology provides a comprehensive and unified information integration structure of the health management domain and lays a foundation for the efficient analysis of multi-source and heterogeneous health system management data and enhancement of the management efficiency.

  16. Time-dependent analysis of dosage delivery information for patient-controlled analgesia services.

    PubMed

    Kuo, I-Ting; Chang, Kuang-Yi; Juan, De-Fong; Hsu, Steen J; Chan, Chia-Tai; Tsou, Mei-Yung

    2018-01-01

    Pain relief always plays the essential part of perioperative care and an important role of medical quality improvement. Patient-controlled analgesia (PCA) is a method that allows a patient to self-administer small boluses of analgesic to relieve the subjective pain. PCA logs from the infusion pump consisted of a lot of text messages which record all events during the therapies. The dosage information can be extracted from PCA logs to provide easily understanding features. The analysis of dosage information with time has great help to figure out the variance of a patient's pain relief condition. To explore the trend of pain relief requirement, we developed a PCA dosage information generator (PCA DIG) to extract meaningful messages from PCA logs during the first 48 hours of therapies. PCA dosage information including consumption, delivery, infusion rate, and the ratio between demand and delivery is presented with corresponding values in 4 successive time frames. Time-dependent statistical analysis demonstrated the trends of analgesia requirements decreased gradually along with time. These findings are compatible with clinical observations and further provide valuable information about the strategy to customize postoperative pain management.

  17. Overlaid caption extraction in news video based on SVM

    NASA Astrophysics Data System (ADS)

    Liu, Manman; Su, Yuting; Ji, Zhong

    2007-11-01

    Overlaid caption in news video often carries condensed semantic information which is key cues for content-based video indexing and retrieval. However, it is still a challenging work to extract caption from video because of its complex background and low resolution. In this paper, we propose an effective overlaid caption extraction approach for news video. We first scan the video key frames using a small window, and then classify the blocks into the text and non-text ones via support vector machine (SVM), with statistical features extracted from the gray level co-occurrence matrices, the LH and HL sub-bands wavelet coefficients and the orientated edge intensity ratios. Finally morphological filtering and projection profile analysis are employed to localize and refine the candidate caption regions. Experiments show its high performance on four 30-minute news video programs.

  18. Volume and Value of Big Healthcare Data.

    PubMed

    Dinov, Ivo D

    Modern scientific inquiries require significant data-driven evidence and trans-disciplinary expertise to extract valuable information and gain actionable knowledge about natural processes. Effective evidence-based decisions require collection, processing and interpretation of vast amounts of complex data. The Moore's and Kryder's laws of exponential increase of computational power and information storage, respectively, dictate the need rapid trans-disciplinary advances, technological innovation and effective mechanisms for managing and interrogating Big Healthcare Data. In this article, we review important aspects of Big Data analytics and discuss important questions like: What are the challenges and opportunities associated with this biomedical, social, and healthcare data avalanche? Are there innovative statistical computing strategies to represent, model, analyze and interpret Big heterogeneous data? We present the foundation of a new compressive big data analytics (CBDA) framework for representation, modeling and inference of large, complex and heterogeneous datasets. Finally, we consider specific directions likely to impact the process of extracting information from Big healthcare data, translating that information to knowledge, and deriving appropriate actions.

  19. Volume and Value of Big Healthcare Data

    PubMed Central

    Dinov, Ivo D.

    2016-01-01

    Modern scientific inquiries require significant data-driven evidence and trans-disciplinary expertise to extract valuable information and gain actionable knowledge about natural processes. Effective evidence-based decisions require collection, processing and interpretation of vast amounts of complex data. The Moore's and Kryder's laws of exponential increase of computational power and information storage, respectively, dictate the need rapid trans-disciplinary advances, technological innovation and effective mechanisms for managing and interrogating Big Healthcare Data. In this article, we review important aspects of Big Data analytics and discuss important questions like: What are the challenges and opportunities associated with this biomedical, social, and healthcare data avalanche? Are there innovative statistical computing strategies to represent, model, analyze and interpret Big heterogeneous data? We present the foundation of a new compressive big data analytics (CBDA) framework for representation, modeling and inference of large, complex and heterogeneous datasets. Finally, we consider specific directions likely to impact the process of extracting information from Big healthcare data, translating that information to knowledge, and deriving appropriate actions. PMID:26998309

  20. SU-E-J-71: Spatially Preserving Prior Knowledge-Based Treatment Planning

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wang, H; Xing, L

    2015-06-15

    Purpose: Prior knowledge-based treatment planning is impeded by the use of a single dose volume histogram (DVH) curve. Critical spatial information is lost from collapsing the dose distribution into a histogram. Even similar patients possess geometric variations that becomes inaccessible in the form of a single DVH. We propose a simple prior knowledge-based planning scheme that extracts features from prior dose distribution while still preserving the spatial information. Methods: A prior patient plan is not used as a mere starting point for a new patient but rather stopping criteria are constructed. Each structure from the prior patient is partitioned intomore » multiple shells. For instance, the PTV is partitioned into an inner, middle, and outer shell. Prior dose statistics are then extracted for each shell and translated into the appropriate Dmin and Dmax parameters for the new patient. Results: The partitioned dose information from a prior case has been applied onto 14 2-D prostate cases. Using prior case yielded final DVHs that was comparable to manual planning, even though the DVH for the prior case was different from the DVH for the 14 cases. Solely using a single DVH for the entire organ was also performed for comparison but showed a much poorer performance. Different ways of translating the prior dose statistics into parameters for the new patient was also tested. Conclusion: Prior knowledge-based treatment planning need to salvage the spatial information without transforming the patients on a voxel to voxel basis. An efficient balance between the anatomy and dose domain is gained through partitioning the organs into multiple shells. The use of prior knowledge not only serves as a starting point for a new case but the information extracted from the partitioned shells are also translated into stopping criteria for the optimization problem at hand.« less

  1. Role of warm saline mouth rinse in prevention of alveolar osteitis: a randomized controlled trial.

    PubMed

    Osunde, Otasowie Daniel; Bassey, Godwin Obi

    2015-01-01

    The present study was aimed at determining the role warm saline rinse in the prevention of alveolar osteitis following dental extractions. Apparently patients aged 16 and above who were referred to the Oral Surgery Clinic of our institution, with an indication for non-surgical extraction of pathologic teeth were prospectively and uniformly randomized into warm saline group and control. The experimental group (n = 80) were instructed to gargle 6 times daily with warm saline and no such instructions were given to the second group (n = 80) to serve as controls. Information on demographic, indications for extraction, and development of alveolar osteitis were obtained and analyzed. Comparative statistics were done using Pearson's chi square or Fisher's exact test as appropriate. A p value of less than 0.05 was considered significant. The demographic and other baseline parameters such as indications for extractions were comparable among the study groups (p > 0.05). The overall prevalence of alveolar osteitis was 13.7%. There was a statistical significant difference between the study groups with respect to development of alveolar osteitis (X2 = 15.00, df = 1, p = 0.001).The risk of development of alveolar osteitis was 4 times higher in the control group (OR = 4.33, P = 0.001). Warm saline mouth rinse instruction is beneficial in the prevention of development of alveolar osteitis after dental extractions.

  2. Towards Solving the Mixing Problem in the Decomposition of Geophysical Time Series by Independent Component Analysis

    NASA Technical Reports Server (NTRS)

    Aires, Filipe; Rossow, William B.; Chedin, Alain; Hansen, James E. (Technical Monitor)

    2000-01-01

    The use of the Principal Component Analysis technique for the analysis of geophysical time series has been questioned in particular for its tendency to extract components that mix several physical phenomena even when the signal is just their linear sum. We demonstrate with a data simulation experiment that the Independent Component Analysis, a recently developed technique, is able to solve this problem. This new technique requires the statistical independence of components, a stronger constraint, that uses higher-order statistics, instead of the classical decorrelation a weaker constraint, that uses only second-order statistics. Furthermore, ICA does not require additional a priori information such as the localization constraint used in Rotational Techniques.

  3. Refractive index variance of cells and tissues measured by quantitative phase imaging.

    PubMed

    Shan, Mingguang; Kandel, Mikhail E; Popescu, Gabriel

    2017-01-23

    The refractive index distribution of cells and tissues governs their interaction with light and can report on morphological modifications associated with disease. Through intensity-based measurements, refractive index information can be extracted only via scattering models that approximate light propagation. As a result, current knowledge of refractive index distributions across various tissues and cell types remains limited. Here we use quantitative phase imaging and the statistical dispersion relation (SDR) to extract information about the refractive index variance in a variety of specimens. Due to the phase-resolved measurement in three-dimensions, our approach yields refractive index results without prior knowledge about the tissue thickness. With the recent progress in quantitative phase imaging systems, we anticipate that using SDR will become routine in assessing tissue optical properties.

  4. Flies and humans share a motion estimation strategy that exploits natural scene statistics

    PubMed Central

    Clark, Damon A.; Fitzgerald, James E.; Ales, Justin M.; Gohl, Daryl M.; Silies, Marion A.; Norcia, Anthony M.; Clandinin, Thomas R.

    2014-01-01

    Sighted animals extract motion information from visual scenes by processing spatiotemporal patterns of light falling on the retina. The dominant models for motion estimation exploit intensity correlations only between pairs of points in space and time. Moving natural scenes, however, contain more complex correlations. Here we show that fly and human visual systems encode the combined direction and contrast polarity of moving edges using triple correlations that enhance motion estimation in natural environments. Both species extract triple correlations with neural substrates tuned for light or dark edges, and sensitivity to specific triple correlations is retained even as light and dark edge motion signals are combined. Thus, both species separately process light and dark image contrasts to capture motion signatures that can improve estimation accuracy. This striking convergence argues that statistical structures in natural scenes have profoundly affected visual processing, driving a common computational strategy over 500 million years of evolution. PMID:24390225

  5. Metabolite profiling on apple volatile content based on solid phase microextraction and gas-chromatography time of flight mass spectrometry.

    PubMed

    Aprea, Eugenio; Gika, Helen; Carlin, Silvia; Theodoridis, Georgios; Vrhovsek, Urska; Mattivi, Fulvio

    2011-07-15

    A headspace SPME GC-TOF-MS method was developed for the acquisition of metabolite profiles of apple volatiles. As a first step, an experimental design was applied to find out the most appropriate conditions for the extraction of apple volatile compounds by SPME. The selected SPME method was applied in profiling of four different apple varieties by GC-EI-TOF-MS. Full scan GC-MS data were processed by MarkerLynx software for peak picking, normalisation, alignment and feature extraction. Advanced chemometric/statistical techniques (PCA and PLS-DA) were used to explore data and extract useful information. Characteristic markers of each variety were successively identified using the NIST library thus providing useful information for variety classification. The developed HS-SPME sampling method is fully automated and proved useful in obtaining the fingerprint of the volatile content of the fruit. The described analytical protocol can aid in further studies of the apple metabolome. Copyright © 2011 Elsevier B.V. All rights reserved.

  6. Recognizing stationary and locomotion activities using combinational of spectral analysis with statistical descriptors features

    NASA Astrophysics Data System (ADS)

    Zainudin, M. N. Shah; Sulaiman, Md Nasir; Mustapha, Norwati; Perumal, Thinagaran

    2017-10-01

    Prior knowledge in pervasive computing recently garnered a lot of attention due to its high demand in various application domains. Human activity recognition (HAR) considered as the applications that are widely explored by the expertise that provides valuable information to the human. Accelerometer sensor-based approach is utilized as devices to undergo the research in HAR since their small in size and this sensor already build-in in the various type of smartphones. However, the existence of high inter-class similarities among the class tends to degrade the recognition performance. Hence, this work presents the method for activity recognition using our proposed features from combinational of spectral analysis with statistical descriptors that able to tackle the issue of differentiating stationary and locomotion activities. The noise signal is filtered using Fourier Transform before it will be extracted using two different groups of features, spectral frequency analysis, and statistical descriptors. Extracted signal later will be classified using random forest ensemble classifier models. The recognition results show the good accuracy performance for stationary and locomotion activities based on USC HAD datasets.

  7. Statistical Modeling of Zr/Hf Extraction using TBP-D2EHPA Mixtures

    NASA Astrophysics Data System (ADS)

    Rezaeinejhad Jirandehi, Vahid; Haghshenas Fatmehsari, Davoud; Firoozi, Sadegh; Taghizadeh, Mohammad; Keshavarz Alamdari, Eskandar

    2012-12-01

    In the present work, response surface methodology was employed for the study and prediction of Zr/Hf extraction curves in a solvent extraction system using D2EHPA-TBP mixtures. The effect of change in the levels of temperature, nitric acid concentration, and TBP/D2EHPA ratio (T/D) on the Zr/Hf extraction/separation was studied by the use of central composite design. The results showed a statistically significant effect of T/D, nitric acid concentration, and temperature on the extraction percentage of Zr and Hf. In the case of Zr, a statistically significant interaction was found between T/D and nitric acid, whereas for Hf, both interactive terms between temperature and T/D and nitric acid were significant. Additionally, the extraction curves were profitably predicted applying the developed statistical regression equations; this approach is faster and more economical compared with experimentally obtained curves.

  8. An information-theoretic approach to the modeling and analysis of whole-genome bisulfite sequencing data.

    PubMed

    Jenkinson, Garrett; Abante, Jordi; Feinberg, Andrew P; Goutsias, John

    2018-03-07

    DNA methylation is a stable form of epigenetic memory used by cells to control gene expression. Whole genome bisulfite sequencing (WGBS) has emerged as a gold-standard experimental technique for studying DNA methylation by producing high resolution genome-wide methylation profiles. Statistical modeling and analysis is employed to computationally extract and quantify information from these profiles in an effort to identify regions of the genome that demonstrate crucial or aberrant epigenetic behavior. However, the performance of most currently available methods for methylation analysis is hampered by their inability to directly account for statistical dependencies between neighboring methylation sites, thus ignoring significant information available in WGBS reads. We present a powerful information-theoretic approach for genome-wide modeling and analysis of WGBS data based on the 1D Ising model of statistical physics. This approach takes into account correlations in methylation by utilizing a joint probability model that encapsulates all information available in WGBS methylation reads and produces accurate results even when applied on single WGBS samples with low coverage. Using the Shannon entropy, our approach provides a rigorous quantification of methylation stochasticity in individual WGBS samples genome-wide. Furthermore, it utilizes the Jensen-Shannon distance to evaluate differences in methylation distributions between a test and a reference sample. Differential performance assessment using simulated and real human lung normal/cancer data demonstrate a clear superiority of our approach over DSS, a recently proposed method for WGBS data analysis. Critically, these results demonstrate that marginal methods become statistically invalid when correlations are present in the data. This contribution demonstrates clear benefits and the necessity of modeling joint probability distributions of methylation using the 1D Ising model of statistical physics and of quantifying methylation stochasticity using concepts from information theory. By employing this methodology, substantial improvement of DNA methylation analysis can be achieved by effectively taking into account the massive amount of statistical information available in WGBS data, which is largely ignored by existing methods.

  9. Assessment of metals bioavailability to vegetables under field conditions using DGT, single extractions and multivariate statistics

    PubMed Central

    2012-01-01

    Background The metals bioavailability in soils is commonly assessed by chemical extractions; however a generally accepted method is not yet established. In this study, the effectiveness of Diffusive Gradients in Thin-films (DGT) technique and single extractions in the assessment of metals bioaccumulation in vegetables, and the influence of soil parameters on phytoavailability were evaluated using multivariate statistics. Soil and plants grown in vegetable gardens from mining-affected rural areas, NW Romania, were collected and analysed. Results Pseudo-total metal content of Cu, Zn and Cd in soil ranged between 17.3-146 mg kg-1, 141–833 mg kg-1 and 0.15-2.05 mg kg-1, respectively, showing enriched contents of these elements. High degrees of metals extractability in 1M HCl and even in 1M NH4Cl were observed. Despite the relatively high total metal concentrations in soil, those found in vegetables were comparable to values typically reported for agricultural crops, probably due to the low concentrations of metals in soil solution (Csoln) and low effective concentrations (CE), assessed by DGT technique. Among the analysed vegetables, the highest metal concentrations were found in carrots roots. By applying multivariate statistics, it was found that CE, Csoln and extraction in 1M NH4Cl, were better predictors for metals bioavailability than the acid extractions applied in this study. Copper transfer to vegetables was strongly influenced by soil organic carbon (OC) and cation exchange capacity (CEC), while pH had a higher influence on Cd transfer from soil to plants. Conclusions The results showed that DGT can be used for general evaluation of the risks associated to soil contamination with Cu, Zn and Cd in field conditions. Although quantitative information on metals transfer from soil to vegetables was not observed. PMID:23079133

  10. Delayed Implants Outcome in Maxillary Molar Region.

    PubMed

    Crespi, Roberto; Capparè, Paolo; Crespi, Giovanni; Gastaldi, Giorgio; Gherlone, Enrico F

    2017-04-01

    The aim of the present study was to assess bone volume changes in maxillary molar regions after delayed implants placement. Patients presented large bone defects after tooth extractions. Reactive soft tissue was left into the defects. No grafts were used. Cone beam computed tomography (CBCT) scans were performed before tooth extractions, at implant placement (at 3 months from extraction) and 3 years after implant placement, bone volume measurements were assessed. Bucco-lingual width showed a statistically significant decrease (p = .013) at implant placement, 3 months after extraction. Moreover, a statistically significant increase (p < .01) was measured 3 years after implant placement. No statistically significant differences (p > .05) were found between baseline values (before extraction) and at 3 years from implant placement. Vertical dimension showed no statistically significant differences (p > .05) at implant placement, 3 months after extraction. Statistically significant differences (p < .0001) were found between baseline values (before extraction) and at 3 months from implant placement as well as between implant placement values and 3 years later. CT scans presented successful outcome of delayed implants placed in large bone defects at 3-year follow-up. © 2016 Wiley Periodicals, Inc.

  11. Toward Reflective Judgment in Exploratory Factor Analysis Decisions: Determining the Extraction Method and Number of Factors To Retain.

    ERIC Educational Resources Information Center

    Knight, Jennifer L.

    This paper considers some decisions that must be made by the researcher conducting an exploratory factor analysis. The primary purpose is to aid the researcher in making informed decisions during the factor analysis instead of relying on defaults in statistical programs or traditions of previous researchers. Three decision areas are addressed.…

  12. NASA STI program database: Journal coverage (1990-1992)

    NASA Technical Reports Server (NTRS)

    1993-01-01

    Data are given in tabular form on the extent of recent journal accessions (1990-1992) to the NASA Scientific and Technical Information (STI) Database. Journals are presented by country in two ways: first by an alphabetical listing; and second, by the decreasing number of citations extracted from these journals during this period. An appendix containing a statistical summary is included.

  13. Forest Fire History... A Computer Method of Data Analysis

    Treesearch

    Romain M. Meese

    1973-01-01

    A series of computer programs is available to extract information from the individual Fire Reports (U.S. Forest Service Form 5100-29). The programs use a statistical technique to fit a continuous distribution to a set of sampled data. The goodness-of-fit program is applicable to data other than the fire history. Data summaries illustrate analysis of fire occurrence,...

  14. Hand and goods judgment algorithm based on depth information

    NASA Astrophysics Data System (ADS)

    Li, Mingzhu; Zhang, Jinsong; Yan, Dan; Wang, Qin; Zhang, Ruiqi; Han, Jing

    2016-03-01

    A tablet computer with a depth camera and a color camera is loaded on a traditional shopping cart. The inside information of the shopping cart is obtained by two cameras. In the shopping cart monitoring field, it is very important for us to determine whether the customer with goods in or out of the shopping cart. This paper establishes a basic framework for judging empty hand, it includes the hand extraction process based on the depth information, process of skin color model building based on WPCA (Weighted Principal Component Analysis), an algorithm for judging handheld products based on motion and skin color information, statistical process. Through this framework, the first step can ensure the integrity of the hand information, and effectively avoids the influence of sleeve and other debris, the second step can accurately extract skin color and eliminate the similar color interference, light has little effect on its results, it has the advantages of fast computation speed and high efficiency, and the third step has the advantage of greatly reducing the noise interference and improving the accuracy.

  15. A statistical analysis of the impact of advertising signs on road safety.

    PubMed

    Yannis, George; Papadimitriou, Eleonora; Papantoniou, Panagiotis; Voulgari, Chrisoula

    2013-01-01

    This research aims to investigate the impact of advertising signs on road safety. An exhaustive review of international literature was carried out on the effect of advertising signs on driver behaviour and safety. Moreover, a before-and-after statistical analysis with control groups was applied on several road sites with different characteristics in the Athens metropolitan area, in Greece, in order to investigate the correlation between the placement or removal of advertising signs and the related occurrence of road accidents. Road accident data for the 'before' and 'after' periods on the test sites and the control sites were extracted from the database of the Hellenic Statistical Authority, and the selected 'before' and 'after' periods vary from 2.5 to 6 years. The statistical analysis shows no statistical correlation between road accidents and advertising signs in none of the nine sites examined, as the confidence intervals of the estimated safety effects are non-significant at 95% confidence level. This can be explained by the fact that, in the examined road sites, drivers are overloaded with information (traffic signs, directions signs, labels of shops, pedestrians and other vehicles, etc.) so that the additional information load from advertising signs may not further distract them.

  16. No-Reference Video Quality Assessment Based on Statistical Analysis in 3D-DCT Domain.

    PubMed

    Li, Xuelong; Guo, Qun; Lu, Xiaoqiang

    2016-05-13

    It is an important task to design models for universal no-reference video quality assessment (NR-VQA) in multiple video processing and computer vision applications. However, most existing NR-VQA metrics are designed for specific distortion types which are not often aware in practical applications. A further deficiency is that the spatial and temporal information of videos is hardly considered simultaneously. In this paper, we propose a new NR-VQA metric based on the spatiotemporal natural video statistics (NVS) in 3D discrete cosine transform (3D-DCT) domain. In the proposed method, a set of features are firstly extracted based on the statistical analysis of 3D-DCT coefficients to characterize the spatiotemporal statistics of videos in different views. These features are used to predict the perceived video quality via the efficient linear support vector regression (SVR) model afterwards. The contributions of this paper are: 1) we explore the spatiotemporal statistics of videos in 3DDCT domain which has the inherent spatiotemporal encoding advantage over other widely used 2D transformations; 2) we extract a small set of simple but effective statistical features for video visual quality prediction; 3) the proposed method is universal for multiple types of distortions and robust to different databases. The proposed method is tested on four widely used video databases. Extensive experimental results demonstrate that the proposed method is competitive with the state-of-art NR-VQA metrics and the top-performing FR-VQA and RR-VQA metrics.

  17. A quadratically regularized functional canonical correlation analysis for identifying the global structure of pleiotropy with NGS data

    PubMed Central

    Zhu, Yun; Fan, Ruzong; Xiong, Momiao

    2017-01-01

    Investigating the pleiotropic effects of genetic variants can increase statistical power, provide important information to achieve deep understanding of the complex genetic structures of disease, and offer powerful tools for designing effective treatments with fewer side effects. However, the current multiple phenotype association analysis paradigm lacks breadth (number of phenotypes and genetic variants jointly analyzed at the same time) and depth (hierarchical structure of phenotype and genotypes). A key issue for high dimensional pleiotropic analysis is to effectively extract informative internal representation and features from high dimensional genotype and phenotype data. To explore correlation information of genetic variants, effectively reduce data dimensions, and overcome critical barriers in advancing the development of novel statistical methods and computational algorithms for genetic pleiotropic analysis, we proposed a new statistic method referred to as a quadratically regularized functional CCA (QRFCCA) for association analysis which combines three approaches: (1) quadratically regularized matrix factorization, (2) functional data analysis and (3) canonical correlation analysis (CCA). Large-scale simulations show that the QRFCCA has a much higher power than that of the ten competing statistics while retaining the appropriate type 1 errors. To further evaluate performance, the QRFCCA and ten other statistics are applied to the whole genome sequencing dataset from the TwinsUK study. We identify a total of 79 genes with rare variants and 67 genes with common variants significantly associated with the 46 traits using QRFCCA. The results show that the QRFCCA substantially outperforms the ten other statistics. PMID:29040274

  18. Wastewater-Based Epidemiology of Stimulant Drugs: Functional Data Analysis Compared to Traditional Statistical Methods.

    PubMed

    Salvatore, Stefania; Bramness, Jørgen Gustav; Reid, Malcolm J; Thomas, Kevin Victor; Harman, Christopher; Røislien, Jo

    2015-01-01

    Wastewater-based epidemiology (WBE) is a new methodology for estimating the drug load in a population. Simple summary statistics and specification tests have typically been used to analyze WBE data, comparing differences between weekday and weekend loads. Such standard statistical methods may, however, overlook important nuanced information in the data. In this study, we apply functional data analysis (FDA) to WBE data and compare the results to those obtained from more traditional summary measures. We analysed temporal WBE data from 42 European cities, using sewage samples collected daily for one week in March 2013. For each city, the main temporal features of two selected drugs were extracted using functional principal component (FPC) analysis, along with simpler measures such as the area under the curve (AUC). The individual cities' scores on each of the temporal FPCs were then used as outcome variables in multiple linear regression analysis with various city and country characteristics as predictors. The results were compared to those of functional analysis of variance (FANOVA). The three first FPCs explained more than 99% of the temporal variation. The first component (FPC1) represented the level of the drug load, while the second and third temporal components represented the level and the timing of a weekend peak. AUC was highly correlated with FPC1, but other temporal characteristic were not captured by the simple summary measures. FANOVA was less flexible than the FPCA-based regression, and even showed concordance results. Geographical location was the main predictor for the general level of the drug load. FDA of WBE data extracts more detailed information about drug load patterns during the week which are not identified by more traditional statistical methods. Results also suggest that regression based on FPC results is a valuable addition to FANOVA for estimating associations between temporal patterns and covariate information.

  19. Maximum likelihood: Extracting unbiased information from complex networks

    NASA Astrophysics Data System (ADS)

    Garlaschelli, Diego; Loffredo, Maria I.

    2008-07-01

    The choice of free parameters in network models is subjective, since it depends on what topological properties are being monitored. However, we show that the maximum likelihood (ML) principle indicates a unique, statistically rigorous parameter choice, associated with a well-defined topological feature. We then find that, if the ML condition is incompatible with the built-in parameter choice, network models turn out to be intrinsically ill defined or biased. To overcome this problem, we construct a class of safely unbiased models. We also propose an extension of these results that leads to the fascinating possibility to extract, only from topological data, the “hidden variables” underlying network organization, making them “no longer hidden.” We test our method on World Trade Web data, where we recover the empirical gross domestic product using only topological information.

  20. Liver segmentation from CT images using a sparse priori statistical shape model (SP-SSM).

    PubMed

    Wang, Xuehu; Zheng, Yongchang; Gan, Lan; Wang, Xuan; Sang, Xinting; Kong, Xiangfeng; Zhao, Jie

    2017-01-01

    This study proposes a new liver segmentation method based on a sparse a priori statistical shape model (SP-SSM). First, mark points are selected in the liver a priori model and the original image. Then, the a priori shape and its mark points are used to obtain a dictionary for the liver boundary information. Second, the sparse coefficient is calculated based on the correspondence between mark points in the original image and those in the a priori model, and then the sparse statistical model is established by combining the sparse coefficients and the dictionary. Finally, the intensity energy and boundary energy models are built based on the intensity information and the specific boundary information of the original image. Then, the sparse matching constraint model is established based on the sparse coding theory. These models jointly drive the iterative deformation of the sparse statistical model to approximate and accurately extract the liver boundaries. This method can solve the problems of deformation model initialization and a priori method accuracy using the sparse dictionary. The SP-SSM can achieve a mean overlap error of 4.8% and a mean volume difference of 1.8%, whereas the average symmetric surface distance and the root mean square symmetric surface distance can reach 0.8 mm and 1.4 mm, respectively.

  1. Agile Text Mining for the 2014 i2b2/UTHealth Cardiac Risk Factors Challenge

    PubMed Central

    Cormack, James; Nath, Chinmoy; Milward, David; Raja, Kalpana; Jonnalagadda, Siddhartha R

    2016-01-01

    This paper describes the use of an agile text mining platform (Linguamatics’ Interactive Information Extraction Platform, I2E) to extract document-level cardiac risk factors in patient records as defined in the i2b2/UTHealth 2014 Challenge. The approach uses a data-driven rule-based methodology with the addition of a simple supervised classifier. We demonstrate that agile text mining allows for rapid optimization of extraction strategies, while post-processing can leverage annotation guidelines, corpus statistics and logic inferred from the gold standard data. We also show how data imbalance in a training set affects performance. Evaluation of this approach on the test data gave an F-Score of 91.7%, one percent behind the top performing system. PMID:26209007

  2. Precipitate statistics in an Al-Mg-Si-Cu alloy from scanning precession electron diffraction data

    NASA Astrophysics Data System (ADS)

    Sunde, J. K.; Paulsen, Ø.; Wenner, S.; Holmestad, R.

    2017-09-01

    The key microstructural feature providing strength to age-hardenable Al alloys is nanoscale precipitates. Alloy development requires a reliable statistical assessment of these precipitates, in order to link the microstructure with material properties. Here, it is demonstrated that scanning precession electron diffraction combined with computational analysis enable the semi-automated extraction of precipitate statistics in an Al-Mg-Si-Cu alloy. Among the main findings is the precipitate number density, which agrees well with a conventional method based on manual counting and measurements. By virtue of its data analysis objectivity, our methodology is therefore seen as an advantageous alternative to existing routines, offering reproducibility and efficiency in alloy statistics. Additional results include improved qualitative information on phase distributions. The developed procedure is generic and applicable to any material containing nanoscale precipitates.

  3. Accidents and undetermined deaths: re-evaluation of nationwide samples from the Scandinavian countries.

    PubMed

    Tøllefsen, Ingvild Maria; Thiblin, Ingemar; Helweg-Larsen, Karin; Hem, Erlend; Kastrup, Marianne; Nyberg, Ullakarin; Rogde, Sidsel; Zahl, Per-Henrik; Østevold, Gunvor; Ekeberg, Øivind

    2016-05-27

    National mortality statistics should be comparable between countries that use the World Health Organization's International Classification of Diseases. Distinguishing between manners of death, especially suicides and accidents, is a challenge. Knowledge about accidents is important in prevention of both accidents and suicides. The aim of the present study was to assess the reliability of classifying deaths as accidents and undetermined manner of deaths in the three Scandinavian countries and to compare cross-national differences. The cause of death registers in Norway, Sweden and Denmark provided data from 2008 for samples of 600 deaths from each country, of which 200 were registered as suicides, 200 as accidents or undetermined manner of deaths and 200 as natural deaths. The information given to the eight experts was identical to the information used by the Cause of Death Register. This included death certificates, and if available external post-mortem examinations, forensic autopsy reports and police reports. In total, 69 % (Sweden and Norway) and 78 % (Denmark) of deaths registered in the official mortality statistics as accidents were confirmed by the experts. In the majority of the cases where disagreement was seen, the experts reclassified accidents to undetermined manner of death, in 26, 25 and 19 % of cases, respectively. Few cases were reclassified as suicides or natural deaths. Among the extracted accidents, the experts agreed least with the official mortality statistics concerning drowning and poisoning accidents. They also reported most uncertainty in these categories of accidents. In a second re-evaluation, where more information was made available, the Norwegian psychiatrist and forensic pathologist increased their agreement with the official mortality statistics from 76 to 87 %, and from 85 to 88 %, respectively, regarding the Norwegian and Swedish datasets. Among the extracted undetermined deaths in the Swedish dataset, the two experts reclassified 22 and 51 %, respectively, to accidents. There was moderate agreement in reclassification of accidents between the official mortality statistics and the experts. In the majority of cases where there was disagreement, accidents were reclassified as undetermined manner of death, and only a small proportion as suicides.

  4. Probability Distribution Extraction from TEC Estimates based on Kernel Density Estimation

    NASA Astrophysics Data System (ADS)

    Demir, Uygar; Toker, Cenk; Çenet, Duygu

    2016-07-01

    Statistical analysis of the ionosphere, specifically the Total Electron Content (TEC), may reveal important information about its temporal and spatial characteristics. One of the core metrics that express the statistical properties of a stochastic process is its Probability Density Function (pdf). Furthermore, statistical parameters such as mean, variance and kurtosis, which can be derived from the pdf, may provide information about the spatial uniformity or clustering of the electron content. For example, the variance differentiates between a quiet ionosphere and a disturbed one, whereas kurtosis differentiates between a geomagnetic storm and an earthquake. Therefore, valuable information about the state of the ionosphere (and the natural phenomena that cause the disturbance) can be obtained by looking at the statistical parameters. In the literature, there are publications which try to fit the histogram of TEC estimates to some well-known pdf.s such as Gaussian, Exponential, etc. However, constraining a histogram to fit to a function with a fixed shape will increase estimation error, and all the information extracted from such pdf will continue to contain this error. In such techniques, it is highly likely to observe some artificial characteristics in the estimated pdf which is not present in the original data. In the present study, we use the Kernel Density Estimation (KDE) technique to estimate the pdf of the TEC. KDE is a non-parametric approach which does not impose a specific form on the TEC. As a result, better pdf estimates that almost perfectly fit to the observed TEC values can be obtained as compared to the techniques mentioned above. KDE is particularly good at representing the tail probabilities, and outliers. We also calculate the mean, variance and kurtosis of the measured TEC values. The technique is applied to the ionosphere over Turkey where the TEC values are estimated from the GNSS measurement from the TNPGN-Active (Turkish National Permanent GNSS Network) network. This study is supported by by TUBITAK 115E915 and Joint TUBITAK 114E092 and AS CR14/001 projects.

  5. Cancer Imaging Phenomics Software Suite: Application to Brain and Breast Cancer | Informatics Technology for Cancer Research (ITCR)

    Cancer.gov

    The transition of oncologic imaging from its “industrial era” to it is “information era” demands analytical methods that 1) extract information from this data that is clinically and biologically relevant; 2) integrate imaging, clinical, and genomic data via rigorous statistical and computational methodologies in order to derive models valuable for understanding cancer mechanisms, diagnosis, prognostic assessment, response evaluation, and personalized treatment management; 3) are available to the biomedical community for easy use and application, with the aim of understanding, diagnosing, an

  6. Characterization of Structural and Configurational Properties of DNA by Atomic Force Microscopy.

    PubMed

    Meroni, Alice; Lazzaro, Federico; Muzi-Falconi, Marco; Podestà, Alessandro

    2018-01-01

    We describe a method to extract quantitative information on DNA structural and configurational properties from high-resolution topographic maps recorded by atomic force microscopy (AFM). DNA molecules are deposited on mica surfaces from an aqueous solution, carefully dehydrated, and imaged in air in Tapping Mode. Upon extraction of the spatial coordinates of the DNA backbones from AFM images, several parameters characterizing DNA structure and configuration can be calculated. Here, we explain how to obtain the distribution of contour lengths, end-to-end distances, and gyration radii. This modular protocol can be also used to characterize other statistical parameters from AFM topographies.

  7. The implementation of CMOS sensors within a real time digital mammography intelligent imaging system: The I-ImaS System

    NASA Astrophysics Data System (ADS)

    Esbrand, C.; Royle, G.; Griffiths, J.; Speller, R.

    2009-07-01

    The integration of technology with healthcare has undoubtedly propelled the medical imaging sector well into the twenty first century. The concept of digital imaging introduced during the 1970s has since paved the way for established imaging techniques where digital mammography, phase contrast imaging and CT imaging are just a few examples. This paper presents a prototype intelligent digital mammography system designed and developed by a European consortium. The final system, the I-ImaS system, utilises CMOS monolithic active pixel sensor (MAPS) technology promoting on-chip data processing, enabling the acts of data processing and image acquisition to be achieved simultaneously; consequently, statistical analysis of tissue is achievable in real-time for the purpose of x-ray beam modulation via a feedback mechanism during the image acquisition procedure. The imager implements a dual array of twenty 520 pixel × 40 pixel CMOS MAPS sensing devices with a 32μm pixel size, each individually coupled to a 100μm thick thallium doped structured CsI scintillator. This paper presents the first intelligent images of real breast tissue obtained from the prototype system of real excised breast tissue where the x-ray exposure was modulated via the statistical information extracted from the breast tissue itself. Conventional images were experimentally acquired where the statistical analysis of the data was done off-line, resulting in the production of simulated real-time intelligently optimised images. The results obtained indicate real-time image optimisation using the statistical information extracted from the breast as a means of a feedback mechanisms is beneficial and foreseeable in the near future.

  8. Bearing Fault Diagnosis Based on Statistical Locally Linear Embedding

    PubMed Central

    Wang, Xiang; Zheng, Yuan; Zhao, Zhenzhou; Wang, Jinping

    2015-01-01

    Fault diagnosis is essentially a kind of pattern recognition. The measured signal samples usually distribute on nonlinear low-dimensional manifolds embedded in the high-dimensional signal space, so how to implement feature extraction, dimensionality reduction and improve recognition performance is a crucial task. In this paper a novel machinery fault diagnosis approach based on a statistical locally linear embedding (S-LLE) algorithm which is an extension of LLE by exploiting the fault class label information is proposed. The fault diagnosis approach first extracts the intrinsic manifold features from the high-dimensional feature vectors which are obtained from vibration signals that feature extraction by time-domain, frequency-domain and empirical mode decomposition (EMD), and then translates the complex mode space into a salient low-dimensional feature space by the manifold learning algorithm S-LLE, which outperforms other feature reduction methods such as PCA, LDA and LLE. Finally in the feature reduction space pattern classification and fault diagnosis by classifier are carried out easily and rapidly. Rolling bearing fault signals are used to validate the proposed fault diagnosis approach. The results indicate that the proposed approach obviously improves the classification performance of fault pattern recognition and outperforms the other traditional approaches. PMID:26153771

  9. Identifying Nanoscale Structure-Function Relationships Using Multimodal Atomic Force Microscopy, Dimensionality Reduction, and Regression Techniques.

    PubMed

    Kong, Jessica; Giridharagopal, Rajiv; Harrison, Jeffrey S; Ginger, David S

    2018-05-31

    Correlating nanoscale chemical specificity with operational physics is a long-standing goal of functional scanning probe microscopy (SPM). We employ a data analytic approach combining multiple microscopy modes, using compositional information in infrared vibrational excitation maps acquired via photoinduced force microscopy (PiFM) with electrical information from conductive atomic force microscopy. We study a model polymer blend comprising insulating poly(methyl methacrylate) (PMMA) and semiconducting poly(3-hexylthiophene) (P3HT). We show that PiFM spectra are different from FTIR spectra, but can still be used to identify local composition. We use principal component analysis to extract statistically significant principal components and principal component regression to predict local current and identify local polymer composition. In doing so, we observe evidence of semiconducting P3HT within PMMA aggregates. These methods are generalizable to correlated SPM data and provide a meaningful technique for extracting complex compositional information that are impossible to measure from any one technique.

  10. Effects of band selection on endmember extraction for forestry applications

    NASA Astrophysics Data System (ADS)

    Karathanassi, Vassilia; Andreou, Charoula; Andronis, Vassilis; Kolokoussis, Polychronis

    2014-10-01

    In spectral unmixing theory, data reduction techniques play an important role as hyperspectral imagery contains an immense amount of data, posing many challenging problems such as data storage, computational efficiency, and the so called "curse of dimensionality". Feature extraction and feature selection are the two main approaches for dimensionality reduction. Feature extraction techniques are used for reducing the dimensionality of the hyperspectral data by applying transforms on hyperspectral data. Feature selection techniques retain the physical meaning of the data by selecting a set of bands from the input hyperspectral dataset, which mainly contain the information needed for spectral unmixing. Although feature selection techniques are well-known for their dimensionality reduction potentials they are rarely used in the unmixing process. The majority of the existing state-of-the-art dimensionality reduction methods set criteria to the spectral information, which is derived by the whole wavelength, in order to define the optimum spectral subspace. These criteria are not associated with any particular application but with the data statistics, such as correlation and entropy values. However, each application is associated with specific land c over materials, whose spectral characteristics present variations in specific wavelengths. In forestry for example, many applications focus on tree leaves, in which specific pigments such as chlorophyll, xanthophyll, etc. determine the wavelengths where tree species, diseases, etc., can be detected. For such applications, when the unmixing process is applied, the tree species, diseases, etc., are considered as the endmembers of interest. This paper focuses on investigating the effects of band selection on the endmember extraction by exploiting the information of the vegetation absorbance spectral zones. More precisely, it is explored whether endmember extraction can be optimized when specific sets of initial bands related to leaf spectral characteristics are selected. Experiments comprise application of well-known signal subspace estimation and endmember extraction methods on a hyperspectral imagery that presents a forest area. Evaluation of the extracted endmembers showed that more forest species can be extracted as endmembers using selected bands.

  11. Road boundary detection

    NASA Technical Reports Server (NTRS)

    Sowers, J.; Mehrotra, R.; Sethi, I. K.

    1989-01-01

    A method for extracting road boundaries using the monochrome image of a visual road scene is presented. The statistical information regarding the intensity levels present in the image along with some geometrical constraints concerning the road are the basics of this approach. Results and advantages of this technique compared to others are discussed. The major advantages of this technique, when compared to others, are its ability to process the image in only one pass, to limit the area searched in the image using only knowledge concerning the road geometry and previous boundary information, and dynamically adjust for inconsistencies in the located boundary information, all of which helps to increase the efficacy of this technique.

  12. Depressed Mothers as Informants on Child Behavior: Methodological Issues

    PubMed Central

    Ordway, Monica Roosa

    2011-01-01

    Mothers with depressive symptoms more frequently report behavioral problems among their children than non-depressed mothers leading to a debate regarding the accuracy of depressed mothers as informants of children’s behavior. The purpose of this integrative review was to identify methodological challenges in research related to the debate. Data were extracted from 43 papers (6 theoretical, 36 research reports, and 1 instrument scoring manual). The analysis focused on the methodologies considered when using depressed mothers as informants. Nine key themes were identified and I concluded that researchers should incorporate multiple informants, identify the characteristics of maternal depression, and incorporate advanced statistical methodology. The use of a conceptual framework to understand informant discrepancies within child behavior evaluations is suggested for future research. PMID:21964958

  13. A defocus-information-free autostereoscopic three-dimensional (3D) digital reconstruction method using direct extraction of disparity information (DEDI)

    NASA Astrophysics Data System (ADS)

    Li, Da; Cheung, Chifai; Zhao, Xing; Ren, Mingjun; Zhang, Juan; Zhou, Liqiu

    2016-10-01

    Autostereoscopy based three-dimensional (3D) digital reconstruction has been widely applied in the field of medical science, entertainment, design, industrial manufacture, precision measurement and many other areas. The 3D digital model of the target can be reconstructed based on the series of two-dimensional (2D) information acquired by the autostereoscopic system, which consists multiple lens and can provide information of the target from multiple angles. This paper presents a generalized and precise autostereoscopic three-dimensional (3D) digital reconstruction method based on Direct Extraction of Disparity Information (DEDI) which can be used to any transform autostereoscopic systems and provides accurate 3D reconstruction results through error elimination process based on statistical analysis. The feasibility of DEDI method has been successfully verified through a series of optical 3D digital reconstruction experiments on different autostereoscopic systems which is highly efficient to perform the direct full 3D digital model construction based on tomography-like operation upon every depth plane with the exclusion of the defocused information. With the absolute focused information processed by DEDI method, the 3D digital model of the target can be directly and precisely formed along the axial direction with the depth information.

  14. Rescue karyotyping: a case series of array-based comparative genomic hybridization evaluation of archival conceptual tissue

    PubMed Central

    2014-01-01

    Background Determination of fetal aneuploidy is central to evaluation of recurrent pregnancy loss (RPL). However, obtaining this information at the time of a miscarriage is not always possible or may not have been ordered. Here we report on “rescue karyotyping”, wherein DNA extracted from archived paraffin-embedded pregnancy loss tissue from a prior dilation and curettage (D&C) is evaluated by array-based comparative genomic hybridization (aCGH). Methods A retrospective case series was conducted at an academic medical center. Patients included had unexplained RPL and a prior pregnancy loss for which karyotype information would be clinically informative but was unavailable. After extracting DNA from slides of archived tissue, aCGH with a reduced stringency approach was performed, allowing for analysis of partially degraded DNA. Statistics were computed using STATA v12.1 (College Station, TX). Results Rescue karyotyping was attempted on 20 specimens from 17 women. DNA was successfully extracted in 16 samples (80.0%), enabling analysis at either high or low resolution. The longest interval from tissue collection to DNA extraction was 4.2 years. There was no significant difference in specimen sufficiency for analysis in the collection-to-extraction interval (p = 0.14) or gestational age at pregnancy loss (p = 0.32). Eight specimens showed copy number variants: 3 trisomies, 2 partial chromosomal deletions, 1 mosaic abnormality and 2 unclassified variants. Conclusions Rescue karyotyping using aCGH on DNA extracted from paraffin-embedded tissue provides the opportunity to obtain critical fetal cytogenetic information from a prior loss, even if it occurred years earlier. Given the ubiquitous archiving of paraffin embedded tissue obtained during a D&C and the ease of obtaining results despite long loss-to-testing intervals or early gestational age at time of fetal demise, this may provide a useful technique in the evaluation of couples with recurrent pregnancy loss. PMID:24589081

  15. DEVELOPMENT AND PERFORMANCE OF TEXT-MINING ALGORITHMS TO EXTRACT SOCIOECONOMIC STATUS FROM DE-IDENTIFIED ELECTRONIC HEALTH RECORDS.

    PubMed

    Hollister, Brittany M; Restrepo, Nicole A; Farber-Eger, Eric; Crawford, Dana C; Aldrich, Melinda C; Non, Amy

    2017-01-01

    Socioeconomic status (SES) is a fundamental contributor to health, and a key factor underlying racial disparities in disease. However, SES data are rarely included in genetic studies due in part to the difficultly of collecting these data when studies were not originally designed for that purpose. The emergence of large clinic-based biobanks linked to electronic health records (EHRs) provides research access to large patient populations with longitudinal phenotype data captured in structured fields as billing codes, procedure codes, and prescriptions. SES data however, are often not explicitly recorded in structured fields, but rather recorded in the free text of clinical notes and communications. The content and completeness of these data vary widely by practitioner. To enable gene-environment studies that consider SES as an exposure, we sought to extract SES variables from racial/ethnic minority adult patients (n=9,977) in BioVU, the Vanderbilt University Medical Center biorepository linked to de-identified EHRs. We developed several measures of SES using information available within the de-identified EHR, including broad categories of occupation, education, insurance status, and homelessness. Two hundred patients were randomly selected for manual review to develop a set of seven algorithms for extracting SES information from de-identified EHRs. The algorithms consist of 15 categories of information, with 830 unique search terms. SES data extracted from manual review of 50 randomly selected records were compared to data produced by the algorithm, resulting in positive predictive values of 80.0% (education), 85.4% (occupation), 87.5% (unemployment), 63.6% (retirement), 23.1% (uninsured), 81.8% (Medicaid), and 33.3% (homelessness), suggesting some categories of SES data are easier to extract in this EHR than others. The SES data extraction approach developed here will enable future EHR-based genetic studies to integrate SES information into statistical analyses. Ultimately, incorporation of measures of SES into genetic studies will help elucidate the impact of the social environment on disease risk and outcomes.

  16. Global Profiling and Novel Structure Discovery Using Multiple Neutral Loss/Precursor Ion Scanning Combined with Substructure Recognition and Statistical Analysis (MNPSS): Characterization of Terpene-Conjugated Curcuminoids in Curcuma longa as a Case Study.

    PubMed

    Qiao, Xue; Lin, Xiong-hao; Ji, Shuai; Zhang, Zheng-xiang; Bo, Tao; Guo, De-an; Ye, Min

    2016-01-05

    To fully understand the chemical diversity of an herbal medicine is challenging. In this work, we describe a new approach to globally profile and discover novel compounds from an herbal extract using multiple neutral loss/precursor ion scanning combined with substructure recognition and statistical analysis. Turmeric (the rhizomes of Curcuma longa L.) was used as an example. This approach consists of three steps: (i) multiple neutral loss/precursor ion scanning to obtain substructure information; (ii) targeted identification of new compounds by extracted ion current and substructure recognition; and (iii) untargeted identification using total ion current and multivariate statistical analysis to discover novel structures. Using this approach, 846 terpecurcumins (terpene-conjugated curcuminoids) were discovered from turmeric, including a number of potentially novel compounds. Furthermore, two unprecedented compounds (terpecurcumins X and Y) were purified, and their structures were identified by NMR spectroscopy. This study extended the application of mass spectrometry to global profiling of natural products in herbal medicines and could help chemists to rapidly discover novel compounds from a complex matrix.

  17. Extracting genetic alteration information for personalized cancer therapy from ClinicalTrials.gov

    PubMed Central

    Xu, Jun; Lee, Hee-Jin; Zeng, Jia; Wu, Yonghui; Zhang, Yaoyun; Huang, Liang-Chin; Johnson, Amber; Holla, Vijaykumar; Bailey, Ann M; Cohen, Trevor; Meric-Bernstam, Funda; Bernstam, Elmer V

    2016-01-01

    Objective: Clinical trials investigating drugs that target specific genetic alterations in tumors are important for promoting personalized cancer therapy. The goal of this project is to create a knowledge base of cancer treatment trials with annotations about genetic alterations from ClinicalTrials.gov. Methods: We developed a semi-automatic framework that combines advanced text-processing techniques with manual review to curate genetic alteration information in cancer trials. The framework consists of a document classification system to identify cancer treatment trials from ClinicalTrials.gov and an information extraction system to extract gene and alteration pairs from the Title and Eligibility Criteria sections of clinical trials. By applying the framework to trials at ClinicalTrials.gov, we created a knowledge base of cancer treatment trials with genetic alteration annotations. We then evaluated each component of the framework against manually reviewed sets of clinical trials and generated descriptive statistics of the knowledge base. Results and Discussion: The automated cancer treatment trial identification system achieved a high precision of 0.9944. Together with the manual review process, it identified 20 193 cancer treatment trials from ClinicalTrials.gov. The automated gene-alteration extraction system achieved a precision of 0.8300 and a recall of 0.6803. After validation by manual review, we generated a knowledge base of 2024 cancer trials that are labeled with specific genetic alteration information. Analysis of the knowledge base revealed the trend of increased use of targeted therapy for cancer, as well as top frequent gene-alteration pairs of interest. We expect this knowledge base to be a valuable resource for physicians and patients who are seeking information about personalized cancer therapy. PMID:27013523

  18. Extracting genetic alteration information for personalized cancer therapy from ClinicalTrials.gov.

    PubMed

    Xu, Jun; Lee, Hee-Jin; Zeng, Jia; Wu, Yonghui; Zhang, Yaoyun; Huang, Liang-Chin; Johnson, Amber; Holla, Vijaykumar; Bailey, Ann M; Cohen, Trevor; Meric-Bernstam, Funda; Bernstam, Elmer V; Xu, Hua

    2016-07-01

    Clinical trials investigating drugs that target specific genetic alterations in tumors are important for promoting personalized cancer therapy. The goal of this project is to create a knowledge base of cancer treatment trials with annotations about genetic alterations from ClinicalTrials.gov. We developed a semi-automatic framework that combines advanced text-processing techniques with manual review to curate genetic alteration information in cancer trials. The framework consists of a document classification system to identify cancer treatment trials from ClinicalTrials.gov and an information extraction system to extract gene and alteration pairs from the Title and Eligibility Criteria sections of clinical trials. By applying the framework to trials at ClinicalTrials.gov, we created a knowledge base of cancer treatment trials with genetic alteration annotations. We then evaluated each component of the framework against manually reviewed sets of clinical trials and generated descriptive statistics of the knowledge base. The automated cancer treatment trial identification system achieved a high precision of 0.9944. Together with the manual review process, it identified 20 193 cancer treatment trials from ClinicalTrials.gov. The automated gene-alteration extraction system achieved a precision of 0.8300 and a recall of 0.6803. After validation by manual review, we generated a knowledge base of 2024 cancer trials that are labeled with specific genetic alteration information. Analysis of the knowledge base revealed the trend of increased use of targeted therapy for cancer, as well as top frequent gene-alteration pairs of interest. We expect this knowledge base to be a valuable resource for physicians and patients who are seeking information about personalized cancer therapy. © The Author 2016. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  19. Extraction of phase information in daily stock prices

    NASA Astrophysics Data System (ADS)

    Fujiwara, Yoshi; Maekawa, Satoshi

    2000-06-01

    It is known that, in an intermediate time-scale such as days, stock market fluctuations possess several statistical properties that are common to different markets. Namely, logarithmic returns of an asset price have (i) truncated Pareto-Lévy distribution, (ii) vanishing linear correlation, (iii) volatility clustering and its power-law autocorrelation. The fact (ii) is a consequence of nonexistence of arbitragers with simple strategies, but this does not mean statistical independence of market fluctuations. Little attention has been paid to temporal structure of higher-order statistics, although it contains some important information on market dynamics. We applied a signal separation technique, called Independent Component Analysis (ICA), to actual data of daily stock prices in Tokyo and New York Stock Exchange (TSE/NYSE). ICA does a linear transformation of lag vectors from time-series to find independent components by a nonlinear algorithm. We obtained a similar impulse response for these dataset. If it were a Martingale process, it can be shown that impulse response should be a delta-function under a few conditions that could be numerically checked and as was verified by surrogate data. This result would provide information on the market dynamics including speculative bubbles and arbitrating processes. .

  20. Extracting semantic representations from word co-occurrence statistics: stop-lists, stemming, and SVD.

    PubMed

    Bullinaria, John A; Levy, Joseph P

    2012-09-01

    In a previous article, we presented a systematic computational study of the extraction of semantic representations from the word-word co-occurrence statistics of large text corpora. The conclusion was that semantic vectors of pointwise mutual information values from very small co-occurrence windows, together with a cosine distance measure, consistently resulted in the best representations across a range of psychologically relevant semantic tasks. This article extends that study by investigating the use of three further factors--namely, the application of stop-lists, word stemming, and dimensionality reduction using singular value decomposition (SVD)--that have been used to provide improved performance elsewhere. It also introduces an additional semantic task and explores the advantages of using a much larger corpus. This leads to the discovery and analysis of improved SVD-based methods for generating semantic representations (that provide new state-of-the-art performance on a standard TOEFL task) and the identification and discussion of problems and misleading results that can arise without a full systematic study.

  1. Making adjustments to event annotations for improved biological event extraction.

    PubMed

    Baek, Seung-Cheol; Park, Jong C

    2016-09-16

    Current state-of-the-art approaches to biological event extraction train statistical models in a supervised manner on corpora annotated with event triggers and event-argument relations. Inspecting such corpora, we observe that there is ambiguity in the span of event triggers (e.g., "transcriptional activity" vs. 'transcriptional'), leading to inconsistencies across event trigger annotations. Such inconsistencies make it quite likely that similar phrases are annotated with different spans of event triggers, suggesting the possibility that a statistical learning algorithm misses an opportunity for generalizing from such event triggers. We anticipate that adjustments to the span of event triggers to reduce these inconsistencies would meaningfully improve the present performance of event extraction systems. In this study, we look into this possibility with the corpora provided by the 2009 BioNLP shared task as a proof of concept. We propose an Informed Expectation-Maximization (EM) algorithm, which trains models using the EM algorithm with a posterior regularization technique, which consults the gold-standard event trigger annotations in a form of constraints. We further propose four constraints on the possible event trigger annotations to be explored by the EM algorithm. The algorithm is shown to outperform the state-of-the-art algorithm on the development corpus in a statistically significant manner and on the test corpus by a narrow margin. The analysis of the annotations generated by the algorithm shows that there are various types of ambiguity in event annotations, even though they could be small in number.

  2. Population Estimation in Singapore Based on Remote Sensing and Open Data

    NASA Astrophysics Data System (ADS)

    Guo, H.; Cao, K.; Wang, P.

    2017-09-01

    Population estimation statistics are widely used in government, commercial and educational sectors for a variety of purposes. With growing emphases on real-time and detailed population information, data users nowadays have switched from traditional census data to more technology-based data source such as LiDAR point cloud and High-Resolution Satellite Imagery. Nevertheless, such data are costly and periodically unavailable. In this paper, the authors use West Coast District, Singapore as a case study to investigate the applicability and effectiveness of using satellite image from Google Earth for extraction of building footprint and population estimation. At the same time, volunteered geographic information (VGI) is also utilized as ancillary data for building footprint extraction. Open data such as Open Street Map OSM could be employed to enhance the extraction process. In view of challenges in building shadow extraction, this paper discusses several methods including buffer, mask and shape index to improve accuracy. It also illustrates population estimation methods based on building height and number of floor estimates. The results show that the accuracy level of housing unit method on population estimation can reach 92.5 %, which is remarkably accurate. This paper thus provides insights into techniques for building extraction and fine-scale population estimation, which will benefit users such as urban planners in terms of policymaking and urban planning of Singapore.

  3. Knowledge Extraction from Atomically Resolved Images.

    PubMed

    Vlcek, Lukas; Maksov, Artem; Pan, Minghu; Vasudevan, Rama K; Kalinin, Sergei V

    2017-10-24

    Tremendous strides in experimental capabilities of scanning transmission electron microscopy and scanning tunneling microscopy (STM) over the past 30 years made atomically resolved imaging routine. However, consistent integration and use of atomically resolved data with generative models is unavailable, so information on local thermodynamics and other microscopic driving forces encoded in the observed atomic configurations remains hidden. Here, we present a framework based on statistical distance minimization to consistently utilize the information available from atomic configurations obtained from an atomically resolved image and extract meaningful physical interaction parameters. We illustrate the applicability of the framework on an STM image of a FeSe x Te 1-x superconductor, with the segregation of the chalcogen atoms investigated using a nonideal interacting solid solution model. This universal method makes full use of the microscopic degrees of freedom sampled in an atomically resolved image and can be extended via Bayesian inference toward unbiased model selection with uncertainty quantification.

  4. Exploring patterns of epigenetic information with data mining techniques.

    PubMed

    Aguiar-Pulido, Vanessa; Seoane, José A; Gestal, Marcos; Dorado, Julián

    2013-01-01

    Data mining, a part of the Knowledge Discovery in Databases process (KDD), is the process of extracting patterns from large data sets by combining methods from statistics and artificial intelligence with database management. Analyses of epigenetic data have evolved towards genome-wide and high-throughput approaches, thus generating great amounts of data for which data mining is essential. Part of these data may contain patterns of epigenetic information which are mitotically and/or meiotically heritable determining gene expression and cellular differentiation, as well as cellular fate. Epigenetic lesions and genetic mutations are acquired by individuals during their life and accumulate with ageing. Both defects, either together or individually, can result in losing control over cell growth and, thus, causing cancer development. Data mining techniques could be then used to extract the previous patterns. This work reviews some of the most important applications of data mining to epigenetics.

  5. Mutual information, neural networks and the renormalization group

    NASA Astrophysics Data System (ADS)

    Koch-Janusz, Maciej; Ringel, Zohar

    2018-06-01

    Physical systems differing in their microscopic details often display strikingly similar behaviour when probed at macroscopic scales. Those universal properties, largely determining their physical characteristics, are revealed by the powerful renormalization group (RG) procedure, which systematically retains `slow' degrees of freedom and integrates out the rest. However, the important degrees of freedom may be difficult to identify. Here we demonstrate a machine-learning algorithm capable of identifying the relevant degrees of freedom and executing RG steps iteratively without any prior knowledge about the system. We introduce an artificial neural network based on a model-independent, information-theoretic characterization of a real-space RG procedure, which performs this task. We apply the algorithm to classical statistical physics problems in one and two dimensions. We demonstrate RG flow and extract the Ising critical exponent. Our results demonstrate that machine-learning techniques can extract abstract physical concepts and consequently become an integral part of theory- and model-building.

  6. Fusion of Local Statistical Parameters for Buried Underwater Mine Detection in Sonar Imaging

    NASA Astrophysics Data System (ADS)

    Maussang, F.; Rombaut, M.; Chanussot, J.; Hétet, A.; Amate, M.

    2008-12-01

    Detection of buried underwater objects, and especially mines, is a current crucial strategic task. Images provided by sonar systems allowing to penetrate in the sea floor, such as the synthetic aperture sonars (SASs), are of great interest for the detection and classification of such objects. However, the signal-to-noise ratio is fairly low and advanced information processing is required for a correct and reliable detection of the echoes generated by the objects. The detection method proposed in this paper is based on a data-fusion architecture using the belief theory. The input data of this architecture are local statistical characteristics extracted from SAS data corresponding to the first-, second-, third-, and fourth-order statistical properties of the sonar images, respectively. The interest of these parameters is derived from a statistical model of the sonar data. Numerical criteria are also proposed to estimate the detection performances and to validate the method.

  7. Agile text mining for the 2014 i2b2/UTHealth Cardiac risk factors challenge.

    PubMed

    Cormack, James; Nath, Chinmoy; Milward, David; Raja, Kalpana; Jonnalagadda, Siddhartha R

    2015-12-01

    This paper describes the use of an agile text mining platform (Linguamatics' Interactive Information Extraction Platform, I2E) to extract document-level cardiac risk factors in patient records as defined in the i2b2/UTHealth 2014 challenge. The approach uses a data-driven rule-based methodology with the addition of a simple supervised classifier. We demonstrate that agile text mining allows for rapid optimization of extraction strategies, while post-processing can leverage annotation guidelines, corpus statistics and logic inferred from the gold standard data. We also show how data imbalance in a training set affects performance. Evaluation of this approach on the test data gave an F-Score of 91.7%, one percent behind the top performing system. Copyright © 2015 Elsevier Inc. All rights reserved.

  8. Statistics of dislocation pinning at localized obstacles

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Dutta, A.; Bhattacharya, M., E-mail: mishreyee@vecc.gov.in; Barat, P.

    2014-10-14

    Pinning of dislocations at nanosized obstacles like precipitates, voids, and bubbles is a crucial mechanism in the context of phenomena like hardening and creep. The interaction between such an obstacle and a dislocation is often studied at fundamental level by means of analytical tools, atomistic simulations, and finite element methods. Nevertheless, the information extracted from such studies cannot be utilized to its maximum extent on account of insufficient information about the underlying statistics of this process comprising a large number of dislocations and obstacles in a system. Here, we propose a new statistical approach, where the statistics of pinning ofmore » dislocations by idealized spherical obstacles is explored by taking into account the generalized size-distribution of the obstacles along with the dislocation density within a three-dimensional framework. Starting with a minimal set of material parameters, the framework employs the method of geometrical statistics with a few simple assumptions compatible with the real physical scenario. The application of this approach, in combination with the knowledge of fundamental dislocation-obstacle interactions, has successfully been demonstrated for dislocation pinning at nanovoids in neutron irradiated type 316-stainless steel in regard to the non-conservative motion of dislocations. An interesting phenomenon of transition from rare pinning to multiple pinning regimes with increasing irradiation temperature is revealed.« less

  9. Statistical similarity measures for link prediction in heterogeneous complex networks

    NASA Astrophysics Data System (ADS)

    Shakibian, Hadi; Charkari, Nasrollah Moghadam

    2018-07-01

    The majority of the link prediction measures in heterogeneous complex networks rely on the nodes connectivities while less attention has been paid to the importance of the nodes and paths. In this paper, we propose some new meta-path based statistical similarity measures to properly perform link prediction task. The main idea in the proposed measures is to drive some co-occurrence events in a number of co-occurrence matrices that are occurred between the visited nodes obeying a meta-path. The extracted co-occurrence matrices are analyzed in terms of the energy, inertia, local homogeneity, correlation, and information measure of correlation to determine various information theoretic measures. We evaluate the proposed measures, denoted as link energy, link inertia, link local homogeneity, link correlation, and link information measure of correlation, using a standard DBLP network data set. The results of the AUC score and Precision rate indicate the validity and accuracy of the proposed measures in comparison to the popular meta-path based similarity measures.

  10. Multi-layer cube sampling for liver boundary detection in PET-CT images.

    PubMed

    Liu, Xinxin; Yang, Jian; Song, Shuang; Song, Hong; Ai, Danni; Zhu, Jianjun; Jiang, Yurong; Wang, Yongtian

    2018-06-01

    Liver metabolic information is considered as a crucial diagnostic marker for the diagnosis of fever of unknown origin, and liver recognition is the basis of automatic diagnosis of metabolic information extraction. However, the poor quality of PET and CT images is a challenge for information extraction and target recognition in PET-CT images. The existing detection method cannot meet the requirement of liver recognition in PET-CT images, which is the key problem in the big data analysis of PET-CT images. A novel texture feature descriptor called multi-layer cube sampling (MLCS) is developed for liver boundary detection in low-dose CT and PET images. The cube sampling feature is proposed for extracting more texture information, which uses a bi-centric voxel strategy. Neighbour voxels are divided into three regions by the centre voxel and the reference voxel in the histogram, and the voxel distribution information is statistically classified as texture feature. Multi-layer texture features are also used to improve the ability and adaptability of target recognition in volume data. The proposed feature is tested on the PET and CT images for liver boundary detection. For the liver in the volume data, mean detection rate (DR) and mean error rate (ER) reached 95.15 and 7.81% in low-quality PET images, and 83.10 and 21.08% in low-contrast CT images. The experimental results demonstrated that the proposed method is effective and robust for liver boundary detection.

  11. Assigning Polarity to Causal Information in Financial Articles on Business Performance of Companies

    NASA Astrophysics Data System (ADS)

    Sakai, Hiroyuki; Masuyama, Shigeru

    We propose a method of assigning polarity to causal information extracted from Japanese financial articles concerning business performance of companies. Our method assigns polarity (positive or negative) to causal information in accordance with business performance, e.g. “zidousya no uriage ga koutyou: (Sales of cars are good)” (The polarity positive is assigned in this example). We may use causal expressions assigned polarity by our method, e.g., to analyze content of articles concerning business performance circumstantially. First, our method classifies articles concerning business performance into positive articles and negative articles. Using them, our method assigns polarity (positive or negative) to causal information extracted from the set of articles concerning business performance. Although our method needs training dataset for classifying articles concerning business performance into positive and negative ones, our method does not need a training dataset for assigning polarity to causal information. Hence, even if causal information not appearing in the training dataset for classifying articles concerning business performance into positive and negative ones exist, our method is able to assign it polarity by using statistical information of this classified sets of articles. We evaluated our method and confirmed that it attained 74.4% precision and 50.4% recall of assigning polarity positive, and 76.8% precision and 61.5% recall of assigning polarity negative, respectively.

  12. Mutual information-based feature selection for radiomics

    NASA Astrophysics Data System (ADS)

    Oubel, Estanislao; Beaumont, Hubert; Iannessi, Antoine

    2016-03-01

    Background The extraction and analysis of image features (radiomics) is a promising field in the precision medicine era, with applications to prognosis, prediction, and response to treatment quantification. In this work, we present a mutual information - based method for quantifying reproducibility of features, a necessary step for qualification before their inclusion in big data systems. Materials and Methods Ten patients with Non-Small Cell Lung Cancer (NSCLC) lesions were followed over time (7 time points in average) with Computed Tomography (CT). Five observers segmented lesions by using a semi-automatic method and 27 features describing shape and intensity distribution were extracted. Inter-observer reproducibility was assessed by computing the multi-information (MI) of feature changes over time, and the variability of global extrema. Results The highest MI values were obtained for volume-based features (VBF). The lesion mass (M), surface to volume ratio (SVR) and volume (V) presented statistically significant higher values of MI than the rest of features. Within the same VBF group, SVR showed also the lowest variability of extrema. The correlation coefficient (CC) of feature values was unable to make a difference between features. Conclusions MI allowed to discriminate three features (M, SVR, and V) from the rest in a statistically significant manner. This result is consistent with the order obtained when sorting features by increasing values of extrema variability. MI is a promising alternative for selecting features to be considered as surrogate biomarkers in a precision medicine context.

  13. Advances in Statistical Methods for Substance Abuse Prevention Research

    PubMed Central

    MacKinnon, David P.; Lockwood, Chondra M.

    2010-01-01

    The paper describes advances in statistical methods for prevention research with a particular focus on substance abuse prevention. Standard analysis methods are extended to the typical research designs and characteristics of the data collected in prevention research. Prevention research often includes longitudinal measurement, clustering of data in units such as schools or clinics, missing data, and categorical as well as continuous outcome variables. Statistical methods to handle these features of prevention data are outlined. Developments in mediation, moderation, and implementation analysis allow for the extraction of more detailed information from a prevention study. Advancements in the interpretation of prevention research results include more widespread calculation of effect size and statistical power, the use of confidence intervals as well as hypothesis testing, detailed causal analysis of research findings, and meta-analysis. The increased availability of statistical software has contributed greatly to the use of new methods in prevention research. It is likely that the Internet will continue to stimulate the development and application of new methods. PMID:12940467

  14. Extraction of lead and ridge characteristics from SAR images of sea ice

    NASA Technical Reports Server (NTRS)

    Vesecky, John F.; Smith, Martha P.; Samadani, Ramin

    1990-01-01

    Image-processing techniques for extracting the characteristics of lead and pressure ridge features in SAR images of sea ice are reported. The methods are applied to a SAR image of the Beaufort Sea collected from the Seasat satellite on October 3, 1978. Estimates of lead and ridge statistics are made, e.g., lead and ridge density (number of lead or ridge pixels per unit area of image) and the distribution of lead area and orientation as well as ridge length and orientation. The information derived is useful in both ice science and polar operations for such applications as albedo and heat and momentum transfer estimates, as well as ship routing and offshore engineering.

  15. Analysis on Time-Lag Effect of Research and Development Investment in the Pharmaceutical Industry in Korea

    PubMed Central

    Lee, Munjae; Choi, Mankyu

    2015-01-01

    Objectives The aim of this study is to analyze the influence of the research and development (R&D) investment of pharmaceutical companies on enterprise value. Methods The period of the empirical analysis is from 2000 to 2012, considering the period after the influence of the financial crisis. Financial statements and comments in general and internal transactions were extracted from TS-2000 of the Korea Listed Company Association, and data related to stock price were extracted from KISVALUE-III of National Information and Credit Evaluation Information Service Co., Ltd. STATA 12.0 was used as the statistical package for panel analysis. Results In the pharmaceutical firms, the influence of the R&D intensity with regard to Tobin's q was found to be positive. However, only the R&D expenditure intensities of previous years 2 and 5 (t–2 and t–5, respectively) were statistically significant (p < 0.1), whereas those of previous years 1, 3, and 4 years (t–1, t–3, and t–4, respectively) were not statistically significant. Conclusion R&D investment not only affects the enterprise value but is also evaluated as an investment activity that raises the long-term enterprise value. The research findings will serve as valuable data to understand the enterprise value of the Korea pharmaceutical industry and to strengthen reform measures. Not only should new drug development be made, but also investment and support should be provided according to the specific factors suitable to improve the competitiveness of each company, such as generic, incrementally modified drugs, and biosimilar products. PMID:26473091

  16. Analysis on Time-Lag Effect of Research and Development Investment in the Pharmaceutical Industry in Korea.

    PubMed

    Lee, Munjae; Choi, Mankyu

    2015-08-01

    The aim of this study is to analyze the influence of the research and development (R&D) investment of pharmaceutical companies on enterprise value. The period of the empirical analysis is from 2000 to 2012, considering the period after the influence of the financial crisis. Financial statements and comments in general and internal transactions were extracted from TS-2000 of the Korea Listed Company Association, and data related to stock price were extracted from KISVALUE-III of National Information and Credit Evaluation Information Service Co., Ltd. STATA 12.0 was used as the statistical package for panel analysis. In the pharmaceutical firms, the influence of the R&D intensity with regard to Tobin's q was found to be positive. However, only the R&D expenditure intensities of previous years 2 and 5 (t-2 and t-5, respectively) were statistically significant (p < 0.1), whereas those of previous years 1, 3, and 4 years (t-1, t-3, and t-4, respectively) were not statistically significant. R&D investment not only affects the enterprise value but is also evaluated as an investment activity that raises the long-term enterprise value. The research findings will serve as valuable data to understand the enterprise value of the Korea pharmaceutical industry and to strengthen reform measures. Not only should new drug development be made, but also investment and support should be provided according to the specific factors suitable to improve the competitiveness of each company, such as generic, incrementally modified drugs, and biosimilar products.

  17. Functional differences between statistical learning with and without explicit training

    PubMed Central

    Reber, Paul J.; Paller, Ken A.

    2015-01-01

    Humans are capable of rapidly extracting regularities from environmental input, a process known as statistical learning. This type of learning typically occurs automatically, through passive exposure to environmental input. The presumed function of statistical learning is to optimize processing, allowing the brain to more accurately predict and prepare for incoming input. In this study, we ask whether the function of statistical learning may be enhanced through supplementary explicit training, in which underlying regularities are explicitly taught rather than simply abstracted through exposure. Learners were randomly assigned either to an explicit group or an implicit group. All learners were exposed to a continuous stream of repeating nonsense words. Prior to this implicit training, learners in the explicit group received supplementary explicit training on the nonsense words. Statistical learning was assessed through a speeded reaction-time (RT) task, which measured the extent to which learners used acquired statistical knowledge to optimize online processing. Both RTs and brain potentials revealed significant differences in online processing as a function of training condition. RTs showed a crossover interaction; responses in the explicit group were faster to predictable targets and marginally slower to less predictable targets relative to responses in the implicit group. P300 potentials to predictable targets were larger in the explicit group than in the implicit group, suggesting greater recruitment of controlled, effortful processes. Taken together, these results suggest that information abstracted through passive exposure during statistical learning may be processed more automatically and with less effort than information that is acquired explicitly. PMID:26472644

  18. SA-Mot: a web server for the identification of motifs of interest extracted from protein loops

    PubMed Central

    Regad, Leslie; Saladin, Adrien; Maupetit, Julien; Geneix, Colette; Camproux, Anne-Claude

    2011-01-01

    The detection of functional motifs is an important step for the determination of protein functions. We present here a new web server SA-Mot (Structural Alphabet Motif) for the extraction and location of structural motifs of interest from protein loops. Contrary to other methods, SA-Mot does not focus only on functional motifs, but it extracts recurrent and conserved structural motifs involved in structural redundancy of loops. SA-Mot uses the structural word notion to extract all structural motifs from uni-dimensional sequences corresponding to loop structures. Then, SA-Mot provides a description of these structural motifs using statistics computed in the loop data set and in SCOP superfamily, sequence and structural parameters. SA-Mot results correspond to an interactive table listing all structural motifs extracted from a target structure and their associated descriptors. Using this information, the users can easily locate loop regions that are important for the protein folding and function. The SA-Mot web server is available at http://sa-mot.mti.univ-paris-diderot.fr. PMID:21665924

  19. SA-Mot: a web server for the identification of motifs of interest extracted from protein loops.

    PubMed

    Regad, Leslie; Saladin, Adrien; Maupetit, Julien; Geneix, Colette; Camproux, Anne-Claude

    2011-07-01

    The detection of functional motifs is an important step for the determination of protein functions. We present here a new web server SA-Mot (Structural Alphabet Motif) for the extraction and location of structural motifs of interest from protein loops. Contrary to other methods, SA-Mot does not focus only on functional motifs, but it extracts recurrent and conserved structural motifs involved in structural redundancy of loops. SA-Mot uses the structural word notion to extract all structural motifs from uni-dimensional sequences corresponding to loop structures. Then, SA-Mot provides a description of these structural motifs using statistics computed in the loop data set and in SCOP superfamily, sequence and structural parameters. SA-Mot results correspond to an interactive table listing all structural motifs extracted from a target structure and their associated descriptors. Using this information, the users can easily locate loop regions that are important for the protein folding and function. The SA-Mot web server is available at http://sa-mot.mti.univ-paris-diderot.fr.

  20. Steroid hormones in environmental matrices: extraction method comparison.

    PubMed

    Andaluri, Gangadhar; Suri, Rominder P S; Graham, Kendon

    2017-11-09

    The U.S. Environmental Protection Agency (EPA) has developed methods for the analysis of steroid hormones in water, soil, sediment, and municipal biosolids by HRGC/HRMS (EPA Method 1698). Following the guidelines provided in US-EPA Method 1698, the extraction methods were validated with reagent water and applied to municipal wastewater, surface water, and municipal biosolids using GC/MS/MS for the analysis of nine most commonly detected steroid hormones. This is the first reported comparison of the separatory funnel extraction (SFE), continuous liquid-liquid extraction (CLLE), and Soxhlet extraction methods developed by the U.S. EPA. Furthermore, a solid phase extraction (SPE) method was also developed in-house for the extraction of steroid hormones from aquatic environmental samples. This study provides valuable information regarding the robustness of the different extraction methods. Statistical analysis of the data showed that SPE-based methods provided better recovery efficiencies and lower variability of the steroid hormones followed by SFE. The analytical methods developed in-house for extraction of biosolids showed a wide recovery range; however, the variability was low (≤ 7% RSD). Soxhlet extraction and CLLE are lengthy procedures and have been shown to provide highly variably recovery efficiencies. The results of this study are guidance for better sample preparation strategies in analytical methods for steroid hormone analysis, and SPE adds to the choice in environmental sample analysis.

  1. Keywords and Co-Occurrence Patterns in the Voynich Manuscript: An Information-Theoretic Analysis

    PubMed Central

    Montemurro, Marcelo A.; Zanette, Damián H.

    2013-01-01

    The Voynich manuscript has remained so far as a mystery for linguists and cryptologists. While the text written on medieval parchment -using an unknown script system- shows basic statistical patterns that bear resemblance to those from real languages, there are features that suggested to some researches that the manuscript was a forgery intended as a hoax. Here we analyse the long-range structure of the manuscript using methods from information theory. We show that the Voynich manuscript presents a complex organization in the distribution of words that is compatible with those found in real language sequences. We are also able to extract some of the most significant semantic word-networks in the text. These results together with some previously known statistical features of the Voynich manuscript, give support to the presence of a genuine message inside the book. PMID:23805215

  2. Constructing networks with correlation maximization methods.

    PubMed

    Mellor, Joseph C; Wu, Jie; Delisi, Charles

    2004-01-01

    Problems of inference in systems biology are ideally reduced to formulations which can efficiently represent the features of interest. In the case of predicting gene regulation and pathway networks, an important feature which describes connected genes and proteins is the relationship between active and inactive forms, i.e. between the "on" and "off" states of the components. While not optimal at the limits of resolution, these logical relationships between discrete states can often yield good approximations of the behavior in larger complex systems, where exact representation of measurement relationships may be intractable. We explore techniques for extracting binary state variables from measurement of gene expression, and go on to describe robust measures for statistical significance and information that can be applied to many such types of data. We show how statistical strength and information are equivalent criteria in limiting cases, and demonstrate the application of these measures to simple systems of gene regulation.

  3. Accelerometry-based classification of human activities using Markov modeling.

    PubMed

    Mannini, Andrea; Sabatini, Angelo Maria

    2011-01-01

    Accelerometers are a popular choice as body-motion sensors: the reason is partly in their capability of extracting information that is useful for automatically inferring the physical activity in which the human subject is involved, beside their role in feeding biomechanical parameters estimators. Automatic classification of human physical activities is highly attractive for pervasive computing systems, whereas contextual awareness may ease the human-machine interaction, and in biomedicine, whereas wearable sensor systems are proposed for long-term monitoring. This paper is concerned with the machine learning algorithms needed to perform the classification task. Hidden Markov Model (HMM) classifiers are studied by contrasting them with Gaussian Mixture Model (GMM) classifiers. HMMs incorporate the statistical information available on movement dynamics into the classification process, without discarding the time history of previous outcomes as GMMs do. An example of the benefits of the obtained statistical leverage is illustrated and discussed by analyzing two datasets of accelerometer time series.

  4. Estimation of road profile variability from measured vehicle responses

    NASA Astrophysics Data System (ADS)

    Fauriat, W.; Mattrand, C.; Gayton, N.; Beakou, A.; Cembrzynski, T.

    2016-05-01

    When assessing the statistical variability of fatigue loads acting throughout the life of a vehicle, the question of the variability of road roughness naturally arises, as both quantities are strongly related. For car manufacturers, gathering information on the environment in which vehicles evolve is a long and costly but necessary process to adapt their products to durability requirements. In the present paper, a data processing algorithm is proposed in order to estimate the road profiles covered by a given vehicle, from the dynamic responses measured on this vehicle. The algorithm based on Kalman filtering theory aims at solving a so-called inverse problem, in a stochastic framework. It is validated using experimental data obtained from simulations and real measurements. The proposed method is subsequently applied to extract valuable statistical information on road roughness from an existing load characterisation campaign carried out by Renault within one of its markets.

  5. Trade-off between information and disturbance in qubit thermometry

    NASA Astrophysics Data System (ADS)

    Seveso, Luigi; Paris, Matteo G. A.

    2018-03-01

    We address the trade-off between information and disturbance in qubit thermometry from the perspective of quantum estimation theory. Given a quantum measurement, we quantify information via the Fisher information of the measurement and disturbance via four different figures of merit, which capture different aspects (statistical, thermodynamical, geometrical) of the trade-off. For each disturbance measure, the efficient measurements, i.e., the measurements that introduce a disturbance not greater than any other measurement extracting the same amount of information, are determined explicitly. The family of efficient measurements varies with the choice of the disturbance measure. On the other hand, commutativity between the elements of the probability operator-valued measure (POVM) and the equilibrium state of the thermometer is a necessary condition for efficiency with respect to any figure of disturbance.

  6. Remote Video Monitor of Vehicles in Cooperative Information Platform

    NASA Astrophysics Data System (ADS)

    Qin, Guofeng; Wang, Xiaoguo; Wang, Li; Li, Yang; Li, Qiyan

    Detection of vehicles plays an important role in the area of the modern intelligent traffic management. And the pattern recognition is a hot issue in the area of computer vision. An auto- recognition system in cooperative information platform is studied. In the cooperative platform, 3G wireless network, including GPS, GPRS (CDMA), Internet (Intranet), remote video monitor and M-DMB networks are integrated. The remote video information can be taken from the terminals and sent to the cooperative platform, then detected by the auto-recognition system. The images are pretreated and segmented, including feature extraction, template matching and pattern recognition. The system identifies different models and gets vehicular traffic statistics. Finally, the implementation of the system is introduced.

  7. Extracting features of Gaussian self-similar stochastic processes via the Bandt-Pompe approach.

    PubMed

    Rosso, O A; Zunino, L; Pérez, D G; Figliola, A; Larrondo, H A; Garavaglia, M; Martín, M T; Plastino, A

    2007-12-01

    By recourse to appropriate information theory quantifiers (normalized Shannon entropy and Martín-Plastino-Rosso intensive statistical complexity measure), we revisit the characterization of Gaussian self-similar stochastic processes from a Bandt-Pompe viewpoint. We show that the ensuing approach exhibits considerable advantages with respect to other treatments. In particular, clear quantifiers gaps are found in the transition between the continuous processes and their associated noises.

  8. Are figure legends sufficient? Evaluating the contribution of associated text to biomedical figure comprehension.

    PubMed

    Yu, Hong; Agarwal, Shashank; Johnston, Mark; Cohen, Aaron

    2009-01-06

    Biomedical scientists need to access figures to validate research facts and to formulate or to test novel research hypotheses. However, figures are difficult to comprehend without associated text (e.g., figure legend and other reference text). We are developing automated systems to extract the relevant explanatory information along with figures extracted from full text articles. Such systems could be very useful in improving figure retrieval and in reducing the workload of biomedical scientists, who otherwise have to retrieve and read the entire full-text journal article to determine which figures are relevant to their research. As a crucial step, we studied the importance of associated text in biomedical figure comprehension. Twenty subjects evaluated three figure-text combinations: figure+legend, figure+legend+title+abstract, and figure+full-text. Using a Likert scale, each subject scored each figure+text according to the extent to which the subject thought he/she understood the meaning of the figure and the confidence in providing the assigned score. Additionally, each subject entered a free text summary for each figure-text. We identified missing information using indicator words present within the text summaries. Both the Likert scores and the missing information were statistically analyzed for differences among the figure-text types. We also evaluated the quality of text summaries with the text-summarization evaluation method the ROUGE score. Our results showed statistically significant differences in figure comprehension when varying levels of text were provided. When the full-text article is not available, presenting just the figure+legend left biomedical researchers lacking 39-68% of the information about a figure as compared to having complete figure comprehension; adding the title and abstract improved the situation, but still left biomedical researchers missing 30% of the information. When the full-text article is available, figure comprehension increased to 86-97%; this indicates that researchers felt that only 3-14% of the necessary information for full figure comprehension was missing when full text was available to them. Clearly there is information in the abstract and in the full text that biomedical scientists deem important for understanding the figures that appear in full-text biomedical articles. We conclude that the texts that appear in full-text biomedical articles are useful for understanding the meaning of a figure, and an effective figure-mining system needs to unlock the information beyond figure legend. Our work provides important guidance to the figure mining systems that extract information only from figure and figure legend.

  9. Are figure legends sufficient? Evaluating the contribution of associated text to biomedical figure comprehension

    PubMed Central

    2009-01-01

    Background Biomedical scientists need to access figures to validate research facts and to formulate or to test novel research hypotheses. However, figures are difficult to comprehend without associated text (e.g., figure legend and other reference text). We are developing automated systems to extract the relevant explanatory information along with figures extracted from full text articles. Such systems could be very useful in improving figure retrieval and in reducing the workload of biomedical scientists, who otherwise have to retrieve and read the entire full-text journal article to determine which figures are relevant to their research. As a crucial step, we studied the importance of associated text in biomedical figure comprehension. Methods Twenty subjects evaluated three figure-text combinations: figure+legend, figure+legend+title+abstract, and figure+full-text. Using a Likert scale, each subject scored each figure+text according to the extent to which the subject thought he/she understood the meaning of the figure and the confidence in providing the assigned score. Additionally, each subject entered a free text summary for each figure-text. We identified missing information using indicator words present within the text summaries. Both the Likert scores and the missing information were statistically analyzed for differences among the figure-text types. We also evaluated the quality of text summaries with the text-summarization evaluation method the ROUGE score. Results Our results showed statistically significant differences in figure comprehension when varying levels of text were provided. When the full-text article is not available, presenting just the figure+legend left biomedical researchers lacking 39–68% of the information about a figure as compared to having complete figure comprehension; adding the title and abstract improved the situation, but still left biomedical researchers missing 30% of the information. When the full-text article is available, figure comprehension increased to 86–97%; this indicates that researchers felt that only 3–14% of the necessary information for full figure comprehension was missing when full text was available to them. Clearly there is information in the abstract and in the full text that biomedical scientists deem important for understanding the figures that appear in full-text biomedical articles. Conclusion We conclude that the texts that appear in full-text biomedical articles are useful for understanding the meaning of a figure, and an effective figure-mining system needs to unlock the information beyond figure legend. Our work provides important guidance to the figure mining systems that extract information only from figure and figure legend. PMID:19126221

  10. Statistical evaluation of fatty acid profile and cholesterol content in fish (common carp) lipids obtained by different sample preparation procedures.

    PubMed

    Spiric, Aurelija; Trbovic, Dejana; Vranic, Danijela; Djinovic, Jasna; Petronijevic, Radivoj; Matekalo-Sverak, Vesna

    2010-07-05

    Studies performed on lipid extraction from animal and fish tissues do not provide information on its influence on fatty acid composition of the extracted lipids as well as on cholesterol content. Data presented in this paper indicate the impact of extraction procedures on fatty acid profile of fish lipids extracted by the modified Soxhlet and ASE (accelerated solvent extraction) procedure. Cholesterol was also determined by direct saponification method, too. Student's paired t-test used for comparison of the total fat content in carp fish population obtained by two extraction methods shows that differences between values of the total fat content determined by ASE and modified Soxhlet method are not statistically significant. Values obtained by three different methods (direct saponification, ASE and modified Soxhlet method), used for determination of cholesterol content in carp, were compared by one-way analysis of variance (ANOVA). The obtained results show that modified Soxhlet method gives results which differ significantly from the results obtained by direct saponification and ASE method. However the results obtained by direct saponification and ASE method do not differ significantly from each other. The highest quantities for cholesterol (37.65 to 65.44 mg/100 g) in the analyzed fish muscle were obtained by applying direct saponification method, as less destructive one, followed by ASE (34.16 to 52.60 mg/100 g) and modified Soxhlet extraction method (10.73 to 30.83 mg/100 g). Modified Soxhlet method for extraction of fish lipids gives higher values for n-6 fatty acids than ASE method (t(paired)=3.22 t(c)=2.36), while there is no statistically significant difference in the n-3 content levels between the methods (t(paired)=1.31). The UNSFA/SFA ratio obtained by using modified Soxhlet method is also higher than the ratio obtained using ASE method (t(paired)=4.88 t(c)=2.36). Results of Principal Component Analysis (PCA) showed that the highest positive impact to the second principal component (PC2) is recorded by C18:3 n-3, and C20:3 n-6, being present in a higher amount in the samples treated by the modified Soxhlet extraction, while C22:5 n-3, C20:3 n-3, C22:1 and C20:4, C16 and C18 negatively influence the score values of the PC2, showing significantly increased level in the samples treated by ASE method. Hotelling's paired T-square test used on the first three principal components for confirmation of differences in individual fatty acid content obtained by ASE and Soxhlet method in carp muscle showed statistically significant difference between these two data sets (T(2)=161.308, p<0.001). Copyright 2010 Elsevier B.V. All rights reserved.

  11. The spread of scientific information: insights from the web usage statistics in PLoS article-level metrics.

    PubMed

    Yan, Koon-Kiu; Gerstein, Mark

    2011-01-01

    The presence of web-based communities is a distinctive signature of Web 2.0. The web-based feature means that information propagation within each community is highly facilitated, promoting complex collective dynamics in view of information exchange. In this work, we focus on a community of scientists and study, in particular, how the awareness of a scientific paper is spread. Our work is based on the web usage statistics obtained from the PLoS Article Level Metrics dataset compiled by PLoS. The cumulative number of HTML views was found to follow a long tail distribution which is reasonably well-fitted by a lognormal one. We modeled the diffusion of information by a random multiplicative process, and thus extracted the rates of information spread at different stages after the publication of a paper. We found that the spread of information displays two distinct decay regimes: a rapid downfall in the first month after publication, and a gradual power law decay afterwards. We identified these two regimes with two distinct driving processes: a short-term behavior driven by the fame of a paper, and a long-term behavior consistent with citation statistics. The patterns of information spread were found to be remarkably similar in data from different journals, but there are intrinsic differences for different types of web usage (HTML views and PDF downloads versus XML). These similarities and differences shed light on the theoretical understanding of different complex systems, as well as a better design of the corresponding web applications that is of high potential marketing impact.

  12. The Spread of Scientific Information: Insights from the Web Usage Statistics in PLoS Article-Level Metrics

    PubMed Central

    Yan, Koon-Kiu; Gerstein, Mark

    2011-01-01

    The presence of web-based communities is a distinctive signature of Web 2.0. The web-based feature means that information propagation within each community is highly facilitated, promoting complex collective dynamics in view of information exchange. In this work, we focus on a community of scientists and study, in particular, how the awareness of a scientific paper is spread. Our work is based on the web usage statistics obtained from the PLoS Article Level Metrics dataset compiled by PLoS. The cumulative number of HTML views was found to follow a long tail distribution which is reasonably well-fitted by a lognormal one. We modeled the diffusion of information by a random multiplicative process, and thus extracted the rates of information spread at different stages after the publication of a paper. We found that the spread of information displays two distinct decay regimes: a rapid downfall in the first month after publication, and a gradual power law decay afterwards. We identified these two regimes with two distinct driving processes: a short-term behavior driven by the fame of a paper, and a long-term behavior consistent with citation statistics. The patterns of information spread were found to be remarkably similar in data from different journals, but there are intrinsic differences for different types of web usage (HTML views and PDF downloads versus XML). These similarities and differences shed light on the theoretical understanding of different complex systems, as well as a better design of the corresponding web applications that is of high potential marketing impact. PMID:21603617

  13. Developing a complex independent component analysis technique to extract non-stationary patterns from geophysical time-series

    NASA Astrophysics Data System (ADS)

    Forootan, Ehsan; Kusche, Jürgen

    2016-04-01

    Geodetic/geophysical observations, such as the time series of global terrestrial water storage change or sea level and temperature change, represent samples of physical processes and therefore contain information about complex physical interactionswith many inherent time scales. Extracting relevant information from these samples, for example quantifying the seasonality of a physical process or its variability due to large-scale ocean-atmosphere interactions, is not possible by rendering simple time series approaches. In the last decades, decomposition techniques have found increasing interest for extracting patterns from geophysical observations. Traditionally, principal component analysis (PCA) and more recently independent component analysis (ICA) are common techniques to extract statistical orthogonal (uncorrelated) and independent modes that represent the maximum variance of observations, respectively. PCA and ICA can be classified as stationary signal decomposition techniques since they are based on decomposing the auto-covariance matrix or diagonalizing higher (than two)-order statistical tensors from centered time series. However, the stationary assumption is obviously not justifiable for many geophysical and climate variables even after removing cyclic components e.g., the seasonal cycles. In this paper, we present a new decomposition method, the complex independent component analysis (CICA, Forootan, PhD-2014), which can be applied to extract to non-stationary (changing in space and time) patterns from geophysical time series. Here, CICA is derived as an extension of real-valued ICA (Forootan and Kusche, JoG-2012), where we (i) define a new complex data set using a Hilbert transformation. The complex time series contain the observed values in their real part, and the temporal rate of variability in their imaginary part. (ii) An ICA algorithm based on diagonalization of fourth-order cumulants is then applied to decompose the new complex data set in (i). (iii) Dominant non-stationary patterns are recognized as independent complex patterns that can be used to represent the space and time amplitude and phase propagations. We present the results of CICA on simulated and real cases e.g., for quantifying the impact of large-scale ocean-atmosphere interaction on global mass changes. Forootan (PhD-2014) Statistical signal decomposition techniques for analyzing time-variable satellite gravimetry data, PhD Thesis, University of Bonn, http://hss.ulb.uni-bonn.de/2014/3766/3766.htm Forootan and Kusche (JoG-2012) Separation of global time-variable gravity signals into maximally independent components, Journal of Geodesy 86 (7), 477-497, doi: 10.1007/s00190-011-0532-5

  14. Walking through the statistical black boxes of plant breeding.

    PubMed

    Xavier, Alencar; Muir, William M; Craig, Bruce; Rainey, Katy Martin

    2016-10-01

    The main statistical procedures in plant breeding are based on Gaussian process and can be computed through mixed linear models. Intelligent decision making relies on our ability to extract useful information from data to help us achieve our goals more efficiently. Many plant breeders and geneticists perform statistical analyses without understanding the underlying assumptions of the methods or their strengths and pitfalls. In other words, they treat these statistical methods (software and programs) like black boxes. Black boxes represent complex pieces of machinery with contents that are not fully understood by the user. The user sees the inputs and outputs without knowing how the outputs are generated. By providing a general background on statistical methodologies, this review aims (1) to introduce basic concepts of machine learning and its applications to plant breeding; (2) to link classical selection theory to current statistical approaches; (3) to show how to solve mixed models and extend their application to pedigree-based and genomic-based prediction; and (4) to clarify how the algorithms of genome-wide association studies work, including their assumptions and limitations.

  15. Evaluation of Supercritical Extracts of Algae as Biostimulants of Plant Growth in Field Trials.

    PubMed

    Michalak, Izabela; Chojnacka, Katarzyna; Dmytryk, Agnieszka; Wilk, Radosław; Gramza, Mateusz; Rój, Edward

    2016-01-01

    The aim of the field trials was to determine the influence of supercritical algal extracts on the growth and development of winter wheat (variety Akteur ). As a raw material for the supercritical fluid extraction, the biomass of microalga Spirulina plantensis , brown seaweed - Ascophyllum nodosum and Baltic green macroalgae was used. Forthial and Asahi SL constituted the reference products. It was found that the tested biostimulants did not influence statistically significantly the plant height, length of ear, and shank length. The ear number per m 2 was the highest in the group where the Baltic macroalgae extract was applied in the dose 1.0 L/ha (statistically significant differences). Number of grains in ear (statistically significant differences) and shank length was the highest in the group treated with Spirulina at the dose 1.5 L/ha. In the group with Ascophyllum at the dose 1.0 L/ha, the highest length of ear was observed. The yield was comparable in all the experimental groups (lack of statistically significant differences). Among the tested supercritical extracts, the best results were obtained for Spirulina (1.5 L/ha). The mass of 1000 grains was the highest for extract from Baltic macroalgae and was 3.5% higher than for Asahi, 4.0% higher than for Forthial and 18.5% higher than for the control group (statistically significant differences). Future work is needed to fully characterize the chemical composition of the applied algal extracts. A special attention should be paid to the extracts obtained from Baltic algae because they are inexpensive source of naturally occurring bioactive compounds, which can be used in sustainable agriculture and horticulture.

  16. Evaluation of Supercritical Extracts of Algae as Biostimulants of Plant Growth in Field Trials

    PubMed Central

    Michalak, Izabela; Chojnacka, Katarzyna; Dmytryk, Agnieszka; Wilk, Radosław; Gramza, Mateusz; Rój, Edward

    2016-01-01

    The aim of the field trials was to determine the influence of supercritical algal extracts on the growth and development of winter wheat (variety Akteur). As a raw material for the supercritical fluid extraction, the biomass of microalga Spirulina plantensis, brown seaweed – Ascophyllum nodosum and Baltic green macroalgae was used. Forthial and Asahi SL constituted the reference products. It was found that the tested biostimulants did not influence statistically significantly the plant height, length of ear, and shank length. The ear number per m2 was the highest in the group where the Baltic macroalgae extract was applied in the dose 1.0 L/ha (statistically significant differences). Number of grains in ear (statistically significant differences) and shank length was the highest in the group treated with Spirulina at the dose 1.5 L/ha. In the group with Ascophyllum at the dose 1.0 L/ha, the highest length of ear was observed. The yield was comparable in all the experimental groups (lack of statistically significant differences). Among the tested supercritical extracts, the best results were obtained for Spirulina (1.5 L/ha). The mass of 1000 grains was the highest for extract from Baltic macroalgae and was 3.5% higher than for Asahi, 4.0% higher than for Forthial and 18.5% higher than for the control group (statistically significant differences). Future work is needed to fully characterize the chemical composition of the applied algal extracts. A special attention should be paid to the extracts obtained from Baltic algae because they are inexpensive source of naturally occurring bioactive compounds, which can be used in sustainable agriculture and horticulture. PMID:27826310

  17. Sequence History Update Tool

    NASA Technical Reports Server (NTRS)

    Khanampompan, Teerapat; Gladden, Roy; Fisher, Forest; DelGuercio, Chris

    2008-01-01

    The Sequence History Update Tool performs Web-based sequence statistics archiving for Mars Reconnaissance Orbiter (MRO). Using a single UNIX command, the software takes advantage of sequencing conventions to automatically extract the needed statistics from multiple files. This information is then used to populate a PHP database, which is then seamlessly formatted into a dynamic Web page. This tool replaces a previous tedious and error-prone process of manually editing HTML code to construct a Web-based table. Because the tool manages all of the statistics gathering and file delivery to and from multiple data sources spread across multiple servers, there is also a considerable time and effort savings. With the use of The Sequence History Update Tool what previously took minutes is now done in less than 30 seconds, and now provides a more accurate archival record of the sequence commanding for MRO.

  18. Inverse statistical physics of protein sequences: a key issues review.

    PubMed

    Cocco, Simona; Feinauer, Christoph; Figliuzzi, Matteo; Monasson, Rémi; Weigt, Martin

    2018-03-01

    In the course of evolution, proteins undergo important changes in their amino acid sequences, while their three-dimensional folded structure and their biological function remain remarkably conserved. Thanks to modern sequencing techniques, sequence data accumulate at unprecedented pace. This provides large sets of so-called homologous, i.e. evolutionarily related protein sequences, to which methods of inverse statistical physics can be applied. Using sequence data as the basis for the inference of Boltzmann distributions from samples of microscopic configurations or observables, it is possible to extract information about evolutionary constraints and thus protein function and structure. Here we give an overview over some biologically important questions, and how statistical-mechanics inspired modeling approaches can help to answer them. Finally, we discuss some open questions, which we expect to be addressed over the next years.

  19. Inverse statistical physics of protein sequences: a key issues review

    NASA Astrophysics Data System (ADS)

    Cocco, Simona; Feinauer, Christoph; Figliuzzi, Matteo; Monasson, Rémi; Weigt, Martin

    2018-03-01

    In the course of evolution, proteins undergo important changes in their amino acid sequences, while their three-dimensional folded structure and their biological function remain remarkably conserved. Thanks to modern sequencing techniques, sequence data accumulate at unprecedented pace. This provides large sets of so-called homologous, i.e. evolutionarily related protein sequences, to which methods of inverse statistical physics can be applied. Using sequence data as the basis for the inference of Boltzmann distributions from samples of microscopic configurations or observables, it is possible to extract information about evolutionary constraints and thus protein function and structure. Here we give an overview over some biologically important questions, and how statistical-mechanics inspired modeling approaches can help to answer them. Finally, we discuss some open questions, which we expect to be addressed over the next years.

  20. Statistics for laminar flamelet modeling

    NASA Technical Reports Server (NTRS)

    Cant, R. S.; Rutland, C. J.; Trouve, A.

    1990-01-01

    Statistical information required to support modeling of turbulent premixed combustion by laminar flamelet methods is extracted from a database of the results of Direct Numerical Simulation of turbulent flames. The simulations were carried out previously by Rutland (1989) using a pseudo-spectral code on a three dimensional mesh of 128 points in each direction. One-step Arrhenius chemistry was employed together with small heat release. A framework for the interpretation of the data is provided by the Bray-Moss-Libby model for the mean turbulent reaction rate. Probability density functions are obtained over surfaces of the constant reaction progress variable for the tangential strain rate and the principal curvature. New insights are gained which will greatly aid the development of modeling approaches.

  1. Andreev Bound States Formation and Quasiparticle Trapping in Quench Dynamics Revealed by Time-Dependent Counting Statistics.

    PubMed

    Souto, R Seoane; Martín-Rodero, A; Yeyati, A Levy

    2016-12-23

    We analyze the quantum quench dynamics in the formation of a phase-biased superconducting nanojunction. We find that in the absence of an external relaxation mechanism and for very general conditions the system gets trapped in a metastable state, corresponding to a nonequilibrium population of the Andreev bound states. The use of the time-dependent full counting statistics analysis allows us to extract information on the asymptotic population of even and odd many-body states, demonstrating that a universal behavior, dependent only on the Andreev state energy, is reached in the quantum point contact limit. These results shed light on recent experimental observations on quasiparticle trapping in superconducting atomic contacts.

  2. Characterizing rainfall in the Tenerife island

    NASA Astrophysics Data System (ADS)

    Díez-Sierra, Javier; del Jesus, Manuel; Losada Rodriguez, Inigo

    2017-04-01

    In many locations, rainfall data are collected through networks of meteorological stations. The data collection process is nowadays automated in many places, leading to the development of big databases of rainfall data covering extensive areas of territory. However, managers, decision makers and engineering consultants tend not to extract most of the information contained in these databases due to the lack of specific software tools for their exploitation. Here we present the modeling and development effort put in place in the Tenerife island in order to develop MENSEI-L, a software tool capable of automatically analyzing a complete rainfall database to simplify the extraction of information from observations. MENSEI-L makes use of weather type information derived from atmospheric conditions to separate the complete time series into homogeneous groups where statistical distributions are fitted. Normal and extreme regimes are obtained in this manner. MENSEI-L is also able to complete missing data in the time series and to generate synthetic stations by using Kriging techniques. These techniques also serve to generate the spatial regimes of precipitation, both normal and extreme ones. MENSEI-L makes use of weather type information to also provide a stochastic three-day probability forecast for rainfall.

  3. Learning Probabilistic Inference through Spike-Timing-Dependent Plasticity.

    PubMed

    Pecevski, Dejan; Maass, Wolfgang

    2016-01-01

    Numerous experimental data show that the brain is able to extract information from complex, uncertain, and often ambiguous experiences. Furthermore, it can use such learnt information for decision making through probabilistic inference. Several models have been proposed that aim at explaining how probabilistic inference could be performed by networks of neurons in the brain. We propose here a model that can also explain how such neural network could acquire the necessary information for that from examples. We show that spike-timing-dependent plasticity in combination with intrinsic plasticity generates in ensembles of pyramidal cells with lateral inhibition a fundamental building block for that: probabilistic associations between neurons that represent through their firing current values of random variables. Furthermore, by combining such adaptive network motifs in a recursive manner the resulting network is enabled to extract statistical information from complex input streams, and to build an internal model for the distribution p (*) that generates the examples it receives. This holds even if p (*) contains higher-order moments. The analysis of this learning process is supported by a rigorous theoretical foundation. Furthermore, we show that the network can use the learnt internal model immediately for prediction, decision making, and other types of probabilistic inference.

  4. Learning Probabilistic Inference through Spike-Timing-Dependent Plasticity123

    PubMed Central

    Pecevski, Dejan

    2016-01-01

    Abstract Numerous experimental data show that the brain is able to extract information from complex, uncertain, and often ambiguous experiences. Furthermore, it can use such learnt information for decision making through probabilistic inference. Several models have been proposed that aim at explaining how probabilistic inference could be performed by networks of neurons in the brain. We propose here a model that can also explain how such neural network could acquire the necessary information for that from examples. We show that spike-timing-dependent plasticity in combination with intrinsic plasticity generates in ensembles of pyramidal cells with lateral inhibition a fundamental building block for that: probabilistic associations between neurons that represent through their firing current values of random variables. Furthermore, by combining such adaptive network motifs in a recursive manner the resulting network is enabled to extract statistical information from complex input streams, and to build an internal model for the distribution p* that generates the examples it receives. This holds even if p* contains higher-order moments. The analysis of this learning process is supported by a rigorous theoretical foundation. Furthermore, we show that the network can use the learnt internal model immediately for prediction, decision making, and other types of probabilistic inference. PMID:27419214

  5. Classification Features of US Images Liver Extracted with Co-occurrence Matrix Using the Nearest Neighbor Algorithm

    NASA Astrophysics Data System (ADS)

    Moldovanu, Simona; Bibicu, Dorin; Moraru, Luminita; Nicolae, Mariana Carmen

    2011-12-01

    Co-occurrence matrix has been applied successfully for echographic images characterization because it contains information about spatial distribution of grey-scale levels in an image. The paper deals with the analysis of pixels in selected regions of interest of an US image of the liver. The useful information obtained refers to texture features such as entropy, contrast, dissimilarity and correlation extract with co-occurrence matrix. The analyzed US images were grouped in two distinct sets: healthy liver and steatosis (or fatty) liver. These two sets of echographic images of the liver build a database that includes only histological confirmed cases: 10 images of healthy liver and 10 images of steatosis liver. The healthy subjects help to compute four textural indices and as well as control dataset. We chose to study these diseases because the steatosis is the abnormal retention of lipids in cells. The texture features are statistical measures and they can be used to characterize irregularity of tissues. The goal is to extract the information using the Nearest Neighbor classification algorithm. The K-NN algorithm is a powerful tool to classify features textures by means of grouping in a training set using healthy liver, on the one hand, and in a holdout set using the features textures of steatosis liver, on the other hand. The results could be used to quantify the texture information and will allow a clear detection between health and steatosis liver.

  6. Educational Data Mining Application for Estimating Students Performance in Weka Environment

    NASA Astrophysics Data System (ADS)

    Gowri, G. Shiyamala; Thulasiram, Ramasamy; Amit Baburao, Mahindra

    2017-11-01

    Educational data mining (EDM) is a multi-disciplinary research area that examines artificial intelligence, statistical modeling and data mining with the data generated from an educational institution. EDM utilizes computational ways to deal with explicate educational information keeping in mind the end goal to examine educational inquiries. To make a country stand unique among the other nations of the world, the education system has to undergo a major transition by redesigning its framework. The concealed patterns and data from various information repositories can be extracted by adopting the techniques of data mining. In order to summarize the performance of students with their credentials, we scrutinize the exploitation of data mining in the field of academics. Apriori algorithmic procedure is extensively applied to the database of students for a wider classification based on various categorizes. K-means procedure is applied to the same set of databases in order to accumulate them into a specific category. Apriori algorithm deals with mining the rules in order to extract patterns that are similar along with their associations in relation to various set of records. The records can be extracted from academic information repositories. The parameters used in this study gives more importance to psychological traits than academic features. The undesirable student conduct can be clearly witnessed if we make use of information mining frameworks. Thus, the algorithms efficiently prove to profile the students in any educational environment. The ultimate objective of the study is to suspect if a student is prone to violence or not.

  7. Automatic summarization of changes in biological image sequences using algorithmic information theory.

    PubMed

    Cohen, Andrew R; Bjornsson, Christopher S; Temple, Sally; Banker, Gary; Roysam, Badrinath

    2009-08-01

    An algorithmic information-theoretic method is presented for object-level summarization of meaningful changes in image sequences. Object extraction and tracking data are represented as an attributed tracking graph (ATG). Time courses of object states are compared using an adaptive information distance measure, aided by a closed-form multidimensional quantization. The notion of meaningful summarization is captured by using the gap statistic to estimate the randomness deficiency from algorithmic statistics. The summary is the clustering result and feature subset that maximize the gap statistic. This approach was validated on four bioimaging applications: 1) It was applied to a synthetic data set containing two populations of cells differing in the rate of growth, for which it correctly identified the two populations and the single feature out of 23 that separated them; 2) it was applied to 59 movies of three types of neuroprosthetic devices being inserted in the brain tissue at three speeds each, for which it correctly identified insertion speed as the primary factor affecting tissue strain; 3) when applied to movies of cultured neural progenitor cells, it correctly distinguished neurons from progenitors without requiring the use of a fixative stain; and 4) when analyzing intracellular molecular transport in cultured neurons undergoing axon specification, it automatically confirmed the role of kinesins in axon specification.

  8. DeTEXT: A Database for Evaluating Text Extraction from Biomedical Literature Figures

    PubMed Central

    Yin, Xu-Cheng; Yang, Chun; Pei, Wei-Yi; Man, Haixia; Zhang, Jun; Learned-Miller, Erik; Yu, Hong

    2015-01-01

    Hundreds of millions of figures are available in biomedical literature, representing important biomedical experimental evidence. Since text is a rich source of information in figures, automatically extracting such text may assist in the task of mining figure information. A high-quality ground truth standard can greatly facilitate the development of an automated system. This article describes DeTEXT: A database for evaluating text extraction from biomedical literature figures. It is the first publicly available, human-annotated, high quality, and large-scale figure-text dataset with 288 full-text articles, 500 biomedical figures, and 9308 text regions. This article describes how figures were selected from open-access full-text biomedical articles and how annotation guidelines and annotation tools were developed. We also discuss the inter-annotator agreement and the reliability of the annotations. We summarize the statistics of the DeTEXT data and make available evaluation protocols for DeTEXT. Finally we lay out challenges we observed in the automated detection and recognition of figure text and discuss research directions in this area. DeTEXT is publicly available for downloading at http://prir.ustb.edu.cn/DeTEXT/. PMID:25951377

  9. Extracting Temporal and Spatial Distributions Information about Algal Glooms Based on Multitemporal Modis

    NASA Astrophysics Data System (ADS)

    Chunguang, L.; Qingjiu, T.

    2012-07-01

    Based on MODIS remote sensing data, method and technology to extraction the time and space distribution information of algae bloom is studied and established. The dynamic feature of time and space in Taihu Lake from 2009 to 2011 can be obtained by extracted method. Variation of cyanobacterial bloom in the Taihu Lake is analyzed and discussed. The algae bloom frequency index (AFI) and algae bloom sustainability index (ASI) is important criterion which can show the interannual and inter-monthly variation in the whole area or the subregion of Taihu Lake. Utilizing the AFI and ASI from 2009 to 2011, it found some phenomena that: the booming frequency decreased from the north and west to the East and South of Taihu Lake. The annual month algae bloom variation of AFI reflect the booming existing twin peaks in the high shock level and lag trend in general. In the subregion statistics, the IBD and ASI in 2011 show the abnormal condition in the border between the Gongshan Bay and Central Lake. The date is obvious earlier than that on the same subregion in previous years and that on others subregion in the same year.

  10. A statistical framework for multiparameter analysis at the single-cell level.

    PubMed

    Torres-García, Wandaliz; Ashili, Shashanka; Kelbauskas, Laimonas; Johnson, Roger H; Zhang, Weiwen; Runger, George C; Meldrum, Deirdre R

    2012-03-01

    Phenotypic characterization of individual cells provides crucial insights into intercellular heterogeneity and enables access to information that is unavailable from ensemble averaged, bulk cell analyses. Single-cell studies have attracted significant interest in recent years and spurred the development of a variety of commercially available and research-grade technologies. To quantify cell-to-cell variability of cell populations, we have developed an experimental platform for real-time measurements of oxygen consumption (OC) kinetics at the single-cell level. Unique challenges inherent to these single-cell measurements arise, and no existing data analysis methodology is available to address them. Here we present a data processing and analysis method that addresses challenges encountered with this unique type of data in order to extract biologically relevant information. We applied the method to analyze OC profiles obtained with single cells of two different cell lines derived from metaplastic and dysplastic human Barrett's esophageal epithelium. In terms of method development, three main challenges were considered for this heterogeneous dynamic system: (i) high levels of noise, (ii) the lack of a priori knowledge of single-cell dynamics, and (iii) the role of intercellular variability within and across cell types. Several strategies and solutions to address each of these three challenges are presented. The features such as slopes, intercepts, breakpoint or change-point were extracted for every OC profile and compared across individual cells and cell types. The results demonstrated that the extracted features facilitated exposition of subtle differences between individual cells and their responses to cell-cell interactions. With minor modifications, this method can be used to process and analyze data from other acquisition and experimental modalities at the single-cell level, providing a valuable statistical framework for single-cell analysis.

  11. Mutual interference between statistical summary perception and statistical learning.

    PubMed

    Zhao, Jiaying; Ngo, Nhi; McKendrick, Ryan; Turk-Browne, Nicholas B

    2011-09-01

    The visual system is an efficient statistician, extracting statistical summaries over sets of objects (statistical summary perception) and statistical regularities among individual objects (statistical learning). Although these two kinds of statistical processing have been studied extensively in isolation, their relationship is not yet understood. We first examined how statistical summary perception influences statistical learning by manipulating the task that participants performed over sets of objects containing statistical regularities (Experiment 1). Participants who performed a summary task showed no statistical learning of the regularities, whereas those who performed control tasks showed robust learning. We then examined how statistical learning influences statistical summary perception by manipulating whether the sets being summarized contained regularities (Experiment 2) and whether such regularities had already been learned (Experiment 3). The accuracy of summary judgments improved when regularities were removed and when learning had occurred in advance. In sum, calculating summary statistics impeded statistical learning, and extracting statistical regularities impeded statistical summary perception. This mutual interference suggests that statistical summary perception and statistical learning are fundamentally related.

  12. Kansas environmental and resource study: A Great Plains model, tasks 1-6

    NASA Technical Reports Server (NTRS)

    Haralick, R. M.; Kanemasu, E. T.; Morain, S. A.; Yarger, H. L. (Principal Investigator); Ulaby, F. T.; Shanmugam, K. S.; Williams, D. L.; Mccauley, J. R.; Mcnaughton, J. L.

    1972-01-01

    There are no author identified significant results in this report. Environmental and resources investigations in Kansas utilizing ERTS-1 imagery are summarized for the following areas: (1) use of feature extraction techniqued for texture context information in ERTS imagery; (2) interpretation and automatic image enhancement; (3) water use, production, and disease detection and predictions for wheat; (4) ERTS-1 agricultural statistics; (5) monitoring fresh water resources; and (6) ground pattern analysis in the Great Plains.

  13. Forensic DNA testing.

    PubMed

    Butler, John M

    2011-12-01

    Forensic DNA testing has a number of applications, including parentage testing, identifying human remains from natural or man-made disasters or terrorist attacks, and solving crimes. This article provides background information followed by an overview of the process of forensic DNA testing, including sample collection, DNA extraction, PCR amplification, short tandem repeat (STR) allele separation and sizing, typing and profile interpretation, statistical analysis, and quality assurance. The article concludes with discussions of possible problems with the data and other forensic DNA testing techniques.

  14. Automatic Artifact Removal from Electroencephalogram Data Based on A Priori Artifact Information.

    PubMed

    Zhang, Chi; Tong, Li; Zeng, Ying; Jiang, Jingfang; Bu, Haibing; Yan, Bin; Li, Jianxin

    2015-01-01

    Electroencephalogram (EEG) is susceptible to various nonneural physiological artifacts. Automatic artifact removal from EEG data remains a key challenge for extracting relevant information from brain activities. To adapt to variable subjects and EEG acquisition environments, this paper presents an automatic online artifact removal method based on a priori artifact information. The combination of discrete wavelet transform and independent component analysis (ICA), wavelet-ICA, was utilized to separate artifact components. The artifact components were then automatically identified using a priori artifact information, which was acquired in advance. Subsequently, signal reconstruction without artifact components was performed to obtain artifact-free signals. The results showed that, using this automatic online artifact removal method, there were statistical significant improvements of the classification accuracies in both two experiments, namely, motor imagery and emotion recognition.

  15. Automatic Artifact Removal from Electroencephalogram Data Based on A Priori Artifact Information

    PubMed Central

    Zhang, Chi; Tong, Li; Zeng, Ying; Jiang, Jingfang; Bu, Haibing; Li, Jianxin

    2015-01-01

    Electroencephalogram (EEG) is susceptible to various nonneural physiological artifacts. Automatic artifact removal from EEG data remains a key challenge for extracting relevant information from brain activities. To adapt to variable subjects and EEG acquisition environments, this paper presents an automatic online artifact removal method based on a priori artifact information. The combination of discrete wavelet transform and independent component analysis (ICA), wavelet-ICA, was utilized to separate artifact components. The artifact components were then automatically identified using a priori artifact information, which was acquired in advance. Subsequently, signal reconstruction without artifact components was performed to obtain artifact-free signals. The results showed that, using this automatic online artifact removal method, there were statistical significant improvements of the classification accuracies in both two experiments, namely, motor imagery and emotion recognition. PMID:26380294

  16. Monitoring and Evaluation: Statistical Support for Life-cycle Studies, Annual Report 2003.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Skalski, John

    2003-11-01

    The ongoing mission of this project is the development of statistical tools for analyzing fisheries tagging data in the most precise and appropriate manner possible. This mission also includes providing statistical guidance on the best ways to design large-scale tagging studies. This mission continues because the technologies for conducting fish tagging studies continuously evolve. In just the last decade, fisheries biologists have seen the evolution from freeze-brands and coded wire tags (CWT) to passive integrated transponder (PIT) tags, balloon-tags, radiotelemetry, and now, acoustic-tags. With each advance, the technology holds the promise of more detailed and precise information. However, the technologymore » for analyzing and interpreting the data also becomes more complex as the tagging techniques become more sophisticated. The goal of the project is to develop the analytical tools in parallel with the technical advances in tagging studies, so that maximum information can be extracted on a timely basis. Associated with this mission is the transfer of these analytical capabilities to the field investigators to assure consistency and the highest levels of design and analysis throughout the fisheries community. Consequently, this project provides detailed technical assistance on the design and analysis of tagging studies to groups requesting assistance throughout the fisheries community. Ideally, each project and each investigator would invest in the statistical support needed for the successful completion of their study. However, this is an ideal that is rarely if every attained. Furthermore, there is only a small pool of highly trained scientists in this specialized area of tag analysis here in the Northwest. Project 198910700 provides the financial support to sustain this local expertise on the statistical theory of tag analysis at the University of Washington and make it available to the fisheries community. Piecemeal and fragmented support from various agencies and organizations would be incapable of maintaining a center of expertise. The mission of the project is to help assure tagging studies are designed and analyzed from the onset to extract the best available information using state-of-the-art statistical methods. The overarching goals of the project is to assure statistically sound survival studies so that fish managers can focus on the management implications of their findings and not be distracted by concerns whether the studies are statistically reliable or not. Specific goals and objectives of the study include the following: (1) Provide consistent application of statistical methodologies for survival estimation across all salmon life cycle stages to assure comparable performance measures and assessment of results through time, to maximize learning and adaptive management opportunities, and to improve and maintain the ability to responsibly evaluate the success of implemented Columbia River FWP salmonid mitigation programs and identify future mitigation options. (2) Improve analytical capabilities to conduct research on survival processes of wild and hatchery chinook and steelhead during smolt outmigration, to improve monitoring and evaluation capabilities and assist in-season river management to optimize operational and fish passage strategies to maximize survival. (3) Extend statistical support to estimate ocean survival and in-river survival of returning adults. Provide statistical guidance in implementing a river-wide adult PIT-tag detection capability. (4) Develop statistical methods for survival estimation for all potential users and make this information available through peer-reviewed publications, statistical software, and technology transfers to organizations such as NOAA Fisheries, the Fish Passage Center, US Fish and Wildlife Service, US Geological Survey (USGS), US Army Corps of Engineers (USACE), Public Utility Districts (PUDs), the Independent Scientific Advisory Board (ISAB), and other members of the Northwest fisheries community. (5) Provide and maintain statistical software for tag analysis and user support. (6) Provide improvements in statistical theory and software as requested by user groups. These improvements include extending software capabilities to address new research issues, adapting tagging techniques to new study designs, and extending the analysis capabilities to new technologies such as radio-tags and acoustic-tags.« less

  17. To assess the efficacy of socket plug technique using platelet rich fibrin with or without the use of bone substitute in alveolar ridge preservation: a prospective randomised controlled study.

    PubMed

    Girish Kumar, N; Chaudhary, Rupanzal; Kumar, Ish; Arora, Srimathy S; Kumar, Nilesh; Singh, Hem

    2018-06-01

    The purpose of this study is to evaluate the efficacy of Platelet Rich Fibrin (PRF) as a socket plug with or without use of Plaster of Paris (POP) as bone substitute to preserve the alveolar ridge post-extraction. A prospective randomised single blind controlled study, was conducted for 18 months from November 2014 to May 2016 on 48 patients requiring extraction. All teeth were extracted atraumatically using periotomes and luxators without raising mucoperiosteal flap. Sockets were randomly allotted to groups A, B and C. Group A sockets were chosen as control, where figure of eight suture was placed. In group B sockets, PRF obtained by centrifugation was used as a socket plug and stabilised with figure of eight suture. Group C sockets were filled with POP and then covered with PRF. The socket was then closed with a figure of eight suture. Patients were informed of need for 6 months follow-up. Ninety sockets in 48 patients were subjected to our study. We found that results in the sockets where we have grafted POP showed better ridge preservation and post-operative comfort even though the difference in ridge resorption between the three groups was not statistically significant. Powered by Editorial Manager® and ProduXion Manager® from the Aries Systems Corporation. Atraumatic extraction may minimise the post-operative pain and discomfort to patient as well as the post-extraction alveolar height and width changes. The use of PRF and/or bone substitute even though clinically contributes to better post-operative healing and minimal loss of alveolar width and height, the values were not statistically significant.

  18. Entropic Profiler – detection of conservation in genomes using information theory

    PubMed Central

    Fernandes, Francisco; Freitas, Ana T; Almeida, Jonas S; Vinga, Susana

    2009-01-01

    Background In the last decades, with the successive availability of whole genome sequences, many research efforts have been made to mathematically model DNA. Entropic Profiles (EP) were proposed recently as a new measure of continuous entropy of genome sequences. EP represent local information plots related to DNA randomness and are based on information theory and statistical concepts. They express the weighed relative abundance of motifs for each position in genomes. Their study is very relevant because under or over-representation segments are often associated with significant biological meaning. Findings The Entropic Profiler application here presented is a new tool designed to detect and extract under and over-represented DNA segments in genomes by using EP. It allows its computation in a very efficient way by recurring to improved algorithms and data structures, which include modified suffix trees. Available through a web interface and as downloadable source code, it allows to study positions and to search for motifs inside the whole sequence or within a specified range. DNA sequences can be entered from different sources, including FASTA files, pre-loaded examples or resuming a previously saved work. Besides the EP value plots, p-values and z-scores for each motif are also computed, along with the Chaos Game Representation of the sequence. Conclusion EP are directly related with the statistical significance of motifs and can be considered as a new method to extract and classify significant regions in genomes and estimate local scales in DNA. The present implementation establishes an efficient and useful tool for whole genome analysis. PMID:19416538

  19. The minimal local-asperity hypothesis of early retinal lateral inhibition.

    PubMed

    Balboa, R M; Grzywacz, N M

    2000-07-01

    Recently we found that the theories related to information theory existent in the literature cannot explain the behavior of the extent of the lateral inhibition mediated by retinal horizontal cells as a function of background light intensity. These theories can explain the fall of the extent from intermediate to high intensities, but not its rise from dim to intermediate intensities. We propose an alternate hypothesis that accounts for the extent's bell-shape behavior. This hypothesis proposes that the lateral-inhibition adaptation in the early retina is part of a system to extract several image attributes, such as occlusion borders and contrast. To do so, this system would use prior probabilistic knowledge about the biological processing and relevant statistics in natural images. A key novel statistic used here is the probability of the presence of an occlusion border as a function of local contrast. Using this probabilistic knowledge, the retina would optimize the spatial profile of lateral inhibition to minimize attribute-extraction error. The two significant errors that this minimization process must reduce are due to the quantal noise in photoreceptors and the straddling of occlusion borders by lateral inhibition.

  20. The Peroxidation of Leukocytes Index Ratio Reveals the Prooxidant Effect of Green Tea Extract

    PubMed Central

    Manafikhi, Husseen; Raguzzini, Anna; Longhitano, Yaroslava; Reggi, Raffaella; Zanza, Christian

    2016-01-01

    Despite tea increased plasma nonenzymatic antioxidant capacity, the European Food Safety Administration (EFSA) denied claims related to tea and its protection from oxidative damage. Furthermore, the Supplement Information Expert Committee (DSI EC) expressed some doubts on the safety of green tea extract (GTE). We performed a pilot study in order to evaluate the effect of a single dose of two capsules of a GTE supplement (200 mg × 2) on the peroxidation of leukocytes index ratio (PLIR) in relation to uric acid (UA) and ferric reducing antioxidant potential (FRAP), as well as the sample size to reach statistical significance. GTE induced a prooxidant effect on leukocytes, whereas FRAP did not change, in agreement with the EFSA and the DSI EC conclusions. Besides, our results confirm the primary role of UA in the antioxidant defences. The ratio based calculation of the PLIR reduced the sample size to reach statistical significance, compared to the resistance to an exogenous oxidative stress and to the functional capacity of oxidative burst. Therefore, PLIR could be a sensitive marker of redox status. PMID:28101300

  1. The Peroxidation of Leukocytes Index Ratio Reveals the Prooxidant Effect of Green Tea Extract.

    PubMed

    Peluso, Ilaria; Manafikhi, Husseen; Raguzzini, Anna; Longhitano, Yaroslava; Reggi, Raffaella; Zanza, Christian; Palmery, Maura

    2016-01-01

    Despite tea increased plasma nonenzymatic antioxidant capacity, the European Food Safety Administration (EFSA) denied claims related to tea and its protection from oxidative damage. Furthermore, the Supplement Information Expert Committee (DSI EC) expressed some doubts on the safety of green tea extract (GTE). We performed a pilot study in order to evaluate the effect of a single dose of two capsules of a GTE supplement (200 mg × 2) on the peroxidation of leukocytes index ratio (PLIR) in relation to uric acid (UA) and ferric reducing antioxidant potential (FRAP), as well as the sample size to reach statistical significance. GTE induced a prooxidant effect on leukocytes, whereas FRAP did not change, in agreement with the EFSA and the DSI EC conclusions. Besides, our results confirm the primary role of UA in the antioxidant defences. The ratio based calculation of the PLIR reduced the sample size to reach statistical significance, compared to the resistance to an exogenous oxidative stress and to the functional capacity of oxidative burst. Therefore, PLIR could be a sensitive marker of redox status.

  2. Beyond δ : Tailoring marked statistics to reveal modified gravity

    NASA Astrophysics Data System (ADS)

    Valogiannis, Georgios; Bean, Rachel

    2018-01-01

    Models that seek to explain cosmic acceleration through modifications to general relativity (GR) evade stringent Solar System constraints through a restoring, screening mechanism. Down-weighting the high-density, screened regions in favor of the low density, unscreened ones offers the potential to enhance the amount of information carried in such modified gravity models. In this work, we assess the performance of a new "marked" transformation and perform a systematic comparison with the clipping and logarithmic transformations, in the context of Λ CDM and the symmetron and f (R ) modified gravity models. Performance is measured in terms of the fractional boost in the Fisher information and the signal-to-noise ratio (SNR) for these models relative to the statistics derived from the standard density distribution. We find that all three statistics provide improved Fisher boosts over the basic density statistics. The model parameters for the marked and clipped transformation that best enhance signals and the Fisher boosts are determined. We also show that the mark is useful both as a Fourier and real-space transformation; a marked correlation function also enhances the SNR relative to the standard correlation function, and can on mildly nonlinear scales show a significant difference between the Λ CDM and the modified gravity models. Our results demonstrate how a series of simple analytical transformations could dramatically increase the predicted information extracted on deviations from GR, from large-scale surveys, and give the prospect for a much more feasible potential detection.

  3. Multisensor multiresolution data fusion for improvement in classification

    NASA Astrophysics Data System (ADS)

    Rubeena, V.; Tiwari, K. C.

    2016-04-01

    The rapid advancements in technology have facilitated easy availability of multisensor and multiresolution remote sensing data. Multisensor, multiresolution data contain complementary information and fusion of such data may result in application dependent significant information which may otherwise remain trapped within. The present work aims at improving classification by fusing features of coarse resolution hyperspectral (1 m) LWIR and fine resolution (20 cm) RGB data. The classification map comprises of eight classes. The class names are Road, Trees, Red Roof, Grey Roof, Concrete Roof, Vegetation, bare Soil and Unclassified. The processing methodology for hyperspectral LWIR data comprises of dimensionality reduction, resampling of data by interpolation technique for registering the two images at same spatial resolution, extraction of the spatial features to improve classification accuracy. In the case of fine resolution RGB data, the vegetation index is computed for classifying the vegetation class and the morphological building index is calculated for buildings. In order to extract the textural features, occurrence and co-occurence statistics is considered and the features will be extracted from all the three bands of RGB data. After extracting the features, Support Vector Machine (SVMs) has been used for training and classification. To increase the classification accuracy, post processing steps like removal of any spurious noise such as salt and pepper noise is done which is followed by filtering process by majority voting within the objects for better object classification.

  4. The information extraction of Gannan citrus orchard based on the GF-1 remote sensing image

    NASA Astrophysics Data System (ADS)

    Wang, S.; Chen, Y. L.

    2017-02-01

    The production of Gannan oranges is the largest in China, which occupied an important part in the world. The extraction of citrus orchard quickly and effectively has important significance for fruit pathogen defense, fruit production and industrial planning. The traditional spectra extraction method of citrus orchard based on pixel has a lower classification accuracy, difficult to avoid the “pepper phenomenon”. In the influence of noise, the phenomenon that different spectrums of objects have the same spectrum is graveness. Taking Xunwu County citrus fruit planting area of Ganzhou as the research object, aiming at the disadvantage of the lower accuracy of the traditional method based on image element classification method, a decision tree classification method based on object-oriented rule set is proposed. Firstly, multi-scale segmentation is performed on the GF-1 remote sensing image data of the study area. Subsequently the sample objects are selected for statistical analysis of spectral features and geometric features. Finally, combined with the concept of decision tree classification, a variety of empirical values of single band threshold, NDVI, band combination and object geometry characteristics are used hierarchically to execute the information extraction of the research area, and multi-scale segmentation and hierarchical decision tree classification is implemented. The classification results are verified with the confusion matrix, and the overall Kappa index is 87.91%.

  5. Research of facial feature extraction based on MMC

    NASA Astrophysics Data System (ADS)

    Xue, Donglin; Zhao, Jiufen; Tang, Qinhong; Shi, Shaokun

    2017-07-01

    Based on the maximum margin criterion (MMC), a new algorithm of statistically uncorrelated optimal discriminant vectors and a new algorithm of orthogonal optimal discriminant vectors for feature extraction were proposed. The purpose of the maximum margin criterion is to maximize the inter-class scatter while simultaneously minimizing the intra-class scatter after the projection. Compared with original MMC method and principal component analysis (PCA) method, the proposed methods are better in terms of reducing or eliminating the statistically correlation between features and improving recognition rate. The experiment results on Olivetti Research Laboratory (ORL) face database shows that the new feature extraction method of statistically uncorrelated maximum margin criterion (SUMMC) are better in terms of recognition rate and stability. Besides, the relations between maximum margin criterion and Fisher criterion for feature extraction were revealed.

  6. Evaluation of multiband photography for rock discrimination

    NASA Technical Reports Server (NTRS)

    Raines, G. L.

    1974-01-01

    An evaluation is presented of the multiband photography concept that tonal differences between rock formations on aerial photography can be improved through the selection of the appropriate bands. The concept involves: (1) acquiring band reference data for the rocks being considered; (2) selecting the best combination of bands to discriminate the rocks using these reference data; (3) acquiring aerial photography using these selected bands; and (4) extracting the desired geologic information in an optimum manner. The test site geology and rock reflectance are discussed in detail. The evaluation found that the differences in contrast ratios are not statistically significant, and the spectral information in different bands is not advantageous.

  7. Observation of the immune response of cells and tissue through multimodal label-free microscopy

    NASA Astrophysics Data System (ADS)

    Pavillon, Nicolas; Smith, Nicholas I.

    2017-02-01

    We present applications of a label-free approach to assess the immune response based on the combination of interferometric microscopy and Raman spectroscopy, which makes it possible to simultaneously acquire morphological and molecular information of live cells. We employ this approach to derive statistical models for predicting the activation state of macrophage cells based both on morphological parameters extracted from the high-throughput full-field quantitative phase imaging, and on the molecular content information acquired through Raman spectroscopy. We also employ a system for 3D imaging based on coherence gating, enabling specific targeting of the Raman channel to structures of interest within tissue.

  8. Collaborative classification of hyperspectral and visible images with convolutional neural network

    NASA Astrophysics Data System (ADS)

    Zhang, Mengmeng; Li, Wei; Du, Qian

    2017-10-01

    Recent advances in remote sensing technology have made multisensor data available for the same area, and it is well-known that remote sensing data processing and analysis often benefit from multisource data fusion. Specifically, low spatial resolution of hyperspectral imagery (HSI) degrades the quality of the subsequent classification task while using visible (VIS) images with high spatial resolution enables high-fidelity spatial analysis. A collaborative classification framework is proposed to fuse HSI and VIS images for finer classification. First, the convolutional neural network model is employed to extract deep spectral features for HSI classification. Second, effective binarized statistical image features are learned as contextual basis vectors for the high-resolution VIS image, followed by a classifier. The proposed approach employs diversified data in a decision fusion, leading to an integration of the rich spectral information, spatial information, and statistical representation information. In particular, the proposed approach eliminates the potential problems of the curse of dimensionality and excessive computation time. The experiments evaluated on two standard data sets demonstrate better classification performance offered by this framework.

  9. Data Assimilation to Extract Soil Moisture Information From SMAP Observations

    NASA Technical Reports Server (NTRS)

    Kolassa, J.; Reichle, R. H.; Liu, Q.; Alemohammad, S. H.; Gentine, P.

    2017-01-01

    Statistical techniques permit the retrieval of soil moisture estimates in a model climatology while retaining the spatial and temporal signatures of the satellite observations. As a consequence, they can be used to reduce the need for localized bias correction techniques typically implemented in data assimilation (DA) systems that tend to remove some of the independent information provided by satellite observations. Here, we use a statistical neural network (NN) algorithm to retrieve SMAP (Soil Moisture Active Passive) surface soil moisture estimates in the climatology of the NASA Catchment land surface model. Assimilating these estimates without additional bias correction is found to significantly reduce the model error and increase the temporal correlation against SMAP CalVal in situ observations over the contiguous United States. A comparison with assimilation experiments using traditional bias correction techniques shows that the NN approach better retains the independent information provided by the SMAP observations and thus leads to larger model skill improvements during the assimilation. A comparison with the SMAP Level 4 product shows that the NN approach is able to provide comparable skill improvements and thus represents a viable assimilation approach.

  10. Markov Logic Networks for Adverse Drug Event Extraction from Text.

    PubMed

    Natarajan, Sriraam; Bangera, Vishal; Khot, Tushar; Picado, Jose; Wazalwar, Anurag; Costa, Vitor Santos; Page, David; Caldwell, Michael

    2017-05-01

    Adverse drug events (ADEs) are a major concern and point of emphasis for the medical profession, government, and society. A diverse set of techniques from epidemiology, statistics, and computer science are being proposed and studied for ADE discovery from observational health data (e.g., EHR and claims data), social network data (e.g., Google and Twitter posts), and other information sources. Methodologies are needed for evaluating, quantitatively measuring, and comparing the ability of these various approaches to accurately discover ADEs. This work is motivated by the observation that text sources such as the Medline/Medinfo library provide a wealth of information on human health. Unfortunately, ADEs often result from unexpected interactions, and the connection between conditions and drugs is not explicit in these sources. Thus, in this work we address the question of whether we can quantitatively estimate relationships between drugs and conditions from the medical literature. This paper proposes and studies a state-of-the-art NLP-based extraction of ADEs from text.

  11. Pattern recognition of satellite cloud imagery for improved weather prediction

    NASA Technical Reports Server (NTRS)

    Gautier, Catherine; Somerville, Richard C. J.; Volfson, Leonid B.

    1986-01-01

    The major accomplishment was the successful development of a method for extracting time derivative information from geostationary meteorological satellite imagery. This research is a proof-of-concept study which demonstrates the feasibility of using pattern recognition techniques and a statistical cloud classification method to estimate time rate of change of large-scale meteorological fields from remote sensing data. The cloud classification methodology is based on typical shape function analysis of parameter sets characterizing the cloud fields. The three specific technical objectives, all of which were successfully achieved, are as follows: develop and test a cloud classification technique based on pattern recognition methods, suitable for the analysis of visible and infrared geostationary satellite VISSR imagery; develop and test a methodology for intercomparing successive images using the cloud classification technique, so as to obtain estimates of the time rate of change of meteorological fields; and implement this technique in a testbed system incorporating an interactive graphics terminal to determine the feasibility of extracting time derivative information suitable for comparison with numerical weather prediction products.

  12. A biomedical information system for retrieval and manipulation of NHANES data.

    PubMed

    Mukherjee, Sukrit; Martins, David; Norris, Keith C; Jenders, Robert A

    2013-01-01

    The retrieval and manipulation of data from large public databases like the U.S. National Health and Nutrition Examination Survey (NHANES) may require sophisticated statistical software and significant expertise that may be unavailable in the university setting. In response, we have developed the Data Retrieval And Manipulation System (DReAMS), an automated information system to handle all processes of data extraction and cleaning and then joining different subsets to produce analysis-ready output. The system is a browser-based data warehouse application in which the input data from flat files or operational systems are aggregated in a structured way so that the desired data can be read, recoded, queried and extracted efficiently. The current pilot implementation of the system provides access to a limited amount of NHANES database. We plan to increase the amount of data available through the system in the near future and to extend the techniques to other large databases from CDU archive with a current holding of about 53 databases.

  13. A sensitive continuum analysis method for gamma ray spectra

    NASA Technical Reports Server (NTRS)

    Thakur, Alakh N.; Arnold, James R.

    1993-01-01

    In this work we examine ways to improve the sensitivity of the analysis procedure for gamma ray spectra with respect to small differences in the continuum (Compton) spectra. The method developed is applied to analyze gamma ray spectra obtained from planetary mapping by the Mars Observer spacecraft launched in September 1992. Calculated Mars simulation spectra and actual thick target bombardment spectra have been taken as test cases. The principle of the method rests on the extraction of continuum information from Fourier transforms of the spectra. We study how a better estimate of the spectrum from larger regions of the Mars surface will improve the analysis for smaller regions with poorer statistics. Estimation of signal within the continuum is done in the frequency domain which enables efficient and sensitive discrimination of subtle differences between two spectra. The process is compared to other methods for the extraction of information from the continuum. Finally we explore briefly the possible uses of this technique in other applications of continuum spectra.

  14. Feature extraction and classification algorithms for high dimensional data

    NASA Technical Reports Server (NTRS)

    Lee, Chulhee; Landgrebe, David

    1993-01-01

    Feature extraction and classification algorithms for high dimensional data are investigated. Developments with regard to sensors for Earth observation are moving in the direction of providing much higher dimensional multispectral imagery than is now possible. In analyzing such high dimensional data, processing time becomes an important factor. With large increases in dimensionality and the number of classes, processing time will increase significantly. To address this problem, a multistage classification scheme is proposed which reduces the processing time substantially by eliminating unlikely classes from further consideration at each stage. Several truncation criteria are developed and the relationship between thresholds and the error caused by the truncation is investigated. Next an approach to feature extraction for classification is proposed based directly on the decision boundaries. It is shown that all the features needed for classification can be extracted from decision boundaries. A characteristic of the proposed method arises by noting that only a portion of the decision boundary is effective in discriminating between classes, and the concept of the effective decision boundary is introduced. The proposed feature extraction algorithm has several desirable properties: it predicts the minimum number of features necessary to achieve the same classification accuracy as in the original space for a given pattern recognition problem; and it finds the necessary feature vectors. The proposed algorithm does not deteriorate under the circumstances of equal means or equal covariances as some previous algorithms do. In addition, the decision boundary feature extraction algorithm can be used both for parametric and non-parametric classifiers. Finally, some problems encountered in analyzing high dimensional data are studied and possible solutions are proposed. First, the increased importance of the second order statistics in analyzing high dimensional data is recognized. By investigating the characteristics of high dimensional data, the reason why the second order statistics must be taken into account in high dimensional data is suggested. Recognizing the importance of the second order statistics, there is a need to represent the second order statistics. A method to visualize statistics using a color code is proposed. By representing statistics using color coding, one can easily extract and compare the first and the second statistics.

  15. Preventive Effect of Phosphodiesterase Inhibitor Pentoxifylline Against Medication-Related Osteonecrosis of the Jaw: An Animal Study.

    PubMed

    Yalcin-Ulker, Gül Merve; Cumbul, Alev; Duygu-Capar, Gonca; Uslu, Ünal; Sencift, Kemal

    2017-11-01

    The aim of this experimental study was to investigate the prophylactic effect of pentoxifylline (PTX) on medication-related osteonecrosis of the jaw (MRONJ). Female Sprague-Dawley rats (n = 33) received zoledronic acid (ZA) for 8 weeks to create an osteonecrosis model. The left mandibular second molars were extracted and the recovery period lasted 8 weeks before sacrifice. PTX was intraperitoneally administered to prevent MRONJ. The specimens were histopathologically and histomorphometrically evaluated. Histomorphometrically, between the control and ZA groups, there was no statistically significant difference in total bone volume (P = .999), but there was a statistically significant difference in bone ratio in the extraction sockets (P < .001). A comparison of the bone ratio of the ZA group with the ZA/PTX group (PTX administered after extraction) showed no statistically significant difference (P = .69), but there was a statistically significant difference with the ZA/PTX/PTX group (PTX administered before and after extraction; P = .008). Histopathologically, between the control and ZA groups, there were statistically significant differences for inflammation (P = .013), vascularization (P = .022), hemorrhage (P = .025), and regeneration (P = .008). Between the ZA and ZA/PTX groups, there were no statistically significant differences for inflammation (P = .536), vascularization (P = .642), hemorrhage (P = .765), and regeneration (P = .127). Between the ZA and ZA/PTX/PTX groups, there were statistically significant differences for inflammation (P = .017), vascularization (P = .04), hemorrhage (P = .044), and regeneration (P = .04). In this experimental model of MRONJ, it might be concluded that although PTX, given after tooth extraction, improves new bone formation that positively affects bone healing, it is not prophylactic. However, PTX given before tooth extraction is prophylactic. Therefore, PTX might affect healing in a positive way by optimizing the inflammatory response. Copyright © 2017 American Association of Oral and Maxillofacial Surgeons. Published by Elsevier Inc. All rights reserved.

  16. Extreme event statistics in a drifting Markov chain

    NASA Astrophysics Data System (ADS)

    Kindermann, Farina; Hohmann, Michael; Lausch, Tobias; Mayer, Daniel; Schmidt, Felix; Widera, Artur

    2017-07-01

    We analyze extreme event statistics of experimentally realized Markov chains with various drifts. Our Markov chains are individual trajectories of a single atom diffusing in a one-dimensional periodic potential. Based on more than 500 individual atomic traces we verify the applicability of the Sparre Andersen theorem to our system despite the presence of a drift. We present detailed analysis of four different rare-event statistics for our system: the distributions of extreme values, of record values, of extreme value occurrence in the chain, and of the number of records in the chain. We observe that, for our data, the shape of the extreme event distributions is dominated by the underlying exponential distance distribution extracted from the atomic traces. Furthermore, we find that even small drifts influence the statistics of extreme events and record values, which is supported by numerical simulations, and we identify cases in which the drift can be determined without information about the underlying random variable distributions. Our results facilitate the use of extreme event statistics as a signal for small drifts in correlated trajectories.

  17. Utility of linking primary care electronic medical records with Canadian census data to study the determinants of chronic disease: an example based on socioeconomic status and obesity.

    PubMed

    Biro, Suzanne; Williamson, Tyler; Leggett, Jannet Ann; Barber, David; Morkem, Rachael; Moore, Kieran; Belanger, Paul; Mosley, Brian; Janssen, Ian

    2016-03-11

    Electronic medical records (EMRs) used in primary care contain a breadth of data that can be used in public health research. Patient data from EMRs could be linked with other data sources, such as a postal code linkage with Census data, to obtain additional information on environmental determinants of health. While promising, successful linkages between primary care EMRs with geographic measures is limited due to ethics review board concerns. This study tested the feasibility of extracting full postal code from primary care EMRs and linking this with area-level measures of the environment to demonstrate how such a linkage could be used to examine the determinants of disease. The association between obesity and area-level deprivation was used as an example to illustrate inequalities of obesity in adults. The analysis included EMRs of 7153 patients aged 20 years and older who visited a single, primary care site in 2011. Extracted patient information included demographics (date of birth, sex, postal code) and weight status (height, weight). Information extraction and management procedures were designed to mitigate the risk of individual re-identification when extracting full postal code from source EMRs. Based on patients' postal codes, area-based deprivation indexes were created using the smallest area unit used in Canadian censuses. Descriptive statistics and socioeconomic disparity summary measures of linked census and adult patients were calculated. The data extraction of full postal code met technological requirements for rendering health information extracted from local EMRs into anonymized data. The prevalence of obesity was 31.6 %. There was variation of obesity between deprivation quintiles; adults in the most deprived areas were 35 % more likely to be obese compared with adults in the least deprived areas (Chi-Square = 20.24(1), p < 0.0001). Maps depicting spatial representation of regional deprivation and obesity were created to highlight high risk areas. An area based socio-economic measure was linked with EMR-derived objective measures of height and weight to show a positive association between area-level deprivation and obesity. The linked dataset demonstrates a promising model for assessing health disparities and ecological factors associated with the development of chronic diseases with far reaching implications for informing public health and primary health care interventions and services.

  18. The statistical analysis of circadian phase and amplitude in constant-routine core-temperature data

    NASA Technical Reports Server (NTRS)

    Brown, E. N.; Czeisler, C. A.

    1992-01-01

    Accurate estimation of the phases and amplitude of the endogenous circadian pacemaker from constant-routine core-temperature series is crucial for making inferences about the properties of the human biological clock from data collected under this protocol. This paper presents a set of statistical methods based on a harmonic-regression-plus-correlated-noise model for estimating the phases and the amplitude of the endogenous circadian pacemaker from constant-routine core-temperature data. The methods include a Bayesian Monte Carlo procedure for computing the uncertainty in these circadian functions. We illustrate the techniques with a detailed study of a single subject's core-temperature series and describe their relationship to other statistical methods for circadian data analysis. In our laboratory, these methods have been successfully used to analyze more than 300 constant routines and provide a highly reliable means of extracting phase and amplitude information from core-temperature data.

  19. Eigenvalue statistics for the sum of two complex Wishart matrices

    NASA Astrophysics Data System (ADS)

    Kumar, Santosh

    2014-09-01

    The sum of independent Wishart matrices, taken from distributions with unequal covariance matrices, plays a crucial role in multivariate statistics, and has applications in the fields of quantitative finance and telecommunication. However, analytical results concerning the corresponding eigenvalue statistics have remained unavailable, even for the sum of two Wishart matrices. This can be attributed to the complicated and rotationally noninvariant nature of the matrix distribution that makes extracting the information about eigenvalues a nontrivial task. Using a generalization of the Harish-Chandra-Itzykson-Zuber integral, we find exact solution to this problem for the complex Wishart case when one of the covariance matrices is proportional to the identity matrix, while the other is arbitrary. We derive exact and compact expressions for the joint probability density and marginal density of eigenvalues. The analytical results are compared with numerical simulations and we find perfect agreement.

  20. Annual Research Briefs, 1987

    NASA Technical Reports Server (NTRS)

    Moin, Parviz; Reynolds, William C.

    1988-01-01

    Lagrangian techniques have found widespread application to the prediction and understanding of turbulent transport phenomena and have yielded satisfactory results for different cases of shear flow problems. However, it must be kept in mind that in most experiments what is really available are Eulerian statistics, and it is far from obvious how to extract from them the information relevant to the Lagrangian behavior of the flow; in consequence, Lagrangian models still include some hypothesis for which no adequate supporting evidence was until now available. Direct numerical simulation of turbulence offers a new way to obtain Lagrangian statistics and so verify the validity of the current predictive models and the accuracy of their results. After the pioneering work of Riley (Riley and Patterson, 1974) in the 70's, some such results have just appeared in the literature (Lee et al, Yeung and Pope). The present contribution follows in part similar lines, but focuses on two particle statistics and comparison with existing models.

  1. Spatially Pooled Contrast Responses Predict Neural and Perceptual Similarity of Naturalistic Image Categories

    PubMed Central

    Groen, Iris I. A.; Ghebreab, Sennay; Lamme, Victor A. F.; Scholte, H. Steven

    2012-01-01

    The visual world is complex and continuously changing. Yet, our brain transforms patterns of light falling on our retina into a coherent percept within a few hundred milliseconds. Possibly, low-level neural responses already carry substantial information to facilitate rapid characterization of the visual input. Here, we computationally estimated low-level contrast responses to computer-generated naturalistic images, and tested whether spatial pooling of these responses could predict image similarity at the neural and behavioral level. Using EEG, we show that statistics derived from pooled responses explain a large amount of variance between single-image evoked potentials (ERPs) in individual subjects. Dissimilarity analysis on multi-electrode ERPs demonstrated that large differences between images in pooled response statistics are predictive of more dissimilar patterns of evoked activity, whereas images with little difference in statistics give rise to highly similar evoked activity patterns. In a separate behavioral experiment, images with large differences in statistics were judged as different categories, whereas images with little differences were confused. These findings suggest that statistics derived from low-level contrast responses can be extracted in early visual processing and can be relevant for rapid judgment of visual similarity. We compared our results with two other, well- known contrast statistics: Fourier power spectra and higher-order properties of contrast distributions (skewness and kurtosis). Interestingly, whereas these statistics allow for accurate image categorization, they do not predict ERP response patterns or behavioral categorization confusions. These converging computational, neural and behavioral results suggest that statistics of pooled contrast responses contain information that corresponds with perceived visual similarity in a rapid, low-level categorization task. PMID:23093921

  2. Comparison of antibacterial activity of Talok (Muntingia calabura L) leaves ethanolic and n-hexane extracts on Propionibacterium acnes

    NASA Astrophysics Data System (ADS)

    Desrini, Sufi; Ghiffary, Hifzhan Maulana

    2018-04-01

    Muntingia calabura L., also known locally as Talok or Kersen, is a plant which has been widely used as traditional medicine in Indonesia. In this study, we evaluated the antibacterial activity of Muntingia calabura L. Leaves ethanolic and n-hexane extract extract on Propionibacterium acnes. Antibacterial activity was determined in the extracts using agar well diffusion method. The antibacterial activities of each extract (2 mg/mL, 8 mg/ml, 20 mg/mL 30 mg/mL, and 40 mg/mL) were tested against to Propionibacterium acnes. Zone of inhibition of ethanolic extract and n-hexane extract was measured, compared, and analyzed by using a statistical programme. The phytochemical analyses of the plants were carried out using thin chromatography layer (TLC). The average diameter zone of inhibition at the concentration of 2 mg/mL of the ethanolic extract is 9,97 mm while n-Hexane extract at the same concentration showed 0 mm. Statistical test used was non-parametric test using Kruskal Wallis test which was continued to the Mann-Whitney to see the magnitude of the difference between concentration among groups. Kruskal-Wallis test revealed a significant value 0,000. Based on the result of Post Hoc test using Mann - Whitney test, there is the statistically significant difference between each concentration of ethanolic extract and n-hexane as well as positive control group (p-value < 0,05). Both extracts have antibacterial activity on P.acne. However, ethanolic extract of Muntingia calabura L. is better in inhibiting Propionibacterium acnes growth than n-hexane extract.

  3. Proton radius from electron scattering data

    NASA Astrophysics Data System (ADS)

    Higinbotham, Douglas W.; Kabir, Al Amin; Lin, Vincent; Meekins, David; Norum, Blaine; Sawatzky, Brad

    2016-05-01

    Background: The proton charge radius extracted from recent muonic hydrogen Lamb shift measurements is significantly smaller than that extracted from atomic hydrogen and electron scattering measurements. The discrepancy has become known as the proton radius puzzle. Purpose: In an attempt to understand the discrepancy, we review high-precision electron scattering results from Mainz, Jefferson Lab, Saskatoon, and Stanford. Methods: We make use of stepwise regression techniques using the F test as well as the Akaike information criterion to systematically determine the predictive variables to use for a given set and range of electron scattering data as well as to provide multivariate error estimates. Results: Starting with the precision, low four-momentum transfer (Q2) data from Mainz (1980) and Saskatoon (1974), we find that a stepwise regression of the Maclaurin series using the F test as well as the Akaike information criterion justify using a linear extrapolation which yields a value for the proton radius that is consistent with the result obtained from muonic hydrogen measurements. Applying the same Maclaurin series and statistical criteria to the 2014 Rosenbluth results on GE from Mainz, we again find that the stepwise regression tends to favor a radius consistent with the muonic hydrogen radius but produces results that are extremely sensitive to the range of data included in the fit. Making use of the high-Q2 data on GE to select functions which extrapolate to high Q2, we find that a Padé (N =M =1 ) statistical model works remarkably well, as does a dipole function with a 0.84 fm radius, GE(Q2) =(1+Q2/0.66 GeV2) -2 . Conclusions: Rigorous applications of stepwise regression techniques and multivariate error estimates result in the extraction of a proton charge radius that is consistent with the muonic hydrogen result of 0.84 fm; either from linear extrapolation of the extremely-low-Q2 data or by use of the Padé approximant for extrapolation using a larger range of data. Thus, based on a purely statistical analysis of electron scattering data, we conclude that the electron scattering results and the muonic hydrogen results are consistent. It is the atomic hydrogen results that are the outliers.

  4. Digital disaster evaluation and its application to 2015 Ms 8.1 Nepal Earthquake

    NASA Astrophysics Data System (ADS)

    WANG, Xiaoqing; LV, Jinxia; DING, Xiang; DOU, Aixia

    2016-11-01

    The purpose of the article is to probe the technique resolution of disaster information extraction and evaluation from the digital RS images based on the internet environment and aided by the social and geographic information. The solution is composed with such methods that the fast post-disaster assessment system will assess automatically the disaster area and grade, the multi-phase satellite and airborne high resolution digital RS images will provide the basis to extract the disaster areas or spots, assisted by the fast position of potential serious damage risk targets according to the geographic, administrative, population, buildings and other information in the estimated disaster region, the 2D digital map system or 3D digital earth system will provide platforms to interpret cooperatively the damage information in the internet environment, and further to estimate the spatial distribution of damage index or intensity, casualties or economic losses, which are very useful for the decision-making of emergency rescue and disaster relief, resettlement and reconstruction. The spatial seismic damage distribution of 2015 Ms 8.1 Nepal earthquake, as an example of the above solution, is evaluated by using the high resolution digital RS images, auxiliary geographic information and ground survey. The results are compared with the statistical disaster information issued by the ground truth by field surveying, and show good consistency.

  5. A research framework for pharmacovigilance in health social media: Identification and evaluation of patient adverse drug event reports.

    PubMed

    Liu, Xiao; Chen, Hsinchun

    2015-12-01

    Social media offer insights of patients' medical problems such as drug side effects and treatment failures. Patient reports of adverse drug events from social media have great potential to improve current practice of pharmacovigilance. However, extracting patient adverse drug event reports from social media continues to be an important challenge for health informatics research. In this study, we develop a research framework with advanced natural language processing techniques for integrated and high-performance patient reported adverse drug event extraction. The framework consists of medical entity extraction for recognizing patient discussions of drug and events, adverse drug event extraction with shortest dependency path kernel based statistical learning method and semantic filtering with information from medical knowledge bases, and report source classification to tease out noise. To evaluate the proposed framework, a series of experiments were conducted on a test bed encompassing about postings from major diabetes and heart disease forums in the United States. The results reveal that each component of the framework significantly contributes to its overall effectiveness. Our framework significantly outperforms prior work. Published by Elsevier Inc.

  6. Level statistics of words: Finding keywords in literary texts and symbolic sequences

    NASA Astrophysics Data System (ADS)

    Carpena, P.; Bernaola-Galván, P.; Hackenberg, M.; Coronado, A. V.; Oliver, J. L.

    2009-03-01

    Using a generalization of the level statistics analysis of quantum disordered systems, we present an approach able to extract automatically keywords in literary texts. Our approach takes into account not only the frequencies of the words present in the text but also their spatial distribution along the text, and is based on the fact that relevant words are significantly clustered (i.e., they self-attract each other), while irrelevant words are distributed randomly in the text. Since a reference corpus is not needed, our approach is especially suitable for single documents for which no a priori information is available. In addition, we show that our method works also in generic symbolic sequences (continuous texts without spaces), thus suggesting its general applicability.

  7. Automatic identification of bullet signatures based on consecutive matching striae (CMS) criteria.

    PubMed

    Chu, Wei; Thompson, Robert M; Song, John; Vorburger, Theodore V

    2013-09-10

    The consecutive matching striae (CMS) numeric criteria for firearm and toolmark identifications have been widely accepted by forensic examiners, although there have been questions concerning its observer subjectivity and limited statistical support. In this paper, based on signal processing and extraction, a model for the automatic and objective counting of CMS is proposed. The position and shape information of the striae on the bullet land is represented by a feature profile, which is used for determining the CMS number automatically. Rapid counting of CMS number provides a basis for ballistics correlations with large databases and further statistical and probability analysis. Experimental results in this report using bullets fired from ten consecutively manufactured barrels support this developed model. Published by Elsevier Ireland Ltd.

  8. Competitive-Cooperative Automated Reasoning from Distributed and Multiple Source of Data

    NASA Astrophysics Data System (ADS)

    Fard, Amin Milani

    Knowledge extraction from distributed database systems, have been investigated during past decade in order to analyze billions of information records. In this work a competitive deduction approach in a heterogeneous data grid environment is proposed using classic data mining and statistical methods. By applying a game theory concept in a multi-agent model, we tried to design a policy for hierarchical knowledge discovery and inference fusion. To show the system run, a sample multi-expert system has also been developed.

  9. Improving the Prognostic Ability through Better Use of Standard Clinical Data - The Nottingham Prognostic Index as an Example

    PubMed Central

    Winzer, Klaus-Jürgen; Buchholz, Anika; Schumacher, Martin; Sauerbrei, Willi

    2016-01-01

    Background Prognostic factors and prognostic models play a key role in medical research and patient management. The Nottingham Prognostic Index (NPI) is a well-established prognostic classification scheme for patients with breast cancer. In a very simple way, it combines the information from tumor size, lymph node stage and tumor grade. For the resulting index cutpoints are proposed to classify it into three to six groups with different prognosis. As not all prognostic information from the three and other standard factors is used, we will consider improvement of the prognostic ability using suitable analysis approaches. Methods and Findings Reanalyzing overall survival data of 1560 patients from a clinical database by using multivariable fractional polynomials and further modern statistical methods we illustrate suitable multivariable modelling and methods to derive and assess the prognostic ability of an index. Using a REMARK type profile we summarize relevant steps of the analysis. Adding the information from hormonal receptor status and using the full information from the three NPI components, specifically concerning the number of positive lymph nodes, an extended NPI with improved prognostic ability is derived. Conclusions The prognostic ability of even one of the best established prognostic index in medicine can be improved by using suitable statistical methodology to extract the full information from standard clinical data. This extended version of the NPI can serve as a benchmark to assess the added value of new information, ranging from a new single clinical marker to a derived index from omics data. An established benchmark would also help to harmonize the statistical analyses of such studies and protect against the propagation of many false promises concerning the prognostic value of new measurements. Statistical methods used are generally available and can be used for similar analyses in other diseases. PMID:26938061

  10. Spatio-Temporal Constrained Human Trajectory Generation from the PIR Motion Detector Sensor Network Data: A Geometric Algebra Approach

    PubMed Central

    Yu, Zhaoyuan; Yuan, Linwang; Luo, Wen; Feng, Linyao; Lv, Guonian

    2015-01-01

    Passive infrared (PIR) motion detectors, which can support long-term continuous observation, are widely used for human motion analysis. Extracting all possible trajectories from the PIR sensor networks is important. Because the PIR sensor does not log location and individual information, none of the existing methods can generate all possible human motion trajectories that satisfy various spatio-temporal constraints from the sensor activation log data. In this paper, a geometric algebra (GA)-based approach is developed to generate all possible human trajectories from the PIR sensor network data. Firstly, the representation of the geographical network, sensor activation response sequences and the human motion are represented as algebraic elements using GA. The human motion status of each sensor activation are labeled using the GA-based trajectory tracking. Then, a matrix multiplication approach is developed to dynamically generate the human trajectories according to the sensor activation log and the spatio-temporal constraints. The method is tested with the MERL motion database. Experiments show that our method can flexibly extract the major statistical pattern of the human motion. Compared with direct statistical analysis and tracklet graph method, our method can effectively extract all possible trajectories of the human motion, which makes it more accurate. Our method is also likely to provides a new way to filter other passive sensor log data in sensor networks. PMID:26729123

  11. Statistically Enhanced Model of In Situ Oil Sands Extraction Operations: An Evaluation of Variability in Greenhouse Gas Emissions.

    PubMed

    Orellana, Andrea; Laurenzi, Ian J; MacLean, Heather L; Bergerson, Joule A

    2018-02-06

    Greenhouse gas (GHG) emissions associated with extraction of bitumen from oil sands can vary from project to project and over time. However, the nature and magnitude of this variability have yet to be incorporated into life cycle studies. We present a statistically enhanced life cycle based model (GHOST-SE) for assessing variability of GHG emissions associated with the extraction of bitumen using in situ techniques in Alberta, Canada. It employs publicly available, company-reported operating data, facilitating assessment of inter- and intraproject variability as well as the time evolution of GHG emissions from commercial in situ oil sands projects. We estimate the median GHG emissions associated with bitumen production via cyclic steam stimulation (CSS) to be 77 kg CO 2 eq/bbl bitumen (80% CI: 61-109 kg CO 2 eq/bbl), and via steam assisted gravity drainage (SAGD) to be 68 kg CO 2 eq/bbl bitumen (80% CI: 49-102 kg CO 2 eq/bbl). We also show that the median emissions intensity of Alberta's CSS and SAGD projects have been relatively stable from 2000 to 2013, despite greater than 6-fold growth in production. Variability between projects is the single largest source of variability (driven in part by reservoir characteristics) but intraproject variability (e.g., startups, interruptions), is also important and must be considered in order to inform research or policy priorities.

  12. Spatio-Temporal Constrained Human Trajectory Generation from the PIR Motion Detector Sensor Network Data: A Geometric Algebra Approach.

    PubMed

    Yu, Zhaoyuan; Yuan, Linwang; Luo, Wen; Feng, Linyao; Lv, Guonian

    2015-12-30

    Passive infrared (PIR) motion detectors, which can support long-term continuous observation, are widely used for human motion analysis. Extracting all possible trajectories from the PIR sensor networks is important. Because the PIR sensor does not log location and individual information, none of the existing methods can generate all possible human motion trajectories that satisfy various spatio-temporal constraints from the sensor activation log data. In this paper, a geometric algebra (GA)-based approach is developed to generate all possible human trajectories from the PIR sensor network data. Firstly, the representation of the geographical network, sensor activation response sequences and the human motion are represented as algebraic elements using GA. The human motion status of each sensor activation are labeled using the GA-based trajectory tracking. Then, a matrix multiplication approach is developed to dynamically generate the human trajectories according to the sensor activation log and the spatio-temporal constraints. The method is tested with the MERL motion database. Experiments show that our method can flexibly extract the major statistical pattern of the human motion. Compared with direct statistical analysis and tracklet graph method, our method can effectively extract all possible trajectories of the human motion, which makes it more accurate. Our method is also likely to provides a new way to filter other passive sensor log data in sensor networks.

  13. An exploratory study of air emissions associated with shale gas development and production in the Barnett Shale.

    PubMed

    Rich, Alisa; Grover, James P; Sattler, Melanie L

    2014-01-01

    Information regarding air emissions from shale gas extraction and production is critically important given production is occurring in highly urbanized areas across the United States. Objectives of this exploratory study were to collect ambient air samples in residential areas within 61 m (200 feet) of shale gas extraction/production and determine whether a "fingerprint" of chemicals can be associated with shale gas activity. Statistical analyses correlating fingerprint chemicals with methane, equipment, and processes of extraction/production were performed. Ambient air sampling in residential areas of shale gas extraction and production was conducted at six counties in the Dallas/Fort Worth (DFW) Metroplex from 2008 to 2010. The 39 locations tested were identified by clients that requested monitoring. Seven sites were sampled on 2 days (typically months later in another season), and two sites were sampled on 3 days, resulting in 50 sets of monitoring data. Twenty-four-hour passive samples were collected using summa canisters. Gas chromatography/mass spectrometer analysis was used to identify organic compounds present. Methane was present in concentrations above laboratory detection limits in 49 out of 50 sampling data sets. Most of the areas investigated had atmospheric methane concentrations considerably higher than reported urban background concentrations (1.8-2.0 ppm(v)). Other chemical constituents were found to be correlated with presence of methane. A principal components analysis (PCA) identified multivariate patterns of concentrations that potentially constitute signatures of emissions from different phases of operation at natural gas sites. The first factor identified through the PCA proved most informative. Extreme negative values were strongly and statistically associated with the presence of compressors at sample sites. The seven chemicals strongly associated with this factor (o-xylene, ethylbenzene, 1,2,4-trimethylbenzene, m- and p-xylene, 1,3,5-trimethylbenzene, toluene, and benzene) thus constitute a potential fingerprint of emissions associated with compression. Information regarding air emissions from shale gas development and production is critically important given production is now occurring in highly urbanized areas across the United States. Methane, the primary shale gas constituent, contributes substantially to climate change; other natural gas constituents are known to have adverse health effects. This study goes beyond previous Barnett Shale field studies by encompassing a wider variety of production equipment (wells, tanks, compressors, and separators) and a wider geographical region. The principal components analysis, unique to this study, provides valuable information regarding the ability to anticipate associated shale gas chemical constituents.

  14. Statistical Approach To Extraction Of Texture In SAR

    NASA Technical Reports Server (NTRS)

    Rignot, Eric J.; Kwok, Ronald

    1992-01-01

    Improved statistical method of extraction of textural features in synthetic-aperture-radar (SAR) images takes account of effects of scheme used to sample raw SAR data, system noise, resolution of radar equipment, and speckle. Treatment of speckle incorporated into overall statistical treatment of speckle, system noise, and natural variations in texture. One computes speckle auto-correlation function from system transfer function that expresses effect of radar aperature and incorporates range and azimuth resolutions.

  15. Emerging Concepts of Data Integration in Pathogen Phylodynamics.

    PubMed

    Baele, Guy; Suchard, Marc A; Rambaut, Andrew; Lemey, Philippe

    2017-01-01

    Phylodynamics has become an increasingly popular statistical framework to extract evolutionary and epidemiological information from pathogen genomes. By harnessing such information, epidemiologists aim to shed light on the spatio-temporal patterns of spread and to test hypotheses about the underlying interaction of evolutionary and ecological dynamics in pathogen populations. Although the field has witnessed a rich development of statistical inference tools with increasing levels of sophistication, these tools initially focused on sequences as their sole primary data source. Integrating various sources of information, however, promises to deliver more precise insights in infectious diseases and to increase opportunities for statistical hypothesis testing. Here, we review how the emerging concept of data integration is stimulating new advances in Bayesian evolutionary inference methodology which formalize a marriage of statistical thinking and evolutionary biology. These approaches include connecting sequence to trait evolution, such as for host, phenotypic and geographic sampling information, but also the incorporation of covariates of evolutionary and epidemic processes in the reconstruction procedures. We highlight how a full Bayesian approach to covariate modeling and testing can generate further insights into sequence evolution, trait evolution, and population dynamics in pathogen populations. Specific examples demonstrate how such approaches can be used to test the impact of host on rabies and HIV evolutionary rates, to identify the drivers of influenza dispersal as well as the determinants of rabies cross-species transmissions, and to quantify the evolutionary dynamics of influenza antigenicity. Finally, we briefly discuss how data integration is now also permeating through the inference of transmission dynamics, leading to novel insights into tree-generative processes and detailed reconstructions of transmission trees. [Bayesian inference; birth–death models; coalescent models; continuous trait evolution; covariates; data integration; discrete trait evolution; pathogen phylodynamics.

  16. Emerging Concepts of Data Integration in Pathogen Phylodynamics

    PubMed Central

    Baele, Guy; Suchard, Marc A.; Rambaut, Andrew; Lemey, Philippe

    2017-01-01

    Phylodynamics has become an increasingly popular statistical framework to extract evolutionary and epidemiological information from pathogen genomes. By harnessing such information, epidemiologists aim to shed light on the spatio-temporal patterns of spread and to test hypotheses about the underlying interaction of evolutionary and ecological dynamics in pathogen populations. Although the field has witnessed a rich development of statistical inference tools with increasing levels of sophistication, these tools initially focused on sequences as their sole primary data source. Integrating various sources of information, however, promises to deliver more precise insights in infectious diseases and to increase opportunities for statistical hypothesis testing. Here, we review how the emerging concept of data integration is stimulating new advances in Bayesian evolutionary inference methodology which formalize a marriage of statistical thinking and evolutionary biology. These approaches include connecting sequence to trait evolution, such as for host, phenotypic and geographic sampling information, but also the incorporation of covariates of evolutionary and epidemic processes in the reconstruction procedures. We highlight how a full Bayesian approach to covariate modeling and testing can generate further insights into sequence evolution, trait evolution, and population dynamics in pathogen populations. Specific examples demonstrate how such approaches can be used to test the impact of host on rabies and HIV evolutionary rates, to identify the drivers of influenza dispersal as well as the determinants of rabies cross-species transmissions, and to quantify the evolutionary dynamics of influenza antigenicity. Finally, we briefly discuss how data integration is now also permeating through the inference of transmission dynamics, leading to novel insights into tree-generative processes and detailed reconstructions of transmission trees. [Bayesian inference; birth–death models; coalescent models; continuous trait evolution; covariates; data integration; discrete trait evolution; pathogen phylodynamics. PMID:28173504

  17. Scalar Dissipation Modeling for Passive and Active Scalars: a priori Study Using Direct Numerical Simulation

    NASA Technical Reports Server (NTRS)

    Selle, L. C.; Bellan, Josette

    2006-01-01

    Transitional databases from Direct Numerical Simulation (DNS) of three-dimensional mixing layers for single-phase flows and two-phase flows with evaporation are analyzed and used to examine the typical hypothesis that the scalar dissipation Probability Distribution Function (PDF) may be modeled as a Gaussian. The databases encompass a single-component fuel and four multicomponent fuels, two initial Reynolds numbers (Re), two mass loadings for two-phase flows and two free-stream gas temperatures. Using the DNS calculated moments of the scalar-dissipation PDF, it is shown, consistent with existing experimental information on single-phase flows, that the Gaussian is a modest approximation of the DNS-extracted PDF, particularly poor in the range of the high scalar-dissipation values, which are significant for turbulent reaction rate modeling in non-premixed flows using flamelet models. With the same DNS calculated moments of the scalar-dissipation PDF and making a change of variables, a model of this PDF is proposed in the form of the (beta)-PDF which is shown to approximate much better the DNS-extracted PDF, particularly in the regime of the high scalar-dissipation values. Several types of statistical measures are calculated over the ensemble of the fourteen databases. For each statistical measure, the proposed (beta)-PDF model is shown to be much superior to the Gaussian in approximating the DNS-extracted PDF. Additionally, the agreement between the DNS-extracted PDF and the (beta)-PDF even improves when the comparison is performed for higher initial Re layers, whereas the comparison with the Gaussian is independent of the initial Re values. For two-phase flows, the comparison between the DNS-extracted PDF and the (beta)-PDF also improves with increasing free-stream gas temperature and mass loading. The higher fidelity approximation of the DNS-extracted PDF by the (beta)-PDF with increasing Re, gas temperature and mass loading bodes well for turbulent reaction rate modeling.

  18. Dimensional Changes of Fresh Sockets With Reactive Soft Tissue Preservation: A Cone Beam CT Study.

    PubMed

    Crespi, Roberto; Capparé, Paolo; Crespi, Giovanni; Gastaldi, Giorgio; Gherlone, Enrico Felice

    2017-06-01

    The aim of this study was to assess dimensional changes of the fresh sockets grafted with collagen sheets and maintenance of reactive soft tissue, using cone beam computed tomography (CBCT). Tooth extractions were performed with maximum preservation of the alveolar housing, reactive soft tissue was left into the sockets and collagen sheets filled bone defects. Cone beam computed tomography were performed before and 3 months after extractions. One hundred forty-five teeth, 60 monoradiculars and 85 molars, were extracted. In total, 269 alveoli were evaluated. In Group A, not statistically significant differences were found between monoradiculars, whereas statistically significant differences (P < 0.05) were found between molars, both for mesial and distal alveoli. In Group B, not statistically significant differences were found between maxillary and mandibular bone changes values (P > 0.05) for all types of teeth. This study reported an atraumatic tooth extraction, reactive soft tissue left in situ, and grafted collagen sponge may be helpful to reduce fresh socket collapse after extraction procedures.

  19. Songs as an aid for language acquisition.

    PubMed

    Schön, Daniele; Boyer, Maud; Moreno, Sylvain; Besson, Mireille; Peretz, Isabelle; Kolinsky, Régine

    2008-02-01

    In previous research, Saffran and colleagues [Saffran, J. R., Aslin, R. N., & Newport, E. L. (1996). Statistical learning by 8-month-old infants. Science, 274, 1926-1928; Saffran, J. R., Newport, E. L., & Aslin, R. N. (1996). Word segmentation: The role of distributional cues. Journal of Memory and Language, 35, 606-621.] have shown that adults and infants can use the statistical properties of syllable sequences to extract words from continuous speech. They also showed that a similar learning mechanism operates with musical stimuli [Saffran, J. R., Johnson, R. E. K., Aslin, N., & Newport, E. L. (1999). Abstract Statistical learning of tone sequences by human infants and adults. Cognition, 70, 27-52.]. In this work we combined linguistic and musical information and we compared language learning based on speech sequences to language learning based on sung sequences. We hypothesized that, compared to speech sequences, a consistent mapping of linguistic and musical information would enhance learning. Results confirmed the hypothesis showing a strong learning facilitation of song compared to speech. Most importantly, the present results show that learning a new language, especially in the first learning phase wherein one needs to segment new words, may largely benefit of the motivational and structuring properties of music in song.

  20. Statistical Analysis of the Indus Script Using n-Grams

    PubMed Central

    Yadav, Nisha; Joglekar, Hrishikesh; Rao, Rajesh P. N.; Vahia, Mayank N.; Adhikari, Ronojoy; Mahadevan, Iravatham

    2010-01-01

    The Indus script is one of the major undeciphered scripts of the ancient world. The small size of the corpus, the absence of bilingual texts, and the lack of definite knowledge of the underlying language has frustrated efforts at decipherment since the discovery of the remains of the Indus civilization. Building on previous statistical approaches, we apply the tools of statistical language processing, specifically n-gram Markov chains, to analyze the syntax of the Indus script. We find that unigrams follow a Zipf-Mandelbrot distribution. Text beginner and ender distributions are unequal, providing internal evidence for syntax. We see clear evidence of strong bigram correlations and extract significant pairs and triplets using a log-likelihood measure of association. Highly frequent pairs and triplets are not always highly significant. The model performance is evaluated using information-theoretic measures and cross-validation. The model can restore doubtfully read texts with an accuracy of about 75%. We find that a quadrigram Markov chain saturates information theoretic measures against a held-out corpus. Our work forms the basis for the development of a stochastic grammar which may be used to explore the syntax of the Indus script in greater detail. PMID:20333254

  1. A risk-based approach to management of leachables utilizing statistical analysis of extractables.

    PubMed

    Stults, Cheryl L M; Mikl, Jaromir; Whelehan, Oliver; Morrical, Bradley; Duffield, William; Nagao, Lee M

    2015-04-01

    To incorporate quality by design concepts into the management of leachables, an emphasis is often put on understanding the extractable profile for the materials of construction for manufacturing disposables, container-closure, or delivery systems. Component manufacturing processes may also impact the extractable profile. An approach was developed to (1) identify critical components that may be sources of leachables, (2) enable an understanding of manufacturing process factors that affect extractable profiles, (3) determine if quantitative models can be developed that predict the effect of those key factors, and (4) evaluate the practical impact of the key factors on the product. A risk evaluation for an inhalation product identified injection molding as a key process. Designed experiments were performed to evaluate the impact of molding process parameters on the extractable profile from an ABS inhaler component. Statistical analysis of the resulting GC chromatographic profiles identified processing factors that were correlated with peak levels in the extractable profiles. The combination of statistically significant molding process parameters was different for different types of extractable compounds. ANOVA models were used to obtain optimal process settings and predict extractable levels for a selected number of compounds. The proposed paradigm may be applied to evaluate the impact of material composition and processing parameters on extractable profiles and utilized to manage product leachables early in the development process and throughout the product lifecycle.

  2. The data embedding method

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sandford, M.T. II; Bradley, J.N.; Handel, T.G.

    Data embedding is a new steganographic method for combining digital information sets. This paper describes the data embedding method and gives examples of its application using software written in the C-programming language. Sandford and Handel produced a computer program (BMPEMBED, Ver. 1.51 written for IBM PC/AT or compatible, MS/DOS Ver. 3.3 or later) that implements data embedding in an application for digital imagery. Information is embedded into, and extracted from, Truecolor or color-pallet images in Microsoft{reg_sign} bitmap (.BMP) format. Hiding data in the noise component of a host, by means of an algorithm that modifies or replaces the noise bits,more » is termed {open_quote}steganography.{close_quote} Data embedding differs markedly from conventional steganography, because it uses the noise component of the host to insert information with few or no modifications to the host data values or their statistical properties. Consequently, the entropy of the host data is affected little by using data embedding to add information. The data embedding method applies to host data compressed with transform, or {open_quote}lossy{close_quote} compression algorithms, as for example ones based on discrete cosine transform and wavelet functions. Analysis of the host noise generates a key required for embedding and extracting the auxiliary data from the combined data. The key is stored easily in the combined data. Images without the key cannot be processed to extract the embedded information. To provide security for the embedded data, one can remove the key from the combined data and manage it separately. The image key can be encrypted and stored in the combined data or transmitted separately as a ciphertext much smaller in size than the embedded data. The key size is typically ten to one-hundred bytes, and it is in data an analysis algorithm.« less

  3. Identification of Cichlid Fishes from Lake Malawi Using Computer Vision

    PubMed Central

    Joo, Deokjin; Kwan, Ye-seul; Song, Jongwoo; Pinho, Catarina; Hey, Jody; Won, Yong-Jin

    2013-01-01

    Background The explosively radiating evolution of cichlid fishes of Lake Malawi has yielded an amazing number of haplochromine species estimated as many as 500 to 800 with a surprising degree of diversity not only in color and stripe pattern but also in the shape of jaw and body among them. As these morphological diversities have been a central subject of adaptive speciation and taxonomic classification, such high diversity could serve as a foundation for automation of species identification of cichlids. Methodology/Principal Finding Here we demonstrate a method for automatic classification of the Lake Malawi cichlids based on computer vision and geometric morphometrics. For this end we developed a pipeline that integrates multiple image processing tools to automatically extract informative features of color and stripe patterns from a large set of photographic images of wild cichlids. The extracted information was evaluated by statistical classifiers Support Vector Machine and Random Forests. Both classifiers performed better when body shape information was added to the feature of color and stripe. Besides the coloration and stripe pattern, body shape variables boosted the accuracy of classification by about 10%. The programs were able to classify 594 live cichlid individuals belonging to 12 different classes (species and sexes) with an average accuracy of 78%, contrasting to a mere 42% success rate by human eyes. The variables that contributed most to the accuracy were body height and the hue of the most frequent color. Conclusions Computer vision showed a notable performance in extracting information from the color and stripe patterns of Lake Malawi cichlids although the information was not enough for errorless species identification. Our results indicate that there appears an unavoidable difficulty in automatic species identification of cichlid fishes, which may arise from short divergence times and gene flow between closely related species. PMID:24204918

  4. Data embedding method

    NASA Astrophysics Data System (ADS)

    Sandford, Maxwell T., II; Bradley, Jonathan N.; Handel, Theodore G.

    1996-01-01

    Data embedding is a new steganographic method for combining digital information sets. This paper describes the data embedding method and gives examples of its application using software written in the C-programming language. Sandford and Handel produced a computer program (BMPEMBED, Ver. 1.51 written for IBM PC/AT or compatible, MS/DOS Ver. 3.3 or later) that implements data embedding in an application for digital imagery. Information is embedded into, and extracted from, Truecolor or color-pallet images in MicrosoftTM bitmap (BMP) format. Hiding data in the noise component of a host, by means of an algorithm that modifies or replaces the noise bits, is termed `steganography.' Data embedding differs markedly from conventional steganography, because it uses the noise component of the host to insert information with few or no modifications to the host data values or their statistical properties. Consequently, the entropy of the host data is affected little by using data embedding to add information. The data embedding method applies to host data compressed with transform, or `lossy' compression algorithms, as for example ones based on discrete cosine transform and wavelet functions. Analysis of the host noise generates a key required for embedding and extracting the auxiliary data from the combined data. The key is stored easily in the combined data. Images without the key cannot be processed to extract the embedded information. To provide security for the embedded data, one can remove the key from the combined data and manage it separately. The image key can be encrypted and stored in the combined data or transmitted separately as a ciphertext much smaller in size than the embedded data. The key size is typically ten to one-hundred bytes, and it is derived from the original host data by an analysis algorithm.

  5. Analysis of Landsat-4 Thematic Mapper data for classification of forest stands in Baldwin County, Alabama

    NASA Technical Reports Server (NTRS)

    Hill, C. L.

    1984-01-01

    A computer-implemented classification has been derived from Landsat-4 Thematic Mapper data acquired over Baldwin County, Alabama on January 15, 1983. One set of spectral signatures was developed from the data by utilizing a 3x3 pixel sliding window approach. An analysis of the classification produced from this technique identified forested areas. Additional information regarding only the forested areas. Additional information regarding only the forested areas was extracted by employing a pixel-by-pixel signature development program which derived spectral statistics only for pixels within the forested land covers. The spectral statistics from both approaches were integrated and the data classified. This classification was evaluated by comparing the spectral classes produced from the data against corresponding ground verification polygons. This iterative data analysis technique resulted in an overall classification accuracy of 88.4 percent correct for slash pine, young pine, loblolly pine, natural pine, and mixed hardwood-pine. An accuracy assessment matrix has been produced for the classification.

  6. Lead exposure in US worksites: A literature review and development of an occupational lead exposure database from the published literature

    PubMed Central

    Koh, Dong-Hee; Locke, Sarah J.; Chen, Yu-Cheng; Purdue, Mark P.; Friesen, Melissa C.

    2016-01-01

    Background Retrospective exposure assessment of occupational lead exposure in population-based studies requires historical exposure information from many occupations and industries. Methods We reviewed published US exposure monitoring studies to identify lead exposure measurement data. We developed an occupational lead exposure database from the 175 identified papers containing 1,111 sets of lead concentration summary statistics (21% area air, 47% personal air, 32% blood). We also extracted ancillary exposure-related information, including job, industry, task/location, year collected, sampling strategy, control measures in place, and sampling and analytical methods. Results Measurements were published between 1940 and 2010 and represented 27 2-digit standardized industry classification codes. The majority of the measurements were related to lead-based paint work, joining or cutting metal using heat, primary and secondary metal manufacturing, and lead acid battery manufacturing. Conclusions This database can be used in future statistical analyses to characterize differences in lead exposure across time, jobs, and industries. PMID:25968240

  7. Depth-tunable three-dimensional display with interactive light field control

    NASA Astrophysics Data System (ADS)

    Xie, Songlin; Wang, Peng; Sang, Xinzhu; Li, Chenyu; Dou, Wenhua; Xiao, Liquan

    2016-07-01

    A software-defined depth-tunable three-dimensional (3D) display with interactive 3D depth control is presented. With the proposed post-processing system, the disparity of the multi-view media can be freely adjusted. Benefiting from a wealth of information inherently contains in dense multi-view images captured with parallel arrangement camera array, the 3D light field is built and the light field structure is controlled to adjust the disparity without additional acquired depth information since the light field structure itself contains depth information. A statistical analysis based on the least square is carried out to extract the depth information inherently exists in the light field structure and the accurate depth information can be used to re-parameterize light fields for the autostereoscopic display, and a smooth motion parallax can be guaranteed. Experimental results show that the system is convenient and effective to adjust the 3D scene performance in the 3D display.

  8. Estimation of the Scatterer Distribution of the Cirrhotic Liver using Ultrasonic Image

    NASA Astrophysics Data System (ADS)

    Yamaguchi, Tadashi; Hachiya, Hiroyuki

    1998-05-01

    In the B-mode image of the liver obtained by an ultrasonic imaging system, the speckled pattern changes with the progression of the disease such as liver cirrhosis.In this paper we present the statistical characteristics of the echo envelope of the liver, and the technique to extract information of the scatterer distribution from the normal and cirrhotic liver images using constant false alarm rate (CFAR) processing.We analyze the relationship between the extracted scatterer distribution and the stage of liver cirrhosis. The ratio of the area in which the amplitude of the processing signal is more than the threshold to the entire processed image area is related quantitatively to the stage of liver cirrhosis.It is found that the proposed technique is valid for the quantitative diagnosis of liver cirrhosis.

  9. Skeletonization with hollow detection on gray image by gray weighted distance transform

    NASA Astrophysics Data System (ADS)

    Bhattacharya, Prabir; Qian, Kai; Cao, Siqi; Qian, Yi

    1998-10-01

    A skeletonization algorithm that could be used to process non-uniformly distributed gray-scale images with hollows was presented. This algorithm is based on the Gray Weighted Distance Transformation. The process includes a preliminary phase of investigation in the hollows in the gray-scale image, whether these hollows are considered as topological constraints for the skeleton structure depending on their statistically significant depth. We then extract the resulting skeleton that has certain meaningful information for understanding the object in the image. This improved algorithm can overcome the possible misinterpretation of some complicated images in the extracted skeleton, especially in images with asymmetric hollows and asymmetric features. This algorithm can be executed on a parallel machine as all the operations are executed in local. Some examples are discussed to illustrate the algorithm.

  10. Text line extraction in free style document

    NASA Astrophysics Data System (ADS)

    Shen, Xiaolu; Liu, Changsong; Ding, Xiaoqing; Zou, Yanming

    2009-01-01

    This paper addresses to text line extraction in free style document, such as business card, envelope, poster, etc. In free style document, global property such as character size, line direction can hardly be concluded, which reveals a grave limitation in traditional layout analysis. 'Line' is the most prominent and the highest structure in our bottom-up method. First, we apply a novel intensity function found on gradient information to locate text areas where gradient within a window have large magnitude and various directions, and split such areas into text pieces. We build a probability model of lines consist of text pieces via statistics on training data. For an input image, we group text pieces to lines using a simulated annealing algorithm with cost function based on the probability model.

  11. Analysis of spike-wave discharges in rats using discrete wavelet transform.

    PubMed

    Ubeyli, Elif Derya; Ilbay, Gül; Sahin, Deniz; Ateş, Nurbay

    2009-03-01

    A feature is a distinctive or characteristic measurement, transform, structural component extracted from a segment of a pattern. Features are used to represent patterns with the goal of minimizing the loss of important information. The discrete wavelet transform (DWT) as a feature extraction method was used in representing the spike-wave discharges (SWDs) records of Wistar Albino Glaxo/Rijswijk (WAG/Rij) rats. The SWD records of WAG/Rij rats were decomposed into time-frequency representations using the DWT and the statistical features were calculated to depict their distribution. The obtained wavelet coefficients were used to identify characteristics of the signal that were not apparent from the original time domain signal. The present study demonstrates that the wavelet coefficients are useful in determining the dynamics in the time-frequency domain of SWD records.

  12. Use of a spatial scan statistic to identify clusters of births occurring outside Ghanaian health facilities for targeted intervention.

    PubMed

    Bosomprah, Samuel; Dotse-Gborgbortsi, Winfred; Aboagye, Patrick; Matthews, Zoe

    2016-11-01

    To identify and evaluate clusters of births that occurred outside health facilities in Ghana for targeted intervention. A retrospective study was conducted using a convenience sample of live births registered in Ghanaian health facilities from January 1 to December 31, 2014. Data were extracted from the district health information system. A spatial scan statistic was used to investigate clusters of home births through a discrete Poisson probability model. Scanning with a circular spatial window was conducted only for clusters with high rates of such deliveries. The district was used as the geographic unit of analysis. The likelihood P value was estimated using Monte Carlo simulations. Ten statistically significant clusters with a high rate of home birth were identified. The relative risks ranged from 1.43 ("least likely" cluster; P=0.001) to 1.95 ("most likely" cluster; P=0.001). The relative risks of the top five "most likely" clusters ranged from 1.68 to 1.95; these clusters were located in Ashanti, Brong Ahafo, and the Western, Eastern, and Greater regions of Accra. Health facility records, geospatial techniques, and geographic information systems provided locally relevant information to assist policy makers in delivering targeted interventions to small geographic areas. Copyright © 2016 International Federation of Gynecology and Obstetrics. Published by Elsevier Ireland Ltd. All rights reserved.

  13. High-resolution image reconstruction technique applied to the optical testing of ground-based astronomical telescopes

    NASA Astrophysics Data System (ADS)

    Jin, Zhenyu; Lin, Jing; Liu, Zhong

    2008-07-01

    By study of the classical testing techniques (such as Shack-Hartmann Wave-front Sensor) adopted in testing the aberration of ground-based astronomical optical telescopes, we bring forward two testing methods on the foundation of high-resolution image reconstruction technology. One is based on the averaged short-exposure OTF and the other is based on the Speckle Interferometric OTF by Antoine Labeyrie. Researches made by J.Ohtsubo, F. Roddier, Richard Barakat and J.-Y. ZHANG indicated that the SITF statistical results would be affected by the telescope optical aberrations, which means the SITF statistical results is a function of optical system aberration and the atmospheric Fried parameter (seeing). Telescope diffraction-limited information can be got through two statistics methods of abundant speckle images: by the first method, we can extract the low frequency information such as the full width at half maximum (FWHM) of the telescope PSF to estimate the optical quality; by the second method, we can get a more precise description of the telescope PSF with high frequency information. We will apply the two testing methods to the 2.4m optical telescope of the GMG Observatory, in china to validate their repeatability and correctness and compare the testing results with that of the Shack-Hartmann Wave-Front Sensor got. This part will be described in detail in our paper.

  14. Clinical research of persimmon leaf extract and ginkgo biloba extract in the treatment of vertebrobasilar insufficiency.

    PubMed

    Guo, S G; Guan, S H; Wang, G M; Liu, G Y; Sun, H; Wang, B J; Xu, F

    2015-01-01

    This paper aims to compare the curative effects of persimmon leaf extract and ginkgo biloba extract in the treatment of headache and dizziness caused by vertebrobasilar insufficiency. Sixty patients were observed, who underwent therapy with persimmon leaf extract and ginkgo biloba extract based on the treatment of nimodipine and aspirin. After 30 days, 30 patients treated with persimmon leaf extract and 30 patients with ginkgo biloba extract were examined for changes in hemodynamic indexes and symptoms, such as headache and dizziness. The results showed statistically significant differences of 88.3% for the persimmon leaf extract and 73.1% for the ginkgo biloba extract, P < 0.05. Compared to the group of ginkgo biloba extract, the group of persimmon leaf extract had more apparent improvement in the whole blood viscosity, plasma viscosity, fibrinogen, hematokrit, and platelet adhesion rate, and the difference was statistically significant (P < 0.05 or P < 0.01). Based on these analyses, it can be concluded that persimmon leaf extract is better than ginkgo biloba extract in many aspects, such as cerebral circulation improvement, cerebral vascular expansion, hypercoagulable state lowering and vertebrobasilar insufficiency-induced headache and dizziness relief.

  15. Is There a Common Summary Statistical Process for Representing the Mean and Variance? A Study Using Illustrations of Familiar Items.

    PubMed

    Yang, Yi; Tokita, Midori; Ishiguchi, Akira

    2018-01-01

    A number of studies revealed that our visual system can extract different types of summary statistics, such as the mean and variance, from sets of items. Although the extraction of such summary statistics has been studied well in isolation, the relationship between these statistics remains unclear. In this study, we explored this issue using an individual differences approach. Observers viewed illustrations of strawberries and lollypops varying in size or orientation and performed four tasks in a within-subject design, namely mean and variance discrimination tasks with size and orientation domains. We found that the performances in the mean and variance discrimination tasks were not correlated with each other and demonstrated that extractions of the mean and variance are mediated by different representation mechanisms. In addition, we tested the relationship between performances in size and orientation domains for each summary statistic (i.e. mean and variance) and examined whether each summary statistic has distinct processes across perceptual domains. The results illustrated that statistical summary representations of size and orientation may share a common mechanism for representing the mean and possibly for representing variance. Introspections for each observer performing the tasks were also examined and discussed.

  16. Information extraction and knowledge graph construction from geoscience literature

    NASA Astrophysics Data System (ADS)

    Wang, Chengbin; Ma, Xiaogang; Chen, Jianguo; Chen, Jingwen

    2018-03-01

    Geoscience literature published online is an important part of open data, and brings both challenges and opportunities for data analysis. Compared with studies of numerical geoscience data, there are limited works on information extraction and knowledge discovery from textual geoscience data. This paper presents a workflow and a few empirical case studies for that topic, with a focus on documents written in Chinese. First, we set up a hybrid corpus combining the generic and geology terms from geology dictionaries to train Chinese word segmentation rules of the Conditional Random Fields model. Second, we used the word segmentation rules to parse documents into individual words, and removed the stop-words from the segmentation results to get a corpus constituted of content-words. Third, we used a statistical method to analyze the semantic links between content-words, and we selected the chord and bigram graphs to visualize the content-words and their links as nodes and edges in a knowledge graph, respectively. The resulting graph presents a clear overview of key information in an unstructured document. This study proves the usefulness of the designed workflow, and shows the potential of leveraging natural language processing and knowledge graph technologies for geoscience.

  17. MIMS - MEDICAL INFORMATION MANAGEMENT SYSTEM

    NASA Technical Reports Server (NTRS)

    Frankowski, J. W.

    1994-01-01

    MIMS, Medical Information Management System is an interactive, general purpose information storage and retrieval system. It was first designed to be used in medical data management, and can be used to handle all aspects of data related to patient care. Other areas of application for MIMS include: managing occupational safety data in the public and private sectors; handling judicial information where speed and accuracy are high priorities; systemizing purchasing and procurement systems; and analyzing organizational cost structures. Because of its free format design, MIMS can offer immediate assistance where manipulation of large data bases is required. File structures, data categories, field lengths and formats, including alphabetic and/or numeric, are all user defined. The user can quickly and efficiently extract, display, and analyze the data. Three means of extracting data are provided: certain short items of information, such as social security numbers, can be used to uniquely identify each record for quick access; records can be selected which match conditions defined by the user; and specific categories of data can be selected. Data may be displayed and analyzed in several ways which include: generating tabular information assembled from comparison of all the records on the system; generating statistical information on numeric data such as means, standard deviations and standard errors; and displaying formatted listings of output data. The MIMS program is written in Microsoft FORTRAN-77. It was designed to operate on IBM Personal Computers and compatibles running under PC or MS DOS 2.00 or higher. MIMS was developed in 1987.

  18. Lung lobe segmentation based on statistical atlas and graph cuts

    NASA Astrophysics Data System (ADS)

    Nimura, Yukitaka; Kitasaka, Takayuki; Honma, Hirotoshi; Takabatake, Hirotsugu; Mori, Masaki; Natori, Hiroshi; Mori, Kensaku

    2012-03-01

    This paper presents a novel method that can extract lung lobes by utilizing probability atlas and multilabel graph cuts. Information about pulmonary structures plays very important role for decision of the treatment strategy and surgical planning. The human lungs are divided into five anatomical regions, the lung lobes. Precise segmentation and recognition of lung lobes are indispensable tasks in computer aided diagnosis systems and computer aided surgery systems. A lot of methods for lung lobe segmentation are proposed. However, these methods only target the normal cases. Therefore, these methods cannot extract the lung lobes in abnormal cases, such as COPD cases. To extract lung lobes in abnormal cases, this paper propose a lung lobe segmentation method based on probability atlas of lobe location and multilabel graph cuts. The process consists of three components; normalization based on the patient's physique, probability atlas generation, and segmentation based on graph cuts. We apply this method to six cases of chest CT images including COPD cases. Jaccard index was 79.1%.

  19. Extracting built-up areas from TerraSAR-X data using object-oriented classification method

    NASA Astrophysics Data System (ADS)

    Wang, SuYun; Sun, Z. C.

    2017-02-01

    Based on single-polarized TerraSAR-X, the approach generates homogeneous segments on an arbitrary number of scale levels by applying a region-growing algorithm which takes the intensity of backscatter and shape-related properties into account. The object-oriented procedure consists of three main steps: firstly, the analysis of the local speckle behavior in the SAR intensity data, leading to the generation of a texture image; secondly, a segmentation based on the intensity image; thirdly, the classification of each segment using the derived texture file and intensity information in order to identify and extract build-up areas. In our research, the distribution of BAs in Dongying City is derived from single-polarized TSX SM image (acquired on 17th June 2013) with average ground resolution of 3m using our proposed approach. By cross-validating the random selected validation points with geo-referenced field sites, Quick Bird high-resolution imagery, confusion matrices with statistical indicators are calculated and used for assessing the classification results. The results demonstrate that an overall accuracy 92.89 and a kappa coefficient of 0.85 could be achieved. We have shown that connect texture information with the analysis of the local speckle divergence, combining texture and intensity of construction extraction is feasible, efficient and rapid.

  20. A flexible data-driven comorbidity feature extraction framework.

    PubMed

    Sideris, Costas; Pourhomayoun, Mohammad; Kalantarian, Haik; Sarrafzadeh, Majid

    2016-06-01

    Disease and symptom diagnostic codes are a valuable resource for classifying and predicting patient outcomes. In this paper, we propose a novel methodology for utilizing disease diagnostic information in a predictive machine learning framework. Our methodology relies on a novel, clustering-based feature extraction framework using disease diagnostic information. To reduce the data dimensionality, we identify disease clusters using co-occurrence statistics. We optimize the number of generated clusters in the training set and then utilize these clusters as features to predict patient severity of condition and patient readmission risk. We build our clustering and feature extraction algorithm using the 2012 National Inpatient Sample (NIS), Healthcare Cost and Utilization Project (HCUP) which contains 7 million hospital discharge records and ICD-9-CM codes. The proposed framework is tested on Ronald Reagan UCLA Medical Center Electronic Health Records (EHR) from 3041 Congestive Heart Failure (CHF) patients and the UCI 130-US diabetes dataset that includes admissions from 69,980 diabetic patients. We compare our cluster-based feature set with the commonly used comorbidity frameworks including Charlson's index, Elixhauser's comorbidities and their variations. The proposed approach was shown to have significant gains between 10.7-22.1% in predictive accuracy for CHF severity of condition prediction and 4.65-5.75% in diabetes readmission prediction. Copyright © 2016 Elsevier Ltd. All rights reserved.

  1. What's statistical about learning? Insights from modelling statistical learning as a set of memory processes

    PubMed Central

    2017-01-01

    Statistical learning has been studied in a variety of different tasks, including word segmentation, object identification, category learning, artificial grammar learning and serial reaction time tasks (e.g. Saffran et al. 1996 Science 274, 1926–1928; Orban et al. 2008 Proceedings of the National Academy of Sciences 105, 2745–2750; Thiessen & Yee 2010 Child Development 81, 1287–1303; Saffran 2002 Journal of Memory and Language 47, 172–196; Misyak & Christiansen 2012 Language Learning 62, 302–331). The difference among these tasks raises questions about whether they all depend on the same kinds of underlying processes and computations, or whether they are tapping into different underlying mechanisms. Prior theoretical approaches to statistical learning have often tried to explain or model learning in a single task. However, in many cases these approaches appear inadequate to explain performance in multiple tasks. For example, explaining word segmentation via the computation of sequential statistics (such as transitional probability) provides little insight into the nature of sensitivity to regularities among simultaneously presented features. In this article, we will present a formal computational approach that we believe is a good candidate to provide a unifying framework to explore and explain learning in a wide variety of statistical learning tasks. This framework suggests that statistical learning arises from a set of processes that are inherent in memory systems, including activation, interference, integration of information and forgetting (e.g. Perruchet & Vinter 1998 Journal of Memory and Language 39, 246–263; Thiessen et al. 2013 Psychological Bulletin 139, 792–814). From this perspective, statistical learning does not involve explicit computation of statistics, but rather the extraction of elements of the input into memory traces, and subsequent integration across those memory traces that emphasize consistent information (Thiessen and Pavlik 2013 Cognitive Science 37, 310–343). This article is part of the themed issue ‘New frontiers for statistical learning in the cognitive sciences'. PMID:27872374

  2. What's statistical about learning? Insights from modelling statistical learning as a set of memory processes.

    PubMed

    Thiessen, Erik D

    2017-01-05

    Statistical learning has been studied in a variety of different tasks, including word segmentation, object identification, category learning, artificial grammar learning and serial reaction time tasks (e.g. Saffran et al. 1996 Science 274: , 1926-1928; Orban et al. 2008 Proceedings of the National Academy of Sciences 105: , 2745-2750; Thiessen & Yee 2010 Child Development 81: , 1287-1303; Saffran 2002 Journal of Memory and Language 47: , 172-196; Misyak & Christiansen 2012 Language Learning 62: , 302-331). The difference among these tasks raises questions about whether they all depend on the same kinds of underlying processes and computations, or whether they are tapping into different underlying mechanisms. Prior theoretical approaches to statistical learning have often tried to explain or model learning in a single task. However, in many cases these approaches appear inadequate to explain performance in multiple tasks. For example, explaining word segmentation via the computation of sequential statistics (such as transitional probability) provides little insight into the nature of sensitivity to regularities among simultaneously presented features. In this article, we will present a formal computational approach that we believe is a good candidate to provide a unifying framework to explore and explain learning in a wide variety of statistical learning tasks. This framework suggests that statistical learning arises from a set of processes that are inherent in memory systems, including activation, interference, integration of information and forgetting (e.g. Perruchet & Vinter 1998 Journal of Memory and Language 39: , 246-263; Thiessen et al. 2013 Psychological Bulletin 139: , 792-814). From this perspective, statistical learning does not involve explicit computation of statistics, but rather the extraction of elements of the input into memory traces, and subsequent integration across those memory traces that emphasize consistent information (Thiessen and Pavlik 2013 Cognitive Science 37: , 310-343).This article is part of the themed issue 'New frontiers for statistical learning in the cognitive sciences'. © 2016 The Author(s).

  3. 2016 American Indian/Alaska Native/Native Hawaiian Areas (AIANNH) Michigan, Minnesota, and Wisconsin

    EPA Pesticide Factsheets

    The TIGER/Line shapefiles and related database files (.dbf) are an extract of selected geographic and cartographic information from the U.S. Census Bureau's Master Address File / Topologically Integrated Geographic Encoding and Referencing (MAF/TIGER) Database (MTDB). The MTDB represents a seamless national file with no overlaps or gaps between parts, however, each TIGER/Line shapefile is designed to stand alone as an independent data set, or they can be combined to cover the entire nation. The American Indian/Alaska Native/Native Hawaiian (AIANNH) Areas Shapefile includes the following legal entities: federally recognized American Indian reservations and off-reservation trust land areas, state-recognized American Indian reservations, and Hawaiian home lands (HHLs). The statistical entities included are Alaska Native village statistical areas (ANVSAs), Oklahoma tribal statistical areas (OTSAs), tribal designated statistical areas (TDSAs), and state designated tribal statistical areas (SDTSAs). Joint use areas are also included in this shapefile refer to areas that are administered jointly and/or claimed by two or more American Indian tribes. The Census Bureau designates both legal and statistical joint use areas as unique geographic entities for the purpose of presenting statistical data. The Bureau of Indian Affairs (BIA) within the U.S. Department of the Interior (DOI) provides the list of federally recognized tribes and only provides legal boundary infor

  4. Image encryption using random sequence generated from generalized information domain

    NASA Astrophysics Data System (ADS)

    Xia-Yan, Zhang; Guo-Ji, Zhang; Xuan, Li; Ya-Zhou, Ren; Jie-Hua, Wu

    2016-05-01

    A novel image encryption method based on the random sequence generated from the generalized information domain and permutation-diffusion architecture is proposed. The random sequence is generated by reconstruction from the generalized information file and discrete trajectory extraction from the data stream. The trajectory address sequence is used to generate a P-box to shuffle the plain image while random sequences are treated as keystreams. A new factor called drift factor is employed to accelerate and enhance the performance of the random sequence generator. An initial value is introduced to make the encryption method an approximately one-time pad. Experimental results show that the random sequences pass the NIST statistical test with a high ratio and extensive analysis demonstrates that the new encryption scheme has superior security.

  5. Visual search in scenes involves selective and non-selective pathways

    PubMed Central

    Wolfe, Jeremy M; Vo, Melissa L-H; Evans, Karla K; Greene, Michelle R

    2010-01-01

    How do we find objects in scenes? For decades, visual search models have been built on experiments in which observers search for targets, presented among distractor items, isolated and randomly arranged on blank backgrounds. Are these models relevant to search in continuous scenes? This paper argues that the mechanisms that govern artificial, laboratory search tasks do play a role in visual search in scenes. However, scene-based information is used to guide search in ways that had no place in earlier models. Search in scenes may be best explained by a dual-path model: A “selective” path in which candidate objects must be individually selected for recognition and a “non-selective” path in which information can be extracted from global / statistical information. PMID:21227734

  6. Unfolding single RNA molecules: bridging the gap between equilibrium and non-equilibrium statistical thermodynamics.

    PubMed

    Bustamante, Carlos

    2005-11-01

    During the last 15 years, scientists have developed methods that permit the direct mechanical manipulation of individual molecules. Using this approach, they have begun to investigate the effect of force and torque in chemical and biochemical reactions. These studies span from the study of the mechanical properties of macromolecules, to the characterization of molecular motors, to the mechanical unfolding of individual proteins and RNA. Here I present a review of some of our most recent results using mechanical force to unfold individual molecules of RNA. These studies make it possible to follow in real time the trajectory of each molecule as it unfolds and characterize the various intermediates of the reaction. Moreover, if the process takes place reversibly it is possible to extract both kinetic and thermodynamic information from these experiments at the same time that we characterize the forces that maintain the three-dimensional structure of the molecule in solution. These studies bring us closer to the biological unfolding processes in the cell as they simulate in vitro, the mechanical unfolding of RNAs carried out in the cell by helicases. If the unfolding process occurs irreversibly, I show here that single-molecule experiments can still provide equilibrium, thermodynamic information from non-equilibrium data by using recently discovered fluctuation theorems. Such theorems represent a bridge between equilibrium and non-equilibrium statistical mechanics. In fact, first derived in 1997, the first experimental demonstration of the validity of fluctuation theorems was obtained by unfolding mechanically a single molecule of RNA. It is perhaps a sign of the times that important physical results are these days used to extract information about biological systems and that biological systems are being used to test and confirm fundamental new laws in physics.

  7. Corpus-based Statistical Screening for Phrase Identification

    PubMed Central

    Kim, Won; Wilbur, W. John

    2000-01-01

    Purpose: The authors study the extraction of useful phrases from a natural language database by statistical methods. The aim is to leverage human effort by providing preprocessed phrase lists with a high percentage of useful material. Method: The approach is to develop six different scoring methods that are based on different aspects of phrase occurrence. The emphasis here is not on lexical information or syntactic structure but rather on the statistical properties of word pairs and triples that can be obtained from a large database. Measurements: The Unified Medical Language System (UMLS) incorporates a large list of humanly acceptable phrases in the medical field as a part of its structure. The authors use this list of phrases as a gold standard for validating their methods. A good method is one that ranks the UMLS phrases high among all phrases studied. Measurements are 11-point average precision values and precision-recall curves based on the rankings. Result: The authors find of six different scoring methods that each proves effective in identifying UMLS quality phrases in a large subset of MEDLINE. These methods are applicable both to word pairs and word triples. All six methods are optimally combined to produce composite scoring methods that are more effective than any single method. The quality of the composite methods appears sufficient to support the automatic placement of hyperlinks in text at the site of highly ranked phrases. Conclusion: Statistical scoring methods provide a promising approach to the extraction of useful phrases from a natural language database for the purpose of indexing or providing hyperlinks in text. PMID:10984469

  8. Bootstrapping on Undirected Binary Networks Via Statistical Mechanics

    NASA Astrophysics Data System (ADS)

    Fushing, Hsieh; Chen, Chen; Liu, Shan-Yu; Koehl, Patrice

    2014-09-01

    We propose a new method inspired from statistical mechanics for extracting geometric information from undirected binary networks and generating random networks that conform to this geometry. In this method an undirected binary network is perceived as a thermodynamic system with a collection of permuted adjacency matrices as its states. The task of extracting information from the network is then reformulated as a discrete combinatorial optimization problem of searching for its ground state. To solve this problem, we apply multiple ensembles of temperature regulated Markov chains to establish an ultrametric geometry on the network. This geometry is equipped with a tree hierarchy that captures the multiscale community structure of the network. We translate this geometry into a Parisi adjacency matrix, which has a relative low energy level and is in the vicinity of the ground state. The Parisi adjacency matrix is then further optimized by making block permutations subject to the ultrametric geometry. The optimal matrix corresponds to the macrostate of the original network. An ensemble of random networks is then generated such that each of these networks conforms to this macrostate; the corresponding algorithm also provides an estimate of the size of this ensemble. By repeating this procedure at different scales of the ultrametric geometry of the network, it is possible to compute its evolution entropy, i.e. to estimate the evolution of its complexity as we move from a coarse to a fine description of its geometric structure. We demonstrate the performance of this method on simulated as well as real data networks.

  9. Extraction of indirectly captured information for use in a comparison of offline pH measurement technologies.

    PubMed

    Ritchie, Elspeth K; Martin, Elaine B; Racher, Andy; Jaques, Colin

    2017-06-10

    Understanding the causes of discrepancies in pH readings of a sample can allow more robust pH control strategies to be implemented. It was found that 59.4% of differences between two offline pH measurement technologies for an historical dataset lay outside an expected instrument error range of ±0.02pH. A new variable, Osmo Res , was created using multiple linear regression (MLR) to extract information indirectly captured in the recorded measurements for osmolality. Principal component analysis and time series analysis were used to validate the expansion of the historical dataset with the new variable Osmo Res . MLR was used to identify variables strongly correlated (p<0.05) with differences in pH readings by the two offline pH measurement technologies. These included concentrations of specific chemicals (e.g. glucose) and Osmo Res, indicating culture medium and bolus feed additions as possible causes of discrepancies between the offline pH measurement technologies. Temperature was also identified as statistically significant. It is suggested that this was a result of differences in pH-temperature compensations employed by the pH measurement technologies. In summary, a method for extracting indirectly captured information has been demonstrated, and it has been shown that competing pH measurement technologies were not necessarily interchangeable at the desired level of control (±0.02pH). Copyright © 2017 Elsevier B.V. All rights reserved.

  10. Microwave-assisted extraction and mild saponification for determination of organochlorine pesticides in oyster samples.

    PubMed

    Carro, N; García, I; Ignacio, M-C; Llompart, M; Yebra, M-C; Mouteira, A

    2002-10-01

    A sample-preparation procedure (extraction and saponification) using microwave energy is proposed for determination of organochlorine pesticides in oyster samples. A Plackett-Burman factorial design has been used to optimize the microwave-assisted extraction and mild saponification on a freeze dried sample spiked with a mixture of aldrin, endrin, dieldrin, heptachlor, heptachorepoxide, isodrin, transnonachlor, p, p'-DDE, and p, p'-DDD. Six variables: solvent volume, extraction time, extraction temperature, amount of acetone (%) in the extractant solvent, amount of sample, and volume of NaOH solution were considered in the optimization process. The results show that the amount of sample is statistically significant for dieldrin, aldrin, p, p'-DDE, heptachlor, and transnonachlor and solvent volume for dieldrin, aldrin, and p, p'-DDE. The volume of NaOH solution is statistically significant for aldrin and p, p'-DDE only. Extraction temperature and extraction time seem to be the main factors determining the efficiency of extraction process for isodrin and p, p'-DDE, respectively. The optimized procedure was compared with conventional Soxhlet extraction.

  11. A semi-supervised learning framework for biomedical event extraction based on hidden topics.

    PubMed

    Zhou, Deyu; Zhong, Dayou

    2015-05-01

    Scientists have devoted decades of efforts to understanding the interaction between proteins or RNA production. The information might empower the current knowledge on drug reactions or the development of certain diseases. Nevertheless, due to the lack of explicit structure, literature in life science, one of the most important sources of this information, prevents computer-based systems from accessing. Therefore, biomedical event extraction, automatically acquiring knowledge of molecular events in research articles, has attracted community-wide efforts recently. Most approaches are based on statistical models, requiring large-scale annotated corpora to precisely estimate models' parameters. However, it is usually difficult to obtain in practice. Therefore, employing un-annotated data based on semi-supervised learning for biomedical event extraction is a feasible solution and attracts more interests. In this paper, a semi-supervised learning framework based on hidden topics for biomedical event extraction is presented. In this framework, sentences in the un-annotated corpus are elaborately and automatically assigned with event annotations based on their distances to these sentences in the annotated corpus. More specifically, not only the structures of the sentences, but also the hidden topics embedded in the sentences are used for describing the distance. The sentences and newly assigned event annotations, together with the annotated corpus, are employed for training. Experiments were conducted on the multi-level event extraction corpus, a golden standard corpus. Experimental results show that more than 2.2% improvement on F-score on biomedical event extraction is achieved by the proposed framework when compared to the state-of-the-art approach. The results suggest that by incorporating un-annotated data, the proposed framework indeed improves the performance of the state-of-the-art event extraction system and the similarity between sentences might be precisely described by hidden topics and structures of the sentences. Copyright © 2015 Elsevier B.V. All rights reserved.

  12. Automatic classification of animal vocalizations

    NASA Astrophysics Data System (ADS)

    Clemins, Patrick J.

    2005-11-01

    Bioacoustics, the study of animal vocalizations, has begun to use increasingly sophisticated analysis techniques in recent years. Some common tasks in bioacoustics are repertoire determination, call detection, individual identification, stress detection, and behavior correlation. Each research study, however, uses a wide variety of different measured variables, called features, and classification systems to accomplish these tasks. The well-established field of human speech processing has developed a number of different techniques to perform many of the aforementioned bioacoustics tasks. Melfrequency cepstral coefficients (MFCCs) and perceptual linear prediction (PLP) coefficients are two popular feature sets. The hidden Markov model (HMM), a statistical model similar to a finite autonoma machine, is the most commonly used supervised classification model and is capable of modeling both temporal and spectral variations. This research designs a framework that applies models from human speech processing for bioacoustic analysis tasks. The development of the generalized perceptual linear prediction (gPLP) feature extraction model is one of the more important novel contributions of the framework. Perceptual information from the species under study can be incorporated into the gPLP feature extraction model to represent the vocalizations as the animals might perceive them. By including this perceptual information and modifying parameters of the HMM classification system, this framework can be applied to a wide range of species. The effectiveness of the framework is shown by analyzing African elephant and beluga whale vocalizations. The features extracted from the African elephant data are used as input to a supervised classification system and compared to results from traditional statistical tests. The gPLP features extracted from the beluga whale data are used in an unsupervised classification system and the results are compared to labels assigned by experts. The development of a framework from which to build animal vocalization classifiers will provide bioacoustics researchers with a consistent platform to analyze and classify vocalizations. A common framework will also allow studies to compare results across species and institutions. In addition, the use of automated classification techniques can speed analysis and uncover behavioral correlations not readily apparent using traditional techniques.

  13. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bagher-Ebadian, H; Chetty, I; Liu, C

    Purpose: To examine the impact of image smoothing and noise on the robustness of textural information extracted from CBCT images for prediction of radiotherapy response for patients with head/neck (H/N) cancers. Methods: CBCT image datasets for 14 patients with H/N cancer treated with radiation (70 Gy in 35 fractions) were investigated. A deformable registration algorithm was used to fuse planning CT’s to CBCT’s. Tumor volume was automatically segmented on each CBCT image dataset. Local control at 1-year was used to classify 8 patients as responders (R), and 6 as non-responders (NR). A smoothing filter [2D Adaptive Weiner (2DAW) with 3more » different windows (ψ=3, 5, and 7)], and two noise models (Poisson and Gaussian, SNR=25) were implemented, and independently applied to CBCT images. Twenty-two textural features, describing the spatial arrangement of voxel intensities calculated from gray-level co-occurrence matrices, were extracted for all tumor volumes. Results: Relative to CBCT images without smoothing, none of 22 textural features extracted showed any significant differences when smoothing was applied (using the 2DAW with filtering parameters of ψ=3 and 5), in the responder and non-responder groups. When smoothing, 2DAW with ψ=7 was applied, one textural feature, Information Measure of Correlation, was significantly different relative to no smoothing. Only 4 features (Energy, Entropy, Homogeneity, and Maximum-Probability) were found to be statistically different between the R and NR groups (Table 1). These features remained statistically significant discriminators for R and NR groups in presence of noise and smoothing. Conclusion: This preliminary work suggests that textural classifiers for response prediction, extracted from H&N CBCT images, are robust to low-power noise and low-pass filtering. While other types of filters will alter the spatial frequencies differently, these results are promising. The current study is subject to Type II errors. A much larger cohort of patients is needed to confirm these results. This work was supported in part by a grant from Varian Medical Systems (Palo Alto, CA)« less

  14. Evaluation of Antioxidant Properties, Phenolic Compounds, Anthelmintic, and Cytotoxic Activities of Various Extracts Isolated from Nepeta cadmea: An Endemic Plant for Turkey.

    PubMed

    Kaska, Arzu; Deniz, Nahide; Çiçek, Mehmet; Mammadov, Ramazan

    2018-05-10

    Nepeta cadmea Boiss. is a species endemic to Turkey that belongs to the Nepeta genus. Several species of this genus are used in folk medicine. This study was designed to investigate the phenolic compounds, antioxidant, anthelmintic, and cytotoxic activities of various extracts (ethanol, methanol, acetone, and water) of N. cadmea. The antioxidant activities of these extracts were analyzed using scavenging methods (DPPH, ABTS, and H 2 O 2 scavenging activity), the β-carotene/linoleic acid test system, the phosphomolybdenum method, and metal chelating activity. Among the 4 different extracts of N. cadmea that were evaluated, the water extract showed the highest amount of radical scavenging (DPPH, 25.54 μg/mL and ABTS, 14.51 μg/mL) and antioxidant activities (β-carotene, 86.91%). In the metal chelating and H 2 O 2 scavenging activities, the acetone extract was statistically different from the other extracts. For the phosphomolybdenum method, the antioxidant capacity of the extracts was in the range of 8.15 to 80.40 μg/mg. The phenolic content of the ethanol extract was examined using HPLC and determined some phenolics: epicatechin, chlorogenic, and caffeic acids. With regard to the anthelmintic properties, dose-dependent activity was observed in each of the extracts of N. cadmea. All the extracts exhibited high cytotoxic activities. The results will provide additional information for further studies on the biological activities of N. cadmea, while also helping us to understand the importance of this species. Furthermore, based on the results obtained, N. cadmea may be considered as a potentially useful supplement for the human diet, as well as a natural antioxidant for medicinal applications. The plants of the Nepeta genus have been extensively used as traditional herbal medicines. Nepeta cadmea Boiss., one of the species belonging to the Nepeta genus, is a species endemic to Turkey. In our study, we demonstrated the antioxidant capacities, total phenolic, flavonoid, tannin content, anthelmintic, and cytotoxic activities of various extracts of Nepeta cadmea. The present study could well supply valuable data for future investigations and further information on the potential use of this endemic plant for humans, in both dietary and pharmacological applications. © 2018 Institute of Food Technologists®.

  15. Radial gradient and radial deviation radiomic features from pre-surgical CT scans are associated with survival among lung adenocarcinoma patients.

    PubMed

    Tunali, Ilke; Stringfield, Olya; Guvenis, Albert; Wang, Hua; Liu, Ying; Balagurunathan, Yoganand; Lambin, Philippe; Gillies, Robert J; Schabath, Matthew B

    2017-11-10

    The goal of this study was to extract features from radial deviation and radial gradient maps which were derived from thoracic CT scans of patients diagnosed with lung adenocarcinoma and assess whether these features are associated with overall survival. We used two independent cohorts from different institutions for training (n= 61) and test (n= 47) and focused our analyses on features that were non-redundant and highly reproducible. To reduce the number of features and covariates into a single parsimonious model, a backward elimination approach was applied. Out of 48 features that were extracted, 31 were eliminated because they were not reproducible or were redundant. We considered 17 features for statistical analysis and identified a final model containing the two most highly informative features that were associated with lung cancer survival. One of the two features, radial deviation outside-border separation standard deviation, was replicated in a test cohort exhibiting a statistically significant association with lung cancer survival (multivariable hazard ratio = 0.40; 95% confidence interval 0.17-0.97). Additionally, we explored the biological underpinnings of these features and found radial gradient and radial deviation image features were significantly associated with semantic radiological features.

  16. Human movement stochastic variability leads to diagnostic biomarkers In Autism Spectrum Disorders (ASD)

    NASA Astrophysics Data System (ADS)

    Wu, Di; Torres, Elizabeth B.; Jose, Jorge V.

    2015-03-01

    ASD is a spectrum of neurodevelopmental disorders. The high heterogeneity of the symptoms associated with the disorder impedes efficient diagnoses based on human observations. Recent advances with high-resolution MEM wearable sensors enable accurate movement measurements that may escape the naked eye. It calls for objective metrics to extract physiological relevant information from the rapidly accumulating data. In this talk we'll discuss the statistical analysis of movement data continuously collected with high-resolution sensors at 240Hz. We calculated statistical properties of speed fluctuations within the millisecond time range that closely correlate with the subjects' cognitive abilities. We computed the periodicity and synchronicity of the speed fluctuations' from their power spectrum and ensemble averaged two-point cross-correlation function. We built a two-parameter phase space from the temporal statistical analyses of the nearest neighbor fluctuations that provided a quantitative biomarker for ASD and adult normal subjects and further classified ASD severity. We also found age related developmental statistical signatures and potential ASD parental links in our movement dynamical studies. Our results may have direct clinical applications.

  17. Prognostic value of tissue-type plasminogen activator (tPA) and its complex with the type-1 inhibitor (PAI-1) in breast cancer

    PubMed Central

    Witte, J H de; Sweep, C G J; Klijn, J G M; Grebenschikov, N; Peters, H A; Look, M P; Tienoven, ThH van; Heuvel, J J T M; Vries, J Bolt-De; Benraad, ThJ; Foekens, J A

    1999-01-01

    The prognostic value of tissue-type plasminogen activator (tPA) measured in samples derived from 865 patients with primary breast cancer using a recently developed enzyme-linked immunosorbent assay (ELISA) was evaluated. Since the assay could easily be adapted to the assessment of the complex of tPA with its type-1 inhibitor (PAI-1), it was investigated whether the tPA:PAI-1 complex also provides prognostic information. To this end, cytosolic extracts and corresponding detergent extracts of 100 000 g pellets obtained after ultracentrifugation when preparing the cytosolic fractions for routine steroid hormone receptor determination were assayed. Statistically significant correlations were found between the cytosolic levels and those determined in the pellet extracts (Spearman correlation coefficient rs = 0.75, P < 0.001 for tPA and r = 0.50, P < 0.001 for tPA:PAI-1 complex). In both Cox univariate and multivariate analysis elevated levels of (total) tPA determined in the pellet extracts, but not in cytosols, were associated with prolonged relapse-free (RFS) and overall survival (OS). In contrast, high levels of the tPA:PAI-1 complex measured in cytosols, but not in the pellet extracts, were associated with a poor RFS and OS. The prognostic information provided by the cytosolic tPA:PAI-1 complex was comparable to that provided by cytosolic (total) PAI-1. Furthermore, the estimated levels of free, uncomplexed tPA and PAI-1, in cytosols and in pellet extracts, were related to patient prognosis in a similar way as the (total) levels of tPA and PAI-1 respectively. Determination of specific forms of components of the plasminogen activation system, i.e. tPA:PAI-1 complex and free, uncomplexed tPA and/or PAI-1, may be considered a useful adjunct to the analyses of the separate components (tPA and/or PAI-1) and provide valuable additional prognostic information with respect to survival of breast cancer patients. © 1999 Cancer Research Campaign PMID:10390010

  18. Effects of first premolar extraction on maxillary and mandibular third molar angulation after orthodontic therapy

    PubMed Central

    Gohilot, Avinash; Pradhan, Tejashri; Keluskar, Kanhoba Mahabaleshwar

    2012-01-01

    Background/Aims To compare the change in the angulation of developing mandibular third molar in both first premolar extraction and non-extraction cases and to determine whether premolar extraction results in a more mesial movement of the mandibular buccal segment and causes favorable rotational changes in the mandibular third molar tilt, which can enhance later eruption of the third molars. Materials and methods Pretreatment (T1) and post treatment (T2) panoramic radiographs were taken of 25 subjects (age 14–19 years) who had been treated by the extraction of all the first premolars and 25 subjects who had been treated with non-extraction. The horizontal reference plane was used to measure and compare the changes in the angles of the developing mandibular third molars. Results The mean uprighting of the maxillary third molars seen in the extraction group was 4 ± 9° on the left side and −17 ± 13° on the right side following treatment (T2 _ T1). For the non-extraction group the mean difference was −16 ± 12° on the left side and 2 ± 13° on the right side. There was a statistically significant difference between the groups (P _ 0.021 on the right side and P _ 0.041 on the left side). Mandibular 3rd molars in extraction group showed no statistical significant change in the angulation. Conclusion Premolar extractions had a positive influence on the developing maxillary third molar angulations both on right and left. Mandibular 3rd molar have shown change in the angulation but not statistically significant. Non-extraction therapy did not have any adverse effect. PMID:25737843

  19. Effects of first premolar extraction on maxillary and mandibular third molar angulation after orthodontic therapy.

    PubMed

    Gohilot, Avinash; Pradhan, Tejashri; Keluskar, Kanhoba Mahabaleshwar

    2012-01-01

    To compare the change in the angulation of developing mandibular third molar in both first premolar extraction and non-extraction cases and to determine whether premolar extraction results in a more mesial movement of the mandibular buccal segment and causes favorable rotational changes in the mandibular third molar tilt, which can enhance later eruption of the third molars. Pretreatment (T1) and post treatment (T2) panoramic radiographs were taken of 25 subjects (age 14-19 years) who had been treated by the extraction of all the first premolars and 25 subjects who had been treated with non-extraction. The horizontal reference plane was used to measure and compare the changes in the angles of the developing mandibular third molars. The mean uprighting of the maxillary third molars seen in the extraction group was 4 ± 9° on the left side and -17 ± 13° on the right side following treatment (T2 _ T1). For the non-extraction group the mean difference was -16 ± 12° on the left side and 2 ± 13° on the right side. There was a statistically significant difference between the groups (P _ 0.021 on the right side and P _ 0.041 on the left side). Mandibular 3rd molars in extraction group showed no statistical significant change in the angulation. Premolar extractions had a positive influence on the developing maxillary third molar angulations both on right and left. Mandibular 3rd molar have shown change in the angulation but not statistically significant. Non-extraction therapy did not have any adverse effect.

  20. Variety and volatility in financial markets

    NASA Astrophysics Data System (ADS)

    Lillo, Fabrizio; Mantegna, Rosario N.

    2000-11-01

    We study the price dynamics of stocks traded in a financial market by considering the statistical properties of both a single time series and an ensemble of stocks traded simultaneously. We use the n stocks traded on the New York Stock Exchange to form a statistical ensemble of daily stock returns. For each trading day of our database, we study the ensemble return distribution. We find that a typical ensemble return distribution exists in most of the trading days with the exception of crash and rally days and of the days following these extreme events. We analyze each ensemble return distribution by extracting its first two central moments. We observe that these moments fluctuate in time and are stochastic processes, themselves. We characterize the statistical properties of ensemble return distribution central moments by investigating their probability density functions and temporal correlation properties. In general, time-averaged and portfolio-averaged price returns have different statistical properties. We infer from these differences information about the relative strength of correlation between stocks and between different trading days. Last, we compare our empirical results with those predicted by the single-index model and we conclude that this simple model cannot explain the statistical properties of the second moment of the ensemble return distribution.

  1. DB4US: A Decision Support System for Laboratory Information Management.

    PubMed

    Carmona-Cejudo, José M; Hortas, Maria Luisa; Baena-García, Manuel; Lana-Linati, Jorge; González, Carlos; Redondo, Maximino; Morales-Bueno, Rafael

    2012-11-14

    Until recently, laboratory automation has focused primarily on improving hardware. Future advances are concentrated on intelligent software since laboratories performing clinical diagnostic testing require improved information systems to address their data processing needs. In this paper, we propose DB4US, an application that automates information related to laboratory quality indicators information. Currently, there is a lack of ready-to-use management quality measures. This application addresses this deficiency through the extraction, consolidation, statistical analysis, and visualization of data related to the use of demographics, reagents, and turn-around times. The design and implementation issues, as well as the technologies used for the implementation of this system, are discussed in this paper. To develop a general methodology that integrates the computation of ready-to-use management quality measures and a dashboard to easily analyze the overall performance of a laboratory, as well as automatically detect anomalies or errors. The novelty of our approach lies in the application of integrated web-based dashboards as an information management system in hospital laboratories. We propose a new methodology for laboratory information management based on the extraction, consolidation, statistical analysis, and visualization of data related to demographics, reagents, and turn-around times, offering a dashboard-like user web interface to the laboratory manager. The methodology comprises a unified data warehouse that stores and consolidates multidimensional data from different data sources. The methodology is illustrated through the implementation and validation of DB4US, a novel web application based on this methodology that constructs an interface to obtain ready-to-use indicators, and offers the possibility to drill down from high-level metrics to more detailed summaries. The offered indicators are calculated beforehand so that they are ready to use when the user needs them. The design is based on a set of different parallel processes to precalculate indicators. The application displays information related to tests, requests, samples, and turn-around times. The dashboard is designed to show the set of indicators on a single screen. DB4US was deployed for the first time in the Hospital Costa del Sol in 2008. In our evaluation we show the positive impact of this methodology for laboratory professionals, since the use of our application has reduced the time needed for the elaboration of the different statistical indicators and has also provided information that has been used to optimize the usage of laboratory resources by the discovery of anomalies in the indicators. DB4US users benefit from Internet-based communication of results, since this information is available from any computer without having to install any additional software. The proposed methodology and the accompanying web application, DB4US, automates the processing of information related to laboratory quality indicators and offers a novel approach for managing laboratory-related information, benefiting from an Internet-based communication mechanism. The application of this methodology has been shown to improve the usage of time, as well as other laboratory resources.

  2. Is There a Common Summary Statistical Process for Representing the Mean and Variance? A Study Using Illustrations of Familiar Items

    PubMed Central

    Yang, Yi; Tokita, Midori; Ishiguchi, Akira

    2018-01-01

    A number of studies revealed that our visual system can extract different types of summary statistics, such as the mean and variance, from sets of items. Although the extraction of such summary statistics has been studied well in isolation, the relationship between these statistics remains unclear. In this study, we explored this issue using an individual differences approach. Observers viewed illustrations of strawberries and lollypops varying in size or orientation and performed four tasks in a within-subject design, namely mean and variance discrimination tasks with size and orientation domains. We found that the performances in the mean and variance discrimination tasks were not correlated with each other and demonstrated that extractions of the mean and variance are mediated by different representation mechanisms. In addition, we tested the relationship between performances in size and orientation domains for each summary statistic (i.e. mean and variance) and examined whether each summary statistic has distinct processes across perceptual domains. The results illustrated that statistical summary representations of size and orientation may share a common mechanism for representing the mean and possibly for representing variance. Introspections for each observer performing the tasks were also examined and discussed. PMID:29399318

  3. Mandibular third molar angulation in extraction and non extraction orthodontic cases.

    PubMed

    Ahmed, Imtiaz; Gul-e-Erum; Kumar, Naresh

    2011-01-01

    The purpose of this study is to determine the angulation of mandibular third molar in orthodontic cases which are planned for extraction and non extraction. This is a cross-sectional descriptive study in which pre-treatment panoramic radiographs of 49 patients, age range 11-26 years were taken from the OPD of Department of Orthodontics, Dr. Ishrat- ul -Ebad Khan Institute of Oral and Health Sciences (DIKIOHS), Dow University of Health Sciences. The angles between the long axis of the second and third molars were measured. Descriptive statistics were applied. Mann-Whitney U-test was used for intergroup comparison extraction and non extraction cases. This study consists of 49 patients with mean age of 17.94 years. Over all result concluded that mandibular third molar angulations were from 8-94 degrees in extraction cases and 10-73 degrees in non extraction cases. However, the pre-treatment 3rd molar angulation differences in extraction and non extraction cases were statistically insignificant with p-value >0.05. This study evaluates third molar angulations in pre-treatment cases, the differences in angulation were like other morphological differences but changes in angulation after treatment may or may not be related to extractions.

  4. Calcium (Ca2+) waves data calibration and analysis using image processing techniques

    PubMed Central

    2013-01-01

    Background Calcium (Ca2+) propagates within tissues serving as an important information carrier. In particular, cilia beat frequency in oviduct cells is partially regulated by Ca2+ changes. Thus, measuring the calcium density and characterizing the traveling wave plays a key role in understanding biological phenomena. However, current methods to measure propagation velocities and other wave characteristics involve several manual or time-consuming procedures. This limits the amount of information that can be extracted, and the statistical quality of the analysis. Results Our work provides a framework based on image processing procedures that enables a fast, automatic and robust characterization of data from two-filter fluorescence Ca2+ experiments. We calculate the mean velocity of the wave-front, and use theoretical models to extract meaningful parameters like wave amplitude, decay rate and time of excitation. Conclusions Measurements done by different operators showed a high degree of reproducibility. This framework is also extended to a single filter fluorescence experiments, allowing higher sampling rates, and thus an increased accuracy in velocity measurements. PMID:23679062

  5. [Identification of green tea brand based on hyperspectra imaging technology].

    PubMed

    Zhang, Hai-Liang; Liu, Xiao-Li; Zhu, Feng-Le; He, Yong

    2014-05-01

    Hyperspectral imaging technology was developed to identify different brand famous green tea based on PCA information and image information fusion. First 512 spectral images of six brands of famous green tea in the 380 approximately 1 023 nm wavelength range were collected and principal component analysis (PCA) was performed with the goal of selecting two characteristic bands (545 and 611 nm) that could potentially be used for classification system. Then, 12 gray level co-occurrence matrix (GLCM) features (i. e., mean, covariance, homogeneity, energy, contrast, correlation, entropy, inverse gap, contrast, difference from the second-order and autocorrelation) based on the statistical moment were extracted from each characteristic band image. Finally, integration of the 12 texture features and three PCA spectral characteristics for each green tea sample were extracted as the input of LS-SVM. Experimental results showed that discriminating rate was 100% in the prediction set. The receiver operating characteristic curve (ROC) assessment methods were used to evaluate the LS-SVM classification algorithm. Overall results sufficiently demonstrate that hyperspectral imaging technology can be used to perform classification of green tea.

  6. Biologically-inspired data decorrelation for hyper-spectral imaging

    NASA Astrophysics Data System (ADS)

    Picon, Artzai; Ghita, Ovidiu; Rodriguez-Vaamonde, Sergio; Iriondo, Pedro Ma; Whelan, Paul F.

    2011-12-01

    Hyper-spectral data allows the construction of more robust statistical models to sample the material properties than the standard tri-chromatic color representation. However, because of the large dimensionality and complexity of the hyper-spectral data, the extraction of robust features (image descriptors) is not a trivial issue. Thus, to facilitate efficient feature extraction, decorrelation techniques are commonly applied to reduce the dimensionality of the hyper-spectral data with the aim of generating compact and highly discriminative image descriptors. Current methodologies for data decorrelation such as principal component analysis (PCA), linear discriminant analysis (LDA), wavelet decomposition (WD), or band selection methods require complex and subjective training procedures and in addition the compressed spectral information is not directly related to the physical (spectral) characteristics associated with the analyzed materials. The major objective of this article is to introduce and evaluate a new data decorrelation methodology using an approach that closely emulates the human vision. The proposed data decorrelation scheme has been employed to optimally minimize the amount of redundant information contained in the highly correlated hyper-spectral bands and has been comprehensively evaluated in the context of non-ferrous material classification

  7. Language extraction from zinc sulfide

    NASA Astrophysics Data System (ADS)

    Varn, Dowman Parks

    2001-09-01

    Recent advances in the analysis of one-dimensional temporal and spacial series allow for detailed characterization of disorder and computation in physical systems. One such system that has defied theoretical understanding since its discovery in 1912 is polytypism. Polytypes are layered compounds, exhibiting crystallinity in two dimensions, yet having complicated stacking sequences in the third direction. They can show both ordered and disordered sequences, sometimes each in the same specimen. We demonstrate a method for extracting two-layer correlation information from ZnS diffraction patterns and employ a novel technique for epsilon-machine reconstruction. We solve a long-standing problem---that of determining structural information for disordered materials from their diffraction patterns---for this special class of disorder. Our solution offers the most complete possible statistical description of the disorder. Furthermore, from our reconstructed epsilon-machines we find the effective range of the interlayer interaction in these materials, as well as the configurational energy of both ordered and disordered specimens. Finally, we can determine the 'language' (in terms of the Chomsky Hierarchy) these small rocks speak, and we find that regular languages are sufficient to describe them.

  8. Information Theory Filters for Wavelet Packet Coefficient Selection with Application to Corrosion Type Identification from Acoustic Emission Signals

    PubMed Central

    Van Dijck, Gert; Van Hulle, Marc M.

    2011-01-01

    The damage caused by corrosion in chemical process installations can lead to unexpected plant shutdowns and the leakage of potentially toxic chemicals into the environment. When subjected to corrosion, structural changes in the material occur, leading to energy releases as acoustic waves. This acoustic activity can in turn be used for corrosion monitoring, and even for predicting the type of corrosion. Here we apply wavelet packet decomposition to extract features from acoustic emission signals. We then use the extracted wavelet packet coefficients for distinguishing between the most important types of corrosion processes in the chemical process industry: uniform corrosion, pitting and stress corrosion cracking. The local discriminant basis selection algorithm can be considered as a standard for the selection of the most discriminative wavelet coefficients. However, it does not take the statistical dependencies between wavelet coefficients into account. We show that, when these dependencies are ignored, a lower accuracy is obtained in predicting the corrosion type. We compare several mutual information filters to take these dependencies into account in order to arrive at a more accurate prediction. PMID:22163921

  9. Information transfer and information modification to identify the structure of cardiovascular and cardiorespiratory networks.

    PubMed

    Faes, Luca; Nollo, Giandomenico; Krohova, Jana; Czippelova, Barbora; Turianikova, Zuzana; Javorka, Michal

    2017-07-01

    To fully elucidate the complex physiological mechanisms underlying the short-term autonomic regulation of heart period (H), systolic and diastolic arterial pressure (S, D) and respiratory (R) variability, the joint dynamics of these variables need to be explored using multivariate time series analysis. This study proposes the utilization of information-theoretic measures to measure causal interactions between nodes of the cardiovascular/cardiorespiratory network and to assess the nature (synergistic or redundant) of these directed interactions. Indexes of information transfer and information modification are extracted from the H, S, D and R series measured from healthy subjects in a resting state and during postural stress. Computations are performed in the framework of multivariate linear regression, using bootstrap techniques to assess on a single-subject basis the statistical significance of each measure and of its transitions across conditions. We find patterns of information transfer and modification which are related to specific cardiovascular and cardiorespiratory mechanisms in resting conditions and to their modification induced by the orthostatic stress.

  10. Antinociceptive and anti-inflammatory activity of the ethanolic extract of Cymbidium aloifolium (L.).

    PubMed

    Howlader, Md Amran; Alam, Mahmudul; Ahmed, Kh Tanvir; Khatun, Farjana; Apu, Apurba Sarker

    2011-10-01

    The ethanol leaf extract of Cymbidium aloifolium (L.) was evaluated for its analgesic and antiinflammatory activities. The extract, at the dose of 200 and 400 mg kg(-1) body weight, exerted the analgesic activity by observing the number of abdominal contractions and anti-inflammatory activity against Carrageenin induced paw edema in mice by measuring the paw volume. The ethanolic extract of Cymbidium aloifolium (L.) showed statistically significant (p < 0.05) reduction of percentage of writhing of 33.57 and 61.31% at 200 and 400 mg kg(-1) oral dose, respectively, when compared to negative control. The Ethanolic plant extract also showed significant (p < 0.05) dose dependent reduction of mean increase of formation of paw edema. The results of the experiment and its statistical analysis showed that the ethanolic plant extract had shown significant (p < 0.05) dose dependent analgesic and anti-inflammatory activities when compared to the control.

  11. GenePublisher: Automated analysis of DNA microarray data.

    PubMed

    Knudsen, Steen; Workman, Christopher; Sicheritz-Ponten, Thomas; Friis, Carsten

    2003-07-01

    GenePublisher, a system for automatic analysis of data from DNA microarray experiments, has been implemented with a web interface at http://www.cbs.dtu.dk/services/GenePublisher. Raw data are uploaded to the server together with a specification of the data. The server performs normalization, statistical analysis and visualization of the data. The results are run against databases of signal transduction pathways, metabolic pathways and promoter sequences in order to extract more information. The results of the entire analysis are summarized in report form and returned to the user.

  12. Applications of geostatistics and Markov models for logo recognition

    NASA Astrophysics Data System (ADS)

    Pham, Tuan

    2003-01-01

    Spatial covariances based on geostatistics are extracted as representative features of logo or trademark images. These spatial covariances are different from other statistical features for image analysis in that the structural information of an image is independent of the pixel locations and represented in terms of spatial series. We then design a classifier in the sense of hidden Markov models to make use of these geostatistical sequential data to recognize the logos. High recognition rates are obtained from testing the method against a public-domain logo database.

  13. Extraction of information from major element chemical analyses of lunar basalts

    NASA Technical Reports Server (NTRS)

    Butler, J. C.

    1985-01-01

    Major element chemical analyses often form the framework within which similarities and differences of analyzed specimens are noted and used to propose or devise models. When percentages are formed the ratios of pairs of components are preserved whereas many familiar statistical and geometrical descriptors are likely to exhibit major changes. This ratio preserving aspect forms the basis for a proposed framework. An analysis of compositional variability within the data set of 42 major element analyses of lunar reference samples was selected to investigate this proposal.

  14. A pilot study of NMR-based sensory prediction of roasted coffee bean extracts.

    PubMed

    Wei, Feifei; Furihata, Kazuo; Miyakawa, Takuya; Tanokura, Masaru

    2014-01-01

    Nuclear magnetic resonance (NMR) spectroscopy can be considered a kind of "magnetic tongue" for the characterisation and prediction of the tastes of foods, since it provides a wealth of information in a nondestructive and nontargeted manner. In the present study, the chemical substances in roasted coffee bean extracts that could distinguish and predict the different sensations of coffee taste were identified by the combination of NMR-based metabolomics and human sensory test and the application of the multivariate projection method of orthogonal projection to latent structures (OPLS). In addition, the tastes of commercial coffee beans were successfully predicted based on their NMR metabolite profiles using our OPLS model, suggesting that NMR-based metabolomics accompanied with multiple statistical models is convenient, fast and accurate for the sensory evaluation of coffee. Copyright © 2013 Elsevier Ltd. All rights reserved.

  15. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Cavanaugh, J.E.; McQuarrie, A.D.; Shumway, R.H.

    Conventional methods for discriminating between earthquakes and explosions at regional distances have concentrated on extracting specific features such as amplitude and spectral ratios from the waveforms of the P and S phases. We consider here an optimum nonparametric classification procedure derived from the classical approach to discriminating between two Gaussian processes with unequal spectra. Two robust variations based on the minimum discrimination information statistic and Renyi's entropy are also considered. We compare the optimum classification procedure with various amplitude and spectral ratio discriminants and show that its performance is superior when applied to a small population of 8 land-based earthquakesmore » and 8 mining explosions recorded in Scandinavia. Several parametric characterizations of the notion of complexity based on modeling earthquakes and explosions as autoregressive or modulated autoregressive processes are also proposed and their performance compared with the nonparametric and feature extraction approaches.« less

  16. The influence of the Tribulus terrestris extract on the parameters of the functional preparedness and athletes' organism homeostasis.

    PubMed

    Milasius, K; Dadeliene, R; Skernevicius, Ju

    2009-01-01

    The influence of the Tribulus terrestris extract on the parameters of the functional preparadness and athletes' organism homeostase was investigated. It was established the positive impact of dietary supplement "Tribulus" (Optimum Nutrition, U.S.A.) using per 1 capsule 3 times a day during 20 days on athletes' physical power in various energy producing zones: anaerobic alactic muscular power and anaerobic alactic glycolytic power statistically reliable increased. Tribulus terrestris extract, after 20 days of consuming it, did not have essential effect on erythrocytes, haemoglobin and thrombocytes indices. During the experimental period statistically importantly increased percentage of granulocytes and decreased percentage of leucocytes show negative impact of this food supplement on changes of leucocytes formula in athletes' blood. Creatinkinase concentration in athletes' blood statistically importantly has increased and creatinine amount has had a tendency to decline during 20 days period of consuming Tribulus terrestris extract. The declining tendency of urea, cholesterol and bilirubin concentrations has appeared. The concentration of blood testosterone increased statistically reliable during the first half (10 days) of the experiment; it did not grow during the next 10 days while consuming Tribulus still.

  17. No-reference image quality assessment based on natural scene statistics and gradient magnitude similarity

    NASA Astrophysics Data System (ADS)

    Jia, Huizhen; Sun, Quansen; Ji, Zexuan; Wang, Tonghan; Chen, Qiang

    2014-11-01

    The goal of no-reference/blind image quality assessment (NR-IQA) is to devise a perceptual model that can accurately predict the quality of a distorted image as human opinions, in which feature extraction is an important issue. However, the features used in the state-of-the-art "general purpose" NR-IQA algorithms are usually natural scene statistics (NSS) based or are perceptually relevant; therefore, the performance of these models is limited. To further improve the performance of NR-IQA, we propose a general purpose NR-IQA algorithm which combines NSS-based features with perceptually relevant features. The new method extracts features in both the spatial and gradient domains. In the spatial domain, we extract the point-wise statistics for single pixel values which are characterized by a generalized Gaussian distribution model to form the underlying features. In the gradient domain, statistical features based on neighboring gradient magnitude similarity are extracted. Then a mapping is learned to predict quality scores using a support vector regression. The experimental results on the benchmark image databases demonstrate that the proposed algorithm correlates highly with human judgments of quality and leads to significant performance improvements over state-of-the-art methods.

  18. Systematization, condensed description, and prediction of sets of anion exchange extraction constants on the basis of their statistical treatment by computer

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mezhov, E.A.; Reimarov, G.A.; Rubisov, V.N.

    1987-05-01

    On the basis of a statistical treatment of the entire set of published data on anion exchange extraction constants, the authors have refined and expanded the scale of the hydration parameters for the anions ..delta..G/sub hydr/ (the effective free energies of hydration for the anions). The authors have estimated the parameters ..delta..G for 93 anions and the coefficients % for 94 series of extraction systems, which are distinguished within each series only by the nature of the exchanging anions. The series are distinguished from one another by the nature of the cation extraction agent and the diluent.

  19. Chemical name extraction based on automatic training data generation and rich feature set.

    PubMed

    Yan, Su; Spangler, W Scott; Chen, Ying

    2013-01-01

    The automation of extracting chemical names from text has significant value to biomedical and life science research. A major barrier in this task is the difficulty of getting a sizable and good quality data to train a reliable entity extraction model. Another difficulty is the selection of informative features of chemical names, since comprehensive domain knowledge on chemistry nomenclature is required. Leveraging random text generation techniques, we explore the idea of automatically creating training sets for the task of chemical name extraction. Assuming the availability of an incomplete list of chemical names, called a dictionary, we are able to generate well-controlled, random, yet realistic chemical-like training documents. We statistically analyze the construction of chemical names based on the incomplete dictionary, and propose a series of new features, without relying on any domain knowledge. Compared to state-of-the-art models learned from manually labeled data and domain knowledge, our solution shows better or comparable results in annotating real-world data with less human effort. Moreover, we report an interesting observation about the language for chemical names. That is, both the structural and semantic components of chemical names follow a Zipfian distribution, which resembles many natural languages.

  20. Local binary pattern variants-based adaptive texture features analysis for posed and nonposed facial expression recognition

    NASA Astrophysics Data System (ADS)

    Sultana, Maryam; Bhatti, Naeem; Javed, Sajid; Jung, Soon Ki

    2017-09-01

    Facial expression recognition (FER) is an important task for various computer vision applications. The task becomes challenging when it requires the detection and encoding of macro- and micropatterns of facial expressions. We present a two-stage texture feature extraction framework based on the local binary pattern (LBP) variants and evaluate its significance in recognizing posed and nonposed facial expressions. We focus on the parametric limitations of the LBP variants and investigate their effects for optimal FER. The size of the local neighborhood is an important parameter of the LBP technique for its extraction in images. To make the LBP adaptive, we exploit the granulometric information of the facial images to find the local neighborhood size for the extraction of center-symmetric LBP (CS-LBP) features. Our two-stage texture representations consist of an LBP variant and the adaptive CS-LBP features. Among the presented two-stage texture feature extractions, the binarized statistical image features and adaptive CS-LBP features were found showing high FER rates. Evaluation of the adaptive texture features shows competitive and higher performance than the nonadaptive features and other state-of-the-art approaches, respectively.

  1. Modelling spatiotemporal change using multidimensional arrays Meng

    NASA Astrophysics Data System (ADS)

    Lu, Meng; Appel, Marius; Pebesma, Edzer

    2017-04-01

    The large variety of remote sensors, model simulations, and in-situ records provide great opportunities to model environmental change. The massive amount of high-dimensional data calls for methods to integrate data from various sources and to analyse spatiotemporal and thematic information jointly. An array is a collection of elements ordered and indexed in arbitrary dimensions, which naturally represent spatiotemporal phenomena that are identified by their geographic locations and recording time. In addition, array regridding (e.g., resampling, down-/up-scaling), dimension reduction, and spatiotemporal statistical algorithms are readily applicable to arrays. However, the role of arrays in big geoscientific data analysis has not been systematically studied: How can arrays discretise continuous spatiotemporal phenomena? How can arrays facilitate the extraction of multidimensional information? How can arrays provide a clean, scalable and reproducible change modelling process that is communicable between mathematicians, computer scientist, Earth system scientist and stakeholders? This study emphasises on detecting spatiotemporal change using satellite image time series. Current change detection methods using satellite image time series commonly analyse data in separate steps: 1) forming a vegetation index, 2) conducting time series analysis on each pixel, and 3) post-processing and mapping time series analysis results, which does not consider spatiotemporal correlations and ignores much of the spectral information. Multidimensional information can be better extracted by jointly considering spatial, spectral, and temporal information. To approach this goal, we use principal component analysis to extract multispectral information and spatial autoregressive models to account for spatial correlation in residual based time series structural change modelling. We also discuss the potential of multivariate non-parametric time series structural change methods, hierarchical modelling, and extreme event detection methods to model spatiotemporal change. We show how array operations can facilitate expressing these methods, and how the open-source array data management and analytics software SciDB and R can be used to scale the process and make it easily reproducible.

  2. Modelling the Effects of Land-Use Changes on Climate: a Case Study on Yamula DAM

    NASA Astrophysics Data System (ADS)

    Köylü, Ü.; Geymen, A.

    2016-10-01

    Dams block flow of rivers and cause artificial water reservoirs which affect the climate and the land use characteristics of the river basin. In this research, the effect of the huge water body obtained by Yamula Dam in Kızılırmak Basin is analysed over surrounding spatial's land use and climate change. Mann Kendal non-parametrical statistical test, Theil&Sen Slope method, Inverse Distance Weighting (IDW), Soil Conservation Service-Curve Number (SCS-CN) methods are integrated for spatial and temporal analysis of the research area. For this research humidity, temperature, wind speed, precipitation observations which are collected in 16 weather stations nearby Kızılırmak Basin are analyzed. After that these statistical information is combined by GIS data over years. An application is developed for GIS analysis in Python Programming Language and integrated with ArcGIS software. Statistical analysis calculated in the R Project for Statistical Computing and integrated with developed application. According to the statistical analysis of extracted time series of meteorological parameters, statistical significant spatiotemporal trends are observed for climate change and land use characteristics. In this study, we indicated the effect of big dams in local climate on semi-arid Yamula Dam.

  3. A compilation of safety impact information for extractables associated with materials used in pharmaceutical packaging, delivery, administration, and manufacturing systems.

    PubMed

    Jenke, Dennis; Carlson, Tage

    2014-01-01

    Demonstrating suitability for intended use is necessary to register packaging, delivery/administration, or manufacturing systems for pharmaceutical products. During their use, such systems may interact with the pharmaceutical product, potentially adding extraneous entities to those products. These extraneous entities, termed leachables, have the potential to affect the product's performance and/or safety. To establish the potential safety impact, drug products and their packaging, delivery, or manufacturing systems are tested for leachables or extractables, respectively. This generally involves testing a sample (either the extract or the drug product) by a means that produces a test method response and then correlating the test method response with the identity and concentration of the entity causing the response. Oftentimes, analytical tests produce responses that cannot readily establish the associated entity's identity. Entities associated with un-interpretable responses are termed unknowns. Scientifically justifiable thresholds are used to establish those individual unknowns that represent an acceptable patient safety risk and thus which do not require further identification and, conversely, those unknowns whose potential safety impact require that they be identified. Such thresholds are typically based on the statistical analysis of datasets containing toxicological information for more or less relevant compounds. This article documents toxicological information for over 540 extractables identified in laboratory testing of polymeric materials used in pharmaceutical applications. Relevant toxicological endpoints, such as NOELs (no observed effects), NOAELs (no adverse effects), TDLOs (lowest published toxic dose), and others were collated for these extractables or their structurally similar surrogates and were systematically assessed to produce a risk index, which represents a daily intake value for life-long intravenous administration. This systematic approach uses four uncertainty factors, each assigned a factor of 10, which consider the quality and relevance of the data, differences in route of administration, non-human species to human extrapolations, and inter-individual variation among humans. In addition to the risk index values, all extractables and most of their surrogates were classified for structural safety alerts using Cramer rules and for mutagenicity alerts using an in silico approach (Benigni/Bossa rule base for mutagenicity via Toxtree). Lastly, in vitro mutagenicity data (Ames Salmonella typimurium and Mouse Lymphoma tests) were collected from available databases (Chemical Carcinogenesis Research Information and Carcinogenic Potency Database). The frequency distributions of the resulting data were established; in general risk index values were normally distributed around a band ranging from 5 to 20 mg/day. The risk index associated with 95% level of the cumulative distribution plot was approximately 0.1 mg/day. Thirteen extractables in the dataset had individual risk index values less than 0.1 mg/day, although four of these had additional risk indices, based on multiple different toxicological endpoints, above 0.1 mg/day. Additionally, approximately 50% of the extractables were classified in Cramer Class 1 (low risk of toxicity) and approximately 35% were in Cramer Class 3 (no basis to assume safety). Lastly, roughly 20% of the extractables triggered either an in vitro or in silico alert for mutagenicity. When Cramer classifications and the mutagenicity alerts were compared to the risk indices, extractables with safety alerts generally had lower risk index values, although the differences in the risk index data distributions, extractables with or without alerts, were small and subtle. Leachables from packaging systems, manufacturing systems, or delivery devices can accumulate in drug products and potentially affect the drug product. Although drug products can be analyzed for leachables (and material extracts can be analyzed for extractables), not all leachables or extractables can be fully identified. Safety thresholds can be used to establish whether the unidentified substances can be deemed to be safe or whether additional analytical efforts need to be made to secure the identities. These thresholds are typically based on the statistical analysis of datasets containing toxicological information for more or less relevant compounds. This article contains safety data for over 500 extractables that were identified in laboratory characterizations of polymers used in pharmaceutical applications. The safety data consists of structural toxicity classifications of the extractables as well as calculated risk indices, where the risk indices were obtained by subjecting toxicological safety data, such as NOELs (no observed effects), NOAELs (no adverse effects), TDLOs (lowest published toxic dose), and others to a systematic evaluation process using appropriate uncertainty factors. Thus the risk index values represent daily exposures for the lifetime intravenous administration of drugs. The frequency distributions of the risk indices and Cramer classifications were examined. The risk index values were normally distributed around a range of 5 to 20 mg/day, and the risk index associated with the 95% level of the cumulative frequency plot was 0.1 mg/day. Approximately 50% of the extractables were in Cramer Class 1 (low risk of toxicity) and approximately 35% were in Cramer Class 3 (high risk of toxicity). Approximately 20% of the extractables produced an in vitro or in silico mutagenicity alert. In general, the distribution of risk index values was not strongly correlated with the either extractables' Cramer classification or by mutagenicity alerts. However, extractables with either in vitro or in silico alerts were somewhat more likely to have low risk index values. © PDA, Inc. 2014.

  4. Immediate vs non-immediate loading post-extractive implants: a comparative study of implant stability quotient (ISQ)

    PubMed Central

    MILILLO, L.; FIANDACA, C.; GIANNOULIS, F.; OTTRIA, L.; LUCCHESE, A.; SILVESTRE, F.; PETRUZZI, M.

    2016-01-01

    SUMMARY Purpose This study aims to evaluate differences in implant stability between post-extractive implants vs immediately placed post-extractive implants by resonance frequency analysis (RFA). Materials and methods Patients were grouped into two different categories. In Group A 10 patients had an immediate post-extractive implant, then a provisional, acrylic resin crown was placed (immediate loading). In Group B (control group) 10 patients only had an immediate post-extractive implant. Both upper and lower premolars were chosen as post-extractive sites. Implant Stability Quotient (ISQ) was measured thanks to RFA measurements (Osstell®). Five intervals were considered: immediately after surgery (T0) and every four weeks, until five months after implant placement (T1, T2, T3, T4,T5). A statistical analysis by means of Student’s T-test (significance set at p<0.05) for independent sample was carried out in order to compare Groups A and B. Results The ISQ value between the two groups showed a statistically significant difference (p<0.02) at T1. No statistically significant difference in ISQ was assessed at T0, T2, T3, T4 and T5. Conclusions After clinical assessment it is possible to confirm that provisional and immediate prosthetic surgery in post-extraction sites with cone-shaped implants, platform-switching abutment and bioactive surface can facilitate osseointegration, reducing healing time. PMID:28042440

  5. Query-oriented evidence extraction to support evidence-based medicine practice.

    PubMed

    Sarker, Abeed; Mollá, Diego; Paris, Cecile

    2016-02-01

    Evidence-based medicine practice requires medical practitioners to rely on the best available evidence, in addition to their expertise, when making clinical decisions. The medical domain boasts a large amount of published medical research data, indexed in various medical databases such as MEDLINE. As the size of this data grows, practitioners increasingly face the problem of information overload, and past research has established the time-associated obstacles faced by evidence-based medicine practitioners. In this paper, we focus on the problem of automatic text summarisation to help practitioners quickly find query-focused information from relevant documents. We utilise an annotated corpus that is specialised for the task of evidence-based summarisation of text. In contrast to past summarisation approaches, which mostly rely on surface level features to identify salient pieces of texts that form the summaries, our approach focuses on the use of corpus-based statistics, and domain-specific lexical knowledge for the identification of summary contents. We also apply a target-sentence-specific summarisation technique that reduces the problem of underfitting that persists in generic summarisation models. In automatic evaluations run over a large number of annotated summaries, our extractive summarisation technique statistically outperforms various baseline and benchmark summarisation models with a percentile rank of 96.8%. A manual evaluation shows that our extractive summarisation approach is capable of selecting content with high recall and precision, and may thus be used to generate bottom-line answers to practitioners' queries. Our research shows that the incorporation of specialised data and domain-specific knowledge can significantly improve text summarisation performance in the medical domain. Due to the vast amounts of medical text available, and the high growth of this form of data, we suspect that such summarisation techniques will address the time-related obstacles associated with evidence-based medicine. Copyright © 2015 Elsevier Inc. All rights reserved.

  6. Presentation approaches for enhancing interpretability of patient-reported outcomes (PROs) in meta-analysis: a protocol for a systematic survey of Cochrane reviews.

    PubMed

    Devji, Tahira; Johnston, Bradley C; Patrick, Donald L; Bhandari, Mohit; Thabane, Lehana; Guyatt, Gordon H

    2017-09-27

    Meta-analyses of clinical trials often provide sufficient information for decision-makers to evaluate whether chance can explain apparent differences between interventions. Interpretation of the magnitude and importance of treatment effects beyond statistical significance can, however, be challenging, particularly for patient-reported outcomes (PROs) measured using questionnaires with which clinicians have limited familiarity. The objectives of our study are to systematically evaluate Cochrane systematic review authors' approaches to calculation, reporting and interpretation of pooled estimates of patient-reported outcome measures (PROMs) in meta-analyses. We will conduct a methodological survey of a random sample of Cochrane systematic reviews published from 1 January 2015 to 1 April 2017 that report at least one statistically significant pooled result for at least one PRO in the abstract. Author pairs will independently review all titles, abstracts and full texts identified by the literature search, and they will extract data using a standardised data extraction form. We will extract the following: year of publication, number of included trials, number of included participants, clinical area, type of intervention(s) and control(s), type of meta-analysis and use of the Grading of Recommendations, Assessment, Development and Evaluation approach to rate the quality of evidence, as well as information regarding the characteristics of PROMs, calculation and presentation of PROM effect estimates and interpretation of PROM effect estimates. We will document and summarise the methods used for the analysis, reporting and interpretation of each summary effect measure. We will summarise categorical variables with frequencies and percentages and continuous outcomes as means and/or medians and associated measures of dispersion. Ethics approval for this study is not required. We will disseminate the results of this review in peer-reviewed publications and conference presentations. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.

  7. Evaluation of a low-cost commercially available extraction device for assessing lead bioaccessibility in contaminated soils.

    PubMed

    Nelson, Clay M; Gilmore, Thomas M; Harrington, M; Scheckel, Kirk G; Miller, Bradley W; Bradham, Karen D

    2013-03-01

    The U.S. EPA's in vitro bioaccessibility (IVBA) method 9200.1-86 defines a validated analytical procedure for the determination of lead bioaccessibility in contaminated soils. The method requires the use of a custom-fabricated extraction device that uses a heated water bath for sample incubation. In an effort to improve ease of use, increase sample throughput, and reduce equipment acquisition and maintenance costs, an alternative low-cost, commercially available extraction device capable of sample incubation via heated air and end-over-end rotation was evaluated. An intra-laboratory study was conducted to compare lead bioaccessibility values derived using the two extraction devices. IVBA values were not statistically different (α = 0.05) between the two extraction devices for any of the soils (n = 6) evaluated in this study, with an average difference in mean lead IVBA of 0.8% (s.d. = 0.5%). The commercially available extraction device was able to generate accurate lead IVBA data as compared to the U.S. EPA's expected value for a National Institute of Standards and Technology standard reference material soil. The relative percent differences between high and low IVBA values for each soil, a measure of instrument precision, were also not statistically different (α = 0.05) between the two extraction devices. The statistical agreement of lead IVBA values observed using the two extraction devices supports the use of a low-cost, commercially available extraction device as a reliable alternative to a custom-fabricated device as required by EPA method 9200.1-86.

  8. Extracting temporal and spatial information from remotely sensed data for mapping wildlife habitat: Tucson

    USGS Publications Warehouse

    Wallace, Cynthia S.A.; Advised by Marsh, Stuart E.

    2002-01-01

    The research accomplished in this dissertation used both mathematical and statistical techniques to extract and evaluate measures of landscape temporal dynamics and spatial structure from remotely sensed data for the purpose of mapping wildlife habitat. By coupling the landscape measures gleaned from the remotely sensed data with various sets of animal sightings and population data, effective models of habitat preference were created.Measures of temporal dynamics of vegetation greenness as measured by National Oceanographic and Atmospheric Administration’s Advanced Very High Resolution Radiometer (AVHRR) satellite were used to effectively characterize and map season specific habitat of the Sonoran pronghorn antelope, as well as produce preliminary models of potential yellow-billed cuckoo habitat in Arizona. Various measures that capture different aspects of the temporal dynamics of the landscape were derived from AVHRR Normalized Difference Vegetation Index composite data using three main classes of calculations: basic statistics, standardized principal components analysis, and Fourier analysis. Pronghorn habitat models based on the AVHRR measures correspond visually and statistically to GIS-based models produced using data that represent detailed knowledge of ground-condition.Measures of temporal dynamics also revealed statistically significant correlations with annual estimates of elk population in selected Arizona Game Management Units, suggesting elk respond to regional environmental changes that can be measured using satellite data. Such relationships, once verified and established, can be used to help indirectly monitor the population.Measures of landscape spatial structure derived from IKONOS high spatial resolution (1-m) satellite data using geostatistics effectively map details of Sonoran pronghorn antelope habitat. Local estimates of the nugget, sill, and range variogram parameters calculated within 25 x 25-meter image windows describe the spatial autocorrelation of the image, permitting classification of all pixels into coherent units whose signature graphs exhibit a classic variogram shape. The variogram parameters captured in these signatures have been shown in previous studies to discriminate between different species-specific vegetation associations.The synoptic view of the landscape provided by satellite data can inform resource management efforts. The ability to characterize the spatial structure and temporal dynamics of habitat using repeatable remote sensing data allows closer monitoring of the relationship between a species and its landscape.

  9. A benchmark for statistical microarray data analysis that preserves actual biological and technical variance.

    PubMed

    De Hertogh, Benoît; De Meulder, Bertrand; Berger, Fabrice; Pierre, Michael; Bareke, Eric; Gaigneaux, Anthoula; Depiereux, Eric

    2010-01-11

    Recent reanalysis of spike-in datasets underscored the need for new and more accurate benchmark datasets for statistical microarray analysis. We present here a fresh method using biologically-relevant data to evaluate the performance of statistical methods. Our novel method ranks the probesets from a dataset composed of publicly-available biological microarray data and extracts subset matrices with precise information/noise ratios. Our method can be used to determine the capability of different methods to better estimate variance for a given number of replicates. The mean-variance and mean-fold change relationships of the matrices revealed a closer approximation of biological reality. Performance analysis refined the results from benchmarks published previously.We show that the Shrinkage t test (close to Limma) was the best of the methods tested, except when two replicates were examined, where the Regularized t test and the Window t test performed slightly better. The R scripts used for the analysis are available at http://urbm-cluster.urbm.fundp.ac.be/~bdemeulder/.

  10. Spatial statistical analysis of tree deaths using airborne digital imagery

    NASA Astrophysics Data System (ADS)

    Chang, Ya-Mei; Baddeley, Adrian; Wallace, Jeremy; Canci, Michael

    2013-04-01

    High resolution digital airborne imagery offers unprecedented opportunities for observation and monitoring of vegetation, providing the potential to identify, locate and track individual vegetation objects over time. Analytical tools are required to quantify relevant information. In this paper, locations of trees over a large area of native woodland vegetation were identified using morphological image analysis techniques. Methods of spatial point process statistics were then applied to estimate the spatially-varying tree death risk, and to show that it is significantly non-uniform. [Tree deaths over the area were detected in our previous work (Wallace et al., 2008).] The study area is a major source of ground water for the city of Perth, and the work was motivated by the need to understand and quantify vegetation changes in the context of water extraction and drying climate. The influence of hydrological variables on tree death risk was investigated using spatial statistics (graphical exploratory methods, spatial point pattern modelling and diagnostics).

  11. [Application of regular expression in extracting key information from Chinese medicine literatures about re-evaluation of post-marketing surveillance].

    PubMed

    Wang, Zhifei; Xie, Yanming; Wang, Yongyan

    2011-10-01

    Computerizing extracting information from Chinese medicine literature seems more convenient than hand searching, which could simplify searching process and improve the accuracy. However, many computerized auto-extracting methods are increasingly used, regular expression is so special that could be efficient for extracting useful information in research. This article focused on regular expression applying in extracting information from Chinese medicine literature. Two practical examples were reported in this article about regular expression to extract "case number (non-terminology)" and "efficacy rate (subgroups for related information identification)", which explored how to extract information in Chinese medicine literature by means of some special research method.

  12. Infrared maritime target detection using the high order statistic filtering in fractional Fourier domain

    NASA Astrophysics Data System (ADS)

    Zhou, Anran; Xie, Weixin; Pei, Jihong

    2018-06-01

    Accurate detection of maritime targets in infrared imagery under various sea clutter conditions is always a challenging task. The fractional Fourier transform (FRFT) is the extension of the Fourier transform in the fractional order, and has richer spatial-frequency information. By combining it with the high order statistic filtering, a new ship detection method is proposed. First, the proper range of angle parameter is determined to make it easier for the ship components and background to be separated. Second, a new high order statistic curve (HOSC) at each fractional frequency point is designed. It is proved that maximal peak interval in HOSC reflects the target information, while the points outside the interval reflect the background. And the value of HOSC relative to the ship is much bigger than that to the sea clutter. Then, search the curve's maximal target peak interval and extract the interval by bandpass filtering in fractional Fourier domain. The value outside the peak interval of HOSC decreases rapidly to 0, so the background is effectively suppressed. Finally, the detection result is obtained by the double threshold segmenting and the target region selection method. The results show the proposed method is excellent for maritime targets detection with high clutters.

  13. Statistical software applications used in health services research: analysis of published studies in the U.S

    PubMed Central

    2011-01-01

    Background This study aims to identify the statistical software applications most commonly employed for data analysis in health services research (HSR) studies in the U.S. The study also examines the extent to which information describing the specific analytical software utilized is provided in published articles reporting on HSR studies. Methods Data were extracted from a sample of 1,139 articles (including 877 original research articles) published between 2007 and 2009 in three U.S. HSR journals, that were considered to be representative of the field based upon a set of selection criteria. Descriptive analyses were conducted to categorize patterns in statistical software usage in those articles. The data were stratified by calendar year to detect trends in software use over time. Results Only 61.0% of original research articles in prominent U.S. HSR journals identified the particular type of statistical software application used for data analysis. Stata and SAS were overwhelmingly the most commonly used software applications employed (in 46.0% and 42.6% of articles respectively). However, SAS use grew considerably during the study period compared to other applications. Stratification of the data revealed that the type of statistical software used varied considerably by whether authors were from the U.S. or from other countries. Conclusions The findings highlight a need for HSR investigators to identify more consistently the specific analytical software used in their studies. Knowing that information can be important, because different software packages might produce varying results, owing to differences in the software's underlying estimation methods. PMID:21977990

  14. [Effects of the first premolar extraction on the third molar angulation].

    PubMed

    He, Yu-hong; Duan, Yin-zhong; Pan, Ji-jun; Xi, Lan-lan

    2008-08-01

    To analyze the effects about inclinations of the second and the third molars in patients treated with or without premolar extractions. Fifty-six adolescents were chosen and divided into the first premolar extraction and non-extraction groups, 30 and 26 patients respectively. The pre-treatment and post-treatment panoramic radiographs were made. Angles between long axis of the third molar and the occlusal plane (the second molar alike), and long axis of the second and the third molar were measured and evaluated. The maxillary and mandibular third molar angulations were all improved after treatment in two groups. Compared with non-extraction group, the average changes of angle between long axis of the third molar and the occlusal plane increased significantly in maxilla and mandible (P < 0.05). Average changes of angles between long axis of the second and the third molar decreased and had statistically significant difference (P < 0.05). The change of angle between long axis of mandibular second molar and the occlusal plane had statistically significant difference (P < 0.05), but there was no statistically significant difference in maxillary second molar (P > 0.05). The first premolar extraction in orthodontic treatment could improve the third molar angulations and it would promote the eruption of the third molar.

  15. [Information management in multicenter studies: the Brazilian longitudinal study for adult health].

    PubMed

    Duncan, Bruce Bartholow; Vigo, Álvaro; Hernandez, Émerson; Luft, Vivian Cristine; Ahlert, Hubert; Bergmann, Kaiser; Mota, Eduardo

    2013-06-01

    Information management in large multicenter studies requires a specialized approach. The Estudo Longitudinal da Saúde do Adulto (ELSA-Brasil - Brazilian Longitudinal Study for Adult Health) has created a Datacenter to enter and manage its data system. The aim of this paper is to describe the steps involved, including the information entry, transmission and management methods. A web system was developed in order to allow, in a safe and confidential way, online data entry, checking and editing, as well as the incorporation of data collected on paper. Additionally, a Picture Archiving and Communication System was implemented and customized for echocardiography and retinography. It stores the images received from the Investigation Centers and makes them available at the Reading Centers. Finally, data extraction and cleaning processes were developed to create databases in formats that enable analyses in multiple statistical packages.

  16. Use of keyword hierarchies to interpret gene expression patterns.

    PubMed

    Masys, D R; Welsh, J B; Lynn Fink, J; Gribskov, M; Klacansky, I; Corbeil, J

    2001-04-01

    High-density microarray technology permits the quantitative and simultaneous monitoring of thousands of genes. The interpretation challenge is to extract relevant information from this large amount of data. A growing variety of statistical analysis approaches are available to identify clusters of genes that share common expression characteristics, but provide no information regarding the biological similarities of genes within clusters. The published literature provides a potential source of information to assist in interpretation of clustering results. We describe a data mining method that uses indexing terms ('keywords') from the published literature linked to specific genes to present a view of the conceptual similarity of genes within a cluster or group of interest. The method takes advantage of the hierarchical nature of Medical Subject Headings used to index citations in the MEDLINE database, and the registry numbers applied to enzymes.

  17. LiDAR DTMs and anthropogenic feature extraction: testing the feasibility of geomorphometric parameters in floodplains

    NASA Astrophysics Data System (ADS)

    Sofia, G.; Tarolli, P.; Dalla Fontana, G.

    2012-04-01

    In floodplains, massive investments in land reclamation have always played an important role in the past for flood protection. In these contexts, human alteration is reflected by artificial features ('Anthropogenic features'), such as banks, levees or road scarps, that constantly increase and change, in response to the rapid growth of human populations. For these areas, various existing and emerging applications require up-to-date, accurate and sufficiently attributed digital data, but such information is usually lacking, especially when dealing with large-scale applications. More recently, National or Local Mapping Agencies, in Europe, are moving towards the generation of digital topographic information that conforms to reality and are highly reliable and up to date. LiDAR Digital Terrain Models (DTMs) covering large areas are readily available for public authorities, and there is a greater and more widespread interest in the application of such information by agencies responsible for land management for the development of automated methods aimed at solving geomorphological and hydrological problems. Automatic feature recognition based upon DTMs can offer, for large-scale applications, a quick and accurate method that can help in improving topographic databases, and that can overcome some of the problems associated with traditional, field-based, geomorphological mapping, such as restrictions on access, and constraints of time or costs. Although anthropogenic features as levees and road scarps are artificial structures that actually do not belong to what is usually defined as the bare ground surface, they are implicitly embedded in digital terrain models (DTMs). Automatic feature recognition based upon DTMs, therefore, can offer a quick and accurate method that does not require additional data, and that can help in improving flood defense asset information, flood modeling or other applications. In natural contexts, morphological indicators derived from high resolution topography have been proven to be reliable for feasible applications. The use of statistical operators as thresholds for these geomorphic parameters, furthermore, showed a high reliability for feature extraction in mountainous environments. The goal of this research is to test if these morphological indicators and objective thresholds can be feasible also in floodplains, where features assume different characteristics and other artificial disturbances might be present. In the work, three different geomorphic parameters are tested and applied at different scales on a LiDAR DTM of typical alluvial plain's area in the North East of Italy. The box-plot is applied to identify the threshold for feature extraction, and a filtering procedure is proposed, to improve the quality of the final results. The effectiveness of the different geomorphic parameters is analyzed, comparing automatically derived features with the surveyed ones. The results highlight the capability of high resolution topography, geomorphic indicators and statistical thresholds for anthropogenic features extraction and characterization in a floodplains context.

  18. Reconstructing Information in Large-Scale Structure via Logarithmic Mapping

    NASA Astrophysics Data System (ADS)

    Szapudi, Istvan

    We propose to develop a new method to extract information from large-scale structure data combining two-point statistics and non-linear transformations; before, this information was available only with substantially more complex higher-order statistical methods. Initially, most of the cosmological information in large-scale structure lies in two-point statistics. With non- linear evolution, some of that useful information leaks into higher-order statistics. The PI and group has shown in a series of theoretical investigations how that leakage occurs, and explained the Fisher information plateau at smaller scales. This plateau means that even as more modes are added to the measurement of the power spectrum, the total cumulative information (loosely speaking the inverse errorbar) is not increasing. Recently we have shown in Neyrinck et al. (2009, 2010) that a logarithmic (and a related Gaussianization or Box-Cox) transformation on the non-linear Dark Matter or galaxy field reconstructs a surprisingly large fraction of this missing Fisher information of the initial conditions. This was predicted by the earlier wave mechanical formulation of gravitational dynamics by Szapudi & Kaiser (2003). The present proposal is focused on working out the theoretical underpinning of the method to a point that it can be used in practice to analyze data. In particular, one needs to deal with the usual real-life issues of galaxy surveys, such as complex geometry, discrete sam- pling (Poisson or sub-Poisson noise), bias (linear, or non-linear, deterministic, or stochastic), redshift distortions, pro jection effects for 2D samples, and the effects of photometric redshift errors. We will develop methods for weak lensing and Sunyaev-Zeldovich power spectra as well, the latter specifically targetting Planck. In addition, we plan to investigate the question of residual higher- order information after the non-linear mapping, and possible applications for cosmology. Our aim will be to work out practical methods, with the ultimate goal of cosmological parameter estimation. We will quantify with standard MCMC and Fisher methods (including DETF Figure of merit when applicable) the efficiency of our estimators, comparing with the conventional method, that uses the un-transformed field. Preliminary results indicate that the increase for NASA's WFIRST in the DETF Figure of Merit would be 1.5-4.2 using a range of pessimistic to optimistic assumptions, respectively.

  19. Real-time movement detection and analysis for video surveillance applications

    NASA Astrophysics Data System (ADS)

    Hueber, Nicolas; Hennequin, Christophe; Raymond, Pierre; Moeglin, Jean-Pierre

    2014-06-01

    Pedestrian movement along critical infrastructures like pipes, railways or highways, is of major interest in surveillance applications as well as its behavior in urban environment. The goal is to anticipate illicit or dangerous human activities. For this purpose, we propose an all-in-one small autonomous system which delivers high level statistics and reports alerts in specific cases. This situational awareness project leads us to manage efficiently the scene by performing movement analysis. A dynamic background extraction algorithm is developed to reach the degree of robustness against natural and urban environment perturbations and also to match the embedded implementation constraints. When changes are detected in the scene, specific patterns are applied to detect and highlight relevant movements. Depending on the applications, specific descriptors can be extracted and fused in order to reach a high level of interpretation. In this paper, our approach is applied to two operational use cases: pedestrian urban statistics and railway surveillance. In the first case, a grid of prototypes is deployed over a city centre to collect pedestrian movement statistics up to a macroscopic level of analysis. The results demonstrate the relevance of the delivered information; in particular, the flow density map highlights pedestrian preferential paths along the streets. In the second case, one prototype is set next to high speed train tracks to secure the area. The results exhibit a low false alarm rate and assess our approach of a large sensor network for delivering a precise operational picture without overwhelming a supervisor.

  20. I-Maculaweb: A Tool to Support Data Reuse in Ophthalmology

    PubMed Central

    Bonetto, Monica; Nicolò, Massimo; Gazzarata, Roberta; Fraccaro, Paolo; Rosa, Raffaella; Musetti, Donatella; Musolino, Maria; Traverso, Carlo E.

    2016-01-01

    This paper intends to present a Web-based application to collect and manage clinical data and clinical trials together in a unique tool. I-maculaweb is a user-friendly Web-application designed to manage, share, and analyze clinical data from patients affected by degenerative and vascular diseases of the macula. The unique and innovative scientific and technological elements of this project are the integration with individual and population data, relevant for degenerative and vascular diseases of the macula. Clinical records can also be extracted for statistical purposes and used for clinical decision support systems. I-maculaweb is based on an existing multilevel and multiscale data management model, which includes general principles that are suitable for several different clinical domains. The database structure has been specifically built to respect laterality, a key aspect in ophthalmology. Users can add and manage patient records, follow-up visits, treatment, diagnoses, and clinical history. There are two different modalities to extract records: one for the patient’s own center, in which personal details are shown and the other for statistical purposes, where all center’s anonymized data are visible. The Web-platform allows effective management, sharing, and reuse of information within primary care and clinical research. Clear and precise clinical data will improve understanding of real-life management of degenerative and vascular diseases of the macula as well as increasing precise epidemiologic and statistical data. Furthermore, this Web-based application can be easily employed as an electronic clinical research file in clinical studies. PMID:27170913

  1. Optimal Prediction in the Retina and Natural Motion Statistics

    NASA Astrophysics Data System (ADS)

    Salisbury, Jared M.; Palmer, Stephanie E.

    2016-03-01

    Almost all behaviors involve making predictions. Whether an organism is trying to catch prey, avoid predators, or simply move through a complex environment, the organism uses the data it collects through its senses to guide its actions by extracting from these data information about the future state of the world. A key aspect of the prediction problem is that not all features of the past sensory input have predictive power, and representing all features of the external sensory world is prohibitively costly both due to space and metabolic constraints. This leads to the hypothesis that neural systems are optimized for prediction. Here we describe theoretical and computational efforts to define and quantify the efficient representation of the predictive information by the brain. Another important feature of the prediction problem is that the physics of the world is diverse enough to contain a wide range of possible statistical ensembles, yet not all inputs are probable. Thus, the brain might not be a generalized predictive machine; it might have evolved to specifically solve the prediction problems most common in the natural environment. This paper summarizes recent results on predictive coding and optimal predictive information in the retina and suggests approaches for quantifying prediction in response to natural motion. Basic statistics of natural movies reveal that general patterns of spatiotemporal correlation are present across a wide range of scenes, though individual differences in motion type may be important for optimal processing of motion in a given ecological niche.

  2. Antibacterial activity of Limonium brasiliense (Baicuru) against multidrug-resistant bacteria using a statistical mixture design.

    PubMed

    Blainski, Andressa; Gionco, Barbara; Oliveira, Admilton G; Andrade, Galdino; Scarminio, Ieda S; Silva, Denise B; Lopes, Norberto P; Mello, João C P

    2017-02-23

    Limonium brasiliense (Boiss.) Kuntze (Plumbaginaceae) is commonly known as "baicuru" or "guaicuru" and preparations of its dried rhizomes have been popularly used in the treatment of premenstrual syndrome and menstrual disorder, and as an antiseptic in genito-urinary infections. This study evaluated the potential antibacterial activity of rhizome extracts against multidrug-resistant bacterial strains using statistical mixture design. The statistical design of four components (water, methanol, acetone, and ethanol) produced 15 different extracts and also a confirmatory experiment, which was performed using water:acetone (3:7, v/v). The crude extracts and their ethyl-acetate fractions were tested against vancomycin-resistant Enterococcus faecium (VREfm), methicillin-resistant Staphylococcus aureus (MRSA) and Klebsiella pneumoniae carbapenemase (KPC)-producing K. pneumoniae, all of which have been implicated in hospital and community-acquired infections. The dry residue, total polyphenol, gallocatechin and epigallocatechin contents of the extracts were also tested and statistical analysis was applied in order to define the fit models to predict the result of each parameter for any mixture of components. The principal component and hierarchical clustering analyses (PCA and HCA) of chromatographic data, as well as mass spectrometry (MS) analysis were performanced to determine the main compounds present in the extracts. The Gram-positive bacteria were susceptible to inhibition of bacterial growth, in special the ethyl-acetate fraction of ternary extracts from water:acetone:ethanol and methanol:acetone:ethanol against, respectively, VREfm (MIC=19µg/mL) and MRSA (MIC=39µg/mL). On the other hand, moderate activity of the ethyl-acetate fractions from primary (except water), secondary and ternary extracts (MIC=625µg/mL) was noted against KPC. The quadratic and special cubic models were significant for polyphenols and gallocatechin contents, respectively. Fit models to dry residue and epigallocatechin contents were not possible. PCA and HCA of the chromatographic fingerprints were disturbed by displacement retention time of some peaks, but the ultraviolet spectra indicated the homogeneous presence of flavan-3-ols characteristic of tannins. The MS confirmed the presence of gallic acid, gallocatechin, and epigallocatechin in extracts, and suggested the presence of monomers and dimers of B- and A-type prodelphinidins gallate, as well as a methyl gallate. Our results showed the antibacterial potential of L. brasiliense extracts against multidrug-resistant Gram-positive bacteria, such as VREfm and MRSA. The statistical design was a important tool to evaluate the biological activity by optimized form. The presence of some phenolic compounds was also demonstrated in extracts. Copyright © 2017 Elsevier Ireland Ltd. All rights reserved.

  3. Identifying city PV roof resource based on Gabor filter

    NASA Astrophysics Data System (ADS)

    Ruhang, Xu; Zhilin, Liu; Yong, Huang; Xiaoyu, Zhang

    2017-06-01

    To identify a city’s PV roof resources, the area and ownership distribution of residential buildings in an urban district should be assessed. To achieve this assessment, remote sensing data analysing is a promising approach. Urban building roof area estimation is a major topic for remote sensing image information extraction. There are normally three ways to solve this problem. The first way is pixel-based analysis, which is based on mathematical morphology or statistical methods; the second way is object-based analysis, which is able to combine semantic information and expert knowledge; the third way is signal-processing view method. This paper presented a Gabor filter based method. This result shows that the method is fast and with proper accuracy.

  4. SERAPHIM: studying environmental rasters and phylogenetically informed movements.

    PubMed

    Dellicour, Simon; Rose, Rebecca; Faria, Nuno R; Lemey, Philippe; Pybus, Oliver G

    2016-10-15

    SERAPHIM ("Studying Environmental Rasters and PHylogenetically Informed Movements") is a suite of computational methods developed to study phylogenetic reconstructions of spatial movement in an environmental context. SERAPHIM extracts the spatio-temporal information contained in estimated phylogenetic trees and uses this information to calculate summary statistics of spatial spread and to visualize dispersal history. Most importantly, SERAPHIM enables users to study the impact of customized environmental variables on the spread of the study organism. Specifically, given an environmental raster, SERAPHIM computes environmental "weights" for each phylogeny branch, which represent the degree to which the environmental variable impedes (or facilitates) lineage movement. Correlations between movement duration and these environmental weights are then assessed, and the statistical significances of these correlations are evaluated using null distributions generated by a randomization procedure. SERAPHIM can be applied to any phylogeny whose nodes are annotated with spatial and temporal information. At present, such phylogenies are most often found in the field of emerging infectious diseases, but will become increasingly common in other biological disciplines as population genomic data grows. SERAPHIM 1.0 is freely available from http://evolve.zoo.ox.ac.uk/ R package, source code, example files, tutorials and a manual are also available from this website. simon.dellicour@kuleuven.be or oliver.pybus@zoo.ox.ac.ukSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  5. Challenges in Managing Information Extraction

    ERIC Educational Resources Information Center

    Shen, Warren H.

    2009-01-01

    This dissertation studies information extraction (IE), the problem of extracting structured information from unstructured data. Example IE tasks include extracting person names from news articles, product information from e-commerce Web pages, street addresses from emails, and names of emerging music bands from blogs. IE is all increasingly…

  6. Proton radius from electron scattering data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Higinbotham, Douglas W.; Kabir, Al Amin; Lin, Vincent

    Background: The proton charge radius extracted from recent muonic hydrogen Lamb shift measurements is significantly smaller than that extracted from atomic hydrogen and electron scattering measurements. The discrepancy has become known as the proton radius puzzle. Purpose: In an attempt to understand the discrepancy, we review high-precision electron scattering results from Mainz, Jefferson Lab, Saskatoon and Stanford. Methods: We make use of stepwise regression techniques using the F-test as well as the Akaike information criterion to systematically determine the predictive variables to use for a given set and range of electron scattering data as well as to provide multivariate errormore » estimates. Results: Starting with the precision, low four-momentum transfer (Q 2) data from Mainz (1980) and Saskatoon (1974), we find that a stepwise regression of the Maclaurin series using the F-test as well as the Akaike information criterion justify using a linear extrapolation which yields a value for the proton radius that is consistent with the result obtained from muonic hydrogen measurements. Applying the same Maclaurin series and statistical criteria to the 2014 Rosenbluth results on GE from Mainz, we again find that the stepwise regression tends to favor a radius consistent with the muonic hydrogen radius but produces results that are extremely sensitive to the range of data included in the fit. Making use of the high-Q 2 data on G E to select functions which extrapolate to high Q 2, we find that a Pad´e (N = M = 1) statistical model works remarkably well, as does a dipole function with a 0.84 fm radius, G E(Q 2) = (1 + Q 2/0.66 GeV 2) -2. Conclusions: Rigorous applications of stepwise regression techniques and multivariate error estimates result in the extraction of a proton charge radius that is consistent with the muonic hydrogen result of 0.84 fm; either from linear extrapolation of the extreme low-Q 2 data or by use of the Pad´e approximant for extrapolation using a larger range of data. Thus, based on a purely statistical analysis of electron scattering data, we conclude that the electron scattering result and the muonic hydrogen result are consistent. Lastly, it is the atomic hydrogen results that are the outliers.« less

  7. Proton radius from electron scattering data

    DOE PAGES

    Higinbotham, Douglas W.; Kabir, Al Amin; Lin, Vincent; ...

    2016-05-31

    Background: The proton charge radius extracted from recent muonic hydrogen Lamb shift measurements is significantly smaller than that extracted from atomic hydrogen and electron scattering measurements. The discrepancy has become known as the proton radius puzzle. Purpose: In an attempt to understand the discrepancy, we review high-precision electron scattering results from Mainz, Jefferson Lab, Saskatoon and Stanford. Methods: We make use of stepwise regression techniques using the F-test as well as the Akaike information criterion to systematically determine the predictive variables to use for a given set and range of electron scattering data as well as to provide multivariate errormore » estimates. Results: Starting with the precision, low four-momentum transfer (Q 2) data from Mainz (1980) and Saskatoon (1974), we find that a stepwise regression of the Maclaurin series using the F-test as well as the Akaike information criterion justify using a linear extrapolation which yields a value for the proton radius that is consistent with the result obtained from muonic hydrogen measurements. Applying the same Maclaurin series and statistical criteria to the 2014 Rosenbluth results on GE from Mainz, we again find that the stepwise regression tends to favor a radius consistent with the muonic hydrogen radius but produces results that are extremely sensitive to the range of data included in the fit. Making use of the high-Q 2 data on G E to select functions which extrapolate to high Q 2, we find that a Pad´e (N = M = 1) statistical model works remarkably well, as does a dipole function with a 0.84 fm radius, G E(Q 2) = (1 + Q 2/0.66 GeV 2) -2. Conclusions: Rigorous applications of stepwise regression techniques and multivariate error estimates result in the extraction of a proton charge radius that is consistent with the muonic hydrogen result of 0.84 fm; either from linear extrapolation of the extreme low-Q 2 data or by use of the Pad´e approximant for extrapolation using a larger range of data. Thus, based on a purely statistical analysis of electron scattering data, we conclude that the electron scattering result and the muonic hydrogen result are consistent. Lastly, it is the atomic hydrogen results that are the outliers.« less

  8. The architecture of the management system of complex steganographic information

    NASA Astrophysics Data System (ADS)

    Evsutin, O. O.; Meshcheryakov, R. V.; Kozlova, A. S.; Solovyev, T. M.

    2017-01-01

    The aim of the study is to create a wide area information system that allows one to control processes of generation, embedding, extraction, and detection of steganographic information. In this paper, the following problems are considered: the definition of the system scope and the development of its architecture. For creation of algorithmic maintenance of the system, classic methods of steganography are used to embed information. Methods of mathematical statistics and computational intelligence are used to identify the embedded information. The main result of the paper is the development of the architecture of the management system of complex steganographic information. The suggested architecture utilizes cloud technology in order to provide service using the web-service via the Internet. It is meant to provide streams of multimedia data processing that are streams with many sources of different types. The information system, built in accordance with the proposed architecture, will be used in the following areas: hidden transfer of documents protected by medical secrecy in telemedicine systems; copyright protection of online content in public networks; prevention of information leakage caused by insiders.

  9. Developing a Research Instrument to Document Awareness, Knowledge, and Attitudes Regarding Breast Cancer and Early Detection Techniques for Pakistani Women: The Breast Cancer Inventory (BCI).

    PubMed

    Naqvi, Atta Abbas; Zehra, Fatima; Ahmad, Rizwan; Ahmad, Niyaz

    2016-12-09

    There is a general hesitation in participation among Pakistani women when it comes to giving their responses in surveys related to breast cancer which may be due to the associated stigma and conservatism in society. We felt that no research instrument was able to extract information from the respondents to the extent it was needed for the successful execution of our study. The need to develop a research instrument tailored for Pakistani women was based upon the fact that most Pakistani women come from a conservative background and sometimes view this topic as provocative and believe discussing publicly about it as inappropriate. Existing research instruments exhibited a number of weaknesses during literature review. Therefore, using them may not be able to extract information concretely. A research instrument was, thus, developed exclusively. It was coined as, "breast cancer inventory (BCI)" by a panel of experts for executing a study aimed at documenting awareness, knowledge, and attitudes of Pakistani women regarding breast cancer and early detection techniques. The study is still in the data collection phase. The statistical analysis involved the Kaiser-Meyer-Olkin (KMO) measure and Bartlett's test for sampling adequacy. In addition, reliability analysis and exploratory factor analysis (EFA) were, also employed. This concept paper focuses on the development, piloting and validation of the BCI. It is the first research instrument which has high acceptability among Pakistani women and is able to extract adequate information from the respondents without causing embarrassment or unease.

  10. Developing a Research Instrument to Document Awareness, Knowledge, and Attitudes Regarding Breast Cancer and Early Detection Techniques for Pakistani Women: The Breast Cancer Inventory (BCI)

    PubMed Central

    Naqvi, Atta Abbas; Zehra, Fatima; Ahmad, Rizwan; Ahmad, Niyaz

    2016-01-01

    There is a general hesitation in participation among Pakistani women when it comes to giving their responses in surveys related to breast cancer which may be due to the associated stigma and conservatism in society. We felt that no research instrument was able to extract information from the respondents to the extent it was needed for the successful execution of our study. The need to develop a research instrument tailored for Pakistani women was based upon the fact that most Pakistani women come from a conservative background and sometimes view this topic as provocative and believe discussing publicly about it as inappropriate. Existing research instruments exhibited a number of weaknesses during literature review. Therefore, using them may not be able to extract information concretely. A research instrument was, thus, developed exclusively. It was coined as, “breast cancer inventory (BCI)” by a panel of experts for executing a study aimed at documenting awareness, knowledge, and attitudes of Pakistani women regarding breast cancer and early detection techniques. The study is still in the data collection phase. The statistical analysis involved the Kaiser-Meyer-Olkin (KMO) measure and Bartlett’s test for sampling adequacy. In addition, reliability analysis and exploratory factor analysis (EFA) were, also employed. This concept paper focuses on the development, piloting and validation of the BCI. It is the first research instrument which has high acceptability among Pakistani women and is able to extract adequate information from the respondents without causing embarrassment or unease. PMID:28933416

  11. Antidiabetic activity and phytochemical screening of extracts of the leaves of Ajuga remota Benth on alloxan-induced diabetic mice.

    PubMed

    Tafesse, Tadesse Bekele; Hymete, Ariaya; Mekonnen, Yalemtsehay; Tadesse, Mekuria

    2017-05-02

    Ajuga remota Benth is traditionally used in Ethiopia for the management of diabetes mellitus. Since this claim has not been investigated scientifically, the aim of this study was to evaluate the antidiabetic effect and phytochemical screening of the aqueous and 70% ethanol extracts on alloxan-induced diabetic mice. After acute toxicity test, the Swiss albino mice were induced with alloxan to get experimental diabetes animals. The fasting mean blood glucose level before and after treatment for two weeks in normal, diabetic untreated and diabetic mice treated with aqueous and 70% ethanol extracts were performed. Data were statistically evaluated by using Statistical Package for the Social Sciences software version 20. P-value <0.05 was considered statistically significant. The medium lethal doses (LD 50 ) of both extracts were higher than 5000 mg/kg, indicating the extracts are not toxic under the observable condition. Aqueous extracts of A.remota (300 mg/kg and 500 mg/kg body weight) reduced elevated blood glucose levels by 27.83 ± 2.96% and 38.98 ± 0.67% (P < 0.0001), respectively while the 70% ethanol extract caused a reduction of 27.94 ± 1.92% (300 mg/kg) & 28.26 ± 1.82% (500 mg/kg). Treatment with the antidiabetic drug, Glibenclamide (10 mg/kg body weight) lowered blood glucose level by 51.06% (p < 0.05). Phytochemical screening of both extracts indicated the presence of phenolic compounds, flavonoids, saponins, tannins, and steroids, which might contribute to the antidiabetic activity. The extracts, however, did not contain alkaloids and anthraquinones. The aqueous extract (500 mg/kg) showed the highest percentage reduction in blood glucose levels and the ability of A. remota extracts in reducing blood glucose levels presumably due to the presence of antioxidant constituents such as flavonoids. The effect of the extract supported the traditional claim of the plant.

  12. Utilization of electrical impedance imaging for estimation of in-vivo tissue resistivities

    NASA Astrophysics Data System (ADS)

    Eyuboglu, B. Murat; Pilkington, Theo C.

    1993-08-01

    In order to determine in vivo resistivity of tissues in the thorax, the possibility of combining electrical impedance imaging (EII) techniques with (1) anatomical data extracted from high resolution images, (2) a prior knowledge of tissue resistivities, and (3) a priori noise information was assessed in this study. A Least Square Error Estimator (LSEE) and a statistically constrained Minimum Mean Square Error Estimator (MiMSEE) were implemented to estimate regional electrical resistivities from potential measurements made on the body surface. A two dimensional boundary element model of the human thorax, which consists of four different conductivity regions (the skeletal muscle, the heart, the right lung, and the left lung) was adopted to simulate the measured EII torso potentials. The calculated potentials were then perturbed by simulated instrumentation noise. The signal information used to form the statistical constraint for the MiMSEE was obtained from a prior knowledge of the physiological range of tissue resistivities. The noise constraint was determined from a priori knowledge of errors due to linearization of the forward problem and to the instrumentation noise.

  13. Big Data Analytics for Scanning Transmission Electron Microscopy Ptychography

    NASA Astrophysics Data System (ADS)

    Jesse, S.; Chi, M.; Belianinov, A.; Beekman, C.; Kalinin, S. V.; Borisevich, A. Y.; Lupini, A. R.

    2016-05-01

    Electron microscopy is undergoing a transition; from the model of producing only a few micrographs, through the current state where many images and spectra can be digitally recorded, to a new mode where very large volumes of data (movies, ptychographic and multi-dimensional series) can be rapidly obtained. Here, we discuss the application of so-called “big-data” methods to high dimensional microscopy data, using unsupervised multivariate statistical techniques, in order to explore salient image features in a specific example of BiFeO3 domains. Remarkably, k-means clustering reveals domain differentiation despite the fact that the algorithm is purely statistical in nature and does not require any prior information regarding the material, any coexisting phases, or any differentiating structures. While this is a somewhat trivial case, this example signifies the extraction of useful physical and structural information without any prior bias regarding the sample or the instrumental modality. Further interpretation of these types of results may still require human intervention. However, the open nature of this algorithm and its wide availability, enable broad collaborations and exploratory work necessary to enable efficient data analysis in electron microscopy.

  14. Using statistical text classification to identify health information technology incidents

    PubMed Central

    Chai, Kevin E K; Anthony, Stephen; Coiera, Enrico; Magrabi, Farah

    2013-01-01

    Objective To examine the feasibility of using statistical text classification to automatically identify health information technology (HIT) incidents in the USA Food and Drug Administration (FDA) Manufacturer and User Facility Device Experience (MAUDE) database. Design We used a subset of 570 272 incidents including 1534 HIT incidents reported to MAUDE between 1 January 2008 and 1 July 2010. Text classifiers using regularized logistic regression were evaluated with both ‘balanced’ (50% HIT) and ‘stratified’ (0.297% HIT) datasets for training, validation, and testing. Dataset preparation, feature extraction, feature selection, cross-validation, classification, performance evaluation, and error analysis were performed iteratively to further improve the classifiers. Feature-selection techniques such as removing short words and stop words, stemming, lemmatization, and principal component analysis were examined. Measurements κ statistic, F1 score, precision and recall. Results Classification performance was similar on both the stratified (0.954 F1 score) and balanced (0.995 F1 score) datasets. Stemming was the most effective technique, reducing the feature set size to 79% while maintaining comparable performance. Training with balanced datasets improved recall (0.989) but reduced precision (0.165). Conclusions Statistical text classification appears to be a feasible method for identifying HIT reports within large databases of incidents. Automated identification should enable more HIT problems to be detected, analyzed, and addressed in a timely manner. Semi-supervised learning may be necessary when applying machine learning to big data analysis of patient safety incidents and requires further investigation. PMID:23666777

  15. Quantum Quench Dynamics

    NASA Astrophysics Data System (ADS)

    Mitra, Aditi

    2018-03-01

    Quench dynamics is an active area of study encompassing condensed matter physics and quantum information, with applications to cold-atomic gases and pump-probe spectroscopy of materials. Recent theoretical progress in studying quantum quenches is reviewed. Quenches in interacting one-dimensional systems as well as systems in higher spatial dimensions are covered. The appearance of nontrivial steady states following a quench in exactly solvable models is discussed, and the stability of these states to perturbations is described. Proper conserving approximations needed to capture the onset of thermalization at long times are outlined. The appearance of universal scaling for quenches near critical points and the role of the renormalization group in capturing the transient regime are reviewed. Finally, the effect of quenches near critical points on the dynamics of entanglement entropy and entanglement statistics is discussed. The extraction of critical exponents from the entanglement statistics is outlined.

  16. Factors Influencing Dental Patient Participation in Biobanking and Biomedical Research.

    PubMed

    Hassona, Yazan; Ahram, Mamoun; Odeh, Noorah; Abu Gosh, Mais; Scully, Crispian

    To study the willingness of dental patients to donate biospecimens for research purpose and to examine factors that may influence such a decision. A face-to-face interview was conducted using a pretested structured survey instrument on 408 adult dental patients attending a university hospital for dental care. Descriptive statistics were generated, and the x03C7;2 test was used to examine differences between groups. p values ≤0.5 were considered statistically significant. Of the 408 participants, only 71 (17.4%) had heard of the terms biobanking/biospecimens, but 293 (71.9%) approved of the idea of using biospecimens for biomedical research, and 228 (55.9%) were willing to donate biospecimens and give personal information for research purposes. In participants who were unwilling to participate in biobanking, fear of information leakage was the most frequently reported reason, while in participants who were willing to donate biospecimens, the potential to provide more effective and less costly treatments was the most frequently reported reason. The preferences of the 228 participants who were willing to donate biospecimens were as follows: give a sample of removed oral tissues including extracted teeth (n = 105, 46.1%), donate a blood sample (n = 52, 23%), donate a sample of saliva (n = 43, 18.6%), and give a urine sample (n = 28, 12.3%). Dental patients had a generally positive attitude towards biomedical research and biobanking. The most preferred types of biospecimens to donate in a dental setting were removed tissues, including extracted teeth and blood samples. © 2016 S. Karger AG, Basel.

  17. A Pilot Study on Developing a Standardized and Sensitive School Violence Risk Assessment with Manual Annotation.

    PubMed

    Barzman, Drew H; Ni, Yizhao; Griffey, Marcus; Patel, Bianca; Warren, Ashaki; Latessa, Edward; Sorter, Michael

    2017-09-01

    School violence has increased over the past decade and innovative, sensitive, and standardized approaches to assess school violence risk are needed. In our current feasibility study, we initialized a standardized, sensitive, and rapid school violence risk approach with manual annotation. Manual annotation is the process of analyzing a student's transcribed interview to extract relevant information (e.g., key words) to school violence risk levels that are associated with students' behaviors, attitudes, feelings, use of technology (social media and video games), and other activities. In this feasibility study, we first implemented school violence risk assessments to evaluate risk levels by interviewing the student and parent separately at the school or the hospital to complete our novel school safety scales. We completed 25 risk assessments, resulting in 25 transcribed interviews of 12-18 year olds from 15 schools in Ohio and Kentucky. We then analyzed structured professional judgments, language, and patterns associated with school violence risk levels by using manual annotation and statistical methodology. To analyze the student interviews, we initiated the development of an annotation guideline to extract key information that is associated with students' behaviors, attitudes, feelings, use of technology and other activities. Statistical analysis was applied to associate the significant categories with students' risk levels to identify key factors which will help with developing action steps to reduce risk. In a future study, we plan to recruit more subjects in order to fully develop the manual annotation which will result in a more standardized and sensitive approach to school violence assessments.

  18. Shape information from glucose curves: Functional data analysis compared with traditional summary measures

    PubMed Central

    2013-01-01

    Background Plasma glucose levels are important measures in medical care and research, and are often obtained from oral glucose tolerance tests (OGTT) with repeated measurements over 2–3 hours. It is common practice to use simple summary measures of OGTT curves. However, different OGTT curves can yield similar summary measures, and information of physiological or clinical interest may be lost. Our mean aim was to extract information inherent in the shape of OGTT glucose curves, compare it with the information from simple summary measures, and explore the clinical usefulness of such information. Methods OGTTs with five glucose measurements over two hours were recorded for 974 healthy pregnant women in their first trimester. For each woman, the five measurements were transformed into smooth OGTT glucose curves by functional data analysis (FDA), a collection of statistical methods developed specifically to analyse curve data. The essential modes of temporal variation between OGTT glucose curves were extracted by functional principal component analysis. The resultant functional principal component (FPC) scores were compared with commonly used simple summary measures: fasting and two-hour (2-h) values, area under the curve (AUC) and simple shape index (2-h minus 90-min values, or 90-min minus 60-min values). Clinical usefulness of FDA was explored by regression analyses of glucose tolerance later in pregnancy. Results Over 99% of the variation between individually fitted curves was expressed in the first three FPCs, interpreted physiologically as “general level” (FPC1), “time to peak” (FPC2) and “oscillations” (FPC3). FPC1 scores correlated strongly with AUC (r=0.999), but less with the other simple summary measures (−0.42≤r≤0.79). FPC2 scores gave shape information not captured by simple summary measures (−0.12≤r≤0.40). FPC2 scores, but not FPC1 nor the simple summary measures, discriminated between women who did and did not develop gestational diabetes later in pregnancy. Conclusions FDA of OGTT glucose curves in early pregnancy extracted shape information that was not identified by commonly used simple summary measures. This information discriminated between women with and without gestational diabetes later in pregnancy. PMID:23327294

  19. Shape information from glucose curves: functional data analysis compared with traditional summary measures.

    PubMed

    Frøslie, Kathrine Frey; Røislien, Jo; Qvigstad, Elisabeth; Godang, Kristin; Bollerslev, Jens; Voldner, Nanna; Henriksen, Tore; Veierød, Marit B

    2013-01-17

    Plasma glucose levels are important measures in medical care and research, and are often obtained from oral glucose tolerance tests (OGTT) with repeated measurements over 2-3  hours. It is common practice to use simple summary measures of OGTT curves. However, different OGTT curves can yield similar summary measures, and information of physiological or clinical interest may be lost. Our mean aim was to extract information inherent in the shape of OGTT glucose curves, compare it with the information from simple summary measures, and explore the clinical usefulness of such information. OGTTs with five glucose measurements over two hours were recorded for 974 healthy pregnant women in their first trimester. For each woman, the five measurements were transformed into smooth OGTT glucose curves by functional data analysis (FDA), a collection of statistical methods developed specifically to analyse curve data. The essential modes of temporal variation between OGTT glucose curves were extracted by functional principal component analysis. The resultant functional principal component (FPC) scores were compared with commonly used simple summary measures: fasting and two-hour (2-h) values, area under the curve (AUC) and simple shape index (2-h minus 90-min values, or 90-min minus 60-min values). Clinical usefulness of FDA was explored by regression analyses of glucose tolerance later in pregnancy. Over 99% of the variation between individually fitted curves was expressed in the first three FPCs, interpreted physiologically as "general level" (FPC1), "time to peak" (FPC2) and "oscillations" (FPC3). FPC1 scores correlated strongly with AUC (r=0.999), but less with the other simple summary measures (-0.42≤r≤0.79). FPC2 scores gave shape information not captured by simple summary measures (-0.12≤r≤0.40). FPC2 scores, but not FPC1 nor the simple summary measures, discriminated between women who did and did not develop gestational diabetes later in pregnancy. FDA of OGTT glucose curves in early pregnancy extracted shape information that was not identified by commonly used simple summary measures. This information discriminated between women with and without gestational diabetes later in pregnancy.

  20. Information extraction system

    DOEpatents

    Lemmond, Tracy D; Hanley, William G; Guensche, Joseph Wendell; Perry, Nathan C; Nitao, John J; Kidwell, Paul Brandon; Boakye, Kofi Agyeman; Glaser, Ron E; Prenger, Ryan James

    2014-05-13

    An information extraction system and methods of operating the system are provided. In particular, an information extraction system for performing meta-extraction of named entities of people, organizations, and locations as well as relationships and events from text documents are described herein.

  1. Model of experts for decision support in the diagnosis of leukemia patients.

    PubMed

    Corchado, Juan M; De Paz, Juan F; Rodríguez, Sara; Bajo, Javier

    2009-07-01

    Recent advances in the field of biomedicine, specifically in the field of genomics, have led to an increase in the information available for conducting expression analysis. Expression analysis is a technique used in transcriptomics, a branch of genomics that deals with the study of messenger ribonucleic acid (mRNA) and the extraction of information contained in the genes. This increase in information is reflected in the exon arrays, which require the use of new techniques in order to extract the information. The purpose of this study is to provide a tool based on a mixture of experts model that allows the analysis of the information contained in the exon arrays, from which automatic classifications for decision support in diagnoses of leukemia patients can be made. The proposed model integrates several cooperative algorithms characterized for their efficiency for data processing, filtering, classification and knowledge extraction. The Cancer Institute of the University of Salamanca is making an effort to develop tools to automate the evaluation of data and to facilitate de analysis of information. This proposal is a step forward in this direction and the first step toward the development of a mixture of experts tool that integrates different cognitive and statistical approaches to deal with the analysis of exon arrays. The mixture of experts model presented within this work provides great capacities for learning and adaptation to the characteristics of the problem in consideration, using novel algorithms in each of the stages of the analysis process that can be easily configured and combined, and provides results that notably improve those provided by the existing methods for exon arrays analysis. The material used consists of data from exon arrays provided by the Cancer Institute that contain samples from leukemia patients. The methodology used consists of a system based on a mixture of experts. Each one of the experts incorporates novel artificial intelligence techniques that improve the process of carrying out various tasks such as pre-processing, filtering, classification and extraction of knowledge. This article will detail the manner in which individual experts are combined so that together they generate a system capable of extracting knowledge, thus permitting patients to be classified in an automatic and efficient manner that is also comprehensible for medical personnel. The system has been tested in a real setting and has been used for classifying patients who suffer from different forms of leukemia at various stages. Personnel from the Cancer Institute supervised and participated throughout the testing period. Preliminary results are promising, notably improving the results obtained with previously used tools. The medical staff from the Cancer Institute considers the tools that have been developed to be positive and very useful in a supporting capacity for carrying out their daily tasks. Additionally the mixture of experts supplies a tool for the extraction of necessary information in order to explain the associations that have been made in simple terms. That is, it permits the extraction of knowledge for each classification made and generalized in order to be used in subsequent classifications. This allows for a large amount of learning and adaptation within the proposed system.

  2. Exploratory factor analysis in Rehabilitation Psychology: a content analysis.

    PubMed

    Roberson, Richard B; Elliott, Timothy R; Chang, Jessica E; Hill, Jessica N

    2014-11-01

    Our objective was to examine the use and quality of exploratory factor analysis (EFA) in articles published in Rehabilitation Psychology. Trained raters examined 66 separate exploratory factor analyses in 47 articles published between 1999 and April 2014. The raters recorded the aim of the EFAs, the distributional statistics, sample size, factor retention method(s), extraction and rotation method(s), and whether the pattern coefficients, structure coefficients, and the matrix of association were reported. The primary use of the EFAs was scale development, but the most widely used extraction and rotation method was principle component analysis, with varimax rotation. When determining how many factors to retain, multiple methods (e.g., scree plot, parallel analysis) were used most often. Many articles did not report enough information to allow for the duplication of their results. EFA relies on authors' choices (e.g., factor retention rules extraction, rotation methods), and few articles adhered to all of the best practices. The current findings are compared to other empirical investigations into the use of EFA in published research. Recommendations for improving EFA reporting practices in rehabilitation psychology research are provided.

  3. Extracting valley-ridge lines from point-cloud-based 3D fingerprint models.

    PubMed

    Pang, Xufang; Song, Zhan; Xie, Wuyuan

    2013-01-01

    3D fingerprinting is an emerging technology with the distinct advantage of touchless operation. More important, 3D fingerprint models contain more biometric information than traditional 2D fingerprint images. However, current approaches to fingerprint feature detection usually must transform the 3D models to a 2D space through unwrapping or other methods, which might introduce distortions. A new approach directly extracts valley-ridge features from point-cloud-based 3D fingerprint models. It first applies the moving least-squares method to fit a local paraboloid surface and represent the local point cloud area. It then computes the local surface's curvatures and curvature tensors to facilitate detection of the potential valley and ridge points. The approach projects those points to the most likely valley-ridge lines, using statistical means such as covariance analysis and cross correlation. To finally extract the valley-ridge lines, it grows the polylines that approximate the projected feature points and removes the perturbations between the sampled points. Experiments with different 3D fingerprint models demonstrate this approach's feasibility and performance.

  4. New Optical Transforms For Statistical Image Recognition

    NASA Astrophysics Data System (ADS)

    Lee, Sing H.

    1983-12-01

    In optical implementation of statistical image recognition, new optical transforms on large images for real-time recognition are of special interest. Several important linear transformations frequently used in statistical pattern recognition have now been optically implemented, including the Karhunen-Loeve transform (KLT), the Fukunaga-Koontz transform (FKT) and the least-squares linear mapping technique (LSLMT).1-3 The KLT performs principle components analysis on one class of patterns for feature extraction. The FKT performs feature extraction for separating two classes of patterns. The LSLMT separates multiple classes of patterns by maximizing the interclass differences and minimizing the intraclass variations.

  5. Data Fusion for Enhanced Aircraft Engine Prognostics and Health Management

    NASA Technical Reports Server (NTRS)

    Volponi, Al

    2005-01-01

    Aircraft gas-turbine engine data is available from a variety of sources, including on-board sensor measurements, maintenance histories, and component models. An ultimate goal of Propulsion Health Management (PHM) is to maximize the amount of meaningful information that can be extracted from disparate data sources to obtain comprehensive diagnostic and prognostic knowledge regarding the health of the engine. Data fusion is the integration of data or information from multiple sources for the achievement of improved accuracy and more specific inferences than can be obtained from the use of a single sensor alone. The basic tenet underlying the data/ information fusion concept is to leverage all available information to enhance diagnostic visibility, increase diagnostic reliability and reduce the number of diagnostic false alarms. This report describes a basic PHM data fusion architecture being developed in alignment with the NASA C-17 PHM Flight Test program. The challenge of how to maximize the meaningful information extracted from disparate data sources to obtain enhanced diagnostic and prognostic information regarding the health and condition of the engine is the primary goal of this endeavor. To address this challenge, NASA Glenn Research Center, NASA Dryden Flight Research Center, and Pratt & Whitney have formed a team with several small innovative technology companies to plan and conduct a research project in the area of data fusion, as it applies to PHM. Methodologies being developed and evaluated have been drawn from a wide range of areas including artificial intelligence, pattern recognition, statistical estimation, and fuzzy logic. This report will provide a chronology and summary of the work accomplished under this research contract.

  6. The contribution of foveal and peripheral visual information to ensemble representation of face race.

    PubMed

    Jung, Wonmo; Bülthoff, Isabelle; Armann, Regine G M

    2017-11-01

    The brain can only attend to a fraction of all the information that is entering the visual system at any given moment. One way of overcoming the so-called bottleneck of selective attention (e.g., J. M. Wolfe, Võ, Evans, & Greene, 2011) is to make use of redundant visual information and extract summarized statistical information of the whole visual scene. Such ensemble representation occurs for low-level features of textures or simple objects, but it has also been reported for complex high-level properties. While the visual system has, for example, been shown to compute summary representations of facial expression, gender, or identity, it is less clear whether perceptual input from all parts of the visual field contributes equally to the ensemble percept. Here we extend the line of ensemble-representation research into the realm of race and look at the possibility that ensemble perception relies on weighting visual information differently depending on its origin from either the fovea or the visual periphery. We find that observers can judge the mean race of a set of faces, similar to judgments of mean emotion from faces and ensemble representations in low-level domains of visual processing. We also find that while peripheral faces seem to be taken into account for the ensemble percept, far more weight is given to stimuli presented foveally than peripherally. Whether this precision weighting of information stems from differences in the accuracy with which the visual system processes information across the visual field or from statistical inferences about the world needs to be determined by further research.

  7. Effect of water extract of Psidium guajava leaves on alloxan-induced diabetic rats.

    PubMed

    Mukhtar, H M; Ansari, S H; Ali, M; Naved, T; Bhat, Z A

    2004-09-01

    A water extract of Psidium guajava leaves was screened for hypoglycemic activity on alloxan-induced diabetic rats. In both acute and sub-acute tests, the water extract, at an oral dose of 250 mg/kg, showed statistically significant hypoglycemic activity.

  8. Intrinsic Bayesian Active Contours for Extraction of Object Boundaries in Images

    PubMed Central

    Srivastava, Anuj

    2010-01-01

    We present a framework for incorporating prior information about high-probability shapes in the process of contour extraction and object recognition in images. Here one studies shapes as elements of an infinite-dimensional, non-linear quotient space, and statistics of shapes are defined and computed intrinsically using differential geometry of this shape space. Prior models on shapes are constructed using probability distributions on tangent bundles of shape spaces. Similar to the past work on active contours, where curves are driven by vector fields based on image gradients and roughness penalties, we incorporate the prior shape knowledge in the form of vector fields on curves. Through experimental results, we demonstrate the use of prior shape models in the estimation of object boundaries, and their success in handling partial obscuration and missing data. Furthermore, we describe the use of this framework in shape-based object recognition or classification. PMID:21076692

  9. Classifying patents based on their semantic content.

    PubMed

    Bergeaud, Antonin; Potiron, Yoann; Raimbault, Juste

    2017-01-01

    In this paper, we extend some usual techniques of classification resulting from a large-scale data-mining and network approach. This new technology, which in particular is designed to be suitable to big data, is used to construct an open consolidated database from raw data on 4 million patents taken from the US patent office from 1976 onward. To build the pattern network, not only do we look at each patent title, but we also examine their full abstract and extract the relevant keywords accordingly. We refer to this classification as semantic approach in contrast with the more common technological approach which consists in taking the topology when considering US Patent office technological classes. Moreover, we document that both approaches have highly different topological measures and strong statistical evidence that they feature a different model. This suggests that our method is a useful tool to extract endogenous information.

  10. Business intelligence from social media: a study from the VAST Box Office Challenge.

    PubMed

    Lu, Yafeng; Wang, Feng; Maciejewski, Ross

    2014-01-01

    With over 16 million tweets per hour, 600 new blog posts per minute, and 400 million active users on Facebook, businesses have begun searching for ways to turn real-time consumer-based posts into actionable intelligence. The goal is to extract information from this noisy, unstructured data and use it for trend analysis and prediction. Current practices support the idea that visual analytics (VA) can help enable the effective analysis of such data. However, empirical evidence demonstrating the effectiveness of a VA solution is still lacking. A proposed VA toolkit extracts data from Bitly and Twitter to predict movie revenue and ratings. Results from the 2013 VAST Box Office Challenge demonstrate the benefit of an interactive environment for predictive analysis, compared to a purely statistical modeling approach. The VA approach used by the toolkit is generalizable to other domains involving social media data, such as sales forecasting and advertisement analysis.

  11. A signal processing framework for simultaneous detection of multiple environmental contaminants

    NASA Astrophysics Data System (ADS)

    Chakraborty, Subhadeep; Manahan, Michael P.; Mench, Matthew M.

    2013-11-01

    The possibility of large-scale attacks using chemical warfare agents (CWAs) has exposed the critical need for fundamental research enabling the reliable, unambiguous and early detection of trace CWAs and toxic industrial chemicals. This paper presents a unique approach for the identification and classification of simultaneously present multiple environmental contaminants by perturbing an electrochemical (EC) sensor with an oscillating potential for the extraction of statistically rich information from the current response. The dynamic response, being a function of the degree and mechanism of contamination, is then processed with a symbolic dynamic filter for the extraction of representative patterns, which are then classified using a trained neural network. The approach presented in this paper promises to extend the sensing power and sensitivity of these EC sensors by augmenting and complementing sensor technology with state-of-the-art embedded real-time signal processing capabilities.

  12. First measurement of the muon neutrino charged current quasielastic double differential cross section

    NASA Astrophysics Data System (ADS)

    Aguilar-Arevalo, A. A.; Anderson, C. E.; Bazarko, A. O.; Brice, S. J.; Brown, B. C.; Bugel, L.; Cao, J.; Coney, L.; Conrad, J. M.; Cox, D. C.; Curioni, A.; Djurcic, Z.; Finley, D. A.; Fleming, B. T.; Ford, R.; Garcia, F. G.; Garvey, G. T.; Grange, J.; Green, C.; Green, J. A.; Hart, T. L.; Hawker, E.; Imlay, R.; Johnson, R. A.; Karagiorgi, G.; Kasper, P.; Katori, T.; Kobilarcik, T.; Kourbanis, I.; Koutsoliotas, S.; Laird, E. M.; Linden, S. K.; Link, J. M.; Liu, Y.; Liu, Y.; Louis, W. C.; Mahn, K. B. M.; Marsh, W.; Mauger, C.; McGary, V. T.; McGregor, G.; Metcalf, W.; Meyers, P. D.; Mills, F.; Mills, G. B.; Monroe, J.; Moore, C. D.; Mousseau, J.; Nelson, R. H.; Nienaber, P.; Nowak, J. A.; Osmanov, B.; Ouedraogo, S.; Patterson, R. B.; Pavlovic, Z.; Perevalov, D.; Polly, C. C.; Prebys, E.; Raaf, J. L.; Ray, H.; Roe, B. P.; Russell, A. D.; Sandberg, V.; Schirato, R.; Schmitz, D.; Shaevitz, M. H.; Shoemaker, F. C.; Smith, D.; Soderberg, M.; Sorel, M.; Spentzouris, P.; Spitz, J.; Stancu, I.; Stefanski, R. J.; Sung, M.; Tanaka, H. A.; Tayloe, R.; Tzanov, M.; van de Water, R. G.; Wascko, M. O.; White, D. H.; Wilking, M. J.; Yang, H. J.; Zeller, G. P.; Zimmerman, E. D.; MiniBooNE Collaboration

    2010-05-01

    A high-statistics sample of charged-current muon neutrino scattering events collected with the MiniBooNE experiment is analyzed to extract the first measurement of the double differential cross section ((d2σ)/(dTμdcos⁡θμ)) for charged-current quasielastic (CCQE) scattering on carbon. This result features minimal model dependence and provides the most complete information on this process to date. With the assumption of CCQE scattering, the absolute cross section as a function of neutrino energy (σ[Eν]) and the single differential cross section ((dσ)/(dQ2)) are extracted to facilitate comparison with previous measurements. These quantities may be used to characterize an effective axial-vector form factor of the nucleon and to improve the modeling of low-energy neutrino interactions on nuclear targets. The results are relevant for experiments searching for neutrino oscillations.

  13. Classifying patents based on their semantic content

    PubMed Central

    2017-01-01

    In this paper, we extend some usual techniques of classification resulting from a large-scale data-mining and network approach. This new technology, which in particular is designed to be suitable to big data, is used to construct an open consolidated database from raw data on 4 million patents taken from the US patent office from 1976 onward. To build the pattern network, not only do we look at each patent title, but we also examine their full abstract and extract the relevant keywords accordingly. We refer to this classification as semantic approach in contrast with the more common technological approach which consists in taking the topology when considering US Patent office technological classes. Moreover, we document that both approaches have highly different topological measures and strong statistical evidence that they feature a different model. This suggests that our method is a useful tool to extract endogenous information. PMID:28445550

  14. NASA's online machine aided indexing system

    NASA Technical Reports Server (NTRS)

    Silvester, June P.; Genuardi, Michael T.; Klingbiel, Paul H.

    1993-01-01

    This report describes the NASA Lexical Dictionary, a machine aided indexing system used online at the National Aeronautics and Space Administration's Center for Aerospace Information (CASI). This system is comprised of a text processor that is based on the computational, non-syntactic analysis of input text, and an extensive 'knowledge base' that serves to recognize and translate text-extracted concepts. The structure and function of the various NLD system components are described in detail. Methods used for the development of the knowledge base are discussed. Particular attention is given to a statistically-based text analysis program that provides the knowledge base developer with a list of concept-specific phrases extracted from large textual corpora. Production and quality benefits resulting from the integration of machine aided indexing at CASI are discussed along with a number of secondary applications of NLD-derived systems including on-line spell checking and machine aided lexicography.

  15. Decoding 2D-PAGE complex maps: relevance to proteomics.

    PubMed

    Pietrogrande, Maria Chiara; Marchetti, Nicola; Dondi, Francesco; Righetti, Pier Giorgio

    2006-03-20

    This review describes two mathematical approaches useful for decoding the complex signal of 2D-PAGE maps of protein mixtures. These methods are helpful for interpreting the large amount of data of each 2D-PAGE map by extracting all the analytical information hidden therein by spot overlapping. Here the basic theory and application to 2D-PAGE maps are reviewed: the means for extracting information from the experimental data and their relevance to proteomics are discussed. One method is based on the quantitative theory of statistical model of peak overlapping (SMO) using the spot experimental data (intensity and spatial coordinates). The second method is based on the study of the 2D-autocovariance function (2D-ACVF) computed on the experimental digitised map. They are two independent methods that are able to extract equal and complementary information from the 2D-PAGE map. Both methods permit to obtain fundamental information on the sample complexity and the separation performance and to single out ordered patterns present in spot positions: the availability of two independent procedures to compute the same separation parameters is a powerful tool to estimate the reliability of the obtained results. The SMO procedure is an unique tool to quantitatively estimate the degree of spot overlapping present in the map, while the 2D-ACVF method is particularly powerful in simply singling out the presence of order in the spot position from the complexity of the whole 2D map, i.e., spot trains. The procedures were validated by extensive numerical computation on computer-generated maps describing experimental 2D-PAGE gels of protein mixtures. Their applicability to real samples was tested on reference maps obtained from literature sources. The review describes the most relevant information for proteomics: sample complexity, separation performance, overlapping extent, identification of spot trains related to post-translational modifications (PTMs).

  16. Legal Medicine Information System using CDISC ODM.

    PubMed

    Kiuchi, Takahiro; Yoshida, Ken-ichi; Kotani, Hirokazu; Tamaki, Keiji; Nagai, Hisashi; Harada, Kazuki; Ishikawa, Hirono

    2013-11-01

    We have developed a new database system for forensic autopsies, called the Legal Medicine Information System, using the Clinical Data Interchange Standards Consortium (CDISC) Operational Data Model (ODM). This system comprises two subsystems, namely the Institutional Database System (IDS) located in each institute and containing personal information, and the Central Anonymous Database System (CADS) located in the University Hospital Medical Information Network Center containing only anonymous information. CDISC ODM is used as the data transfer protocol between the two subsystems. Using the IDS, forensic pathologists and other staff can register and search for institutional autopsy information, print death certificates, and extract data for statistical analysis. They can also submit anonymous autopsy information to the CADS semi-automatically. This reduces the burden of double data entry, the time-lag of central data collection, and anxiety regarding legal and ethical issues. Using the CADS, various studies on the causes of death can be conducted quickly and easily, and the results can be used to prevent similar accidents, diseases, and abuse. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.

  17. The Australian Pharmaceutical Benefits Scheme data collection: a practical guide for researchers.

    PubMed

    Mellish, Leigh; Karanges, Emily A; Litchfield, Melisa J; Schaffer, Andrea L; Blanch, Bianca; Daniels, Benjamin J; Segrave, Alicia; Pearson, Sallie-Anne

    2015-11-02

    The Pharmaceutical Benefits Scheme (PBS) is Australia's national drug subsidy program. This paper provides a practical guide to researchers using PBS data to examine prescribed medicine use. Excerpts of the PBS data collection are available in a variety of formats. We describe the core components of four publicly available extracts (the Australian Statistics on Medicines, PBS statistics online, section 85 extract, under co-payment extract). We also detail common analytical challenges and key issues regarding the interpretation of utilisation using the PBS collection and its various extracts. Research using routinely collected data is increasing internationally. PBS data are a valuable resource for Australian pharmacoepidemiological and pharmaceutical policy research. A detailed knowledge of the PBS, the nuances of data capture, and the extracts available for research purposes are necessary to ensure robust methodology, interpretation, and translation of study findings into policy and practice.

  18. Automatic extraction of property norm-like data from large text corpora.

    PubMed

    Kelly, Colin; Devereux, Barry; Korhonen, Anna

    2014-01-01

    Traditional methods for deriving property-based representations of concepts from text have focused on either extracting only a subset of possible relation types, such as hyponymy/hypernymy (e.g., car is-a vehicle) or meronymy/metonymy (e.g., car has wheels), or unspecified relations (e.g., car--petrol). We propose a system for the challenging task of automatic, large-scale acquisition of unconstrained, human-like property norms from large text corpora, and discuss the theoretical implications of such a system. We employ syntactic, semantic, and encyclopedic information to guide our extraction, yielding concept-relation-feature triples (e.g., car be fast, car require petrol, car cause pollution), which approximate property-based conceptual representations. Our novel method extracts candidate triples from parsed corpora (Wikipedia and the British National Corpus) using syntactically and grammatically motivated rules, then reweights triples with a linear combination of their frequency and four statistical metrics. We assess our system output in three ways: lexical comparison with norms derived from human-generated property norm data, direct evaluation by four human judges, and a semantic distance comparison with both WordNet similarity data and human-judged concept similarity ratings. Our system offers a viable and performant method of plausible triple extraction: Our lexical comparison shows comparable performance to the current state-of-the-art, while subsequent evaluations exhibit the human-like character of our generated properties.

  19. The influence of platelet-rich plasma on the healing of extraction sockets: an explorative randomised clinical trial.

    PubMed

    Alissa, Rami; Esposito, Marco; Horner, Keith; Oliver, Richard

    2010-01-01

    To investigate the effect of platelet-rich plasma (PRP) on the healing of hard and soft tissues of extraction sockets with a pilot study. Patients undergoing tooth extraction under intravenous sedation were asked to participate in the trial. Autologous platelet concentrates were prepared from the patients' blood and autologous thrombin was produced. Outcome measures were: pain level, analgesic consumption, oral function (ability to eat food, swallowing, mouth opening and speech), general activity, swelling, bruising, bleeding, bad taste or halitosis, food stagnation, patient satisfaction, healing complications, soft tissue healing, trabecular pattern of newly formed bone in extraction sockets, trabecular bone volume, trabecular separation, trabecular length, trabecular width, and trabecular number. Patients were followed up to 3 months post-extraction. Twelve patients (15 sockets) were randomly allocated to the PRP group and 11 patients (14 sockets) to the control group. Two patients from the control group did not attend any of the scheduled appointments following tooth extraction, and were considered dropouts. Additionally, one more patient from the control group and four patients from the PRP group did not attend their 3-month radiographic assessment appointments. Statistically significantly more pain was recorded in the control group for the first (P=0.02), second (P=0.02) and third (P=0.04) post-operative days for Visual Analogue Scale scores, whereas no differences were observed for the fourth (P=0.17), fifth (P=0.38), sixth (P=0.75) and seventh (P=0.75) post-operative days. There was a statistically significantly higher analgesic consumption for the first (P=0.03) and second (P=0.02) post-operative days in the control group and no differences thereafter. Differences in patients' responses in the health-related quality of life questionnaire were statistically significant in favour of PRP treatment only for the presence of bad taste or bad smell in the mouth (P=0.03), and food stagnation in the operation area (P=0.03). The difference between groups was not statistically significant for patient satisfaction with the treatment (P=0.31). Regarding complications, two dry sockets and one acutely inflamed alveolus occurred in patients of the control group, which determined a borderline statistically significant difference in favour of the PRP group (P=0.06). Soft tissue healing was significantly better in patients treated with PRP (P=0.03). Radiographic evaluation carried out by the two blinded examiners revealed a statistically significant difference (P=0.01) for sockets with dense homogeneous trabecular pattern, a borderline statistically significant difference in the trabecular pattern for bone volume (P=0.06) favouring PRP use, and no significant differences for trabecular separation (P=0.66), trabecular length (P=0.16), trabecular width (P=0.16) and trabecular number (P=0.38). PRP may have some benefits in reducing complications such as alveolar osteitis and improving healing of soft tissue of extraction sockets. There were insufficient data to support the use of PRP to promote bone healing or to enhance the quality of life of patients following tooth extraction, although the sample size was too small to detect statistically significant differences.

  20. Investigation of the Impact of Extracting and Exchanging Health Information by Using Internet and Social Networks.

    PubMed

    Pistolis, John; Zimeras, Stelios; Chardalias, Kostas; Roupa, Zoe; Fildisis, George; Diomidous, Marianna

    2016-06-01

    Social networks (1) have been embedded in our daily life for a long time. They constitute a powerful tool used nowadays for both searching and exchanging information on different issues by using Internet searching engines (Google, Bing, etc.) and Social Networks (Facebook, Twitter etc.). In this paper, are presented the results of a research based on the frequency and the type of the usage of the Internet and the Social Networks by the general public and the health professionals. The objectives of the research were focused on the investigation of the frequency of seeking and meticulously searching for health information in the social media by both individuals and health practitioners. The exchanging of information is a procedure that involves the issues of reliability and quality of information. In this research, by using advanced statistical techniques an effort is made to investigate the participant's profile in using social networks for searching and exchanging information on health issues. Based on the answers 93 % of the people, use the Internet to find information on health-subjects. Considering principal component analysis, the most important health subjects were nutrition (0.719 %), respiratory issues (0.79 %), cardiological issues (0.777%), psychological issues (0.667%) and total (73.8%). The research results, based on different statistical techniques revealed that the 61.2% of the males and 56.4% of the females intended to use the social networks for searching medical information. Based on the principal components analysis, the most important sources that the participants mentioned, were the use of the Internet and social networks for exchanging information on health issues. These sources proved to be of paramount importance to the participants of the study. The same holds for nursing, medical and administrative staff in hospitals.

  1. Statistical Feature Extraction for Artifact Removal from Concurrent fMRI-EEG Recordings

    PubMed Central

    Liu, Zhongming; de Zwart, Jacco A.; van Gelderen, Peter; Kuo, Li-Wei; Duyn, Jeff H.

    2011-01-01

    We propose a set of algorithms for sequentially removing artifacts related to MRI gradient switching and cardiac pulsations from electroencephalography (EEG) data recorded during functional magnetic resonance imaging (fMRI). Special emphases are directed upon the use of statistical metrics and methods for the extraction and selection of features that characterize gradient and pulse artifacts. To remove gradient artifacts, we use a channel-wise filtering based on singular value decomposition (SVD). To remove pulse artifacts, we first decompose data into temporally independent components and then select a compact cluster of components that possess sustained high mutual information with the electrocardiogram (ECG). After the removal of these components, the time courses of remaining components are filtered by SVD to remove the temporal patterns phase-locked to the cardiac markers derived from the ECG. The filtered component time courses are then inversely transformed into multi-channel EEG time series free of pulse artifacts. Evaluation based on a large set of simultaneous EEG-fMRI data obtained during a variety of behavioral tasks, sensory stimulations and resting conditions showed excellent data quality and robust performance attainable by the proposed methods. These algorithms have been implemented as a Matlab-based toolbox made freely available for public access and research use. PMID:22036675

  2. Statistical feature extraction for artifact removal from concurrent fMRI-EEG recordings.

    PubMed

    Liu, Zhongming; de Zwart, Jacco A; van Gelderen, Peter; Kuo, Li-Wei; Duyn, Jeff H

    2012-02-01

    We propose a set of algorithms for sequentially removing artifacts related to MRI gradient switching and cardiac pulsations from electroencephalography (EEG) data recorded during functional magnetic resonance imaging (fMRI). Special emphasis is directed upon the use of statistical metrics and methods for the extraction and selection of features that characterize gradient and pulse artifacts. To remove gradient artifacts, we use channel-wise filtering based on singular value decomposition (SVD). To remove pulse artifacts, we first decompose data into temporally independent components and then select a compact cluster of components that possess sustained high mutual information with the electrocardiogram (ECG). After the removal of these components, the time courses of remaining components are filtered by SVD to remove the temporal patterns phase-locked to the cardiac timing markers derived from the ECG. The filtered component time courses are then inversely transformed into multi-channel EEG time series free of pulse artifacts. Evaluation based on a large set of simultaneous EEG-fMRI data obtained during a variety of behavioral tasks, sensory stimulations and resting conditions showed excellent data quality and robust performance attainable with the proposed methods. These algorithms have been implemented as a Matlab-based toolbox made freely available for public access and research use. Published by Elsevier Inc.

  3. A New Method for Automated Identification and Morphometry of Myelinated Fibers Through Light Microscopy Image Analysis.

    PubMed

    Novas, Romulo Bourget; Fazan, Valeria Paula Sassoli; Felipe, Joaquim Cezar

    2016-02-01

    Nerve morphometry is known to produce relevant information for the evaluation of several phenomena, such as nerve repair, regeneration, implant, transplant, aging, and different human neuropathies. Manual morphometry is laborious, tedious, time consuming, and subject to many sources of error. Therefore, in this paper, we propose a new method for the automated morphometry of myelinated fibers in cross-section light microscopy images. Images from the recurrent laryngeal nerve of adult rats and the vestibulocochlear nerve of adult guinea pigs were used herein. The proposed pipeline for fiber segmentation is based on the techniques of competitive clustering and concavity analysis. The evaluation of the proposed method for segmentation of images was done by comparing the automatic segmentation with the manual segmentation. To further evaluate the proposed method considering morphometric features extracted from the segmented images, the distributions of these features were tested for statistical significant difference. The method achieved a high overall sensitivity and very low false-positive rates per image. We detect no statistical difference between the distribution of the features extracted from the manual and the pipeline segmentations. The method presented a good overall performance, showing widespread potential in experimental and clinical settings allowing large-scale image analysis and, thus, leading to more reliable results.

  4. Extracting morphologies from third harmonic generation images of structurally normal human brain tissue.

    PubMed

    Zhang, Zhiqing; Kuzmin, Nikolay V; Groot, Marie Louise; de Munck, Jan C

    2017-06-01

    The morphologies contained in 3D third harmonic generation (THG) images of human brain tissue can report on the pathological state of the tissue. However, the complexity of THG brain images makes the usage of modern image processing tools, especially those of image filtering, segmentation and validation, to extract this information challenging. We developed a salient edge-enhancing model of anisotropic diffusion for image filtering, based on higher order statistics. We split the intrinsic 3-phase segmentation problem into two 2-phase segmentation problems, each of which we solved with a dedicated model, active contour weighted by prior extreme. We applied the novel proposed algorithms to THG images of structurally normal ex-vivo human brain tissue, revealing key tissue components-brain cells, microvessels and neuropil, enabling statistical characterization of these components. Comprehensive comparison to manually delineated ground truth validated the proposed algorithms. Quantitative comparison to second harmonic generation/auto-fluorescence images, acquired simultaneously from the same tissue area, confirmed the correctness of the main THG features detected. The software and test datasets are available from the authors. z.zhang@vu.nl. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  5. Radial gradient and radial deviation radiomic features from pre-surgical CT scans are associated with survival among lung adenocarcinoma patients

    PubMed Central

    Tunali, Ilke; Stringfield, Olya; Guvenis, Albert; Wang, Hua; Liu, Ying; Balagurunathan, Yoganand; Lambin, Philippe; Gillies, Robert J.; Schabath, Matthew B.

    2017-01-01

    The goal of this study was to extract features from radial deviation and radial gradient maps which were derived from thoracic CT scans of patients diagnosed with lung adenocarcinoma and assess whether these features are associated with overall survival. We used two independent cohorts from different institutions for training (n= 61) and test (n= 47) and focused our analyses on features that were non-redundant and highly reproducible. To reduce the number of features and covariates into a single parsimonious model, a backward elimination approach was applied. Out of 48 features that were extracted, 31 were eliminated because they were not reproducible or were redundant. We considered 17 features for statistical analysis and identified a final model containing the two most highly informative features that were associated with lung cancer survival. One of the two features, radial deviation outside-border separation standard deviation, was replicated in a test cohort exhibiting a statistically significant association with lung cancer survival (multivariable hazard ratio = 0.40; 95% confidence interval 0.17-0.97). Additionally, we explored the biological underpinnings of these features and found radial gradient and radial deviation image features were significantly associated with semantic radiological features. PMID:29221183

  6. Nondeterministic data base for computerized visual perception

    NASA Technical Reports Server (NTRS)

    Yakimovsky, Y.

    1976-01-01

    A description is given of the knowledge representation data base in the perception subsystem of the Mars robot vehicle prototype. Two types of information are stored. The first is generic information that represents general rules that are conformed to by structures in the expected environments. The second kind of information is a specific description of a structure, i.e., the properties and relations of objects in the specific case being analyzed. The generic knowledge is represented so that it can be applied to extract and infer the description of specific structures. The generic model of the rules is substantially a Bayesian representation of the statistics of the environment, which means it is geared to representation of nondeterministic rules relating properties of, and relations between, objects. The description of a specific structure is also nondeterministic in the sense that all properties and relations may take a range of values with an associated probability distribution.

  7. Classification of endoscopic capsule images by using color wavelet features, higher order statistics and radial basis functions.

    PubMed

    Lima, C S; Barbosa, D; Ramos, J; Tavares, A; Monteiro, L; Carvalho, L

    2008-01-01

    This paper presents a system to support medical diagnosis and detection of abnormal lesions by processing capsule endoscopic images. Endoscopic images possess rich information expressed by texture. Texture information can be efficiently extracted from medium scales of the wavelet transform. The set of features proposed in this paper to code textural information is named color wavelet covariance (CWC). CWC coefficients are based on the covariances of second order textural measures, an optimum subset of them is proposed. Third and forth order moments are added to cope with distributions that tend to become non-Gaussian, especially in some pathological cases. The proposed approach is supported by a classifier based on radial basis functions procedure for the characterization of the image regions along the video frames. The whole methodology has been applied on real data containing 6 full endoscopic exams and reached 95% specificity and 93% sensitivity.

  8. Cancer Statistics Animator

    Cancer.gov

    This tool allows users to animate cancer trends over time by cancer site and cause of death, race, and sex. Provides access to incidence, mortality, and survival. Select the type of statistic, variables, format, and then extract the statistics in a delimited format for further analyses.

  9. Spatial-temporal clustering of companion animal enteric syndrome: detection and investigation through the use of electronic medical records from participating private practices.

    PubMed

    Anholt, R M; Berezowski, J; Robertson, C; Stephen, C

    2015-09-01

    There is interest in the potential of companion animal surveillance to provide data to improve pet health and to provide early warning of environmental hazards to people. We implemented a companion animal surveillance system in Calgary, Alberta and the surrounding communities. Informatics technologies automatically extracted electronic medical records from participating veterinary practices and identified cases of enteric syndrome in the warehoused records. The data were analysed using time-series analyses and a retrospective space-time permutation scan statistic. We identified a seasonal pattern of reports of occurrences of enteric syndromes in companion animals and four statistically significant clusters of enteric syndrome cases. The cases within each cluster were examined and information about the animals involved (species, age, sex), their vaccination history, possible exposure or risk behaviour history, information about disease severity, and the aetiological diagnosis was collected. We then assessed whether the cases within the cluster were unusual and if they represented an animal or public health threat. There was often insufficient information recorded in the medical record to characterize the clusters by aetiology or exposures. Space-time analysis of companion animal enteric syndrome cases found evidence of clustering. Collection of more epidemiologically relevant data would enhance the utility of practice-based companion animal surveillance.

  10. Statistical mixture design selective extraction of compounds with antioxidant activity and total polyphenol content from Trichilia catigua.

    PubMed

    Lonni, Audrey Alesandra Stinghen Garcia; Longhini, Renata; Lopes, Gisely Cristiny; de Mello, João Carlos Palazzo; Scarminio, Ieda Spacino

    2012-03-16

    Statistical design mixtures of water, methanol, acetone and ethanol were used to extract material from Trichilia catigua (Meliaceae) barks to study the effects of different solvents and their mixtures on its yield, total polyphenol content and antioxidant activity. The experimental results and their response surface models showed that quaternary mixtures with approximately equal proportions of all four solvents provided the highest yields, total polyphenol contents and antioxidant activities of the crude extracts followed by ternary design mixtures. Principal component and hierarchical clustering analysis of the HPLC-DAD spectra of the chromatographic peaks of 1:1:1:1 water-methanol-acetone-ethanol mixture extracts indicate the presence of cinchonains, gallic acid derivatives, natural polyphenols, flavanoids, catechins, and epicatechins. Copyright © 2011 Elsevier B.V. All rights reserved.

  11. DB4US: A Decision Support System for Laboratory Information Management

    PubMed Central

    Hortas, Maria Luisa; Baena-García, Manuel; Lana-Linati, Jorge; González, Carlos; Redondo, Maximino; Morales-Bueno, Rafael

    2012-01-01

    Background Until recently, laboratory automation has focused primarily on improving hardware. Future advances are concentrated on intelligent software since laboratories performing clinical diagnostic testing require improved information systems to address their data processing needs. In this paper, we propose DB4US, an application that automates information related to laboratory quality indicators information. Currently, there is a lack of ready-to-use management quality measures. This application addresses this deficiency through the extraction, consolidation, statistical analysis, and visualization of data related to the use of demographics, reagents, and turn-around times. The design and implementation issues, as well as the technologies used for the implementation of this system, are discussed in this paper. Objective To develop a general methodology that integrates the computation of ready-to-use management quality measures and a dashboard to easily analyze the overall performance of a laboratory, as well as automatically detect anomalies or errors. The novelty of our approach lies in the application of integrated web-based dashboards as an information management system in hospital laboratories. Methods We propose a new methodology for laboratory information management based on the extraction, consolidation, statistical analysis, and visualization of data related to demographics, reagents, and turn-around times, offering a dashboard-like user web interface to the laboratory manager. The methodology comprises a unified data warehouse that stores and consolidates multidimensional data from different data sources. The methodology is illustrated through the implementation and validation of DB4US, a novel web application based on this methodology that constructs an interface to obtain ready-to-use indicators, and offers the possibility to drill down from high-level metrics to more detailed summaries. The offered indicators are calculated beforehand so that they are ready to use when the user needs them. The design is based on a set of different parallel processes to precalculate indicators. The application displays information related to tests, requests, samples, and turn-around times. The dashboard is designed to show the set of indicators on a single screen. Results DB4US was deployed for the first time in the Hospital Costa del Sol in 2008. In our evaluation we show the positive impact of this methodology for laboratory professionals, since the use of our application has reduced the time needed for the elaboration of the different statistical indicators and has also provided information that has been used to optimize the usage of laboratory resources by the discovery of anomalies in the indicators. DB4US users benefit from Internet-based communication of results, since this information is available from any computer without having to install any additional software. Conclusions The proposed methodology and the accompanying web application, DB4US, automates the processing of information related to laboratory quality indicators and offers a novel approach for managing laboratory-related information, benefiting from an Internet-based communication mechanism. The application of this methodology has been shown to improve the usage of time, as well as other laboratory resources. PMID:23608745

  12. Robust hypothesis tests for detecting statistical evidence of two-dimensional and three-dimensional interactions in single-molecule measurements

    NASA Astrophysics Data System (ADS)

    Calderon, Christopher P.; Weiss, Lucien E.; Moerner, W. E.

    2014-05-01

    Experimental advances have improved the two- (2D) and three-dimensional (3D) spatial resolution that can be extracted from in vivo single-molecule measurements. This enables researchers to quantitatively infer the magnitude and directionality of forces experienced by biomolecules in their native environment. Situations where such force information is relevant range from mitosis to directed transport of protein cargo along cytoskeletal structures. Models commonly applied to quantify single-molecule dynamics assume that effective forces and velocity in the x ,y (or x ,y,z) directions are statistically independent, but this assumption is physically unrealistic in many situations. We present a hypothesis testing approach capable of determining if there is evidence of statistical dependence between positional coordinates in experimentally measured trajectories; if the hypothesis of independence between spatial coordinates is rejected, then a new model accounting for 2D (3D) interactions can and should be considered. Our hypothesis testing technique is robust, meaning it can detect interactions, even if the noise statistics are not well captured by the model. The approach is demonstrated on control simulations and on experimental data (directed transport of intraflagellar transport protein 88 homolog in the primary cilium).

  13. Information extraction from multi-institutional radiology reports.

    PubMed

    Hassanpour, Saeed; Langlotz, Curtis P

    2016-01-01

    The radiology report is the most important source of clinical imaging information. It documents critical information about the patient's health and the radiologist's interpretation of medical findings. It also communicates information to the referring physicians and records that information for future clinical and research use. Although efforts to structure some radiology report information through predefined templates are beginning to bear fruit, a large portion of radiology report information is entered in free text. The free text format is a major obstacle for rapid extraction and subsequent use of information by clinicians, researchers, and healthcare information systems. This difficulty is due to the ambiguity and subtlety of natural language, complexity of described images, and variations among different radiologists and healthcare organizations. As a result, radiology reports are used only once by the clinician who ordered the study and rarely are used again for research and data mining. In this work, machine learning techniques and a large multi-institutional radiology report repository are used to extract the semantics of the radiology report and overcome the barriers to the re-use of radiology report information in clinical research and other healthcare applications. We describe a machine learning system to annotate radiology reports and extract report contents according to an information model. This information model covers the majority of clinically significant contents in radiology reports and is applicable to a wide variety of radiology study types. Our automated approach uses discriminative sequence classifiers for named-entity recognition to extract and organize clinically significant terms and phrases consistent with the information model. We evaluated our information extraction system on 150 radiology reports from three major healthcare organizations and compared its results to a commonly used non-machine learning information extraction method. We also evaluated the generalizability of our approach across different organizations by training and testing our system on data from different organizations. Our results show the efficacy of our machine learning approach in extracting the information model's elements (10-fold cross-validation average performance: precision: 87%, recall: 84%, F1 score: 85%) and its superiority and generalizability compared to the common non-machine learning approach (p-value<0.05). Our machine learning information extraction approach provides an effective automatic method to annotate and extract clinically significant information from a large collection of free text radiology reports. This information extraction system can help clinicians better understand the radiology reports and prioritize their review process. In addition, the extracted information can be used by researchers to link radiology reports to information from other data sources such as electronic health records and the patient's genome. Extracted information also can facilitate disease surveillance, real-time clinical decision support for the radiologist, and content-based image retrieval. Copyright © 2015 Elsevier B.V. All rights reserved.

  14. Assessment of antifungal activity of herbal and conventional toothpastes against clinical isolates of Candida albicans.

    PubMed

    Adwan, Ghaleb; Salameh, Yousef; Adwan, Kamel; Barakat, Ali

    2012-05-01

    To detect the anticandidal activity of nine toothpastes containing sodium fluoride, sodium monofluorophosphate and herbal extracts as an active ingredients against 45 oral and non oral Candida albicans (C. albicans) isolates. The antifungal activity of these toothpaste formulations was determined using a standard agar well diffusion method. Statistical analysis was performed using a statistical package, SPSS windows version 15, by applying mean values using one-way ANOVA with post-hoc least square differences (LSD) method. A P value of less than 0.05 was considered significant. All toothpastes studied in our experiments were effective in inhibiting the growth of all C. albicans isolates. The highest anticandidal activity was obtained from toothpaste that containing both herbal extracts and sodium fluoride as active ingredients, while the lowest activity was obtained from toothpaste containing sodium monofluorophosphate as an active ingredient. Antifungal activity of Parodontax toothpaste showed a significant difference (P< 0.001) against C. albicans isolates compared to toothpastes containing sodium fluoride or herbal products. In the present study, it has been demonstrated that toothpaste containing both herbal extracts and sodium fluoride as active ingredients are more effective in control of C. albicans, while toothpaste that containing monofluorophosphate as an active ingredient is less effective against C. albicans. Some herbal toothpaste formulations studied in our experiments, appear to be equally effective as the fluoride dental formulations and it can be used as an alternative to conventional formulations for individuals who have an interest in naturally-based products. Our results may provide invaluable information for dental professionals.

  15. Assessment of antifungal activity of herbal and conventional toothpastes against clinical isolates of Candida albicans

    PubMed Central

    Adwan, Ghaleb; Salameh, Yousef; Adwan, Kamel; Barakat, Ali

    2012-01-01

    Objective To detect the anticandidal activity of nine toothpastes containing sodium fluoride, sodium monofluorophosphate and herbal extracts as an active ingredients against 45 oral and non oral Candida albicans (C. albicans) isolates. Methods The antifungal activity of these toothpaste formulations was determined using a standard agar well diffusion method. Statistical analysis was performed using a statistical package, SPSS windows version 15, by applying mean values using one-way ANOVA with post-hoc least square differences (LSD) method. A P value of less than 0.05 was considered significant. Results All toothpastes studied in our experiments were effective in inhibiting the growth of all C. albicans isolates. The highest anticandidal activity was obtained from toothpaste that containing both herbal extracts and sodium fluoride as active ingredients, while the lowest activity was obtained from toothpaste containing sodium monofluorophosphate as an active ingredient. Antifungal activity of Parodontax toothpaste showed a significant difference (P< 0.001) against C. albicans isolates compared to toothpastes containing sodium fluoride or herbal products. Conclusions In the present study, it has been demonstrated that toothpaste containing both herbal extracts and sodium fluoride as active ingredients are more effective in control of C. albicans, while toothpaste that containing monofluorophosphate as an active ingredient is less effective against C. albicans. Some herbal toothpaste formulations studied in our experiments, appear to be equally effective as the fluoride dental formulations and it can be used as an alternative to conventional formulations for individuals who have an interest in naturally-based products. Our results may provide invaluable information for dental professionals. PMID:23569933

  16. Score-level fusion of two-dimensional and three-dimensional palmprint for personal recognition systems

    NASA Astrophysics Data System (ADS)

    Chaa, Mourad; Boukezzoula, Naceur-Eddine; Attia, Abdelouahab

    2017-01-01

    Two types of scores extracted from two-dimensional (2-D) and three-dimensional (3-D) palmprint for personal recognition systems are merged, introducing a local image descriptor for 2-D palmprint-based recognition systems, named bank of binarized statistical image features (B-BSIF). The main idea of B-BSIF is that the extracted histograms from the binarized statistical image features (BSIF) code images (the results of applying the different BSIF descriptor size with the length 12) are concatenated into one to produce a large feature vector. 3-D palmprint contains the depth information of the palm surface. The self-quotient image (SQI) algorithm is applied for reconstructing illumination-invariant 3-D palmprint images. To extract discriminative Gabor features from SQI images, Gabor wavelets are defined and used. Indeed, the dimensionality reduction methods have shown their ability in biometrics systems. Given this, a principal component analysis (PCA)+linear discriminant analysis (LDA) technique is employed. For the matching process, the cosine Mahalanobis distance is applied. Extensive experiments were conducted on a 2-D and 3-D palmprint database with 10,400 range images from 260 individuals. Then, a comparison was made between the proposed algorithm and other existing methods in the literature. Results clearly show that the proposed framework provides a higher correct recognition rate. Furthermore, the best results were obtained by merging the score of B-BSIF descriptor with the score of the SQI+Gabor wavelets+PCA+LDA method, yielding an equal error rate of 0.00% and a recognition rate of rank-1=100.00%.

  17. Joint principal trend analysis for longitudinal high-dimensional data.

    PubMed

    Zhang, Yuping; Ouyang, Zhengqing

    2018-06-01

    We consider a research scenario motivated by integrating multiple sources of information for better knowledge discovery in diverse dynamic biological processes. Given two longitudinal high-dimensional datasets for a group of subjects, we want to extract shared latent trends and identify relevant features. To solve this problem, we present a new statistical method named as joint principal trend analysis (JPTA). We demonstrate the utility of JPTA through simulations and applications to gene expression data of the mammalian cell cycle and longitudinal transcriptional profiling data in response to influenza viral infections. © 2017, The International Biometric Society.

  18. Reflectance of vegetation, soil, and water

    NASA Technical Reports Server (NTRS)

    Wiegand, C. L. (Principal Investigator)

    1973-01-01

    There are no author-identified significant results in this report. This report deals with the selection of the best channels from the 24-channel aircraft data to represent crop and soil conditions. A three-step procedure has been developed that involves using univariate statistics and an F-ratio test to indicate the best 14 channels. From the 14, the 10 best channels are selected by a multivariate stochastic process. The third step involves the pattern recognition procedures developed in the data analysis plan. Indications are that the procedures in use are satsifactory and will extract the desired information from the data.

  19. Crop identification and area estimation over large geographic areas using LANDSAT MSS data

    NASA Technical Reports Server (NTRS)

    Bauer, M. E. (Principal Investigator)

    1977-01-01

    The author has identified the following significant results. LANDSAT MSS data was adequate to accurately identify wheat in Kansas; corn and soybean estimates in Indiana were less accurate. Computer-aided analysis techniques were effectively used to extract crop identification information from LANDSAT data. Systematic sampling of entire counties made possible by computer classification methods resulted in very precise area estimates at county, district, and state levels. Training statistics were successfully extended from one county to other counties having similar crops and soils if the training areas sampled the total variation of the area to be classified.

  20. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Liscom, W.L.

    This book presents a complete graphic and statistical portrait of the dramatic shifts in global energy flows during the 1970s and the resultant transfer of economic and political power from the industrial nations to the oil-producing states. The information was extracted from government-source documents and compiled in a computer data base. Computer graphics were combined with the data base to produce over 400 full-color graphs. The energy commodities covered are oil, natural gas, coal, nuclear, and conventional electric-power generation. Also included are data on hydroelectric and geothermal power, oil shale, tar sands, and other alternative energy sources. 72 references.

  1. Review of Extracting Information From the Social Web for Health Personalization

    PubMed Central

    Karlsen, Randi; Bonander, Jason

    2011-01-01

    In recent years the Web has come into its own as a social platform where health consumers are actively creating and consuming Web content. Moreover, as the Web matures, consumers are gaining access to personalized applications adapted to their health needs and interests. The creation of personalized Web applications relies on extracted information about the users and the content to personalize. The Social Web itself provides many sources of information that can be used to extract information for personalization apart from traditional Web forms and questionnaires. This paper provides a review of different approaches for extracting information from the Social Web for health personalization. We reviewed research literature across different fields addressing the disclosure of health information in the Social Web, techniques to extract that information, and examples of personalized health applications. In addition, the paper includes a discussion of technical and socioethical challenges related to the extraction of information for health personalization. PMID:21278049

  2. Carrier statistics and quantum capacitance effects on mobility extraction in two-dimensional crystal semiconductor field-effect transistors

    NASA Astrophysics Data System (ADS)

    Ma, Nan; Jena, Debdeep

    2015-03-01

    In this work, the consequence of the high band-edge density of states on the carrier statistics and quantum capacitance in transition metal dichalcogenide two-dimensional semiconductor devices is explored. The study questions the validity of commonly used expressions for extracting carrier densities and field-effect mobilities from the transfer characteristics of transistors with such channel materials. By comparison to experimental data, a new method for the accurate extraction of carrier densities and mobilities is outlined. The work thus highlights a fundamental difference between these materials and traditional semiconductors that must be considered in future experimental measurements.

  3. Ultrasonic-assisted extraction and in-vitro antioxidant activity of polysaccharide from Hibiscus leaf.

    PubMed

    Afshari, Kasra; Samavati, Vahid; Shahidi, Seyed-Ahmad

    2015-03-01

    The effects of ultrasonic power, extraction time, extraction temperature, and the water-to-raw material ratio on extraction yield of crude polysaccharide from the leaf of Hibiscus rosa-sinensis (HRLP) were optimized by statistical analysis using response surface methodology. The response surface methodology (RSM) was used to optimize HRLP extraction yield by implementing the Box-Behnken design (BBD). The experimental data obtained were fitted to a second-order polynomial equation using multiple regression analysis and also analyzed by appropriate statistical methods (ANOVA). Analysis of the results showed that the linear and quadratic terms of these four variables had significant effects. The optimal conditions for the highest extraction yield of HRLP were: ultrasonic power, 93.59 W; extraction time, 25.71 min; extraction temperature, 93.18°C; and the water to raw material ratio, 24.3 mL/g. Under these conditions, the experimental yield was 9.66±0.18%, which is well in close agreement with the value predicted by the model 9.526%. The results demonstrated that HRLP had strong scavenging activities in vitro on DPPH and hydroxyl radicals. Copyright © 2014 Elsevier B.V. All rights reserved.

  4. Computational methods to extract meaning from text and advance theories of human cognition.

    PubMed

    McNamara, Danielle S

    2011-01-01

    Over the past two decades, researchers have made great advances in the area of computational methods for extracting meaning from text. This research has to a large extent been spurred by the development of latent semantic analysis (LSA), a method for extracting and representing the meaning of words using statistical computations applied to large corpora of text. Since the advent of LSA, researchers have developed and tested alternative statistical methods designed to detect and analyze meaning in text corpora. This research exemplifies how statistical models of semantics play an important role in our understanding of cognition and contribute to the field of cognitive science. Importantly, these models afford large-scale representations of human knowledge and allow researchers to explore various questions regarding knowledge, discourse processing, text comprehension, and language. This topic includes the latest progress by the leading researchers in the endeavor to go beyond LSA. Copyright © 2010 Cognitive Science Society, Inc.

  5. Rapid Statistical Learning Supporting Word Extraction From Continuous Speech.

    PubMed

    Batterink, Laura J

    2017-07-01

    The identification of words in continuous speech, known as speech segmentation, is a critical early step in language acquisition. This process is partially supported by statistical learning, the ability to extract patterns from the environment. Given that speech segmentation represents a potential bottleneck for language acquisition, patterns in speech may be extracted very rapidly, without extensive exposure. This hypothesis was examined by exposing participants to continuous speech streams composed of novel repeating nonsense words. Learning was measured on-line using a reaction time task. After merely one exposure to an embedded novel word, learners demonstrated significant learning effects, as revealed by faster responses to predictable than to unpredictable syllables. These results demonstrate that learners gained sensitivity to the statistical structure of unfamiliar speech on a very rapid timescale. This ability may play an essential role in early stages of language acquisition, allowing learners to rapidly identify word candidates and "break in" to an unfamiliar language.

  6. Analysis of glycosylated flavonoids extracted from sweet-cherry stems, as antibacterial agents against pathogenic Escherichia coli isolates.

    PubMed

    Aires, Alfredo; Dias, Carla; Carvalho, Rosa; Saavedra, Maria José

    2017-01-01

    The aim of this study was to evaluate the bioactivity of flavonoids extracted from sweet-cherry stems which are often used by a traditional system of medicine to treat gastro-intestinal and urinary tract infections but lacking any consistent scientific evidence; moreover the information about the class of phenolics, their content and the potential bioactivity of such material is very scarce. Thus, in this context, we have set a research study in which we evaluated the profile and content of phenolics extracted from sweet-cherry stems through a conventional (70ºC and 20 min) and ultrasound assisted extraction (40 kHz, room temperature and 20 min). The extracts were phytochemically characterized by using an HPLC-DAD-UV/VIS system and assayed by an in vitro minimum inhibitory concentration (MIC) bioassay against Escherichia coli isolates. Simultaneously, the total antioxidant activities were measured using the 2,2'-azinobis-3-ethylbenzothiazoline-6-sulphonate (ABTS •+ ) radical cation assay. Our results indicate that sweet-cherry stems have a high content of sakuranetin, ferulic acid, p-coumaric acid, p-coumaroylquinic acid, chlorogenic acid and its isomer neochlorogenic acid. Their average levels were highly affected by the extraction method used (p<0.001). The same trend was observed for total antioxidant activity and MIC values. The extracts produced with ultrasounds presented both, a higher total antioxidant activity and a lower minimum inhibitory concentration. Statistical analyses of our results showed a significant correlation (p<0.01) of total antioxidant activity and minimum inhibitory concentration with phenolics present in the extracts studied. Thus, we can conclude that cherry stems can be further exploited to purify compounds and produce coproducts with enhanced biologically added value for pharmaceutical industry.

  7. 22 CFR 92.80 - Obtaining American vital statistics records.

    Code of Federal Regulations, 2011 CFR

    2011-04-01

    ... 22 Foreign Relations 1 2011-04-01 2011-04-01 false Obtaining American vital statistics records. 92... statistics records. Individuals who inquire as to means of obtaining copies of or extracts from American... Vital Statistics Office at the place where the record is kept, which is usually in the capital city of...

  8. 22 CFR 92.80 - Obtaining American vital statistics records.

    Code of Federal Regulations, 2010 CFR

    2010-04-01

    ... 22 Foreign Relations 1 2010-04-01 2010-04-01 false Obtaining American vital statistics records. 92... statistics records. Individuals who inquire as to means of obtaining copies of or extracts from American... Vital Statistics Office at the place where the record is kept, which is usually in the capital city of...

  9. Research of information classification and strategy intelligence extract algorithm based on military strategy hall

    NASA Astrophysics Data System (ADS)

    Chen, Lei; Li, Dehua; Yang, Jie

    2007-12-01

    Constructing virtual international strategy environment needs many kinds of information, such as economy, politic, military, diploma, culture, science, etc. So it is very important to build an information auto-extract, classification, recombination and analysis management system with high efficiency as the foundation and component of military strategy hall. This paper firstly use improved Boost algorithm to classify obtained initial information, then use a strategy intelligence extract algorithm to extract strategy intelligence from initial information to help strategist to analysis information.

  10. Development of an Information Fusion System for Engine Diagnostics and Health Management

    NASA Technical Reports Server (NTRS)

    Volponi, Allan J.; Brotherton, Tom; Luppold, Robert; Simon, Donald L.

    2004-01-01

    Aircraft gas-turbine engine data are available from a variety of sources including on-board sensor measurements, maintenance histories, and component models. An ultimate goal of Propulsion Health Management (PHM) is to maximize the amount of meaningful information that can be extracted from disparate data sources to obtain comprehensive diagnostic and prognostic knowledge regarding the health of the engine. Data Fusion is the integration of data or information from multiple sources, to achieve improved accuracy and more specific inferences than can be obtained from the use of a single sensor alone. The basic tenet underlying the data/information fusion concept is to leverage all available information to enhance diagnostic visibility, increase diagnostic reliability and reduce the number of diagnostic false alarms. This paper describes a basic PHM Data Fusion architecture being developed in alignment with the NASA C17 Propulsion Health Management (PHM) Flight Test program. The challenge of how to maximize the meaningful information extracted from disparate data sources to obtain enhanced diagnostic and prognostic information regarding the health and condition of the engine is the primary goal of this endeavor. To address this challenge, NASA Glenn Research Center (GRC), NASA Dryden Flight Research Center (DFRC) and Pratt & Whitney (P&W) have formed a team with several small innovative technology companies to plan and conduct a research project in the area of data fusion as applied to PHM. Methodologies being developed and evaluated have been drawn from a wide range of areas including artificial intelligence, pattern recognition, statistical estimation, and fuzzy logic. This paper will provide a broad overview of this work, discuss some of the methodologies employed and give some illustrative examples.

  11. Analysis of entropy extraction efficiencies in random number generation systems

    NASA Astrophysics Data System (ADS)

    Wang, Chao; Wang, Shuang; Chen, Wei; Yin, Zhen-Qiang; Han, Zheng-Fu

    2016-05-01

    Random numbers (RNs) have applications in many areas: lottery games, gambling, computer simulation, and, most importantly, cryptography [N. Gisin et al., Rev. Mod. Phys. 74 (2002) 145]. In cryptography theory, the theoretical security of the system calls for high quality RNs. Therefore, developing methods for producing unpredictable RNs with adequate speed is an attractive topic. Early on, despite the lack of theoretical support, pseudo RNs generated by algorithmic methods performed well and satisfied reasonable statistical requirements. However, as implemented, those pseudorandom sequences were completely determined by mathematical formulas and initial seeds, which cannot introduce extra entropy or information. In these cases, “random” bits are generated that are not at all random. Physical random number generators (RNGs), which, in contrast to algorithmic methods, are based on unpredictable physical random phenomena, have attracted considerable research interest. However, the way that we extract random bits from those physical entropy sources has a large influence on the efficiency and performance of the system. In this manuscript, we will review and discuss several randomness extraction schemes that are based on radiation or photon arrival times. We analyze the robustness, post-processing requirements and, in particular, the extraction efficiency of those methods to aid in the construction of efficient, compact and robust physical RNG systems.

  12. High-efficient Extraction of Drainage Networks from Digital Elevation Model Data Constrained by Enhanced Flow Enforcement from Known River Map

    NASA Astrophysics Data System (ADS)

    Wu, T.; Li, T.; Li, J.; Wang, G.

    2017-12-01

    Improved drainage network extraction can be achieved by flow enforcement whereby information of known river maps is imposed to the flow-path modeling process. However, the common elevation-based stream burning method can sometimes cause unintended topological errors and misinterpret the overall drainage pattern. We presented an enhanced flow enforcement method to facilitate accurate and efficient process of drainage network extraction. Both the topology of the mapped hydrography and the initial landscape of the DEM are well preserved and fully utilized in the proposed method. An improved stream rasterization is achieved here, yielding continuous, unambiguous and stream-collision-free raster equivalent of stream vectors for flow enforcement. By imposing priority-based enforcement with a complementary flow direction enhancement procedure, the drainage patterns of the mapped hydrography are fully represented in the derived results. The proposed method was tested over the Rogue River Basin, using DEMs with various resolutions. As indicated by the visual and statistical analyses, the proposed method has three major advantages: (1) it significantly reduces the occurrences of topological errors, yielding very accurate watershed partition and channel delineation, (2) it ensures scale-consistent performance at DEMs of various resolutions, and (3) the entire extraction process is well-designed to achieve great computational efficiency.

  13. Self-Supervised Chinese Ontology Learning from Online Encyclopedias

    PubMed Central

    Shao, Zhiqing; Ruan, Tong

    2014-01-01

    Constructing ontology manually is a time-consuming, error-prone, and tedious task. We present SSCO, a self-supervised learning based chinese ontology, which contains about 255 thousand concepts, 5 million entities, and 40 million facts. We explore the three largest online Chinese encyclopedias for ontology learning and describe how to transfer the structured knowledge in encyclopedias, including article titles, category labels, redirection pages, taxonomy systems, and InfoBox modules, into ontological form. In order to avoid the errors in encyclopedias and enrich the learnt ontology, we also apply some machine learning based methods. First, we proof that the self-supervised machine learning method is practicable in Chinese relation extraction (at least for synonymy and hyponymy) statistically and experimentally and train some self-supervised models (SVMs and CRFs) for synonymy extraction, concept-subconcept relation extraction, and concept-instance relation extraction; the advantages of our methods are that all training examples are automatically generated from the structural information of encyclopedias and a few general heuristic rules. Finally, we evaluate SSCO in two aspects, scale and precision; manual evaluation results show that the ontology has excellent precision, and high coverage is concluded by comparing SSCO with other famous ontologies and knowledge bases; the experiment results also indicate that the self-supervised models obviously enrich SSCO. PMID:24715819

  14. Contextual Classification of Point Cloud Data by Exploiting Individual 3d Neigbourhoods

    NASA Astrophysics Data System (ADS)

    Weinmann, M.; Schmidt, A.; Mallet, C.; Hinz, S.; Rottensteiner, F.; Jutzi, B.

    2015-03-01

    The fully automated analysis of 3D point clouds is of great importance in photogrammetry, remote sensing and computer vision. For reliably extracting objects such as buildings, road inventory or vegetation, many approaches rely on the results of a point cloud classification, where each 3D point is assigned a respective semantic class label. Such an assignment, in turn, typically involves statistical methods for feature extraction and machine learning. Whereas the different components in the processing workflow have extensively, but separately been investigated in recent years, the respective connection by sharing the results of crucial tasks across all components has not yet been addressed. This connection not only encapsulates the interrelated issues of neighborhood selection and feature extraction, but also the issue of how to involve spatial context in the classification step. In this paper, we present a novel and generic approach for 3D scene analysis which relies on (i) individually optimized 3D neighborhoods for (ii) the extraction of distinctive geometric features and (iii) the contextual classification of point cloud data. For a labeled benchmark dataset, we demonstrate the beneficial impact of involving contextual information in the classification process and that using individual 3D neighborhoods of optimal size significantly increases the quality of the results for both pointwise and contextual classification.

  15. Self-supervised Chinese ontology learning from online encyclopedias.

    PubMed

    Hu, Fanghuai; Shao, Zhiqing; Ruan, Tong

    2014-01-01

    Constructing ontology manually is a time-consuming, error-prone, and tedious task. We present SSCO, a self-supervised learning based chinese ontology, which contains about 255 thousand concepts, 5 million entities, and 40 million facts. We explore the three largest online Chinese encyclopedias for ontology learning and describe how to transfer the structured knowledge in encyclopedias, including article titles, category labels, redirection pages, taxonomy systems, and InfoBox modules, into ontological form. In order to avoid the errors in encyclopedias and enrich the learnt ontology, we also apply some machine learning based methods. First, we proof that the self-supervised machine learning method is practicable in Chinese relation extraction (at least for synonymy and hyponymy) statistically and experimentally and train some self-supervised models (SVMs and CRFs) for synonymy extraction, concept-subconcept relation extraction, and concept-instance relation extraction; the advantages of our methods are that all training examples are automatically generated from the structural information of encyclopedias and a few general heuristic rules. Finally, we evaluate SSCO in two aspects, scale and precision; manual evaluation results show that the ontology has excellent precision, and high coverage is concluded by comparing SSCO with other famous ontologies and knowledge bases; the experiment results also indicate that the self-supervised models obviously enrich SSCO.

  16. The use of experimental structures to model protein dynamics.

    PubMed

    Katebi, Ataur R; Sankar, Kannan; Jia, Kejue; Jernigan, Robert L

    2015-01-01

    The number of solved protein structures submitted in the Protein Data Bank (PDB) has increased dramatically in recent years. For some specific proteins, this number is very high-for example, there are over 550 solved structures for HIV-1 protease, one protein that is essential for the life cycle of human immunodeficiency virus (HIV) which causes acquired immunodeficiency syndrome (AIDS) in humans. The large number of structures for the same protein and its variants include a sample of different conformational states of the protein. A rich set of structures solved experimentally for the same protein has information buried within the dataset that can explain the functional dynamics and structural mechanism of the protein. To extract the dynamics information and functional mechanism from the experimental structures, this chapter focuses on two methods-Principal Component Analysis (PCA) and Elastic Network Models (ENM). PCA is a widely used statistical dimensionality reduction technique to classify and visualize high-dimensional data. On the other hand, ENMs are well-established simple biophysical method for modeling the functionally important global motions of proteins. This chapter covers the basics of these two. Moreover, an improved ENM version that utilizes the variations found within a given set of structures for a protein is described. As a practical example, we have extracted the functional dynamics and mechanism of HIV-1 protease dimeric structure by using a set of 329 PDB structures of this protein. We have described, step by step, how to select a set of protein structures, how to extract the needed information from the PDB files for PCA, how to extract the dynamics information using PCA, how to calculate ENM modes, how to measure the congruency between the dynamics computed from the principal components (PCs) and the ENM modes, and how to compute entropies using the PCs. We provide the computer programs or references to software tools to accomplish each step and show how to use these programs and tools. We also include computer programs to generate movies based on PCs and ENM modes and describe how to visualize them.

  17. Multi-scale image segmentation method with visual saliency constraints and its application

    NASA Astrophysics Data System (ADS)

    Chen, Yan; Yu, Jie; Sun, Kaimin

    2018-03-01

    Object-based image analysis method has many advantages over pixel-based methods, so it is one of the current research hotspots. It is very important to get the image objects by multi-scale image segmentation in order to carry out object-based image analysis. The current popular image segmentation methods mainly share the bottom-up segmentation principle, which is simple to realize and the object boundaries obtained are accurate. However, the macro statistical characteristics of the image areas are difficult to be taken into account, and fragmented segmentation (or over-segmentation) results are difficult to avoid. In addition, when it comes to information extraction, target recognition and other applications, image targets are not equally important, i.e., some specific targets or target groups with particular features worth more attention than the others. To avoid the problem of over-segmentation and highlight the targets of interest, this paper proposes a multi-scale image segmentation method with visually saliency graph constraints. Visual saliency theory and the typical feature extraction method are adopted to obtain the visual saliency information, especially the macroscopic information to be analyzed. The visual saliency information is used as a distribution map of homogeneity weight, where each pixel is given a weight. This weight acts as one of the merging constraints in the multi- scale image segmentation. As a result, pixels that macroscopically belong to the same object but are locally different can be more likely assigned to one same object. In addition, due to the constraint of visual saliency model, the constraint ability over local-macroscopic characteristics can be well controlled during the segmentation process based on different objects. These controls will improve the completeness of visually saliency areas in the segmentation results while diluting the controlling effect for non- saliency background areas. Experiments show that this method works better for texture image segmentation than traditional multi-scale image segmentation methods, and can enable us to give priority control to the saliency objects of interest. This method has been used in image quality evaluation, scattered residential area extraction, sparse forest extraction and other applications to verify its validation. All applications showed good results.

  18. The Use of Experimental Structures to Model Protein Dynamics

    PubMed Central

    Katebi, Ataur R.; Sankar, Kannan; Jia, Kejue; Jernigan, Robert L.

    2014-01-01

    Summary The number of solved protein structures submitted in the Protein Data Bank (PDB) has increased dramatically in recent years. For some specific proteins, this number is very high – for example, there are over 550 solved structures for HIV-1 protease, one protein that is essential for the life cycle of human immunodeficiency virus (HIV) which causes acquired immunodeficiency syndrome (AIDS) in humans. The large number of structures for the same protein and its variants include a sample of different conformational states of the protein. A rich set of structures solved experimentally for the same protein has information buried within the dataset that can explain the functional dynamics and structural mechanism of the protein. To extract the dynamics information and functional mechanism from the experimental structures, this chapter focuses on two methods – Principal Component Analysis (PCA) and Elastic Network Models (ENM). PCA is a widely used statistical dimensionality reduction technique to classify and visualize high-dimensional data. On the other hand, ENMs are well-established simple biophysical method for modeling the functionally important global motions of proteins. This chapter covers the basics of these two. Moreover, an improved ENM version that utilizes the variations found within a given set of structures for a protein is described. As a practical example, we have extracted the functional dynamics and mechanism of HIV-1 protease dimeric structure by using a set of 329 PDB structures of this protein. We have described, step by step, how to select a set of protein structures, how to extract the needed information from the PDB files for PCA, how to extract the dynamics information using PCA, how to calculate ENM modes, how to measure the congruency between the dynamics computed from the principal components (PCs) and the ENM modes, and how to compute entropies using the PCs. We provide the computer programs or references to software tools to accomplish each step and show how to use these programs and tools. We also include computer programs to generate movies based on PCs and ENM modes and describe how to visualize them. PMID:25330965

  19. Spatiotemporal modelling of groundwater extraction in semi-arid central Queensland, Australia

    NASA Astrophysics Data System (ADS)

    Keir, Greg; Bulovic, Nevenka; McIntyre, Neil

    2016-04-01

    The semi-arid Surat Basin in central Queensland, Australia, forms part of the Great Artesian Basin, a groundwater resource of national significance. While this area relies heavily on groundwater supply bores to sustain agricultural industries and rural life in general, measurement of groundwater extraction rates is very limited. Consequently, regional groundwater extraction rates are not well known, which may have implications for regional numerical groundwater modelling. However, flows from a small number of bores are metered, and less precise anecdotal estimates of extraction are increasingly available. There is also an increasing number of other spatiotemporal datasets which may help predict extraction rates (e.g. rainfall, temperature, soils, stocking rates etc.). These can be used to construct spatial multivariate regression models to estimate extraction. The data exhibit complicated statistical features, such as zero-valued observations, non-Gaussianity, and non-stationarity, which limit the use of many classical estimation techniques, such as kriging. As well, water extraction histories may exhibit temporal autocorrelation. To account for these features, we employ a separable space-time model to predict bore extraction rates using the R-INLA package for computationally efficient Bayesian inference. A joint approach is used to model both the probability (using a binomial likelihood) and magnitude (using a gamma likelihood) of extraction. The correlation between extraction rates in space and time is modelled using a Gaussian Markov Random Field (GMRF) with a Matérn spatial covariance function which can evolve over time according to an autoregressive model. To reduce computational burden, we allow the GMRF to be evaluated at a relatively coarse temporal resolution, while still allowing predictions to be made at arbitrarily small time scales. We describe the process of model selection and inference using an information criterion approach, and present some preliminary results from the study area. We conclude by discussing issues related with upscaling of the modelling approach to the entire basin, including merging of extraction rate observations with different precision, temporal resolution, and even potentially different likelihoods.

  20. Kinetic quantitation of cerebral PET-FDG studies without concurrent blood sampling: statistical recovery of the arterial input function.

    PubMed

    O'Sullivan, F; Kirrane, J; Muzi, M; O'Sullivan, J N; Spence, A M; Mankoff, D A; Krohn, K A

    2010-03-01

    Kinetic quantitation of dynamic positron emission tomography (PET) studies via compartmental modeling usually requires the time-course of the radio-tracer concentration in the arterial blood as an arterial input function (AIF). For human and animal imaging applications, significant practical difficulties are associated with direct arterial sampling and as a result there is substantial interest in alternative methods that require no blood sampling at the time of the study. A fixed population template input function derived from prior experience with directly sampled arterial curves is one possibility. Image-based extraction, including requisite adjustment for spillover and recovery, is another approach. The present work considers a hybrid statistical approach based on a penalty formulation in which the information derived from a priori studies is combined in a Bayesian manner with information contained in the sampled image data in order to obtain an input function estimate. The absolute scaling of the input is achieved by an empirical calibration equation involving the injected dose together with the subject's weight, height and gender. The technique is illustrated in the context of (18)F -Fluorodeoxyglucose (FDG) PET studies in humans. A collection of 79 arterially sampled FDG blood curves are used as a basis for a priori characterization of input function variability, including scaling characteristics. Data from a series of 12 dynamic cerebral FDG PET studies in normal subjects are used to evaluate the performance of the penalty-based AIF estimation technique. The focus of evaluations is on quantitation of FDG kinetics over a set of 10 regional brain structures. As well as the new method, a fixed population template AIF and a direct AIF estimate based on segmentation are also considered. Kinetics analyses resulting from these three AIFs are compared with those resulting from radially sampled AIFs. The proposed penalty-based AIF extraction method is found to achieve significant improvements over the fixed template and the segmentation methods. As well as achieving acceptable kinetic parameter accuracy, the quality of fit of the region of interest (ROI) time-course data based on the extracted AIF, matches results based on arterially sampled AIFs. In comparison, significant deviation in the estimation of FDG flux and degradation in ROI data fit are found with the template and segmentation methods. The proposed AIF extraction method is recommended for practical use.

  1. High-Resolution Remote Sensing Image Building Extraction Based on Markov Model

    NASA Astrophysics Data System (ADS)

    Zhao, W.; Yan, L.; Chang, Y.; Gong, L.

    2018-04-01

    With the increase of resolution, remote sensing images have the characteristics of increased information load, increased noise, more complex feature geometry and texture information, which makes the extraction of building information more difficult. To solve this problem, this paper designs a high resolution remote sensing image building extraction method based on Markov model. This method introduces Contourlet domain map clustering and Markov model, captures and enhances the contour and texture information of high-resolution remote sensing image features in multiple directions, and further designs the spectral feature index that can characterize "pseudo-buildings" in the building area. Through the multi-scale segmentation and extraction of image features, the fine extraction from the building area to the building is realized. Experiments show that this method can restrain the noise of high-resolution remote sensing images, reduce the interference of non-target ground texture information, and remove the shadow, vegetation and other pseudo-building information, compared with the traditional pixel-level image information extraction, better performance in building extraction precision, accuracy and completeness.

  2. Using electronic patient records to discover disease correlations and stratify patient cohorts.

    PubMed

    Roque, Francisco S; Jensen, Peter B; Schmock, Henriette; Dalgaard, Marlene; Andreatta, Massimo; Hansen, Thomas; Søeby, Karen; Bredkjær, Søren; Juul, Anders; Werge, Thomas; Jensen, Lars J; Brunak, Søren

    2011-08-01

    Electronic patient records remain a rather unexplored, but potentially rich data source for discovering correlations between diseases. We describe a general approach for gathering phenotypic descriptions of patients from medical records in a systematic and non-cohort dependent manner. By extracting phenotype information from the free-text in such records we demonstrate that we can extend the information contained in the structured record data, and use it for producing fine-grained patient stratification and disease co-occurrence statistics. The approach uses a dictionary based on the International Classification of Disease ontology and is therefore in principle language independent. As a use case we show how records from a Danish psychiatric hospital lead to the identification of disease correlations, which subsequently can be mapped to systems biology frameworks.

  3. Weak-value amplification as an optimal metrological protocol

    NASA Astrophysics Data System (ADS)

    Alves, G. Bié; Escher, B. M.; de Matos Filho, R. L.; Zagury, N.; Davidovich, L.

    2015-06-01

    The implementation of weak-value amplification requires the pre- and postselection of states of a quantum system, followed by the observation of the response of the meter, which interacts weakly with the system. Data acquisition from the meter is conditioned to successful postselection events. Here we derive an optimal postselection procedure for estimating the coupling constant between system and meter and show that it leads both to weak-value amplification and to the saturation of the quantum Fisher information, under conditions fulfilled by all previously reported experiments on the amplification of weak signals. For most of the preselected states, full information on the coupling constant can be extracted from the meter data set alone, while for a small fraction of the space of preselected states, it must be obtained from the postselection statistics.

  4. Inflammation response and cytotoxic effects in human THP-1 cells of size-fractionated PM10 extracts in a polluted urban site.

    PubMed

    Schilirò, T; Alessandria, L; Bonetta, S; Carraro, E; Gilli, G

    2016-02-01

    To contribute to a greater characterization of the airborne particulate matter's toxicity, size-fractionated PM10 was sampled during different seasons in a polluted urban site in Torino, a northern Italian city. Three main size fractions (PM10 - 3 μm; PM3 - 0.95 μm; PM < 0.95 μm) extracts (organic and aqueous) were assayed with THP-1 cells to evaluate their effects on cell proliferation, LDH activity, TNFα, IL-8 and CYP1A1 expression. The mean PM10 concentrations were statistically different in summer and in winter and the finest fraction PM<0.95 was always higher than the others. Size-fractionated PM10 extracts, sampled in an urban traffic meteorological-chemical station produced size-related toxicological effects in relation to season and particles extraction. The PM summer extracts induced a significant release of LDH compared to winter and produced a size-related effect, with higher values measured with PM10-3. Exposure to size-fractionated PM10 extracts did not induce significant expression of TNFα. IL-8 expression was influenced by exposure to size-fractionated PM10 extracts and statistically significant differences were found between kind of extracts for both seasons. The mean fold increases in CYP1A1 expression were statistically different in summer and in winter; winter fraction extracts produced a size-related effect, in particular for organic samples with higher values measured with PM<0.95 extracts. Our results confirm that the only measure of PM can be misleading for the assessment of air quality moreover we support efforts toward identifying potential effect-based tools (e.g. in vitro test) that could be used in the context of the different monitoring programs. Copyright © 2015 Elsevier Ltd. All rights reserved.

  5. Standardized Patients versus Volunteer Patients for Physical Therapy Students' Interviewing Practice: A Pilot Study.

    PubMed

    Murphy, Sue; Imam, Bita; MacIntyre, Donna L

    2015-01-01

    To compare the use of standardized patients (SPs) and volunteer patients (VPs) for physical therapy students' interviewing practice in terms of students' perception and overall costs. Students in the Master of Physical Therapy programme (n=80) at a Canadian university were divided into 20 groups of 4 and were randomly assigned to interview either an SP (10 groups) or a VP (10 groups). Students completed a survey about their perception of the usefulness of the activity and the ease and depth of information extraction. Survey responses as well as costs of the interview exercise were compared between SP and VP groups. No statistically significant between-groups difference was found for the majority of survey items. The cost of using an SP was $148, versus $50 for a VP. Students' perceptions of the usefulness of the activity in helping them to develop their interview skills and of the ease and depth of extracting information were similar for both SPs and VPs. Because the cost of using an SP is about three times that of using a VP, using VPs seem to be a more cost-effective option.

  6. Space station integrated wall damage and penetration damage control. Task 5: Space debris measurement, mapping and characterization system

    NASA Technical Reports Server (NTRS)

    Lempriere, B. M.

    1987-01-01

    The procedures and results of a study of a conceptual system for measuring the debris environment on the space station is discussed. The study was conducted in two phases: the first consisted of experiments aimed at evaluating location of impact through panel response data collected from acoustic emission sensors; the second analyzed the available statistical description of the environment to determine the probability of the measurement system producing useful data, and analyzed the results of the previous tests to evaluate the accuracy of location and the feasibility of extracting impactor characteristics from the panel response. The conclusions were that for one panel the system would not be exposed to any event, but that the entire Logistics Module would provide a modest amount of data. The use of sensors with higher sensitivity than those used in the tests could be advantageous. The impact location could be found with sufficient accuracy from panel response data. The waveforms of the response were shown to contain information on the impact characteristics, but the data set did not span a sufficient range of the variables necessary to evaluate the feasibility of extracting the information.

  7. Derivation of groundwater flow-paths based on semi-automatic extraction of lineaments from remote sensing data

    NASA Astrophysics Data System (ADS)

    Mallast, U.; Gloaguen, R.; Geyer, S.; Rödiger, T.; Siebert, C.

    2011-08-01

    In this paper we present a semi-automatic method to infer groundwater flow-paths based on the extraction of lineaments from digital elevation models. This method is especially adequate in remote and inaccessible areas where in-situ data are scarce. The combined method of linear filtering and object-based classification provides a lineament map with a high degree of accuracy. Subsequently, lineaments are differentiated into geological and morphological lineaments using auxiliary information and finally evaluated in terms of hydro-geological significance. Using the example of the western catchment of the Dead Sea (Israel/Palestine), the orientation and location of the differentiated lineaments are compared to characteristics of known structural features. We demonstrate that a strong correlation between lineaments and structural features exists. Using Euclidean distances between lineaments and wells provides an assessment criterion to evaluate the hydraulic significance of detected lineaments. Based on this analysis, we suggest that the statistical analysis of lineaments allows a delineation of flow-paths and thus significant information on groundwater movements. To validate the flow-paths we compare them to existing results of groundwater models that are based on well data.

  8. The 2D analytic signal for envelope detection and feature extraction on ultrasound images.

    PubMed

    Wachinger, Christian; Klein, Tassilo; Navab, Nassir

    2012-08-01

    The fundamental property of the analytic signal is the split of identity, meaning the separation of qualitative and quantitative information in form of the local phase and the local amplitude, respectively. Especially the structural representation, independent of brightness and contrast, of the local phase is interesting for numerous image processing tasks. Recently, the extension of the analytic signal from 1D to 2D, covering also intrinsic 2D structures, was proposed. We show the advantages of this improved concept on ultrasound RF and B-mode images. Precisely, we use the 2D analytic signal for the envelope detection of RF data. This leads to advantages for the extraction of the information-bearing signal from the modulated carrier wave. We illustrate this, first, by visual assessment of the images, and second, by performing goodness-of-fit tests to a Nakagami distribution, indicating a clear improvement of statistical properties. The evaluation is performed for multiple window sizes and parameter estimation techniques. Finally, we show that the 2D analytic signal allows for an improved estimation of local features on B-mode images. Copyright © 2012 Elsevier B.V. All rights reserved.

  9. GIS-based analysis and modelling with empirical and remotely-sensed data on coastline advance and retreat

    NASA Astrophysics Data System (ADS)

    Ahmad, Sajid Rashid

    With the understanding that far more research remains to be done on the development and use of innovative and functional geospatial techniques and procedures to investigate coastline changes this thesis focussed on the integration of remote sensing, geographical information systems (GIS) and modelling techniques to provide meaningful insights on the spatial and temporal dynamics of coastline changes. One of the unique strengths of this research was the parameterization of the GIS with long-term empirical and remote sensing data. Annual empirical data from 1941--2007 were analyzed by the GIS, and then modelled with statistical techniques. Data were also extracted from Landsat TM and ETM+ images. The band ratio method was used to extract the coastlines. Topographic maps were also used to extract digital map data. All data incorporated into ArcGIS 9.2 were analyzed with various modules, including Spatial Analyst, 3D Analyst, and Triangulated Irregular Networks. The Digital Shoreline Analysis System was used to analyze and predict rates of coastline change. GIS results showed the spatial locations along the coast that will either advance or retreat over time. The linear regression results highlighted temporal changes which are likely to occur along the coastline. Box-Jenkins modelling procedures were utilized to determine statistical models which best described the time series (1941--2007) of coastline change data. After several iterations and goodness-of-fit tests, second-order spatial cyclic autoregressive models, first-order autoregressive models and autoregressive moving average models were identified as being appropriate for describing the deterministic and random processes operating in Guyana's coastal system. The models highlighted not only cyclical patterns in advance and retreat of the coastline, but also the existence of short and long-term memory processes. Long-term memory processes could be associated with mudshoal propagation and stabilization while short-term memory processes were indicative of transitory hydrodynamic and other processes. An innovative framework for a spatio-temporal information-based system (STIBS) was developed. STIBS incorporated diverse datasets within a GIS, dynamic computer-based simulation models, and a spatial information query and graphical subsystem. Tests of the STIBS proved that it could be used to simulate and visualize temporal variability in shifting morphological states of the coastline.

  10. European and Mexican vs US diagnostic extracts of Bermuda grass and cat in skin testing.

    PubMed

    Larenas-Linnemann, Désirée; Cruz, Alfredo Arias; Gutierrez, Isabel Rojo; Rodriguez, Pablo; Shah-Hosseini, Kijawasch; Michels, Alexandra; Mösges, Ralph

    2011-05-01

    Laboratory testing of various diagnostic extracts has shown lower potencies for several European and Mexican extracts relative to the US Food and Drug Administration (FDA) reference (10,000 BAU/mL). Quantitative skin prick testing (QSPT) with Dermatophagoides pteronyssinus extracts have previously shown a similar picture. To compare European and Mexican Bermuda grass (BG) and cat diagnostic extracts against an FDA-validated extract using QSPT. Six diagnostic BG and cat extracts (1 reference FDA extract, 3 European extracts, 1 imported nonstandardized extract from the United States, and 1 Mexican extract) were tested with quadruplicate QSPT, as a concentrate and as 2 serial 2-fold dilutions, in cat and BG allergic individuals. BG showed good dose response in wheal size for the concentrate (1:2-1:4 dilutions; steep part of the curve). Cat showed poorer dose response. The Wilcoxon test for linked random samples was used to investigate whether the distribution of the reference differed from each of the test extracts to a statistically significant degree (2-sided asymptotic significance, α = .05). All BG and 2 cat extracts were statistically less potent than the 10,000 BAU/mL US reference. European BG extracts were 7,700, 4,100, and 1,600 BAU/mL, and cat extracts were 12,500, 4,400, and 5,100 BAU/mL. The potency of some diagnostic extracts of BG and cat used in Europe, Mexico, and the United States differs, with the US extracts being generally more potent. On the basis of provocation tests, optimal diagnostic concentrations should be determined. Similar comparisons using other manufacturers and therapeutic extracts might be interesting. Copyright © 2011 American College of Allergy, Asthma & Immunology. Published by Elsevier Inc. All rights reserved.

  11. Effects of preprocessing Landsat MSS data on derived features

    NASA Technical Reports Server (NTRS)

    Parris, T. M.; Cicone, R. C.

    1983-01-01

    Important to the use of multitemporal Landsat MSS data for earth resources monitoring, such as agricultural inventories, is the ability to minimize the effects of varying atmospheric and satellite viewing conditions, while extracting physically meaningful features from the data. In general, the approaches to the preprocessing problem have been derived from either physical or statistical models. This paper compares three proposed algorithms; XSTAR haze correction, Color Normalization, and Multiple Acquisition Mean Level Adjustment. These techniques represent physical, statistical, and hybrid physical-statistical models, respectively. The comparisons are made in the context of three feature extraction techniques; the Tasseled Cap, the Cate Color Cube. and Normalized Difference.

  12. Antiviral activity of Quercus persica L.: High efficacy and low toxicity

    PubMed Central

    Karimi, Ali; Moradi, Mohammad-Taghi; Saeedi, Mojtaba; Asgari, Sedigheh; Rafieian-kopaei, Mahmoud

    2013-01-01

    Background: Drug-resistant strain of Herpes simplex virus type 1 (HSV-I) has increased the interest in the use of natural substances. Aims: This study was aimed to determine minimum inhibitory concentration of hydroalchoholic extract of a traditionally used herbal plant, Quercus persica L., on HSV-1 replication on baby hamster kidney (BHK) cells. Setting: The study was conducted in Shahrekord University of Medical Sciences, Iran. Design: This was an experimental study. Materials and Methods: BHK cells were grown in monolayer culture with Dulbecco's modified Eagle's medium (DMEM) supplemented with 5% fetal calf serum and plated onto 48-well culture plates. Fifty percent cytotoxic concentration (CC50%) of Q. persica L. on BHK cells was determined. Subsequently, 50% inhibitory concentration (IC50%) of the extract on replication of HSV-1 both in interacellular and exteracellular cases was assessed. Statistical Analysis: Statistic Probit model was used for statistical analysis. The dose-dependent effect of antiviral activity of the extracts was determined by linear regression. Results: Q. persica L. had no cytotoxic effect on this cell line. There was significant relationship between the concentration of the extract and cell death (P<0.01). IC50s of Q. persica L. on HSV-1, before and after attachment to BHK cells were 1.02 and 0.257 μg/mL, respectively. There was significant relationship between the concentration of this extract and inhibition of cytopathic effect (CPE) (P<0.05). Antioxidant capacity of the extract was 67.5%. Conclusions: The hydroalchoholic extract of Q. persica L. is potentially an appropriate and promising anti herpetic herbal medicine. PMID:24516836

  13. Multi-level emulation of complex climate model responses to boundary forcing data

    NASA Astrophysics Data System (ADS)

    Tran, Giang T.; Oliver, Kevin I. C.; Holden, Philip B.; Edwards, Neil R.; Sóbester, András; Challenor, Peter

    2018-04-01

    Climate model components involve both high-dimensional input and output fields. It is desirable to efficiently generate spatio-temporal outputs of these models for applications in integrated assessment modelling or to assess the statistical relationship between such sets of inputs and outputs, for example, uncertainty analysis. However, the need for efficiency often compromises the fidelity of output through the use of low complexity models. Here, we develop a technique which combines statistical emulation with a dimensionality reduction technique to emulate a wide range of outputs from an atmospheric general circulation model, PLASIM, as functions of the boundary forcing prescribed by the ocean component of a lower complexity climate model, GENIE-1. Although accurate and detailed spatial information on atmospheric variables such as precipitation and wind speed is well beyond the capability of GENIE-1's energy-moisture balance model of the atmosphere, this study demonstrates that the output of this model is useful in predicting PLASIM's spatio-temporal fields through multi-level emulation. Meaningful information from the fast model, GENIE-1 was extracted by utilising the correlation between variables of the same type in the two models and between variables of different types in PLASIM. We present here the construction and validation of several PLASIM variable emulators and discuss their potential use in developing a hybrid model with statistical components.

  14. A novelty detection diagnostic methodology for gearboxes operating under fluctuating operating conditions using probabilistic techniques

    NASA Astrophysics Data System (ADS)

    Schmidt, S.; Heyns, P. S.; de Villiers, J. P.

    2018-02-01

    In this paper, a fault diagnostic methodology is developed which is able to detect, locate and trend gear faults under fluctuating operating conditions when only vibration data from a single transducer, measured on a healthy gearbox are available. A two-phase feature extraction and modelling process is proposed to infer the operating condition and based on the operating condition, to detect changes in the machine condition. Information from optimised machine and operating condition hidden Markov models are statistically combined to generate a discrepancy signal which is post-processed to infer the condition of the gearbox. The discrepancy signal is processed and combined with statistical methods for automatic fault detection and localisation and to perform fault trending over time. The proposed methodology is validated on experimental data and a tacholess order tracking methodology is used to enhance the cost-effectiveness of the diagnostic methodology.

  15. A combinatorial framework to quantify peak/pit asymmetries in complex dynamics.

    PubMed

    Hasson, Uri; Iacovacci, Jacopo; Davis, Ben; Flanagan, Ryan; Tagliazucchi, Enzo; Laufs, Helmut; Lacasa, Lucas

    2018-02-23

    We explore a combinatorial framework which efficiently quantifies the asymmetries between minima and maxima in local fluctuations of time series. We first showcase its performance by applying it to a battery of synthetic cases. We find rigorous results on some canonical dynamical models (stochastic processes with and without correlations, chaotic processes) complemented by extensive numerical simulations for a range of processes which indicate that the methodology correctly distinguishes different complex dynamics and outperforms state of the art metrics in several cases. Subsequently, we apply this methodology to real-world problems emerging across several disciplines including cases in neurobiology, finance and climate science. We conclude that differences between the statistics of local maxima and local minima in time series are highly informative of the complex underlying dynamics and a graph-theoretic extraction procedure allows to use these features for statistical learning purposes.

  16. Inflated Uncertainty in Multimodel-Based Regional Climate Projections.

    PubMed

    Madsen, Marianne Sloth; Langen, Peter L; Boberg, Fredrik; Christensen, Jens Hesselbjerg

    2017-11-28

    Multimodel ensembles are widely analyzed to estimate the range of future regional climate change projections. For an ensemble of climate models, the result is often portrayed by showing maps of the geographical distribution of the multimodel mean results and associated uncertainties represented by model spread at the grid point scale. Here we use a set of CMIP5 models to show that presenting statistics this way results in an overestimation of the projected range leading to physically implausible patterns of change on global but also on regional scales. We point out that similar inconsistencies occur in impact analyses relying on multimodel information extracted using statistics at the regional scale, for example, when a subset of CMIP models is selected to represent regional model spread. Consequently, the risk of unwanted impacts may be overestimated at larger scales as climate change impacts will never be realized as the worst (or best) case everywhere.

  17. Statistical mapping of zones of focused groundwater/surface-water exchange using fiber-optic distributed temperature sensing

    USGS Publications Warehouse

    Mwakanyamale, Kisa; Day-Lewis, Frederick D.; Slater, Lee D.

    2013-01-01

    Fiber-optic distributed temperature sensing (FO-DTS) increasingly is used to map zones of focused groundwater/surface-water exchange (GWSWE). Previous studies of GWSWE using FO-DTS involved identification of zones of focused GWSWE based on arbitrary cutoffs of FO-DTS time-series statistics (e.g., variance, cross-correlation between temperature and stage, or spectral power). New approaches are needed to extract more quantitative information from large, complex FO-DTS data sets while concurrently providing an assessment of uncertainty associated with mapping zones of focused GSWSE. Toward this end, we present a strategy combining discriminant analysis (DA) and spectral analysis (SA). We demonstrate the approach using field experimental data from a reach of the Columbia River adjacent to the Hanford 300 Area site. Results of the combined SA/DA approach are shown to be superior to previous results from qualitative interpretation of FO-DTS spectra alone.

  18. Automated breast tissue density assessment using high order regional texture descriptors in mammography

    NASA Astrophysics Data System (ADS)

    Law, Yan Nei; Lieng, Monica Keiko; Li, Jingmei; Khoo, David Aik-Aun

    2014-03-01

    Breast cancer is the most common cancer and second leading cause of cancer death among women in the US. The relative survival rate is lower among women with a more advanced stage at diagnosis. Early detection through screening is vital. Mammography is the most widely used and only proven screening method for reliably and effectively detecting abnormal breast tissues. In particular, mammographic density is one of the strongest breast cancer risk factors, after age and gender, and can be used to assess the future risk of disease before individuals become symptomatic. A reliable method for automatic density assessment would be beneficial and could assist radiologists in the evaluation of mammograms. To address this problem, we propose a density classification method which uses statistical features from different parts of the breast. Our method is composed of three parts: breast region identification, feature extraction and building ensemble classifiers for density assessment. It explores the potential of the features extracted from second and higher order statistical information for mammographic density classification. We further investigate the registration of bilateral pairs and time-series of mammograms. The experimental results on 322 mammograms demonstrate that (1) a classifier using features from dense regions has higher discriminative power than a classifier using only features from the whole breast region; (2) these high-order features can be effectively combined to boost the classification accuracy; (3) a classifier using these statistical features from dense regions achieves 75% accuracy, which is a significant improvement from 70% accuracy obtained by the existing approaches.

  19. Extraction of maxillary canines: Esthetic perceptions of patient smiles among dental professionals and laypeople.

    PubMed

    Thiruvenkatachari, Badri; Javidi, Hanieh; Griffiths, Sarah Elizabeth; Shah, Anwar A; Sandler, Jonathan

    2017-10-01

    Maxillary canines are generally considered important both cosmetically and functionally. Most claims on the importance of maxillary canines, however, have been based on expert opinions and clinician-based studies. There are no scientific studies in the literature reporting on their cosmetic importance or how laypeople perceive a smile treated by maxillary canine extractions. Our objective was to investigate whether there is any difference in the perceptions of patients' smiles treated by extracting either maxillary canines or first premolars, as judged by orthodontists, dentists, and laypeople. This retrospective study included 24 participants who had unilateral or bilateral extraction of maxillary permanent canines and fixed appliances in the maxillary and mandibular arches to comprehensively correct the malocclusion, selected from orthodontic patients treated at Chesterfield Royal Hospital NHS trust in the United Kingdom over the last 20 years. The control group of patients had extraction of maxillary first premolars followed by fixed appliances and finished to an extremely high standard judged by the requirement that they had been submitted for the Membership in Orthodontics examination. The finished Peer Assessment Rating scores for this group were less than 5. The end-of-treatment frontal extraoral smiling and frontal intraoral views were presented for both groups. The photographs were blinded for extraction choice and standardized for size and brightness using computer software (Adobe Photoshop CC version 14.0; Adobe Systems, San Jose, Calif). The work file was converted to an editable pdf file and e-mailed to the assessors. The assessor panel consisted of 30 members (10 orthodontists, 10 dentists, and 10 laypeople), who were purposely selected. The measures were rated on a 10-point Likert scale. The attractiveness ratings were not statistically significantly different between the canine extraction and premolar extraction groups, with a mean difference of 0.33 (SD, 0.29) points. A 1-way repeated-measures analysis of variance to test the difference in scores among the laypeople, orthodontists, and dentists (n = 30) showed no statistically significant difference (Wilks lambda = 0.835; P = 0.138), and the Bonferroni test indicated that no pair-wise difference was statistically significant. No statistically significant difference was found in the smile attractiveness between canine extraction and premolar extraction patients as assessed by general dentists, laypeople, and orthodontists. Further high-quality studies are required to evaluate the effect of canine extraction and premolar substitution on functional occlusion. Crown Copyright © 2017. Published by Elsevier Inc. All rights reserved.

  20. Mapping model behaviour using Self-Organizing Maps

    NASA Astrophysics Data System (ADS)

    Herbst, M.; Gupta, H. V.; Casper, M. C.

    2009-03-01

    Hydrological model evaluation and identification essentially involves extracting and processing information from model time series. However, the type of information extracted by statistical measures has only very limited meaning because it does not relate to the hydrological context of the data. To overcome this inadequacy we exploit the diagnostic evaluation concept of Signature Indices, in which model performance is measured using theoretically relevant characteristics of system behaviour. In our study, a Self-Organizing Map (SOM) is used to process the Signatures extracted from Monte-Carlo simulations generated by the distributed conceptual watershed model NASIM. The SOM creates a hydrologically interpretable mapping of overall model behaviour, which immediately reveals deficits and trade-offs in the ability of the model to represent the different functional behaviours of the watershed. Further, it facilitates interpretation of the hydrological functions of the model parameters and provides preliminary information regarding their sensitivities. Most notably, we use this mapping to identify the set of model realizations (among the Monte-Carlo data) that most closely approximate the observed discharge time series in terms of the hydrologically relevant characteristics, and to confine the parameter space accordingly. Our results suggest that Signature Index based SOMs could potentially serve as tools for decision makers inasmuch as model realizations with specific Signature properties can be selected according to the purpose of the model application. Moreover, given that the approach helps to represent and analyze multi-dimensional distributions, it could be used to form the basis of an optimization framework that uses SOMs to characterize the model performance response surface. As such it provides a powerful and useful way to conduct model identification and model uncertainty analyses.

  1. Mapping model behaviour using Self-Organizing Maps

    NASA Astrophysics Data System (ADS)

    Herbst, M.; Gupta, H. V.; Casper, M. C.

    2008-12-01

    Hydrological model evaluation and identification essentially depends on the extraction of information from model time series and its processing. However, the type of information extracted by statistical measures has only very limited meaning because it does not relate to the hydrological context of the data. To overcome this inadequacy we exploit the diagnostic evaluation concept of Signature Indices, in which model performance is measured using theoretically relevant characteristics of system behaviour. In our study, a Self-Organizing Map (SOM) is used to process the Signatures extracted from Monte-Carlo simulations generated by a distributed conceptual watershed model. The SOM creates a hydrologically interpretable mapping of overall model behaviour, which immediately reveals deficits and trade-offs in the ability of the model to represent the different functional behaviours of the watershed. Further, it facilitates interpretation of the hydrological functions of the model parameters and provides preliminary information regarding their sensitivities. Most notably, we use this mapping to identify the set of model realizations (among the Monte-Carlo data) that most closely approximate the observed discharge time series in terms of the hydrologically relevant characteristics, and to confine the parameter space accordingly. Our results suggest that Signature Index based SOMs could potentially serve as tools for decision makers inasmuch as model realizations with specific Signature properties can be selected according to the purpose of the model application. Moreover, given that the approach helps to represent and analyze multi-dimensional distributions, it could be used to form the basis of an optimization framework that uses SOMs to characterize the model performance response surface. As such it provides a powerful and useful way to conduct model identification and model uncertainty analyses.

  2. The value of necropsy reports for animal health surveillance.

    PubMed

    Küker, Susanne; Faverjon, Celine; Furrer, Lenz; Berezowski, John; Posthaus, Horst; Rinaldi, Fabio; Vial, Flavie

    2018-06-18

    Animal health data recorded in free text, such as in necropsy reports, can have valuable information for national surveillance systems. However, these data are rarely utilized because the text format requires labor-intensive classification of records before they can be analyzed with using statistical or other software. In a previous study, we designed a text-mining tool to extract data from text in necropsy reports. In the current study, we used the tool to extract data from the reports from pig and cattle necropsies performed between 2000 and 2011 at the Institute of Animal Pathology (ITPA), University of Bern, Switzerland. We evaluated data quality in terms of credibility, completeness and representativeness of the Swiss pig and cattle populations. Data was easily extracted from necropsy reports. Data quality in terms of completeness and validity varied a lot depending on the type of data reported. Diseases of the gastrointestinal system were reported most frequently (54.6% of pig submissions and 40.8% of cattle submissions). Diseases affecting serous membranes were reported in 16.0% of necropsied pigs and 27.6% of cattle. Respiratory diseases were reported in 18.3% of pigs and 21.6% of cattle submissions. This study suggests that extracting data from necropsy reports can provide information of value for animal health surveillance. This data has potential value for monitoring endemic disease syndromes in different age and production groups, or for early detection of emerging or re-emerging diseases. The study identified data entry and other errors that could be corrected to improve the quality and validity of the data. Submissions to veterinary diagnostic laboratories have selection biases and these should be considered when designing surveillance systems that include necropsy reports.

  3. Temporal and peripheral extraction of contextual cues from scenes during visual search.

    PubMed

    Koehler, Kathryn; Eckstein, Miguel P

    2017-02-01

    Scene context is known to facilitate object recognition and guide visual search, but little work has focused on isolating image-based cues and evaluating their contributions to eye movement guidance and search performance. Here, we explore three types of contextual cues (a co-occurring object, the configuration of other objects, and the superordinate category of background elements) and assess their joint contributions to search performance in the framework of cue-combination and the temporal unfolding of their extraction. We also assess whether observers' ability to extract each contextual cue in the visual periphery is a bottleneck that determines the utilization and contribution of each cue to search guidance and decision accuracy. We find that during the first four fixations of a visual search task observers first utilize the configuration of objects for coarse eye movement guidance and later use co-occurring object information for finer guidance. In the absence of contextual cues, observers were suboptimally biased to report the target object as being absent. The presence of the co-occurring object was the only contextual cue that had a significant effect in reducing decision bias. The early influence of object-based cues on eye movements is corroborated by a clear demonstration of observers' ability to extract object cues up to 16° into the visual periphery. The joint contributions of the cues to decision search accuracy approximates that expected from the combination of statistically independent cues and optimal cue combination. Finally, the lack of utilization and contribution of the background-based contextual cue to search guidance cannot be explained by the availability of the contextual cue in the visual periphery; instead it is related to background cues providing the least inherent information about the precise location of the target in the scene.

  4. Information Extraction for Clinical Data Mining: A Mammography Case Study

    PubMed Central

    Nassif, Houssam; Woods, Ryan; Burnside, Elizabeth; Ayvaci, Mehmet; Shavlik, Jude; Page, David

    2013-01-01

    Breast cancer is the leading cause of cancer mortality in women between the ages of 15 and 54. During mammography screening, radiologists use a strict lexicon (BI-RADS) to describe and report their findings. Mammography records are then stored in a well-defined database format (NMD). Lately, researchers have applied data mining and machine learning techniques to these databases. They successfully built breast cancer classifiers that can help in early detection of malignancy. However, the validity of these models depends on the quality of the underlying databases. Unfortunately, most databases suffer from inconsistencies, missing data, inter-observer variability and inappropriate term usage. In addition, many databases are not compliant with the NMD format and/or solely consist of text reports. BI-RADS feature extraction from free text and consistency checks between recorded predictive variables and text reports are crucial to addressing this problem. We describe a general scheme for concept information retrieval from free text given a lexicon, and present a BI-RADS features extraction algorithm for clinical data mining. It consists of a syntax analyzer, a concept finder and a negation detector. The syntax analyzer preprocesses the input into individual sentences. The concept finder uses a semantic grammar based on the BI-RADS lexicon and the experts’ input. It parses sentences detecting BI-RADS concepts. Once a concept is located, a lexical scanner checks for negation. Our method can handle multiple latent concepts within the text, filtering out ultrasound concepts. On our dataset, our algorithm achieves 97.7% precision, 95.5% recall and an F1-score of 0.97. It outperforms manual feature extraction at the 5% statistical significance level. PMID:23765123

  5. Information Extraction for Clinical Data Mining: A Mammography Case Study.

    PubMed

    Nassif, Houssam; Woods, Ryan; Burnside, Elizabeth; Ayvaci, Mehmet; Shavlik, Jude; Page, David

    2009-01-01

    Breast cancer is the leading cause of cancer mortality in women between the ages of 15 and 54. During mammography screening, radiologists use a strict lexicon (BI-RADS) to describe and report their findings. Mammography records are then stored in a well-defined database format (NMD). Lately, researchers have applied data mining and machine learning techniques to these databases. They successfully built breast cancer classifiers that can help in early detection of malignancy. However, the validity of these models depends on the quality of the underlying databases. Unfortunately, most databases suffer from inconsistencies, missing data, inter-observer variability and inappropriate term usage. In addition, many databases are not compliant with the NMD format and/or solely consist of text reports. BI-RADS feature extraction from free text and consistency checks between recorded predictive variables and text reports are crucial to addressing this problem. We describe a general scheme for concept information retrieval from free text given a lexicon, and present a BI-RADS features extraction algorithm for clinical data mining. It consists of a syntax analyzer, a concept finder and a negation detector. The syntax analyzer preprocesses the input into individual sentences. The concept finder uses a semantic grammar based on the BI-RADS lexicon and the experts' input. It parses sentences detecting BI-RADS concepts. Once a concept is located, a lexical scanner checks for negation. Our method can handle multiple latent concepts within the text, filtering out ultrasound concepts. On our dataset, our algorithm achieves 97.7% precision, 95.5% recall and an F 1 -score of 0.97. It outperforms manual feature extraction at the 5% statistical significance level.

  6. A rapid extraction of landslide disaster information research based on GF-1 image

    NASA Astrophysics Data System (ADS)

    Wang, Sai; Xu, Suning; Peng, Ling; Wang, Zhiyi; Wang, Na

    2015-08-01

    In recent years, the landslide disasters occurred frequently because of the seismic activity. It brings great harm to people's life. It has caused high attention of the state and the extensive concern of society. In the field of geological disaster, landslide information extraction based on remote sensing has been controversial, but high resolution remote sensing image can improve the accuracy of information extraction effectively with its rich texture and geometry information. Therefore, it is feasible to extract the information of earthquake- triggered landslides with serious surface damage and large scale. Taking the Wenchuan county as the study area, this paper uses multi-scale segmentation method to extract the landslide image object through domestic GF-1 images and DEM data, which uses the estimation of scale parameter tool to determine the optimal segmentation scale; After analyzing the characteristics of landslide high-resolution image comprehensively and selecting spectrum feature, texture feature, geometric features and landform characteristics of the image, we can establish the extracting rules to extract landslide disaster information. The extraction results show that there are 20 landslide whose total area is 521279.31 .Compared with visual interpretation results, the extraction accuracy is 72.22%. This study indicates its efficient and feasible to extract earthquake landslide disaster information based on high resolution remote sensing and it provides important technical support for post-disaster emergency investigation and disaster assessment.

  7. Multi-Filter String Matching and Human-Centric Entity Matching for Information Extraction

    ERIC Educational Resources Information Center

    Sun, Chong

    2012-01-01

    More and more information is being generated in text documents, such as Web pages, emails and blogs. To effectively manage this unstructured information, one broadly used approach includes locating relevant content in documents, extracting structured information and integrating the extracted information for querying, mining or further analysis. In…

  8. Inactivation disinfection property of Moringa Oleifera seed extract: optimization and kinetic studies

    NASA Astrophysics Data System (ADS)

    Idris, M. A.; Jami, M. S.; Hammed, A. M.

    2017-05-01

    This paper presents the statistical optimization study of disinfection inactivation parameters of defatted Moringa oleifera seed extract on Pseudomonas aeruginosa bacterial cells. Three level factorial design was used to estimate the optimum range and the kinetics of the inactivation process was also carried. The inactivation process involved comparing different disinfection models of Chicks-Watson, Collins-Selleck and Homs models. The results from analysis of variance (ANOVA) of the statistical optimization process revealed that only contact time was significant. The optimum disinfection range of the seed extract was 125 mg/L, 30 minutes and 120rpm agitation. At the optimum dose, the inactivation kinetics followed the Collin-Selleck model with coefficient of determination (R2) of 0.6320. This study is the first of its kind in determining the inactivation kinetics of pseudomonas aeruginosa using the defatted seed extract.

  9. Quantitative knowledge acquisition for expert systems

    NASA Technical Reports Server (NTRS)

    Belkin, Brenda L.; Stengel, Robert F.

    1991-01-01

    A common problem in the design of expert systems is the definition of rules from data obtained in system operation or simulation. While it is relatively easy to collect data and to log the comments of human operators engaged in experiments, generalizing such information to a set of rules has not previously been a direct task. A statistical method is presented for generating rule bases from numerical data, motivated by an example based on aircraft navigation with multiple sensors. The specific objective is to design an expert system that selects a satisfactory suite of measurements from a dissimilar, redundant set, given an arbitrary navigation geometry and possible sensor failures. The systematic development is described of a Navigation Sensor Management (NSM) Expert System from Kalman Filter convariance data. The method invokes two statistical techniques: Analysis of Variance (ANOVA) and the ID3 Algorithm. The ANOVA technique indicates whether variations of problem parameters give statistically different covariance results, and the ID3 algorithms identifies the relationships between the problem parameters using probabilistic knowledge extracted from a simulation example set. Both are detailed.

  10. Robust kernel representation with statistical local features for face recognition.

    PubMed

    Yang, Meng; Zhang, Lei; Shiu, Simon Chi-Keung; Zhang, David

    2013-06-01

    Factors such as misalignment, pose variation, and occlusion make robust face recognition a difficult problem. It is known that statistical features such as local binary pattern are effective for local feature extraction, whereas the recently proposed sparse or collaborative representation-based classification has shown interesting results in robust face recognition. In this paper, we propose a novel robust kernel representation model with statistical local features (SLF) for robust face recognition. Initially, multipartition max pooling is used to enhance the invariance of SLF to image registration error. Then, a kernel-based representation model is proposed to fully exploit the discrimination information embedded in the SLF, and robust regression is adopted to effectively handle the occlusion in face images. Extensive experiments are conducted on benchmark face databases, including extended Yale B, AR (A. Martinez and R. Benavente), multiple pose, illumination, and expression (multi-PIE), facial recognition technology (FERET), face recognition grand challenge (FRGC), and labeled faces in the wild (LFW), which have different variations of lighting, expression, pose, and occlusions, demonstrating the promising performance of the proposed method.

  11. The skeletal maturation status estimated by statistical shape analysis: axial images of Japanese cervical vertebra.

    PubMed

    Shin, S M; Kim, Y-I; Choi, Y-S; Yamaguchi, T; Maki, K; Cho, B-H; Park, S-B

    2015-01-01

    To evaluate axial cervical vertebral (ACV) shape quantitatively and to build a prediction model for skeletal maturation level using statistical shape analysis for Japanese individuals. The sample included 24 female and 19 male patients with hand-wrist radiographs and CBCT images. Through generalized Procrustes analysis and principal components (PCs) analysis, the meaningful PCs were extracted from each ACV shape and analysed for the estimation regression model. Each ACV shape had meaningful PCs, except for the second axial cervical vertebra. Based on these models, the smallest prediction intervals (PIs) were from the combination of the shape space PCs, age and gender. Overall, the PIs of the male group were smaller than those of the female group. There was no significant correlation between centroid size as a size factor and skeletal maturation level. Our findings suggest that the ACV maturation method, which was applied by statistical shape analysis, could confirm information about skeletal maturation in Japanese individuals as an available quantifier of skeletal maturation and could be as useful a quantitative method as the skeletal maturation index.

  12. The skeletal maturation status estimated by statistical shape analysis: axial images of Japanese cervical vertebra

    PubMed Central

    Shin, S M; Choi, Y-S; Yamaguchi, T; Maki, K; Cho, B-H; Park, S-B

    2015-01-01

    Objectives: To evaluate axial cervical vertebral (ACV) shape quantitatively and to build a prediction model for skeletal maturation level using statistical shape analysis for Japanese individuals. Methods: The sample included 24 female and 19 male patients with hand–wrist radiographs and CBCT images. Through generalized Procrustes analysis and principal components (PCs) analysis, the meaningful PCs were extracted from each ACV shape and analysed for the estimation regression model. Results: Each ACV shape had meaningful PCs, except for the second axial cervical vertebra. Based on these models, the smallest prediction intervals (PIs) were from the combination of the shape space PCs, age and gender. Overall, the PIs of the male group were smaller than those of the female group. There was no significant correlation between centroid size as a size factor and skeletal maturation level. Conclusions: Our findings suggest that the ACV maturation method, which was applied by statistical shape analysis, could confirm information about skeletal maturation in Japanese individuals as an available quantifier of skeletal maturation and could be as useful a quantitative method as the skeletal maturation index. PMID:25411713

  13. Ensembles of radial basis function networks for spectroscopic detection of cervical precancer

    NASA Technical Reports Server (NTRS)

    Tumer, K.; Ramanujam, N.; Ghosh, J.; Richards-Kortum, R.

    1998-01-01

    The mortality related to cervical cancer can be substantially reduced through early detection and treatment. However, current detection techniques, such as Pap smear and colposcopy, fail to achieve a concurrently high sensitivity and specificity. In vivo fluorescence spectroscopy is a technique which quickly, noninvasively and quantitatively probes the biochemical and morphological changes that occur in precancerous tissue. A multivariate statistical algorithm was used to extract clinically useful information from tissue spectra acquired from 361 cervical sites from 95 patients at 337-, 380-, and 460-nm excitation wavelengths. The multivariate statistical analysis was also employed to reduce the number of fluorescence excitation-emission wavelength pairs required to discriminate healthy tissue samples from precancerous tissue samples. The use of connectionist methods such as multilayered perceptrons, radial basis function (RBF) networks, and ensembles of such networks was investigated. RBF ensemble algorithms based on fluorescence spectra potentially provide automated and near real-time implementation of precancer detection in the hands of nonexperts. The results are more reliable, direct, and accurate than those achieved by either human experts or multivariate statistical algorithms.

  14. Statistical estimation of femur micro-architecture using optimal shape and density predictors.

    PubMed

    Lekadir, Karim; Hazrati-Marangalou, Javad; Hoogendoorn, Corné; Taylor, Zeike; van Rietbergen, Bert; Frangi, Alejandro F

    2015-02-26

    The personalization of trabecular micro-architecture has been recently shown to be important in patient-specific biomechanical models of the femur. However, high-resolution in vivo imaging of bone micro-architecture using existing modalities is still infeasible in practice due to the associated acquisition times, costs, and X-ray radiation exposure. In this study, we describe a statistical approach for the prediction of the femur micro-architecture based on the more easily extracted subject-specific bone shape and mineral density information. To this end, a training sample of ex vivo micro-CT images is used to learn the existing statistical relationships within the low and high resolution image data. More specifically, optimal bone shape and mineral density features are selected based on their predictive power and used within a partial least square regression model to estimate the unknown trabecular micro-architecture within the anatomical models of new subjects. The experimental results demonstrate the accuracy of the proposed approach, with average errors of 0.07 for both the degree of anisotropy and tensor norms. Copyright © 2015 Elsevier Ltd. All rights reserved.

  15. Extracting Useful Semantic Information from Large Scale Corpora of Text

    ERIC Educational Resources Information Center

    Mendoza, Ray Padilla, Jr.

    2012-01-01

    Extracting and representing semantic information from large scale corpora is at the crux of computer-assisted knowledge generation. Semantic information depends on collocation extraction methods, mathematical models used to represent distributional information, and weighting functions which transform the space. This dissertation provides a…

  16. Screening antimicrobial activity of various extracts of Urtica dioica.

    PubMed

    Modarresi-Chahardehi, Amir; Ibrahim, Darah; Fariza-Sulaiman, Shaida; Mousavi, Leila

    2012-12-01

    Urtica dioica or stinging nettle is traditionally used as an herbal medicine in Western Asia. The current study represents the investigation of antimicrobial activity of U. dioica from nine crude extracts that were prepared using different organic solvents, obtained from two extraction methods: the Soxhlet extractor (Method I), which included the use of four solvents with ethyl acetate and hexane, or the sequential partitions (Method II) with a five solvent system (butanol). The antibacterial and antifungal activities of crude extracts were tested against 28 bacteria, three yeast strains and seven fungal isolates by the disc diffusion and broth dilution methods. Amoxicillin was used as positive control for bacteria strains, vancomycin for Streptococcus sp., miconazole nitrate (30 microg/mL) as positive control for fungi and yeast, and pure methanol (v/v) as negative control. The disc diffusion assay was used to determine the sensitivity of the samples, whilst the broth dilution method was used for the determination of the minimal inhibition concentration (MIC). The ethyl acetate and hexane extract from extraction method I (EA I and HE I) exhibited highest inhibition against some pathogenic bacteria such as Bacillus cereus, MRSA and Vibrio parahaemolyticus. A selection of extracts that showed some activity was further tested for the MIC and minimal bactericidal concentrations (MBC). MIC values of Bacillus subtilis and Methicillin-resistant Staphylococcus aureus (MRSA) using butanol extract of extraction method II (BE II) were 8.33 and 16.33mg/mL, respectively; while the MIC value using ethyl acetate extract of extraction method II (EAE II) for Vibrio parahaemolyticus was 0.13mg/mL. Our study showed that 47.06% of extracts inhibited Gram-negative (8 out of 17), and 63.63% of extracts also inhibited Gram-positive bacteria (7 out of 11); besides, statistically the frequency of antimicrobial activity was 13.45% (35 out of 342) which in this among 21.71% belongs to antimicrobial activity extracts from extraction method I (33 out of 152 of crude extracts) and 6.82% from extraction method II (13 out of 190 of crude extracts). However, crude extracts from method I exhibited better antimicrobial activity against the Gram-positive bacteria than the Gram-negative bacteria. The positive results on medicinal plants screening for antibacterial activity constitutes primary information for further phytochemical and pharmacological studies. Therefore, the extracts could be suitable as antimicrobial agents in pharmaceutical and food industry.

  17. Bird sound spectrogram decomposition through Non-Negative Matrix Factorization for the acoustic classification of bird species.

    PubMed

    Ludeña-Choez, Jimmy; Quispe-Soncco, Raisa; Gallardo-Antolín, Ascensión

    2017-01-01

    Feature extraction for Acoustic Bird Species Classification (ABSC) tasks has traditionally been based on parametric representations that were specifically developed for speech signals, such as Mel Frequency Cepstral Coefficients (MFCC). However, the discrimination capabilities of these features for ABSC could be enhanced by accounting for the vocal production mechanisms of birds, and, in particular, the spectro-temporal structure of bird sounds. In this paper, a new front-end for ABSC is proposed that incorporates this specific information through the non-negative decomposition of bird sound spectrograms. It consists of the following two different stages: short-time feature extraction and temporal feature integration. In the first stage, which aims at providing a better spectral representation of bird sounds on a frame-by-frame basis, two methods are evaluated. In the first method, cepstral-like features (NMF_CC) are extracted by using a filter bank that is automatically learned by means of the application of Non-Negative Matrix Factorization (NMF) on bird audio spectrograms. In the second method, the features are directly derived from the activation coefficients of the spectrogram decomposition as performed through NMF (H_CC). The second stage summarizes the most relevant information contained in the short-time features by computing several statistical measures over long segments. The experiments show that the use of NMF_CC and H_CC in conjunction with temporal integration significantly improves the performance of a Support Vector Machine (SVM)-based ABSC system with respect to conventional MFCC.

  18. Batch process fault detection and identification based on discriminant global preserving kernel slow feature analysis.

    PubMed

    Zhang, Hanyuan; Tian, Xuemin; Deng, Xiaogang; Cao, Yuping

    2018-05-16

    As an attractive nonlinear dynamic data analysis tool, global preserving kernel slow feature analysis (GKSFA) has achieved great success in extracting the high nonlinearity and inherently time-varying dynamics of batch process. However, GKSFA is an unsupervised feature extraction method and lacks the ability to utilize batch process class label information, which may not offer the most effective means for dealing with batch process monitoring. To overcome this problem, we propose a novel batch process monitoring method based on the modified GKSFA, referred to as discriminant global preserving kernel slow feature analysis (DGKSFA), by closely integrating discriminant analysis and GKSFA. The proposed DGKSFA method can extract discriminant feature of batch process as well as preserve global and local geometrical structure information of observed data. For the purpose of fault detection, a monitoring statistic is constructed based on the distance between the optimal kernel feature vectors of test data and normal data. To tackle the challenging issue of nonlinear fault variable identification, a new nonlinear contribution plot method is also developed to help identifying the fault variable after a fault is detected, which is derived from the idea of variable pseudo-sample trajectory projection in DGKSFA nonlinear biplot. Simulation results conducted on a numerical nonlinear dynamic system and the benchmark fed-batch penicillin fermentation process demonstrate that the proposed process monitoring and fault diagnosis approach can effectively detect fault and distinguish fault variables from normal variables. Copyright © 2018 ISA. Published by Elsevier Ltd. All rights reserved.

  19. Bird sound spectrogram decomposition through Non-Negative Matrix Factorization for the acoustic classification of bird species

    PubMed Central

    Quispe-Soncco, Raisa

    2017-01-01

    Feature extraction for Acoustic Bird Species Classification (ABSC) tasks has traditionally been based on parametric representations that were specifically developed for speech signals, such as Mel Frequency Cepstral Coefficients (MFCC). However, the discrimination capabilities of these features for ABSC could be enhanced by accounting for the vocal production mechanisms of birds, and, in particular, the spectro-temporal structure of bird sounds. In this paper, a new front-end for ABSC is proposed that incorporates this specific information through the non-negative decomposition of bird sound spectrograms. It consists of the following two different stages: short-time feature extraction and temporal feature integration. In the first stage, which aims at providing a better spectral representation of bird sounds on a frame-by-frame basis, two methods are evaluated. In the first method, cepstral-like features (NMF_CC) are extracted by using a filter bank that is automatically learned by means of the application of Non-Negative Matrix Factorization (NMF) on bird audio spectrograms. In the second method, the features are directly derived from the activation coefficients of the spectrogram decomposition as performed through NMF (H_CC). The second stage summarizes the most relevant information contained in the short-time features by computing several statistical measures over long segments. The experiments show that the use of NMF_CC and H_CC in conjunction with temporal integration significantly improves the performance of a Support Vector Machine (SVM)-based ABSC system with respect to conventional MFCC. PMID:28628630

  20. Brain vascular image segmentation based on fuzzy local information C-means clustering

    NASA Astrophysics Data System (ADS)

    Hu, Chaoen; Liu, Xia; Liang, Xiao; Hui, Hui; Yang, Xin; Tian, Jie

    2017-02-01

    Light sheet fluorescence microscopy (LSFM) is a powerful optical resolution fluorescence microscopy technique which enables to observe the mouse brain vascular network in cellular resolution. However, micro-vessel structures are intensity inhomogeneity in LSFM images, which make an inconvenience for extracting line structures. In this work, we developed a vascular image segmentation method by enhancing vessel details which should be useful for estimating statistics like micro-vessel density. Since the eigenvalues of hessian matrix and its sign describes different geometric structure in images, which enable to construct vascular similarity function and enhance line signals, the main idea of our method is to cluster the pixel values of the enhanced image. Our method contained three steps: 1) calculate the multiscale gradients and the differences between eigenvalues of Hessian matrix. 2) In order to generate the enhanced microvessels structures, a feed forward neural network was trained by 2.26 million pixels for dealing with the correlations between multi-scale gradients and the differences between eigenvalues. 3) The fuzzy local information c-means clustering (FLICM) was used to cluster the pixel values in enhance line signals. To verify the feasibility and effectiveness of this method, mouse brain vascular images have been acquired by a commercial light-sheet microscope in our lab. The experiment of the segmentation method showed that dice similarity coefficient can reach up to 85%. The results illustrated that our approach extracting line structures of blood vessels dramatically improves the vascular image and enable to accurately extract blood vessels in LSFM images.

  1. Validation and extraction of molecular-geometry information from small-molecule databases.

    PubMed

    Long, Fei; Nicholls, Robert A; Emsley, Paul; Graǽulis, Saulius; Merkys, Andrius; Vaitkus, Antanas; Murshudov, Garib N

    2017-02-01

    A freely available small-molecule structure database, the Crystallography Open Database (COD), is used for the extraction of molecular-geometry information on small-molecule compounds. The results are used for the generation of new ligand descriptions, which are subsequently used by macromolecular model-building and structure-refinement software. To increase the reliability of the derived data, and therefore the new ligand descriptions, the entries from this database were subjected to very strict validation. The selection criteria made sure that the crystal structures used to derive atom types, bond and angle classes are of sufficiently high quality. Any suspicious entries at a crystal or molecular level were removed from further consideration. The selection criteria included (i) the resolution of the data used for refinement (entries solved at 0.84 Å resolution or higher) and (ii) the structure-solution method (structures must be from a single-crystal experiment and all atoms of generated molecules must have full occupancies), as well as basic sanity checks such as (iii) consistency between the valences and the number of connections between atoms, (iv) acceptable bond-length deviations from the expected values and (v) detection of atomic collisions. The derived atom types and bond classes were then validated using high-order moment-based statistical techniques. The results of the statistical analyses were fed back to fine-tune the atom typing. The developed procedure was repeated four times, resulting in fine-grained atom typing, bond and angle classes. The procedure will be repeated in the future as and when new entries are deposited in the COD. The whole procedure can also be applied to any source of small-molecule structures, including the Cambridge Structural Database and the ZINC database.

  2. Statistical assessment of DNA extraction reagent lot variability in real-time quantitative PCR

    USGS Publications Warehouse

    Bushon, R.N.; Kephart, C.M.; Koltun, G.F.; Francy, D.S.; Schaefer, F. W.; Lindquist, H.D. Alan

    2010-01-01

    Aims: The aim of this study was to evaluate the variability in lots of a DNA extraction kit using real-time PCR assays for Bacillus anthracis, Francisella tularensis and Vibrio cholerae. Methods and Results: Replicate aliquots of three bacteria were processed in duplicate with three different lots of a commercial DNA extraction kit. This experiment was repeated in triplicate. Results showed that cycle threshold values were statistically different among the different lots. Conclusions: Differences in DNA extraction reagent lots were found to be a significant source of variability for qPCR results. Steps should be taken to ensure the quality and consistency of reagents. Minimally, we propose that standard curves should be constructed for each new lot of extraction reagents, so that lot-to-lot variation is accounted for in data interpretation. Significance and Impact of the Study: This study highlights the importance of evaluating variability in DNA extraction procedures, especially when different reagent lots are used. Consideration of this variability in data interpretation should be an integral part of studies investigating environmental samples with unknown concentrations of organisms. ?? 2010 The Society for Applied Microbiology.

  3. Extraction of multi-scale landslide morphological features based on local Gi* using airborne LiDAR-derived DEM

    NASA Astrophysics Data System (ADS)

    Shi, Wenzhong; Deng, Susu; Xu, Wenbing

    2018-02-01

    For automatic landslide detection, landslide morphological features should be quantitatively expressed and extracted. High-resolution Digital Elevation Models (DEMs) derived from airborne Light Detection and Ranging (LiDAR) data allow fine-scale morphological features to be extracted, but noise in DEMs influences morphological feature extraction, and the multi-scale nature of landslide features should be considered. This paper proposes a method to extract landslide morphological features characterized by homogeneous spatial patterns. Both profile and tangential curvature are utilized to quantify land surface morphology, and a local Gi* statistic is calculated for each cell to identify significant patterns of clustering of similar morphometric values. The method was tested on both synthetic surfaces simulating natural terrain and airborne LiDAR data acquired over an area dominated by shallow debris slides and flows. The test results of the synthetic data indicate that the concave and convex morphologies of the simulated terrain features at different scales and distinctness could be recognized using the proposed method, even when random noise was added to the synthetic data. In the test area, cells with large local Gi* values were extracted at a specified significance level from the profile and the tangential curvature image generated from the LiDAR-derived 1-m DEM. The morphologies of landslide main scarps, source areas and trails were clearly indicated, and the morphological features were represented by clusters of extracted cells. A comparison with the morphological feature extraction method based on curvature thresholds proved the proposed method's robustness to DEM noise. When verified against a landslide inventory, the morphological features of almost all recent (< 5 years) landslides and approximately 35% of historical (> 10 years) landslides were extracted. This finding indicates that the proposed method can facilitate landslide detection, although the cell clusters extracted from curvature images should be filtered using a filtering strategy based on supplementary information provided by expert knowledge or other data sources.

  4. Evaluation of the reliability, usability, and applicability of AMSTAR, AMSTAR 2, and ROBIS: protocol for a descriptive analytic study.

    PubMed

    Gates, Allison; Gates, Michelle; Duarte, Gonçalo; Cary, Maria; Becker, Monika; Prediger, Barbara; Vandermeer, Ben; Fernandes, Ricardo M; Pieper, Dawid; Hartling, Lisa

    2018-06-13

    Systematic reviews (SRs) of randomised controlled trials (RCTs) can provide the best evidence to inform decision-making, but their methodological and reporting quality varies. Tools exist to guide the critical appraisal of quality and risk of bias in SRs, but evaluations of their measurement properties are limited. We will investigate the interrater reliability (IRR), usability, and applicability of A MeaSurement Tool to Assess systematic Reviews (AMSTAR), AMSTAR 2, and Risk Of Bias In Systematic reviews (ROBIS) for SRs in the fields of biomedicine and public health. An international team of researchers at three collaborating centres will undertake the study. We will use a random sample of 30 SRs of RCTs investigating therapeutic interventions indexed in MEDLINE in February 2014. Two reviewers at each centre will appraise the quality and risk of bias in each SR using AMSTAR, AMSTAR 2, and ROBIS. We will record the time to complete each assessment and for the two reviewers to reach consensus for each SR. We will extract the descriptive characteristics of each SR, the included studies, participants, interventions, and comparators. We will also extract the direction and strength of the results and conclusions for the primary outcome. We will summarise the descriptive characteristics of the SRs using means and standard deviations, or frequencies and proportions. To test for interrater reliability between reviewers and between the consensus agreements of reviewer pairs, we will use Gwet's AC 1 statistic. For comparability to previous evaluations, we will also calculate weighted Cohen's kappa and Fleiss' kappa statistics. To estimate usability, we will calculate the mean time to complete the appraisal and to reach consensus for each tool. To inform applications of the tools, we will test for statistical associations between quality scores and risk of bias judgments, and the results and conclusions of the SRs. Appraising the methodological and reporting quality of SRs is necessary to determine the trustworthiness of their conclusions. Which tool may be most reliably applied and how the appraisals should be used is uncertain; the usability of newly developed tools is unknown. This investigation of common (AMSTAR) and newly developed (AMSTAR 2, ROBIS) tools will provide empiric data to inform their application, interpretation, and refinement.

  5. The Impact of mHealth Interventions on Breast Cancer Awareness and Screening: Systematic Review Protocol.

    PubMed

    Tokosi, Temitope O; Fortuin, Jill; Douglas, Tania S

    2017-12-21

    Mobile health (mHealth) is the use of mobile communication technologies to promote health by supporting health care practices (eg, health data collection, delivery of health care information). mHealth technologies (such as mobile phones) can be used effectively by health care practitioners in the distribution of health information and have the potential to improve access to and quality of health care, as well as reduce the cost of health services. Current literature shows limited scientific evidence related to the benefits of mHealth interventions for breast cancer, which is the leading cause of cancer deaths in women worldwide and contributes a large proportion of all cancer deaths, especially in developing countries. Women, especially in low- and middle-income countries (LMICs), are faced with low odds of surviving breast cancer. This finding is likely due to multiple factors related to health systems: low priority of women's health and cancer on national health agendas; lack of awareness that breast cancer can be effectively treated if detected early; and societal, cultural, and religious factors that are prevalent in LMICs. The proposed systematic review will examine the impact of mHealth interventions on breast cancer awareness and screening among women aged 18 years and older. The objectives of this study are to identify and describe the various mHealth intervention strategies that are used for breast cancer, and assess the impact of mHealth strategies on breast cancer awareness and screening. Literature from various databases such as MEDLINE, EMBASE, PsycINFO, CINAHL, and Cochrane Central Register of Controlled Trials will be examined. Trial registers, reports, and unpublished theses will also be included. All mobile technologies such as cell phones, personal digital assistants, and tablets that have short message service, multimedia message service, video, and audio capabilities will be included. mHealth is the primary intervention. The search strategy will include keywords such as "mHealth," "breast cancer," "awareness," and "screening," among other medical subject heading terms. Articles published from January 1, 1964 to December 31, 2016 will be eligible for inclusion. Two authors will independently screen and select studies, extract data, and assess the risk of bias, with discrepancies resolved by dialogue involving a third author. We will assess statistical heterogeneity by examining the types of participants, interventions, study designs, and outcomes in each study, and pool studies judged to be statistically homogeneous. In the assessment of heterogeneity, a sensitivity analysis will be considered to explore statistical heterogeneity. Statistical heterogeneity will be investigated using the Chi-square test of homogeneity on Cochrane's Q statistic and quantified using the I-squared statistic. The search strategy will be refined with the assistance of an information specialist from November 1, 2017 to January 31, 2018. Literature searches will take place from February 2018 to April 2018. Data extraction and capturing in Review Manager (RevMan, Version 5.3) will take place from May 1, 2018 to July 31, 2018. The final stages will include analyses and writing, which is anticipated occur between August 2018 and October 2018. The knowledge derived from this study will inform health care stakeholders, including researchers, policy makers, investors, health professionals, technologists, and engineers, on the impact of mHealth interventions on breast cancer screening and awareness. Prospero registration number CRD42016050202. ©Temitope O Tokosi, Jill Fortuin, Tania S Douglas. Originally published in JMIR Research Protocols (http://www.researchprotocols.org), 21.12.2017.

  6. Fusion of Remote Sensing Methods, UAV Photogrammetry and LiDAR Scanning products for monitoring fluvial dynamics

    NASA Astrophysics Data System (ADS)

    Lendzioch, Theodora; Langhammer, Jakub; Hartvich, Filip

    2015-04-01

    Fusion of remote sensing data is a common and rapidly developing discipline, which combines data from multiple sources with different spatial and spectral resolution, from satellite sensors, aircraft and ground platforms. Fusion data contains more detailed information than each of the source and enhances the interpretation performance and accuracy of the source data and produces a high-quality visualisation of the final data. Especially, in fluvial geomorphology it is essential to get valuable images in sub-meter resolution to obtain high quality 2D and 3D information for a detailed identification, extraction and description of channel features of different river regimes and to perform a rapid mapping of changes in river topography. In order to design, test and evaluate a new approach for detection of river morphology, we combine different research techniques from remote sensing products to drone-based photogrammetry and LiDAR products (aerial LiDAR Scanner and TLS). Topographic information (e.g. changes in river channel morphology, surface roughness, evaluation of floodplain inundation, mapping gravel bars and slope characteristics) will be extracted either from one single layer or from combined layers in accordance to detect fluvial topographic changes before and after flood events. Besides statistical approaches for predictive geomorphological mapping and the determination of errors and uncertainties of the data, we will also provide 3D modelling of small fluvial features.

  7. Short tandem repeat analysis in Japanese population.

    PubMed

    Hashiyada, M

    2000-01-01

    Short tandem repeats (STRs), known as microsatellites, are one of the most informative genetic markers for characterizing biological materials. Because of the relatively small size of STR alleles (generally 100-350 nucleotides), amplification by polymerase chain reaction (PCR) is relatively easy, affording a high sensitivity of detection. In addition, STR loci can be amplified simultaneously in a multiplex PCR. Thus, substantial information can be obtained in a single analysis with the benefits of using less template DNA, reducing labor, and reducing the contamination. We investigated 14 STR loci in a Japanese population living in Sendai by three multiplex PCR kits, GenePrint PowerPlex 1.1 and 2.2. Fluorescent STR System (Promega, Madison, WI, USA) and AmpF/STR Profiler (Perkin-Elmer, Norwalk, CT, USA). Genomic DNA was extracted using sodium dodecyl sulfate (SDS) proteinase K or Chelex 100 treatment followed by the phenol/chloroform extraction. PCR was performed according to the manufacturer's protocols. Electrophoresis was carried out on an ABI 377 sequencer and the alleles were determined by GeneScan 2.0.2 software (Perkin-Elmer). In 14 STRs loci, statistical parameters indicated a relatively high rate, and no significant deviation from Hardy-Weinberg equilibrium was detected. We apply this STR system to paternity testing and forensic casework, e.g., personal identification in rape cases. This system is an effective tool in the forensic sciences to obtain information on individual identification.

  8. Robust real-time extraction of respiratory signals from PET list-mode data.

    PubMed

    Salomon, Andre; Zhang, Bin; Olivier, Patrick; Goedicke, Andreas

    2018-05-01

    Respiratory motion, which typically cannot simply be suspended during PET image acquisition, affects lesions' detection and quantitative accuracy inside or in close vicinity to the lungs. Some motion compensation techniques address this issue via pre-sorting ("binning") of the acquired PET data into a set of temporal gates, where each gate is assumed to be minimally affected by respiratory motion. Tracking respiratory motion is typically realized using dedicated hardware (e.g. using respiratory belts and digital cameras). Extracting respiratory signalsdirectly from the acquired PET data simplifies the clinical workflow as it avoids to handle additional signal measurement equipment. We introduce a new data-driven method "Combined Local Motion Detection" (CLMD). It uses the Time-of-Flight (TOF) information provided by state-of-the-art PET scanners in order to enable real-time respiratory signal extraction without additional hardware resources. CLMD applies center-of-mass detection in overlapping regions based on simple back-positioned TOF event sets acquired in short time frames. Following a signal filtering and quality-based pre-selection step, the remaining extracted individual position information over time is then combined to generate a global respiratory signal. The method is evaluated using 7 measured FDG studies from single and multiple scan positions of the thorax region, and it is compared to other software-based methods regarding quantitative accuracy and statistical noise stability. Correlation coefficients around 90% between the reference and the extracted signal have been found for those PET scans where motion affected features such as tumors or hot regions were present in the PET field-of-view. For PET scans with a quarter of typically applied radiotracer doses, the CLMD method still provides similar high correlation coefficients which indicates its robustness to noise. Each CLMD processing needed less than 0.4s in total on a standard multi-core CPU and thus provides a robust and accurate approach enabling real-time processing capabilities using standard PC hardware. © 2018 Institute of Physics and Engineering in Medicine.

  9. Robust real-time extraction of respiratory signals from PET list-mode data

    NASA Astrophysics Data System (ADS)

    Salomon, André; Zhang, Bin; Olivier, Patrick; Goedicke, Andreas

    2018-06-01

    Respiratory motion, which typically cannot simply be suspended during PET image acquisition, affects lesions’ detection and quantitative accuracy inside or in close vicinity to the lungs. Some motion compensation techniques address this issue via pre-sorting (‘binning’) of the acquired PET data into a set of temporal gates, where each gate is assumed to be minimally affected by respiratory motion. Tracking respiratory motion is typically realized using dedicated hardware (e.g. using respiratory belts and digital cameras). Extracting respiratory signals directly from the acquired PET data simplifies the clinical workflow as it avoids handling additional signal measurement equipment. We introduce a new data-driven method ‘combined local motion detection’ (CLMD). It uses the time-of-flight (TOF) information provided by state-of-the-art PET scanners in order to enable real-time respiratory signal extraction without additional hardware resources. CLMD applies center-of-mass detection in overlapping regions based on simple back-positioned TOF event sets acquired in short time frames. Following a signal filtering and quality-based pre-selection step, the remaining extracted individual position information over time is then combined to generate a global respiratory signal. The method is evaluated using seven measured FDG studies from single and multiple scan positions of the thorax region, and it is compared to other software-based methods regarding quantitative accuracy and statistical noise stability. Correlation coefficients around 90% between the reference and the extracted signal have been found for those PET scans where motion affected features such as tumors or hot regions were present in the PET field-of-view. For PET scans with a quarter of typically applied radiotracer doses, the CLMD method still provides similar high correlation coefficients which indicates its robustness to noise. Each CLMD processing needed less than 0.4 s in total on a standard multi-core CPU and thus provides a robust and accurate approach enabling real-time processing capabilities using standard PC hardware.

  10. Dendritic tree extraction from noisy maximum intensity projection images in C. elegans.

    PubMed

    Greenblum, Ayala; Sznitman, Raphael; Fua, Pascal; Arratia, Paulo E; Oren, Meital; Podbilewicz, Benjamin; Sznitman, Josué

    2014-06-12

    Maximum Intensity Projections (MIP) of neuronal dendritic trees obtained from confocal microscopy are frequently used to study the relationship between tree morphology and mechanosensory function in the model organism C. elegans. Extracting dendritic trees from noisy images remains however a strenuous process that has traditionally relied on manual approaches. Here, we focus on automated and reliable 2D segmentations of dendritic trees following a statistical learning framework. Our dendritic tree extraction (DTE) method uses small amounts of labelled training data on MIPs to learn noise models of texture-based features from the responses of tree structures and image background. Our strategy lies in evaluating statistical models of noise that account for both the variability generated from the imaging process and from the aggregation of information in the MIP images. These noisy models are then used within a probabilistic, or Bayesian framework to provide a coarse 2D dendritic tree segmentation. Finally, some post-processing is applied to refine the segmentations and provide skeletonized trees using a morphological thinning process. Following a Leave-One-Out Cross Validation (LOOCV) method for an MIP databse with available "ground truth" images, we demonstrate that our approach provides significant improvements in tree-structure segmentations over traditional intensity-based methods. Improvements for MIPs under various imaging conditions are both qualitative and quantitative, as measured from Receiver Operator Characteristic (ROC) curves and the yield and error rates in the final segmentations. In a final step, we demonstrate our DTE approach on previously unseen MIP samples including the extraction of skeletonized structures, and compare our method to a state-of-the art dendritic tree tracing software. Overall, our DTE method allows for robust dendritic tree segmentations in noisy MIPs, outperforming traditional intensity-based methods. Such approach provides a useable segmentation framework, ultimately delivering a speed-up for dendritic tree identification on the user end and a reliable first step towards further morphological characterizations of tree arborization.

  11. Optimization of the omega-3 extraction as a functional food from flaxseed.

    PubMed

    Hassan-Zadeh, A; Sahari, M A; Barzegar, M

    2008-09-01

    The fatty acid content, total lipid, refractive index, peroxide, iodine, acid and saponification values of Iranian linseed oil (Linum usitatissimum) were studied. For optimization of extraction conditions, this oil was extracted by solvents (petroleum benzene and methanol-water-petroleum benzene) in 1:2, 1:3 and 1:4 ratios at 2, 5 and 8 h. Then its fatty acid content, omega-3 content and extraction yield were determined. According to the statistical analysis, petroleum benzene in a ratio of 1:3 at 5 h was chosen for the higher fatty acid, extraction yield, and economical feasibility. For preservation of omega-3 ingredients, oil with specified characters containing 46.8% omega-3 was kept under a nitrogen atmosphere at -30 degrees C during 0, 7, 30, 60 and 90 days and its peroxide value was determined. Statistical analysis showed a significant difference in the average amount of peroxide value only on the first 7 days of storage, and its increase (8.30%) conformed to the international standard.

  12. Signal extraction and wave field separation in tunnel seismic prediction by independent component analysis

    NASA Astrophysics Data System (ADS)

    Yue, Y.; Jiang, T.; Zhou, Q.

    2017-12-01

    In order to ensure the rationality and the safety of tunnel excavation, the advanced geological prediction has been become an indispensable step in tunneling. However, the extraction of signal and the separation of P and S waves directly influence the accuracy of geological prediction. Generally, the raw data collected in TSP system is low quality because of the numerous disturb factors in tunnel projects, such as the power interference and machine vibration interference. It's difficult for traditional method (band-pass filtering) to remove interference effectively as well as bring little loss to signal. The power interference, machine vibration interference and the signal are original variables and x, y, z component as observation signals, each component of the representation is a linear combination of the original variables, which satisfy applicable conditions of independent component analysis (ICA). We perform finite-difference simulations of elastic wave propagation to synthetic a tunnel seismic reflection record. The method of ICA was adopted to process the three-component data, and the results show that extract the estimates of signal and the signals are highly correlated (the coefficient correlation is up to more than 0.93). In addition, the estimates of interference that separated from ICA and the interference signals are also highly correlated, and the coefficient correlation is up to more than 0.99. Thus, simulation results showed that the ICA is an ideal method for extracting high quality data from mixed signals. For the separation of P and S waves, the conventional separation techniques are based on physical characteristics of wave propagation, which require knowledge of the near-surface P and S waves velocities and density. Whereas the ICA approach is entirely based on statistical differences between P and S waves, and the statistical technique does not require a priori information. The concrete results of the wave field separation will be presented in the meeting. In summary, we can safely draw the conclusion that ICA can not only extract high quality data from the mixed signals, but also can separate P and S waves effectively.

  13. Digital immunohistochemistry platform for the staining variation monitoring based on integration of image and statistical analyses with laboratory information system.

    PubMed

    Laurinaviciene, Aida; Plancoulaine, Benoit; Baltrusaityte, Indra; Meskauskas, Raimundas; Besusparis, Justinas; Lesciute-Krilaviciene, Daiva; Raudeliunas, Darius; Iqbal, Yasir; Herlin, Paulette; Laurinavicius, Arvydas

    2014-01-01

    Digital immunohistochemistry (IHC) is one of the most promising applications brought by new generation image analysis (IA). While conventional IHC staining quality is monitored by semi-quantitative visual evaluation of tissue controls, IA may require more sensitive measurement. We designed an automated system to digitally monitor IHC multi-tissue controls, based on SQL-level integration of laboratory information system with image and statistical analysis tools. Consecutive sections of TMA containing 10 cores of breast cancer tissue were used as tissue controls in routine Ki67 IHC testing. Ventana slide label barcode ID was sent to the LIS to register the serial section sequence. The slides were stained and scanned (Aperio ScanScope XT), IA was performed by the Aperio/Leica Colocalization and Genie Classifier/Nuclear algorithms. SQL-based integration ensured automated statistical analysis of the IA data by the SAS Enterprise Guide project. Factor analysis and plot visualizations were performed to explore slide-to-slide variation of the Ki67 IHC staining results in the control tissue. Slide-to-slide intra-core IHC staining analysis revealed rather significant variation of the variables reflecting the sample size, while Brown and Blue Intensity were relatively stable. To further investigate this variation, the IA results from the 10 cores were aggregated to minimize tissue-related variance. Factor analysis revealed association between the variables reflecting the sample size detected by IA and Blue Intensity. Since the main feature to be extracted from the tissue controls was staining intensity, we further explored the variation of the intensity variables in the individual cores. MeanBrownBlue Intensity ((Brown+Blue)/2) and DiffBrownBlue Intensity (Brown-Blue) were introduced to better contrast the absolute intensity and the colour balance variation in each core; relevant factor scores were extracted. Finally, tissue-related factors of IHC staining variance were explored in the individual tissue cores. Our solution enabled to monitor staining of IHC multi-tissue controls by the means of IA, followed by automated statistical analysis, integrated into the laboratory workflow. We found that, even in consecutive serial tissue sections, tissue-related factors affected the IHC IA results; meanwhile, less intense blue counterstain was associated with less amount of tissue, detected by the IA tools.

  14. Digital immunohistochemistry platform for the staining variation monitoring based on integration of image and statistical analyses with laboratory information system

    PubMed Central

    2014-01-01

    Background Digital immunohistochemistry (IHC) is one of the most promising applications brought by new generation image analysis (IA). While conventional IHC staining quality is monitored by semi-quantitative visual evaluation of tissue controls, IA may require more sensitive measurement. We designed an automated system to digitally monitor IHC multi-tissue controls, based on SQL-level integration of laboratory information system with image and statistical analysis tools. Methods Consecutive sections of TMA containing 10 cores of breast cancer tissue were used as tissue controls in routine Ki67 IHC testing. Ventana slide label barcode ID was sent to the LIS to register the serial section sequence. The slides were stained and scanned (Aperio ScanScope XT), IA was performed by the Aperio/Leica Colocalization and Genie Classifier/Nuclear algorithms. SQL-based integration ensured automated statistical analysis of the IA data by the SAS Enterprise Guide project. Factor analysis and plot visualizations were performed to explore slide-to-slide variation of the Ki67 IHC staining results in the control tissue. Results Slide-to-slide intra-core IHC staining analysis revealed rather significant variation of the variables reflecting the sample size, while Brown and Blue Intensity were relatively stable. To further investigate this variation, the IA results from the 10 cores were aggregated to minimize tissue-related variance. Factor analysis revealed association between the variables reflecting the sample size detected by IA and Blue Intensity. Since the main feature to be extracted from the tissue controls was staining intensity, we further explored the variation of the intensity variables in the individual cores. MeanBrownBlue Intensity ((Brown+Blue)/2) and DiffBrownBlue Intensity (Brown-Blue) were introduced to better contrast the absolute intensity and the colour balance variation in each core; relevant factor scores were extracted. Finally, tissue-related factors of IHC staining variance were explored in the individual tissue cores. Conclusions Our solution enabled to monitor staining of IHC multi-tissue controls by the means of IA, followed by automated statistical analysis, integrated into the laboratory workflow. We found that, even in consecutive serial tissue sections, tissue-related factors affected the IHC IA results; meanwhile, less intense blue counterstain was associated with less amount of tissue, detected by the IA tools. PMID:25565007

  15. SEER Cancer Query Systems (CanQues)

    Cancer.gov

    These applications provide access to cancer statistics including incidence, mortality, survival, prevalence, and probability of developing or dying from cancer. Users can display reports of the statistics or extract them for additional analyses.

  16. A novel probabilistic framework for event-based speech recognition

    NASA Astrophysics Data System (ADS)

    Juneja, Amit; Espy-Wilson, Carol

    2003-10-01

    One of the reasons for unsatisfactory performance of the state-of-the-art automatic speech recognition (ASR) systems is the inferior acoustic modeling of low-level acoustic-phonetic information in the speech signal. An acoustic-phonetic approach to ASR, on the other hand, explicitly targets linguistic information in the speech signal, but such a system for continuous speech recognition (CSR) is not known to exist. A probabilistic and statistical framework for CSR based on the idea of the representation of speech sounds by bundles of binary valued articulatory phonetic features is proposed. Multiple probabilistic sequences of linguistically motivated landmarks are obtained using binary classifiers of manner phonetic features-syllabic, sonorant and continuant-and the knowledge-based acoustic parameters (APs) that are acoustic correlates of those features. The landmarks are then used for the extraction of knowledge-based APs for source and place phonetic features and their binary classification. Probabilistic landmark sequences are constrained using manner class language models for isolated or connected word recognition. The proposed method could overcome the disadvantages encountered by the early acoustic-phonetic knowledge-based systems that led the ASR community to switch to systems highly dependent on statistical pattern analysis methods and probabilistic language or grammar models.

  17. Big Data Analytics for Scanning Transmission Electron Microscopy Ptychography

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jesse, S.; Chi, M.; Belianinov, A.

    Electron microscopy is undergoing a transition; from the model of producing only a few micrographs, through the current state where many images and spectra can be digitally recorded, to a new mode where very large volumes of data (movies, ptychographic and multi-dimensional series) can be rapidly obtained. In this paper, we discuss the application of so-called “big-data” methods to high dimensional microscopy data, using unsupervised multivariate statistical techniques, in order to explore salient image features in a specific example of BiFeO 3 domains. Remarkably, k-means clustering reveals domain differentiation despite the fact that the algorithm is purely statistical in naturemore » and does not require any prior information regarding the material, any coexisting phases, or any differentiating structures. While this is a somewhat trivial case, this example signifies the extraction of useful physical and structural information without any prior bias regarding the sample or the instrumental modality. Further interpretation of these types of results may still require human intervention. Finally, however, the open nature of this algorithm and its wide availability, enable broad collaborations and exploratory work necessary to enable efficient data analysis in electron microscopy.« less

  18. Big Data Analytics for Scanning Transmission Electron Microscopy Ptychography

    DOE PAGES

    Jesse, S.; Chi, M.; Belianinov, A.; ...

    2016-05-23

    Electron microscopy is undergoing a transition; from the model of producing only a few micrographs, through the current state where many images and spectra can be digitally recorded, to a new mode where very large volumes of data (movies, ptychographic and multi-dimensional series) can be rapidly obtained. In this paper, we discuss the application of so-called “big-data” methods to high dimensional microscopy data, using unsupervised multivariate statistical techniques, in order to explore salient image features in a specific example of BiFeO 3 domains. Remarkably, k-means clustering reveals domain differentiation despite the fact that the algorithm is purely statistical in naturemore » and does not require any prior information regarding the material, any coexisting phases, or any differentiating structures. While this is a somewhat trivial case, this example signifies the extraction of useful physical and structural information without any prior bias regarding the sample or the instrumental modality. Further interpretation of these types of results may still require human intervention. Finally, however, the open nature of this algorithm and its wide availability, enable broad collaborations and exploratory work necessary to enable efficient data analysis in electron microscopy.« less

  19. Big Data Analytics for Scanning Transmission Electron Microscopy Ptychography

    PubMed Central

    Jesse, S.; Chi, M.; Belianinov, A.; Beekman, C.; Kalinin, S. V.; Borisevich, A. Y.; Lupini, A. R.

    2016-01-01

    Electron microscopy is undergoing a transition; from the model of producing only a few micrographs, through the current state where many images and spectra can be digitally recorded, to a new mode where very large volumes of data (movies, ptychographic and multi-dimensional series) can be rapidly obtained. Here, we discuss the application of so-called “big-data” methods to high dimensional microscopy data, using unsupervised multivariate statistical techniques, in order to explore salient image features in a specific example of BiFeO3 domains. Remarkably, k-means clustering reveals domain differentiation despite the fact that the algorithm is purely statistical in nature and does not require any prior information regarding the material, any coexisting phases, or any differentiating structures. While this is a somewhat trivial case, this example signifies the extraction of useful physical and structural information without any prior bias regarding the sample or the instrumental modality. Further interpretation of these types of results may still require human intervention. However, the open nature of this algorithm and its wide availability, enable broad collaborations and exploratory work necessary to enable efficient data analysis in electron microscopy. PMID:27211523

  20. A extract method of mountainous area settlement place information from GF-1 high resolution optical remote sensing image under semantic constraints

    NASA Astrophysics Data System (ADS)

    Guo, H., II

    2016-12-01

    Spatial distribution information of mountainous area settlement place is of great significance to the earthquake emergency work because most of the key earthquake hazardous areas of china are located in the mountainous area. Remote sensing has the advantages of large coverage and low cost, it is an important way to obtain the spatial distribution information of mountainous area settlement place. At present, fully considering the geometric information, spectral information and texture information, most studies have applied object-oriented methods to extract settlement place information, In this article, semantic constraints is to be added on the basis of object-oriented methods. The experimental data is one scene remote sensing image of domestic high resolution satellite (simply as GF-1), with a resolution of 2 meters. The main processing consists of 3 steps, the first is pretreatment, including ortho rectification and image fusion, the second is Object oriented information extraction, including Image segmentation and information extraction, the last step is removing the error elements under semantic constraints, in order to formulate these semantic constraints, the distribution characteristics of mountainous area settlement place must be analyzed and the spatial logic relation between settlement place and other objects must be considered. The extraction accuracy calculation result shows that the extraction accuracy of object oriented method is 49% and rise up to 86% after the use of semantic constraints. As can be seen from the extraction accuracy, the extract method under semantic constraints can effectively improve the accuracy of mountainous area settlement place information extraction. The result shows that it is feasible to extract mountainous area settlement place information form GF-1 image, so the article proves that it has a certain practicality to use domestic high resolution optical remote sensing image in earthquake emergency preparedness.

  1. Functional Differences between Statistical Learning with and without Explicit Training

    ERIC Educational Resources Information Center

    Batterink, Laura J.; Reber, Paul J.; Paller, Ken A.

    2015-01-01

    Humans are capable of rapidly extracting regularities from environmental input, a process known as statistical learning. This type of learning typically occurs automatically, through passive exposure to environmental input. The presumed function of statistical learning is to optimize processing, allowing the brain to more accurately predict and…

  2. De-blending deep Herschel surveys: A multi-wavelength approach

    NASA Astrophysics Data System (ADS)

    Pearson, W. J.; Wang, L.; van der Tak, F. F. S.; Hurley, P. D.; Burgarella, D.; Oliver, S. J.

    2017-07-01

    Aims: Cosmological surveys in the far-infrared are known to suffer from confusion. The Bayesian de-blending tool, XID+, currently provides one of the best ways to de-confuse deep Herschel SPIRE images, using a flat flux density prior. This work is to demonstrate that existing multi-wavelength data sets can be exploited to improve XID+ by providing an informed prior, resulting in more accurate and precise extracted flux densities. Methods: Photometric data for galaxies in the COSMOS field were used to constrain spectral energy distributions (SEDs) using the fitting tool CIGALE. These SEDs were used to create Gaussian prior estimates in the SPIRE bands for XID+. The multi-wavelength photometry and the extracted SPIRE flux densities were run through CIGALE again to allow us to compare the performance of the two priors. Inferred ALMA flux densities (FinferALMA), at 870 μm and 1250 μm, from the best fitting SEDs from the second CIGALE run were compared with measured ALMA flux densities (FmeasALMA) as an independent performance validation. Similar validations were conducted with the SED modelling and fitting tool MAGPHYS and modified black-body functions to test for model dependency. Results: We demonstrate a clear improvement in agreement between the flux densities extracted with XID+ and existing data at other wavelengths when using the new informed Gaussian prior over the original uninformed prior. The residuals between FmeasALMA and FinferALMA were calculated. For the Gaussian priors these residuals, expressed as a multiple of the ALMA error (σ), have a smaller standard deviation, 7.95σ for the Gaussian prior compared to 12.21σ for the flat prior; reduced mean, 1.83σ compared to 3.44σ; and have reduced skew to positive values, 7.97 compared to 11.50. These results were determined to not be significantly model dependent. This results in statistically more reliable SPIRE flux densities and hence statistically more reliable infrared luminosity estimates. Herschel is an ESA space observatory with science instruments provided by European-led Principal Investigator consortia and with important participation from NASA.

  3. Terminology extraction from medical texts in Polish

    PubMed Central

    2014-01-01

    Background Hospital documents contain free text describing the most important facts relating to patients and their illnesses. These documents are written in specific language containing medical terminology related to hospital treatment. Their automatic processing can help in verifying the consistency of hospital documentation and obtaining statistical data. To perform this task we need information on the phrases we are looking for. At the moment, clinical Polish resources are sparse. The existing terminologies, such as Polish Medical Subject Headings (MeSH), do not provide sufficient coverage for clinical tasks. It would be helpful therefore if it were possible to automatically prepare, on the basis of a data sample, an initial set of terms which, after manual verification, could be used for the purpose of information extraction. Results Using a combination of linguistic and statistical methods for processing over 1200 children hospital discharge records, we obtained a list of single and multiword terms used in hospital discharge documents written in Polish. The phrases are ordered according to their presumed importance in domain texts measured by the frequency of use of a phrase and the variety of its contexts. The evaluation showed that the automatically identified phrases cover about 84% of terms in domain texts. At the top of the ranked list, only 4% out of 400 terms were incorrect while out of the final 200, 20% of expressions were either not domain related or syntactically incorrect. We also observed that 70% of the obtained terms are not included in the Polish MeSH. Conclusions Automatic terminology extraction can give results which are of a quality high enough to be taken as a starting point for building domain related terminological dictionaries or ontologies. This approach can be useful for preparing terminological resources for very specific subdomains for which no relevant terminologies already exist. The evaluation performed showed that none of the tested ranking procedures were able to filter out all improperly constructed noun phrases from the top of the list. Careful choice of noun phrases is crucial to the usefulness of the created terminological resource in applications such as lexicon construction or acquisition of semantic relations from texts. PMID:24976943

  4. Terminology extraction from medical texts in Polish.

    PubMed

    Marciniak, Małgorzata; Mykowiecka, Agnieszka

    2014-01-01

    Hospital documents contain free text describing the most important facts relating to patients and their illnesses. These documents are written in specific language containing medical terminology related to hospital treatment. Their automatic processing can help in verifying the consistency of hospital documentation and obtaining statistical data. To perform this task we need information on the phrases we are looking for. At the moment, clinical Polish resources are sparse. The existing terminologies, such as Polish Medical Subject Headings (MeSH), do not provide sufficient coverage for clinical tasks. It would be helpful therefore if it were possible to automatically prepare, on the basis of a data sample, an initial set of terms which, after manual verification, could be used for the purpose of information extraction. Using a combination of linguistic and statistical methods for processing over 1200 children hospital discharge records, we obtained a list of single and multiword terms used in hospital discharge documents written in Polish. The phrases are ordered according to their presumed importance in domain texts measured by the frequency of use of a phrase and the variety of its contexts. The evaluation showed that the automatically identified phrases cover about 84% of terms in domain texts. At the top of the ranked list, only 4% out of 400 terms were incorrect while out of the final 200, 20% of expressions were either not domain related or syntactically incorrect. We also observed that 70% of the obtained terms are not included in the Polish MeSH. Automatic terminology extraction can give results which are of a quality high enough to be taken as a starting point for building domain related terminological dictionaries or ontologies. This approach can be useful for preparing terminological resources for very specific subdomains for which no relevant terminologies already exist. The evaluation performed showed that none of the tested ranking procedures were able to filter out all improperly constructed noun phrases from the top of the list. Careful choice of noun phrases is crucial to the usefulness of the created terminological resource in applications such as lexicon construction or acquisition of semantic relations from texts.

  5. Relationship between post-extraction pain and acute pulpitis: a randomised trial using third molars.

    PubMed

    Zhang, Wei; Dai, Yong-Bo; Wan, Peng-Cheng; Xu, Dong-Dong; Guo, Yi; Li, Zhi

    2016-12-01

    The aim of the present study was to examine the relationship between post-extraction pain and acute pulpitis in third molars. This study was a randomised controlled trial. Sixty patients requiring removal of a single maxillary third molar with acute pulpitis were included and randomly divided into two groups: group A (n = 30); and group B (n = 30). In group A, third molars were directly extracted, and group B received endodontic therapy (pulp chamber opening and drainage) and underwent extraction 24 hours later, aiming to eliminate the acute inflammation. Another 30 patients requiring removal of a single maxillary third molar and with the same inclusion criteria but without caries or acute pulpitis were recruited into group C, in which the maxillary third molars were also directly extracted. The level of postoperative pain reported each day among the three groups was statistically evaluated. On the first, second and third days after surgery, there was a statistically significant difference between group A and group B and between group A and group C, but there was no statistically significant difference between group B and group C. The results of the present study indicate that there is more pain when third molars with acute pulpitis are directly removed compared with the pain level of the removal of third molars without acute pulpitis. © 2016 FDI World Dental Federation.

  6. Autism, Context/Noncontext Information Processing, and Atypical Development

    PubMed Central

    Skoyles, John R.

    2011-01-01

    Autism has been attributed to a deficit in contextual information processing. Attempts to understand autism in terms of such a defect, however, do not include more recent computational work upon context. This work has identified that context information processing depends upon the extraction and use of the information hidden in higher-order (or indirect) associations. Higher-order associations underlie the cognition of context rather than that of situations. This paper starts by examining the differences between higher-order and first-order (or direct) associations. Higher-order associations link entities not directly (as with first-order ones) but indirectly through all the connections they have via other entities. Extracting this information requires the processing of past episodes as a totality. As a result, this extraction depends upon specialised extraction processes separate from cognition. This information is then consolidated. Due to this difference, the extraction/consolidation of higher-order information can be impaired whilst cognition remains intact. Although not directly impaired, cognition will be indirectly impaired by knock on effects such as cognition compensating for absent higher-order information with information extracted from first-order associations. This paper discusses the implications of this for the inflexible, literal/immediate, and inappropriate information processing of autistic individuals. PMID:22937255

  7. Intensity information extraction in Geiger mode detector array based three-dimensional imaging applications

    NASA Astrophysics Data System (ADS)

    Wang, Fei

    2013-09-01

    Geiger-mode detectors have single photon sensitivity and picoseconds timing resolution, which make it a good candidate for low light level ranging applications, especially in the case of flash three dimensional imaging applications where the received laser power is extremely limited. Another advantage of Geiger-mode APD is their capability of large output current which can drive CMOS timing circuit directly, which means that larger format focal plane arrays can be easily fabricated using the mature CMOS technology. However Geiger-mode detector based FPAs can only measure the range information of a scene but not the reflectivity. Reflectivity is a major characteristic which can help target classification and identification. According to Poisson statistic nature, detection probability is tightly connected to the incident number of photon. Employing this relation, a signal intensity estimation method based on probability inversion is proposed. Instead of measuring intensity directly, several detections are conducted, then the detection probability is obtained and the intensity is estimated using this method. The relation between the estimator's accuracy, measuring range and number of detections are discussed based on statistical theory. Finally Monte-Carlo simulation is conducted to verify the correctness of this theory. Using 100 times of detection, signal intensity equal to 4.6 photons per detection can be measured using this method. With slight modification of measuring strategy, intensity information can be obtained using current Geiger-mode detector based FPAs, which can enrich the information acquired and broaden the application field of current technology.

  8. Statistical coding and decoding of heartbeat intervals.

    PubMed

    Lucena, Fausto; Barros, Allan Kardec; Príncipe, José C; Ohnishi, Noboru

    2011-01-01

    The heart integrates neuroregulatory messages into specific bands of frequency, such that the overall amplitude spectrum of the cardiac output reflects the variations of the autonomic nervous system. This modulatory mechanism seems to be well adjusted to the unpredictability of the cardiac demand, maintaining a proper cardiac regulation. A longstanding theory holds that biological organisms facing an ever-changing environment are likely to evolve adaptive mechanisms to extract essential features in order to adjust their behavior. The key question, however, has been to understand how the neural circuitry self-organizes these feature detectors to select behaviorally relevant information. Previous studies in computational perception suggest that a neural population enhances information that is important for survival by minimizing the statistical redundancy of the stimuli. Herein we investigate whether the cardiac system makes use of a redundancy reduction strategy to regulate the cardiac rhythm. Based on a network of neural filters optimized to code heartbeat intervals, we learn a population code that maximizes the information across the neural ensemble. The emerging population code displays filter tuning proprieties whose characteristics explain diverse aspects of the autonomic cardiac regulation, such as the compromise between fast and slow cardiac responses. We show that the filters yield responses that are quantitatively similar to observed heart rate responses during direct sympathetic or parasympathetic nerve stimulation. Our findings suggest that the heart decodes autonomic stimuli according to information theory principles analogous to how perceptual cues are encoded by sensory systems.

  9. Conditional Disease Development extracted from Longitudinal Health Care Cohort Data using Layered Network Construction

    PubMed Central

    Kannan, Venkateshan; Swartz, Fredrik; Kiani, Narsis A.; Silberberg, Gilad; Tsipras, Giorgos; Gomez-Cabrero, David; Alexanderson, Kristina; Tegnèr, Jesper

    2016-01-01

    Health care data holds great promise to be used in clinical decision support systems. However, frequent near-synonymous diagnoses recorded separately, as well as the sheer magnitude and complexity of the disease data makes it challenging to extract non-trivial conclusions beyond confirmatory associations from such a web of interactions. Here we present a systematic methodology to derive statistically valid conditional development of diseases. To this end we utilize a cohort of 5,512,469 individuals followed over 13 years at inpatient care, including data on disability pension and cause of death. By introducing a causal information fraction measure and taking advantage of the composite structure in the ICD codes, we extract an effective directed lower dimensional network representation (100 nodes and 130 edges) of our cohort. Unpacking composite nodes into bipartite graphs retrieves, for example, that individuals with behavioral disorders are more likely to be followed by prescription drug poisoning episodes, whereas women with leiomyoma were more likely to subsequently experience endometriosis. The conditional disease development represent putative causal relations, indicating possible novel clinical relationships and pathophysiological associations that have not been explored yet. PMID:27211115

  10. A Study of Feature Extraction Using Divergence Analysis of Texture Features

    NASA Technical Reports Server (NTRS)

    Hallada, W. A.; Bly, B. G.; Boyd, R. K.; Cox, S.

    1982-01-01

    An empirical study of texture analysis for feature extraction and classification of high spatial resolution remotely sensed imagery (10 meters) is presented in terms of specific land cover types. The principal method examined is the use of spatial gray tone dependence (SGTD). The SGTD method reduces the gray levels within a moving window into a two-dimensional spatial gray tone dependence matrix which can be interpreted as a probability matrix of gray tone pairs. Haralick et al (1973) used a number of information theory measures to extract texture features from these matrices, including angular second moment (inertia), correlation, entropy, homogeneity, and energy. The derivation of the SGTD matrix is a function of: (1) the number of gray tones in an image; (2) the angle along which the frequency of SGTD is calculated; (3) the size of the moving window; and (4) the distance between gray tone pairs. The first three parameters were varied and tested on a 10 meter resolution panchromatic image of Maryville, Tennessee using the five SGTD measures. A transformed divergence measure was used to determine the statistical separability between four land cover categories forest, new residential, old residential, and industrial for each variation in texture parameters.

  11. Conditional Disease Development extracted from Longitudinal Health Care Cohort Data using Layered Network Construction.

    PubMed

    Kannan, Venkateshan; Swartz, Fredrik; Kiani, Narsis A; Silberberg, Gilad; Tsipras, Giorgos; Gomez-Cabrero, David; Alexanderson, Kristina; Tegnèr, Jesper

    2016-05-23

    Health care data holds great promise to be used in clinical decision support systems. However, frequent near-synonymous diagnoses recorded separately, as well as the sheer magnitude and complexity of the disease data makes it challenging to extract non-trivial conclusions beyond confirmatory associations from such a web of interactions. Here we present a systematic methodology to derive statistically valid conditional development of diseases. To this end we utilize a cohort of 5,512,469 individuals followed over 13 years at inpatient care, including data on disability pension and cause of death. By introducing a causal information fraction measure and taking advantage of the composite structure in the ICD codes, we extract an effective directed lower dimensional network representation (100 nodes and 130 edges) of our cohort. Unpacking composite nodes into bipartite graphs retrieves, for example, that individuals with behavioral disorders are more likely to be followed by prescription drug poisoning episodes, whereas women with leiomyoma were more likely to subsequently experience endometriosis. The conditional disease development represent putative causal relations, indicating possible novel clinical relationships and pathophysiological associations that have not been explored yet.

  12. Unified anomaly suppression and boundary extraction in laser radar range imagery based on a joint curve-evolution and expectation-maximization algorithm.

    PubMed

    Feng, Haihua; Karl, William Clem; Castañon, David A

    2008-05-01

    In this paper, we develop a new unified approach for laser radar range anomaly suppression, range profiling, and segmentation. This approach combines an object-based hybrid scene model for representing the range distribution of the field and a statistical mixture model for the range data measurement noise. The image segmentation problem is formulated as a minimization problem which jointly estimates the target boundary together with the target region range variation and background range variation directly from the noisy and anomaly-filled range data. This formulation allows direct incorporation of prior information concerning the target boundary, target ranges, and background ranges into an optimal reconstruction process. Curve evolution techniques and a generalized expectation-maximization algorithm are jointly employed as an efficient solver for minimizing the objective energy, resulting in a coupled pair of object and intensity optimization tasks. The method directly and optimally extracts the target boundary, avoiding a suboptimal two-step process involving image smoothing followed by boundary extraction. Experiments are presented demonstrating that the proposed approach is robust to anomalous pixels (missing data) and capable of producing accurate estimation of the target boundary and range values from noisy data.

  13. High-grade glioma diffusive modeling using statistical tissue information and diffusion tensors extracted from atlases.

    PubMed

    Roniotis, Alexandros; Manikis, Georgios C; Sakkalis, Vangelis; Zervakis, Michalis E; Karatzanis, Ioannis; Marias, Kostas

    2012-03-01

    Glioma, especially glioblastoma, is a leading cause of brain cancer fatality involving highly invasive and neoplastic growth. Diffusive models of glioma growth use variations of the diffusion-reaction equation in order to simulate the invasive patterns of glioma cells by approximating the spatiotemporal change of glioma cell concentration. The most advanced diffusive models take into consideration the heterogeneous velocity of glioma in gray and white matter, by using two different discrete diffusion coefficients in these areas. Moreover, by using diffusion tensor imaging (DTI), they simulate the anisotropic migration of glioma cells, which is facilitated along white fibers, assuming diffusion tensors with different diffusion coefficients along each candidate direction of growth. Our study extends this concept by fully exploiting the proportions of white and gray matter extracted by normal brain atlases, rather than discretizing diffusion coefficients. Moreover, the proportions of white and gray matter, as well as the diffusion tensors, are extracted by the respective atlases; thus, no DTI processing is needed. Finally, we applied this novel glioma growth model on real data and the results indicate that prognostication rates can be improved. © 2012 IEEE

  14. Identifying the Critical Time Period for Information Extraction when Recognizing Sequences of Play

    ERIC Educational Resources Information Center

    North, Jamie S.; Williams, A. Mark

    2008-01-01

    The authors attempted to determine the critical time period for information extraction when recognizing play sequences in soccer. Although efforts have been made to identify the perceptual information underpinning such decisions, no researchers have attempted to determine "when" this information may be extracted from the display. The authors…

  15. Can we replace curation with information extraction software?

    PubMed

    Karp, Peter D

    2016-01-01

    Can we use programs for automated or semi-automated information extraction from scientific texts as practical alternatives to professional curation? I show that error rates of current information extraction programs are too high to replace professional curation today. Furthermore, current IEP programs extract single narrow slivers of information, such as individual protein interactions; they cannot extract the large breadth of information extracted by professional curators for databases such as EcoCyc. They also cannot arbitrate among conflicting statements in the literature as curators can. Therefore, funding agencies should not hobble the curation efforts of existing databases on the assumption that a problem that has stymied Artificial Intelligence researchers for more than 60 years will be solved tomorrow. Semi-automated extraction techniques appear to have significantly more potential based on a review of recent tools that enhance curator productivity. But a full cost-benefit analysis for these tools is lacking. Without such analysis it is possible to expend significant effort developing information-extraction tools that automate small parts of the overall curation workflow without achieving a significant decrease in curation costs.Database URL. © The Author(s) 2016. Published by Oxford University Press.

  16. Microarray data and gene expression statistics for Saccharomyces cerevisiae exposed to simulated asbestos mine drainage.

    PubMed

    Driscoll, Heather E; Murray, Janet M; English, Erika L; Hunter, Timothy C; Pivarski, Kara; Dolci, Elizabeth D

    2017-08-01

    Here we describe microarray expression data (raw and normalized), experimental metadata, and gene-level data with expression statistics from Saccharomyces cerevisiae exposed to simulated asbestos mine drainage from the Vermont Asbestos Group (VAG) Mine on Belvidere Mountain in northern Vermont, USA. For nearly 100 years (between the late 1890s and 1993), chrysotile asbestos fibers were extracted from serpentinized ultramafic rock at the VAG Mine for use in construction and manufacturing industries. Studies have shown that water courses and streambeds nearby have become contaminated with asbestos mine tailings runoff, including elevated levels of magnesium, nickel, chromium, and arsenic, elevated pH, and chrysotile asbestos-laden mine tailings, due to leaching and gradual erosion of massive piles of mine waste covering approximately 9 km 2 . We exposed yeast to simulated VAG Mine tailings leachate to help gain insight on how eukaryotic cells exposed to VAG Mine drainage may respond in the mine environment. Affymetrix GeneChip® Yeast Genome 2.0 Arrays were utilized to assess gene expression after 24-h exposure to simulated VAG Mine tailings runoff. The chemistry of mine-tailings leachate, mine-tailings leachate plus yeast extract peptone dextrose media, and control yeast extract peptone dextrose media is also reported. To our knowledge this is the first dataset to assess global gene expression patterns in a eukaryotic model system simulating asbestos mine tailings runoff exposure. Raw and normalized gene expression data are accessible through the National Center for Biotechnology Information Gene Expression Omnibus (NCBI GEO) Database Series GSE89875 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE89875).

  17. Semi-automatic medical image segmentation with adaptive local statistics in Conditional Random Fields framework.

    PubMed

    Hu, Yu-Chi J; Grossberg, Michael D; Mageras, Gikas S

    2008-01-01

    Planning radiotherapy and surgical procedures usually require onerous manual segmentation of anatomical structures from medical images. In this paper we present a semi-automatic and accurate segmentation method to dramatically reduce the time and effort required of expert users. This is accomplished by giving a user an intuitive graphical interface to indicate samples of target and non-target tissue by loosely drawing a few brush strokes on the image. We use these brush strokes to provide the statistical input for a Conditional Random Field (CRF) based segmentation. Since we extract purely statistical information from the user input, we eliminate the need of assumptions on boundary contrast previously used by many other methods, A new feature of our method is that the statistics on one image can be reused on related images without registration. To demonstrate this, we show that boundary statistics provided on a few 2D slices of volumetric medical data, can be propagated through the entire 3D stack of images without using the geometric correspondence between images. In addition, the image segmentation from the CRF can be formulated as a minimum s-t graph cut problem which has a solution that is both globally optimal and fast. The combination of a fast segmentation and minimal user input that is reusable, make this a powerful technique for the segmentation of medical images.

  18. Computer image analysis in obtaining characteristics of images: greenhouse tomatoes in the process of generating learning sets of artificial neural networks

    NASA Astrophysics Data System (ADS)

    Zaborowicz, M.; Przybył, J.; Koszela, K.; Boniecki, P.; Mueller, W.; Raba, B.; Lewicki, A.; Przybył, K.

    2014-04-01

    The aim of the project was to make the software which on the basis on image of greenhouse tomato allows for the extraction of its characteristics. Data gathered during the image analysis and processing were used to build learning sets of artificial neural networks. Program enables to process pictures in jpeg format, acquisition of statistical information of the picture and export them to an external file. Produced software is intended to batch analyze collected research material and obtained information saved as a csv file. Program allows for analysis of 33 independent parameters implicitly to describe tested image. The application is dedicated to processing and image analysis of greenhouse tomatoes. The program can be used for analysis of other fruits and vegetables of a spherical shape.

  19. Marginal regression analysis of recurrent events with coarsened censoring times.

    PubMed

    Hu, X Joan; Rosychuk, Rhonda J

    2016-12-01

    Motivated by an ongoing pediatric mental health care (PMHC) study, this article presents weakly structured methods for analyzing doubly censored recurrent event data where only coarsened information on censoring is available. The study extracted administrative records of emergency department visits from provincial health administrative databases. The available information of each individual subject is limited to a subject-specific time window determined up to concealed data. To evaluate time-dependent effect of exposures, we adapt the local linear estimation with right censored survival times under the Cox regression model with time-varying coefficients (cf. Cai and Sun, Scandinavian Journal of Statistics 2003, 30, 93-111). We establish the pointwise consistency and asymptotic normality of the regression parameter estimator, and examine its performance by simulation. The PMHC study illustrates the proposed approach throughout the article. © 2016, The International Biometric Society.

  20. Neonatal heart rate prediction.

    PubMed

    Abdel-Rahman, Yumna; Jeremic, Aleksander; Tan, Kenneth

    2009-01-01

    Technological advances have caused a decrease in the number of infant deaths. Pre-term infants now have a substantially increased chance of survival. One of the mechanisms that is vital to saving the lives of these infants is continuous monitoring and early diagnosis. With continuous monitoring huge amounts of data are collected with so much information embedded in them. By using statistical analysis this information can be extracted and used to aid diagnosis and to understand development. In this study we have a large dataset containing over 180 pre-term infants whose heart rates were recorded over the length of their stay in the Neonatal Intensive Care Unit (NICU). We test two types of models, empirical bayesian and autoregressive moving average. We then attempt to predict future values. The autoregressive moving average model showed better results but required more computation.

Top