NASA Astrophysics Data System (ADS)
Su, Zuqiang; Xiao, Hong; Zhang, Yi; Tang, Baoping; Jiang, Yonghua
2017-04-01
Extraction of sensitive features is a challenging but key task in data-driven machinery running state identification. Aimed at solving this problem, a method for machinery running state identification that applies discriminant semi-supervised local tangent space alignment (DSS-LTSA) for feature fusion and extraction is proposed. Firstly, in order to extract more distinct features, the vibration signals are decomposed by wavelet packet decomposition WPD, and a mixed-domain feature set consisted of statistical features, autoregressive (AR) model coefficients, instantaneous amplitude Shannon entropy and WPD energy spectrum is extracted to comprehensively characterize the properties of machinery running state(s). Then, the mixed-dimension feature set is inputted into DSS-LTSA for feature fusion and extraction to eliminate redundant information and interference noise. The proposed DSS-LTSA can extract intrinsic structure information of both labeled and unlabeled state samples, and as a result the over-fitting problem of supervised manifold learning and blindness problem of unsupervised manifold learning are overcome. Simultaneously, class discrimination information is integrated within the dimension reduction process in a semi-supervised manner to improve sensitivity of the extracted fusion features. Lastly, the extracted fusion features are inputted into a pattern recognition algorithm to achieve the running state identification. The effectiveness of the proposed method is verified by a running state identification case in a gearbox, and the results confirm the improved accuracy of the running state identification.
Semi-Supervised Recurrent Neural Network for Adverse Drug Reaction mention extraction.
Gupta, Shashank; Pawar, Sachin; Ramrakhiyani, Nitin; Palshikar, Girish Keshav; Varma, Vasudeva
2018-06-13
Social media is a useful platform to share health-related information due to its vast reach. This makes it a good candidate for public-health monitoring tasks, specifically for pharmacovigilance. We study the problem of extraction of Adverse-Drug-Reaction (ADR) mentions from social media, particularly from Twitter. Medical information extraction from social media is challenging, mainly due to short and highly informal nature of text, as compared to more technical and formal medical reports. Current methods in ADR mention extraction rely on supervised learning methods, which suffer from labeled data scarcity problem. The state-of-the-art method uses deep neural networks, specifically a class of Recurrent Neural Network (RNN) which is Long-Short-Term-Memory network (LSTM). Deep neural networks, due to their large number of free parameters rely heavily on large annotated corpora for learning the end task. But in the real-world, it is hard to get large labeled data, mainly due to the heavy cost associated with the manual annotation. To this end, we propose a novel semi-supervised learning based RNN model, which can leverage unlabeled data also present in abundance on social media. Through experiments we demonstrate the effectiveness of our method, achieving state-of-the-art performance in ADR mention extraction. In this study, we tackle the problem of labeled data scarcity for Adverse Drug Reaction mention extraction from social media and propose a novel semi-supervised learning based method which can leverage large unlabeled corpus available in abundance on the web. Through empirical study, we demonstrate that our proposed method outperforms fully supervised learning based baseline which relies on large manually annotated corpus for a good performance.
Rapid Training of Information Extraction with Local and Global Data Views
2012-05-01
relation type extension system based on active learning a relation type extension system based on semi-supervised learning, and a crossdomain...bootstrapping system for domain adaptive named entity extraction. The active learning procedure adopts features extracted at the sentence level as the local
Coupled Semi-Supervised Learning
2010-05-01
later in the thesis, in Chapter 5. CPL as a Case Study of Coupled Semi-Supervised Learning The results presented above demonstrate that coupling...EXTRACTION PATTERNS Our answer to the question posed above, then, is that our results with CPL serve as a case study of coupled semi-supervised learning of...that are incompatible with the coupling constraints. Thus, we argue that our results with CPL serve as a case study of coupled semi-supervised
Rapid Training of Information Extraction with Local and Global Data Views
2012-05-01
56 xiii 4.1 An example of words and their bit string representations. Bold ones are transliterated Arabic words...Natural Language Processing ( NLP ) community faces new tasks and new domains all the time. Without enough labeled data of a new task or a new domain to...conduct supervised learning, semi-supervised learning is particularly attractive to NLP researchers since it only requires a handful of labeled examples
A semi-supervised learning framework for biomedical event extraction based on hidden topics.
Zhou, Deyu; Zhong, Dayou
2015-05-01
Scientists have devoted decades of efforts to understanding the interaction between proteins or RNA production. The information might empower the current knowledge on drug reactions or the development of certain diseases. Nevertheless, due to the lack of explicit structure, literature in life science, one of the most important sources of this information, prevents computer-based systems from accessing. Therefore, biomedical event extraction, automatically acquiring knowledge of molecular events in research articles, has attracted community-wide efforts recently. Most approaches are based on statistical models, requiring large-scale annotated corpora to precisely estimate models' parameters. However, it is usually difficult to obtain in practice. Therefore, employing un-annotated data based on semi-supervised learning for biomedical event extraction is a feasible solution and attracts more interests. In this paper, a semi-supervised learning framework based on hidden topics for biomedical event extraction is presented. In this framework, sentences in the un-annotated corpus are elaborately and automatically assigned with event annotations based on their distances to these sentences in the annotated corpus. More specifically, not only the structures of the sentences, but also the hidden topics embedded in the sentences are used for describing the distance. The sentences and newly assigned event annotations, together with the annotated corpus, are employed for training. Experiments were conducted on the multi-level event extraction corpus, a golden standard corpus. Experimental results show that more than 2.2% improvement on F-score on biomedical event extraction is achieved by the proposed framework when compared to the state-of-the-art approach. The results suggest that by incorporating un-annotated data, the proposed framework indeed improves the performance of the state-of-the-art event extraction system and the similarity between sentences might be precisely described by hidden topics and structures of the sentences. Copyright © 2015 Elsevier B.V. All rights reserved.
Song, Min; Yu, Hwanjo; Han, Wook-Shin
2011-11-24
Protein-protein interaction (PPI) extraction has been a focal point of many biomedical research and database curation tools. Both Active Learning and Semi-supervised SVMs have recently been applied to extract PPI automatically. In this paper, we explore combining the AL with the SSL to improve the performance of the PPI task. We propose a novel PPI extraction technique called PPISpotter by combining Deterministic Annealing-based SSL and an AL technique to extract protein-protein interaction. In addition, we extract a comprehensive set of features from MEDLINE records by Natural Language Processing (NLP) techniques, which further improve the SVM classifiers. In our feature selection technique, syntactic, semantic, and lexical properties of text are incorporated into feature selection that boosts the system performance significantly. By conducting experiments with three different PPI corpuses, we show that PPISpotter is superior to the other techniques incorporated into semi-supervised SVMs such as Random Sampling, Clustering, and Transductive SVMs by precision, recall, and F-measure. Our system is a novel, state-of-the-art technique for efficiently extracting protein-protein interaction pairs.
Dong, Yadong; Sun, Yongqi; Qin, Chao
2018-01-01
The existing protein complex detection methods can be broadly divided into two categories: unsupervised and supervised learning methods. Most of the unsupervised learning methods assume that protein complexes are in dense regions of protein-protein interaction (PPI) networks even though many true complexes are not dense subgraphs. Supervised learning methods utilize the informative properties of known complexes; they often extract features from existing complexes and then use the features to train a classification model. The trained model is used to guide the search process for new complexes. However, insufficient extracted features, noise in the PPI data and the incompleteness of complex data make the classification model imprecise. Consequently, the classification model is not sufficient for guiding the detection of complexes. Therefore, we propose a new robust score function that combines the classification model with local structural information. Based on the score function, we provide a search method that works both forwards and backwards. The results from experiments on six benchmark PPI datasets and three protein complex datasets show that our approach can achieve better performance compared with the state-of-the-art supervised, semi-supervised and unsupervised methods for protein complex detection, occasionally significantly outperforming such methods.
Ortega-Martorell, Sandra; Ruiz, Héctor; Vellido, Alfredo; Olier, Iván; Romero, Enrique; Julià-Sapé, Margarida; Martín, José D.; Jarman, Ian H.; Arús, Carles; Lisboa, Paulo J. G.
2013-01-01
Background The clinical investigation of human brain tumors often starts with a non-invasive imaging study, providing information about the tumor extent and location, but little insight into the biochemistry of the analyzed tissue. Magnetic Resonance Spectroscopy can complement imaging by supplying a metabolic fingerprint of the tissue. This study analyzes single-voxel magnetic resonance spectra, which represent signal information in the frequency domain. Given that a single voxel may contain a heterogeneous mix of tissues, signal source identification is a relevant challenge for the problem of tumor type classification from the spectroscopic signal. Methodology/Principal Findings Non-negative matrix factorization techniques have recently shown their potential for the identification of meaningful sources from brain tissue spectroscopy data. In this study, we use a convex variant of these methods that is capable of handling negatively-valued data and generating sources that can be interpreted as tumor class prototypes. A novel approach to convex non-negative matrix factorization is proposed, in which prior knowledge about class information is utilized in model optimization. Class-specific information is integrated into this semi-supervised process by setting the metric of a latent variable space where the matrix factorization is carried out. The reported experimental study comprises 196 cases from different tumor types drawn from two international, multi-center databases. The results indicate that the proposed approach outperforms a purely unsupervised process by achieving near perfect correlation of the extracted sources with the mean spectra of the tumor types. It also improves tissue type classification. Conclusions/Significance We show that source extraction by unsupervised matrix factorization benefits from the integration of the available class information, so operating in a semi-supervised learning manner, for discriminative source identification and brain tumor labeling from single-voxel spectroscopy data. We are confident that the proposed methodology has wider applicability for biomedical signal processing. PMID:24376744
Liu, Jing; Zhao, Songzheng; Wang, Gang
2018-01-01
With the development of Web 2.0 technology, social media websites have become lucrative but under-explored data sources for extracting adverse drug events (ADEs), which is a serious health problem. Besides ADE, other semantic relation types (e.g., drug indication and beneficial effect) could hold between the drug and adverse event mentions, making ADE relation extraction - distinguishing ADE relationship from other relation types - necessary. However, conducting ADE relation extraction in social media environment is not a trivial task because of the expertise-dependent, time-consuming and costly annotation process, and the feature space's high-dimensionality attributed to intrinsic characteristics of social media data. This study aims to develop a framework for ADE relation extraction using patient-generated content in social media with better performance than that delivered by previous efforts. To achieve the objective, a general semi-supervised ensemble learning framework, SSEL-ADE, was developed. The framework exploited various lexical, semantic, and syntactic features, and integrated ensemble learning and semi-supervised learning. A series of experiments were conducted to verify the effectiveness of the proposed framework. Empirical results demonstrate the effectiveness of each component of SSEL-ADE and reveal that our proposed framework outperforms most of existing ADE relation extraction methods The SSEL-ADE can facilitate enhanced ADE relation extraction performance, thereby providing more reliable support for pharmacovigilance. Moreover, the proposed semi-supervised ensemble methods have the potential of being applied to effectively deal with other social media-based problems. Copyright © 2017 Elsevier B.V. All rights reserved.
Label Information Guided Graph Construction for Semi-Supervised Learning.
Zhuang, Liansheng; Zhou, Zihan; Gao, Shenghua; Yin, Jingwen; Lin, Zhouchen; Ma, Yi
2017-09-01
In the literature, most existing graph-based semi-supervised learning methods only use the label information of observed samples in the label propagation stage, while ignoring such valuable information when learning the graph. In this paper, we argue that it is beneficial to consider the label information in the graph learning stage. Specifically, by enforcing the weight of edges between labeled samples of different classes to be zero, we explicitly incorporate the label information into the state-of-the-art graph learning methods, such as the low-rank representation (LRR), and propose a novel semi-supervised graph learning method called semi-supervised low-rank representation. This results in a convex optimization problem with linear constraints, which can be solved by the linearized alternating direction method. Though we take LRR as an example, our proposed method is in fact very general and can be applied to any self-representation graph learning methods. Experiment results on both synthetic and real data sets demonstrate that the proposed graph learning method can better capture the global geometric structure of the data, and therefore is more effective for semi-supervised learning tasks.
Tan, Lirong; Holland, Scott K; Deshpande, Aniruddha K; Chen, Ye; Choo, Daniel I; Lu, Long J
2015-12-01
We developed a machine learning model to predict whether or not a cochlear implant (CI) candidate will develop effective language skills within 2 years after the CI surgery by using the pre-implant brain fMRI data from the candidate. The language performance was measured 2 years after the CI surgery by the Clinical Evaluation of Language Fundamentals-Preschool, Second Edition (CELF-P2). Based on the CELF-P2 scores, the CI recipients were designated as either effective or ineffective CI users. For feature extraction from the fMRI data, we constructed contrast maps using the general linear model, and then utilized the Bag-of-Words (BoW) approach that we previously published to convert the contrast maps into feature vectors. We trained both supervised models and semi-supervised models to classify CI users as effective or ineffective. Compared with the conventional feature extraction approach, which used each single voxel as a feature, our BoW approach gave rise to much better performance for the classification of effective versus ineffective CI users. The semi-supervised model with the feature set extracted by the BoW approach from the contrast of speech versus silence achieved a leave-one-out cross-validation AUC as high as 0.97. Recursive feature elimination unexpectedly revealed that two features were sufficient to provide highly accurate classification of effective versus ineffective CI users based on our current dataset. We have validated the hypothesis that pre-implant cortical activation patterns revealed by fMRI during infancy correlate with language performance 2 years after cochlear implantation. The two brain regions highlighted by our classifier are potential biomarkers for the prediction of CI outcomes. Our study also demonstrated the superiority of the semi-supervised model over the supervised model. It is always worthwhile to try a semi-supervised model when unlabeled data are available.
Active link selection for efficient semi-supervised community detection
NASA Astrophysics Data System (ADS)
Yang, Liang; Jin, Di; Wang, Xiao; Cao, Xiaochun
2015-03-01
Several semi-supervised community detection algorithms have been proposed recently to improve the performance of traditional topology-based methods. However, most of them focus on how to integrate supervised information with topology information; few of them pay attention to which information is critical for performance improvement. This leads to large amounts of demand for supervised information, which is expensive or difficult to obtain in most fields. For this problem we propose an active link selection framework, that is we actively select the most uncertain and informative links for human labeling for the efficient utilization of the supervised information. We also disconnect the most likely inter-community edges to further improve the efficiency. Our main idea is that, by connecting uncertain nodes to their community hubs and disconnecting the inter-community edges, one can sharpen the block structure of adjacency matrix more efficiently than randomly labeling links as the existing methods did. Experiments on both synthetic and real networks demonstrate that our new approach significantly outperforms the existing methods in terms of the efficiency of using supervised information. It needs ~13% of the supervised information to achieve a performance similar to that of the original semi-supervised approaches.
Active link selection for efficient semi-supervised community detection
Yang, Liang; Jin, Di; Wang, Xiao; Cao, Xiaochun
2015-01-01
Several semi-supervised community detection algorithms have been proposed recently to improve the performance of traditional topology-based methods. However, most of them focus on how to integrate supervised information with topology information; few of them pay attention to which information is critical for performance improvement. This leads to large amounts of demand for supervised information, which is expensive or difficult to obtain in most fields. For this problem we propose an active link selection framework, that is we actively select the most uncertain and informative links for human labeling for the efficient utilization of the supervised information. We also disconnect the most likely inter-community edges to further improve the efficiency. Our main idea is that, by connecting uncertain nodes to their community hubs and disconnecting the inter-community edges, one can sharpen the block structure of adjacency matrix more efficiently than randomly labeling links as the existing methods did. Experiments on both synthetic and real networks demonstrate that our new approach significantly outperforms the existing methods in terms of the efficiency of using supervised information. It needs ~13% of the supervised information to achieve a performance similar to that of the original semi-supervised approaches. PMID:25761385
Active Semi-Supervised Community Detection Based on Must-Link and Cannot-Link Constraints
Cheng, Jianjun; Leng, Mingwei; Li, Longjie; Zhou, Hanhai; Chen, Xiaoyun
2014-01-01
Community structure detection is of great importance because it can help in discovering the relationship between the function and the topology structure of a network. Many community detection algorithms have been proposed, but how to incorporate the prior knowledge in the detection process remains a challenging problem. In this paper, we propose a semi-supervised community detection algorithm, which makes full utilization of the must-link and cannot-link constraints to guide the process of community detection and thereby extracts high-quality community structures from networks. To acquire the high-quality must-link and cannot-link constraints, we also propose a semi-supervised component generation algorithm based on active learning, which actively selects nodes with maximum utility for the proposed semi-supervised community detection algorithm step by step, and then generates the must-link and cannot-link constraints by accessing a noiseless oracle. Extensive experiments were carried out, and the experimental results show that the introduction of active learning into the problem of community detection makes a success. Our proposed method can extract high-quality community structures from networks, and significantly outperforms other comparison methods. PMID:25329660
Active semi-supervised learning method with hybrid deep belief networks.
Zhou, Shusen; Chen, Qingcai; Wang, Xiaolong
2014-01-01
In this paper, we develop a novel semi-supervised learning algorithm called active hybrid deep belief networks (AHD), to address the semi-supervised sentiment classification problem with deep learning. First, we construct the previous several hidden layers using restricted Boltzmann machines (RBM), which can reduce the dimension and abstract the information of the reviews quickly. Second, we construct the following hidden layers using convolutional restricted Boltzmann machines (CRBM), which can abstract the information of reviews effectively. Third, the constructed deep architecture is fine-tuned by gradient-descent based supervised learning with an exponential loss function. Finally, active learning method is combined based on the proposed deep architecture. We did several experiments on five sentiment classification datasets, and show that AHD is competitive with previous semi-supervised learning algorithm. Experiments are also conducted to verify the effectiveness of our proposed method with different number of labeled reviews and unlabeled reviews respectively.
An Active Learning Framework for Hyperspectral Image Classification Using Hierarchical Segmentation
NASA Technical Reports Server (NTRS)
Zhang, Zhou; Pasolli, Edoardo; Crawford, Melba M.; Tilton, James C.
2015-01-01
Augmenting spectral data with spatial information for image classification has recently gained significant attention, as classification accuracy can often be improved by extracting spatial information from neighboring pixels. In this paper, we propose a new framework in which active learning (AL) and hierarchical segmentation (HSeg) are combined for spectral-spatial classification of hyperspectral images. The spatial information is extracted from a best segmentation obtained by pruning the HSeg tree using a new supervised strategy. The best segmentation is updated at each iteration of the AL process, thus taking advantage of informative labeled samples provided by the user. The proposed strategy incorporates spatial information in two ways: 1) concatenating the extracted spatial features and the original spectral features into a stacked vector and 2) extending the training set using a self-learning-based semi-supervised learning (SSL) approach. Finally, the two strategies are combined within an AL framework. The proposed framework is validated with two benchmark hyperspectral datasets. Higher classification accuracies are obtained by the proposed framework with respect to five other state-of-the-art spectral-spatial classification approaches. Moreover, the effectiveness of the proposed pruning strategy is also demonstrated relative to the approaches based on a fixed segmentation.
Cross-Domain Semi-Supervised Learning Using Feature Formulation.
Xingquan Zhu
2011-12-01
Semi-Supervised Learning (SSL) traditionally makes use of unlabeled samples by including them into the training set through an automated labeling process. Such a primitive Semi-Supervised Learning (pSSL) approach suffers from a number of disadvantages including false labeling and incapable of utilizing out-of-domain samples. In this paper, we propose a formative Semi-Supervised Learning (fSSL) framework which explores hidden features between labeled and unlabeled samples to achieve semi-supervised learning. fSSL regards that both labeled and unlabeled samples are generated from some hidden concepts with labeling information partially observable for some samples. The key of the fSSL is to recover the hidden concepts, and take them as new features to link labeled and unlabeled samples for semi-supervised learning. Because unlabeled samples are only used to generate new features, but not to be explicitly included in the training set like pSSL does, fSSL overcomes the inherent disadvantages of the traditional pSSL methods, especially for samples not within the same domain as the labeled instances. Experimental results and comparisons demonstrate that fSSL significantly outperforms pSSL-based methods for both within-domain and cross-domain semi-supervised learning.
Zhang, Zhao; Zhao, Mingbo; Chow, Tommy W S
2012-12-01
In this work, sub-manifold projections based semi-supervised dimensionality reduction (DR) problem learning from partial constrained data is discussed. Two semi-supervised DR algorithms termed Marginal Semi-Supervised Sub-Manifold Projections (MS³MP) and orthogonal MS³MP (OMS³MP) are proposed. MS³MP in the singular case is also discussed. We also present the weighted least squares view of MS³MP. Based on specifying the types of neighborhoods with pairwise constraints (PC) and the defined manifold scatters, our methods can preserve the local properties of all points and discriminant structures embedded in the localized PC. The sub-manifolds of different classes can also be separated. In PC guided methods, exploring and selecting the informative constraints is challenging and random constraint subsets significantly affect the performance of algorithms. This paper also introduces an effective technique to select the informative constraints for DR with consistent constraints. The analytic form of the projection axes can be obtained by eigen-decomposition. The connections between this work and other related work are also elaborated. The validity of the proposed constraint selection approach and DR algorithms are evaluated by benchmark problems. Extensive simulations show that our algorithms can deliver promising results over some widely used state-of-the-art semi-supervised DR techniques. Copyright © 2012 Elsevier Ltd. All rights reserved.
Human semi-supervised learning.
Gibson, Bryan R; Rogers, Timothy T; Zhu, Xiaojin
2013-01-01
Most empirical work in human categorization has studied learning in either fully supervised or fully unsupervised scenarios. Most real-world learning scenarios, however, are semi-supervised: Learners receive a great deal of unlabeled information from the world, coupled with occasional experiences in which items are directly labeled by a knowledgeable source. A large body of work in machine learning has investigated how learning can exploit both labeled and unlabeled data provided to a learner. Using equivalences between models found in human categorization and machine learning research, we explain how these semi-supervised techniques can be applied to human learning. A series of experiments are described which show that semi-supervised learning models prove useful for explaining human behavior when exposed to both labeled and unlabeled data. We then discuss some machine learning models that do not have familiar human categorization counterparts. Finally, we discuss some challenges yet to be addressed in the use of semi-supervised models for modeling human categorization. Copyright © 2013 Cognitive Science Society, Inc.
Klabjan, Diego; Jonnalagadda, Siddhartha Reddy
2016-01-01
Background Community-based question answering (CQA) sites play an important role in addressing health information needs. However, a significant number of posted questions remain unanswered. Automatically answering the posted questions can provide a useful source of information for Web-based health communities. Objective In this study, we developed an algorithm to automatically answer health-related questions based on past questions and answers (QA). We also aimed to understand information embedded within Web-based health content that are good features in identifying valid answers. Methods Our proposed algorithm uses information retrieval techniques to identify candidate answers from resolved QA. To rank these candidates, we implemented a semi-supervised leaning algorithm that extracts the best answer to a question. We assessed this approach on a curated corpus from Yahoo! Answers and compared against a rule-based string similarity baseline. Results On our dataset, the semi-supervised learning algorithm has an accuracy of 86.2%. Unified medical language system–based (health related) features used in the model enhance the algorithm’s performance by proximately 8%. A reasonably high rate of accuracy is obtained given that the data are considerably noisy. Important features distinguishing a valid answer from an invalid answer include text length, number of stop words contained in a test question, a distance between the test question and other questions in the corpus, and a number of overlapping health-related terms between questions. Conclusions Overall, our automated QA system based on historical QA pairs is shown to be effective according to the dataset in this case study. It is developed for general use in the health care domain, which can also be applied to other CQA sites. PMID:27485666
Towards a Relation Extraction Framework for Cyber-Security Concepts
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jones, Corinne L; Bridges, Robert A; Huffer, Kelly M
In order to assist security analysts in obtaining information pertaining to their network, such as novel vulnerabilities, exploits, or patches, information retrieval methods tailored to the security domain are needed. As labeled text data is scarce and expensive, we follow developments in semi-supervised NLP and implement a bootstrapping algorithm for extracting security entities and their relationships from text. The algorithm requires little input data, specifically, a few relations or patterns (heuristics for identifying relations), and incorporates an active learning component which queries the user on the most important decisions to prevent drifting the desired relations. Preliminary testing on a smallmore » corpus shows promising results, obtaining precision of .82.« less
NASA Astrophysics Data System (ADS)
Jiang, Li; Xuan, Jianping; Shi, Tielin
2013-12-01
Generally, the vibration signals of faulty machinery are non-stationary and nonlinear under complicated operating conditions. Therefore, it is a big challenge for machinery fault diagnosis to extract optimal features for improving classification accuracy. This paper proposes semi-supervised kernel Marginal Fisher analysis (SSKMFA) for feature extraction, which can discover the intrinsic manifold structure of dataset, and simultaneously consider the intra-class compactness and the inter-class separability. Based on SSKMFA, a novel approach to fault diagnosis is put forward and applied to fault recognition of rolling bearings. SSKMFA directly extracts the low-dimensional characteristics from the raw high-dimensional vibration signals, by exploiting the inherent manifold structure of both labeled and unlabeled samples. Subsequently, the optimal low-dimensional features are fed into the simplest K-nearest neighbor (KNN) classifier to recognize different fault categories and severities of bearings. The experimental results demonstrate that the proposed approach improves the fault recognition performance and outperforms the other four feature extraction methods.
Semi-supervised SVM for individual tree crown species classification
NASA Astrophysics Data System (ADS)
Dalponte, Michele; Ene, Liviu Theodor; Marconcini, Mattia; Gobakken, Terje; Næsset, Erik
2015-12-01
In this paper a novel semi-supervised SVM classifier is presented, specifically developed for tree species classification at individual tree crown (ITC) level. In ITC tree species classification, all the pixels belonging to an ITC should have the same label. This assumption is used in the learning of the proposed semi-supervised SVM classifier (ITC-S3VM). This method exploits the information contained in the unlabeled ITC samples in order to improve the classification accuracy of a standard SVM. The ITC-S3VM method can be easily implemented using freely available software libraries. The datasets used in this study include hyperspectral imagery and laser scanning data acquired over two boreal forest areas characterized by the presence of three information classes (Pine, Spruce, and Broadleaves). The experimental results quantify the effectiveness of the proposed approach, which provides classification accuracies significantly higher (from 2% to above 27%) than those obtained by the standard supervised SVM and by a state-of-the-art semi-supervised SVM (S3VM). Particularly, by reducing the number of training samples (i.e. from 100% to 25%, and from 100% to 5% for the two datasets, respectively) the proposed method still exhibits results comparable to the ones of a supervised SVM trained with the full available training set. This property of the method makes it particularly suitable for practical forest inventory applications in which collection of in situ information can be very expensive both in terms of cost and time.
Active learning for semi-supervised clustering based on locally linear propagation reconstruction.
Chang, Chin-Chun; Lin, Po-Yi
2015-03-01
The success of semi-supervised clustering relies on the effectiveness of side information. To get effective side information, a new active learner learning pairwise constraints known as must-link and cannot-link constraints is proposed in this paper. Three novel techniques are developed for learning effective pairwise constraints. The first technique is used to identify samples less important to cluster structures. This technique makes use of a kernel version of locally linear embedding for manifold learning. Samples neither important to locally linear propagation reconstructions of other samples nor on flat patches in the learned manifold are regarded as unimportant samples. The second is a novel criterion for query selection. This criterion considers not only the importance of a sample to expanding the space coverage of the learned samples but also the expected number of queries needed to learn the sample. To facilitate semi-supervised clustering, the third technique yields inferred must-links for passing information about flat patches in the learned manifold to semi-supervised clustering algorithms. Experimental results have shown that the learned pairwise constraints can capture the underlying cluster structures and proven the feasibility of the proposed approach. Copyright © 2014 Elsevier Ltd. All rights reserved.
Patient-specific semi-supervised learning for postoperative brain tumor segmentation.
Meier, Raphael; Bauer, Stefan; Slotboom, Johannes; Wiest, Roland; Reyes, Mauricio
2014-01-01
In contrast to preoperative brain tumor segmentation, the problem of postoperative brain tumor segmentation has been rarely approached so far. We present a fully-automatic segmentation method using multimodal magnetic resonance image data and patient-specific semi-supervised learning. The idea behind our semi-supervised approach is to effectively fuse information from both pre- and postoperative image data of the same patient to improve segmentation of the postoperative image. We pose image segmentation as a classification problem and solve it by adopting a semi-supervised decision forest. The method is evaluated on a cohort of 10 high-grade glioma patients, with segmentation performance and computation time comparable or superior to a state-of-the-art brain tumor segmentation method. Moreover, our results confirm that the inclusion of preoperative MR images lead to a better performance regarding postoperative brain tumor segmentation.
Adaptive Sensing and Fusion of Multi-Sensor Data and Historical Information
2009-11-06
integrate MTL and semi-supervised learning into a single framework , thereby exploiting two forms of contextual information. A key new objective of the...this report we integrate MTL and semi-supervised learning into a single framework , thereby exploiting two forms of contextual information. A key new...process [8], denoted as X ∼ BeP (B), where B is a measure on Ω. If B is continuous, X is a Poisson process with intensity B and can be constructed as X = N
Current trends in geomorphological mapping
NASA Astrophysics Data System (ADS)
Seijmonsbergen, A. C.
2012-04-01
Geomorphological mapping is a world currently in motion, driven by technological advances and the availability of new high resolution data. As a consequence, classic (paper) geomorphological maps which were the standard for more than 50 years are rapidly being replaced by digital geomorphological information layers. This is witnessed by the following developments: 1. the conversion of classic paper maps into digital information layers, mainly performed in a digital mapping environment such as a Geographical Information System, 2. updating the location precision and the content of the converted maps, by adding more geomorphological details, taken from high resolution elevation data and/or high resolution image data, 3. (semi) automated extraction and classification of geomorphological features from digital elevation models, broadly separated into unsupervised and supervised classification techniques and 4. New digital visualization / cartographic techniques and reading interfaces. Newly digital geomorphological information layers can be based on manual digitization of polygons using DEMs and/or aerial photographs, or prepared through (semi) automated extraction and delineation of geomorphological features. DEMs are often used as basis to derive Land Surface Parameter information which is used as input for (un) supervised classification techniques. Especially when using high-res data, object-based classification is used as an alternative to traditional pixel-based classifications, to cluster grid cells into homogeneous objects, which can be classified as geomorphological features. Classic map content can also be used as training material for the supervised classification of geomorphological features. In the classification process, rule-based protocols, including expert-knowledge input, are used to map specific geomorphological features or entire landscapes. Current (semi) automated classification techniques are increasingly able to extract morphometric, hydrological, and in the near future also morphogenetic information. As a result, these new opportunities have changed the workflows for geomorphological mapmaking, and their focus have shifted from field-based techniques to using more computer-based techniques: for example, traditional pre-field air-photo based maps are now replaced by maps prepared in a digital mapping environment, and designated field visits using mobile GIS / digital mapping devices now focus on gathering location information and attribute inventories and are strongly time efficient. The resulting 'modern geomorphological maps' are digital collections of geomorphological information layers consisting of georeferenced vector, raster and tabular data which are stored in a digital environment such as a GIS geodatabase, and are easily visualized as e.g. 'birds' eye' views, as animated 3D displays, on virtual globes, or stored as GeoPDF maps in which georeferenced attribute information can be easily exchanged over the internet. Digital geomorphological information layers are increasingly accessed via web-based services distributed through remote servers. Information can be consulted - or even build using remote geoprocessing servers - by the end user. Therefore, it will not only be the geomorphologist anymore, but also the professional end user that dictates the applied use of digital geomorphological information layers.
Liu, Jinping; Tang, Zhaohui; Xu, Pengfei; Liu, Wenzhong; Zhang, Jin; Zhu, Jianyong
2016-06-29
The topic of online product quality inspection (OPQI) with smart visual sensors is attracting increasing interest in both the academic and industrial communities on account of the natural connection between the visual appearance of products with their underlying qualities. Visual images captured from granulated products (GPs), e.g., cereal products, fabric textiles, are comprised of a large number of independent particles or stochastically stacking locally homogeneous fragments, whose analysis and understanding remains challenging. A method of image statistical modeling-based OPQI for GP quality grading and monitoring by a Weibull distribution(WD) model with a semi-supervised learning classifier is presented. WD-model parameters (WD-MPs) of GP images' spatial structures, obtained with omnidirectional Gaussian derivative filtering (OGDF), which were demonstrated theoretically to obey a specific WD model of integral form, were extracted as the visual features. Then, a co-training-style semi-supervised classifier algorithm, named COSC-Boosting, was exploited for semi-supervised GP quality grading, by integrating two independent classifiers with complementary nature in the face of scarce labeled samples. Effectiveness of the proposed OPQI method was verified and compared in the field of automated rice quality grading with commonly-used methods and showed superior performance, which lays a foundation for the quality control of GP on assembly lines.
Visual texture perception via graph-based semi-supervised learning
NASA Astrophysics Data System (ADS)
Zhang, Qin; Dong, Junyu; Zhong, Guoqiang
2018-04-01
Perceptual features, for example direction, contrast and repetitiveness, are important visual factors for human to perceive a texture. However, it needs to perform psychophysical experiment to quantify these perceptual features' scale, which requires a large amount of human labor and time. This paper focuses on the task of obtaining perceptual features' scale of textures by small number of textures with perceptual scales through a rating psychophysical experiment (what we call labeled textures) and a mass of unlabeled textures. This is the scenario that the semi-supervised learning is naturally suitable for. This is meaningful for texture perception research, and really helpful for the perceptual texture database expansion. A graph-based semi-supervised learning method called random multi-graphs, RMG for short, is proposed to deal with this task. We evaluate different kinds of features including LBP, Gabor, and a kind of unsupervised deep features extracted by a PCA-based deep network. The experimental results show that our method can achieve satisfactory effects no matter what kind of texture features are used.
2014-01-01
Classification confidence, or informative content of the subsets, is quantified by the Information Divergence. Our approach relates to active learning , semi-supervised learning, mixed generative/discriminative learning.
Semi-supervised learning via regularized boosting working on multiple semi-supervised assumptions.
Chen, Ke; Wang, Shihai
2011-01-01
Semi-supervised learning concerns the problem of learning in the presence of labeled and unlabeled data. Several boosting algorithms have been extended to semi-supervised learning with various strategies. To our knowledge, however, none of them takes all three semi-supervised assumptions, i.e., smoothness, cluster, and manifold assumptions, together into account during boosting learning. In this paper, we propose a novel cost functional consisting of the margin cost on labeled data and the regularization penalty on unlabeled data based on three fundamental semi-supervised assumptions. Thus, minimizing our proposed cost functional with a greedy yet stagewise functional optimization procedure leads to a generic boosting framework for semi-supervised learning. Extensive experiments demonstrate that our algorithm yields favorite results for benchmark and real-world classification tasks in comparison to state-of-the-art semi-supervised learning algorithms, including newly developed boosting algorithms. Finally, we discuss relevant issues and relate our algorithm to the previous work.
Liu, Jinping; Tang, Zhaohui; Xu, Pengfei; Liu, Wenzhong; Zhang, Jin; Zhu, Jianyong
2016-01-01
The topic of online product quality inspection (OPQI) with smart visual sensors is attracting increasing interest in both the academic and industrial communities on account of the natural connection between the visual appearance of products with their underlying qualities. Visual images captured from granulated products (GPs), e.g., cereal products, fabric textiles, are comprised of a large number of independent particles or stochastically stacking locally homogeneous fragments, whose analysis and understanding remains challenging. A method of image statistical modeling-based OPQI for GP quality grading and monitoring by a Weibull distribution(WD) model with a semi-supervised learning classifier is presented. WD-model parameters (WD-MPs) of GP images’ spatial structures, obtained with omnidirectional Gaussian derivative filtering (OGDF), which were demonstrated theoretically to obey a specific WD model of integral form, were extracted as the visual features. Then, a co-training-style semi-supervised classifier algorithm, named COSC-Boosting, was exploited for semi-supervised GP quality grading, by integrating two independent classifiers with complementary nature in the face of scarce labeled samples. Effectiveness of the proposed OPQI method was verified and compared in the field of automated rice quality grading with commonly-used methods and showed superior performance, which lays a foundation for the quality control of GP on assembly lines. PMID:27367703
SemiBoost: boosting for semi-supervised learning.
Mallapragada, Pavan Kumar; Jin, Rong; Jain, Anil K; Liu, Yi
2009-11-01
Semi-supervised learning has attracted a significant amount of attention in pattern recognition and machine learning. Most previous studies have focused on designing special algorithms to effectively exploit the unlabeled data in conjunction with labeled data. Our goal is to improve the classification accuracy of any given supervised learning algorithm by using the available unlabeled examples. We call this as the Semi-supervised improvement problem, to distinguish the proposed approach from the existing approaches. We design a metasemi-supervised learning algorithm that wraps around the underlying supervised algorithm and improves its performance using unlabeled data. This problem is particularly important when we need to train a supervised learning algorithm with a limited number of labeled examples and a multitude of unlabeled examples. We present a boosting framework for semi-supervised learning, termed as SemiBoost. The key advantages of the proposed semi-supervised learning approach are: 1) performance improvement of any supervised learning algorithm with a multitude of unlabeled data, 2) efficient computation by the iterative boosting algorithm, and 3) exploiting both manifold and cluster assumption in training classification models. An empirical study on 16 different data sets and text categorization demonstrates that the proposed framework improves the performance of several commonly used supervised learning algorithms, given a large number of unlabeled examples. We also show that the performance of the proposed algorithm, SemiBoost, is comparable to the state-of-the-art semi-supervised learning algorithms.
Hou, Bin; Wang, Yunhong; Liu, Qingjie
2016-01-01
Characterizations of up to date information of the Earth’s surface are an important application providing insights to urban planning, resources monitoring and environmental studies. A large number of change detection (CD) methods have been developed to solve them by utilizing remote sensing (RS) images. The advent of high resolution (HR) remote sensing images further provides challenges to traditional CD methods and opportunities to object-based CD methods. While several kinds of geospatial objects are recognized, this manuscript mainly focuses on buildings. Specifically, we propose a novel automatic approach combining pixel-based strategies with object-based ones for detecting building changes with HR remote sensing images. A multiresolution contextual morphological transformation called extended morphological attribute profiles (EMAPs) allows the extraction of geometrical features related to the structures within the scene at different scales. Pixel-based post-classification is executed on EMAPs using hierarchical fuzzy clustering. Subsequently, the hierarchical fuzzy frequency vector histograms are formed based on the image-objects acquired by simple linear iterative clustering (SLIC) segmentation. Then, saliency and morphological building index (MBI) extracted on difference images are used to generate a pseudo training set. Ultimately, object-based semi-supervised classification is implemented on this training set by applying random forest (RF). Most of the important changes are detected by the proposed method in our experiments. This study was checked for effectiveness using visual evaluation and numerical evaluation. PMID:27618903
Hou, Bin; Wang, Yunhong; Liu, Qingjie
2016-08-27
Characterizations of up to date information of the Earth's surface are an important application providing insights to urban planning, resources monitoring and environmental studies. A large number of change detection (CD) methods have been developed to solve them by utilizing remote sensing (RS) images. The advent of high resolution (HR) remote sensing images further provides challenges to traditional CD methods and opportunities to object-based CD methods. While several kinds of geospatial objects are recognized, this manuscript mainly focuses on buildings. Specifically, we propose a novel automatic approach combining pixel-based strategies with object-based ones for detecting building changes with HR remote sensing images. A multiresolution contextual morphological transformation called extended morphological attribute profiles (EMAPs) allows the extraction of geometrical features related to the structures within the scene at different scales. Pixel-based post-classification is executed on EMAPs using hierarchical fuzzy clustering. Subsequently, the hierarchical fuzzy frequency vector histograms are formed based on the image-objects acquired by simple linear iterative clustering (SLIC) segmentation. Then, saliency and morphological building index (MBI) extracted on difference images are used to generate a pseudo training set. Ultimately, object-based semi-supervised classification is implemented on this training set by applying random forest (RF). Most of the important changes are detected by the proposed method in our experiments. This study was checked for effectiveness using visual evaluation and numerical evaluation.
Sparks, Rachel; Madabhushi, Anant
2016-01-01
Content-based image retrieval (CBIR) retrieves database images most similar to the query image by (1) extracting quantitative image descriptors and (2) calculating similarity between database and query image descriptors. Recently, manifold learning (ML) has been used to perform CBIR in a low dimensional representation of the high dimensional image descriptor space to avoid the curse of dimensionality. ML schemes are computationally expensive, requiring an eigenvalue decomposition (EVD) for every new query image to learn its low dimensional representation. We present out-of-sample extrapolation utilizing semi-supervised ML (OSE-SSL) to learn the low dimensional representation without recomputing the EVD for each query image. OSE-SSL incorporates semantic information, partial class label, into a ML scheme such that the low dimensional representation co-localizes semantically similar images. In the context of prostate histopathology, gland morphology is an integral component of the Gleason score which enables discrimination between prostate cancer aggressiveness. Images are represented by shape features extracted from the prostate gland. CBIR with OSE-SSL for prostate histology obtained from 58 patient studies, yielded an area under the precision recall curve (AUPRC) of 0.53 ± 0.03 comparatively a CBIR with Principal Component Analysis (PCA) to learn a low dimensional space yielded an AUPRC of 0.44 ± 0.01. PMID:27264985
Soto, Axel J; Zerva, Chrysoula; Batista-Navarro, Riza; Ananiadou, Sophia
2018-04-15
Pathway models are valuable resources that help us understand the various mechanisms underpinning complex biological processes. Their curation is typically carried out through manual inspection of published scientific literature to find information relevant to a model, which is a laborious and knowledge-intensive task. Furthermore, models curated manually cannot be easily updated and maintained with new evidence extracted from the literature without automated support. We have developed LitPathExplorer, a visual text analytics tool that integrates advanced text mining, semi-supervised learning and interactive visualization, to facilitate the exploration and analysis of pathway models using statements (i.e. events) extracted automatically from the literature and organized according to levels of confidence. LitPathExplorer supports pathway modellers and curators alike by: (i) extracting events from the literature that corroborate existing models with evidence; (ii) discovering new events which can update models; and (iii) providing a confidence value for each event that is automatically computed based on linguistic features and article metadata. Our evaluation of event extraction showed a precision of 89% and a recall of 71%. Evaluation of our confidence measure, when used for ranking sampled events, showed an average precision ranging between 61 and 73%, which can be improved to 95% when the user is involved in the semi-supervised learning process. Qualitative evaluation using pair analytics based on the feedback of three domain experts confirmed the utility of our tool within the context of pathway model exploration. LitPathExplorer is available at http://nactem.ac.uk/LitPathExplorer_BI/. sophia.ananiadou@manchester.ac.uk. Supplementary data are available at Bioinformatics online.
Zhang, Xiaotian; Yin, Jian; Zhang, Xu
2018-03-02
Increasing evidence suggests that dysregulation of microRNAs (miRNAs) may lead to a variety of diseases. Therefore, identifying disease-related miRNAs is a crucial problem. Currently, many computational approaches have been proposed to predict binary miRNA-disease associations. In this study, in order to predict underlying miRNA-disease association types, a semi-supervised model called the network-based label propagation algorithm is proposed to infer multiple types of miRNA-disease associations (NLPMMDA) by mutual information derived from the heterogeneous network. The NLPMMDA method integrates disease semantic similarity, miRNA functional similarity, and Gaussian interaction profile kernel similarity information of miRNAs and diseases to construct a heterogeneous network. NLPMMDA is a semi-supervised model which does not require verified negative samples. Leave-one-out cross validation (LOOCV) was implemented for four known types of miRNA-disease associations and demonstrated the reliable performance of our method. Moreover, case studies of lung cancer and breast cancer confirmed effective performance of NLPMMDA to predict novel miRNA-disease associations and their association types.
ERIC Educational Resources Information Center
Hemer, Susan R.
2012-01-01
A great deal of literature in recent years has focused on the supervisory relationship, yet very little has been written about the nature or content of supervisory meetings, beyond commenting on the frequency and length of meetings. Through semi-structured interviews, informal discussions with colleagues and students, a critical review of…
A semi-supervised classification algorithm using the TAD-derived background as training data
NASA Astrophysics Data System (ADS)
Fan, Lei; Ambeau, Brittany; Messinger, David W.
2013-05-01
In general, spectral image classification algorithms fall into one of two categories: supervised and unsupervised. In unsupervised approaches, the algorithm automatically identifies clusters in the data without a priori information about those clusters (except perhaps the expected number of them). Supervised approaches require an analyst to identify training data to learn the characteristics of the clusters such that they can then classify all other pixels into one of the pre-defined groups. The classification algorithm presented here is a semi-supervised approach based on the Topological Anomaly Detection (TAD) algorithm. The TAD algorithm defines background components based on a mutual k-Nearest Neighbor graph model of the data, along with a spectral connected components analysis. Here, the largest components produced by TAD are used as regions of interest (ROI's),or training data for a supervised classification scheme. By combining those ROI's with a Gaussian Maximum Likelihood (GML) or a Minimum Distance to the Mean (MDM) algorithm, we are able to achieve a semi supervised classification method. We test this classification algorithm against data collected by the HyMAP sensor over the Cooke City, MT area and University of Pavia scene.
Ensemble Semi-supervised Frame-work for Brain Magnetic Resonance Imaging Tissue Segmentation.
Azmi, Reza; Pishgoo, Boshra; Norozi, Narges; Yeganeh, Samira
2013-04-01
Brain magnetic resonance images (MRIs) tissue segmentation is one of the most important parts of the clinical diagnostic tools. Pixel classification methods have been frequently used in the image segmentation with two supervised and unsupervised approaches up to now. Supervised segmentation methods lead to high accuracy, but they need a large amount of labeled data, which is hard, expensive, and slow to obtain. Moreover, they cannot use unlabeled data to train classifiers. On the other hand, unsupervised segmentation methods have no prior knowledge and lead to low level of performance. However, semi-supervised learning which uses a few labeled data together with a large amount of unlabeled data causes higher accuracy with less trouble. In this paper, we propose an ensemble semi-supervised frame-work for segmenting of brain magnetic resonance imaging (MRI) tissues that it has been used results of several semi-supervised classifiers simultaneously. Selecting appropriate classifiers has a significant role in the performance of this frame-work. Hence, in this paper, we present two semi-supervised algorithms expectation filtering maximization and MCo_Training that are improved versions of semi-supervised methods expectation maximization and Co_Training and increase segmentation accuracy. Afterward, we use these improved classifiers together with graph-based semi-supervised classifier as components of the ensemble frame-work. Experimental results show that performance of segmentation in this approach is higher than both supervised methods and the individual semi-supervised classifiers.
The helpfulness of category labels in semi-supervised learning depends on category structure.
Vong, Wai Keen; Navarro, Daniel J; Perfors, Amy
2016-02-01
The study of semi-supervised category learning has generally focused on how additional unlabeled information with given labeled information might benefit category learning. The literature is also somewhat contradictory, sometimes appearing to show a benefit to unlabeled information and sometimes not. In this paper, we frame the problem differently, focusing on when labels might be helpful to a learner who has access to lots of unlabeled information. Using an unconstrained free-sorting categorization experiment, we show that labels are useful to participants only when the category structure is ambiguous and that people's responses are driven by the specific set of labels they see. We present an extension of Anderson's Rational Model of Categorization that captures this effect.
Semi-supervised vibration-based classification and condition monitoring of compressors
NASA Astrophysics Data System (ADS)
Potočnik, Primož; Govekar, Edvard
2017-09-01
Semi-supervised vibration-based classification and condition monitoring of the reciprocating compressors installed in refrigeration appliances is proposed in this paper. The method addresses the problem of industrial condition monitoring where prior class definitions are often not available or difficult to obtain from local experts. The proposed method combines feature extraction, principal component analysis, and statistical analysis for the extraction of initial class representatives, and compares the capability of various classification methods, including discriminant analysis (DA), neural networks (NN), support vector machines (SVM), and extreme learning machines (ELM). The use of the method is demonstrated on a case study which was based on industrially acquired vibration measurements of reciprocating compressors during the production of refrigeration appliances. The paper presents a comparative qualitative analysis of the applied classifiers, confirming the good performance of several nonlinear classifiers. If the model parameters are properly selected, then very good classification performance can be obtained from NN trained by Bayesian regularization, SVM and ELM classifiers. The method can be effectively applied for the industrial condition monitoring of compressors.
Graph-Based Semi-Supervised Hyperspectral Image Classification Using Spatial Information
NASA Astrophysics Data System (ADS)
Jamshidpour, N.; Homayouni, S.; Safari, A.
2017-09-01
Hyperspectral image classification has been one of the most popular research areas in the remote sensing community in the past decades. However, there are still some problems that need specific attentions. For example, the lack of enough labeled samples and the high dimensionality problem are two most important issues which degrade the performance of supervised classification dramatically. The main idea of semi-supervised learning is to overcome these issues by the contribution of unlabeled samples, which are available in an enormous amount. In this paper, we propose a graph-based semi-supervised classification method, which uses both spectral and spatial information for hyperspectral image classification. More specifically, two graphs were designed and constructed in order to exploit the relationship among pixels in spectral and spatial spaces respectively. Then, the Laplacians of both graphs were merged to form a weighted joint graph. The experiments were carried out on two different benchmark hyperspectral data sets. The proposed method performed significantly better than the well-known supervised classification methods, such as SVM. The assessments consisted of both accuracy and homogeneity analyses of the produced classification maps. The proposed spectral-spatial SSL method considerably increased the classification accuracy when the labeled training data set is too scarce.When there were only five labeled samples for each class, the performance improved 5.92% and 10.76% compared to spatial graph-based SSL, for AVIRIS Indian Pine and Pavia University data sets respectively.
Can Semi-Supervised Learning Explain Incorrect Beliefs about Categories?
ERIC Educational Resources Information Center
Kalish, Charles W.; Rogers, Timothy T.; Lang, Jonathan; Zhu, Xiaojin
2011-01-01
Three experiments with 88 college-aged participants explored how unlabeled experiences--learning episodes in which people encounter objects without information about their category membership--influence beliefs about category structure. Participants performed a simple one-dimensional categorization task in a brief supervised learning phase, then…
Ensemble Semi-supervised Frame-work for Brain Magnetic Resonance Imaging Tissue Segmentation
Azmi, Reza; Pishgoo, Boshra; Norozi, Narges; Yeganeh, Samira
2013-01-01
Brain magnetic resonance images (MRIs) tissue segmentation is one of the most important parts of the clinical diagnostic tools. Pixel classification methods have been frequently used in the image segmentation with two supervised and unsupervised approaches up to now. Supervised segmentation methods lead to high accuracy, but they need a large amount of labeled data, which is hard, expensive, and slow to obtain. Moreover, they cannot use unlabeled data to train classifiers. On the other hand, unsupervised segmentation methods have no prior knowledge and lead to low level of performance. However, semi-supervised learning which uses a few labeled data together with a large amount of unlabeled data causes higher accuracy with less trouble. In this paper, we propose an ensemble semi-supervised frame-work for segmenting of brain magnetic resonance imaging (MRI) tissues that it has been used results of several semi-supervised classifiers simultaneously. Selecting appropriate classifiers has a significant role in the performance of this frame-work. Hence, in this paper, we present two semi-supervised algorithms expectation filtering maximization and MCo_Training that are improved versions of semi-supervised methods expectation maximization and Co_Training and increase segmentation accuracy. Afterward, we use these improved classifiers together with graph-based semi-supervised classifier as components of the ensemble frame-work. Experimental results show that performance of segmentation in this approach is higher than both supervised methods and the individual semi-supervised classifiers. PMID:24098863
Learning Biological Networks via Bootstrapping with Optimized GO-based Gene Similarity
DOE Office of Scientific and Technical Information (OSTI.GOV)
Taylor, Ronald C.; Sanfilippo, Antonio P.; McDermott, Jason E.
2010-08-02
Microarray gene expression data provide a unique information resource for learning biological networks using "reverse engineering" methods. However, there are a variety of cases in which we know which genes are involved in a given pathology of interest, but we do not have enough experimental evidence to support the use of fully-supervised/reverse-engineering learning methods. In this paper, we explore a novel semi-supervised approach in which biological networks are learned from a reference list of genes and a partial set of links for these genes extracted automatically from PubMed abstracts, using a knowledge-driven bootstrapping algorithm. We show how new relevant linksmore » across genes can be iteratively derived using a gene similarity measure based on the Gene Ontology that is optimized on the input network at each iteration. We describe an application of this approach to the TGFB pathway as a case study and show how the ensuing results prove the feasibility of the approach as an alternate or complementary technique to fully supervised methods.« less
Evaluation of Semi-supervised Learning for Classification of Protein Crystallization Imagery.
Sigdel, Madhav; Dinç, İmren; Dinç, Semih; Sigdel, Madhu S; Pusey, Marc L; Aygün, Ramazan S
2014-03-01
In this paper, we investigate the performance of two wrapper methods for semi-supervised learning algorithms for classification of protein crystallization images with limited labeled images. Firstly, we evaluate the performance of semi-supervised approach using self-training with naïve Bayesian (NB) and sequential minimum optimization (SMO) as the base classifiers. The confidence values returned by these classifiers are used to select high confident predictions to be used for self-training. Secondly, we analyze the performance of Yet Another Two Stage Idea (YATSI) semi-supervised learning using NB, SMO, multilayer perceptron (MLP), J48 and random forest (RF) classifiers. These results are compared with the basic supervised learning using the same training sets. We perform our experiments on a dataset consisting of 2250 protein crystallization images for different proportions of training and test data. Our results indicate that NB and SMO using both self-training and YATSI semi-supervised approaches improve accuracies with respect to supervised learning. On the other hand, MLP, J48 and RF perform better using basic supervised learning. Overall, random forest classifier yields the best accuracy with supervised learning for our dataset.
Lu, Shen; Xia, Yong; Cai, Tom Weidong; Feng, David Dagan
2015-01-01
Dementia, Alzheimer's disease (AD) in particular is a global problem and big threat to the aging population. An image based computer-aided dementia diagnosis method is needed to providing doctors help during medical image examination. Many machine learning based dementia classification methods using medical imaging have been proposed and most of them achieve accurate results. However, most of these methods make use of supervised learning requiring fully labeled image dataset, which usually is not practical in real clinical environment. Using large amount of unlabeled images can improve the dementia classification performance. In this study we propose a new semi-supervised dementia classification method based on random manifold learning with affinity regularization. Three groups of spatial features are extracted from positron emission tomography (PET) images to construct an unsupervised random forest which is then used to regularize the manifold learning objective function. The proposed method, stat-of-the-art Laplacian support vector machine (LapSVM) and supervised SVM are applied to classify AD and normal controls (NC). The experiment results show that learning with unlabeled images indeed improves the classification performance. And our method outperforms LapSVM on the same dataset.
Stanescu, Ana; Caragea, Doina
2015-01-01
Recent biochemical advances have led to inexpensive, time-efficient production of massive volumes of raw genomic data. Traditional machine learning approaches to genome annotation typically rely on large amounts of labeled data. The process of labeling data can be expensive, as it requires domain knowledge and expert involvement. Semi-supervised learning approaches that can make use of unlabeled data, in addition to small amounts of labeled data, can help reduce the costs associated with labeling. In this context, we focus on the problem of predicting splice sites in a genome using semi-supervised learning approaches. This is a challenging problem, due to the highly imbalanced distribution of the data, i.e., small number of splice sites as compared to the number of non-splice sites. To address this challenge, we propose to use ensembles of semi-supervised classifiers, specifically self-training and co-training classifiers. Our experiments on five highly imbalanced splice site datasets, with positive to negative ratios of 1-to-99, showed that the ensemble-based semi-supervised approaches represent a good choice, even when the amount of labeled data consists of less than 1% of all training data. In particular, we found that ensembles of co-training and self-training classifiers that dynamically balance the set of labeled instances during the semi-supervised iterations show improvements over the corresponding supervised ensemble baselines. In the presence of limited amounts of labeled data, ensemble-based semi-supervised approaches can successfully leverage the unlabeled data to enhance supervised ensembles learned from highly imbalanced data distributions. Given that such distributions are common for many biological sequence classification problems, our work can be seen as a stepping stone towards more sophisticated ensemble-based approaches to biological sequence annotation in a semi-supervised framework.
2015-01-01
Background Recent biochemical advances have led to inexpensive, time-efficient production of massive volumes of raw genomic data. Traditional machine learning approaches to genome annotation typically rely on large amounts of labeled data. The process of labeling data can be expensive, as it requires domain knowledge and expert involvement. Semi-supervised learning approaches that can make use of unlabeled data, in addition to small amounts of labeled data, can help reduce the costs associated with labeling. In this context, we focus on the problem of predicting splice sites in a genome using semi-supervised learning approaches. This is a challenging problem, due to the highly imbalanced distribution of the data, i.e., small number of splice sites as compared to the number of non-splice sites. To address this challenge, we propose to use ensembles of semi-supervised classifiers, specifically self-training and co-training classifiers. Results Our experiments on five highly imbalanced splice site datasets, with positive to negative ratios of 1-to-99, showed that the ensemble-based semi-supervised approaches represent a good choice, even when the amount of labeled data consists of less than 1% of all training data. In particular, we found that ensembles of co-training and self-training classifiers that dynamically balance the set of labeled instances during the semi-supervised iterations show improvements over the corresponding supervised ensemble baselines. Conclusions In the presence of limited amounts of labeled data, ensemble-based semi-supervised approaches can successfully leverage the unlabeled data to enhance supervised ensembles learned from highly imbalanced data distributions. Given that such distributions are common for many biological sequence classification problems, our work can be seen as a stepping stone towards more sophisticated ensemble-based approaches to biological sequence annotation in a semi-supervised framework. PMID:26356316
Semi-supervised learning for ordinal Kernel Discriminant Analysis.
Pérez-Ortiz, M; Gutiérrez, P A; Carbonero-Ruz, M; Hervás-Martínez, C
2016-12-01
Ordinal classification considers those classification problems where the labels of the variable to predict follow a given order. Naturally, labelled data is scarce or difficult to obtain in this type of problems because, in many cases, ordinal labels are given by a user or expert (e.g. in recommendation systems). Firstly, this paper develops a new strategy for ordinal classification where both labelled and unlabelled data are used in the model construction step (a scheme which is referred to as semi-supervised learning). More specifically, the ordinal version of kernel discriminant learning is extended for this setting considering the neighbourhood information of unlabelled data, which is proposed to be computed in the feature space induced by the kernel function. Secondly, a new method for semi-supervised kernel learning is devised in the context of ordinal classification, which is combined with our developed classification strategy to optimise the kernel parameters. The experiments conducted compare 6 different approaches for semi-supervised learning in the context of ordinal classification in a battery of 30 datasets, showing (1) the good synergy of the ordinal version of discriminant analysis and the use of unlabelled data and (2) the advantage of computing distances in the feature space induced by the kernel function. Copyright © 2016 Elsevier Ltd. All rights reserved.
Squire, P N; Parasuraman, R
2010-08-01
The present study assessed the impact of task load and level of automation (LOA) on task switching in participants supervising a team of four or eight semi-autonomous robots in a simulated 'capture the flag' game. Participants were faster to perform the same task than when they chose to switch between different task actions. They also took longer to switch between different tasks when supervising the robots at a high compared to a low LOA. Task load, as manipulated by the number of robots to be supervised, did not influence switch costs. The results suggest that the design of future unmanned vehicle (UV) systems should take into account not simply how many UVs an operator can supervise, but also the impact of LOA and task operations on task switching during supervision of multiple UVs. The findings of this study are relevant for the ergonomics practice of UV systems. This research extends the cognitive theory of task switching to inform the design of UV systems and results show that switching between UVs is an important factor to consider.
A Cluster-then-label Semi-supervised Learning Approach for Pathology Image Classification.
Peikari, Mohammad; Salama, Sherine; Nofech-Mozes, Sharon; Martel, Anne L
2018-05-08
Completely labeled pathology datasets are often challenging and time-consuming to obtain. Semi-supervised learning (SSL) methods are able to learn from fewer labeled data points with the help of a large number of unlabeled data points. In this paper, we investigated the possibility of using clustering analysis to identify the underlying structure of the data space for SSL. A cluster-then-label method was proposed to identify high-density regions in the data space which were then used to help a supervised SVM in finding the decision boundary. We have compared our method with other supervised and semi-supervised state-of-the-art techniques using two different classification tasks applied to breast pathology datasets. We found that compared with other state-of-the-art supervised and semi-supervised methods, our SSL method is able to improve classification performance when a limited number of labeled data instances are made available. We also showed that it is important to examine the underlying distribution of the data space before applying SSL techniques to ensure semi-supervised learning assumptions are not violated by the data.
Semi-Supervised Marginal Fisher Analysis for Hyperspectral Image Classification
NASA Astrophysics Data System (ADS)
Huang, H.; Liu, J.; Pan, Y.
2012-07-01
The problem of learning with both labeled and unlabeled examples arises frequently in Hyperspectral image (HSI) classification. While marginal Fisher analysis is a supervised method, which cannot be directly applied for Semi-supervised classification. In this paper, we proposed a novel method, called semi-supervised marginal Fisher analysis (SSMFA), to process HSI of natural scenes, which uses a combination of semi-supervised learning and manifold learning. In SSMFA, a new difference-based optimization objective function with unlabeled samples has been designed. SSMFA preserves the manifold structure of labeled and unlabeled samples in addition to separating labeled samples in different classes from each other. The semi-supervised method has an analytic form of the globally optimal solution, and it can be computed based on eigen decomposition. Classification experiments with a challenging HSI task demonstrate that this method outperforms current state-of-the-art HSI-classification methods.
A qualitative investigation of the nature of "informal supervision" among therapists in training.
Coren, Sidney; Farber, Barry A
2017-11-29
This study investigated how, when, why, and with whom therapists in training utilize "informal supervision"-that is, engage individuals who are not their formally assigned supervisors in significant conversations about their clinical work. Participants were 16 doctoral trainees in clinical and counseling psychology programs. Semi-structured interviews were conducted and analyzed using the Consensual Qualitative Research (CQR) method. Seven domains emerged from the analysis, indicating that, in general, participants believe that informal and formal supervision offer many of the same benefits, including validation, support, and reassurance; freedom and safety to discuss doubts, anxieties, strong personal reactions to patients, clinical mistakes and challenges; and alternative approaches to clinical interventions. However, several differences also emerged between these modes of learning-for example, formal supervision is seen as more focused on didactics per se ("what to do"), whereas informal supervision is seen as providing more of a "holding environment." Overall, the findings of this study suggest that informal supervision is an important and valuable adjunctive practice by which clinical trainees augment their professional competencies. Recommendations are proposed for clinical practice and training, including the need to further specify the ethical boundaries of this unique and essentially unregulated type of supervision.
Evaluation of Semi-supervised Learning for Classification of Protein Crystallization Imagery
Sigdel, Madhav; Dinç, İmren; Dinç, Semih; Sigdel, Madhu S.; Pusey, Marc L.; Aygün, Ramazan S.
2015-01-01
In this paper, we investigate the performance of two wrapper methods for semi-supervised learning algorithms for classification of protein crystallization images with limited labeled images. Firstly, we evaluate the performance of semi-supervised approach using self-training with naïve Bayesian (NB) and sequential minimum optimization (SMO) as the base classifiers. The confidence values returned by these classifiers are used to select high confident predictions to be used for self-training. Secondly, we analyze the performance of Yet Another Two Stage Idea (YATSI) semi-supervised learning using NB, SMO, multilayer perceptron (MLP), J48 and random forest (RF) classifiers. These results are compared with the basic supervised learning using the same training sets. We perform our experiments on a dataset consisting of 2250 protein crystallization images for different proportions of training and test data. Our results indicate that NB and SMO using both self-training and YATSI semi-supervised approaches improve accuracies with respect to supervised learning. On the other hand, MLP, J48 and RF perform better using basic supervised learning. Overall, random forest classifier yields the best accuracy with supervised learning for our dataset. PMID:25914518
Hu, Weiming; Gao, Jin; Xing, Junliang; Zhang, Chao; Maybank, Stephen
2017-01-01
An appearance model adaptable to changes in object appearance is critical in visual object tracking. In this paper, we treat an image patch as a two-order tensor which preserves the original image structure. We design two graphs for characterizing the intrinsic local geometrical structure of the tensor samples of the object and the background. Graph embedding is used to reduce the dimensions of the tensors while preserving the structure of the graphs. Then, a discriminant embedding space is constructed. We prove two propositions for finding the transformation matrices which are used to map the original tensor samples to the tensor-based graph embedding space. In order to encode more discriminant information in the embedding space, we propose a transfer-learning- based semi-supervised strategy to iteratively adjust the embedding space into which discriminative information obtained from earlier times is transferred. We apply the proposed semi-supervised tensor-based graph embedding learning algorithm to visual tracking. The new tracking algorithm captures an object's appearance characteristics during tracking and uses a particle filter to estimate the optimal object state. Experimental results on the CVPR 2013 benchmark dataset demonstrate the effectiveness of the proposed tracking algorithm.
Semi-supervised clustering methods
Bair, Eric
2013-01-01
Cluster analysis methods seek to partition a data set into homogeneous subgroups. It is useful in a wide variety of applications, including document processing and modern genetics. Conventional clustering methods are unsupervised, meaning that there is no outcome variable nor is anything known about the relationship between the observations in the data set. In many situations, however, information about the clusters is available in addition to the values of the features. For example, the cluster labels of some observations may be known, or certain observations may be known to belong to the same cluster. In other cases, one may wish to identify clusters that are associated with a particular outcome variable. This review describes several clustering algorithms (known as “semi-supervised clustering” methods) that can be applied in these situations. The majority of these methods are modifications of the popular k-means clustering method, and several of them will be described in detail. A brief description of some other semi-supervised clustering algorithms is also provided. PMID:24729830
Semi-supervised clustering methods.
Bair, Eric
2013-01-01
Cluster analysis methods seek to partition a data set into homogeneous subgroups. It is useful in a wide variety of applications, including document processing and modern genetics. Conventional clustering methods are unsupervised, meaning that there is no outcome variable nor is anything known about the relationship between the observations in the data set. In many situations, however, information about the clusters is available in addition to the values of the features. For example, the cluster labels of some observations may be known, or certain observations may be known to belong to the same cluster. In other cases, one may wish to identify clusters that are associated with a particular outcome variable. This review describes several clustering algorithms (known as "semi-supervised clustering" methods) that can be applied in these situations. The majority of these methods are modifications of the popular k-means clustering method, and several of them will be described in detail. A brief description of some other semi-supervised clustering algorithms is also provided.
Ensemble learning with trees and rules: supervised, semi-supervised, unsupervised
USDA-ARS?s Scientific Manuscript database
In this article, we propose several new approaches for post processing a large ensemble of conjunctive rules for supervised and semi-supervised learning problems. We show with various examples that for high dimensional regression problems the models constructed by the post processing the rules with ...
Semi-Supervised Projective Non-Negative Matrix Factorization for Cancer Classification.
Zhang, Xiang; Guan, Naiyang; Jia, Zhilong; Qiu, Xiaogang; Luo, Zhigang
2015-01-01
Advances in DNA microarray technologies have made gene expression profiles a significant candidate in identifying different types of cancers. Traditional learning-based cancer identification methods utilize labeled samples to train a classifier, but they are inconvenient for practical application because labels are quite expensive in the clinical cancer research community. This paper proposes a semi-supervised projective non-negative matrix factorization method (Semi-PNMF) to learn an effective classifier from both labeled and unlabeled samples, thus boosting subsequent cancer classification performance. In particular, Semi-PNMF jointly learns a non-negative subspace from concatenated labeled and unlabeled samples and indicates classes by the positions of the maximum entries of their coefficients. Because Semi-PNMF incorporates statistical information from the large volume of unlabeled samples in the learned subspace, it can learn more representative subspaces and boost classification performance. We developed a multiplicative update rule (MUR) to optimize Semi-PNMF and proved its convergence. The experimental results of cancer classification for two multiclass cancer gene expression profile datasets show that Semi-PNMF outperforms the representative methods.
Semi-Supervised Learning to Identify UMLS Semantic Relations.
Luo, Yuan; Uzuner, Ozlem
2014-01-01
The UMLS Semantic Network is constructed by experts and requires periodic expert review to update. We propose and implement a semi-supervised approach for automatically identifying UMLS semantic relations from narrative text in PubMed. Our method analyzes biomedical narrative text to collect semantic entity pairs, and extracts multiple semantic, syntactic and orthographic features for the collected pairs. We experiment with seeded k-means clustering with various distance metrics. We create and annotate a ground truth corpus according to the top two levels of the UMLS semantic relation hierarchy. We evaluate our system on this corpus and characterize the learning curves of different clustering configuration. Using KL divergence consistently performs the best on the held-out test data. With full seeding, we obtain macro-averaged F-measures above 70% for clustering the top level UMLS relations (2-way), and above 50% for clustering the second level relations (7-way).
Cross-language opinion lexicon extraction using mutual-reinforcement label propagation.
Lin, Zheng; Tan, Songbo; Liu, Yue; Cheng, Xueqi; Xu, Xueke
2013-01-01
There is a growing interest in automatically building opinion lexicon from sources such as product reviews. Most of these methods depend on abundant external resources such as WordNet, which limits the applicability of these methods. Unsupervised or semi-supervised learning provides an optional solution to multilingual opinion lexicon extraction. However, the datasets are imbalanced in different languages. For some languages, the high-quality corpora are scarce or hard to obtain, which limits the research progress. To solve the above problems, we explore a mutual-reinforcement label propagation framework. First, for each language, a label propagation algorithm is applied to a word relation graph, and then a bilingual dictionary is used as a bridge to transfer information between two languages. A key advantage of this model is its ability to make two languages learn from each other and boost each other. The experimental results show that the proposed approach outperforms baseline significantly.
Cross-Language Opinion Lexicon Extraction Using Mutual-Reinforcement Label Propagation
Lin, Zheng; Tan, Songbo; Liu, Yue; Cheng, Xueqi; Xu, Xueke
2013-01-01
There is a growing interest in automatically building opinion lexicon from sources such as product reviews. Most of these methods depend on abundant external resources such as WordNet, which limits the applicability of these methods. Unsupervised or semi-supervised learning provides an optional solution to multilingual opinion lexicon extraction. However, the datasets are imbalanced in different languages. For some languages, the high-quality corpora are scarce or hard to obtain, which limits the research progress. To solve the above problems, we explore a mutual-reinforcement label propagation framework. First, for each language, a label propagation algorithm is applied to a word relation graph, and then a bilingual dictionary is used as a bridge to transfer information between two languages. A key advantage of this model is its ability to make two languages learn from each other and boost each other. The experimental results show that the proposed approach outperforms baseline significantly. PMID:24260190
Semi-supervised and unsupervised extreme learning machines.
Huang, Gao; Song, Shiji; Gupta, Jatinder N D; Wu, Cheng
2014-12-01
Extreme learning machines (ELMs) have proven to be efficient and effective learning mechanisms for pattern classification and regression. However, ELMs are primarily applied to supervised learning problems. Only a few existing research papers have used ELMs to explore unlabeled data. In this paper, we extend ELMs for both semi-supervised and unsupervised tasks based on the manifold regularization, thus greatly expanding the applicability of ELMs. The key advantages of the proposed algorithms are as follows: 1) both the semi-supervised ELM (SS-ELM) and the unsupervised ELM (US-ELM) exhibit learning capability and computational efficiency of ELMs; 2) both algorithms naturally handle multiclass classification or multicluster clustering; and 3) both algorithms are inductive and can handle unseen data at test time directly. Moreover, it is shown in this paper that all the supervised, semi-supervised, and unsupervised ELMs can actually be put into a unified framework. This provides new perspectives for understanding the mechanism of random feature mapping, which is the key concept in ELM theory. Empirical study on a wide range of data sets demonstrates that the proposed algorithms are competitive with the state-of-the-art semi-supervised or unsupervised learning algorithms in terms of accuracy and efficiency.
Semi-supervised prediction of gene regulatory networks using machine learning algorithms.
Patel, Nihir; Wang, Jason T L
2015-10-01
Use of computational methods to predict gene regulatory networks (GRNs) from gene expression data is a challenging task. Many studies have been conducted using unsupervised methods to fulfill the task; however, such methods usually yield low prediction accuracies due to the lack of training data. In this article, we propose semi-supervised methods for GRN prediction by utilizing two machine learning algorithms, namely, support vector machines (SVM) and random forests (RF). The semi-supervised methods make use of unlabelled data for training. We investigated inductive and transductive learning approaches, both of which adopt an iterative procedure to obtain reliable negative training data from the unlabelled data. We then applied our semi-supervised methods to gene expression data of Escherichia coli and Saccharomyces cerevisiae, and evaluated the performance of our methods using the expression data. Our analysis indicated that the transductive learning approach outperformed the inductive learning approach for both organisms. However, there was no conclusive difference identified in the performance of SVM and RF. Experimental results also showed that the proposed semi-supervised methods performed better than existing supervised methods for both organisms.
Semi-Supervised Geographical Feature Detection
NASA Astrophysics Data System (ADS)
Yu, H.; Yu, L.; Kuo, K. S.
2016-12-01
Extraction and tracking geographical features is a fundamental requirement in many geoscience fields. However, this operation has become an increasingly challenging task for domain scientists when tackling a large amount of geoscience data. Although domain scientists may have a relatively clear definition of features, it is difficult to capture the presence of features in an accurate and efficient fashion. We propose a semi-supervised approach to address large geographical feature detection. Our approach has two main components. First, we represent a heterogeneous geoscience data in a unified high-dimensional space, which can facilitate us to evaluate the similarity of data points with respect to geolocation, time, and variable values. We characterize the data from these measures, and use a set of hash functions to parameterize the initial knowledge of the data. Second, for any user query, our approach can automatically extract the initial results based on the hash functions. To improve the accuracy of querying, our approach provides a visualization interface to display the querying results and allow users to interactively explore and refine them. The user feedback will be used to enhance our knowledge base in an iterative manner. In our implementation, we use high-performance computing techniques to accelerate the construction of hash functions. Our design facilitates a parallelization scheme for feature detection and extraction, which is a traditionally challenging problem for large-scale data. We evaluate our approach and demonstrate the effectiveness using both synthetic and real world datasets.
Safe semi-supervised learning based on weighted likelihood.
Kawakita, Masanori; Takeuchi, Jun'ichi
2014-05-01
We are interested in developing a safe semi-supervised learning that works in any situation. Semi-supervised learning postulates that n(') unlabeled data are available in addition to n labeled data. However, almost all of the previous semi-supervised methods require additional assumptions (not only unlabeled data) to make improvements on supervised learning. If such assumptions are not met, then the methods possibly perform worse than supervised learning. Sokolovska, Cappé, and Yvon (2008) proposed a semi-supervised method based on a weighted likelihood approach. They proved that this method asymptotically never performs worse than supervised learning (i.e., it is safe) without any assumption. Their method is attractive because it is easy to implement and is potentially general. Moreover, it is deeply related to a certain statistical paradox. However, the method of Sokolovska et al. (2008) assumes a very limited situation, i.e., classification, discrete covariates, n(')→∞ and a maximum likelihood estimator. In this paper, we extend their method by modifying the weight. We prove that our proposal is safe in a significantly wide range of situations as long as n≤n('). Further, we give a geometrical interpretation of the proof of safety through the relationship with the above-mentioned statistical paradox. Finally, we show that the above proposal is asymptotically safe even when n(')
Enhanced low-rank representation via sparse manifold adaption for semi-supervised learning.
Peng, Yong; Lu, Bao-Liang; Wang, Suhang
2015-05-01
Constructing an informative and discriminative graph plays an important role in various pattern recognition tasks such as clustering and classification. Among the existing graph-based learning models, low-rank representation (LRR) is a very competitive one, which has been extensively employed in spectral clustering and semi-supervised learning (SSL). In SSL, the graph is composed of both labeled and unlabeled samples, where the edge weights are calculated based on the LRR coefficients. However, most of existing LRR related approaches fail to consider the geometrical structure of data, which has been shown beneficial for discriminative tasks. In this paper, we propose an enhanced LRR via sparse manifold adaption, termed manifold low-rank representation (MLRR), to learn low-rank data representation. MLRR can explicitly take the data local manifold structure into consideration, which can be identified by the geometric sparsity idea; specifically, the local tangent space of each data point was sought by solving a sparse representation objective. Therefore, the graph to depict the relationship of data points can be built once the manifold information is obtained. We incorporate a regularizer into LRR to make the learned coefficients preserve the geometric constraints revealed in the data space. As a result, MLRR combines both the global information emphasized by low-rank property and the local information emphasized by the identified manifold structure. Extensive experimental results on semi-supervised classification tasks demonstrate that MLRR is an excellent method in comparison with several state-of-the-art graph construction approaches. Copyright © 2015 Elsevier Ltd. All rights reserved.
Fully Decentralized Semi-supervised Learning via Privacy-preserving Matrix Completion.
Fierimonte, Roberto; Scardapane, Simone; Uncini, Aurelio; Panella, Massimo
2016-08-26
Distributed learning refers to the problem of inferring a function when the training data are distributed among different nodes. While significant work has been done in the contexts of supervised and unsupervised learning, the intermediate case of Semi-supervised learning in the distributed setting has received less attention. In this paper, we propose an algorithm for this class of problems, by extending the framework of manifold regularization. The main component of the proposed algorithm consists of a fully distributed computation of the adjacency matrix of the training patterns. To this end, we propose a novel algorithm for low-rank distributed matrix completion, based on the framework of diffusion adaptation. Overall, the distributed Semi-supervised algorithm is efficient and scalable, and it can preserve privacy by the inclusion of flexible privacy-preserving mechanisms for similarity computation. The experimental results and comparison on a wide range of standard Semi-supervised benchmarks validate our proposal.
In-situ trainable intrusion detection system
DOE Office of Scientific and Technical Information (OSTI.GOV)
Symons, Christopher T.; Beaver, Justin M.; Gillen, Rob
A computer implemented method detects intrusions using a computer by analyzing network traffic. The method includes a semi-supervised learning module connected to a network node. The learning module uses labeled and unlabeled data to train a semi-supervised machine learning sensor. The method records events that include a feature set made up of unauthorized intrusions and benign computer requests. The method identifies at least some of the benign computer requests that occur during the recording of the events while treating the remainder of the data as unlabeled. The method trains the semi-supervised learning module at the network node in-situ, such thatmore » the semi-supervised learning modules may identify malicious traffic without relying on specific rules, signatures, or anomaly detection.« less
Semi Automated Land Cover Layer Updating Process Utilizing Spectral Analysis and GIS Data Fusion
NASA Astrophysics Data System (ADS)
Cohen, L.; Keinan, E.; Yaniv, M.; Tal, Y.; Felus, A.; Regev, R.
2018-04-01
Technological improvements made in recent years of mass data gathering and analyzing, influenced the traditional methods of updating and forming of the national topographic database. It has brought a significant increase in the number of use cases and detailed geo information demands. Processes which its purpose is to alternate traditional data collection methods developed in many National Mapping and Cadaster Agencies. There has been significant progress in semi-automated methodologies aiming to facilitate updating of a topographic national geodatabase. Implementation of those is expected to allow a considerable reduction of updating costs and operation times. Our previous activity has focused on building automatic extraction (Keinan, Zilberstein et al, 2015). Before semiautomatic updating method, it was common that interpreter identification has to be as detailed as possible to hold most reliable database eventually. When using semi-automatic updating methodologies, the ability to insert human insights based knowledge is limited. Therefore, our motivations were to reduce the created gap by allowing end-users to add their data inputs to the basic geometric database. In this article, we will present a simple Land cover database updating method which combines insights extracted from the analyzed image, and a given spatial data of vector layers. The main stages of the advanced practice are multispectral image segmentation and supervised classification together with given vector data geometric fusion while maintaining the principle of low shape editorial work to be done. All coding was done utilizing open source software components.
An efficient semi-supervised community detection framework in social networks.
Li, Zhen; Gong, Yong; Pan, Zhisong; Hu, Guyu
2017-01-01
Community detection is an important tasks across a number of research fields including social science, biology, and physics. In the real world, topology information alone is often inadequate to accurately find out community structure due to its sparsity and noise. The potential useful prior information such as pairwise constraints which contain must-link and cannot-link constraints can be obtained from domain knowledge in many applications. Thus, combining network topology with prior information to improve the community detection accuracy is promising. Previous methods mainly utilize the must-link constraints while cannot make full use of cannot-link constraints. In this paper, we propose a semi-supervised community detection framework which can effectively incorporate two types of pairwise constraints into the detection process. Particularly, must-link and cannot-link constraints are represented as positive and negative links, and we encode them by adding different graph regularization terms to penalize closeness of the nodes. Experiments on multiple real-world datasets show that the proposed framework significantly improves the accuracy of community detection.
A multimedia retrieval framework based on semi-supervised ranking and relevance feedback.
Yang, Yi; Nie, Feiping; Xu, Dong; Luo, Jiebo; Zhuang, Yueting; Pan, Yunhe
2012-04-01
We present a new framework for multimedia content analysis and retrieval which consists of two independent algorithms. First, we propose a new semi-supervised algorithm called ranking with Local Regression and Global Alignment (LRGA) to learn a robust Laplacian matrix for data ranking. In LRGA, for each data point, a local linear regression model is used to predict the ranking scores of its neighboring points. A unified objective function is then proposed to globally align the local models from all the data points so that an optimal ranking score can be assigned to each data point. Second, we propose a semi-supervised long-term Relevance Feedback (RF) algorithm to refine the multimedia data representation. The proposed long-term RF algorithm utilizes both the multimedia data distribution in multimedia feature space and the history RF information provided by users. A trace ratio optimization problem is then formulated and solved by an efficient algorithm. The algorithms have been applied to several content-based multimedia retrieval applications, including cross-media retrieval, image retrieval, and 3D motion/pose data retrieval. Comprehensive experiments on four data sets have demonstrated its advantages in precision, robustness, scalability, and computational efficiency.
Improved semi-supervised online boosting for object tracking
NASA Astrophysics Data System (ADS)
Li, Yicui; Qi, Lin; Tan, Shukun
2016-10-01
The advantage of an online semi-supervised boosting method which takes object tracking problem as a classification problem, is training a binary classifier from labeled and unlabeled examples. Appropriate object features are selected based on real time changes in the object. However, the online semi-supervised boosting method faces one key problem: The traditional self-training using the classification results to update the classifier itself, often leads to drifting or tracking failure, due to the accumulated error during each update of the tracker. To overcome the disadvantages of semi-supervised online boosting based on object tracking methods, the contribution of this paper is an improved online semi-supervised boosting method, in which the learning process is guided by positive (P) and negative (N) constraints, termed P-N constraints, which restrict the labeling of the unlabeled samples. First, we train the classification by an online semi-supervised boosting. Then, this classification is used to process the next frame. Finally, the classification is analyzed by the P-N constraints, which are used to verify if the labels of unlabeled data assigned by the classifier are in line with the assumptions made about positive and negative samples. The proposed algorithm can effectively improve the discriminative ability of the classifier and significantly alleviate the drifting problem in tracking applications. In the experiments, we demonstrate real-time tracking of our tracker on several challenging test sequences where our tracker outperforms other related on-line tracking methods and achieves promising tracking performance.
Integrative gene network construction to analyze cancer recurrence using semi-supervised learning.
Park, Chihyun; Ahn, Jaegyoon; Kim, Hyunjin; Park, Sanghyun
2014-01-01
The prognosis of cancer recurrence is an important research area in bioinformatics and is challenging due to the small sample sizes compared to the vast number of genes. There have been several attempts to predict cancer recurrence. Most studies employed a supervised approach, which uses only a few labeled samples. Semi-supervised learning can be a great alternative to solve this problem. There have been few attempts based on manifold assumptions to reveal the detailed roles of identified cancer genes in recurrence. In order to predict cancer recurrence, we proposed a novel semi-supervised learning algorithm based on a graph regularization approach. We transformed the gene expression data into a graph structure for semi-supervised learning and integrated protein interaction data with the gene expression data to select functionally-related gene pairs. Then, we predicted the recurrence of cancer by applying a regularization approach to the constructed graph containing both labeled and unlabeled nodes. The average improvement rate of accuracy for three different cancer datasets was 24.9% compared to existing supervised and semi-supervised methods. We performed functional enrichment on the gene networks used for learning. We identified that those gene networks are significantly associated with cancer-recurrence-related biological functions. Our algorithm was developed with standard C++ and is available in Linux and MS Windows formats in the STL library. The executable program is freely available at: http://embio.yonsei.ac.kr/~Park/ssl.php.
Multilabel user classification using the community structure of online networks
Papadopoulos, Symeon; Kompatsiaris, Yiannis
2017-01-01
We study the problem of semi-supervised, multi-label user classification of networked data in the online social platform setting. We propose a framework that combines unsupervised community extraction and supervised, community-based feature weighting before training a classifier. We introduce Approximate Regularized Commute-Time Embedding (ARCTE), an algorithm that projects the users of a social graph onto a latent space, but instead of packing the global structure into a matrix of predefined rank, as many spectral and neural representation learning methods do, it extracts local communities for all users in the graph in order to learn a sparse embedding. To this end, we employ an improvement of personalized PageRank algorithms for searching locally in each user’s graph structure. Then, we perform supervised community feature weighting in order to boost the importance of highly predictive communities. We assess our method performance on the problem of user classification by performing an extensive comparative study among various recent methods based on graph embeddings. The comparison shows that ARCTE significantly outperforms the competition in almost all cases, achieving up to 35% relative improvement compared to the second best competing method in terms of F1-score. PMID:28278242
Multilabel user classification using the community structure of online networks.
Rizos, Georgios; Papadopoulos, Symeon; Kompatsiaris, Yiannis
2017-01-01
We study the problem of semi-supervised, multi-label user classification of networked data in the online social platform setting. We propose a framework that combines unsupervised community extraction and supervised, community-based feature weighting before training a classifier. We introduce Approximate Regularized Commute-Time Embedding (ARCTE), an algorithm that projects the users of a social graph onto a latent space, but instead of packing the global structure into a matrix of predefined rank, as many spectral and neural representation learning methods do, it extracts local communities for all users in the graph in order to learn a sparse embedding. To this end, we employ an improvement of personalized PageRank algorithms for searching locally in each user's graph structure. Then, we perform supervised community feature weighting in order to boost the importance of highly predictive communities. We assess our method performance on the problem of user classification by performing an extensive comparative study among various recent methods based on graph embeddings. The comparison shows that ARCTE significantly outperforms the competition in almost all cases, achieving up to 35% relative improvement compared to the second best competing method in terms of F1-score.
Qu, Jianfeng; Ouyang, Dantong; Hua, Wen; Ye, Yuxin; Li, Ximing
2018-04-01
Distant supervision for neural relation extraction is an efficient approach to extracting massive relations with reference to plain texts. However, the existing neural methods fail to capture the critical words in sentence encoding and meanwhile lack useful sentence information for some positive training instances. To address the above issues, we propose a novel neural relation extraction model. First, we develop a word-level attention mechanism to distinguish the importance of each individual word in a sentence, increasing the attention weights for those critical words. Second, we investigate the semantic information from word embeddings of target entities, which can be developed as a supplementary feature for the extractor. Experimental results show that our model outperforms previous state-of-the-art baselines. Copyright © 2018 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Khan, Faisal M.; Kulikowski, Casimir A.
2016-03-01
A major focus area for precision medicine is in managing the treatment of newly diagnosed prostate cancer patients. For patients with a positive biopsy, clinicians aim to develop an individualized treatment plan based on a mechanistic understanding of the disease factors unique to each patient. Recently, there has been a movement towards a multi-modal view of the cancer through the fusion of quantitative information from multiple sources, imaging and otherwise. Simultaneously, there have been significant advances in machine learning methods for medical prognostics which integrate a multitude of predictive factors to develop an individualized risk assessment and prognosis for patients. An emerging area of research is in semi-supervised approaches which transduce the appropriate survival time for censored patients. In this work, we apply a novel semi-supervised approach for support vector regression to predict the prognosis for newly diagnosed prostate cancer patients. We integrate clinical characteristics of a patient's disease with imaging derived metrics for biomarker expression as well as glandular and nuclear morphology. In particular, our goal was to explore the performance of nuclear and glandular architecture within the transduction algorithm and assess their predictive power when compared with the Gleason score manually assigned by a pathologist. Our analysis in a multi-institutional cohort of 1027 patients indicates that not only do glandular and morphometric characteristics improve the predictive power of the semi-supervised transduction algorithm; they perform better when the pathological Gleason is absent. This work represents one of the first assessments of quantitative prostate biopsy architecture versus the Gleason grade in the context of a data fusion paradigm which leverages a semi-supervised approach for risk prognosis.
Bryan, Kenneth; Cunningham, Pádraig
2008-01-01
Background Microarrays have the capacity to measure the expressions of thousands of genes in parallel over many experimental samples. The unsupervised classification technique of bicluster analysis has been employed previously to uncover gene expression correlations over subsets of samples with the aim of providing a more accurate model of the natural gene functional classes. This approach also has the potential to aid functional annotation of unclassified open reading frames (ORFs). Until now this aspect of biclustering has been under-explored. In this work we illustrate how bicluster analysis may be extended into a 'semi-supervised' ORF annotation approach referred to as BALBOA. Results The efficacy of the BALBOA ORF classification technique is first assessed via cross validation and compared to a multi-class k-Nearest Neighbour (kNN) benchmark across three independent gene expression datasets. BALBOA is then used to assign putative functional annotations to unclassified yeast ORFs. These predictions are evaluated using existing experimental and protein sequence information. Lastly, we employ a related semi-supervised method to predict the presence of novel functional modules within yeast. Conclusion In this paper we demonstrate how unsupervised classification methods, such as bicluster analysis, may be extended using of available annotations to form semi-supervised approaches within the gene expression analysis domain. We show that such methods have the potential to improve upon supervised approaches and shed new light on the functions of unclassified ORFs and their co-regulation. PMID:18831786
A Ranking Approach on Large-Scale Graph With Multidimensional Heterogeneous Information.
Wei, Wei; Gao, Bin; Liu, Tie-Yan; Wang, Taifeng; Li, Guohui; Li, Hang
2016-04-01
Graph-based ranking has been extensively studied and frequently applied in many applications, such as webpage ranking. It aims at mining potentially valuable information from the raw graph-structured data. Recently, with the proliferation of rich heterogeneous information (e.g., node/edge features and prior knowledge) available in many real-world graphs, how to effectively and efficiently leverage all information to improve the ranking performance becomes a new challenging problem. Previous methods only utilize part of such information and attempt to rank graph nodes according to link-based methods, of which the ranking performances are severely affected by several well-known issues, e.g., over-fitting or high computational complexity, especially when the scale of graph is very large. In this paper, we address the large-scale graph-based ranking problem and focus on how to effectively exploit rich heterogeneous information of the graph to improve the ranking performance. Specifically, we propose an innovative and effective semi-supervised PageRank (SSP) approach to parameterize the derived information within a unified semi-supervised learning framework (SSLF-GR), then simultaneously optimize the parameters and the ranking scores of graph nodes. Experiments on the real-world large-scale graphs demonstrate that our method significantly outperforms the algorithms that consider such graph information only partially.
Method of Grassland Information Extraction Based on Multi-Level Segmentation and Cart Model
NASA Astrophysics Data System (ADS)
Qiao, Y.; Chen, T.; He, J.; Wen, Q.; Liu, F.; Wang, Z.
2018-04-01
It is difficult to extract grassland accurately by traditional classification methods, such as supervised method based on pixels or objects. This paper proposed a new method combing the multi-level segmentation with CART (classification and regression tree) model. The multi-level segmentation which combined the multi-resolution segmentation and the spectral difference segmentation could avoid the over and insufficient segmentation seen in the single segmentation mode. The CART model was established based on the spectral characteristics and texture feature which were excavated from training sample data. Xilinhaote City in Inner Mongolia Autonomous Region was chosen as the typical study area and the proposed method was verified by using visual interpretation results as approximate truth value. Meanwhile, the comparison with the nearest neighbor supervised classification method was obtained. The experimental results showed that the total precision of classification and the Kappa coefficient of the proposed method was 95 % and 0.9, respectively. However, the total precision of classification and the Kappa coefficient of the nearest neighbor supervised classification method was 80 % and 0.56, respectively. The result suggested that the accuracy of classification proposed in this paper was higher than the nearest neighbor supervised classification method. The experiment certificated that the proposed method was an effective extraction method of grassland information, which could enhance the boundary of grassland classification and avoid the restriction of grassland distribution scale. This method was also applicable to the extraction of grassland information in other regions with complicated spatial features, which could avoid the interference of woodland, arable land and water body effectively.
Self-Supervised Chinese Ontology Learning from Online Encyclopedias
Shao, Zhiqing; Ruan, Tong
2014-01-01
Constructing ontology manually is a time-consuming, error-prone, and tedious task. We present SSCO, a self-supervised learning based chinese ontology, which contains about 255 thousand concepts, 5 million entities, and 40 million facts. We explore the three largest online Chinese encyclopedias for ontology learning and describe how to transfer the structured knowledge in encyclopedias, including article titles, category labels, redirection pages, taxonomy systems, and InfoBox modules, into ontological form. In order to avoid the errors in encyclopedias and enrich the learnt ontology, we also apply some machine learning based methods. First, we proof that the self-supervised machine learning method is practicable in Chinese relation extraction (at least for synonymy and hyponymy) statistically and experimentally and train some self-supervised models (SVMs and CRFs) for synonymy extraction, concept-subconcept relation extraction, and concept-instance relation extraction; the advantages of our methods are that all training examples are automatically generated from the structural information of encyclopedias and a few general heuristic rules. Finally, we evaluate SSCO in two aspects, scale and precision; manual evaluation results show that the ontology has excellent precision, and high coverage is concluded by comparing SSCO with other famous ontologies and knowledge bases; the experiment results also indicate that the self-supervised models obviously enrich SSCO. PMID:24715819
Self-supervised Chinese ontology learning from online encyclopedias.
Hu, Fanghuai; Shao, Zhiqing; Ruan, Tong
2014-01-01
Constructing ontology manually is a time-consuming, error-prone, and tedious task. We present SSCO, a self-supervised learning based chinese ontology, which contains about 255 thousand concepts, 5 million entities, and 40 million facts. We explore the three largest online Chinese encyclopedias for ontology learning and describe how to transfer the structured knowledge in encyclopedias, including article titles, category labels, redirection pages, taxonomy systems, and InfoBox modules, into ontological form. In order to avoid the errors in encyclopedias and enrich the learnt ontology, we also apply some machine learning based methods. First, we proof that the self-supervised machine learning method is practicable in Chinese relation extraction (at least for synonymy and hyponymy) statistically and experimentally and train some self-supervised models (SVMs and CRFs) for synonymy extraction, concept-subconcept relation extraction, and concept-instance relation extraction; the advantages of our methods are that all training examples are automatically generated from the structural information of encyclopedias and a few general heuristic rules. Finally, we evaluate SSCO in two aspects, scale and precision; manual evaluation results show that the ontology has excellent precision, and high coverage is concluded by comparing SSCO with other famous ontologies and knowledge bases; the experiment results also indicate that the self-supervised models obviously enrich SSCO.
Gönen, Mehmet
2014-01-01
Coupled training of dimensionality reduction and classification is proposed previously to improve the prediction performance for single-label problems. Following this line of research, in this paper, we first introduce a novel Bayesian method that combines linear dimensionality reduction with linear binary classification for supervised multilabel learning and present a deterministic variational approximation algorithm to learn the proposed probabilistic model. We then extend the proposed method to find intrinsic dimensionality of the projected subspace using automatic relevance determination and to handle semi-supervised learning using a low-density assumption. We perform supervised learning experiments on four benchmark multilabel learning data sets by comparing our method with baseline linear dimensionality reduction algorithms. These experiments show that the proposed approach achieves good performance values in terms of hamming loss, average AUC, macro F1, and micro F1 on held-out test data. The low-dimensional embeddings obtained by our method are also very useful for exploratory data analysis. We also show the effectiveness of our approach in finding intrinsic subspace dimensionality and semi-supervised learning tasks. PMID:24532862
Gönen, Mehmet
2014-03-01
Coupled training of dimensionality reduction and classification is proposed previously to improve the prediction performance for single-label problems. Following this line of research, in this paper, we first introduce a novel Bayesian method that combines linear dimensionality reduction with linear binary classification for supervised multilabel learning and present a deterministic variational approximation algorithm to learn the proposed probabilistic model. We then extend the proposed method to find intrinsic dimensionality of the projected subspace using automatic relevance determination and to handle semi-supervised learning using a low-density assumption. We perform supervised learning experiments on four benchmark multilabel learning data sets by comparing our method with baseline linear dimensionality reduction algorithms. These experiments show that the proposed approach achieves good performance values in terms of hamming loss, average AUC, macro F 1 , and micro F 1 on held-out test data. The low-dimensional embeddings obtained by our method are also very useful for exploratory data analysis. We also show the effectiveness of our approach in finding intrinsic subspace dimensionality and semi-supervised learning tasks.
Liu, Xiao; Shi, Jun; Zhou, Shichong; Lu, Minhua
2014-01-01
The dimensionality reduction is an important step in ultrasound image based computer-aided diagnosis (CAD) for breast cancer. A newly proposed l2,1 regularized correntropy algorithm for robust feature selection (CRFS) has achieved good performance for noise corrupted data. Therefore, it has the potential to reduce the dimensions of ultrasound image features. However, in clinical practice, the collection of labeled instances is usually expensive and time costing, while it is relatively easy to acquire the unlabeled or undetermined instances. Therefore, the semi-supervised learning is very suitable for clinical CAD. The iterated Laplacian regularization (Iter-LR) is a new regularization method, which has been proved to outperform the traditional graph Laplacian regularization in semi-supervised classification and ranking. In this study, to augment the classification accuracy of the breast ultrasound CAD based on texture feature, we propose an Iter-LR-based semi-supervised CRFS (Iter-LR-CRFS) algorithm, and then apply it to reduce the feature dimensions of ultrasound images for breast CAD. We compared the Iter-LR-CRFS with LR-CRFS, original supervised CRFS, and principal component analysis. The experimental results indicate that the proposed Iter-LR-CRFS significantly outperforms all other algorithms.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Harvey, Neal R; Ruggiero, Christy E; Pawley, Norma H
2009-01-01
Detecting complex targets, such as facilities, in commercially available satellite imagery is a difficult problem that human analysts try to solve by applying world knowledge. Often there are known observables that can be extracted by pixel-level feature detectors that can assist in the facility detection process. Individually, each of these observables is not sufficient for an accurate and reliable detection, but in combination, these auxiliary observables may provide sufficient context for detection by a machine learning algorithm. We describe an approach for automatic detection of facilities that uses an automated feature extraction algorithm to extract auxiliary observables, and a semi-supervisedmore » assisted target recognition algorithm to then identify facilities of interest. We illustrate the approach using an example of finding schools in Quickbird image data of Albuquerque, New Mexico. We use Los Alamos National Laboratory's Genie Pro automated feature extraction algorithm to find a set of auxiliary features that should be useful in the search for schools, such as parking lots, large buildings, sports fields and residential areas and then combine these features using Genie Pro's assisted target recognition algorithm to learn a classifier that finds schools in the image data.« less
L1-norm locally linear representation regularization multi-source adaptation learning.
Tao, Jianwen; Wen, Shiting; Hu, Wenjun
2015-09-01
In most supervised domain adaptation learning (DAL) tasks, one has access only to a small number of labeled examples from target domain. Therefore the success of supervised DAL in this "small sample" regime needs the effective utilization of the large amounts of unlabeled data to extract information that is useful for generalization. Toward this end, we here use the geometric intuition of manifold assumption to extend the established frameworks in existing model-based DAL methods for function learning by incorporating additional information about the target geometric structure of the marginal distribution. We would like to ensure that the solution is smooth with respect to both the ambient space and the target marginal distribution. In doing this, we propose a novel L1-norm locally linear representation regularization multi-source adaptation learning framework which exploits the geometry of the probability distribution, which has two techniques. Firstly, an L1-norm locally linear representation method is presented for robust graph construction by replacing the L2-norm reconstruction measure in LLE with L1-norm one, which is termed as L1-LLR for short. Secondly, considering the robust graph regularization, we replace traditional graph Laplacian regularization with our new L1-LLR graph Laplacian regularization and therefore construct new graph-based semi-supervised learning framework with multi-source adaptation constraint, which is coined as L1-MSAL method. Moreover, to deal with the nonlinear learning problem, we also generalize the L1-MSAL method by mapping the input data points from the input space to a high-dimensional reproducing kernel Hilbert space (RKHS) via a nonlinear mapping. Promising experimental results have been obtained on several real-world datasets such as face, visual video and object. Copyright © 2015 Elsevier Ltd. All rights reserved.
Relation Extraction with Weak Supervision and Distributional Semantics
2013-05-01
DATES COVERED 00-00-2013 to 00-00-2013 4 . TITLE AND SUBTITLE Relation Extraction with Weak Supervision and Distributional Semantics 5a...ix List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x 1 Introduction 1 2 Prior Work 4 ...2.1 Supervised relation extraction . . . . . . . . . . . . . . . . . . . . . 4 2.2 Distant supervision for relation extraction
Semi-supervised clustering for parcellating brain regions based on resting state fMRI data
NASA Astrophysics Data System (ADS)
Cheng, Hewei; Fan, Yong
2014-03-01
Many unsupervised clustering techniques have been adopted for parcellating brain regions of interest into functionally homogeneous subregions based on resting state fMRI data. However, the unsupervised clustering techniques are not able to take advantage of exiting knowledge of the functional neuroanatomy readily available from studies of cytoarchitectonic parcellation or meta-analysis of the literature. In this study, we propose a semi-supervised clustering method for parcellating amygdala into functionally homogeneous subregions based on resting state fMRI data. Particularly, the semi-supervised clustering is implemented under the framework of graph partitioning, and adopts prior information and spatial consistent constraints to obtain a spatially contiguous parcellation result. The graph partitioning problem is solved using an efficient algorithm similar to the well-known weighted kernel k-means algorithm. Our method has been validated for parcellating amygdala into 3 subregions based on resting state fMRI data of 28 subjects. The experiment results have demonstrated that the proposed method is more robust than unsupervised clustering and able to parcellate amygdala into centromedial, laterobasal, and superficial parts with improved functionally homogeneity compared with the cytoarchitectonic parcellation result. The validity of the parcellation results is also supported by distinctive functional and structural connectivity patterns of the subregions and high consistency between coactivation patterns derived from a meta-analysis and functional connectivity patterns of corresponding subregions.
Multi-Modal Curriculum Learning for Semi-Supervised Image Classification.
Gong, Chen; Tao, Dacheng; Maybank, Stephen J; Liu, Wei; Kang, Guoliang; Yang, Jie
2016-07-01
Semi-supervised image classification aims to classify a large quantity of unlabeled images by typically harnessing scarce labeled images. Existing semi-supervised methods often suffer from inadequate classification accuracy when encountering difficult yet critical images, such as outliers, because they treat all unlabeled images equally and conduct classifications in an imperfectly ordered sequence. In this paper, we employ the curriculum learning methodology by investigating the difficulty of classifying every unlabeled image. The reliability and the discriminability of these unlabeled images are particularly investigated for evaluating their difficulty. As a result, an optimized image sequence is generated during the iterative propagations, and the unlabeled images are logically classified from simple to difficult. Furthermore, since images are usually characterized by multiple visual feature descriptors, we associate each kind of features with a teacher, and design a multi-modal curriculum learning (MMCL) strategy to integrate the information from different feature modalities. In each propagation, each teacher analyzes the difficulties of the currently unlabeled images from its own modality viewpoint. A consensus is subsequently reached among all the teachers, determining the currently simplest images (i.e., a curriculum), which are to be reliably classified by the multi-modal learner. This well-organized propagation process leveraging multiple teachers and one learner enables our MMCL to outperform five state-of-the-art methods on eight popular image data sets.
Semi-supervised Learning for Phenotyping Tasks.
Dligach, Dmitriy; Miller, Timothy; Savova, Guergana K
2015-01-01
Supervised learning is the dominant approach to automatic electronic health records-based phenotyping, but it is expensive due to the cost of manual chart review. Semi-supervised learning takes advantage of both scarce labeled and plentiful unlabeled data. In this work, we study a family of semi-supervised learning algorithms based on Expectation Maximization (EM) in the context of several phenotyping tasks. We first experiment with the basic EM algorithm. When the modeling assumptions are violated, basic EM leads to inaccurate parameter estimation. Augmented EM attenuates this shortcoming by introducing a weighting factor that downweights the unlabeled data. Cross-validation does not always lead to the best setting of the weighting factor and other heuristic methods may be preferred. We show that accurate phenotyping models can be trained with only a few hundred labeled (and a large number of unlabeled) examples, potentially providing substantial savings in the amount of the required manual chart review.
Liang, Yong; Chai, Hua; Liu, Xiao-Ying; Xu, Zong-Ben; Zhang, Hai; Leung, Kwong-Sak
2016-03-01
One of the most important objectives of the clinical cancer research is to diagnose cancer more accurately based on the patients' gene expression profiles. Both Cox proportional hazards model (Cox) and accelerated failure time model (AFT) have been widely adopted to the high risk and low risk classification or survival time prediction for the patients' clinical treatment. Nevertheless, two main dilemmas limit the accuracy of these prediction methods. One is that the small sample size and censored data remain a bottleneck for training robust and accurate Cox classification model. In addition to that, similar phenotype tumours and prognoses are actually completely different diseases at the genotype and molecular level. Thus, the utility of the AFT model for the survival time prediction is limited when such biological differences of the diseases have not been previously identified. To try to overcome these two main dilemmas, we proposed a novel semi-supervised learning method based on the Cox and AFT models to accurately predict the treatment risk and the survival time of the patients. Moreover, we adopted the efficient L1/2 regularization approach in the semi-supervised learning method to select the relevant genes, which are significantly associated with the disease. The results of the simulation experiments show that the semi-supervised learning model can significant improve the predictive performance of Cox and AFT models in survival analysis. The proposed procedures have been successfully applied to four real microarray gene expression and artificial evaluation datasets. The advantages of our proposed semi-supervised learning method include: 1) significantly increase the available training samples from censored data; 2) high capability for identifying the survival risk classes of patient in Cox model; 3) high predictive accuracy for patients' survival time in AFT model; 4) strong capability of the relevant biomarker selection. Consequently, our proposed semi-supervised learning model is one more appropriate tool for survival analysis in clinical cancer research.
Prototype Vector Machine for Large Scale Semi-Supervised Learning
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhang, Kai; Kwok, James T.; Parvin, Bahram
2009-04-29
Practicaldataminingrarelyfalls exactlyinto the supervisedlearning scenario. Rather, the growing amount of unlabeled data poses a big challenge to large-scale semi-supervised learning (SSL). We note that the computationalintensivenessofgraph-based SSLarises largely from the manifold or graph regularization, which in turn lead to large models that are dificult to handle. To alleviate this, we proposed the prototype vector machine (PVM), a highlyscalable,graph-based algorithm for large-scale SSL. Our key innovation is the use of"prototypes vectors" for effcient approximation on both the graph-based regularizer and model representation. The choice of prototypes are grounded upon two important criteria: they not only perform effective low-rank approximation of themore » kernel matrix, but also span a model suffering the minimum information loss compared with the complete model. We demonstrate encouraging performance and appealing scaling properties of the PVM on a number of machine learning benchmark data sets.« less
Semi-supervised tracking of extreme weather events in global spatio-temporal climate datasets
NASA Astrophysics Data System (ADS)
Kim, S. K.; Prabhat, M.; Williams, D. N.
2017-12-01
Deep neural networks have been successfully applied to solve problem to detect extreme weather events in large scale climate datasets and attend superior performance that overshadows all previous hand-crafted methods. Recent work has shown that multichannel spatiotemporal encoder-decoder CNN architecture is able to localize events in semi-supervised bounding box. Motivated by this work, we propose new learning metric based on Variational Auto-Encoders (VAE) and Long-Short-Term-Memory (LSTM) to track extreme weather events in spatio-temporal dataset. We consider spatio-temporal object tracking problems as learning probabilistic distribution of continuous latent features of auto-encoder using stochastic variational inference. For this, we assume that our datasets are i.i.d and latent features is able to be modeled by Gaussian distribution. In proposed metric, we first train VAE to generate approximate posterior given multichannel climate input with an extreme climate event at fixed time. Then, we predict bounding box, location and class of extreme climate events using convolutional layers given input concatenating three features including embedding, sampled mean and standard deviation. Lastly, we train LSTM with concatenated input to learn timely information of dataset by recurrently feeding output back to next time-step's input of VAE. Our contribution is two-fold. First, we show the first semi-supervised end-to-end architecture based on VAE to track extreme weather events which can apply to massive scaled unlabeled climate datasets. Second, the information of timely movement of events is considered for bounding box prediction using LSTM which can improve accuracy of localization. To our knowledge, this technique has not been explored neither in climate community or in Machine Learning community.
Chai, Hua; Li, Zi-Na; Meng, De-Yu; Xia, Liang-Yong; Liang, Yong
2017-10-12
Gene selection is an attractive and important task in cancer survival analysis. Most existing supervised learning methods can only use the labeled biological data, while the censored data (weakly labeled data) far more than the labeled data are ignored in model building. Trying to utilize such information in the censored data, a semi-supervised learning framework (Cox-AFT model) combined with Cox proportional hazard (Cox) and accelerated failure time (AFT) model was used in cancer research, which has better performance than the single Cox or AFT model. This method, however, is easily affected by noise. To alleviate this problem, in this paper we combine the Cox-AFT model with self-paced learning (SPL) method to more effectively employ the information in the censored data in a self-learning way. SPL is a kind of reliable and stable learning mechanism, which is recently proposed for simulating the human learning process to help the AFT model automatically identify and include samples of high confidence into training, minimizing interference from high noise. Utilizing the SPL method produces two direct advantages: (1) The utilization of censored data is further promoted; (2) the noise delivered to the model is greatly decreased. The experimental results demonstrate the effectiveness of the proposed model compared to the traditional Cox-AFT model.
Cao, Peng; Liu, Xiaoli; Bao, Hang; Yang, Jinzhu; Zhao, Dazhe
2015-01-01
The false-positive reduction (FPR) is a crucial step in the computer aided detection system for the breast. The issues of imbalanced data distribution and the limitation of labeled samples complicate the classification procedure. To overcome these challenges, we propose oversampling and semi-supervised learning methods based on the restricted Boltzmann machines (RBMs) to solve the classification of imbalanced data with a few labeled samples. To evaluate the proposed method, we conducted a comprehensive performance study and compared its results with the commonly used techniques. Experiments on benchmark dataset of DDSM demonstrate the effectiveness of the RBMs based oversampling and semi-supervised learning method in terms of geometric mean (G-mean) for false positive reduction in Breast CAD.
Guided Text Search Using Adaptive Visual Analytics
DOE Office of Scientific and Technical Information (OSTI.GOV)
Steed, Chad A; Symons, Christopher T; Senter, James K
This research demonstrates the promise of augmenting interactive visualizations with semi- supervised machine learning techniques to improve the discovery of significant associations and insights in the search and analysis of textual information. More specifically, we have developed a system called Gryffin that hosts a unique collection of techniques that facilitate individualized investigative search pertaining to an ever-changing set of analytical questions over an indexed collection of open-source documents related to critical national infrastructure. The Gryffin client hosts dynamic displays of the search results via focus+context record listings, temporal timelines, term-frequency views, and multiple coordinate views. Furthermore, as the analyst interactsmore » with the display, the interactions are recorded and used to label the search records. These labeled records are then used to drive semi-supervised machine learning algorithms that re-rank the unlabeled search records such that potentially relevant records are moved to the top of the record listing. Gryffin is described in the context of the daily tasks encountered at the US Department of Homeland Security s Fusion Center, with whom we are collaborating in its development. The resulting system is capable of addressing the analysts information overload that can be directly attributed to the deluge of information that must be addressed in the search and investigative analysis of textual information.« less
Wang, Fei
2015-06-01
With the rapid development of information technologies, tremendous amount of data became readily available in various application domains. This big data era presents challenges to many conventional data analytics research directions including data capture, storage, search, sharing, analysis, and visualization. It is no surprise to see that the success of next-generation healthcare systems heavily relies on the effective utilization of gigantic amounts of medical data. The ability of analyzing big data in modern healthcare systems plays a vital role in the improvement of the quality of care delivery. Specifically, patient similarity evaluation aims at estimating the clinical affinity and diagnostic proximity of patients. As one of the successful data driven techniques adopted in healthcare systems, patient similarity evaluation plays a fundamental role in many healthcare research areas such as prognosis, risk assessment, and comparative effectiveness analysis. However, existing algorithms for patient similarity evaluation are inefficient in handling massive patient data. In this paper, we propose an Adaptive Semi-Supervised Recursive Tree Partitioning (ART) framework for large scale patient indexing such that the patients with similar clinical or diagnostic patterns can be correctly and efficiently retrieved. The framework is designed for semi-supervised settings since it is crucial to leverage experts' supervision knowledge in medical scenario, which are fairly limited compared to the available data. Starting from the proposed ART framework, we will discuss several specific instantiations and validate them on both benchmark and real world healthcare data. Our results show that with the ART framework, the patients can be efficiently and effectively indexed in the sense that (1) similarity patients can be retrieved in a very short time; (2) the retrieval performance can beat the state-of-the art indexing methods. Copyright © 2015. Published by Elsevier Inc.
NASA Astrophysics Data System (ADS)
Mafanya, Madodomzi; Tsele, Philemon; Botai, Joel; Manyama, Phetole; Swart, Barend; Monate, Thabang
2017-07-01
Invasive alien plants (IAPs) not only pose a serious threat to biodiversity and water resources but also have impacts on human and animal wellbeing. To support decision making in IAPs monitoring, semi-automated image classifiers which are capable of extracting valuable information in remotely sensed data are vital. This study evaluated the mapping accuracies of supervised and unsupervised image classifiers for mapping Harrisia pomanensis (a cactus plant commonly known as the Midnight Lady) using two interlinked evaluation strategies i.e. point and area based accuracy assessment. Results of the point-based accuracy assessment show that with reference to 219 ground control points, the supervised image classifiers (i.e. Maxver and Bhattacharya) mapped H. pomanensis better than the unsupervised image classifiers (i.e. K-mediuns, Euclidian Length and Isoseg). In this regard, user and producer accuracies were 82.4% and 84% respectively for the Maxver classifier. The user and producer accuracies for the Bhattacharya classifier were 90% and 95.7%, respectively. Though the Maxver produced a higher overall accuracy and Kappa estimate than the Bhattacharya classifier, the Maxver Kappa estimate of 0.8305 is not significantly (statistically) greater than the Bhattacharya Kappa estimate of 0.8088 at a 95% confidence interval. The area based accuracy assessment results show that the Bhattacharya classifier estimated the spatial extent of H. pomanensis with an average mapping accuracy of 86.1% whereas the Maxver classifier only gave an average mapping accuracy of 65.2%. Based on these results, the Bhattacharya classifier is therefore recommended for mapping H. pomanensis. These findings will aid in the algorithm choice making for the development of a semi-automated image classification system for mapping IAPs.
ERIC Educational Resources Information Center
Huang, Jian
2010-01-01
With the increasing wealth of information on the Web, information integration is ubiquitous as the same real-world entity may appear in a variety of forms extracted from different sources. This dissertation proposes supervised and unsupervised algorithms that are naturally integrated in a scalable framework to solve the entity resolution problem,…
Optimizing area under the ROC curve using semi-supervised learning
Wang, Shijun; Li, Diana; Petrick, Nicholas; Sahiner, Berkman; Linguraru, Marius George; Summers, Ronald M.
2014-01-01
Receiver operating characteristic (ROC) analysis is a standard methodology to evaluate the performance of a binary classification system. The area under the ROC curve (AUC) is a performance metric that summarizes how well a classifier separates two classes. Traditional AUC optimization techniques are supervised learning methods that utilize only labeled data (i.e., the true class is known for all data) to train the classifiers. In this work, inspired by semi-supervised and transductive learning, we propose two new AUC optimization algorithms hereby referred to as semi-supervised learning receiver operating characteristic (SSLROC) algorithms, which utilize unlabeled test samples in classifier training to maximize AUC. Unlabeled samples are incorporated into the AUC optimization process, and their ranking relationships to labeled positive and negative training samples are considered as optimization constraints. The introduced test samples will cause the learned decision boundary in a multidimensional feature space to adapt not only to the distribution of labeled training data, but also to the distribution of unlabeled test data. We formulate the semi-supervised AUC optimization problem as a semi-definite programming problem based on the margin maximization theory. The proposed methods SSLROC1 (1-norm) and SSLROC2 (2-norm) were evaluated using 34 (determined by power analysis) randomly selected datasets from the University of California, Irvine machine learning repository. Wilcoxon signed rank tests showed that the proposed methods achieved significant improvement compared with state-of-the-art methods. The proposed methods were also applied to a CT colonography dataset for colonic polyp classification and showed promising results.1 PMID:25395692
Optimizing area under the ROC curve using semi-supervised learning.
Wang, Shijun; Li, Diana; Petrick, Nicholas; Sahiner, Berkman; Linguraru, Marius George; Summers, Ronald M
2015-01-01
Receiver operating characteristic (ROC) analysis is a standard methodology to evaluate the performance of a binary classification system. The area under the ROC curve (AUC) is a performance metric that summarizes how well a classifier separates two classes. Traditional AUC optimization techniques are supervised learning methods that utilize only labeled data (i.e., the true class is known for all data) to train the classifiers. In this work, inspired by semi-supervised and transductive learning, we propose two new AUC optimization algorithms hereby referred to as semi-supervised learning receiver operating characteristic (SSLROC) algorithms, which utilize unlabeled test samples in classifier training to maximize AUC. Unlabeled samples are incorporated into the AUC optimization process, and their ranking relationships to labeled positive and negative training samples are considered as optimization constraints. The introduced test samples will cause the learned decision boundary in a multidimensional feature space to adapt not only to the distribution of labeled training data, but also to the distribution of unlabeled test data. We formulate the semi-supervised AUC optimization problem as a semi-definite programming problem based on the margin maximization theory. The proposed methods SSLROC1 (1-norm) and SSLROC2 (2-norm) were evaluated using 34 (determined by power analysis) randomly selected datasets from the University of California, Irvine machine learning repository. Wilcoxon signed rank tests showed that the proposed methods achieved significant improvement compared with state-of-the-art methods. The proposed methods were also applied to a CT colonography dataset for colonic polyp classification and showed promising results.
Pollock, Alex; Campbell, Pauline; Deery, Ruth; Fleming, Mick; Rankin, Jean; Sloan, Graham; Cheyne, Helen
2017-08-01
The aim of this study was to systematically review evidence relating to clinical supervision for nurses, midwives and allied health professionals. Since 1902 statutory supervision has been a requirement for UK midwives, but this is due to change. Evidence relating to clinical supervision for nurses and allied health professions could inform a new model of clinical supervision for midwives. A systematic review with a contingent design, comprising a broad map of research relating to clinical supervision and two focussed syntheses answering specific review questions. Electronic databases were searched from 2005 - September 2015, limited to English-language peer-reviewed publications. Systematic reviews evaluating the effectiveness of clinical supervision were included in Synthesis 1. Primary research studies including a description of a clinical supervision intervention were included in Synthesis 2. Quality of reviews were judged using a risk of bias tool and review results summarized in tables. Data describing the key components of clinical supervision interventions were extracted from studies included in Synthesis 2, categorized using a reporting framework and a narrative account provided. Ten reviews were included in Synthesis 1; these demonstrated an absence of convincing empirical evidence and lack of agreement over the nature of clinical supervision. Nineteen primary studies were included in Synthesis 2; these highlighted a lack of consistency and large variations between delivered interventions. Despite insufficient evidence to directly inform the selection and implementation of a framework, the limited available evidence can inform the design of a new model of clinical supervision for UK-based midwives. © 2017 John Wiley & Sons Ltd.
Can we replace curation with information extraction software?
Karp, Peter D
2016-01-01
Can we use programs for automated or semi-automated information extraction from scientific texts as practical alternatives to professional curation? I show that error rates of current information extraction programs are too high to replace professional curation today. Furthermore, current IEP programs extract single narrow slivers of information, such as individual protein interactions; they cannot extract the large breadth of information extracted by professional curators for databases such as EcoCyc. They also cannot arbitrate among conflicting statements in the literature as curators can. Therefore, funding agencies should not hobble the curation efforts of existing databases on the assumption that a problem that has stymied Artificial Intelligence researchers for more than 60 years will be solved tomorrow. Semi-automated extraction techniques appear to have significantly more potential based on a review of recent tools that enhance curator productivity. But a full cost-benefit analysis for these tools is lacking. Without such analysis it is possible to expend significant effort developing information-extraction tools that automate small parts of the overall curation workflow without achieving a significant decrease in curation costs.Database URL. © The Author(s) 2016. Published by Oxford University Press.
Wire connector classification with machine vision and a novel hybrid SVM
NASA Astrophysics Data System (ADS)
Chauhan, Vedang; Joshi, Keyur D.; Surgenor, Brian W.
2018-04-01
A machine vision-based system has been developed and tested that uses a novel hybrid Support Vector Machine (SVM) in a part inspection application with clear plastic wire connectors. The application required the system to differentiate between 4 different known styles of connectors plus one unknown style, for a total of 5 classes. The requirement to handle an unknown class is what necessitated the hybrid approach. The system was trained with the 4 known classes and tested with 5 classes (the 4 known plus the 1 unknown). The hybrid classification approach used two layers of SVMs: one layer was semi-supervised and the other layer was supervised. The semi-supervised SVM was a special case of unsupervised machine learning that classified test images as one of the 4 known classes (to accept) or as the unknown class (to reject). The supervised SVM classified test images as one of the 4 known classes and consequently would give false positives (FPs). Two methods were tested. The difference between the methods was that the order of the layers was switched. The method with the semi-supervised layer first gave an accuracy of 80% with 20% FPs. The method with the supervised layer first gave an accuracy of 98% with 0% FPs. Further work is being conducted to see if the hybrid approach works with other applications that have an unknown class requirement.
Tagline: Information Extraction for Semi-Structured Text Elements in Medical Progress Notes
ERIC Educational Resources Information Center
Finch, Dezon Kile
2012-01-01
Text analysis has become an important research activity in the Department of Veterans Affairs (VA). Statistical text mining and natural language processing have been shown to be very effective for extracting useful information from medical documents. However, neither of these techniques is effective at extracting the information stored in…
FRaC: a feature-modeling approach for semi-supervised and unsupervised anomaly detection.
Noto, Keith; Brodley, Carla; Slonim, Donna
2012-01-01
Anomaly detection involves identifying rare data instances (anomalies) that come from a different class or distribution than the majority (which are simply called "normal" instances). Given a training set of only normal data, the semi-supervised anomaly detection task is to identify anomalies in the future. Good solutions to this task have applications in fraud and intrusion detection. The unsupervised anomaly detection task is different: Given unlabeled, mostly-normal data, identify the anomalies among them. Many real-world machine learning tasks, including many fraud and intrusion detection tasks, are unsupervised because it is impractical (or impossible) to verify all of the training data. We recently presented FRaC, a new approach for semi-supervised anomaly detection. FRaC is based on using normal instances to build an ensemble of feature models, and then identifying instances that disagree with those models as anomalous. In this paper, we investigate the behavior of FRaC experimentally and explain why FRaC is so successful. We also show that FRaC is a superior approach for the unsupervised as well as the semi-supervised anomaly detection task, compared to well-known state-of-the-art anomaly detection methods, LOF and one-class support vector machines, and to an existing feature-modeling approach.
FRaC: a feature-modeling approach for semi-supervised and unsupervised anomaly detection
Brodley, Carla; Slonim, Donna
2011-01-01
Anomaly detection involves identifying rare data instances (anomalies) that come from a different class or distribution than the majority (which are simply called “normal” instances). Given a training set of only normal data, the semi-supervised anomaly detection task is to identify anomalies in the future. Good solutions to this task have applications in fraud and intrusion detection. The unsupervised anomaly detection task is different: Given unlabeled, mostly-normal data, identify the anomalies among them. Many real-world machine learning tasks, including many fraud and intrusion detection tasks, are unsupervised because it is impractical (or impossible) to verify all of the training data. We recently presented FRaC, a new approach for semi-supervised anomaly detection. FRaC is based on using normal instances to build an ensemble of feature models, and then identifying instances that disagree with those models as anomalous. In this paper, we investigate the behavior of FRaC experimentally and explain why FRaC is so successful. We also show that FRaC is a superior approach for the unsupervised as well as the semi-supervised anomaly detection task, compared to well-known state-of-the-art anomaly detection methods, LOF and one-class support vector machines, and to an existing feature-modeling approach. PMID:22639542
Computerized breast cancer analysis system using three stage semi-supervised learning method.
Sun, Wenqing; Tseng, Tzu-Liang Bill; Zhang, Jianying; Qian, Wei
2016-10-01
A large number of labeled medical image data is usually a requirement to train a well-performed computer-aided detection (CAD) system. But the process of data labeling is time consuming, and potential ethical and logistical problems may also present complications. As a result, incorporating unlabeled data into CAD system can be a feasible way to combat these obstacles. In this study we developed a three stage semi-supervised learning (SSL) scheme that combines a small amount of labeled data and larger amount of unlabeled data. The scheme was modified on our existing CAD system using the following three stages: data weighing, feature selection, and newly proposed dividing co-training data labeling algorithm. Global density asymmetry features were incorporated to the feature pool to reduce the false positive rate. Area under the curve (AUC) and accuracy were computed using 10 fold cross validation method to evaluate the performance of our CAD system. The image dataset includes mammograms from 400 women who underwent routine screening examinations, and each pair contains either two cranio-caudal (CC) or two mediolateral-oblique (MLO) view mammograms from the right and the left breasts. From these mammograms 512 regions were extracted and used in this study, and among them 90 regions were treated as labeled while the rest were treated as unlabeled. Using our proposed scheme, the highest AUC observed in our research was 0.841, which included the 90 labeled data and all the unlabeled data. It was 7.4% higher than using labeled data only. With the increasing amount of labeled data, AUC difference between using mixed data and using labeled data only reached its peak when the amount of labeled data was around 60. This study demonstrated that our proposed three stage semi-supervised learning can improve the CAD performance by incorporating unlabeled data. Using unlabeled data is promising in computerized cancer research and may have a significant impact for future CAD system applications. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
He, Dengchao; Zhang, Hongjun; Hao, Wenning; Zhang, Rui; Cheng, Kai
2017-07-01
Distant supervision, a widely applied approach in the field of relation extraction can automatically generate large amounts of labeled training corpus with minimal manual effort. However, the labeled training corpus may have many false-positive data, which would hurt the performance of relation extraction. Moreover, in traditional feature-based distant supervised approaches, extraction models adopt human design features with natural language processing. It may also cause poor performance. To address these two shortcomings, we propose a customized attention-based long short-term memory network. Our approach adopts word-level attention to achieve better data representation for relation extraction without manually designed features to perform distant supervision instead of fully supervised relation extraction, and it utilizes instance-level attention to tackle the problem of false-positive data. Experimental results demonstrate that our proposed approach is effective and achieves better performance than traditional methods.
Zhang, Xianchang; Cheng, Hewei; Zuo, Zhentao; Zhou, Ke; Cong, Fei; Wang, Bo; Zhuo, Yan; Chen, Lin; Xue, Rong; Fan, Yong
2018-01-01
The amygdala plays an important role in emotional functions and its dysfunction is considered to be associated with multiple psychiatric disorders in humans. Cytoarchitectonic mapping has demonstrated that the human amygdala complex comprises several subregions. However, it's difficult to delineate boundaries of these subregions in vivo even if using state of the art high resolution structural MRI. Previous attempts to parcellate this small structure using unsupervised clustering methods based on resting state fMRI data suffered from the low spatial resolution of typical fMRI data, and it remains challenging for the unsupervised methods to define subregions of the amygdala in vivo . In this study, we developed a novel brain parcellation method to segment the human amygdala into spatially contiguous subregions based on 7T high resolution fMRI data. The parcellation was implemented using a semi-supervised spectral clustering (SSC) algorithm at an individual subject level. Under guidance of prior information derived from the Julich cytoarchitectonic atlas, our method clustered voxels of the amygdala into subregions according to similarity measures of their functional signals. As a result, three distinct amygdala subregions can be obtained in each hemisphere for every individual subject. Compared with the cytoarchitectonic atlas, our method achieved better performance in terms of subregional functional homogeneity. Validation experiments have also demonstrated that the amygdala subregions obtained by our method have distinctive, lateralized functional connectivity (FC) patterns. Our study has demonstrated that the semi-supervised brain parcellation method is a powerful tool for exploring amygdala subregional functions.
Wang, Hui; Zhang, Weide; Zeng, Qiang; Li, Zuofeng; Feng, Kaiyan; Liu, Lei
2014-04-01
Extracting information from unstructured clinical narratives is valuable for many clinical applications. Although natural Language Processing (NLP) methods have been profoundly studied in electronic medical records (EMR), few studies have explored NLP in extracting information from Chinese clinical narratives. In this study, we report the development and evaluation of extracting tumor-related information from operation notes of hepatic carcinomas which were written in Chinese. Using 86 operation notes manually annotated by physicians as the training set, we explored both rule-based and supervised machine-learning approaches. Evaluating on unseen 29 operation notes, our best approach yielded 69.6% in precision, 58.3% in recall and 63.5% F-score. Copyright © 2014 Elsevier Inc. All rights reserved.
ERIC Educational Resources Information Center
Koltz, Rebecca L.; Feit, Stephen S.
2012-01-01
The experiences of live supervision for three, master's level, pre-practicum counseling students were explored using a phenomenological methodology. Using semi-structured interviews, this study resulted in a thick description of the experience of live supervision capturing participants' thoughts, emotions, and behaviors. Data revealed that live…
Han, Xu; Kim, Jung-jae; Kwoh, Chee Keong
2016-01-01
Biomedical text mining may target various kinds of valuable information embedded in the literature, but a critical obstacle to the extension of the mining targets is the cost of manual construction of labeled data, which are required for state-of-the-art supervised learning systems. Active learning is to choose the most informative documents for the supervised learning in order to reduce the amount of required manual annotations. Previous works of active learning, however, focused on the tasks of entity recognition and protein-protein interactions, but not on event extraction tasks for multiple event types. They also did not consider the evidence of event participants, which might be a clue for the presence of events in unlabeled documents. Moreover, the confidence scores of events produced by event extraction systems are not reliable for ranking documents in terms of informativity for supervised learning. We here propose a novel committee-based active learning method that supports multi-event extraction tasks and employs a new statistical method for informativity estimation instead of using the confidence scores from event extraction systems. Our method is based on a committee of two systems as follows: We first employ an event extraction system to filter potential false negatives among unlabeled documents, from which the system does not extract any event. We then develop a statistical method to rank the potential false negatives of unlabeled documents 1) by using a language model that measures the probabilities of the expression of multiple events in documents and 2) by using a named entity recognition system that locates the named entities that can be event arguments (e.g. proteins). The proposed method further deals with unknown words in test data by using word similarity measures. We also apply our active learning method for the task of named entity recognition. We evaluate the proposed method against the BioNLP Shared Tasks datasets, and show that our method can achieve better performance than such previous methods as entropy and Gibbs error based methods and a conventional committee-based method. We also show that the incorporation of named entity recognition into the active learning for event extraction and the unknown word handling further improve the active learning method. In addition, the adaptation of the active learning method into named entity recognition tasks also improves the document selection for manual annotation of named entities.
Graph-Based Weakly-Supervised Methods for Information Extraction & Integration
ERIC Educational Resources Information Center
Talukdar, Partha Pratim
2010-01-01
The variety and complexity of potentially-related data resources available for querying--webpages, databases, data warehouses--has been growing ever more rapidly. There is a growing need to pose integrative queries "across" multiple such sources, exploiting foreign keys and other means of interlinking data to merge information from diverse…
NASA Astrophysics Data System (ADS)
Ma, Xiaoke; Wang, Bingbo; Yu, Liang
2018-01-01
Community detection is fundamental for revealing the structure-functionality relationship in complex networks, which involves two issues-the quantitative function for community as well as algorithms to discover communities. Despite significant research on either of them, few attempt has been made to establish the connection between the two issues. To attack this problem, a generalized quantification function is proposed for community in weighted networks, which provides a framework that unifies several well-known measures. Then, we prove that the trace optimization of the proposed measure is equivalent with the objective functions of algorithms such as nonnegative matrix factorization, kernel K-means as well as spectral clustering. It serves as the theoretical foundation for designing algorithms for community detection. On the second issue, a semi-supervised spectral clustering algorithm is developed by exploring the equivalence relation via combining the nonnegative matrix factorization and spectral clustering. Different from the traditional semi-supervised algorithms, the partial supervision is integrated into the objective of the spectral algorithm. Finally, through extensive experiments on both artificial and real world networks, we demonstrate that the proposed method improves the accuracy of the traditional spectral algorithms in community detection.
NASA Astrophysics Data System (ADS)
Valizadegan, Hamed; Martin, Rodney; McCauliff, Sean D.; Jenkins, Jon Michael; Catanzarite, Joseph; Oza, Nikunj C.
2015-08-01
Building new catalogues of planetary candidates, astrophysical false alarms, and non-transiting phenomena is a challenging task that currently requires a reviewing team of astrophysicists and astronomers. These scientists need to examine more than 100 diagnostic metrics and associated graphics for each candidate exoplanet-transit-like signal to classify it into one of the three classes. Considering that the NASA Explorer Program's TESS mission and ESA's PLATO mission survey even a larger area of space, the classification of their transit-like signals is more time-consuming for human agents and a bottleneck to successfully construct the new catalogues in a timely manner. This encourages building automatic classification tools that can quickly and reliably classify the new signal data from these missions. The standard tool for building automatic classification systems is the supervised machine learning that requires a large set of highly accurate labeled examples in order to build an effective classifier. This requirement cannot be easily met for classifying transit-like signals because not only are existing labeled signals very limited, but also the current labels may not be reliable (because the labeling process is a subjective task). Our experiments with using different supervised classifiers to categorize transit-like signals verifies that the labeled signals are not rich enough to provide the classifier with enough power to generalize well beyond the observed cases (e.g. to unseen or test signals). That motivated us to utilize a new category of learning techniques, so-called semi-supervised learning, that combines the label information from the costly labeled signals, and distribution information from the cheaply available unlabeled signals in order to construct more effective classifiers. Our study on the Kepler Mission data shows that semi-supervised learning can significantly improve the result of multiple base classifiers (e.g. Support Vector Machines, AdaBoost, and Decision Tree) and is a good technique for automatic classification of exoplanet-transit-like signal.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pullum, Laura L; Symons, Christopher T
2011-01-01
Machine learning is used in many applications, from machine vision to speech recognition to decision support systems, and is used to test applications. However, though much has been done to evaluate the performance of machine learning algorithms, little has been done to verify the algorithms or examine their failure modes. Moreover, complex learning frameworks often require stepping beyond black box evaluation to distinguish between errors based on natural limits on learning and errors that arise from mistakes in implementation. We present a conceptual architecture, failure model and taxonomy, and failure modes and effects analysis (FMEA) of a semi-supervised, multi-modal learningmore » system, and provide specific examples from its use in a radiological analysis assistant system. The goal of the research described in this paper is to provide a foundation from which dependability analysis of systems using semi-supervised, multi-modal learning can be conducted. The methods presented provide a first step towards that overall goal.« less
Deep Learning for Extreme Weather Detection
NASA Astrophysics Data System (ADS)
Prabhat, M.; Racah, E.; Biard, J.; Liu, Y.; Mudigonda, M.; Kashinath, K.; Beckham, C.; Maharaj, T.; Kahou, S.; Pal, C.; O'Brien, T. A.; Wehner, M. F.; Kunkel, K.; Collins, W. D.
2017-12-01
We will present our latest results from the application of Deep Learning methods for detecting, localizing and segmenting extreme weather patterns in climate data. We have successfully applied supervised convolutional architectures for the binary classification tasks of detecting tropical cyclones and atmospheric rivers in centered, cropped patches. We have subsequently extended our architecture to a semi-supervised formulation, which is capable of learning a unified representation of multiple weather patterns, predicting bounding boxes and object categories, and has the capability to detect novel patterns (w/ few, or no labels). We will briefly present our efforts in scaling the semi-supervised architecture to 9600 nodes of the Cori supercomputer, obtaining 15PF performance. Time permitting, we will highlight our efforts in pixel-level segmentation of weather patterns.
Active learning based segmentation of Crohns disease from abdominal MRI.
Mahapatra, Dwarikanath; Vos, Franciscus M; Buhmann, Joachim M
2016-05-01
This paper proposes a novel active learning (AL) framework, and combines it with semi supervised learning (SSL) for segmenting Crohns disease (CD) tissues from abdominal magnetic resonance (MR) images. Robust fully supervised learning (FSL) based classifiers require lots of labeled data of different disease severities. Obtaining such data is time consuming and requires considerable expertise. SSL methods use a few labeled samples, and leverage the information from many unlabeled samples to train an accurate classifier. AL queries labels of most informative samples and maximizes gain from the labeling effort. Our primary contribution is in designing a query strategy that combines novel context information with classification uncertainty and feature similarity. Combining SSL and AL gives a robust segmentation method that: (1) optimally uses few labeled samples and many unlabeled samples; and (2) requires lower training time. Experimental results show our method achieves higher segmentation accuracy than FSL methods with fewer samples and reduced training effort. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Du, Shihong; Zhang, Fangli; Zhang, Xiuyuan
2015-07-01
While most existing studies have focused on extracting geometric information on buildings, only a few have concentrated on semantic information. The lack of semantic information cannot satisfy many demands on resolving environmental and social issues. This study presents an approach to semantically classify buildings into much finer categories than those of existing studies by learning random forest (RF) classifier from a large number of imbalanced samples with high-dimensional features. First, a two-level segmentation mechanism combining GIS and VHR image produces single image objects at a large scale and intra-object components at a small scale. Second, a semi-supervised method chooses a large number of unbiased samples by considering the spatial proximity and intra-cluster similarity of buildings. Third, two important improvements in RF classifier are made: a voting-distribution ranked rule for reducing the influences of imbalanced samples on classification accuracy and a feature importance measurement for evaluating each feature's contribution to the recognition of each category. Fourth, the semantic classification of urban buildings is practically conducted in Beijing city, and the results demonstrate that the proposed approach is effective and accurate. The seven categories used in the study are finer than those in existing work and more helpful to studying many environmental and social problems.
Perualila-Tan, Nolen Joy; Shkedy, Ziv; Talloen, Willem; Göhlmann, Hinrich W H; Moerbeke, Marijke Van; Kasim, Adetayo
2016-08-01
The modern process of discovering candidate molecules in early drug discovery phase includes a wide range of approaches to extract vital information from the intersection of biology and chemistry. A typical strategy in compound selection involves compound clustering based on chemical similarity to obtain representative chemically diverse compounds (not incorporating potency information). In this paper, we propose an integrative clustering approach that makes use of both biological (compound efficacy) and chemical (structural features) data sources for the purpose of discovering a subset of compounds with aligned structural and biological properties. The datasets are integrated at the similarity level by assigning complementary weights to produce a weighted similarity matrix, serving as a generic input in any clustering algorithm. This new analysis work flow is semi-supervised method since, after the determination of clusters, a secondary analysis is performed wherein it finds differentially expressed genes associated to the derived integrated cluster(s) to further explain the compound-induced biological effects inside the cell. In this paper, datasets from two drug development oncology projects are used to illustrate the usefulness of the weighted similarity-based clustering approach to integrate multi-source high-dimensional information to aid drug discovery. Compounds that are structurally and biologically similar to the reference compounds are discovered using this proposed integrative approach.
NASA Astrophysics Data System (ADS)
Salman, S. S.; Abbas, W. A.
2018-05-01
The goal of the study is to support analysis Enhancement of Resolution and study effect on classification methods on bands spectral information of specific and quantitative approaches. In this study introduce a method to enhancement resolution Landsat 8 of combining the bands spectral of 30 meters resolution with panchromatic band 8 of 15 meters resolution, because of importance multispectral imagery to extracting land - cover. Classification methods used in this study to classify several lands -covers recorded from OLI- 8 imagery. Two methods of Data mining can be classified as either supervised or unsupervised. In supervised methods, there is a particular predefined target, that means the algorithm learn which values of the target are associated with which values of the predictor sample. K-nearest neighbors and maximum likelihood algorithms examine in this work as supervised methods. In other hand, no sample identified as target in unsupervised methods, the algorithm of data extraction searches for structure and patterns between all the variables, represented by Fuzzy C-mean clustering method as one of the unsupervised methods, NDVI vegetation index used to compare the results of classification method, the percent of dense vegetation in maximum likelihood method give a best results.
Supervised guiding long-short term memory for image caption generation based on object classes
NASA Astrophysics Data System (ADS)
Wang, Jian; Cao, Zhiguo; Xiao, Yang; Qi, Xinyuan
2018-03-01
The present models of image caption generation have the problems of image visual semantic information attenuation and errors in guidance information. In order to solve these problems, we propose a supervised guiding Long Short Term Memory model based on object classes, named S-gLSTM for short. It uses the object detection results from R-FCN as supervisory information with high confidence, and updates the guidance word set by judging whether the last output matches the supervisory information. S-gLSTM learns how to extract the current interested information from the image visual se-mantic information based on guidance word set. The interested information is fed into the S-gLSTM at each iteration as guidance information, to guide the caption generation. To acquire the text-related visual semantic information, the S-gLSTM fine-tunes the weights of the network through the back-propagation of the guiding loss. Complementing guidance information at each iteration solves the problem of visual semantic information attenuation in the traditional LSTM model. Besides, the supervised guidance information in our model can reduce the impact of the mismatched words on the caption generation. We test our model on MSCOCO2014 dataset, and obtain better performance than the state-of-the- art models.
Audio-Visual Speaker Diarization Based on Spatiotemporal Bayesian Fusion.
Gebru, Israel D; Ba, Sileye; Li, Xiaofei; Horaud, Radu
2018-05-01
Speaker diarization consists of assigning speech signals to people engaged in a dialogue. An audio-visual spatiotemporal diarization model is proposed. The model is well suited for challenging scenarios that consist of several participants engaged in multi-party interaction while they move around and turn their heads towards the other participants rather than facing the cameras and the microphones. Multiple-person visual tracking is combined with multiple speech-source localization in order to tackle the speech-to-person association problem. The latter is solved within a novel audio-visual fusion method on the following grounds: binaural spectral features are first extracted from a microphone pair, then a supervised audio-visual alignment technique maps these features onto an image, and finally a semi-supervised clustering method assigns binaural spectral features to visible persons. The main advantage of this method over previous work is that it processes in a principled way speech signals uttered simultaneously by multiple persons. The diarization itself is cast into a latent-variable temporal graphical model that infers speaker identities and speech turns, based on the output of an audio-visual association process, executed at each time slice, and on the dynamics of the diarization variable itself. The proposed formulation yields an efficient exact inference procedure. A novel dataset, that contains audio-visual training data as well as a number of scenarios involving several participants engaged in formal and informal dialogue, is introduced. The proposed method is thoroughly tested and benchmarked with respect to several state-of-the art diarization algorithms.
Semi-supervised morphosyntactic classification of Old Icelandic.
Urban, Kryztof; Tangherlini, Timothy R; Vijūnas, Aurelijus; Broadwell, Peter M
2014-01-01
We present IceMorph, a semi-supervised morphosyntactic analyzer of Old Icelandic. In addition to machine-read corpora and dictionaries, it applies a small set of declension prototypes to map corpus words to dictionary entries. A web-based GUI allows expert users to modify and augment data through an online process. A machine learning module incorporates prototype data, edit-distance metrics, and expert feedback to continuously update part-of-speech and morphosyntactic classification. An advantage of the analyzer is its ability to achieve competitive classification accuracy with minimum training data.
Guo, Yufan; Silins, Ilona; Stenius, Ulla; Korhonen, Anna
2013-06-01
Techniques that are capable of automatically analyzing the information structure of scientific articles could be highly useful for improving information access to biomedical literature. However, most existing approaches rely on supervised machine learning (ML) and substantial labeled data that are expensive to develop and apply to different sub-fields of biomedicine. Recent research shows that minimal supervision is sufficient for fairly accurate information structure analysis of biomedical abstracts. However, is it realistic for full articles given their high linguistic and informational complexity? We introduce and release a novel corpus of 50 biomedical articles annotated according to the Argumentative Zoning (AZ) scheme, and investigate active learning with one of the most widely used ML models-Support Vector Machines (SVM)-on this corpus. Additionally, we introduce two novel applications that use AZ to support real-life literature review in biomedicine via question answering and summarization. We show that active learning with SVM trained on 500 labeled sentences (6% of the corpus) performs surprisingly well with the accuracy of 82%, just 2% lower than fully supervised learning. In our question answering task, biomedical researchers find relevant information significantly faster from AZ-annotated than unannotated articles. In the summarization task, sentences extracted from particular zones are significantly more similar to gold standard summaries than those extracted from particular sections of full articles. These results demonstrate that active learning of full articles' information structure is indeed realistic and the accuracy is high enough to support real-life literature review in biomedicine. The annotated corpus, our AZ classifier and the two novel applications are available at http://www.cl.cam.ac.uk/yg244/12bioinfo.html
NASA Astrophysics Data System (ADS)
Brook, A.; Cristofani, E.; Vandewal, M.; Matheis, C.; Jonuscheit, J.; Beigang, R.
2012-05-01
The present study proposes a fully integrated, semi-automatic and near real-time mode-operated image processing methodology developed for Frequency-Modulated Continuous-Wave (FMCW) THz images with the center frequencies around: 100 GHz and 300 GHz. The quality control of aeronautics composite multi-layered materials and structures using Non-Destructive Testing is the main focus of this work. Image processing is applied on the 3-D images to extract useful information. The data is processed by extracting areas of interest. The detected areas are subjected to image analysis for more particular investigation managed by a spatial model. Finally, the post-processing stage examines and evaluates the spatial accuracy of the extracted information.
Semi-supervised learning for photometric supernova classification
NASA Astrophysics Data System (ADS)
Richards, Joseph W.; Homrighausen, Darren; Freeman, Peter E.; Schafer, Chad M.; Poznanski, Dovi
2012-01-01
We present a semi-supervised method for photometric supernova typing. Our approach is to first use the non-linear dimension reduction technique diffusion map to detect structure in a data base of supernova light curves and subsequently employ random forest classification on a spectroscopically confirmed training set to learn a model that can predict the type of each newly observed supernova. We demonstrate that this is an effective method for supernova typing. As supernova numbers increase, our semi-supervised method efficiently utilizes this information to improve classification, a property not enjoyed by template-based methods. Applied to supernova data simulated by Kessler et al. to mimic those of the Dark Energy Survey, our methods achieve (cross-validated) 95 per cent Type Ia purity and 87 per cent Type Ia efficiency on the spectroscopic sample, but only 50 per cent Type Ia purity and 50 per cent efficiency on the photometric sample due to their spectroscopic follow-up strategy. To improve the performance on the photometric sample, we search for better spectroscopic follow-up procedures by studying the sensitivity of our machine-learned supernova classification on the specific strategy used to obtain training sets. With a fixed amount of spectroscopic follow-up time, we find that, despite collecting data on a smaller number of supernovae, deeper magnitude-limited spectroscopic surveys are better for producing training sets. For supernova Ia (II-P) typing, we obtain a 44 per cent (1 per cent) increase in purity to 72 per cent (87 per cent) and 30 per cent (162 per cent) increase in efficiency to 65 per cent (84 per cent) of the sample using a 25th (24.5th) magnitude-limited survey instead of the shallower spectroscopic sample used in the original simulations. When redshift information is available, we incorporate it into our analysis using a novel method of altering the diffusion map representation of the supernovae. Incorporating host redshifts leads to a 5 per cent improvement in Type Ia purity and 13 per cent improvement in Type Ia efficiency. A web service for the supernova classification method used in this paper can be found at .
Adaptive distance metric learning for diffusion tensor image segmentation.
Kong, Youyong; Wang, Defeng; Shi, Lin; Hui, Steve C N; Chu, Winnie C W
2014-01-01
High quality segmentation of diffusion tensor images (DTI) is of key interest in biomedical research and clinical application. In previous studies, most efforts have been made to construct predefined metrics for different DTI segmentation tasks. These methods require adequate prior knowledge and tuning parameters. To overcome these disadvantages, we proposed to automatically learn an adaptive distance metric by a graph based semi-supervised learning model for DTI segmentation. An original discriminative distance vector was first formulated by combining both geometry and orientation distances derived from diffusion tensors. The kernel metric over the original distance and labels of all voxels were then simultaneously optimized in a graph based semi-supervised learning approach. Finally, the optimization task was efficiently solved with an iterative gradient descent method to achieve the optimal solution. With our approach, an adaptive distance metric could be available for each specific segmentation task. Experiments on synthetic and real brain DTI datasets were performed to demonstrate the effectiveness and robustness of the proposed distance metric learning approach. The performance of our approach was compared with three classical metrics in the graph based semi-supervised learning framework.
Adaptive Distance Metric Learning for Diffusion Tensor Image Segmentation
Kong, Youyong; Wang, Defeng; Shi, Lin; Hui, Steve C. N.; Chu, Winnie C. W.
2014-01-01
High quality segmentation of diffusion tensor images (DTI) is of key interest in biomedical research and clinical application. In previous studies, most efforts have been made to construct predefined metrics for different DTI segmentation tasks. These methods require adequate prior knowledge and tuning parameters. To overcome these disadvantages, we proposed to automatically learn an adaptive distance metric by a graph based semi-supervised learning model for DTI segmentation. An original discriminative distance vector was first formulated by combining both geometry and orientation distances derived from diffusion tensors. The kernel metric over the original distance and labels of all voxels were then simultaneously optimized in a graph based semi-supervised learning approach. Finally, the optimization task was efficiently solved with an iterative gradient descent method to achieve the optimal solution. With our approach, an adaptive distance metric could be available for each specific segmentation task. Experiments on synthetic and real brain DTI datasets were performed to demonstrate the effectiveness and robustness of the proposed distance metric learning approach. The performance of our approach was compared with three classical metrics in the graph based semi-supervised learning framework. PMID:24651858
Jiang, Yizhang; Wu, Dongrui; Deng, Zhaohong; Qian, Pengjiang; Wang, Jun; Wang, Guanjin; Chung, Fu-Lai; Choi, Kup-Sze; Wang, Shitong
2017-12-01
Recognition of epileptic seizures from offline EEG signals is very important in clinical diagnosis of epilepsy. Compared with manual labeling of EEG signals by doctors, machine learning approaches can be faster and more consistent. However, the classification accuracy is usually not satisfactory for two main reasons: the distributions of the data used for training and testing may be different, and the amount of training data may not be enough. In addition, most machine learning approaches generate black-box models that are difficult to interpret. In this paper, we integrate transductive transfer learning, semi-supervised learning and TSK fuzzy system to tackle these three problems. More specifically, we use transfer learning to reduce the discrepancy in data distribution between the training and testing data, employ semi-supervised learning to use the unlabeled testing data to remedy the shortage of training data, and adopt TSK fuzzy system to increase model interpretability. Two learning algorithms are proposed to train the system. Our experimental results show that the proposed approaches can achieve better performance than many state-of-the-art seizure classification algorithms.
Le, Hoang-Quynh; Tran, Mai-Vu; Dang, Thanh Hai; Ha, Quang-Thuy; Collier, Nigel
2016-07-01
The BioCreative V chemical-disease relation (CDR) track was proposed to accelerate the progress of text mining in facilitating integrative understanding of chemicals, diseases and their relations. In this article, we describe an extension of our system (namely UET-CAM) that participated in the BioCreative V CDR. The original UET-CAM system's performance was ranked fourth among 18 participating systems by the BioCreative CDR track committee. In the Disease Named Entity Recognition and Normalization (DNER) phase, our system employed joint inference (decoding) with a perceptron-based named entity recognizer (NER) and a back-off model with Semantic Supervised Indexing and Skip-gram for named entity normalization. In the chemical-induced disease (CID) relation extraction phase, we proposed a pipeline that includes a coreference resolution module and a Support Vector Machine relation extraction model. The former module utilized a multi-pass sieve to extend entity recall. In this article, the UET-CAM system was improved by adding a 'silver' CID corpus to train the prediction model. This silver standard corpus of more than 50 thousand sentences was automatically built based on the Comparative Toxicogenomics Database (CTD) database. We evaluated our method on the CDR test set. Results showed that our system could reach the state of the art performance with F1 of 82.44 for the DNER task and 58.90 for the CID task. Analysis demonstrated substantial benefits of both the multi-pass sieve coreference resolution method (F1 + 4.13%) and the silver CID corpus (F1 +7.3%).Database URL: SilverCID-The silver-standard corpus for CID relation extraction is freely online available at: https://zenodo.org/record/34530 (doi:10.5281/zenodo.34530). © The Author(s) 2016. Published by Oxford University Press.
Semi-Supervised Novelty Detection with Adaptive Eigenbases, and Application to Radio Transients
NASA Technical Reports Server (NTRS)
Thompson, David R.; Majid, Walid A.; Reed, Colorado J.; Wagstaff, Kiri L.
2011-01-01
We present a semi-supervised online method for novelty detection and evaluate its performance for radio astronomy time series data. Our approach uses adaptive eigenbases to combine 1) prior knowledge about uninteresting signals with 2) online estimation of the current data properties to enable highly sensitive and precise detection of novel signals. We apply the method to the problem of detecting fast transient radio anomalies and compare it to current alternative algorithms. Tests based on observations from the Parkes Multibeam Survey show both effective detection of interesting rare events and robustness to known false alarm anomalies.
The UK Postgraduate Masters Dissertation: An "Elusive Chameleon"?
ERIC Educational Resources Information Center
Pilcher, Nick
2011-01-01
Many studies into the process of producing and supervising dissertations exist, yet little research into the "product" of the Masters dissertation, or into how Masters supervision changes over time exist. Drawing on 62 semi-structured interviews with 31 Maths and Computer Science supervisors over a two-year period, this paper explores…
What's Not Being Said? Recollections of Nondisclosure in Clinical Supervision While in Training
ERIC Educational Resources Information Center
Sweeney, Jennifer; Creaner, Mary
2014-01-01
The aim of this qualitative study was to retrospectively examine nondisclosure in individual supervision while in training. Interviews were conducted with supervisees two years post-qualification. Specific nondisclosures were examined and reasons for these nondisclosures were explored. Six in-depth semi-structured interviews were conducted and…
A robust data-driven approach for gene ontology annotation.
Li, Yanpeng; Yu, Hong
2014-01-01
Gene ontology (GO) and GO annotation are important resources for biological information management and knowledge discovery, but the speed of manual annotation became a major bottleneck of database curation. BioCreative IV GO annotation task aims to evaluate the performance of system that automatically assigns GO terms to genes based on the narrative sentences in biomedical literature. This article presents our work in this task as well as the experimental results after the competition. For the evidence sentence extraction subtask, we built a binary classifier to identify evidence sentences using reference distance estimator (RDE), a recently proposed semi-supervised learning method that learns new features from around 10 million unlabeled sentences, achieving an F1 of 19.3% in exact match and 32.5% in relaxed match. In the post-submission experiment, we obtained 22.1% and 35.7% F1 performance by incorporating bigram features in RDE learning. In both development and test sets, RDE-based method achieved over 20% relative improvement on F1 and AUC performance against classical supervised learning methods, e.g. support vector machine and logistic regression. For the GO term prediction subtask, we developed an information retrieval-based method to retrieve the GO term most relevant to each evidence sentence using a ranking function that combined cosine similarity and the frequency of GO terms in documents, and a filtering method based on high-level GO classes. The best performance of our submitted runs was 7.8% F1 and 22.2% hierarchy F1. We found that the incorporation of frequency information and hierarchy filtering substantially improved the performance. In the post-submission evaluation, we obtained a 10.6% F1 using a simpler setting. Overall, the experimental analysis showed our approaches were robust in both the two tasks. © The Author(s) 2014. Published by Oxford University Press.
A Hybrid Semi-supervised Classification Scheme for Mining Multisource Geospatial Data
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vatsavai, Raju; Bhaduri, Budhendra L
2011-01-01
Supervised learning methods such as Maximum Likelihood (ML) are often used in land cover (thematic) classification of remote sensing imagery. ML classifier relies exclusively on spectral characteristics of thematic classes whose statistical distributions (class conditional probability densities) are often overlapping. The spectral response distributions of thematic classes are dependent on many factors including elevation, soil types, and ecological zones. A second problem with statistical classifiers is the requirement of large number of accurate training samples (10 to 30 |dimensions|), which are often costly and time consuming to acquire over large geographic regions. With the increasing availability of geospatial databases, itmore » is possible to exploit the knowledge derived from these ancillary datasets to improve classification accuracies even when the class distributions are highly overlapping. Likewise newer semi-supervised techniques can be adopted to improve the parameter estimates of statistical model by utilizing a large number of easily available unlabeled training samples. Unfortunately there is no convenient multivariate statistical model that can be employed for mulitsource geospatial databases. In this paper we present a hybrid semi-supervised learning algorithm that effectively exploits freely available unlabeled training samples from multispectral remote sensing images and also incorporates ancillary geospatial databases. We have conducted several experiments on real datasets, and our new hybrid approach shows over 25 to 35% improvement in overall classification accuracy over conventional classification schemes.« less
Doostparast Torshizi, Abolfazl; Petzold, Linda R
2018-01-01
Data integration methods that combine data from different molecular levels such as genome, epigenome, transcriptome, etc., have received a great deal of interest in the past few years. It has been demonstrated that the synergistic effects of different biological data types can boost learning capabilities and lead to a better understanding of the underlying interactions among molecular levels. In this paper we present a graph-based semi-supervised classification algorithm that incorporates latent biological knowledge in the form of biological pathways with gene expression and DNA methylation data. The process of graph construction from biological pathways is based on detecting condition-responsive genes, where 3 sets of genes are finally extracted: all condition responsive genes, high-frequency condition-responsive genes, and P-value-filtered genes. The proposed approach is applied to ovarian cancer data downloaded from the Human Genome Atlas. Extensive numerical experiments demonstrate superior performance of the proposed approach compared to other state-of-the-art algorithms, including the latest graph-based classification techniques. Simulation results demonstrate that integrating various data types enhances classification performance and leads to a better understanding of interrelations between diverse omics data types. The proposed approach outperforms many of the state-of-the-art data integration algorithms. © The Author 2017. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Maximum margin semi-supervised learning with irrelevant data.
Yang, Haiqin; Huang, Kaizhu; King, Irwin; Lyu, Michael R
2015-10-01
Semi-supervised learning (SSL) is a typical learning paradigms training a model from both labeled and unlabeled data. The traditional SSL models usually assume unlabeled data are relevant to the labeled data, i.e., following the same distributions of the targeted labeled data. In this paper, we address a different, yet formidable scenario in semi-supervised classification, where the unlabeled data may contain irrelevant data to the labeled data. To tackle this problem, we develop a maximum margin model, named tri-class support vector machine (3C-SVM), to utilize the available training data, while seeking a hyperplane for separating the targeted data well. Our 3C-SVM exhibits several characteristics and advantages. First, it does not need any prior knowledge and explicit assumption on the data relatedness. On the contrary, it can relieve the effect of irrelevant unlabeled data based on the logistic principle and maximum entropy principle. That is, 3C-SVM approaches an ideal classifier. This classifier relies heavily on labeled data and is confident on the relevant data lying far away from the decision hyperplane, while maximally ignoring the irrelevant data, which are hardly distinguished. Second, theoretical analysis is provided to prove that in what condition, the irrelevant data can help to seek the hyperplane. Third, 3C-SVM is a generalized model that unifies several popular maximum margin models, including standard SVMs, Semi-supervised SVMs (S(3)VMs), and SVMs learned from the universum (U-SVMs) as its special cases. More importantly, we deploy a concave-convex produce to solve the proposed 3C-SVM, transforming the original mixed integer programming, to a semi-definite programming relaxation, and finally to a sequence of quadratic programming subproblems, which yields the same worst case time complexity as that of S(3)VMs. Finally, we demonstrate the effectiveness and efficiency of our proposed 3C-SVM through systematical experimental comparisons. Copyright © 2015 Elsevier Ltd. All rights reserved.
Extracting microRNA-gene relations from biomedical literature using distant supervision
Clarke, Luka A.; Couto, Francisco M.
2017-01-01
Many biomedical relation extraction approaches are based on supervised machine learning, requiring an annotated corpus. Distant supervision aims at training a classifier by combining a knowledge base with a corpus, reducing the amount of manual effort necessary. This is particularly useful for biomedicine because many databases and ontologies have been made available for many biological processes, while the availability of annotated corpora is still limited. We studied the extraction of microRNA-gene relations from text. MicroRNA regulation is an important biological process due to its close association with human diseases. The proposed method, IBRel, is based on distantly supervised multi-instance learning. We evaluated IBRel on three datasets, and the results were compared with a co-occurrence approach as well as a supervised machine learning algorithm. While supervised learning outperformed on two of those datasets, IBRel obtained an F-score 28.3 percentage points higher on the dataset for which there was no training set developed specifically. To demonstrate the applicability of IBRel, we used it to extract 27 miRNA-gene relations from recently published papers about cystic fibrosis. Our results demonstrate that our method can be successfully used to extract relations from literature about a biological process without an annotated corpus. The source code and data used in this study are available at https://github.com/AndreLamurias/IBRel. PMID:28263989
Extracting microRNA-gene relations from biomedical literature using distant supervision.
Lamurias, Andre; Clarke, Luka A; Couto, Francisco M
2017-01-01
Many biomedical relation extraction approaches are based on supervised machine learning, requiring an annotated corpus. Distant supervision aims at training a classifier by combining a knowledge base with a corpus, reducing the amount of manual effort necessary. This is particularly useful for biomedicine because many databases and ontologies have been made available for many biological processes, while the availability of annotated corpora is still limited. We studied the extraction of microRNA-gene relations from text. MicroRNA regulation is an important biological process due to its close association with human diseases. The proposed method, IBRel, is based on distantly supervised multi-instance learning. We evaluated IBRel on three datasets, and the results were compared with a co-occurrence approach as well as a supervised machine learning algorithm. While supervised learning outperformed on two of those datasets, IBRel obtained an F-score 28.3 percentage points higher on the dataset for which there was no training set developed specifically. To demonstrate the applicability of IBRel, we used it to extract 27 miRNA-gene relations from recently published papers about cystic fibrosis. Our results demonstrate that our method can be successfully used to extract relations from literature about a biological process without an annotated corpus. The source code and data used in this study are available at https://github.com/AndreLamurias/IBRel.
Fully Convolutional Network Based Shadow Extraction from GF-2 Imagery
NASA Astrophysics Data System (ADS)
Li, Z.; Cai, G.; Ren, H.
2018-04-01
There are many shadows on the high spatial resolution satellite images, especially in the urban areas. Although shadows on imagery severely affect the information extraction of land cover or land use, they provide auxiliary information for building extraction which is hard to achieve a satisfactory accuracy through image classification itself. This paper focused on the method of building shadow extraction by designing a fully convolutional network and training samples collected from GF-2 satellite imagery in the urban region of Changchun city. By means of spatial filtering and calculation of adjacent relationship along the sunlight direction, the small patches from vegetation or bridges have been eliminated from the preliminary extracted shadows. Finally, the building shadows were separated. The extracted building shadow information from the proposed method in this paper was compared with the results from the traditional object-oriented supervised classification algorihtms. It showed that the deep learning network approach can improve the accuracy to a large extent.
Semi supervised Learning of Feature Hierarchies for Object Detection in a Video (Open Access)
2013-10-03
dataset: PETS2009 Dataset, Oxford Town Center dataset [3], PNNL Parking Lot datasets [25] and CAVIAR cols1 dataset [1] for human detection. Be- sides, we...level features from TownCen- ter, ParkingLot, PETS09 and CAVIAR . As we can see that, the four set of features are visually very different from each other...information is more distinguished for detecting a person in the TownCen- ter than CAVIAR . Comparing figure 5(a) with 6(a), interest- ingly, the color
Sweeney, Elizabeth M.; Vogelstein, Joshua T.; Cuzzocreo, Jennifer L.; Calabresi, Peter A.; Reich, Daniel S.; Crainiceanu, Ciprian M.; Shinohara, Russell T.
2014-01-01
Machine learning is a popular method for mining and analyzing large collections of medical data. We focus on a particular problem from medical research, supervised multiple sclerosis (MS) lesion segmentation in structural magnetic resonance imaging (MRI). We examine the extent to which the choice of machine learning or classification algorithm and feature extraction function impacts the performance of lesion segmentation methods. As quantitative measures derived from structural MRI are important clinical tools for research into the pathophysiology and natural history of MS, the development of automated lesion segmentation methods is an active research field. Yet, little is known about what drives performance of these methods. We evaluate the performance of automated MS lesion segmentation methods, which consist of a supervised classification algorithm composed with a feature extraction function. These feature extraction functions act on the observed T1-weighted (T1-w), T2-weighted (T2-w) and fluid-attenuated inversion recovery (FLAIR) MRI voxel intensities. Each MRI study has a manual lesion segmentation that we use to train and validate the supervised classification algorithms. Our main finding is that the differences in predictive performance are due more to differences in the feature vectors, rather than the machine learning or classification algorithms. Features that incorporate information from neighboring voxels in the brain were found to increase performance substantially. For lesion segmentation, we conclude that it is better to use simple, interpretable, and fast algorithms, such as logistic regression, linear discriminant analysis, and quadratic discriminant analysis, and to develop the features to improve performance. PMID:24781953
Sweeney, Elizabeth M; Vogelstein, Joshua T; Cuzzocreo, Jennifer L; Calabresi, Peter A; Reich, Daniel S; Crainiceanu, Ciprian M; Shinohara, Russell T
2014-01-01
Machine learning is a popular method for mining and analyzing large collections of medical data. We focus on a particular problem from medical research, supervised multiple sclerosis (MS) lesion segmentation in structural magnetic resonance imaging (MRI). We examine the extent to which the choice of machine learning or classification algorithm and feature extraction function impacts the performance of lesion segmentation methods. As quantitative measures derived from structural MRI are important clinical tools for research into the pathophysiology and natural history of MS, the development of automated lesion segmentation methods is an active research field. Yet, little is known about what drives performance of these methods. We evaluate the performance of automated MS lesion segmentation methods, which consist of a supervised classification algorithm composed with a feature extraction function. These feature extraction functions act on the observed T1-weighted (T1-w), T2-weighted (T2-w) and fluid-attenuated inversion recovery (FLAIR) MRI voxel intensities. Each MRI study has a manual lesion segmentation that we use to train and validate the supervised classification algorithms. Our main finding is that the differences in predictive performance are due more to differences in the feature vectors, rather than the machine learning or classification algorithms. Features that incorporate information from neighboring voxels in the brain were found to increase performance substantially. For lesion segmentation, we conclude that it is better to use simple, interpretable, and fast algorithms, such as logistic regression, linear discriminant analysis, and quadratic discriminant analysis, and to develop the features to improve performance.
A semi-supervised learning approach for RNA secondary structure prediction.
Yonemoto, Haruka; Asai, Kiyoshi; Hamada, Michiaki
2015-08-01
RNA secondary structure prediction is a key technology in RNA bioinformatics. Most algorithms for RNA secondary structure prediction use probabilistic models, in which the model parameters are trained with reliable RNA secondary structures. Because of the difficulty of determining RNA secondary structures by experimental procedures, such as NMR or X-ray crystal structural analyses, there are still many RNA sequences that could be useful for training whose secondary structures have not been experimentally determined. In this paper, we introduce a novel semi-supervised learning approach for training parameters in a probabilistic model of RNA secondary structures in which we employ not only RNA sequences with annotated secondary structures but also ones with unknown secondary structures. Our model is based on a hybrid of generative (stochastic context-free grammars) and discriminative models (conditional random fields) that has been successfully applied to natural language processing. Computational experiments indicate that the accuracy of secondary structure prediction is improved by incorporating RNA sequences with unknown secondary structures into training. To our knowledge, this is the first study of a semi-supervised learning approach for RNA secondary structure prediction. This technique will be useful when the number of reliable structures is limited. Copyright © 2015 Elsevier Ltd. All rights reserved.
Accuracy of latent-variable estimation in Bayesian semi-supervised learning.
Yamazaki, Keisuke
2015-09-01
Hierarchical probabilistic models, such as Gaussian mixture models, are widely used for unsupervised learning tasks. These models consist of observable and latent variables, which represent the observable data and the underlying data-generation process, respectively. Unsupervised learning tasks, such as cluster analysis, are regarded as estimations of latent variables based on the observable ones. The estimation of latent variables in semi-supervised learning, where some labels are observed, will be more precise than that in unsupervised, and one of the concerns is to clarify the effect of the labeled data. However, there has not been sufficient theoretical analysis of the accuracy of the estimation of latent variables. In a previous study, a distribution-based error function was formulated, and its asymptotic form was calculated for unsupervised learning with generative models. It has been shown that, for the estimation of latent variables, the Bayes method is more accurate than the maximum-likelihood method. The present paper reveals the asymptotic forms of the error function in Bayesian semi-supervised learning for both discriminative and generative models. The results show that the generative model, which uses all of the given data, performs better when the model is well specified. Copyright © 2015 Elsevier Ltd. All rights reserved.
32 CFR 634.38 - Involuntary extraction of bodily fluids in traffic cases.
Code of Federal Regulations, 2010 CFR
2010-07-01
... 32 National Defense 4 2010-07-01 2010-07-01 true Involuntary extraction of bodily fluids in traffic cases. 634.38 Section 634.38 National Defense Department of Defense (Continued) DEPARTMENT OF THE ARMY (CONTINUED) LAW ENFORCEMENT AND CRIMINAL INVESTIGATIONS MOTOR VEHICLE TRAFFIC SUPERVISION Traffic Supervision § 634.38 Involuntary extraction of...
Classifying galaxy spectra at 0.5 < z < 1 with self-organizing maps
NASA Astrophysics Data System (ADS)
Rahmani, S.; Teimoorinia, H.; Barmby, P.
2018-05-01
The spectrum of a galaxy contains information about its physical properties. Classifying spectra using templates helps elucidate the nature of a galaxy's energy sources. In this paper, we investigate the use of self-organizing maps in classifying galaxy spectra against templates. We trained semi-supervised self-organizing map networks using a set of templates covering the wavelength range from far ultraviolet to near infrared. The trained networks were used to classify the spectra of a sample of 142 galaxies with 0.5 < z < 1 and the results compared to classifications performed using K-means clustering, a supervised neural network, and chi-squared minimization. Spectra corresponding to quiescent galaxies were more likely to be classified similarly by all methods while starburst spectra showed more variability. Compared to classification using chi-squared minimization or the supervised neural network, the galaxies classed together by the self-organizing map had more similar spectra. The class ordering provided by the one-dimensional self-organizing maps corresponds to an ordering in physical properties, a potentially important feature for the exploration of large datasets.
Weakly supervised visual dictionary learning by harnessing image attributes.
Gao, Yue; Ji, Rongrong; Liu, Wei; Dai, Qionghai; Hua, Gang
2014-12-01
Bag-of-features (BoFs) representation has been extensively applied to deal with various computer vision applications. To extract discriminative and descriptive BoF, one important step is to learn a good dictionary to minimize the quantization loss between local features and codewords. While most existing visual dictionary learning approaches are engaged with unsupervised feature quantization, the latest trend has turned to supervised learning by harnessing the semantic labels of images or regions. However, such labels are typically too expensive to acquire, which restricts the scalability of supervised dictionary learning approaches. In this paper, we propose to leverage image attributes to weakly supervise the dictionary learning procedure without requiring any actual labels. As a key contribution, our approach establishes a generative hidden Markov random field (HMRF), which models the quantized codewords as the observed states and the image attributes as the hidden states, respectively. Dictionary learning is then performed by supervised grouping the observed states, where the supervised information is stemmed from the hidden states of the HMRF. In such a way, the proposed dictionary learning approach incorporates the image attributes to learn a semantic-preserving BoF representation without any genuine supervision. Experiments in large-scale image retrieval and classification tasks corroborate that our approach significantly outperforms the state-of-the-art unsupervised dictionary learning approaches.
An online semi-supervised brain-computer interface.
Gu, Zhenghui; Yu, Zhuliang; Shen, Zhifang; Li, Yuanqing
2013-09-01
Practical brain-computer interface (BCI) systems should require only low training effort for the user, and the algorithms used to classify the intent of the user should be computationally efficient. However, due to inter- and intra-subject variations in EEG signal, intermittent training/calibration is often unavoidable. In this paper, we present an online semi-supervised P300 BCI speller system. After a short initial training (around or less than 1 min in our experiments), the system is switched to a mode where the user can input characters through selective attention. In this mode, a self-training least squares support vector machine (LS-SVM) classifier is gradually enhanced in back end with the unlabeled EEG data collected online after every character input. In this way, the classifier is gradually enhanced. Even though the user may experience some errors in input at the beginning due to the small initial training dataset, the accuracy approaches that of fully supervised method in a few minutes. The algorithm based on LS-SVM and its sequential update has low computational complexity; thus, it is suitable for online applications. The effectiveness of the algorithm has been validated through data analysis on BCI Competition III dataset II (P300 speller BCI data). The performance of the online system was evaluated through experimental results on eight healthy subjects, where all of them achieved the spelling accuracy of 85 % or above within an average online semi-supervised learning time of around 3 min.
Detection and Evaluation of Cheating on College Exams Using Supervised Classification
ERIC Educational Resources Information Center
Cavalcanti, Elmano Ramalho; Pires, Carlos Eduardo; Cavalcanti, Elmano Pontes; Pires, Vládia Freire
2012-01-01
Text mining has been used for various purposes, such as document classification and extraction of domain-specific information from text. In this paper we present a study in which text mining methodology and algorithms were properly employed for academic dishonesty (cheating) detection and evaluation on open-ended college exams, based on document…
An integrated system for land resources supervision based on the IoT and cloud computing
NASA Astrophysics Data System (ADS)
Fang, Shifeng; Zhu, Yunqiang; Xu, Lida; Zhang, Jinqu; Zhou, Peiji; Luo, Kan; Yang, Jie
2017-01-01
Integrated information systems are important safeguards for the utilisation and development of land resources. Information technologies, including the Internet of Things (IoT) and cloud computing, are inevitable requirements for the quality and efficiency of land resources supervision tasks. In this study, an economical and highly efficient supervision system for land resources has been established based on IoT and cloud computing technologies; a novel online and offline integrated system with synchronised internal and field data that includes the entire process of 'discovering breaches, analysing problems, verifying fieldwork and investigating cases' was constructed. The system integrates key technologies, such as the automatic extraction of high-precision information based on remote sensing, semantic ontology-based technology to excavate and discriminate public sentiment on the Internet that is related to illegal incidents, high-performance parallel computing based on MapReduce, uniform storing and compressing (bitwise) technology, global positioning system data communication and data synchronisation mode, intelligent recognition and four-level ('device, transfer, system and data') safety control technology. The integrated system based on a 'One Map' platform has been officially implemented by the Department of Land and Resources of Guizhou Province, China, and was found to significantly increase the efficiency and level of land resources supervision. The system promoted the overall development of informatisation in fields related to land resource management.
An information extraction framework for cohort identification using electronic health records.
Liu, Hongfang; Bielinski, Suzette J; Sohn, Sunghwan; Murphy, Sean; Wagholikar, Kavishwar B; Jonnalagadda, Siddhartha R; Ravikumar, K E; Wu, Stephen T; Kullo, Iftikhar J; Chute, Christopher G
2013-01-01
Information extraction (IE), a natural language processing (NLP) task that automatically extracts structured or semi-structured information from free text, has become popular in the clinical domain for supporting automated systems at point-of-care and enabling secondary use of electronic health records (EHRs) for clinical and translational research. However, a high performance IE system can be very challenging to construct due to the complexity and dynamic nature of human language. In this paper, we report an IE framework for cohort identification using EHRs that is a knowledge-driven framework developed under the Unstructured Information Management Architecture (UIMA). A system to extract specific information can be developed by subject matter experts through expert knowledge engineering of the externalized knowledge resources used in the framework.
Feature-level sentiment analysis by using comparative domain corpora
NASA Astrophysics Data System (ADS)
Quan, Changqin; Ren, Fuji
2016-06-01
Feature-level sentiment analysis (SA) is able to provide more fine-grained SA on certain opinion targets and has a wider range of applications on E-business. This study proposes an approach based on comparative domain corpora for feature-level SA. The proposed approach makes use of word associations for domain-specific feature extraction. First, we assign a similarity score for each candidate feature to denote its similarity extent to a domain. Then we identify domain features based on their similarity scores on different comparative domain corpora. After that, dependency grammar and a general sentiment lexicon are applied to extract and expand feature-oriented opinion words. Lastly, the semantic orientation of a domain-specific feature is determined based on the feature-oriented opinion lexicons. In evaluation, we compare the proposed method with several state-of-the-art methods (including unsupervised and semi-supervised) using a standard product review test collection. The experimental results demonstrate the effectiveness of using comparative domain corpora.
A Hybrid Human-Computer Approach to the Extraction of Scientific Facts from the Literature.
Tchoua, Roselyne B; Chard, Kyle; Audus, Debra; Qin, Jian; de Pablo, Juan; Foster, Ian
2016-01-01
A wealth of valuable data is locked within the millions of research articles published each year. Reading and extracting pertinent information from those articles has become an unmanageable task for scientists. This problem hinders scientific progress by making it hard to build on results buried in literature. Moreover, these data are loosely structured, encoded in manuscripts of various formats, embedded in different content types, and are, in general, not machine accessible. We present a hybrid human-computer solution for semi-automatically extracting scientific facts from literature. This solution combines an automated discovery, download, and extraction phase with a semi-expert crowd assembled from students to extract specific scientific facts. To evaluate our approach we apply it to a challenging molecular engineering scenario, extraction of a polymer property: the Flory-Huggins interaction parameter. We demonstrate useful contributions to a comprehensive database of polymer properties.
Supervised Learning Based Hypothesis Generation from Biomedical Literature.
Sang, Shengtian; Yang, Zhihao; Li, Zongyao; Lin, Hongfei
2015-01-01
Nowadays, the amount of biomedical literatures is growing at an explosive speed, and there is much useful knowledge undiscovered in this literature. Researchers can form biomedical hypotheses through mining these works. In this paper, we propose a supervised learning based approach to generate hypotheses from biomedical literature. This approach splits the traditional processing of hypothesis generation with classic ABC model into AB model and BC model which are constructed with supervised learning method. Compared with the concept cooccurrence and grammar engineering-based approaches like SemRep, machine learning based models usually can achieve better performance in information extraction (IE) from texts. Then through combining the two models, the approach reconstructs the ABC model and generates biomedical hypotheses from literature. The experimental results on the three classic Swanson hypotheses show that our approach outperforms SemRep system.
Yang, Liang; Jin, Di; He, Dongxiao; Fu, Huazhu; Cao, Xiaochun; Fogelman-Soulie, Francoise
2017-03-29
Due to the importance of community structure in understanding network and a surge of interest aroused on community detectability, how to improve the community identification performance with pairwise prior information becomes a hot topic. However, most existing semi-supervised community detection algorithms only focus on improving the accuracy but ignore the impacts of priors on speeding detection. Besides, they always require to tune additional parameters and cannot guarantee pairwise constraints. To address these drawbacks, we propose a general, high-speed, effective and parameter-free semi-supervised community detection framework. By constructing the indivisible super-nodes according to the connected subgraph of the must-link constraints and by forming the weighted super-edge based on network topology and cannot-link constraints, our new framework transforms the original network into an equivalent but much smaller Super-Network. Super-Network perfectly ensures the must-link constraints and effectively encodes cannot-link constraints. Furthermore, the time complexity of super-network construction process is linear in the original network size, which makes it efficient. Meanwhile, since the constructed super-network is much smaller than the original one, any existing community detection algorithm is much faster when using our framework. Besides, the overall process will not introduce any additional parameters, making it more practical.
Semi-supervised anomaly detection - towards model-independent searches of new physics
NASA Astrophysics Data System (ADS)
Kuusela, Mikael; Vatanen, Tommi; Malmi, Eric; Raiko, Tapani; Aaltonen, Timo; Nagai, Yoshikazu
2012-06-01
Most classification algorithms used in high energy physics fall under the category of supervised machine learning. Such methods require a training set containing both signal and background events and are prone to classification errors should this training data be systematically inaccurate for example due to the assumed MC model. To complement such model-dependent searches, we propose an algorithm based on semi-supervised anomaly detection techniques, which does not require a MC training sample for the signal data. We first model the background using a multivariate Gaussian mixture model. We then search for deviations from this model by fitting to the observations a mixture of the background model and a number of additional Gaussians. This allows us to perform pattern recognition of any anomalous excess over the background. We show by a comparison to neural network classifiers that such an approach is a lot more robust against misspecification of the signal MC than supervised classification. In cases where there is an unexpected signal, a neural network might fail to correctly identify it, while anomaly detection does not suffer from such a limitation. On the other hand, when there are no systematic errors in the training data, both methods perform comparably.
Active learning: a step towards automating medical concept extraction.
Kholghi, Mahnoosh; Sitbon, Laurianne; Zuccon, Guido; Nguyen, Anthony
2016-03-01
This paper presents an automatic, active learning-based system for the extraction of medical concepts from clinical free-text reports. Specifically, (1) the contribution of active learning in reducing the annotation effort and (2) the robustness of incremental active learning framework across different selection criteria and data sets are determined. The comparative performance of an active learning framework and a fully supervised approach were investigated to study how active learning reduces the annotation effort while achieving the same effectiveness as a supervised approach. Conditional random fields as the supervised method, and least confidence and information density as 2 selection criteria for active learning framework were used. The effect of incremental learning vs standard learning on the robustness of the models within the active learning framework with different selection criteria was also investigated. The following 2 clinical data sets were used for evaluation: the Informatics for Integrating Biology and the Bedside/Veteran Affairs (i2b2/VA) 2010 natural language processing challenge and the Shared Annotated Resources/Conference and Labs of the Evaluation Forum (ShARe/CLEF) 2013 eHealth Evaluation Lab. The annotation effort saved by active learning to achieve the same effectiveness as supervised learning is up to 77%, 57%, and 46% of the total number of sequences, tokens, and concepts, respectively. Compared with the random sampling baseline, the saving is at least doubled. Incremental active learning is a promising approach for building effective and robust medical concept extraction models while significantly reducing the burden of manual annotation. © The Author 2015. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Semi-supervised classification tool for DubaiSat-2 multispectral imagery
NASA Astrophysics Data System (ADS)
Al-Mansoori, Saeed
2015-10-01
This paper addresses a semi-supervised classification tool based on a pixel-based approach of the multi-spectral satellite imagery. There are not many studies demonstrating such algorithm for the multispectral images, especially when the image consists of 4 bands (Red, Green, Blue and Near Infrared) as in DubaiSat-2 satellite images. The proposed approach utilizes both unsupervised and supervised classification schemes sequentially to identify four classes in the image, namely, water bodies, vegetation, land (developed and undeveloped areas) and paved areas (i.e. roads). The unsupervised classification concept is applied to identify two classes; water bodies and vegetation, based on a well-known index that uses the distinct wavelengths of visible and near-infrared sunlight that is absorbed and reflected by the plants to identify the classes; this index parameter is called "Normalized Difference Vegetation Index (NDVI)". Afterward, the supervised classification is performed by selecting training homogenous samples for roads and land areas. Here, a precise selection of training samples plays a vital role in the classification accuracy. Post classification is finally performed to enhance the classification accuracy, where the classified image is sieved, clumped and filtered before producing final output. Overall, the supervised classification approach produced higher accuracy than the unsupervised method. This paper shows some current preliminary research results which point out the effectiveness of the proposed technique in a virtual perspective.
A semi-automatic annotation tool for cooking video
NASA Astrophysics Data System (ADS)
Bianco, Simone; Ciocca, Gianluigi; Napoletano, Paolo; Schettini, Raimondo; Margherita, Roberto; Marini, Gianluca; Gianforme, Giorgio; Pantaleo, Giuseppe
2013-03-01
In order to create a cooking assistant application to guide the users in the preparation of the dishes relevant to their profile diets and food preferences, it is necessary to accurately annotate the video recipes, identifying and tracking the foods of the cook. These videos present particular annotation challenges such as frequent occlusions, food appearance changes, etc. Manually annotate the videos is a time-consuming, tedious and error-prone task. Fully automatic tools that integrate computer vision algorithms to extract and identify the elements of interest are not error free, and false positive and false negative detections need to be corrected in a post-processing stage. We present an interactive, semi-automatic tool for the annotation of cooking videos that integrates computer vision techniques under the supervision of the user. The annotation accuracy is increased with respect to completely automatic tools and the human effort is reduced with respect to completely manual ones. The performance and usability of the proposed tool are evaluated on the basis of the time and effort required to annotate the same video sequences.
Encoding Dissimilarity Data for Statistical Model Building.
Wahba, Grace
2010-12-01
We summarize, review and comment upon three papers which discuss the use of discrete, noisy, incomplete, scattered pairwise dissimilarity data in statistical model building. Convex cone optimization codes are used to embed the objects into a Euclidean space which respects the dissimilarity information while controlling the dimension of the space. A "newbie" algorithm is provided for embedding new objects into this space. This allows the dissimilarity information to be incorporated into a Smoothing Spline ANOVA penalized likelihood model, a Support Vector Machine, or any model that will admit Reproducing Kernel Hilbert Space components, for nonparametric regression, supervised learning, or semi-supervised learning. Future work and open questions are discussed. The papers are: F. Lu, S. Keles, S. Wright and G. Wahba 2005. A framework for kernel regularization with application to protein clustering. Proceedings of the National Academy of Sciences 102, 12332-1233.G. Corrada Bravo, G. Wahba, K. Lee, B. Klein, R. Klein and S. Iyengar 2009. Examining the relative influence of familial, genetic and environmental covariate information in flexible risk models. Proceedings of the National Academy of Sciences 106, 8128-8133F. Lu, Y. Lin and G. Wahba. Robust manifold unfolding with kernel regularization. TR 1008, Department of Statistics, University of Wisconsin-Madison.
Identification of Alfalfa Leaf Diseases Using Image Recognition Technology
Qin, Feng; Liu, Dongxia; Sun, Bingda; Ruan, Liu; Ma, Zhanhong; Wang, Haiguang
2016-01-01
Common leaf spot (caused by Pseudopeziza medicaginis), rust (caused by Uromyces striatus), Leptosphaerulina leaf spot (caused by Leptosphaerulina briosiana) and Cercospora leaf spot (caused by Cercospora medicaginis) are the four common types of alfalfa leaf diseases. Timely and accurate diagnoses of these diseases are critical for disease management, alfalfa quality control and the healthy development of the alfalfa industry. In this study, the identification and diagnosis of the four types of alfalfa leaf diseases were investigated using pattern recognition algorithms based on image-processing technology. A sub-image with one or multiple typical lesions was obtained by artificial cutting from each acquired digital disease image. Then the sub-images were segmented using twelve lesion segmentation methods integrated with clustering algorithms (including K_means clustering, fuzzy C-means clustering and K_median clustering) and supervised classification algorithms (including logistic regression analysis, Naive Bayes algorithm, classification and regression tree, and linear discriminant analysis). After a comprehensive comparison, the segmentation method integrating the K_median clustering algorithm and linear discriminant analysis was chosen to obtain lesion images. After the lesion segmentation using this method, a total of 129 texture, color and shape features were extracted from the lesion images. Based on the features selected using three methods (ReliefF, 1R and correlation-based feature selection), disease recognition models were built using three supervised learning methods, including the random forest, support vector machine (SVM) and K-nearest neighbor methods. A comparison of the recognition results of the models was conducted. The results showed that when the ReliefF method was used for feature selection, the SVM model built with the most important 45 features (selected from a total of 129 features) was the optimal model. For this SVM model, the recognition accuracies of the training set and the testing set were 97.64% and 94.74%, respectively. Semi-supervised models for disease recognition were built based on the 45 effective features that were used for building the optimal SVM model. For the optimal semi-supervised models built with three ratios of labeled to unlabeled samples in the training set, the recognition accuracies of the training set and the testing set were both approximately 80%. The results indicated that image recognition of the four alfalfa leaf diseases can be implemented with high accuracy. This study provides a feasible solution for lesion image segmentation and image recognition of alfalfa leaf disease. PMID:27977767
Identification of Alfalfa Leaf Diseases Using Image Recognition Technology.
Qin, Feng; Liu, Dongxia; Sun, Bingda; Ruan, Liu; Ma, Zhanhong; Wang, Haiguang
2016-01-01
Common leaf spot (caused by Pseudopeziza medicaginis), rust (caused by Uromyces striatus), Leptosphaerulina leaf spot (caused by Leptosphaerulina briosiana) and Cercospora leaf spot (caused by Cercospora medicaginis) are the four common types of alfalfa leaf diseases. Timely and accurate diagnoses of these diseases are critical for disease management, alfalfa quality control and the healthy development of the alfalfa industry. In this study, the identification and diagnosis of the four types of alfalfa leaf diseases were investigated using pattern recognition algorithms based on image-processing technology. A sub-image with one or multiple typical lesions was obtained by artificial cutting from each acquired digital disease image. Then the sub-images were segmented using twelve lesion segmentation methods integrated with clustering algorithms (including K_means clustering, fuzzy C-means clustering and K_median clustering) and supervised classification algorithms (including logistic regression analysis, Naive Bayes algorithm, classification and regression tree, and linear discriminant analysis). After a comprehensive comparison, the segmentation method integrating the K_median clustering algorithm and linear discriminant analysis was chosen to obtain lesion images. After the lesion segmentation using this method, a total of 129 texture, color and shape features were extracted from the lesion images. Based on the features selected using three methods (ReliefF, 1R and correlation-based feature selection), disease recognition models were built using three supervised learning methods, including the random forest, support vector machine (SVM) and K-nearest neighbor methods. A comparison of the recognition results of the models was conducted. The results showed that when the ReliefF method was used for feature selection, the SVM model built with the most important 45 features (selected from a total of 129 features) was the optimal model. For this SVM model, the recognition accuracies of the training set and the testing set were 97.64% and 94.74%, respectively. Semi-supervised models for disease recognition were built based on the 45 effective features that were used for building the optimal SVM model. For the optimal semi-supervised models built with three ratios of labeled to unlabeled samples in the training set, the recognition accuracies of the training set and the testing set were both approximately 80%. The results indicated that image recognition of the four alfalfa leaf diseases can be implemented with high accuracy. This study provides a feasible solution for lesion image segmentation and image recognition of alfalfa leaf disease.
Porosity estimation by semi-supervised learning with sparsely available labeled samples
NASA Astrophysics Data System (ADS)
Lima, Luiz Alberto; Görnitz, Nico; Varella, Luiz Eduardo; Vellasco, Marley; Müller, Klaus-Robert; Nakajima, Shinichi
2017-09-01
This paper addresses the porosity estimation problem from seismic impedance volumes and porosity samples located in a small group of exploratory wells. Regression methods, trained on the impedance as inputs and the porosity as output labels, generally suffer from extremely expensive (and hence sparsely available) porosity samples. To optimally make use of the valuable porosity data, a semi-supervised machine learning method was proposed, Transductive Conditional Random Field Regression (TCRFR), showing good performance (Görnitz et al., 2017). TCRFR, however, still requires more labeled data than those usually available, which creates a gap when applying the method to the porosity estimation problem in realistic situations. In this paper, we aim to fill this gap by introducing two graph-based preprocessing techniques, which adapt the original TCRFR for extremely weakly supervised scenarios. Our new method outperforms the previous automatic estimation methods on synthetic data and provides a comparable result to the manual labored, time-consuming geostatistics approach on real data, proving its potential as a practical industrial tool.
Visualization techniques for computer network defense
NASA Astrophysics Data System (ADS)
Beaver, Justin M.; Steed, Chad A.; Patton, Robert M.; Cui, Xiaohui; Schultz, Matthew
2011-06-01
Effective visual analysis of computer network defense (CND) information is challenging due to the volume and complexity of both the raw and analyzed network data. A typical CND is comprised of multiple niche intrusion detection tools, each of which performs network data analysis and produces a unique alerting output. The state-of-the-practice in the situational awareness of CND data is the prevalent use of custom-developed scripts by Information Technology (IT) professionals to retrieve, organize, and understand potential threat events. We propose a new visual analytics framework, called the Oak Ridge Cyber Analytics (ORCA) system, for CND data that allows an operator to interact with all detection tool outputs simultaneously. Aggregated alert events are presented in multiple coordinated views with timeline, cluster, and swarm model analysis displays. These displays are complemented with both supervised and semi-supervised machine learning classifiers. The intent of the visual analytics framework is to improve CND situational awareness, to enable an analyst to quickly navigate and analyze thousands of detected events, and to combine sophisticated data analysis techniques with interactive visualization such that patterns of anomalous activities may be more easily identified and investigated.
2009-01-01
Background The characterisation, or binning, of metagenome fragments is an important first step to further downstream analysis of microbial consortia. Here, we propose a one-dimensional signature, OFDEG, derived from the oligonucleotide frequency profile of a DNA sequence, and show that it is possible to obtain a meaningful phylogenetic signal for relatively short DNA sequences. The one-dimensional signal is essentially a compact representation of higher dimensional feature spaces of greater complexity and is intended to improve on the tetranucleotide frequency feature space preferred by current compositional binning methods. Results We compare the fidelity of OFDEG against tetranucleotide frequency in both an unsupervised and semi-supervised setting on simulated metagenome benchmark data. Four tests were conducted using assembler output of Arachne and phrap, and for each, performance was evaluated on contigs which are greater than or equal to 8 kbp in length and contigs which are composed of at least 10 reads. Using both G-C content in conjunction with OFDEG gave an average accuracy of 96.75% (semi-supervised) and 95.19% (unsupervised), versus 94.25% (semi-supervised) and 82.35% (unsupervised) for tetranucleotide frequency. Conclusion We have presented an observation of an alternative characteristic of DNA sequences. The proposed feature representation has proven to be more beneficial than the existing tetranucleotide frequency space to the metagenome binning problem. We do note, however, that our observation of OFDEG deserves further anlaysis and investigation. Unsupervised clustering revealed OFDEG related features performed better than standard tetranucleotide frequency in representing a relevant organism specific signal. Further improvement in binning accuracy is given by semi-supervised classification using OFDEG. The emphasis on a feature-driven, bottom-up approach to the problem of binning reveals promising avenues for future development of techniques to characterise short environmental sequences without bias toward cultivable organisms. PMID:19958473
An Extended Spectral-Spatial Classification Approach for Hyperspectral Data
NASA Astrophysics Data System (ADS)
Akbari, D.
2017-11-01
In this paper an extended classification approach for hyperspectral imagery based on both spectral and spatial information is proposed. The spatial information is obtained by an enhanced marker-based minimum spanning forest (MSF) algorithm. Three different methods of dimension reduction are first used to obtain the subspace of hyperspectral data: (1) unsupervised feature extraction methods including principal component analysis (PCA), independent component analysis (ICA), and minimum noise fraction (MNF); (2) supervised feature extraction including decision boundary feature extraction (DBFE), discriminate analysis feature extraction (DAFE), and nonparametric weighted feature extraction (NWFE); (3) genetic algorithm (GA). The spectral features obtained are then fed into the enhanced marker-based MSF classification algorithm. In the enhanced MSF algorithm, the markers are extracted from the classification maps obtained by both SVM and watershed segmentation algorithm. To evaluate the proposed approach, the Pavia University hyperspectral data is tested. Experimental results show that the proposed approach using GA achieves an approximately 8 % overall accuracy higher than the original MSF-based algorithm.
Titaley, C R; Jusril, H; Ariawan, I; Soeharno, N; Setiawan, T; Weber, M W
2014-01-01
The integrated management of childhood illness (IMCI) is a comprehensive approach to child health, which has been adopted in Indonesia since 1997. This study aims to provide an overview of IMCI implementation at community health centres (puskesmas) in West Java province, Indonesia. Data were derived from a cross-sectional study conducted in 10 districts of West Java province, from November to December 2012. Semi-structured interviews were used to obtain information from staff at 80 puskesmas, including the heads (80 informants), pharmacy staff (79 informants) and midwives/nurses trained in IMCI (148 informants), using semi-structured interviews. Quantitative data were analysed using frequency tabulations and qualitative data were analysed by identifying themes that emerged in informants' responses. Almost all (N = 79) puskesmas implemented the IMCI strategy; however, only 64% applied it to all visiting children. Several barriers to IMCI implementation were identified, including shortage of health workers trained in IMCI (only 43% of puskesmas had all health workers in the child care unit trained in IMCI and 40% of puskesmas conducted on-the-job training). Only 19% of puskesmas had all the essential drugs and equipment for IMCI. Nearly all health workers acknowledged the importance of IMCI in their routine services and very few did not perceive its benefits. Lack of supervision from district health office staff and low community awareness regarding the importance of IMCI were reported. Complaints received from patients'families were generally related to the long duration of treatment and no administration of medication after physical examination. Interventions aiming to create local regulations endorsing IMCI implementation; promoting monitoring and supervision; encouraging on-the-job training for health workers; and strengthening training programmes, counselling and other promotional activities are important for promoting IMCI implementation in West Java province, and are also likely to be useful elsewhere in the country.
An Information Extraction Framework for Cohort Identification Using Electronic Health Records
Liu, Hongfang; Bielinski, Suzette J.; Sohn, Sunghwan; Murphy, Sean; Wagholikar, Kavishwar B.; Jonnalagadda, Siddhartha R.; Ravikumar, K.E.; Wu, Stephen T.; Kullo, Iftikhar J.; Chute, Christopher G
Information extraction (IE), a natural language processing (NLP) task that automatically extracts structured or semi-structured information from free text, has become popular in the clinical domain for supporting automated systems at point-of-care and enabling secondary use of electronic health records (EHRs) for clinical and translational research. However, a high performance IE system can be very challenging to construct due to the complexity and dynamic nature of human language. In this paper, we report an IE framework for cohort identification using EHRs that is a knowledge-driven framework developed under the Unstructured Information Management Architecture (UIMA). A system to extract specific information can be developed by subject matter experts through expert knowledge engineering of the externalized knowledge resources used in the framework. PMID:24303255
Zhao, Nan; Han, Jing Ginger; Shyu, Chi-Ren; Korkin, Dmitry
2014-01-01
Single nucleotide polymorphisms (SNPs) are among the most common types of genetic variation in complex genetic disorders. A growing number of studies link the functional role of SNPs with the networks and pathways mediated by the disease-associated genes. For example, many non-synonymous missense SNPs (nsSNPs) have been found near or inside the protein-protein interaction (PPI) interfaces. Determining whether such nsSNP will disrupt or preserve a PPI is a challenging task to address, both experimentally and computationally. Here, we present this task as three related classification problems, and develop a new computational method, called the SNP-IN tool (non-synonymous SNP INteraction effect predictor). Our method predicts the effects of nsSNPs on PPIs, given the interaction's structure. It leverages supervised and semi-supervised feature-based classifiers, including our new Random Forest self-learning protocol. The classifiers are trained based on a dataset of comprehensive mutagenesis studies for 151 PPI complexes, with experimentally determined binding affinities of the mutant and wild-type interactions. Three classification problems were considered: (1) a 2-class problem (strengthening/weakening PPI mutations), (2) another 2-class problem (mutations that disrupt/preserve a PPI), and (3) a 3-class classification (detrimental/neutral/beneficial mutation effects). In total, 11 different supervised and semi-supervised classifiers were trained and assessed resulting in a promising performance, with the weighted f-measure ranging from 0.87 for Problem 1 to 0.70 for the most challenging Problem 3. By integrating prediction results of the 2-class classifiers into the 3-class classifier, we further improved its performance for Problem 3. To demonstrate the utility of SNP-IN tool, it was applied to study the nsSNP-induced rewiring of two disease-centered networks. The accurate and balanced performance of SNP-IN tool makes it readily available to study the rewiring of large-scale protein-protein interaction networks, and can be useful for functional annotation of disease-associated SNPs. SNIP-IN tool is freely accessible as a web-server at http://korkinlab.org/snpintool/. PMID:24784581
NASA Astrophysics Data System (ADS)
Chen, Andrew A.; Meng, Frank; Morioka, Craig A.; Churchill, Bernard M.; Kangarloo, Hooshang
2005-04-01
Managing pediatric patients with neurogenic bladder (NGB) involves regular laboratory, imaging, and physiologic testing. Using input from domain experts and current literature, we identified specific data points from these tests to develop the concept of an electronic disease vector for NGB. An information extraction engine was used to extract the desired data elements from free-text and semi-structured documents retrieved from the patient"s medical record. Finally, a Java-based presentation engine created graphical visualizations of the extracted data. After precision, recall, and timing evaluation, we conclude that these tools may enable clinically useful, automatically generated, and diagnosis-specific visualizations of patient data, potentially improving compliance and ultimately, outcomes.
Interprofessional supervision in an intercultural context: a qualitative study.
Chipchase, Lucy; Allen, Shelley; Eley, Diann; McAllister, Lindy; Strong, Jenny
2012-11-01
Our understanding of the qualities and value of clinical supervision is based on uniprofessional clinical education models. There is little research regarding the role and qualities needed in the supervisor role for supporting interprofessional placements. This paper reports the views and perceptions of medical and allied heath students and supervisors on the characteristics of clinical supervision in an interprofessional, international context. A qualitative case study was used involving semi-structured interviews of eight health professional students and four clinical supervisors before and after an interprofessional, international clinical placement. Our findings suggest that supervision from educators whose profession differs from that of the students can be a beneficial and rewarding experience leading to the use of alternative learning strategies. Although all participants valued interprofessional supervision, there was agreement that profession-specific supervision was required throughout the placement. Further research is required to understand this view as interprofessional education aims to prepare graduates for collaborative practice where they may work in teams supervised by staff whose profession may differ from their own.
NASA Astrophysics Data System (ADS)
Partovi, T.; Fraundorfer, F.; Azimi, S.; Marmanis, D.; Reinartz, P.
2017-05-01
3D building reconstruction from remote sensing image data from satellites is still an active research topic and very valuable for 3D city modelling. The roof model is the most important component to reconstruct the Level of Details 2 (LoD2) for a building in 3D modelling. While the general solution for roof modelling relies on the detailed cues (such as lines, corners and planes) extracted from a Digital Surface Model (DSM), the correct detection of the roof type and its modelling can fail due to low quality of the DSM generated by dense stereo matching. To reduce dependencies of roof modelling on DSMs, the pansharpened satellite images as a rich resource of information are used in addition. In this paper, two strategies are employed for roof type classification. In the first one, building roof types are classified in a state-of-the-art supervised pre-trained convolutional neural network (CNN) framework. In the second strategy, deep features from deep layers of different pre-trained CNN model are extracted and then an RBF kernel using SVM is employed to classify the building roof type. Based on roof complexity of the scene, a roof library including seven types of roofs is defined. A new semi-automatic method is proposed to generate training and test patches of each roof type in the library. Using the pre-trained CNN model does not only decrease the computation time for training significantly but also increases the classification accuracy.
PI2GIS: processing image to geographical information systems, a learning tool for QGIS
NASA Astrophysics Data System (ADS)
Correia, R.; Teodoro, A.; Duarte, L.
2017-10-01
To perform an accurate interpretation of remote sensing images, it is necessary to extract information using different image processing techniques. Nowadays, it became usual to use image processing plugins to add new capabilities/functionalities integrated in Geographical Information System (GIS) software. The aim of this work was to develop an open source application to automatically process and classify remote sensing images from a set of satellite input data. The application was integrated in a GIS software (QGIS), automating several image processing steps. The use of QGIS for this purpose is justified since it is easy and quick to develop new plugins, using Python language. This plugin is inspired in the Semi-Automatic Classification Plugin (SCP) developed by Luca Congedo. SCP allows the supervised classification of remote sensing images, the calculation of vegetation indices such as NDVI (Normalized Difference Vegetation Index) and EVI (Enhanced Vegetation Index) and other image processing operations. When analysing SCP, it was realized that a set of operations, that are very useful in teaching classes of remote sensing and image processing tasks, were lacking, such as the visualization of histograms, the application of filters, different image corrections, unsupervised classification and several environmental indices computation. The new set of operations included in the PI2GIS plugin can be divided into three groups: pre-processing, processing, and classification procedures. The application was tested consider an image from Landsat 8 OLI from a North area of Portugal.
Classification of the Gabon SAR Mosaic Using a Wavelet Based Rule Classifier
NASA Technical Reports Server (NTRS)
Simard, Marc; Saatchi, Sasan; DeGrandi, Gianfranco
2000-01-01
A method is developed for semi-automated classification of SAR images of the tropical forest. Information is extracted using the wavelet transform (WT). The transform allows for extraction of structural information in the image as a function of scale. In order to classify the SAR image, a Desicion Tree Classifier is used. The method of pruning is used to optimize classification rate versus tree size. The results give explicit insight on the type of information useful for a given class.
System for definition of the central-chest vasculature
NASA Astrophysics Data System (ADS)
Taeprasartsit, Pinyo; Higgins, William E.
2009-02-01
Accurate definition of the central-chest vasculature from three-dimensional (3D) multi-detector CT (MDCT) images is important for pulmonary applications. For instance, the aorta and pulmonary artery help in automatic definition of the Mountain lymph-node stations for lung-cancer staging. This work presents a system for defining major vascular structures in the central chest. The system provides automatic methods for extracting the aorta and pulmonary artery and semi-automatic methods for extracting the other major central chest arteries/veins, such as the superior vena cava and azygos vein. Automatic aorta and pulmonary artery extraction are performed by model fitting and selection. The system also extracts certain vascular structure information to validate outputs. A semi-automatic method extracts vasculature by finding the medial axes between provided important sites. Results of the system are applied to lymph-node station definition and guidance of bronchoscopic biopsy.
Federal Register 2010, 2011, 2012, 2013, 2014
2013-12-06
...) March 10-11, 2014 AGENCY: Centers for Medicare & Medicaid Services (CMS), Department of Health and Human... Advisory Panel on Hospital Outpatient Payment (the Panel) for 2014. The purpose of the Panel is to advise... therapeutic services supervision issues. DATES: Meeting Dates: The first semi-annual meeting in 2014 is...
[How did health personnel perceive supervision of obstetric institutions?].
Arianson, Helga; Elvbakken, Kari Tove; Malterud, Kirsti
2008-05-15
Through audits, the Norwegian Board of Health supervises and ensures that health institutions adhere to rules and regulations that apply to them. Conduct of such supervision should be predictable and the basis for decisions should be documented and challengeable. Those in charge of the supervision must have the necessary professional competence and be able to integrate and understand the collected information so they can draw the right conclusions. The audit team should demonstrate consideration and respect to those they meet during audits. We therefore wanted to study the experience of being audited among health care providers and leaders of institutions and subsequent adjustments after the audit. We used a questionnaire to evaluate the national audit of 26 (of 60 totally) Norwegian obstetric institutions in 2004. A questionnaire was sent to leaders and health care providers in all institutions that had been inspected (208 persons). Data from semi-structured interviews were used to validate and explore the quantitative findings. 89% responded to the questionnaire. The supervision was well received by leaders and health care providers at the obstetric institutions. The respondents confirmed that the audit team's approach and conduct in principle adhered to the rules within the examined domains. The conclusions presented by the audit teams were accepted as correct by most of the respondents. A large number of adjustments were reported after the audits. We conclude that auditing can lead to improvements and that the described programme probably contributed to improving obstetric services in Norway. The audit team's conduct seems to have an effect on acceptance of the supervision. The performance of the teams may have an impact of the acceptance of auditing, but not on reporting of the adjustments carried out.
Multisource Data Classification Using A Hybrid Semi-supervised Learning Scheme
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vatsavai, Raju; Bhaduri, Budhendra L; Shekhar, Shashi
2009-01-01
In many practical situations thematic classes can not be discriminated by spectral measurements alone. Often one needs additional features such as population density, road density, wetlands, elevation, soil types, etc. which are discrete attributes. On the other hand remote sensing image features are continuous attributes. Finding a suitable statistical model and estimation of parameters is a challenging task in multisource (e.g., discrete and continuous attributes) data classification. In this paper we present a semi-supervised learning method by assuming that the samples were generated by a mixture model, where each component could be either a continuous or discrete distribution. Overall classificationmore » accuracy of the proposed method is improved by 12% in our initial experiments.« less
Optimized Graph Learning Using Partial Tags and Multiple Features for Image and Video Annotation.
Song, Jingkuan; Gao, Lianli; Nie, Feiping; Shen, Heng Tao; Yan, Yan; Sebe, Nicu
2016-11-01
In multimedia annotation, due to the time constraints and the tediousness of manual tagging, it is quite common to utilize both tagged and untagged data to improve the performance of supervised learning when only limited tagged training data are available. This is often done by adding a geometry-based regularization term in the objective function of a supervised learning model. In this case, a similarity graph is indispensable to exploit the geometrical relationships among the training data points, and the graph construction scheme essentially determines the performance of these graph-based learning algorithms. However, most of the existing works construct the graph empirically and are usually based on a single feature without using the label information. In this paper, we propose a semi-supervised annotation approach by learning an optimized graph (OGL) from multi-cues (i.e., partial tags and multiple features), which can more accurately embed the relationships among the data points. Since OGL is a transductive method and cannot deal with novel data points, we further extend our model to address the out-of-sample issue. Extensive experiments on image and video annotation show the consistent superiority of OGL over the state-of-the-art methods.
Impact of translation on named-entity recognition in radiology texts
Pedro, Vasco
2017-01-01
Abstract Radiology reports describe the results of radiography procedures and have the potential of being a useful source of information which can bring benefits to health care systems around the world. One way to automatically extract information from the reports is by using Text Mining tools. The problem is that these tools are mostly developed for English and reports are usually written in the native language of the radiologist, which is not necessarily English. This creates an obstacle to the sharing of Radiology information between different communities. This work explores the solution of translating the reports to English before applying the Text Mining tools, probing the question of what translation approach should be used. We created MRRAD (Multilingual Radiology Research Articles Dataset), a parallel corpus of Portuguese research articles related to Radiology and a number of alternative translations (human, automatic and semi-automatic) to English. This is a novel corpus which can be used to move forward the research on this topic. Using MRRAD we studied which kind of automatic or semi-automatic translation approach is more effective on the Named-entity recognition task of finding RadLex terms in the English version of the articles. Considering the terms extracted from human translations as our gold standard, we calculated how similar to this standard were the terms extracted using other translations. We found that a completely automatic translation approach using Google leads to F-scores (between 0.861 and 0.868, depending on the extraction approach) similar to the ones obtained through a more expensive semi-automatic translation approach using Unbabel (between 0.862 and 0.870). To better understand the results we also performed a qualitative analysis of the type of errors found in the automatic and semi-automatic translations. Database URL: https://github.com/lasigeBioTM/MRRAD PMID:29220455
Alzheimer's Disease Early Diagnosis Using Manifold-Based Semi-Supervised Learning.
Khajehnejad, Moein; Saatlou, Forough Habibollahi; Mohammadzade, Hoda
2017-08-20
Alzheimer's disease (AD) is currently ranked as the sixth leading cause of death in the United States and recent estimates indicate that the disorder may rank third, just behind heart disease and cancer, as a cause of death for older people. Clearly, predicting this disease in the early stages and preventing it from progressing is of great importance. The diagnosis of Alzheimer's disease (AD) requires a variety of medical tests, which leads to huge amounts of multivariate heterogeneous data. It can be difficult and exhausting to manually compare, visualize, and analyze this data due to the heterogeneous nature of medical tests; therefore, an efficient approach for accurate prediction of the condition of the brain through the classification of magnetic resonance imaging (MRI) images is greatly beneficial and yet very challenging. In this paper, a novel approach is proposed for the diagnosis of very early stages of AD through an efficient classification of brain MRI images, which uses label propagation in a manifold-based semi-supervised learning framework. We first apply voxel morphometry analysis to extract some of the most critical AD-related features of brain images from the original MRI volumes and also gray matter (GM) segmentation volumes. The features must capture the most discriminative properties that vary between a healthy and Alzheimer-affected brain. Next, we perform a principal component analysis (PCA)-based dimension reduction on the extracted features for faster yet sufficiently accurate analysis. To make the best use of the captured features, we present a hybrid manifold learning framework which embeds the feature vectors in a subspace. Next, using a small set of labeled training data, we apply a label propagation method in the created manifold space to predict the labels of the remaining images and classify them in the two groups of mild Alzheimer's and normal condition (MCI/NC). The accuracy of the classification using the proposed method is 93.86% for the Open Access Series of Imaging Studies (OASIS) database of MRI brain images, providing, compared to the best existing methods, a 3% lower error rate.
NASA Astrophysics Data System (ADS)
Fatehi, Moslem; Asadi, Hooshang H.
2017-04-01
In this study, the application of a transductive support vector machine (TSVM), an innovative semi-supervised learning algorithm, has been proposed for mapping the potential drill targets at a detailed exploration stage. The semi-supervised learning method is a hybrid of supervised and unsupervised learning approach that simultaneously uses both training and non-training data to design a classifier. By using the TSVM algorithm, exploration layers at the Dalli porphyry Cu-Au deposit in the central Iran were integrated to locate the boundary of the Cu-Au mineralization for further drilling. By applying this algorithm on the non-training (unlabeled) and limited training (labeled) Dalli exploration data, the study area was classified in two domains of Cu-Au ore and waste. Then, the results were validated by the earlier block models created, using the available borehole and trench data. In addition to TSVM, the support vector machine (SVM) algorithm was also implemented on the study area for comparison. Thirty percent of the labeled exploration data was used to evaluate the performance of these two algorithms. The results revealed 87 percent correct recognition accuracy for the TSVM algorithm and 82 percent for the SVM algorithm. The deepest inclined borehole, recently drilled in the western part of the Dalli deposit, indicated that the boundary of Cu-Au mineralization, as identified by the TSVM algorithm, was only 15 m off from the actual boundary intersected by this borehole. According to the results of the TSVM algorithm, six new boreholes were suggested for further drilling at the Dalli deposit. This study showed that the TSVM algorithm could be a useful tool for enhancing the mineralization zones and consequently, ensuring a more accurate drill hole planning.
Su, Hang; Yin, Zhaozheng; Huh, Seungil; Kanade, Takeo
2013-10-01
Phase-contrast microscopy is one of the most common and convenient imaging modalities to observe long-term multi-cellular processes, which generates images by the interference of lights passing through transparent specimens and background medium with different retarded phases. Despite many years of study, computer-aided phase contrast microscopy analysis on cell behavior is challenged by image qualities and artifacts caused by phase contrast optics. Addressing the unsolved challenges, the authors propose (1) a phase contrast microscopy image restoration method that produces phase retardation features, which are intrinsic features of phase contrast microscopy, and (2) a semi-supervised learning based algorithm for cell segmentation, which is a fundamental task for various cell behavior analysis. Specifically, the image formation process of phase contrast microscopy images is first computationally modeled with a dictionary of diffraction patterns; as a result, each pixel of a phase contrast microscopy image is represented by a linear combination of the bases, which we call phase retardation features. Images are then partitioned into phase-homogeneous atoms by clustering neighboring pixels with similar phase retardation features. Consequently, cell segmentation is performed via a semi-supervised classification technique over the phase-homogeneous atoms. Experiments demonstrate that the proposed approach produces quality segmentation of individual cells and outperforms previous approaches. Copyright © 2013 Elsevier B.V. All rights reserved.
Lima, Ana Carolina E S; de Castro, Leandro Nunes
2014-10-01
Social media allow web users to create and share content pertaining to different subjects, exposing their activities, opinions, feelings and thoughts. In this context, online social media has attracted the interest of data scientists seeking to understand behaviours and trends, whilst collecting statistics for social sites. One potential application for these data is personality prediction, which aims to understand a user's behaviour within social media. Traditional personality prediction relies on users' profiles, their status updates, the messages they post, etc. Here, a personality prediction system for social media data is introduced that differs from most approaches in the literature, in that it works with groups of texts, instead of single texts, and does not take users' profiles into account. Also, the proposed approach extracts meta-attributes from texts and does not work directly with the content of the messages. The set of possible personality traits is taken from the Big Five model and allows the problem to be characterised as a multi-label classification task. The problem is then transformed into a set of five binary classification problems and solved by means of a semi-supervised learning approach, due to the difficulty in annotating the massive amounts of data generated in social media. In our implementation, the proposed system was trained with three well-known machine-learning algorithms, namely a Naïve Bayes classifier, a Support Vector Machine, and a Multilayer Perceptron neural network. The system was applied to predict the personality of Tweets taken from three datasets available in the literature, and resulted in an approximately 83% accurate prediction, with some of the personality traits presenting better individual classification rates than others. Copyright © 2014 Elsevier Ltd. All rights reserved.
Semi-supervised protein subcellular localization.
Xu, Qian; Hu, Derek Hao; Xue, Hong; Yu, Weichuan; Yang, Qiang
2009-01-30
Protein subcellular localization is concerned with predicting the location of a protein within a cell using computational method. The location information can indicate key functionalities of proteins. Accurate predictions of subcellular localizations of protein can aid the prediction of protein function and genome annotation, as well as the identification of drug targets. Computational methods based on machine learning, such as support vector machine approaches, have already been widely used in the prediction of protein subcellular localization. However, a major drawback of these machine learning-based approaches is that a large amount of data should be labeled in order to let the prediction system learn a classifier of good generalization ability. However, in real world cases, it is laborious, expensive and time-consuming to experimentally determine the subcellular localization of a protein and prepare instances of labeled data. In this paper, we present an approach based on a new learning framework, semi-supervised learning, which can use much fewer labeled instances to construct a high quality prediction model. We construct an initial classifier using a small set of labeled examples first, and then use unlabeled instances to refine the classifier for future predictions. Experimental results show that our methods can effectively reduce the workload for labeling data using the unlabeled data. Our method is shown to enhance the state-of-the-art prediction results of SVM classifiers by more than 10%.
Contaminant source identification using semi-supervised machine learning
NASA Astrophysics Data System (ADS)
Vesselinov, Velimir V.; Alexandrov, Boian S.; O'Malley, Daniel
2018-05-01
Identification of the original groundwater types present in geochemical mixtures observed in an aquifer is a challenging but very important task. Frequently, some of the groundwater types are related to different infiltration and/or contamination sources associated with various geochemical signatures and origins. The characterization of groundwater mixing processes typically requires solving complex inverse models representing groundwater flow and geochemical transport in the aquifer, where the inverse analysis accounts for available site data. Usually, the model is calibrated against the available data characterizing the spatial and temporal distribution of the observed geochemical types. Numerous different geochemical constituents and processes may need to be simulated in these models which further complicates the analyses. In this paper, we propose a new contaminant source identification approach that performs decomposition of the observation mixtures based on Non-negative Matrix Factorization (NMF) method for Blind Source Separation (BSS), coupled with a custom semi-supervised clustering algorithm. Our methodology, called NMFk, is capable of identifying (a) the unknown number of groundwater types and (b) the original geochemical concentration of the contaminant sources from measured geochemical mixtures with unknown mixing ratios without any additional site information. NMFk is tested on synthetic and real-world site data. The NMFk algorithm works with geochemical data represented in the form of concentrations, ratios (of two constituents; for example, isotope ratios), and delta notations (standard normalized stable isotope ratios).
Yang, Liang; Ge, Meng; Jin, Di; He, Dongxiao; Fu, Huazhu; Wang, Jing; Cao, Xiaochun
2017-01-01
Due to the demand for performance improvement and the existence of prior information, semi-supervised community detection with pairwise constraints becomes a hot topic. Most existing methods have been successfully encoding the must-link constraints, but neglect the opposite ones, i.e., the cannot-link constraints, which can force the exclusion between nodes. In this paper, we are interested in understanding the role of cannot-link constraints and effectively encoding pairwise constraints. Towards these goals, we define an integral generative process jointly considering the network topology, must-link and cannot-link constraints. We propose to characterize this process as a Multi-variance Mixed Gaussian Generative (MMGG) Model to address diverse degrees of confidences that exist in network topology and pairwise constraints and formulate it as a weighted nonnegative matrix factorization problem. The experiments on artificial and real-world networks not only illustrate the superiority of our proposed MMGG, but also, most importantly, reveal the roles of pairwise constraints. That is, though the must-link is more important than cannot-link when either of them is available, both must-link and cannot-link are equally important when both of them are available. To the best of our knowledge, this is the first work on discovering and exploring the importance of cannot-link constraints in semi-supervised community detection.
Contaminant source identification using semi-supervised machine learning
Vesselinov, Velimir Valentinov; Alexandrov, Boian S.; O’Malley, Dan
2017-11-08
Identification of the original groundwater types present in geochemical mixtures observed in an aquifer is a challenging but very important task. Frequently, some of the groundwater types are related to different infiltration and/or contamination sources associated with various geochemical signatures and origins. The characterization of groundwater mixing processes typically requires solving complex inverse models representing groundwater flow and geochemical transport in the aquifer, where the inverse analysis accounts for available site data. Usually, the model is calibrated against the available data characterizing the spatial and temporal distribution of the observed geochemical types. Numerous different geochemical constituents and processes may needmore » to be simulated in these models which further complicates the analyses. In this paper, we propose a new contaminant source identification approach that performs decomposition of the observation mixtures based on Non-negative Matrix Factorization (NMF) method for Blind Source Separation (BSS), coupled with a custom semi-supervised clustering algorithm. Our methodology, called NMFk, is capable of identifying (a) the unknown number of groundwater types and (b) the original geochemical concentration of the contaminant sources from measured geochemical mixtures with unknown mixing ratios without any additional site information. NMFk is tested on synthetic and real-world site data. Finally, the NMFk algorithm works with geochemical data represented in the form of concentrations, ratios (of two constituents; for example, isotope ratios), and delta notations (standard normalized stable isotope ratios).« less
Contaminant source identification using semi-supervised machine learning
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vesselinov, Velimir Valentinov; Alexandrov, Boian S.; O’Malley, Dan
Identification of the original groundwater types present in geochemical mixtures observed in an aquifer is a challenging but very important task. Frequently, some of the groundwater types are related to different infiltration and/or contamination sources associated with various geochemical signatures and origins. The characterization of groundwater mixing processes typically requires solving complex inverse models representing groundwater flow and geochemical transport in the aquifer, where the inverse analysis accounts for available site data. Usually, the model is calibrated against the available data characterizing the spatial and temporal distribution of the observed geochemical types. Numerous different geochemical constituents and processes may needmore » to be simulated in these models which further complicates the analyses. In this paper, we propose a new contaminant source identification approach that performs decomposition of the observation mixtures based on Non-negative Matrix Factorization (NMF) method for Blind Source Separation (BSS), coupled with a custom semi-supervised clustering algorithm. Our methodology, called NMFk, is capable of identifying (a) the unknown number of groundwater types and (b) the original geochemical concentration of the contaminant sources from measured geochemical mixtures with unknown mixing ratios without any additional site information. NMFk is tested on synthetic and real-world site data. Finally, the NMFk algorithm works with geochemical data represented in the form of concentrations, ratios (of two constituents; for example, isotope ratios), and delta notations (standard normalized stable isotope ratios).« less
Ge, Meng; Jin, Di; He, Dongxiao; Fu, Huazhu; Wang, Jing; Cao, Xiaochun
2017-01-01
Due to the demand for performance improvement and the existence of prior information, semi-supervised community detection with pairwise constraints becomes a hot topic. Most existing methods have been successfully encoding the must-link constraints, but neglect the opposite ones, i.e., the cannot-link constraints, which can force the exclusion between nodes. In this paper, we are interested in understanding the role of cannot-link constraints and effectively encoding pairwise constraints. Towards these goals, we define an integral generative process jointly considering the network topology, must-link and cannot-link constraints. We propose to characterize this process as a Multi-variance Mixed Gaussian Generative (MMGG) Model to address diverse degrees of confidences that exist in network topology and pairwise constraints and formulate it as a weighted nonnegative matrix factorization problem. The experiments on artificial and real-world networks not only illustrate the superiority of our proposed MMGG, but also, most importantly, reveal the roles of pairwise constraints. That is, though the must-link is more important than cannot-link when either of them is available, both must-link and cannot-link are equally important when both of them are available. To the best of our knowledge, this is the first work on discovering and exploring the importance of cannot-link constraints in semi-supervised community detection. PMID:28678864
The nature and structure of supervision in health visiting with victims of child sexual abuse.
Scott, L
1999-03-01
Part of a higher research degree to explore professional practice. To explore how health visitors work with victims of child sexual abuse and the supervision systems to support them. To seek the views and experiences of practising health visitors relating to complex care in order to consider the nature and structure of supervision. The research reported in this paper used a qualitative method of research and semi-structured interviews with practising health visitors of varying levels of experience in venues around England. Qualitative research enabled the exploration of experiences. Identification of the need for regular, structured, accountable opportunities in a 'private setting' to discuss whole caseload work and current practice issues. Supervision requires a structured, formalized process, in both regularity and content, as a means to explore and acknowledge work with increasingly complex care, to enable full discussion of whole caseloads. Supervision is demonstrated as a vehicle to enable the sharing of good practices and for weak practices to be identified and managed appropriately. Supervision seeks to fulfil the above whilst promoting a stimulating, learning experience, accommodating the notion that individuals learn at their own pace and bring a wealth of human experience to the service. The size of the study was dictated by the amount of time available within which to complete a research master's degree course primarily in the author's own time, over a 2-year period. The majority of participants volunteered their accounts in their own time. For others I obtained permission from their employers for them to participate once they approached me with an interest in being interviewed. This research provides a model of supervision based on practitioner views and experiences. The article highlights the value of research and evidence-based information to enhance practice accountability and the quality of care. Proactive risk management can safeguard the health and safety of the public, the practitioner and the organization.
The Need for Data-Informed Clinical Supervision in Substance Use Disorder Treatment
Ramsey, Alex T.; Baumann, Ana; Silver Wolf, David Patterson; Yan, Yan; Cooper, Ben; Proctor, Enola
2017-01-01
Background Effective clinical supervision is necessary for high-quality care in community-based substance use disorder treatment settings, yet little is known about current supervision practices. Some evidence suggests that supervisors and counselors differ in their experiences of clinical supervision; however, the impact of this misalignment on supervision quality is unclear. Clinical information monitoring systems may support supervision in substance use disorder treatment, but the potential use of these tools must first be explored. Aims First, this study examines the extent to which misaligned supervisor-counselor perceptions impact supervision satisfaction and emphasis on evidence-based treatments. This study also reports on formative work to develop a supervision-based clinical dashboard, an electronic information monitoring system and data visualization tool providing real-time clinical information to engage supervisors and counselors in a coordinated and data-informed manner, help align supervisor-counselor perceptions about supervision, and improve supervision effectiveness. Methods Clinical supervisors and frontline counselors (N=165) from five Midwestern agencies providing substance abuse services completed an online survey using Research Electronic Data Capture (REDCap) software, yielding a 75% response rate. Valid quantitative measures of supervision effectiveness were assessed, along with qualitative perceptions of a supervision-based clinical dashboard. Results Through within-dyad analyses, misalignment between supervisor and counselor perceptions of supervision practices was negatively associated with satisfaction of supervision and reported frequency of discussing several important clinical supervision topics, including evidence-based treatments and client rapport. Participants indicated the most useful clinical dashboard functions and reported important benefits and challenges to using the proposed tool. Discussion Clinical supervision tends to be largely an informal and unstructured process in substance abuse treatment, which may compromise the quality of care. Clinical dashboards may be a well-targeted approach to facilitate data-informed clinical supervision in community-based treatment agencies. PMID:28166480
The need for data-informed clinical supervision in substance use disorder treatment.
Ramsey, Alex T; Baumann, Ana; Patterson Silver Wolf, David; Yan, Yan; Cooper, Ben; Proctor, Enola
2017-01-01
Effective clinical supervision is necessary for high-quality care in community-based substance use disorder treatment settings, yet little is known about current supervision practices. Some evidence suggests that supervisors and counselors differ in their experiences of clinical supervision; however, the impact of this misalignment on supervision quality is unclear. Clinical information monitoring systems may support supervision in substance use disorder treatment, but the potential use of these tools must first be explored. First, the current study examines the extent to which misaligned supervisor-counselor perceptions impact supervision satisfaction and emphasis on evidence-based treatments. This study also reports on formative work to develop a supervision-based clinical dashboard, an electronic information monitoring system and data visualization tool providing real-time clinical information to engage supervisors and counselors in a coordinated and data-informed manner, help align supervisor-counselor perceptions about supervision, and improve supervision effectiveness. Clinical supervisors and frontline counselors (N = 165) from five Midwestern agencies providing substance abuse services completed an online survey using Research Electronic Data Capture software, yielding a 75% response rate. Valid quantitative measures of supervision effectiveness were administered, along with qualitative perceptions of a supervision-based clinical dashboard. Through within-dyad analyses, misalignment between supervisor and counselor perceptions of supervision practices was negatively associated with satisfaction of supervision and reported frequency of discussing several important clinical supervision topics, including evidence-based treatments and client rapport. Participants indicated the most useful clinical dashboard functions and reported important benefits and challenges to using the proposed tool. Clinical supervision tends to be largely an informal and unstructured process in substance abuse treatment, which may compromise the quality of care. Clinical dashboards may be a well-targeted approach to facilitate data-informed clinical supervision in community-based treatment agencies.
Arrangement and Applying of Movement Patterns in the Cerebellum Based on Semi-supervised Learning.
Solouki, Saeed; Pooyan, Mohammad
2016-06-01
Biological control systems have long been studied as a possible inspiration for the construction of robotic controllers. The cerebellum is known to be involved in the production and learning of smooth, coordinated movements. Therefore, highly regular structure of the cerebellum has been in the core of attention in theoretical and computational modeling. However, most of these models reflect some special features of the cerebellum without regarding the whole motor command computational process. In this paper, we try to make a logical relation between the most significant models of the cerebellum and introduce a new learning strategy to arrange the movement patterns: cerebellar modular arrangement and applying of movement patterns based on semi-supervised learning (CMAPS). We assume here the cerebellum like a big archive of patterns that has an efficient organization to classify and recall them. The main idea is to achieve an optimal use of memory locations by more than just a supervised learning and classification algorithm. Surely, more experimental and physiological researches are needed to confirm our hypothesis.
SU-F-I-10: Spatially Local Statistics for Adaptive Image Filtering
DOE Office of Scientific and Technical Information (OSTI.GOV)
Iliopoulos, AS; Sun, X; Floros, D
Purpose: To facilitate adaptive image filtering operations, addressing spatial variations in both noise and signal. Such issues are prevalent in cone-beam projections, where physical effects such as X-ray scattering result in spatially variant noise, violating common assumptions of homogeneous noise and challenging conventional filtering approaches to signal extraction and noise suppression. Methods: We present a computational mechanism for probing into and quantifying the spatial variance of noise throughout an image. The mechanism builds a pyramid of local statistics at multiple spatial scales; local statistical information at each scale includes (weighted) mean, median, standard deviation, median absolute deviation, as well asmore » histogram or dynamic range after local mean/median shifting. Based on inter-scale differences of local statistics, the spatial scope of distinguishable noise variation is detected in a semi- or un-supervised manner. Additionally, we propose and demonstrate the incorporation of such information in globally parametrized (i.e., non-adaptive) filters, effectively transforming the latter into spatially adaptive filters. The multi-scale mechanism is materialized by efficient algorithms and implemented in parallel CPU/GPU architectures. Results: We demonstrate the impact of local statistics for adaptive image processing and analysis using cone-beam projections of a Catphan phantom, fitted within an annulus to increase X-ray scattering. The effective spatial scope of local statistics calculations is shown to vary throughout the image domain, necessitating multi-scale noise and signal structure analysis. Filtering results with and without spatial filter adaptation are compared visually, illustrating improvements in imaging signal extraction and noise suppression, and in preserving information in low-contrast regions. Conclusion: Local image statistics can be incorporated in filtering operations to equip them with spatial adaptivity to spatial signal/noise variations. An efficient multi-scale computational mechanism is developed to curtail processing latency. Spatially adaptive filtering may impact subsequent processing tasks such as reconstruction and numerical gradient computations for deformable registration. NIH Grant No. R01-184173.« less
Unsupervised, Robust Estimation-based Clustering for Multispectral Images
NASA Technical Reports Server (NTRS)
Netanyahu, Nathan S.
1997-01-01
To prepare for the challenge of handling the archiving and querying of terabyte-sized scientific spatial databases, the NASA Goddard Space Flight Center's Applied Information Sciences Branch (AISB, Code 935) developed a number of characterization algorithms that rely on supervised clustering techniques. The research reported upon here has been aimed at continuing the evolution of some of these supervised techniques, namely the neural network and decision tree-based classifiers, plus extending the approach to incorporating unsupervised clustering algorithms, such as those based on robust estimation (RE) techniques. The algorithms developed under this task should be suited for use by the Intelligent Information Fusion System (IIFS) metadata extraction modules, and as such these algorithms must be fast, robust, and anytime in nature. Finally, so that the planner/schedule module of the IlFS can oversee the use and execution of these algorithms, all information required by the planner/scheduler must be provided to the IIFS development team to ensure the timely integration of these algorithms into the overall system.
NASA Astrophysics Data System (ADS)
Shenoy Handiru, Vikram; Vinod, A. P.; Guan, Cuntai
2017-08-01
Objective. In electroencephalography (EEG)-based brain-computer interface (BCI) systems for motor control tasks the conventional practice is to decode motor intentions by using scalp EEG. However, scalp EEG only reveals certain limited information about the complex tasks of movement with a higher degree of freedom. Therefore, our objective is to investigate the effectiveness of source-space EEG in extracting relevant features that discriminate arm movement in multiple directions. Approach. We have proposed a novel feature extraction algorithm based on supervised factor analysis that models the data from source-space EEG. To this end, we computed the features from the source dipoles confined to Brodmann areas of interest (BA4a, BA4p and BA6). Further, we embedded class-wise labels of multi-direction (multi-class) source-space EEG to an unsupervised factor analysis to make it into a supervised learning method. Main Results. Our approach provided an average decoding accuracy of 71% for the classification of hand movement in four orthogonal directions, that is significantly higher (>10%) than the classification accuracy obtained using state-of-the-art spatial pattern features in sensor space. Also, the group analysis on the spectral characteristics of source-space EEG indicates that the slow cortical potentials from a set of cortical source dipoles reveal discriminative information regarding the movement parameter, direction. Significance. This study presents evidence that low-frequency components in the source space play an important role in movement kinematics, and thus it may lead to new strategies for BCI-based neurorehabilitation.
Extracting PICO Sentences from Clinical Trial Reports using Supervised Distant Supervision
Wallace, Byron C.; Kuiper, Joël; Sharma, Aakash; Zhu, Mingxi (Brian); Marshall, Iain J.
2016-01-01
Systematic reviews underpin Evidence Based Medicine (EBM) by addressing precise clinical questions via comprehensive synthesis of all relevant published evidence. Authors of systematic reviews typically define a Population/Problem, Intervention, Comparator, and Outcome (a PICO criteria) of interest, and then retrieve, appraise and synthesize results from all reports of clinical trials that meet these criteria. Identifying PICO elements in the full-texts of trial reports is thus a critical yet time-consuming step in the systematic review process. We seek to expedite evidence synthesis by developing machine learning models to automatically extract sentences from articles relevant to PICO elements. Collecting a large corpus of training data for this task would be prohibitively expensive. Therefore, we derive distant supervision (DS) with which to train models using previously conducted reviews. DS entails heuristically deriving ‘soft’ labels from an available structured resource. However, we have access only to unstructured, free-text summaries of PICO elements for corresponding articles; we must derive from these the desired sentence-level annotations. To this end, we propose a novel method – supervised distant supervision (SDS) – that uses a small amount of direct supervision to better exploit a large corpus of distantly labeled instances by learning to pseudo-annotate articles using the available DS. We show that this approach tends to outperform existing methods with respect to automated PICO extraction. PMID:27746703
Analysis of Technique to Extract Data from the Web for Improved Performance
NASA Astrophysics Data System (ADS)
Gupta, Neena; Singh, Manish
2010-11-01
The World Wide Web rapidly guides the world into a newly amazing electronic world, where everyone can publish anything in electronic form and extract almost all the information. Extraction of information from semi structured or unstructured documents, such as web pages, is a useful yet complex task. Data extraction, which is important for many applications, extracts the records from the HTML files automatically. Ontologies can achieve a high degree of accuracy in data extraction. We analyze method for data extraction OBDE (Ontology-Based Data Extraction), which automatically extracts the query result records from the web with the help of agents. OBDE first constructs an ontology for a domain according to information matching between the query interfaces and query result pages from different web sites within the same domain. Then, the constructed domain ontology is used during data extraction to identify the query result section in a query result page and to align and label the data values in the extracted records. The ontology-assisted data extraction method is fully automatic and overcomes many of the deficiencies of current automatic data extraction methods.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Symons, Christopher T; Arel, Itamar
2011-01-01
Budgeted learning under constraints on both the amount of labeled information and the availability of features at test time pertains to a large number of real world problems. Ideas from multi-view learning, semi-supervised learning, and even active learning have applicability, but a common framework whose assumptions fit these problem spaces is non-trivial to construct. We leverage ideas from these fields based on graph regularizers to construct a robust framework for learning from labeled and unlabeled samples in multiple views that are non-independent and include features that are inaccessible at the time the model would need to be applied. We describemore » examples of applications that fit this scenario, and we provide experimental results to demonstrate the effectiveness of knowledge carryover from training-only views. As learning algorithms are applied to more complex applications, relevant information can be found in a wider variety of forms, and the relationships between these information sources are often quite complex. The assumptions that underlie most learning algorithms do not readily or realistically permit the incorporation of many of the data sources that are available, despite an implicit understanding that useful information exists in these sources. When multiple information sources are available, they are often partially redundant, highly interdependent, and contain noise as well as other information that is irrelevant to the problem under study. In this paper, we are focused on a framework whose assumptions match this reality, as well as the reality that labeled information is usually sparse. Most significantly, we are interested in a framework that can also leverage information in scenarios where many features that would be useful for learning a model are not available when the resulting model will be applied. As with constraints on labels, there are many practical limitations on the acquisition of potentially useful features. A key difference in the case of feature acquisition is that the same constraints often don't pertain to the training samples. This difference provides an opportunity to allow features that are impractical in an applied setting to nevertheless add value during the model-building process. Unfortunately, there are few machine learning frameworks built on assumptions that allow effective utilization of features that are only available at training time. In this paper we formulate a knowledge carryover framework for the budgeted learning scenario with constraints on features and labels. The approach is based on multi-view and semi-supervised learning methods that use graph-encoded regularization. Our main contributions are the following: (1) we propose and provide justification for a methodology for ensuring that changes in the graph regularizer using alternate views are performed in a manner that is target-concept specific, allowing value to be obtained from noisy views; and (2) we demonstrate how this general set-up can be used to effectively improve models by leveraging features unavailable at test time. The rest of the paper is structured as follows. In Section 2, we outline real-world problems to motivate the approach and describe relevant prior work. Section 3 describes the graph construction process and the learning methodologies that are employed. Section 4 provides preliminary discussion regarding theoretical motivation for the method. In Section 5, effectiveness of the approach is demonstrated in a series of experiments employing modified versions of two well-known semi-supervised learning algorithms. Section 6 concludes the paper.« less
NASA Astrophysics Data System (ADS)
Cruz-Roa, Angel; Arevalo, John; Basavanhally, Ajay; Madabhushi, Anant; González, Fabio
2015-01-01
Learning data representations directly from the data itself is an approach that has shown great success in different pattern recognition problems, outperforming state-of-the-art feature extraction schemes for different tasks in computer vision, speech recognition and natural language processing. Representation learning applies unsupervised and supervised machine learning methods to large amounts of data to find building-blocks that better represent the information in it. Digitized histopathology images represents a very good testbed for representation learning since it involves large amounts of high complex, visual data. This paper presents a comparative evaluation of different supervised and unsupervised representation learning architectures to specifically address open questions on what type of learning architectures (deep or shallow), type of learning (unsupervised or supervised) is optimal. In this paper we limit ourselves to addressing these questions in the context of distinguishing between anaplastic and non-anaplastic medulloblastomas from routine haematoxylin and eosin stained images. The unsupervised approaches evaluated were sparse autoencoders and topographic reconstruct independent component analysis, and the supervised approach was convolutional neural networks. Experimental results show that shallow architectures with more neurons are better than deeper architectures without taking into account local space invariances and that topographic constraints provide useful invariant features in scale and rotations for efficient tumor differentiation.
Ravikumar, Ke; Liu, Haibin; Cohn, Judith D; Wall, Michael E; Verspoor, Karin
2012-10-05
We propose a method for automatic extraction of protein-specific residue mentions from the biomedical literature. The method searches text for mentions of amino acids at specific sequence positions and attempts to correctly associate each mention with a protein also named in the text. The methods presented in this work will enable improved protein functional site extraction from articles, ultimately supporting protein function prediction. Our method made use of linguistic patterns for identifying the amino acid residue mentions in text. Further, we applied an automated graph-based method to learn syntactic patterns corresponding to protein-residue pairs mentioned in the text. We finally present an approach to automated construction of relevant training and test data using the distant supervision model. The performance of the method was assessed by extracting protein-residue relations from a new automatically generated test set of sentences containing high confidence examples found using distant supervision. It achieved a F-measure of 0.84 on automatically created silver corpus and 0.79 on a manually annotated gold data set for this task, outperforming previous methods. The primary contributions of this work are to (1) demonstrate the effectiveness of distant supervision for automatic creation of training data for protein-residue relation extraction, substantially reducing the effort and time involved in manual annotation of a data set and (2) show that the graph-based relation extraction approach we used generalizes well to the problem of protein-residue association extraction. This work paves the way towards effective extraction of protein functional residues from the literature.
NASA Astrophysics Data System (ADS)
Amato, Gabriele; Eisank, Clemens; Albrecht, Florian
2017-04-01
Landslide detection from Earth observation imagery is an important preliminary work for landslide mapping, landslide inventories and landslide hazard assessment. In this context, the object-based image analysis (OBIA) concept has been increasingly used over the last decade. Within the framework of the Land@Slide project (Earth observation based landslide mapping: from methodological developments to automated web-based information delivery) a simple, unsupervised, semi-automatic and object-based approach for the detection of shallow landslides has been developed and implemented in the InterIMAGE open-source software. The method was applied to an Alpine case study in western Austria, exploiting spectral information from pansharpened 4-bands WorldView-2 satellite imagery (0.5 m spatial resolution) in combination with digital elevation models. First, we divided the image into sub-images, i.e. tiles, and then we applied the workflow to each of them without changing the parameters. The workflow was implemented as top-down approach: at the image tile level, an over-classification of the potential landslide area was produced; the over-estimated area was re-segmented and re-classified by several processing cycles until most false positive objects have been eliminated. In every step a Baatz algorithm based segmentation generates polygons "candidates" to be landslides. At the same time, the average values of normalized difference vegetation index (NDVI) and brightness are calculated for these polygons; after that, these values are used as thresholds to perform an objects selection in order to improve the quality of the classification results. In combination, also empirically determined values of slope and roughness are used in the selection process. Results for each tile were merged to obtain the landslide map for the test area. For final validation, the landslide map was compared to a geological map and a supervised landslide classification in order to estimate its accuracy. Results for the test area showed that the proposed method is capable of accurately distinguishing landslides from roofs and trees. Implementation of the workflow into InterIMAGE was straightforward. We conclude that the method is able to extract landslides in forested areas, but that there is still room for improvements concerning the extraction in non-forested high-alpine regions.
Visualization Techniques for Computer Network Defense
DOE Office of Scientific and Technical Information (OSTI.GOV)
Beaver, Justin M; Steed, Chad A; Patton, Robert M
2011-01-01
Effective visual analysis of computer network defense (CND) information is challenging due to the volume and complexity of both the raw and analyzed network data. A typical CND is comprised of multiple niche intrusion detection tools, each of which performs network data analysis and produces a unique alerting output. The state-of-the-practice in the situational awareness of CND data is the prevalent use of custom-developed scripts by Information Technology (IT) professionals to retrieve, organize, and understand potential threat events. We propose a new visual analytics framework, called the Oak Ridge Cyber Analytics (ORCA) system, for CND data that allows an operatormore » to interact with all detection tool outputs simultaneously. Aggregated alert events are presented in multiple coordinated views with timeline, cluster, and swarm model analysis displays. These displays are complemented with both supervised and semi-supervised machine learning classifiers. The intent of the visual analytics framework is to improve CND situational awareness, to enable an analyst to quickly navigate and analyze thousands of detected events, and to combine sophisticated data analysis techniques with interactive visualization such that patterns of anomalous activities may be more easily identified and investigated.« less
Extracting laboratory test information from biomedical text
Kang, Yanna Shen; Kayaalp, Mehmet
2013-01-01
Background: No previous study reported the efficacy of current natural language processing (NLP) methods for extracting laboratory test information from narrative documents. This study investigates the pathology informatics question of how accurately such information can be extracted from text with the current tools and techniques, especially machine learning and symbolic NLP methods. The study data came from a text corpus maintained by the U.S. Food and Drug Administration, containing a rich set of information on laboratory tests and test devices. Methods: The authors developed a symbolic information extraction (SIE) system to extract device and test specific information about four types of laboratory test entities: Specimens, analytes, units of measures and detection limits. They compared the performance of SIE and three prominent machine learning based NLP systems, LingPipe, GATE and BANNER, each implementing a distinct supervised machine learning method, hidden Markov models, support vector machines and conditional random fields, respectively. Results: Machine learning systems recognized laboratory test entities with moderately high recall, but low precision rates. Their recall rates were relatively higher when the number of distinct entity values (e.g., the spectrum of specimens) was very limited or when lexical morphology of the entity was distinctive (as in units of measures), yet SIE outperformed them with statistically significant margins on extracting specimen, analyte and detection limit information in both precision and F-measure. Its high recall performance was statistically significant on analyte information extraction. Conclusions: Despite its shortcomings against machine learning methods, a well-tailored symbolic system may better discern relevancy among a pile of information of the same type and may outperform a machine learning system by tapping into lexically non-local contextual information such as the document structure. PMID:24083058
An image-guided tool to prevent hospital acquired infections
NASA Astrophysics Data System (ADS)
Nagy, Melinda; Szilágyi, László; Lehotsky, Ákos; Haidegger, Tamás; Benyó, Balázs
2011-03-01
Hospital Acquired Infections (HAI) represent the fourth leading cause of death in the United States, and claims hundreds of thousands of lives annually in the rest of the world. This paper presents a novel low-cost mobile device|called Stery-Hand|that helps to avoid HAI by improving hand hygiene control through providing an objective evaluation of the quality of hand washing. The use of the system is intuitive: having performed hand washing with a soap mixed with UV re ective powder, the skin appears brighter in UV illumination on the disinfected surfaces. Washed hands are inserted into the Stery-Hand box, where a digital image is taken under UV lighting. Automated image processing algorithms are employed in three steps to evaluate the quality of hand washing. First, the contour of the hand is extracted in order to distinguish the hand from the background. Next, a semi-supervised clustering algorithm classies the pixels of the hand into three groups, corresponding to clean, partially clean and dirty areas. The clustering algorithm is derived from the histogram-based quick fuzzy c-means approach, using a priori information extracted from reference images, evaluated by experts. Finally, the identied areas are adjusted to suppress shading eects, and quantied in order to give a verdict on hand disinfection quality. The proposed methodology was validated through tests using hundreds of images recorded in our laboratory. The proposed system was found robust and accurate, producing correct estimation for over 98% of the test cases. Stery-Hand may be employed in general practice, and it may also serve educational purposes.
Nonlinear Semi-Supervised Metric Learning Via Multiple Kernels and Local Topology.
Li, Xin; Bai, Yanqin; Peng, Yaxin; Du, Shaoyi; Ying, Shihui
2018-03-01
Changing the metric on the data may change the data distribution, hence a good distance metric can promote the performance of learning algorithm. In this paper, we address the semi-supervised distance metric learning (ML) problem to obtain the best nonlinear metric for the data. First, we describe the nonlinear metric by the multiple kernel representation. By this approach, we project the data into a high dimensional space, where the data can be well represented by linear ML. Then, we reformulate the linear ML by a minimization problem on the positive definite matrix group. Finally, we develop a two-step algorithm for solving this model and design an intrinsic steepest descent algorithm to learn the positive definite metric matrix. Experimental results validate that our proposed method is effective and outperforms several state-of-the-art ML methods.
Application of semi-supervised deep learning to lung sound analysis.
Chamberlain, Daniel; Kodgule, Rahul; Ganelin, Daniela; Miglani, Vivek; Fletcher, Richard Ribon
2016-08-01
The analysis of lung sounds, collected through auscultation, is a fundamental component of pulmonary disease diagnostics for primary care and general patient monitoring for telemedicine. Despite advances in computation and algorithms, the goal of automated lung sound identification and classification has remained elusive. Over the past 40 years, published work in this field has demonstrated only limited success in identifying lung sounds, with most published studies using only a small numbers of patients (typically N<;20) and usually limited to a single type of lung sound. Larger research studies have also been impeded by the challenge of labeling large volumes of data, which is extremely labor-intensive. In this paper, we present the development of a semi-supervised deep learning algorithm for automatically classify lung sounds from a relatively large number of patients (N=284). Focusing on the two most common lung sounds, wheeze and crackle, we present results from 11,627 sound files recorded from 11 different auscultation locations on these 284 patients with pulmonary disease. 890 of these sound files were labeled to evaluate the model, which is significantly larger than previously published studies. Data was collected with a custom mobile phone application and a low-cost (US$30) electronic stethoscope. On this data set, our algorithm achieves ROC curves with AUCs of 0.86 for wheeze and 0.74 for crackle. Most importantly, this study demonstrates how semi-supervised deep learning can be used with larger data sets without requiring extensive labeling of data.
A Modular Hierarchical Approach to 3D Electron Microscopy Image Segmentation
Liu, Ting; Jones, Cory; Seyedhosseini, Mojtaba; Tasdizen, Tolga
2014-01-01
The study of neural circuit reconstruction, i.e., connectomics, is a challenging problem in neuroscience. Automated and semi-automated electron microscopy (EM) image analysis can be tremendously helpful for connectomics research. In this paper, we propose a fully automatic approach for intra-section segmentation and inter-section reconstruction of neurons using EM images. A hierarchical merge tree structure is built to represent multiple region hypotheses and supervised classification techniques are used to evaluate their potentials, based on which we resolve the merge tree with consistency constraints to acquire final intra-section segmentation. Then, we use a supervised learning based linking procedure for the inter-section neuron reconstruction. Also, we develop a semi-automatic method that utilizes the intermediate outputs of our automatic algorithm and achieves intra-segmentation with minimal user intervention. The experimental results show that our automatic method can achieve close-to-human intra-segmentation accuracy and state-of-the-art inter-section reconstruction accuracy. We also show that our semi-automatic method can further improve the intra-segmentation accuracy. PMID:24491638
Som, Meena; Panda, Bhuputra; Pati, Sanghamitra; Nallala, Srinivas; Anasuya, Anita; Chauhan, Abhimanyu Singh; Sen, Ashish Kumar; Zodpey, Sanjay
2014-06-30
Routine immunization is a key child survival intervention. Issues related to quality of service delivery pose operational challenges in delivering effective immunization services. Accumulated evidences suggest that "supportive supervision" improves the quality of health care services. During 2009-10, Govt. of Odisha (GoO) and UNICEF jointly piloted this strategy in four districts to improve routine immunization. The present study aims to assess the effect of supportive supervision strategy on improvement of knowledge and practices on routine immunization among service providers. We adopted a 'post-test only' study design to compare the knowledge and practices of frontline health workers and their supervisors in four intervention districts with that of two control districts. Altogether we interviewed 170 supervisors and supervisees (health workers), each, using semi-structured interview schedules. We also directly observed 25 ice lined refrigerator (ILR) points in both groups of districts. The findings were compared with the baseline information, available only for the intervention districts. The health workers in the intervention districts displayed a higher knowledge score in selected items than in the control group. No significant difference in knowledge was observed between control and intervention supervisors. The management practices at ILR points on key routine immunization components were found to have improved significantly in intervention districts. The observed improvements in the ILR management practices indicate positive influence of supportive supervision. Higher level of domain knowledge among intervention health workers on specific items related to routine immunization could be due to successful transfer of knowledge from supervisors. A 'pre-post' study design should be undertaken to gain insights into the effectiveness of supportive supervision in improving routine immunization services.
Using statistical text classification to identify health information technology incidents
Chai, Kevin E K; Anthony, Stephen; Coiera, Enrico; Magrabi, Farah
2013-01-01
Objective To examine the feasibility of using statistical text classification to automatically identify health information technology (HIT) incidents in the USA Food and Drug Administration (FDA) Manufacturer and User Facility Device Experience (MAUDE) database. Design We used a subset of 570 272 incidents including 1534 HIT incidents reported to MAUDE between 1 January 2008 and 1 July 2010. Text classifiers using regularized logistic regression were evaluated with both ‘balanced’ (50% HIT) and ‘stratified’ (0.297% HIT) datasets for training, validation, and testing. Dataset preparation, feature extraction, feature selection, cross-validation, classification, performance evaluation, and error analysis were performed iteratively to further improve the classifiers. Feature-selection techniques such as removing short words and stop words, stemming, lemmatization, and principal component analysis were examined. Measurements κ statistic, F1 score, precision and recall. Results Classification performance was similar on both the stratified (0.954 F1 score) and balanced (0.995 F1 score) datasets. Stemming was the most effective technique, reducing the feature set size to 79% while maintaining comparable performance. Training with balanced datasets improved recall (0.989) but reduced precision (0.165). Conclusions Statistical text classification appears to be a feasible method for identifying HIT reports within large databases of incidents. Automated identification should enable more HIT problems to be detected, analyzed, and addressed in a timely manner. Semi-supervised learning may be necessary when applying machine learning to big data analysis of patient safety incidents and requires further investigation. PMID:23666777
Multiple supervised residual network for osteosarcoma segmentation in CT images.
Zhang, Rui; Huang, Lin; Xia, Wei; Zhang, Bo; Qiu, Bensheng; Gao, Xin
2018-01-01
Automatic and accurate segmentation of osteosarcoma region in CT images can help doctor make a reasonable treatment plan, thus improving cure rate. In this paper, a multiple supervised residual network (MSRN) was proposed for osteosarcoma image segmentation. Three supervised side output modules were added to the residual network. The shallow side output module could extract image shape features, such as edge features and texture features. The deep side output module could extract semantic features. The side output module could compute the loss value between output probability map and ground truth and back-propagate the loss information. Then, the parameters of residual network could be modified by gradient descent method. This could guide the multi-scale feature learning of the network. The final segmentation results were obtained by fusing the results output by the three side output modules. A total of 1900 CT images from 15 osteosarcoma patients were used to train the network and a total of 405 CT images from another 8 osteosarcoma patients were used to test the network. Results indicated that MSRN enabled a dice similarity coefficient (DSC) of 89.22%, a sensitivity of 88.74% and a F1-measure of 0.9305, which were larger than those obtained by fully convolutional network (FCN) and U-net. Thus, MSRN for osteosarcoma segmentation could give more accurate results than FCN and U-Net. Copyright © 2018 Elsevier Ltd. All rights reserved.
Incorporating World Knowledge to Document Clustering via Heterogeneous Information Networks.
Wang, Chenguang; Song, Yangqiu; El-Kishky, Ahmed; Roth, Dan; Zhang, Ming; Han, Jiawei
2015-08-01
One of the key obstacles in making learning protocols realistic in applications is the need to supervise them, a costly process that often requires hiring domain experts. We consider the framework to use the world knowledge as indirect supervision. World knowledge is general-purpose knowledge, which is not designed for any specific domain. Then the key challenges are how to adapt the world knowledge to domains and how to represent it for learning. In this paper, we provide an example of using world knowledge for domain dependent document clustering. We provide three ways to specify the world knowledge to domains by resolving the ambiguity of the entities and their types, and represent the data with world knowledge as a heterogeneous information network. Then we propose a clustering algorithm that can cluster multiple types and incorporate the sub-type information as constraints. In the experiments, we use two existing knowledge bases as our sources of world knowledge. One is Freebase, which is collaboratively collected knowledge about entities and their organizations. The other is YAGO2, a knowledge base automatically extracted from Wikipedia and maps knowledge to the linguistic knowledge base, Word-Net. Experimental results on two text benchmark datasets (20newsgroups and RCV1) show that incorporating world knowledge as indirect supervision can significantly outperform the state-of-the-art clustering algorithms as well as clustering algorithms enhanced with world knowledge features.
Incorporating World Knowledge to Document Clustering via Heterogeneous Information Networks
Wang, Chenguang; Song, Yangqiu; El-Kishky, Ahmed; Roth, Dan; Zhang, Ming; Han, Jiawei
2015-01-01
One of the key obstacles in making learning protocols realistic in applications is the need to supervise them, a costly process that often requires hiring domain experts. We consider the framework to use the world knowledge as indirect supervision. World knowledge is general-purpose knowledge, which is not designed for any specific domain. Then the key challenges are how to adapt the world knowledge to domains and how to represent it for learning. In this paper, we provide an example of using world knowledge for domain dependent document clustering. We provide three ways to specify the world knowledge to domains by resolving the ambiguity of the entities and their types, and represent the data with world knowledge as a heterogeneous information network. Then we propose a clustering algorithm that can cluster multiple types and incorporate the sub-type information as constraints. In the experiments, we use two existing knowledge bases as our sources of world knowledge. One is Freebase, which is collaboratively collected knowledge about entities and their organizations. The other is YAGO2, a knowledge base automatically extracted from Wikipedia and maps knowledge to the linguistic knowledge base, Word-Net. Experimental results on two text benchmark datasets (20newsgroups and RCV1) show that incorporating world knowledge as indirect supervision can significantly outperform the state-of-the-art clustering algorithms as well as clustering algorithms enhanced with world knowledge features. PMID:26705504
Deep Question Answering for protein annotation
Gobeill, Julien; Gaudinat, Arnaud; Pasche, Emilie; Vishnyakova, Dina; Gaudet, Pascale; Bairoch, Amos; Ruch, Patrick
2015-01-01
Biomedical professionals have access to a huge amount of literature, but when they use a search engine, they often have to deal with too many documents to efficiently find the appropriate information in a reasonable time. In this perspective, question-answering (QA) engines are designed to display answers, which were automatically extracted from the retrieved documents. Standard QA engines in literature process a user question, then retrieve relevant documents and finally extract some possible answers out of these documents using various named-entity recognition processes. In our study, we try to answer complex genomics questions, which can be adequately answered only using Gene Ontology (GO) concepts. Such complex answers cannot be found using state-of-the-art dictionary- and redundancy-based QA engines. We compare the effectiveness of two dictionary-based classifiers for extracting correct GO answers from a large set of 100 retrieved abstracts per question. In the same way, we also investigate the power of GOCat, a GO supervised classifier. GOCat exploits the GOA database to propose GO concepts that were annotated by curators for similar abstracts. This approach is called deep QA, as it adds an original classification step, and exploits curated biological data to infer answers, which are not explicitly mentioned in the retrieved documents. We show that for complex answers such as protein functional descriptions, the redundancy phenomenon has a limited effect. Similarly usual dictionary-based approaches are relatively ineffective. In contrast, we demonstrate how existing curated data, beyond information extraction, can be exploited by a supervised classifier, such as GOCat, to massively improve both the quantity and the quality of the answers with a +100% improvement for both recall and precision. Database URL: http://eagl.unige.ch/DeepQA4PA/ PMID:26384372
Exploring Spanish health social media for detecting drug effects.
Segura-Bedmar, Isabel; Martínez, Paloma; Revert, Ricardo; Moreno-Schneider, Julián
2015-01-01
Adverse Drug reactions (ADR) cause a high number of deaths among hospitalized patients in developed countries. Major drug agencies have devoted a great interest in the early detection of ADRs due to their high incidence and increasing health care costs. Reporting systems are available in order for both healthcare professionals and patients to alert about possible ADRs. However, several studies have shown that these adverse events are underestimated. Our hypothesis is that health social networks could be a significant information source for the early detection of ADRs as well as of new drug indications. In this work we present a system for detecting drug effects (which include both adverse drug reactions as well as drug indications) from user posts extracted from a Spanish health forum. Texts were processed using MeaningCloud, a multilingual text analysis engine, to identify drugs and effects. In addition, we developed the first Spanish database storing drugs as well as their effects automatically built from drug package inserts gathered from online websites. We then applied a distant-supervision method using the database on a collection of 84,000 messages in order to extract the relations between drugs and their effects. To classify the relation instances, we used a kernel method based only on shallow linguistic information of the sentences. Regarding Relation Extraction of drugs and their effects, the distant supervision approach achieved a recall of 0.59 and a precision of 0.48. The task of extracting relations between drugs and their effects from social media is a complex challenge due to the characteristics of social media texts. These texts, typically posts or tweets, usually contain many grammatical errors and spelling mistakes. Moreover, patients use lay terminology to refer to diseases, symptoms and indications that is not usually included in lexical resources in languages other than English.
Deep Question Answering for protein annotation.
Gobeill, Julien; Gaudinat, Arnaud; Pasche, Emilie; Vishnyakova, Dina; Gaudet, Pascale; Bairoch, Amos; Ruch, Patrick
2015-01-01
Biomedical professionals have access to a huge amount of literature, but when they use a search engine, they often have to deal with too many documents to efficiently find the appropriate information in a reasonable time. In this perspective, question-answering (QA) engines are designed to display answers, which were automatically extracted from the retrieved documents. Standard QA engines in literature process a user question, then retrieve relevant documents and finally extract some possible answers out of these documents using various named-entity recognition processes. In our study, we try to answer complex genomics questions, which can be adequately answered only using Gene Ontology (GO) concepts. Such complex answers cannot be found using state-of-the-art dictionary- and redundancy-based QA engines. We compare the effectiveness of two dictionary-based classifiers for extracting correct GO answers from a large set of 100 retrieved abstracts per question. In the same way, we also investigate the power of GOCat, a GO supervised classifier. GOCat exploits the GOA database to propose GO concepts that were annotated by curators for similar abstracts. This approach is called deep QA, as it adds an original classification step, and exploits curated biological data to infer answers, which are not explicitly mentioned in the retrieved documents. We show that for complex answers such as protein functional descriptions, the redundancy phenomenon has a limited effect. Similarly usual dictionary-based approaches are relatively ineffective. In contrast, we demonstrate how existing curated data, beyond information extraction, can be exploited by a supervised classifier, such as GOCat, to massively improve both the quantity and the quality of the answers with a +100% improvement for both recall and precision. Database URL: http://eagl.unige.ch/DeepQA4PA/. © The Author(s) 2015. Published by Oxford University Press.
Two Models for Semi-Supervised Terrorist Group Detection
NASA Astrophysics Data System (ADS)
Ozgul, Fatih; Erdem, Zeki; Bowerman, Chris
Since discovery of organization structure of offender groups leads the investigation to terrorist cells or organized crime groups, detecting covert networks from crime data are important to crime investigation. Two models, GDM and OGDM, which are based on another representation model - OGRM are developed and tested on nine terrorist groups. GDM, which is basically depending on police arrest data and “caught together” information and OGDM, which uses a feature matching on year-wise offender components from arrest and demographics data, performed well on terrorist groups, but OGDM produced high precision with low recall values. OGDM uses a terror crime modus operandi ontology which enabled matching of similar crimes.
TSAI, LING LING Y.; MCNAMARA, RENAE J.; DENNIS, SARAH M.; MODDEL, CHLOE; ALISON, JENNIFER A.; MCKENZIE, DAVID K.; MCKEOUGH, ZOE J.
2016-01-01
Telerehabilitation, consisting of supervised home-based exercise training via real-time videoconferencing, is an alternative method to deliver pulmonary rehabilitation with potential to improve access. The aims were to determine the level of satisfaction and experience of an eight-week supervised home-based telerehabilitation exercise program using real-time videoconferencing in people with COPD. Quantitative measures were the Client Satisfaction Questionnaire-8 (CSQ-8) and a purpose-designed satisfaction survey. A qualitative component was conducted using semi-structured interviews. Nineteen participants (mean (SD) age 73 (8) years, forced expiratory volume in 1 second (FEV1) 60 (23) % predicted) showed a high level of satisfaction in the CSQ-8 score and 100% of participants reported a high level of satisfaction with the quality of exercise sessions delivered using real-time videoconferencing in participant satisfaction survey. Eleven participants undertook semi-structured interviews. Key themes in four areas relating to the telerehabilitation service emerged: positive virtual interaction through technology; health benefits; and satisfaction with the convenience and use of equipment. Participants were highly satisfied with the telerehabilitation exercise program delivered via videoconferencing. PMID:28775799
An immune-inspired semi-supervised algorithm for breast cancer diagnosis.
Peng, Lingxi; Chen, Wenbin; Zhou, Wubai; Li, Fufang; Yang, Jin; Zhang, Jiandong
2016-10-01
Breast cancer is the most frequently and world widely diagnosed life-threatening cancer, which is the leading cause of cancer death among women. Early accurate diagnosis can be a big plus in treating breast cancer. Researchers have approached this problem using various data mining and machine learning techniques such as support vector machine, artificial neural network, etc. The computer immunology is also an intelligent method inspired by biological immune system, which has been successfully applied in pattern recognition, combination optimization, machine learning, etc. However, most of these diagnosis methods belong to a supervised diagnosis method. It is very expensive to obtain labeled data in biology and medicine. In this paper, we seamlessly integrate the state-of-the-art research on life science with artificial intelligence, and propose a semi-supervised learning algorithm to reduce the need for labeled data. We use two well-known benchmark breast cancer datasets in our study, which are acquired from the UCI machine learning repository. Extensive experiments are conducted and evaluated on those two datasets. Our experimental results demonstrate the effectiveness and efficiency of our proposed algorithm, which proves that our algorithm is a promising automatic diagnosis method for breast cancer. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Amis, Gregory P; Carpenter, Gail A
2010-03-01
Computational models of learning typically train on labeled input patterns (supervised learning), unlabeled input patterns (unsupervised learning), or a combination of the two (semi-supervised learning). In each case input patterns have a fixed number of features throughout training and testing. Human and machine learning contexts present additional opportunities for expanding incomplete knowledge from formal training, via self-directed learning that incorporates features not previously experienced. This article defines a new self-supervised learning paradigm to address these richer learning contexts, introducing a neural network called self-supervised ARTMAP. Self-supervised learning integrates knowledge from a teacher (labeled patterns with some features), knowledge from the environment (unlabeled patterns with more features), and knowledge from internal model activation (self-labeled patterns). Self-supervised ARTMAP learns about novel features from unlabeled patterns without destroying partial knowledge previously acquired from labeled patterns. A category selection function bases system predictions on known features, and distributed network activation scales unlabeled learning to prediction confidence. Slow distributed learning on unlabeled patterns focuses on novel features and confident predictions, defining classification boundaries that were ambiguous in the labeled patterns. Self-supervised ARTMAP improves test accuracy on illustrative low-dimensional problems and on high-dimensional benchmarks. Model code and benchmark data are available from: http://techlab.eu.edu/SSART/. Copyright 2009 Elsevier Ltd. All rights reserved.
A thyroid nodule classification method based on TI-RADS
NASA Astrophysics Data System (ADS)
Wang, Hao; Yang, Yang; Peng, Bo; Chen, Qin
2017-07-01
Thyroid Imaging Reporting and Data System(TI-RADS) is a valuable tool for differentiating the benign and the malignant thyroid nodules. In clinic, doctors can determine the extent of being benign or malignant in terms of different classes by using TI-RADS. Classification represents the degree of malignancy of thyroid nodules. TI-RADS as a classification standard can be used to guide the ultrasonic doctor to examine thyroid nodules more accurately and reliably. In this paper, we aim to classify the thyroid nodules with the help of TI-RADS. To this end, four ultrasound signs, i.e., cystic and solid, echo pattern, boundary feature and calcification of thyroid nodules are extracted and converted into feature vectors. Then semi-supervised fuzzy C-means ensemble (SS-FCME) model is applied to obtain the classification results. The experimental results demonstrate that the proposed method can help doctors diagnose the thyroid nodules effectively.
Enhancing deep convolutional neural network scheme for breast cancer diagnosis with unlabeled data.
Sun, Wenqing; Tseng, Tzu-Liang Bill; Zhang, Jianying; Qian, Wei
2017-04-01
In this study we developed a graph based semi-supervised learning (SSL) scheme using deep convolutional neural network (CNN) for breast cancer diagnosis. CNN usually needs a large amount of labeled data for training and fine tuning the parameters, and our proposed scheme only requires a small portion of labeled data in training set. Four modules were included in the diagnosis system: data weighing, feature selection, dividing co-training data labeling, and CNN. 3158 region of interests (ROIs) with each containing a mass extracted from 1874 pairs of mammogram images were used for this study. Among them 100 ROIs were treated as labeled data while the rest were treated as unlabeled. The area under the curve (AUC) observed in our study was 0.8818, and the accuracy of CNN is 0.8243 using the mixed labeled and unlabeled data. Copyright © 2016. Published by Elsevier Ltd.
Mental health nurses' experiences of managing work-related emotions through supervision.
MacLaren, Jessica; Stenhouse, Rosie; Ritchie, Deborah
2016-10-01
The aim of this study was to explore emotion cultures constructed in supervision and consider how supervision functions as an emotionally safe space promoting critical reflection. Research published between 1995-2015 suggests supervision has a positive impact on nurses' emotional well-being, but there is little understanding of the processes involved in this and how styles of emotion interaction are established in supervision. A narrative approach was used to investigate mental health nurses' understandings and experiences of supervision. Eight semi-structured interviews were conducted with community mental health nurses in the UK during 2011. Analysis of audio data used features of speech to identify narrative discourse and illuminate meanings. A topic-centred analysis of interview narratives explored discourses shared between the participants. This supported the identification of feeling rules in participants' narratives and the exploration of the emotion context of supervision. Effective supervision was associated with three feeling rules: safety and reflexivity; staying professional; managing feelings. These feeling rules allowed the expression and exploration of emotions, promoting critical reflection. A contrast was identified between the emotion culture of supervision and the nurses' experience of their workplace cultures as requiring the suppression of difficult emotions. Despite this, contrast supervision functioned as an emotion micro-culture with its own distinctive feeling rules. The analytical construct of feeling rules allows us to connect individual emotional experiences to shared normative discourses, highlighting how these shape emotional processes taking place in supervision. This understanding supports an explanation of how supervision may positively influence nurses' emotion management and perhaps reduce burnout. © 2016 John Wiley & Sons Ltd.
NASA Astrophysics Data System (ADS)
D'Amore, M.; Le Scaon, R.; Helbert, J.; Maturilli, A.
2017-12-01
Machine-learning achieved unprecedented results in high-dimensional data processing tasks with wide applications in various fields. Due to the growing number of complex nonlinear systems that have to be investigated in science and the bare raw size of data nowadays available, ML offers the unique ability to extract knowledge, regardless the specific application field. Examples are image segmentation, supervised/unsupervised/ semi-supervised classification, feature extraction, data dimensionality analysis/reduction.The MASCS instrument has mapped Mercury surface in the 400-1145 nm wavelength range during orbital observations by the MESSENGER spacecraft. We have conducted k-means unsupervised hierarchical clustering to identify and characterize spectral units from MASCS observations. The results display a dichotomy: a polar and equatorial units, possibly linked to compositional differences or weathering due to irradiation. To explore possible relations between composition and spectral behavior, we have compared the spectral provinces with elemental abundance maps derived from MESSENGER's X-Ray Spectrometer (XRS).For the Vesta application on DAWN Visible and infrared spectrometer (VIR) data, we explored several Machine Learning techniques: image segmentation method, stream algorithm and hierarchical clustering.The algorithm successfully separates the Olivine outcrops around two craters on Vesta's surface [1]. New maps summarizing the spectral and chemical signature of the surface could be automatically produced.We conclude that instead of hand digging in data, scientist could choose a subset of algorithms with well known feature (i.e. efficacy on the particular problem, speed, accuracy) and focus their effort in understanding what important characteristic of the groups found in the data mean. [1] E Ammannito et al. "Olivine in an unexpected location on Vesta's surface". In: Nature 504.7478 (2013), pp. 122-125.
Semi-supervised prediction of SH2-peptide interactions from imbalanced high-throughput data.
Kundu, Kousik; Costa, Fabrizio; Huber, Michael; Reth, Michael; Backofen, Rolf
2013-01-01
Src homology 2 (SH2) domains are the largest family of the peptide-recognition modules (PRMs) that bind to phosphotyrosine containing peptides. Knowledge about binding partners of SH2-domains is key for a deeper understanding of different cellular processes. Given the high binding specificity of SH2, in-silico ligand peptide prediction is of great interest. Currently however, only a few approaches have been published for the prediction of SH2-peptide interactions. Their main shortcomings range from limited coverage, to restrictive modeling assumptions (they are mainly based on position specific scoring matrices and do not take into consideration complex amino acids inter-dependencies) and high computational complexity. We propose a simple yet effective machine learning approach for a large set of known human SH2 domains. We used comprehensive data from micro-array and peptide-array experiments on 51 human SH2 domains. In order to deal with the high data imbalance problem and the high signal-to-noise ration, we casted the problem in a semi-supervised setting. We report competitive predictive performance w.r.t. state-of-the-art. Specifically we obtain 0.83 AUC ROC and 0.93 AUC PR in comparison to 0.71 AUC ROC and 0.87 AUC PR previously achieved by the position specific scoring matrices (PSSMs) based SMALI approach. Our work provides three main contributions. First, we showed that better models can be obtained when the information on the non-interacting peptides (negative examples) is also used. Second, we improve performance when considering high order correlations between the ligand positions employing regularization techniques to effectively avoid overfitting issues. Third, we developed an approach to tackle the data imbalance problem using a semi-supervised strategy. Finally, we performed a genome-wide prediction of human SH2-peptide binding, uncovering several findings of biological relevance. We make our models and genome-wide predictions, for all the 51 SH2-domains, freely available to the scientific community under the following URLs: http://www.bioinf.uni-freiburg.de/Software/SH2PepInt/SH2PepInt.tar.gz and http://www.bioinf.uni-freiburg.de/Software/SH2PepInt/Genome-wide-predictions.tar.gz, respectively.
A Large-scale Distributed Indexed Learning Framework for Data that Cannot Fit into Memory
2015-03-27
learn a classifier. Integrating three learning techniques (online, semi-supervised and active learning ) together with a selective sampling with minimum communication between the server and the clients solved this problem.
Reflective practice and guided discovery: clinical supervision.
Todd, G; Freshwater, D
This article explores the parallels between reflective practice as a model for clinical supervision, and guided discovery as a skill in cognitive psychotherapy. A description outlining the historical development of clinical supervision in relationship to positional papers and policies is followed by an exposé of the difficulties in developing a clear, consistent model of clinical supervision with a coherent focus; reflective practice is proposed as a model of choice for clinical supervision in nursing. The article examines the parallels and processes of a model of reflection in an individual clinical supervision session, and the use of guided discovery through Socratic dialogue with a depressed patient in cognitive psychotherapy. Extracts from both sessions are used to illuminate the subsequent discussion.
A semi-automatic method for extracting thin line structures in images as rooted tree network
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brazzini, Jacopo; Dillard, Scott; Soille, Pierre
2010-01-01
This paper addresses the problem of semi-automatic extraction of line networks in digital images - e.g., road or hydrographic networks in satellite images, blood vessels in medical images, robust. For that purpose, we improve a generic method derived from morphological and hydrological concepts and consisting in minimum cost path estimation and flow simulation. While this approach fully exploits the local contrast and shape of the network, as well as its arborescent nature, we further incorporate local directional information about the structures in the image. Namely, an appropriate anisotropic metric is designed by using both the characteristic features of the targetmore » network and the eigen-decomposition of the gradient structure tensor of the image. Following, the geodesic propagation from a given seed with this metric is combined with hydrological operators for overland flow simulation to extract the line network. The algorithm is demonstrated for the extraction of blood vessels in a retina image and of a river network in a satellite image.« less
Joint learning of labels and distance metric.
Liu, Bo; Wang, Meng; Hong, Richang; Zha, Zhengjun; Hua, Xian-Sheng
2010-06-01
Machine learning algorithms frequently suffer from the insufficiency of training data and the usage of inappropriate distance metric. In this paper, we propose a joint learning of labels and distance metric (JLLDM) approach, which is able to simultaneously address the two difficulties. In comparison with the existing semi-supervised learning and distance metric learning methods that focus only on label prediction or distance metric construction, the JLLDM algorithm optimizes the labels of unlabeled samples and a Mahalanobis distance metric in a unified scheme. The advantage of JLLDM is multifold: 1) the problem of training data insufficiency can be tackled; 2) a good distance metric can be constructed with only very few training samples; and 3) no radius parameter is needed since the algorithm automatically determines the scale of the metric. Extensive experiments are conducted to compare the JLLDM approach with different semi-supervised learning and distance metric learning methods, and empirical results demonstrate its effectiveness.
Semi-Supervised Approach to Monitoring Clinical Depressive Symptoms in Social Media
Yazdavar, Amir Hossein; Al-Olimat, Hussein S.; Ebrahimi, Monireh; Bajaj, Goonmeet; Banerjee, Tanvi; Thirunarayan, Krishnaprasad; Pathak, Jyotishman; Sheth, Amit
2017-01-01
With the rise of social media, millions of people are routinely expressing their moods, feelings, and daily struggles with mental health issues on social media platforms like Twitter. Unlike traditional observational cohort studies conducted through questionnaires and self-reported surveys, we explore the reliable detection of clinical depression from tweets obtained unobtrusively. Based on the analysis of tweets crawled from users with self-reported depressive symptoms in their Twitter profiles, we demonstrate the potential for detecting clinical depression symptoms which emulate the PHQ-9 questionnaire clinicians use today. Our study uses a semi-supervised statistical model to evaluate how the duration of these symptoms and their expression on Twitter (in terms of word usage patterns and topical preferences) align with the medical findings reported via the PHQ-9. Our proactive and automatic screening tool is able to identify clinical depressive symptoms with an accuracy of 68% and precision of 72%. PMID:29707701
Buus, Niels; Delgado, Cynthia; Traynor, Michael; Gonge, Henrik
2018-04-01
This present study is a report of an interview study exploring personal views on participating in group clinical supervision among mental health nursing staff members who do not participate in supervision. There is a paucity of empirical research on resistance to supervision, which has traditionally been theorized as a supervisee's maladaptive coping with anxiety in the supervision process. The aim of the present study was to examine resistance to group clinical supervision by interviewing nurses who did not participate in supervision. In 2015, we conducted semistructured interviews with 24 Danish mental health nursing staff members who had been observed not to participate in supervision in two periods of 3 months. Interviews were audio-recorded and subjected to discourse analysis. We constructed two discursive positions taken by the informants: (i) 'forced non-participation', where an informant was in favour of supervision, but presented practical reasons for not participating; and (ii) 'deliberate rejection', where an informant intentionally chose to not to participate in supervision. Furthermore, we described two typical themes drawn upon by informants in their positioning: 'difficulties related to participating in supervision' and 'limited need for and benefits from supervision'. The findings indicated that group clinical supervision extended a space for group discussion that generated or accentuated anxiety because of already-existing conflicts and a fundamental lack of trust between group members. Many informants perceived group clinical supervision as an unacceptable intrusion, which could indicate a need for developing more acceptable types of post-registration clinical education and reflective practice for this group. © 2017 Australian College of Mental Health Nurses Inc.
Combined rule extraction and feature elimination in supervised classification.
Liu, Sheng; Patel, Ronak Y; Daga, Pankaj R; Liu, Haining; Fu, Gang; Doerksen, Robert J; Chen, Yixin; Wilkins, Dawn E
2012-09-01
There are a vast number of biology related research problems involving a combination of multiple sources of data to achieve a better understanding of the underlying problems. It is important to select and interpret the most important information from these sources. Thus it will be beneficial to have a good algorithm to simultaneously extract rules and select features for better interpretation of the predictive model. We propose an efficient algorithm, Combined Rule Extraction and Feature Elimination (CRF), based on 1-norm regularized random forests. CRF simultaneously extracts a small number of rules generated by random forests and selects important features. We applied CRF to several drug activity prediction and microarray data sets. CRF is capable of producing performance comparable with state-of-the-art prediction algorithms using a small number of decision rules. Some of the decision rules are biologically significant.
Semi-automatic feedback using concurrence between mixture vectors for general databases
NASA Astrophysics Data System (ADS)
Larabi, Mohamed-Chaker; Richard, Noel; Colot, Olivier; Fernandez-Maloigne, Christine
2001-12-01
This paper describes how a query system can exploit the basic knowledge by employing semi-automatic relevance feedback to refine queries and runtimes. For general databases, it is often useless to call complex attributes, because we have not sufficient information about images in the database. Moreover, these images can be topologically very different from one to each other and an attribute that is powerful for a database category may be very powerless for the other categories. The idea is to use very simple features, such as color histogram, correlograms, Color Coherence Vectors (CCV), to fill out the signature vector. Then, a number of mixture vectors is prepared depending on the number of very distinctive categories in the database. Knowing that a mixture vector is a vector containing the weight of each attribute that will be used to compute a similarity distance. We post a query in the database using successively all the mixture vectors defined previously. We retain then the N first images for each vector in order to make a mapping using the following information: Is image I present in several mixture vectors results? What is its rank in the results? These informations allow us to switch the system on an unsupervised relevance feedback or user's feedback (supervised feedback).
NASA Astrophysics Data System (ADS)
Gao, Yuan; Ma, Jiayi; Yuille, Alan L.
2017-05-01
This paper addresses the problem of face recognition when there is only few, or even only a single, labeled examples of the face that we wish to recognize. Moreover, these examples are typically corrupted by nuisance variables, both linear (i.e., additive nuisance variables such as bad lighting, wearing of glasses) and non-linear (i.e., non-additive pixel-wise nuisance variables such as expression changes). The small number of labeled examples means that it is hard to remove these nuisance variables between the training and testing faces to obtain good recognition performance. To address the problem we propose a method called Semi-Supervised Sparse Representation based Classification (S$^3$RC). This is based on recent work on sparsity where faces are represented in terms of two dictionaries: a gallery dictionary consisting of one or more examples of each person, and a variation dictionary representing linear nuisance variables (e.g., different lighting conditions, different glasses). The main idea is that (i) we use the variation dictionary to characterize the linear nuisance variables via the sparsity framework, then (ii) prototype face images are estimated as a gallery dictionary via a Gaussian Mixture Model (GMM), with mixed labeled and unlabeled samples in a semi-supervised manner, to deal with the non-linear nuisance variations between labeled and unlabeled samples. We have done experiments with insufficient labeled samples, even when there is only a single labeled sample per person. Our results on the AR, Multi-PIE, CAS-PEAL, and LFW databases demonstrate that the proposed method is able to deliver significantly improved performance over existing methods.
Valizade Hasanloei, Mohammad Amin; Sheikhpour, Razieh; Sarram, Mehdi Agha; Sheikhpour, Elnaz; Sharifi, Hamdollah
2018-02-01
Quantitative structure-activity relationship (QSAR) is an effective computational technique for drug design that relates the chemical structures of compounds to their biological activities. Feature selection is an important step in QSAR based drug design to select the most relevant descriptors. One of the most popular feature selection methods for classification problems is Fisher score which aim is to minimize the within-class distance and maximize the between-class distance. In this study, the properties of Fisher criterion were extended for QSAR models to define the new distance metrics based on the continuous activity values of compounds with known activities. Then, a semi-supervised feature selection method was proposed based on the combination of Fisher and Laplacian criteria which exploits both compounds with known and unknown activities to select the relevant descriptors. To demonstrate the efficiency of the proposed semi-supervised feature selection method in selecting the relevant descriptors, we applied the method and other feature selection methods on three QSAR data sets such as serine/threonine-protein kinase PLK3 inhibitors, ROCK inhibitors and phenol compounds. The results demonstrated that the QSAR models built on the selected descriptors by the proposed semi-supervised method have better performance than other models. This indicates the efficiency of the proposed method in selecting the relevant descriptors using the compounds with known and unknown activities. The results of this study showed that the compounds with known and unknown activities can be helpful to improve the performance of the combined Fisher and Laplacian based feature selection methods.
NASA Astrophysics Data System (ADS)
Valizade Hasanloei, Mohammad Amin; Sheikhpour, Razieh; Sarram, Mehdi Agha; Sheikhpour, Elnaz; Sharifi, Hamdollah
2018-02-01
Quantitative structure-activity relationship (QSAR) is an effective computational technique for drug design that relates the chemical structures of compounds to their biological activities. Feature selection is an important step in QSAR based drug design to select the most relevant descriptors. One of the most popular feature selection methods for classification problems is Fisher score which aim is to minimize the within-class distance and maximize the between-class distance. In this study, the properties of Fisher criterion were extended for QSAR models to define the new distance metrics based on the continuous activity values of compounds with known activities. Then, a semi-supervised feature selection method was proposed based on the combination of Fisher and Laplacian criteria which exploits both compounds with known and unknown activities to select the relevant descriptors. To demonstrate the efficiency of the proposed semi-supervised feature selection method in selecting the relevant descriptors, we applied the method and other feature selection methods on three QSAR data sets such as serine/threonine-protein kinase PLK3 inhibitors, ROCK inhibitors and phenol compounds. The results demonstrated that the QSAR models built on the selected descriptors by the proposed semi-supervised method have better performance than other models. This indicates the efficiency of the proposed method in selecting the relevant descriptors using the compounds with known and unknown activities. The results of this study showed that the compounds with known and unknown activities can be helpful to improve the performance of the combined Fisher and Laplacian based feature selection methods.
Compressive Information Extraction: A Dynamical Systems Approach
2016-01-24
sparsely encoded in very large data streams. (a) Target tracking in an urban canyon; (b) and (c) sample frames showing contextually abnormal events: onset...extraction to identify contextually abnormal se- quences (see section 2.2.3). Formally, the problem of interest can be stated as establishing whether a noisy...relaxations with optimality guarantees can be obtained using tools from semi-algebraic geometry. 2.2 Application: Detecting Contextually Abnormal Events
Detecting Visually Observable Disease Symptoms from Faces.
Wang, Kuan; Luo, Jiebo
2016-12-01
Recent years have witnessed an increasing interest in the application of machine learning to clinical informatics and healthcare systems. A significant amount of research has been done on healthcare systems based on supervised learning. In this study, we present a generalized solution to detect visually observable symptoms on faces using semi-supervised anomaly detection combined with machine vision algorithms. We rely on the disease-related statistical facts to detect abnormalities and classify them into multiple categories to narrow down the possible medical reasons of detecting. Our method is in contrast with most existing approaches, which are limited by the availability of labeled training data required for supervised learning, and therefore offers the major advantage of flagging any unusual and visually observable symptoms.
Yao, Chen; Zhu, Xiaojin; Weigel, Kent A
2016-11-07
Genomic prediction for novel traits, which can be costly and labor-intensive to measure, is often hampered by low accuracy due to the limited size of the reference population. As an option to improve prediction accuracy, we introduced a semi-supervised learning strategy known as the self-training model, and applied this method to genomic prediction of residual feed intake (RFI) in dairy cattle. We describe a self-training model that is wrapped around a support vector machine (SVM) algorithm, which enables it to use data from animals with and without measured phenotypes. Initially, a SVM model was trained using data from 792 animals with measured RFI phenotypes. Then, the resulting SVM was used to generate self-trained phenotypes for 3000 animals for which RFI measurements were not available. Finally, the SVM model was re-trained using data from up to 3792 animals, including those with measured and self-trained RFI phenotypes. Incorporation of additional animals with self-trained phenotypes enhanced the accuracy of genomic predictions compared to that of predictions that were derived from the subset of animals with measured phenotypes. The optimal ratio of animals with self-trained phenotypes to animals with measured phenotypes (2.5, 2.0, and 1.8) and the maximum increase achieved in prediction accuracy measured as the correlation between predicted and actual RFI phenotypes (5.9, 4.1, and 2.4%) decreased as the size of the initial training set (300, 400, and 500 animals with measured phenotypes) increased. The optimal number of animals with self-trained phenotypes may be smaller when prediction accuracy is measured as the mean squared error rather than the correlation between predicted and actual RFI phenotypes. Our results demonstrate that semi-supervised learning models that incorporate self-trained phenotypes can achieve genomic prediction accuracies that are comparable to those obtained with models using larger training sets that include only animals with measured phenotypes. Semi-supervised learning can be helpful for genomic prediction of novel traits, such as RFI, for which the size of reference population is limited, in particular, when the animals to be predicted and the animals in the reference population originate from the same herd-environment.
An Expertise Recommender using Web Mining
NASA Technical Reports Server (NTRS)
Joshi, Anupam; Chandrasekaran, Purnima; ShuYang, Michelle; Ramakrishnan, Ramya
2001-01-01
This report explored techniques to mine web pages of scientists to extract information regarding their expertise, build expertise chains and referral webs, and semi automatically combine this information with directory information services to create a recommender system that permits query by expertise. The approach included experimenting with existing techniques that have been reported in research literature in recent past , and adapted them as needed. In addition, software tools were developed to capture and use this information.
Code of Federal Regulations, 2010 CFR
2010-01-01
... available to supervised financial institutions and financial institution supervisory agencies. 261.20... Supervised Institutions, Financial Institution Supervisory Agencies, Law Enforcement Agencies, and Others in Certain Circumstances § 261.20 Confidential supervisory information made available to supervised financial...
Generating region proposals for histopathological whole slide image retrieval.
Ma, Yibing; Jiang, Zhiguo; Zhang, Haopeng; Xie, Fengying; Zheng, Yushan; Shi, Huaqiang; Zhao, Yu; Shi, Jun
2018-06-01
Content-based image retrieval is an effective method for histopathological image analysis. However, given a database of huge whole slide images (WSIs), acquiring appropriate region-of-interests (ROIs) for training is significant and difficult. Moreover, histopathological images can only be annotated by pathologists, resulting in the lack of labeling information. Therefore, it is an important and challenging task to generate ROIs from WSI and retrieve image with few labels. This paper presents a novel unsupervised region proposing method for histopathological WSI based on Selective Search. Specifically, the WSI is over-segmented into regions which are hierarchically merged until the WSI becomes a single region. Nucleus-oriented similarity measures for region mergence and Nucleus-Cytoplasm color space for histopathological image are specially defined to generate accurate region proposals. Additionally, we propose a new semi-supervised hashing method for image retrieval. The semantic features of images are extracted with Latent Dirichlet Allocation and transformed into binary hashing codes with Supervised Hashing. The methods are tested on a large-scale multi-class database of breast histopathological WSIs. The results demonstrate that for one WSI, our region proposing method can generate 7.3 thousand contoured regions which fit well with 95.8% of the ROIs annotated by pathologists. The proposed hashing method can retrieve a query image among 136 thousand images in 0.29 s and reach precision of 91% with only 10% of images labeled. The unsupervised region proposing method can generate regions as predictions of lesions in histopathological WSI. The region proposals can also serve as the training samples to train machine-learning models for image retrieval. The proposed hashing method can achieve fast and precise image retrieval with small amount of labels. Furthermore, the proposed methods can be potentially applied in online computer-aided-diagnosis systems. Copyright © 2018 Elsevier B.V. All rights reserved.
Yousefi, Alireza; Bazrafkan, Leila; Yamani, Nikoo
2015-07-01
The supervision of academic theses at the Universities of Medical Sciences is one of the most important issues with several challenges. The aim of the present study is to discover the nature of problems and challenges of thesis supervision in Iranian universities of medical sciences. The study was conducted with a qualitative method using conventional content analysis approach. Nineteen faculty members, using purposive sampling, and 11 postgraduate medical sciences students (Ph.D students and residents) were selected on the basis of theoretical sampling. The data were gathered through semi-structured interviews and field observations in Shiraz and Isfahan universities of medical sciences from September 2012 to December 2014. The qualitative content analysis was used with a conventional approach to analyze the data. While experiencing the nature of research supervision process, faculties and the students faced some complexities and challenges in the research supervision process. The obtained codes were categorized under 4 themes Based on the characteristics; included "contextual problem", "role ambiguity in thesis supervision", "poor reflection in supervision" and "ethical problems". The result of this study revealed that there is a need for more attention to planning and defining the supervisory, and research supervision. Also, improvement of the quality of supervisor and students relationship must be considered behind the research context improvement in research supervisory area.
Craig, Pippa L; Phillips, Christine; Hall, Sally
2016-08-01
To describe outcomes of a model of service learning in interprofessional learning (IPL) aimed at developing a sustainable model of training that also contributed to service strengthening. A total of 57 semi-structured interviews with key informants and document review exploring the impacts of interprofessional student teams engaged in locally relevant IPL activities. Six rural towns in South East New South Wales. Local facilitators, staff of local health and other services, health professionals who supervised the 89 students in 37 IPL teams, and academic and administrative staff. Perceived benefits as a consequence of interprofessional, service-learning interventions in these rural towns. Reported outcomes included increased local awareness of a particular issue addressed by the team; improved communication between different health professions; continued use of the team's product or a changed procedure in response to the teams' work; and evidence of improved use of a particular local health service. Given the limited workforce available in rural areas to supervise clinical IPL placements, a service-learning IPL model that aims to build social capital may be a useful educational model. © 2015 National Rural Health Alliance Inc.
Webly-Supervised Fine-Grained Visual Categorization via Deep Domain Adaptation.
Xu, Zhe; Huang, Shaoli; Zhang, Ya; Tao, Dacheng
2018-05-01
Learning visual representations from web data has recently attracted attention for object recognition. Previous studies have mainly focused on overcoming label noise and data bias and have shown promising results by learning directly from web data. However, we argue that it might be better to transfer knowledge from existing human labeling resources to improve performance at nearly no additional cost. In this paper, we propose a new semi-supervised method for learning via web data. Our method has the unique design of exploiting strong supervision, i.e., in addition to standard image-level labels, our method also utilizes detailed annotations including object bounding boxes and part landmarks. By transferring as much knowledge as possible from existing strongly supervised datasets to weakly supervised web images, our method can benefit from sophisticated object recognition algorithms and overcome several typical problems found in webly-supervised learning. We consider the problem of fine-grained visual categorization, in which existing training resources are scarce, as our main research objective. Comprehensive experimentation and extensive analysis demonstrate encouraging performance of the proposed approach, which, at the same time, delivers a new pipeline for fine-grained visual categorization that is likely to be highly effective for real-world applications.
Agile Text Mining for the 2014 i2b2/UTHealth Cardiac Risk Factors Challenge
Cormack, James; Nath, Chinmoy; Milward, David; Raja, Kalpana; Jonnalagadda, Siddhartha R
2016-01-01
This paper describes the use of an agile text mining platform (Linguamatics’ Interactive Information Extraction Platform, I2E) to extract document-level cardiac risk factors in patient records as defined in the i2b2/UTHealth 2014 Challenge. The approach uses a data-driven rule-based methodology with the addition of a simple supervised classifier. We demonstrate that agile text mining allows for rapid optimization of extraction strategies, while post-processing can leverage annotation guidelines, corpus statistics and logic inferred from the gold standard data. We also show how data imbalance in a training set affects performance. Evaluation of this approach on the test data gave an F-Score of 91.7%, one percent behind the top performing system. PMID:26209007
2012-01-01
Computational approaches to generate hypotheses from biomedical literature have been studied intensively in recent years. Nevertheless, it still remains a challenge to automatically discover novel, cross-silo biomedical hypotheses from large-scale literature repositories. In order to address this challenge, we first model a biomedical literature repository as a comprehensive network of biomedical concepts and formulate hypotheses generation as a process of link discovery on the concept network. We extract the relevant information from the biomedical literature corpus and generate a concept network and concept-author map on a cluster using Map-Reduce frame-work. We extract a set of heterogeneous features such as random walk based features, neighborhood features and common author features. The potential number of links to consider for the possibility of link discovery is large in our concept network and to address the scalability problem, the features from a concept network are extracted using a cluster with Map-Reduce framework. We further model link discovery as a classification problem carried out on a training data set automatically extracted from two network snapshots taken in two consecutive time duration. A set of heterogeneous features, which cover both topological and semantic features derived from the concept network, have been studied with respect to their impacts on the accuracy of the proposed supervised link discovery process. A case study of hypotheses generation based on the proposed method has been presented in the paper. PMID:22759614
Ducat, Wendy; Martin, Priya; Kumar, Saravana; Burge, Vanessa; Abernathy, LuJuana
2016-02-01
Improving the quality and safety of health care in Australia is imperative to ensure the right treatment is delivered to the right person at the right time. Achieving this requires appropriate clinical governance and support for health professionals, including professional supervision. This study investigates the usefulness and effectiveness of and barriers to supervision in rural and remote Queensland. As part of the evaluation of the Allied Health Rural and Remote Training and Support program, a qualitative descriptive study was conducted involving semi-structured interviews with 42 rural or remote allied health professionals, nine operational managers and four supervisors. The interviews explored perspectives on their supervision arrangements, including the perceived usefulness, effect on practice and barriers. Themes of reduced isolation; enhanced professional enthusiasm, growth and commitment to the organisation; enhanced clinical skills, knowledge and confidence; and enhanced patient safety were identified as perceived outcomes of professional supervision. Time, technology and organisational factors were identified as potential facilitators as well as potential barriers to effective supervision. This research provides current evidence on the impact of professional supervision in rural and remote Queensland. A multidimensional model of organisational factors associated with effective supervision in rural and remote settings is proposed identifying positive supervision culture and a good supervisor-supervisee fit as key factors associated with effective arrangements. © 2015 Commonwealth of Australia. Australian Journal of Rural Health published by Wiley Publishing Asia Pty Ltd. on behalf of National Rural Health Alliance Inc.
Martin, Priya; Kumar, Saravana; Lizarondo, Lucylynn; VanErp, Ans
2015-09-24
Health professionals practising in countries with dispersed populations such as Australia rely on clinical supervision for professional support. While there are directives and guidelines in place to govern clinical supervision, little is known about how it is actually conducted and what makes it effective. The purpose of this study was to explore the enablers of and barriers to high quality clinical supervision among occupational therapists across Queensland in Australia. This qualitative study took place as part of a broader project. Individual, in-depth, semi-structured interviews were conducted with occupational therapy supervisees in Queensland. The interviews explored the enablers of and barriers to high quality clinical supervision in this group. They further explored some findings from the initial quantitative study. Content analysis of the interview data resulted in eight themes. These themes were broadly around the importance of the supervisory relationship, the impact of clinical supervision and the enablers of and barriers to high quality clinical supervision. This study identified a number of factors that were perceived to be associated with high quality clinical supervision. Supervisor-supervisee matching and fit, supervisory relationship and availability of supervisor for support in between clinical supervision sessions appeared to be associated with perceptions of higher quality of clinical supervision received. Some face-to-face contact augmented with telesupervision was found to improve perceptions of the quality of clinical supervision received via telephone. Lastly, dual roles where clinical supervision and line management were provided by the same person were not considered desirable by supervisees. A number of enablers of and barriers to high quality clinical supervision were also identified. With clinical supervision gaining increasing prominence as part of organisational and professional governance, this study provides important lessons for successful and sustainable clinical supervision in practice contexts.
NASA Astrophysics Data System (ADS)
Osmanoglu, B.; Ozkan, C.; Sunar, F.
2013-10-01
After air strikes on July 14 and 15, 2006 the Jiyeh Power Station started leaking oil into the eastern Mediterranean Sea. The power station is located about 30 km south of Beirut and the slick covered about 170 km of coastline threatening the neighboring countries Turkey and Cyprus. Due to the ongoing conflict between Israel and Lebanon, cleaning efforts could not start immediately resulting in 12 000 to 15 000 tons of fuel oil leaking into the sea. In this paper we compare results from automatic and semi-automatic slick detection algorithms. The automatic detection method combines the probabilities calculated for each pixel from each image to obtain a joint probability, minimizing the adverse effects of atmosphere on oil spill detection. The method can readily utilize X-, C- and L-band data where available. Furthermore wind and wave speed observations can be used for a more accurate analysis. For this study, we utilize Envisat ASAR ScanSAR data. A probability map is generated based on the radar backscatter, effect of wind and dampening value. The semi-automatic algorithm is based on supervised classification. As a classifier, Artificial Neural Network Multilayer Perceptron (ANN MLP) classifier is used since it is more flexible and efficient than conventional maximum likelihood classifier for multisource and multi-temporal data. The learning algorithm for ANN MLP is chosen as the Levenberg-Marquardt (LM). Training and test data for supervised classification are composed from the textural information created from SAR images. This approach is semiautomatic because tuning the parameters of classifier and composing training data need a human interaction. We point out the similarities and differences between the two methods and their results as well as underlining their advantages and disadvantages. Due to the lack of ground truth data, we compare obtained results to each other, as well as other published oil slick area assessments.
Satellite monitoring of vegetation and geology in semi-arid environments. [Tanzania
NASA Technical Reports Server (NTRS)
Kihlblom, U.; Johansson, D. (Principal Investigator)
1980-01-01
The possibility of mapping various characteristics of the natural environment of Tanzania by various LANDSAT techniques was assessed. Interpretation and mapping were carried out using black and white as well as color infrared images on the scale of 1:250,000. The advantages of several computer techniques were also assessed, including contrast-stretched rationing, differential edge enhancement; supervised classification; multitemporal classification; and change detection. Results Show the most useful image for interpretation comes from band 5, with additional information being obtained from either band 6 or band 7. The advantages of using color infrared images for interpreting vegetation and geology are so great that black and white should be used only to supplement the colored images.
Semi-Automatic Terminology Generation for Information Extraction from German Chest X-Ray Reports.
Krebs, Jonathan; Corovic, Hamo; Dietrich, Georg; Ertl, Max; Fette, Georg; Kaspar, Mathias; Krug, Markus; Stoerk, Stefan; Puppe, Frank
2017-01-01
Extraction of structured data from textual reports is an important subtask for building medical data warehouses for research and care. Many medical and most radiology reports are written in a telegraphic style with a concatenation of noun phrases describing the presence or absence of findings. Therefore a lexico-syntactical approach is promising, where key terms and their relations are recognized and mapped on a predefined standard terminology (ontology). We propose a two-phase algorithm for terminology matching: In the first pass, a local terminology for recognition is derived as close as possible to the terms used in the radiology reports. In the second pass, the local terminology is mapped to a standard terminology. In this paper, we report on an algorithm for the first step of semi-automatic generation of the local terminology and evaluate the algorithm with radiology reports of chest X-ray examinations from Würzburg university hospital. With an effort of about 20 hours work of a radiologist as domain expert and 10 hours for meetings, a local terminology with about 250 attributes and various value patterns was built. In an evaluation with 100 randomly chosen reports it achieved an F1-Score of about 95% for information extraction.
Zahm, Kimberly Wehner; Veach, Patricia McCarthy; Martyr, Meredith A; LeRoy, Bonnie S
2016-08-01
Research on genetic counselor professional development would characterize typical developmental processes, inform training and supervision, and promote life-long development opportunities. To date, however no studies have comprehensively examined this phenomenon. The aims of this study were to investigate the nature of professional development for genetic counselors (processes, influences, and outcomes) and whether professional development varies across experience levels. Thirty-four genetic counselors participated in semi-structured telephone interviews exploring their perspectives on their professional development. Participants were sampled from three levels of post-degree genetic counseling experience: novice (0-5 years), experienced (6-14 years), and seasoned (>15 years). Using modified Consensual Qualitative Research and grounded theory methods, themes, domains, and categories were extracted from the data. The themes reflect genetic counselors' evolving perceptions of their professional development and its relationship to: (a) being a clinician, (b) their professional identity, and (c) the field itself. Across experience levels, prevalent influences on professional development were interpersonal (e.g., experiences with patients, genetic counseling colleagues) and involved professional and personal life events. Common developmental experiences included greater confidence and less anxiety over time, being less information-driven and more emotion-focused with patients, delivering "bad news" to patients remains challenging, and individuals' professional development experiences parallel genetic counseling's development as a field. With a few noteworthy exceptions, professional development was similar across experience levels. A preliminary model of genetic counselor professional development is proposed suggesting development occurs in a non-linear fashion throughout the professional lifespan. Each component of the model mutually influences the others, and there are positive and negative avenues of development.
Views of Health System Experts on Macro Factors of Induced Demand
Khorasani, Elahe; Keyvanara, Mahmoud; Karimi, Saeed; Jazi, Marzie Jafarian
2014-01-01
Background: The probability and severity of effects of induced demand are because of the interaction between a range of factors that can affect physicians and patients behavior. It is also affected by the laws of the markets and organizational arrangements for medical services. This article studies major factors that affect the phenomenon of induced demand with the use of experts’ experiences of the Isfahan University of Medical Sciences. Methods: The research is applied a qualitative method. Semi-structured interview was used for data generation. Participants in this study were people who had been informed in this regard and had to be experienced and were known as experts. Purposive sampling was done for data saturation. Seventeen people were interviewed and criteria such as data “reliability of information” and stability were considered. The anonymity of the interviewees was preserved. The data are transcribed, categorized and then used the thematic analysis. Results: In this study, thematic analysis was conducted, and 77 sub-themes and 3 themes were extracted respectively. The three main themes include infrastructural factors, social factors, and organizational structural factors affecting induced demand. Each of these also has some sub-themes. Conclusions: Results of this research present a framework for analyzing the major causes of induced demand. The causes identified here include complexity of medicine, information mismatch between service providers and consumers, clinical uncertainty, false beliefs, advertisements, insufficient supervision, scarcity of clinical guidelines, weakness of education system, and ignorance of medical ethics. These findings help policymakers to investigate the induced demand phenomenon clear-sighted. PMID:25400888
Enhancing clinical concept extraction with distributional semantics
Cohen, Trevor; Wu, Stephen; Gonzalez, Graciela
2011-01-01
Extracting concepts (such as drugs, symptoms, and diagnoses) from clinical narratives constitutes a basic enabling technology to unlock the knowledge within and support more advanced reasoning applications such as diagnosis explanation, disease progression modeling, and intelligent analysis of the effectiveness of treatment. The recent release of annotated training sets of de-identified clinical narratives has contributed to the development and refinement of concept extraction methods. However, as the annotation process is labor-intensive, training data are necessarily limited in the concepts and concept patterns covered, which impacts the performance of supervised machine learning applications trained with these data. This paper proposes an approach to minimize this limitation by combining supervised machine learning with empirical learning of semantic relatedness from the distribution of the relevant words in additional unannotated text. The approach uses a sequential discriminative classifier (Conditional Random Fields) to extract the mentions of medical problems, treatments and tests from clinical narratives. It takes advantage of all Medline abstracts indexed as being of the publication type “clinical trials” to estimate the relatedness between words in the i2b2/VA training and testing corpora. In addition to the traditional features such as dictionary matching, pattern matching and part-of-speech tags, we also used as a feature words that appear in similar contexts to the word in question (that is, words that have a similar vector representation measured with the commonly used cosine metric, where vector representations are derived using methods of distributional semantics). To the best of our knowledge, this is the first effort exploring the use of distributional semantics, the semantics derived empirically from unannotated text often using vector space models, for a sequence classification task such as concept extraction. Therefore, we first experimented with different sliding window models and found the model with parameters that led to best performance in a preliminary sequence labeling task. The evaluation of this approach, performed against the i2b2/VA concept extraction corpus, showed that incorporating features based on the distribution of words across a large unannotated corpus significantly aids concept extraction. Compared to a supervised-only approach as a baseline, the micro-averaged f-measure for exact match increased from 80.3% to 82.3% and the micro-averaged f-measure based on inexact match increased from 89.7% to 91.3%. These improvements are highly significant according to the bootstrap resampling method and also considering the performance of other systems. Thus, distributional semantic features significantly improve the performance of concept extraction from clinical narratives by taking advantage of word distribution information obtained from unannotated data. PMID:22085698
A Graph-Based Recovery and Decomposition of Swanson’s Hypothesis using Semantic Predications
Cameron, Delroy; Bodenreider, Olivier; Yalamanchili, Hima; Danh, Tu; Vallabhaneni, Sreeram; Thirunarayan, Krishnaprasad; Sheth, Amit P.; Rindflesch, Thomas C.
2014-01-01
Objectives This paper presents a methodology for recovering and decomposing Swanson’s Raynaud Syndrome–Fish Oil Hypothesis semi-automatically. The methodology leverages the semantics of assertions extracted from biomedical literature (called semantic predications) along with structured background knowledge and graph-based algorithms to semi-automatically capture the informative associations originally discovered manually by Swanson. Demonstrating that Swanson’s manually intensive techniques can be undertaken semi-automatically, paves the way for fully automatic semantics-based hypothesis generation from scientific literature. Methods Semantic predications obtained from biomedical literature allow the construction of labeled directed graphs which contain various associations among concepts from the literature. By aggregating such associations into informative subgraphs, some of the relevant details originally articulated by Swanson has been uncovered. However, by leveraging background knowledge to bridge important knowledge gaps in the literature, a methodology for semi-automatically capturing the detailed associations originally explicated in natural language by Swanson has been developed. Results Our methodology not only recovered the 3 associations commonly recognized as Swanson’s Hypothesis, but also decomposed them into an additional 16 detailed associations, formulated as chains of semantic predications. Altogether, 14 out of the 19 associations that can be attributed to Swanson were retrieved using our approach. To the best of our knowledge, such an in-depth recovery and decomposition of Swanson’s Hypothesis has never been attempted. Conclusion In this work therefore, we presented a methodology for semi- automatically recovering and decomposing Swanson’s RS-DFO Hypothesis using semantic representations and graph algorithms. Our methodology provides new insights into potential prerequisites for semantics-driven Literature-Based Discovery (LBD). These suggest that three critical aspects of LBD include: 1) the need for more expressive representations beyond Swanson’s ABC model; 2) an ability to accurately extract semantic information from text; and 3) the semantic integration of scientific literature with structured background knowledge. PMID:23026233
Semi-Supervised Multi-View Learning for Gene Network Reconstruction
Ceci, Michelangelo; Pio, Gianvito; Kuzmanovski, Vladimir; Džeroski, Sašo
2015-01-01
The task of gene regulatory network reconstruction from high-throughput data is receiving increasing attention in recent years. As a consequence, many inference methods for solving this task have been proposed in the literature. It has been recently observed, however, that no single inference method performs optimally across all datasets. It has also been shown that the integration of predictions from multiple inference methods is more robust and shows high performance across diverse datasets. Inspired by this research, in this paper, we propose a machine learning solution which learns to combine predictions from multiple inference methods. While this approach adds additional complexity to the inference process, we expect it would also carry substantial benefits. These would come from the automatic adaptation to patterns on the outputs of individual inference methods, so that it is possible to identify regulatory interactions more reliably when these patterns occur. This article demonstrates the benefits (in terms of accuracy of the reconstructed networks) of the proposed method, which exploits an iterative, semi-supervised ensemble-based algorithm. The algorithm learns to combine the interactions predicted by many different inference methods in the multi-view learning setting. The empirical evaluation of the proposed algorithm on a prokaryotic model organism (E. coli) and on a eukaryotic model organism (S. cerevisiae) clearly shows improved performance over the state of the art methods. The results indicate that gene regulatory network reconstruction for the real datasets is more difficult for S. cerevisiae than for E. coli. The software, all the datasets used in the experiments and all the results are available for download at the following link: http://figshare.com/articles/Semi_supervised_Multi_View_Learning_for_Gene_Network_Reconstruction/1604827. PMID:26641091
PDF text classification to leverage information extraction from publication reports.
Bui, Duy Duc An; Del Fiol, Guilherme; Jonnalagadda, Siddhartha
2016-06-01
Data extraction from original study reports is a time-consuming, error-prone process in systematic review development. Information extraction (IE) systems have the potential to assist humans in the extraction task, however majority of IE systems were not designed to work on Portable Document Format (PDF) document, an important and common extraction source for systematic review. In a PDF document, narrative content is often mixed with publication metadata or semi-structured text, which add challenges to the underlining natural language processing algorithm. Our goal is to categorize PDF texts for strategic use by IE systems. We used an open-source tool to extract raw texts from a PDF document and developed a text classification algorithm that follows a multi-pass sieve framework to automatically classify PDF text snippets (for brevity, texts) into TITLE, ABSTRACT, BODYTEXT, SEMISTRUCTURE, and METADATA categories. To validate the algorithm, we developed a gold standard of PDF reports that were included in the development of previous systematic reviews by the Cochrane Collaboration. In a two-step procedure, we evaluated (1) classification performance, and compared it with machine learning classifier, and (2) the effects of the algorithm on an IE system that extracts clinical outcome mentions. The multi-pass sieve algorithm achieved an accuracy of 92.6%, which was 9.7% (p<0.001) higher than the best performing machine learning classifier that used a logistic regression algorithm. F-measure improvements were observed in the classification of TITLE (+15.6%), ABSTRACT (+54.2%), BODYTEXT (+3.7%), SEMISTRUCTURE (+34%), and MEDADATA (+14.2%). In addition, use of the algorithm to filter semi-structured texts and publication metadata improved performance of the outcome extraction system (F-measure +4.1%, p=0.002). It also reduced of number of sentences to be processed by 44.9% (p<0.001), which corresponds to a processing time reduction of 50% (p=0.005). The rule-based multi-pass sieve framework can be used effectively in categorizing texts extracted from PDF documents. Text classification is an important prerequisite step to leverage information extraction from PDF documents. Copyright © 2016 Elsevier Inc. All rights reserved.
Learning Robust and Discriminative Subspace With Low-Rank Constraints.
Li, Sheng; Fu, Yun
2016-11-01
In this paper, we aim at learning robust and discriminative subspaces from noisy data. Subspace learning is widely used in extracting discriminative features for classification. However, when data are contaminated with severe noise, the performance of most existing subspace learning methods would be limited. Recent advances in low-rank modeling provide effective solutions for removing noise or outliers contained in sample sets, which motivates us to take advantage of low-rank constraints in order to exploit robust and discriminative subspace for classification. In particular, we present a discriminative subspace learning method called the supervised regularization-based robust subspace (SRRS) approach, by incorporating the low-rank constraint. SRRS seeks low-rank representations from the noisy data, and learns a discriminative subspace from the recovered clean data jointly. A supervised regularization function is designed to make use of the class label information, and therefore to enhance the discriminability of subspace. Our approach is formulated as a constrained rank-minimization problem. We design an inexact augmented Lagrange multiplier optimization algorithm to solve it. Unlike the existing sparse representation and low-rank learning methods, our approach learns a low-dimensional subspace from recovered data, and explicitly incorporates the supervised information. Our approach and some baselines are evaluated on the COIL-100, ALOI, Extended YaleB, FERET, AR, and KinFace databases. The experimental results demonstrate the effectiveness of our approach, especially when the data contain considerable noise or variations.
NASA Astrophysics Data System (ADS)
Lin, Chuang; Wang, Binghui; Jiang, Ning; Farina, Dario
2018-04-01
Objective. This paper proposes a novel simultaneous and proportional multiple degree of freedom (DOF) myoelectric control method for active prostheses. Approach. The approach is based on non-negative matrix factorization (NMF) of surface EMG signals with the inclusion of sparseness constraints. By applying a sparseness constraint to the control signal matrix, it is possible to extract the basis information from arbitrary movements (quasi-unsupervised approach) for multiple DOFs concurrently. Main Results. In online testing based on target hitting, able-bodied subjects reached a greater throughput (TP) when using sparse NMF (SNMF) than with classic NMF or with linear regression (LR). Accordingly, the completion time (CT) was shorter for SNMF than NMF or LR. The same observations were made in two patients with unilateral limb deficiencies. Significance. The addition of sparseness constraints to NMF allows for a quasi-unsupervised approach to myoelectric control with superior results with respect to previous methods for the simultaneous and proportional control of multi-DOF. The proposed factorization algorithm allows robust simultaneous and proportional control, is superior to previous supervised algorithms, and, because of minimal supervision, paves the way to online adaptation in myoelectric control.
Automatic age and gender classification using supervised appearance model
NASA Astrophysics Data System (ADS)
Bukar, Ali Maina; Ugail, Hassan; Connah, David
2016-11-01
Age and gender classification are two important problems that recently gained popularity in the research community, due to their wide range of applications. Research has shown that both age and gender information are encoded in the face shape and texture, hence the active appearance model (AAM), a statistical model that captures shape and texture variations, has been one of the most widely used feature extraction techniques for the aforementioned problems. However, AAM suffers from some drawbacks, especially when used for classification. This is primarily because principal component analysis (PCA), which is at the core of the model, works in an unsupervised manner, i.e., PCA dimensionality reduction does not take into account how the predictor variables relate to the response (class labels). Rather, it explores only the underlying structure of the predictor variables, thus, it is no surprise if PCA discards valuable parts of the data that represent discriminatory features. Toward this end, we propose a supervised appearance model (sAM) that improves on AAM by replacing PCA with partial least-squares regression. This feature extraction technique is then used for the problems of age and gender classification. Our experiments show that sAM has better predictive power than the conventional AAM.
Austin, Anne; Gulema, Hanna; Belizan, Maria; Colaci, Daniela S; Kendall, Tamil; Tebeka, Mahlet; Hailemariam, Mengistu; Bekele, Delayehu; Tadesse, Lia; Berhane, Yemane; Langer, Ana
2015-03-29
Increasing women's access to and use of facilities for childbirth is a critical national strategy to improve maternal health outcomes in Ethiopia; however coverage alone is not enough as the quality of emergency obstetric services affects maternal mortality and morbidity. Addis Ababa has a much higher proportion of facility-based births (82%) than the national average (11%), but timely provision of quality emergency obstetric care remains a significant challenge for reducing maternal mortality and improving maternal health. The purpose of this study was to assess barriers to the provision of emergency obstetric care in Addis Ababa from the perspective of healthcare providers by analyzing three factors: implementation of national referral guidelines, staff training, and staff supervision. A mixed methods approach was used to assess barriers to quality emergency obstetric care. Qualitative analyses included twenty-nine, semi-structured, key informant interviews with providers from an urban referral network consisting of a hospital and seven health centers. Quantitative survey data were collected from 111 providers, 80% (111/138) of those providing maternal health services in the same referral network. Respondents identified a lack of transportation and communication infrastructure, overcrowding at the referral hospital, insufficient pre-service and in-service training, and absence of supportive supervision as key barriers to provision of quality emergency obstetric care. Dedicated transportation and communication infrastructure, improvements in pre-service and in-service training, and supportive supervision are needed to maximize the effective use of existing human resources and infrastructure, thus increasing access to and the provision of timely, high quality emergency obstetric care in Addis Ababa, Ethiopia.
NASA Astrophysics Data System (ADS)
Dogon-Yaro, M. A.; Kumar, P.; Rahman, A. Abdul; Buyuksalih, G.
2016-09-01
Mapping of trees plays an important role in modern urban spatial data management, as many benefits and applications inherit from this detailed up-to-date data sources. Timely and accurate acquisition of information on the condition of urban trees serves as a tool for decision makers to better appreciate urban ecosystems and their numerous values which are critical to building up strategies for sustainable development. The conventional techniques used for extracting trees include ground surveying and interpretation of the aerial photography. However, these techniques are associated with some constraints, such as labour intensive field work and a lot of financial requirement which can be overcome by means of integrated LiDAR and digital image datasets. Compared to predominant studies on trees extraction mainly in purely forested areas, this study concentrates on urban areas, which have a high structural complexity with a multitude of different objects. This paper presented a workflow about semi-automated approach for extracting urban trees from integrated processing of airborne based LiDAR point cloud and multispectral digital image datasets over Istanbul city of Turkey. The paper reveals that the integrated datasets is a suitable technology and viable source of information for urban trees management. As a conclusion, therefore, the extracted information provides a snapshot about location, composition and extent of trees in the study area useful to city planners and other decision makers in order to understand how much canopy cover exists, identify new planting, removal, or reforestation opportunities and what locations have the greatest need or potential to maximize benefits of return on investment. It can also help track trends or changes to the urban trees over time and inform future management decisions.
Automated Detection of Microaneurysms Using Scale-Adapted Blob Analysis and Semi-Supervised Learning
DOE Office of Scientific and Technical Information (OSTI.GOV)
Adal, Kedir M.; Sidebe, Desire; Ali, Sharib
2014-01-07
Despite several attempts, automated detection of microaneurysm (MA) from digital fundus images still remains to be an open issue. This is due to the subtle nature of MAs against the surrounding tissues. In this paper, the microaneurysm detection problem is modeled as finding interest regions or blobs from an image and an automatic local-scale selection technique is presented. Several scale-adapted region descriptors are then introduced to characterize these blob regions. A semi-supervised based learning approach, which requires few manually annotated learning examples, is also proposed to train a classifier to detect true MAs. The developed system is built using onlymore » few manually labeled and a large number of unlabeled retinal color fundus images. The performance of the overall system is evaluated on Retinopathy Online Challenge (ROC) competition database. A competition performance measure (CPM) of 0.364 shows the competitiveness of the proposed system against state-of-the art techniques as well as the applicability of the proposed features to analyze fundus images.« less
Agile text mining for the 2014 i2b2/UTHealth Cardiac risk factors challenge.
Cormack, James; Nath, Chinmoy; Milward, David; Raja, Kalpana; Jonnalagadda, Siddhartha R
2015-12-01
This paper describes the use of an agile text mining platform (Linguamatics' Interactive Information Extraction Platform, I2E) to extract document-level cardiac risk factors in patient records as defined in the i2b2/UTHealth 2014 challenge. The approach uses a data-driven rule-based methodology with the addition of a simple supervised classifier. We demonstrate that agile text mining allows for rapid optimization of extraction strategies, while post-processing can leverage annotation guidelines, corpus statistics and logic inferred from the gold standard data. We also show how data imbalance in a training set affects performance. Evaluation of this approach on the test data gave an F-Score of 91.7%, one percent behind the top performing system. Copyright © 2015 Elsevier Inc. All rights reserved.
Value, Cost, and Sharing: Open Issues in Constrained Clustering
NASA Technical Reports Server (NTRS)
Wagstaff, Kiri L.
2006-01-01
Clustering is an important tool for data mining, since it can identify major patterns or trends without any supervision (labeled data). Over the past five years, semi-supervised (constrained) clustering methods have become very popular. These methods began with incorporating pairwise constraints and have developed into more general methods that can learn appropriate distance metrics. However, several important open questions have arisen about which constraints are most useful, how they can be actively acquired, and when and how they should be propagated to neighboring points. This position paper describes these open questions and suggests future directions for constrained clustering research.
Exploring Spanish health social media for detecting drug effects
2015-01-01
Background Adverse Drug reactions (ADR) cause a high number of deaths among hospitalized patients in developed countries. Major drug agencies have devoted a great interest in the early detection of ADRs due to their high incidence and increasing health care costs. Reporting systems are available in order for both healthcare professionals and patients to alert about possible ADRs. However, several studies have shown that these adverse events are underestimated. Our hypothesis is that health social networks could be a significant information source for the early detection of ADRs as well as of new drug indications. Methods In this work we present a system for detecting drug effects (which include both adverse drug reactions as well as drug indications) from user posts extracted from a Spanish health forum. Texts were processed using MeaningCloud, a multilingual text analysis engine, to identify drugs and effects. In addition, we developed the first Spanish database storing drugs as well as their effects automatically built from drug package inserts gathered from online websites. We then applied a distant-supervision method using the database on a collection of 84,000 messages in order to extract the relations between drugs and their effects. To classify the relation instances, we used a kernel method based only on shallow linguistic information of the sentences. Results Regarding Relation Extraction of drugs and their effects, the distant supervision approach achieved a recall of 0.59 and a precision of 0.48. Conclusions The task of extracting relations between drugs and their effects from social media is a complex challenge due to the characteristics of social media texts. These texts, typically posts or tweets, usually contain many grammatical errors and spelling mistakes. Moreover, patients use lay terminology to refer to diseases, symptoms and indications that is not usually included in lexical resources in languages other than English. PMID:26100267
25 CFR 115.403 - Who will receive information regarding a minor's supervised account?
Code of Federal Regulations, 2010 CFR
2010-04-01
... receive information regarding a minor's supervised account? (a) The parent(s) with legal custody of the... 25 Indians 1 2010-04-01 2010-04-01 false Who will receive information regarding a minor's supervised account? 115.403 Section 115.403 Indians BUREAU OF INDIAN AFFAIRS, DEPARTMENT OF THE INTERIOR...
Roberts, Kirk; Shooshan, Sonya E; Rodriguez, Laritza; Abhyankar, Swapna; Kilicoglu, Halil; Demner-Fushman, Dina
2015-12-01
This paper describes a supervised machine learning approach for identifying heart disease risk factors in clinical text, and assessing the impact of annotation granularity and quality on the system's ability to recognize these risk factors. We utilize a series of support vector machine models in conjunction with manually built lexicons to classify triggers specific to each risk factor. The features used for classification were quite simple, utilizing only lexical information and ignoring higher-level linguistic information such as syntax and semantics. Instead, we incorporated high-quality data to train the models by annotating additional information on top of a standard corpus. Despite the relative simplicity of the system, it achieves the highest scores (micro- and macro-F1, and micro- and macro-recall) out of the 20 participants in the 2014 i2b2/UTHealth Shared Task. This system obtains a micro- (macro-) precision of 0.8951 (0.8965), recall of 0.9625 (0.9611), and F1-measure of 0.9276 (0.9277). Additionally, we perform a series of experiments to assess the value of the annotated data we created. These experiments show how manually-labeled negative annotations can improve information extraction performance, demonstrating the importance of high-quality, fine-grained natural language annotations. Copyright © 2015 Elsevier Inc. All rights reserved.
Federal Register 2010, 2011, 2012, 2013, 2014
2010-05-11
... DEPARTMENT OF THE TREASURY Office of Thrift Supervision '34 Disclosures AGENCY: Office of Thrift Supervision (OTS), Treasury. ACTION: Notice and request for comment. SUMMARY: The proposed information... Supervision, 1700 G Street, NW., Washington, DC 20552. SUPPLEMENTARY INFORMATION: OTS may not conduct or...
SUPERCRITICAL FLUID EXTRACTION OF SEMI-VOLATILE ORGANIC COMPOUNDS FROM PARTICLES
A nitrogen oxide flux chamber was modified to measure the flux of semi-volatile organic compounds (SVOCs). Part of the modification involved the development of methods to extract SVOCs from polyurethane foam (PUF), sand, and soil. Breakthroughs and extraction efficiencies were ...
Code of Federal Regulations, 2010 CFR
2010-01-01
... derogatory under the criteria listed in 10 CFR part 710, subpart A. Semi-structured interview means an interview by a Designated Psychologist, or a psychologist under his or her supervision, who has the latitude to vary the focus and content of the questions depending on the interviewee's responses. Site...
Code of Federal Regulations, 2011 CFR
2011-01-01
... derogatory under the criteria listed in 10 CFR part 710, subpart A. Semi-structured interview means an interview by a Designated Psychologist, or a psychologist under his or her supervision, who has the latitude to vary the focus and content of the questions depending on the interviewee's responses. Site...
Hidden discriminative features extraction for supervised high-order time series modeling.
Nguyen, Ngoc Anh Thi; Yang, Hyung-Jeong; Kim, Sunhee
2016-11-01
In this paper, an orthogonal Tucker-decomposition-based extraction of high-order discriminative subspaces from a tensor-based time series data structure is presented, named as Tensor Discriminative Feature Extraction (TDFE). TDFE relies on the employment of category information for the maximization of the between-class scatter and the minimization of the within-class scatter to extract optimal hidden discriminative feature subspaces that are simultaneously spanned by every modality for supervised tensor modeling. In this context, the proposed tensor-decomposition method provides the following benefits: i) reduces dimensionality while robustly mining the underlying discriminative features, ii) results in effective interpretable features that lead to an improved classification and visualization, and iii) reduces the processing time during the training stage and the filtering of the projection by solving the generalized eigenvalue issue at each alternation step. Two real third-order tensor-structures of time series datasets (an epilepsy electroencephalogram (EEG) that is modeled as channel×frequency bin×time frame and a microarray data that is modeled as gene×sample×time) were used for the evaluation of the TDFE. The experiment results corroborate the advantages of the proposed method with averages of 98.26% and 89.63% for the classification accuracies of the epilepsy dataset and the microarray dataset, respectively. These performance averages represent an improvement on those of the matrix-based algorithms and recent tensor-based, discriminant-decomposition approaches; this is especially the case considering the small number of samples that are used in practice. Copyright © 2016 Elsevier Ltd. All rights reserved.
van de Belt, Tom H; Engelen, Lucien J L P G; Verhoef, Lise M; van der Weide, Marian J A; Schoonhoven, Lisette; Kool, Rudolf B
2015-01-15
Social media has become mainstream and a growing number of people use it to share health care-related experiences, for example on health care rating sites. These users' experiences and ratings on social media seem to be associated with quality of care. Therefore, information shared by citizens on social media could be of additional value for supervising the quality and safety of health care services by regulatory bodies, thereby stimulating participation by consumers. The objective of the study was to identify the added value of social media for two types of supervision by the Dutch Healthcare Inspectorate (DHI), which is the regulatory body charged with supervising the quality and safety of health care services in the Netherlands. These were (1) supervision in response to incidents reported by individuals, and (2) risk-based supervision. We performed an exploratory study in cooperation with the DHI and searched different social media sources such as Twitter, Facebook, and healthcare rating sites to find additional information for these incidents and topics, from five different sectors. Supervision experts determined the added value for each individual result found, making use of pre-developed scales. Searches in social media resulted in relevant information for six of 40 incidents studied and provided relevant additional information in 72 of 116 cases in risk-based supervision of long-term elderly care. The results showed that social media could be used to include the patient's perspective in supervision. However, it appeared that the rating site ZorgkaartNederland was the only source that provided information that was of additional value for the DHI, while other sources such as forums and social networks like Twitter and Facebook did not result in additional information. This information could be of importance for health care inspectorates, particularly for its enforcement by risk-based supervision in care of the elderly. Further research is needed to determine the added value for other health care sectors.
Using Patient Experiences on Dutch Social Media to Supervise Health Care Services: Exploratory Study
Engelen, Lucien JLPG; Verhoef, Lise M; van der Weide, Marian JA; Schoonhoven, Lisette; Kool, Rudolf B
2015-01-01
Background Social media has become mainstream and a growing number of people use it to share health care-related experiences, for example on health care rating sites. These users’ experiences and ratings on social media seem to be associated with quality of care. Therefore, information shared by citizens on social media could be of additional value for supervising the quality and safety of health care services by regulatory bodies, thereby stimulating participation by consumers. Objective The objective of the study was to identify the added value of social media for two types of supervision by the Dutch Healthcare Inspectorate (DHI), which is the regulatory body charged with supervising the quality and safety of health care services in the Netherlands. These were (1) supervision in response to incidents reported by individuals, and (2) risk-based supervision. Methods We performed an exploratory study in cooperation with the DHI and searched different social media sources such as Twitter, Facebook, and healthcare rating sites to find additional information for these incidents and topics, from five different sectors. Supervision experts determined the added value for each individual result found, making use of pre-developed scales. Results Searches in social media resulted in relevant information for six of 40 incidents studied and provided relevant additional information in 72 of 116 cases in risk-based supervision of long-term elderly care. Conclusions The results showed that social media could be used to include the patient’s perspective in supervision. However, it appeared that the rating site ZorgkaartNederland was the only source that provided information that was of additional value for the DHI, while other sources such as forums and social networks like Twitter and Facebook did not result in additional information. This information could be of importance for health care inspectorates, particularly for its enforcement by risk-based supervision in care of the elderly. Further research is needed to determine the added value for other health care sectors. PMID:25592481
NASA Astrophysics Data System (ADS)
Skersys, Tomas; Butleris, Rimantas; Kapocius, Kestutis
2013-10-01
Approaches for the analysis and specification of business vocabularies and rules are very relevant topics in both Business Process Management and Information Systems Development disciplines. However, in common practice of Information Systems Development, the Business modeling activities still are of mostly empiric nature. In this paper, basic aspects of the approach for business vocabularies' semi-automated extraction from business process models are presented. The approach is based on novel business modeling-level OMG standards "Business Process Model and Notation" (BPMN) and "Semantics for Business Vocabularies and Business Rules" (SBVR), thus contributing to OMG's vision about Model-Driven Architecture (MDA) and to model-driven development in general.
Using soft-hard fusion for misinformation detection and pattern of life analysis in OSINT
NASA Astrophysics Data System (ADS)
Levchuk, Georgiy; Shabarekh, Charlotte
2017-05-01
Today's battlefields are shifting to "denied areas", where the use of U.S. Military air and ground assets is limited. To succeed, the U.S. intelligence analysts increasingly rely on available open-source intelligence (OSINT) which is fraught with inconsistencies, biased reporting and fake news. Analysts need automated tools for retrieval of information from OSINT sources, and these solutions must identify and resolve conflicting and deceptive information. In this paper, we present a misinformation detection model (MDM) which converts text to attributed knowledge graphs and runs graph-based analytics to identify misinformation. At the core of our solution is identification of knowledge conflicts in the fused multi-source knowledge graph, and semi-supervised learning to compute locally consistent reliability and credibility scores for the documents and sources, respectively. We present validation of proposed method using an open source dataset constructed from the online investigations of MH17 downing in Eastern Ukraine.
Topic Identification and Categorization of Public Information in Community-Based Social Media
NASA Astrophysics Data System (ADS)
Kusumawardani, RP; Basri, MH
2017-01-01
This paper presents a work on a semi-supervised method for topic identification and classification of short texts in the social media, and its application on tweets containing dialogues in a large community of dwellers in a city, written mostly in Indonesian. These dialogues comprise a wealth of information about the city, shared in real-time. We found that despite the high irregularity of the language used, and the scarcity of suitable linguistic resources, a meaningful identification of topics could be performed by clustering the tweets using the K-Means algorithm. The resulting clusters are found to be robust enough to be the basis of a classification. On three grouping schemes derived from the clusters, we get accuracy of 95.52%, 95.51%, and 96.7 using linear SVMs, reflecting the applicability of applying this method for generating topic identification and classification on such data.
Ziatdinov, Maxim; Dyck, Ondrej; Maksov, Artem; Li, Xufan; Sang, Xiahan; Xiao, Kai; Unocic, Raymond R; Vasudevan, Rama; Jesse, Stephen; Kalinin, Sergei V
2017-12-26
Recent advances in scanning transmission electron and scanning probe microscopies have opened exciting opportunities in probing the materials structural parameters and various functional properties in real space with angstrom-level precision. This progress has been accompanied by an exponential increase in the size and quality of data sets produced by microscopic and spectroscopic experimental techniques. These developments necessitate adequate methods for extracting relevant physical and chemical information from the large data sets, for which a priori information on the structures of various atomic configurations and lattice defects is limited or absent. Here we demonstrate an application of deep neural networks to extract information from atomically resolved images including location of the atomic species and type of defects. We develop a "weakly supervised" approach that uses information on the coordinates of all atomic species in the image, extracted via a deep neural network, to identify a rich variety of defects that are not part of an initial training set. We further apply our approach to interpret complex atomic and defect transformation, including switching between different coordination of silicon dopants in graphene as a function of time, formation of peculiar silicon dimer with mixed 3-fold and 4-fold coordination, and the motion of molecular "rotor". This deep learning-based approach resembles logic of a human operator, but can be scaled leading to significant shift in the way of extracting and analyzing information from raw experimental data.
2012-01-01
Background In-vivo single voxel proton magnetic resonance spectroscopy (SV 1H-MRS), coupled with supervised pattern recognition (PR) methods, has been widely used in clinical studies of discrimination of brain tumour types and follow-up of patients bearing abnormal brain masses. SV 1H-MRS provides useful biochemical information about the metabolic state of tumours and can be performed at short (< 45 ms) or long (> 45 ms) echo time (TE), each with particular advantages. Short-TE spectra are more adequate for detecting lipids, while the long-TE provides a much flatter signal baseline in between peaks but also negative signals for metabolites such as lactate. Both, lipids and lactate, are respectively indicative of specific metabolic processes taking place. Ideally, the information provided by both TE should be of use for clinical purposes. In this study, we characterise the performance of a range of Non-negative Matrix Factorisation (NMF) methods in two respects: first, to derive sources correlated with the mean spectra of known tissue types (tumours and normal tissue); second, taking the best performing NMF method for source separation, we compare its accuracy for class assignment when using the mixing matrix directly as a basis for classification, as against using the method for dimensionality reduction (DR). For this, we used SV 1H-MRS data with positive and negative peaks, from a widely tested SV 1H-MRS human brain tumour database. Results The results reported in this paper reveal the advantage of using a recently described variant of NMF, namely Convex-NMF, as an unsupervised method of source extraction from SV1H-MRS. Most of the sources extracted in our experiments closely correspond to the mean spectra of some of the analysed tumour types. This similarity allows accurate diagnostic predictions to be made both in fully unsupervised mode and using Convex-NMF as a DR step previous to standard supervised classification. The obtained results are comparable to, or more accurate than those obtained with supervised techniques. Conclusions The unsupervised properties of Convex-NMF place this approach one step ahead of classical label-requiring supervised methods for the discrimination of brain tumour types, as it accounts for their increasingly recognised molecular subtype heterogeneity. The application of Convex-NMF in computer assisted decision support systems is expected to facilitate further improvements in the uptake of MRS-derived information by clinicians. PMID:22401579
Changing over a Project Changing over: A Project-Research Supervision as a Conversation
ERIC Educational Resources Information Center
Clarke, Helen; Ryan, Charly
2006-01-01
Extracts from the written conversation between research student and supervisor show the nature of educative research supervision. The authors argue that researcher-supervisor relationships are methodological in nature as they shape and influence the people, the project and the field. Such relationships, which construct meanings, are complex. A…
Semi-Supervised Clustering for High-Dimensional and Sparse Features
ERIC Educational Resources Information Center
Yan, Su
2010-01-01
Clustering is one of the most common data mining tasks, used frequently for data organization and analysis in various application domains. Traditional machine learning approaches to clustering are fully automated and unsupervised where class labels are unknown a priori. In real application domains, however, some "weak" form of side…
Intellectual Production Supervision Perform based on RFID Smart Electricity Meter
NASA Astrophysics Data System (ADS)
Chen, Xiangqun; Huang, Rui; Shen, Liman; chen, Hao; Xiong, Dezhi; Xiao, Xiangqi; Liu, Mouhai; Xu, Renheng
2018-03-01
This topic develops the RFID intelligent electricity meter production supervision project management system. The system is designed for energy meter production supervision in the management of the project schedule, quality and cost information management requirements in RFID intelligent power, and provide quantitative information more comprehensive, timely and accurate for supervision engineer and project manager management decisions, and to provide technical information for the product manufacturing stage file. From the angle of scheme analysis, design, implementation and test, the system development of production supervision project management system for RFID smart meter project is discussed. Focus on the development of the system, combined with the main business application and management mode at this stage, focuses on the energy meter to monitor progress information, quality information and cost based information on RFID intelligent power management function. The paper introduces the design scheme of the system, the overall client / server architecture, client oriented graphical user interface universal, complete the supervision of project management and interactive transaction information display, the server system of realizing the main program. The system is programmed with C# language and.NET operating environment, and the client and server platforms use Windows operating system, and the database server software uses Oracle. The overall platform supports mainstream information and standards and has good scalability.
On the Implementation of a Land Cover Classification System for SAR Images Using Khoros
NASA Technical Reports Server (NTRS)
Medina Revera, Edwin J.; Espinosa, Ramon Vasquez
1997-01-01
The Synthetic Aperture Radar (SAR) sensor is widely used to record data about the ground under all atmospheric conditions. The SAR acquired images have very good resolution which necessitates the development of a classification system that process the SAR images to extract useful information for different applications. In this work, a complete system for the land cover classification was designed and programmed using the Khoros, a data flow visual language environment, taking full advantages of the polymorphic data services that it provides. Image analysis was applied to SAR images to improve and automate the processes of recognition and classification of the different regions like mountains and lakes. Both unsupervised and supervised classification utilities were used. The unsupervised classification routines included the use of several Classification/Clustering algorithms like the K-means, ISO2, Weighted Minimum Distance, and the Localized Receptive Field (LRF) training/classifier. Different texture analysis approaches such as Invariant Moments, Fractal Dimension and Second Order statistics were implemented for supervised classification of the images. The results and conclusions for SAR image classification using the various unsupervised and supervised procedures are presented based on their accuracy and performance.
Kok, Maryse C; Kea, Aschenaki Z; Datiko, Daniel G; Broerse, Jacqueline E W; Dieleman, Marjolein; Taegtmeyer, Miriam; Tulloch, Olivia
2015-09-30
Health extension workers (HEWs) in Ethiopia have a unique position, connecting communities to the health sector. This intermediary position requires strong interpersonal relationships with actors in both the community and health sector, in order to enhance HEW performance. This study aimed to understand how relationships between HEWs, the community and health sector were shaped, in order to inform policy on optimizing HEW performance in providing maternal health services. We conducted a qualitative study in six districts in the Sidama zone, which included focus group discussions (FGDs) with HEWs, women and men from the community and semi-structured interviews with HEWs; key informants working in programme management, health service delivery and supervision of HEWs; mothers; and traditional birth attendants. Respondents were asked about facilitators and barriers regarding HEWs' relationships with the community and health sector. Interviews and FGDs were recorded, transcribed, translated, coded and thematically analysed. HEWs were selected by their communities, which enhanced trust and engagement between them. Relationships were facilitated by programme design elements related to support, referral, supervision, training, monitoring and accountability. Trust, communication and dialogue and expectations influenced the strength of relationships. From the community side, the health development army supported HEWs in liaising with community members. From the health sector side, top-down supervision and inadequate training possibilities hampered relationships and demotivated HEWs. Health professionals, administrators, HEWs and communities occasionally met to monitor HEW and programme performance. Expectations from the community and health sector regarding HEWs' tasks sometimes differed, negatively affecting motivation and satisfaction of HEWs. HEWs' relationships with the community and health sector can be constrained as a result of inadequate support systems, lack of trust, communication and dialogue and differing expectations. Clearly defined roles at all levels and standardized support, monitoring and accountability, referral, supervision and training, which are executed regularly with clear communication lines, could improve dialogue and trust between HEWs and actors from the community and health sector. This is important to increase HEW performance and maximize the value of HEWs' unique position.
Neural computing for numeric-to-symbolic conversion in control systems
NASA Technical Reports Server (NTRS)
Passino, Kevin M.; Sartori, Michael A.; Antsaklis, Panos J.
1989-01-01
A type of neural network, the multilayer perceptron, is used to classify numeric data and assign appropriate symbols to various classes. This numeric-to-symbolic conversion results in a type of information extraction, which is similar to what is called data reduction in pattern recognition. The use of the neural network as a numeric-to-symbolic converter is introduced, its application in autonomous control is discussed, and several applications are studied. The perceptron is used as a numeric-to-symbolic converter for a discrete-event system controller supervising a continuous variable dynamic system. It is also shown how the perceptron can implement fault trees, which provide useful information (alarms) in a biological system and information for failure diagnosis and control purposes in an aircraft example.
Semi-supervised Machine Learning for Analysis of Hydrogeochemical Data and Models
NASA Astrophysics Data System (ADS)
Vesselinov, Velimir; O'Malley, Daniel; Alexandrov, Boian; Moore, Bryan
2017-04-01
Data- and model-based analyses such as uncertainty quantification, sensitivity analysis, and decision support using complex physics models with numerous model parameters and typically require a huge number of model evaluations (on order of 10^6). Furthermore, model simulations of complex physics may require substantial computational time. For example, accounting for simultaneously occurring physical processes such as fluid flow and biogeochemical reactions in heterogeneous porous medium may require several hours of wall-clock computational time. To address these issues, we have developed a novel methodology for semi-supervised machine learning based on Non-negative Matrix Factorization (NMF) coupled with customized k-means clustering. The algorithm allows for automated, robust Blind Source Separation (BSS) of groundwater types (contamination sources) based on model-free analyses of observed hydrogeochemical data. We have also developed reduced order modeling tools, which coupling support vector regression (SVR), genetic algorithms (GA) and artificial and convolutional neural network (ANN/CNN). SVR is applied to predict the model behavior within prior uncertainty ranges associated with the model parameters. ANN and CNN procedures are applied to upscale heterogeneity of the porous medium. In the upscaling process, fine-scale high-resolution models of heterogeneity are applied to inform coarse-resolution models which have improved computational efficiency while capturing the impact of fine-scale effects at the course scale of interest. These techniques are tested independently on a series of synthetic problems. We also present a decision analysis related to contaminant remediation where the developed reduced order models are applied to reproduce groundwater flow and contaminant transport in a synthetic heterogeneous aquifer. The tools are coded in Julia and are a part of the MADS high-performance computational framework (https://github.com/madsjulia/Mads.jl).
Raffing, Rie; Jensen, Thor Bern; Tønnesen, Hanne
2017-10-23
Quality of supervision is a major predictor for successful PhD projects. A survey showed that almost all PhD students in the Health Sciences in Denmark indicated that good supervision was important for the completion of their PhD study. Interestingly, approximately half of the students who withdrew from their program had experienced insufficient supervision. This led the Research Education Committee at the University of Copenhagen to recommend that supervisors further develop their supervision competence. The aim of this study was to explore PhD supervisors' self-reported needs and wishes regarding the content of a new program in supervision, with a special focus on the supervision of PhD students in medical fields. A semi-structured interview guide was developed, and 20 PhD supervisors from the Graduate School of Health and Medical Sciences at the Faculty of Health and Medical Sciences at the University of Copenhagen were interviewed. Empirical data were analysed using qualitative methods of analysis. Overall, the results indicated a general interest in improved competence and development of a new supervision programme. Those who were not interested argued that, due to their extensive experience with supervision, they had no need to participate in such a programme. The analysis revealed seven overall themes to be included in the course. The clinical context offers PhD supervisors additional challenges that include the following sub-themes: patient recruitment, writing the first article, agreements and scheduled appointments and two main groups of students, in addition to the main themes. The PhD supervisors reported the clear need and desire for a competence enhancement programme targeting the supervision of PhD students at the Faculty of Health and Medical Sciences. Supervision in the clinical context appeared to require additional competence. The Scientific Ethical Committee for the Capital Region of Denmark. Number: H-3-2010-101, date: 2010.09.29.
Land mine detection using multispectral image fusion
DOE Office of Scientific and Technical Information (OSTI.GOV)
Clark, G.A.; Sengupta, S.K.; Aimonetti, W.D.
1995-03-29
Our system fuses information contained in registered images from multiple sensors to reduce the effects of clutter and improve the ability to detect surface and buried land mines. The sensor suite currently consists of a camera that acquires images in six bands (400nm, 500nm, 600nm, 700nm, 800nm and 900nm). Past research has shown that it is extremely difficult to distinguish land mines from background clutter in images obtained from a single sensor. It is hypothesized, however, that information fused from a suite of various sensors is likely to provide better detection reliability, because the suite of sensors detects a varietymore » of physical properties that are more separable in feature space. The materials surrounding the mines can include natural materials (soil, rocks, foliage, water, etc.) and some artifacts. We use a supervised learning pattern recognition approach to detecting the metal and plastic land mines. The overall process consists of four main parts: Preprocessing, feature extraction, feature selection, and classification. These parts are used in a two step process to classify a subimage. We extract features from the images, and use feature selection algorithms to select only the most important features according to their contribution to correct detections. This allows us to save computational complexity and determine which of the spectral bands add value to the detection system. The most important features from the various sensors are fused using a supervised learning pattern classifier (the probabilistic neural network). We present results of experiments to detect land mines from real data collected from an airborne platform, and evaluate the usefulness of fusing feature information from multiple spectral bands.« less
Huang, Tao; Li, Xiao-yu; Jin, Rui; Ku, Jing; Xu, Sen-miao; Xu, Meng-ling; Wu, Zhen-zhong; Kong, De-guo
2015-04-01
The present paper put forward a non-destructive detection method which combines semi-transmission hyperspectral imaging technology with manifold learning dimension reduction algorithm and least squares support vector machine (LSSVM) to recognize internal and external defects in potatoes simultaneously. Three hundred fifteen potatoes were bought in farmers market as research object, and semi-transmission hyperspectral image acquisition system was constructed to acquire the hyperspectral images of normal external defects (bud and green rind) and internal defect (hollow heart) potatoes. In order to conform to the actual production, defect part is randomly put right, side and back to the acquisition probe when the hyperspectral images of external defects potatoes are acquired. The average spectrums (390-1,040 nm) were extracted from the region of interests for spectral preprocessing. Then three kinds of manifold learning algorithm were respectively utilized to reduce the dimension of spectrum data, including supervised locally linear embedding (SLLE), locally linear embedding (LLE) and isometric mapping (ISOMAP), the low-dimensional data gotten by manifold learning algorithms is used as model input, Error Correcting Output Code (ECOC) and LSSVM were combined to develop the multi-target classification model. By comparing and analyzing results of the three models, we concluded that SLLE is the optimal manifold learning dimension reduction algorithm, and the SLLE-LSSVM model is determined to get the best recognition rate for recognizing internal and external defects potatoes. For test set data, the single recognition rate of normal, bud, green rind and hollow heart potato reached 96.83%, 86.96%, 86.96% and 95% respectively, and he hybrid recognition rate was 93.02%. The results indicate that combining the semi-transmission hyperspectral imaging technology with SLLE-LSSVM is a feasible qualitative analytical method which can simultaneously recognize the internal and external defects potatoes and also provide technical reference for rapid on-line non-destructive detecting of the internal and external defects potatoes.
Task-driven dictionary learning.
Mairal, Julien; Bach, Francis; Ponce, Jean
2012-04-01
Modeling data with linear combinations of a few elements from a learned dictionary has been the focus of much recent research in machine learning, neuroscience, and signal processing. For signals such as natural images that admit such sparse representations, it is now well established that these models are well suited to restoration tasks. In this context, learning the dictionary amounts to solving a large-scale matrix factorization problem, which can be done efficiently with classical optimization tools. The same approach has also been used for learning features from data for other purposes, e.g., image classification, but tuning the dictionary in a supervised way for these tasks has proven to be more difficult. In this paper, we present a general formulation for supervised dictionary learning adapted to a wide variety of tasks, and present an efficient algorithm for solving the corresponding optimization problem. Experiments on handwritten digit classification, digital art identification, nonlinear inverse image problems, and compressed sensing demonstrate that our approach is effective in large-scale settings, and is well suited to supervised and semi-supervised classification, as well as regression tasks for data that admit sparse representations.
Application of remote sensing to state and regional problems
NASA Technical Reports Server (NTRS)
Miller, W. F. (Principal Investigator); Quattrochi, D. A.; Carter, B. D.; Higgs, G. K.; Solomon, J. L.; Wax, C. L.
1979-01-01
The author has identified the following significant results. The Lowndes County data base is essentially complete with 18 primary variables and 16 proximity variables encoded into the geo-information system. The single purpose, decision tree classifier is now operational. Signatures for the thematic extraction of strip mines from LANDSAT Digital data were obtained by employing both supervised and nonsupervised procedures. Dry, blowing sand areas of beach were also identified from the LANDSAT data. The primary procedure was the analysis of analog data on the I2S signal slicer.
Roberton, Timothy; Applegate, Jennifer; Lefevre, Amnesty E; Mosha, Idda; Cooper, Chelsea M; Silverman, Marissa; Feldhaus, Isabelle; Chebet, Joy J; Mpembeni, Rose; Semu, Helen; Killewo, Japhet; Winch, Peter; Baqui, Abdullah H; George, Asha S
2015-04-09
Supervision is meant to improve the performance and motivation of community health workers (CHWs). However, most evidence on supervision relates to facility health workers. The Integrated Maternal, Newborn, and Child Health (MNCH) Program in Morogoro region, Tanzania, implemented a CHW pilot with a cascade supervision model where facility health workers were trained in supportive supervision for volunteer CHWs, supported by regional and district staff, and with village leaders to further support CHWs. We examine the initial experiences of CHWs, their supervisors, and village leaders to understand the strengths and challenges of such a supervision model for CHWs. Quantitative and qualitative data were collected concurrently from CHWs, supervisors, and village leaders. A survey was administered to 228 (96%) of the CHWs in the Integrated MNCH Program and semi-structured interviews were conducted with 15 CHWs, 8 supervisors, and 15 village leaders purposefully sampled to represent different actor perspectives from health centre catchment villages in Morogoro region. Descriptive statistics analysed the frequency and content of CHW supervision, while thematic content analysis explored CHW, supervisor, and village leader experiences with CHW supervision. CHWs meet with their facility-based supervisors an average of 1.2 times per month. CHWs value supervision and appreciate the sense of legitimacy that arises when supervisors visit them in their village. Village leaders and district staff are engaged and committed to supporting CHWs. Despite these successes, facility-based supervisors visit CHWs in their village an average of only once every 2.8 months, CHWs and supervisors still see supervision primarily as an opportunity to check reports, and meetings with district staff are infrequent and not well scheduled. Supervision of CHWs could be strengthened by streamlining supervision protocols to focus less on report checking and more on problem solving and skills development. Facility health workers, while important for technical oversight, may not be the best mentors for certain tasks such as community relationship-building. We suggest further exploring CHW supervision innovations, such as an enhanced role for community actors, who may be more suitable to support CHWs engaged primarily in health promotion than scarce and over-worked facility health workers.
Kulstein, Galina; Marienfeld, Ralf; Miltner, Erich; Wiegand, Peter
2016-10-01
In the last years, microRNA (miRNA) analysis came into focus in the field of forensic genetics. Yet, no standardized and recommendable protocols for co-isolation of miRNA and DNA from forensic relevant samples have been developed so far. Hence, this study evaluated the performance of an automated Maxwell® 16 System-based strategy (Promega) for co-extraction of DNA and miRNA from forensically relevant (blood and saliva) samples compared to (semi-)manual extraction methods. Three procedures were compared on the basis of recovered quantity of DNA and miRNA (as determined by real-time PCR and Bioanalyzer), miRNA profiling (shown by Cq values and extraction efficiency), STR profiles, duration, contamination risk and handling. All in all, the results highlight that the automated co-extraction procedure yielded the highest miRNA and DNA amounts from saliva and blood samples compared to both (semi-)manual protocols. Also, for aged and genuine samples of forensically relevant traces the miRNA and DNA yields were sufficient for subsequent downstream analysis. Furthermore, the strategy allows miRNA extraction only in cases where it is relevant to obtain additional information about the sample type. Besides, this system enables flexible sample throughput and labor-saving sample processing with reduced risk of cross-contamination. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Arias-Ramos, Nuria; Ferrer-Font, Laura; Lope-Piedrafita, Silvia; Mocioiu, Victor; Julià-Sapé, Margarida; Pumarola, Martí; Arús, Carles; Candiota, Ana Paula
2017-01-01
Glioblastoma (GBM) is the most common aggressive primary brain tumor in adults, with a short survival time even after aggressive therapy. Non-invasive surrogate biomarkers of therapy response may be relevant for improving patient survival. Previous work produced such biomarkers in preclinical GBM using semi-supervised source extraction and single-slice Magnetic Resonance Spectroscopic Imaging (MRSI). Nevertheless, GBMs are heterogeneous and single-slice studies could prevent obtaining relevant information. The purpose of this work was to evaluate whether a multi-slice MRSI approach, acquiring consecutive grids across the tumor, is feasible for preclinical models and may produce additional insight into therapy response. Nosological images were analyzed pixel-by-pixel and a relative responding volume, the Tumor Responding Index (TRI), was defined to quantify response. Heterogeneous response levels were observed and treated animals were ascribed to three arbitrary predefined groups: high response (HR, n = 2), TRI = 68.2 ± 2.8%, intermediate response (IR, n = 6), TRI = 41.1 ± 4.2% and low response (LR, n = 2), TRI = 13.4 ± 14.3%, producing therapy response categorization which had not been fully registered in single-slice studies. Results agreed with the multi-slice approach being feasible and producing an inverse correlation between TRI and Ki67 immunostaining. Additionally, ca. 7-day oscillations of TRI were observed, suggesting that host immune system activation in response to treatment could contribute to the responding patterns detected. PMID:28524099
Actively learning to distinguish suspicious from innocuous anomalies in a batch of vehicle tracks
NASA Astrophysics Data System (ADS)
Qiu, Zhicong; Miller, David J.; Stieber, Brian; Fair, Tim
2014-06-01
We investigate the problem of actively learning to distinguish between two sets of anomalous vehicle tracks, innocuous" and suspicious", starting from scratch, without any initial examples of suspicious" and with no prior knowledge of what an operator would deem suspicious. This two-class problem is challenging because it is a priori unknown which track features may characterize the suspicious class. Furthermore, there is inherent imbalance in the sizes of the labeled innocuous" and suspicious" sets, even after some suspicious examples are identified. We present a comprehensive solution wherein a classifier learns to discriminate suspicious from innocuous based on derived p-value track features. Through active learning, our classifier thus learns the types of anomalies on which to base its discrimination. Our solution encompasses: i) judicious choice of kinematic p-value based features conditioned on the road of origin, along with more explicit features that capture unique vehicle behavior (e.g. U-turns); ii) novel semi-supervised learning that exploits information in the unlabeled (test batch) tracks, and iii) evaluation of several classifier models (logistic regression, SVMs). We find that two active labeling streams are necessary in practice in order to have efficient classifier learning while also forwarding (for labeling) the most actionable tracks. Experiments on wide-area motion imagery (WAMI) tracks, extracted via a system developed by Toyon Research Corporation, demonstrate the strong ROC AUC performance of our system, with sparing use of operator-based active labeling.
Multi-viewpoint Coronal Mass Ejection Catalog Based on STEREO COR2 Observations
NASA Astrophysics Data System (ADS)
Vourlidas, Angelos; Balmaceda, Laura A.; Stenborg, Guillermo; Dal Lago, Alisson
2017-04-01
We present the first multi-viewpoint coronal mass ejection (CME) catalog. The events are identified visually in simultaneous total brightness observations from the twin SECCHI/COR2 coronagraphs on board the Solar Terrestrial Relations Observatory mission. The Multi-View CME Catalog differs from past catalogs in three key aspects: (1) all events between the two viewpoints are cross-linked, (2) each event is assigned a physics-motivated morphological classification (e.g., jet, wave, and flux rope), and (3) kinematic and geometric information is extracted semi-automatically via a supervised image segmentation algorithm. The database extends from the beginning of the COR2 synoptic program (2007 March) to the end of dual-viewpoint observations (2014 September). It contains 4473 unique events with 3358 events identified in both COR2s. Kinematic properties exist currently for 1747 events (26% of COR2-A events and 17% of COR2-B events). We examine several issues, made possible by this cross-linked CME database, including the role of projection on the perceived morphology of events, the missing CME rate, the existence of cool material in CMEs, the solar cycle dependence on CME rate, speeds and width, and the existence of flux rope within CMEs. We discuss the implications for past single-viewpoint studies and for Space Weather research. The database is publicly available on the web including all available measurements. We hope that it will become a useful resource for the community.
Drift in Children's Categories: When Experienced Distributions Conflict with Prior Learning
ERIC Educational Resources Information Center
Kalish, Charles W.; Zhu, XiaoJin; Rogers, Timothy T.
2015-01-01
Psychological intuitions about natural category structure do not always correspond to the true structure of the world. The current study explores young children's responses to conflict between intuitive structure and authoritative feedback using a semi-supervised learning (Zhu et al., 2007) paradigm. In three experiments, 160 children between the…
Adal, Kedir M; Sidibé, Désiré; Ali, Sharib; Chaum, Edward; Karnowski, Thomas P; Mériaudeau, Fabrice
2014-04-01
Despite several attempts, automated detection of microaneurysm (MA) from digital fundus images still remains to be an open issue. This is due to the subtle nature of MAs against the surrounding tissues. In this paper, the microaneurysm detection problem is modeled as finding interest regions or blobs from an image and an automatic local-scale selection technique is presented. Several scale-adapted region descriptors are introduced to characterize these blob regions. A semi-supervised based learning approach, which requires few manually annotated learning examples, is also proposed to train a classifier which can detect true MAs. The developed system is built using only few manually labeled and a large number of unlabeled retinal color fundus images. The performance of the overall system is evaluated on Retinopathy Online Challenge (ROC) competition database. A competition performance measure (CPM) of 0.364 shows the competitiveness of the proposed system against state-of-the art techniques as well as the applicability of the proposed features to analyze fundus images. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Levchuk, Georgiy; Colonna-Romano, John; Eslami, Mohammed
2017-05-01
The United States increasingly relies on cyber-physical systems to conduct military and commercial operations. Attacks on these systems have increased dramatically around the globe. The attackers constantly change their methods, making state-of-the-art commercial and military intrusion detection systems ineffective. In this paper, we present a model to identify functional behavior of network devices from netflow traces. Our model includes two innovations. First, we define novel features for a host IP using detection of application graph patterns in IP's host graph constructed from 5-min aggregated packet flows. Second, we present the first application, to the best of our knowledge, of Graph Semi-Supervised Learning (GSSL) to the space of IP behavior classification. Using a cyber-attack dataset collected from NetFlow packet traces, we show that GSSL trained with only 20% of the data achieves higher attack detection rates than Support Vector Machines (SVM) and Naïve Bayes (NB) classifiers trained with 80% of data points. We also show how to improve detection quality by filtering out web browsing data, and conclude with discussion of future research directions.
Wulsin, D. F.; Gupta, J. R.; Mani, R.; Blanco, J. A.; Litt, B.
2011-01-01
Clinical electroencephalography (EEG) records vast amounts of human complex data yet is still reviewed primarily by human readers. Deep Belief Nets (DBNs) are a relatively new type of multi-layer neural network commonly tested on two-dimensional image data, but are rarely applied to times-series data such as EEG. We apply DBNs in a semi-supervised paradigm to model EEG waveforms for classification and anomaly detection. DBN performance was comparable to standard classifiers on our EEG dataset, and classification time was found to be 1.7 to 103.7 times faster than the other high-performing classifiers. We demonstrate how the unsupervised step of DBN learning produces an autoencoder that can naturally be used in anomaly measurement. We compare the use of raw, unprocessed data—a rarity in automated physiological waveform analysis—to hand-chosen features and find that raw data produces comparable classification and better anomaly measurement performance. These results indicate that DBNs and raw data inputs may be more effective for online automated EEG waveform recognition than other common techniques. PMID:21525569
Nonlinear Deep Kernel Learning for Image Annotation.
Jiu, Mingyuan; Sahbi, Hichem
2017-02-08
Multiple kernel learning (MKL) is a widely used technique for kernel design. Its principle consists in learning, for a given support vector classifier, the most suitable convex (or sparse) linear combination of standard elementary kernels. However, these combinations are shallow and often powerless to capture the actual similarity between highly semantic data, especially for challenging classification tasks such as image annotation. In this paper, we redefine multiple kernels using deep multi-layer networks. In this new contribution, a deep multiple kernel is recursively defined as a multi-layered combination of nonlinear activation functions, each one involves a combination of several elementary or intermediate kernels, and results into a positive semi-definite deep kernel. We propose four different frameworks in order to learn the weights of these networks: supervised, unsupervised, kernel-based semisupervised and Laplacian-based semi-supervised. When plugged into support vector machines (SVMs), the resulting deep kernel networks show clear gain, compared to several shallow kernels for the task of image annotation. Extensive experiments and analysis on the challenging ImageCLEF photo annotation benchmark, the COREL5k database and the Banana dataset validate the effectiveness of the proposed method.
Onder, Devrim; Sarioglu, Sulen; Karacali, Bilge
2013-04-01
Quasi-supervised learning is a statistical learning algorithm that contrasts two datasets by computing estimate for the posterior probability of each sample in either dataset. This method has not been applied to histopathological images before. The purpose of this study is to evaluate the performance of the method to identify colorectal tissues with or without adenocarcinoma. Light microscopic digital images from histopathological sections were obtained from 30 colorectal radical surgery materials including adenocarcinoma and non-neoplastic regions. The texture features were extracted by using local histograms and co-occurrence matrices. The quasi-supervised learning algorithm operates on two datasets, one containing samples of normal tissues labelled only indirectly, and the other containing an unlabeled collection of samples of both normal and cancer tissues. As such, the algorithm eliminates the need for manually labelled samples of normal and cancer tissues for conventional supervised learning and significantly reduces the expert intervention. Several texture feature vector datasets corresponding to different extraction parameters were tested within the proposed framework. The Independent Component Analysis dimensionality reduction approach was also identified as the one improving the labelling performance evaluated in this series. In this series, the proposed method was applied to the dataset of 22,080 vectors with reduced dimensionality 119 from 132. Regions containing cancer tissue could be identified accurately having false and true positive rates up to 19% and 88% respectively without using manually labelled ground-truth datasets in a quasi-supervised strategy. The resulting labelling performances were compared to that of a conventional powerful supervised classifier using manually labelled ground-truth data. The supervised classifier results were calculated as 3.5% and 95% for the same case. The results in this series in comparison with the benchmark classifier, suggest that quasi-supervised image texture labelling may be a useful method in the analysis and classification of pathological slides but further study is required to improve the results. Copyright © 2013 Elsevier Ltd. All rights reserved.
ERIC Educational Resources Information Center
Apostolidou, Zoe; Schweitzer, Robert
2017-01-01
The present study is the first study undertaken in Australia that seeks to explore practitioners' perspectives on the use of clinical supervision in their therapeutic engagement with asylum seekers and refugees. We used thematic analysis to analyse extracts of interviews that were conducted with nine professionals who worked therapeutically with…
[Object-oriented aquatic vegetation extracting approach based on visible vegetation indices.
Jing, Ran; Deng, Lei; Zhao, Wen Ji; Gong, Zhao Ning
2016-05-01
Using the estimation of scale parameters (ESP) image segmentation tool to determine the ideal image segmentation scale, the optimal segmented image was created by the multi-scale segmentation method. Based on the visible vegetation indices derived from mini-UAV imaging data, we chose a set of optimal vegetation indices from a series of visible vegetation indices, and built up a decision tree rule. A membership function was used to automatically classify the study area and an aquatic vegetation map was generated. The results showed the overall accuracy of image classification using the supervised classification was 53.7%, and the overall accuracy of object-oriented image analysis (OBIA) was 91.7%. Compared with pixel-based supervised classification method, the OBIA method improved significantly the image classification result and further increased the accuracy of extracting the aquatic vegetation. The Kappa value of supervised classification was 0.4, and the Kappa value based OBIA was 0.9. The experimental results demonstrated that using visible vegetation indices derived from the mini-UAV data and OBIA method extracting the aquatic vegetation developed in this study was feasible and could be applied in other physically similar areas.
Smart Annotation of Cyclic Data Using Hierarchical Hidden Markov Models.
Martindale, Christine F; Hoenig, Florian; Strohrmann, Christina; Eskofier, Bjoern M
2017-10-13
Cyclic signals are an intrinsic part of daily life, such as human motion and heart activity. The detailed analysis of them is important for clinical applications such as pathological gait analysis and for sports applications such as performance analysis. Labeled training data for algorithms that analyze these cyclic data come at a high annotation cost due to only limited annotations available under laboratory conditions or requiring manual segmentation of the data under less restricted conditions. This paper presents a smart annotation method that reduces this cost of labeling for sensor-based data, which is applicable to data collected outside of strict laboratory conditions. The method uses semi-supervised learning of sections of cyclic data with a known cycle number. A hierarchical hidden Markov model (hHMM) is used, achieving a mean absolute error of 0.041 ± 0.020 s relative to a manually-annotated reference. The resulting model was also used to simultaneously segment and classify continuous, 'in the wild' data, demonstrating the applicability of using hHMM, trained on limited data sections, to label a complete dataset. This technique achieved comparable results to its fully-supervised equivalent. Our semi-supervised method has the significant advantage of reduced annotation cost. Furthermore, it reduces the opportunity for human error in the labeling process normally required for training of segmentation algorithms. It also lowers the annotation cost of training a model capable of continuous monitoring of cycle characteristics such as those employed to analyze the progress of movement disorders or analysis of running technique.
NASA Astrophysics Data System (ADS)
Collins, W. D.; Wehner, M. F.; Prabhat, M.; Kurth, T.; Satish, N.; Mitliagkas, I.; Zhang, J.; Racah, E.; Patwary, M.; Sundaram, N.; Dubey, P.
2017-12-01
Anthropogenically-forced climate changes in the number and character of extreme storms have the potential to significantly impact human and natural systems. Current high-performance computing enables multidecadal simulations with global climate models at resolutions of 25km or finer. Such high-resolution simulations are demonstrably superior in simulating extreme storms such as tropical cyclones than the coarser simulations available in the Coupled Model Intercomparison Project (CMIP5) and provide the capability to more credibly project future changes in extreme storm statistics and properties. The identification and tracking of storms in the voluminous model output is very challenging as it is impractical to manually identify storms due to the enormous size of the datasets, and therefore automated procedures are used. Traditionally, these procedures are based on a multi-variate set of physical conditions based on known properties of the class of storms in question. In recent years, we have successfully demonstrated that Deep Learning produces state of the art results for pattern detection in climate data. We have developed supervised and semi-supervised convolutional architectures for detecting and localizing tropical cyclones, extra-tropical cyclones and atmospheric rivers in simulation data. One of the primary challenges in the applicability of Deep Learning to climate data is in the expensive training phase. Typical networks may take days to converge on 10GB-sized datasets, while the climate science community has ready access to O(10 TB)-O(PB) sized datasets. In this work, we present the most scalable implementation of Deep Learning to date. We successfully scale a unified, semi-supervised convolutional architecture on all of the Cori Phase II supercomputer at NERSC. We use IntelCaffe, MKL and MLSL libraries. We have optimized single node MKL libraries to obtain 1-4 TF on single KNL nodes. We have developed a novel hybrid parameter update strategy to improve scaling to 9600 KNL nodes (600,000 cores). We obtain 15PF performance over the course of the training run; setting a new watermark for the HPC and Deep Learning communities. This talk will share insights on how to obtain this extreme level of performance, current gaps/challenges and implications for the climate science community.
Semi-Supervised Data Summarization: Using Spectral Libraries to Improve Hyperspectral Clustering
NASA Technical Reports Server (NTRS)
Wagstaff, K. L.; Shu, H. P.; Mazzoni, D.; Castano, R.
2005-01-01
Hyperspectral imagers produce very large images, with each pixel recorded at hundreds or thousands of different wavelengths. The ability to automatically generate summaries of these data sets enables several important applications, such as quickly browsing through a large image repository or determining the best use of a limited bandwidth link (e.g., determining which images are most critical for full transmission). Clustering algorithms can be used to generate these summaries, but traditional clustering methods make decisions based only on the information contained in the data set. In contrast, we present a new method that additionally leverages existing spectral libraries to identify materials that are likely to be present in the image target area. We find that this approach simultaneously reduces runtime and produces summaries that are more relevant to science goals.
A Review on Data Stream Classification
NASA Astrophysics Data System (ADS)
Haneen, A. A.; Noraziah, A.; Wahab, Mohd Helmy Abd
2018-05-01
At this present time, the significance of data streams cannot be denied as many researchers have placed their focus on the research areas of databases, statistics, and computer science. In fact, data streams refer to some data points sequences that are found in order with the potential to be non-binding, which is generated from the process of generating information in a manner that is not stationary. As such the typical tasks of searching data have been linked to streams of data that are inclusive of clustering, classification, and repeated mining of pattern. This paper presents several data stream clustering approaches, which are based on density, besides attempting to comprehend the function of the related algorithms; both semi-supervised and active learning, along with reviews of a number of recent studies.
Form Follows Function: A Model for Clinical Supervision of Genetic Counseling Students.
Wherley, Colleen; Veach, Patricia McCarthy; Martyr, Meredith A; LeRoy, Bonnie S
2015-10-01
Supervision plays a vital role in genetic counselor training, yet models describing genetic counseling supervision processes and outcomes are lacking. This paper describes a proposed supervision model intended to provide a framework to promote comprehensive and consistent clinical supervision training for genetic counseling students. Based on the principle "form follows function," the model reflects and reinforces McCarthy Veach et al.'s empirically derived model of genetic counseling practice - the "Reciprocal Engagement Model" (REM). The REM consists of mutually interactive educational, relational, and psychosocial components. The Reciprocal Engagement Model of Supervision (REM-S) has similar components and corresponding tenets, goals, and outcomes. The 5 REM-S tenets are: Learning and applying genetic information are key; Relationship is integral to genetic counseling supervision; Student autonomy must be supported; Students are capable; and Student emotions matter. The REM-S outcomes are: Student understands and applies information to independently provide effective services, develop professionally, and engage in self-reflective practice. The 16 REM-S goals are informed by the REM of genetic counseling practice and supported by prior literature. A review of models in medicine and psychology confirms the REM-S contains supervision elements common in healthcare fields, while remaining unique to genetic counseling. The REM-S shows promise for enhancing genetic counselor supervision training and practice and for promoting research on clinical supervision. The REM-S is presented in detail along with specific examples and training and research suggestions.
Dos Santos, Fernanda M; de Souza, Maria Gorete; Crotti, Antônio E Miller; Martins, Carlos H G; Ambrósio, Sérgio R; Veneziani, Rodrigo C S; E Silva, Márcio L Andrade; Cunha, Wilson R
2012-04-01
This work describes the phytochemical study of the extracts from aerial parts of Tibouchina candolleana as well as the evaluation of the antimicrobial activity of extracts, isolated compounds, and semi-synthetic derivatives of ursolic acid against endodontic bacteria. HRGC analysis of the n-hexane extract of T. candolleana allowed identification of β-amyrin, α-amyrin, and β-sitosterol as major constituents. The triterpenes ursolic acid and oleanolic acid were isolated from the methylene chloride extract and identified. In addition, the flavonoids luteolin and genistein were isolated from the ethanol extract and identified. The antimicrobial activity was investigated via determination of the minimum inhibitory concentration (MIC) using the broth microdilution method. Amongst the isolated compounds, ursolic acid was the most effective against the selected endodontic bacteria. As for the semi-synthetic ursolic acid derivatives, only the methyl ester derivative potentiated the activity against Bacteroides fragilis.
dos Santos, Fernanda M.; de Souza, Maria Gorete; Crotti, Antônio E. Miller; Martins, Carlos H. G.; Ambrósio, Sérgio R.; Veneziani, Rodrigo C. S.; e Silva, Márcio L. Andrade; Cunha, Wilson R.
2012-01-01
This work describes the phytochemical study of the extracts from aerial parts of Tibouchina candolleana as well as the evaluation of the antimicrobial activity of extracts, isolated compounds, and semi-synthetic derivatives of ursolic acid against endodontic bacteria. HRGC analysis of the n-hexane extract of T. candolleana allowed identification of β-amyrin, α-amyrin, and β-sitosterol as major constituents. The triterpenes ursolic acid and oleanolic acid were isolated from the methylene chloride extract and identified. In addition, the flavonoids luteolin and genistein were isolated from the ethanol extract and identified. The antimicrobial activity was investigated via determination of the minimum inhibitory concentration (MIC) using the broth microdilution method. Amongst the isolated compounds, ursolic acid was the most effective against the selected endodontic bacteria. As for the semi-synthetic ursolic acid derivatives, only the methyl ester derivative potentiated the activity against Bacteroides fragilis. PMID:24031892
Computational approaches for predicting biomedical research collaborations.
Zhang, Qing; Yu, Hong
2014-01-01
Biomedical research is increasingly collaborative, and successful collaborations often produce high impact work. Computational approaches can be developed for automatically predicting biomedical research collaborations. Previous works of collaboration prediction mainly explored the topological structures of research collaboration networks, leaving out rich semantic information from the publications themselves. In this paper, we propose supervised machine learning approaches to predict research collaborations in the biomedical field. We explored both the semantic features extracted from author research interest profile and the author network topological features. We found that the most informative semantic features for author collaborations are related to research interest, including similarity of out-citing citations, similarity of abstracts. Of the four supervised machine learning models (naïve Bayes, naïve Bayes multinomial, SVMs, and logistic regression), the best performing model is logistic regression with an ROC ranging from 0.766 to 0.980 on different datasets. To our knowledge we are the first to study in depth how research interest and productivities can be used for collaboration prediction. Our approach is computationally efficient, scalable and yet simple to implement. The datasets of this study are available at https://github.com/qingzhanggithub/medline-collaboration-datasets.
Extending information retrieval methods to personalized genomic-based studies of disease.
Ye, Shuyun; Dawson, John A; Kendziorski, Christina
2014-01-01
Genomic-based studies of disease now involve diverse types of data collected on large groups of patients. A major challenge facing statistical scientists is how best to combine the data, extract important features, and comprehensively characterize the ways in which they affect an individual's disease course and likelihood of response to treatment. We have developed a survival-supervised latent Dirichlet allocation (survLDA) modeling framework to address these challenges. Latent Dirichlet allocation (LDA) models have proven extremely effective at identifying themes common across large collections of text, but applications to genomics have been limited. Our framework extends LDA to the genome by considering each patient as a "document" with "text" detailing his/her clinical events and genomic state. We then further extend the framework to allow for supervision by a time-to-event response. The model enables the efficient identification of collections of clinical and genomic features that co-occur within patient subgroups, and then characterizes each patient by those features. An application of survLDA to The Cancer Genome Atlas ovarian project identifies informative patient subgroups showing differential response to treatment, and validation in an independent cohort demonstrates the potential for patient-specific inference.
Assessing support for supervised injection services among community stakeholders in London, Canada.
Bardwell, Geoff; Scheim, Ayden; Mitra, Sanjana; Kerr, Thomas
2017-10-01
Few qualitative studies have examined support for supervised injection services (SIS), and these have been restricted to large cities. This study aimed to assess support for SIS among a diverse representation of community stakeholders in London, a mid-sized city in southwestern Ontario, Canada. This qualitative study was undertaken as part of the Ontario Integrated Supervised Injection Services Feasibility Study. We used purposive sampling methods to recruit a diversity of key informants (n=20) from five sectors: healthcare; social services; government and municipal services; police and emergency services; and the business and community sector. Interview data, collected via one-to-one semi structured interviews, were coded and analyzed using thematic analyses through NVivo 10 software. Interview participants unanimously supported the implementation of SIS in London. However, participant support for SIS was met with some implementation-related preferences and/or conditions. These included centralization or decentralization of SIS; accessibility of SIS for people who inject drugs; proximity of SIS to interview participants; and other services and strategies offered alongside SIS. The results of this study challenge the assumptions that smaller cities like London may be unlikely to support SIS. Community stakeholders were supportive of the implementation of SIS with some preferences or conditions. Interview participants had differing perspectives, but ultimately supported similar end goals of accessibility and reducing community harms associated with injection drug use. Future research and SIS programming should consider these factors when determining optimal service delivery in ways that increase support from a diversity of community stakeholders. Copyright © 2017 Elsevier B.V. All rights reserved.
[Analysis of interventions designed to improve clinical supervision of student nurses in Benin].
Otti, André; Pirson, Magali; Piette, Danielle; Coppieters T Wallant, Yves
2017-12-05
The absence of an explicit and coherent conception of the articulation between theory and practice in the reform of nursing training in Benin has resulted in poor quality clinical supervision of student nurses. The objective of this article is to analyze two interventions designed to improve the quality of supervision. A student welcome booklet developed by means of a consultative and provocative participatory approach was tested with twelve student nurses versus a control group. Content analysis of the data collected by individual semi-directed interviews and during two focus groups demonstrated the value of this tool. Student nurses were also taught to use to training diaries inspired by the ?experiential learning? Training diaries were analysed using a grid based on the descriptive elements of the five types of Scheepers training diaries (2008). According to the student nurses, the welcome booklet provided them with structured information to be used as a reference during their training and a better understanding of their teachers, and allowed them to situate the resources of the training course with a lower level of stress. Fifty-eight per cent of the training diaries were are mosaics, reflecting the reflective practice and self-regulated learning of student nurses. This activity also promoted metacognitive dialogue with their supervisors. The student welcome booklet appeared to facilitate integration of student nurses into the clinical setting and promoted professional and organizational socialization. The training diary improved the quality of clinical learning by repeated reflective observation of student nurses and helped to maintain permanent communication with the supervisors.
NASA Astrophysics Data System (ADS)
Sarıyılmaz, F. B.; Musaoğlu, N.; Uluğtekin, N.
2017-11-01
The Sazlidere Basin is located on the European side of Istanbul within the borders of Arnavutkoy and Basaksehir districts. The total area of the basin, which is largely located within the province of Arnavutkoy, is approximately 177 km2. The Sazlidere Basin is faced with intense urbanization pressures and land use / cover change due to the Northern Marmara Motorway, 3rd airport and Channel Istanbul Projects, which are planned to be realized in the Arnavutkoy region. Due to the mentioned projects, intense land use /cover changes occur in the basin. In this study, 2000 and 2012 dated LANDSAT images were supervised classified based on CORINE Land Cover first level to determine the land use/cover classes. As a result, four information classes were identified. These classes are water bodies, forest and semi-natural areas, agricultural areas and artificial surfaces. Accuracy analysis of the images were performed following the classification process. The supervised classified images that have the smallest mapping units 0.09 ha and 0.64 ha were generalized to be compatible with the CORINE Land Cover data. The image pixels have been rearranged by using the thematic pixel aggregation method as the smallest mapping unit is 25 ha. These results were compared with CORINE Land Cover 2000 and CORINE Land Cover 2012, which were obtained by digitizing land cover and land use classes on satellite images. It has been determined that the compared results are compatible with each other in terms of quality and quantity.
Qian, Jing; Song, Baihe; Wang, Bin
2017-01-01
Although research on the antecedents of job dissatisfaction has been developed greatly, we know little about the role of abusive supervision in generating job dissatisfaction. The contingencies under which abusive supervision relates to employees’ job dissatisfaction are still unknown. The present study aimed to fill this research gap by empirically exploring the abusive supervision-job dissatisfaction relationship as well as examining the moderating roles of feedback avoidance and critical thinking on this relationship. We tested the hypotheses with data from a sample of 248 employees from a high-tech communications company in northern China and found that: (a) abusive supervision was positively related to job dissatisfaction; (b) the positive relationship was moderated by both employees’ feedback avoidance and critical thinking. We conclude by extracting the theoretical as well as practical contributions, along with a discussion of the promising directions for future research. PMID:28408899
Qian, Jing; Song, Baihe; Wang, Bin
2017-01-01
Although research on the antecedents of job dissatisfaction has been developed greatly, we know little about the role of abusive supervision in generating job dissatisfaction. The contingencies under which abusive supervision relates to employees' job dissatisfaction are still unknown. The present study aimed to fill this research gap by empirically exploring the abusive supervision-job dissatisfaction relationship as well as examining the moderating roles of feedback avoidance and critical thinking on this relationship. We tested the hypotheses with data from a sample of 248 employees from a high-tech communications company in northern China and found that: (a) abusive supervision was positively related to job dissatisfaction; (b) the positive relationship was moderated by both employees' feedback avoidance and critical thinking. We conclude by extracting the theoretical as well as practical contributions, along with a discussion of the promising directions for future research.
ERIC Educational Resources Information Center
Reynolds, Vikki
2010-01-01
This article illustrates an approach to therapeutic supervision informed by a philosophy of solidarity and social justice activism. Called a "Supervision of Solidarity", this approach addresses the particular challenges in the supervision of therapists who work alongside clients who are subjected to social injustice and extreme marginalization. It…
HEDEA: A Python Tool for Extracting and Analysing Semi-structured Information from Medical Records
Aggarwal, Anshul; Garhwal, Sunita
2018-01-01
Objectives One of the most important functions for a medical practitioner while treating a patient is to study the patient's complete medical history by going through all records, from test results to doctor's notes. With the increasing use of technology in medicine, these records are mostly digital, alleviating the problem of looking through a stack of papers, which are easily misplaced, but some of these are in an unstructured form. Large parts of clinical reports are in written text form and are tedious to use directly without appropriate pre-processing. In medical research, such health records may be a good, convenient source of medical data; however, lack of structure means that the data is unfit for statistical evaluation. In this paper, we introduce a system to extract, store, retrieve, and analyse information from health records, with a focus on the Indian healthcare scene. Methods A Python-based tool, Healthcare Data Extraction and Analysis (HEDEA), has been designed to extract structured information from various medical records using a regular expression-based approach. Results The HEDEA system is working, covering a large set of formats, to extract and analyse health information. Conclusions This tool can be used to generate analysis report and charts using the central database. This information is only provided after prior approval has been received from the patient for medical research purposes. PMID:29770248
HEDEA: A Python Tool for Extracting and Analysing Semi-structured Information from Medical Records.
Aggarwal, Anshul; Garhwal, Sunita; Kumar, Ajay
2018-04-01
One of the most important functions for a medical practitioner while treating a patient is to study the patient's complete medical history by going through all records, from test results to doctor's notes. With the increasing use of technology in medicine, these records are mostly digital, alleviating the problem of looking through a stack of papers, which are easily misplaced, but some of these are in an unstructured form. Large parts of clinical reports are in written text form and are tedious to use directly without appropriate pre-processing. In medical research, such health records may be a good, convenient source of medical data; however, lack of structure means that the data is unfit for statistical evaluation. In this paper, we introduce a system to extract, store, retrieve, and analyse information from health records, with a focus on the Indian healthcare scene. A Python-based tool, Healthcare Data Extraction and Analysis (HEDEA), has been designed to extract structured information from various medical records using a regular expression-based approach. The HEDEA system is working, covering a large set of formats, to extract and analyse health information. This tool can be used to generate analysis report and charts using the central database. This information is only provided after prior approval has been received from the patient for medical research purposes.
Machine learning applications in genetics and genomics.
Libbrecht, Maxwell W; Noble, William Stafford
2015-06-01
The field of machine learning, which aims to develop computer algorithms that improve with experience, holds promise to enable computers to assist humans in the analysis of large, complex data sets. Here, we provide an overview of machine learning applications for the analysis of genome sequencing data sets, including the annotation of sequence elements and epigenetic, proteomic or metabolomic data. We present considerations and recurrent challenges in the application of supervised, semi-supervised and unsupervised machine learning methods, as well as of generative and discriminative modelling approaches. We provide general guidelines to assist in the selection of these machine learning methods and their practical application for the analysis of genetic and genomic data sets.
Security system signal supervision
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chritton, M.R.; Matter, J.C.
1991-09-01
This purpose of this NUREG is to present technical information that should be useful to NRC licensees for understanding and applying line supervision techniques to security communication links. A review of security communication links is followed by detailed discussions of link physical protection and DC/AC static supervision and dynamic supervision techniques. Material is also presented on security for atmospheric transmission and video line supervision. A glossary of security communication line supervision terms is appended. 16 figs.
Code of Federal Regulations, 2010 CFR
2010-01-01
... holding company's business, a component based on its organizational form, and a component based on its...-site supervision of a noncomplex, low risk savings and loan holding company structure. OTS will... company is the registered holding company at the highest level of ownership in a holding company structure...
Fusion And Inference From Multiple And Massive Disparate Distributed Dynamic Data Sets
2017-07-01
principled methodology for two-sample graph testing; designed a provably almost-surely perfect vertex clustering algorithm for block model graphs; proved...3.7 Semi-Supervised Clustering Methodology ...................................................................... 9 3.8 Robust Hypothesis Testing...dimensional Euclidean space – allows the full arsenal of statistical and machine learning methodology for multivariate Euclidean data to be deployed for
Improved Anomaly Detection using Integrated Supervised and Unsupervised Processing
NASA Astrophysics Data System (ADS)
Hunt, B.; Sheppard, D. G.; Wetterer, C. J.
There are two broad technologies of signal processing applicable to space object feature identification using nonresolved imagery: supervised processing analyzes a large set of data for common characteristics that can be then used to identify, transform, and extract information from new data taken of the same given class (e.g. support vector machine); unsupervised processing utilizes detailed physics-based models that generate comparison data that can then be used to estimate parameters presumed to be governed by the same models (e.g. estimation filters). Both processes have been used in non-resolved space object identification and yield similar results yet arrived at using vastly different processes. The goal of integrating the results of the two is to seek to achieve an even greater performance by building on the process diversity. Specifically, both supervised processing and unsupervised processing will jointly operate on the analysis of brightness (radiometric flux intensity) measurements reflected by space objects and observed by a ground station to determine whether a particular day conforms to a nominal operating mode (as determined from a training set) or exhibits anomalous behavior where a particular parameter (e.g. attitude, solar panel articulation angle) has changed in some way. It is demonstrated in a variety of different scenarios that the integrated process achieves a greater performance than each of the separate processes alone.
NASA Technical Reports Server (NTRS)
Odenyo, V. A. O.
1975-01-01
Remote sensing data on computer-compatible tapes of LANDSAT 1 multispectral scanner imager were analyzed to generate a land use map of the City of Virginia Beach. All four bands were used in both the supervised and unsupervised approaches with the LAYSYS software system. Color IR imagery of a U-2 flight of the same area was also digitized and two sample areas were analyzed via the unsupervised approach. The relationships between the mapped land use and the soils of the area were investigated. A land use land cover map at a scale of 1:24,000 was obtained from the supervised analysis of LANDSAT 1 data. It was concluded that machine analysis of remote sensing data to produce land use maps was feasible; that the LAYSYS software system was usable for this purpose; and that the machine analysis was capable of extracting detailed information from the relatively small scale LANDSAT data in a much shorter time without compromising accuracy.
Mahapatra, Dwarikanath; Schueffler, Peter; Tielbeek, Jeroen A W; Buhmann, Joachim M; Vos, Franciscus M
2013-10-01
Increasing incidence of Crohn's disease (CD) in the Western world has made its accurate diagnosis an important medical challenge. The current reference standard for diagnosis, colonoscopy, is time-consuming and invasive while magnetic resonance imaging (MRI) has emerged as the preferred noninvasive procedure over colonoscopy. Current MRI approaches assess rate of contrast enhancement and bowel wall thickness, and rely on extensive manual segmentation for accurate analysis. We propose a supervised learning method for the identification and localization of regions in abdominal magnetic resonance images that have been affected by CD. Low-level features like intensity and texture are used with shape asymmetry information to distinguish between diseased and normal regions. Particular emphasis is laid on a novel entropy-based shape asymmetry method and higher-order statistics like skewness and kurtosis. Multi-scale feature extraction renders the method robust. Experiments on real patient data show that our features achieve a high level of accuracy and perform better than two competing methods.
Classifying GABAergic interneurons with semi-supervised projected model-based clustering.
Mihaljević, Bojan; Benavides-Piccione, Ruth; Guerra, Luis; DeFelipe, Javier; Larrañaga, Pedro; Bielza, Concha
2015-09-01
A recently introduced pragmatic scheme promises to be a useful catalog of interneuron names. We sought to automatically classify digitally reconstructed interneuronal morphologies according to this scheme. Simultaneously, we sought to discover possible subtypes of these types that might emerge during automatic classification (clustering). We also investigated which morphometric properties were most relevant for this classification. A set of 118 digitally reconstructed interneuronal morphologies classified into the common basket (CB), horse-tail (HT), large basket (LB), and Martinotti (MA) interneuron types by 42 of the world's leading neuroscientists, quantified by five simple morphometric properties of the axon and four of the dendrites. We labeled each neuron with the type most commonly assigned to it by the experts. We then removed this class information for each type separately, and applied semi-supervised clustering to those cells (keeping the others' cluster membership fixed), to assess separation from other types and look for the formation of new groups (subtypes). We performed this same experiment unlabeling the cells of two types at a time, and of half the cells of a single type at a time. The clustering model is a finite mixture of Gaussians which we adapted for the estimation of local (per-cluster) feature relevance. We performed the described experiments on three different subsets of the data, formed according to how many experts agreed on type membership: at least 18 experts (the full data set), at least 21 (73 neurons), and at least 26 (47 neurons). Interneurons with more reliable type labels were classified more accurately. We classified HT cells with 100% accuracy, MA cells with 73% accuracy, and CB and LB cells with 56% and 58% accuracy, respectively. We identified three subtypes of the MA type, one subtype of CB and LB types each, and no subtypes of HT (it was a single, homogeneous type). We got maximum (adapted) Silhouette width and ARI values of 1, 0.83, 0.79, and 0.42, when unlabeling the HT, CB, LB, and MA types, respectively, confirming the quality of the formed cluster solutions. The subtypes identified when unlabeling a single type also emerged when unlabeling two types at a time, confirming their validity. Axonal morphometric properties were more relevant that dendritic ones, with the axonal polar histogram length in the [π, 2π) angle interval being particularly useful. The applied semi-supervised clustering method can accurately discriminate among CB, HT, LB, and MA interneuron types while discovering potential subtypes, and is therefore useful for neuronal classification. The discovery of potential subtypes suggests that some of these types are more heterogeneous that previously thought. Finally, axonal variables seem to be more relevant than dendritic ones for distinguishing among the CB, HT, LB, and MA interneuron types. Copyright © 2015 Elsevier B.V. All rights reserved.
Target discrimination method for SAR images based on semisupervised co-training
NASA Astrophysics Data System (ADS)
Wang, Yan; Du, Lan; Dai, Hui
2018-01-01
Synthetic aperture radar (SAR) target discrimination is usually performed in a supervised manner. However, supervised methods for SAR target discrimination may need lots of labeled training samples, whose acquirement is costly, time consuming, and sometimes impossible. This paper proposes an SAR target discrimination method based on semisupervised co-training, which utilizes a limited number of labeled samples and an abundant number of unlabeled samples. First, Lincoln features, widely used in SAR target discrimination, are extracted from the training samples and partitioned into two sets according to their physical meanings. Second, two support vector machine classifiers are iteratively co-trained with the extracted two feature sets based on the co-training algorithm. Finally, the trained classifiers are exploited to classify the test data. The experimental results on real SAR images data not only validate the effectiveness of the proposed method compared with the traditional supervised methods, but also demonstrate the superiority of co-training over self-training, which only uses one feature set.
Augmenting the decomposition of EMG signals using supervised feature extraction techniques.
Parsaei, Hossein; Gangeh, Mehrdad J; Stashuk, Daniel W; Kamel, Mohamed S
2012-01-01
Electromyographic (EMG) signal decomposition is the process of resolving an EMG signal into its constituent motor unit potential trains (MUPTs). In this work, the possibility of improving the decomposing results using two supervised feature extraction methods, i.e., Fisher discriminant analysis (FDA) and supervised principal component analysis (SPCA), is explored. Using the MUP labels provided by a decomposition-based quantitative EMG system as a training data for FDA and SPCA, the MUPs are transformed into a new feature space such that the MUPs of a single MU become as close as possible to each other while those created by different MUs become as far as possible. The MUPs are then reclassified using a certainty-based classification algorithm. Evaluation results using 10 simulated EMG signals comprised of 3-11 MUPTs demonstrate that FDA and SPCA on average improve the decomposition accuracy by 6%. The improvement for the most difficult-to-decompose signal is about 12%, which shows the proposed approach is most beneficial in the decomposition of more complex signals.
The effects of group supervision of nurses: a systematic literature review.
Francke, Anneke L; de Graaff, Fuusje M
2012-09-01
To gain insight into the existing scientific evidence on the effects of group supervision for nurses. A systematic literature study of original research publications. Searches were performed in February 2010 in PubMed, CINAHL, Cochrane Library, Embase, ERIC, the NIVEL catalogue, and PsycINFO. No limitations were applied regarding date of publication, language or country. Original research publications were eligible for review when they described group supervision programmes directed at nurses; used a control group or a pre-test post-test design; and gave information about the effects of group supervision on nurse or patient outcomes. The two review authors independently assessed studies for inclusion. The methodological quality of included studies was also independently assessed by the review authors, using a check list developed by Van Tulder et al. in collaboration with the Dutch Cochrane Centre. Data related to the original publications were extracted by one review author and checked by a second review author. No statistical pooling of outcomes was performed, because there was large heterogeneity of outcomes. A total of 1087 potentially relevant references were found. After screening of the references, eight studies with a control group and nine with a pre-test post-test design were included. Most of the 17 studies included have serious methodological limitations, but four Swedish publications in the field of dementia care had high methodological quality and all point to positive effects on nurses' attitudes and skills and/or nurse-patient interactions. However, in interpreting these positive results, it must be taken into account that these four high-quality publications concern sub-studies of one 'sliced' research project using the same study sample. Moreover, these four publications combined a group supervision intervention with the introduction of individual care planning, which also hampers conclusions about the effectiveness of group supervision alone. Although there are rather a lot of indications that group supervision of nurses is effective, evidence on the effects is still scarce. Further methodologically sound research is needed. Copyright © 2011 Elsevier Ltd. All rights reserved.
Radulović, Niko S; Blagojević, Polina D
2012-01-01
Unfortunately, contaminants of synthetic/artificial origin are sometimes identified as major constituents of essential oils or plant extracts and considered to be biologically active native plant metabolites. To explore the possibility of early recognition and to create a list of some of the most common semi-volatile contaminants of essential oils and plant extracts. Detailed GC and GC-MS analyses of the evaporation residues of six commercially available diethyl ethers and of a plastic bag hydrodistillate were performed. Average mass scans of the total ion chromatogram profiles of the analysed samples were performed. Almost 200 different compounds, subdivided into two groups, were identified in the analysed samples: (i) compounds that could be only of a synthetic/artificial origin, such as butylated hydroxytoluene and o-phthalic acid esters, i.e. requiring exclusion from the list of identified plant constituents; (ii) compounds possibly of synthetic and/or natural plant origin, i.e. compounds derived from the fatty acid metabolism or products of anaerobic intracellular/microbial fermentation. Average mass scans of the total ion chromatogram profiles provide meaningful and convenient information on uncovering important solvent-derived contamination. A database of the most common semi-volatile contaminants of essential oils and plant extracts has been generated that provides information on the likelihood of rejection or acceptance of contaminants as possible plant constituents. The suggested average mass scan approach enables fast and easy profiling of solvents, allowing even inexperienced researchers to pinpoint contaminants. Copyright © 2011 John Wiley & Sons, Ltd.
Application of MPEG-7 descriptors for content-based indexing of sports videos
NASA Astrophysics Data System (ADS)
Hoeynck, Michael; Auweiler, Thorsten; Ohm, Jens-Rainer
2003-06-01
The amount of multimedia data available worldwide is increasing every day. There is a vital need to annotate multimedia data in order to allow universal content access and to provide content-based search-and-retrieval functionalities. Since supervised video annotation can be time consuming, an automatic solution is appreciated. We review recent approaches to content-based indexing and annotation of videos for different kind of sports, and present our application for the automatic annotation of equestrian sports videos. Thereby, we especially concentrate on MPEG-7 based feature extraction and content description. We apply different visual descriptors for cut detection. Further, we extract the temporal positions of single obstacles on the course by analyzing MPEG-7 edge information and taking specific domain knowledge into account. Having determined single shot positions as well as the visual highlights, the information is jointly stored together with additional textual information in an MPEG-7 description scheme. Using this information, we generate content summaries which can be utilized in a user front-end in order to provide content-based access to the video stream, but further content-based queries and navigation on a video-on-demand streaming server.
Automatic classification of animal vocalizations
NASA Astrophysics Data System (ADS)
Clemins, Patrick J.
2005-11-01
Bioacoustics, the study of animal vocalizations, has begun to use increasingly sophisticated analysis techniques in recent years. Some common tasks in bioacoustics are repertoire determination, call detection, individual identification, stress detection, and behavior correlation. Each research study, however, uses a wide variety of different measured variables, called features, and classification systems to accomplish these tasks. The well-established field of human speech processing has developed a number of different techniques to perform many of the aforementioned bioacoustics tasks. Melfrequency cepstral coefficients (MFCCs) and perceptual linear prediction (PLP) coefficients are two popular feature sets. The hidden Markov model (HMM), a statistical model similar to a finite autonoma machine, is the most commonly used supervised classification model and is capable of modeling both temporal and spectral variations. This research designs a framework that applies models from human speech processing for bioacoustic analysis tasks. The development of the generalized perceptual linear prediction (gPLP) feature extraction model is one of the more important novel contributions of the framework. Perceptual information from the species under study can be incorporated into the gPLP feature extraction model to represent the vocalizations as the animals might perceive them. By including this perceptual information and modifying parameters of the HMM classification system, this framework can be applied to a wide range of species. The effectiveness of the framework is shown by analyzing African elephant and beluga whale vocalizations. The features extracted from the African elephant data are used as input to a supervised classification system and compared to results from traditional statistical tests. The gPLP features extracted from the beluga whale data are used in an unsupervised classification system and the results are compared to labels assigned by experts. The development of a framework from which to build animal vocalization classifiers will provide bioacoustics researchers with a consistent platform to analyze and classify vocalizations. A common framework will also allow studies to compare results across species and institutions. In addition, the use of automated classification techniques can speed analysis and uncover behavioral correlations not readily apparent using traditional techniques.
Fast Localization in Large-Scale Environments Using Supervised Indexing of Binary Features.
Youji Feng; Lixin Fan; Yihong Wu
2016-01-01
The essence of image-based localization lies in matching 2D key points in the query image and 3D points in the database. State-of-the-art methods mostly employ sophisticated key point detectors and feature descriptors, e.g., Difference of Gaussian (DoG) and Scale Invariant Feature Transform (SIFT), to ensure robust matching. While a high registration rate is attained, the registration speed is impeded by the expensive key point detection and the descriptor extraction. In this paper, we propose to use efficient key point detectors along with binary feature descriptors, since the extraction of such binary features is extremely fast. The naive usage of binary features, however, does not lend itself to significant speedup of localization, since existing indexing approaches, such as hierarchical clustering trees and locality sensitive hashing, are not efficient enough in indexing binary features and matching binary features turns out to be much slower than matching SIFT features. To overcome this, we propose a much more efficient indexing approach for approximate nearest neighbor search of binary features. This approach resorts to randomized trees that are constructed in a supervised training process by exploiting the label information derived from that multiple features correspond to a common 3D point. In the tree construction process, node tests are selected in a way such that trees have uniform leaf sizes and low error rates, which are two desired properties for efficient approximate nearest neighbor search. To further improve the search efficiency, a probabilistic priority search strategy is adopted. Apart from the label information, this strategy also uses non-binary pixel intensity differences available in descriptor extraction. By using the proposed indexing approach, matching binary features is no longer much slower but slightly faster than matching SIFT features. Consequently, the overall localization speed is significantly improved due to the much faster key point detection and descriptor extraction. It is empirically demonstrated that the localization speed is improved by an order of magnitude as compared with state-of-the-art methods, while comparable registration rate and localization accuracy are still maintained.
NASA Astrophysics Data System (ADS)
Mallast, U.; Gloaguen, R.; Geyer, S.; Rödiger, T.; Siebert, C.
2011-08-01
In this paper we present a semi-automatic method to infer groundwater flow-paths based on the extraction of lineaments from digital elevation models. This method is especially adequate in remote and inaccessible areas where in-situ data are scarce. The combined method of linear filtering and object-based classification provides a lineament map with a high degree of accuracy. Subsequently, lineaments are differentiated into geological and morphological lineaments using auxiliary information and finally evaluated in terms of hydro-geological significance. Using the example of the western catchment of the Dead Sea (Israel/Palestine), the orientation and location of the differentiated lineaments are compared to characteristics of known structural features. We demonstrate that a strong correlation between lineaments and structural features exists. Using Euclidean distances between lineaments and wells provides an assessment criterion to evaluate the hydraulic significance of detected lineaments. Based on this analysis, we suggest that the statistical analysis of lineaments allows a delineation of flow-paths and thus significant information on groundwater movements. To validate the flow-paths we compare them to existing results of groundwater models that are based on well data.
Recognition techniques for extracting information from semistructured documents
NASA Astrophysics Data System (ADS)
Della Ventura, Anna; Gagliardi, Isabella; Zonta, Bruna
2000-12-01
Archives of optical documents are more and more massively employed, the demand driven also by the new norms sanctioning the legal value of digital documents, provided they are stored on supports that are physically unalterable. On the supply side there is now a vast and technologically advanced market, where optical memories have solved the problem of the duration and permanence of data at costs comparable to those for magnetic memories. The remaining bottleneck in these systems is the indexing. The indexing of documents with a variable structure, while still not completely automated, can be machine supported to a large degree with evident advantages both in the organization of the work, and in extracting information, providing data that is much more detailed and potentially significant for the user. We present here a system for the automatic registration of correspondence to and from a public office. The system is based on a general methodology for the extraction, indexing, archiving, and retrieval of significant information from semi-structured documents. This information, in our prototype application, is distributed among the database fields of sender, addressee, subject, date, and body of the document.
NASA Astrophysics Data System (ADS)
Anderson, Dylan; Bapst, Aleksander; Coon, Joshua; Pung, Aaron; Kudenov, Michael
2017-05-01
Hyperspectral imaging provides a highly discriminative and powerful signature for target detection and discrimination. Recent literature has shown that considering additional target characteristics, such as spatial or temporal profiles, simultaneously with spectral content can greatly increase classifier performance. Considering these additional characteristics in a traditional discriminative algorithm requires a feature extraction step be performed first. An example of such a pipeline is computing a filter bank response to extract spatial features followed by a support vector machine (SVM) to discriminate between targets. This decoupling between feature extraction and target discrimination yields features that are suboptimal for discrimination, reducing performance. This performance reduction is especially pronounced when the number of features or available data is limited. In this paper, we propose the use of Supervised Nonnegative Tensor Factorization (SNTF) to jointly perform feature extraction and target discrimination over hyperspectral data products. SNTF learns a tensor factorization and a classification boundary from labeled training data simultaneously. This ensures that the features learned via tensor factorization are optimal for both summarizing the input data and separating the targets of interest. Practical considerations for applying SNTF to hyperspectral data are presented, and results from this framework are compared to decoupled feature extraction/target discrimination pipelines.
Application of wavelet techniques for cancer diagnosis using ultrasound images: A Review.
Sudarshan, Vidya K; Mookiah, Muthu Rama Krishnan; Acharya, U Rajendra; Chandran, Vinod; Molinari, Filippo; Fujita, Hamido; Ng, Kwan Hoong
2016-02-01
Ultrasound is an important and low cost imaging modality used to study the internal organs of human body and blood flow through blood vessels. It uses high frequency sound waves to acquire images of internal organs. It is used to screen normal, benign and malignant tissues of various organs. Healthy and malignant tissues generate different echoes for ultrasound. Hence, it provides useful information about the potential tumor tissues that can be analyzed for diagnostic purposes before therapeutic procedures. Ultrasound images are affected with speckle noise due to an air gap between the transducer probe and the body. The challenge is to design and develop robust image preprocessing, segmentation and feature extraction algorithms to locate the tumor region and to extract subtle information from isolated tumor region for diagnosis. This information can be revealed using a scale space technique such as the Discrete Wavelet Transform (DWT). It decomposes an image into images at different scales using low pass and high pass filters. These filters help to identify the detail or sudden changes in intensity in the image. These changes are reflected in the wavelet coefficients. Various texture, statistical and image based features can be extracted from these coefficients. The extracted features are subjected to statistical analysis to identify the significant features to discriminate normal and malignant ultrasound images using supervised classifiers. This paper presents a review of wavelet techniques used for preprocessing, segmentation and feature extraction of breast, thyroid, ovarian and prostate cancer using ultrasound images. Copyright © 2015 Elsevier Ltd. All rights reserved.
Fine-grained information extraction from German transthoracic echocardiography reports.
Toepfer, Martin; Corovic, Hamo; Fette, Georg; Klügl, Peter; Störk, Stefan; Puppe, Frank
2015-11-12
Information extraction techniques that get structured representations out of unstructured data make a large amount of clinically relevant information about patients accessible for semantic applications. These methods typically rely on standardized terminologies that guide this process. Many languages and clinical domains, however, lack appropriate resources and tools, as well as evaluations of their applications, especially if detailed conceptualizations of the domain are required. For instance, German transthoracic echocardiography reports have not been targeted sufficiently before, despite of their importance for clinical trials. This work therefore aimed at development and evaluation of an information extraction component with a fine-grained terminology that enables to recognize almost all relevant information stated in German transthoracic echocardiography reports at the University Hospital of Würzburg. A domain expert validated and iteratively refined an automatically inferred base terminology. The terminology was used by an ontology-driven information extraction system that outputs attribute value pairs. The final component has been mapped to the central elements of a standardized terminology, and it has been evaluated according to documents with different layouts. The final system achieved state-of-the-art precision (micro average.996) and recall (micro average.961) on 100 test documents that represent more than 90 % of all reports. In particular, principal aspects as defined in a standardized external terminology were recognized with f 1=.989 (micro average) and f 1=.963 (macro average). As a result of keyword matching and restraint concept extraction, the system obtained high precision also on unstructured or exceptionally short documents, and documents with uncommon layout. The developed terminology and the proposed information extraction system allow to extract fine-grained information from German semi-structured transthoracic echocardiography reports with very high precision and high recall on the majority of documents at the University Hospital of Würzburg. Extracted results populate a clinical data warehouse which supports clinical research.
Automated monitoring of medical protocols: a secure and distributed architecture.
Alsinet, T; Ansótegui, C; Béjar, R; Fernández, C; Manyà, F
2003-03-01
The control of the right application of medical protocols is a key issue in hospital environments. For the automated monitoring of medical protocols, we need a domain-independent language for their representation and a fully, or semi, autonomous system that understands the protocols and supervises their application. In this paper we describe a specification language and a multi-agent system architecture for monitoring medical protocols. We model medical services in hospital environments as specialized domain agents and interpret a medical protocol as a negotiation process between agents. A medical service can be involved in multiple medical protocols, and so specialized domain agents are independent of negotiation processes and autonomous system agents perform monitoring tasks. We present the detailed architecture of the system agents and of an important domain agent, the database broker agent, that is responsible of obtaining relevant information about the clinical history of patients. We also describe how we tackle the problems of privacy, integrity and authentication during the process of exchanging information between agents.
Data quality enhancement and knowledge discovery from relevant signals in acoustic emission
NASA Astrophysics Data System (ADS)
Mejia, Felipe; Shyu, Mei-Ling; Nanni, Antonio
2015-10-01
The increasing popularity of structural health monitoring has brought with it a growing need for automated data management and data analysis tools. Of great importance are filters that can systematically detect unwanted signals in acoustic emission datasets. This study presents a semi-supervised data mining scheme that detects data belonging to unfamiliar distributions. This type of outlier detection scheme is useful detecting the presence of new acoustic emission sources, given a training dataset of unwanted signals. In addition to classifying new observations (herein referred to as "outliers") within a dataset, the scheme generates a decision tree that classifies sub-clusters within the outlier context set. The obtained tree can be interpreted as a series of characterization rules for newly-observed data, and they can potentially describe the basic structure of different modes within the outlier distribution. The data mining scheme is first validated on a synthetic dataset, and an attempt is made to confirm the algorithms' ability to discriminate outlier acoustic emission sources from a controlled pencil-lead-break experiment. Finally, the scheme is applied to data from two fatigue crack-growth steel specimens, where it is shown that extracted rules can adequately describe crack-growth related acoustic emission sources while filtering out background "noise." Results show promising performance in filter generation, thereby allowing analysts to extract, characterize, and focus only on meaningful signals.
Robustness and Reliability of Synergy-Based Myocontrol of a Multiple Degree of Freedom Robotic Arm.
Lunardini, Francesca; Casellato, Claudia; d'Avella, Andrea; Sanger, Terence D; Pedrocchi, Alessandra
2016-09-01
In this study, we test the feasibility of the synergy- based approach for application in the realistic and clinically oriented framework of multi-degree of freedom (DOF) robotic control. We developed and tested online ten able-bodied subjects in a semi-supervised method to achieve simultaneous, continuous control of two DOFs of a robotic arm, using muscle synergies extracted from upper limb muscles while performing flexion-extension movements of the elbow and shoulder joints in the horizontal plane. To validate the efficacy of the synergy-based approach in extracting reliable control signals, compared to the simple muscle-pair method typically used in commercial applications, we evaluated the repeatability of the algorithm over days, the effect of the arm dynamics on the control performance, and the robustness of the control scheme to the presence of co-contraction between pairs of antagonist muscles. Results showed that, without the need for a daily calibration, all subjects were able to intuitively and easily control the synergy-based myoelectric interface in different scenarios, using both dynamic and isometric muscle contractions. The proposed control scheme was shown to be robust to co-contraction between antagonist muscles, providing better performance compared to the traditional muscle-pair approach. The current study is a first step toward user-friendly application of synergy-based myocontrol of assistive robotic devices.
Alfonsson, Sven; Spännargård, Åsa; Parling, Thomas; Andersson, Gerhard; Lundgren, Tobias
2017-05-11
Clinical supervision by a senior therapist is a very common practice in psychotherapist training and psychiatric care settings. Though clinical supervision is advocated by most educational and governing institutions, the effects of clinical supervision on the supervisees' competence, e.g., attitudes, behaviors, and skills, as well as on treatment outcomes and other patient variables are debated and largely unknown. Evidence-based practice is advocated in clinical settings but has not yet been fully implemented in educational or clinical training settings. The aim of this systematic review is to synthesize and present the empirical literature regarding effects of clinical supervision in cognitive-behavioral therapy. This study will include a systematic review of the literature to identify studies that have empirically investigated the effects of supervision on supervised psychotherapists and/or the supervisees' patients. A comprehensive search strategy will be conducted to identify published controlled studies indexed in the MEDLINE, EMBASE, PsycINFO, and Cochrane Library databases. Data on supervision outcomes in both psychotherapists and their patients will be extracted, synthesized, and reported. Risk of bias and quality of the included studies will be assessed systematically. This systematic review will rigorously follow established guidelines for systematic reviews in order to summarize and present the evidence base for clinical supervision in cognitive-behavioral therapy and may aid further research and discussion in this area. PROSPERO CRD42016046834.
The Tao of Supervision: Taoist Insights into the Theory and Practice of Educational Supervision.
ERIC Educational Resources Information Center
Glanz, Jeffrey
1997-01-01
There are three approaches to educational supervision: the applied science approach, the interpretive-practical approach, and the critical/emancipatory approach. From a Taoist perspective, conflicting supervision theories or proposals should be welcomed, not resisted. By accepting a diversity of views to inform practice, a balance or centeredness…
Gong, Zhihong; Chen, Si; Gao, Jiangtao; Li, Meihong; Wang, Xiaxia; Lin, Jun; Yu, Xiaomin
2017-11-08
An effective and simple method was established to simultaneously purify seven tea catechins (gallocatechin (GC), epigallocatechin (EGC), catechin (C), epigallocatechin-3- O -gallate (EGCG), epicatechin (EC), epigallocatechin-3- O -(3- O -methyl)-gallate (EGCG3"Me) and epicatechin-3- O -gallate (ECG)) from fresh tea leaves by semi-preparative high performance liquid chromatography (HPLC). Fresh leaves of Tieguanyin tea were successively extracted with methanol and chloroform. Then crude catechins were precipitated from the aqueous fraction of chloroform extraction by adding lead subacetate. Crude catechins were used for the isolation of the seven target catechin compounds by semi-preparative HPLC. Methanol-water and acetonitrile-water were sequentially used as mobile phases. After two rounds of semi-preparative HPLC, all target compounds were achieved with high purities (>90%). The proposed method was tested on two additional tea cultivars and showed similar results. This method demonstrated a simple and efficient strategy based on solvent extraction, ion precipitation and semi-preparative HPLC for the preparation of multiple catechins from tea leaves.
Dusabe-Richards, John N; Tesfaye, Hayley Teshome; Mekonnen, Jarso; Kea, Aschenaki; Theobald, Sally; Datiko, Daniel G
2016-12-27
This study assesses the feasibility of female health extension workers (HEWs) using eHealth within their core duties, supporting both the design and capacity building for an eHealth system project focussed initially on tuberculosis, maternal child health, and gender equity. Health extension workers, Health Centre Heads, District Health Officers, Zonal Health Department and Regional Health Bureau representatives in Southern Ethiopia. The study was undertaken in Southern Ethiopia with three districts in Sidama zone (population of 3.5 million) and one district in Gedeo zone (control zone with similar health service coverage and population density). Mixed method baseline data collection was undertaken, using quantitative questionnaires (n = 57) and purposively sampled qualitative face-to-face semi-structured interviews (n = 10) and focus group discussions (n = 3). Themes were identified relating to HEW commitment and role, supervision, and performance management. The Health Management Information System (HMIS) was seen as important by all participants, but with challenges of information quality, accuracy, reliability and timeliness. Participants' perceptions varied by group regarding the purpose and benefits of HMIS as well as the potential of an eHealth system. Mobile phones were used regularly by all participants. eHealth technology presents a new opportunity for the Ethiopian health system to improve data quality and community health. Front-line female HEWs are a critical bridge between communities and health systems. Empowering HEWs, supporting them and responding to the challenges they face will be an important part of ensuring the sustainability and responsiveness of eHealth strategies. Findings have informed the subsequent eHealth technology design and implementation, capacity strengthening approach, supervision, and performance management approach.
Dorsey, Shannon; Pullmann, Michael D; Kerns, Suzanne E U; Jungbluth, Nathaniel; Meza, Rosemary; Thompson, Kelly; Berliner, Lucy
2017-11-01
Supervisors are an underutilized resource for supporting evidence-based treatments (EBTs) in community mental health. Little is known about how EBT-trained supervisors use supervision time. Primary aims were to describe supervision (e.g., modality, frequency), examine functions of individual supervision, and examine factors associated with time allocation to supervision functions. Results from 56 supervisors and 207 clinicians from 25 organizations indicate high prevalence of individual supervision, often alongside group and informal supervision. Individual supervision serves a wide range of functions, with substantial variation at the supervisor-level. Implementation climate was the strongest predictor of time allocation to clinical and EBT-relevant functions.
Introducing clinical supervision across Western Australian public mental health services.
Taylor, Monica; Harrison, Carole A
2010-08-01
Retention and recruitment of the mental health nursing workforce is a critical issue in Australia and more specifically in Western Australia (WA), partly due to the isolation of the state. It has been suggested that these workforce issues might be minimized through the introduction of clinical supervision within WA mental health services, where, historically, it has been misunderstood and viewed with caution by mental health nurses. This may have been partly due to a lack of understanding of clinical supervision, its models, and its many benefits, due to a paucity of information delivered into initial nurse education programs. The aim of this pilot project is to explore and evaluate the introduction of clinical supervision in WA public mental health services. A quantitative approach informed the study and included the use of an information gathering survey initially, which was followed with evaluation questionnaires. The findings show that education can increase the uptake of clinical supervision. Further, the findings illustrate the importance of linking clinicians from all professional groups via a clinical supervision web-based database.
Chen, W; Kowatch, R; Lin, S; Splaingard, M; Huang, Y
2015-01-01
Nationwide Children's Hospital established an i2b2 (Informatics for Integrating Biology & the Bedside) application for sleep disorder cohort identification. Discrete data were gleaned from semistructured sleep study reports. The system showed to work more efficiently than the traditional manual chart review method, and it also enabled searching capabilities that were previously not possible. We report on the development and implementation of the sleep disorder i2b2 cohort identification system using natural language processing of semi-structured documents. We developed a natural language processing approach to automatically parse concepts and their values from semi-structured sleep study documents. Two parsers were developed: a regular expression parser for extracting numeric concepts and a NLP based tree parser for extracting textual concepts. Concepts were further organized into i2b2 ontologies based on document structures and in-domain knowledge. 26,550 concepts were extracted with 99% being textual concepts. 1.01 million facts were extracted from sleep study documents such as demographic information, sleep study lab results, medications, procedures, diagnoses, among others. The average accuracy of terminology parsing was over 83% when comparing against those by experts. The system is capable of capturing both standard and non-standard terminologies. The time for cohort identification has been reduced significantly from a few weeks to a few seconds. Natural language processing was shown to be powerful for quickly converting large amount of semi-structured or unstructured clinical data into discrete concepts, which in combination of intuitive domain specific ontologies, allows fast and effective interactive cohort identification through the i2b2 platform for research and clinical use.
Chen, W.; Kowatch, R.; Lin, S.; Splaingard, M.
2015-01-01
Summary Nationwide Children’s Hospital established an i2b2 (Informatics for Integrating Biology & the Bedside) application for sleep disorder cohort identification. Discrete data were gleaned from semistructured sleep study reports. The system showed to work more efficiently than the traditional manual chart review method, and it also enabled searching capabilities that were previously not possible. Objective We report on the development and implementation of the sleep disorder i2b2 cohort identification system using natural language processing of semi-structured documents. Methods We developed a natural language processing approach to automatically parse concepts and their values from semi-structured sleep study documents. Two parsers were developed: a regular expression parser for extracting numeric concepts and a NLP based tree parser for extracting textual concepts. Concepts were further organized into i2b2 ontologies based on document structures and in-domain knowledge. Results 26,550 concepts were extracted with 99% being textual concepts. 1.01 million facts were extracted from sleep study documents such as demographic information, sleep study lab results, medications, procedures, diagnoses, among others. The average accuracy of terminology parsing was over 83% when comparing against those by experts. The system is capable of capturing both standard and non-standard terminologies. The time for cohort identification has been reduced significantly from a few weeks to a few seconds. Conclusion Natural language processing was shown to be powerful for quickly converting large amount of semi-structured or unstructured clinical data into discrete concepts, which in combination of intuitive domain specific ontologies, allows fast and effective interactive cohort identification through the i2b2 platform for research and clinical use. PMID:26171080
Mehryary, Farrokh; Kaewphan, Suwisa; Hakala, Kai; Ginter, Filip
2016-01-01
Biomedical event extraction is one of the key tasks in biomedical text mining, supporting various applications such as database curation and hypothesis generation. Several systems, some of which have been applied at a large scale, have been introduced to solve this task. Past studies have shown that the identification of the phrases describing biological processes, also known as trigger detection, is a crucial part of event extraction, and notable overall performance gains can be obtained by solely focusing on this sub-task. In this paper we propose a novel approach for filtering falsely identified triggers from large-scale event databases, thus improving the quality of knowledge extraction. Our method relies on state-of-the-art word embeddings, event statistics gathered from the whole biomedical literature, and both supervised and unsupervised machine learning techniques. We focus on EVEX, an event database covering the whole PubMed and PubMed Central Open Access literature containing more than 40 million extracted events. The top most frequent EVEX trigger words are hierarchically clustered, and the resulting cluster tree is pruned to identify words that can never act as triggers regardless of their context. For rarely occurring trigger words we introduce a supervised approach trained on the combination of trigger word classification produced by the unsupervised clustering method and manual annotation. The method is evaluated on the official test set of BioNLP Shared Task on Event Extraction. The evaluation shows that the method can be used to improve the performance of the state-of-the-art event extraction systems. This successful effort also translates into removing 1,338,075 of potentially incorrect events from EVEX, thus greatly improving the quality of the data. The method is not solely bound to the EVEX resource and can be thus used to improve the quality of any event extraction system or database. The data and source code for this work are available at: http://bionlp-www.utu.fi/trigger-clustering/.
NASA Astrophysics Data System (ADS)
Bruno, L. S.; Rodrigo, B. P.; Lucio, A. de C. Jorge
2016-10-01
This paper presents a system developed by an application of a neural network Multilayer Perceptron for drone acquired agricultural image segmentation. This application allows a supervised user training the classes that will posteriorly be interpreted by neural network. These classes will be generated manually with pre-selected attributes in the application. After the attribute selection a segmentation process is made to allow the relevant information extraction for different types of images, RGB or Hyperspectral. The application allows extracting the geographical coordinates from the image metadata, geo referencing all pixels on the image. In spite of excessive memory consume on hyperspectral images regions of interest, is possible to perform segmentation, using bands chosen by user that can be combined in different ways to obtain different results.
A Hybrid Semi-Supervised Anomaly Detection Model for High-Dimensional Data.
Song, Hongchao; Jiang, Zhuqing; Men, Aidong; Yang, Bo
2017-01-01
Anomaly detection, which aims to identify observations that deviate from a nominal sample, is a challenging task for high-dimensional data. Traditional distance-based anomaly detection methods compute the neighborhood distance between each observation and suffer from the curse of dimensionality in high-dimensional space; for example, the distances between any pair of samples are similar and each sample may perform like an outlier. In this paper, we propose a hybrid semi-supervised anomaly detection model for high-dimensional data that consists of two parts: a deep autoencoder (DAE) and an ensemble k -nearest neighbor graphs- ( K -NNG-) based anomaly detector. Benefiting from the ability of nonlinear mapping, the DAE is first trained to learn the intrinsic features of a high-dimensional dataset to represent the high-dimensional data in a more compact subspace. Several nonparametric KNN-based anomaly detectors are then built from different subsets that are randomly sampled from the whole dataset. The final prediction is made by all the anomaly detectors. The performance of the proposed method is evaluated on several real-life datasets, and the results confirm that the proposed hybrid model improves the detection accuracy and reduces the computational complexity.
A Hybrid Semi-Supervised Anomaly Detection Model for High-Dimensional Data
Jiang, Zhuqing; Men, Aidong; Yang, Bo
2017-01-01
Anomaly detection, which aims to identify observations that deviate from a nominal sample, is a challenging task for high-dimensional data. Traditional distance-based anomaly detection methods compute the neighborhood distance between each observation and suffer from the curse of dimensionality in high-dimensional space; for example, the distances between any pair of samples are similar and each sample may perform like an outlier. In this paper, we propose a hybrid semi-supervised anomaly detection model for high-dimensional data that consists of two parts: a deep autoencoder (DAE) and an ensemble k-nearest neighbor graphs- (K-NNG-) based anomaly detector. Benefiting from the ability of nonlinear mapping, the DAE is first trained to learn the intrinsic features of a high-dimensional dataset to represent the high-dimensional data in a more compact subspace. Several nonparametric KNN-based anomaly detectors are then built from different subsets that are randomly sampled from the whole dataset. The final prediction is made by all the anomaly detectors. The performance of the proposed method is evaluated on several real-life datasets, and the results confirm that the proposed hybrid model improves the detection accuracy and reduces the computational complexity. PMID:29270197
An Efficient Semi-supervised Learning Approach to Predict SH2 Domain Mediated Interactions.
Kundu, Kousik; Backofen, Rolf
2017-01-01
Src homology 2 (SH2) domain is an important subclass of modular protein domains that plays an indispensable role in several biological processes in eukaryotes. SH2 domains specifically bind to the phosphotyrosine residue of their binding peptides to facilitate various molecular functions. For determining the subtle binding specificities of SH2 domains, it is very important to understand the intriguing mechanisms by which these domains recognize their target peptides in a complex cellular environment. There are several attempts have been made to predict SH2-peptide interactions using high-throughput data. However, these high-throughput data are often affected by a low signal to noise ratio. Furthermore, the prediction methods have several additional shortcomings, such as linearity problem, high computational complexity, etc. Thus, computational identification of SH2-peptide interactions using high-throughput data remains challenging. Here, we propose a machine learning approach based on an efficient semi-supervised learning technique for the prediction of 51 SH2 domain mediated interactions in the human proteome. In our study, we have successfully employed several strategies to tackle the major problems in computational identification of SH2-peptide interactions.
Target oriented dimensionality reduction of hyperspectral data by Kernel Fukunaga-Koontz Transform
NASA Astrophysics Data System (ADS)
Binol, Hamidullah; Ochilov, Shuhrat; Alam, Mohammad S.; Bal, Abdullah
2017-02-01
Principal component analysis (PCA) is a popular technique in remote sensing for dimensionality reduction. While PCA is suitable for data compression, it is not necessarily an optimal technique for feature extraction, particularly when the features are exploited in supervised learning applications (Cheriyadat and Bruce, 2003) [1]. Preserving features belonging to the target is very crucial to the performance of target detection/recognition techniques. Fukunaga-Koontz Transform (FKT) based supervised band reduction technique can be used to provide this requirement. FKT achieves feature selection by transforming into a new space in where feature classes have complimentary eigenvectors. Analysis of these eigenvectors under two classes, target and background clutter, can be utilized for target oriented band reduction since each basis functions best represent target class while carrying least information of the background class. By selecting few eigenvectors which are the most relevant to the target class, dimension of hyperspectral data can be reduced and thus, it presents significant advantages for near real time target detection applications. The nonlinear properties of the data can be extracted by kernel approach which provides better target features. Thus, we propose constructing kernel FKT (KFKT) to present target oriented band reduction. The performance of the proposed KFKT based target oriented dimensionality reduction algorithm has been tested employing two real-world hyperspectral data and results have been reported consequently.
Effects of additional data on Bayesian clustering.
Yamazaki, Keisuke
2017-10-01
Hierarchical probabilistic models, such as mixture models, are used for cluster analysis. These models have two types of variables: observable and latent. In cluster analysis, the latent variable is estimated, and it is expected that additional information will improve the accuracy of the estimation of the latent variable. Many proposed learning methods are able to use additional data; these include semi-supervised learning and transfer learning. However, from a statistical point of view, a complex probabilistic model that encompasses both the initial and additional data might be less accurate due to having a higher-dimensional parameter. The present paper presents a theoretical analysis of the accuracy of such a model and clarifies which factor has the greatest effect on its accuracy, the advantages of obtaining additional data, and the disadvantages of increasing the complexity. Copyright © 2017 Elsevier Ltd. All rights reserved.
ERIC Educational Resources Information Center
Cox, David E.; And Others
1991-01-01
Includes "It's Time to Stop Quibbling over the Acronym" (Cox); "Information Rich--Experience Poor" (Elliot et al.); "Supervised Agricultural Experience Selection Process" (Yokum, Boggs); "Point System" (Fraze, Vaughn); "Urban Diversity Rural Style" (Morgan, Henry); "Nonoccupational Supervised Experience" (Croom); "Reflecting Industry" (Miller);…
Xu, Guangkuan; Hao, Changchun; Tian, Suyang; Gao, Feng; Sun, Wenyuan; Sun, Runguang
2017-01-15
This study investigated a new and easy-to-industrialized extracting method for curcumin from Curcuma longa rhizomes using ultrasonic extraction technology combined with ammonium sulfate/ethanol aqueous two-phase system (ATPS), and the preparation of curcumin using the semi-preparative HPLC. The single-factor experiments and response surface methodology (RSM) were utilized to determine the optimal material-solvent ratio, ultrasonic intensity (UI) and ultrasonic time. The optimum extraction conditions were finally determined to be material-solvent rate of 3.29:100, ultrasonic intensity of 33.63W/cm 2 and ultrasonic time of 17min. At these optimum conditions, the extraction yield could reach 46.91mg/g. And the extraction yields of curcumin remained stable in the case of amplification, which indicated that scale-up extraction was feasible and efficient. Afterwards, the semi-preparative HPLC experiment was carried out, in which optimal preparation conditions were elected according to the single factor experiment. The prepared curcumin was obtained and the purity could up to 85.58% by the semi-preparative HPLC. Copyright © 2016 Elsevier B.V. All rights reserved.
Rare tradition of the folk medicinal use of Aconitum spp. is kept alive in Solčavsko, Slovenia.
Povšnar, Marija; Koželj, Gordana; Kreft, Samo; Lumpert, Mateja
2017-08-08
Aconitum species are poisonous plants that have been used in Western medicine for centuries. In the nineteenth century, these plants were part of official and folk medicine in the Slovenian territory. According to current ethnobotanical studies, folk use of Aconitum species is rarely reported in Europe. The purpose of this study was to research the folk medicinal use of Aconitum species in Solčavsko, Slovenia; to collect recipes for the preparation of Aconitum spp., indications for use, and dosing; and to investigate whether the folk use of aconite was connected to poisoning incidents. In Solčavsko, a remote alpine area in northern Slovenia, we performed semi-structured interviews with 19 informants in Solčavsko, 3 informants in Luče, and two retired physicians who worked in that area. Three samples of homemade ethanolic extracts were obtained from informants, and the concentration of aconitine was measured. In addition, four extracts were prepared according to reported recipes. All 22 informants knew of Aconitum spp. and their therapeutic use, and 5 of them provided a detailed description of the preparation and use of "voukuc", an ethanolic extract made from aconite roots. Seven informants were unable to describe the preparation in detail, since they knew of the extract only from the narration of others or they remembered it from childhood. Most likely, the roots of Aconitum tauricum and Aconitum napellus were used for the preparation of the extract, and the solvent was homemade spirits. Four informants kept the extract at home; two extracts were prepared recently (1998 and 2015). Three extracts were analyzed, and 2 contained aconitine. Informants reported many indications for the use of the extract; it was used internally and, in some cases, externally as well. The extract was also used in animals. The extract was measured in drops, but the number of drops differed among the informants. The informants reported nine poisonings with Aconitum spp., but none of them occurred as a result of medicinal use of the extract. In this study, we determined that folk knowledge of the medicinal use of Aconitum spp. is still present in Solčavsko, but Aconitum preparations are used only infrequently.
Miller, Vonda H; Jansen, Ben H
2008-12-01
Computer algorithms that match human performance in recognizing written text or spoken conversation remain elusive. The reasons why the human brain far exceeds any existing recognition scheme to date in the ability to generalize and to extract invariant characteristics relevant to category matching are not clear. However, it has been postulated that the dynamic distribution of brain activity (spatiotemporal activation patterns) is the mechanism by which stimuli are encoded and matched to categories. This research focuses on supervised learning using a trajectory based distance metric for category discrimination in an oscillatory neural network model. Classification is accomplished using a trajectory based distance metric. Since the distance metric is differentiable, a supervised learning algorithm based on gradient descent is demonstrated. Classification of spatiotemporal frequency transitions and their relation to a priori assessed categories is shown along with the improved classification results after supervised training. The results indicate that this spatiotemporal representation of stimuli and the associated distance metric is useful for simple pattern recognition tasks and that supervised learning improves classification results.
gamAID: Greedy CP tensor decomposition for supervised EHR-based disease trajectory differentiation.
Henderson, Jette; Ho, Joyce; Ghosh, Joydeep
2017-07-01
We propose gamAID, an exploratory, supervised nonnegative tensor factorization method that iteratively extracts phenotypes from tensors constructed from medical count data. Using data from diabetic patients who later on get diagnosed with chronic kidney disorder (CKD) as well as diabetic patients who do not receive a CKD diagnosis, we demonstrate the potential of gamAID to discover phenotypes that characterize patients who are at risk for developing a disease.
Sensor feature fusion for detecting buried objects
DOE Office of Scientific and Technical Information (OSTI.GOV)
Clark, G.A.; Sengupta, S.K.; Sherwood, R.J.
1993-04-01
Given multiple registered images of the earth`s surface from dual-band sensors, our system fuses information from the sensors to reduce the effects of clutter and improve the ability to detect buried or surface target sites. The sensor suite currently includes two sensors (5 micron and 10 micron wavelengths) and one ground penetrating radar (GPR) of the wide-band pulsed synthetic aperture type. We use a supervised teaming pattern recognition approach to detect metal and plastic land mines buried in soil. The overall process consists of four main parts: Preprocessing, feature extraction, feature selection, and classification. These parts are used in amore » two step process to classify a subimage. Thee first step, referred to as feature selection, determines the features of sub-images which result in the greatest separability among the classes. The second step, image labeling, uses the selected features and the decisions from a pattern classifier to label the regions in the image which are likely to correspond to buried mines. We extract features from the images, and use feature selection algorithms to select only the most important features according to their contribution to correct detections. This allows us to save computational complexity and determine which of the sensors add value to the detection system. The most important features from the various sensors are fused using supervised teaming pattern classifiers (including neural networks). We present results of experiments to detect buried land mines from real data, and evaluate the usefulness of fusing feature information from multiple sensor types, including dual-band infrared and ground penetrating radar. The novelty of the work lies mostly in the combination of the algorithms and their application to the very important and currently unsolved operational problem of detecting buried land mines from an airborne standoff platform.« less
NASA Astrophysics Data System (ADS)
Ressel, Rudolf; Singha, Suman; Lehner, Susanne
2016-08-01
Arctic Sea ice monitoring has attracted increasing attention over the last few decades. Besides the scientific interest in sea ice, the operational aspect of ice charting is becoming more important due to growing navigational possibilities in an increasingly ice free Arctic. For this purpose, satellite borne SAR imagery has become an invaluable tool. In past, mostly single polarimetric datasets were investigated with supervised or unsupervised classification schemes for sea ice investigation. Despite proven sea ice classification achievements on single polarimetric data, a fully automatic, general purpose classifier for single-pol data has not been established due to large variation of sea ice manifestations and incidence angle impact. Recently, through the advent of polarimetric SAR sensors, polarimetric features have moved into the focus of ice classification research. The higher information content four polarimetric channels promises to offer greater insight into sea ice scattering mechanism and overcome some of the shortcomings of single- polarimetric classifiers. Two spatially and temporally coincident pairs of fully polarimetric acquisitions from the TerraSAR-X/TanDEM-X and RADARSAT-2 satellites are investigated. Proposed supervised classification algorithm consists of two steps: The first step comprises a feature extraction, the results of which are ingested into a neural network classifier in the second step. Based on the common coherency and covariance matrix, we extract a number of features and analyze the relevance and redundancy by means of mutual information for the purpose of sea ice classification. Coherency matrix based features which require an eigendecomposition are found to be either of low relevance or redundant to other covariance matrix based features. Among the most useful features for classification are matrix invariant based features (Geometric Intensity, Scattering Diversity, Surface Scattering Fraction).
NASA Astrophysics Data System (ADS)
Zhang, Yunlu; Yan, Lei; Liou, Frank
2018-05-01
The quality initial guess of deformation parameters in digital image correlation (DIC) has a serious impact on convergence, robustness, and efficiency of the following subpixel level searching stage. In this work, an improved feature-based initial guess (FB-IG) scheme is presented to provide initial guess for points of interest (POIs) inside a large region. Oriented FAST and Rotated BRIEF (ORB) features are semi-uniformly extracted from the region of interest (ROI) and matched to provide initial deformation information. False matched pairs are eliminated by the novel feature guided Gaussian mixture model (FG-GMM) point set registration algorithm, and nonuniform deformation parameters of the versatile reproducing kernel Hilbert space (RKHS) function are calculated simultaneously. Validations on simulated images and real-world mini tensile test verify that this scheme can robustly and accurately compute initial guesses with semi-subpixel level accuracy in cases with small or large translation, deformation, or rotation.
Quasi-Epipolar Resampling of High Resolution Satellite Stereo Imagery for Semi Global Matching
NASA Astrophysics Data System (ADS)
Tatar, N.; Saadatseresht, M.; Arefi, H.; Hadavand, A.
2015-12-01
Semi-global matching is a well-known stereo matching algorithm in photogrammetric and computer vision society. Epipolar images are supposed as input of this algorithm. Epipolar geometry of linear array scanners is not a straight line as in case of frame camera. Traditional epipolar resampling algorithms demands for rational polynomial coefficients (RPCs), physical sensor model or ground control points. In this paper we propose a new solution for epipolar resampling method which works without the need for these information. In proposed method, automatic feature extraction algorithms are employed to generate corresponding features for registering stereo pairs. Also original images are divided into small tiles. In this way by omitting the need for extra information, the speed of matching algorithm increased and the need for high temporal memory decreased. Our experiments on GeoEye-1 stereo pair captured over Qom city in Iran demonstrates that the epipolar images are generated with sub-pixel accuracy.
75 FR 39329 - Interagency Guidance on Asset Securitization Activities
Federal Register 2010, 2011, 2012, 2013, 2014
2010-07-08
... DEPARTMENT OF THE TREASURY Office of Thrift Supervision Interagency Guidance on Asset Securitization Activities AGENCY: Office of Thrift Supervision (OTS), Treasury. ACTION: Notice and request for...; and Information Collection Comments, Chief Counsel's Office, Office of Thrift Supervision, 1700 G...
75 FR 30107 - Community Reinvestment Act Sunshine
Federal Register 2010, 2011, 2012, 2013, 2014
2010-05-28
... DEPARTMENT OF THE TREASURY Office of Thrift Supervision Community Reinvestment Act Sunshine AGENCY: Office of Thrift Supervision (OTS), Treasury. ACTION: Notice and request for comment. SUMMARY: The... Supervision within the Department of the Treasury will submit the proposed information collection requirement...
Kuhlmann, Andrea; Reuter, Verena; Schramek, Renate; Dimitrov, Todor; Görnig, Matthias; Matip, Eva-Maria; Matthies, Olaf; Naroska, Edwin
2018-01-01
The "OurPuppet" project comprises a sensor-based, interactive puppet that will be developed to communicate with people in need of care during a short period of absence of the informal caregiver. Specially qualified puppet guides will support the use of the new technical development. They instruct people with dementia and caregivers on how to use the puppet and supervise the (informal) care relationship through discussions on a regular basis. The article shows the specific components of users' needs for which the concrete technical development should find answers. It also focuses on the opportunities and challenges for the technical and social developmental process accompanied by these demands. The analysis of the users' needs is based on a participatory approach. Semi-structured focus group interviews were conducted with informal caregivers, nurses and volunteers in order to identify typical situations in home care settings. The interviews were paraphrased and summarized in order to deduce inductive categories (qualitative data analysis), which describe everyday situations that the technical system should address. Such analyses provide information about the needs of potential users and indicate how to design such technical systems. Furthermore, opportunities and challenges of the development process as well as important contextual information were identified.
ERIC Educational Resources Information Center
Cobia, Debra C.; Boes, Susan R.
2000-01-01
Discusses ethical conflicts related to issues of informed consent, due process, competence, confidentiality, and dual relationships in supervision. Proposes two strategies as ways to minimize the potential for ethical conflict in post-master's supervision: the use of professional disclosure statements by supervisors and the development of formal…
High-order distance-based multiview stochastic learning in image classification.
Yu, Jun; Rui, Yong; Tang, Yuan Yan; Tao, Dacheng
2014-12-01
How do we find all images in a larger set of images which have a specific content? Or estimate the position of a specific object relative to the camera? Image classification methods, like support vector machine (supervised) and transductive support vector machine (semi-supervised), are invaluable tools for the applications of content-based image retrieval, pose estimation, and optical character recognition. However, these methods only can handle the images represented by single feature. In many cases, different features (or multiview data) can be obtained, and how to efficiently utilize them is a challenge. It is inappropriate for the traditionally concatenating schema to link features of different views into a long vector. The reason is each view has its specific statistical property and physical interpretation. In this paper, we propose a high-order distance-based multiview stochastic learning (HD-MSL) method for image classification. HD-MSL effectively combines varied features into a unified representation and integrates the labeling information based on a probabilistic framework. In comparison with the existing strategies, our approach adopts the high-order distance obtained from the hypergraph to replace pairwise distance in estimating the probability matrix of data distribution. In addition, the proposed approach can automatically learn a combination coefficient for each view, which plays an important role in utilizing the complementary information of multiview data. An alternative optimization is designed to solve the objective functions of HD-MSL and obtain different views on coefficients and classification scores simultaneously. Experiments on two real world datasets demonstrate the effectiveness of HD-MSL in image classification.
Empirical Analysis of Exploiting Review Helpfulness for Extractive Summarization of Online Reviews
ERIC Educational Resources Information Center
Xiong, Wenting; Litman, Diane
2014-01-01
We propose a novel unsupervised extractive approach for summarizing online reviews by exploiting review helpfulness ratings. In addition to using the helpfulness ratings for review-level filtering, we suggest using them as the supervision of a topic model for sentence-level content scoring. The proposed method is metadata-driven, requiring no…
Spatiotemporal modelling of groundwater extraction in semi-arid central Queensland, Australia
NASA Astrophysics Data System (ADS)
Keir, Greg; Bulovic, Nevenka; McIntyre, Neil
2016-04-01
The semi-arid Surat Basin in central Queensland, Australia, forms part of the Great Artesian Basin, a groundwater resource of national significance. While this area relies heavily on groundwater supply bores to sustain agricultural industries and rural life in general, measurement of groundwater extraction rates is very limited. Consequently, regional groundwater extraction rates are not well known, which may have implications for regional numerical groundwater modelling. However, flows from a small number of bores are metered, and less precise anecdotal estimates of extraction are increasingly available. There is also an increasing number of other spatiotemporal datasets which may help predict extraction rates (e.g. rainfall, temperature, soils, stocking rates etc.). These can be used to construct spatial multivariate regression models to estimate extraction. The data exhibit complicated statistical features, such as zero-valued observations, non-Gaussianity, and non-stationarity, which limit the use of many classical estimation techniques, such as kriging. As well, water extraction histories may exhibit temporal autocorrelation. To account for these features, we employ a separable space-time model to predict bore extraction rates using the R-INLA package for computationally efficient Bayesian inference. A joint approach is used to model both the probability (using a binomial likelihood) and magnitude (using a gamma likelihood) of extraction. The correlation between extraction rates in space and time is modelled using a Gaussian Markov Random Field (GMRF) with a Matérn spatial covariance function which can evolve over time according to an autoregressive model. To reduce computational burden, we allow the GMRF to be evaluated at a relatively coarse temporal resolution, while still allowing predictions to be made at arbitrarily small time scales. We describe the process of model selection and inference using an information criterion approach, and present some preliminary results from the study area. We conclude by discussing issues related with upscaling of the modelling approach to the entire basin, including merging of extraction rate observations with different precision, temporal resolution, and even potentially different likelihoods.
Information retrieval and terminology extraction in online resources for patients with diabetes.
Seljan, Sanja; Baretić, Maja; Kucis, Vlasta
2014-06-01
Terminology use, as a mean for information retrieval or document indexing, plays an important role in health literacy. Specific types of users, i.e. patients with diabetes need access to various online resources (on foreign and/or native language) searching for information on self-education of basic diabetic knowledge, on self-care activities regarding importance of dietetic food, medications, physical exercises and on self-management of insulin pumps. Automatic extraction of corpus-based terminology from online texts, manuals or professional papers, can help in building terminology lists or list of "browsing phrases" useful in information retrieval or in document indexing. Specific terminology lists represent an intermediate step between free text search and controlled vocabulary, between user's demands and existing online resources in native and foreign language. The research aiming to detect the role of terminology in online resources, is conducted on English and Croatian manuals and Croatian online texts, and divided into three interrelated parts: i) comparison of professional and popular terminology use ii) evaluation of automatic statistically-based terminology extraction on English and Croatian texts iii) comparison and evaluation of extracted terminology performed on English manual using statistical and hybrid approaches. Extracted terminology candidates are evaluated by comparison with three types of reference lists: list created by professional medical person, list of highly professional vocabulary contained in MeSH and list created by non-medical persons, made as intersection of 15 lists. Results report on use of popular and professional terminology in online diabetes resources, on evaluation of automatically extracted terminology candidates in English and Croatian texts and on comparison of statistical and hybrid extraction methods in English text. Evaluation of automatic and semi-automatic terminology extraction methods is performed by recall, precision and f-measure.
Supervision of tunnelling constructions and software used for their evaluation
NASA Astrophysics Data System (ADS)
Caravanas, Aristotelis; Hilar, Matous
2017-09-01
Supervision is a common instrument for controlling constructions of tunnels. In order to suit relevant project’s purposes a supervision procedure is modified by local conditions, habits, codes and ways of allocating of a particular tunnelling project. The duties of tunnel supervision are specified in an agreement with the client and they can include a wide range of activities. On large scale tunnelling projects the supervision tasks are performed by a high number of people of different professions. Teamwork, smooth communication and coordination are required in order to successfully fulfil supervision tasks. The efficiency and quality of tunnel supervision work are enhanced when specialized software applications are used. Such applications should allow on-line data management and the prompt evaluation, reporting and sharing of relevant construction information and other aspects. The client is provided with an as-built database that contains all the relevant information related to a construction process, which is a valuable tool for the claim management as well as for the evaluation of structure defects that can occur in the future. As a result, the level of risks related to tunnel constructions is decreased.
25 CFR 115.403 - Who will receive information regarding a minor's supervised account?
Code of Federal Regulations, 2011 CFR
2011-04-01
... 25 Indians 1 2011-04-01 2011-04-01 false Who will receive information regarding a minor's... FINANCIAL ACTIVITIES TRUST FUNDS FOR TRIBES AND INDIVIDUAL INDIANS IIM Accounts: Minors § 115.403 Who will receive information regarding a minor's supervised account? (a) The parent(s) with legal custody of the...
7 CFR 1230.623 - Supervision of referendum.
Code of Federal Regulations, 2011 CFR
2011-01-01
... 7 Agriculture 10 2011-01-01 2011-01-01 false Supervision of referendum. 1230.623 Section 1230.623 Agriculture Regulations of the Department of Agriculture (Continued) AGRICULTURAL MARKETING SERVICE (MARKETING... CONSUMER INFORMATION Procedures for the Conduct of Referendum Referendum § 1230.623 Supervision of...
7 CFR 1230.623 - Supervision of referendum.
Code of Federal Regulations, 2012 CFR
2012-01-01
... 7 Agriculture 10 2012-01-01 2012-01-01 false Supervision of referendum. 1230.623 Section 1230.623 Agriculture Regulations of the Department of Agriculture (Continued) AGRICULTURAL MARKETING SERVICE (MARKETING... CONSUMER INFORMATION Procedures for the Conduct of Referendum Referendum § 1230.623 Supervision of...
7 CFR 1230.623 - Supervision of referendum.
Code of Federal Regulations, 2010 CFR
2010-01-01
... 7 Agriculture 10 2010-01-01 2010-01-01 false Supervision of referendum. 1230.623 Section 1230.623 Agriculture Regulations of the Department of Agriculture (Continued) AGRICULTURAL MARKETING SERVICE (MARKETING... CONSUMER INFORMATION Procedures for the Conduct of Referendum Referendum § 1230.623 Supervision of...
76 FR 37894 - Amendment of a Savings Association's Bylaws
Federal Register 2010, 2011, 2012, 2013, 2014
2011-06-28
... DEPARTMENT OF THE TREASURY Office of Thrift Supervision Amendment of a Savings Association's Bylaws AGENCY: Office of Thrift Supervision (OTS), Treasury. ACTION: Notice and request for comment... Office of Thrift Supervision within the Department of the Treasury will submit the proposed information...
7 CFR 1230.623 - Supervision of referendum.
Code of Federal Regulations, 2013 CFR
2013-01-01
... 7 Agriculture 10 2013-01-01 2013-01-01 false Supervision of referendum. 1230.623 Section 1230.623 Agriculture Regulations of the Department of Agriculture (Continued) AGRICULTURAL MARKETING SERVICE (MARKETING... CONSUMER INFORMATION Procedures for the Conduct of Referendum Referendum § 1230.623 Supervision of...
76 FR 7630 - Interagency Charter and Federal Deposit Insurance Application
Federal Register 2010, 2011, 2012, 2013, 2014
2011-02-10
... DEPARTMENT OF THE TREASURY Office of Thrift Supervision Interagency Charter and Federal Deposit Insurance Application AGENCY: Office of Thrift Supervision (OTS), Treasury. ACTION: Notice and request for...) 393-6974; and Information Collection Comments, Chief Counsel's Office, Office of Thrift Supervision...
76 FR 36626 - Amendment of a Federal Savings Association Charter
Federal Register 2010, 2011, 2012, 2013, 2014
2011-06-22
... DEPARTMENT OF THE TREASURY Office of Thrift Supervision Amendment of a Federal Savings Association Charter AGENCY: Office of Thrift Supervision (OTS), Treasury. ACTION: Notice and request for comment...; and Information Collection Comments, Chief Counsel's Office, Office of Thrift Supervision, 1700 G...
7 CFR 1230.623 - Supervision of referendum.
Code of Federal Regulations, 2014 CFR
2014-01-01
... 7 Agriculture 10 2014-01-01 2014-01-01 false Supervision of referendum. 1230.623 Section 1230.623 Agriculture Regulations of the Department of Agriculture (Continued) AGRICULTURAL MARKETING SERVICE (MARKETING... CONSUMER INFORMATION Procedures for the Conduct of Referendum Referendum § 1230.623 Supervision of...
Kamali, Hossein; Aminimoghadamfarouj, Noushin; Golmakani, Ebrahim; Nematollahi, Alireza
2015-01-01
Aim: The aim of this study was to examine and evaluate crucial variables in essential oils extraction process from Lavandula hybrida through static-dynamic and semi-continuous techniques using response surface method. Materials and Methods: Essential oil components were extracted from Lavandula hybrida (Lavandin) flowers using supercritical carbon dioxide via static-dynamic steps (SDS) procedure, and semi-continuous (SC) technique. Results: Using response surface method the optimum extraction yield (4.768%) was obtained via SDS at 108.7 bar, 48.5°C, 120 min (static: 8×15), 24 min (dynamic: 8×3 min) in contrast to the 4.620% extraction yield for the SC at 111.6 bar, 49.2°C, 14 min (static), 121.1 min (dynamic). Conclusion: The results indicated that a substantial reduction (81.56%) solvent usage (kg CO2/g oil) is observed in the SDS method versus the conventional SC method. PMID:25598636
NASA Astrophysics Data System (ADS)
Gangeh, Mehrdad J.; Fung, Brandon; Tadayyon, Hadi; Tran, William T.; Czarnota, Gregory J.
2016-03-01
A non-invasive computer-aided-theragnosis (CAT) system was developed for the early assessment of responses to neoadjuvant chemotherapy in patients with locally advanced breast cancer. The CAT system was based on quantitative ultrasound spectroscopy methods comprising several modules including feature extraction, a metric to measure the dissimilarity between "pre-" and "mid-treatment" scans, and a supervised learning algorithm for the classification of patients to responders/non-responders. One major requirement for the successful design of a high-performance CAT system is to accurately measure the changes in parametric maps before treatment onset and during the course of treatment. To this end, a unified framework based on Hilbert-Schmidt independence criterion (HSIC) was used for the design of feature extraction from parametric maps and the dissimilarity measure between the "pre-" and "mid-treatment" scans. For the feature extraction, HSIC was used to design a supervised dictionary learning (SDL) method by maximizing the dependency between the scans taken from "pre-" and "mid-treatment" with "dummy labels" given to the scans. For the dissimilarity measure, an HSIC-based metric was employed to effectively measure the changes in parametric maps as an indication of treatment effectiveness. The HSIC-based feature extraction and dissimilarity measure used a kernel function to nonlinearly transform input vectors into a higher dimensional feature space and computed the population means in the new space, where enhanced group separability was ideally obtained. The results of the classification using the developed CAT system indicated an improvement of performance compared to a CAT system with basic features using histogram of intensity.
[Nurse supervision in health basic units].
Correia, Valesca Silveira; Servo, Maria Lúcia Silva
2006-01-01
This qualitative study intends to evaluate the pattern of supervision of the nurse in health basic units, in Feira de Santana city (Bahía-Brasil), between August 2001 and June 2002. The objective was to describe the supervision and the existence of supervision systematics for the nurse. A questionnaire was used to take informations from a group of sixteen (16) nurses in actual professional work. Descriptive statistical procedures for data analysis were used. It can be concluded that systematic supervision is practiced in 64% of the nurses and in 36% of the cases systematic supervision do not occur.
Shultz, Laura A Schwent; Pedersen, Heather A; Roper, Brad L; Rey-Casserly, Celiane
2014-01-01
Within the psychology supervision literature, most theoretical models and practices pertain to general clinical or counseling psychology. Supervision specific to clinical neuropsychology has garnered little attention. This survey study explores supervision training, practices, and perspectives of neuropsychology supervisors. Practicing neuropsychologists were invited to participate in an online survey via listservs and email lists. Of 451 respondents, 382 provided supervision to students, interns, and/or fellows in settings such as VA medical centers (37%), university medical centers (35%), and private practice (15%). Most supervisors (84%) reported supervision was discussed in graduate school "minimally" or "not at all." Although 67% completed informal didactics or received continuing education in supervision, only 27% reported receiving training specific to neuropsychology supervision. Notably, only 39% were satisfied with their training in providing supervision and 77% indicated they would likely participate in training in providing supervision, if available at professional conferences. Results indicate that clinical neuropsychology as a specialty has paid scant attention to developing supervision models and explicit training in supervision skills. We recommend that the specialty develop models of supervision for neuropsychological practice, supervision standards and competencies, training methods in provision of supervision, and benchmark measures for supervision competencies.
Guidelines for clinical supervision in health service psychology.
2015-01-01
This document outlines guidelines for supervision of students in health service psychology education and training programs. The goal was to capture optimal performance expectations for psychologists who supervise. It is based on the premises that supervisors (a) strive to achieve competence in the provision of supervision and (b) employ a competency-based, meta-theoretical approach to the supervision process. The Guidelines on Supervision were developed as a resource to inform education and training regarding the implementation of competency-based supervision. The Guidelines on Supervision build on the robust literatures on competency-based education and clinical supervision. They are organized around seven domains: supervisor competence; diversity; relationships; professionalism; assessment/evaluation/feedback; problems of professional competence, and ethical, legal, and regulatory considerations. The Guidelines on Supervision represent the collective effort of a task force convened by the American Psychological Association (APA) Board of Educational Affairs (BEA). PsycINFO Database Record (c) 2015 APA, all rights reserved.
Sabey, Abigail; Harris, Michael; van Hamel, Clare
2016-03-01
General practice is a popular placement in the second year of Foundation training. Evaluations suggest this is a positive experience for most trainee doctors and benefits their perceptions of primary care, but the impact on primary care supervisors has not been considered. At a time when placements may need to increase, understanding the experience of the GP supervisors responsible for these placements is important. To explore the views, experiences and needs of GPs who supervise F2 doctors in their practices including their perceptions of the benefits to individuals and practices. A qualitative approach with GPs from across Severn Postgraduate Medical Education who supervise F2 doctors. Semi-structured interviews with 15 GPs between December 2012 and April 2013. GP supervisors are enthusiastic about helping F2 doctors to appreciate the uniqueness of primary care. Workload and responsibility around supervision is considerable making a supportive team important. Working with young, enthusiastic doctors boosts morale in the team. The presence of freshly trained minds prompts GPs to consider their own learning needs. Being a supervisor can increase job satisfaction; the teaching role gives respite from the demanding nature of GP work. Supervisors are positive about working with F2s, who lift morale in the team and challenge GPs in their own practice and learning. This boosts job and personal satisfaction. Nonetheless, consideration should be given to managing teaching workload and team support for supervision.
76 FR 21801 - Amendment of a Federal Savings Association Charter
Federal Register 2010, 2011, 2012, 2013, 2014
2011-04-18
... DEPARTMENT OF THE TREASURY Office of Thrift Supervision Amendment of a Federal Savings Association Charter AGENCY: Office of Thrift Supervision (OTS), Treasury. ACTION: Notice and request for comment... Office of Thrift Supervision within the Department of the Treasury will submit the proposed information...
Unfinished Business: Subjectivity and Supervision
ERIC Educational Resources Information Center
Green, Bill
2005-01-01
Within the now burgeoning literature on doctoral research education, postgraduate research supervision continues to be a problematical issue, practically and theoretically. This paper seeks to explore and understand supervision as a distinctive kind of pedagogic practice. Informed by a larger research project, it draws on poststructuralism,…
A review of supervised object-based land-cover image classification
NASA Astrophysics Data System (ADS)
Ma, Lei; Li, Manchun; Ma, Xiaoxue; Cheng, Liang; Du, Peijun; Liu, Yongxue
2017-08-01
Object-based image classification for land-cover mapping purposes using remote-sensing imagery has attracted significant attention in recent years. Numerous studies conducted over the past decade have investigated a broad array of sensors, feature selection, classifiers, and other factors of interest. However, these research results have not yet been synthesized to provide coherent guidance on the effect of different supervised object-based land-cover classification processes. In this study, we first construct a database with 28 fields using qualitative and quantitative information extracted from 254 experimental cases described in 173 scientific papers. Second, the results of the meta-analysis are reported, including general characteristics of the studies (e.g., the geographic range of relevant institutes, preferred journals) and the relationships between factors of interest (e.g., spatial resolution and study area or optimal segmentation scale, accuracy and number of targeted classes), especially with respect to the classification accuracy of different sensors, segmentation scale, training set size, supervised classifiers, and land-cover types. Third, useful data on supervised object-based image classification are determined from the meta-analysis. For example, we find that supervised object-based classification is currently experiencing rapid advances, while development of the fuzzy technique is limited in the object-based framework. Furthermore, spatial resolution correlates with the optimal segmentation scale and study area, and Random Forest (RF) shows the best performance in object-based classification. The area-based accuracy assessment method can obtain stable classification performance, and indicates a strong correlation between accuracy and training set size, while the accuracy of the point-based method is likely to be unstable due to mixed objects. In addition, the overall accuracy benefits from higher spatial resolution images (e.g., unmanned aerial vehicle) or agricultural sites where it also correlates with the number of targeted classes. More than 95.6% of studies involve an area less than 300 ha, and the spatial resolution of images is predominantly between 0 and 2 m. Furthermore, we identify some methods that may advance supervised object-based image classification. For example, deep learning and type-2 fuzzy techniques may further improve classification accuracy. Lastly, scientists are strongly encouraged to report results of uncertainty studies to further explore the effects of varied factors on supervised object-based image classification.
Psoriasis image representation using patch-based dictionary learning for erythema severity scoring.
George, Yasmeen; Aldeen, Mohammad; Garnavi, Rahil
2018-06-01
Psoriasis is a chronic skin disease which can be life-threatening. Accurate severity scoring helps dermatologists to decide on the treatment. In this paper, we present a semi-supervised computer-aided system for automatic erythema severity scoring in psoriasis images. Firstly, the unsupervised stage includes a novel image representation method. We construct a dictionary, which is then used in the sparse representation for local feature extraction. To acquire the final image representation vector, an aggregation method is exploited over the local features. Secondly, the supervised phase is where various multi-class machine learning (ML) classifiers are trained for erythema severity scoring. Finally, we compare the proposed system with two popular unsupervised feature extractor methods, namely: bag of visual words model (BoVWs) and AlexNet pretrained model. Root mean square error (RMSE) and F1 score are used as performance measures for the learned dictionaries and the trained ML models, respectively. A psoriasis image set consisting of 676 images, is used in this study. Experimental results demonstrate that the use of the proposed procedure can provide a setup where erythema scoring is accurate and consistent. Also, it is revealed that dictionaries with large number of atoms and small patch sizes yield the best representative erythema severity features. Further, random forest (RF) outperforms other classifiers with F1 score 0.71, followed by support vector machine (SVM) and boosting with 0.66 and 0.64 scores, respectively. Furthermore, the conducted comparative studies confirm the effectiveness of the proposed approach with improvement of 9% and 12% over BoVWs and AlexNet based features, respectively. Crown Copyright © 2018. Published by Elsevier Ltd. All rights reserved.
Improving EEG-Based Driver Fatigue Classification Using Sparse-Deep Belief Networks.
Chai, Rifai; Ling, Sai Ho; San, Phyo Phyo; Naik, Ganesh R; Nguyen, Tuan N; Tran, Yvonne; Craig, Ashley; Nguyen, Hung T
2017-01-01
This paper presents an improvement of classification performance for electroencephalography (EEG)-based driver fatigue classification between fatigue and alert states with the data collected from 43 participants. The system employs autoregressive (AR) modeling as the features extraction algorithm, and sparse-deep belief networks (sparse-DBN) as the classification algorithm. Compared to other classifiers, sparse-DBN is a semi supervised learning method which combines unsupervised learning for modeling features in the pre-training layer and supervised learning for classification in the following layer. The sparsity in sparse-DBN is achieved with a regularization term that penalizes a deviation of the expected activation of hidden units from a fixed low-level prevents the network from overfitting and is able to learn low-level structures as well as high-level structures. For comparison, the artificial neural networks (ANN), Bayesian neural networks (BNN), and original deep belief networks (DBN) classifiers are used. The classification results show that using AR feature extractor and DBN classifiers, the classification performance achieves an improved classification performance with a of sensitivity of 90.8%, a specificity of 90.4%, an accuracy of 90.6%, and an area under the receiver operating curve (AUROC) of 0.94 compared to ANN (sensitivity at 80.8%, specificity at 77.8%, accuracy at 79.3% with AUC-ROC of 0.83) and BNN classifiers (sensitivity at 84.3%, specificity at 83%, accuracy at 83.6% with AUROC of 0.87). Using the sparse-DBN classifier, the classification performance improved further with sensitivity of 93.9%, a specificity of 92.3%, and an accuracy of 93.1% with AUROC of 0.96. Overall, the sparse-DBN classifier improved accuracy by 13.8, 9.5, and 2.5% over ANN, BNN, and DBN classifiers, respectively.
Improving EEG-Based Driver Fatigue Classification Using Sparse-Deep Belief Networks
Chai, Rifai; Ling, Sai Ho; San, Phyo Phyo; Naik, Ganesh R.; Nguyen, Tuan N.; Tran, Yvonne; Craig, Ashley; Nguyen, Hung T.
2017-01-01
This paper presents an improvement of classification performance for electroencephalography (EEG)-based driver fatigue classification between fatigue and alert states with the data collected from 43 participants. The system employs autoregressive (AR) modeling as the features extraction algorithm, and sparse-deep belief networks (sparse-DBN) as the classification algorithm. Compared to other classifiers, sparse-DBN is a semi supervised learning method which combines unsupervised learning for modeling features in the pre-training layer and supervised learning for classification in the following layer. The sparsity in sparse-DBN is achieved with a regularization term that penalizes a deviation of the expected activation of hidden units from a fixed low-level prevents the network from overfitting and is able to learn low-level structures as well as high-level structures. For comparison, the artificial neural networks (ANN), Bayesian neural networks (BNN), and original deep belief networks (DBN) classifiers are used. The classification results show that using AR feature extractor and DBN classifiers, the classification performance achieves an improved classification performance with a of sensitivity of 90.8%, a specificity of 90.4%, an accuracy of 90.6%, and an area under the receiver operating curve (AUROC) of 0.94 compared to ANN (sensitivity at 80.8%, specificity at 77.8%, accuracy at 79.3% with AUC-ROC of 0.83) and BNN classifiers (sensitivity at 84.3%, specificity at 83%, accuracy at 83.6% with AUROC of 0.87). Using the sparse-DBN classifier, the classification performance improved further with sensitivity of 93.9%, a specificity of 92.3%, and an accuracy of 93.1% with AUROC of 0.96. Overall, the sparse-DBN classifier improved accuracy by 13.8, 9.5, and 2.5% over ANN, BNN, and DBN classifiers, respectively. PMID:28326009
Lidar Cloud Detection with Fully Convolutional Networks
NASA Astrophysics Data System (ADS)
Cromwell, E.; Flynn, D.
2017-12-01
The vertical distribution of clouds from active remote sensing instrumentation is a widely used data product from global atmospheric measuring sites. The presence of clouds can be expressed as a binary cloud mask and is a primary input for climate modeling efforts and cloud formation studies. Current cloud detection algorithms producing these masks do not accurately identify the cloud boundaries and tend to oversample or over-represent the cloud. This translates as uncertainty for assessing the radiative impact of clouds and tracking changes in cloud climatologies. The Atmospheric Radiation Measurement (ARM) program has over 20 years of micro-pulse lidar (MPL) and High Spectral Resolution Lidar (HSRL) instrument data and companion automated cloud mask product at the mid-latitude Southern Great Plains (SGP) and the polar North Slope of Alaska (NSA) atmospheric observatory. Using this data, we train a fully convolutional network (FCN) with semi-supervised learning to segment lidar imagery into geometric time-height cloud locations for the SGP site and MPL instrument. We then use transfer learning to train a FCN for (1) the MPL instrument at the NSA site and (2) for the HSRL. In our semi-supervised approach, we pre-train the classification layers of the FCN with weakly labeled lidar data. Then, we facilitate end-to-end unsupervised pre-training and transition to fully supervised learning with ground truth labeled data. Our goal is to improve the cloud mask accuracy and precision for the MPL instrument to 95% and 80%, respectively, compared to the current cloud mask algorithms of 89% and 50%. For the transfer learning based FCN for the HSRL instrument, our goal is to achieve a cloud mask accuracy of 90% and a precision of 80%.
Multi-test cervical cancer diagnosis with missing data estimation
NASA Astrophysics Data System (ADS)
Xu, Tao; Huang, Xiaolei; Kim, Edward; Long, L. Rodney; Antani, Sameer
2015-03-01
Cervical cancer is a leading most common type of cancer for women worldwide. Existing screening programs for cervical cancer suffer from low sensitivity. Using images of the cervix (cervigrams) as an aid in detecting pre-cancerous changes to the cervix has good potential to improve sensitivity and help reduce the number of cervical cancer cases. In this paper, we present a method that utilizes multi-modality information extracted from multiple tests of a patient's visit to classify the patient visit to be either low-risk or high-risk. Our algorithm integrates image features and text features to make a diagnosis. We also present two strategies to estimate the missing values in text features: Image Classifier Supervised Mean Imputation (ICSMI) and Image Classifier Supervised Linear Interpolation (ICSLI). We evaluate our method on a large medical dataset and compare it with several alternative approaches. The results show that the proposed method with ICSLI strategy achieves the best result of 83.03% specificity and 76.36% sensitivity. When higher specificity is desired, our method can achieve 90% specificity with 62.12% sensitivity.
NASA Astrophysics Data System (ADS)
Dogon-yaro, M. A.; Kumar, P.; Rahman, A. Abdul; Buyuksalih, G.
2016-10-01
Timely and accurate acquisition of information on the condition and structural changes of urban trees serves as a tool for decision makers to better appreciate urban ecosystems and their numerous values which are critical to building up strategies for sustainable development. The conventional techniques used for extracting tree features include; ground surveying and interpretation of the aerial photography. However, these techniques are associated with some constraint, such as labour intensive field work, a lot of financial requirement, influences by weather condition and topographical covers which can be overcome by means of integrated airborne based LiDAR and very high resolution digital image datasets. This study presented a semi-automated approach for extracting urban trees from integrated airborne based LIDAR and multispectral digital image datasets over Istanbul city of Turkey. The above scheme includes detection and extraction of shadow free vegetation features based on spectral properties of digital images using shadow index and NDVI techniques and automated extraction of 3D information about vegetation features from the integrated processing of shadow free vegetation image and LiDAR point cloud datasets. The ability of the developed algorithms shows a promising result as an automated and cost effective approach to estimating and delineated 3D information of urban trees. The research also proved that integrated datasets is a suitable technology and a viable source of information for city managers to be used in urban trees management.
NASA Astrophysics Data System (ADS)
Rahardjo, Andhika Priotomo; Fauzantoro, Ahmad; Gozan, Misri
2018-02-01
The decline in cigarette production as the solution of health problems can interfere with the welfare of tobacco farmers in Indonesia. So, it is required to utilize the alternative uses of tobacco with chemical compounds inside it as the raw material for producing alternative products. One of the methods that is efficient in separating chemical compounds from plant extracts is fractionation and characterization method. This method has never been used for Nicotiana tabaccum L. extract using semi polar and polar solvents. This study begins with preparing Nicotiana tabaccum L. extract ingredients obtained through reflux ethanol extraction process. Extracts are analyzed by HPLC which serves to determine the chemical compounds in tobacco extract qualitatively. Extract that has been analyzed, is then fractionated using column chromatography with semi polar (ethyl acetate) and polar (ethane) solvents sequentially. Chemical compounds from tobacco extracts will be dissolved in accordance with the polarity of each solvents. The chemical compound is then characterized using HPLC quantitatively and qualitatively. Then, the data that has been obtained is used to find the partition coefficient of the main components in Nicotiana tabaccum L., which is Nicotine (kN) in Virginia 1 (Ethyl Acetate) fraction at 0.075; Virginia 2 (Ethyl Acetate) fraction at 0.037; And Virginia 3 (Ethyl Acetate) fraction at 0.043.
Identification of Active Compounds in the Root of Merung (Coptosapelta tomentosa Valeton K. Heyne)
NASA Astrophysics Data System (ADS)
Fitriyana
2018-04-01
The roots of Merung (Coptosapelta tomentosa Valeton K. Heyne) are a group of shrubs usually found on the margins of secondary dryland forest. Empirically, local people have been using the roots of Merung for medical treatment. However, some researches show that the plant extract is used as a poisonous material applied on the tip of the arrow (dart). Based on the online literature study, there are less than 5 articles that provide information about the active compound of this root extract. This study aimed to give additional information more deeply about the content of active compound of Merung root extract in three fractions, n-hexane (nonpolar), ethyl acetate (semi polar) and methanol (polar). The extract was then analysed using Gas Chromatography-Mass Spectrometry (GC-MS). GC-MS analysis of root extract in n-hexane showed there were 56 compounds, with the main compound being decanoic acid, methyl ester (peak 5, 10.13%), 11-Octadecenoic acid, methyl ester (peak 15, 10.43%) and 1H-Pyrazole, 3- (4-chlorophenyl) -4, 5-dihydro-1-phenyl (peak 43, 11.25%). Extracts in ethyl acetate fraction obtained 81 compounds. The largest component is Benzoic acid (peak 19, 22.40%), whereas in methanol there are 38 compounds, of which the main component is 2-Furancarboxaldehyde, 5-(hydroxyl methyl) (peak 29, 30.46%).
Psychodrama: A Creative Approach for Addressing Parallel Process in Group Supervision
ERIC Educational Resources Information Center
Hinkle, Michelle Gimenez
2008-01-01
This article provides a model for using psychodrama to address issues of parallel process during group supervision. Information on how to utilize the specific concepts and techniques of psychodrama in relation to group supervision is discussed. A case vignette of the model is provided.
17 CFR 200.72 - Supervision of internal organization.
Code of Federal Regulations, 2011 CFR
2011-04-01
... 17 Commodity and Securities Exchanges 2 2011-04-01 2011-04-01 false Supervision of internal organization. 200.72 Section 200.72 Commodity and Securities Exchanges SECURITIES AND EXCHANGE COMMISSION ORGANIZATION; CONDUCT AND ETHICS; AND INFORMATION AND REQUESTS Canons of Ethics § 200.72 Supervision of...
17 CFR 200.72 - Supervision of internal organization.
Code of Federal Regulations, 2012 CFR
2012-04-01
... 17 Commodity and Securities Exchanges 2 2012-04-01 2012-04-01 false Supervision of internal organization. 200.72 Section 200.72 Commodity and Securities Exchanges SECURITIES AND EXCHANGE COMMISSION ORGANIZATION; CONDUCT AND ETHICS; AND INFORMATION AND REQUESTS Canons of Ethics § 200.72 Supervision of...
17 CFR 200.72 - Supervision of internal organization.
Code of Federal Regulations, 2014 CFR
2014-04-01
... 17 Commodity and Securities Exchanges 3 2014-04-01 2014-04-01 false Supervision of internal organization. 200.72 Section 200.72 Commodity and Securities Exchanges SECURITIES AND EXCHANGE COMMISSION ORGANIZATION; CONDUCT AND ETHICS; AND INFORMATION AND REQUESTS Canons of Ethics § 200.72 Supervision of...
17 CFR 200.72 - Supervision of internal organization.
Code of Federal Regulations, 2013 CFR
2013-04-01
... 17 Commodity and Securities Exchanges 2 2013-04-01 2013-04-01 false Supervision of internal organization. 200.72 Section 200.72 Commodity and Securities Exchanges SECURITIES AND EXCHANGE COMMISSION ORGANIZATION; CONDUCT AND ETHICS; AND INFORMATION AND REQUESTS Canons of Ethics § 200.72 Supervision of...
17 CFR 200.72 - Supervision of internal organization.
Code of Federal Regulations, 2010 CFR
2010-04-01
... 17 Commodity and Securities Exchanges 2 2010-04-01 2010-04-01 false Supervision of internal organization. 200.72 Section 200.72 Commodity and Securities Exchanges SECURITIES AND EXCHANGE COMMISSION ORGANIZATION; CONDUCT AND ETHICS; AND INFORMATION AND REQUESTS Canons of Ethics § 200.72 Supervision of...
Informal sources of supervision in clinical training.
Farber, Barry A; Hazanov, Valery
2014-11-01
Although formal, assigned supervision is a potent source of learning and guidance for psychotherapy trainees, many beginning psychotherapists use other, informal sources of supervision or consultation for advice and support. Results of an online survey of beginning trainees (N = 146) indicate that other than their formally assigned supervisor, trainees most often consult with colleagues in their program, their own psychotherapist, and their significant other; that they're most likely to seek these other sources of help when they're feeling stuck or feel they've made a clinical mistake; that they do so because they need extra reassurance and suggestions; that they feel the advice given from these sources is helpful; and that they don't especially regret sharing this information. Several case examples are used to illustrate these points. Discussing clinical material with informal sources is, apparently, a great deal more common than typically acknowledged, and as such, has implications for training programs (including discussions of ethics) and formal supervision. © 2014 Wiley Periodicals, Inc.
Mao, Jin; Moore, Lisa R; Blank, Carrine E; Wu, Elvis Hsin-Hui; Ackerman, Marcia; Ranade, Sonali; Cui, Hong
2016-12-13
The large-scale analysis of phenomic data (i.e., full phenotypic traits of an organism, such as shape, metabolic substrates, and growth conditions) in microbial bioinformatics has been hampered by the lack of tools to rapidly and accurately extract phenotypic data from existing legacy text in the field of microbiology. To quickly obtain knowledge on the distribution and evolution of microbial traits, an information extraction system needed to be developed to extract phenotypic characters from large numbers of taxonomic descriptions so they can be used as input to existing phylogenetic analysis software packages. We report the development and evaluation of Microbial Phenomics Information Extractor (MicroPIE, version 0.1.0). MicroPIE is a natural language processing application that uses a robust supervised classification algorithm (Support Vector Machine) to identify characters from sentences in prokaryotic taxonomic descriptions, followed by a combination of algorithms applying linguistic rules with groups of known terms to extract characters as well as character states. The input to MicroPIE is a set of taxonomic descriptions (clean text). The output is a taxon-by-character matrix-with taxa in the rows and a set of 42 pre-defined characters (e.g., optimum growth temperature) in the columns. The performance of MicroPIE was evaluated against a gold standard matrix and another student-made matrix. Results show that, compared to the gold standard, MicroPIE extracted 21 characters (50%) with a Relaxed F1 score > 0.80 and 16 characters (38%) with Relaxed F1 scores ranging between 0.50 and 0.80. Inclusion of a character prediction component (SVM) improved the overall performance of MicroPIE, notably the precision. Evaluated against the same gold standard, MicroPIE performed significantly better than the undergraduate students. MicroPIE is a promising new tool for the rapid and efficient extraction of phenotypic character information from prokaryotic taxonomic descriptions. However, further development, including incorporation of ontologies, will be necessary to improve the performance of the extraction for some character types.
Optimal reinforcement of training datasets in semi-supervised landmark-based segmentation
NASA Astrophysics Data System (ADS)
Ibragimov, Bulat; Likar, Boštjan; Pernuš, Franjo; Vrtovec, Tomaž
2015-03-01
During the last couple of decades, the development of computerized image segmentation shifted from unsupervised to supervised methods, which made segmentation results more accurate and robust. However, the main disadvantage of supervised segmentation is a need for manual image annotation that is time-consuming and subjected to human error. To reduce the need for manual annotation, we propose a novel learning approach for training dataset reinforcement in the area of landmark-based segmentation, where newly detected landmarks are optimally combined with reference landmarks from the training dataset and therefore enriches the training process. The approach is formulated as a nonlinear optimization problem, where the solution is a vector of weighting factors that measures how reliable are the detected landmarks. The detected landmarks that are found to be more reliable are included into the training procedure with higher weighting factors, whereas the detected landmarks that are found to be less reliable are included with lower weighting factors. The approach is integrated into the landmark-based game-theoretic segmentation framework and validated against the problem of lung field segmentation from chest radiographs.
Hua, Chengcheng; Wang, Hong; Wang, Hong; Lu, Shaowen; Liu, Chong; Khalid, Syed Madiha
2018-04-11
Functional brain network (FBN) has become very popular to analyze the interaction between cortical regions in the last decade. But researchers always spend a long time to search the best way to compute FBN for their specific studies. The purpose of this study is to detect the proficiency of operators during their mineral grinding process controlling based on FBN. To save the search time, a novel semi-data-driven method of computing functional brain connection based on stacked autoencoder (BCSAE) is proposed in this paper. This method uses stacked autoencoder (SAE) to encode the multi-channel EEG data into codes and then computes the dissimilarity between the codes from every pair of electrodes to build FBN. The highlight of this method is that the SAE has a multi-layered structure and is semi-supervised, which means it can dig deeper information and generate better features. Then an experiment was performed, the EEG of the operators were collected while they were operating and analyzed to detect their proficiency. The results show that the BCSAE method generated more number of separable features with less redundancy, and the average accuracy of classification (96.18%) is higher than that of the control methods: PLV (92.19%) and PLI (78.39%).
12 CFR 911.3 - Prohibition on unauthorized use and disclosure of unpublished information.
Code of Federal Regulations, 2011 CFR
2011-01-01
... contract to provide services to the supervised entity or Bank member. (e) Government agencies. The Finance... control by any person, supervised entity, Bank member, government agency, or other entity of unpublished... as authorized in writing by the Finance Board, no person, supervised entity, Bank member, government...
Federal Register 2010, 2011, 2012, 2013, 2014
2011-07-19
... DEPARTMENT OF THE TREASURY Office of Thrift Supervision Application for Issuance of Subordinated... AGENCY: Office of Thrift Supervision (OTS), Treasury. ACTION: Notice and request for comment. SUMMARY... Thrift Supervision within the Department of the Treasury will submit the proposed information collection...
The UXO Classification Demonstration at San Luis Obispo, CA
2010-09-01
Set ................................45 2.17.2 Active Learning Training and Test Set ..........................................47 2.17.3 Extended...optimized algorithm by applying it to only the unlabeled data in the test set. 2.17.2 Active Learning Training and Test Set SIG also used active ... learning [12]. Active learning , an alternative approach for constructing a training set, is used in conjunction with either supervised or semi
Efficient use of unlabeled data for protein sequence classification: a comparative study.
Kuksa, Pavel; Huang, Pai-Hsi; Pavlovic, Vladimir
2009-04-29
Recent studies in computational primary protein sequence analysis have leveraged the power of unlabeled data. For example, predictive models based on string kernels trained on sequences known to belong to particular folds or superfamilies, the so-called labeled data set, can attain significantly improved accuracy if this data is supplemented with protein sequences that lack any class tags-the unlabeled data. In this study, we present a principled and biologically motivated computational framework that more effectively exploits the unlabeled data by only using the sequence regions that are more likely to be biologically relevant for better prediction accuracy. As overly-represented sequences in large uncurated databases may bias the estimation of computational models that rely on unlabeled data, we also propose a method to remove this bias and improve performance of the resulting classifiers. Combined with state-of-the-art string kernels, our proposed computational framework achieves very accurate semi-supervised protein remote fold and homology detection on three large unlabeled databases. It outperforms current state-of-the-art methods and exhibits significant reduction in running time. The unlabeled sequences used under the semi-supervised setting resemble the unpolished gemstones; when used as-is, they may carry unnecessary features and hence compromise the classification accuracy but once cut and polished, they improve the accuracy of the classifiers considerably.
Atkinson, Jonathan A; Lobet, Guillaume; Noll, Manuel; Meyer, Patrick E; Griffiths, Marcus; Wells, Darren M
2017-10-01
Genetic analyses of plant root systems require large datasets of extracted architectural traits. To quantify such traits from images of root systems, researchers often have to choose between automated tools (that are prone to error and extract only a limited number of architectural traits) or semi-automated ones (that are highly time consuming). We trained a Random Forest algorithm to infer architectural traits from automatically extracted image descriptors. The training was performed on a subset of the dataset, then applied to its entirety. This strategy allowed us to (i) decrease the image analysis time by 73% and (ii) extract meaningful architectural traits based on image descriptors. We also show that these traits are sufficient to identify the quantitative trait loci that had previously been discovered using a semi-automated method. We have shown that combining semi-automated image analysis with machine learning algorithms has the power to increase the throughput of large-scale root studies. We expect that such an approach will enable the quantification of more complex root systems for genetic studies. We also believe that our approach could be extended to other areas of plant phenotyping. © The Authors 2017. Published by Oxford University Press.
Atkinson, Jonathan A.; Lobet, Guillaume; Noll, Manuel; Meyer, Patrick E.; Griffiths, Marcus
2017-01-01
Abstract Genetic analyses of plant root systems require large datasets of extracted architectural traits. To quantify such traits from images of root systems, researchers often have to choose between automated tools (that are prone to error and extract only a limited number of architectural traits) or semi-automated ones (that are highly time consuming). We trained a Random Forest algorithm to infer architectural traits from automatically extracted image descriptors. The training was performed on a subset of the dataset, then applied to its entirety. This strategy allowed us to (i) decrease the image analysis time by 73% and (ii) extract meaningful architectural traits based on image descriptors. We also show that these traits are sufficient to identify the quantitative trait loci that had previously been discovered using a semi-automated method. We have shown that combining semi-automated image analysis with machine learning algorithms has the power to increase the throughput of large-scale root studies. We expect that such an approach will enable the quantification of more complex root systems for genetic studies. We also believe that our approach could be extended to other areas of plant phenotyping. PMID:29020748
Contamination of semi-solid dosage forms by leachables from aluminium tubes.
Haverkamp, Jan Boris; Lipke, Uwe; Zapf, Thomas; Galensa, Rudolf; Lipperheide, Cornelia
2008-11-01
The objective of this study was to determine to what extent bisphenol A (BPA), bisphenol A diglycidyl ether (BADGE) and its derivatives are extractable from epoxy-based coatings of aluminium tubes for pharmaceutical use and to monitor their leaching into different kinds of semi-solid dosage forms. Migration increasing factors should be evaluated. Extraction tests using acetonitrile for 10 days at 40 degrees C turned out to be suitable to estimate the maximum amount of extractables. A plain variability in the nature and amount of extractables among tubes of different vendors (n=7) could be demonstrated. Leaching of the remnants into various semi-solid drug products (ointment, cream, gel) during storage (30 degrees C/40 degrees C) was verifiable. Leachable profiles were, apart from storage time and temperature, decisively influenced by the matrix. In particular, matrix polarity seemed to play a crucial role. Thus, the highest amount of leachables was found in isopropanol-based carbomer gel. Furthermore, in-use conditions (mechanical stress) enhanced migration significantly. In order to ensure quality and safety of semi-solid formulae, interactions between the coating material and the drug product should be thoroughly evaluated.
Current Risk Management Practices in Psychotherapy Supervision.
Mehrtens, Ilayna K; Crapanzano, Kathleen; Tynes, L Lee
2017-12-01
Psychotherapy competence is a core skill for psychiatry residents, and psychotherapy supervision is a time-honored approach to teaching this skill. To explore the current supervision practices of psychiatry training programs, a 24-item questionnaire was sent to all program directors of Accreditation Council for Graduate Medical Education (ACGME)-approved adult psychiatry programs. The questionnaire included items regarding adherence to recently proposed therapy supervision practices aimed at reducing potential liability risk. The results suggested that current therapy supervision practices do not include sufficient management of the potential liability involved in therapy supervision. Better protections for patients, residents, supervisors and the institutions would be possible with improved credentialing practices and better documentation of informed consent and supervision policies and procedures. © 2017 American Academy of Psychiatry and the Law.
Raffo, Antonio; Carcea, Marina; Castagna, Claudia; Magrì, Andrea
2015-08-07
An improved method based on headspace solid phase microextraction combined with gas chromatography-mass spectrometry (HS-SPME/GC-MS) was proposed for the semi-quantitative determination of wheat bread volatile compounds isolated from both whole slice and crust samples. A DVB/CAR/PDMS fibre was used to extract volatiles from the headspace of a bread powdered sample dispersed in a sodium chloride (20%) aqueous solution and kept for 60min at 50°C under controlled stirring. Thirty-nine out of all the extracted volatiles were fully identified, whereas for 95 other volatiles a tentative identification was proposed, to give a complete as possible profile of wheat bread volatile compounds. The use of an array of ten structurally and physicochemically similar internal standards allowed to markedly improve method precision with respect to previous HS-SPME/GC-MS methods for bread volatiles. Good linearity of the method was verified for a selection of volatiles from several chemical groups by calibration with matrix-matched extraction solutions. This simple, rapid, precise and sensitive method could represent a valuable tool to obtain semi-quantitative information when investigating the influence of technological factors on volatiles formation in wheat bread and other bakery products. Copyright © 2015 Elsevier B.V. All rights reserved.
Applying Information Processing Theory to Supervision: An Initial Exploration
ERIC Educational Resources Information Center
Tangen, Jodi L.; Borders, L. DiAnne
2017-01-01
Although clinical supervision is an educational endeavor (Borders & Brown, [Borders, L. D., 2005]), many scholars neglect theories of learning in working with supervisees. The authors describe 1 learning theory--information processing theory (Atkinson & Shiffrin, 1968, 1971; Schunk, 2016)--and the ways its associated interventions may…
Cancer Patients' Informational Needs: Qualitative Content Analysis.
Heidari, Haydeh; Mardani-Hamooleh, Marjan
2016-12-01
Understanding the informational needs of cancer patients is a requirement to plan any educative care program for them. The aim of this study was to identify Iranian cancer patients' perceptions of informational needs. The study took a qualitative approach. Semi-structured interviews were held with 25 cancer patients in two teaching hospitals in Iran. Transcripts of the interviews underwent conventional content analysis, and categories were extracted. The results came under two main categories: disease-related informational needs and information needs related to daily life. Disease-related informational needs had two subcategories: obtaining information about the nature of disease and obtaining information about disease prognosis. Information needs related to daily life also had two subcategories: obtaining information about healthy lifestyle and obtaining information about regular activities of daily life. The findings provide deep understanding of cancer patients' informational needs in Iran.
Data Warehouse Design from HL7 Clinical Document Architecture Schema.
Pecoraro, Fabrizio; Luzi, Daniela; Ricci, Fabrizio L
2015-01-01
This paper proposes a semi-automatic approach to extract clinical information structured in a HL7 Clinical Document Architecture (CDA) and transform it in a data warehouse dimensional model schema. It is based on a conceptual framework published in a previous work that maps the dimensional model primitives with CDA elements. Its feasibility is demonstrated providing a case study based on the analysis of vital signs gathered during laboratory tests.
On supervised graph Laplacian embedding CA model & kernel construction and its application
NASA Astrophysics Data System (ADS)
Zeng, Junwei; Qian, Yongsheng; Wang, Min; Yang, Yongzhong
2017-01-01
There are many methods to construct kernel with given data attribute information. Gaussian radial basis function (RBF) kernel is one of the most popular ways to construct a kernel. The key observation is that in real-world data, besides the data attribute information, data label information also exists, which indicates the data class. In order to make use of both data attribute information and data label information, in this work, we propose a supervised kernel construction method. Supervised information from training data is integrated into standard kernel construction process to improve the discriminative property of resulting kernel. A supervised Laplacian embedding cellular automaton model is another key application developed for two-lane heterogeneous traffic flow with the safe distance and large-scale truck. Based on the properties of traffic flow in China, we re-calibrate the cell length, velocity, random slowing mechanism and lane-change conditions and use simulation tests to study the relationships among the speed, density and flux. The numerical results show that the large-scale trucks will have great effects on the traffic flow, which are relevant to the proportion of the large-scale trucks, random slowing rate and the times of the lane space change.
Qi, Xiubin; Crooke, Emma; Ross, Andrew; Bastow, Trevor P; Stalvies, Charlotte
2011-09-21
This paper presents a system and method developed to identify a source oil's characteristic properties by testing the oil's dissolved components in water. Through close examination of the oil dissolution process in water, we hypothesise that when oil is in contact with water, the resulting oil-water extract, a complex hydrocarbon mixture, carries the signature property information of the parent oil. If the dominating differences in compositions between such extracts of different oils can be identified, this information could guide the selection of various sensors, capable of capturing such chemical variations. When used as an array, such a sensor system can be used to determine parent oil information from the oil-water extract. To test this hypothesis, 22 oils' water extracts were prepared and selected dominant hydrocarbons analyzed with Gas Chromatography-Mass Spectrometry (GC-MS); the subsequent Principal Component Analysis (PCA) indicates that the major difference between the extract solutions is the relative concentration between the volatile mono-aromatics and fluorescent polyaromatics. An integrated sensor array system that is composed of 3 volatile hydrocarbon sensors and 2 polyaromatic hydrocarbon sensors was built accordingly to capture the major and subtle differences of these extracts. It was tested by exposure to a total of 110 water extract solutions diluted from the 22 extracts. The sensor response data collected from the testing were processed with two multivariate analysis tools to reveal information retained in the response patterns of the arrayed sensors: by conducting PCA, we were able to demonstrate the ability to qualitatively identify and distinguish different oil samples from their sensor array response patterns. When a supervised PCA, Linear Discriminate Analysis (LDA), was applied, even quantitative classification can be achieved: the multivariate model generated from the LDA achieved 89.7% of successful classification of the type of the oil samples. By grouping the samples based on the level of viscosity and density we were able to reveal the correlation between the oil extracts' sensor array responses and their original oils' feature properties. The equipment and method developed in this study have promising potential to be readily applied in field studies and marine surveys for oil exploration or oil spill monitoring.
Valente, João; Vieira, Pedro M; Couto, Carlos; Lima, Carlos S
2018-02-01
Poor brain extraction in Magnetic Resonance Imaging (MRI) has negative consequences in several types of brain post-extraction such as tissue segmentation and related statistical measures or pattern recognition algorithms. Current state of the art algorithms for brain extraction work on weighted T1 and T2, being not adequate for non-whole brain images such as the case of T2*FLASH@7T partial volumes. This paper proposes two new methods that work directly in T2*FLASH@7T partial volumes. The first is an improvement of the semi-automatic threshold-with-morphology approach adapted to incomplete volumes. The second method uses an improved version of a current implementation of the fuzzy c-means algorithm with bias correction for brain segmentation. Under high inhomogeneity conditions the performance of the first method degrades, requiring user intervention which is unacceptable. The second method performed well for all volumes, being entirely automatic. State of the art algorithms for brain extraction are mainly semi-automatic, requiring a correct initialization by the user and knowledge of the software. These methods can't deal with partial volumes and/or need information from atlas which is not available in T2*FLASH@7T. Also, combined volumes suffer from manipulations such as re-sampling which deteriorates significantly voxel intensity structures making segmentation tasks difficult. The proposed method can overcome all these difficulties, reaching good results for brain extraction using only T2*FLASH@7T volumes. The development of this work will lead to an improvement of automatic brain lesions segmentation in T2*FLASH@7T volumes, becoming more important when lesions such as cortical Multiple-Sclerosis need to be detected. Copyright © 2017 Elsevier B.V. All rights reserved.
[Situations related to drug misuse in public schools in the city of São Paulo, Brazil].
Moreira, Fernanda Gonçalves; Silveira, Dartiu Xavier da; Andreoli, Sérgio Baxter
2006-10-01
To explore situations, attitudes and behavior of public elementary school education supervisors concerning psychoactive substance misuse. The study was carried out in the city of São Paulo, Southeastern Brazil, in 2002. Data were collected using a semi-structured questionnaire applied to eight key informants in the administrative area experienced in education supervision. Qualitative content analysis with ethnographic reference was conducted. Most discourses show that knowledge transmission is thought as essential for drug use prevention, though supervisors reported being ill-informed on this subject. The most frequent attitudes toward drug users are impotence and inability to act and sometimes a repressive attitude. These are motivated by misinformation and fear due to mistaken association of drug users and criminals. In situations indirectly related to drug abuse (family and behavior problems) more understanding and inclusive attitudes are reported, following the harm reduction paradigm. Theoretical capacity building of educators for preventive attitudes would support their skills developed through dealing with situations (in)directly related to drug abuse in schools. Thus, educators would feel more confident to make interventions for harm or risk reduction among drug users.
Federal Register 2010, 2011, 2012, 2013, 2014
2010-12-01
... DEPARTMENT OF THE TREASURY Office of Thrift Supervision Agency Information Collection Activities: Submission for OMB Review: Interagency Charter and Federal Insurance Application AGENCY: Office of Thrift... required by the Paperwork Reduction Act of 1995, 44 U.S.C. 3507. The Office of Thrift Supervision within...
NASA Astrophysics Data System (ADS)
Ahmed, Z.; Habib, M.; Sid Ali, H.; Sofiane, K.
2015-04-01
The degradation of natural resources in arid and semi-arid areas was highlighted dramatically during this century due to population growth and transformation of land use systems. The Algerian steppe has undergone a regression over the past decade due to drought cycle, the extension of areas cultivated in marginal lands, population growth and overgrazing. These phenomena have led to different degradation processes, such as the destruction of vegetation, soil erosion, and deterioration of the physical environment. In this study, the work is mainly based on the criteria for classification and identification of physical parameters for spatial analysis, and multi-sources factors to determine the vulnerability of steppe formations and their impact on desertification. To do this, we used satellite data Alsat-1 (2009) IRS (2009) and LANDSAT TM (2001). These cross-sectional data with exogenous information could monitor the impact of the semi arid ecological diversity of steppe formations. A hierarchical process including the supervised image classification was used to characterize the main steppe formations. An analysis of the vulnerability of plant was conducted to assign weights and identify areas most susceptible to desertification. Vegetation indices combined with classification are used to characterize the forest and steppe formations to determine changes in land use. The results of this present study provide maps of different components of the steppe, formation that could assist in highlighting the magnitude of the degradation pathways, which affects the steppe environment, allowing an analysis of the process of desertification in the region.
Semi-automatic segmentation of brain tumors using population and individual information.
Wu, Yao; Yang, Wei; Jiang, Jun; Li, Shuanqian; Feng, Qianjin; Chen, Wufan
2013-08-01
Efficient segmentation of tumors in medical images is of great practical importance in early diagnosis and radiation plan. This paper proposes a novel semi-automatic segmentation method based on population and individual statistical information to segment brain tumors in magnetic resonance (MR) images. First, high-dimensional image features are extracted. Neighborhood components analysis is proposed to learn two optimal distance metrics, which contain population and patient-specific information, respectively. The probability of each pixel belonging to the foreground (tumor) and the background is estimated by the k-nearest neighborhood classifier under the learned optimal distance metrics. A cost function for segmentation is constructed through these probabilities and is optimized using graph cuts. Finally, some morphological operations are performed to improve the achieved segmentation results. Our dataset consists of 137 brain MR images, including 68 for training and 69 for testing. The proposed method overcomes segmentation difficulties caused by the uneven gray level distribution of the tumors and even can get satisfactory results if the tumors have fuzzy edges. Experimental results demonstrate that the proposed method is robust to brain tumor segmentation.
District health managers' perceptions of supervision in Malawi and Tanzania.
Bradley, Susan; Kamwendo, Francis; Masanja, Honorati; de Pinho, Helen; Waxman, Rachel; Boostrom, Camille; McAuliffe, Eilish
2013-09-05
Mid-level cadres are being used to address human resource shortages in many African contexts, but insufficient and ineffective human resource management is compromising their performance. Supervision plays a key role in performance and motivation, but is frequently characterised by periodic inspection and control, rather than support and feedback to improve performance. This paper explores the perceptions of district health management teams in Tanzania and Malawi on their role as supervisors and on the challenges to effective supervision at the district level. This qualitative study took place as part of a broader project, "Health Systems Strengthening for Equity: The Power and Potential of Mid-Level Providers". Semi-structured interviews were conducted with 20 district health management team personnel in Malawi and 37 council health team members in Tanzania. The interviews covered a range of human resource management issues, including supervision and performance assessment, staff job descriptions and roles, motivation and working conditions. Participants displayed varying attitudes to the nature and purpose of the supervision process. Much of the discourse in Malawi centred on inspection and control, while interviewees in Tanzania were more likely to articulate a paradigm characterised by support and improvement. In both countries, facility level performance metrics dominated. The lack of competency-based indicators or clear standards to assess individual health worker performance were considered problematic. Shortages of staff, at both district and facility level, were described as a major impediment to carrying out regular supervisory visits. Other challenges included conflicting and multiple responsibilities of district health team staff and financial constraints. Supervision is a central component of effective human resource management. Policy level attention is crucial to ensure a systematic, structured process that is based on common understandings of the role and purpose of supervision. This is particularly important in a context where the majority of staff are mid-level cadres for whom regulation and guidelines may not be as formalised or well-developed as for traditional cadres, such as registered nurses and medical doctors. Supervision needs to be adequately resourced and supported in order to improve performance and retention at the district level.
District health managers’ perceptions of supervision in Malawi and Tanzania
2013-01-01
Background Mid-level cadres are being used to address human resource shortages in many African contexts, but insufficient and ineffective human resource management is compromising their performance. Supervision plays a key role in performance and motivation, but is frequently characterised by periodic inspection and control, rather than support and feedback to improve performance. This paper explores the perceptions of district health management teams in Tanzania and Malawi on their role as supervisors and on the challenges to effective supervision at the district level. Methods This qualitative study took place as part of a broader project, “Health Systems Strengthening for Equity: The Power and Potential of Mid-Level Providers”. Semi-structured interviews were conducted with 20 district health management team personnel in Malawi and 37 council health team members in Tanzania. The interviews covered a range of human resource management issues, including supervision and performance assessment, staff job descriptions and roles, motivation and working conditions. Results Participants displayed varying attitudes to the nature and purpose of the supervision process. Much of the discourse in Malawi centred on inspection and control, while interviewees in Tanzania were more likely to articulate a paradigm characterised by support and improvement. In both countries, facility level performance metrics dominated. The lack of competency-based indicators or clear standards to assess individual health worker performance were considered problematic. Shortages of staff, at both district and facility level, were described as a major impediment to carrying out regular supervisory visits. Other challenges included conflicting and multiple responsibilities of district health team staff and financial constraints. Conclusion Supervision is a central component of effective human resource management. Policy level attention is crucial to ensure a systematic, structured process that is based on common understandings of the role and purpose of supervision. This is particularly important in a context where the majority of staff are mid-level cadres for whom regulation and guidelines may not be as formalised or well-developed as for traditional cadres, such as registered nurses and medical doctors. Supervision needs to be adequately resourced and supported in order to improve performance and retention at the district level. PMID:24007354
Automatically identifying health- and clinical-related content in wikipedia.
Liu, Feifan; Moosavinasab, Soheil; Agarwal, Shashank; Bennett, Andrew S; Yu, Hong
2013-01-01
Physicians are increasingly using the Internet for finding medical information related to patient care. Wikipedia is a valuable online medical resource to be integrated into existing clinical question answering (QA) systems. On the other hand, Wikipedia contains a full spectrum of world's knowledge and therefore comprises a large partition of non-health-related content, which makes disambiguation more challenging and consequently leads to large overhead for existing systems to effectively filter irrelevant information. To overcome this, we have developed both unsupervised and supervised approaches to identify health-related articles as well as clinically relevant articles. Furthermore, we explored novel features by extracting health related hierarchy from the Wikipedia category network, from which a variety of features were derived and evaluated. Our experiments show promising results and also demonstrate that employing the category hierarchy can effectively improve the system performance.
Choy-Brown, Mimi; Stanhope, Victoria; Tiderington, Emmy; Padgett, Deborah K
2016-07-01
Behavioral health organizations use clinical supervision to ensure professional development and practice quality. This qualitative study examined 35 service coordinators' perspectives on supervision in two distinct supportive housing program types (permanent and transitional). Thematic analysis of in-depth interviews yielded three contrast themes: support versus scrutiny, planned versus impromptu time, and housing first versus treatment first. Supervisory content and format resulted in differential perceptions of supervision, thereby influencing opportunities for learning. These findings suggest that unpacking discrete elements of supervision enactment in usual care settings can inform implementation of recovery-oriented practice.
Choy-Brown, Mimi; Stanhope, Victoria; Tiderington, Emmy; Padgett, Deborah K.
2015-01-01
Behavioral health organizations use clinical supervision to ensure professional development and practice quality. This qualitative study examined 35 service coordinators' perspectives on supervision in two distinct supportive housing program types (permanent and transitional). Thematic analysis of in-depth interviews yielded three contrast themes: support versus scrutiny, planned versus impromptu time, and Housing First versus Treatment First. Supervisory content and format resulted in differential perceptions of supervision, thereby influencing opportunities for learning. These findings suggest that unpacking discrete elements of supervision enactment in usual care settings can inform implementation of recovery-oriented practice. PMID:26066866
High-recall protein entity recognition using a dictionary
Kou, Zhenzhen; Cohen, William W.; Murphy, Robert F.
2010-01-01
Protein name extraction is an important step in mining biological literature. We describe two new methods for this task: semiCRFs and dictionary HMMs. SemiCRFs are a recently-proposed extension to conditional random fields that enables more effective use of dictionary information as features. Dictionary HMMs are a technique in which a dictionary is converted to a large HMM that recognizes phrases from the dictionary, as well as variations of these phrases. Standard training methods for HMMs can be used to learn which variants should be recognized. We compared the performance of our new approaches to that of Maximum Entropy (Max-Ent) and normal CRFs on three datasets, and improvement was obtained for all four methods over the best published results for two of the datasets. CRFs and semiCRFs achieved the highest overall performance according to the widely-used F-measure, while the dictionary HMMs performed the best at finding entities that actually appear in the dictionary—the measure of most interest in our intended application. PMID:15961466
Direct Supervision in Outpatient Psychiatric Graduate Medical Education.
Galanter, Cathryn A; Nikolov, Roumen; Green, Norma; Naidoo, Shivana; Myers, Michael F; Merlino, Joseph P
2016-02-01
The authors describe a stimulus case that led training staff to examine and revise the supervision policy of the adult and child and adolescent psychiatry clinics. To inform the revisions, the authors reviewed the literature and national policies. The authors conducted a literature review in PubMed using the following criteria: Supervision, Residents, Training, Direct, and Indirect and a supplemental review in Academic Psychiatry. The authors reviewed institutional and Accreditation Council for Graduate Medical Education resident and fellow supervision policies to develop an outpatient and fellow supervision policy. Research is limited in psychiatry with three experimental articles demonstrating positive impact of direct supervision and several suggesting different techniques for direct supervision. In other areas of medicine, direct supervision is associated with improved educational and patient outcomes. The authors present details of our new supervision policy including triggers for direct supervision. The term direct supervision is relatively new in psychiatry and medical education. There is little published on the extent of implementation of direct supervision and on its impact on the educational experience of psychiatry trainees and other medical specialties. Direct supervision has been associated with improved educational and patient outcomes in nonpsychiatric fields of medicine. More research is needed on the implementation of, indications for, and effects of direct supervision on trainee education and on patient outcomes.
Panda, Bhuputra; Pati, Sanghamitra; Nallala, Srinivas; Chauhan, Abhimanyu S; Anasuya, Anita; Som, Meena; Zodpey, Sanjay
2015-01-01
Routine immunization (RI) is a key child survival intervention. Ensuring acceptable standards of RI service delivery is critical for optimal outcomes. Accumulated evidences suggest that 'supportive supervision' improves the quality of health care services in general. During 2009-2010, the Government of Odisha and UNICEF jointly piloted this strategy in four districts to improve RI program outcomes. The present study aims to assess the effect of this strategy on improvement of skills and practices at immunization session sites. A quasi-experimental 'post-test only' study design was adopted to compare the opinion and practices of frontline health workers and their supervisors in four intervention districts (IDs) with two control districts (CDs). Altogether, we interviewed 111 supervisor-supervisee (health worker) pairs using semi-structured interview schedules and case vignettes. We also directly observed health workers' practices during immunization sessions at 111 sites. Data were analyzed with SPSS version 16.0. The mean knowledge score of supervisors in CDs was significantly higher than in intervention groups. Variegated responses were obtained on case vignettes. The control group performed better in solving certain hypothetically asked problems, whereas the intervention group scored better in others. Health workers in IDs gave a lower rating to their respective supervisors' knowledge, skill, and frequency of supervision. Logistics and vaccine availability were better in CDs. Notwithstanding other limitations, supportive supervision may not have independent effects on improving the quality of immunization services. Addressing systemic issues, such as the availability of essential logistics, supply chain management, timely indenting, and financial resources, could complement the supportive supervision strategy in improving immunization service delivery.
Disciplinary supervision following ethics complaints: goals, tasks, and ethical dimensions.
Thomas, Janet T
2014-11-01
Clinical supervision is considered an integral component of the training of psychologists, and most of the professional literature is focused on this type of supervision. But psychologists also may supervise fully credentialed colleagues in other circumstances. One such context occurs when licensing boards mandate supervision as part of a disciplinary order. When supervision is provided in disciplinary cases, there are significant implications for the ethical dimensions of the supervisory relationship and concomitant ethical challenges for supervisors. Not only are the goals, objectives, and supervisory tasks of disciplinary supervision distinct from other types of supervision, but the supervisor's ethical responsibilities also encompass unique dimensions. Competence, informed consent, boundaries, confidentiality, and documentation are examined. Recommendations for reports to licensing boards include a statement of the clinical or ethical problems instigating discipline, description of how these problems have been addressed, and an assessment of the supervisee's current practices and ability to perform competently. © 2014 Wiley Periodicals, Inc.
Clinical supervision of psychotherapy: essential ethics issues for supervisors and supervisees.
Barnett, Jeffrey E; Molzon, Corey H
2014-11-01
Clinical supervision is an essential aspect of every mental health professional's training. The importance of ensuring that supervision is provided competently, ethically, and legally is explained. The elements of the ethical practice of supervision are described and explained. Specific issues addressed include informed consent and the supervision contract, supervisor and supervisee competence, attention to issues of diversity and multicultural competence, boundaries and multiple relationships in the supervision relationship, documentation and record keeping by both supervisor and supervisee, evaluation and feedback, self-care and the ongoing promotion of wellness, emergency coverage, and the ending of the supervision relationship. Additionally, the role of clinical supervisor as mentor, professional role model, and gatekeeper for the profession are discussed. Specific recommendations are provided for ethically and effectively conducting the supervision relationship and for addressing commonly arising dilemmas that supervisors and supervisees may confront. © 2014 Wiley Periodicals, Inc.
Fluctuating snow line altitudes in the Hunza basin (Karakoram) using Landsat OLI imagery
NASA Astrophysics Data System (ADS)
Racoviteanu, Adina; Rittger, Karl; Brodzik, Mary J.; Painter, Thomas H.; Armstrong, Richard
2016-04-01
Snowline altitudes (SLAs) on glacier surfaces are needed for separating snow and ice as input for melt models. When measured at the end of the ablation season, SLAs are used for inferring stable-state glacier equilibrium line altitudes (ELAs). Direct measurements of snowlines are rarely possible particularly in remote, high altitude glacierized terrain, but remote sensing data can be used to separate these snow and ice surfaces. Snow lines are commonly visible on optical satellite images acquired at the end of the ablation season if the images are contrasted enough, and are manually digitized on screen using various satellite band combinations for visual interpretation, which is a time-consuming, subjective process. Here we use Landsat OLI imagery at 30 m resolution to estimate glacier SLAs for a subset of the Hunza basin in the Upper Indus in the Karakoram. Clean glacier ice surfaces are delineated using a standardized semi-automated band ratio algorithm with image segmentation. Within the glacier surface, snow and ice are separated using supervised classification schemes based on regions of interest, and glacier SLAs are extracted on the basis of these areas. SLAs are compared with estimates from a new automated method that relies on fractional snow covered area rather than on band ratio algorithms for delineating clean glacier ice surfaces, and on grain size (instead of supervised classification) for separating snow from glacier ice on the glacier surface. The two methods produce comparable snow/ice outputs. The fSCA-derived glacierized areas are slightly larger than the band ratio estimates. Some of the additional area is the result of better detection in shadows from spectral mixture analysis (true positive) while the rest is shallow water, which is spectrally similar to snow/ice (false positive). On the glacier surface, a thresholding the snow grain size image (grain size > 500μm) results in similar glacier ice areas derived from the supervised classification, but there is noise (snow) on edges of dirty ice/ moraines at the glacier termini and around rock outcrops on the glacier surface. Neither of the two methods distinguishes the debris-covered ice, so these were mapped separately using a combination of topographic indices (slope, terrain curvature), along with remote sensing surface temperature and texture data. Using average elevation of snow and ice areas, we calculate an ELA of 5260 m for 2013. We construct yearly time series of the ELAs around the centerlines of selected glaciers in the Hunza for the period 2000 - 2014 using Landsat imagery. We explore spatial trends in glacier ELAs within the region, as well as relationships between ELA and topographic characteristics extracted on a glacier-by-glacier basis from a digital elevation model.
A Semi-supervised Heat Kernel Pagerank MBO Algorithm for Data Classification
2016-07-01
financial predictions, etc. and is finding growing use in text mining studies. In this paper, we present an efficient algorithm for classification of high...video data, set of images, hyperspectral data, medical data, text data, etc. Moreover, the framework provides a way to analyze data whose different...also be incorporated. For text classification, one can use tfidf (term frequency inverse document frequency) to form feature vectors for each document
Semi-Supervised Multiple Feature Analysis for Action Recognition
2013-11-26
Technology and Electrical Engineering, University of Queensland, Brisbane, Australia ( e -mail: sen.wang@uq.edu.au; yi.yang@uq.edu.au). Z. Ma is with...the Language Technologies Institute, Carnegie Mellon Univer- sity, Pittsburgh, PA 15213 USA ( e -mail: kevinma@cs.cmu.edu). X. Li is with the School of...Service Computing in Cyber Physical Society, Chongqing University, Chongqing, China ( e -mail: xueli@itee.uq.edu.au). C. Pang is with the Australian e
NASA Astrophysics Data System (ADS)
Pilant, A. N.; Baynes, J.; Dannenberg, M.; Riegel, J.; Rudder, C.; Endres, K.
2013-12-01
US EPA EnviroAtlas is an online collection of tools and resources that provides geospatial data, maps, research, and analysis on the relationships between nature, people, health, and the economy (http://www.epa.gov/research/enviroatlas/index.htm). Using EnviroAtlas, you can see and explore information related to the benefits (e.g., ecosystem services) that humans receive from nature, including clean air, clean and plentiful water, natural hazard mitigation, biodiversity conservation, food, fuel, and materials, recreational opportunities, and cultural and aesthetic value. EPA developed several urban land cover maps at very high spatial resolution (one-meter pixel size) for a portion of EnviroAtlas devoted to urban studies. This urban mapping effort supported analysis of relations among land cover, human health and demographics at the US Census Block Group level. Supervised classification of 2010 USDA NAIP (National Agricultural Imagery Program) digital aerial photos produced eight-class land cover maps for several cities, including Durham, NC, Portland, ME, Tampa, FL, New Bedford, MA, Pittsburgh, PA, Portland, OR, and Milwaukee, WI. Semi-automated feature extraction methods were used to classify the NAIP imagery: genetic algorithms/machine learning, random forest, and object-based image analysis (OBIA). In this presentation we describe the image processing and fuzzy accuracy assessment methods used, and report on some sustainability and ecosystem service metrics computed using this land cover as input (e.g., carbon sequestration from USFS iTREE model; health and demographics in relation to road buffer forest width). We also discuss the land cover classification schema (a modified Anderson Level 1 after the National Land Cover Data (NLCD)), and offer some observations on lessons learned. Meter-scale urban land cover in Portland, OR overlaid on NAIP aerial photo. Streets, buildings and individual trees are identifiable.
Munkhdalai, Tsendsuren; Li, Meijing; Batsuren, Khuyagbaatar; Park, Hyeon Ah; Choi, Nak Hyeon; Ryu, Keun Ho
2015-01-01
Chemical and biomedical Named Entity Recognition (NER) is an essential prerequisite task before effective text mining can begin for biochemical-text data. Exploiting unlabeled text data to leverage system performance has been an active and challenging research topic in text mining due to the recent growth in the amount of biomedical literature. We present a semi-supervised learning method that efficiently exploits unlabeled data in order to incorporate domain knowledge into a named entity recognition model and to leverage system performance. The proposed method includes Natural Language Processing (NLP) tasks for text preprocessing, learning word representation features from a large amount of text data for feature extraction, and conditional random fields for token classification. Other than the free text in the domain, the proposed method does not rely on any lexicon nor any dictionary in order to keep the system applicable to other NER tasks in bio-text data. We extended BANNER, a biomedical NER system, with the proposed method. This yields an integrated system that can be applied to chemical and drug NER or biomedical NER. We call our branch of the BANNER system BANNER-CHEMDNER, which is scalable over millions of documents, processing about 530 documents per minute, is configurable via XML, and can be plugged into other systems by using the BANNER Unstructured Information Management Architecture (UIMA) interface. BANNER-CHEMDNER achieved an 85.68% and an 86.47% F-measure on the testing sets of CHEMDNER Chemical Entity Mention (CEM) and Chemical Document Indexing (CDI) subtasks, respectively, and achieved an 87.04% F-measure on the official testing set of the BioCreative II gene mention task, showing remarkable performance in both chemical and biomedical NER. BANNER-CHEMDNER system is available at: https://bitbucket.org/tsendeemts/banner-chemdner.
12 CFR 1070.60 - Exempt records.
Code of Federal Regulations, 2014 CFR
2014-01-01
... Supervision Database (2) CFPB.003 Non-Depository Institution Supervision Database (3) CFPB.004 Enforcement Database (4) CFPB.005 Consumer Response System (b) Information compiled for civil actions or proceedings...
12 CFR 1070.60 - Exempt records.
Code of Federal Regulations, 2012 CFR
2012-01-01
... Supervision Database (2) CFPB.003 Non-Depository Institution Supervision Database (3) CFPB.004 Enforcement Database (b) Information compiled for civil actions or proceedings. This regulation does not permit an...
12 CFR 1070.60 - Exempt records.
Code of Federal Regulations, 2013 CFR
2013-01-01
... Supervision Database (2) CFPB.003 Non-Depository Institution Supervision Database (3) CFPB.004 Enforcement Database (b) Information compiled for civil actions or proceedings. This regulation does not permit an...
Improving practice in community-based settings: a randomized trial of supervision - study protocol.
Dorsey, Shannon; Pullmann, Michael D; Deblinger, Esther; Berliner, Lucy; Kerns, Suzanne E; Thompson, Kelly; Unützer, Jürgen; Weisz, John R; Garland, Ann F
2013-08-10
Evidence-based treatments for child mental health problems are not consistently available in public mental health settings. Expanding availability requires workforce training. However, research has demonstrated that training alone is not sufficient for changing provider behavior, suggesting that ongoing intervention-specific supervision or consultation is required. Supervision is notably under-investigated, particularly as provided in public mental health. The degree to which supervision in this setting includes 'gold standard' supervision elements from efficacy trials (e.g., session review, model fidelity, outcome monitoring, skill-building) is unknown. The current federally-funded investigation leverages the Washington State Trauma-focused Cognitive Behavioral Therapy Initiative to describe usual supervision practices and test the impact of systematic implementation of gold standard supervision strategies on treatment fidelity and clinical outcomes. The study has two phases. We will conduct an initial descriptive study (Phase I) of supervision practices within public mental health in Washington State followed by a randomized controlled trial of gold standard supervision strategies (Phase II), with randomization at the clinician level (i.e., supervisors provide both conditions). Study participants will be 35 supervisors and 130 clinicians in community mental health centers. We will enroll one child per clinician in Phase I (N = 130) and three children per clinician in Phase II (N = 390). We use a multi-level mixed within- and between-subjects longitudinal design. Audio recordings of supervision and therapy sessions will be collected and coded throughout both phases. Child outcome data will be collected at the beginning of treatment and at three and six months into treatment. This study will provide insight into how supervisors can optimally support clinicians delivering evidence-based treatments. Phase I will provide descriptive information, currently unavailable in the literature, about commonly used supervision strategies in community mental health. The Phase II randomized controlled trial of gold standard supervision strategies is, to our knowledge, the first experimental study of gold standard supervision strategies in community mental health and will yield needed information about how to leverage supervision to improve clinician fidelity and client outcomes. ClinicalTrials.gov NCT01800266.
Improving practice in community-based settings: a randomized trial of supervision – study protocol
2013-01-01
Background Evidence-based treatments for child mental health problems are not consistently available in public mental health settings. Expanding availability requires workforce training. However, research has demonstrated that training alone is not sufficient for changing provider behavior, suggesting that ongoing intervention-specific supervision or consultation is required. Supervision is notably under-investigated, particularly as provided in public mental health. The degree to which supervision in this setting includes ‘gold standard’ supervision elements from efficacy trials (e.g., session review, model fidelity, outcome monitoring, skill-building) is unknown. The current federally-funded investigation leverages the Washington State Trauma-focused Cognitive Behavioral Therapy Initiative to describe usual supervision practices and test the impact of systematic implementation of gold standard supervision strategies on treatment fidelity and clinical outcomes. Methods/Design The study has two phases. We will conduct an initial descriptive study (Phase I) of supervision practices within public mental health in Washington State followed by a randomized controlled trial of gold standard supervision strategies (Phase II), with randomization at the clinician level (i.e., supervisors provide both conditions). Study participants will be 35 supervisors and 130 clinicians in community mental health centers. We will enroll one child per clinician in Phase I (N = 130) and three children per clinician in Phase II (N = 390). We use a multi-level mixed within- and between-subjects longitudinal design. Audio recordings of supervision and therapy sessions will be collected and coded throughout both phases. Child outcome data will be collected at the beginning of treatment and at three and six months into treatment. Discussion This study will provide insight into how supervisors can optimally support clinicians delivering evidence-based treatments. Phase I will provide descriptive information, currently unavailable in the literature, about commonly used supervision strategies in community mental health. The Phase II randomized controlled trial of gold standard supervision strategies is, to our knowledge, the first experimental study of gold standard supervision strategies in community mental health and will yield needed information about how to leverage supervision to improve clinician fidelity and client outcomes. Trial registration ClinicalTrials.gov NCT01800266 PMID:23937766
Near ground level sensing for spatial analysis of vegetation
NASA Technical Reports Server (NTRS)
Sauer, Tom; Rasure, John; Gage, Charlie
1991-01-01
Measured changes in vegetation indicate the dynamics of ecological processes and can identify the impacts from disturbances. Traditional methods of vegetation analysis tend to be slow because they are labor intensive; as a result, these methods are often confined to small local area measurements. Scientists need new algorithms and instruments that will allow them to efficiently study environmental dynamics across a range of different spatial scales. A new methodology that addresses this problem is presented. This methodology includes the acquisition, processing, and presentation of near ground level image data and its corresponding spatial characteristics. The systematic approach taken encompasses a feature extraction process, a supervised and unsupervised classification process, and a region labeling process yielding spatial information.
A software package for interactive motor unit potential classification using fuzzy k-NN classifier.
Rasheed, Sarbast; Stashuk, Daniel; Kamel, Mohamed
2008-01-01
We present an interactive software package for implementing the supervised classification task during electromyographic (EMG) signal decomposition process using a fuzzy k-NN classifier and utilizing the MATLAB high-level programming language and its interactive environment. The method employs an assertion-based classification that takes into account a combination of motor unit potential (MUP) shapes and two modes of use of motor unit firing pattern information: the passive and the active modes. The developed package consists of several graphical user interfaces used to detect individual MUP waveforms from a raw EMG signal, extract relevant features, and classify the MUPs into motor unit potential trains (MUPTs) using assertion-based classifiers.
Supporting and Supervising Teachers Working With Adults Learning English. CAELA Network Brief
ERIC Educational Resources Information Center
Young, Sarah
2009-01-01
This brief provides an overview of the knowledge and skills that administrators need in order to support and supervise teachers of adult English language learners. It begins with a review of resources and literature related to teacher supervision in general and to adult ESL education. It continues with information on the background and…
Ducat, Wendy H; Kumar, Saravana
2015-01-01
Introduction In regional, rural, and remote settings, allied health professional supervision is one organizational mechanism designed to support and retain the workforce, provide clinical governance, and enhance service delivery. A systematic approach to evaluating the evidence of the experience and effects of professional supervision for non-metropolitan allied health practitioners and their service delivery is needed. Methods Studies investigating the experience and effects of professional supervision across 17 allied health disciplines in non-metropolitan health services were systematically searched for using standardized keywords across seven databases. The initial search identified 1,574 references. Of these studies, five met inclusion criteria and were subject to full methodological appraisal by both reviewers. Two studies were primarily qualitative with three studies primarily quantitative in their approach. Studies were appraised using McMaster critical appraisal tools and data were extracted and synthesized. Results Studies reported the context specific benefits and challenges of supervision in non-metropolitan areas and the importance of supervision in enhancing satisfaction and support in these areas. Comparison of findings between metropolitan and non-metropolitan settings within one study suggested that allied health in non-metropolitan settings were more satisfied with supervision though less likely to access it and preferred supervision with other non-metropolitan practitioners over access to more experienced supervisors. One study in a regional health service identified the lack of an agreed upon definition and functions of supervision when supervisors from diverse allied health disciplines were surveyed. While methodologically weak, all studies reported positive perceptions of supervision across professionals, supervisors, and managers. This is in accordance with previous research in the wider supervision literature. Discussion Considering the large pool of studies retrieved for further investigation, few of these met inclusion criteria demonstrating the paucity of primary research in this area. Increased training, policies, and implementation frameworks to ensure the definition and functions of supervision are agreed upon across the allied health disciplines in non-metropolitan areas is needed. Furthermore, systematic evaluation of supervision implementation in non-metropolitan settings, investigation of the experience and effects of distance based supervision (versus face-to-face), and increased rigor in research studies investigating non-metropolitan allied health profession supervision is needed. PMID:26347446
Tuberculosis diagnosis support analysis for precarious health information systems.
Orjuela-Cañón, Alvaro David; Camargo Mendoza, Jorge Eliécer; Awad García, Carlos Enrique; Vergara Vela, Erika Paola
2018-04-01
Pulmonary tuberculosis is a world emergency for the World Health Organization. Techniques and new diagnosis tools are important to battle this bacterial infection. There have been many advances in all those fields, but in developing countries such as Colombia, where the resources and infrastructure are limited, new fast and less expensive strategies are increasingly needed. Artificial neural networks are computational intelligence techniques that can be used in this kind of problems and offer additional support in the tuberculosis diagnosis process, providing a tool to medical staff to make decisions about management of subjects under suspicious of tuberculosis. A database extracted from 105 subjects with precarious information of people under suspect of pulmonary tuberculosis was used in this study. Data extracted from sex, age, diabetes, homeless, AIDS status and a variable with clinical knowledge from the medical personnel were used. Models based on artificial neural networks were used, exploring supervised learning to detect the disease. Unsupervised learning was used to create three risk groups based on available information. Obtained results are comparable with traditional techniques for detection of tuberculosis, showing advantages such as fast and low implementation costs. Sensitivity of 97% and specificity of 71% where achieved. Used techniques allowed to obtain valuable information that can be useful for physicians who treat the disease in decision making processes, especially under limited infrastructure and data. Copyright © 2018 Elsevier B.V. All rights reserved.
A Collection Scheme for Tracing Information of Pig Safety Production
NASA Astrophysics Data System (ADS)
Luo, Qingyao; Xiong, Benhai; Yang, Liang
This study takes one main production pattern of smallhold pig farming in Tianjin as a study prototype, deeply analyzes characters of informations about tracing inputs including vaccines,feeds,veterinary drugs and supervision test in pig farming, proposesinputs metadata, criteria for integrating inputs event and interface norms for data transmision, developes and completes identification of 2D ear tags and traceability information collection system of pig safety production based on mobile PDA. The system has implemented functions including setting and invalidate of 2D ear tags, collection of tracing inputs and supervision in the mobile PDA and finally integration of tracing events (the epidemic event,feed event,drug event and supervision event) on the traceability data center (server). The PDA information collection system has been applied for demonstration in Tianjin, the collection is simple, convenient and feasible. It could meet with requirements of traceability information system of pig safety production
Extracting fields snow coverage information with HJ-1A/B satellites data
NASA Astrophysics Data System (ADS)
Dong, Wenquan; Meng, Jihua
2015-10-01
The distribution and change of snow coverage are sensitive factors of climate change. In northeast part of China, farmlands are still covered with snow in spring. Since sowing activity can only be done when the snow melted, fields snow coverage monitoring provides reference for the determination of sowing date. Because of the restriction of the sensors and application requirements, current researches on remote sensing of snow focus more on the study of musicale and large scale, rather than the study of small scale, and especially research on snow melting period is rarely reported.HJ-1A/B satellites are parts of little satellite constellation, focusing on environment and disaster monitoring and meteorological forecast. Compared to other data sources, HJ-1A/B satellites both have comparatively higher temporal and spatial resolution and are more conducive to monitor the variations of melting snow coverage at small watershed. This paper was based on HJ-1A/1B data, taking Hongxing farm of Bei'an, Heilongjiang Province, China as the study area. In this paper, we exploited the methods for extraction of snow cover information on farmland in two cases, both HJ-1A/1B CCD with HJ-1B IRS data and just HJ-1A/1B CCD data. The reason we chose the two cases is that, the two optical satellites HJ-1A/B are capable of providing a whole territory coverage period in visible light spectrum in two days, infrared spectrum in four days. So sometimes we can only obtain CCD image. In this case, the method of normalized snow index cannot be used to extract snow coverage information. Using HJ-1A/1B CCD with HJ-1B IRS data, combined with the theory of snow remote sensing monitoring, this paper analyzed spectral response characteristics of HJ-1A/1B satellites data, then the widely used Normalized Difference Snow Index(NDSI) and S3 Index were quoted to the HJ-1A/1B satellites data. The NDSI uses reflectance values of Red and SWIR spectral bands of HJ-1B, and S3 index uses reflectance values of NIR, Red and SWIR spectral bands. With multi-temporal HJ satellite data, the optimal threshold of normalized snow index was determined to divide the farmland into snow covering area, melting snow area and non-snow area. The results are quite similar to each other and of high accuracy, and the melting snow coverage can be well extracted by two types of normalized snow index. When we can only obtain CCD image, we use supervised classification method to extract melting snow coverage. With this method, the accuracy of fields snow coverage extraction is slightly lower than that using normalized snow index methods mentioned above. And in mountain area, the snow coverage area is slightly larger than that is extracted by normalized snow index methods, because the shadows make the color of snow in the valley darker, the supervised classification method divides it into non-snow coverage area, while the normalized snow index method well weakened the effect of shadow. This study shows that extraction accuracy in both cases is assessed, and both of them can meet the needs of practical applications. HJ-1A/1B satellites are conducive to monitor the variations of melting snow coverage over farmland, and they can provide reference for the determination of sowing date.
The validation of the Supervision of Thesis Questionnaire (STQ).
Henricson, Maria; Fridlund, Bengt; Mårtensson, Jan; Hedberg, Berith
2018-06-01
The supervision process is characterized by differences between the supervisors' and the students' expectations before the start of writing a bachelor thesis as well as after its completion. A review of the literature did not reveal any scientifically tested questionnaire for evaluating nursing students' expectations of the supervision process when writing a bachelor thesis. The aim of the study was to determine the construct validity and internal consistency reliability of a questionnaire for measuring nursing students' expectations of the bachelor thesis supervision process. The study had a developmental and methodological design carried out in four steps including construct validity and internal consistency reliability statistical procedures: construction of the items, assessment of face validity, data collection and data analysis. This study was conducted at a university in southern Sweden, where students on the "Nursing student thesis, 15 ECTS" course were consecutively selected for participation. Of the 512 questionnaires distributed, 327 were returned, a response rate of 64%. Five factors with a total variance of 74% and good communalities, ≥0.64, were extracted from the 10-item STQ. The internal consistency of the 10 items was 0.68. The five factors were labelled: The nature of the supervision process, The supervisor's role as a coach, The students' progression to self-support, The interaction between students and supervisor and supervisor competence. A didactic, useful and secure questionnaire measuring nursing students' expectations of the bachelor thesis supervision process based on three main forms of supervision was created. Copyright © 2018 Elsevier Ltd. All rights reserved.
Yee, Kwang Chien; Madden, Angela; Nash, Rosie; Connolly, Michael
2017-01-01
Clinical communication and clinical supervision of junior healthcare professionals are identified as the two most common preventable factors to reduce medical errors. While multiple strategies have been implemented to improve clinical communication, clinical supervision has not attracted as much attention. This is in part due to the lack of understanding of clinical supervision. Furthermore, there is a lack of exploration of information communication technology (ICT) in assisting the delivery of clinical supervision from the perspective of users (i.e. junior clinicians). This paper presents a study to understand clinical supervision from the perspective of medical and pharmacy interns. The important elements of good clinical supervisors and good clinical supervision have been presented in this paper based on our study. More importantly, our results suggest a distinction between good supervisors and good supervisions. Both these factors impact on patient safety. Through discussion of user requirements of good supervision by users (interns), this paper then explores and presents a conceptual framework to assist in the discussion and design of ICT by healthcare organisations to improve clinical supervision of interns and therefore improve patient safety.
Semi-Supervised Learning of Lift Optimization of Multi-Element Three-Segment Variable Camber Airfoil
NASA Technical Reports Server (NTRS)
Kaul, Upender K.; Nguyen, Nhan T.
2017-01-01
This chapter describes a new intelligent platform for learning optimal designs of morphing wings based on Variable Camber Continuous Trailing Edge Flaps (VCCTEF) in conjunction with a leading edge flap called the Variable Camber Krueger (VCK). The new platform consists of a Computational Fluid Dynamics (CFD) methodology coupled with a semi-supervised learning methodology. The CFD component of the intelligent platform comprises of a full Navier-Stokes solution capability (NASA OVERFLOW solver with Spalart-Allmaras turbulence model) that computes flow over a tri-element inboard NASA Generic Transport Model (GTM) wing section. Various VCCTEF/VCK settings and configurations were considered to explore optimal design for high-lift flight during take-off and landing. To determine globally optimal design of such a system, an extremely large set of CFD simulations is needed. This is not feasible to achieve in practice. To alleviate this problem, a recourse was taken to a semi-supervised learning (SSL) methodology, which is based on manifold regularization techniques. A reasonable space of CFD solutions was populated and then the SSL methodology was used to fit this manifold in its entirety, including the gaps in the manifold where there were no CFD solutions available. The SSL methodology in conjunction with an elastodynamic solver (FiDDLE) was demonstrated in an earlier study involving structural health monitoring. These CFD-SSL methodologies define the new intelligent platform that forms the basis for our search for optimal design of wings. Although the present platform can be used in various other design and operational problems in engineering, this chapter focuses on the high-lift study of the VCK-VCCTEF system. Top few candidate design configurations were identified by solving the CFD problem in a small subset of the design space. The SSL component was trained on the design space, and was then used in a predictive mode to populate a selected set of test points outside of the given design space. The new design test space thus populated was evaluated by using the CFD component by determining the error between the SSL predictions and the true (CFD) solutions, which was found to be small. This demonstrates the proposed CFD-SSL methodologies for isolating the best design of the VCK-VCCTEF system, and it holds promise for quantitatively identifying best designs of flight systems, in general.
Martin, Shannon K; Tulla, Kiara; Meltzer, David O; Arora, Vineet M; Farnan, Jeanne M
2017-12-01
Advances in information technology have increased remote access to the electronic health record (EHR). Concurrently, standards defining appropriate resident supervision have evolved. How often and under what circumstances inpatient attending physicians remotely access the EHR for resident supervision is unknown. We described a model of attending remote EHR use for resident supervision, and quantified the frequency and magnitude of use. Using a mixed methods approach, general medicine inpatient attendings were surveyed and interviewed about their remote EHR use. Frequency of use and supervisory actions were quantitatively examined via survey. Transcripts from semistructured interviews were analyzed using grounded theory to identify codes and themes. A total of 83% (59 of 71) of attendings participated. Fifty-seven (97%) reported using the EHR remotely, with 54 (92%) reporting they discovered new clinical information not relayed by residents via remote EHR use. A majority (93%, 55 of 59) reported that this resulted in management changes, and 54% (32 of 59) reported making immediate changes by contacting cross-covering teams. Six major factors around remote EHR use emerged: resident, clinical, educational, personal, technical, and administrative. Attendings described resident and clinical factors as facilitating "backstage" supervision via remote EHR use. In our study to assess attending remote EHR use for resident supervision, attendings reported frequent remote use with resulting supervisory actions, describing a previously uncharacterized form of "backstage" oversight supervision. Future work should explore best practices in remote EHR use to provide effective supervision and ultimately improve patient safety.
Automated Data Cleansing in Data Harvesting and Data Migration
DOE Office of Scientific and Technical Information (OSTI.GOV)
Martin, Mark; Vowell, Lance; King, Ian
2011-03-16
In the proposal for this project, we noted how the explosion of digitized information available through corporate databases, data stores and online search systems has resulted in the knowledge worker being bombarded by information. Knowledge workers typically spend more than 20-30% of their time seeking and sorting information, only finding the information 50-60% of the time . This information exists as unstructured, semi-structured and structured data. The problem of information overload is compounded by the production of duplicate or near-duplicate information. In addition, near-duplicate items frequently have different origins, creating a situation in which each item may have unique informationmore » of value, but their differences are not significant enough to justify maintaining them as separate entities. Effective tools can be provided to eliminate duplicate and near-duplicate information. The proposed approach was to extract unique information from data sets and consolidation that information into a single comprehensive file.« less
Efficient use of unlabeled data for protein sequence classification: a comparative study
Kuksa, Pavel; Huang, Pai-Hsi; Pavlovic, Vladimir
2009-01-01
Background Recent studies in computational primary protein sequence analysis have leveraged the power of unlabeled data. For example, predictive models based on string kernels trained on sequences known to belong to particular folds or superfamilies, the so-called labeled data set, can attain significantly improved accuracy if this data is supplemented with protein sequences that lack any class tags–the unlabeled data. In this study, we present a principled and biologically motivated computational framework that more effectively exploits the unlabeled data by only using the sequence regions that are more likely to be biologically relevant for better prediction accuracy. As overly-represented sequences in large uncurated databases may bias the estimation of computational models that rely on unlabeled data, we also propose a method to remove this bias and improve performance of the resulting classifiers. Results Combined with state-of-the-art string kernels, our proposed computational framework achieves very accurate semi-supervised protein remote fold and homology detection on three large unlabeled databases. It outperforms current state-of-the-art methods and exhibits significant reduction in running time. Conclusion The unlabeled sequences used under the semi-supervised setting resemble the unpolished gemstones; when used as-is, they may carry unnecessary features and hence compromise the classification accuracy but once cut and polished, they improve the accuracy of the classifiers considerably. PMID:19426450
Purification of Training Samples Based on Spectral Feature and Superpixel Segmentation
NASA Astrophysics Data System (ADS)
Guan, X.; Qi, W.; He, J.; Wen, Q.; Chen, T.; Wang, Z.
2018-04-01
Remote sensing image classification is an effective way to extract information from large volumes of high-spatial resolution remote sensing images. Generally, supervised image classification relies on abundant and high-precision training data, which is often manually interpreted by human experts to provide ground truth for training and evaluating the performance of the classifier. Remote sensing enterprises accumulated lots of manually interpreted products from early lower-spatial resolution remote sensing images by executing their routine research and business programs. However, these manually interpreted products may not match the very high resolution (VHR) image properly because of different dates or spatial resolution of both data, thus, hindering suitability of manually interpreted products in training classification models, or small coverage area of these manually interpreted products. We also face similar problems in our laboratory in 21st Century Aerospace Technology Co. Ltd (short for 21AT). In this work, we propose a method to purify the interpreted product to match newly available VHRI data and provide the best training data for supervised image classifiers in VHR image classification. And results indicate that our proposed method can efficiently purify the input data for future machine learning use.
Valous, Nektarios A; Mendoza, Fernando; Sun, Da-Wen; Allen, Paul
2010-03-01
The quaternionic singular value decomposition is a technique to decompose a quaternion matrix (representation of a colour image) into quaternion singular vector and singular value component matrices exposing useful properties. The objective of this study was to use a small portion of uncorrelated singular values, as robust features for the classification of sliced pork ham images, using a supervised artificial neural network classifier. Images were acquired from four qualities of sliced cooked pork ham typically consumed in Ireland (90 slices per quality), having similar appearances. Mahalanobis distances and Pearson product moment correlations were used for feature selection. Six highly discriminating features were used as input to train the neural network. An adaptive feedforward multilayer perceptron classifier was employed to obtain a suitable mapping from the input dataset. The overall correct classification performance for the training, validation and test set were 90.3%, 94.4%, and 86.1%, respectively. The results confirm that the classification performance was satisfactory. Extracting the most informative features led to the recognition of a set of different but visually quite similar textural patterns based on quaternionic singular values. Copyright 2009 Elsevier Ltd. All rights reserved.
Optoelectronic sensor device for monitoring ethanol concentration in winemaking applications
NASA Astrophysics Data System (ADS)
Jiménez-Márquez, F.; Vázquez, J.; Úbeda, J.; Rodríguez-Rey, J.; Sánchez-Rojas, J. L.
2015-05-01
The supervision of key variables such as sugar, alcohol, released CO2 and microbiological evolution in fermenting grape must is of great importance in the winemaking industry. However, the fermentation kinetics is assessed by monitoring the evolution of the density as it varies during a fermentation, since density is an indicator of the total amount of sugars, ethanol and glycerol. Even so, supervising the fermentation process is an awkward and non-comprehensive task, especially in wine cellars where production rates are massive, and enologists usually measure the density of the extracted samples from each fermentation tank manually twice a day. This work aims at the design of a fast, low-cost, portable and reliable optoelectronic sensor for measuring ethanol concentration in fermenting grape must samples. Different sets of model solutions, which contain ethanol, fructose, glucose, glycerol dissolved in water and emulate the grape must composition at different stages of the fermentation, were prepared both for calibration and validation. The absorption characteristics of these model solutions were analyzed by a commercial spectrophotometer in the NIR region, in order to identify key wavelengths from which valuable information regarding the sample composition can be extracted. Finally, a customized optoelectronic prototype based on absorbance measurements at two wavelengths belonging to the NIR region was designed, fabricated and successfully tested. The system, whose optoelectronics is reduced after a thorough analysis to only two LED lamps and their corresponding paired photodiodes operating at 1.2 and 1.3 μm respectively, calculates the ethanol content by a multiple linear regression.
Towards harmonized seismic analysis across Europe using supervised machine learning approaches
NASA Astrophysics Data System (ADS)
Zaccarelli, Riccardo; Bindi, Dino; Cotton, Fabrice; Strollo, Angelo
2017-04-01
In the framework of the Thematic Core Services for Seismology of EPOS-IP (European Plate Observing System-Implementation Phase), a service for disseminating a regionalized logic-tree of ground motions models for Europe is under development. While for the Mediterranean area the large availability of strong motion data qualified and disseminated through the Engineering Strong Motion database (ESM-EPOS), supports the development of both selection criteria and ground motion models, for the low-to-moderate seismic regions of continental Europe the development of ad-hoc models using weak motion recordings of moderate earthquakes is unavoidable. Aim of this work is to present a platform for creating application-oriented earthquake databases by retrieving information from EIDA (European Integrated Data Archive) and applying supervised learning models for earthquake records selection and processing suitable for any specific application of interest. Supervised learning models, i.e. the task of inferring a function from labelled training data, have been extensively used in several fields such as spam detection, speech and image recognition and in general pattern recognition. Their suitability to detect anomalies and perform a semi- to fully- automated filtering on large waveform data set easing the effort of (or replacing) human expertise is therefore straightforward. Being supervised learning algorithms capable of learning from a relatively small training set to predict and categorize unseen data, its advantage when processing large amount of data is crucial. Moreover, their intrinsic ability to make data driven predictions makes them suitable (and preferable) in those cases where explicit algorithms for detection might be unfeasible or too heuristic. In this study, we consider relatively simple statistical classifiers (e.g., Naive Bayes, Logistic Regression, Random Forest, SVMs) where label are assigned to waveform data based on "recognized classes" needed for our use case. These classes might be a simply binary case (e.g., "good for analysis" vs "bad") or more complex one (e.g., "good for analysis" vs "low SNR", "multi-event", "bad coda envelope"). It is important to stress the fact that our approach can be generalized to any use case providing, as in any supervised approach, an adequate training set of labelled data, a feature-set, a statistical classifier, and finally model validation and evaluation. Examples of use cases considered to develop the system prototype are the characterization of the ground motion in low seismic areas; harmonized spectral analysis across Europe for source and attenuation studies; magnitude calibration; coda analysis for attenuation studies.
An accelerated solvent extraction (ASE) device was evaluated as a semi-automated means for extracting arsenicals from quality control (QC) samples and DORM-2 [standard reference material (SRM)]. Unlike conventional extraction procedures, the ASE requires that the sample be dispe...
Dynamic Decision-Making in Multi-Task Environments: Theory and Experimental Results.
1981-03-15
The operator’s primary responsibility in this new role is to extract information from his environment, and to integrate it for’ action selection and its...of the human operator from one of a controller to one of a supervisory decision-maker. The operator’s primary responsibility in this new role is to...troller to that of a monitor of multiple tasks, or a supervisor of sev- ~ I eral semi-automated subsystems. The operator’s primary task in these
Toward global crop type mapping using a hybrid machine learning approach and multi-sensor imagery
NASA Astrophysics Data System (ADS)
Wang, S.; Le Bras, S.; Azzari, G.; Lobell, D. B.
2017-12-01
Current global scale datasets on agricultural land use do not have sufficient spatial or temporal resolution to meet the needs of many applications. The recent rapid increase in public availability of fine- to moderate-resolution satellite imagery from Landsat OLI and Copernicus Sentinel-2 provides a unique opportunity to improve agricultural land use datasets. This project leverages these new satellite data streams, existing census data, and a novel training approach to develop global, annual maps that indicate the presence of (i) cropland and (ii) specific crops at a 20m resolution. Our machine learning methodology consists of two steps. The first is a supervised classifier trained with explicitly labelled data to distinguish between crop and non-crop pixels, creating a binary mask. For ground truth, we use labels collected by previous mapping efforts (e.g. IIASA's crowdsourced data (Fritz et al. 2015) and AFSIS's geosurvey data) in combination with new data collected manually. The crop pixels output by the binary mask are input to the second step: a semi-supervised clustering algorithm to resolve different crop types and generate a crop type map. We do not use field-level information on crop type to train the algorithm, making this approach scalable spatially and temporally. We instead incorporate size constraints on clusters based on aggregated agricultural land use statistics and other, more generalizable domain knowledge. We employ field-level data from the U.S., Southern Europe, and Eastern Africa to validate crop-to-cluster assignments.
Identification of research hypotheses and new knowledge from scientific literature.
Shardlow, Matthew; Batista-Navarro, Riza; Thompson, Paul; Nawaz, Raheel; McNaught, John; Ananiadou, Sophia
2018-06-25
Text mining (TM) methods have been used extensively to extract relations and events from the literature. In addition, TM techniques have been used to extract various types or dimensions of interpretative information, known as Meta-Knowledge (MK), from the context of relations and events, e.g. negation, speculation, certainty and knowledge type. However, most existing methods have focussed on the extraction of individual dimensions of MK, without investigating how they can be combined to obtain even richer contextual information. In this paper, we describe a novel, supervised method to extract new MK dimensions that encode Research Hypotheses (an author's intended knowledge gain) and New Knowledge (an author's findings). The method incorporates various features, including a combination of simple MK dimensions. We identify previously explored dimensions and then use a random forest to combine these with linguistic features into a classification model. To facilitate evaluation of the model, we have enriched two existing corpora annotated with relations and events, i.e., a subset of the GENIA-MK corpus and the EU-ADR corpus, by adding attributes to encode whether each relation or event corresponds to Research Hypothesis or New Knowledge. In the GENIA-MK corpus, these new attributes complement simpler MK dimensions that had previously been annotated. We show that our approach is able to assign different types of MK dimensions to relations and events with a high degree of accuracy. Firstly, our method is able to improve upon the previously reported state of the art performance for an existing dimension, i.e., Knowledge Type. Secondly, we also demonstrate high F1-score in predicting the new dimensions of Research Hypothesis (GENIA: 0.914, EU-ADR 0.802) and New Knowledge (GENIA: 0.829, EU-ADR 0.836). We have presented a novel approach for predicting New Knowledge and Research Hypothesis, which combines simple MK dimensions to achieve high F1-scores. The extraction of such information is valuable for a number of practical TM applications.
Simpson-Southward, Chloe; Waller, Glenn; Hardy, Gillian E
2016-02-01
Psychological treatments for depression are not always delivered effectively or consistently. Clinical supervision of therapists is often assumed to keep therapy on track, ensuring positive patient outcomes. However, there is a lack of empirical evidence supporting this assumption. This experimental study explored the focus of supervision of depression cases, comparing guidance given to supervisees of different genders and anxiety levels. Participants were clinical supervisors who supervised therapists working with patients with depression. Supervisors indicated their supervision focus for three supervision case vignettes. Supervisee anxiety and gender was varied across vignettes. Supervisors focused calm female supervisees more on therapeutic techniques than state anxious female supervisees. Males were supervised in the same way, regardless of their anxiety. Both male and female supervisors had this pattern of focus. Findings indicate that supervision is influenced by supervisors' own biases towards their supervisees. These factors may cause supervisors to drift from prompting their supervisees to deliver best practice. Suggestions are made for ways to improve the effectiveness of clinical supervision and how these results may inform future research practice. Copyright © 2015 Elsevier Ltd. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shang, Yu; Lin, Yu; Yu, Guoqiang, E-mail: guoqiang.yu@uky.edu
2014-05-12
Conventional semi-infinite solution for extracting blood flow index (BFI) from diffuse correlation spectroscopy (DCS) measurements may cause errors in estimation of BFI (αD{sub B}) in tissues with small volume and large curvature. We proposed an algorithm integrating Nth-order linear model of autocorrelation function with the Monte Carlo simulation of photon migrations in tissue for the extraction of αD{sub B}. The volume and geometry of the measured tissue were incorporated in the Monte Carlo simulation, which overcome the semi-infinite restrictions. The algorithm was tested using computer simulations on four tissue models with varied volumes/geometries and applied on an in vivo strokemore » model of mouse. Computer simulations shows that the high-order (N ≥ 5) linear algorithm was more accurate in extracting αD{sub B} (errors < ±2%) from the noise-free DCS data than the semi-infinite solution (errors: −5.3% to −18.0%) for different tissue models. Although adding random noises to DCS data resulted in αD{sub B} variations, the mean values of errors in extracting αD{sub B} were similar to those reconstructed from the noise-free DCS data. In addition, the errors in extracting the relative changes of αD{sub B} using both linear algorithm and semi-infinite solution were fairly small (errors < ±2.0%) and did not rely on the tissue volume/geometry. The experimental results from the in vivo stroke mice agreed with those in simulations, demonstrating the robustness of the linear algorithm. DCS with the high-order linear algorithm shows the potential for the inter-subject comparison and longitudinal monitoring of absolute BFI in a variety of tissues/organs with different volumes/geometries.« less
Schmidt, Mette L K; Østergren, Peter; Cormie, Prue; Ragle, Anne-Mette; Sønksen, Jens; Midtgaard, Julie
2018-06-21
Regular exercise is recommended to mitigate the adverse effects of androgen deprivation therapy in men with prostate cancer. The purpose of this study was to explore the experience of transition to unsupervised, community-based exercise among men who had participated in a hospital-based supervised exercise programme in order to propose components that supported transition to unsupervised exercise. Participants were selected by means of purposive, criteria-based sampling. Men undergoing androgen deprivation therapy who had completed a 12-week hospital-based, supervised, group exercise intervention were invited to participate. The programme involved aerobic and resistance training using machines and included a structured transition to a community-based fitness centre. Data were collected by means of semi-structured focus group interviews and analysed using thematic analysis. Five focus group interviews were conducted with a total of 29 men, of whom 25 reported to have continued to exercise at community-based facilities. Three thematic categories emerged: Development and practice of new skills; Establishing social relationships; and Familiarising with bodily well-being. These were combined into an overarching theme: From learning to doing. Components suggested to support transition were as follows: a structured transition involving supervised exercise sessions at a community-based facility; strategies to facilitate peer support; transferable tools including an individual exercise chart; and access to 'check-ups' by qualified exercise specialists. Hospital-based, supervised exercise provides a safe learning environment. Transferring to community-based exercise can be experienced as a confrontation with the real world and can be eased through securing a structured transition, having transferable tools, sustained peer support and monitoring.
LexValueSets: An Approach for Context-Driven Value Sets Extraction
Pathak, Jyotishman; Jiang, Guoqian; Dwarkanath, Sridhar O.; Buntrock, James D.; Chute, Christopher G.
2008-01-01
The ability to model, share and re-use value sets across multiple medical information systems is an important requirement. However, generating value sets semi-automatically from a terminology service is still an unresolved issue, in part due to the lack of linkage to clinical context patterns that provide the constraints in defining a concept domain and invocation of value sets extraction. Towards this goal, we develop and evaluate an approach for context-driven automatic value sets extraction based on a formal terminology model. The crux of the technique is to identify and define the context patterns from various domains of discourse and leverage them for value set extraction using two complementary ideas based on (i) local terms provided by the Subject Matter Experts (extensional) and (ii) semantic definition of the concepts in coding schemes (intensional). A prototype was implemented based on SNOMED CT rendered in the LexGrid terminology model and a preliminary evaluation is presented. PMID:18998955
Short term load forecasting using a self-supervised adaptive neural network
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yoo, H.; Pimmel, R.L.
The authors developed a self-supervised adaptive neural network to perform short term load forecasts (STLF) for a large power system covering a wide service area with several heavy load centers. They used the self-supervised network to extract correlational features from temperature and load data. In using data from the calendar year 1993 as a test case, they found a 0.90 percent error for hour-ahead forecasting and 1.92 percent error for day-ahead forecasting. These levels of error compare favorably with those obtained by other techniques. The algorithm ran in a couple of minutes on a PC containing an Intel Pentium --more » 120 MHz CPU. Since the algorithm included searching the historical database, training the network, and actually performing the forecasts, this approach provides a real-time, portable, and adaptable STLF.« less
Weakly supervised image semantic segmentation based on clustering superpixels
NASA Astrophysics Data System (ADS)
Yan, Xiong; Liu, Xiaohua
2018-04-01
In this paper, we propose an image semantic segmentation model which is trained from image-level labeled images. The proposed model starts with superpixel segmenting, and features of the superpixels are extracted by trained CNN. We introduce a superpixel-based graph followed by applying the graph partition method to group correlated superpixels into clusters. For the acquisition of inter-label correlations between the image-level labels in dataset, we not only utilize label co-occurrence statistics but also exploit visual contextual cues simultaneously. At last, we formulate the task of mapping appropriate image-level labels to the detected clusters as a problem of convex minimization. Experimental results on MSRC-21 dataset and LableMe dataset show that the proposed method has a better performance than most of the weakly supervised methods and is even comparable to fully supervised methods.
Martin, Priya; Kumar, Saravana; Lizarondo, Lucylynn; Tyack, Zephanie
2016-10-01
Clinical supervision is important for effective health service delivery, professional development and practice. Despite its importance there is a lack of evidence regarding the factors that improve its quality. This study aimed to investigate the factors that influence the quality of clinical supervision of occupational therapists employed in a large public sector health service covering mental health, paediatrics, adult physical and other practice areas. A mixed method, sequential explanatory study design was used consisting of two phases. This article reports the quantitative phase (Phase One) which involved administration of the Manchester Clinical Supervision Scale (MCSS-26) to 207 occupational therapists. Frequency of supervision sessions, choice of supervisor and the type of supervision were found to be the predictor variables with a positive and significant influence on the quality of clinical supervision. Factors such as age, length of supervision and the area of practice were found to be the predictor variables with a negative and significant influence on the quality of clinical supervision. Factors that influence the perceived quality of clinical supervision among occupational therapists have been identified. High quality clinical supervision is an important component of clinical governance and has been shown to be beneficial to practitioners, patients and the organisation. Information on factors that make clinical supervision effective identified in this study can be added to existing supervision training and practices to improve the quality of clinical supervision. © 2016 Occupational Therapy Australia.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Not Available
This interim notice covers the following: extractable organic halides in solids, total organic halides, analysis by gas chromatography/Fourier transform-infrared spectroscopy, hexadecane extracts for volatile organic compounds, GC/MS analysis of VOCs, GC/MS analysis of methanol extracts of cryogenic vapor samples, screening of semivolatile organic extracts, GPC cleanup for semivolatiles, sample preparation for GC/MS for semi-VOCs, analysis for pesticides/PCBs by GC with electron capture detection, sample preparation for pesticides/PCBs in water and soil sediment, report preparation, Florisil column cleanup for pesticide/PCBs, silica gel and acid-base partition cleanup of samples for semi-VOCs, concentrate acid wash cleanup, carbon determination in solids using Coulometrics` CO{submore » 2} coulometer, determination of total carbon/total organic carbon/total inorganic carbon in radioactive liquids/soils/sludges by hot persulfate method, analysis of solids for carbonates using Coulometrics` Model 5011 coulometer, and soxhlet extraction.« less
An accelerated solvent extraction (ASE) device was evaluated as a semi-automated means of extracting arsenicals from ribbon kelp. Objective was to investigate effect of experimentally controllable ASE parameters (pressure, temperature, static time and solvent composition) on extr...
Hsieh, Thomas M; Liu, Yi-Min; Liao, Chun-Chih; Xiao, Furen; Chiang, I-Jen; Wong, Jau-Min
2011-08-26
In recent years, magnetic resonance imaging (MRI) has become important in brain tumor diagnosis. Using this modality, physicians can locate specific pathologies by analyzing differences in tissue character presented in different types of MR images.This paper uses an algorithm integrating fuzzy-c-mean (FCM) and region growing techniques for automated tumor image segmentation from patients with menigioma. Only non-contrasted T1 and T2 -weighted MR images are included in the analysis. The study's aims are to correctly locate tumors in the images, and to detect those situated in the midline position of the brain. The study used non-contrasted T1- and T2-weighted MR images from 29 patients with menigioma. After FCM clustering, 32 groups of images from each patient group were put through the region-growing procedure for pixels aggregation. Later, using knowledge-based information, the system selected tumor-containing images from these groups and merged them into one tumor image. An alternative semi-supervised method was added at this stage for comparison with the automatic method. Finally, the tumor image was optimized by a morphology operator. Results from automatic segmentation were compared to the "ground truth" (GT) on a pixel level. Overall data were then evaluated using a quantified system. The quantified parameters, including the "percent match" (PM) and "correlation ratio" (CR), suggested a high match between GT and the present study's system, as well as a fair level of correspondence. The results were compatible with those from other related studies. The system successfully detected all of the tumors situated at the midline of brain.Six cases failed in the automatic group. One also failed in the semi-supervised alternative. The remaining five cases presented noticeable edema inside the brain. In the 23 successful cases, the PM and CR values in the two groups were highly related. Results indicated that, even when using only two sets of non-contrasted MR images, the system is a reliable and efficient method of brain-tumor detection. With further development the system demonstrates high potential for practical clinical use.
Automatic evidence quality prediction to support evidence-based decision making.
Sarker, Abeed; Mollá, Diego; Paris, Cécile
2015-06-01
Evidence-based medicine practice requires practitioners to obtain the best available medical evidence, and appraise the quality of the evidence when making clinical decisions. Primarily due to the plethora of electronically available data from the medical literature, the manual appraisal of the quality of evidence is a time-consuming process. We present a fully automatic approach for predicting the quality of medical evidence in order to aid practitioners at point-of-care. Our approach extracts relevant information from medical article abstracts and utilises data from a specialised corpus to apply supervised machine learning for the prediction of the quality grades. Following an in-depth analysis of the usefulness of features (e.g., publication types of articles), they are extracted from the text via rule-based approaches and from the meta-data associated with the articles, and then applied in the supervised classification model. We propose the use of a highly scalable and portable approach using a sequence of high precision classifiers, and introduce a simple evaluation metric called average error distance (AED) that simplifies the comparison of systems. We also perform elaborate human evaluations to compare the performance of our system against human judgments. We test and evaluate our approaches on a publicly available, specialised, annotated corpus containing 1132 evidence-based recommendations. Our rule-based approach performs exceptionally well at the automatic extraction of publication types of articles, with F-scores of up to 0.99 for high-quality publication types. For evidence quality classification, our approach obtains an accuracy of 63.84% and an AED of 0.271. The human evaluations show that the performance of our system, in terms of AED and accuracy, is comparable to the performance of humans on the same data. The experiments suggest that our structured text classification framework achieves evaluation results comparable to those of human performance. Our overall classification approach and evaluation technique are also highly portable and can be used for various evidence grading scales. Copyright © 2015 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Zhang, Yuanyuan; Gao, Zhiqiang; Liu, Xiangyang; Xu, Ning; Liu, Chaoshun; Gao, Wei
2016-09-01
Reclamation caused a significant dynamic change in the coastal zone, the tidal flat zone is an unstable reserve land resource, it has important significance for its research. In order to realize the efficient extraction of the tidal flat area information, this paper takes Rudong County in Jiangsu Province as the research area, using the HJ1A/1B images as the data source, on the basis of previous research experience and literature review, the paper chooses the method of object-oriented classification as a semi-automatic extraction method to generate waterlines. Then waterlines are analyzed by DSAS software to obtain tide points, automatic extraction of outer boundary points are followed under the use of Python to determine the extent of tidal flats in 2014 of Rudong County, the extraction area was 55182hm2, the confusion matrix is used to verify the accuracy and the result shows that the kappa coefficient is 0.945. The method could improve deficiencies of previous studies and its available free nature on the Internet makes a generalization.
Peixoto, Juliana A; Andrade E Silva, Márcio Luis; Crotti, Antônio E M; Cassio Sola Veneziani, Rodrigo; Gimenez, Valéria M M; Januário, Ana H; Groppo, Milton; Magalhães, Lizandra G; Dos Santos, Fransérgio F; Albuquerque, Sérgio; da Silva Filho, Ademar A; Cunha, Wilson R
2011-02-22
The in vitro activity of the crude hydroalcoholic extract of the aerial parts of Miconia langsdorffii Cogn. was evaluated against the promastigote forms of L. amazonensis, the causative agent of cutaneous leishmaniasis in humans. The bioassay-guided fractionation of this extract led to identification of the triterpenes ursolic acid and oleanolic acid as the major compounds in the fraction that displayed the highest activity. Several ursolic acid semi-synthetic derivatives were prepared, to find out whether more active compounds could be obtained. Among these ursolic acid-derived substances, the C-28 methyl ester derivative exhibited the best antileishmanial activity.
Quasi-Supervised Scoring of Human Sleep in Polysomnograms Using Augmented Input Variables
Yaghouby, Farid; Sunderam, Sridhar
2015-01-01
The limitations of manual sleep scoring make computerized methods highly desirable. Scoring errors can arise from human rater uncertainty or inter-rater variability. Sleep scoring algorithms either come as supervised classifiers that need scored samples of each state to be trained, or as unsupervised classifiers that use heuristics or structural clues in unscored data to define states. We propose a quasi-supervised classifier that models observations in an unsupervised manner but mimics a human rater wherever training scores are available. EEG, EMG, and EOG features were extracted in 30s epochs from human-scored polysomnograms recorded from 42 healthy human subjects (18 to 79 years) and archived in an anonymized, publicly accessible database. Hypnograms were modified so that: 1. Some states are scored but not others; 2. Samples of all states are scored but not for transitional epochs; and 3. Two raters with 67% agreement are simulated. A framework for quasi-supervised classification was devised in which unsupervised statistical models—specifically Gaussian mixtures and hidden Markov models—are estimated from unlabeled training data, but the training samples are augmented with variables whose values depend on available scores. Classifiers were fitted to signal features incorporating partial scores, and used to predict scores for complete recordings. Performance was assessed using Cohen's K statistic. The quasi-supervised classifier performed significantly better than an unsupervised model and sometimes as well as a completely supervised model despite receiving only partial scores. The quasi-supervised algorithm addresses the need for classifiers that mimic scoring patterns of human raters while compensating for their limitations. PMID:25679475
Quasi-supervised scoring of human sleep in polysomnograms using augmented input variables.
Yaghouby, Farid; Sunderam, Sridhar
2015-04-01
The limitations of manual sleep scoring make computerized methods highly desirable. Scoring errors can arise from human rater uncertainty or inter-rater variability. Sleep scoring algorithms either come as supervised classifiers that need scored samples of each state to be trained, or as unsupervised classifiers that use heuristics or structural clues in unscored data to define states. We propose a quasi-supervised classifier that models observations in an unsupervised manner but mimics a human rater wherever training scores are available. EEG, EMG, and EOG features were extracted in 30s epochs from human-scored polysomnograms recorded from 42 healthy human subjects (18-79 years) and archived in an anonymized, publicly accessible database. Hypnograms were modified so that: 1. Some states are scored but not others; 2. Samples of all states are scored but not for transitional epochs; and 3. Two raters with 67% agreement are simulated. A framework for quasi-supervised classification was devised in which unsupervised statistical models-specifically Gaussian mixtures and hidden Markov models--are estimated from unlabeled training data, but the training samples are augmented with variables whose values depend on available scores. Classifiers were fitted to signal features incorporating partial scores, and used to predict scores for complete recordings. Performance was assessed using Cohen's Κ statistic. The quasi-supervised classifier performed significantly better than an unsupervised model and sometimes as well as a completely supervised model despite receiving only partial scores. The quasi-supervised algorithm addresses the need for classifiers that mimic scoring patterns of human raters while compensating for their limitations. Copyright © 2015 Elsevier Ltd. All rights reserved.