Sample records for naive bayes classification

  1. Comparison of Naive Bayes and Decision Tree on Feature Selection Using Genetic Algorithm for Classification Problem

    NASA Astrophysics Data System (ADS)

    Rahmadani, S.; Dongoran, A.; Zarlis, M.; Zakarias

    2018-03-01

    This paper discusses the problem of feature selection using genetic algorithms on a dataset for classification problems. The classification model used is the decicion tree (DT), and Naive Bayes. In this paper we will discuss how the Naive Bayes and Decision Tree models to overcome the classification problem in the dataset, where the dataset feature is selectively selected using GA. Then both models compared their performance, whether there is an increase in accuracy or not. From the results obtained shows an increase in accuracy if the feature selection using GA. The proposed model is referred to as GADT (GA-Decision Tree) and GANB (GA-Naive Bayes). The data sets tested in this paper are taken from the UCI Machine Learning repository.

  2. Improving Naive Bayes with Online Feature Selection for Quick Adaptation to Evolving Feature Usefulness

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Pon, R K; Cardenas, A F; Buttler, D J

    The definition of what makes an article interesting varies from user to user and continually evolves even for a single user. As a result, for news recommendation systems, useless document features can not be determined a priori and all features are usually considered for interestingness classification. Consequently, the presence of currently useless features degrades classification performance [1], particularly over the initial set of news articles being classified. The initial set of document is critical for a user when considering which particular news recommendation system to adopt. To address these problems, we introduce an improved version of the naive Bayes classifiermore » with online feature selection. We use correlation to determine the utility of each feature and take advantage of the conditional independence assumption used by naive Bayes for online feature selection and classification. The augmented naive Bayes classifier performs 28% better than the traditional naive Bayes classifier in recommending news articles from the Yahoo! RSS feeds.« less

  3. Naïve Bayes classification in R.

    PubMed

    Zhang, Zhongheng

    2016-06-01

    Naïve Bayes classification is a kind of simple probabilistic classification methods based on Bayes' theorem with the assumption of independence between features. The model is trained on training dataset to make predictions by predict() function. This article introduces two functions naiveBayes() and train() for the performance of Naïve Bayes classification.

  4. Modified Mahalanobis Taguchi System for Imbalance Data Classification

    PubMed Central

    2017-01-01

    The Mahalanobis Taguchi System (MTS) is considered one of the most promising binary classification algorithms to handle imbalance data. Unfortunately, MTS lacks a method for determining an efficient threshold for the binary classification. In this paper, a nonlinear optimization model is formulated based on minimizing the distance between MTS Receiver Operating Characteristics (ROC) curve and the theoretical optimal point named Modified Mahalanobis Taguchi System (MMTS). To validate the MMTS classification efficacy, it has been benchmarked with Support Vector Machines (SVMs), Naive Bayes (NB), Probabilistic Mahalanobis Taguchi Systems (PTM), Synthetic Minority Oversampling Technique (SMOTE), Adaptive Conformal Transformation (ACT), Kernel Boundary Alignment (KBA), Hidden Naive Bayes (HNB), and other improved Naive Bayes algorithms. MMTS outperforms the benchmarked algorithms especially when the imbalance ratio is greater than 400. A real life case study on manufacturing sector is used to demonstrate the applicability of the proposed model and to compare its performance with Mahalanobis Genetic Algorithm (MGA). PMID:28811820

  5. Machine learning approach to automatic exudate detection in retinal images from diabetic patients

    NASA Astrophysics Data System (ADS)

    Sopharak, Akara; Dailey, Matthew N.; Uyyanonvara, Bunyarit; Barman, Sarah; Williamson, Tom; Thet Nwe, Khine; Aye Moe, Yin

    2010-01-01

    Exudates are among the preliminary signs of diabetic retinopathy, a major cause of vision loss in diabetic patients. Early detection of exudates could improve patients' chances to avoid blindness. In this paper, we present a series of experiments on feature selection and exudates classification using naive Bayes and support vector machine (SVM) classifiers. We first fit the naive Bayes model to a training set consisting of 15 features extracted from each of 115,867 positive examples of exudate pixels and an equal number of negative examples. We then perform feature selection on the naive Bayes model, repeatedly removing features from the classifier, one by one, until classification performance stops improving. To find the best SVM, we begin with the best feature set from the naive Bayes classifier, and repeatedly add the previously-removed features to the classifier. For each combination of features, we perform a grid search to determine the best combination of hyperparameters ν (tolerance for training errors) and γ (radial basis function width). We compare the best naive Bayes and SVM classifiers to a baseline nearest neighbour (NN) classifier using the best feature sets from both classifiers. We find that the naive Bayes and SVM classifiers perform better than the NN classifier. The overall best sensitivity, specificity, precision, and accuracy are 92.28%, 98.52%, 53.05%, and 98.41%, respectively.

  6. Detection of dechallenge in spontaneous reporting systems: a comparison of Bayes methods.

    PubMed

    Banu, A Bazila; Alias Balamurugan, S Appavu; Thirumalaikolundusubramanian, Ponniah

    2014-01-01

    Dechallenge is a response observed for the reduction or disappearance of adverse drug reactions (ADR) on withdrawal of a drug from a patient. Currently available algorithms to detect dechallenge have limitations. Hence, there is a need to compare available new methods. To detect dechallenge in Spontaneous Reporting Systems, data-mining algorithms like Naive Bayes and Improved Naive Bayes were applied for comparing the performance of the algorithms in terms of accuracy and error. Analyzing the factors of dechallenge like outcome and disease category will help medical practitioners and pharmaceutical industries to determine the reasons for dechallenge in order to take essential steps toward drug safety. Adverse drug reactions of the year 2011 and 2012 were downloaded from the United States Food and Drug Administration's database. The outcome of classification algorithms showed that Improved Naive Bayes algorithm outperformed Naive Bayes with accuracy of 90.11% and error of 9.8% in detecting the dechallenge. Detecting dechallenge for unknown samples are essential for proper prescription. To overcome the issues exposed by Naive Bayes algorithm, Improved Naive Bayes algorithm can be used to detect dechallenge in terms of higher accuracy and minimal error.

  7. Classifying emotion in Twitter using Bayesian network

    NASA Astrophysics Data System (ADS)

    Surya Asriadie, Muhammad; Syahrul Mubarok, Mohamad; Adiwijaya

    2018-03-01

    Language is used to express not only facts, but also emotions. Emotions are noticeable from behavior up to the social media statuses written by a person. Analysis of emotions in a text is done in a variety of media such as Twitter. This paper studies classification of emotions on twitter using Bayesian network because of its ability to model uncertainty and relationships between features. The result is two models based on Bayesian network which are Full Bayesian Network (FBN) and Bayesian Network with Mood Indicator (BNM). FBN is a massive Bayesian network where each word is treated as a node. The study shows the method used to train FBN is not very effective to create the best model and performs worse compared to Naive Bayes. F1-score for FBN is 53.71%, while for Naive Bayes is 54.07%. BNM is proposed as an alternative method which is based on the improvement of Multinomial Naive Bayes and has much lower computational complexity compared to FBN. Even though it’s not better compared to FBN, the resulting model successfully improves the performance of Multinomial Naive Bayes. F1-Score for Multinomial Naive Bayes model is 51.49%, while for BNM is 52.14%.

  8. Naive Bayes as opinion classifier to evaluate students satisfaction based on student sentiment in Twitter Social Media

    NASA Astrophysics Data System (ADS)

    Candra Permana, Fahmi; Rosmansyah, Yusep; Setiawan Abdullah, Atje

    2017-10-01

    Students activity on social media can provide implicit knowledge and new perspectives for an educational system. Sentiment analysis is a part of text mining that can help to analyze and classify the opinion data. This research uses text mining and naive Bayes method as opinion classifier, to be used as an alternative methods in the process of evaluating studentss satisfaction for educational institution. Based on test results, this system can determine the opinion classification in Bahasa Indonesia using naive Bayes as opinion classifier with accuracy level of 84% correct, and the comparison between the existing system and the proposed system to evaluate students satisfaction in learning process, there is only a difference of 16.49%.

  9. Support vector inductive logic programming outperforms the naive Bayes classifier and inductive logic programming for the classification of bioactive chemical compounds.

    PubMed

    Cannon, Edward O; Amini, Ata; Bender, Andreas; Sternberg, Michael J E; Muggleton, Stephen H; Glen, Robert C; Mitchell, John B O

    2007-05-01

    We investigate the classification performance of circular fingerprints in combination with the Naive Bayes Classifier (MP2D), Inductive Logic Programming (ILP) and Support Vector Inductive Logic Programming (SVILP) on a standard molecular benchmark dataset comprising 11 activity classes and about 102,000 structures. The Naive Bayes Classifier treats features independently while ILP combines structural fragments, and then creates new features with higher predictive power. SVILP is a very recently presented method which adds a support vector machine after common ILP procedures. The performance of the methods is evaluated via a number of statistical measures, namely recall, specificity, precision, F-measure, Matthews Correlation Coefficient, area under the Receiver Operating Characteristic (ROC) curve and enrichment factor (EF). According to the F-measure, which takes both recall and precision into account, SVILP is for seven out of the 11 classes the superior method. The results show that the Bayes Classifier gives the best recall performance for eight of the 11 targets, but has a much lower precision, specificity and F-measure. The SVILP model on the other hand has the highest recall for only three of the 11 classes, but generally far superior specificity and precision. To evaluate the statistical significance of the SVILP superiority, we employ McNemar's test which shows that SVILP performs significantly (p < 5%) better than both other methods for six out of 11 activity classes, while being superior with less significance for three of the remaining classes. While previously the Bayes Classifier was shown to perform very well in molecular classification studies, these results suggest that SVILP is able to extract additional knowledge from the data, thus improving classification results further.

  10. On the classification techniques in data mining for microarray data classification

    NASA Astrophysics Data System (ADS)

    Aydadenta, Husna; Adiwijaya

    2018-03-01

    Cancer is one of the deadly diseases, according to data from WHO by 2015 there are 8.8 million more deaths caused by cancer, and this will increase every year if not resolved earlier. Microarray data has become one of the most popular cancer-identification studies in the field of health, since microarray data can be used to look at levels of gene expression in certain cell samples that serve to analyze thousands of genes simultaneously. By using data mining technique, we can classify the sample of microarray data thus it can be identified with cancer or not. In this paper we will discuss some research using some data mining techniques using microarray data, such as Support Vector Machine (SVM), Artificial Neural Network (ANN), Naive Bayes, k-Nearest Neighbor (kNN), and C4.5, and simulation of Random Forest algorithm with technique of reduction dimension using Relief. The result of this paper show performance measure (accuracy) from classification algorithm (SVM, ANN, Naive Bayes, kNN, C4.5, and Random Forets).The results in this paper show the accuracy of Random Forest algorithm higher than other classification algorithms (Support Vector Machine (SVM), Artificial Neural Network (ANN), Naive Bayes, k-Nearest Neighbor (kNN), and C4.5). It is hoped that this paper can provide some information about the speed, accuracy, performance and computational cost generated from each Data Mining Classification Technique based on microarray data.

  11. Document-level classification of CT pulmonary angiography reports based on an extension of the ConText algorithm.

    PubMed

    Chapman, Brian E; Lee, Sean; Kang, Hyunseok Peter; Chapman, Wendy W

    2011-10-01

    In this paper we describe an application called peFinder for document-level classification of CT pulmonary angiography reports. peFinder is based on a generalized version of the ConText algorithm, a simple text processing algorithm for identifying features in clinical report documents. peFinder was used to answer questions about the disease state (pulmonary emboli present or absent), the certainty state of the diagnosis (uncertainty present or absent), the temporal state of an identified pulmonary embolus (acute or chronic), and the technical quality state of the exam (diagnostic or not diagnostic). Gold standard answers for each question were determined from the consensus classifications of three human annotators. peFinder results were compared to naive Bayes' classifiers using unigrams and bigrams. The sensitivities (and positive predictive values) for peFinder were 0.98(0.83), 0.86(0.96), 0.94(0.93), and 0.60(0.90) for disease state, quality state, certainty state, and temporal state respectively, compared to 0.68(0.77), 0.67(0.87), 0.62(0.82), and 0.04(0.25) for the naive Bayes' classifier using unigrams, and 0.75(0.79), 0.52(0.69), 0.59(0.84), and 0.04(0.25) for the naive Bayes' classifier using bigrams. Copyright © 2011 Elsevier Inc. All rights reserved.

  12. A discrete wavelet based feature extraction and hybrid classification technique for microarray data analysis.

    PubMed

    Bennet, Jaison; Ganaprakasam, Chilambuchelvan Arul; Arputharaj, Kannan

    2014-01-01

    Cancer classification by doctors and radiologists was based on morphological and clinical features and had limited diagnostic ability in olden days. The recent arrival of DNA microarray technology has led to the concurrent monitoring of thousands of gene expressions in a single chip which stimulates the progress in cancer classification. In this paper, we have proposed a hybrid approach for microarray data classification based on nearest neighbor (KNN), naive Bayes, and support vector machine (SVM). Feature selection prior to classification plays a vital role and a feature selection technique which combines discrete wavelet transform (DWT) and moving window technique (MWT) is used. The performance of the proposed method is compared with the conventional classifiers like support vector machine, nearest neighbor, and naive Bayes. Experiments have been conducted on both real and benchmark datasets and the results indicate that the ensemble approach produces higher classification accuracy than conventional classifiers. This paper serves as an automated system for the classification of cancer and can be applied by doctors in real cases which serve as a boon to the medical community. This work further reduces the misclassification of cancers which is highly not allowed in cancer detection.

  13. Classification of Indonesian quote on Twitter using Naïve Bayes

    NASA Astrophysics Data System (ADS)

    Rachmadany, A.; Pranoto, Y. M.; Gunawan; Multazam, M. T.; Nandiyanto, A. B. D.; Abdullah, A. G.; Widiaty, I.

    2018-01-01

    Quote is sentences made in the hope that someone can become strong personalities, individuals who always improve themselves to move forward and achieve success. Social media is a place for people to express his heart to the world that sometimes the expression of the heart is quotes. Here, the purpose of this study was to classify Indonesian quote on Twitter using Naïve Bayes. This experiment uses text classification from Twitter data written by Twitter users which are quote then classification again grouped into 6 categories (Love, Life, Motivation, Education, Religion, Others). The language used is Indonesian. The method used is Naive Bayes. The results of this experiment are a web application collection of Indonesian quote that have been classified. This classification gives the user ease in finding quote based on class or keyword. For example, when a user wants to find a 'motivation' quote, this classification would be very useful.

  14. A SVM-based method for sentiment analysis in Persian language

    NASA Astrophysics Data System (ADS)

    Hajmohammadi, Mohammad Sadegh; Ibrahim, Roliana

    2013-03-01

    Persian language is the official language of Iran, Tajikistan and Afghanistan. Local online users often represent their opinions and experiences on the web with written Persian. Although the information in those reviews is valuable to potential consumers and sellers, the huge amount of web reviews make it difficult to give an unbiased evaluation to a product. In this paper, standard machine learning techniques SVM and naive Bayes are incorporated into the domain of online Persian Movie reviews to automatically classify user reviews as positive or negative and performance of these two classifiers is compared with each other in this language. The effects of feature presentations on classification performance are discussed. We find that accuracy is influenced by interaction between the classification models and the feature options. The SVM classifier achieves as well as or better accuracy than naive Bayes in Persian movie. Unigrams are proved better features than bigrams and trigrams in capturing Persian sentiment orientation.

  15. Text Classification for Intelligent Portfolio Management

    DTIC Science & Technology

    2002-05-01

    years including nearest neighbor classification [15], naive Bayes with EM (Ex- pectation Maximization) [11] [13], Winnow with active learning [10... Active Learning and Expectation Maximization (EM). In particular, active learning is used to actively select documents for labeling, then EM assigns...generalization with active learning . Machine Learning, 15(2):201–221, 1994. [3] I. Dagan and P. Engelson. Committee-based sampling for training

  16. Relevance popularity: A term event model based feature selection scheme for text classification.

    PubMed

    Feng, Guozhong; An, Baiguo; Yang, Fengqin; Wang, Han; Zhang, Libiao

    2017-01-01

    Feature selection is a practical approach for improving the performance of text classification methods by optimizing the feature subsets input to classifiers. In traditional feature selection methods such as information gain and chi-square, the number of documents that contain a particular term (i.e. the document frequency) is often used. However, the frequency of a given term appearing in each document has not been fully investigated, even though it is a promising feature to produce accurate classifications. In this paper, we propose a new feature selection scheme based on a term event Multinomial naive Bayes probabilistic model. According to the model assumptions, the matching score function, which is based on the prediction probability ratio, can be factorized. Finally, we derive a feature selection measurement for each term after replacing inner parameters by their estimators. On a benchmark English text datasets (20 Newsgroups) and a Chinese text dataset (MPH-20), our numerical experiment results obtained from using two widely used text classifiers (naive Bayes and support vector machine) demonstrate that our method outperformed the representative feature selection methods.

  17. Optimizing taxonomic classification of marker-gene amplicon sequences with QIIME 2's q2-feature-classifier plugin.

    PubMed

    Bokulich, Nicholas A; Kaehler, Benjamin D; Rideout, Jai Ram; Dillon, Matthew; Bolyen, Evan; Knight, Rob; Huttley, Gavin A; Gregory Caporaso, J

    2018-05-17

    Taxonomic classification of marker-gene sequences is an important step in microbiome analysis. We present q2-feature-classifier ( https://github.com/qiime2/q2-feature-classifier ), a QIIME 2 plugin containing several novel machine-learning and alignment-based methods for taxonomy classification. We evaluated and optimized several commonly used classification methods implemented in QIIME 1 (RDP, BLAST, UCLUST, and SortMeRNA) and several new methods implemented in QIIME 2 (a scikit-learn naive Bayes machine-learning classifier, and alignment-based taxonomy consensus methods based on VSEARCH, and BLAST+) for classification of bacterial 16S rRNA and fungal ITS marker-gene amplicon sequence data. The naive-Bayes, BLAST+-based, and VSEARCH-based classifiers implemented in QIIME 2 meet or exceed the species-level accuracy of other commonly used methods designed for classification of marker gene sequences that were evaluated in this work. These evaluations, based on 19 mock communities and error-free sequence simulations, including classification of simulated "novel" marker-gene sequences, are available in our extensible benchmarking framework, tax-credit ( https://github.com/caporaso-lab/tax-credit-data ). Our results illustrate the importance of parameter tuning for optimizing classifier performance, and we make recommendations regarding parameter choices for these classifiers under a range of standard operating conditions. q2-feature-classifier and tax-credit are both free, open-source, BSD-licensed packages available on GitHub.

  18. Content Abstract Classification Using Naive Bayes

    NASA Astrophysics Data System (ADS)

    Latif, Syukriyanto; Suwardoyo, Untung; Aldrin Wihelmus Sanadi, Edwin

    2018-03-01

    This study aims to classify abstract content based on the use of the highest number of words in an abstract content of the English language journals. This research uses a system of text mining technology that extracts text data to search information from a set of documents. Abstract content of 120 data downloaded at www.computer.org. Data grouping consists of three categories: DM (Data Mining), ITS (Intelligent Transport System) and MM (Multimedia). Systems built using naive bayes algorithms to classify abstract journals and feature selection processes using term weighting to give weight to each word. Dimensional reduction techniques to reduce the dimensions of word counts rarely appear in each document based on dimensional reduction test parameters of 10% -90% of 5.344 words. The performance of the classification system is tested by using the Confusion Matrix based on comparative test data and test data. The results showed that the best classification results were obtained during the 75% training data test and 25% test data from the total data. Accuracy rates for categories of DM, ITS and MM were 100%, 100%, 86%. respectively with dimension reduction parameters of 30% and the value of learning rate between 0.1-0.5.

  19. Protein classification based on text document classification techniques.

    PubMed

    Cheng, Betty Yee Man; Carbonell, Jaime G; Klein-Seetharaman, Judith

    2005-03-01

    The need for accurate, automated protein classification methods continues to increase as advances in biotechnology uncover new proteins. G-protein coupled receptors (GPCRs) are a particularly difficult superfamily of proteins to classify due to extreme diversity among its members. Previous comparisons of BLAST, k-nearest neighbor (k-NN), hidden markov model (HMM) and support vector machine (SVM) using alignment-based features have suggested that classifiers at the complexity of SVM are needed to attain high accuracy. Here, analogous to document classification, we applied Decision Tree and Naive Bayes classifiers with chi-square feature selection on counts of n-grams (i.e. short peptide sequences of length n) to this classification task. Using the GPCR dataset and evaluation protocol from the previous study, the Naive Bayes classifier attained an accuracy of 93.0 and 92.4% in level I and level II subfamily classification respectively, while SVM has a reported accuracy of 88.4 and 86.3%. This is a 39.7 and 44.5% reduction in residual error for level I and level II subfamily classification, respectively. The Decision Tree, while inferior to SVM, outperforms HMM in both level I and level II subfamily classification. For those GPCR families whose profiles are stored in the Protein FAMilies database of alignments and HMMs (PFAM), our method performs comparably to a search against those profiles. Finally, our method can be generalized to other protein families by applying it to the superfamily of nuclear receptors with 94.5, 97.8 and 93.6% accuracy in family, level I and level II subfamily classification respectively. Copyright 2005 Wiley-Liss, Inc.

  20. Evaluation of supervised machine-learning algorithms to distinguish between inflammatory bowel disease and alimentary lymphoma in cats.

    PubMed

    Awaysheh, Abdullah; Wilcke, Jeffrey; Elvinger, François; Rees, Loren; Fan, Weiguo; Zimmerman, Kurt L

    2016-11-01

    Inflammatory bowel disease (IBD) and alimentary lymphoma (ALA) are common gastrointestinal diseases in cats. The very similar clinical signs and histopathologic features of these diseases make the distinction between them diagnostically challenging. We tested the use of supervised machine-learning algorithms to differentiate between the 2 diseases using data generated from noninvasive diagnostic tests. Three prediction models were developed using 3 machine-learning algorithms: naive Bayes, decision trees, and artificial neural networks. The models were trained and tested on data from complete blood count (CBC) and serum chemistry (SC) results for the following 3 groups of client-owned cats: normal, inflammatory bowel disease (IBD), or alimentary lymphoma (ALA). Naive Bayes and artificial neural networks achieved higher classification accuracy (sensitivities of 70.8% and 69.2%, respectively) than the decision tree algorithm (63%, p < 0.0001). The areas under the receiver-operating characteristic curve for classifying cases into the 3 categories was 83% by naive Bayes, 79% by decision tree, and 82% by artificial neural networks. Prediction models using machine learning provided a method for distinguishing between ALA-IBD, ALA-normal, and IBD-normal. The naive Bayes and artificial neural networks classifiers used 10 and 4 of the CBC and SC variables, respectively, to outperform the C4.5 decision tree, which used 5 CBC and SC variables in classifying cats into the 3 classes. These models can provide another noninvasive diagnostic tool to assist clinicians with differentiating between IBD and ALA, and between diseased and nondiseased cats. © 2016 The Author(s).

  1. Automated Classification of Pathology Reports.

    PubMed

    Oleynik, Michel; Finger, Marcelo; Patrão, Diogo F C

    2015-01-01

    This work develops an automated classifier of pathology reports which infers the topography and the morphology classes of a tumor using codes from the International Classification of Diseases for Oncology (ICD-O). Data from 94,980 patients of the A.C. Camargo Cancer Center was used for training and validation of Naive Bayes classifiers, evaluated by the F1-score. Measures greater than 74% in the topographic group and 61% in the morphologic group are reported. Our work provides a successful baseline for future research for the classification of medical documents written in Portuguese and in other domains.

  2. Neuropsychological Test Selection for Cognitive Impairment Classification: A Machine Learning Approach

    PubMed Central

    Williams, Jennifer A.; Schmitter-Edgecombe, Maureen; Cook, Diane J.

    2016-01-01

    Introduction Reducing the amount of testing required to accurately detect cognitive impairment is clinically relevant. The aim of this research was to determine the fewest number of clinical measures required to accurately classify participants as healthy older adult, mild cognitive impairment (MCI) or dementia using a suite of classification techniques. Methods Two variable selection machine learning models (i.e., naive Bayes, decision tree), a logistic regression, and two participant datasets (i.e., clinical diagnosis, clinical dementia rating; CDR) were explored. Participants classified using clinical diagnosis criteria included 52 individuals with dementia, 97 with MCI, and 161 cognitively healthy older adults. Participants classified using CDR included 154 individuals CDR = 0, 93 individuals with CDR = 0.5, and 25 individuals with CDR = 1.0+. Twenty-seven demographic, psychological, and neuropsychological variables were available for variable selection. Results No significant difference was observed between naive Bayes, decision tree, and logistic regression models for classification of both clinical diagnosis and CDR datasets. Participant classification (70.0 – 99.1%), geometric mean (60.9 – 98.1%), sensitivity (44.2 – 100%), and specificity (52.7 – 100%) were generally satisfactory. Unsurprisingly, the MCI/CDR = 0.5 participant group was the most challenging to classify. Through variable selection only 2 – 9 variables were required for classification and varied between datasets in a clinically meaningful way. Conclusions The current study results reveal that machine learning techniques can accurately classifying cognitive impairment and reduce the number of measures required for diagnosis. PMID:26332171

  3. Breast cancer Ki67 expression preoperative discrimination by DCE-MRI radiomics features

    NASA Astrophysics Data System (ADS)

    Ma, Wenjuan; Ji, Yu; Qin, Zhuanping; Guo, Xinpeng; Jian, Xiqi; Liu, Peifang

    2018-02-01

    To investigate whether quantitative radiomics features extracted from dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) are associated with Ki67 expression of breast cancer. In this institutional review board approved retrospective study, we collected 377 cases Chinese women who were diagnosed with invasive breast cancer in 2015. This cohort included 53 low-Ki67 expression (Ki67 proliferation index less than 14%) and 324 cases with high-Ki67 expression (Ki67 proliferation index more than 14%). A binary-classification of low- vs. high- Ki67 expression was performed. A set of 52 quantitative radiomics features, including morphological, gray scale statistic, and texture features, were extracted from the segmented lesion area. Three most common machine learning classification methods, including Naive Bayes, k-Nearest Neighbor and support vector machine with Gaussian kernel, were employed for the classification and the least absolute shrink age and selection operator (LASSO) method was used to select most predictive features set for the classifiers. Classification performance was evaluated by the area under receiver operating characteristic curve (AUC), accuracy, sensitivity and specificity. The model that used Naive Bayes classification method achieved the best performance than the other two methods, yielding 0.773 AUC value, 0.757 accuracy, 0.777 sensitivity and 0.769 specificity. Our study showed that quantitative radiomics imaging features of breast tumor extracted from DCE-MRI are associated with breast cancer Ki67 expression. Future larger studies are needed in order to further evaluate the findings.

  4. Sentiment analysis system for movie review in Bahasa Indonesia using naive bayes classifier method

    NASA Astrophysics Data System (ADS)

    Nurdiansyah, Yanuar; Bukhori, Saiful; Hidayat, Rahmad

    2018-04-01

    There are many ways of implementing the use of sentiments often found in documents; one of which is the sentiments found on the product or service reviews. It is so important to be able to process and extract textual data from the documents. Therefore, we propose a system that is able to classify sentiments from review documents into two classes: positive sentiment and negative sentiment. We use Naive Bayes Classifier method in this document classification system that we build. We choose Movienthusiast, a movie reviews in Bahasa Indonesia website as the source of our review documents. From there, we were able to collect 1201 movie reviews: 783 positive reviews and 418 negative reviews that we use as the dataset for this machine learning classifier. The classifying accuracy yields an average of 88.37% from five times of accuracy measuring attempts using aforementioned dataset.

  5. Improved Fuzzy K-Nearest Neighbor Using Modified Particle Swarm Optimization

    NASA Astrophysics Data System (ADS)

    Jamaluddin; Siringoringo, Rimbun

    2017-12-01

    Fuzzy k-Nearest Neighbor (FkNN) is one of the most powerful classification methods. The presence of fuzzy concepts in this method successfully improves its performance on almost all classification issues. The main drawbackof FKNN is that it is difficult to determine the parameters. These parameters are the number of neighbors (k) and fuzzy strength (m). Both parameters are very sensitive. This makes it difficult to determine the values of ‘m’ and ‘k’, thus making FKNN difficult to control because no theories or guides can deduce how proper ‘m’ and ‘k’ should be. This study uses Modified Particle Swarm Optimization (MPSO) to determine the best value of ‘k’ and ‘m’. MPSO is focused on the Constriction Factor Method. Constriction Factor Method is an improvement of PSO in order to avoid local circumstances optima. The model proposed in this study was tested on the German Credit Dataset. The test of the data/The data test has been standardized by UCI Machine Learning Repository which is widely applied to classification problems. The application of MPSO to the determination of FKNN parameters is expected to increase the value of classification performance. Based on the experiments that have been done indicating that the model offered in this research results in a better classification performance compared to the Fk-NN model only. The model offered in this study has an accuracy rate of 81%, while. With using Fk-NN model, it has the accuracy of 70%. At the end is done comparison of research model superiority with 2 other classification models;such as Naive Bayes and Decision Tree. This research model has a better performance level, where Naive Bayes has accuracy 75%, and the decision tree model has 70%

  6. Hierarchical Naive Bayes for genetic association studies.

    PubMed

    Malovini, Alberto; Barbarini, Nicola; Bellazzi, Riccardo; de Michelis, Francesca

    2012-01-01

    Genome Wide Association Studies represent powerful approaches that aim at disentangling the genetic and molecular mechanisms underlying complex traits. The usual "one-SNP-at-the-time" testing strategy cannot capture the multi-factorial nature of this kind of disorders. We propose a Hierarchical Naïve Bayes classification model for taking into account associations in SNPs data characterized by Linkage Disequilibrium. Validation shows that our model reaches classification performances superior to those obtained by the standard Naïve Bayes classifier for simulated and real datasets. In the Hierarchical Naïve Bayes implemented, the SNPs mapping to the same region of Linkage Disequilibrium are considered as "details" or "replicates" of the locus, each contributing to the overall effect of the region on the phenotype. A latent variable for each block, which models the "population" of correlated SNPs, can be then used to summarize the available information. The classification is thus performed relying on the latent variables conditional probability distributions and on the SNPs data available. The developed methodology has been tested on simulated datasets, each composed by 300 cases, 300 controls and a variable number of SNPs. Our approach has been also applied to two real datasets on the genetic bases of Type 1 Diabetes and Type 2 Diabetes generated by the Wellcome Trust Case Control Consortium. The approach proposed in this paper, called Hierarchical Naïve Bayes, allows dealing with classification of examples for which genetic information of structurally correlated SNPs are available. It improves the Naïve Bayes performances by properly handling the within-loci variability.

  7. Know your data: understanding implicit usage versus explicit action in video content classification

    NASA Astrophysics Data System (ADS)

    Yew, Jude; Shamma, David A.

    2011-02-01

    In this paper, we present a method for video category classification using only social metadata from websites like YouTube. In place of content analysis, we utilize communicative and social contexts surrounding videos as a means to determine a categorical genre, e.g. Comedy, Music. We hypothesize that video clips belonging to different genre categories would have distinct signatures and patterns that are reflected in their collected metadata. In particular, we define and describe social metadata as usage or action to aid in classification. We trained a Naive Bayes classifier to predict categories from a sample of 1,740 YouTube videos representing the top five genre categories. Using just a small number of the available metadata features, we compare the classifications produced by our Naive Bayes classifier with those provided by the uploader of that particular video. Compared to random predictions with the YouTube data (21% accurate), our classifier attained a mediocre 33% accuracy in predicting video genres. However, we found that the accuracy of our classifier significantly improves by nominal factoring of the explicit data features. By factoring the ratings of the videos in the dataset, the classifier was able to accurately predict the genres of 75% of the videos. We argue that the patterns of social activity found in the metadata are not just meaningful in their own right, but are indicative of the meaning of the shared video content. The results presented by this project represents a first step in investigating the potential meaning and significance of social metadata and its relation to the media experience.

  8. Understanding of the naive Bayes classifier in spam filtering

    NASA Astrophysics Data System (ADS)

    Wei, Qijia

    2018-05-01

    Along with the development of the Internet, the information stream is experiencing an unprecedented burst. The methods of information transmission become more and more important and people receiving effective information is a hot topic in the both research and industry field. As one of the most common methods of information communication, email has its own advantages. However, spams always flood the inbox and automatic filtering is needed. This paper is going to discuss this issue from the perspective of Naive Bayes Classifier, which is one of the applications of Bayes Theorem. Concepts and process of Naive Bayes Classifier will be introduced, followed by two examples. Discussion with Machine Learning is made in the last section. Naive Bayes Classifier has been proved to be surprisingly effective, with the limitation of the interdependence among attributes which are usually email words or phrases.

  9. Optimization of Candidate Selection Using Naive Bayes: Case Study in Company X

    NASA Astrophysics Data System (ADS)

    Kadar, JA; Agustono, D.; Napitupulu, D.

    2018-01-01

    This research was conducted as a decision-making system, and an alternative solution to complete the candidate assessment for a particular position. The human resources (HR) section on company X is responsible and initiative in selecting candidates in accordance with the assessment of their superiors. Selection by using the method of filling out the manager’s assessment questionnaire on the candidate’s subordinate. Three (3) managers have been determined to assess the 11 candidates for subordinates. By using questionnaire of quality classification of human resources and formula naive bayes it will get result which finally grouped using criteria scale as final grouping. The HR department has also determined that what is received is that which meets criteria 5. The result is three (3) candidates who can be proposed as candidates for certain positions in company X, and have met all required calculations. Furthermore the candidate will be given to management as an alternative input data in the selection of candidates.

  10. The comprehensive health care orientation process indicators explain hospital organisation's attractiveness: a Bayesian analysis of newly hired nurse and physician survey data.

    PubMed

    Peltokoski, Jaana; Vehviläinen-Julkunen, Katri; Pitkäaho, Taina; Mikkonen, Santtu; Miettinen, Merja

    2015-10-01

    To examine the relationship of a comprehensive health care orientation process with a hospital's attractiveness. Little is known about indicators of the employee orientation process that most likely explain a hospital organisation's attractiveness. Empirical data collected from registered nurses (n = 145) and physicians (n = 37) working in two specialised hospital districts. A Naive Bayes Classification was applied to examine the comprehensive orientation process indicators that predict hospital's attractiveness. The model was composed of five orientation process indicators: the contribution of the orientation process to nurses' and physicians' intention to stay; the defined responsibilities of the orientation process; interaction between newcomer and colleagues; responsibilities that are adapted for tasks; and newcomers' baseline knowledge assessment that should be done before the orientation phase. The Naive Bayes Classification was used to explore employee orientation process and related indicators. The model constructed provides insight that can be used in designing and implementing the orientation process to promote the hospital organisation's attractiveness. Managers should focus on developing fluently organised orientation practices based on the indicators that predict the hospital's attractiveness. For the purpose of personalised orientation, employees' baseline knowledge and competence level should be assessed before the orientation phase. © 2014 John Wiley & Sons Ltd.

  11. Privacy-Preserving Evaluation of Generalization Error and Its Application to Model and Attribute Selection

    NASA Astrophysics Data System (ADS)

    Sakuma, Jun; Wright, Rebecca N.

    Privacy-preserving classification is the task of learning or training a classifier on the union of privately distributed datasets without sharing the datasets. The emphasis of existing studies in privacy-preserving classification has primarily been put on the design of privacy-preserving versions of particular data mining algorithms, However, in classification problems, preprocessing and postprocessing— such as model selection or attribute selection—play a prominent role in achieving higher classification accuracy. In this paper, we show generalization error of classifiers in privacy-preserving classification can be securely evaluated without sharing prediction results. Our main technical contribution is a new generalized Hamming distance protocol that is universally applicable to preprocessing and postprocessing of various privacy-preserving classification problems, such as model selection in support vector machine and attribute selection in naive Bayes classification.

  12. The impact of modeling the dependencies among patient findings on classification accuracy and calibration.

    PubMed Central

    Monti, S.; Cooper, G. F.

    1998-01-01

    We present a new Bayesian classifier for computer-aided diagnosis. The new classifier builds upon the naive-Bayes classifier, and models the dependencies among patient findings in an attempt to improve its performance, both in terms of classification accuracy and in terms of calibration of the estimated probabilities. This work finds motivation in the argument that highly calibrated probabilities are necessary for the clinician to be able to rely on the model's recommendations. Experimental results are presented, supporting the conclusion that modeling the dependencies among findings improves calibration. PMID:9929288

  13. Automatic migraine classification via feature selection committee and machine learning techniques over imaging and questionnaire data.

    PubMed

    Garcia-Chimeno, Yolanda; Garcia-Zapirain, Begonya; Gomez-Beldarrain, Marian; Fernandez-Ruanova, Begonya; Garcia-Monco, Juan Carlos

    2017-04-13

    Feature selection methods are commonly used to identify subsets of relevant features to facilitate the construction of models for classification, yet little is known about how feature selection methods perform in diffusion tensor images (DTIs). In this study, feature selection and machine learning classification methods were tested for the purpose of automating diagnosis of migraines using both DTIs and questionnaire answers related to emotion and cognition - factors that influence of pain perceptions. We select 52 adult subjects for the study divided into three groups: control group (15), subjects with sporadic migraine (19) and subjects with chronic migraine and medication overuse (18). These subjects underwent magnetic resonance with diffusion tensor to see white matter pathway integrity of the regions of interest involved in pain and emotion. The tests also gather data about pathology. The DTI images and test results were then introduced into feature selection algorithms (Gradient Tree Boosting, L1-based, Random Forest and Univariate) to reduce features of the first dataset and classification algorithms (SVM (Support Vector Machine), Boosting (Adaboost) and Naive Bayes) to perform a classification of migraine group. Moreover we implement a committee method to improve the classification accuracy based on feature selection algorithms. When classifying the migraine group, the greatest improvements in accuracy were made using the proposed committee-based feature selection method. Using this approach, the accuracy of classification into three types improved from 67 to 93% when using the Naive Bayes classifier, from 90 to 95% with the support vector machine classifier, 93 to 94% in boosting. The features that were determined to be most useful for classification included are related with the pain, analgesics and left uncinate brain (connected with the pain and emotions). The proposed feature selection committee method improved the performance of migraine diagnosis classifiers compared to individual feature selection methods, producing a robust system that achieved over 90% accuracy in all classifiers. The results suggest that the proposed methods can be used to support specialists in the classification of migraines in patients undergoing magnetic resonance imaging.

  14. Dynamic Dimensionality Selection for Bayesian Classifier Ensembles

    DTIC Science & Technology

    2015-03-19

    learning of weights in an otherwise generatively learned naive Bayes classifier. WANBIA-C is very cometitive to Logistic Regression but much more...classifier, Generative learning, Discriminative learning, Naïve Bayes, Feature selection, Logistic regression , higher order attribute independence 16...discriminative learning of weights in an otherwise generatively learned naive Bayes classifier. WANBIA-C is very cometitive to Logistic Regression but

  15. A comparison of supervised classification methods for the prediction of substrate type using multibeam acoustic and legacy grain-size data.

    PubMed

    Stephens, David; Diesing, Markus

    2014-01-01

    Detailed seabed substrate maps are increasingly in demand for effective planning and management of marine ecosystems and resources. It has become common to use remotely sensed multibeam echosounder data in the form of bathymetry and acoustic backscatter in conjunction with ground-truth sampling data to inform the mapping of seabed substrates. Whilst, until recently, such data sets have typically been classified by expert interpretation, it is now obvious that more objective, faster and repeatable methods of seabed classification are required. This study compares the performances of a range of supervised classification techniques for predicting substrate type from multibeam echosounder data. The study area is located in the North Sea, off the north-east coast of England. A total of 258 ground-truth samples were classified into four substrate classes. Multibeam bathymetry and backscatter data, and a range of secondary features derived from these datasets were used in this study. Six supervised classification techniques were tested: Classification Trees, Support Vector Machines, k-Nearest Neighbour, Neural Networks, Random Forest and Naive Bayes. Each classifier was trained multiple times using different input features, including i) the two primary features of bathymetry and backscatter, ii) a subset of the features chosen by a feature selection process and iii) all of the input features. The predictive performances of the models were validated using a separate test set of ground-truth samples. The statistical significance of model performances relative to a simple baseline model (Nearest Neighbour predictions on bathymetry and backscatter) were tested to assess the benefits of using more sophisticated approaches. The best performing models were tree based methods and Naive Bayes which achieved accuracies of around 0.8 and kappa coefficients of up to 0.5 on the test set. The models that used all input features didn't generally perform well, highlighting the need for some means of feature selection.

  16. A Machine Learning Concept for DTN Routing

    NASA Technical Reports Server (NTRS)

    Dudukovich, Rachel; Hylton, Alan; Papachristou, Christos

    2017-01-01

    This paper discusses the concept and architecture of a machine learning based router for delay tolerant space networks. The techniques of reinforcement learning and Bayesian learning are used to supplement the routing decisions of the popular Contact Graph Routing algorithm. An introduction to the concepts of Contact Graph Routing, Q-routing and Naive Bayes classification are given. The development of an architecture for a cross-layer feedback framework for DTN (Delay-Tolerant Networking) protocols is discussed. Finally, initial simulation setup and results are given.

  17. Automatic classification of small bowel mucosa alterations in celiac disease for confocal laser endomicroscopy

    NASA Astrophysics Data System (ADS)

    Boschetto, Davide; Di Claudio, Gianluca; Mirzaei, Hadis; Leong, Rupert; Grisan, Enrico

    2016-03-01

    Celiac disease (CD) is an immune-mediated enteropathy triggered by exposure to gluten and similar proteins, affecting genetically susceptible persons, increasing their risk of different complications. Small bowels mucosa damage due to CD involves various degrees of endoscopically relevant lesions, which are not easily recognized: their overall sensitivity and positive predictive values are poor even when zoom-endoscopy is used. Confocal Laser Endomicroscopy (CLE) allows skilled and trained experts to qualitative evaluate mucosa alteration such as a decrease in goblet cells density, presence of villous atrophy or crypt hypertrophy. We present a method for automatically classifying CLE images into three different classes: normal regions, villous atrophy and crypt hypertrophy. This classification is performed after a features selection process, in which four features are extracted from each image, through the application of homomorphic filtering and border identification through Canny and Sobel operators. Three different classifiers have been tested on a dataset of 67 different images labeled by experts in three classes (normal, VA and CH): linear approach, Naive-Bayes quadratic approach and a standard quadratic analysis, all validated with a ten-fold cross validation. Linear classification achieves 82.09% accuracy (class accuracies: 90.32% for normal villi, 82.35% for VA and 68.42% for CH, sensitivity: 0.68, specificity 1.00), Naive Bayes analysis returns 83.58% accuracy (90.32% for normal villi, 70.59% for VA and 84.21% for CH, sensitivity: 0.84 specificity: 0.92), while the quadratic analysis achieves a final accuracy of 94.03% (96.77% accuracy for normal villi, 94.12% for VA and 89.47% for CH, sensitivity: 0.89, specificity: 0.98).

  18. Machine learning approaches to diagnosis and laterality effects in semantic dementia discourse.

    PubMed

    Garrard, Peter; Rentoumi, Vassiliki; Gesierich, Benno; Miller, Bruce; Gorno-Tempini, Maria Luisa

    2014-06-01

    Advances in automatic text classification have been necessitated by the rapid increase in the availability of digital documents. Machine learning (ML) algorithms can 'learn' from data: for instance a ML system can be trained on a set of features derived from written texts belonging to known categories, and learn to distinguish between them. Such a trained system can then be used to classify unseen texts. In this paper, we explore the potential of the technique to classify transcribed speech samples along clinical dimensions, using vocabulary data alone. We report the accuracy with which two related ML algorithms [naive Bayes Gaussian (NBG) and naive Bayes multinomial (NBM)] categorized picture descriptions produced by: 32 semantic dementia (SD) patients versus 10 healthy, age-matched controls; and SD patients with left- (n = 21) versus right-predominant (n = 11) patterns of temporal lobe atrophy. We used information gain (IG) to identify the vocabulary features that were most informative to each of these two distinctions. In the SD versus control classification task, both algorithms achieved accuracies of greater than 90%. In the right- versus left-temporal lobe predominant classification, NBM achieved a high level of accuracy (88%), but this was achieved by both NBM and NBG when the features used in the training set were restricted to those with high values of IG. The most informative features for the patient versus control task were low frequency content words, generic terms and components of metanarrative statements. For the right versus left task the number of informative lexical features was too small to support any specific inferences. An enriched feature set, including values derived from Quantitative Production Analysis (QPA) may shed further light on this little understood distinction. Copyright © 2013 Elsevier Ltd. All rights reserved.

  19. Naive Bayes Bearing Fault Diagnosis Based on Enhanced Independence of Data

    PubMed Central

    Zhang, Nannan; Wu, Lifeng; Yang, Jing; Guan, Yong

    2018-01-01

    The bearing is the key component of rotating machinery, and its performance directly determines the reliability and safety of the system. Data-based bearing fault diagnosis has become a research hotspot. Naive Bayes (NB), which is based on independent presumption, is widely used in fault diagnosis. However, the bearing data are not completely independent, which reduces the performance of NB algorithms. In order to solve this problem, we propose a NB bearing fault diagnosis method based on enhanced independence of data. The method deals with data vector from two aspects: the attribute feature and the sample dimension. After processing, the classification limitation of NB is reduced by the independence hypothesis. First, we extract the statistical characteristics of the original signal of the bearings effectively. Then, the Decision Tree algorithm is used to select the important features of the time domain signal, and the low correlation features is selected. Next, the Selective Support Vector Machine (SSVM) is used to prune the dimension data and remove redundant vectors. Finally, we use NB to diagnose the fault with the low correlation data. The experimental results show that the independent enhancement of data is effective for bearing fault diagnosis. PMID:29401730

  20. Identifying Wrist Fracture Patients with High Accuracy by Automatic Categorization of X-ray Reports

    PubMed Central

    de Bruijn, Berry; Cranney, Ann; O’Donnell, Siobhan; Martin, Joel D.; Forster, Alan J.

    2006-01-01

    The authors performed this study to determine the accuracy of several text classification methods to categorize wrist x-ray reports. We randomly sampled 751 textual wrist x-ray reports. Two expert reviewers rated the presence (n = 301) or absence (n = 450) of an acute fracture of wrist. We developed two information retrieval (IR) text classification methods and a machine learning method using a support vector machine (TC-1). In cross-validation on the derivation set (n = 493), TC-1 outperformed the two IR based methods and six benchmark classifiers, including Naive Bayes and a Neural Network. In the validation set (n = 258), TC-1 demonstrated consistent performance with 93.8% accuracy; 95.5% sensitivity; 92.9% specificity; and 87.5% positive predictive value. TC-1 was easy to implement and superior in performance to the other classification methods. PMID:16929046

  1. Wood identification of Dalbergia nigra (CITES Appendix I) using quantitative wood anatomy, principal components analysis and naive Bayes classification.

    PubMed

    Gasson, Peter; Miller, Regis; Stekel, Dov J; Whinder, Frances; Zieminska, Kasia

    2010-01-01

    Dalbergia nigra is one of the most valuable timber species of its genus, having been traded for over 300 years. Due to over-exploitation it is facing extinction and trade has been banned under CITES Appendix I since 1992. Current methods, primarily comparative wood anatomy, are inadequate for conclusive species identification. This study aims to find a set of anatomical characters that distinguish the wood of D. nigra from other commercially important species of Dalbergia from Latin America. Qualitative and quantitative wood anatomy, principal components analysis and naïve Bayes classification were conducted on 43 specimens of Dalbergia, eight D. nigra and 35 from six other Latin American species. Dalbergia cearensis and D. miscolobium can be distinguished from D. nigra on the basis of vessel frequency for the former, and ray frequency for the latter. Principal components analysis was unable to provide any further basis for separating the species. Naïve Bayes classification using the four characters: minimum vessel diameter; frequency of solitary vessels; mean ray width; and frequency of axially fused rays, classified all eight D. nigra correctly with no false negatives, but there was a false positive rate of 36.36 %. Wood anatomy alone cannot distinguish D. nigra from all other commercially important Dalbergia species likely to be encountered by customs officials, but can be used to reduce the number of specimens that would need further study.

  2. Naive Probability: A Mental Model Theory of Extensional Reasoning.

    ERIC Educational Resources Information Center

    Johnson-Laird, P. N.; Legrenzi, Paolo; Girotto, Vittorio; Legrenzi, Maria Sonino; Caverni, Jean-Paul

    1999-01-01

    Outlines a theory of naive probability in which individuals who are unfamiliar with the probability calculus can infer the probabilities of events in an "extensional" way. The theory accommodates reasoning based on numerical premises, and explains how naive reasoners can infer posterior probabilities without relying on Bayes's theorem.…

  3. Improving medical diagnosis reliability using Boosted C5.0 decision tree empowered by Particle Swarm Optimization.

    PubMed

    Pashaei, Elnaz; Ozen, Mustafa; Aydin, Nizamettin

    2015-08-01

    Improving accuracy of supervised classification algorithms in biomedical applications is one of active area of research. In this study, we improve the performance of Particle Swarm Optimization (PSO) combined with C4.5 decision tree (PSO+C4.5) classifier by applying Boosted C5.0 decision tree as the fitness function. To evaluate the effectiveness of our proposed method, it is implemented on 1 microarray dataset and 5 different medical data sets obtained from UCI machine learning databases. Moreover, the results of PSO + Boosted C5.0 implementation are compared to eight well-known benchmark classification methods (PSO+C4.5, support vector machine under the kernel of Radial Basis Function, Classification And Regression Tree (CART), C4.5 decision tree, C5.0 decision tree, Boosted C5.0 decision tree, Naive Bayes and Weighted K-Nearest neighbor). Repeated five-fold cross-validation method was used to justify the performance of classifiers. Experimental results show that our proposed method not only improve the performance of PSO+C4.5 but also obtains higher classification accuracy compared to the other classification methods.

  4. An ant colony optimization based feature selection for web page classification.

    PubMed

    Saraç, Esra; Özel, Selma Ayşe

    2014-01-01

    The increased popularity of the web has caused the inclusion of huge amount of information to the web, and as a result of this explosive information growth, automated web page classification systems are needed to improve search engines' performance. Web pages have a large number of features such as HTML/XML tags, URLs, hyperlinks, and text contents that should be considered during an automated classification process. The aim of this study is to reduce the number of features to be used to improve runtime and accuracy of the classification of web pages. In this study, we used an ant colony optimization (ACO) algorithm to select the best features, and then we applied the well-known C4.5, naive Bayes, and k nearest neighbor classifiers to assign class labels to web pages. We used the WebKB and Conference datasets in our experiments, and we showed that using the ACO for feature selection improves both accuracy and runtime performance of classification. We also showed that the proposed ACO based algorithm can select better features with respect to the well-known information gain and chi square feature selection methods.

  5. Using clustering and a modified classification algorithm for automatic text summarization

    NASA Astrophysics Data System (ADS)

    Aries, Abdelkrime; Oufaida, Houda; Nouali, Omar

    2013-01-01

    In this paper we describe a modified classification method destined for extractive summarization purpose. The classification in this method doesn't need a learning corpus; it uses the input text to do that. First, we cluster the document sentences to exploit the diversity of topics, then we use a learning algorithm (here we used Naive Bayes) on each cluster considering it as a class. After obtaining the classification model, we calculate the score of a sentence in each class, using a scoring model derived from classification algorithm. These scores are used, then, to reorder the sentences and extract the first ones as the output summary. We conducted some experiments using a corpus of scientific papers, and we have compared our results to another summarization system called UNIS.1 Also, we experiment the impact of clustering threshold tuning, on the resulted summary, as well as the impact of adding more features to the classifier. We found that this method is interesting, and gives good performance, and the addition of new features (which is simple using this method) can improve summary's accuracy.

  6. A Neuro-Fuzzy Approach in the Classification of Students' Academic Performance

    PubMed Central

    2013-01-01

    Classifying the student academic performance with high accuracy facilitates admission decisions and enhances educational services at educational institutions. The purpose of this paper is to present a neuro-fuzzy approach for classifying students into different groups. The neuro-fuzzy classifier used previous exam results and other related factors as input variables and labeled students based on their expected academic performance. The results showed that the proposed approach achieved a high accuracy. The results were also compared with those obtained from other well-known classification approaches, including support vector machine, Naive Bayes, neural network, and decision tree approaches. The comparative analysis indicated that the neuro-fuzzy approach performed better than the others. It is expected that this work may be used to support student admission procedures and to strengthen the services of educational institutions. PMID:24302928

  7. A neuro-fuzzy approach in the classification of students' academic performance.

    PubMed

    Do, Quang Hung; Chen, Jeng-Fung

    2013-01-01

    Classifying the student academic performance with high accuracy facilitates admission decisions and enhances educational services at educational institutions. The purpose of this paper is to present a neuro-fuzzy approach for classifying students into different groups. The neuro-fuzzy classifier used previous exam results and other related factors as input variables and labeled students based on their expected academic performance. The results showed that the proposed approach achieved a high accuracy. The results were also compared with those obtained from other well-known classification approaches, including support vector machine, Naive Bayes, neural network, and decision tree approaches. The comparative analysis indicated that the neuro-fuzzy approach performed better than the others. It is expected that this work may be used to support student admission procedures and to strengthen the services of educational institutions.

  8. Naive scoring of human sleep based on a hidden Markov model of the electroencephalogram.

    PubMed

    Yaghouby, Farid; Modur, Pradeep; Sunderam, Sridhar

    2014-01-01

    Clinical sleep scoring involves tedious visual review of overnight polysomnograms by a human expert. Many attempts have been made to automate the process by training computer algorithms such as support vector machines and hidden Markov models (HMMs) to replicate human scoring. Such supervised classifiers are typically trained on scored data and then validated on scored out-of-sample data. Here we describe a methodology based on HMMs for scoring an overnight sleep recording without the benefit of a trained initial model. The number of states in the data is not known a priori and is optimized using a Bayes information criterion. When tested on a 22-subject database, this unsupervised classifier agreed well with human scores (mean of Cohen's kappa > 0.7). The HMM also outperformed other unsupervised classifiers (Gaussian mixture models, k-means, and linkage trees), that are capable of naive classification but do not model dynamics, by a significant margin (p < 0.05).

  9. Unified framework for triaxial accelerometer-based fall event detection and classification using cumulants and hierarchical decision tree classifier.

    PubMed

    Kambhampati, Satya Samyukta; Singh, Vishal; Manikandan, M Sabarimalai; Ramkumar, Barathram

    2015-08-01

    In this Letter, the authors present a unified framework for fall event detection and classification using the cumulants extracted from the acceleration (ACC) signals acquired using a single waist-mounted triaxial accelerometer. The main objective of this Letter is to find suitable representative cumulants and classifiers in effectively detecting and classifying different types of fall and non-fall events. It was discovered that the first level of the proposed hierarchical decision tree algorithm implements fall detection using fifth-order cumulants and support vector machine (SVM) classifier. In the second level, the fall event classification algorithm uses the fifth-order cumulants and SVM. Finally, human activity classification is performed using the second-order cumulants and SVM. The detection and classification results are compared with those of the decision tree, naive Bayes, multilayer perceptron and SVM classifiers with different types of time-domain features including the second-, third-, fourth- and fifth-order cumulants and the signal magnitude vector and signal magnitude area. The experimental results demonstrate that the second- and fifth-order cumulant features and SVM classifier can achieve optimal detection and classification rates of above 95%, as well as the lowest false alarm rate of 1.03%.

  10. Risk Classification with an Adaptive Naive Bayes Kernel Machine Model.

    PubMed

    Minnier, Jessica; Yuan, Ming; Liu, Jun S; Cai, Tianxi

    2015-04-22

    Genetic studies of complex traits have uncovered only a small number of risk markers explaining a small fraction of heritability and adding little improvement to disease risk prediction. Standard single marker methods may lack power in selecting informative markers or estimating effects. Most existing methods also typically do not account for non-linearity. Identifying markers with weak signals and estimating their joint effects among many non-informative markers remains challenging. One potential approach is to group markers based on biological knowledge such as gene structure. If markers in a group tend to have similar effects, proper usage of the group structure could improve power and efficiency in estimation. We propose a two-stage method relating markers to disease risk by taking advantage of known gene-set structures. Imposing a naive bayes kernel machine (KM) model, we estimate gene-set specific risk models that relate each gene-set to the outcome in stage I. The KM framework efficiently models potentially non-linear effects of predictors without requiring explicit specification of functional forms. In stage II, we aggregate information across gene-sets via a regularization procedure. Estimation and computational efficiency is further improved with kernel principle component analysis. Asymptotic results for model estimation and gene set selection are derived and numerical studies suggest that the proposed procedure could outperform existing procedures for constructing genetic risk models.

  11. Gender classification from face images by using local binary pattern and gray-level co-occurrence matrix

    NASA Astrophysics Data System (ADS)

    Uzbaş, Betül; Arslan, Ahmet

    2018-04-01

    Gender is an important step for human computer interactive processes and identification. Human face image is one of the important sources to determine gender. In the present study, gender classification is performed automatically from facial images. In order to classify gender, we propose a combination of features that have been extracted face, eye and lip regions by using a hybrid method of Local Binary Pattern and Gray-Level Co-Occurrence Matrix. The features have been extracted from automatically obtained face, eye and lip regions. All of the extracted features have been combined and given as input parameters to classification methods (Support Vector Machine, Artificial Neural Networks, Naive Bayes and k-Nearest Neighbor methods) for gender classification. The Nottingham Scan face database that consists of the frontal face images of 100 people (50 male and 50 female) is used for this purpose. As the result of the experimental studies, the highest success rate has been achieved as 98% by using Support Vector Machine. The experimental results illustrate the efficacy of our proposed method.

  12. Comparisons and Selections of Features and Classifiers for Short Text Classification

    NASA Astrophysics Data System (ADS)

    Wang, Ye; Zhou, Zhi; Jin, Shan; Liu, Debin; Lu, Mi

    2017-10-01

    Short text is considerably different from traditional long text documents due to its shortness and conciseness, which somehow hinders the applications of conventional machine learning and data mining algorithms in short text classification. According to traditional artificial intelligence methods, we divide short text classification into three steps, namely preprocessing, feature selection and classifier comparison. In this paper, we have illustrated step-by-step how we approach our goals. Specifically, in feature selection, we compared the performance and robustness of the four methods of one-hot encoding, tf-idf weighting, word2vec and paragraph2vec, and in the classification part, we deliberately chose and compared Naive Bayes, Logistic Regression, Support Vector Machine, K-nearest Neighbor and Decision Tree as our classifiers. Then, we compared and analysed the classifiers horizontally with each other and vertically with feature selections. Regarding the datasets, we crawled more than 400,000 short text files from Shanghai and Shenzhen Stock Exchanges and manually labeled them into two classes, the big and the small. There are eight labels in the big class, and 59 labels in the small class.

  13. Texture classification of lung computed tomography images

    NASA Astrophysics Data System (ADS)

    Pheng, Hang See; Shamsuddin, Siti M.

    2013-03-01

    Current development of algorithms in computer-aided diagnosis (CAD) scheme is growing rapidly to assist the radiologist in medical image interpretation. Texture analysis of computed tomography (CT) scans is one of important preliminary stage in the computerized detection system and classification for lung cancer. Among different types of images features analysis, Haralick texture with variety of statistical measures has been used widely in image texture description. The extraction of texture feature values is essential to be used by a CAD especially in classification of the normal and abnormal tissue on the cross sectional CT images. This paper aims to compare experimental results using texture extraction and different machine leaning methods in the classification normal and abnormal tissues through lung CT images. The machine learning methods involve in this assessment are Artificial Immune Recognition System (AIRS), Naive Bayes, Decision Tree (J48) and Backpropagation Neural Network. AIRS is found to provide high accuracy (99.2%) and sensitivity (98.0%) in the assessment. For experiments and testing purpose, publicly available datasets in the Reference Image Database to Evaluate Therapy Response (RIDER) are used as study cases.

  14. Identification of protein-interacting nucleotides in a RNA sequence using composition profile of tri-nucleotides.

    PubMed

    Panwar, Bharat; Raghava, Gajendra P S

    2015-04-01

    The RNA-protein interactions play a diverse role in the cells, thus identification of RNA-protein interface is essential for the biologist to understand their function. In the past, several methods have been developed for predicting RNA interacting residues in proteins, but limited efforts have been made for the identification of protein-interacting nucleotides in RNAs. In order to discriminate protein-interacting and non-interacting nucleotides, we used various classifiers (NaiveBayes, NaiveBayesMultinomial, BayesNet, ComplementNaiveBayes, MultilayerPerceptron, J48, SMO, RandomForest, SMO and SVM(light)) for prediction model development using various features and achieved highest 83.92% sensitivity, 84.82 specificity, 84.62% accuracy and 0.62 Matthew's correlation coefficient by SVM(light) based models. We observed that certain tri-nucleotides like ACA, ACC, AGA, CAC, CCA, GAG, UGA, and UUU preferred in protein-interaction. All the models have been developed using a non-redundant dataset and are evaluated using five-fold cross validation technique. A web-server called RNApin has been developed for the scientific community (http://crdd.osdd.net/raghava/rnapin/). Copyright © 2015 Elsevier Inc. All rights reserved.

  15. An Ant Colony Optimization Based Feature Selection for Web Page Classification

    PubMed Central

    2014-01-01

    The increased popularity of the web has caused the inclusion of huge amount of information to the web, and as a result of this explosive information growth, automated web page classification systems are needed to improve search engines' performance. Web pages have a large number of features such as HTML/XML tags, URLs, hyperlinks, and text contents that should be considered during an automated classification process. The aim of this study is to reduce the number of features to be used to improve runtime and accuracy of the classification of web pages. In this study, we used an ant colony optimization (ACO) algorithm to select the best features, and then we applied the well-known C4.5, naive Bayes, and k nearest neighbor classifiers to assign class labels to web pages. We used the WebKB and Conference datasets in our experiments, and we showed that using the ACO for feature selection improves both accuracy and runtime performance of classification. We also showed that the proposed ACO based algorithm can select better features with respect to the well-known information gain and chi square feature selection methods. PMID:25136678

  16. Learning accurate very fast decision trees from uncertain data streams

    NASA Astrophysics Data System (ADS)

    Liang, Chunquan; Zhang, Yang; Shi, Peng; Hu, Zhengguo

    2015-12-01

    Most existing works on data stream classification assume the streaming data is precise and definite. Such assumption, however, does not always hold in practice, since data uncertainty is ubiquitous in data stream applications due to imprecise measurement, missing values, privacy protection, etc. The goal of this paper is to learn accurate decision tree models from uncertain data streams for classification analysis. On the basis of very fast decision tree (VFDT) algorithms, we proposed an algorithm for constructing an uncertain VFDT tree with classifiers at tree leaves (uVFDTc). The uVFDTc algorithm can exploit uncertain information effectively and efficiently in both the learning and the classification phases. In the learning phase, it uses Hoeffding bound theory to learn from uncertain data streams and yield fast and reasonable decision trees. In the classification phase, at tree leaves it uses uncertain naive Bayes (UNB) classifiers to improve the classification performance. Experimental results on both synthetic and real-life datasets demonstrate the strong ability of uVFDTc to classify uncertain data streams. The use of UNB at tree leaves has improved the performance of uVFDTc, especially the any-time property, the benefit of exploiting uncertain information, and the robustness against uncertainty.

  17. A Corpus-Based Approach for Automatic Thai Unknown Word Recognition Using Boosting Techniques

    NASA Astrophysics Data System (ADS)

    Techo, Jakkrit; Nattee, Cholwich; Theeramunkong, Thanaruk

    While classification techniques can be applied for automatic unknown word recognition in a language without word boundary, it faces with the problem of unbalanced datasets where the number of positive unknown word candidates is dominantly smaller than that of negative candidates. To solve this problem, this paper presents a corpus-based approach that introduces a so-called group-based ranking evaluation technique into ensemble learning in order to generate a sequence of classification models that later collaborate to select the most probable unknown word from multiple candidates. Given a classification model, the group-based ranking evaluation (GRE) is applied to construct a training dataset for learning the succeeding model, by weighing each of its candidates according to their ranks and correctness when the candidates of an unknown word are considered as one group. A number of experiments have been conducted on a large Thai medical text to evaluate performance of the proposed group-based ranking evaluation approach, namely V-GRE, compared to the conventional naïve Bayes classifier and our vanilla version without ensemble learning. As the result, the proposed method achieves an accuracy of 90.93±0.50% when the first rank is selected while it gains 97.26±0.26% when the top-ten candidates are considered, that is 8.45% and 6.79% improvement over the conventional record-based naïve Bayes classifier and the vanilla version. Another result on applying only best features show 93.93±0.22% and up to 98.85±0.15% accuracy for top-1 and top-10, respectively. They are 3.97% and 9.78% improvement over naive Bayes and the vanilla version. Finally, an error analysis is given.

  18. Multivariate and Naive Bayes Text Classification Approach to Cost Growth Risk in Department of Defense Acquisition Programs

    DTIC Science & Technology

    2013-03-01

    alerts 0.00011 3.26E-06 alternative 0.000161 0.000426 amp 5.25E-05 0.003127 amplifier 0.001501 0.000277 angular 0.000103 3.26E-06 anticipate 0.000755...0.000217 0.00056 amp 4.07E-05 0.004884 amplifier 0.002158 0.00043 angular 0.000109 4.48E-06 anticipation 0.000136 0.000453 aperture 0.000624...0.000215 instructed 0.00057 4.93E-05 java 0.000258 4.48E-05 refactoring 0.00019 2.69E-05 strike 0.000271 5.83E-05 touches 1.36E-05 9.86E-05

  19. Pattern classification of fMRI data: applications for analysis of spatially distributed cortical networks.

    PubMed

    Yourganov, Grigori; Schmah, Tanya; Churchill, Nathan W; Berman, Marc G; Grady, Cheryl L; Strother, Stephen C

    2014-08-01

    The field of fMRI data analysis is rapidly growing in sophistication, particularly in the domain of multivariate pattern classification. However, the interaction between the properties of the analytical model and the parameters of the BOLD signal (e.g. signal magnitude, temporal variance and functional connectivity) is still an open problem. We addressed this problem by evaluating a set of pattern classification algorithms on simulated and experimental block-design fMRI data. The set of classifiers consisted of linear and quadratic discriminants, linear support vector machine, and linear and nonlinear Gaussian naive Bayes classifiers. For linear discriminant, we used two methods of regularization: principal component analysis, and ridge regularization. The classifiers were used (1) to classify the volumes according to the behavioral task that was performed by the subject, and (2) to construct spatial maps that indicated the relative contribution of each voxel to classification. Our evaluation metrics were: (1) accuracy of out-of-sample classification and (2) reproducibility of spatial maps. In simulated data sets, we performed an additional evaluation of spatial maps with ROC analysis. We varied the magnitude, temporal variance and connectivity of simulated fMRI signal and identified the optimal classifier for each simulated environment. Overall, the best performers were linear and quadratic discriminants (operating on principal components of the data matrix) and, in some rare situations, a nonlinear Gaussian naïve Bayes classifier. The results from the simulated data were supported by within-subject analysis of experimental fMRI data, collected in a study of aging. This is the first study that systematically characterizes interactions between analysis model and signal parameters (such as magnitude, variance and correlation) on the performance of pattern classifiers for fMRI. Copyright © 2014 Elsevier Inc. All rights reserved.

  20. Improving imbalanced scientific text classification using sampling strategies and dictionaries.

    PubMed

    Borrajo, L; Romero, R; Iglesias, E L; Redondo Marey, C M

    2011-09-15

    Many real applications have the imbalanced class distribution problem, where one of the classes is represented by a very small number of cases compared to the other classes. One of the systems affected are those related to the recovery and classification of scientific documentation. Sampling strategies such as Oversampling and Subsampling are popular in tackling the problem of class imbalance. In this work, we study their effects on three types of classifiers (Knn, SVM and Naive-Bayes) when they are applied to search on the PubMed scientific database. Another purpose of this paper is to study the use of dictionaries in the classification of biomedical texts. Experiments are conducted with three different dictionaries (BioCreative, NLPBA, and an ad-hoc subset of the UniProt database named Protein) using the mentioned classifiers and sampling strategies. Best results were obtained with NLPBA and Protein dictionaries and the SVM classifier using the Subsampling balancing technique. These results were compared with those obtained by other authors using the TREC Genomics 2005 public corpus. Copyright 2011 The Author(s). Published by Journal of Integrative Bioinformatics.

  1. Automatic classification of protein structures using physicochemical parameters.

    PubMed

    Mohan, Abhilash; Rao, M Divya; Sunderrajan, Shruthi; Pennathur, Gautam

    2014-09-01

    Protein classification is the first step to functional annotation; SCOP and Pfam databases are currently the most relevant protein classification schemes. However, the disproportion in the number of three dimensional (3D) protein structures generated versus their classification into relevant superfamilies/families emphasizes the need for automated classification schemes. Predicting function of novel proteins based on sequence information alone has proven to be a major challenge. The present study focuses on the use of physicochemical parameters in conjunction with machine learning algorithms (Naive Bayes, Decision Trees, Random Forest and Support Vector Machines) to classify proteins into their respective SCOP superfamily/Pfam family, using sequence derived information. Spectrophores™, a 1D descriptor of the 3D molecular field surrounding a structure was used as a benchmark to compare the performance of the physicochemical parameters. The machine learning algorithms were modified to select features based on information gain for each SCOP superfamily/Pfam family. The effect of combining physicochemical parameters and spectrophores on classification accuracy (CA) was studied. Machine learning algorithms trained with the physicochemical parameters consistently classified SCOP superfamilies and Pfam families with a classification accuracy above 90%, while spectrophores performed with a CA of around 85%. Feature selection improved classification accuracy for both physicochemical parameters and spectrophores based machine learning algorithms. Combining both attributes resulted in a marginal loss of performance. Physicochemical parameters were able to classify proteins from both schemes with classification accuracy ranging from 90-96%. These results suggest the usefulness of this method in classifying proteins from amino acid sequences.

  2. Classifying smoking urges via machine learning

    PubMed Central

    Dumortier, Antoine; Beckjord, Ellen; Shiffman, Saul; Sejdić, Ervin

    2016-01-01

    Background and objective Smoking is the largest preventable cause of death and diseases in the developed world, and advances in modern electronics and machine learning can help us deliver real-time intervention to smokers in novel ways. In this paper, we examine different machine learning approaches to use situational features associated with having or not having urges to smoke during a quit attempt in order to accurately classify high-urge states. Methods To test our machine learning approaches, specifically, Bayes, discriminant analysis and decision tree learning methods, we used a dataset collected from over 300 participants who had initiated a quit attempt. The three classification approaches are evaluated observing sensitivity, specificity, accuracy and precision. Results The outcome of the analysis showed that algorithms based on feature selection make it possible to obtain high classification rates with only a few features selected from the entire dataset. The classification tree method outperformed the naive Bayes and discriminant analysis methods, with an accuracy of the classifications up to 86%. These numbers suggest that machine learning may be a suitable approach to deal with smoking cessation matters, and to predict smoking urges, outlining a potential use for mobile health applications. Conclusions In conclusion, machine learning classifiers can help identify smoking situations, and the search for the best features and classifier parameters significantly improves the algorithms’ performance. In addition, this study also supports the usefulness of new technologies in improving the effect of smoking cessation interventions, the management of time and patients by therapists, and thus the optimization of available health care resources. Future studies should focus on providing more adaptive and personalized support to people who really need it, in a minimum amount of time by developing novel expert systems capable of delivering real-time interventions. PMID:28110725

  3. Classifying smoking urges via machine learning.

    PubMed

    Dumortier, Antoine; Beckjord, Ellen; Shiffman, Saul; Sejdić, Ervin

    2016-12-01

    Smoking is the largest preventable cause of death and diseases in the developed world, and advances in modern electronics and machine learning can help us deliver real-time intervention to smokers in novel ways. In this paper, we examine different machine learning approaches to use situational features associated with having or not having urges to smoke during a quit attempt in order to accurately classify high-urge states. To test our machine learning approaches, specifically, Bayes, discriminant analysis and decision tree learning methods, we used a dataset collected from over 300 participants who had initiated a quit attempt. The three classification approaches are evaluated observing sensitivity, specificity, accuracy and precision. The outcome of the analysis showed that algorithms based on feature selection make it possible to obtain high classification rates with only a few features selected from the entire dataset. The classification tree method outperformed the naive Bayes and discriminant analysis methods, with an accuracy of the classifications up to 86%. These numbers suggest that machine learning may be a suitable approach to deal with smoking cessation matters, and to predict smoking urges, outlining a potential use for mobile health applications. In conclusion, machine learning classifiers can help identify smoking situations, and the search for the best features and classifier parameters significantly improves the algorithms' performance. In addition, this study also supports the usefulness of new technologies in improving the effect of smoking cessation interventions, the management of time and patients by therapists, and thus the optimization of available health care resources. Future studies should focus on providing more adaptive and personalized support to people who really need it, in a minimum amount of time by developing novel expert systems capable of delivering real-time interventions. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  4. Evolving optimised decision rules for intrusion detection using particle swarm paradigm

    NASA Astrophysics Data System (ADS)

    Sivatha Sindhu, Siva S.; Geetha, S.; Kannan, A.

    2012-12-01

    The aim of this article is to construct a practical intrusion detection system (IDS) that properly analyses the statistics of network traffic pattern and classify them as normal or anomalous class. The objective of this article is to prove that the choice of effective network traffic features and a proficient machine-learning paradigm enhances the detection accuracy of IDS. In this article, a rule-based approach with a family of six decision tree classifiers, namely Decision Stump, C4.5, Naive Baye's Tree, Random Forest, Random Tree and Representative Tree model to perform the detection of anomalous network pattern is introduced. In particular, the proposed swarm optimisation-based approach selects instances that compose training set and optimised decision tree operate over this trained set producing classification rules with improved coverage, classification capability and generalisation ability. Experiment with the Knowledge Discovery and Data mining (KDD) data set which have information on traffic pattern, during normal and intrusive behaviour shows that the proposed algorithm produces optimised decision rules and outperforms other machine-learning algorithm.

  5. Classification of older adults with/without a fall history using machine learning methods.

    PubMed

    Lin Zhang; Ou Ma; Fabre, Jennifer M; Wood, Robert H; Garcia, Stephanie U; Ivey, Kayla M; McCann, Evan D

    2015-01-01

    Falling is a serious problem in an aged society such that assessment of the risk of falls for individuals is imperative for the research and practice of falls prevention. This paper introduces an application of several machine learning methods for training a classifier which is capable of classifying individual older adults into a high risk group and a low risk group (distinguished by whether or not the members of the group have a recent history of falls). Using a 3D motion capture system, significant gait features related to falls risk are extracted. By training these features, classification hypotheses are obtained based on machine learning techniques (K Nearest-neighbour, Naive Bayes, Logistic Regression, Neural Network, and Support Vector Machine). Training and test accuracies with sensitivity and specificity of each of these techniques are assessed. The feature adjustment and tuning of the machine learning algorithms are discussed. The outcome of the study will benefit the prediction and prevention of falls.

  6. Ideal discrimination of discrete clinical endpoints using multilocus genotypes.

    PubMed

    Hahn, Lance W; Moore, Jason H

    2004-01-01

    Multifactor Dimensionality Reduction (MDR) is a method for the classification and prediction of discrete clinical endpoints using attributes constructed from multilocus genotype data. Empirical studies with both real and simulated data suggest that MDR has good power for detecting gene-gene interactions in the absence of independent main effects. The purpose of this study is to develop an objective, theory-driven approach to evaluate the strengths and limitations of MDR. To accomplish this goal, we borrow concepts from ideal observer analysis used in visual perception to evaluate the theoretical limits of classifying and predicting discrete clinical endpoints using multilocus genotype data. We conclude that MDR ideally discriminates between low risk and high risk subjects using attributes constructed from multilocus genotype data. We also how that the classification approach used once a multilocus attribute is constructed is similar to that of a naive Bayes classifier. This study provides a theoretical foundation for the continued development, evaluation, and application of the MDR as a data mining tool in the domain of statistical genetics and genetic epidemiology.

  7. Applying Data Mining Techniques to Improve Breast Cancer Diagnosis.

    PubMed

    Diz, Joana; Marreiros, Goreti; Freitas, Alberto

    2016-09-01

    In the field of breast cancer research, and more than ever, new computer aided diagnosis based systems have been developed aiming to reduce diagnostic tests false-positives. Within this work, we present a data mining based approach which might support oncologists in the process of breast cancer classification and diagnosis. The present study aims to compare two breast cancer datasets and find the best methods in predicting benign/malignant lesions, breast density classification, and even for finding identification (mass / microcalcification distinction). To carry out these tasks, two matrices of texture features extraction were implemented using Matlab, and classified using data mining algorithms, on WEKA. Results revealed good percentages of accuracy for each class: 89.3 to 64.7 % - benign/malignant; 75.8 to 78.3 % - dense/fatty tissue; 71.0 to 83.1 % - finding identification. Among the different tests classifiers, Naive Bayes was the best to identify masses texture, and Random Forests was the first or second best classifier for the majority of tested groups.

  8. Photometric Supernova Classification with Machine Learning

    NASA Astrophysics Data System (ADS)

    Lochner, Michelle; McEwen, Jason D.; Peiris, Hiranya V.; Lahav, Ofer; Winter, Max K.

    2016-08-01

    Automated photometric supernova classification has become an active area of research in recent years in light of current and upcoming imaging surveys such as the Dark Energy Survey (DES) and the Large Synoptic Survey Telescope, given that spectroscopic confirmation of type for all supernovae discovered will be impossible. Here, we develop a multi-faceted classification pipeline, combining existing and new approaches. Our pipeline consists of two stages: extracting descriptive features from the light curves and classification using a machine learning algorithm. Our feature extraction methods vary from model-dependent techniques, namely SALT2 fits, to more independent techniques that fit parametric models to curves, to a completely model-independent wavelet approach. We cover a range of representative machine learning algorithms, including naive Bayes, k-nearest neighbors, support vector machines, artificial neural networks, and boosted decision trees (BDTs). We test the pipeline on simulated multi-band DES light curves from the Supernova Photometric Classification Challenge. Using the commonly used area under the curve (AUC) of the Receiver Operating Characteristic as a metric, we find that the SALT2 fits and the wavelet approach, with the BDTs algorithm, each achieve an AUC of 0.98, where 1 represents perfect classification. We find that a representative training set is essential for good classification, whatever the feature set or algorithm, with implications for spectroscopic follow-up. Importantly, we find that by using either the SALT2 or the wavelet feature sets with a BDT algorithm, accurate classification is possible purely from light curve data, without the need for any redshift information.

  9. Application of texture analysis method for mammogram density classification

    NASA Astrophysics Data System (ADS)

    Nithya, R.; Santhi, B.

    2017-07-01

    Mammographic density is considered a major risk factor for developing breast cancer. This paper proposes an automated approach to classify breast tissue types in digital mammogram. The main objective of the proposed Computer-Aided Diagnosis (CAD) system is to investigate various feature extraction methods and classifiers to improve the diagnostic accuracy in mammogram density classification. Texture analysis methods are used to extract the features from the mammogram. Texture features are extracted by using histogram, Gray Level Co-Occurrence Matrix (GLCM), Gray Level Run Length Matrix (GLRLM), Gray Level Difference Matrix (GLDM), Local Binary Pattern (LBP), Entropy, Discrete Wavelet Transform (DWT), Wavelet Packet Transform (WPT), Gabor transform and trace transform. These extracted features are selected using Analysis of Variance (ANOVA). The features selected by ANOVA are fed into the classifiers to characterize the mammogram into two-class (fatty/dense) and three-class (fatty/glandular/dense) breast density classification. This work has been carried out by using the mini-Mammographic Image Analysis Society (MIAS) database. Five classifiers are employed namely, Artificial Neural Network (ANN), Linear Discriminant Analysis (LDA), Naive Bayes (NB), K-Nearest Neighbor (KNN), and Support Vector Machine (SVM). Experimental results show that ANN provides better performance than LDA, NB, KNN and SVM classifiers. The proposed methodology has achieved 97.5% accuracy for three-class and 99.37% for two-class density classification.

  10. PHOTOMETRIC SUPERNOVA CLASSIFICATION WITH MACHINE LEARNING

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lochner, Michelle; Peiris, Hiranya V.; Lahav, Ofer

    Automated photometric supernova classification has become an active area of research in recent years in light of current and upcoming imaging surveys such as the Dark Energy Survey (DES) and the Large Synoptic Survey Telescope, given that spectroscopic confirmation of type for all supernovae discovered will be impossible. Here, we develop a multi-faceted classification pipeline, combining existing and new approaches. Our pipeline consists of two stages: extracting descriptive features from the light curves and classification using a machine learning algorithm. Our feature extraction methods vary from model-dependent techniques, namely SALT2 fits, to more independent techniques that fit parametric models tomore » curves, to a completely model-independent wavelet approach. We cover a range of representative machine learning algorithms, including naive Bayes, k -nearest neighbors, support vector machines, artificial neural networks, and boosted decision trees (BDTs). We test the pipeline on simulated multi-band DES light curves from the Supernova Photometric Classification Challenge. Using the commonly used area under the curve (AUC) of the Receiver Operating Characteristic as a metric, we find that the SALT2 fits and the wavelet approach, with the BDTs algorithm, each achieve an AUC of 0.98, where 1 represents perfect classification. We find that a representative training set is essential for good classification, whatever the feature set or algorithm, with implications for spectroscopic follow-up. Importantly, we find that by using either the SALT2 or the wavelet feature sets with a BDT algorithm, accurate classification is possible purely from light curve data, without the need for any redshift information.« less

  11. Automated system for characterization and classification of malaria-infected stages using light microscopic images of thin blood smears.

    PubMed

    Das, D K; Maiti, A K; Chakraborty, C

    2015-03-01

    In this paper, we propose a comprehensive image characterization cum classification framework for malaria-infected stage detection using microscopic images of thin blood smears. The methodology mainly includes microscopic imaging of Leishman stained blood slides, noise reduction and illumination correction, erythrocyte segmentation, feature selection followed by machine classification. Amongst three-image segmentation algorithms (namely, rule-based, Chan-Vese-based and marker-controlled watershed methods), marker-controlled watershed technique provides better boundary detection of erythrocytes specially in overlapping situations. Microscopic features at intensity, texture and morphology levels are extracted to discriminate infected and noninfected erythrocytes. In order to achieve subgroup of potential features, feature selection techniques, namely, F-statistic and information gain criteria are considered here for ranking. Finally, five different classifiers, namely, Naive Bayes, multilayer perceptron neural network, logistic regression, classification and regression tree (CART), RBF neural network have been trained and tested by 888 erythrocytes (infected and noninfected) for each features' subset. Performance evaluation of the proposed methodology shows that multilayer perceptron network provides higher accuracy for malaria-infected erythrocytes recognition and infected stage classification. Results show that top 90 features ranked by F-statistic (specificity: 98.64%, sensitivity: 100%, PPV: 99.73% and overall accuracy: 96.84%) and top 60 features ranked by information gain provides better results (specificity: 97.29%, sensitivity: 100%, PPV: 99.46% and overall accuracy: 96.73%) for malaria-infected stage classification. © 2014 The Authors Journal of Microscopy © 2014 Royal Microscopical Society.

  12. Feature Augmentation via Nonparametrics and Selection (FANS) in High-Dimensional Classification.

    PubMed

    Fan, Jianqing; Feng, Yang; Jiang, Jiancheng; Tong, Xin

    We propose a high dimensional classification method that involves nonparametric feature augmentation. Knowing that marginal density ratios are the most powerful univariate classifiers, we use the ratio estimates to transform the original feature measurements. Subsequently, penalized logistic regression is invoked, taking as input the newly transformed or augmented features. This procedure trains models equipped with local complexity and global simplicity, thereby avoiding the curse of dimensionality while creating a flexible nonlinear decision boundary. The resulting method is called Feature Augmentation via Nonparametrics and Selection (FANS). We motivate FANS by generalizing the Naive Bayes model, writing the log ratio of joint densities as a linear combination of those of marginal densities. It is related to generalized additive models, but has better interpretability and computability. Risk bounds are developed for FANS. In numerical analysis, FANS is compared with competing methods, so as to provide a guideline on its best application domain. Real data analysis demonstrates that FANS performs very competitively on benchmark email spam and gene expression data sets. Moreover, FANS is implemented by an extremely fast algorithm through parallel computing.

  13. Feature Augmentation via Nonparametrics and Selection (FANS) in High-Dimensional Classification

    PubMed Central

    Feng, Yang; Jiang, Jiancheng; Tong, Xin

    2015-01-01

    We propose a high dimensional classification method that involves nonparametric feature augmentation. Knowing that marginal density ratios are the most powerful univariate classifiers, we use the ratio estimates to transform the original feature measurements. Subsequently, penalized logistic regression is invoked, taking as input the newly transformed or augmented features. This procedure trains models equipped with local complexity and global simplicity, thereby avoiding the curse of dimensionality while creating a flexible nonlinear decision boundary. The resulting method is called Feature Augmentation via Nonparametrics and Selection (FANS). We motivate FANS by generalizing the Naive Bayes model, writing the log ratio of joint densities as a linear combination of those of marginal densities. It is related to generalized additive models, but has better interpretability and computability. Risk bounds are developed for FANS. In numerical analysis, FANS is compared with competing methods, so as to provide a guideline on its best application domain. Real data analysis demonstrates that FANS performs very competitively on benchmark email spam and gene expression data sets. Moreover, FANS is implemented by an extremely fast algorithm through parallel computing. PMID:27185970

  14. Implementation and performance evaluation of acoustic denoising algorithms for UAV

    NASA Astrophysics Data System (ADS)

    Chowdhury, Ahmed Sony Kamal

    Unmanned Aerial Vehicles (UAVs) have become popular alternative for wildlife monitoring and border surveillance applications. Elimination of the UAV's background noise and classifying the target audio signal effectively are still a major challenge. The main goal of this thesis is to remove UAV's background noise by means of acoustic denoising techniques. Existing denoising algorithms, such as Adaptive Least Mean Square (LMS), Wavelet Denoising, Time-Frequency Block Thresholding, and Wiener Filter, were implemented and their performance evaluated. The denoising algorithms were evaluated for average Signal to Noise Ratio (SNR), Segmental SNR (SSNR), Log Likelihood Ratio (LLR), and Log Spectral Distance (LSD) metrics. To evaluate the effectiveness of the denoising algorithms on classification of target audio, we implemented Support Vector Machine (SVM) and Naive Bayes classification algorithms. Simulation results demonstrate that LMS and Discrete Wavelet Transform (DWT) denoising algorithm offered superior performance than other algorithms. Finally, we implemented the LMS and DWT algorithms on a DSP board for hardware evaluation. Experimental results showed that LMS algorithm's performance is robust compared to DWT for various noise types to classify target audio signals.

  15. Impact of corpus domain for sentiment classification: An evaluation study using supervised machine learning techniques

    NASA Astrophysics Data System (ADS)

    Karsi, Redouane; Zaim, Mounia; El Alami, Jamila

    2017-07-01

    Thanks to the development of the internet, a large community now has the possibility to communicate and express its opinions and preferences through multiple media such as blogs, forums, social networks and e-commerce sites. Today, it becomes clearer that opinions published on the web are a very valuable source for decision-making, so a rapidly growing field of research called “sentiment analysis” is born to address the problem of automatically determining the polarity (Positive, negative, neutral,…) of textual opinions. People expressing themselves in a particular domain often use specific domain language expressions, thus, building a classifier, which performs well in different domains is a challenging problem. The purpose of this paper is to evaluate the impact of domain for sentiment classification when using machine learning techniques. In our study three popular machine learning techniques: Support Vector Machines (SVM), Naive Bayes and K nearest neighbors(KNN) were applied on datasets collected from different domains. Experimental results show that Support Vector Machines outperforms other classifiers in all domains, since it achieved at least 74.75% accuracy with a standard deviation of 4,08.

  16. Comparison of Machine Learning Methods for the Arterial Hypertension Diagnostics

    PubMed Central

    Belo, David; Gamboa, Hugo

    2017-01-01

    The paper presents results of machine learning approach accuracy applied analysis of cardiac activity. The study evaluates the diagnostics possibilities of the arterial hypertension by means of the short-term heart rate variability signals. Two groups were studied: 30 relatively healthy volunteers and 40 patients suffering from the arterial hypertension of II-III degree. The following machine learning approaches were studied: linear and quadratic discriminant analysis, k-nearest neighbors, support vector machine with radial basis, decision trees, and naive Bayes classifier. Moreover, in the study, different methods of feature extraction are analyzed: statistical, spectral, wavelet, and multifractal. All in all, 53 features were investigated. Investigation results show that discriminant analysis achieves the highest classification accuracy. The suggested approach of noncorrelated feature set search achieved higher results than data set based on the principal components. PMID:28831239

  17. Bayesian network modelling of upper gastrointestinal bleeding

    NASA Astrophysics Data System (ADS)

    Aisha, Nazziwa; Shohaimi, Shamarina; Adam, Mohd Bakri

    2013-09-01

    Bayesian networks are graphical probabilistic models that represent causal and other relationships between domain variables. In the context of medical decision making, these models have been explored to help in medical diagnosis and prognosis. In this paper, we discuss the Bayesian network formalism in building medical support systems and we learn a tree augmented naive Bayes Network (TAN) from gastrointestinal bleeding data. The accuracy of the TAN in classifying the source of gastrointestinal bleeding into upper or lower source is obtained. The TAN achieves a high classification accuracy of 86% and an area under curve of 92%. A sensitivity analysis of the model shows relatively high levels of entropy reduction for color of the stool, history of gastrointestinal bleeding, consistency and the ratio of blood urea nitrogen to creatinine. The TAN facilitates the identification of the source of GIB and requires further validation.

  18. Predicting flight delay based on multiple linear regression

    NASA Astrophysics Data System (ADS)

    Ding, Yi

    2017-08-01

    Delay of flight has been regarded as one of the toughest difficulties in aviation control. How to establish an effective model to handle the delay prediction problem is a significant work. To solve the problem that the flight delay is difficult to predict, this study proposes a method to model the arriving flights and a multiple linear regression algorithm to predict delay, comparing with Naive-Bayes and C4.5 approach. Experiments based on a realistic dataset of domestic airports show that the accuracy of the proposed model approximates 80%, which is further improved than the Naive-Bayes and C4.5 approach approaches. The result testing shows that this method is convenient for calculation, and also can predict the flight delays effectively. It can provide decision basis for airport authorities.

  19. Overlapped Partitioning for Ensemble Classifiers of P300-Based Brain-Computer Interfaces

    PubMed Central

    Onishi, Akinari; Natsume, Kiyohisa

    2014-01-01

    A P300-based brain-computer interface (BCI) enables a wide range of people to control devices that improve their quality of life. Ensemble classifiers with naive partitioning were recently applied to the P300-based BCI and these classification performances were assessed. However, they were usually trained on a large amount of training data (e.g., 15300). In this study, we evaluated ensemble linear discriminant analysis (LDA) classifiers with a newly proposed overlapped partitioning method using 900 training data. In addition, the classification performances of the ensemble classifier with naive partitioning and a single LDA classifier were compared. One of three conditions for dimension reduction was applied: the stepwise method, principal component analysis (PCA), or none. The results show that an ensemble stepwise LDA (SWLDA) classifier with overlapped partitioning achieved a better performance than the commonly used single SWLDA classifier and an ensemble SWLDA classifier with naive partitioning. This result implies that the performance of the SWLDA is improved by overlapped partitioning and the ensemble classifier with overlapped partitioning requires less training data than that with naive partitioning. This study contributes towards reducing the required amount of training data and achieving better classification performance. PMID:24695550

  20. Overlapped partitioning for ensemble classifiers of P300-based brain-computer interfaces.

    PubMed

    Onishi, Akinari; Natsume, Kiyohisa

    2014-01-01

    A P300-based brain-computer interface (BCI) enables a wide range of people to control devices that improve their quality of life. Ensemble classifiers with naive partitioning were recently applied to the P300-based BCI and these classification performances were assessed. However, they were usually trained on a large amount of training data (e.g., 15300). In this study, we evaluated ensemble linear discriminant analysis (LDA) classifiers with a newly proposed overlapped partitioning method using 900 training data. In addition, the classification performances of the ensemble classifier with naive partitioning and a single LDA classifier were compared. One of three conditions for dimension reduction was applied: the stepwise method, principal component analysis (PCA), or none. The results show that an ensemble stepwise LDA (SWLDA) classifier with overlapped partitioning achieved a better performance than the commonly used single SWLDA classifier and an ensemble SWLDA classifier with naive partitioning. This result implies that the performance of the SWLDA is improved by overlapped partitioning and the ensemble classifier with overlapped partitioning requires less training data than that with naive partitioning. This study contributes towards reducing the required amount of training data and achieving better classification performance.

  1. Prediction of cause of death from forensic autopsy reports using text classification techniques: A comparative study.

    PubMed

    Mujtaba, Ghulam; Shuib, Liyana; Raj, Ram Gopal; Rajandram, Retnagowri; Shaikh, Khairunisa

    2018-07-01

    Automatic text classification techniques are useful for classifying plaintext medical documents. This study aims to automatically predict the cause of death from free text forensic autopsy reports by comparing various schemes for feature extraction, term weighing or feature value representation, text classification, and feature reduction. For experiments, the autopsy reports belonging to eight different causes of death were collected, preprocessed and converted into 43 master feature vectors using various schemes for feature extraction, representation, and reduction. The six different text classification techniques were applied on these 43 master feature vectors to construct a classification model that can predict the cause of death. Finally, classification model performance was evaluated using four performance measures i.e. overall accuracy, macro precision, macro-F-measure, and macro recall. From experiments, it was found that that unigram features obtained the highest performance compared to bigram, trigram, and hybrid-gram features. Furthermore, in feature representation schemes, term frequency, and term frequency with inverse document frequency obtained similar and better results when compared with binary frequency, and normalized term frequency with inverse document frequency. Furthermore, the chi-square feature reduction approach outperformed Pearson correlation, and information gain approaches. Finally, in text classification algorithms, support vector machine classifier outperforms random forest, Naive Bayes, k-nearest neighbor, decision tree, and ensemble-voted classifier. Our results and comparisons hold practical importance and serve as references for future works. Moreover, the comparison outputs will act as state-of-art techniques to compare future proposals with existing automated text classification techniques. Copyright © 2017 Elsevier Ltd and Faculty of Forensic and Legal Medicine. All rights reserved.

  2. A NAIVE BAYES SOURCE CLASSIFIER FOR X-RAY SOURCES

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Broos, Patrick S.; Getman, Konstantin V.; Townsley, Leisa K.

    2011-05-01

    The Chandra Carina Complex Project (CCCP) provides a sensitive X-ray survey of a nearby starburst region over >1 deg{sup 2} in extent. Thousands of faint X-ray sources are found, many concentrated into rich young stellar clusters. However, significant contamination from unrelated Galactic and extragalactic sources is present in the X-ray catalog. We describe the use of a naive Bayes classifier to assign membership probabilities to individual sources, based on source location, X-ray properties, and visual/infrared properties. For the particular membership decision rule adopted, 75% of CCCP sources are classified as members, 11% are classified as contaminants, and 14% remain unclassified.more » The resulting sample of stars likely to be Carina members is used in several other studies, which appear in this special issue devoted to the CCCP.« less

  3. Prediction of cold and heat patterns using anthropometric measures based on machine learning.

    PubMed

    Lee, Bum Ju; Lee, Jae Chul; Nam, Jiho; Kim, Jong Yeol

    2018-01-01

    To examine the association of body shape with cold and heat patterns, to determine which anthropometric measure is the best indicator for discriminating between the two patterns, and to investigate whether using a combination of measures can improve the predictive power to diagnose these patterns. Based on a total of 4,859 subjects (3,000 women and 1,859 men), statistical analyses using binary logistic regression were performed to assess the significance of the difference and the predictive power of each anthropometric measure, and binary logistic regression and Naive Bayes with the variable selection technique were used to assess the improvement in the predictive power of the patterns using the combined measures. In women, the strongest indicators for determining the cold and heat patterns among anthropometric measures were body mass index (BMI) and rib circumference; in men, the best indicator was BMI. In experiments using a combination of measures, the values of the area under the receiver operating characteristic curve in women were 0.776 by Naive Bayes and 0.772 by logistic regression, and the values in men were 0.788 by Naive Bayes and 0.779 by logistic regression. Individuals with a higher BMI have a tendency toward a heat pattern in both women and men. The use of a combination of anthropometric measures can slightly improve the diagnostic accuracy. Our findings can provide fundamental information for the diagnosis of cold and heat patterns based on body shape for personalized medicine.

  4. Automated Assessment of Patients' Self-Narratives for Posttraumatic Stress Disorder Screening Using Natural Language Processing and Text Mining.

    PubMed

    He, Qiwei; Veldkamp, Bernard P; Glas, Cees A W; de Vries, Theo

    2017-03-01

    Patients' narratives about traumatic experiences and symptoms are useful in clinical screening and diagnostic procedures. In this study, we presented an automated assessment system to screen patients for posttraumatic stress disorder via a natural language processing and text-mining approach. Four machine-learning algorithms-including decision tree, naive Bayes, support vector machine, and an alternative classification approach called the product score model-were used in combination with n-gram representation models to identify patterns between verbal features in self-narratives and psychiatric diagnoses. With our sample, the product score model with unigrams attained the highest prediction accuracy when compared with practitioners' diagnoses. The addition of multigrams contributed most to balancing the metrics of sensitivity and specificity. This article also demonstrates that text mining is a promising approach for analyzing patients' self-expression behavior, thus helping clinicians identify potential patients from an early stage.

  5. The diagnose of oil palm disease using Naive Bayes Method based on Expert System Technology

    NASA Astrophysics Data System (ADS)

    Nababan, Marlince; Laia, Yonata; Sitanggang, Delima; Sihombing, Oloan; Indra, Evta; Siregar, Saut; Purba, Windania; Mancur, Roy

    2018-04-01

    Expert system is dealt with system that used computer-based human intelligence to overcome particular problem which is commonly conducted by an expert. Frequent problem faced by the farmers of oil palm is the difficulty in defining the type of plant disease. As a result, the delay treatment of plant disease brings out the declining of farm products. An application system is needed to deal with the obstacles and diagnosing the type of oil palm plant disease. The researcher designed an intelligence-based application with input-output plan which is able to diagnose the type of oil palm plant disease by applying naive bayes method. Based on the research result by conducting bayes method with recognized symptom, diagnose of oil palm plant disease could be accomplished. The data of symptoms found are leaves turned yellow 0.4, dead leaves 0.4, black and brown color among the veins of leaves 0.5, young and old fruit with whole space 0.4, and decay of bunches is 0.3. The roots are tender in the amount of 0.5, and damage on sheath is 0.3. Through the chosen symptoms as mentioned above, the value of bayes is 80% with the type of disease is rotten bunch.

  6. Opinion mining feature-level using Naive Bayes and feature extraction based analysis dependencies

    NASA Astrophysics Data System (ADS)

    Sanda, Regi; Baizal, Z. K. Abdurahman; Nhita, Fhira

    2015-12-01

    Development of internet and technology, has major impact and providing new business called e-commerce. Many e-commerce sites that provide convenience in transaction, and consumers can also provide reviews or opinions on products that purchased. These opinions can be used by consumers and producers. Consumers to know the advantages and disadvantages of particular feature of the product. Procuders can analyse own strengths and weaknesses as well as it's competitors products. Many opinions need a method that the reader can know the point of whole opinion. The idea emerged from review summarization that summarizes the overall opinion based on sentiment and features contain. In this study, the domain that become the main focus is about the digital camera. This research consisted of four steps 1) giving the knowledge to the system to recognize the semantic orientation of an opinion 2) indentify the features of product 3) indentify whether the opinion gives a positive or negative 4) summarizing the result. In this research discussed the methods such as Naï;ve Bayes for sentiment classification, and feature extraction algorithm based on Dependencies Analysis, which is one of the tools in Natural Language Processing (NLP) and knowledge based dictionary which is useful for handling implicit features. The end result of research is a summary that contains a bunch of reviews from consumers on the features and sentiment. With proposed method, accuration for sentiment classification giving 81.2 % for positive test data, 80.2 % for negative test data, and accuration for feature extraction reach 90.3 %.

  7. A naive Bayes algorithm for tissue origin diagnosis (TOD-Bayes) of synchronous multifocal tumors in the hepatobiliary and pancreatic system.

    PubMed

    Jiang, Weiqin; Shen, Yifei; Ding, Yongfeng; Ye, Chuyu; Zheng, Yi; Zhao, Peng; Liu, Lulu; Tong, Zhou; Zhou, Linfu; Sun, Shuo; Zhang, Xingchen; Teng, Lisong; Timko, Michael P; Fan, Longjiang; Fang, Weijia

    2018-01-15

    Synchronous multifocal tumors are common in the hepatobiliary and pancreatic system but because of similarities in their histological features, oncologists have difficulty in identifying their precise tissue clonal origin through routine histopathological methods. To address this problem and assist in more precise diagnosis, we developed a computational approach for tissue origin diagnosis based on naive Bayes algorithm (TOD-Bayes) using ubiquitous RNA-Seq data. Massive tissue-specific RNA-Seq data sets were first obtained from The Cancer Genome Atlas (TCGA) and ∼1,000 feature genes were used to train and validate the TOD-Bayes algorithm. The accuracy of the model was >95% based on tenfold cross validation by the data from TCGA. A total of 18 clinical cancer samples (including six negative controls) with definitive tissue origin were subsequently used for external validation and 17 of the 18 samples were classified correctly in our study (94.4%). Furthermore, we included as cases studies seven tumor samples, taken from two individuals who suffered from synchronous multifocal tumors across tissues, where the efforts to make a definitive primary cancer diagnosis by traditional diagnostic methods had failed. Using our TOD-Bayes analysis, the two clinical test cases were successfully diagnosed as pancreatic cancer (PC) and cholangiocarcinoma (CC), respectively, in agreement with their clinical outcomes. Based on our findings, we believe that the TOD-Bayes algorithm is a powerful novel methodology to accurately identify the tissue origin of synchronous multifocal tumors of unknown primary cancers using RNA-Seq data and an important step toward more precision-based medicine in cancer diagnosis and treatment. © 2017 UICC.

  8. Biomarker selection and classification of "-omics" data using a two-step bayes classification framework.

    PubMed

    Assawamakin, Anunchai; Prueksaaroon, Supakit; Kulawonganunchai, Supasak; Shaw, Philip James; Varavithya, Vara; Ruangrajitpakorn, Taneth; Tongsima, Sissades

    2013-01-01

    Identification of suitable biomarkers for accurate prediction of phenotypic outcomes is a goal for personalized medicine. However, current machine learning approaches are either too complex or perform poorly. Here, a novel two-step machine-learning framework is presented to address this need. First, a Naïve Bayes estimator is used to rank features from which the top-ranked will most likely contain the most informative features for prediction of the underlying biological classes. The top-ranked features are then used in a Hidden Naïve Bayes classifier to construct a classification prediction model from these filtered attributes. In order to obtain the minimum set of the most informative biomarkers, the bottom-ranked features are successively removed from the Naïve Bayes-filtered feature list one at a time, and the classification accuracy of the Hidden Naïve Bayes classifier is checked for each pruned feature set. The performance of the proposed two-step Bayes classification framework was tested on different types of -omics datasets including gene expression microarray, single nucleotide polymorphism microarray (SNParray), and surface-enhanced laser desorption/ionization time-of-flight (SELDI-TOF) proteomic data. The proposed two-step Bayes classification framework was equal to and, in some cases, outperformed other classification methods in terms of prediction accuracy, minimum number of classification markers, and computational time.

  9. Textual and visual content-based anti-phishing: a Bayesian approach.

    PubMed

    Zhang, Haijun; Liu, Gang; Chow, Tommy W S; Liu, Wenyin

    2011-10-01

    A novel framework using a Bayesian approach for content-based phishing web page detection is presented. Our model takes into account textual and visual contents to measure the similarity between the protected web page and suspicious web pages. A text classifier, an image classifier, and an algorithm fusing the results from classifiers are introduced. An outstanding feature of this paper is the exploration of a Bayesian model to estimate the matching threshold. This is required in the classifier for determining the class of the web page and identifying whether the web page is phishing or not. In the text classifier, the naive Bayes rule is used to calculate the probability that a web page is phishing. In the image classifier, the earth mover's distance is employed to measure the visual similarity, and our Bayesian model is designed to determine the threshold. In the data fusion algorithm, the Bayes theory is used to synthesize the classification results from textual and visual content. The effectiveness of our proposed approach was examined in a large-scale dataset collected from real phishing cases. Experimental results demonstrated that the text classifier and the image classifier we designed deliver promising results, the fusion algorithm outperforms either of the individual classifiers, and our model can be adapted to different phishing cases. © 2011 IEEE

  10. Muscle categorization using PDF estimation and Naive Bayes classification.

    PubMed

    Adel, Tameem M; Smith, Benn E; Stashuk, Daniel W

    2012-01-01

    The structure of motor unit potentials (MUPs) and their times of occurrence provide information about the motor units (MUs) that created them. As such, electromyographic (EMG) data can be used to categorize muscles as normal or suffering from a neuromuscular disease. Using pattern discovery (PD) allows clinicians to understand the rationale underlying a certain muscle characterization; i.e. it is transparent. Discretization is required in PD, which leads to some loss in accuracy. In this work, characterization techniques that are based on estimating probability density functions (PDFs) for each muscle category are implemented. Characterization probabilities of each motor unit potential train (MUPT) are obtained from these PDFs and then Bayes rule is used to aggregate the MUPT characterization probabilities to calculate muscle level probabilities. Even though this technique is not as transparent as PD, its accuracy is higher than the discrete PD. Ultimately, the goal is to use a technique that is based on both PDFs and PD and make it as transparent and as efficient as possible, but first it was necessary to thoroughly assess how accurate a fully continuous approach can be. Using gaussian PDF estimation achieved improvements in muscle categorization accuracy over PD and further improvements resulted from using feature value histograms to choose more representative PDFs; for instance, using log-normal distribution to represent skewed histograms.

  11. Machine learning for the assessment of Alzheimer's disease through DTI

    NASA Astrophysics Data System (ADS)

    Lella, Eufemia; Amoroso, Nicola; Bellotti, Roberto; Diacono, Domenico; La Rocca, Marianna; Maggipinto, Tommaso; Monaco, Alfonso; Tangaro, Sabina

    2017-09-01

    Digital imaging techniques have found several medical applications in the development of computer aided detection systems, especially in neuroimaging. Recent advances in Diffusion Tensor Imaging (DTI) aim to discover biological markers for the early diagnosis of Alzheimer's disease (AD), one of the most widespread neurodegenerative disorders. We explore here how different supervised classification models provide a robust support to the diagnosis of AD patients. We use DTI measures, assessing the structural integrity of white matter (WM) fiber tracts, to reveal patterns of disrupted brain connectivity. In particular, we provide a voxel-wise measure of fractional anisotropy (FA) and mean diffusivity (MD), thus identifying the regions of the brain mostly affected by neurodegeneration, and then computing intensity features to feed supervised classification algorithms. In particular, we evaluate the accuracy of discrimination of AD patients from healthy controls (HC) with a dataset of 80 subjects (40 HC, 40 AD), from the Alzheimer's Disease Neurodegenerative Initiative (ADNI). In this study, we compare three state-of-the-art classification models: Random Forests, Naive Bayes and Support Vector Machines (SVMs). We use a repeated five-fold cross validation framework with nested feature selection to perform a fair comparison between these algorithms and evaluate the information content they provide. Results show that AD patterns are well localized within the brain, thus DTI features can support the AD diagnosis.

  12. Comparative analysis of tree classification models for detecting fusarium oxysporum f. sp cubense (TR4) based on multi soil sensor parameters

    NASA Astrophysics Data System (ADS)

    Estuar, Maria Regina Justina; Victorino, John Noel; Coronel, Andrei; Co, Jerelyn; Tiausas, Francis; Señires, Chiara Veronica

    2017-09-01

    Use of wireless sensor networks and smartphone integration design to monitor environmental parameters surrounding plantations is made possible because of readily available and affordable sensors. Providing low cost monitoring devices would be beneficial, especially to small farm owners, in a developing country like the Philippines, where agriculture covers a significant amount of the labor market. This study discusses the integration of wireless soil sensor devices and smartphones to create an application that will use multidimensional analysis to detect the presence or absence of plant disease. Specifically, soil sensors are designed to collect soil quality parameters in a sink node from which the smartphone collects data from via Bluetooth. Given these, there is a need to develop a classification model on the mobile phone that will report infection status of a soil. Though tree classification is the most appropriate approach for continuous parameter-based datasets, there is a need to determine whether tree models will result to coherent results or not. Soil sensor data that resides on the phone is modeled using several variations of decision tree, namely: decision tree (DT), best-fit (BF) decision tree, functional tree (FT), Naive Bayes (NB) decision tree, J48, J48graft and LAD tree, where decision tree approaches the problem by considering all sensor nodes as one. Results show that there are significant differences among soil sensor parameters indicating that there are variances in scores between the infected and uninfected sites. Furthermore, analysis of variance in accuracy, recall, precision and F1 measure scores from tree classification models homogeneity among NBTree, J48graft and J48 tree classification models.

  13. A novel, fast and efficient single-sensor automatic sleep-stage classification based on complementary cross-frequency coupling estimates.

    PubMed

    Dimitriadis, Stavros I; Salis, Christos; Linden, David

    2018-04-01

    Limitations of the manual scoring of polysomnograms, which include data from electroencephalogram (EEG), electro-oculogram (EOG), electrocardiogram (ECG) and electromyogram (EMG) channels have long been recognized. Manual staging is resource intensive and time consuming, and thus considerable effort must be spent to ensure inter-rater reliability. As a result, there is a great interest in techniques based on signal processing and machine learning for a completely Automatic Sleep Stage Classification (ASSC). In this paper, we present a single-EEG-sensor ASSC technique based on the dynamic reconfiguration of different aspects of cross-frequency coupling (CFC) estimated between predefined frequency pairs over 5 s epoch lengths. The proposed analytic scheme is demonstrated using the PhysioNet Sleep European Data Format (EDF) Database with repeat recordings from 20 healthy young adults. We validate our methodology in a second sleep dataset. We achieved very high classification sensitivity, specificity and accuracy of 96.2 ± 2.2%, 94.2 ± 2.3%, and 94.4 ± 2.2% across 20 folds, respectively, and also a high mean F1 score (92%, range 90-94%) when a multi-class Naive Bayes classifier was applied. High classification performance has been achieved also in the second sleep dataset. Our method outperformed the accuracy of previous studies not only on different datasets but also on the same database. Single-sensor ASSC makes the entire methodology appropriate for longitudinal monitoring using wearable EEG in real-world and laboratory-oriented environments. Crown Copyright © 2018. Published by Elsevier B.V. All rights reserved.

  14. Simple hybrid method for fine microaneurysm detection from non-dilated diabetic retinopathy retinal images.

    PubMed

    Sopharak, Akara; Uyyanonvara, Bunyarit; Barman, Sarah

    2013-01-01

    Microaneurysms detection is an important task in computer aided diagnosis of diabetic retinopathy. Microaneurysms are the first clinical sign of diabetic retinopathy, a major cause of vision loss in diabetic patients. Early microaneurysm detection can help reduce the incidence of blindness. Automatic detection of microaneurysms is still an open problem due to their tiny sizes, low contrast and also similarity with blood vessels. It is particularly very difficult to detect fine microaneurysms, especially from non-dilated pupils and that is the goal of this paper. Simple yet effective methods are used. They are coarse segmentation using mathematic morphology and fine segmentation using naive Bayes classifier. A total of 18 microaneurysms features are proposed in this paper and they are extracted for naive Bayes classifier. The detected microaneurysms are validated by comparing at pixel level with ophthalmologists' hand-drawn ground-truth. The sensitivity, specificity, precision and accuracy are 85.68, 99.99, 83.34 and 99.99%, respectively. Copyright © 2013 Elsevier Ltd. All rights reserved.

  15. Health Problems Discovery from Motion-Capture Data of Elderly

    NASA Astrophysics Data System (ADS)

    Pogorelc, B.; Gams, M.

    Rapid aging of the population of the developed countries could exceed the society's capacity for taking care for them. In order to help solving this problem, we propose a system for automatic discovery of health problems from motion-capture data of gait of elderly. The gait of the user is captured with the motion capture system, which consists of tags attached to the body and sensors situated in the apartment. Position of the tags is acquired by the sensors and the resulting time series of position coordinates are analyzed with machine learning algorithms in order to identify the specific health problem. We propose novel features for training a machine learning classifier that classifies the user's gait into: i) normal, ii) with hemiplegia, iii) with Parkinson's disease, iv) with pain in the back and v) with pain in the leg. Results show that naive Bayes needs more tags and less noise to reach classification accuracy of 98 % than support vector machines for 99 %.

  16. System steganalysis with automatic fingerprint extraction

    PubMed Central

    Sloan, Tom; Hernandez-Castro, Julio; Isasi, Pedro

    2018-01-01

    This paper tries to tackle the modern challenge of practical steganalysis over large data by presenting a novel approach whose aim is to perform with perfect accuracy and in a completely automatic manner. The objective is to detect changes introduced by the steganographic process in those data objects, including signatures related to the tools being used. Our approach achieves this by first extracting reliable regularities by analyzing pairs of modified and unmodified data objects; then, combines these findings by creating general patterns present on data used for training. Finally, we construct a Naive Bayes model that is used to perform classification, and operates on attributes extracted using the aforementioned patterns. This technique has been be applied for different steganographic tools that operate in media files of several types. We are able to replicate or improve on a number or previously published results, but more importantly, we in addition present new steganalytic findings over a number of popular tools that had no previous known attacks. PMID:29694366

  17. Study on bayes discriminant analysis of EEG data.

    PubMed

    Shi, Yuan; He, DanDan; Qin, Fang

    2014-01-01

    In this paper, we have done Bayes Discriminant analysis to EEG data of experiment objects which are recorded impersonally come up with a relatively accurate method used in feature extraction and classification decisions. In accordance with the strength of α wave, the head electrodes are divided into four species. In use of part of 21 electrodes EEG data of 63 people, we have done Bayes Discriminant analysis to EEG data of six objects. Results In use of part of EEG data of 63 people, we have done Bayes Discriminant analysis, the electrode classification accuracy rates is 64.4%. Bayes Discriminant has higher prediction accuracy, EEG features (mainly αwave) extract more accurate. Bayes Discriminant would be better applied to the feature extraction and classification decisions of EEG data.

  18. Investigations on classification categories for wetlands of Chesapeake Bay using remotely sensed data

    NASA Technical Reports Server (NTRS)

    Williamson, F. S. L.

    1974-01-01

    The use of remote sensors to determine the characteristics of the wetlands of the Chesapeake Bay and surrounding areas is discussed. The objectives of the program are stated as follows: (1) to use data and remote sensing techniques developed from studies of Rhode River, West River, and South River salt marshes to develop a wetland classification scheme useful in other regions of the Chesapeake Bay and to evaluate the classification system with respect to vegetation types, marsh physiography, man-induced perturbation, and salinity; and (2) to develop a program using remote sensing techniques, for the extension of the classification to Chesapeake Bay salt marshes and to coordinate this program with the goals of the Chesapeake Research Consortium and the states of Maryland and Virginia. Maps of the Chesapeake Bay areas are developed from aerial photographs to display the wetland structure and vegetation.

  19. Classification of iRBD and Parkinson's disease patients based on eye movements during sleep.

    PubMed

    Christensen, Julie A E; Koch, Henriette; Frandsen, Rune; Kempfner, Jacob; Arvastson, Lars; Christensen, Soren R; Sorensen, Helge B D; Jennum, Poul

    2013-01-01

    Patients suffering from the sleep disorder idiopathic rapid-eye-movement sleep behavior disorder (iRBD) have been observed to be in high risk of developing Parkinson's disease (PD). This makes it essential to analyze them in the search for PD biomarkers. This study aims at classifying patients suffering from iRBD or PD based on features reflecting eye movements (EMs) during sleep. A Latent Dirichlet Allocation (LDA) topic model was developed based on features extracted from two electrooculographic (EOG) signals measured as parts in full night polysomnographic (PSG) recordings from ten control subjects. The trained model was tested on ten other control subjects, ten iRBD patients and ten PD patients, obtaining a EM topic mixture diagram for each subject in the test dataset. Three features were extracted from the topic mixture diagrams, reflecting "certainty", "fragmentation" and "stability" in the timely distribution of the EM topics. Using a Naive Bayes (NB) classifier and the features "certainty" and "stability" yielded the best classification result and the subjects were classified with a sensitivity of 95 %, a specificity of 80% and an accuracy of 90 %. This study demonstrates in a data-driven approach, that iRBD and PD patients may exhibit abnorm form and/or timely distribution of EMs during sleep.

  20. Automation of motor dexterity assessment.

    PubMed

    Heyer, Patrick; Castrejon, Luis R; Orihuela-Espina, Felipe; Sucar, Luis Enrique

    2017-07-01

    Motor dexterity assessment is regularly performed in rehabilitation wards to establish patient status and automatization for such routinary task is sought. A system for automatizing the assessment of motor dexterity based on the Fugl-Meyer scale and with loose restrictions on sensing technologies is presented. The system consists of two main elements: 1) A data representation that abstracts the low level information obtained from a variety of sensors, into a highly separable low dimensionality encoding employing t-distributed Stochastic Neighbourhood Embedding, and, 2) central to this communication, a multi-label classifier that boosts classification rates by exploiting the fact that the classes corresponding to the individual exercises are naturally organized as a network. Depending on the targeted therapeutic movement class labels i.e. exercises scores, are highly correlated-patients who perform well in one, tends to perform well in related exercises-; and critically no node can be used as proxy of others - an exercise does not encode the information of other exercises. Over data from a cohort of 20 patients, the novel classifier outperforms classical Naive Bayes, random forest and variants of support vector machines (ANOVA: p < 0.001). The novel multi-label classification strategy fulfills an automatic system for motor dexterity assessment, with implications for lessening therapist's workloads, reducing healthcare costs and providing support for home-based virtual rehabilitation and telerehabilitation alternatives.

  1. Segmentation schema for enhancing land cover identification: A case study using Sentinel 2 data

    NASA Astrophysics Data System (ADS)

    Mongus, Domen; Žalik, Borut

    2018-04-01

    Land monitoring is performed increasingly using high and medium resolution optical satellites, such as the Sentinel-2. However, optical data is inevitably subjected to the variable operational conditions under which it was acquired. Overlapping of features caused by shadows, soft transitions between shadowed and non-shadowed regions, and temporal variability of the observed land-cover types require radiometric corrections. This study examines a new approach to enhancing the accuracy of land cover identification that resolves this problem. The proposed method constructs an ensemble-type classification model with weak classifiers tuned to the particular operational conditions under which the data was acquired. Iterative segmentation over the learning set is applied for this purpose, where feature space is partitioned according to the likelihood of misclassifications introduced by the classification model. As these are a consequence of overlapping features, such partitioning avoids the need for radiometric corrections of the data, and divides land cover types implicitly into subclasses. As a result, improved performance of all tested classification approaches were measured during the validation that was conducted on Sentinel-2 data. The highest accuracies in terms of F1-scores were achieved using the Naive Bayes Classifier as the weak classifier, while supplementing original spectral signatures with normalised difference vegetation index and texture analysis features, namely, average intensity, contrast, homogeneity, and dissimilarity. In total, an F1-score of nearly 95% was achieved in this way, with F1-scores of each particular land cover type reaching above 90%.

  2. Classification of postural profiles among mouth-breathing children by learning vector quantization.

    PubMed

    Mancini, F; Sousa, F S; Hummel, A D; Falcão, A E J; Yi, L C; Ortolani, C F; Sigulem, D; Pisa, I T

    2011-01-01

    Mouth breathing is a chronic syndrome that may bring about postural changes. Finding characteristic patterns of changes occurring in the complex musculoskeletal system of mouth-breathing children has been a challenge. Learning vector quantization (LVQ) is an artificial neural network model that can be applied for this purpose. The aim of the present study was to apply LVQ to determine the characteristic postural profiles shown by mouth-breathing children, in order to further understand abnormal posture among mouth breathers. Postural training data on 52 children (30 mouth breathers and 22 nose breathers) and postural validation data on 32 children (22 mouth breathers and 10 nose breathers) were used. The performance of LVQ and other classification models was compared in relation to self-organizing maps, back-propagation applied to multilayer perceptrons, Bayesian networks, naive Bayes, J48 decision trees, k, and k-nearest-neighbor classifiers. Classifier accuracy was assessed by means of leave-one-out cross-validation, area under ROC curve (AUC), and inter-rater agreement (Kappa statistics). By using the LVQ model, five postural profiles for mouth-breathing children could be determined. LVQ showed satisfactory results for mouth-breathing and nose-breathing classification: sensitivity and specificity rates of 0.90 and 0.95, respectively, when using the training dataset, and 0.95 and 0.90, respectively, when using the validation dataset. The five postural profiles for mouth-breathing children suggested by LVQ were incorporated into application software for classifying the severity of mouth breathers' abnormal posture.

  3. Gradient Analysis and Classification of Carolina Bay Vegetation: A Framework for Bay Wetlands Conservation and Restoration

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Diane De Steven,Ph.D.; Maureen Tone,PhD.

    1997-10-01

    This report address four project objectives: (1) Gradient model of Carolina bay vegetation on the SRS--The authors use ordination analyses to identify environmental and landscape factors that are correlated with vegetation composition. Significant factors can provide a framework for site-based conservation of existing diversity, and they may also be useful site predictors for potential vegetation in bay restorations. (2) Regional analysis of Carolina bay vegetation diversity--They expand the ordination analyses to assess the degree to which SRS bays encompass the range of vegetation diversity found in the regional landscape of South Carolina's western Upper Coastal Plain. Such comparisons can indicatemore » floristic status relative to regional potentials and identify missing species or community elements that might be re-introduced or restored. (3) Classification of vegetation communities in Upper Coastal Plain bays--They use cluster analysis to identify plant community-types at the regional scale, and explore how this classification may be functional with respect to significant environmental and landscape factors. An environmentally-based classification at the whole-bay level can provide a system of templates for managing bays as individual units and for restoring bays to desired plant communities. (4) Qualitative model for bay vegetation dynamics--They analyze present-day vegetation in relation to historic land uses and disturbances. The distinctive history of SRS bays provides the possibility of assessing pathways of post-disturbance succession. They attempt to develop a coarse-scale model of vegetation shifts in response to changing site factors; such qualitative models can provide a basis for suggesting management interventions that may be needed to maintain desired vegetation in protected or restored bays.« less

  4. Application of a Hidden Bayes Naive Multiclass Classifier in Network Intrusion Detection

    ERIC Educational Resources Information Center

    Koc, Levent

    2013-01-01

    With increasing Internet connectivity and traffic volume, recent intrusion incidents have reemphasized the importance of network intrusion detection systems for combating increasingly sophisticated network attacks. Techniques such as pattern recognition and the data mining of network events are often used by intrusion detection systems to classify…

  5. A Novel Feature Selection Technique for Text Classification Using Naïve Bayes.

    PubMed

    Dey Sarkar, Subhajit; Goswami, Saptarsi; Agarwal, Aman; Aktar, Javed

    2014-01-01

    With the proliferation of unstructured data, text classification or text categorization has found many applications in topic classification, sentiment analysis, authorship identification, spam detection, and so on. There are many classification algorithms available. Naïve Bayes remains one of the oldest and most popular classifiers. On one hand, implementation of naïve Bayes is simple and, on the other hand, this also requires fewer amounts of training data. From the literature review, it is found that naïve Bayes performs poorly compared to other classifiers in text classification. As a result, this makes the naïve Bayes classifier unusable in spite of the simplicity and intuitiveness of the model. In this paper, we propose a two-step feature selection method based on firstly a univariate feature selection and then feature clustering, where we use the univariate feature selection method to reduce the search space and then apply clustering to select relatively independent feature sets. We demonstrate the effectiveness of our method by a thorough evaluation and comparison over 13 datasets. The performance improvement thus achieved makes naïve Bayes comparable or superior to other classifiers. The proposed algorithm is shown to outperform other traditional methods like greedy search based wrapper or CFS.

  6. Document-Level Classification of CT Pulmonary Angiography Reports based on an Extension of the ConText Algorithm

    PubMed Central

    Chapman, Brian E.; Lee, Sean; Kang, Hyunseok Peter; Chapman, Wendy W.

    2011-01-01

    In this paper we describe an application called peFinder for document-level classification of CT pulmonary angiography reports. peFinder is based on a generalized version of the ConText algorithm, a simple text processing algorithm for identifying features in clinical report documents. peFinder was used to answer questions about the disease state (pulmonary emboli present or absent), the certainty state of the diagnosis (uncertainty present or absent), the temporal state of an identified pulmonary embolus (acute or chronic), and the technical quality state of the exam (diagnostic or not diagnostic). Gold standard answers for each question were determined from the consensus classifications of three human annotators. peFinder results were compared to naive Bayes’ classifiers using unigrams and bigrams. The sensitivities (and positive predictive values) for peFinder were 0.98(0.83), 0.86(0.96), 0.94(0.93), and 0.60(0.90) for disease state, quality state, certainty state, and temporal state respectively, compared to 0.68(0.77), 0.67(0.87), 0.62(0.82), and 0.04(0.25) for the naive Bayes’ classifier using unigrams, and 0.75(0.79), 0.52(0.69), 0.59(0.84), and 0.04(0.25) for the naive Bayes’ classifier using bigrams. PMID:21459155

  7. Gold-standard for computer-assisted morphological sperm analysis.

    PubMed

    Chang, Violeta; Garcia, Alejandra; Hitschfeld, Nancy; Härtel, Steffen

    2017-04-01

    Published algorithms for classification of human sperm heads are based on relatively small image databases that are not open to the public, and thus no direct comparison is available for competing methods. We describe a gold-standard for morphological sperm analysis (SCIAN-MorphoSpermGS), a dataset of sperm head images with expert-classification labels in one of the following classes: normal, tapered, pyriform, small or amorphous. This gold-standard is for evaluating and comparing known techniques and future improvements to present approaches for classification of human sperm heads for semen analysis. Although this paper does not provide a computational tool for morphological sperm analysis, we present a set of experiments for comparing sperm head description and classification common techniques. This classification base-line is aimed to be used as a reference for future improvements to present approaches for human sperm head classification. The gold-standard provides a label for each sperm head, which is achieved by majority voting among experts. The classification base-line compares four supervised learning methods (1- Nearest Neighbor, naive Bayes, decision trees and Support Vector Machine (SVM)) and three shape-based descriptors (Hu moments, Zernike moments and Fourier descriptors), reporting the accuracy and the true positive rate for each experiment. We used Fleiss' Kappa Coefficient to evaluate the inter-expert agreement and Fisher's exact test for inter-expert variability and statistical significant differences between descriptors and learning techniques. Our results confirm the high degree of inter-expert variability in the morphological sperm analysis. Regarding the classification base line, we show that none of the standard descriptors or classification approaches is best suitable for tackling the problem of sperm head classification. We discovered that the correct classification rate was highly variable when trying to discriminate among non-normal sperm heads. By using the Fourier descriptor and SVM, we achieved the best mean correct classification: only 49%. We conclude that the SCIAN-MorphoSpermGS will provide a standard tool for evaluation of characterization and classification approaches for human sperm heads. Indeed, there is a clear need for a specific shape-based descriptor for human sperm heads and a specific classification approach to tackle the problem of high variability within subcategories of abnormal sperm cells. Copyright © 2017 Elsevier Ltd. All rights reserved.

  8. A Naive Bayes Approach for Converging Learning Objects with Open Educational Resources

    ERIC Educational Resources Information Center

    Sabitha, A. Sai; Mehrotra, Deepti; Bansal, Abhay; Sharma, B. K.

    2016-01-01

    Open educational resources (OER) are digitised material freely available to the students and self learners. Many institutions had initiated in incorporating these OERs in their higher educational system, to improve the quality of teaching and learning. These resources promote individualised study, collaborative learning. If they are coupled with…

  9. A Naive Bayes machine learning approach to risk prediction using censored, time-to-event data.

    PubMed

    Wolfson, Julian; Bandyopadhyay, Sunayan; Elidrisi, Mohamed; Vazquez-Benitez, Gabriela; Vock, David M; Musgrove, Donald; Adomavicius, Gediminas; Johnson, Paul E; O'Connor, Patrick J

    2015-09-20

    Predicting an individual's risk of experiencing a future clinical outcome is a statistical task with important consequences for both practicing clinicians and public health experts. Modern observational databases such as electronic health records provide an alternative to the longitudinal cohort studies traditionally used to construct risk models, bringing with them both opportunities and challenges. Large sample sizes and detailed covariate histories enable the use of sophisticated machine learning techniques to uncover complex associations and interactions, but observational databases are often 'messy', with high levels of missing data and incomplete patient follow-up. In this paper, we propose an adaptation of the well-known Naive Bayes machine learning approach to time-to-event outcomes subject to censoring. We compare the predictive performance of our method with the Cox proportional hazards model which is commonly used for risk prediction in healthcare populations, and illustrate its application to prediction of cardiovascular risk using an electronic health record dataset from a large Midwest integrated healthcare system. Copyright © 2015 John Wiley & Sons, Ltd.

  10. Mapping South San Francisco Bay's seabed diversity for use in wetland restoration planning

    USGS Publications Warehouse

    Fregoso, Theresa A.; Jaffe, B.; Rathwell, G.; Collins, W.; Rhynas, K.; Tomlin, V.; Sullivan, S.

    2006-01-01

    Data for an acoustic seabed classification were collected as a part of a California Coastal Conservancy funded bathymetric survey of South Bay in early 2005.  A QTC VIEW seabed classification system recorded echoes from a sungle bean 50 kHz echosounder.  Approximately 450,000 seabed classification records were generated from an are of of about 30 sq. miles.  Ten district acoustic classes were identified through an unsupervised classification system using principle component and cluster analyses.  One hundred and sixty-one grab samples and forty-five benthic community composition data samples collected in the study area shortly before and after the seabed classification survey, further refined the ten classes into groups based on grain size.  A preliminary map of surficial grain size of South Bay was developed from the combination of the seabed classification and the grab and benthic samples.  The initial seabed classification map, the grain size map, and locations of sediment samples will be displayed along with the methods of acousitc seabed classification.

  11. Bayesian learning for spatial filtering in an EEG-based brain-computer interface.

    PubMed

    Zhang, Haihong; Yang, Huijuan; Guan, Cuntai

    2013-07-01

    Spatial filtering for EEG feature extraction and classification is an important tool in brain-computer interface. However, there is generally no established theory that links spatial filtering directly to Bayes classification error. To address this issue, this paper proposes and studies a Bayesian analysis theory for spatial filtering in relation to Bayes error. Following the maximum entropy principle, we introduce a gamma probability model for describing single-trial EEG power features. We then formulate and analyze the theoretical relationship between Bayes classification error and the so-called Rayleigh quotient, which is a function of spatial filters and basically measures the ratio in power features between two classes. This paper also reports our extensive study that examines the theory and its use in classification, using three publicly available EEG data sets and state-of-the-art spatial filtering techniques and various classifiers. Specifically, we validate the positive relationship between Bayes error and Rayleigh quotient in real EEG power features. Finally, we demonstrate that the Bayes error can be practically reduced by applying a new spatial filter with lower Rayleigh quotient.

  12. Feature weight estimation for gene selection: a local hyperlinear learning approach

    PubMed Central

    2014-01-01

    Background Modeling high-dimensional data involving thousands of variables is particularly important for gene expression profiling experiments, nevertheless,it remains a challenging task. One of the challenges is to implement an effective method for selecting a small set of relevant genes, buried in high-dimensional irrelevant noises. RELIEF is a popular and widely used approach for feature selection owing to its low computational cost and high accuracy. However, RELIEF based methods suffer from instability, especially in the presence of noisy and/or high-dimensional outliers. Results We propose an innovative feature weighting algorithm, called LHR, to select informative genes from highly noisy data. LHR is based on RELIEF for feature weighting using classical margin maximization. The key idea of LHR is to estimate the feature weights through local approximation rather than global measurement, which is typically used in existing methods. The weights obtained by our method are very robust in terms of degradation of noisy features, even those with vast dimensions. To demonstrate the performance of our method, extensive experiments involving classification tests have been carried out on both synthetic and real microarray benchmark datasets by combining the proposed technique with standard classifiers, including the support vector machine (SVM), k-nearest neighbor (KNN), hyperplane k-nearest neighbor (HKNN), linear discriminant analysis (LDA) and naive Bayes (NB). Conclusion Experiments on both synthetic and real-world datasets demonstrate the superior performance of the proposed feature selection method combined with supervised learning in three aspects: 1) high classification accuracy, 2) excellent robustness to noise and 3) good stability using to various classification algorithms. PMID:24625071

  13. Comparing supervised learning methods for classifying sex, age, context and individual Mudi dogs from barking.

    PubMed

    Larrañaga, Ana; Bielza, Concha; Pongrácz, Péter; Faragó, Tamás; Bálint, Anna; Larrañaga, Pedro

    2015-03-01

    Barking is perhaps the most characteristic form of vocalization in dogs; however, very little is known about its role in the intraspecific communication of this species. Besides the obvious need for ethological research, both in the field and in the laboratory, the possible information content of barks can also be explored by computerized acoustic analyses. This study compares four different supervised learning methods (naive Bayes, classification trees, [Formula: see text]-nearest neighbors and logistic regression) combined with three strategies for selecting variables (all variables, filter and wrapper feature subset selections) to classify Mudi dogs by sex, age, context and individual from their barks. The classification accuracy of the models obtained was estimated by means of [Formula: see text]-fold cross-validation. Percentages of correct classifications were 85.13 % for determining sex, 80.25 % for predicting age (recodified as young, adult and old), 55.50 % for classifying contexts (seven situations) and 67.63 % for recognizing individuals (8 dogs), so the results are encouraging. The best-performing method was [Formula: see text]-nearest neighbors following a wrapper feature selection approach. The results for classifying contexts and recognizing individual dogs were better with this method than they were for other approaches reported in the specialized literature. This is the first time that the sex and age of domestic dogs have been predicted with the help of sound analysis. This study shows that dog barks carry ample information regarding the caller's indexical features. Our computerized analysis provides indirect proof that barks may serve as an important source of information for dogs as well.

  14. Posture Detection Based on Smart Cushion for Wheelchair Users

    PubMed Central

    Ma, Congcong; Li, Wenfeng; Gravina, Raffaele; Fortino, Giancarlo

    2017-01-01

    The postures of wheelchair users can reveal their sitting habit, mood, and even predict health risks such as pressure ulcers or lower back pain. Mining the hidden information of the postures can reveal their wellness and general health conditions. In this paper, a cushion-based posture recognition system is used to process pressure sensor signals for the detection of user’s posture in the wheelchair. The proposed posture detection method is composed of three main steps: data level classification for posture detection, backward selection of sensor configuration, and recognition results compared with previous literature. Five supervised classification techniques—Decision Tree (J48), Support Vector Machines (SVM), Multilayer Perceptron (MLP), Naive Bayes, and k-Nearest Neighbor (k-NN)—are compared in terms of classification accuracy, precision, recall, and F-measure. Results indicate that the J48 classifier provides the highest accuracy compared to other techniques. The backward selection method was used to determine the best sensor deployment configuration of the wheelchair. Several kinds of pressure sensor deployments are compared and our new method of deployment is shown to better detect postures of the wheelchair users. Performance analysis also took into account the Body Mass Index (BMI), useful for evaluating the robustness of the method across individual physical differences. Results show that our proposed sensor deployment is effective, achieving 99.47% posture recognition accuracy. Our proposed method is very competitive for posture recognition and robust in comparison with other former research. Accurate posture detection represents a fundamental basic block to develop several applications, including fatigue estimation and activity level assessment. PMID:28353684

  15. Identifying patients in target customer segments using a two-stage clustering-classification approach: a hospital-based assessment.

    PubMed

    Chen, You-Shyang; Cheng, Ching-Hsue; Lai, Chien-Jung; Hsu, Cheng-Yi; Syu, Han-Jhou

    2012-02-01

    Identifying patients in a Target Customer Segment (TCS) is important to determine the demand for, and to appropriately allocate resources for, health care services. The purpose of this study is to propose a two-stage clustering-classification model through (1) initially integrating the RFM attribute and K-means algorithm for clustering the TCS patients and (2) then integrating the global discretization method and the rough set theory for classifying hospitalized departments and optimizing health care services. To assess the performance of the proposed model, a dataset was used from a representative hospital (termed Hospital-A) that was extracted from a database from an empirical study in Taiwan comprised of 183,947 samples that were characterized by 44 attributes during 2008. The proposed model was compared with three techniques, Decision Tree, Naive Bayes, and Multilayer Perceptron, and the empirical results showed significant promise of its accuracy. The generated knowledge-based rules provide useful information to maximize resource utilization and support the development of a strategy for decision-making in hospitals. From the findings, 75 patients in the TCS, three hospital departments, and specific diagnostic items were discovered in the data for Hospital-A. A potential determinant for gender differences was found, and the age attribute was not significant to the hospital departments. Copyright © 2011 Elsevier Ltd. All rights reserved.

  16. Mapping online transportation service quality and multiclass classification problem solving priorities

    NASA Astrophysics Data System (ADS)

    Alamsyah, Andry; Rachmadiansyah, Imam

    2018-03-01

    Online transportation service is known for its accessibility, transparency, and tariff affordability. These points make online transportation have advantages over the existing conventional transportation service. Online transportation service is an example of disruptive technology that change the relationship between customers and companies. In Indonesia, there are high competition among online transportation provider, hence the companies must maintain and monitor their service level. To understand their position, we apply both sentiment analysis and multiclass classification to understand customer opinions. From negative sentiments, we can identify problems and establish problem-solving priorities. As a case study, we use the most popular online transportation provider in Indonesia: Gojek and Grab. Since many customers are actively give compliment and complain about company’s service level on Twitter, therefore we collect 61,721 tweets in Bahasa during one month observations. We apply Naive Bayes and Support Vector Machine methods to see which model perform best for our data. The result reveal Gojek has better service quality with 19.76% positive and 80.23% negative sentiments than Grab with 9.2% positive and 90.8% negative. The Gojek highest problem-solving priority is regarding application problems, while Grab is about unusable promos. The overall result shows general problems of both case study are related to accessibility dimension which indicate lack of capability to provide good digital access to the end users.

  17. Text Categorization for Multi-Page Documents: A Hybrid Naive Bayes HMM Approach.

    ERIC Educational Resources Information Center

    Frasconi, Paolo; Soda, Giovanni; Vullo, Alessandro

    Text categorization is typically formulated as a concept learning problem where each instance is a single isolated document. This paper is interested in a more general formulation where documents are organized as page sequences, as naturally occurring in digital libraries of scanned books and magazines. The paper describes a method for classifying…

  18. Automated annotation of functional imaging experiments via multi-label classification

    PubMed Central

    Turner, Matthew D.; Chakrabarti, Chayan; Jones, Thomas B.; Xu, Jiawei F.; Fox, Peter T.; Luger, George F.; Laird, Angela R.; Turner, Jessica A.

    2013-01-01

    Identifying the experimental methods in human neuroimaging papers is important for grouping meaningfully similar experiments for meta-analyses. Currently, this can only be done by human readers. We present the performance of common machine learning (text mining) methods applied to the problem of automatically classifying or labeling this literature. Labeling terms are from the Cognitive Paradigm Ontology (CogPO), the text corpora are abstracts of published functional neuroimaging papers, and the methods use the performance of a human expert as training data. We aim to replicate the expert's annotation of multiple labels per abstract identifying the experimental stimuli, cognitive paradigms, response types, and other relevant dimensions of the experiments. We use several standard machine learning methods: naive Bayes (NB), k-nearest neighbor, and support vector machines (specifically SMO or sequential minimal optimization). Exact match performance ranged from only 15% in the worst cases to 78% in the best cases. NB methods combined with binary relevance transformations performed strongly and were robust to overfitting. This collection of results demonstrates what can be achieved with off-the-shelf software components and little to no pre-processing of raw text. PMID:24409112

  19. Identifying typical physical activity on smartphone with varying positions and orientations.

    PubMed

    Miao, Fen; He, Yi; Liu, Jinlei; Li, Ye; Ayoola, Idowu

    2015-04-13

    Traditional activity recognition solutions are not widely applicable due to a high cost and inconvenience to use with numerous sensors. This paper aims to automatically recognize physical activity with the help of the built-in sensors of the widespread smartphone without any limitation of firm attachment to the human body. By introducing a method to judge whether the phone is in a pocket, we investigated the data collected from six positions of seven subjects, chose five signals that are insensitive to orientation for activity classification. Decision trees (J48), Naive Bayes and Sequential minimal optimization (SMO) were employed to recognize five activities: static, walking, running, walking upstairs and walking downstairs. The experimental results based on 8,097 activity data demonstrated that the J48 classifier produced the best performance with an average recognition accuracy of 89.6% during the three classifiers, and thus would serve as the optimal online classifier. The utilization of the built-in sensors of the smartphone to recognize typical physical activities without any limitation of firm attachment is feasible.

  20. Link prediction in multiplex online social networks

    NASA Astrophysics Data System (ADS)

    Jalili, Mahdi; Orouskhani, Yasin; Asgari, Milad; Alipourfard, Nazanin; Perc, Matjaž

    2017-02-01

    Online social networks play a major role in modern societies, and they have shaped the way social relationships evolve. Link prediction in social networks has many potential applications such as recommending new items to users, friendship suggestion and discovering spurious connections. Many real social networks evolve the connections in multiple layers (e.g. multiple social networking platforms). In this article, we study the link prediction problem in multiplex networks. As an example, we consider a multiplex network of Twitter (as a microblogging service) and Foursquare (as a location-based social network). We consider social networks of the same users in these two platforms and develop a meta-path-based algorithm for predicting the links. The connectivity information of the two layers is used to predict the links in Foursquare network. Three classical classifiers (naive Bayes, support vector machines (SVM) and K-nearest neighbour) are used for the classification task. Although the networks are not highly correlated in the layers, our experiments show that including the cross-layer information significantly improves the prediction performance. The SVM classifier results in the best performance with an average accuracy of 89%.

  1. Link prediction in multiplex online social networks.

    PubMed

    Jalili, Mahdi; Orouskhani, Yasin; Asgari, Milad; Alipourfard, Nazanin; Perc, Matjaž

    2017-02-01

    Online social networks play a major role in modern societies, and they have shaped the way social relationships evolve. Link prediction in social networks has many potential applications such as recommending new items to users, friendship suggestion and discovering spurious connections. Many real social networks evolve the connections in multiple layers (e.g. multiple social networking platforms). In this article, we study the link prediction problem in multiplex networks. As an example, we consider a multiplex network of Twitter (as a microblogging service) and Foursquare (as a location-based social network). We consider social networks of the same users in these two platforms and develop a meta-path-based algorithm for predicting the links. The connectivity information of the two layers is used to predict the links in Foursquare network. Three classical classifiers (naive Bayes, support vector machines (SVM) and K-nearest neighbour) are used for the classification task. Although the networks are not highly correlated in the layers, our experiments show that including the cross-layer information significantly improves the prediction performance. The SVM classifier results in the best performance with an average accuracy of 89%.

  2. An Analysis of Document Category Prediction Responses to Classifier Model Parameter Treatment Permutations within the Software Design Patterns Subject Domain

    ERIC Educational Resources Information Center

    Pankau, Brian L.

    2009-01-01

    This empirical study evaluates the document category prediction effectiveness of Naive Bayes (NB) and K-Nearest Neighbor (KNN) classifier treatments built from different feature selection and machine learning settings and trained and tested against textual corpora of 2300 Gang-Of-Four (GOF) design pattern documents. Analysis of the experiment's…

  3. A deep learning approach for fetal QRS complex detection.

    PubMed

    Zhong, Wei; Liao, Lijuan; Guo, Xuemei; Wang, Guoli

    2018-04-20

    Non-invasive foetal electrocardiography (NI-FECG) has the potential to provide more additional clinical information for detecting and diagnosing fetal diseases. We propose and demonstrate a deep learning approach for fetal QRS complex detection from raw NI-FECG signals by using a convolutional neural network (CNN) model. The main objective is to investigate whether reliable fetal QRS complex detection performance can still be obtained from features of single-channel NI-FECG signals, without canceling maternal ECG (MECG) signals. A deep learning method is proposed for recognizing fetal QRS complexes. Firstly, we collect data from set-a of the PhysioNet/computing in Cardiology Challenge database. The sample entropy method is used for signal quality assessment. Part of the bad quality signals is excluded in the further analysis. Secondly, in the proposed method, the features of raw NI-FECG signals are normalized before they are fed to a CNN classifier to perform fetal QRS complex detection. We use precision, recall, F-measure and accuracy as the evaluation metrics to assess the performance of fetal QRS complex detection. The proposed deep learning method can achieve relatively high precision (75.33%), recall (80.54%), and F-measure scores (77.85%) compared with three other well-known pattern classification methods, namely KNN, naive Bayes and SVM. the proposed deep learning method can attain reliable fetal QRS complex detection performance from the raw NI-FECG signals without canceling MECG signals. In addition, the influence of different activation functions and signal quality assessment on classification performance are evaluated, and results show that Relu outperforms the Sigmoid and Tanh on this particular task, and better classification performance is obtained with the signal quality assessment step in this study.

  4. Combining deep residual neural network features with supervised machine learning algorithms to classify diverse food image datasets.

    PubMed

    McAllister, Patrick; Zheng, Huiru; Bond, Raymond; Moorhead, Anne

    2018-04-01

    Obesity is increasing worldwide and can cause many chronic conditions such as type-2 diabetes, heart disease, sleep apnea, and some cancers. Monitoring dietary intake through food logging is a key method to maintain a healthy lifestyle to prevent and manage obesity. Computer vision methods have been applied to food logging to automate image classification for monitoring dietary intake. In this work we applied pretrained ResNet-152 and GoogleNet convolutional neural networks (CNNs), initially trained using ImageNet Large Scale Visual Recognition Challenge (ILSVRC) dataset with MatConvNet package, to extract features from food image datasets; Food 5K, Food-11, RawFooT-DB, and Food-101. Deep features were extracted from CNNs and used to train machine learning classifiers including artificial neural network (ANN), support vector machine (SVM), Random Forest, and Naive Bayes. Results show that using ResNet-152 deep features with SVM with RBF kernel can accurately detect food items with 99.4% accuracy using Food-5K validation food image dataset and 98.8% with Food-5K evaluation dataset using ANN, SVM-RBF, and Random Forest classifiers. Trained with ResNet-152 features, ANN can achieve 91.34%, 99.28% when applied to Food-11 and RawFooT-DB food image datasets respectively and SVM with RBF kernel can achieve 64.98% with Food-101 image dataset. From this research it is clear that using deep CNN features can be used efficiently for diverse food item image classification. The work presented in this research shows that pretrained ResNet-152 features provide sufficient generalisation power when applied to a range of food image classification tasks. Copyright © 2018 Elsevier Ltd. All rights reserved.

  5. Creating Diverse Ensemble Classifiers to Reduce Supervision

    DTIC Science & Technology

    2005-12-01

    artificial examples. Quite often training with noise improves network generalization (Bishop, 1995; Raviv & Intrator, 1996). Adding noise to training...full training set, as seen by comparing to the to- tal dataset sizes. Hence, improving on the data utilization of DECORATE is a fairly difficult task...prohibitively expensive, except (perhaps) with an incremen- tal learner such as Naive Bayes. Our AFA framework is significantly more efficient because

  6. Construction accident narrative classification: An evaluation of text mining techniques.

    PubMed

    Goh, Yang Miang; Ubeynarayana, C U

    2017-11-01

    Learning from past accidents is fundamental to accident prevention. Thus, accident and near miss reporting are encouraged by organizations and regulators. However, for organizations managing large safety databases, the time taken to accurately classify accident and near miss narratives will be very significant. This study aims to evaluate the utility of various text mining classification techniques in classifying 1000 publicly available construction accident narratives obtained from the US OSHA website. The study evaluated six machine learning algorithms, including support vector machine (SVM), linear regression (LR), random forest (RF), k-nearest neighbor (KNN), decision tree (DT) and Naive Bayes (NB), and found that SVM produced the best performance in classifying the test set of 251 cases. Further experimentation with tokenization of the processed text and non-linear SVM were also conducted. In addition, a grid search was conducted on the hyperparameters of the SVM models. It was found that the best performing classifiers were linear SVM with unigram tokenization and radial basis function (RBF) SVM with uni-gram tokenization. In view of its relative simplicity, the linear SVM is recommended. Across the 11 labels of accident causes or types, the precision of the linear SVM ranged from 0.5 to 1, recall ranged from 0.36 to 0.9 and F1 score was between 0.45 and 0.92. The reasons for misclassification were discussed and suggestions on ways to improve the performance were provided. Copyright © 2017 Elsevier Ltd. All rights reserved.

  7. Classification of earth terrain using polarimetric synthetic aperture radar images

    NASA Technical Reports Server (NTRS)

    Lim, H. H.; Swartz, A. A.; Yueh, H. A.; Kong, J. A.; Shin, R. T.; Van Zyl, J. J.

    1989-01-01

    Supervised and unsupervised classification techniques are developed and used to classify the earth terrain components from SAR polarimetric images of San Francisco Bay and Traverse City, Michigan. The supervised techniques include the Bayes classifiers, normalized polarimetric classification, and simple feature classification using discriminates such as the absolute and normalized magnitude response of individual receiver channel returns and the phase difference between receiver channels. An algorithm is developed as an unsupervised technique which classifies terrain elements based on the relationship between the orientation angle and the handedness of the transmitting and receiving polariation states. It is found that supervised classification produces the best results when accurate classifier training data are used, while unsupervised classification may be applied when training data are not available.

  8. Computer-aided diagnosis with potential application to rapid detection of disease outbreaks.

    PubMed

    Burr, Tom; Koster, Frederick; Picard, Rick; Forslund, Dave; Wokoun, Doug; Joyce, Ed; Brillman, Judith; Froman, Phil; Lee, Jack

    2007-04-15

    Our objectives are to quickly interpret symptoms of emergency patients to identify likely syndromes and to improve population-wide disease outbreak detection. We constructed a database of 248 syndromes, each syndrome having an estimated probability of producing any of 85 symptoms, with some two-way, three-way, and five-way probabilities reflecting correlations among symptoms. Using these multi-way probabilities in conjunction with an iterative proportional fitting algorithm allows estimation of full conditional probabilities. Combining these conditional probabilities with misdiagnosis error rates and incidence rates via Bayes theorem, the probability of each syndrome is estimated. We tested a prototype of computer-aided differential diagnosis (CADDY) on simulated data and on more than 100 real cases, including West Nile Virus, Q fever, SARS, anthrax, plague, tularaemia and toxic shock cases. We conclude that: (1) it is important to determine whether the unrecorded positive status of a symptom means that the status is negative or that the status is unknown; (2) inclusion of misdiagnosis error rates produces more realistic results; (3) the naive Bayes classifier, which assumes all symptoms behave independently, is slightly outperformed by CADDY, which includes available multi-symptom information on correlations; as more information regarding symptom correlations becomes available, the advantage of CADDY over the naive Bayes classifier should increase; (4) overlooking low-probability, high-consequence events is less likely if the standard output summary is augmented with a list of rare syndromes that are consistent with observed symptoms, and (5) accumulating patient-level probabilities across a larger population can aid in biosurveillance for disease outbreaks. c 2007 John Wiley & Sons, Ltd.

  9. Short-lived brain state after cued motor imagery in naive subjects.

    PubMed

    Pfurtscheller, G; Scherer, R; Müller-Putz, G R; Lopes da Silva, F H

    2008-10-01

    Multi-channel electroencephalography recordings have shown that a visual cue, indicating right hand, left hand or foot motor imagery, can induce a short-lived brain state in the order of about 500 ms. In the present study, 10 able-bodied subjects without any motor imagery experience (naive subjects) were asked to imagine the indicated limb movement for some seconds. Common spatial filtering and linear single-trial classification was applied to discriminate between two conditions (two brain states: right hand vs. left hand, left hand vs. foot and right hand vs. foot). The corresponding classification accuracies (mean +/- SD) were 80.0 +/- 10.6%, 83.3 +/- 10.2% and 83.6 +/- 8.8%, respectively. Inspection of central mu and beta rhythms revealed a short-lasting somatotopically specific event-related desynchronization (ERD) in the upper mu and/or beta bands starting approximately 300 ms after the cue onset and lasting for less than 1 s.

  10. Prediction of vitamin interacting residues in a vitamin binding protein using evolutionary information

    PubMed Central

    2013-01-01

    Background The vitamins are important cofactors in various enzymatic-reactions. In past, many inhibitors have been designed against vitamin binding pockets in order to inhibit vitamin-protein interactions. Thus, it is important to identify vitamin interacting residues in a protein. It is possible to detect vitamin-binding pockets on a protein, if its tertiary structure is known. Unfortunately tertiary structures of limited proteins are available. Therefore, it is important to develop in-silico models for predicting vitamin interacting residues in protein from its primary structure. Results In this study, first we compared protein-interacting residues of vitamins with other ligands using Two Sample Logo (TSL). It was observed that ATP, GTP, NAD, FAD and mannose preferred {G,R,K,S,H}, {G,K,T,S,D,N}, {T,G,Y}, {G,Y,W} and {Y,D,W,N,E} residues respectively, whereas vitamins preferred {Y,F,S,W,T,G,H} residues for the interaction with proteins. Furthermore, compositional information of preferred and non-preferred residues along with patterns-specificity was also observed within different vitamin-classes. Vitamins A, B and B6 preferred {F,I,W,Y,L,V}, {S,Y,G,T,H,W,N,E} and {S,T,G,H,Y,N} interacting residues respectively. It suggested that protein-binding patterns of vitamins are different from other ligands, and motivated us to develop separate predictor for vitamins and their sub-classes. The four different prediction modules, (i) vitamin interacting residues (VIRs), (ii) vitamin-A interacting residues (VAIRs), (iii) vitamin-B interacting residues (VBIRs) and (iv) pyridoxal-5-phosphate (vitamin B6) interacting residues (PLPIRs) have been developed. We applied various classifiers of SVM, BayesNet, NaiveBayes, ComplementNaiveBayes, NaiveBayesMultinomial, RandomForest and IBk etc., as machine learning techniques, using binary and Position-Specific Scoring Matrix (PSSM) features of protein sequences. Finally, we selected best performing SVM modules and obtained highest MCC of 0.53, 0.48, 0.61, 0.81 for VIRs, VAIRs, VBIRs, PLPIRs respectively, using PSSM-based evolutionary information. All the modules developed in this study have been trained and tested on non-redundant datasets and evaluated using five-fold cross-validation technique. The performances were also evaluated on the balanced and different independent datasets. Conclusions This study demonstrates that it is possible to predict VIRs, VAIRs, VBIRs and PLPIRs from evolutionary information of protein sequence. In order to provide service to the scientific community, we have developed web-server and standalone software VitaPred (http://crdd.osdd.net/raghava/vitapred/). PMID:23387468

  11. Prediction of vitamin interacting residues in a vitamin binding protein using evolutionary information.

    PubMed

    Panwar, Bharat; Gupta, Sudheer; Raghava, Gajendra P S

    2013-02-07

    The vitamins are important cofactors in various enzymatic-reactions. In past, many inhibitors have been designed against vitamin binding pockets in order to inhibit vitamin-protein interactions. Thus, it is important to identify vitamin interacting residues in a protein. It is possible to detect vitamin-binding pockets on a protein, if its tertiary structure is known. Unfortunately tertiary structures of limited proteins are available. Therefore, it is important to develop in-silico models for predicting vitamin interacting residues in protein from its primary structure. In this study, first we compared protein-interacting residues of vitamins with other ligands using Two Sample Logo (TSL). It was observed that ATP, GTP, NAD, FAD and mannose preferred {G,R,K,S,H}, {G,K,T,S,D,N}, {T,G,Y}, {G,Y,W} and {Y,D,W,N,E} residues respectively, whereas vitamins preferred {Y,F,S,W,T,G,H} residues for the interaction with proteins. Furthermore, compositional information of preferred and non-preferred residues along with patterns-specificity was also observed within different vitamin-classes. Vitamins A, B and B6 preferred {F,I,W,Y,L,V}, {S,Y,G,T,H,W,N,E} and {S,T,G,H,Y,N} interacting residues respectively. It suggested that protein-binding patterns of vitamins are different from other ligands, and motivated us to develop separate predictor for vitamins and their sub-classes. The four different prediction modules, (i) vitamin interacting residues (VIRs), (ii) vitamin-A interacting residues (VAIRs), (iii) vitamin-B interacting residues (VBIRs) and (iv) pyridoxal-5-phosphate (vitamin B6) interacting residues (PLPIRs) have been developed. We applied various classifiers of SVM, BayesNet, NaiveBayes, ComplementNaiveBayes, NaiveBayesMultinomial, RandomForest and IBk etc., as machine learning techniques, using binary and Position-Specific Scoring Matrix (PSSM) features of protein sequences. Finally, we selected best performing SVM modules and obtained highest MCC of 0.53, 0.48, 0.61, 0.81 for VIRs, VAIRs, VBIRs, PLPIRs respectively, using PSSM-based evolutionary information. All the modules developed in this study have been trained and tested on non-redundant datasets and evaluated using five-fold cross-validation technique. The performances were also evaluated on the balanced and different independent datasets. This study demonstrates that it is possible to predict VIRs, VAIRs, VBIRs and PLPIRs from evolutionary information of protein sequence. In order to provide service to the scientific community, we have developed web-server and standalone software VitaPred (http://crdd.osdd.net/raghava/vitapred/).

  12. Object-Oriented Approach to Integrating Database Semantics. Volume 4.

    DTIC Science & Technology

    1987-12-01

    schemata for; 1. Object Classification Shema -- Entities 2. Object Structure and Relationship Schema -- Relations 3. Operation Classification and... relationships are represented in a database is non- intuitive for naive users. *It is difficult to access and combine information in multiple databases. In this...from the CURRENT-.CLASSES table. Choosing a selected item do-selects it. Choose 0 to exit. 1. STUDENTS 2. CUR~RENT-..CLASSES 3. MANAGMNT -.CLASS

  13. Remote Sensing Image Classification Applied to the First National Geographical Information Census of China

    NASA Astrophysics Data System (ADS)

    Yu, Xin; Wen, Zongyong; Zhu, Zhaorong; Xia, Qiang; Shun, Lan

    2016-06-01

    Image classification will still be a long way in the future, although it has gone almost half a century. In fact, researchers have gained many fruits in the image classification domain, but there is still a long distance between theory and practice. However, some new methods in the artificial intelligence domain will be absorbed into the image classification domain and draw on the strength of each to offset the weakness of the other, which will open up a new prospect. Usually, networks play the role of a high-level language, as is seen in Artificial Intelligence and statistics, because networks are used to build complex model from simple components. These years, Bayesian Networks, one of probabilistic networks, are a powerful data mining technique for handling uncertainty in complex domains. In this paper, we apply Tree Augmented Naive Bayesian Networks (TAN) to texture classification of High-resolution remote sensing images and put up a new method to construct the network topology structure in terms of training accuracy based on the training samples. Since 2013, China government has started the first national geographical information census project, which mainly interprets geographical information based on high-resolution remote sensing images. Therefore, this paper tries to apply Bayesian network to remote sensing image classification, in order to improve image interpretation in the first national geographical information census project. In the experiment, we choose some remote sensing images in Beijing. Experimental results demonstrate TAN outperform than Naive Bayesian Classifier (NBC) and Maximum Likelihood Classification Method (MLC) in the overall classification accuracy. In addition, the proposed method can reduce the workload of field workers and improve the work efficiency. Although it is time consuming, it will be an attractive and effective method for assisting office operation of image interpretation.

  14. Earthquake Damage Assessment over Port-au-Prince (Haiti) by Fusing Optical and SAR Data

    NASA Astrophysics Data System (ADS)

    Romaniello, V.; Piscini, A.; Bignami, C.; Anniballe, R.; Pierdicca, N.; Stramondo, S.

    2016-08-01

    This work proposes methodologies aiming at evaluating the sensitivity of optical and SAR change features obtained from satellite images with respect to the damage grade. The proposed methods are derived from the literature ([1], [2], [3], [4]) and the main novelty concerns the estimation of these change features at object scale.The test case is the Mw 7.0 earthquake that hit Haiti on January 12, 2010.The analysis of change detection indicators is based on ground truth information collected during a post- earthquake survey. We have generated the damage map of Port-au-Prince by considering a set of polygons extracted from the open source Open Street Map geo- database. The resulting damage map was calculated in terms of collapse ratio [5].We selected some features having a good sensitivity with damage at object scale [6]: the Normalised Difference Index, the Kullback-Libler Divergence, the Mutual Information and the Intensity Correlation Difference.The Naive Bayes and the Support Vector Machine classifiers were used to evaluate the goodness of these features. The classification results demonstrate that the simultaneous use of several change features from EO observations can improve the damage estimation at object scale.

  15. Rating prediction using textual reviews

    NASA Astrophysics Data System (ADS)

    NithyaKalyani, A.; Ushasukhanya, S.; Nagamalleswari, TYJ; Girija, S.

    2018-04-01

    Information today is present in the form of opinions. Two & a half quintillion bytes are exchanged today in Internet everyday and a large amount consists of people’s speculation and reflection over an issue. It is the need of the hour to be able to mine this information that is presented to us. Sentimental analysis refers to mining of this raw information to make sense. The discipline of opinion mining has seen a lot of encouragement in the past few years augmented by involvement of social media like Instagram, Facebook, and twitter. The hidden message in this web of information is useful in several fields such as marketing, political polls, product review, forecast market movement, Identifying detractor and promoter. In this endeavor, we introduced sentiment rating system for a particular text or paragraph to determine the opinions polarity. Firstly we resolve the searching problem, tokenization, classification, and reliable content identification. Secondly we extract probability for given text or paragraph for both positive & negative sentiment value using naive bayes classifier. At last we use sentiment dictionary (SD), sentiment degree dictionary (SDD) and negation dictionary (ND) for more accuracy. Later we blend all above mentioned factor into given formula to find the rating for the review.

  16. Content-based image retrieval for interstitial lung diseases using classification confidence

    NASA Astrophysics Data System (ADS)

    Dash, Jatindra Kumar; Mukhopadhyay, Sudipta; Prabhakar, Nidhi; Garg, Mandeep; Khandelwal, Niranjan

    2013-02-01

    Content Based Image Retrieval (CBIR) system could exploit the wealth of High-Resolution Computed Tomography (HRCT) data stored in the archive by finding similar images to assist radiologists for self learning and differential diagnosis of Interstitial Lung Diseases (ILDs). HRCT findings of ILDs are classified into several categories (e.g. consolidation, emphysema, ground glass, nodular etc.) based on their texture like appearances. Therefore, analysis of ILDs is considered as a texture analysis problem. Many approaches have been proposed for CBIR of lung images using texture as primitive visual content. This paper presents a new approach to CBIR for ILDs. The proposed approach makes use of a trained neural network (NN) to find the output class label of query image. The degree of confidence of the NN classifier is analyzed using Naive Bayes classifier that dynamically takes a decision on the size of the search space to be used for retrieval. The proposed approach is compared with three simple distance based and one classifier based texture retrieval approaches. Experimental results show that the proposed technique achieved highest average percentage precision of 92.60% with lowest standard deviation of 20.82%.

  17. Computer-aided diagnosis of melanoma using border and wavelet-based texture analysis.

    PubMed

    Garnavi, Rahil; Aldeen, Mohammad; Bailey, James

    2012-11-01

    This paper presents a novel computer-aided diagnosis system for melanoma. The novelty lies in the optimised selection and integration of features derived from textural, borderbased and geometrical properties of the melanoma lesion. The texture features are derived from using wavelet-decomposition, the border features are derived from constructing a boundaryseries model of the lesion border and analysing it in spatial and frequency domains, and the geometry features are derived from shape indexes. The optimised selection of features is achieved by using the Gain-Ratio method, which is shown to be computationally efficient for melanoma diagnosis application. Classification is done through the use of four classifiers; namely, Support Vector Machine, Random Forest, Logistic Model Tree and Hidden Naive Bayes. The proposed diagnostic system is applied on a set of 289 dermoscopy images (114 malignant, 175 benign) partitioned into train, validation and test image sets. The system achieves and accuracy of 91.26% and AUC value of 0.937, when 23 features are used. Other important findings include (i) the clear advantage gained in complementing texture with border and geometry features, compared to using texture information only, and (ii) higher contribution of texture features than border-based features in the optimised feature set.

  18. Data mining for dengue hemorrhagic fever (DHF) prediction with naive Bayes method

    NASA Astrophysics Data System (ADS)

    Arafiyah, Ria; Hermin, Fariani

    2018-01-01

    Handling of infectious diseases is determined by the accuracy and speed of diagnosis. Government through the Regulation of the Minister of Health of the Republic of Indonesia No. 82 of 2014 on the Control of Communicable Diseases establishes Dengue Hemorrhagic Fever (DHF) has made DHF prevention a national priority. Various attempts were made to overcome this misdiagnosis. The treatment and diagnosis of DHF using ANFIS has result an application program that can decide whether a patient has dengue fever or not [1]. An expert system of dengue prevention by using ANFIS has predict the weather and the number of sufferers [2]. The large number of data on DHF often cannot affect a person in making decisions. The use of data mining method, able to build data base support in decision makers diagnose DHF disease [3]. This study predicts DHF with the method of Naive Bayes. Parameter of The input variable is the patient’s medical data (temperature, spotting, bleeding, and tornuine test) and the output variable suffers from DBD or not while the system output is diagnosis of the patient suffering from DHF or not. Result of model test by using tools of Orange 3.4.5 obtained level of precision model is 77,3%.

  19. Bayes-LQAS: classifying the prevalence of global acute malnutrition

    PubMed Central

    2010-01-01

    Lot Quality Assurance Sampling (LQAS) applications in health have generally relied on frequentist interpretations for statistical validity. Yet health professionals often seek statements about the probability distribution of unknown parameters to answer questions of interest. The frequentist paradigm does not pretend to yield such information, although a Bayesian formulation might. This is the source of an error made in a recent paper published in this journal. Many applications lend themselves to a Bayesian treatment, and would benefit from such considerations in their design. We discuss Bayes-LQAS (B-LQAS), which allows for incorporation of prior information into the LQAS classification procedure, and thus shows how to correct the aforementioned error. Further, we pay special attention to the formulation of Bayes Operating Characteristic Curves and the use of prior information to improve survey designs. As a motivating example, we discuss the classification of Global Acute Malnutrition prevalence and draw parallels between the Bayes and classical classifications schemes. We also illustrate the impact of informative and non-informative priors on the survey design. Results indicate that using a Bayesian approach allows the incorporation of expert information and/or historical data and is thus potentially a valuable tool for making accurate and precise classifications. PMID:20534159

  20. Bayes-LQAS: classifying the prevalence of global acute malnutrition.

    PubMed

    Olives, Casey; Pagano, Marcello

    2010-06-09

    Lot Quality Assurance Sampling (LQAS) applications in health have generally relied on frequentist interpretations for statistical validity. Yet health professionals often seek statements about the probability distribution of unknown parameters to answer questions of interest. The frequentist paradigm does not pretend to yield such information, although a Bayesian formulation might. This is the source of an error made in a recent paper published in this journal. Many applications lend themselves to a Bayesian treatment, and would benefit from such considerations in their design. We discuss Bayes-LQAS (B-LQAS), which allows for incorporation of prior information into the LQAS classification procedure, and thus shows how to correct the aforementioned error. Further, we pay special attention to the formulation of Bayes Operating Characteristic Curves and the use of prior information to improve survey designs. As a motivating example, we discuss the classification of Global Acute Malnutrition prevalence and draw parallels between the Bayes and classical classifications schemes. We also illustrate the impact of informative and non-informative priors on the survey design. Results indicate that using a Bayesian approach allows the incorporation of expert information and/or historical data and is thus potentially a valuable tool for making accurate and precise classifications.

  1. Diagnosis of combined faults in Rotary Machinery by Non-Naive Bayesian approach

    NASA Astrophysics Data System (ADS)

    Asr, Mahsa Yazdanian; Ettefagh, Mir Mohammad; Hassannejad, Reza; Razavi, Seyed Naser

    2017-02-01

    When combined faults happen in different parts of the rotating machines, their features are profoundly dependent. Experts are completely familiar with individuals faults characteristics and enough data are available from single faults but the problem arises, when the faults combined and the separation of characteristics becomes complex. Therefore, the experts cannot declare exact information about the symptoms of combined fault and its quality. In this paper to overcome this drawback, a novel method is proposed. The core idea of the method is about declaring combined fault without using combined fault features as training data set and just individual fault features are applied in training step. For this purpose, after data acquisition and resampling the obtained vibration signals, Empirical Mode Decomposition (EMD) is utilized to decompose multi component signals to Intrinsic Mode Functions (IMFs). With the use of correlation coefficient, proper IMFs for feature extraction are selected. In feature extraction step, Shannon energy entropy of IMFs was extracted as well as statistical features. It is obvious that most of extracted features are strongly dependent. To consider this matter, Non-Naive Bayesian Classifier (NNBC) is appointed, which release the fundamental assumption of Naive Bayesian, i.e., the independence among features. To demonstrate the superiority of NNBC, other counterpart methods, include Normal Naive Bayesian classifier, Kernel Naive Bayesian classifier and Back Propagation Neural Networks were applied and the classification results are compared. An experimental vibration signals, collected from automobile gearbox, were used to verify the effectiveness of the proposed method. During the classification process, only the features, related individually to healthy state, bearing failure and gear failures, were assigned for training the classifier. But, combined fault features (combined gear and bearing failures) were examined as test data. The achieved probabilities for the test data show that the combined fault can be identified with high success rate.

  2. From genus to phylum: large-subunit and internal transcribed spacer rRNA operon regions show similar classification accuracies influenced by database composition.

    PubMed

    Porras-Alfaro, Andrea; Liu, Kuan-Liang; Kuske, Cheryl R; Xie, Gary

    2014-02-01

    We compared the classification accuracy of two sections of the fungal internal transcribed spacer (ITS) region, individually and combined, and the 5' section (about 600 bp) of the large-subunit rRNA (LSU), using a naive Bayesian classifier and BLASTN. A hand-curated ITS-LSU training set of 1,091 sequences and a larger training set of 8,967 ITS region sequences were used. Of the factors evaluated, database composition and quality had the largest effect on classification accuracy, followed by fragment size and use of a bootstrap cutoff to improve classification confidence. The naive Bayesian classifier and BLASTN gave similar results at higher taxonomic levels, but the classifier was faster and more accurate at the genus level when a bootstrap cutoff was used. All of the ITS and LSU sections performed well (>97.7% accuracy) at higher taxonomic ranks from kingdom to family, and differences between them were small at the genus level (within 0.66 to 1.23%). When full-length sequence sections were used, the LSU outperformed the ITS1 and ITS2 fragments at the genus level, but the ITS1 and ITS2 showed higher accuracy when smaller fragment sizes of the same length and a 50% bootstrap cutoff were used. In a comparison using the larger ITS training set, ITS1 and ITS2 had very similar accuracy classification for fragments between 100 and 200 bp. Collectively, the results show that any of the ITS or LSU sections we tested provided comparable classification accuracy to the genus level and underscore the need for larger and more diverse classification training sets.

  3. From Genus to Phylum: Large-Subunit and Internal Transcribed Spacer rRNA Operon Regions Show Similar Classification Accuracies Influenced by Database Composition

    PubMed Central

    Liu, Kuan-Liang; Kuske, Cheryl R.

    2014-01-01

    We compared the classification accuracy of two sections of the fungal internal transcribed spacer (ITS) region, individually and combined, and the 5′ section (about 600 bp) of the large-subunit rRNA (LSU), using a naive Bayesian classifier and BLASTN. A hand-curated ITS-LSU training set of 1,091 sequences and a larger training set of 8,967 ITS region sequences were used. Of the factors evaluated, database composition and quality had the largest effect on classification accuracy, followed by fragment size and use of a bootstrap cutoff to improve classification confidence. The naive Bayesian classifier and BLASTN gave similar results at higher taxonomic levels, but the classifier was faster and more accurate at the genus level when a bootstrap cutoff was used. All of the ITS and LSU sections performed well (>97.7% accuracy) at higher taxonomic ranks from kingdom to family, and differences between them were small at the genus level (within 0.66 to 1.23%). When full-length sequence sections were used, the LSU outperformed the ITS1 and ITS2 fragments at the genus level, but the ITS1 and ITS2 showed higher accuracy when smaller fragment sizes of the same length and a 50% bootstrap cutoff were used. In a comparison using the larger ITS training set, ITS1 and ITS2 had very similar accuracy classification for fragments between 100 and 200 bp. Collectively, the results show that any of the ITS or LSU sections we tested provided comparable classification accuracy to the genus level and underscore the need for larger and more diverse classification training sets. PMID:24242255

  4. Web-Enabled Distributed Health-Care Framework for Automated Malaria Parasite Classification: an E-Health Approach.

    PubMed

    Maity, Maitreya; Dhane, Dhiraj; Mungle, Tushar; Maiti, A K; Chakraborty, Chandan

    2017-10-26

    Web-enabled e-healthcare system or computer assisted disease diagnosis has a potential to improve the quality and service of conventional healthcare delivery approach. The article describes the design and development of a web-based distributed healthcare management system for medical information and quantitative evaluation of microscopic images using machine learning approach for malaria. In the proposed study, all the health-care centres are connected in a distributed computer network. Each peripheral centre manages its' own health-care service independently and communicates with the central server for remote assistance. The proposed methodology for automated evaluation of parasites includes pre-processing of blood smear microscopic images followed by erythrocytes segmentation. To differentiate between different parasites; a total of 138 quantitative features characterising colour, morphology, and texture are extracted from segmented erythrocytes. An integrated pattern classification framework is designed where four feature selection methods viz. Correlation-based Feature Selection (CFS), Chi-square, Information Gain, and RELIEF are employed with three different classifiers i.e. Naive Bayes', C4.5, and Instance-Based Learning (IB1) individually. Optimal features subset with the best classifier is selected for achieving maximum diagnostic precision. It is seen that the proposed method achieved with 99.2% sensitivity and 99.6% specificity by combining CFS and C4.5 in comparison with other methods. Moreover, the web-based tool is entirely designed using open standards like Java for a web application, ImageJ for image processing, and WEKA for data mining considering its feasibility in rural places with minimal health care facilities.

  5. Quantitative CT analysis for the preoperative prediction of pathologic grade in pancreatic neuroendocrine tumors

    NASA Astrophysics Data System (ADS)

    Chakraborty, Jayasree; Pulvirenti, Alessandra; Yamashita, Rikiya; Midya, Abhishek; Gönen, Mithat; Klimstra, David S.; Reidy, Diane L.; Allen, Peter J.; Do, Richard K. G.; Simpson, Amber L.

    2018-02-01

    Pancreatic neuroendocrine tumors (PanNETs) account for approximately 5% of all pancreatic tumors, affecting one individual per million each year.1 PanNETs are difficult to treat due to biological variability from benign to highly malignant, indolent to very aggressive. The World Health Organization classifies PanNETs into three categories based on cell proliferative rate, usually detected using the Ki67 index and cell morphology: low-grade (G1), intermediate-grade (G2) and high-grade (G3) tumors. Knowledge of grade prior to treatment would select patients for optimal therapy: G1/G2 tumors respond well to somatostatin analogs and targeted or cytotoxic drugs whereas G3 tumors would be targeted with platinum or alkylating agents.2, 3 Grade assessment is based on the pathologic examination of the surgical specimen, biopsy or ne-needle aspiration; however, heterogeneity in the proliferative index can lead to sampling errors.4 Based on studies relating qualitatively assessed shape and enhancement characteristics on CT imaging to tumor grade in PanNET,5 we propose objective classification of PanNET grade with quantitative analysis of CT images. Fifty-five patients were included in our retrospective analysis. A pathologist graded the tumors. Texture and shape-based features were extracted from CT. Random forest and naive Bayes classifiers were compared for the classification of G1/G2 and G3 PanNETs. The best area under the receiver operating characteristic curve (AUC) of 0:74 and accuracy of 71:64% was achieved with texture features. The shape-based features achieved an AUC of 0:70 and accuracy of 78:73%.

  6. A Learning-Based Approach for IP Geolocation

    NASA Astrophysics Data System (ADS)

    Eriksson, Brian; Barford, Paul; Sommers, Joel; Nowak, Robert

    The ability to pinpoint the geographic location of IP hosts is compelling for applications such as on-line advertising and network attack diagnosis. While prior methods can accurately identify the location of hosts in some regions of the Internet, they produce erroneous results when the delay or topology measurement on which they are based is limited. The hypothesis of our work is that the accuracy of IP geolocation can be improved through the creation of a flexible analytic framework that accommodates different types of geolocation information. In this paper, we describe a new framework for IP geolocation that reduces to a machine-learning classification problem. Our methodology considers a set of lightweight measurements from a set of known monitors to a target, and then classifies the location of that target based on the most probable geographic region given probability densities learned from a training set. For this study, we employ a Naive Bayes framework that has low computational complexity and enables additional environmental information to be easily added to enhance the classification process. To demonstrate the feasibility and accuracy of our approach, we test IP geolocation on over 16,000 routers given ping measurements from 78 monitors with known geographic placement. Our results show that the simple application of our method improves geolocation accuracy for over 96% of the nodes identified in our data set, with on average accuracy 70 miles closer to the true geographic location versus prior constraint-based geolocation. These results highlight the promise of our method and indicate how future expansion of the classifier can lead to further improvements in geolocation accuracy.

  7. A Pairwise Naïve Bayes Approach to Bayesian Classification.

    PubMed

    Asafu-Adjei, Josephine K; Betensky, Rebecca A

    2015-10-01

    Despite the relatively high accuracy of the naïve Bayes (NB) classifier, there may be several instances where it is not optimal, i.e. does not have the same classification performance as the Bayes classifier utilizing the joint distribution of the examined attributes. However, the Bayes classifier can be computationally intractable due to its required knowledge of the joint distribution. Therefore, we introduce a "pairwise naïve" Bayes (PNB) classifier that incorporates all pairwise relationships among the examined attributes, but does not require specification of the joint distribution. In this paper, we first describe the necessary and sufficient conditions under which the PNB classifier is optimal. We then discuss sufficient conditions for which the PNB classifier, and not NB, is optimal for normal attributes. Through simulation and actual studies, we evaluate the performance of our proposed classifier relative to the Bayes and NB classifiers, along with the HNB, AODE, LBR and TAN classifiers, using normal density and empirical estimation methods. Our applications show that the PNB classifier using normal density estimation yields the highest accuracy for data sets containing continuous attributes. We conclude that it offers a useful compromise between the Bayes and NB classifiers.

  8. Problems of stock definition in estimating relative contributions of Atlantic striped bass to the coastal fishery

    USGS Publications Warehouse

    Waldman, John R.; Fabrizio, Mary C.

    1994-01-01

    Stock contribution studies of mixed-stock fisheries rely on the application of classification algorithms to samples of unknown origin. Although the performance of these algorithms can be assessed, there are no guidelines regarding decisions about including minor stocks, pooling stocks into regional groups, or sampling discrete substocks to adequately characterize a stock. We examined these questions for striped bass Morone saxatilis of the U.S. Atlantic coast by applying linear discriminant functions to meristic and morphometric data from fish collected from spawning areas. Some of our samples were from the Hudson and Roanoke rivers and four tributaries of the Chesapeake Bay. We also collected fish of mixed-stock origin from the Atlantic Ocean near Montauk, New York. Inclusion of the minor stock from the Roanoke River in the classification algorithm decreased the correct-classification rate, whereas grouping of the Roanoke River and Chesapeake Bay stock into a regional (''southern'') group increased the overall resolution. The increased resolution was offset by our inability to obtain separate contribution estimates of the groups that were pooled. Although multivariate analysis of variance indicated significant differences among Chesapeake Bay substocks, increasing the number of substocks in the discriminant analysis decreased the overall correct-classification rate. Although the inclusion of one, two, three, or four substocks in the classification algorithm did not greatly affect the overall correct-classification rates, the specific combination of substocks significantly affected the relative contribution estimates derived from the mixed-stock sample. Future studies of this kind must balance the costs and benefits of including minor stocks and would profit from examination of the variation in discriminant characters among all Chesapeake Bay substocks.

  9. Persistence of the Intuitive Conception of Living Things in Adolescence

    ERIC Educational Resources Information Center

    Babai, Reuven; Sekal, Rachel; Stavy, Ruth

    2010-01-01

    This study investigated whether intuitive, naive conceptions of "living things" based on objects' mobility (movement = alive) persist into adolescence and affect 10th graders' accuracy of responses and reaction times during object classification. Most of the 58 students classified the test objects correctly as living/nonliving, yet they…

  10. Hyperspectral Biofilm Classification Analysis for Carrying Capacity of Migratory Birds in the South Bay Salt Ponds

    NASA Technical Reports Server (NTRS)

    Hsu, Wei-Chen; Kuss, Amber Jean; Ketron, Tyler; Nguyen, Andrew; Remar, Alex Covello; Newcomer, Michelle; Fleming, Erich; Debout, Leslie; Debout, Brad; Detweiler, Angela; hide

    2011-01-01

    Tidal marshes are highly productive ecosystems that support migratory birds as roosting and over-wintering habitats on the Pacific Flyway. Microphytobenthos, or more commonly 'biofilms' contribute significantly to the primary productivity of wetland ecosystems, and provide a substantial food source for macroinvertebrates and avian communities. In this study, biofilms were characterized based on taxonomic classification, density differences, and spectral signatures. These techniques were then applied to remotely sensed images to map biofilm densities and distributions in the South Bay Salt Ponds and predict the carrying capacity of these newly restored ponds for migratory birds. The GER-1500 spectroradiometer was used to obtain in situ spectral signatures for each density-class of biofilm. The spectral variation and taxonomic classification between high, medium, and low density biofilm cover types was mapped using in-situ spectral measurements and classification of EO-1 Hyperion and Landsat TM 5 images. Biofilm samples were also collected in the field to perform laboratory analyses including chlorophyll-a, taxonomic classification, and energy content. Comparison of the spectral signatures between the three density groups shows distinct variations useful for classification. Also, analysis of chlorophyll-a concentrations show statistically significant differences between each density group, using the Tukey-Kramer test at an alpha level of 0.05. The potential carrying capacity in South Bay Salt Ponds is estimated to be 250,000 birds.

  11. Non-invasive Fetal ECG Signal Quality Assessment for Multichannel Heart Rate Estimation.

    PubMed

    Andreotti, Fernando; Graser, Felix; Malberg, Hagen; Zaunseder, Sebastian

    2017-12-01

    The noninvasive fetal ECG (NI-FECG) from abdominal recordings offers novel prospects for prenatal monitoring. However, NI-FECG signals are corrupted by various nonstationary noise sources, making the processing of abdominal recordings a challenging task. In this paper, we present an online approach that dynamically assess the quality of NI-FECG to improve fetal heart rate (FHR) estimation. Using a naive Bayes classifier, state-of-the-art and novel signal quality indices (SQIs), and an existing adaptive Kalman filter, FHR estimation was improved. For the purpose of training and validating the proposed methods, a large annotated private clinical dataset was used. The suggested classification scheme demonstrated an accuracy of Krippendorff's alpha in determining the overall quality of NI-FECG signals. The proposed Kalman filter outperformed alternative methods for FHR estimation achieving accuracy. The proposed algorithm was able to reliably reflect changes of signal quality and can be used in improving FHR estimation. NI-ECG signal quality estimation and multichannel information fusion are largely unexplored topics. Based on previous works, multichannel FHR estimation is a field that could strongly benefit from such methods. The developed SQI algorithms as well as resulting classifier were made available under a GNU GPL open-source license and contributed to the FECGSYN toolbox.

  12. Personal recognition using hand shape and texture.

    PubMed

    Kumar, Ajay; Zhang, David

    2006-08-01

    This paper proposes a new bimodal biometric system using feature-level fusion of hand shape and palm texture. The proposed combination is of significance since both the palmprint and hand-shape images are proposed to be extracted from the single hand image acquired from a digital camera. Several new hand-shape features that can be used to represent the hand shape and improve the performance are investigated. The new approach for palmprint recognition using discrete cosine transform coefficients, which can be directly obtained from the camera hardware, is demonstrated. None of the prior work on hand-shape or palmprint recognition has given any attention on the critical issue of feature selection. Our experimental results demonstrate that while majority of palmprint or hand-shape features are useful in predicting the subjects identity, only a small subset of these features are necessary in practice for building an accurate model for identification. The comparison and combination of proposed features is evaluated on the diverse classification schemes; naive Bayes (normal, estimated, multinomial), decision trees (C4.5, LMT), k-NN, SVM, and FFN. Although more work remains to be done, our results to date indicate that the combination of selected hand-shape and palmprint features constitutes a promising addition to the biometrics-based personal recognition systems.

  13. Computing symmetrical strength of N-grams: a two pass filtering approach in automatic classification of text documents.

    PubMed

    Agnihotri, Deepak; Verma, Kesari; Tripathi, Priyanka

    2016-01-01

    The contiguous sequences of the terms (N-grams) in the documents are symmetrically distributed among different classes. The symmetrical distribution of the N-Grams raises uncertainty in the belongings of the N-Grams towards the class. In this paper, we focused on the selection of most discriminating N-Grams by reducing the effects of symmetrical distribution. In this context, a new text feature selection method named as the symmetrical strength of the N-Grams (SSNG) is proposed using a two pass filtering based feature selection (TPF) approach. Initially, in the first pass of the TPF, the SSNG method chooses various informative N-Grams from the entire extracted N-Grams of the corpus. Subsequently, in the second pass the well-known Chi Square (χ(2)) method is being used to select few most informative N-Grams. Further, to classify the documents the two standard classifiers Multinomial Naive Bayes and Linear Support Vector Machine have been applied on the ten standard text data sets. In most of the datasets, the experimental results state the performance and success rate of SSNG method using TPF approach is superior to the state-of-the-art methods viz. Mutual Information, Information Gain, Odds Ratio, Discriminating Feature Selection and χ(2).

  14. Toward More Accurate Iris Recognition Using Cross-Spectral Matching.

    PubMed

    Nalla, Pattabhi Ramaiah; Kumar, Ajay

    2017-01-01

    Iris recognition systems are increasingly deployed for large-scale applications such as national ID programs, which continue to acquire millions of iris images to establish identity among billions. However, with the availability of variety of iris sensors that are deployed for the iris imaging under different illumination/environment, significant performance degradation is expected while matching such iris images acquired under two different domains (either sensor-specific or wavelength-specific). This paper develops a domain adaptation framework to address this problem and introduces a new algorithm using Markov random fields model to significantly improve cross-domain iris recognition. The proposed domain adaptation framework based on the naive Bayes nearest neighbor classification uses a real-valued feature representation, which is capable of learning domain knowledge. Our approach to estimate corresponding visible iris patterns from the synthesis of iris patches in the near infrared iris images achieves outperforming results for the cross-spectral iris recognition. In this paper, a new class of bi-spectral iris recognition system that can simultaneously acquire visible and near infra-red images with pixel-to-pixel correspondences is proposed and evaluated. This paper presents experimental results from three publicly available databases; PolyU cross-spectral iris image database, IIITD CLI and UND database, and achieve outperforming results for the cross-sensor and cross-spectral iris matching.

  15. Machine vision based quality inspection of flat glass products

    NASA Astrophysics Data System (ADS)

    Zauner, G.; Schagerl, M.

    2014-03-01

    This application paper presents a machine vision solution for the quality inspection of flat glass products. A contact image sensor (CIS) is used to generate digital images of the glass surfaces. The presented machine vision based quality inspection at the end of the production line aims to classify five different glass defect types. The defect images are usually characterized by very little `image structure', i.e. homogeneous regions without distinct image texture. Additionally, these defect images usually consist of only a few pixels. At the same time the appearance of certain defect classes can be very diverse (e.g. water drops). We used simple state-of-the-art image features like histogram-based features (std. deviation, curtosis, skewness), geometric features (form factor/elongation, eccentricity, Hu-moments) and texture features (grey level run length matrix, co-occurrence matrix) to extract defect information. The main contribution of this work now lies in the systematic evaluation of various machine learning algorithms to identify appropriate classification approaches for this specific class of images. In this way, the following machine learning algorithms were compared: decision tree (J48), random forest, JRip rules, naive Bayes, Support Vector Machine (multi class), neural network (multilayer perceptron) and k-Nearest Neighbour. We used a representative image database of 2300 defect images and applied cross validation for evaluation purposes.

  16. Hydrologic Landscape Classification to Estimate Bristol Bay Watershed Hydrology

    EPA Science Inventory

    The use of hydrologic landscapes has proven to be a useful tool for broad scale assessment and classification of landscapes across the United States. These classification systems help organize larger geographical areas into areas of similar hydrologic characteristics based on cl...

  17. Beyond where to how: a machine learning approach for sensing mobility contexts using smartphone sensors.

    PubMed

    Guinness, Robert E

    2015-04-28

    This paper presents the results of research on the use of smartphone sensors (namely, GPS and accelerometers), geospatial information (points of interest, such as bus stops and train stations) and machine learning (ML) to sense mobility contexts. Our goal is to develop techniques to continuously and automatically detect a smartphone user's mobility activities, including walking, running, driving and using a bus or train, in real-time or near-real-time (<5 s). We investigated a wide range of supervised learning techniques for classification, including decision trees (DT), support vector machines (SVM), naive Bayes classifiers (NB), Bayesian networks (BN), logistic regression (LR), artificial neural networks (ANN) and several instance-based classifiers (KStar, LWLand IBk). Applying ten-fold cross-validation, the best performers in terms of correct classification rate (i.e., recall) were DT (96.5%), BN (90.9%), LWL (95.5%) and KStar (95.6%). In particular, the DT-algorithm RandomForest exhibited the best overall performance. After a feature selection process for a subset of algorithms, the performance was improved slightly. Furthermore, after tuning the parameters of RandomForest, performance improved to above 97.5%. Lastly, we measured the computational complexity of the classifiers, in terms of central processing unit (CPU) time needed for classification, to provide a rough comparison between the algorithms in terms of battery usage requirements. As a result, the classifiers can be ranked from lowest to highest complexity (i.e., computational cost) as follows: SVM, ANN, LR, BN, DT, NB, IBk, LWL and KStar. The instance-based classifiers take considerably more computational time than the non-instance-based classifiers, whereas the slowest non-instance-based classifier (NB) required about five-times the amount of CPU time as the fastest classifier (SVM). The above results suggest that DT algorithms are excellent candidates for detecting mobility contexts in smartphones, both in terms of performance and computational complexity.

  18. Beyond Where to How: A Machine Learning Approach for Sensing Mobility Contexts Using Smartphone Sensors †

    PubMed Central

    Guinness, Robert E.

    2015-01-01

    This paper presents the results of research on the use of smartphone sensors (namely, GPS and accelerometers), geospatial information (points of interest, such as bus stops and train stations) and machine learning (ML) to sense mobility contexts. Our goal is to develop techniques to continuously and automatically detect a smartphone user's mobility activities, including walking, running, driving and using a bus or train, in real-time or near-real-time (<5 s). We investigated a wide range of supervised learning techniques for classification, including decision trees (DT), support vector machines (SVM), naive Bayes classifiers (NB), Bayesian networks (BN), logistic regression (LR), artificial neural networks (ANN) and several instance-based classifiers (KStar, LWLand IBk). Applying ten-fold cross-validation, the best performers in terms of correct classification rate (i.e., recall) were DT (96.5%), BN (90.9%), LWL (95.5%) and KStar (95.6%). In particular, the DT-algorithm RandomForest exhibited the best overall performance. After a feature selection process for a subset of algorithms, the performance was improved slightly. Furthermore, after tuning the parameters of RandomForest, performance improved to above 97.5%. Lastly, we measured the computational complexity of the classifiers, in terms of central processing unit (CPU) time needed for classification, to provide a rough comparison between the algorithms in terms of battery usage requirements. As a result, the classifiers can be ranked from lowest to highest complexity (i.e., computational cost) as follows: SVM, ANN, LR, BN, DT, NB, IBk, LWL and KStar. The instance-based classifiers take considerably more computational time than the non-instance-based classifiers, whereas the slowest non-instance-based classifier (NB) required about five-times the amount of CPU time as the fastest classifier (SVM). The above results suggest that DT algorithms are excellent candidates for detecting mobility contexts in smartphones, both in terms of performance and computational complexity. PMID:25928060

  19. Minimum Bayes risk image correlation

    NASA Technical Reports Server (NTRS)

    Minter, T. C., Jr.

    1980-01-01

    In this paper, the problem of designing a matched filter for image correlation will be treated as a statistical pattern recognition problem. It is shown that, by minimizing a suitable criterion, a matched filter can be estimated which approximates the optimum Bayes discriminant function in a least-squares sense. It is well known that the use of the Bayes discriminant function in target classification minimizes the Bayes risk, which in turn directly minimizes the probability of a false fix. A fast Fourier implementation of the minimum Bayes risk correlation procedure is described.

  20. Linear dimension reduction and Bayes classification

    NASA Technical Reports Server (NTRS)

    Decell, H. P., Jr.; Odell, P. L.; Coberly, W. A.

    1978-01-01

    An explicit expression for a compression matrix T of smallest possible left dimension K consistent with preserving the n variate normal Bayes assignment of X to a given one of a finite number of populations and the K variate Bayes assignment of TX to that population was developed. The Bayes population assignment of X and TX were shown to be equivalent for a compression matrix T explicitly calculated as a function of the means and covariances of the given populations.

  1. Fuzzy Naive Bayesian model for medical diagnostic decision support.

    PubMed

    Wagholikar, Kavishwar B; Vijayraghavan, Sundararajan; Deshpande, Ashok W

    2009-01-01

    This work relates to the development of computational algorithms to provide decision support to physicians. The authors propose a Fuzzy Naive Bayesian (FNB) model for medical diagnosis, which extends the Fuzzy Bayesian approach proposed by Okuda. A physician's interview based method is described to define a orthogonal fuzzy symptom information system, required to apply the model. For the purpose of elaboration and elicitation of characteristics, the algorithm is applied to a simple simulated dataset, and compared with conventional Naive Bayes (NB) approach. As a preliminary evaluation of FNB in real world scenario, the comparison is repeated on a real fuzzy dataset of 81 patients diagnosed with infectious diseases. The case study on simulated dataset elucidates that FNB can be optimal over NB for diagnosing patients with imprecise-fuzzy information, on account of the following characteristics - 1) it can model the information that, values of some attributes are semantically closer than values of other attributes, and 2) it offers a mechanism to temper exaggerations in patient information. Although the algorithm requires precise training data, its utility for fuzzy training data is argued for. This is supported by the case study on infectious disease dataset, which indicates optimality of FNB over NB for the infectious disease domain. Further case studies on large datasets are required to establish utility of FNB.

  2. Treatment effect heterogeneity for univariate subgroups in clinical trials: Shrinkage, standardization, or else

    PubMed Central

    Varadhan, Ravi; Wang, Sue-Jane

    2016-01-01

    Treatment effect heterogeneity is a well-recognized phenomenon in randomized controlled clinical trials. In this paper, we discuss subgroup analyses with prespecified subgroups of clinical or biological importance. We explore various alternatives to the naive (the traditional univariate) subgroup analyses to address the issues of multiplicity and confounding. Specifically, we consider a model-based Bayesian shrinkage (Bayes-DS) and a nonparametric, empirical Bayes shrinkage approach (Emp-Bayes) to temper the optimism of traditional univariate subgroup analyses; a standardization approach (standardization) that accounts for correlation between baseline covariates; and a model-based maximum likelihood estimation (MLE) approach. The Bayes-DS and Emp-Bayes methods model the variation in subgroup-specific treatment effect rather than testing the null hypothesis of no difference between subgroups. The standardization approach addresses the issue of confounding in subgroup analyses. The MLE approach is considered only for comparison in simulation studies as the “truth” since the data were generated from the same model. Using the characteristics of a hypothetical large outcome trial, we perform simulation studies and articulate the utilities and potential limitations of these estimators. Simulation results indicate that Bayes-DS and Emp-Bayes can protect against optimism present in the naïve approach. Due to its simplicity, the naïve approach should be the reference for reporting univariate subgroup-specific treatment effect estimates from exploratory subgroup analyses. Standardization, although it tends to have a larger variance, is suggested when it is important to address the confounding of univariate subgroup effects due to correlation between baseline covariates. The Bayes-DS approach is available as an R package (DSBayes). PMID:26485117

  3. Elementary School Students' Understandings of Technology Concepts.

    ERIC Educational Resources Information Center

    Davis, Robert S.; Ginns, Ian S.; McRobbie, Campbell J.

    2002-01-01

    Students in grades 2 (n=27), 4 (n=37), and 6 (n=28) were asked questions about artifacts or pictures. Their explanations, which revealed their understanding of such technological concepts as material properties and stability, were classified as naive, artifact related, or not artifact related. Explanations tended to cluster in a classification at…

  4. Data Exploration and Analysis of Alternative Learning System Accreditation and Equivalency Test Result Using Data Mining

    NASA Astrophysics Data System (ADS)

    Talingdan, J. A.; Trinidad, J. T., Jr.; Palaoag, T. D.

    2018-03-01

    Alternative Learning System (ALS) is a subsystem of Depatment of Education (DepEd) that serves as an option of learners who cannot afford to go in a formal education. The research focuses on the data exploration and analysis of ALS accreditation and equivalency test result using data mining. The ALS 2014 to 2016 A & E test results in the secondary level were used as data sets in the study. The A & E test results revealed that the passing rate is doubled per year. The results were clustered using k- means clustering algorithm and they were grouped into good, medium, and low standard learners to identify students need exceptional stuff for enhancement. From the clustered data, it was found out that the strand they are weak in is strand 4 which is the Development of Self and a Sense of Community with a general average of 84.23. It also revealed that the essay type of exam got the lowest score with a general average of 2.14 compared to the multiple type of exam that covers the five learning strands. Furthermore, decision tree and naive bayes were also employed in the study to predict the performance of the learners in the A & E test and determine which is better to use for prediction. It was concluded that naive bayes performs better because the accuracy rate is higher than the decision tree algorithm.

  5. Applying data mining techniques to improve diagnosis in neonatal jaundice.

    PubMed

    Ferreira, Duarte; Oliveira, Abílio; Freitas, Alberto

    2012-12-07

    Hyperbilirubinemia is emerging as an increasingly common problem in newborns due to a decreasing hospital length of stay after birth. Jaundice is the most common disease of the newborn and although being benign in most cases it can lead to severe neurological consequences if poorly evaluated. In different areas of medicine, data mining has contributed to improve the results obtained with other methodologies.Hence, the aim of this study was to improve the diagnosis of neonatal jaundice with the application of data mining techniques. This study followed the different phases of the Cross Industry Standard Process for Data Mining model as its methodology.This observational study was performed at the Obstetrics Department of a central hospital (Centro Hospitalar Tâmega e Sousa--EPE), from February to March of 2011. A total of 227 healthy newborn infants with 35 or more weeks of gestation were enrolled in the study. Over 70 variables were collected and analyzed. Also, transcutaneous bilirubin levels were measured from birth to hospital discharge with maximum time intervals of 8 hours between measurements, using a noninvasive bilirubinometer.Different attribute subsets were used to train and test classification models using algorithms included in Weka data mining software, such as decision trees (J48) and neural networks (multilayer perceptron). The accuracy results were compared with the traditional methods for prediction of hyperbilirubinemia. The application of different classification algorithms to the collected data allowed predicting subsequent hyperbilirubinemia with high accuracy. In particular, at 24 hours of life of newborns, the accuracy for the prediction of hyperbilirubinemia was 89%. The best results were obtained using the following algorithms: naive Bayes, multilayer perceptron and simple logistic. The findings of our study sustain that, new approaches, such as data mining, may support medical decision, contributing to improve diagnosis in neonatal jaundice.

  6. Sequentially distant but structurally similar proteins exhibit fold specific patterns based on their biophysical properties.

    PubMed

    Rajendran, Senthilnathan; Jothi, Arunachalam

    2018-05-16

    The Three-dimensional structure of a protein depends on the interaction between their amino acid residues. These interactions are in turn influenced by various biophysical properties of the amino acids. There are several examples of proteins that share the same fold but are very dissimilar at the sequence level. For proteins to share a common fold some crucial interactions should be maintained despite insignificant sequence similarity. Since the interactions are because of the biophysical properties of the amino acids, we should be able to detect descriptive patterns for folds at such a property level. In this line, the main focus of our research is to analyze such proteins and to characterize them in terms of their biophysical properties. Protein structures with sequence similarity lesser than 40% were selected for ten different subfolds from three different mainfolds (according to CATH classification) and were used for this analysis. We used the normalized values of the 49 physio-chemical, energetic and conformational properties of amino acids. We characterize the folds based on the average biophysical property values. We also observed a fold specific correlational behavior of biophysical properties despite a very low sequence similarity in our data. We further trained three different binary classification models (Naive Bayes-NB, Support Vector Machines-SVM and Bayesian Generalized Linear Model-BGLM) which could discriminate mainfold based on the biophysical properties. We also show that among the three generated models, the BGLM classifier model was able to discriminate protein sequences coming under all beta category with 81.43% accuracy and all alpha, alpha-beta proteins with 83.37% accuracy. Copyright © 2018 Elsevier Ltd. All rights reserved.

  7. Multi-temporal Land Use Mapping of Coastal Wetlands Area using Machine Learning in Google Earth Engine

    NASA Astrophysics Data System (ADS)

    Farda, N. M.

    2017-12-01

    Coastal wetlands provide ecosystem services essential to people and the environment. Changes in coastal wetlands, especially on land use, are important to monitor by utilizing multi-temporal imagery. The Google Earth Engine (GEE) provides many machine learning algorithms (10 algorithms) that are very useful for extracting land use from imagery. The research objective is to explore machine learning in Google Earth Engine and its accuracy for multi-temporal land use mapping of coastal wetland area. Landsat 3 MSS (1978), Landsat 5 TM (1991), Landsat 7 ETM+ (2001), and Landsat 8 OLI (2014) images located in Segara Anakan lagoon are selected to represent multi temporal images. The input for machine learning are visible and near infrared bands, PCA band, invers PCA bands, bare soil index, vegetation index, wetness index, elevation from ASTER GDEM, and GLCM (Harralick) texture, and also polygon samples in 140 locations. There are 10 machine learning algorithms applied to extract coastal wetlands land use from Landsat imagery. The algorithms are Fast Naive Bayes, CART (Classification and Regression Tree), Random Forests, GMO Max Entropy, Perceptron (Multi Class Perceptron), Winnow, Voting SVM, Margin SVM, Pegasos (Primal Estimated sub-GrAdient SOlver for Svm), IKPamir (Intersection Kernel Passive Aggressive Method for Information Retrieval, SVM). Machine learning in Google Earth Engine are very helpful in multi-temporal land use mapping, the highest accuracy for land use mapping of coastal wetland is CART with 96.98 % Overall Accuracy using K-Fold Cross Validation (K = 10). GEE is particularly useful for multi-temporal land use mapping with ready used image and classification algorithms, and also very challenging for other applications.

  8. New software methods in radar ornithology using WSR-88D weather data and potential application to monitoring effects of climate change on bird migration

    USGS Publications Warehouse

    Mead, Reginald; Paxton, John; Sojda, Richard S.; Swayne, David A.; Yang, Wanhong; Voinov, A.A.; Rizzoli, A.; Filatova, T.

    2010-01-01

    Radar ornithology has provided tools for studying the movement of birds, especially related to migration. Researchers have presented qualitative evidence suggesting that birds, or at least migration events, can be identified using large broad scale radars such as the WSR-88D used in the NEXRAD weather surveillance system. This is potentially a boon for ornithologists because such data cover a large portion of the United States, are constantly being produced, are freely available, and have been archived since the early 1990s. A major obstacle to this research, however, has been that identifying birds in NEXRAD data has required a trained technician to manually inspect a graphically rendered radar sweep. A single site completes one volume scan every five to ten minutes, producing over 52,000 volume scans in one year. This is an immense amount of data, and manual classification is infeasible. We have developed a system that identifies biological echoes using machine learning techniques. This approach begins with training data using scans that have been classified by experts, or uses bird data collected in the field. The data are preprocessed to ensure quality and to emphasize relevant features. A classifier is then trained using this data and cross validation is used to measure performance. We compared neural networks, naive Bayes, and k-nearest neighbor classifiers. Empirical evidence is provided showing that this system can achieve classification accuracies in the 80th to 90th percentile. We propose to apply these methods to studying bird migration phenology and how it is affected by climate variability and change over multiple temporal scales.

  9. The Analysis Performance Method Naive Bayes Andssvm Determine Pattern Groups of Disease

    NASA Astrophysics Data System (ADS)

    Sitanggang, Rianto; Tulus; Situmorang, Zakarias

    2017-12-01

    Information is a very important element and into the daily needs of the moment, to get a precise and accurate information is not easy, this research can help decision makers and make a comparison. Researchers perform data mining techniques to analyze the performance of methods and algorithms naïve Bayes methods Smooth Support Vector Machine (ssvm) in the grouping of the disease.The pattern of disease that is often suffered by people in the group can be in the detection area of the collection of information contained in the medical record. Medical records have infromasi disease by patients in coded according to standard WHO. Processing of medical record data to find patterns of this group of diseases that often occur in this community take the attribute address, sex, type of disease, and age. Determining the next analysis is grouping of four ersebut attribute. From the results of research conducted on the dataset fever diabete mellitus, naïve Bayes method produces an average value of 99% and an accuracy and SSVM method produces an average value of 93% accuracy

  10. [Naïve Bayes classification for classifying injury-cause groups from Emergency Room data in the Friuli Venezia Giulia region (Northern Italy)].

    PubMed

    Valent, Francesca; Clagnan, Elena; Zanier, Loris

    2014-01-01

    to assess whether Naïve Bayes Classification could be used to classify injury causes from the Emergency Room (ER) database, because in the Friuli Venezia Giulia Region (Northern Italy) the electronic ER data have never been used to study the epidemiology of injuries, because the proportion of generic "accidental" causes is much higher than that of injuries with a specific cause. application of the Naïve Bayes Classification method to the regional ER database. sensitivity, specificity, positive and negative predictive values, agreement, and the kappa statistic were calculated for the train dataset and the distribution of causes of injury for the test dataset. on 22.248 records with known cause, the classifications assigned by the model agreed moderately (kappa =0.53) with those assigned by ER personnel. The model was then used on 76.660 unclassified cases. Although sensitivity and positive predictive value of the method were generally poor, mainly due to limitations in the ER data, it allowed to estimate for the first time the frequency of specific injury causes in the Region. the model was useful to provide the "big picture" of non-fatal injuries in the Region. To improve the collection of injury data at the ER, the options available for injury classification in the ER software are being revised to make categories exhaustive and mutually exclusive.

  11. A Dirichlet-Multinomial Bayes Classifier for Disease Diagnosis with Microbial Compositions.

    PubMed

    Gao, Xiang; Lin, Huaiying; Dong, Qunfeng

    2017-01-01

    Dysbiosis of microbial communities is associated with various human diseases, raising the possibility of using microbial compositions as biomarkers for disease diagnosis. We have developed a Bayes classifier by modeling microbial compositions with Dirichlet-multinomial distributions, which are widely used to model multicategorical count data with extra variation. The parameters of the Dirichlet-multinomial distributions are estimated from training microbiome data sets based on maximum likelihood. The posterior probability of a microbiome sample belonging to a disease or healthy category is calculated based on Bayes' theorem, using the likelihood values computed from the estimated Dirichlet-multinomial distribution, as well as a prior probability estimated from the training microbiome data set or previously published information on disease prevalence. When tested on real-world microbiome data sets, our method, called DMBC (for Dirichlet-multinomial Bayes classifier), shows better classification accuracy than the only existing Bayesian microbiome classifier based on a Dirichlet-multinomial mixture model and the popular random forest method. The advantage of DMBC is its built-in automatic feature selection, capable of identifying a subset of microbial taxa with the best classification accuracy between different classes of samples based on cross-validation. This unique ability enables DMBC to maintain and even improve its accuracy at modeling species-level taxa. The R package for DMBC is freely available at https://github.com/qunfengdong/DMBC. IMPORTANCE By incorporating prior information on disease prevalence, Bayes classifiers have the potential to estimate disease probability better than other common machine-learning methods. Thus, it is important to develop Bayes classifiers specifically tailored for microbiome data. Our method shows higher classification accuracy than the only existing Bayesian classifier and the popular random forest method, and thus provides an alternative option for using microbial compositions for disease diagnosis.

  12. Automatic Fault Characterization via Abnormality-Enhanced Classification

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bronevetsky, G; Laguna, I; de Supinski, B R

    Enterprise and high-performance computing systems are growing extremely large and complex, employing hundreds to hundreds of thousands of processors and software/hardware stacks built by many people across many organizations. As the growing scale of these machines increases the frequency of faults, system complexity makes these faults difficult to detect and to diagnose. Current system management techniques, which focus primarily on efficient data access and query mechanisms, require system administrators to examine the behavior of various system services manually. Growing system complexity is making this manual process unmanageable: administrators require more effective management tools that can detect faults and help tomore » identify their root causes. System administrators need timely notification when a fault is manifested that includes the type of fault, the time period in which it occurred and the processor on which it originated. Statistical modeling approaches can accurately characterize system behavior. However, the complex effects of system faults make these tools difficult to apply effectively. This paper investigates the application of classification and clustering algorithms to fault detection and characterization. We show experimentally that naively applying these methods achieves poor accuracy. Further, we design novel techniques that combine classification algorithms with information on the abnormality of application behavior to improve detection and characterization accuracy. Our experiments demonstrate that these techniques can detect and characterize faults with 65% accuracy, compared to just 5% accuracy for naive approaches.« less

  13. Predicting healthcare associated infections using patients' experiences

    NASA Astrophysics Data System (ADS)

    Pratt, Michael A.; Chu, Henry

    2016-05-01

    Healthcare associated infections (HAI) are a major threat to patient safety and are costly to health systems. Our goal is to predict the HAI performance of a hospital using the patients' experience responses as input. We use four classifiers, viz. random forest, naive Bayes, artificial feedforward neural networks, and the support vector machine, to perform the prediction of six types of HAI. The six types include blood stream, urinary tract, surgical site, and intestinal infections. Experiments show that the random forest and support vector machine perform well across the six types of HAI.

  14. Comparing K-mer based methods for improved classification of 16S sequences.

    PubMed

    Vinje, Hilde; Liland, Kristian Hovde; Almøy, Trygve; Snipen, Lars

    2015-07-01

    The need for precise and stable taxonomic classification is highly relevant in modern microbiology. Parallel to the explosion in the amount of sequence data accessible, there has also been a shift in focus for classification methods. Previously, alignment-based methods were the most applicable tools. Now, methods based on counting K-mers by sliding windows are the most interesting classification approach with respect to both speed and accuracy. Here, we present a systematic comparison on five different K-mer based classification methods for the 16S rRNA gene. The methods differ from each other both in data usage and modelling strategies. We have based our study on the commonly known and well-used naïve Bayes classifier from the RDP project, and four other methods were implemented and tested on two different data sets, on full-length sequences as well as fragments of typical read-length. The difference in classification error obtained by the methods seemed to be small, but they were stable and for both data sets tested. The Preprocessed nearest-neighbour (PLSNN) method performed best for full-length 16S rRNA sequences, significantly better than the naïve Bayes RDP method. On fragmented sequences the naïve Bayes Multinomial method performed best, significantly better than all other methods. For both data sets explored, and on both full-length and fragmented sequences, all the five methods reached an error-plateau. We conclude that no K-mer based method is universally best for classifying both full-length sequences and fragments (reads). All methods approach an error plateau indicating improved training data is needed to improve classification from here. Classification errors occur most frequent for genera with few sequences present. For improving the taxonomy and testing new classification methods, the need for a better and more universal and robust training data set is crucial.

  15. Improving Hospital-Wide Early Resource Allocation through Machine Learning.

    PubMed

    Gartner, Daniel; Padman, Rema

    2015-01-01

    The objective of this paper is to evaluate the extent to which early determination of diagnosis-related groups (DRGs) can be used for better allocation of scarce hospital resources. When elective patients seek admission, the true DRG, currently determined only at discharge, is unknown. We approach the problem of early DRG determination in three stages: (1) test how much a Naïve Bayes classifier can improve classification accuracy as compared to a hospital's current approach; (2) develop a statistical program that makes admission and scheduling decisions based on the patients' clincial pathways and scarce hospital resources; and (3) feed the DRG as classified by the Naïve Bayes classifier and the hospitals' baseline approach into the model (which we evaluate in simulation). Our results reveal that the DRG grouper performs poorly in classifying the DRG correctly before admission while the Naïve Bayes approach substantially improves the classification task. The results from the connection of the classification method with the mathematical program also reveal that resource allocation decisions can be more effective and efficient with the hybrid approach.

  16. Classification of patients by severity grades during triage in the emergency department using data mining methods.

    PubMed

    Zmiri, Dror; Shahar, Yuval; Taieb-Maimon, Meirav

    2012-04-01

    To test the feasibility of classifying emergency department patients into severity grades using data mining methods. Emergency department records of 402 patients were classified into five severity grades by two expert physicians. The Naïve Bayes and C4.5 algorithms were applied to produce classifiers from patient data into severity grades. The classifiers' results over several subsets of the data were compared with the physicians' assessments, with a random classifier, and with a classifier that selects the maximal-prevalence class. Positive predictive value, multiple-class extensions of sensitivity and specificity combinations, and entropy change. The mean accuracy of the data mining classifiers was 52.94 ± 5.89%, significantly better (P < 0.05) than the mean accuracy of a random classifier (34.60 ± 2.40%). The entropy of the input data sets was reduced through classification by a mean of 10.1%. Allowing for classification deviations of one severity grade led to mean accuracy of 85.42 ± 1.42%. The classifiers' accuracy in that case was similar to the physicians' consensus rate. Learning from consensus records led to better performance. Reducing the number of severity grades improved results in certain cases. The performance of the Naïve Bayes and C4.5 algorithms was similar; in unbalanced data sets, Naïve Bayes performed better. It is possible to produce a computerized classification model for the severity grade of triage patients, using data mining methods. Learning from patient records regarding which there is a consensus of several physicians is preferable to learning from each physician's patients. Either Naïve Bayes or C4.5 can be used; Naïve Bayes is preferable for unbalanced data sets. An ambiguity in the intermediate severity grades seems to hamper both the physicians' agreement and the classifiers' accuracy. © 2010 Blackwell Publishing Ltd.

  17. Producing a satellite-derived map and modelling Spartina alterniflora expansion for Willapa Bay in Washington State

    NASA Astrophysics Data System (ADS)

    Berlin, Cynthia Jane

    1998-12-01

    This research addresses the identification of the areal extent of the intertidal wetlands of Willapa Bay, Washington, and the evaluation of the potential for exotic Spartina alterniflora (smooth cordgrass) expansion in the bay using a spatial geographic approach. It is hoped that the results will address not only the management needs of the study area but provide a research design that may be applied to studies of other coastal wetlands. Four satellite images, three Landsat Multi-Spectral (MSS) and one Thematic Mapper (TM), are used to derive a map showing areas of water, low, middle and high intertidal, and upland. Two multi-date remote sensing mapping techniques are assessed: a supervised classification using density-slicing and an unsupervised classification using an ISODATA algorithm. Statistical comparisons are made between the resultant derived maps and the U.S.G.S. topographic maps for the Willapa Bay area. The potential for Spartina expansion in the bay is assessed using a sigmoidal (logistic) growth model and a spatial modelling procedure for four possible growth scenarios: without management controls (Business-as-Usual), with moderate management controls (e.g. harvesting to eliminate seed setting), under a hypothetical increase in the growth rate that may reflect favorable environmental changes, and under a hypothetical decrease in the growth rate that may reflect aggressive management controls. Comparisons for the statistics of the two mapping techniques suggest that although the unsupervised classification method performed satisfactorily, the supervised classification (density-slicing) method provided more satisfactory results. Results from the modelling of potential Spartina expansion suggest that Spartina expansion will proceed rapidly for the Business-as-Usual and hypothetical increase in the growth rate scenario, and at a slower rate for the elimination of seed setting and hypothetical decrease in the growth rate scenarios, until all potential habitat is filled.

  18. Chinese Sentence Classification Based on Convolutional Neural Network

    NASA Astrophysics Data System (ADS)

    Gu, Chengwei; Wu, Ming; Zhang, Chuang

    2017-10-01

    Sentence classification is one of the significant issues in Natural Language Processing (NLP). Feature extraction is often regarded as the key point for natural language processing. Traditional ways based on machine learning can not take high level features into consideration, such as Naive Bayesian Model. The neural network for sentence classification can make use of contextual information to achieve greater results in sentence classification tasks. In this paper, we focus on classifying Chinese sentences. And the most important is that we post a novel architecture of Convolutional Neural Network (CNN) to apply on Chinese sentence classification. In particular, most of the previous methods often use softmax classifier for prediction, we embed a linear support vector machine to substitute softmax in the deep neural network model, minimizing a margin-based loss to get a better result. And we use tanh as an activation function, instead of ReLU. The CNN model improve the result of Chinese sentence classification tasks. Experimental results on the Chinese news title database validate the effectiveness of our model.

  19. A HYBRID HIGH RESOLUTION IMAGE CLASSIFICATION METHOD FOR MAPPING EELGRASS DISTRIBUTIONS IN YAQUINA BAY ESTUARY, OREGON

    EPA Science Inventory

    False-color infrared aerial photography of the Yaquina Bay Estuary, Oregon was acquired at extreme low tides and digitally orthorectified with a ground pixel resolution of 20 cm to provide data for intertidal vegetation mapping. Submerged, semi-exposed and exposed eelgrass mead...

  20. Unravelling core microbial metabolisms in the hypersaline microbial mats of Shark Bay using high-throughput metagenomics

    PubMed Central

    Ruvindy, Rendy; White III, Richard Allen; Neilan, Brett Anthony; Burns, Brendan Paul

    2016-01-01

    Modern microbial mats are potential analogues of some of Earth's earliest ecosystems. Excellent examples can be found in Shark Bay, Australia, with mats of various morphologies. To further our understanding of the functional genetic potential of these complex microbial ecosystems, we conducted for the first time shotgun metagenomic analyses. We assembled metagenomic next-generation sequencing data to classify the taxonomic and metabolic potential across diverse morphologies of marine mats in Shark Bay. The microbial community across taxonomic classifications using protein-coding and small subunit rRNA genes directly extracted from the metagenomes suggests that three phyla Proteobacteria, Cyanobacteria and Bacteriodetes dominate all marine mats. However, the microbial community structure between Shark Bay and Highbourne Cay (Bahamas) marine systems appears to be distinct from each other. The metabolic potential (based on SEED subsystem classifications) of the Shark Bay and Highbourne Cay microbial communities were also distinct. Shark Bay metagenomes have a metabolic pathway profile consisting of both heterotrophic and photosynthetic pathways, whereas Highbourne Cay appears to be dominated almost exclusively by photosynthetic pathways. Alternative non-rubisco-based carbon metabolism including reductive TCA cycle and 3-hydroxypropionate/4-hydroxybutyrate pathways is highly represented in Shark Bay metagenomes while not represented in Highbourne Cay microbial mats or any other mat forming ecosystems investigated to date. Potentially novel aspects of nitrogen cycling were also observed, as well as putative heavy metal cycling (arsenic, mercury, copper and cadmium). Finally, archaea are highly represented in Shark Bay and may have critical roles in overall ecosystem function in these modern microbial mats. PMID:26023869

  1. Predicting membrane protein types using various decision tree classifiers based on various modes of general PseAAC for imbalanced datasets.

    PubMed

    Sankari, E Siva; Manimegalai, D

    2017-12-21

    Predicting membrane protein types is an important and challenging research area in bioinformatics and proteomics. Traditional biophysical methods are used to classify membrane protein types. Due to large exploration of uncharacterized protein sequences in databases, traditional methods are very time consuming, expensive and susceptible to errors. Hence, it is highly desirable to develop a robust, reliable, and efficient method to predict membrane protein types. Imbalanced datasets and large datasets are often handled well by decision tree classifiers. Since imbalanced datasets are taken, the performance of various decision tree classifiers such as Decision Tree (DT), Classification And Regression Tree (CART), C4.5, Random tree, REP (Reduced Error Pruning) tree, ensemble methods such as Adaboost, RUS (Random Under Sampling) boost, Rotation forest and Random forest are analysed. Among the various decision tree classifiers Random forest performs well in less time with good accuracy of 96.35%. Another inference is RUS boost decision tree classifier is able to classify one or two samples in the class with very less samples while the other classifiers such as DT, Adaboost, Rotation forest and Random forest are not sensitive for the classes with fewer samples. Also the performance of decision tree classifiers is compared with SVM (Support Vector Machine) and Naive Bayes classifier. Copyright © 2017 Elsevier Ltd. All rights reserved.

  2. Development of a triage engine enabling behavior recognition and lethal arrhythmia detection for remote health care system.

    PubMed

    Sugano, Hiroto; Hara, Shinsuke; Tsujioka, Tetsuo; Inoue, Tadayuki; Nakajima, Shigeyoshi; Kozaki, Takaaki; Namkamura, Hajime; Takeuchi, Kazuhide

    2011-01-01

    For ubiquitous health care systems which continuously monitor a person's vital signs such as electrocardiogram (ECG), body surface temperature and three-dimensional (3D) acceleration by wireless, it is important to accurately detect the occurrence of an abnormal event in the data and immediately inform a medical doctor of its detail. In this paper, we introduce a remote health care system, which is composed of a wireless vital sensor, multiple receivers and a triage engine installed in a desktop personal computer (PC). The middleware installed in the receiver, which was developed in C++, supports reliable data handling of vital data to the ethernet port. On the other hand, the human interface of the triage engine, which was developed in JAVA, shows graphics on his/her ECG data, 3D acceleration data, body surface temperature data and behavior status in the display of the desktop PC and sends an urgent e-mail containing the display data to a pre-registered medical doctor when it detects the occurrence of an abnormal event. In the triage engine, the lethal arrhythmia detection algorithm based on short time Fourier transform (STFT) analysis can achieve 100 % sensitivity and 99.99 % specificity, and the behavior recognition algorithm based on the combination of the nearest neighbor method and the Naive Bayes method can achieve more than 71 % classification accuracy.

  3. Analysis and prediction of presynaptic and postsynaptic neurotoxins by Chou's general pseudo amino acid composition and motif features.

    PubMed

    Mei, Juan; Zhao, Ji

    2018-06-14

    Presynaptic neurotoxins and postsynaptic neurotoxins are two important neurotoxins isolated from venoms of venomous animals and have been proven to be potential effective in neurosciences and pharmacology. With the number of toxin sequences appeared in the public databases, there was a need for developing a computational method for fast and accurate identification and classification of the novel presynaptic neurotoxins and postsynaptic neurotoxins in the large databases. In this study, the Multinomial Naive Bayes Classifier (MNBC) had been developed to discriminate the presynaptic neurotoxins and postsynaptic neurotoxins based on the different kinds of features. The Minimum Redundancy Maximum Relevance (MRMR) feature selection method was used for ranking 400 pseudo amino acid (PseAA) compositions and 50 top ranked PseAA compositions were selected for improving the prediction results. The motif features, 400 PseAA compositions and 50 PseAA compositions were combined together, and selected as the input parameters of MNBC. The best correlation coefficient (CC) value of 0.8213 was obtained when the prediction quality was evaluated by the jackknife test. It was anticipated that the algorithm presented in this study may become a useful tool for identification of presynaptic neurotoxin and postsynaptic neurotoxin sequences and may provide some useful help for in-depth investigation into the biological mechanism of presynaptic neurotoxins and postsynaptic neurotoxins. Copyright © 2018 Elsevier Ltd. All rights reserved.

  4. Gene selection using hybrid binary black hole algorithm and modified binary particle swarm optimization.

    PubMed

    Pashaei, Elnaz; Pashaei, Elham; Aydin, Nizamettin

    2018-04-14

    In cancer classification, gene selection is an important data preprocessing technique, but it is a difficult task due to the large search space. Accordingly, the objective of this study is to develop a hybrid meta-heuristic Binary Black Hole Algorithm (BBHA) and Binary Particle Swarm Optimization (BPSO) (4-2) model that emphasizes gene selection. In this model, the BBHA is embedded in the BPSO (4-2) algorithm to make the BPSO (4-2) more effective and to facilitate the exploration and exploitation of the BPSO (4-2) algorithm to further improve the performance. This model has been associated with Random Forest Recursive Feature Elimination (RF-RFE) pre-filtering technique. The classifiers which are evaluated in the proposed framework are Sparse Partial Least Squares Discriminant Analysis (SPLSDA); k-nearest neighbor and Naive Bayes. The performance of the proposed method was evaluated on two benchmark and three clinical microarrays. The experimental results and statistical analysis confirm the better performance of the BPSO (4-2)-BBHA compared with the BBHA, the BPSO (4-2) and several state-of-the-art methods in terms of avoiding local minima, convergence rate, accuracy and number of selected genes. The results also show that the BPSO (4-2)-BBHA model can successfully identify known biologically and statistically significant genes from the clinical datasets. Copyright © 2018 Elsevier Inc. All rights reserved.

  5. Land cover in the Guayas Basin using SAR images from low resolution ASAR Global mode to high resolution Sentinel-1 images

    NASA Astrophysics Data System (ADS)

    Bourrel, Luc; Brodu, Nicolas; Frappart, Frédéric

    2016-04-01

    Remotely sensed images allow a frequent monitoring of land cover variations at regional and global scale. Recently launched Sentinel-1 satellite offers a global cover of land areas at an unprecedented spatial (20 m) and temporal (6 days at the Equator). We propose here to compare the performances of commonly used supervised classification techniques (i.e., k-nearest neighbors, linear and Gaussian support vector machines, naive Bayes, linear and quadratic discriminant analyzes, adaptative boosting, loggit regression, ridge regression with one-vs-one voting, random forest, extremely randomized trees) for land cover applications in the Guayas Basin, the largest river basin of the Pacific coast of Ecuator (area ~32,000 km²). The reason of this choice is the importance of this region in Ecuatorian economy as its watershed represents 13% of the total area of Ecuador where 40% of the Ecuadorian population lives. It also corresponds to the most productive region of Ecuador for agriculture and aquaculture. Fifty percents of the country shrimp farming production comes from this watershed, and represents with agriculture the largest source of revenue of the country. Similar comparisons are also performed using ENVISAT ASAR images acquired in global mode (1 km of spatial resolution). Accuracy of the results will be achieved using land cover map derived from multi-spectral images.

  6. Streamlining machine learning in mobile devices for remote sensing

    NASA Astrophysics Data System (ADS)

    Coronel, Andrei D.; Estuar, Ma. Regina E.; Garcia, Kyle Kristopher P.; Dela Cruz, Bon Lemuel T.; Torrijos, Jose Emmanuel; Lim, Hadrian Paulo M.; Abu, Patricia Angela R.; Victorino, John Noel C.

    2017-09-01

    Mobile devices have been at the forefront of Intelligent Farming because of its ubiquitous nature. Applications on precision farming have been developed on smartphones to allow small farms to monitor environmental parameters surrounding crops. Mobile devices are used for most of these applications, collecting data to be sent to the cloud for storage, analysis, modeling and visualization. However, with the issue of weak and intermittent connectivity in geographically challenged areas of the Philippines, the solution is to provide analysis on the phone itself. Given this, the farmer gets a real time response after data submission. Though Machine Learning is promising, hardware constraints in mobile devices limit the computational capabilities, making model development on the phone restricted and challenging. This study discusses the development of a Machine Learning based mobile application using OpenCV libraries. The objective is to enable the detection of Fusarium oxysporum cubense (Foc) in juvenile and asymptomatic bananas using images of plant parts and microscopic samples as input. Image datasets of attached, unattached, dorsal, and ventral views of leaves were acquired through sampling protocols. Images of raw and stained specimens from soil surrounding the plant, and sap from the plant resulted to stained and unstained samples respectively. Segmentation and feature extraction techniques were applied to all images. Initial findings show no significant differences among the different feature extraction techniques. For differentiating infected from non-infected leaves, KNN yields highest average accuracy, as opposed to Naive Bayes and SVM. For microscopic images using MSER feature extraction, KNN has been tested as having a better accuracy than SVM or Naive-Bayes.

  7. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ruvindy, Rendy; White III, Richard Allen; Neilan, Brett Anthony

    Modern microbial mats are potential analogues of some of Earth’s earliest ecosystems. Excellent examples can be found in Shark Bay, Australia, with mats of various morphologies. To further our understanding of the functional genetic potential of these complex microbial ecosystems, we conducted for the first time shotgun metagenomic analyses. We assembled metagenomic nextgeneration sequencing data to classify the taxonomic and metabolic potential across diverse morphologies of marine mats in Shark Bay. The microbial community across taxonomic classifications using protein-coding and small subunit rRNA genes directly extracted from the metagenomes suggests that three phyla Proteobacteria, Cyanobacteria and Bacteriodetes dominate all marinemore » mats. However, the microbial community structure between Shark Bay and Highbourne Cay (Bahamas) marine systems appears to be distinct from each other. The metabolic potential (based on SEED subsystem classifications) of the Shark Bay and Highbourne Cay microbial communities were also distinct. Shark Bay metagenomes have a metabolic pathway profile consisting of both heterotrophic and photosynthetic pathways, whereas Highbourne Cay appears to be dominated almost exclusively by photosynthetic pathways. Alternative non-rubisco-based carbon metabolism including reductive TCA cycle and 3-hydroxypropionate/4-hydroxybutyrate pathways is highly represented in Shark Bay metagenomes while not represented in Highbourne Cay microbial mats or any other mat forming ecosystems investigated to date. Potentially novel aspects of nitrogen cycling were also observed, as well as putative heavy metal cycling (arsenic, mercury, copper and cadmium). Finally, archaea are highly represented in Shark Bay and may have critical roles in overall ecosystem function in these modern microbial mats.« less

  8. Using empirical Bayes predictors from generalized linear mixed models to test and visualize associations among longitudinal outcomes.

    PubMed

    Mikulich-Gilbertson, Susan K; Wagner, Brandie D; Grunwald, Gary K; Riggs, Paula D; Zerbe, Gary O

    2018-01-01

    Medical research is often designed to investigate changes in a collection of response variables that are measured repeatedly on the same subjects. The multivariate generalized linear mixed model (MGLMM) can be used to evaluate random coefficient associations (e.g. simple correlations, partial regression coefficients) among outcomes that may be non-normal and differently distributed by specifying a multivariate normal distribution for their random effects and then evaluating the latent relationship between them. Empirical Bayes predictors are readily available for each subject from any mixed model and are observable and hence, plotable. Here, we evaluate whether second-stage association analyses of empirical Bayes predictors from a MGLMM, provide a good approximation and visual representation of these latent association analyses using medical examples and simulations. Additionally, we compare these results with association analyses of empirical Bayes predictors generated from separate mixed models for each outcome, a procedure that could circumvent computational problems that arise when the dimension of the joint covariance matrix of random effects is large and prohibits estimation of latent associations. As has been shown in other analytic contexts, the p-values for all second-stage coefficients that were determined by naively assuming normality of empirical Bayes predictors provide a good approximation to p-values determined via permutation analysis. Analyzing outcomes that are interrelated with separate models in the first stage and then associating the resulting empirical Bayes predictors in a second stage results in different mean and covariance parameter estimates from the maximum likelihood estimates generated by a MGLMM. The potential for erroneous inference from using results from these separate models increases as the magnitude of the association among the outcomes increases. Thus if computable, scatterplots of the conditionally independent empirical Bayes predictors from a MGLMM are always preferable to scatterplots of empirical Bayes predictors generated by separate models, unless the true association between outcomes is zero.

  9. Using Loss Functions for DIF Detection: An Empirical Bayes Approach.

    ERIC Educational Resources Information Center

    Zwick, Rebecca; Thayer, Dorothy; Lewis, Charles

    2000-01-01

    Studied a method for flagging differential item functioning (DIF) based on loss functions. Builds on earlier research that led to the development of an empirical Bayes enhancement to the Mantel-Haenszel DIF analysis. Tested the method through simulation and found its performance better than some commonly used DIF classification systems. (SLD)

  10. A hybrid approach to select features and classify diseases based on medical data

    NASA Astrophysics Data System (ADS)

    AbdelLatif, Hisham; Luo, Jiawei

    2018-03-01

    Feature selection is popular problem in the classification of diseases in clinical medicine. Here, we developing a hybrid methodology to classify diseases, based on three medical datasets, Arrhythmia, Breast cancer, and Hepatitis datasets. This methodology called k-means ANOVA Support Vector Machine (K-ANOVA-SVM) uses K-means cluster with ANOVA statistical to preprocessing data and selection the significant features, and Support Vector Machines in the classification process. To compare and evaluate the performance, we choice three classification algorithms, decision tree Naïve Bayes, Support Vector Machines and applied the medical datasets direct to these algorithms. Our methodology was a much better classification accuracy is given of 98% in Arrhythmia datasets, 92% in Breast cancer datasets and 88% in Hepatitis datasets, Compare to use the medical data directly with decision tree Naïve Bayes, and Support Vector Machines. Also, the ROC curve and precision with (K-ANOVA-SVM) Achieved best results than other algorithms

  11. Comparison of MSS and TM Data for Landcover Classification in the Chesapeake Bay Area: a Preliminary Report. [Taylor's Island, Maryland

    NASA Technical Reports Server (NTRS)

    Mulligan, P. J.; Gervin, J. C.; Lu, Y. C.

    1985-01-01

    An area bordering the Eastern Shore of the Chesapeake Bay was selected for study and classified using unsupervised techniques applied to LANDSAT-2 MSS data and several band combinations of LANDSAT-4 TM data. The accuracies of these Level I land cover classifications were verified using the Taylor's Island USGS 7.5 minute topographic map which was photointerpreted, digitized and rasterized. The the Taylor's Island map, comparing the MSS and TM three band (2 3 4) classifications, the increased resolution of TM produced a small improvement in overall accuracy of 1% correct due primarily to a small improvement, and 1% and 3%, in areas such as water and woodland. This was expected as the MSS data typically produce high accuracies for categories which cover large contiguous areas. However, in the categories covering smaller areas within the map there was generally an improvement of at least 10%. Classification of the important residential category improved 12%, and wetlands were mapped with 11% greater accuracy.

  12. Using an Integrated Naive Bayes Calssifier for Crawling Relevent Data on the Web

    NASA Astrophysics Data System (ADS)

    Mihsra, A.

    2015-12-01

    In our experiments (at JPL, NASA) for DARPA Memex project, we wanted to crawl a large amount of data for various domains. A big challenge was data relevancy in the crawled data. More than 50% of the data was irrelevant to the domain at hand. One immediate solution was to use good seeds (seeds are the initial urls from where the program starts to crawl) and make sure that the crawl remains into the original host urls. This although a very efficient technique, fails under two conditions. One when you aim to reach deeper into the web; into new hosts (not in the seed list) and two when the website hosts myriad content types eg. a News website.The relevancy calculation used to be a post processing step i.e. once we had finished crawling, we trained a NaiveBayes Classifier and used it to find a rough relevancy of the web pages that we had. Integrating the relevancy into the crawling rather than after it was very important because crawling takes resources and time. To save both we needed to get an idea of relevancy of the whole crawl during run time and be able to steer its course accordingly. We use Apache Nutch as the crawler, which uses a plugin system to incorporate any new implementations and hence we built a plugin for Nutch.The Naive Bayes Parse Plugin works in the following way. It parses every page and decides, using a trained model (which is built in situ only once using the positive and negative examples given by the user in a very simple format), if it is relevant; If true, then it allows all the outlinks from that page to go to the next round of crawling; If not, then it gives the urls a second chance to prove themselves by checking some commonly expected words in the url relevant to that domain. This two tier system is very intuitive and efficient in focusing the crawl. In our initial test experiments over 100 seed urls, the results were astonishingly good with a recall of 98%.The same technique can be applied to geo-informatics. This will help scientists gather data that is relevant to their specific domain. As a proof of concept we also crawled nsidc.org and some similar websites and were very efficiently able to keep the crawler from going into the hub websites like Yahoo, commercial/advertising portals and irrelevant content pages.It is a strong start towards focused crawling using Nutch, one of the most scalable and ever evolving crawler available today.

  13. Bayes Error Rate Estimation Using Classifier Ensembles

    NASA Technical Reports Server (NTRS)

    Tumer, Kagan; Ghosh, Joydeep

    2003-01-01

    The Bayes error rate gives a statistical lower bound on the error achievable for a given classification problem and the associated choice of features. By reliably estimating th is rate, one can assess the usefulness of the feature set that is being used for classification. Moreover, by comparing the accuracy achieved by a given classifier with the Bayes rate, one can quantify how effective that classifier is. Classical approaches for estimating or finding bounds for the Bayes error, in general, yield rather weak results for small sample sizes; unless the problem has some simple characteristics, such as Gaussian class-conditional likelihoods. This article shows how the outputs of a classifier ensemble can be used to provide reliable and easily obtainable estimates of the Bayes error with negligible extra computation. Three methods of varying sophistication are described. First, we present a framework that estimates the Bayes error when multiple classifiers, each providing an estimate of the a posteriori class probabilities, a recombined through averaging. Second, we bolster this approach by adding an information theoretic measure of output correlation to the estimate. Finally, we discuss a more general method that just looks at the class labels indicated by ensem ble members and provides error estimates based on the disagreements among classifiers. The methods are illustrated for artificial data, a difficult four-class problem involving underwater acoustic data, and two problems from the Problem benchmarks. For data sets with known Bayes error, the combiner-based methods introduced in this article outperform existing methods. The estimates obtained by the proposed methods also seem quite reliable for the real-life data sets for which the true Bayes rates are unknown.

  14. Improving data retrieval quality: Evidence based medicine perspective.

    PubMed

    Kamalov, M; Dobrynin, V; Balykina, J; Kolbin, A; Verbitskaya, E; Kasimova, M

    2015-01-01

    The actively developing approach in modern medicine is the approach focused on principles of evidence-based medicine. The assessment of quality and reliability of studies is needed. However, in some cases studies corresponding to the first level of evidence may contain errors in randomized control trials (RCTs). Solution of the problem is the Grading of Recommendations Assessment, Development and Evaluation (GRADE) system. Studies both in the fields of medicine and information retrieval are conducted for developing search engines for the MEDLINE database [1]; combined techniques for summarization and information retrieval targeted to solving problems of finding the best medication based on the levels of evidence are being developed [2]. Based on the relevance and demand for studies both in the field of medicine and information retrieval, it was decided to start the development of a search engine for the MEDLINE database search on the basis of the Saint-Petersburg State University with the support of Pavlov First Saint-Petersburg State Medical University and Tashkent Institute of Postgraduate Medical Education. Novelty and value of the proposed system are characterized by the use of ranking method of relevant abstracts. It is suggested that the system will be able to perform ranking based on studies level of evidence and to apply GRADE criteria for system evaluation. The assigned task falls within the domain of information retrieval and machine learning. Based on the results of implementation from previous work [3], in which the main goal was to cluster abstracts from MEDLINE database by subtypes of medical interventions, a set of algorithms for clustering in this study was selected: K-means, K-means ++, EM from the sklearn (http://scikit-learn.org) and WEKA (http://www.cs.waikato.ac.nz/~ml/weka/) libraries, together with the methods of Latent Semantic Analysis (LSA) [4] choosing the first 210 facts and the model "bag of words" [5] to represent clustered documents. During the process of abstracts classification, few algorithms were tested including: Complement Naive Bayes [6], Sequential Minimal Optimization (SMO) [7] and non linear SVM from the WEKA library. The first step of this study was to markup abstracts of articles from the MEDLINE by containing and not containing a medical intervention. For this purpose, based on our previous work [8] a web-crawler was modified to perform the necessary markuping. The next step was to evaluate the clustering algorithms at the markup abstracts. As a result of clustering abstracts by two groups, when applying the LSA and choosing first 210 facts, the following results were obtained:1) K-means: Purity = 0,5598, Normalized Entropy = 0.5994;2)K-means ++: Purity = 0,6743, Normalized Entropy = 0.4996;3)EM: Purity = 0,5443, Normalized Entropy = 0.6344.When applying the model "bag of words":1)K-means: Purity = 0,5134, Normalized Entropy = 0.6254;2)K-means ++: Purity = 0,5645, Normalized Entropy = 0.5299;3)EM: Purity = 0,5247, Normalized Entropy = 0.6345.Then, studies which contain medical intervention have been considered and classified by the subtypes of medical interventions. At the process of classification abstracts by subtypes of medical interventions, abstracts were presented as a "bag of words" model with the removal of stop words. 1)Complement Naive Bayes: macro F-measure = 0.6934, micro F-measure = 0.7234;2)Sequantial Minimal Optimization: macro F-measure = 0.6543, micro F-measure = 0.7042;3)Non linear SVM: macro F-measure = 0.6835, micro F-measure = 0.7642. Based on the results of computational experiments, the best results of abstract clustering by containing and not containing medical intervention were obtained using the K-Means ++ algorithm together with LSA, choosing the first 210 facts. The quality of classification abstracts by subtypes of medical interventions value for existing ones [8] has been improved using non linear SVM algorithm, with "bag of words" model and the removal of stop words. The results of clustering obtained in this study will help in grouping abstracts by levels of evidence, using the classification by subtypes of medical interventions and it will be possible to extract information from the abstracts on specific types of interventions.

  15. Multi-objective evolutionary algorithms for fuzzy classification in survival prediction.

    PubMed

    Jiménez, Fernando; Sánchez, Gracia; Juárez, José M

    2014-03-01

    This paper presents a novel rule-based fuzzy classification methodology for survival/mortality prediction in severe burnt patients. Due to the ethical aspects involved in this medical scenario, physicians tend not to accept a computer-based evaluation unless they understand why and how such a recommendation is given. Therefore, any fuzzy classifier model must be both accurate and interpretable. The proposed methodology is a three-step process: (1) multi-objective constrained optimization of a patient's data set, using Pareto-based elitist multi-objective evolutionary algorithms to maximize accuracy and minimize the complexity (number of rules) of classifiers, subject to interpretability constraints; this step produces a set of alternative (Pareto) classifiers; (2) linguistic labeling, which assigns a linguistic label to each fuzzy set of the classifiers; this step is essential to the interpretability of the classifiers; (3) decision making, whereby a classifier is chosen, if it is satisfactory, according to the preferences of the decision maker. If no classifier is satisfactory for the decision maker, the process starts again in step (1) with a different input parameter set. The performance of three multi-objective evolutionary algorithms, niched pre-selection multi-objective algorithm, elitist Pareto-based multi-objective evolutionary algorithm for diversity reinforcement (ENORA) and the non-dominated sorting genetic algorithm (NSGA-II), was tested using a patient's data set from an intensive care burn unit and a standard machine learning data set from an standard machine learning repository. The results are compared using the hypervolume multi-objective metric. Besides, the results have been compared with other non-evolutionary techniques and validated with a multi-objective cross-validation technique. Our proposal improves the classification rate obtained by other non-evolutionary techniques (decision trees, artificial neural networks, Naive Bayes, and case-based reasoning) obtaining with ENORA a classification rate of 0.9298, specificity of 0.9385, and sensitivity of 0.9364, with 14.2 interpretable fuzzy rules on average. Our proposal improves the accuracy and interpretability of the classifiers, compared with other non-evolutionary techniques. We also conclude that ENORA outperforms niched pre-selection and NSGA-II algorithms. Moreover, given that our multi-objective evolutionary methodology is non-combinational based on real parameter optimization, the time cost is significantly reduced compared with other evolutionary approaches existing in literature based on combinational optimization. Copyright © 2014 Elsevier B.V. All rights reserved.

  16. Accuracy of Bayes and Logistic Regression Subscale Probabilities for Educational and Certification Tests

    ERIC Educational Resources Information Center

    Rudner, Lawrence

    2016-01-01

    In the machine learning literature, it is commonly accepted as fact that as calibration sample sizes increase, Naïve Bayes classifiers initially outperform Logistic Regression classifiers in terms of classification accuracy. Applied to subtests from an on-line final examination and from a highly regarded certification examination, this study shows…

  17. Sentimental analysis of Amazon reviews using naïve bayes on laptop products with MongoDB and R

    NASA Astrophysics Data System (ADS)

    Kamal Hassan, Mohan; Prasanth Shakthi, Sana; Sasikala, R.

    2017-11-01

    Start In Today’s era the e-commerce is developing rapidly these years, buying products on-line has become more and more fashionable owing to its variety of options, low cost value (high discounts) and quick supply systems, so abundant folks intend to do online shopping. In the meantime the standard and delivery of merchandise is uneven, fake branded products are delivered. We use product users review comments about product and review about retailers from Amazon as data set and classify review text by subjectivity/objectivity and negative/positive attitude of buyer. Such reviews are helpful to some extent, promising both the shoppers and products makers. This paper presents an empirical study of efficacy of classifying product review by tagging the keyword. In the present study, we tend to analyse the fundamentals of determining, positive and negative approach towards the product. Thus we hereby propose completely different approaches by removing the unstructured data and then classifying comments employing Naive Bayes algorithm.

  18. Automatic topic identification of health-related messages in online health community using text classification.

    PubMed

    Lu, Yingjie

    2013-01-01

    To facilitate patient involvement in online health community and obtain informative support and emotional support they need, a topic identification approach was proposed in this paper for identifying automatically topics of the health-related messages in online health community, thus assisting patients in reaching the most relevant messages for their queries efficiently. Feature-based classification framework was presented for automatic topic identification in our study. We first collected the messages related to some predefined topics in a online health community. Then we combined three different types of features, n-gram-based features, domain-specific features and sentiment features to build four feature sets for health-related text representation. Finally, three different text classification techniques, C4.5, Naïve Bayes and SVM were adopted to evaluate our topic classification model. By comparing different feature sets and different classification techniques, we found that n-gram-based features, domain-specific features and sentiment features were all considered to be effective in distinguishing different types of health-related topics. In addition, feature reduction technique based on information gain was also effective to improve the topic classification performance. In terms of classification techniques, SVM outperformed C4.5 and Naïve Bayes significantly. The experimental results demonstrated that the proposed approach could identify the topics of online health-related messages efficiently.

  19. The nearest neighbor and the bayes error rates.

    PubMed

    Loizou, G; Maybank, S J

    1987-02-01

    The (k, l) nearest neighbor method of pattern classification is compared to the Bayes method. If the two acceptance rates are equal then the asymptotic error rates satisfy the inequalities Ek,l + 1 ¿ E*(¿) ¿ Ek,l dE*(¿), where d is a function of k, l, and the number of pattern classes, and ¿ is the reject threshold for the Bayes method. An explicit expression for d is given which is optimal in the sense that for some probability distributions Ek,l and dE* (¿) are equal.

  20. Application of Bayes' to the prediction of referral decisions made by specialist optometrists in relation to chronic open angle glaucoma.

    PubMed

    Gurney, J C; Ansari, E; Harle, D; O'Kane, N; Sagar, R V; Dunne, M C M

    2018-02-09

    To determine the accuracy of a Bayesian learning scheme (Bayes') applied to the prediction of clinical decisions made by specialist optometrists in relation to the referral refinement of chronic open angle glaucoma. This cross-sectional observational study involved collection of data from the worst affected or right eyes of a consecutive sample of cases (n = 1,006) referred into the West Kent Clinical Commissioning Group Community Ophthalmology Team (COT) by high street optometrists. Multilevel classification of each case was based on race, sex, age, family history of chronic open angle glaucoma, reason for referral, Goldmann Applanation Tonometry (intraocular pressure and interocular asymmetry), optic nerve head assessment (vertical size, cup disc ratio and interocular asymmetry), central corneal thickness and visual field analysis (Hodapp-Parrish-Anderson classification). Randomised stratified tenfold cross-validation was applied to determine the accuracy of Bayes' by comparing its output to the clinical decisions of three COT specialist optometrists; namely, the decision to discharge, follow-up or refer each case. Outcomes of cross-validation, expressed as means and standard deviations, showed that the accuracy of Bayes' was high (95%, 2.0%) but that it falsely discharged (3.4%, 1.6%) or referred (3.1%, 1.5%) some cases. The results indicate that Bayes' has the potential to augment the decisions of specialist optometrists.

  1. Theory and analysis of statistical discriminant techniques as applied to remote sensing data

    NASA Technical Reports Server (NTRS)

    Odell, P. L.

    1973-01-01

    Classification of remote earth resources sensing data according to normed exponential density statistics is reported. The use of density models appropriate for several physical situations provides an exact solution for the probabilities of classifications associated with the Bayes discriminant procedure even when the covariance matrices are unequal.

  2. Identification of Phragmites australis and Spartina alterniflora in the Yangtze Estuary between Bayes and BP neural network using hyper-spectral data

    NASA Astrophysics Data System (ADS)

    Liu, Pudong; Zhou, Jiayuan; Shi, Runhe; Zhang, Chao; Liu, Chaoshun; Sun, Zhibin; Gao, Wei

    2016-09-01

    The aim of this work was to identify the coastal wetland plants between Bayes and BP neural network using hyperspectral data in order to optimize the classification method. For this purpose, we chose two dominant plants (invasive S. alterniflora and native P. australis) in the Yangtze Estuary, the leaf spectral reflectance of P. australis and S. alterniflora were measured by ASD field spectral machine. We tested the Bayes method and BP neural network for the identification of these two species. Results showed that three different bands (i.e., 555 nm 711 nm and 920 nm) could be identified as the sensitive bands for the input parameters for the two methods. Bayes method and BP neural network prediction model both performed well (Bayes prediction for 88.57% accuracy, BP neural network model prediction for about 80% accuracy), but Bayes theorem method could give higher accuracy and stability.

  3. Delineation of marsh types from Corpus Christi Bay, Texas, to Perdido Bay, Alabama, in 2010

    USGS Publications Warehouse

    Enwright, Nicholas M.; Hartley, Stephen B.; Couvillion, Brady R.; Michael G. Brasher,; Jenneke M. Visser,; Michael K. Mitchell,; Bart M. Ballard,; Mark W. Parr,; Barry C. Wilson,

    2015-07-23

    This study incorporates about 9,800 ground reference locations collected via helicopter surveys in coastal wetland areas. Decision-tree analyses were used to classify emergent marsh vegetation types by using ground reference data from helicopter vegetation surveys and independent variables such as multitemporal satellite-based multispectral imagery from 2009 to 2011, bare-earth digital elevation models based on airborne light detection and ranging (lidar), alternative contemporary land cover classifications, and other spatially explicit variables. Image objects were created from 2010 National Agriculture Imagery Program color-infrared aerial photography. The final classification is a 10-meter raster dataset that was produced by using a majority filter to classify image objects according to the marsh vegetation type covering the majority of each image object. The classification is dated 2010 because the year is both the midpoint of the classified multitemporal satellite-based imagery (2009–11) and the date of the high-resolution airborne imagery that was used to develop image objects. The seamless classification produced through this work can be used to help develop and refine conservation efforts for priority natural resources.

  4. An automated approach to the design of decision tree classifiers

    NASA Technical Reports Server (NTRS)

    Argentiero, P.; Chin, P.; Beaudet, P.

    1980-01-01

    The classification of large dimensional data sets arising from the merging of remote sensing data with more traditional forms of ancillary data is considered. Decision tree classification, a popular approach to the problem, is characterized by the property that samples are subjected to a sequence of decision rules before they are assigned to a unique class. An automated technique for effective decision tree design which relies only on apriori statistics is presented. This procedure utilizes a set of two dimensional canonical transforms and Bayes table look-up decision rules. An optimal design at each node is derived based on the associated decision table. A procedure for computing the global probability of correct classfication is also provided. An example is given in which class statistics obtained from an actual LANDSAT scene are used as input to the program. The resulting decision tree design has an associated probability of correct classification of .76 compared to the theoretically optimum .79 probability of correct classification associated with a full dimensional Bayes classifier. Recommendations for future research are included.

  5. Implementation of mutual information and bayes theorem for classification microarray data

    NASA Astrophysics Data System (ADS)

    Dwifebri Purbolaksono, Mahendra; Widiastuti, Kurnia C.; Syahrul Mubarok, Mohamad; Adiwijaya; Aminy Ma’ruf, Firda

    2018-03-01

    Microarray Technology is one of technology which able to read the structure of gen. The analysis is important for this technology. It is for deciding which attribute is more important than the others. Microarray technology is able to get cancer information to diagnose a person’s gen. Preparation of microarray data is a huge problem and takes a long time. That is because microarray data contains high number of insignificant and irrelevant attributes. So, it needs a method to reduce the dimension of microarray data without eliminating important information in every attribute. This research uses Mutual Information to reduce dimension. System is built with Machine Learning approach specifically Bayes Theorem. This theorem uses a statistical and probability approach. By combining both methods, it will be powerful for Microarray Data Classification. The experiment results show that system is good to classify Microarray data with highest F1-score using Bayesian Network by 91.06%, and Naïve Bayes by 88.85%.

  6. A comparative study of nonparametric methods for pattern recognition

    NASA Technical Reports Server (NTRS)

    Hahn, S. F.; Nelson, G. D.

    1972-01-01

    The applied research discussed in this report determines and compares the correct classification percentage of the nonparametric sign test, Wilcoxon's signed rank test, and K-class classifier with the performance of the Bayes classifier. The performance is determined for data which have Gaussian, Laplacian and Rayleigh probability density functions. The correct classification percentage is shown graphically for differences in modes and/or means of the probability density functions for four, eight and sixteen samples. The K-class classifier performed very well with respect to the other classifiers used. Since the K-class classifier is a nonparametric technique, it usually performed better than the Bayes classifier which assumes the data to be Gaussian even though it may not be. The K-class classifier has the advantage over the Bayes in that it works well with non-Gaussian data without having to determine the probability density function of the data. It should be noted that the data in this experiment was always unimodal.

  7. A review and experimental study on the application of classifiers and evolutionary algorithms in EEG-based brain-machine interface systems

    NASA Astrophysics Data System (ADS)

    Tahernezhad-Javazm, Farajollah; Azimirad, Vahid; Shoaran, Maryam

    2018-04-01

    Objective. Considering the importance and the near-future development of noninvasive brain-machine interface (BMI) systems, this paper presents a comprehensive theoretical-experimental survey on the classification and evolutionary methods for BMI-based systems in which EEG signals are used. Approach. The paper is divided into two main parts. In the first part, a wide range of different types of the base and combinatorial classifiers including boosting and bagging classifiers and evolutionary algorithms are reviewed and investigated. In the second part, these classifiers and evolutionary algorithms are assessed and compared based on two types of relatively widely used BMI systems, sensory motor rhythm-BMI and event-related potentials-BMI. Moreover, in the second part, some of the improved evolutionary algorithms as well as bi-objective algorithms are experimentally assessed and compared. Main results. In this study two databases are used, and cross-validation accuracy (CVA) and stability to data volume (SDV) are considered as the evaluation criteria for the classifiers. According to the experimental results on both databases, regarding the base classifiers, linear discriminant analysis and support vector machines with respect to CVA evaluation metric, and naive Bayes with respect to SDV demonstrated the best performances. Among the combinatorial classifiers, four classifiers, Bagg-DT (bagging decision tree), LogitBoost, and GentleBoost with respect to CVA, and Bagging-LR (bagging logistic regression) and AdaBoost (adaptive boosting) with respect to SDV had the best performances. Finally, regarding the evolutionary algorithms, single-objective invasive weed optimization (IWO) and bi-objective nondominated sorting IWO algorithms demonstrated the best performances. Significance. We present a general survey on the base and the combinatorial classification methods for EEG signals (sensory motor rhythm and event-related potentials) as well as their optimization methods through the evolutionary algorithms. In addition, experimental and statistical significance tests are carried out to study the applicability and effectiveness of the reviewed methods.

  8. Spatial big data for disaster management

    NASA Astrophysics Data System (ADS)

    Shalini, R.; Jayapratha, K.; Ayeshabanu, S.; Chemmalar Selvi, G.

    2017-11-01

    Big data is an idea of informational collections that depicts huge measure of information and complex that conventional information preparing application program is lacking to manage them. Presently, big data is a widely known domain used in research, academic, and industries. It is utilized to store substantial measure of information in a solitary brought together one. Challenges integrate capture, allocation, analysis, information precise, visualization, distribution, interchange, delegation, inquiring, updating and information protection. In this digital world, to put away the information and recovering the data is enormous errand for the huge organizations and some time information ought to be misfortune due to circulated information putting away. For this issue the organization individuals are chosen to actualize the huge information to put away every one of the information identified with the organization they are put away in one enormous database that is known as large information. Remote sensor is a science getting data used to distinguish the items or break down the range from a separation. It is anything but difficult to discover the question effortlessly with the sensor. It makes geographic data from satellite and sensor information so in this paper dissect what are the structures are utilized for remote sensor in huge information and how the engineering is vary from each other and how they are identify with our investigations. This paper depicts how the calamity happens and figuring consequence of informational collection. And applied a seismic informational collection to compute the tremor calamity in view of classification and clustering strategy. The classical data mining algorithms for classification used are k-nearest, naive bayes and decision table and clustering used are hierarchical, make density based and simple k_means using XLMINER and WEKA tool. This paper also helps to predicts the spatial dataset by applying the XLMINER AND WEKA tool and thus the big spatial data can be well suited to this paper.

  9. Texture analysis for survival prediction of pancreatic ductal adenocarcinoma patients with neoadjuvant chemotherapy

    NASA Astrophysics Data System (ADS)

    Chakraborty, Jayasree; Langdon-Embry, Liana; Escalon, Joanna G.; Allen, Peter J.; Lowery, Maeve A.; O'Reilly, Eileen M.; Do, Richard K. G.; Simpson, Amber L.

    2016-03-01

    Pancreatic ductal adenocarcinoma (PDAC) is the fourth leading cause of cancer-related death in the United States. The five-year survival rate for all stages is approximately 6%, and approximately 2% when presenting with distant disease.1 Only 10-20% of all patients present with resectable disease, but recurrence rates are high with only 5 to 15% remaining free of disease at 5 years. At this time, we are unable to distinguish between resectable PDAC patients with occult metastatic disease from those with potentially curable disease. Early classification of these tumor types may eventually lead to changes in initial management including the use of neoadjuvant chemotherapy or radiation, or in the choice of postoperative adjuvant treatments. Texture analysis is an emerging methodology in oncologic imaging for quantitatively assessing tumor heterogeneity that could potentially aid in the stratification of these patients. The present study derives several texture-based features from CT images of PDAC patients, acquired prior to neoadjuvant chemotherapy, and analyzes their performance, individually as well as in combination, as prognostic markers. A fuzzy minimum redundancy maximum relevance method with leave-one-image-out technique is included to select discriminating features from the set of extracted features. With a naive Bayes classifier, the proposed method predicts the 5-year overall survival of PDAC patients prior to neoadjuvant therapy and achieves the best results in terms of the area under the receiver operating characteristic curve of 0:858 and accuracy of 83:0% with four-fold cross-validation techniques.

  10. Fast Compressive Tracking.

    PubMed

    Zhang, Kaihua; Zhang, Lei; Yang, Ming-Hsuan

    2014-10-01

    It is a challenging task to develop effective and efficient appearance models for robust object tracking due to factors such as pose variation, illumination change, occlusion, and motion blur. Existing online tracking algorithms often update models with samples from observations in recent frames. Despite much success has been demonstrated, numerous issues remain to be addressed. First, while these adaptive appearance models are data-dependent, there does not exist sufficient amount of data for online algorithms to learn at the outset. Second, online tracking algorithms often encounter the drift problems. As a result of self-taught learning, misaligned samples are likely to be added and degrade the appearance models. In this paper, we propose a simple yet effective and efficient tracking algorithm with an appearance model based on features extracted from a multiscale image feature space with data-independent basis. The proposed appearance model employs non-adaptive random projections that preserve the structure of the image feature space of objects. A very sparse measurement matrix is constructed to efficiently extract the features for the appearance model. We compress sample images of the foreground target and the background using the same sparse measurement matrix. The tracking task is formulated as a binary classification via a naive Bayes classifier with online update in the compressed domain. A coarse-to-fine search strategy is adopted to further reduce the computational complexity in the detection procedure. The proposed compressive tracking algorithm runs in real-time and performs favorably against state-of-the-art methods on challenging sequences in terms of efficiency, accuracy and robustness.

  11. Application of recurrence quantification analysis for the automated identification of epileptic EEG signals.

    PubMed

    Acharya, U Rajendra; Sree, S Vinitha; Chattopadhyay, Subhagata; Yu, Wenwei; Ang, Peng Chuan Alvin

    2011-06-01

    Epilepsy is a common neurological disorder that is characterized by the recurrence of seizures. Electroencephalogram (EEG) signals are widely used to diagnose seizures. Because of the non-linear and dynamic nature of the EEG signals, it is difficult to effectively decipher the subtle changes in these signals by visual inspection and by using linear techniques. Therefore, non-linear methods are being researched to analyze the EEG signals. In this work, we use the recorded EEG signals in Recurrence Plots (RP), and extract Recurrence Quantification Analysis (RQA) parameters from the RP in order to classify the EEG signals into normal, ictal, and interictal classes. Recurrence Plot (RP) is a graph that shows all the times at which a state of the dynamical system recurs. Studies have reported significantly different RQA parameters for the three classes. However, more studies are needed to develop classifiers that use these promising features and present good classification accuracy in differentiating the three types of EEG segments. Therefore, in this work, we have used ten RQA parameters to quantify the important features in the EEG signals.These features were fed to seven different classifiers: Support vector machine (SVM), Gaussian Mixture Model (GMM), Fuzzy Sugeno Classifier, K-Nearest Neighbor (KNN), Naive Bayes Classifier (NBC), Decision Tree (DT), and Radial Basis Probabilistic Neural Network (RBPNN). Our results show that the SVM classifier was able to identify the EEG class with an average efficiency of 95.6%, sensitivity and specificity of 98.9% and 97.8%, respectively.

  12. Preliminary study of tumor heterogeneity in imaging predicts two year survival in pancreatic cancer patients.

    PubMed

    Chakraborty, Jayasree; Langdon-Embry, Liana; Cunanan, Kristen M; Escalon, Joanna G; Allen, Peter J; Lowery, Maeve A; O'Reilly, Eileen M; Gönen, Mithat; Do, Richard G; Simpson, Amber L

    2017-01-01

    Pancreatic ductal adenocarcinoma (PDAC) is one of the most lethal cancers in the United States with a five-year survival rate of 7.2% for all stages. Although surgical resection is the only curative treatment, currently we are unable to differentiate between resectable patients with occult metastatic disease from those with potentially curable disease. Identification of patients with poor prognosis via early classification would help in initial management including the use of neoadjuvant chemotherapy or radiation, or in the choice of postoperative adjuvant therapy. PDAC ranges in appearance from homogeneously isoattenuating masses to heterogeneously hypovascular tumors on CT images; hence, we hypothesize that heterogeneity reflects underlying differences at the histologic or genetic level and will therefore correlate with patient outcome. We quantify heterogeneity of PDAC with texture analysis to predict 2-year survival. Using fuzzy minimum-redundancy maximum-relevance feature selection and a naive Bayes classifier, the proposed features achieve an area under receiver operating characteristic curve (AUC) of 0.90 and accuracy (Ac) of 82.86% with the leave-one-image-out technique and an AUC of 0.80 and Ac of 75.0% with three-fold cross-validation. We conclude that texture analysis can be used to quantify heterogeneity in CT images to accurately predict 2-year survival in patients with pancreatic cancer. From these data, we infer differences in the biological evolution of pancreatic cancer subtypes measurable in imaging and identify opportunities for optimized patient selection for therapy.

  13. A geometrical interpretation of the 2n-th central difference

    NASA Technical Reports Server (NTRS)

    Tapia, R. A.

    1972-01-01

    Many algorithms used for data smoothing, data classification and error detection require the calculation of the distance from a point to the polynomial interpolating its 2n neighbors (n on each side). This computation, if performed naively, would require the solution of a system of equations and could create numerical problems. This note shows that if the data is equally spaced, then this calculation can be performed using a simple recursion formula.

  14. A parametric multiclass Bayes error estimator for the multispectral scanner spatial model performance evaluation

    NASA Technical Reports Server (NTRS)

    Mobasseri, B. G.; Mcgillem, C. D.; Anuta, P. E. (Principal Investigator)

    1978-01-01

    The author has identified the following significant results. The probability of correct classification of various populations in data was defined as the primary performance index. The multispectral data being of multiclass nature as well, required a Bayes error estimation procedure that was dependent on a set of class statistics alone. The classification error was expressed in terms of an N dimensional integral, where N was the dimensionality of the feature space. The multispectral scanner spatial model was represented by a linear shift, invariant multiple, port system where the N spectral bands comprised the input processes. The scanner characteristic function, the relationship governing the transformation of the input spatial, and hence, spectral correlation matrices through the systems, was developed.

  15. Factors Affecting the Item Parameter Estimation and Classification Accuracy of the DINA Model

    ERIC Educational Resources Information Center

    de la Torre, Jimmy; Hong, Yuan; Deng, Weiling

    2010-01-01

    To better understand the statistical properties of the deterministic inputs, noisy "and" gate cognitive diagnosis (DINA) model, the impact of several factors on the quality of the item parameter estimates and classification accuracy was investigated. Results of the simulation study indicate that the fully Bayes approach is most accurate when the…

  16. Bayes classification of interferometric TOPSAR data

    NASA Technical Reports Server (NTRS)

    Michel, T. R.; Rodriguez, E.; Houshmand, B.; Carande, R.

    1995-01-01

    We report the Bayes classification of terrain types at different sites using airborne interferometric synthetic aperture radar (INSAR) data. A Gaussian maximum likelihood classifier was applied on multidimensional observations derived from the SAR intensity, the terrain elevation model, and the magnitude of the interferometric correlation. Training sets for forested, urban, agricultural, or bare areas were obtained either by selecting samples with known ground truth, or by k-means clustering of random sets of samples uniformly distributed across all sites, and subsequent assignments of these clusters using ground truth. The accuracy of the classifier was used to optimize the discriminating efficiency of the set of features that was chosen. The most important features include the SAR intensity, a canopy penetration depth model, and the terrain slope. We demonstrate the classifier's performance across sites using a unique set of training classes for the four main terrain categories. The scenes examined include San Francisco (CA) (predominantly urban and water), Mount Adams (WA) (forested with clear cuts), Pasadena (CA) (urban with mountains), and Antioch Hills (CA) (water, swamps, fields). Issues related to the effects of image calibration and the robustness of the classification to calibration errors are explored. The relative performance of single polarization Interferometric data classification is contrasted against classification schemes based on polarimetric SAR data.

  17. In silico prediction of drug-induced myelotoxicity by using Naïve Bayes method.

    PubMed

    Zhang, Hui; Yu, Peng; Zhang, Teng-Guo; Kang, Yan-Li; Zhao, Xiao; Li, Yuan-Yuan; He, Jia-Hui; Zhang, Ji

    2015-11-01

    Drug-induced myelotoxicity usually leads to decrease the production of platelets, red cells, and white cells. Thus, early identification and characterization of myelotoxicity hazard in drug development is very necessary. The purpose of this investigation was to develop a prediction model of drug-induced myelotoxicity by using a Naïve Bayes classifier. For comparison, other prediction models based on support vector machine and single-hidden-layer feed-forward neural network  methods were also established. Among all the prediction models, the Naïve Bayes classification model showed the best prediction performance, which offered an average overall prediction accuracy of [Formula: see text] for the training set and [Formula: see text] for the external test set. The significant contributions of this study are that we first developed a Naïve Bayes classification model of drug-induced myelotoxicity adverse effect using a larger scale dataset, which could be employed for the prediction of drug-induced myelotoxicity. In addition, several important molecular descriptors and substructures of myelotoxic compounds have been identified, which should be taken into consideration in the design of new candidate compounds to produce safer and more effective drugs, ultimately reducing the attrition rate in later stages of drug development.

  18. [Study on the classification of dominant pathogens related to febrile respiratory syndrome, based on the method of Bayes discriminant analysis].

    PubMed

    Li, X C; Li, J S; Meng, L; Bai, Y N; Yu, D S; Liu, X N; Liu, X F; Jiang, X J; Ren, X W; Yang, X T; Shen, X P; Zhang, J W

    2017-08-10

    Objective: To understand the dominant pathogens of febrile respiratory syndrome (FRS) patients in Gansu province and to establish the Bayes discriminant function in order to identify the patients infected with the dominant pathogens. Methods: FRS patients were collected in various sentinel hospitals of Gansu province from 2009 to 2015 and the dominant pathogens were determined by describing the composition of pathogenic profile. Significant clinical variables were selected by stepwise discriminant analysis to establish the Bayes discriminant function. Results: In the detection of pathogens for FRS, both influenza virus and rhinovirus showed higher positive rates than those caused by other viruses (13.79%, 8.63%), that accounting for 54.38%, 13.73% of total viral positive patients. Most frequently detected bacteria would include Streptococcus pneumoniae , and haemophilus influenza (44.41%, 18.07%) that accounting for 66.21% and 24.55% among the bacterial positive patients. The original-validated rate of discriminant function, established by 11 clinical variables, was 73.1%, with the cross-validated rate as 70.6%. Conclusion: Influenza virus, Rhinovirus, Streptococcus pneumoniae and Haemophilus influenzae were the dominant pathogens of FRS in Gansu province. Results from the Bayes discriminant analysis showed both higher accuracy in the classification of dominant pathogens, and applicative value for FRS.

  19. Extracting Information from Electronic Medical Records to Identify the Obesity Status of a Patient Based on Comorbidities and Bodyweight Measures.

    PubMed

    Figueroa, Rosa L; Flores, Christopher A

    2016-08-01

    Obesity is a chronic disease with an increasing impact on the world's population. In this work, we present a method of identifying obesity automatically using text mining techniques and information related to body weight measures and obesity comorbidities. We used a dataset of 3015 de-identified medical records that contain labels for two classification problems. The first classification problem distinguishes between obesity, overweight, normal weight, and underweight. The second classification problem differentiates between obesity types: super obesity, morbid obesity, severe obesity and moderate obesity. We used a Bag of Words approach to represent the records together with unigram and bigram representations of the features. We implemented two approaches: a hierarchical method and a nonhierarchical one. We used Support Vector Machine and Naïve Bayes together with ten-fold cross validation to evaluate and compare performances. Our results indicate that the hierarchical approach does not work as well as the nonhierarchical one. In general, our results show that Support Vector Machine obtains better performances than Naïve Bayes for both classification problems. We also observed that bigram representation improves performance compared with unigram representation.

  20. Movement imagery classification in EMOTIV cap based system by Naïve Bayes.

    PubMed

    Stock, Vinicius N; Balbinot, Alexandre

    2016-08-01

    Brain-computer interfaces (BCI) provide means of communications and control, in assistive technology, which do not require motor activity from the user. The goal of this study is to promote classification of two types of imaginary movements, left and right hands, in an EMOTIV cap based system, using the Naïve Bayes classifier. A preliminary analysis with respect to results obtained by other experiments in this field is also conducted. Processing of the electroencephalography (EEG) signals is done applying Common Spatial Pattern filters. The EPOC electrodes cap is used for EEG acquisition, in two test subjects, for two distinct trial formats. The channels picked are FC5, FC6, P7 and P8 of the 10-20 system, and a discussion about the differences of using C3, C4, P3 and P4 positions is proposed. Dataset 3 of the BCI Competition II is also analyzed using the implemented algorithms. The maximum classification results for the proposed experiment and for the BCI Competition dataset were, respectively, 79% and 85% The conclusion of this study is that the picked positions for electrodes may be applied for BCI systems with satisfactory classification rates.

  1. Application of Taxonomic Modeling to Microbiota Data Mining for Detection of Helminth Infection in Global Populations.

    PubMed

    Torbati, Mahbaneh Eshaghzadeh; Mitreva, Makedonka; Gopalakrishnan, Vanathi

    2016-12-01

    Human microbiome data from genomic sequencing technologies is fast accumulating, giving us insights into bacterial taxa that contribute to health and disease. The predictive modeling of such microbiota count data for the classification of human infection from parasitic worms, such as helminths, can help in the detection and management across global populations. Real-world datasets of microbiome experiments are typically sparse, containing hundreds of measurements for bacterial species, of which only a few are detected in the bio-specimens that are analyzed. This feature of microbiome data produces the challenge of needing more observations for accurate predictive modeling and has been dealt with previously, using different methods of feature reduction. To our knowledge, integrative methods, such as transfer learning, have not yet been explored in the microbiome domain as a way to deal with data sparsity by incorporating knowledge of different but related datasets. One way of incorporating this knowledge is by using a meaningful mapping among features of these datasets. In this paper, we claim that this mapping would exist among members of each individual cluster, grouped based on phylogenetic dependency among taxa and their association to the phenotype. We validate our claim by showing that models incorporating associations in such a grouped feature space result in no performance deterioration for the given classification task. In this paper, we test our hypothesis by using classification models that detect helminth infection in microbiota of human fecal samples obtained from Indonesia and Liberia countries. In our experiments, we first learn binary classifiers for helminth infection detection by using Naive Bayes, Support Vector Machines, Multilayer Perceptrons, and Random Forest methods. In the next step, we add taxonomic modeling by using the SMART-scan module to group the data, and learn classifiers using the same four methods, to test the validity of the achieved groupings. We observed a 6% to 23% and 7% to 26% performance improvement based on the Area Under the receiver operating characteristic (ROC) Curve (AUC) and Balanced Accuracy (Bacc) measures, respectively, over 10 runs of 10-fold cross-validation. These results show that using phylogenetic dependency for grouping our microbiota data actually results in a noticeable improvement in classification performance for helminth infection detection. These promising results from this feasibility study demonstrate that methods such as SMART-scan can be utilized in the future for knowledge transfer from different but related microbiome datasets by phylogenetically-related functional mapping, to enable novel integrative biomarker discovery.

  2. "When 'Bad' is 'Good'": Identifying Personal Communication and Sentiment in Drug-Related Tweets.

    PubMed

    Daniulaityte, Raminta; Chen, Lu; Lamy, Francois R; Carlson, Robert G; Thirunarayan, Krishnaprasad; Sheth, Amit

    2016-10-24

    To harness the full potential of social media for epidemiological surveillance of drug abuse trends, the field needs a greater level of automation in processing and analyzing social media content. The objective of the study is to describe the development of supervised machine-learning techniques for the eDrugTrends platform to automatically classify tweets by type/source of communication (personal, official/media, retail) and sentiment (positive, negative, neutral) expressed in cannabis- and synthetic cannabinoid-related tweets. Tweets were collected using Twitter streaming Application Programming Interface and filtered through the eDrugTrends platform using keywords related to cannabis, marijuana edibles, marijuana concentrates, and synthetic cannabinoids. After creating coding rules and assessing intercoder reliability, a manually labeled data set (N=4000) was developed by coding several batches of randomly selected subsets of tweets extracted from the pool of 15,623,869 collected by eDrugTrends (May-November 2015). Out of 4000 tweets, 25% (1000/4000) were used to build source classifiers and 75% (3000/4000) were used for sentiment classifiers. Logistic Regression (LR), Naive Bayes (NB), and Support Vector Machines (SVM) were used to train the classifiers. Source classification (n=1000) tested Approach 1 that used short URLs, and Approach 2 where URLs were expanded and included into the bag-of-words analysis. For sentiment classification, Approach 1 used all tweets, regardless of their source/type (n=3000), while Approach 2 applied sentiment classification to personal communication tweets only (2633/3000, 88%). Multiclass and binary classification tasks were examined, and machine-learning sentiment classifier performance was compared with Valence Aware Dictionary for sEntiment Reasoning (VADER), a lexicon and rule-based method. The performance of each classifier was assessed using 5-fold cross validation that calculated average F-scores. One-tailed t test was used to determine if differences in F-scores were statistically significant. In multiclass source classification, the use of expanded URLs did not contribute to significant improvement in classifier performance (0.7972 vs 0.8102 for SVM, P=.19). In binary classification, the identification of all source categories improved significantly when unshortened URLs were used, with personal communication tweets benefiting the most (0.8736 vs 0.8200, P<.001). In multiclass sentiment classification Approach 1, SVM (0.6723) performed similarly to NB (0.6683) and LR (0.6703). In Approach 2, SVM (0.7062) did not differ from NB (0.6980, P=.13) or LR (F=0.6931, P=.05), but it was over 40% more accurate than VADER (F=0.5030, P<.001). In multiclass task, improvements in sentiment classification (Approach 2 vs Approach 1) did not reach statistical significance (eg, SVM: 0.7062 vs 0.6723, P=.052). In binary sentiment classification (positive vs negative), Approach 2 (focus on personal communication tweets only) improved classification results, compared with Approach 1, for LR (0.8752 vs 0.8516, P=.04) and SVM (0.8800 vs 0.8557, P=.045). The study provides an example of the use of supervised machine learning methods to categorize cannabis- and synthetic cannabinoid-related tweets with fairly high accuracy. Use of these content analysis tools along with geographic identification capabilities developed by the eDrugTrends platform will provide powerful methods for tracking regional changes in user opinions related to cannabis and synthetic cannabinoids use over time and across different regions.

  3. Comparison of Hyperspectral and Multispectral Satellites for Forest Alliance Classification in the San Francisco Bay Area

    NASA Astrophysics Data System (ADS)

    Clark, M. L.

    2016-12-01

    The goal of this study was to assess multi-temporal, Hyperspectral Infrared Imager (HyspIRI) satellite imagery for improved forest class mapping relative to multispectral satellites. The study area was the western San Francisco Bay Area, California and forest alliances (e.g., forest communities defined by dominant or co-dominant trees) were defined using the U.S. National Vegetation Classification System. Simulated 30-m HyspIRI, Landsat 8 and Sentinel-2 imagery were processed from image data acquired by NASA's AVIRIS airborne sensor in year 2015, with summer and multi-temporal (spring, summer, fall) data analyzed separately. HyspIRI reflectance was used to generate a suite of hyperspectral metrics that targeted key spectral features related to chemical and structural properties. The Random Forests classifier was applied to the simulated images and overall accuracies (OA) were compared to those from real Landsat 8 images. For each image group, broad land cover (e.g., Needle-leaf Trees, Broad-leaf Trees, Annual agriculture, Herbaceous, Built-up) was classified first, followed by a finer-detail forest alliance classification for pixels mapped as closed-canopy forest. There were 5 needle-leaf tree alliances and 16 broad-leaf tree alliances, including 7 Quercus (oak) alliance types. No forest alliance classification exceeded 50% OA, indicating that there was broad spectral similarity among alliances, most of which were not spectrally pure but rather a mix of tree species. In general, needle-leaf (Pine, Redwood, Douglas Fir) alliances had better class accuracies than broad-leaf alliances (Oaks, Madrone, Bay Laurel, Buckeye, etc). Multi-temporal data classifications all had 5-6% greater OA than with comparable summer data. For simulated data, HyspIRI metrics had 4-5% greater OA than Landsat 8 and Sentinel-2 multispectral imagery and 3-4% greater OA than HyspIRI reflectance. Finally, HyspIRI metrics had 8% greater OA than real Landsat 8 imagery. In conclusion, forest alliance classification was found to be a difficult remote sensing application with moderate resolution (30 m) satellite imagery; however, of the data tested, HyspIRI spectral metrics had the best performance relative to multispectral satellites.

  4. Supervised DNA Barcodes species classification: analysis, comparisons and results

    PubMed Central

    2014-01-01

    Background Specific fragments, coming from short portions of DNA (e.g., mitochondrial, nuclear, and plastid sequences), have been defined as DNA Barcode and can be used as markers for organisms of the main life kingdoms. Species classification with DNA Barcode sequences has been proven effective on different organisms. Indeed, specific gene regions have been identified as Barcode: COI in animals, rbcL and matK in plants, and ITS in fungi. The classification problem assigns an unknown specimen to a known species by analyzing its Barcode. This task has to be supported with reliable methods and algorithms. Methods In this work the efficacy of supervised machine learning methods to classify species with DNA Barcode sequences is shown. The Weka software suite, which includes a collection of supervised classification methods, is adopted to address the task of DNA Barcode analysis. Classifier families are tested on synthetic and empirical datasets belonging to the animal, fungus, and plant kingdoms. In particular, the function-based method Support Vector Machines (SVM), the rule-based RIPPER, the decision tree C4.5, and the Naïve Bayes method are considered. Additionally, the classification results are compared with respect to ad-hoc and well-established DNA Barcode classification methods. Results A software that converts the DNA Barcode FASTA sequences to the Weka format is released, to adapt different input formats and to allow the execution of the classification procedure. The analysis of results on synthetic and real datasets shows that SVM and Naïve Bayes outperform on average the other considered classifiers, although they do not provide a human interpretable classification model. Rule-based methods have slightly inferior classification performances, but deliver the species specific positions and nucleotide assignments. On synthetic data the supervised machine learning methods obtain superior classification performances with respect to the traditional DNA Barcode classification methods. On empirical data their classification performances are at a comparable level to the other methods. Conclusions The classification analysis shows that supervised machine learning methods are promising candidates for handling with success the DNA Barcoding species classification problem, obtaining excellent performances. To conclude, a powerful tool to perform species identification is now available to the DNA Barcoding community. PMID:24721333

  5. Bayesian cloud detection for MERIS, AATSR, and their combination

    NASA Astrophysics Data System (ADS)

    Hollstein, A.; Fischer, J.; Carbajal Henken, C.; Preusker, R.

    2014-11-01

    A broad range of different of Bayesian cloud detection schemes is applied to measurements from the Medium Resolution Imaging Spectrometer (MERIS), the Advanced Along-Track Scanning Radiometer (AATSR), and their combination. The cloud masks were designed to be numerically efficient and suited for the processing of large amounts of data. Results from the classical and naive approach to Bayesian cloud masking are discussed for MERIS and AATSR as well as for their combination. A sensitivity study on the resolution of multidimensional histograms, which were post-processed by Gaussian smoothing, shows how theoretically insufficient amounts of truth data can be used to set up accurate classical Bayesian cloud masks. Sets of exploited features from single and derived channels are numerically optimized and results for naive and classical Bayesian cloud masks are presented. The application of the Bayesian approach is discussed in terms of reproducing existing algorithms, enhancing existing algorithms, increasing the robustness of existing algorithms, and on setting up new classification schemes based on manually classified scenes.

  6. Bayesian cloud detection for MERIS, AATSR, and their combination

    NASA Astrophysics Data System (ADS)

    Hollstein, A.; Fischer, J.; Carbajal Henken, C.; Preusker, R.

    2015-04-01

    A broad range of different of Bayesian cloud detection schemes is applied to measurements from the Medium Resolution Imaging Spectrometer (MERIS), the Advanced Along-Track Scanning Radiometer (AATSR), and their combination. The cloud detection schemes were designed to be numerically efficient and suited for the processing of large numbers of data. Results from the classical and naive approach to Bayesian cloud masking are discussed for MERIS and AATSR as well as for their combination. A sensitivity study on the resolution of multidimensional histograms, which were post-processed by Gaussian smoothing, shows how theoretically insufficient numbers of truth data can be used to set up accurate classical Bayesian cloud masks. Sets of exploited features from single and derived channels are numerically optimized and results for naive and classical Bayesian cloud masks are presented. The application of the Bayesian approach is discussed in terms of reproducing existing algorithms, enhancing existing algorithms, increasing the robustness of existing algorithms, and on setting up new classification schemes based on manually classified scenes.

  7. Sensitivity and specificity of machine learning classifiers for glaucoma diagnosis using Spectral Domain OCT and standard automated perimetry.

    PubMed

    Silva, Fabrício R; Vidotti, Vanessa G; Cremasco, Fernanda; Dias, Marcelo; Gomi, Edson S; Costa, Vital P

    2013-01-01

    To evaluate the sensitivity and specificity of machine learning classifiers (MLCs) for glaucoma diagnosis using Spectral Domain OCT (SD-OCT) and standard automated perimetry (SAP). Observational cross-sectional study. Sixty two glaucoma patients and 48 healthy individuals were included. All patients underwent a complete ophthalmologic examination, achromatic standard automated perimetry (SAP) and retinal nerve fiber layer (RNFL) imaging with SD-OCT (Cirrus HD-OCT; Carl Zeiss Meditec Inc., Dublin, California). Receiver operating characteristic (ROC) curves were obtained for all SD-OCT parameters and global indices of SAP. Subsequently, the following MLCs were tested using parameters from the SD-OCT and SAP: Bagging (BAG), Naive-Bayes (NB), Multilayer Perceptron (MLP), Radial Basis Function (RBF), Random Forest (RAN), Ensemble Selection (ENS), Classification Tree (CTREE), Ada Boost M1(ADA),Support Vector Machine Linear (SVML) and Support Vector Machine Gaussian (SVMG). Areas under the receiver operating characteristic curves (aROC) obtained for isolated SAP and OCT parameters were compared with MLCs using OCT+SAP data. Combining OCT and SAP data, MLCs' aROCs varied from 0.777(CTREE) to 0.946 (RAN).The best OCT+SAP aROC obtained with RAN (0.946) was significantly larger the best single OCT parameter (p<0.05), but was not significantly different from the aROC obtained with the best single SAP parameter (p=0.19). Machine learning classifiers trained on OCT and SAP data can successfully discriminate between healthy and glaucomatous eyes. The combination of OCT and SAP measurements improved the diagnostic accuracy compared with OCT data alone.

  8. A Mobile Health Application to Predict Postpartum Depression Based on Machine Learning.

    PubMed

    Jiménez-Serrano, Santiago; Tortajada, Salvador; García-Gómez, Juan Miguel

    2015-07-01

    Postpartum depression (PPD) is a disorder that often goes undiagnosed. The development of a screening program requires considerable and careful effort, where evidence-based decisions have to be taken in order to obtain an effective test with a high level of sensitivity and an acceptable specificity that is quick to perform, easy to interpret, culturally sensitive, and cost-effective. The purpose of this article is twofold: first, to develop classification models for detecting the risk of PPD during the first week after childbirth, thus enabling early intervention; and second, to develop a mobile health (m-health) application (app) for the Android(®) (Google, Mountain View, CA) platform based on the model with best performance for both mothers who have just given birth and clinicians who want to monitor their patient's test. A set of predictive models for estimating the risk of PPD was trained using machine learning techniques and data about postpartum women collected from seven Spanish hospitals. An internal evaluation was carried out using a hold-out strategy. An easy flowchart and architecture for designing the graphical user interface of the m-health app was followed. Naive Bayes showed the best balance between sensitivity and specificity as a predictive model for PPD during the first week after delivery. It was integrated into the clinical decision support system for Android mobile apps. This approach can enable the early prediction and detection of PPD because it fulfills the conditions of an effective screening test with a high level of sensitivity and specificity that is quick to perform, easy to interpret, culturally sensitive, and cost-effective.

  9. New method for predicting estrogen receptor status utilizing breast MRI texture kinetic analysis

    NASA Astrophysics Data System (ADS)

    Chaudhury, Baishali; Hall, Lawrence O.; Goldgof, Dmitry B.; Gatenby, Robert A.; Gillies, Robert; Drukteinis, Jennifer S.

    2014-03-01

    Magnetic Resonance Imaging (MRI) of breast cancer typically shows that tumors are heterogeneous with spatial variations in blood flow and cell density. Here, we examine the potential link between clinical tumor imaging and the underlying evolutionary dynamics behind heterogeneity in the cellular expression of estrogen receptors (ER) in breast cancer. We assume, in an evolutionary environment, that ER expression will only occur in the presence of significant concentrations of estrogen, which is delivered via the blood stream. Thus, we hypothesize, the expression of ER in breast cancer cells will correlate with blood flow on gadolinium enhanced breast MRI. To test this hypothesis, we performed quantitative analysis of blood flow on dynamic contrast enhanced MRI (DCE-MRI) and correlated it with the ER status of the tumor. Here we present our analytic methods, which utilize a novel algorithm to analyze 20 volumetric DCE-MRI breast cancer tumors. The algorithm generates post initial enhancement (PIE) maps from DCE-MRI and then performs texture features extraction from the PIE map, feature selection, and finally classification of tumors into ER positive and ER negative status. The combined gray level co-occurrence matrices, gray level run length matrices and local binary pattern histogram features allow quantification of breast tumor heterogeneity. The algorithm predicted ER expression with an accuracy of 85% using a Naive Bayes classifier in leave-one-out cross-validation. Hence, we conclude that our data supports the hypothesis that imaging characteristics can, through application of evolutionary principles, provide insights into the cellular and molecular properties of cancer cells.

  10. Automated classification of neurological disorders of gait using spatio-temporal gait parameters.

    PubMed

    Pradhan, Cauchy; Wuehr, Max; Akrami, Farhoud; Neuhaeusser, Maximilian; Huth, Sabrina; Brandt, Thomas; Jahn, Klaus; Schniepp, Roman

    2015-04-01

    Automated pattern recognition systems have been used for accurate identification of neurological conditions as well as the evaluation of the treatment outcomes. This study aims to determine the accuracy of diagnoses of (oto-)neurological gait disorders using different types of automated pattern recognition techniques. Clinically confirmed cases of phobic postural vertigo (N = 30), cerebellar ataxia (N = 30), progressive supranuclear palsy (N = 30), bilateral vestibulopathy (N = 30), as well as healthy subjects (N = 30) were recruited for the study. 8 measurements with 136 variables using a GAITRite(®) sensor carpet were obtained from each subject. Subjects were randomly divided into two groups (training cases and validation cases). Sensitivity and specificity of k-nearest neighbor (KNN), naive-bayes classifier (NB), artificial neural network (ANN), and support vector machine (SVM) in classifying the validation cases were calculated. ANN and SVM had the highest overall sensitivity with 90.6% and 92.0% respectively, followed by NB (76.0%) and KNN (73.3%). SVM and ANN showed high false negative rates for bilateral vestibulopathy cases (20.0% and 26.0%); while KNN and NB had high false negative rates for progressive supranuclear palsy cases (76.7% and 40.0%). Automated pattern recognition systems are able to identify pathological gait patterns and establish clinical diagnosis with good accuracy. SVM and ANN in particular differentiate gait patterns of several distinct oto-neurological disorders of gait with high sensitivity and specificity compared to KNN and NB. Both SVM and ANN appear to be a reliable diagnostic and management tool for disorders of gait. Copyright © 2015 Elsevier Ltd. All rights reserved.

  11. The use of decision trees and naïve Bayes algorithms and trace element patterns for controlling the authenticity of free-range-pastured hens' eggs.

    PubMed

    Barbosa, Rommel Melgaço; Nacano, Letícia Ramos; Freitas, Rodolfo; Batista, Bruno Lemos; Barbosa, Fernando

    2014-09-01

    This article aims to evaluate 2 machine learning algorithms, decision trees and naïve Bayes (NB), for egg classification (free-range eggs compared with battery eggs). The database used for the study consisted of 15 chemical elements (As, Ba, Cd, Co, Cs, Cu, Fe, Mg, Mn, Mo, Pb, Se, Sr, V, and Zn) determined in 52 eggs samples (20 free-range and 32 battery eggs) by inductively coupled plasma mass spectrometry. Our results demonstrated that decision trees and NB associated with the mineral contents of eggs provide a high level of accuracy (above 80% and 90%, respectively) for classification between free-range and battery eggs and can be used as an alternative method for adulteration evaluation. © 2014 Institute of Food Technologists®

  12. Wood identification of Dalbergia nigra (CITES Appendix I) using quantitative wood anatomy, principal components analysis and naïve Bayes classification

    PubMed Central

    Gasson, Peter; Miller, Regis; Stekel, Dov J.; Whinder, Frances; Ziemińska, Kasia

    2010-01-01

    Background and Aims Dalbergia nigra is one of the most valuable timber species of its genus, having been traded for over 300 years. Due to over-exploitation it is facing extinction and trade has been banned under CITES Appendix I since 1992. Current methods, primarily comparative wood anatomy, are inadequate for conclusive species identification. This study aims to find a set of anatomical characters that distinguish the wood of D. nigra from other commercially important species of Dalbergia from Latin America. Methods Qualitative and quantitative wood anatomy, principal components analysis and naïve Bayes classification were conducted on 43 specimens of Dalbergia, eight D. nigra and 35 from six other Latin American species. Key Results Dalbergia cearensis and D. miscolobium can be distinguished from D. nigra on the basis of vessel frequency for the former, and ray frequency for the latter. Principal components analysis was unable to provide any further basis for separating the species. Naïve Bayes classification using the four characters: minimum vessel diameter; frequency of solitary vessels; mean ray width; and frequency of axially fused rays, classified all eight D. nigra correctly with no false negatives, but there was a false positive rate of 36·36 %. Conclusions Wood anatomy alone cannot distinguish D. nigra from all other commercially important Dalbergia species likely to be encountered by customs officials, but can be used to reduce the number of specimens that would need further study. PMID:19884155

  13. Analysis of calibrated seafloor backscatter for habitat classification methodology and case study of 158 spots in the Bay of Biscay and Celtic Sea

    NASA Astrophysics Data System (ADS)

    Fezzani, Ridha; Berger, Laurent

    2018-06-01

    An automated signal-based method was developed in order to analyse the seafloor backscatter data logged by calibrated multibeam echosounder. The processing consists first in the clustering of each survey sub-area into a small number of homogeneous sediment types, based on the backscatter average level at one or several incidence angles. Second, it uses their local average angular response to extract discriminant descriptors, obtained by fitting the field data to the Generic Seafloor Acoustic Backscatter parametric model. Third, the descriptors are used for seafloor type classification. The method was tested on the multi-year data recorded by a calibrated 90-kHz Simrad ME70 multibeam sonar operated in the Bay of Biscay, France and Celtic Sea, Ireland. It was applied for seafloor-type classification into 12 classes, to a dataset of 158 spots surveyed for demersal and benthic fauna study and monitoring. Qualitative analyses and classified clusters using extracted parameters show a good discriminatory potential, indicating the robustness of this approach.

  14. Geological sampling data and benthic biota classification: Buzzards Bay and Vineyard Sound, Massachusetts

    USGS Publications Warehouse

    Ackerman, Seth D.; Pappal, Adrienne L.; Huntley, Emily C.; Blackwood, Dann S.; Schwab, William C.

    2015-01-01

    Sea-floor sample collection is an important component of a statewide cooperative mapping effort between the U.S. Geological Survey (USGS) and the Massachusetts Office of Coastal Zone Management (CZM). Sediment grab samples, bottom photographs, and video transects were collected within Vineyard Sound and Buzzards Bay in 2010 aboard the research vesselConnecticut. This report contains sample data and related information, including analyses of surficial-sediment grab samples, locations and images of sea-floor photography, survey lines along which sea-floor video was collected, and a classification of benthic biota observed in sea-floor photographs and based on the Coastal and Marine Ecological Classification Standard (CMECS). These sample data and analyses information are used to verify interpretations of geophysical data and are an essential part of geologic maps of the sea floor. These data also provide a valuable inventory of benthic habitat and resources. Geographic information system (GIS) data, maps, and interpretations, produced through the USGS and CZM mapping cooperative, are intended to aid efforts to manage coastal and marine resources and to provide baseline information for research focused on coastal evolution and environmental change.

  15. Remote sensing of submerged aquatic vegetation in lower Chesapeake Bay - A comparison of Landsat MSS to TM imagery

    NASA Technical Reports Server (NTRS)

    Ackleson, S. G.; Klemas, V.

    1987-01-01

    Landsat MSS and TM imagery, obtained simultaneously over Guinea Marsh, VA, as analyzed and compares for its ability to detect submerged aquatic vegetation (SAV). An unsupervised clustering algorithm was applied to each image, where the input classification parameters are defined as functions of apparent sensor noise. Class confidence and accuracy were computed for all water areas by comparing the classified images, pixel-by-pixel, to rasterized SAV distributions derived from color aerial photography. To illustrate the effect of water depth on classification error, areas of depth greater than 1.9 m were masked, and class confidence and accuracy recalculated. A single-scattering radiative-transfer model is used to illustrate how percent canopy cover and water depth affect the volume reflectance from a water column containing SAV. For a submerged canopy that is morphologically and optically similar to Zostera marina inhabiting Lower Chesapeake Bay, dense canopies may be isolated by masking optically deep water. For less dense canopies, the effect of increasing water depth is to increase the apparent percent crown cover, which may result in classification error.

  16. Multivariate spline methods in surface fitting

    NASA Technical Reports Server (NTRS)

    Guseman, L. F., Jr. (Principal Investigator); Schumaker, L. L.

    1984-01-01

    The use of spline functions in the development of classification algorithms is examined. In particular, a method is formulated for producing spline approximations to bivariate density functions where the density function is decribed by a histogram of measurements. The resulting approximations are then incorporated into a Bayesiaan classification procedure for which the Bayes decision regions and the probability of misclassification is readily computed. Some preliminary numerical results are presented to illustrate the method.

  17. Section 14 Detailed Project Report, Emergency Shoreline Protection, Portersville Bay Mobile County, Alabama

    DTIC Science & Technology

    1990-05-01

    MOBILE COUNTY, ALABAMA MAY 1990 FT~~f f r’ep l c sl m 0 F UNCLASSIFIED SECURITY CLASSIFICATION OF THIS PAGE rw’hen Dat. Entered) REPOT DCUMNTATON...1990 Portersville Bay, Mobile County, Ala. 6. PERFORMING ORG. REPORT NUMBER 7. AUJTHOR(*) 8. CONTRACT OR GRANT NUMBER(*.) Johnny L. Grandison 9. PE...RFORMING ORGANIZATION NAME AND ADDRESS 10. PROGRAM1 ELEMENT, PROJECT, TASK U.S. Army En(7ineer District, Mobile AREA& ORK UNIT NUMBERS Plan Develop

  18. Sentiment analysis: a comparison of deep learning neural network algorithm with SVM and naϊve Bayes for Indonesian text

    NASA Astrophysics Data System (ADS)

    Calvin Frans Mariel, Wahyu; Mariyah, Siti; Pramana, Setia

    2018-03-01

    Deep learning is a new era of machine learning techniques that essentially imitate the structure and function of the human brain. It is a development of deeper Artificial Neural Network (ANN) that uses more than one hidden layer. Deep Learning Neural Network has a great ability on recognizing patterns from various data types such as picture, audio, text, and many more. In this paper, the authors tries to measure that algorithm’s ability by applying it into the text classification. The classification task herein is done by considering the content of sentiment in a text which is also called as sentiment analysis. By using several combinations of text preprocessing and feature extraction techniques, we aim to compare the precise modelling results of Deep Learning Neural Network with the other two commonly used algorithms, the Naϊve Bayes and Support Vector Machine (SVM). This algorithm comparison uses Indonesian text data with balanced and unbalanced sentiment composition. Based on the experimental simulation, Deep Learning Neural Network clearly outperforms the Naϊve Bayes and SVM and offers a better F-1 Score while for the best feature extraction technique which improves that modelling result is Bigram.

  19. Marine benthic habitat mapping of the West Arm, Glacier Bay National Park and Preserve, Alaska

    USGS Publications Warehouse

    Hodson, Timothy O.; Cochrane, Guy R.; Powell, Ross D.

    2013-01-01

    Seafloor geology and potential benthic habitats were mapped in West Arm, Glacier Bay National Park and Preserve, Alaska, using multibeam sonar, groundtruthed observations, and geological interpretations. The West Arm of Glacier Bay is a recently deglaciated fjord system under the influence of glacial and paraglacial marine processes. High glacially derived sediment and meltwater fluxes, slope instabilities, and variable bathymetry result in a highly dynamic estuarine environment and benthic ecosystem. We characterize the fjord seafloor and potential benthic habitats using the recently developed Coastal and Marine Ecological Classification Standard (CMECS) by the National Oceanic and Atmospheric Administration (NOAA) and NatureServe. Due to the high flux of glacially sourced fines, mud is the dominant substrate within the West Arm. Water-column characteristics are addressed using a combination of CTD and circulation model results. We also present sediment accumulation data derived from differential bathymetry. These data show the West Arm is divided into two contrasting environments: a dynamic upper fjord and a relatively static lower fjord. The results of these analyses serve as a test of the CMECS classification scheme and as a baseline for ongoing and future mapping efforts and correlations between seafloor substrate, benthic habitats, and glacimarine processes.

  20. Habitat Mapping and Classification of the Grand Bay National Estuarine Research Reserve using AISA Hyperspectral Imagery

    NASA Astrophysics Data System (ADS)

    Rose, K.

    2012-12-01

    Habitat mapping and classification provides essential information for land use planning and ecosystem research, monitoring and management. At the Grand Bay National Estuarine Research Reserve (GRDNERR), Mississippi, habitat characterization of the Grand Bay watershed will also be used to develop a decision-support tool for the NERR's managers and state and local partners. Grand Bay NERR habitat units were identified using a combination of remotely sensed imagery, aerial photography and elevation data. Airborne Imaging Spectrometer for Applications (AISA) hyperspectral data, acquired 5 and 6 May 2010, was analyzed and classified using ENVI v4.8 and v5.0 software. The AISA system was configured to return 63 bands of digital imagery data with a spectral range of 400 to 970 nm (VNIR), spectral resolution (bandwidth) at 8.76 nm, and 1 m spatial resolution. Minimum Noise Fraction (MNF) and Inverse Minimum Noise Fraction were applied to the data prior to using Spectral Angle Mapper ([SAM] supervised) and ISODATA (unsupervised) classification techniques. The resulting class image was exported to ArcGIS 10.0 and visually inspected and compared with the original imagery as well as auxiliary datasets to assist in the attribution of habitat characteristics to the spectral classes, including: National Agricultural Imagery Program (NAIP) aerial photography, Jackson County, MS, 2010; USFWS National Wetlands Inventory, 2007; an existing GRDNERR habitat map (2004), SAV (2009) and salt panne (2002-2003) GIS produced by GRDNERR; and USACE lidar topo-bathymetry, 2005. A field survey to validate the map's accuracy will take place during the 2012 summer season. ENVI's Random Sample generator was used to generate GIS points for a ground-truth survey. The broad range of coastal estuarine habitats and geomorphological features- many of which are transitional and vulnerable to environmental stressors- that have been identified within the GRDNERR point to the value of the Reserve for continued coastal research.

  1. Use of the Coastal and Marine Ecological Classification Standard (CMECS) for Geological Studies in Glacier Bay, Alaska

    NASA Astrophysics Data System (ADS)

    Cochrane, G. R.; Hodson, T. O.; Allee, R.; Cicchetti, G.; Finkbeiner, M.; Goodin, K.; Handley, L.; Madden, C.; Mayer, G.; Shumchenia, E.

    2012-12-01

    The U S Geological Survey (USGS) is one of four primary organizations (along with the National Oceanographic and Atmospheric Administration, the Evironmental Protection Agency, and NatureServe) responsible for the development of the Coastal and Marine Ecological Classification Standard (CMECS) over the past decade. In June 2012 the Federal Geographic Data Committee approved CMECS as the first-ever comprehensive federal standard for classifying and describing coastal and marine ecosystems. The USGS has pioneered the application of CMECS in Glacier Bay, Alaska as part of its Seafloor Mapping and Benthic Habitat Studies Project. This presentation briefly describes the standard and its application as part of geological survey studies in the Western Arm of Glacier Bay. CMECS offers a simple, standard framework and common terminology for describing natural and human influenced ecosystems from the upper tidal reaches of estuaries to the deepest portions of the ocean. The framework is organized into two settings, biogeographic and aquatic, and four components, water column, geoform, substrate, and biotic. Each describes a separate aspect of the environment and biota. Settings and components can be used in combination or independently to describe ecosystem features. The hierarchical arrangement of units of the settings and components allows users to apply CMECS to the scale and specificity that best suits their needs. Modifiers allow users to customize the classification to meet specific needs. Biotopes can be described when there is a need for more detailed information on the biota and their environment. USGS efforts focused primarily on the substrate and geoform components. Previous research has demonstrated three classes of bottom type that can be derived from multibeam data that in part determine the distribution of benthic organisms: soft, flat bottom, mixed bottom including coarse sediment and low-relief rock with low to moderate rugosity, and rugose, hard bottom. The West Arm of Glacier Bay has all of these habitats, with the greatest abundance being soft, flat bottom. In Glacier Bay, species associated with soft, flat bottom habitats include gastropods, algae, flatfish, Tanner crabs, shrimp, sea pen, and other crustaceans; soft corals and sponge dominate areas of boulder and rock substrate. Video observations in the West Arm suggest that geological-biological associations found in central Glacier Bay to be at least partially analogous to associations in the West Arm. Given that soft, mud substrate is the most prevalent habitat in the West Arm, it is expected that the species associated with a soft bottom in the bay proper are the most abundant types of species within the West Arm. While mud is the dominant substrate throughout the fjord, the upper and lower West Arm are potentially very different environments due to the spatially and temporally heterogeneous influence of glaciation and associated effects on fjord hydrologic and oceanographic conditions. Therefore, we expect variations in the distribution of species and the development of biotopes for Glacier Bay will require data applicable to the full spectrum of CMECS components.

  2. Comparative study of classification algorithms for damage classification in smart composite laminates

    NASA Astrophysics Data System (ADS)

    Khan, Asif; Ryoo, Chang-Kyung; Kim, Heung Soo

    2017-04-01

    This paper presents a comparative study of different classification algorithms for the classification of various types of inter-ply delaminations in smart composite laminates. Improved layerwise theory is used to model delamination at different interfaces along the thickness and longitudinal directions of the smart composite laminate. The input-output data obtained through surface bonded piezoelectric sensor and actuator is analyzed by the system identification algorithm to get the system parameters. The identified parameters for the healthy and delaminated structure are supplied as input data to the classification algorithms. The classification algorithms considered in this study are ZeroR, Classification via regression, Naïve Bayes, Multilayer Perceptron, Sequential Minimal Optimization, Multiclass-Classifier, and Decision tree (J48). The open source software of Waikato Environment for Knowledge Analysis (WEKA) is used to evaluate the classification performance of the classifiers mentioned above via 75-25 holdout and leave-one-sample-out cross-validation regarding classification accuracy, precision, recall, kappa statistic and ROC Area.

  3. Classification accuracies of physical activities using smartphone motion sensors.

    PubMed

    Wu, Wanmin; Dasgupta, Sanjoy; Ramirez, Ernesto E; Peterson, Carlyn; Norman, Gregory J

    2012-10-05

    Over the past few years, the world has witnessed an unprecedented growth in smartphone use. With sensors such as accelerometers and gyroscopes on board, smartphones have the potential to enhance our understanding of health behavior, in particular physical activity or the lack thereof. However, reliable and valid activity measurement using only a smartphone in situ has not been realized. To examine the validity of the iPod Touch (Apple, Inc.) and particularly to understand the value of using gyroscopes for classifying types of physical activity, with the goal of creating a measurement and feedback system that easily integrates into individuals' daily living. We collected accelerometer and gyroscope data for 16 participants on 13 activities with an iPod Touch, a device that has essentially the same sensors and computing platform as an iPhone. The 13 activities were sitting, walking, jogging, and going upstairs and downstairs at different paces. We extracted time and frequency features, including mean and variance of acceleration and gyroscope on each axis, vector magnitude of acceleration, and fast Fourier transform magnitude for each axis of acceleration. Different classifiers were compared using the Waikato Environment for Knowledge Analysis (WEKA) toolkit, including C4.5 (J48) decision tree, multilayer perception, naive Bayes, logistic, k-nearest neighbor (kNN), and meta-algorithms such as boosting and bagging. The 10-fold cross-validation protocol was used. Overall, the kNN classifier achieved the best accuracies: 52.3%-79.4% for up and down stair walking, 91.7% for jogging, 90.1%-94.1% for walking on a level ground, and 100% for sitting. A 2-second sliding window size with a 1-second overlap worked the best. Adding gyroscope measurements proved to be more beneficial than relying solely on accelerometer readings for all activities (with improvement ranging from 3.1% to 13.4%). Common categories of physical activity and sedentary behavior (walking, jogging, and sitting) can be recognized with high accuracies using both the accelerometer and gyroscope onboard the iPod touch or iPhone. This suggests the potential of developing just-in-time classification and feedback tools on smartphones.

  4. Machine learning models in breast cancer survival prediction.

    PubMed

    Montazeri, Mitra; Montazeri, Mohadeseh; Montazeri, Mahdieh; Beigzadeh, Amin

    2016-01-01

    Breast cancer is one of the most common cancers with a high mortality rate among women. With the early diagnosis of breast cancer survival will increase from 56% to more than 86%. Therefore, an accurate and reliable system is necessary for the early diagnosis of this cancer. The proposed model is the combination of rules and different machine learning techniques. Machine learning models can help physicians to reduce the number of false decisions. They try to exploit patterns and relationships among a large number of cases and predict the outcome of a disease using historical cases stored in datasets. The objective of this study is to propose a rule-based classification method with machine learning techniques for the prediction of different types of Breast cancer survival. We use a dataset with eight attributes that include the records of 900 patients in which 876 patients (97.3%) and 24 (2.7%) patients were females and males respectively. Naive Bayes (NB), Trees Random Forest (TRF), 1-Nearest Neighbor (1NN), AdaBoost (AD), Support Vector Machine (SVM), RBF Network (RBFN), and Multilayer Perceptron (MLP) machine learning techniques with 10-cross fold technique were used with the proposed model for the prediction of breast cancer survival. The performance of machine learning techniques were evaluated with accuracy, precision, sensitivity, specificity, and area under ROC curve. Out of 900 patients, 803 patients and 97 patients were alive and dead, respectively. In this study, Trees Random Forest (TRF) technique showed better results in comparison to other techniques (NB, 1NN, AD, SVM and RBFN, MLP). The accuracy, sensitivity and the area under ROC curve of TRF are 96%, 96%, 93%, respectively. However, 1NN machine learning technique provided poor performance (accuracy 91%, sensitivity 91% and area under ROC curve 78%). This study demonstrates that Trees Random Forest model (TRF) which is a rule-based classification model was the best model with the highest level of accuracy. Therefore, this model is recommended as a useful tool for breast cancer survival prediction as well as medical decision making.

  5. Integrating Entropy-Based Naïve Bayes and GIS for Spatial Evaluation of Flood Hazard.

    PubMed

    Liu, Rui; Chen, Yun; Wu, Jianping; Gao, Lei; Barrett, Damian; Xu, Tingbao; Li, Xiaojuan; Li, Linyi; Huang, Chang; Yu, Jia

    2017-04-01

    Regional flood risk caused by intensive rainfall under extreme climate conditions has increasingly attracted global attention. Mapping and evaluation of flood hazard are vital parts in flood risk assessment. This study develops an integrated framework for estimating spatial likelihood of flood hazard by coupling weighted naïve Bayes (WNB), geographic information system, and remote sensing. The north part of Fitzroy River Basin in Queensland, Australia, was selected as a case study site. The environmental indices, including extreme rainfall, evapotranspiration, net-water index, soil water retention, elevation, slope, drainage proximity, and density, were generated from spatial data representing climate, soil, vegetation, hydrology, and topography. These indices were weighted using the statistics-based entropy method. The weighted indices were input into the WNB-based model to delineate a regional flood risk map that indicates the likelihood of flood occurrence. The resultant map was validated by the maximum inundation extent extracted from moderate resolution imaging spectroradiometer (MODIS) imagery. The evaluation results, including mapping and evaluation of the distribution of flood hazard, are helpful in guiding flood inundation disaster responses for the region. The novel approach presented consists of weighted grid data, image-based sampling and validation, cell-by-cell probability inferring and spatial mapping. It is superior to an existing spatial naive Bayes (NB) method for regional flood hazard assessment. It can also be extended to other likelihood-related environmental hazard studies. © 2016 Society for Risk Analysis.

  6. Simultaneous learning and filtering without delusions: a Bayes-optimal combination of Predictive Inference and Adaptive Filtering.

    PubMed

    Kneissler, Jan; Drugowitsch, Jan; Friston, Karl; Butz, Martin V

    2015-01-01

    Predictive coding appears to be one of the fundamental working principles of brain processing. Amongst other aspects, brains often predict the sensory consequences of their own actions. Predictive coding resembles Kalman filtering, where incoming sensory information is filtered to produce prediction errors for subsequent adaptation and learning. However, to generate prediction errors given motor commands, a suitable temporal forward model is required to generate predictions. While in engineering applications, it is usually assumed that this forward model is known, the brain has to learn it. When filtering sensory input and learning from the residual signal in parallel, a fundamental problem arises: the system can enter a delusional loop when filtering the sensory information using an overly trusted forward model. In this case, learning stalls before accurate convergence because uncertainty about the forward model is not properly accommodated. We present a Bayes-optimal solution to this generic and pernicious problem for the case of linear forward models, which we call Predictive Inference and Adaptive Filtering (PIAF). PIAF filters incoming sensory information and learns the forward model simultaneously. We show that PIAF is formally related to Kalman filtering and to the Recursive Least Squares linear approximation method, but combines these procedures in a Bayes optimal fashion. Numerical evaluations confirm that the delusional loop is precluded and that the learning of the forward model is more than 10-times faster when compared to a naive combination of Kalman filtering and Recursive Least Squares.

  7. Application of otolith shape analysis for stock discrimination and species identification of five goby species (Perciformes: Gobiidae) in the northern Chinese coastal waters

    NASA Astrophysics Data System (ADS)

    Yu, Xin; Cao, Liang; Liu, Jinhu; Zhao, Bo; Shan, Xiujuan; Dou, Shuozeng

    2014-09-01

    We tested the use of otolith shape analysis to discriminate between species and stocks of five goby species ( Ctenotrypauchen chinensis, Odontamblyopus lacepedii, Amblychaeturichthys hexanema, Chaeturichthys stigmatias, and Acanthogobius hasta) found in northern Chinese coastal waters. The five species were well differentiated with high overall classification success using shape indices (83.7%), elliptic Fourier coefficients (98.6%), or the combination of both methods (94.9%). However, shape analysis alone was only moderately successful at discriminating among the four stocks (Liaodong Bay, LD; Bohai Bay, BH; Huanghe (Yellow) River estuary HRE, and Jiaozhou Bay, JZ stocks) of A. hasta (50%-54%) and C. stigmatias (65.7%-75.8%). For these two species, shape analysis was moderately successful at discriminating the HRE or JZ stocks from other stocks, but failed to effectively identify the LD and BH stocks. A large number of otoliths were misclassified between the HRE and JZ stocks, which are geographically well separated. The classification success for stock discrimination was higher using elliptic Fourier coefficients alone (70.2%) or in combination with shape indices (75.8%) than using only shape indices (65.7%) in C. stigmatias whereas there was little difference among the three methods for A. hasta. Our results supported the common belief that otolith shape analysis is generally more effective for interspecific identification than intraspecific discrimination. Moreover, compared with shape indices analysis, Fourier analysis improves classification success during inter- and intra-species discrimination by otolith shape analysis, although this did not necessarily always occur in all fish species.

  8. Geologic characteristics of benthic habitats in Glacier Bay, southeast Alaska

    USGS Publications Warehouse

    Harney, Jodi N.; Cochrane, Guy R.; Etherington, Lisa L.; Dartnell, Pete; Golden, Nadine E.; Chezar, Hank

    2006-01-01

    In April 2004, more than 40 hours of georeferenced submarine digital video was collected in water depths of 15-370 m in Glacier Bay to (1) ground-truth existing geophysical data (bathymetry and acoustic reflectance), (2) examine and record geologic characteristics of the sea floor, and (3) investigate the relation between substrate types and benthic communities, and (4) construct predictive maps of seafloor geomorphology and habitat distribution. Common substrates observed include rock, boulders, cobbles, rippled sand, bioturbated mud, and extensive beds of living horse mussels and scallops. Four principal sea-floor geomorphic types are distinguished by using video observations. Their distribution in lower and central Glacier Bay is predicted using a supervised, hierarchical decision-tree statistical classification of geophysical data.

  9. Implicit structured sequence learning: an fMRI study of the structural mere-exposure effect

    PubMed Central

    Folia, Vasiliki; Petersson, Karl Magnus

    2014-01-01

    In this event-related fMRI study we investigated the effect of 5 days of implicit acquisition on preference classification by means of an artificial grammar learning (AGL) paradigm based on the structural mere-exposure effect and preference classification using a simple right-linear unification grammar. This allowed us to investigate implicit AGL in a proper learning design by including baseline measurements prior to grammar exposure. After 5 days of implicit acquisition, the fMRI results showed activations in a network of brain regions including the inferior frontal (centered on BA 44/45) and the medial prefrontal regions (centered on BA 8/32). Importantly, and central to this study, the inclusion of a naive preference fMRI baseline measurement allowed us to conclude that these fMRI findings were the intrinsic outcomes of the learning process itself and not a reflection of a preexisting functionality recruited during classification, independent of acquisition. Support for the implicit nature of the knowledge utilized during preference classification on day 5 come from the fact that the basal ganglia, associated with implicit procedural learning, were activated during classification, while the medial temporal lobe system, associated with explicit declarative memory, was consistently deactivated. Thus, preference classification in combination with structural mere-exposure can be used to investigate structural sequence processing (syntax) in unsupervised AGL paradigms with proper learning designs. PMID:24550865

  10. Implicit structured sequence learning: an fMRI study of the structural mere-exposure effect.

    PubMed

    Folia, Vasiliki; Petersson, Karl Magnus

    2014-01-01

    In this event-related fMRI study we investigated the effect of 5 days of implicit acquisition on preference classification by means of an artificial grammar learning (AGL) paradigm based on the structural mere-exposure effect and preference classification using a simple right-linear unification grammar. This allowed us to investigate implicit AGL in a proper learning design by including baseline measurements prior to grammar exposure. After 5 days of implicit acquisition, the fMRI results showed activations in a network of brain regions including the inferior frontal (centered on BA 44/45) and the medial prefrontal regions (centered on BA 8/32). Importantly, and central to this study, the inclusion of a naive preference fMRI baseline measurement allowed us to conclude that these fMRI findings were the intrinsic outcomes of the learning process itself and not a reflection of a preexisting functionality recruited during classification, independent of acquisition. Support for the implicit nature of the knowledge utilized during preference classification on day 5 come from the fact that the basal ganglia, associated with implicit procedural learning, were activated during classification, while the medial temporal lobe system, associated with explicit declarative memory, was consistently deactivated. Thus, preference classification in combination with structural mere-exposure can be used to investigate structural sequence processing (syntax) in unsupervised AGL paradigms with proper learning designs.

  11. Admiralty Bay Benthos Diversity—A census of a complex polar ecosystem

    NASA Astrophysics Data System (ADS)

    Siciński, Jacek; Jażdżewski, Krzysztof; Broyer, Claude De; Presler, Piotr; Ligowski, Ryszard; Nonato, Edmundo F.; Corbisier, Thais N.; Petti, Monica A. V.; Brito, Tania A. S.; Lavrado, Helena P.; BŁażewicz-Paszkowycz, Magdalena; Pabis, Krzysztof; Jażdżewska, Anna; Campos, Lucia S.

    2011-03-01

    A thorough census of Admiralty Bay benthic biodiversity was completed through the synthesis of data, acquired from more than 30 years of observations. Most of the available records arise from successive Polish and Brazilian Antarctic expeditions organized since 1977 and 1982, respectively, but also include new data from joint collecting efforts during the International Polar Year (2007-2009). Geological and hydrological characteristics of Admiralty Bay and a comprehensive species checklist with detailed data on the distribution and nature of the benthic communities are provided. Approximately 1300 species of benthic organisms (excluding bacteria, fungi and parasites) were recorded from the bay's entire depth range (0-500 m). Generalized classifications and the descriptions of soft-bottom and hard-bottom invertebrate communities are presented. A time-series analysis showed seasonal and interannual changes in the shallow benthic communities, likely to be related to ice formation and ice melt within the bay. As one of the best studied regions in the maritime Antarctic Admiralty Bay represents a legacy site, where continued, systematically integrated data sampling can evaluate the effects of climate change on marine life. Both high species richness and high assemblage diversity of the Admiralty Bay shelf benthic community have been documented against the background of habitat heterogeneity.

  12. Prediction of outcome in multiorgan resections for cancer using a bayes-network.

    PubMed

    Udelnow, Andrej; Leinung, Steffen; Grochola, Lukasz Filipp; Henne-Bruns, Doris; Wfcrl, Peter

    2013-01-01

    The long-term success of multivisceral resections for cancer is difficult to forecast due to the complexity of factors influencing the prognosis. The aim of our study was to assess the predictivity of a Bayes network for the postoperative outcome and survival. We included each oncologic patient undergoing resection of 4 or more organs from 2002 till 2005 at the Ulm university hospital. Preoperative data were assessed as well as the tumour classification, the resected organs, intra- and postoperative complications and overall survival. Using the Genie 2.0 software we developed a Bayes network. Multivisceral tumour resections were performed in 22 patients. The receiver operating curve areas of the variables "survival >12 months" and "hospitalisation >28 days" as predicted by the Bayes network were 0.81 and 0.77 and differed significantly from 0.5 (p: 0.019 and 0.028, respectively). The positive predictive values of the Bayes network for these variables were 1 and 0.8 and the negative ones 0.71 and 0.88, respectively. Bayes networks are useful for the prognosis estimation of individual patients and can help to decide whether to perform a multivisceral resection for cancer.

  13. Automatic Identification of Messages Related to Adverse Drug Reactions from Online User Reviews using Feature-based Classification.

    PubMed

    Liu, Jingfang; Zhang, Pengzhu; Lu, Yingjie

    2014-11-01

    User-generated medical messages on Internet contain extensive information related to adverse drug reactions (ADRs) and are known as valuable resources for post-marketing drug surveillance. The aim of this study was to find an effective method to identify messages related to ADRs automatically from online user reviews. We conducted experiments on online user reviews using different feature set and different classification technique. Firstly, the messages from three communities, allergy community, schizophrenia community and pain management community, were collected, the 3000 messages were annotated. Secondly, the N-gram-based features set and medical domain-specific features set were generated. Thirdly, three classification techniques, SVM, C4.5 and Naïve Bayes, were used to perform classification tasks separately. Finally, we evaluated the performance of different method using different feature set and different classification technique by comparing the metrics including accuracy and F-measure. In terms of accuracy, the accuracy of SVM classifier was higher than 0.8, the accuracy of C4.5 classifier or Naïve Bayes classifier was lower than 0.8; meanwhile, the combination feature sets including n-gram-based feature set and domain-specific feature set consistently outperformed single feature set. In terms of F-measure, the highest F-measure is 0.895 which was achieved by using combination feature sets and a SVM classifier. In all, we can get the best classification performance by using combination feature sets and SVM classifier. By using combination feature sets and SVM classifier, we can get an effective method to identify messages related to ADRs automatically from online user reviews.

  14. A Step Towards EEG-based Brain Computer Interface for Autism Intervention*

    PubMed Central

    Fan, Jing; Wade, Joshua W.; Bian, Dayi; Key, Alexandra P.; Warren, Zachary E.; Mion, Lorraine C.; Sarkar, Nilanjan

    2017-01-01

    Autism Spectrum Disorder (ASD) is a prevalent and costly neurodevelopmental disorder. Individuals with ASD often have deficits in social communication skills as well as adaptive behavior skills related to daily activities. We have recently designed a novel virtual reality (VR) based driving simulator for driving skill training for individuals with ASD. In this paper, we explored the feasibility of detecting engagement level, emotional states, and mental workload during VR-based driving using EEG as a first step towards a potential EEG-based Brain Computer Interface (BCI) for assisting autism intervention. We used spectral features of EEG signals from a 14-channel EEG neuroheadset, together with therapist ratings of behavioral engagement, enjoyment, frustration, boredom, and difficulty to train a group of classification models. Seven classification methods were applied and compared including Bayes network, naïve Bayes, Support Vector Machine (SVM), multilayer perceptron, K-nearest neighbors (KNN), random forest, and J48. The classification results were promising, with over 80% accuracy in classifying engagement and mental workload, and over 75% accuracy in classifying emotional states. Such results may lead to an adaptive closed-loop VR-based skill training system for use in autism intervention. PMID:26737113

  15. Early Remission Is a Realistic Target in a Majority of Patients with DMARD-naive Rheumatoid Arthritis.

    PubMed

    Rannio, Tuomas; Asikainen, Juha; Kokko, Arto; Hannonen, Pekka; Sokka, Tuulikki

    2016-04-01

    We analyzed remission rates at 3 and 12 months in patients with rheumatoid arthritis (RA) who were naive for disease-modifying antirheumatic drugs (DMARD) and who were treated in a Finnish rheumatology clinic from 2008 to 2011. We compared remission rates and drug treatments between patients with RA and patients with undifferentiated arthritis (UA). Data from all DMARD-naive RA and UA patients from the healthcare district were collected using software that includes demographic and clinical characteristics, disease activity, medications, and patient-reported outcomes. Our rheumatology clinic applies the treat-to-target principle, electronic monitoring of patients, and multidisciplinary care. Out of 409 patients, 406 had data for classification by the 2010 RA criteria of the American College of Rheumatology/European League Against Rheumatism. A total of 68% were female, and mean age (SD) was 58 (16) years. Respectively, 56%, 60%, and 68% were positive for anticyclic citrullinated peptide antibodies (anti-CCP), rheumatoid factor (RF), and RF/anti-CCP, and 19% had erosive disease. The median (interquartile range) duration of symptoms was 6 (4-12) months. A total of 310 were classified as RA and 96 as UA. The patients with UA were younger, had better functional status and lower disease activity, and were more often seronegative than the patients with RA. The 28-joint Disease Activity Score (3 variables) remission rates of RA and UA patients at 3 months were 67% and 58% (p = 0.13), and at 12 months, 71% and 79%, respectively (p = 0.16). Sustained remission was observed in 57%/56% of RA/UA patients. Patients with RA used more conventional synthetic DMARD combinations than did patients with UA. None used biological DMARD at 3 months, and only 2.7%/1.1% of the patients (RA/UA) used them at 12 months (p = 0.36). Remarkably high remission rates are achievable in real-world DMARD-naive patients with RA or UA.

  16. Use of machine-learning classifiers to predict requests for preoperative acute pain service consultation.

    PubMed

    Tighe, Patrick J; Lucas, Stephen D; Edwards, David A; Boezaart, André P; Aytug, Haldun; Bihorac, Azra

    2012-10-01

      The purpose of this project was to determine whether machine-learning classifiers could predict which patients would require a preoperative acute pain service (APS) consultation.   Retrospective cohort.   University teaching hospital.   The records of 9,860 surgical patients posted between January 1 and June 30, 2010 were reviewed.   Request for APS consultation. A cohort of machine-learning classifiers was compared according to its ability or inability to classify surgical cases as requiring a request for a preoperative APS consultation. Classifiers were then optimized utilizing ensemble techniques. Computational efficiency was measured with the central processing unit processing times required for model training. Classifiers were tested using the full feature set, as well as the reduced feature set that was optimized using a merit-based dimensional reduction strategy.   Machine-learning classifiers correctly predicted preoperative requests for APS consultations in 92.3% (95% confidence intervals [CI], 91.8-92.8) of all surgical cases. Bayesian methods yielded the highest area under the receiver operating curve (0.87, 95% CI 0.84-0.89) and lowest training times (0.0018 seconds, 95% CI, 0.0017-0.0019 for the NaiveBayesUpdateable algorithm). An ensemble of high-performing machine-learning classifiers did not yield a higher area under the receiver operating curve than its component classifiers. Dimensional reduction decreased the computational requirements for multiple classifiers, but did not adversely affect classification performance.   Using historical data, machine-learning classifiers can predict which surgical cases should prompt a preoperative request for an APS consultation. Dimensional reduction improved computational efficiency and preserved predictive performance. Wiley Periodicals, Inc.

  17. Analyzing a Lung Cancer Patient Dataset with the Focus on Predicting Survival Rate One Year after Thoracic Surgery

    PubMed

    Rezaei Hachesu, Peyman; Moftian, Nazila; Dehghani, Mahsa; Samad Soltani, Taha

    2017-06-25

    Background: Data mining, a new concept introduced in the mid-1990s, can help researchers to gain new, profound insights and facilitate access to unanticipated knowledge sources in biomedical datasets. Many issues in the medical field are concerned with the diagnosis of diseases based on tests conducted on individuals at risk. Early diagnosis and treatment can provide a better outcome regarding the survival of lung cancer patients. Researchers can use data mining techniques to create effective diagnostic models. The aim of this study was to evaluate patterns existing in risk factor data of for mortality one year after thoracic surgery for lung cancer. Methods: The dataset used in this study contained 470 records and 17 features. First, the most important variables involved in the incidence of lung cancer were extracted using knowledge discovery and datamining algorithms such as naive Bayes, maximum expectation and then, using a regression analysis algorithm, a questionnaire was developed to predict the risk of death one year after lung surgery. Outliers in the data were excluded and reported using the clustering algorithm. Finally, a calculator was designed to estimate the risk for one-year post-operative mortality based on a scorecard algorithm. Results: The results revealed the most important factor involved in increased mortality to be large tumor size. Roles for type II diabetes and preoperative dyspnea in lower survival were also identified. The greatest commonality in classification of patients was Forced expiratory volume in first second (FEV1), based on levels of which patients could be classified into different categories. Conclusion: Development of a questionnaire based on calculations to diagnose disease can be used to identify and fill knowledge gaps in clinical practice guidelines. Creative Commons Attribution License

  18. Sensitivity and specificity of machine learning classifiers and spectral domain OCT for the diagnosis of glaucoma.

    PubMed

    Vidotti, Vanessa G; Costa, Vital P; Silva, Fabrício R; Resende, Graziela M; Cremasco, Fernanda; Dias, Marcelo; Gomi, Edson S

    2012-06-15

    Purpose. To investigate the sensitivity and specificity of machine learning classifiers (MLC) and spectral domain optical coherence tomography (SD-OCT) for the diagnosis of glaucoma. Methods. Sixty-two patients with early to moderate glaucomatous visual field damage and 48 healthy individuals were included. All subjects underwent a complete ophthalmologic examination, achromatic standard automated perimetry, and RNFL imaging with SD-OCT (Cirrus HD-OCT; Carl Zeiss Meditec, Inc., Dublin, California, USA). Receiver operating characteristic (ROC) curves were obtained for all SD-OCT parameters. Subsequently, the following MLCs were tested: Classification Tree (CTREE), Random Forest (RAN), Bagging (BAG), AdaBoost M1 (ADA), Ensemble Selection (ENS), Multilayer Perceptron (MLP), Radial Basis Function (RBF), Naive-Bayes (NB), and Support Vector Machine (SVM). Areas under the ROC curves (aROCs) obtained for each parameter and each MLC were compared. Results. The mean age was 57.0±9.2 years for healthy individuals and 59.9±9.0 years for glaucoma patients (p=0.103). Mean deviation values were -4.1±2.4 dB for glaucoma patients and -1.5±1.6 dB for healthy individuals (p<0.001). The SD-OCT parameters with the greater aROCs were inferior quadrant (0.813), average thickness (0.807), 7 o'clock position (0.765), and 6 o'clock position (0.754). The aROCs from classifiers varied from 0.785 (ADA) to 0.818 (BAG). The aROC obtained with BAG was not significantly different from the aROC obtained with the best single SD-OCT parameter (p=0.93). Conclusions. The SD-OCT showed good diagnostic accuracy in a group of patients with early glaucoma. In this series, MLCs did not improve the sensitivity and specificity of SD-OCT for the diagnosis of glaucoma.

  19. Predicting Classifier Performance with Limited Training Data: Applications to Computer-Aided Diagnosis in Breast and Prostate Cancer

    PubMed Central

    Basavanhally, Ajay; Viswanath, Satish; Madabhushi, Anant

    2015-01-01

    Clinical trials increasingly employ medical imaging data in conjunction with supervised classifiers, where the latter require large amounts of training data to accurately model the system. Yet, a classifier selected at the start of the trial based on smaller and more accessible datasets may yield inaccurate and unstable classification performance. In this paper, we aim to address two common concerns in classifier selection for clinical trials: (1) predicting expected classifier performance for large datasets based on error rates calculated from smaller datasets and (2) the selection of appropriate classifiers based on expected performance for larger datasets. We present a framework for comparative evaluation of classifiers using only limited amounts of training data by using random repeated sampling (RRS) in conjunction with a cross-validation sampling strategy. Extrapolated error rates are subsequently validated via comparison with leave-one-out cross-validation performed on a larger dataset. The ability to predict error rates as dataset size increases is demonstrated on both synthetic data as well as three different computational imaging tasks: detecting cancerous image regions in prostate histopathology, differentiating high and low grade cancer in breast histopathology, and detecting cancerous metavoxels in prostate magnetic resonance spectroscopy. For each task, the relationships between 3 distinct classifiers (k-nearest neighbor, naive Bayes, Support Vector Machine) are explored. Further quantitative evaluation in terms of interquartile range (IQR) suggests that our approach consistently yields error rates with lower variability (mean IQRs of 0.0070, 0.0127, and 0.0140) than a traditional RRS approach (mean IQRs of 0.0297, 0.0779, and 0.305) that does not employ cross-validation sampling for all three datasets. PMID:25993029

  20. Scaling-laws of human broadcast communication enable distinction between human, corporate and robot Twitter users.

    PubMed

    Tavares, Gabriela; Faisal, Aldo

    2013-01-01

    Human behaviour is highly individual by nature, yet statistical structures are emerging which seem to govern the actions of human beings collectively. Here we search for universal statistical laws dictating the timing of human actions in communication decisions. We focus on the distribution of the time interval between messages in human broadcast communication, as documented in Twitter, and study a collection of over 160,000 tweets for three user categories: personal (controlled by one person), managed (typically PR agency controlled) and bot-controlled (automated system). To test our hypothesis, we investigate whether it is possible to differentiate between user types based on tweet timing behaviour, independently of the content in messages. For this purpose, we developed a system to process a large amount of tweets for reality mining and implemented two simple probabilistic inference algorithms: 1. a naive Bayes classifier, which distinguishes between two and three account categories with classification performance of 84.6% and 75.8%, respectively and 2. a prediction algorithm to estimate the time of a user's next tweet with an R(2) ≈ 0.7. Our results show that we can reliably distinguish between the three user categories as well as predict the distribution of a user's inter-message time with reasonable accuracy. More importantly, we identify a characteristic power-law decrease in the tail of inter-message time distribution by human users which is different from that obtained for managed and automated accounts. This result is evidence of a universal law that permeates the timing of human decisions in broadcast communication and extends the findings of several previous studies of peer-to-peer communication.

  1. Scaling-Laws of Human Broadcast Communication Enable Distinction between Human, Corporate and Robot Twitter Users

    PubMed Central

    Tavares, Gabriela; Faisal, Aldo

    2013-01-01

    Human behaviour is highly individual by nature, yet statistical structures are emerging which seem to govern the actions of human beings collectively. Here we search for universal statistical laws dictating the timing of human actions in communication decisions. We focus on the distribution of the time interval between messages in human broadcast communication, as documented in Twitter, and study a collection of over 160,000 tweets for three user categories: personal (controlled by one person), managed (typically PR agency controlled) and bot-controlled (automated system). To test our hypothesis, we investigate whether it is possible to differentiate between user types based on tweet timing behaviour, independently of the content in messages. For this purpose, we developed a system to process a large amount of tweets for reality mining and implemented two simple probabilistic inference algorithms: 1. a naive Bayes classifier, which distinguishes between two and three account categories with classification performance of 84.6% and 75.8%, respectively and 2. a prediction algorithm to estimate the time of a user's next tweet with an . Our results show that we can reliably distinguish between the three user categories as well as predict the distribution of a user's inter-message time with reasonable accuracy. More importantly, we identify a characteristic power-law decrease in the tail of inter-message time distribution by human users which is different from that obtained for managed and automated accounts. This result is evidence of a universal law that permeates the timing of human decisions in broadcast communication and extends the findings of several previous studies of peer-to-peer communication. PMID:23843945

  2. Nearest Neighbor Algorithms for Pattern Classification

    NASA Technical Reports Server (NTRS)

    Barrios, J. O.

    1972-01-01

    A solution of the discrimination problem is considered by means of the minimum distance classifier, commonly referred to as the nearest neighbor (NN) rule. The NN rule is nonparametric, or distribution free, in the sense that it does not depend on any assumptions about the underlying statistics for its application. The k-NN rule is a procedure that assigns an observation vector z to a category F if most of the k nearby observations x sub i are elements of F. The condensed nearest neighbor (CNN) rule may be used to reduce the size of the training set required categorize The Bayes risk serves merely as a reference-the limit of excellence beyond which it is not possible to go. The NN rule is bounded below by the Bayes risk and above by twice the Bayes risk.

  3. On the Discriminant Analysis in the 2-Populations Case

    NASA Astrophysics Data System (ADS)

    Rublík, František

    2008-01-01

    The empirical Bayes Gaussian rule, which in the normal case yields good values of the probability of total error, may yield high values of the maximum probability error. From this point of view the presented modified version of the classification rule of Broffitt, Randles and Hogg appears to be superior. The modification included in this paper is termed as a WR method, and the choice of its weights is discussed. The mentioned methods are also compared with the K nearest neighbours classification rule.

  4. Segmentation of white blood cells and comparison of cell morphology by linear and naïve Bayes classifiers.

    PubMed

    Prinyakupt, Jaroonrut; Pluempitiwiriyawej, Charnchai

    2015-06-30

    Blood smear microscopic images are routinely investigated by haematologists to diagnose most blood diseases. However, the task is quite tedious and time consuming. An automatic detection and classification of white blood cells within such images can accelerate the process tremendously. In this paper we propose a system to locate white blood cells within microscopic blood smear images, segment them into nucleus and cytoplasm regions, extract suitable features and finally, classify them into five types: basophil, eosinophil, neutrophil, lymphocyte and monocyte. Two sets of blood smear images were used in this study's experiments. Dataset 1, collected from Rangsit University, were normal peripheral blood slides under light microscope with 100× magnification; 555 images with 601 white blood cells were captured by a Nikon DS-Fi2 high-definition color camera and saved in JPG format of size 960 × 1,280 pixels at 15 pixels per 1 μm resolution. In dataset 2, 477 cropped white blood cell images were downloaded from CellaVision.com. They are in JPG format of size 360 × 363 pixels. The resolution is estimated to be 10 pixels per 1 μm. The proposed system comprises a pre-processing step, nucleus segmentation, cell segmentation, feature extraction, feature selection and classification. The main concept of the segmentation algorithm employed uses white blood cell's morphological properties and the calibrated size of a real cell relative to image resolution. The segmentation process combined thresholding, morphological operation and ellipse curve fitting. Consequently, several features were extracted from the segmented nucleus and cytoplasm regions. Prominent features were then chosen by a greedy search algorithm called sequential forward selection. Finally, with a set of selected prominent features, both linear and naïve Bayes classifiers were applied for performance comparison. This system was tested on normal peripheral blood smear slide images from two datasets. Two sets of comparison were performed: segmentation and classification. The automatically segmented results were compared to the ones obtained manually by a haematologist. It was found that the proposed method is consistent and coherent in both datasets, with dice similarity of 98.9 and 91.6% for average segmented nucleus and cell regions, respectively. Furthermore, the overall correction rate in the classification phase is about 98 and 94% for linear and naïve Bayes models, respectively. The proposed system, based on normal white blood cell morphology and its characteristics, was applied to two different datasets. The results of the calibrated segmentation process on both datasets are fast, robust, efficient and coherent. Meanwhile, the classification of normal white blood cells into five types shows high sensitivity in both linear and naïve Bayes models, with slightly better results in the linear classifier.

  5. Decadal Trend in Agricultural Abandonment and Woodland Expansion in an Agro-Pastoral Transition Band in Northern China.

    PubMed

    Wang, Chao; Gao, Qiong; Wang, Xian; Yu, Mei

    2015-01-01

    Land use land cover (LULC) changes frequently in ecotones due to the large climate and soil gradients, and complex landscape composition and configuration. Accurate mapping of LULC changes in ecotones is of great importance for assessment of ecosystem functions/services and policy-decision support. Decadal or sub-decadal mapping of LULC provides scenarios for modeling biogeochemical processes and their feedbacks to climate, and evaluating effectiveness of land-use policies, e.g. forest conversion. However, it remains a great challenge to produce reliable LULC maps in moderate resolution and to evaluate their uncertainties over large areas with complex landscapes. In this study we developed a robust LULC classification system using multiple classifiers based on MODIS (Moderate Resolution Imaging Spectroradiometer) data and posterior data fusion. Not only does the system create LULC maps with high statistical accuracy, but also it provides pixel-level uncertainties that are essential for subsequent analyses and applications. We applied the classification system to the Agro-pasture transition band in northern China (APTBNC) to detect the decadal changes in LULC during 2003-2013 and evaluated the effectiveness of the implementation of major Key Forestry Programs (KFPs). In our study, the random forest (RF), support vector machine (SVM), and weighted k-nearest neighbors (WKNN) classifiers outperformed the artificial neural networks (ANN) and naive Bayes (NB) in terms of high classification accuracy and low sensitivity to training sample size. The Bayesian-average data fusion based on the results of RF, SVM, and WKNN achieved the 87.5% Kappa statistics, higher than any individual classifiers and the majority-vote integration. The pixel-level uncertainty map agreed with the traditional accuracy assessment. However, it conveys spatial variation of uncertainty. Specifically, it pinpoints the southwestern area of APTBNC has higher uncertainty than other part of the region, and the open shrubland is likely to be misclassified to the bare ground in some locations. Forests, closed shrublands, and grasslands in APTBNC expanded by 23%, 50%, and 9%, respectively, during 2003-2013. The expansion of these land cover types is compensated with the shrinkages in croplands (20%), bare ground (15%), and open shrublands (30%). The significant decline in agricultural lands is primarily attributed to the KFPs implemented in the end of last century and the nationwide urbanization in recent decade. The increased coverage of grass and woody plants would largely reduce soil erosion, improve mitigation of climate change, and enhance carbon sequestration in this region.

  6. Occupancy estimation and modeling with multiple states and state uncertainty

    USGS Publications Warehouse

    Nichols, J.D.; Hines, J.E.; MacKenzie, D.I.; Seamans, M.E.; Gutierrez, R.J.

    2007-01-01

    The distribution of a species over space is of central interest in ecology, but species occurrence does not provide all of the information needed to characterize either the well-being of a population or the suitability of occupied habitat. Recent methodological development has focused on drawing inferences about species occurrence in the face of imperfect detection. Here we extend those methods by characterizing occupied locations by some additional state variable ( e. g., as producing young or not). Our modeling approach deals with both detection probabilities,1 and uncertainty in state classification. We then use the approach with occupancy and reproductive rate data from California Spotted Owls (Strix occidentalis occidentalis) collected in the central Sierra Nevada during the breeding season of 2004 to illustrate the utility of the modeling approach. Estimates of owl reproductive rate were larger than naive estimates, indicating the importance of appropriately accounting for uncertainty in detection and state classification.

  7. Spatial patterning of water quality in Biscayne Bay, Florida as a function of land use and water management.

    PubMed

    Caccia, Valentina G; Boyer, Joseph N

    2005-11-01

    An objective classification analysis was performed on a water quality data set from 25 sites collected monthly during 1994-2003. The water quality parameters measured included: TN, TON, DIN, NH4+, NO3-, NO2-, TP, SRP, TN:TP ratio, TOC, DO, CHL A, turbidity, salinity and temperature. Based on this spatial analysis, Biscayne Bay was divided into five zones having similar water quality characteristics. A robust nutrient gradient, driven mostly by dissolved inorganic nitrogen, from alongshore to offshore in the main Bay, was a large determinant in the spatial clustering. Two of these zones (Alongshore and Inshore) were heavily influenced by freshwater input from four canals which drain the South Dade agricultural area, Black Point Landfill, and sewage treatment plant. The North Bay zone, with high turbidity, phytoplankton biomass, total phosphorus, and low DO, was affected by runoff from five canals, the Munisport Landfill, and the urban landscape. The South Bay zone, an embayment surrounded by mangrove wetlands with little urban development, was high in dissolved organic constituents but low in inorganic nutrients. The Main Bay was the area most influenced by water exchange with the Atlantic Ocean and showed the lowest nutrient concentrations. The water quality in Biscayne Bay is therefore highly dependent of the land use and influence from the watershed.

  8. A study and evaluation of image analysis techniques applied to remotely sensed data

    NASA Technical Reports Server (NTRS)

    Atkinson, R. J.; Dasarathy, B. V.; Lybanon, M.; Ramapriyan, H. K.

    1976-01-01

    An analysis of phenomena causing nonlinearities in the transformation from Landsat multispectral scanner coordinates to ground coordinates is presented. Experimental results comparing rms errors at ground control points indicated a slight improvement when a nonlinear (8-parameter) transformation was used instead of an affine (6-parameter) transformation. Using a preliminary ground truth map of a test site in Alabama covering the Mobile Bay area and six Landsat images of the same scene, several classification methods were assessed. A methodology was developed for automatic change detection using classification/cluster maps. A coding scheme was employed for generation of change depiction maps indicating specific types of changes. Inter- and intraseasonal data of the Mobile Bay test area were compared to illustrate the method. A beginning was made in the study of data compression by applying a Karhunen-Loeve transform technique to a small section of the test data set. The second part of the report provides a formal documentation of the several programs developed for the analysis and assessments presented.

  9. Habitat suitability index model of the sea cucumber Apostichopus japonicus (Selenka): A case study of Shandong Peninsula, China.

    PubMed

    Zhang, Zhipeng; Zhou, Jian; Song, Jingjing; Wang, Qixiang; Liu, Hongjun; Tang, Xuexi

    2017-09-15

    A habitat suitability index (HSI) model for the sea cucumber Apostichopus japonicus (Selenka) was established in the present study. Based on geographic information systems, the HSI model was used to identify potential sites around the Shandong Peninsula suitable for restoration of immature (<25g) and mature (>25g) A. japonicus. Six habitat factors were used as input variables for the HSI model: sediment classification, water temperature, salinity, water depth, pH and dissolved oxygen. The weighting of each habitat factor was defined through the Delphi method. Sediment classification was the most important condition affecting the HSI of A. japonicus in the different study areas, while water temperature was the most important condition in different seasons. The HSI of Western Laizhou Bay was relatively low, meaning the site was not suitable for aquaculture-based restoration of A. japonicus. In contrast, Xiaoheishan Island, Rongcheng Bay and Qingdao were preferable sites, suitable as habitats for restoration efforts. Copyright © 2017 Elsevier Ltd. All rights reserved.

  10. Multivariate geometry as an approach to algal community analysis

    USGS Publications Warehouse

    Allen, T.F.H.; Skagen, S.

    1973-01-01

    Multivariate analyses are put in the context of more usual approaches to phycological investigations. The intuitive common-sense involved in methods of ordination, classification and discrimination are emphasised by simple geometric accounts which avoid jargon and matrix algebra. Warnings are given that artifacts result from technique abuses by the naive or over-enthusiastic. An analysis of a simple periphyton data set is presented as an example of the approach. Suggestions are made as to situations in phycological investigations, where the techniques could be appropriate. The discipline is reprimanded for its neglect of the multivariate approach.

  11. NASA Ames DEVELOP Interns Collaborate with the South Bay Salt Pond Restoration Project to Monitor and Study Restoration Efforts using NASA's Satellites

    NASA Technical Reports Server (NTRS)

    Newcomer, Michelle E.; Kuss, Amber Jean; Nguyen, Andrew; Schmidt, Cynthia L.

    2012-01-01

    In the past, natural tidal marshes in the south bay were segmented by levees and converted into ponds for use in salt production. In an effort to provide habitat for migratory birds and other native plants and animals, as well as to rebuild natural capital, the South Bay Salt Pond Restoration Project (SBSPRP) is focused on restoring a portion of the over 15,000 acres of wetlands in California's South San Francisco Bay. The process of restoration begins when a levee is breached; the bay water and sediment flow into the ponds and eventually restore natural tidal marshes. Since the spring of 2010 the NASA Ames Research Center (ARC) DEVELOP student internship program has collaborated with the South Bay Salt Pond Restoration Project (SBSPRP) to study the effects of these restoration efforts and to provide valuable information to assist in habitat management and ecological forecasting. All of the studies were based on remote sensing techniques -- NASA's area of expertise in the field of Earth Science, and used various analytical techniques such as predictive modeling, flora and fauna classification, and spectral detection, to name a few. Each study was conducted by a team of aspiring scientists as a part of the DEVELOP program at Ames.

  12. Detection of Cardiovascular Disease Risk's Level for Adults Using Naive Bayes Classifier.

    PubMed

    Miranda, Eka; Irwansyah, Edy; Amelga, Alowisius Y; Maribondang, Marco M; Salim, Mulyadi

    2016-07-01

    The number of deaths caused by cardiovascular disease and stroke is predicted to reach 23.3 million in 2030. As a contribution to support prevention of this phenomenon, this paper proposes a mining model using a naïve Bayes classifier that could detect cardiovascular disease and identify its risk level for adults. The process of designing the method began by identifying the knowledge related to the cardiovascular disease profile and the level of cardiovascular disease risk factors for adults based on the medical record, and designing a mining technique model using a naïve Bayes classifier. Evaluation of this research employed two methods: accuracy, sensitivity, and specificity calculation as well as an evaluation session with cardiologists and internists. The characteristics of cardiovascular disease are identified by its primary risk factors. Those factors are diabetes mellitus, the level of lipids in the blood, coronary artery function, and kidney function. Class labels were assigned according to the values of these factors: risk level 1, risk level 2 and risk level 3. The evaluation of the classifier performance (accuracy, sensitivity, and specificity) in this research showed that the proposed model predicted the class label of tuples correctly (above 80%). More than eighty percent of respondents (including cardiologists and internists) who participated in the evaluation session agree till strongly agreed that this research followed medical procedures and that the result can support medical analysis related to cardiovascular disease. The research showed that the proposed model achieves good performance for risk level detection of cardiovascular disease.

  13. Protein Secondary Structure Prediction Using AutoEncoder Network and Bayes Classifier

    NASA Astrophysics Data System (ADS)

    Wang, Leilei; Cheng, Jinyong

    2018-03-01

    Protein secondary structure prediction is belong to bioinformatics,and it's important in research area. In this paper, we propose a new prediction way of protein using bayes classifier and autoEncoder network. Our experiments show some algorithms including the construction of the model, the classification of parameters and so on. The data set is a typical CB513 data set for protein. In terms of accuracy, the method is the cross validation based on the 3-fold. Then we can get the Q3 accuracy. Paper results illustrate that the autoencoder network improved the prediction accuracy of protein secondary structure.

  14. Optimized Motor Imagery Paradigm Based on Imagining Chinese Characters Writing Movement.

    PubMed

    Qiu, Zhaoyang; Allison, Brendan Z; Jin, Jing; Zhang, Yu; Wang, Xingyu; Li, Wei; Cichocki, Andrzej

    2017-07-01

    motor imagery (MI) is a mental representation of motor behavior. The MI-based brain computer interfaces (BCIs) can provide communication for the physically impaired. The performance of MI-based BCI mainly depends on the subject's ability to self-modulate electroencephalogram signals. Proper training can help naive subjects learn to modulate brain activity proficiently. However, training subjects typically involve abstract motor tasks and are time-consuming. to improve the performance of naive subjects during motor imagery, a novel paradigm was presented that would guide naive subjects to modulate brain activity effectively. In this new paradigm, pictures of the left or right hand were used as cues for subjects to finish the motor imagery task. Fourteen healthy subjects (11 male, aged 22-25 years, and mean 23.6±1.16) participated in this study. The task was to imagine writing a Chinese character. Specifically, subjects could imagine hand movements corresponding to the sequence of writing strokes in the Chinese character. This paradigm was meant to find an effective and familiar action for most Chinese people, to provide them with a specific, extensively practiced task and help them modulate brain activity. results showed that the writing task paradigm yielded significantly better performance than the traditional arrow paradigm (p < 0.001). Questionnaire replies indicated that most subjects thought that the new paradigm was easier. the proposed new motor imagery paradigm could guide subjects to help them modulate brain activity effectively. Results showed that there were significant improvements using new paradigm, both in classification accuracy and usability.

  15. Single-accelerometer-based daily physical activity classification.

    PubMed

    Long, Xi; Yin, Bin; Aarts, Ronald M

    2009-01-01

    In this study, a single tri-axial accelerometer placed on the waist was used to record the acceleration data for human physical activity classification. The data collection involved 24 subjects performing daily real-life activities in a naturalistic environment without researchers' intervention. For the purpose of assessing customers' daily energy expenditure, walking, running, cycling, driving, and sports were chosen as target activities for classification. This study compared a Bayesian classification with that of a Decision Tree based approach. A Bayes classifier has the advantage to be more extensible, requiring little effort in classifier retraining and software update upon further expansion or modification of the target activities. Principal components analysis was applied to remove the correlation among features and to reduce the feature vector dimension. Experiments using leave-one-subject-out and 10-fold cross validation protocols revealed a classification accuracy of approximately 80%, which was comparable with that obtained by a Decision Tree classifier.

  16. Multinomial mixture model with heterogeneous classification probabilities

    USGS Publications Warehouse

    Holland, M.D.; Gray, B.R.

    2011-01-01

    Royle and Link (Ecology 86(9):2505-2512, 2005) proposed an analytical method that allowed estimation of multinomial distribution parameters and classification probabilities from categorical data measured with error. While useful, we demonstrate algebraically and by simulations that this method yields biased multinomial parameter estimates when the probabilities of correct category classifications vary among sampling units. We address this shortcoming by treating these probabilities as logit-normal random variables within a Bayesian framework. We use Markov chain Monte Carlo to compute Bayes estimates from a simulated sample from the posterior distribution. Based on simulations, this elaborated Royle-Link model yields nearly unbiased estimates of multinomial and correct classification probability estimates when classification probabilities are allowed to vary according to the normal distribution on the logit scale or according to the Beta distribution. The method is illustrated using categorical submersed aquatic vegetation data. ?? 2010 Springer Science+Business Media, LLC.

  17. Geothermal Potential of Adak Island, Alaska

    DTIC Science & Technology

    1985-10-01

    alteration of the Andrew Bay Hot Springs is essentially propylitic , with the introduction of pyrite and the conversion of magnetite to pyrite. This pyritic...features: Goethite coats the walls of a 1-mm fracture in this rock. Classification: Propylitically altered andesite porphyry breccia. 71 NWC TP 6676 Date: 20

  18. Cell of origin associated classification of B-cell malignancies by gene signatures of the normal B-cell hierarchy.

    PubMed

    Johnsen, Hans Erik; Bergkvist, Kim Steve; Schmitz, Alexander; Kjeldsen, Malene Krag; Hansen, Steen Møller; Gaihede, Michael; Nørgaard, Martin Agge; Bæch, John; Grønholdt, Marie-Louise; Jensen, Frank Svendsen; Johansen, Preben; Bødker, Julie Støve; Bøgsted, Martin; Dybkær, Karen

    2014-06-01

    Recent findings have suggested biological classification of B-cell malignancies as exemplified by the "activated B-cell-like" (ABC), the "germinal-center B-cell-like" (GCB) and primary mediastinal B-cell lymphoma (PMBL) subtypes of diffuse large B-cell lymphoma and "recurrent translocation and cyclin D" (TC) classification of multiple myeloma. Biological classification of B-cell derived cancers may be refined by a direct and systematic strategy where identification and characterization of normal B-cell differentiation subsets are used to define the cancer cell of origin phenotype. Here we propose a strategy combining multiparametric flow cytometry, global gene expression profiling and biostatistical modeling to generate B-cell subset specific gene signatures from sorted normal human immature, naive, germinal centrocytes and centroblasts, post-germinal memory B-cells, plasmablasts and plasma cells from available lymphoid tissues including lymph nodes, tonsils, thymus, peripheral blood and bone marrow. This strategy will provide an accurate image of the stage of differentiation, which prospectively can be used to classify any B-cell malignancy and eventually purify tumor cells. This report briefly describes the current models of the normal B-cell subset differentiation in multiple tissues and the pathogenesis of malignancies originating from the normal germinal B-cell hierarchy.

  19. Classifying environmentally significant urban land uses with satellite imagery.

    PubMed

    Park, Mi-Hyun; Stenstrom, Michael K

    2008-01-01

    We investigated Bayesian networks to classify urban land use from satellite imagery. Landsat Enhanced Thematic Mapper Plus (ETM(+)) images were used for the classification in two study areas: (1) Marina del Rey and its vicinity in the Santa Monica Bay Watershed, CA and (2) drainage basins adjacent to the Sweetwater Reservoir in San Diego, CA. Bayesian networks provided 80-95% classification accuracy for urban land use using four different classification systems. The classifications were robust with small training data sets with normal and reduced radiometric resolution. The networks needed only 5% of the total data (i.e., 1500 pixels) for sample size and only 5- or 6-bit information for accurate classification. The network explicitly showed the relationship among variables from its structure and was also capable of utilizing information from non-spectral data. The classification can be used to provide timely and inexpensive land use information over large areas for environmental purposes such as estimating stormwater pollutant loads.

  20. Big data analytics for early detection of breast cancer based on machine learning

    NASA Astrophysics Data System (ADS)

    Ivanova, Desislava

    2017-12-01

    This paper presents the concept and the modern advances in personalized medicine that rely on technology and review the existing tools for early detection of breast cancer. The breast cancer types and distribution worldwide is discussed. It is spent time to explain the importance of identifying the normality and to specify the main classes in breast cancer, benign or malignant. The main purpose of the paper is to propose a conceptual model for early detection of breast cancer based on machine learning for processing and analysis of medical big dataand further knowledge discovery for personalized treatment. The proposed conceptual model is realized by using Naive Bayes classifier. The software is written in python programming language and for the experiments the Wisconsin breast cancer database is used. Finally, the experimental results are presented and discussed.

  1. Classification of wetlands vegetation using small scale color infrared imagery

    NASA Technical Reports Server (NTRS)

    Williamson, F. S. L.

    1975-01-01

    A classification system for Chesapeake Bay wetlands was derived from the correlation of film density classes and actual vegetation classes. The data processing programs used were developed by the Laboratory for the Applications of Remote Sensing. These programs were tested for their value in classifying natural vegetation, using digitized data from small scale aerial photography. Existing imagery and the vegetation map of Farm Creek Marsh were used to determine the optimal number of classes, and to aid in determining if the computer maps were a believable product.

  2. Consistent latent position estimation and vertex classification for random dot product graphs.

    PubMed

    Sussman, Daniel L; Tang, Minh; Priebe, Carey E

    2014-01-01

    In this work, we show that using the eigen-decomposition of the adjacency matrix, we can consistently estimate latent positions for random dot product graphs provided the latent positions are i.i.d. from some distribution. If class labels are observed for a number of vertices tending to infinity, then we show that the remaining vertices can be classified with error converging to Bayes optimal using the $(k)$-nearest-neighbors classification rule. We evaluate the proposed methods on simulated data and a graph derived from Wikipedia.

  3. Forward for book entitled "Estuaries: Classification, Ecology, and Human Impacts"

    EPA Science Inventory

    The author was introduced to the science of estuaries as a graduate student in the early 1980s, studying the ecology of oyster populations in Chesapeake Bay. To undertake this research, he needed to learn not only about oyster biology, but also about the unique physical and chemi...

  4. Question analysis for Indonesian comparative question

    NASA Astrophysics Data System (ADS)

    Saelan, A.; Purwarianti, A.; Widyantoro, D. H.

    2017-01-01

    Information seeking is one of human needs today. Comparing things using search engine surely take more times than search only one thing. In this paper, we analyzed comparative questions for comparative question answering system. Comparative question is a question that comparing two or more entities. We grouped comparative questions into 5 types: selection between mentioned entities, selection between unmentioned entities, selection between any entity, comparison, and yes or no question. Then we extracted 4 types of information from comparative questions: entity, aspect, comparison, and constraint. We built classifiers for classification task and information extraction task. Features used for classification task are bag of words, whether for information extraction, we used lexical, 2 previous and following words lexical, and previous label as features. We tried 2 scenarios: classification first and extraction first. For classification first, we used classification result as a feature for extraction. Otherwise, for extraction first, we used extraction result as features for classification. We found that the result would be better if we do extraction first before classification. For the extraction task, classification using SMO gave the best result (88.78%), while for classification, it is better to use naïve bayes (82.35%).

  5. Hybrid analysis for indicating patients with breast cancer using temperature time series.

    PubMed

    Silva, Lincoln F; Santos, Alair Augusto S M D; Bravo, Renato S; Silva, Aristófanes C; Muchaluat-Saade, Débora C; Conci, Aura

    2016-07-01

    Breast cancer is the most common cancer among women worldwide. Diagnosis and treatment in early stages increase cure chances. The temperature of cancerous tissue is generally higher than that of healthy surrounding tissues, making thermography an option to be considered in screening strategies of this cancer type. This paper proposes a hybrid methodology for analyzing dynamic infrared thermography in order to indicate patients with risk of breast cancer, using unsupervised and supervised machine learning techniques, which characterizes the methodology as hybrid. The dynamic infrared thermography monitors or quantitatively measures temperature changes on the examined surface, after a thermal stress. In the dynamic infrared thermography execution, a sequence of breast thermograms is generated. In the proposed methodology, this sequence is processed and analyzed by several techniques. First, the region of the breasts is segmented and the thermograms of the sequence are registered. Then, temperature time series are built and the k-means algorithm is applied on these series using various values of k. Clustering formed by k-means algorithm, for each k value, is evaluated using clustering validation indices, generating values treated as features in the classification model construction step. A data mining tool was used to solve the combined algorithm selection and hyperparameter optimization (CASH) problem in classification tasks. Besides the classification algorithm recommended by the data mining tool, classifiers based on Bayesian networks, neural networks, decision rules and decision tree were executed on the data set used for evaluation. Test results support that the proposed analysis methodology is able to indicate patients with breast cancer. Among 39 tested classification algorithms, K-Star and Bayes Net presented 100% classification accuracy. Furthermore, among the Bayes Net, multi-layer perceptron, decision table and random forest classification algorithms, an average accuracy of 95.38% was obtained. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  6. A Landsat-Based Assessment of Mobile Bay Land Use and Land Cover Change from 1974 to 2008

    NASA Technical Reports Server (NTRS)

    Spruce, Joseph; Ellis, Jean; Smoot, James; Swann, Roberta; Graham, William

    2009-01-01

    The Mobile Bay region has experienced noteworthy land use and land cover (LULC) change in the latter half of the 20th century. Accompanying this change has been urban expansion and a reduction of rural land uses. Much of this LULC change has reportedly occurred since the landfall of Hurricane Frederic in 1979. The Mobile Bay region provides great economic and ecologic benefits to the Nation, including important coastal habitat for a broad diversity of fisheries and wildlife. Regional urbanization threatens the estuary s water quality and aquatic-habitat dependent biota, including commercial fisheries and avian wildlife. Coastal conservation and urban land use planners require additional information on historical LULC change to support coastal habitat restoration and resiliency management efforts. This presentation discusses results of a Gulf of Mexico Application Pilot project that was conducted in 2008 to quantify and assess LULC change from 1974 to 2008. This project was led by NASA Stennis Space Center and involved multiple Gulf of Mexico Alliance (GOMA) partners, including the Mobile Bay National Estuary Program (NEP), the U.S. Army Corps of Engineers, the National Oceanic and Atmospheric Administration s (NOAA s) National Coastal Data Development Center (NCDDC), and the NOAA Coastal Services Center. Nine Landsat images were employed to compute LULC products because of their availability and suitability for the application. The project also used Landsat-based national LULC products, including coastal LULC products from NOAA s Coastal Change & Analysis Program (C-CAP), available at 5-year intervals since 1995. Our study was initiated in part because C-CAP LULC products were not available to assess the region s urbanization prior to 1995 and subsequent to post Hurricane Katrina in 2006. This project assessed LULC change across the 34-year time frame and at decadal and middecadal scales. The study area included the majority of Mobile and Baldwin counties that encompass Mobile Bay. In doing so, each date of Landsat data was classified using an end-user defined modified Anderson level 1 classification scheme. LULC classifications were refined using a decision rule approach in conjunction with available C-CAP products. Individual dates of LULC classifications were validated by image interpretation of stratified random locations on raw Landsat color composite imagery in combination with higher resolution remote sensing and in-situ reference data. The results indicate that during the 34-year study period, urban areas increased from 96,688 to 150,227 acres, representing a 55.37% increase, or 1.63% per annum. Most of the identified urban expansion results from conversion of rural forest and agriculture to urban cover types. Final LULC mapping and metadata products were produced for the entire study area as well as watersheds of concern within the study area. Final project products, including LULC trend information, were incorporated into the Mobile Bay NEP State of the Bay report. Products and metadata were transferred to NOAA NCDDC to allow free online accessibility and use by GOMA partners and by the public.

  7. Logical Differential Prediction Bayes Net, improving breast cancer diagnosis for older women.

    PubMed

    Nassif, Houssam; Wu, Yirong; Page, David; Burnside, Elizabeth

    2012-01-01

    Overdiagnosis is a phenomenon in which screening identities cancer which may not go on to cause symptoms or death. Women over 65 who develop breast cancer bear the heaviest burden of overdiagnosis. This work introduces novel machine learning algorithms to improve diagnostic accuracy of breast cancer in aging populations. At the same time, we aim at minimizing unnecessary invasive procedures (thus decreasing false positives) and concomitantly addressing overdiagnosis. We develop a novel algorithm. Logical Differential Prediction Bayes Net (LDP-BN), that calculates the risk of breast disease based on mammography findings. LDP-BN uses Inductive Logic Programming (ILP) to learn relational rules, selects older-specific differentially predictive rules, and incorporates them into a Bayes Net, significantly improving its performance. In addition, LDP-BN offers valuable insight into the classification process, revealing novel older-specific rules that link mass presence to invasive, and calcification presence and lack of detectable mass to DCIS.

  8. Association of serum brain derived neurotropic factor with duration of drug-naive period and positive-negative symptom scores in drug naive schizophrenia.

    PubMed

    Bakirhan, Abdurrahim; Yalcin Sahiner, Safak; Sahiner, Ismail Volkan; Safak, Yasir; Goka, Erol

    2017-01-01

    The aim of this study was to compare the serum brain derived neurotropic factor (BNDF) levels of patients with schizophrenia who had never received an antipsychotic treatment with those of a control group. Also, to analyze the relationship between the Positive and Negative Symptom Scale (PANSS) scores and BDNF levels of the patients during the period they were drug-naive. The sample of the study comprised patients who presentedto the Psychiatry Clinic and were admitted after a distinctive schizophrenia diagnosis was made in accordance with the fourth edition of the Diagnostic and Statistical Manual of Mental Disorders (DSM-IV-TR) diagnosis classification and who were not using and never had any antipsychotic medicine. A total of 160 participants were included in the study, 80 of whom had schizophrenia patients and 80 constituted the age- and sex-matched healthy control group. Before the start of the treatment, the serum samples to be checked for the BDNF levels were collected from the patients. The difference between the average BDNF levels of the groups were statistically significant (t = -5.25; p˂.001). An analysis as to whether there was a relation between the BDNF levels and the drug-naïve duration indicated no correlations. An examination of the relationship between PANSS scores and BDNF levels of the patients yielded no correlations. Serum BDNF levels seem to be one of the indicators of schizophrenia and its progress; nevertheless, we still do not have sufficient information about this neurotropic factor. In light of our study, the neurodevelopmental changes that occur at disease onset of the illness prominently affect the progress of the illness, which highlights the importance of the treatment in the early stages.

  9. Robust through-the-wall radar image classification using a target-model alignment procedure.

    PubMed

    Smith, Graeme E; Mobasseri, Bijan G

    2012-02-01

    A through-the-wall radar image (TWRI) bears little resemblance to the equivalent optical image, making it difficult to interpret. To maximize the intelligence that may be obtained, it is desirable to automate the classification of targets in the image to support human operators. This paper presents a technique for classifying stationary targets based on the high-range resolution profile (HRRP) extracted from 3-D TWRIs. The dependence of the image on the target location is discussed using a system point spread function (PSF) approach. It is shown that the position dependence will cause a classifier to fail, unless the image to be classified is aligned to a classifier-training location. A target image alignment technique based on deconvolution of the image with the system PSF is proposed. Comparison of the aligned target images with measured images shows the alignment process introducing normalized mean squared error (NMSE) ≤ 9%. The HRRP extracted from aligned target images are classified using a naive Bayesian classifier supported by principal component analysis. The classifier is tested using a real TWRI of canonical targets behind a concrete wall and shown to obtain correct classification rates ≥ 97%. © 2011 IEEE

  10. Identification of Alfalfa Leaf Diseases Using Image Recognition Technology

    PubMed Central

    Qin, Feng; Liu, Dongxia; Sun, Bingda; Ruan, Liu; Ma, Zhanhong; Wang, Haiguang

    2016-01-01

    Common leaf spot (caused by Pseudopeziza medicaginis), rust (caused by Uromyces striatus), Leptosphaerulina leaf spot (caused by Leptosphaerulina briosiana) and Cercospora leaf spot (caused by Cercospora medicaginis) are the four common types of alfalfa leaf diseases. Timely and accurate diagnoses of these diseases are critical for disease management, alfalfa quality control and the healthy development of the alfalfa industry. In this study, the identification and diagnosis of the four types of alfalfa leaf diseases were investigated using pattern recognition algorithms based on image-processing technology. A sub-image with one or multiple typical lesions was obtained by artificial cutting from each acquired digital disease image. Then the sub-images were segmented using twelve lesion segmentation methods integrated with clustering algorithms (including K_means clustering, fuzzy C-means clustering and K_median clustering) and supervised classification algorithms (including logistic regression analysis, Naive Bayes algorithm, classification and regression tree, and linear discriminant analysis). After a comprehensive comparison, the segmentation method integrating the K_median clustering algorithm and linear discriminant analysis was chosen to obtain lesion images. After the lesion segmentation using this method, a total of 129 texture, color and shape features were extracted from the lesion images. Based on the features selected using three methods (ReliefF, 1R and correlation-based feature selection), disease recognition models were built using three supervised learning methods, including the random forest, support vector machine (SVM) and K-nearest neighbor methods. A comparison of the recognition results of the models was conducted. The results showed that when the ReliefF method was used for feature selection, the SVM model built with the most important 45 features (selected from a total of 129 features) was the optimal model. For this SVM model, the recognition accuracies of the training set and the testing set were 97.64% and 94.74%, respectively. Semi-supervised models for disease recognition were built based on the 45 effective features that were used for building the optimal SVM model. For the optimal semi-supervised models built with three ratios of labeled to unlabeled samples in the training set, the recognition accuracies of the training set and the testing set were both approximately 80%. The results indicated that image recognition of the four alfalfa leaf diseases can be implemented with high accuracy. This study provides a feasible solution for lesion image segmentation and image recognition of alfalfa leaf disease. PMID:27977767

  11. Identification of Alfalfa Leaf Diseases Using Image Recognition Technology.

    PubMed

    Qin, Feng; Liu, Dongxia; Sun, Bingda; Ruan, Liu; Ma, Zhanhong; Wang, Haiguang

    2016-01-01

    Common leaf spot (caused by Pseudopeziza medicaginis), rust (caused by Uromyces striatus), Leptosphaerulina leaf spot (caused by Leptosphaerulina briosiana) and Cercospora leaf spot (caused by Cercospora medicaginis) are the four common types of alfalfa leaf diseases. Timely and accurate diagnoses of these diseases are critical for disease management, alfalfa quality control and the healthy development of the alfalfa industry. In this study, the identification and diagnosis of the four types of alfalfa leaf diseases were investigated using pattern recognition algorithms based on image-processing technology. A sub-image with one or multiple typical lesions was obtained by artificial cutting from each acquired digital disease image. Then the sub-images were segmented using twelve lesion segmentation methods integrated with clustering algorithms (including K_means clustering, fuzzy C-means clustering and K_median clustering) and supervised classification algorithms (including logistic regression analysis, Naive Bayes algorithm, classification and regression tree, and linear discriminant analysis). After a comprehensive comparison, the segmentation method integrating the K_median clustering algorithm and linear discriminant analysis was chosen to obtain lesion images. After the lesion segmentation using this method, a total of 129 texture, color and shape features were extracted from the lesion images. Based on the features selected using three methods (ReliefF, 1R and correlation-based feature selection), disease recognition models were built using three supervised learning methods, including the random forest, support vector machine (SVM) and K-nearest neighbor methods. A comparison of the recognition results of the models was conducted. The results showed that when the ReliefF method was used for feature selection, the SVM model built with the most important 45 features (selected from a total of 129 features) was the optimal model. For this SVM model, the recognition accuracies of the training set and the testing set were 97.64% and 94.74%, respectively. Semi-supervised models for disease recognition were built based on the 45 effective features that were used for building the optimal SVM model. For the optimal semi-supervised models built with three ratios of labeled to unlabeled samples in the training set, the recognition accuracies of the training set and the testing set were both approximately 80%. The results indicated that image recognition of the four alfalfa leaf diseases can be implemented with high accuracy. This study provides a feasible solution for lesion image segmentation and image recognition of alfalfa leaf disease.

  12. Optimal linear and nonlinear feature extraction based on the minimization of the increased risk of misclassification. [Bayes theorem - statistical analysis/data processing

    NASA Technical Reports Server (NTRS)

    Defigueiredo, R. J. P.

    1974-01-01

    General classes of nonlinear and linear transformations were investigated for the reduction of the dimensionality of the classification (feature) space so that, for a prescribed dimension m of this space, the increase of the misclassification risk is minimized.

  13. A comparison of LANDSAT TM to MSS imagery for detecting submerged aquatic vegetation in lower Chesapeake Bay

    NASA Technical Reports Server (NTRS)

    Ackleson, S. G.; Klemas, V.

    1985-01-01

    LANDSAT Thematic Mapper (TM) and Multispectral Scanner (MSS) imagery generated simultaneously over Guinea Marsh, Virginia, are assessed in the ability to detect submerged aquatic, bottom-adhering plant canopies (SAV). An unsupervised clustering algorithm is applied to both image types and the resulting classifications compared to SAV distributions derived from color aerial photography. Class confidence and accuracy are first computed for all water areas and then only shallow areas where water depth is less than 6 feet. In both the TM and MSS imagery, masking water areas deeper than 6 ft. resulted in greater classification accuracy at confidence levels greater than 50%. Both systems perform poorly in detecting SAV with crown cover densities less than 70%. On the basis of the spectral resolution, radiometric sensitivity, and location of visible bands, TM imagery does not offer a significant advantage over MSS data for detecting SAV in Lower Chesapeake Bay. However, because the TM imagery represents a higher spatial resolution, smaller SAV canopies may be detected than is possible with MSS data.

  14. Toward extending terrestrial laser scanning applications in forestry: a case study of broad- and needle-leaf tree classification

    NASA Astrophysics Data System (ADS)

    Lin, Yi; Jiang, Miao

    2017-01-01

    Tree species information is essential for forest research and management purposes, which in turn require approaches for accurate and precise classification of tree species. One such remote sensing technology, terrestrial laser scanning (TLS), has proved to be capable of characterizing detailed tree structures, such as tree stem geometry. Can TLS further differentiate between broad- and needle-leaves? If the answer is positive, TLS data can be used for classification of taxonomic tree groups by directly examining their differences in leaf morphology. An analysis was proposed to assess TLS-represented broad- and needle-leaf structures, followed by a Bayes classifier to perform the classification. Tests indicated that the proposed method can basically implement the task, with an overall accuracy of 77.78%. This study indicates a way of implementing the classification of the two major broad- and needle-leaf taxonomies measured by TLS in accordance to their literal definitions, and manifests the potential of extending TLS applications in forestry.

  15. Exploring the CAESAR database using dimensionality reduction techniques

    NASA Astrophysics Data System (ADS)

    Mendoza-Schrock, Olga; Raymer, Michael L.

    2012-06-01

    The Civilian American and European Surface Anthropometry Resource (CAESAR) database containing over 40 anthropometric measurements on over 4000 humans has been extensively explored for pattern recognition and classification purposes using the raw, original data [1-4]. However, some of the anthropometric variables would be impossible to collect in an uncontrolled environment. Here, we explore the use of dimensionality reduction methods in concert with a variety of classification algorithms for gender classification using only those variables that are readily observable in an uncontrolled environment. Several dimensionality reduction techniques are employed to learn the underlining structure of the data. These techniques include linear projections such as the classical Principal Components Analysis (PCA) and non-linear (manifold learning) techniques, such as Diffusion Maps and the Isomap technique. This paper briefly describes all three techniques, and compares three different classifiers, Naïve Bayes, Adaboost, and Support Vector Machines (SVM), for gender classification in conjunction with each of these three dimensionality reduction approaches.

  16. Improved, ACMG-Compliant, in silico prediction of pathogenicity for missense substitutions encoded by TP53 variants.

    PubMed

    Fortuno, Cristina; James, Paul A; Young, Erin L; Feng, Bing; Olivier, Magali; Pesaran, Tina; Tavtigian, Sean V; Spurdle, Amanda B

    2018-05-18

    Clinical interpretation of germline missense variants represents a major challenge, including those in the TP53 Li-Fraumeni syndrome gene. Bioinformatic prediction is a key part of variant classification strategies. We aimed to optimize the performance of the Align-GVGD tool used for p53 missense variant prediction, and compare its performance to other bioinformatic tools (SIFT, PolyPhen-2) and ensemble methods (REVEL, BayesDel). Reference sets of assumed pathogenic and assumed benign variants were defined using functional and/or clinical data. Area under the curve and Matthews correlation coefficient (MCC) values were used as objective functions to select an optimized protein multi-sequence alignment with best performance for Align-GVGD. MCC comparison of tools using binary categories showed optimized Align-GVGD (C15 cut-off) combined with BayesDel (0.16 cut-off), or with REVEL (0.5 cut-off), to have the best overall performance. Further, a semi-quantitative approach using multiple tiers of bioinformatic prediction, validated using an independent set of non-functional and functional variants, supported use of Align-GVGD and BayesDel prediction for different strength of evidence levels in ACMG/AMP rules. We provide rationale for bioinformatic tool selection for TP53 variant classification, and have also computed relevant bioinformatic predictions for every possible p53 missense variant to facilitate their use by the scientific and medical community. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.

  17. Assessment of various supervised learning algorithms using different performance metrics

    NASA Astrophysics Data System (ADS)

    Susheel Kumar, S. M.; Laxkar, Deepak; Adhikari, Sourav; Vijayarajan, V.

    2017-11-01

    Our work brings out comparison based on the performance of supervised machine learning algorithms on a binary classification task. The supervised machine learning algorithms which are taken into consideration in the following work are namely Support Vector Machine(SVM), Decision Tree(DT), K Nearest Neighbour (KNN), Naïve Bayes(NB) and Random Forest(RF). This paper mostly focuses on comparing the performance of above mentioned algorithms on one binary classification task by analysing the Metrics such as Accuracy, F-Measure, G-Measure, Precision, Misclassification Rate, False Positive Rate, True Positive Rate, Specificity, Prevalence.

  18. Social sensing of floods in the UK

    PubMed Central

    Williams, Hywel T. P.

    2018-01-01

    “Social sensing” is a form of crowd-sourcing that involves systematic analysis of digital communications to detect real-world events. Here we consider the use of social sensing for observing natural hazards. In particular, we present a case study that uses data from a popular social media platform (Twitter) to detect and locate flood events in the UK. In order to improve data quality we apply a number of filters (timezone, simple text filters and a naive Bayes ‘relevance’ filter) to the data. We then use place names in the user profile and message text to infer the location of the tweets. These two steps remove most of the irrelevant tweets and yield orders of magnitude more located tweets than we have by relying on geo-tagged data. We demonstrate that high resolution social sensing of floods is feasible and we can produce high-quality historical and real-time maps of floods using Twitter. PMID:29385132

  19. Social sensing of floods in the UK.

    PubMed

    Arthur, Rudy; Boulton, Chris A; Shotton, Humphrey; Williams, Hywel T P

    2018-01-01

    "Social sensing" is a form of crowd-sourcing that involves systematic analysis of digital communications to detect real-world events. Here we consider the use of social sensing for observing natural hazards. In particular, we present a case study that uses data from a popular social media platform (Twitter) to detect and locate flood events in the UK. In order to improve data quality we apply a number of filters (timezone, simple text filters and a naive Bayes 'relevance' filter) to the data. We then use place names in the user profile and message text to infer the location of the tweets. These two steps remove most of the irrelevant tweets and yield orders of magnitude more located tweets than we have by relying on geo-tagged data. We demonstrate that high resolution social sensing of floods is feasible and we can produce high-quality historical and real-time maps of floods using Twitter.

  20. A qualitative and quantitative assessment for a bone marrow harvest simulator.

    PubMed

    Machado, Liliane S; Moraes, Ronei M

    2009-01-01

    Several approaches to perform assessment in training simulators based on virtual reality have been proposed. There are two kinds of assessment methods: offline and online. The main requirements related to online training assessment methodologies applied to virtual reality systems are the low computational complexity and the high accuracy. In the literature it can be found several approaches for general cases which can satisfy such requirements. An inconvenient about those approaches is related to an unsatisfactory solution for specific cases, as in some medical procedures, where there are quantitative and qualitative information available to perform the assessment. In this paper, we present an approach to online training assessment based on a Modified Naive Bayes which can manipulate qualitative and quantitative variables simultaneously. A special medical case was simulated in a bone marrow harvest simulator. The results obtained were satisfactory and evidenced the applicability of the method.

  1. Monitoring eating habits using a piezoelectric sensor-based necklace.

    PubMed

    Kalantarian, Haik; Alshurafa, Nabil; Le, Tuan; Sarrafzadeh, Majid

    2015-03-01

    Maintaining appropriate levels of food intake and developing regularity in eating habits is crucial to weight loss and the preservation of a healthy lifestyle. Moreover, awareness of eating habits is an important step towards portion control and weight loss. In this paper, we introduce a novel food-intake monitoring system based around a wearable wireless-enabled necklace. The proposed necklace includes an embedded piezoelectric sensor, small Arduino-compatible microcontroller, Bluetooth LE transceiver, and Lithium-Polymer battery. Motion in the throat is captured and transmitted to a mobile application for processing and user guidance. Results from data collected from 30 subjects indicate that it is possible to detect solid and liquid foods, with an F-measure of 0.837 and 0.864, respectively, using a naive Bayes classifier. Furthermore, identification of extraneous motions such as head turns and walking are shown to significantly reduce the false positive rate of swallow detection. Copyright © 2015 Elsevier Ltd. All rights reserved.

  2. An Individual Finger Gesture Recognition System Based on Motion-Intent Analysis Using Mechanomyogram Signal

    PubMed Central

    Ding, Huijun; He, Qing; Zhou, Yongjin; Dan, Guo; Cui, Song

    2017-01-01

    Motion-intent-based finger gesture recognition systems are crucial for many applications such as prosthesis control, sign language recognition, wearable rehabilitation system, and human–computer interaction. In this article, a motion-intent-based finger gesture recognition system is designed to correctly identify the tapping of every finger for the first time. Two auto-event annotation algorithms are firstly applied and evaluated for detecting the finger tapping frame. Based on the truncated signals, the Wavelet packet transform (WPT) coefficients are calculated and compressed as the features, followed by a feature selection method that is able to improve the performance by optimizing the feature set. Finally, three popular classifiers including naive Bayes (NBC), K-nearest neighbor (KNN), and support vector machine (SVM) are applied and evaluated. The recognition accuracy can be achieved up to 94%. The design and the architecture of the system are presented with full system characterization results. PMID:29167655

  3. Human Naive T Cells Express Functional CXCL8 and Promote Tumorigenesis.

    PubMed

    Crespo, Joel; Wu, Ke; Li, Wei; Kryczek, Ilona; Maj, Tomasz; Vatan, Linda; Wei, Shuang; Opipari, Anthony W; Zou, Weiping

    2018-05-25

    Naive T cells are thought to be functionally quiescent. In this study, we studied and compared the phenotype, cytokine profile, and potential function of human naive CD4 + T cells in umbilical cord and peripheral blood. We found that naive CD4 + T cells, but not memory T cells, expressed high levels of chemokine CXCL8. CXCL8 + naive T cells were preferentially enriched CD31 + T cells and did not express T cell activation markers or typical Th effector cytokines, including IFN-γ, IL-4, IL-17, and IL-22. In addition, upon activation, naive T cells retained high levels of CXCL8 expression. Furthermore, we showed that naive T cell-derived CXCL8 mediated neutrophil migration in the in vitro migration assay, supported tumor sphere formation, and promoted tumor growth in an in vivo human xenograft model. Thus, human naive T cells are phenotypically and functionally heterogeneous and can carry out active functions in immune responses. Copyright © 2018 by The American Association of Immunologists, Inc.

  4. The application of remote sensing image sea ice monitoring method in Bohai Bay based on C4.5 decision tree algorithm

    NASA Astrophysics Data System (ADS)

    Ye, Wei; Song, Wei

    2018-02-01

    In The Paper, the remote sensing monitoring of sea ice problem was turned into a classification problem in data mining. Based on the statistic of the related band data of HJ1B remote sensing images, the main bands of HJ1B images related with the reflectance of seawater and sea ice were found. On the basis, the decision tree rules for sea ice monitoring were constructed by the related bands found above, and then the rules were applied to Liaodong Bay area seriously covered by sea ice for sea ice monitoring. The result proved that the method is effective.

  5. Classification and Sequential Pattern Analysis for Improving Managerial Efficiency and Providing Better Medical Service in Public Healthcare Centers

    PubMed Central

    Chung, Sukhoon; Rhee, Hyunsill; Suh, Yongmoo

    2010-01-01

    Objectives This study sought to find answers to the following questions: 1) Can we predict whether a patient will revisit a healthcare center? 2) Can we anticipate diseases of patients who revisit the center? Methods For the first question, we applied 5 classification algorithms (decision tree, artificial neural network, logistic regression, Bayesian networks, and Naïve Bayes) and the stacking-bagging method for building classification models. To solve the second question, we performed sequential pattern analysis. Results We determined: 1) In general, the most influential variables which impact whether a patient of a public healthcare center will revisit it or not are personal burden, insurance bill, period of prescription, age, systolic pressure, name of disease, and postal code. 2) The best plain classification model is dependent on the dataset. 3) Based on average of classification accuracy, the proposed stacking-bagging method outperformed all traditional classification models and our sequential pattern analysis revealed 16 sequential patterns. Conclusions Classification models and sequential patterns can help public healthcare centers plan and implement healthcare service programs and businesses that are more appropriate to local residents, encouraging them to revisit public health centers. PMID:21818426

  6. Predicting the need for CT imaging in children with minor head injury using an ensemble of Naive Bayes classifiers.

    PubMed

    Klement, William; Wilk, Szymon; Michalowski, Wojtek; Farion, Ken J; Osmond, Martin H; Verter, Vedat

    2012-03-01

    Using an automatic data-driven approach, this paper develops a prediction model that achieves more balanced performance (in terms of sensitivity and specificity) than the Canadian Assessment of Tomography for Childhood Head Injury (CATCH) rule, when predicting the need for computed tomography (CT) imaging of children after a minor head injury. CT is widely considered an effective tool for evaluating patients with minor head trauma who have potentially suffered serious intracranial injury. However, its use poses possible harmful effects, particularly for children, due to exposure to radiation. Safety concerns, along with issues of cost and practice variability, have led to calls for the development of effective methods to decide when CT imaging is needed. Clinical decision rules represent such methods and are normally derived from the analysis of large prospectively collected patient data sets. The CATCH rule was created by a group of Canadian pediatric emergency physicians to support the decision of referring children with minor head injury to CT imaging. The goal of the CATCH rule was to maximize the sensitivity of predictions of potential intracranial lesion while keeping specificity at a reasonable level. After extensive analysis of the CATCH data set, characterized by severe class imbalance, and after a thorough evaluation of several data mining methods, we derived an ensemble of multiple Naive Bayes classifiers as the prediction model for CT imaging decisions. In the first phase of the experiment we compared the proposed ensemble model to other ensemble models employing rule-, tree- and instance-based member classifiers. Our prediction model demonstrated the best performance in terms of AUC, G-mean and sensitivity measures. In the second phase, using a bootstrapping experiment similar to that reported by the CATCH investigators, we showed that the proposed ensemble model achieved a more balanced predictive performance than the CATCH rule with an average sensitivity of 82.8% and an average specificity of 74.4% (vs. 98.1% and 50.0% for the CATCH rule respectively). Automatically derived prediction models cannot replace a physician's acumen. However, they help establish reference performance indicators for the purpose of developing clinical decision rules so the trade-off between prediction sensitivity and specificity is better understood. Copyright © 2011 Elsevier B.V. All rights reserved.

  7. [Diversity and antimicrobial activities of cultivable bacteria isolated from Jiaozhou Bay].

    PubMed

    Wang, Yiting; Zhang, Chuanbo; Qi, Lin; Jia, Xiaoqiang; Lu, Wenyu

    2016-12-04

    Marine microorganisms have a great potential in producing biologically active secondary metabolites. In order to study the diversity and antimicrobial activity, we explored 9 sediment samples in different observation sites of Jiaozhou bay. We used YPD and Z2216E culture medium to isolate bacteria from the sediments; 16S rRNA was sequenced for classification and identification of the isolates. Then, we used Oxford cup method to detect antimicrobial activities of the isolated bacteria against 7 test strains. Lastly, we selected 16 representatives to detect secondary-metabolite biosynthesis genes:PKSI, NRPS, CYP, PhzE, dTGD by PCR specific amplification. A total of 76 bacterial strains were isolated from Jiaozhou bay; according to the 16S rRNA gene sequence analysis. These strains could be sorted into 11 genera belonging to 8 different families:Aneurinibacillus, Brevibacillus, Microbacterium, Oceanisphae, Bacillus, Marinomonas, Staphylococcus, Kocuria, Arthrobacters, Micrococcus and Pseudoalteromonas. Of them 34 strains showed antimicrobial activity against at least one of the tested strains. All 16 strains had at least one function genes, 5 strains possessed more than three function genes. Jiaozhou bay area is rich in microbial resources with potential in providing useful secondary metabolites.

  8. Blocking the recruitment of naive CD4+ T cells reverses immunosuppression in breast cancer

    PubMed Central

    Su, Shicheng; Liao, Jianyou; Liu, Jiang; Huang, Di; He, Chonghua; Chen, Fei; Yang, LinBing; Wu, Wei; Chen, Jianing; Lin, Ling; Zeng, Yunjie; Ouyang, Nengtai; Cui, Xiuying; Yao, Herui; Su, Fengxi; Huang, Jian-dong; Lieberman, Judy; Liu, Qiang; Song, Erwei

    2017-01-01

    The origin of tumor-infiltrating Tregs, critical mediators of tumor immunosuppression, is unclear. Here, we show that tumor-infiltrating naive CD4+ T cells and Tregs in human breast cancer have overlapping TCR repertoires, while hardly overlap with circulating Tregs, suggesting that intratumoral Tregs mainly develop from naive T cells in situ rather than from recruited Tregs. Furthermore, the abundance of naive CD4+ T cells and Tregs is closely correlated, both indicating poor prognosis for breast cancer patients. Naive CD4+ T cells adhere to tumor slices in proportion to the abundance of CCL18-producing macrophages. Moreover, adoptively transferred human naive CD4+ T cells infiltrate human breast cancer orthotopic xenografts in a CCL18-dependent manner. In human breast cancer xenografts in humanized mice, blocking the recruitment of naive CD4+ T cells into tumor by knocking down the expression of PITPNM3, a CCL18 receptor, significantly reduces intratumoral Tregs and inhibits tumor progression. These findings suggest that breast tumor-infiltrating Tregs arise from chemotaxis of circulating naive CD4+ T cells that differentiate into Tregs in situ. Inhibiting naive CD4+ T cell recruitment into tumors by interfering with PITPNM3 recognition of CCL18 may be an attractive strategy for anticancer immunotherapy. PMID:28290464

  9. BayesMotif: de novo protein sorting motif discovery from impure datasets.

    PubMed

    Hu, Jianjun; Zhang, Fan

    2010-01-18

    Protein sorting is the process that newly synthesized proteins are transported to their target locations within or outside of the cell. This process is precisely regulated by protein sorting signals in different forms. A major category of sorting signals are amino acid sub-sequences usually located at the N-terminals or C-terminals of protein sequences. Genome-wide experimental identification of protein sorting signals is extremely time-consuming and costly. Effective computational algorithms for de novo discovery of protein sorting signals is needed to improve the understanding of protein sorting mechanisms. We formulated the protein sorting motif discovery problem as a classification problem and proposed a Bayesian classifier based algorithm (BayesMotif) for de novo identification of a common type of protein sorting motifs in which a highly conserved anchor is present along with a less conserved motif regions. A false positive removal procedure is developed to iteratively remove sequences that are unlikely to contain true motifs so that the algorithm can identify motifs from impure input sequences. Experiments on both implanted motif datasets and real-world datasets showed that the enhanced BayesMotif algorithm can identify anchored sorting motifs from pure or impure protein sequence dataset. It also shows that the false positive removal procedure can help to identify true motifs even when there is only 20% of the input sequences containing true motif instances. We proposed BayesMotif, a novel Bayesian classification based algorithm for de novo discovery of a special category of anchored protein sorting motifs from impure datasets. Compared to conventional motif discovery algorithms such as MEME, our algorithm can find less-conserved motifs with short highly conserved anchors. Our algorithm also has the advantage of easy incorporation of additional meta-sequence features such as hydrophobicity or charge of the motifs which may help to overcome the limitations of PWM (position weight matrix) motif model.

  10. Identification and Characteristics of Signature Whistles in Wild Bottlenose Dolphins (Tursiops truncatus) from Namibia

    PubMed Central

    Elwen, Simon Harvey; Nastasi, Aurora

    2014-01-01

    A signature whistle type is a learned, individually distinctive whistle type in a dolphin's acoustic repertoire that broadcasts the identity of the whistle owner. The acquisition and use of signature whistles indicates complex cognitive functioning that requires wider investigation in wild dolphin populations. Here we identify signature whistle types from a population of approximately 100 wild common bottlenose dolphins (Tursiops truncatus) inhabiting Walvis Bay, and describe signature whistle occurrence, acoustic parameters and temporal production. A catalogue of 43 repeatedly emitted whistle types (REWTs) was generated by analysing 79 hrs of acoustic recordings. From this, 28 signature whistle types were identified using a method based on the temporal patterns in whistle sequences. A visual classification task conducted by 5 naïve judges showed high levels of agreement in classification of whistles (Fleiss-Kappa statistic, κ = 0.848, Z = 55.3, P<0.001) and supported our categorisation. Signature whistle structure remained stable over time and location, with most types (82%) recorded in 2 or more years, and 4 identified at Walvis Bay and a second field site approximately 450 km away. Whistle acoustic parameters were consistent with those of signature whistles documented in Sarasota Bay (Florida, USA). We provide evidence of possible two-voice signature whistle production by a common bottlenose dolphin. Although signature whistle types have potential use as a marker for studying individual habitat use, we only identified approximately 28% of those from the Walvis Bay population, despite considerable recording effort. We found that signature whistle type diversity was higher in larger dolphin groups and groups with calves present. This is the first study describing signature whistles in a wild free-ranging T. truncatus population inhabiting African waters and it provides a baseline on which more in depth behavioural studies can be based. PMID:25203814

  11. Persistence of the Intuitive Conception of Living Things in Adolescence

    NASA Astrophysics Data System (ADS)

    Babai, Reuven; Sekal, Rachel; Stavy, Ruth

    2010-02-01

    This study investigated whether intuitive, naive conceptions of "living things" based on objects' mobility (movement = alive) persist into adolescence and affect 10th graders' accuracy of responses and reaction times during object classification. Most of the 58 students classified the test objects correctly as living/nonliving, yet they demonstrated significantly longer reaction times for classifying plants compared to animals and for classifying dynamic objects compared to static inanimate objects. Findings indicated that, despite prior learning in biology, the intuitive conception of living things persists up to age 15-16 years, affecting related reasoning processes. Consideration of these findings may help educators in their decisions about the nature of examples they use in their classrooms.

  12. Hyperspectral analysis of seagrass in Redfish Bay, Texas

    NASA Astrophysics Data System (ADS)

    Wood, John S.

    Remote sensing using multi- and hyperspectral imaging and analysis has been used in resource management for quite some time, and for a variety of purposes. In the studies to follow, hyperspectral imagery of Redfish Bay is used to discriminate between species of seagrasses found below the water surface. Water attenuates and reflects light and energy from the electromagnetic spectrum, and as a result, subsurface analysis can be more complex than that performed in the terrestrial world. In the following studies, an iterative process is developed, using ENVI image processing software and ArcGIS software. Band selection was based on recommendations developed empirically in conjunction with ongoing research into depth corrections, which were applied to the imagery bands (a default depth of 65 cm was used). Polygons generated, classified and aggregated within ENVI are reclassified in ArcGIS using field site data that was randomly selected for that purpose. After the first iteration, polygons that remain classified as 'Mixed' are subjected to another iteration of classification in ENVI, then brought into ArcGIS and reclassified. Finally, when that classification scheme is exhausted, a supervised classification is performed, using a 'Maximum Likelihood' classification technique, which assigned the remaining polygons to the classification that was most like the training polygons, by digital number value. Producer's Accuracy by classification ranged from 23.33 % for the 'MixedMono' class to 66.67% for the 'Bare' class; User's Accuracy by classification ranged from 22.58% for the 'MixedMono' class to 69.57% for the 'Bare' classification. An overall accuracy of 37.93% was achieved. Producers and Users Accuracies for Halodule were 29% and 39%, respectively; for Thalassia, they were 46% and 40%. Cohen's Kappa Coefficient was calculated at .2988. We then returned to the field and collected spectral signatures of monotypic stands of seagrass at varying depths and at three sensor levels: above the water surface, just below the air/water interface, and at the canopy position, when it differed from the subsurface position. Analysis of plots of these spectral curves, after applying depth corrections and Multiplicative Scatter Correction, indicates that there are detectable spectral differences between Halodule and Thalassia species at all three positions. Further analysis indicated that only above-surface spectral signals could reliably be used to discriminate between species, because there was an overlap of the standard deviations in the other two positions. A recommendation for wavelengths that would produce increased accuracy in hyperspectral image analysis was made, based on areas where there is a significant amount of difference between the mean spectral signatures, and no overlap of the standard deviations in our samples. The original hyperspectral imagery was reprocessed, using the bands recommended from the research above (approximately 535, 600, 620, 638, and 656 nm). A depth raster was developed from various available sources, which was resampled and reclassified to reflect values for water absorption and water scattering, which were then applied to each band using the depth correction algorithm. Processing followed the iterative classification methods described above. Accuracy for this round of processing improved; overall accuracy increased from 38% to 57%. Improvements were noted in Producer's Accuracy, with the 'Bare' vi classification increasing from 67% to 73%, Halodule increasing from 29% to 63%, Thalassia increasing slightly, from 46% to 50%, and 'MixedMono' improving from 23% to 42%. User's Accuracy also improved, with the 'Bare' class increasing from 69% to 70%, Halodule increasing from 39% to 67%, Thalassia increasing from 40% to 7%, and 'MixedMono' increasing from 22.5% to 35%. A very recent report shows the mean percent cover of seagrasses in Redfish Bay and Corpus Christi Bay combined for all species at 68.6%, and individually by species: Halodule 39.8%, Thalassia 23.7%, Syringodium 4%, Ruppia 1% and Halophila 0.1%. Our study classifies 15% as 'Bare', 23% Halodule, 18% Thalassia, and 2% Ruppia. In addition, we classify 5% as 'Mixed', 22% as 'MixedMono', 12% as 'Bare/Halodule Mix', and 3% 'Bare/Thalassia Mix'. Aggregating the 'Bare' and 'Bare/species' classes would equate to approximately 30%, very close to what this new study produces. Other classes are quite similar, when considering that their study includes no 'Mixed' classifications. This series of research studies illustrates the application and utility of hyperspectral imagery and associated processing to mapping shallow benthic habitats. It also demonstrates that the technology is rapidly changing and adapting, which will lead to even further increases in accuracy. Future studies with hyperspectral imaging should include extensive spectral field collection, and the application of a depth correction.

  13. Evaluation for the ecological quality status of coastal waters in East China Sea using fuzzy integrated assessment method.

    PubMed

    Wu, H Y; Chen, K L; Chen, Z H; Chen, Q H; Qiu, Y P; Wu, J C; Zhang, J F

    2012-03-01

    This research presented an evaluation for the ecological quality status (EcoQS) of three semi-enclosed coastal areas using fuzzy integrated assessment method (FIAM). With this method, the hierarchy structure was clarified by an index system of 11 indicators selected from biotic elements and physicochemical elements, and the weight vector of index system was calculated with Delphi-Analytic Hierarchy Process (AHP) procedure. Then, the FIAM was used to achieve an EcoQS assessment. As a result of assessment, most of the sampling stations demonstrated a clear gradient in EcoQS, ranging from high to poor status. Among the four statuses, high and good, owning a ratio of 55.9% and 26.5%, respectively, were two dominant statuses for three bays, especially for Sansha Bay and Luoyuan Bay. The assessment results were found consistent with the pressure information and parameters obtained at most stations. In addition, the sources of uncertainty in classification of EcoQS were also discussed. Copyright © 2011 Elsevier Ltd. All rights reserved.

  14. Adaptive classifier for steel strip surface defects

    NASA Astrophysics Data System (ADS)

    Jiang, Mingming; Li, Guangyao; Xie, Li; Xiao, Mang; Yi, Li

    2017-01-01

    Surface defects detection system has been receiving increased attention as its precision, speed and less cost. One of the most challenges is reacting to accuracy deterioration with time as aged equipment and changed processes. These variables will make a tiny change to the real world model but a big impact on the classification result. In this paper, we propose a new adaptive classifier with a Bayes kernel (BYEC) which update the model with small sample to it adaptive for accuracy deterioration. Firstly, abundant features were introduced to cover lots of information about the defects. Secondly, we constructed a series of SVMs with the random subspace of the features. Then, a Bayes classifier was trained as an evolutionary kernel to fuse the results from base SVMs. Finally, we proposed the method to update the Bayes evolutionary kernel. The proposed algorithm is experimentally compared with different algorithms, experimental results demonstrate that the proposed method can be updated with small sample and fit the changed model well. Robustness, low requirement for samples and adaptive is presented in the experiment.

  15. Risk forewarning model for rice grain Cd pollution based on Bayes theory.

    PubMed

    Wu, Bo; Guo, Shuhai; Zhang, Lingyan; Li, Fengmei

    2018-03-15

    Cadmium (Cd) pollution of rice grain caused by Cd-contaminated soils is a common problem in southwest and central south China. In this study, utilizing the advantages of the Bayes classification statistical method, we established a risk forewarning model for rice grain Cd pollution, and put forward two parameters (the prior probability factor and data variability factor). The sensitivity analysis of the model parameters illustrated that sample size and standard deviation influenced the accuracy and applicable range of the model. The accuracy of the model was improved by the self-renewal of the model through adding the posterior data into the priori data. Furthermore, this method can be used to predict the risk probability of rice grain Cd pollution under similar soil environment, tillage and rice varietal conditions. The Bayes approach thus represents a feasible method for risk forewarning of heavy metals pollution of agricultural products caused by contaminated soils. Copyright © 2017 Elsevier B.V. All rights reserved.

  16. Total nutrient and sediment loads, trends, yields, and nontidal water-quality indicators for selected nontidal stations, Chesapeake Bay Watershed, 1985–2011

    USGS Publications Warehouse

    Langland, Michael J.; Blomquist, Joel D.; Moyer, Douglas; Hyer, Kenneth; Chanat, Jeffrey G.

    2013-01-01

    The U.S. Geological Survey, in cooperation with Chesapeake Bay Program (CBP) partners, routinely reports long-term concentration trends and monthly and annual constituent loads for stream water-quality monitoring stations across the Chesapeake Bay watershed. This report documents flow-adjusted trends in sediment and total nitrogen and phosphorus concentrations for 31 stations in the years 1985–2011 and for 32 stations in the years 2002–2011. Sediment and total nitrogen and phosphorus yields for 65 stations are presented for the years 2006–2011. A combined nontidal water-quality indicator (based on both trends and yields) indicates there are more stations classified as “improving water-quality trend and a low yield” than “degrading water-quality trend and a high yield” for total nitrogen. The same type of 2-way classification for total phosphorus and sediment results in equal numbers of stations in each indicator class.

  17. Modeling Verdict Outcomes Using Social Network Measures: The Watergate and Caviar Network Cases.

    PubMed

    Masías, Víctor Hugo; Valle, Mauricio; Morselli, Carlo; Crespo, Fernando; Vargas, Augusto; Laengle, Sigifredo

    2016-01-01

    Modelling criminal trial verdict outcomes using social network measures is an emerging research area in quantitative criminology. Few studies have yet analyzed which of these measures are the most important for verdict modelling or which data classification techniques perform best for this application. To compare the performance of different techniques in classifying members of a criminal network, this article applies three different machine learning classifiers-Logistic Regression, Naïve Bayes and Random Forest-with a range of social network measures and the necessary databases to model the verdicts in two real-world cases: the U.S. Watergate Conspiracy of the 1970's and the now-defunct Canada-based international drug trafficking ring known as the Caviar Network. In both cases it was found that the Random Forest classifier did better than either Logistic Regression or Naïve Bayes, and its superior performance was statistically significant. This being so, Random Forest was used not only for classification but also to assess the importance of the measures. For the Watergate case, the most important one proved to be betweenness centrality while for the Caviar Network, it was the effective size of the network. These results are significant because they show that an approach combining machine learning with social network analysis not only can generate accurate classification models but also helps quantify the importance social network variables in modelling verdict outcomes. We conclude our analysis with a discussion and some suggestions for future work in verdict modelling using social network measures.

  18. A comparative assessment of GIS-based data mining models and a novel ensemble model in groundwater well potential mapping

    NASA Astrophysics Data System (ADS)

    Naghibi, Seyed Amir; Moghaddam, Davood Davoodi; Kalantar, Bahareh; Pradhan, Biswajeet; Kisi, Ozgur

    2017-05-01

    In recent years, application of ensemble models has been increased tremendously in various types of natural hazard assessment such as landslides and floods. However, application of this kind of robust models in groundwater potential mapping is relatively new. This study applied four data mining algorithms including AdaBoost, Bagging, generalized additive model (GAM), and Naive Bayes (NB) models to map groundwater potential. Then, a novel frequency ratio data mining ensemble model (FREM) was introduced and evaluated. For this purpose, eleven groundwater conditioning factors (GCFs), including altitude, slope aspect, slope angle, plan curvature, stream power index (SPI), river density, distance from rivers, topographic wetness index (TWI), land use, normalized difference vegetation index (NDVI), and lithology were mapped. About 281 well locations with high potential were selected. Wells were randomly partitioned into two classes for training the models (70% or 197) and validating them (30% or 84). AdaBoost, Bagging, GAM, and NB algorithms were employed to get groundwater potential maps (GPMs). The GPMs were categorized into potential classes using natural break method of classification scheme. In the next stage, frequency ratio (FR) value was calculated for the output of the four aforementioned models and were summed, and finally a GPM was produced using FREM. For validating the models, area under receiver operating characteristics (ROC) curve was calculated. The ROC curve for prediction dataset was 94.8, 93.5, 92.6, 92.0, and 84.4% for FREM, Bagging, AdaBoost, GAM, and NB models, respectively. The results indicated that FREM had the best performance among all the models. The better performance of the FREM model could be related to reduction of over fitting and possible errors. Other models such as AdaBoost, Bagging, GAM, and NB also produced acceptable performance in groundwater modelling. The GPMs produced in the current study may facilitate groundwater exploitation by determining high and very high groundwater potential zones.

  19. Machine-learning model observer for detection and localization tasks in clinical SPECT-MPI

    NASA Astrophysics Data System (ADS)

    Parages, Felipe M.; O'Connor, J. Michael; Pretorius, P. Hendrik; Brankov, Jovan G.

    2016-03-01

    In this work we propose a machine-learning MO based on Naive-Bayes classification (NB-MO) for the diagnostic tasks of detection, localization and assessment of perfusion defects in clinical SPECT Myocardial Perfusion Imaging (MPI), with the goal of evaluating several image reconstruction methods used in clinical practice. NB-MO uses image features extracted from polar-maps in order to predict lesion detection, localization and severity scores given by human readers in a series of 3D SPECT-MPI. The population used to tune (i.e. train) the NB-MO consisted of simulated SPECT-MPI cases - divided into normals or with lesions in variable sizes and locations - reconstructed using filtered backprojection (FBP) method. An ensemble of five human specialists (physicians) read a subset of simulated reconstructed images, and assigned a perfusion score for each region of the left-ventricle (LV). Polar-maps generated from the simulated volumes along with their corresponding human scores were used to train five NB-MOs (one per human reader), which are subsequently applied (i.e. tested) on three sets of clinical SPECT-MPI polar maps, in order to predict human detection and localization scores. The clinical "testing" population comprises healthy individuals and patients suffering from coronary artery disease (CAD) in three possible regions, namely: LAD, LcX and RCA. Each clinical case was reconstructed using three reconstruction strategies, namely: FBP with no SC (i.e. scatter compensation), OSEM with Triple Energy Window (TEW) SC method, and OSEM with Effective Source Scatter Estimation (ESSE) SC. Alternative Free-Response (AFROC) analysis of perfusion scores shows that NB-MO predicts a higher human performance for scatter-compensated reconstructions, in agreement with what has been reported in published literature. These results suggest that NB-MO has good potential to generalize well to reconstruction methods not used during training, even for reasonably dissimilar datasets (i.e. simulated vs. clinical).

  20. Effects of changes in reservoir operations on water quality and trophic state indicators in Voyageurs National Park, northern Minnesota, 2001-03

    USGS Publications Warehouse

    Christensen, Victoria G.; Payne, G.A.; Kallemeyn, Larry W.

    2004-01-01

    Implementation of an order by the International Joint Commission in January 2000 has changed operating procedures for dams that regulate two large reservoirs in Voyageurs National Park in northern Minnesota. These new procedures were expected to restore a more natural water regime and affect water levels, water quality, and trophic status. Results of laboratory analyses and field measurements of chemical and physical properties from May 2001 through September 2003 were compared to similar data collected prior to the change in operating procedures. Rank sum tests showed significant decreases in chlorophyll-a concentrations and trophic state indices for Kabetogama Lake (p=0.021) and Black Bay (p=0.007). There were no significant decreases in total phosphorus concentration, however, perhaps due to internal cycling of phosphorus. No sites had significant trends in seasonal total phosphorus concentrations, with the exception of May samples from Sand Point Lake, which had a significant decreasing trend (tau=-0.056, probability=0.03). May chlorophyll-a concentrations for Kabetogama Lake showed a significant decreasing trend (tau=-0.42, probability=0.05). Based on mean chlorophyll trophic-state indices (2001-03), Sand Point, Namakan, and Rainy Lakes would be classified oligotrophic to mesotrophic, and Kabetogama Lake and Rainy Lake at Black Bay would be classified as mesotrophic. The classification of Sand Point, Namakan, and Rainy Lakes remain the same for data collected prior to the change in operating procedures. In contrast, the trophic classification of Kabetogama Lake and Rainy Lake at Black Bay has changed from eutrophic to mesotrophic.

  1. MAPPING NON-INDIGENOUS EELGRASS ZOSTERA JAPONICA, ASSOCIATED MACROALGAE AND EMERGENT AQUATIC VEGETARIAN HABITATS IN A PACIFIC NORTHWEST ESTUARY USING NEAR-INFRARED COLOR AERIAL PHOTOGRAPHY AND A HYBRID IMAGE CLASSIFICATION TECHNIQUE

    EPA Science Inventory

    We conducted aerial photographic surveys of Oregon's Yaquina Bay estuary during consecutive summers from 1997 through 2001. Imagery was obtained during low tide exposures of intertidal mudflats, allowing use of near-infrared color film to detect and discriminate plant communitie...

  2. MAPPING EELGRASS SPECIES ZOSTERA ZAPONICA AND Z. MARINA, ASSOCIATED MACROALGAE AND EMERGENT AQUATIC VEGETATION HABITATS IN PACIFIC NORTHWEST ESTUARIES USING NEAR-INFRARED COLOR AERIAL PHOTOGRAPHY AND A HYBRID IMAGE CLASSIFICATION TECHNIQUE

    EPA Science Inventory

    Aerial photographic surveys of Oregon's Yaquina Bay estuary were conducted during consecutive summers from 1997 through 2000. Imagery was obtained during low tide exposures of intertidal mudflats, allowing use of near-infrared color film to detect and discriminate plant communit...

  3. Humboldt Bay Wetlands Review and Baylands Analysis. Volume III. Habitat Classification and Mapping and Appendices.

    DTIC Science & Technology

    1980-08-01

    also a mobile substrate habitat type, but not the massive dunes described previously; some vegetation is established. Most foredunes along the coastal...wvith the Fish and Wildlife Co~ordiinatioin ccnii h’ should be cdirected toe ard tin’, still Sit i~l~( ie . apliC ii n lilt Act IS 320.3ft Obovei

  4. Binary image classification

    NASA Technical Reports Server (NTRS)

    Morris, Carl N.

    1987-01-01

    Motivated by the LANDSAT problem of estimating the probability of crop or geological types based on multi-channel satellite imagery data, Morris and Kostal (1983), Hill, Hinkley, Kostal, and Morris (1984), and Morris, Hinkley, and Johnston (1985) developed an empirical Bayes approach to this problem. Here, researchers return to those developments, making certain improvements and extensions, but restricting attention to the binary case of only two attributes.

  5. Mycofier: a new machine learning-based classifier for fungal ITS sequences.

    PubMed

    Delgado-Serrano, Luisa; Restrepo, Silvia; Bustos, Jose Ricardo; Zambrano, Maria Mercedes; Anzola, Juan Manuel

    2016-08-11

    The taxonomic and phylogenetic classification based on sequence analysis of the ITS1 genomic region has become a crucial component of fungal ecology and diversity studies. Nowadays, there is no accurate alignment-free classification tool for fungal ITS1 sequences for large environmental surveys. This study describes the development of a machine learning-based classifier for the taxonomical assignment of fungal ITS1 sequences at the genus level. A fungal ITS1 sequence database was built using curated data. Training and test sets were generated from it. A Naïve Bayesian classifier was built using features from the primary sequence with an accuracy of 87 % in the classification at the genus level. The final model was based on a Naïve Bayes algorithm using ITS1 sequences from 510 fungal genera. This classifier, denoted as Mycofier, provides similar classification accuracy compared to BLASTN, but the database used for the classification contains curated data and the tool, independent of alignment, is more efficient and contributes to the field, given the lack of an accurate classification tool for large data from fungal ITS1 sequences. The software and source code for Mycofier are freely available at https://github.com/ldelgado-serrano/mycofier.git .

  6. A Theoretical Analysis of Why Hybrid Ensembles Work.

    PubMed

    Hsu, Kuo-Wei

    2017-01-01

    Inspired by the group decision making process, ensembles or combinations of classifiers have been found favorable in a wide variety of application domains. Some researchers propose to use the mixture of two different types of classification algorithms to create a hybrid ensemble. Why does such an ensemble work? The question remains. Following the concept of diversity, which is one of the fundamental elements of the success of ensembles, we conduct a theoretical analysis of why hybrid ensembles work, connecting using different algorithms to accuracy gain. We also conduct experiments on classification performance of hybrid ensembles of classifiers created by decision tree and naïve Bayes classification algorithms, each of which is a top data mining algorithm and often used to create non-hybrid ensembles. Therefore, through this paper, we provide a complement to the theoretical foundation of creating and using hybrid ensembles.

  7. Wavelet-based energy features for glaucomatous image classification.

    PubMed

    Dua, Sumeet; Acharya, U Rajendra; Chowriappa, Pradeep; Sree, S Vinitha

    2012-01-01

    Texture features within images are actively pursued for accurate and efficient glaucoma classification. Energy distribution over wavelet subbands is applied to find these important texture features. In this paper, we investigate the discriminatory potential of wavelet features obtained from the daubechies (db3), symlets (sym3), and biorthogonal (bio3.3, bio3.5, and bio3.7) wavelet filters. We propose a novel technique to extract energy signatures obtained using 2-D discrete wavelet transform, and subject these signatures to different feature ranking and feature selection strategies. We have gauged the effectiveness of the resultant ranked and selected subsets of features using a support vector machine, sequential minimal optimization, random forest, and naïve Bayes classification strategies. We observed an accuracy of around 93% using tenfold cross validations to demonstrate the effectiveness of these methods.

  8. The ERTS-1 investigation (ER-600). Volume 2: ERTS-1 coastal/estuarine analysis. [Galveston Bay, Texas

    NASA Technical Reports Server (NTRS)

    Erb, R. B.

    1974-01-01

    The Coastal Analysis Team of the Johnson Space Center conducted a 1-year investigation of ERTS-1 MSS data to determine its usefulness in coastal zone management. Galveston Bay, Texas, was the study area for evaluating both conventional image interpretation and computer-aided techniques. There was limited success in detecting, identifying and measuring areal extent of water bodies, turbidity zones, phytoplankton blooms, salt marshes, grasslands, swamps, and low wetlands using image interpretation techniques. Computer-aided techniques were generally successful in identifying these features. Aerial measurement of salt marshes accuracies ranged from 89 to 99 percent. Overall classification accuracy of all study sites was 89 percent for Level 1 and 75 percent for Level 2.

  9. Wavelet Packet Entropy for Heart Murmurs Classification

    PubMed Central

    Safara, Fatemeh; Doraisamy, Shyamala; Azman, Azreen; Jantan, Azrul; Ranga, Sri

    2012-01-01

    Heart murmurs are the first signs of cardiac valve disorders. Several studies have been conducted in recent years to automatically differentiate normal heart sounds, from heart sounds with murmurs using various types of audio features. Entropy was successfully used as a feature to distinguish different heart sounds. In this paper, new entropy was introduced to analyze heart sounds and the feasibility of using this entropy in classification of five types of heart sounds and murmurs was shown. The entropy was previously introduced to analyze mammograms. Four common murmurs were considered including aortic regurgitation, mitral regurgitation, aortic stenosis, and mitral stenosis. Wavelet packet transform was employed for heart sound analysis, and the entropy was calculated for deriving feature vectors. Five types of classification were performed to evaluate the discriminatory power of the generated features. The best results were achieved by BayesNet with 96.94% accuracy. The promising results substantiate the effectiveness of the proposed wavelet packet entropy for heart sounds classification. PMID:23227043

  10. Gender classification from video under challenging operating conditions

    NASA Astrophysics Data System (ADS)

    Mendoza-Schrock, Olga; Dong, Guozhu

    2014-06-01

    The literature is abundant with papers on gender classification research. However the majority of such research is based on the assumption that there is enough resolution so that the subject's face can be resolved. Hence the majority of the research is actually in the face recognition and facial feature area. A gap exists for gender classification under challenging operating conditions—different seasonal conditions, different clothing, etc.—and when the subject's face cannot be resolved due to lack of resolution. The Seasonal Weather and Gender (SWAG) Database is a novel database that contains subjects walking through a scene under operating conditions that span a calendar year. This paper exploits a subset of that database—the SWAG One dataset—using data mining techniques, traditional classifiers (ex. Naïve Bayes, Support Vector Machine, etc.) and traditional (canny edge detection, etc.) and non-traditional (height/width ratios, etc.) feature extractors to achieve high correct gender classification rates (greater than 85%). Another novelty includes exploiting frame differentials.

  11. Classification Algorithms for Big Data Analysis, a Map Reduce Approach

    NASA Astrophysics Data System (ADS)

    Ayma, V. A.; Ferreira, R. S.; Happ, P.; Oliveira, D.; Feitosa, R.; Costa, G.; Plaza, A.; Gamba, P.

    2015-03-01

    Since many years ago, the scientific community is concerned about how to increase the accuracy of different classification methods, and major achievements have been made so far. Besides this issue, the increasing amount of data that is being generated every day by remote sensors raises more challenges to be overcome. In this work, a tool within the scope of InterIMAGE Cloud Platform (ICP), which is an open-source, distributed framework for automatic image interpretation, is presented. The tool, named ICP: Data Mining Package, is able to perform supervised classification procedures on huge amounts of data, usually referred as big data, on a distributed infrastructure using Hadoop MapReduce. The tool has four classification algorithms implemented, taken from WEKA's machine learning library, namely: Decision Trees, Naïve Bayes, Random Forest and Support Vector Machines (SVM). The results of an experimental analysis using a SVM classifier on data sets of different sizes for different cluster configurations demonstrates the potential of the tool, as well as aspects that affect its performance.

  12. Evaluation of the impact of chitosan/DNA nanoparticles on the differentiation of human naive CD4+ T cells

    NASA Astrophysics Data System (ADS)

    Liu, Lanxia; Bai, Yuanyuan; Zhu, Dunwan; Song, Liping; Wang, Hai; Dong, Xia; Zhang, Hailing; Leng, Xigang

    2011-06-01

    Chitosan (CS) is one of the most widely studied polymers in non-viral gene delivery since it is a cationic polysaccharide that forms nanoparticles with DNA and hence protects the DNA against digestion by DNase. However, the impact of CS/DNA nanoparticle on the immune system still remains poorly understood. Previous investigations did not found CS/DNA nanoparticles had any significant impact on the function of human and murine macrophages. To date, little is known about the interaction between CS/DNA nanoparticles and naive CD4+ T cells. This study was designed to investigate whether CS/DNA nanoparticles affect the initial differentiation direction of human naive CD4+ T cells. The indirect impact of CS/DNA nanoparticles on naive CD4+ T cell differentiation was investigated by incubating the nanoparticles with human macrophage THP-1 cells in one chamber of a transwell co-incubation system, with the enriched human naive CD4+ T cells being placed in the other chamber of the transwell. The nanoparticles were also co-incubated with the naive CD4+ T cells to explore their direct impact on naive CD4+ T cell differentiation by measuring the release of IL-4 and IFN-γ from the cells. It was demonstrated that CS/DNA nanoparticles induced slightly elevated production of IL-12 by THP-1 cells, possibly owing to the presence of CpG motifs in the plasmid. However, this macrophage stimulating activity was much less significant as compared with lipopolysaccharide and did not impact on the differentiation of the naive CD4+ T cells. It was also demonstrated that, when directly exposed to the naive CD4+ T cells, the nanoparticles induced neither the activation of the naive CD4+ T cells in the absence of recombinant cytokines (recombinant human IL-4 or IFN-γ) that induce naive CD4+ T cell polarization, nor any changes in the differentiation direction of naive CD4+ T cells in the presence of the corresponding cytokines.

  13. LORETA functional imaging in antipsychotic-naive and olanzapine-, clozapine- and risperidone-treated patients with schizophrenia.

    PubMed

    Tislerova, Barbora; Brunovsky, Martin; Horacek, Jiri; Novak, Tomas; Kopecek, Miloslav; Mohr, Pavel; Krajca, Vladimír

    2008-01-01

    The aim of our study was to detect changes in the distribution of electrical brain activity in schizophrenic patients who were antipsychotic naive and those who received treatment with clozapine, olanzapine or risperidone. We included 41 subjects with schizophrenia (antipsychotic naive = 11; clozapine = 8; olanzapine = 10; risperidone = 12) and 20 healthy controls. Low-resolution brain electromagnetic tomography was computed from 19-channel electroencephalography for the frequency bands delta, theta, alpha-1, alpha-2, beta-1, beta-2 and beta-3. We compared antipsychotic-naive subjects with healthy controls and medicated patients. (1) Comparing antipsychotic-naive subjects and controls we found a general increase in the slow delta and theta frequencies over the fronto-temporo-occipital cortex, particularly in the temporolimbic structures, an increase in alpha-1 and alpha-2 in the temporal cortex and an increase in beta-1 and beta-2 in the temporo-occipital and posterior limbic structures. (2) Comparing patients who received clozapine and those who were antipsychotic naive, we found an increase in delta and theta frequencies in the anterior cingulate and medial frontal cortex, and a decrease in alpha-1 and beta-2 in the occipital structures. (3) Comparing patients taking olanzapine with those who were antipsychotic naive, there was an increase in theta frequencies in the anterior cingulum, a decrease in alpha-1, beta-2 and beta-3 in the occipital cortex and posterior limbic structures, and a decrease in beta-3 in the frontotemporal cortex and anterior cingulum. (4) In patients taking risperidone, we found no significant changes from those who were antipsychotic naive. Our results in antipsychotic-naive patients are in agreement with existing functional findings. Changes in those taking clozapine and olanzapine versus those who were antipsychotic naive suggest a compensatory mechanism in the neurobiological substrate for schizophrenia. The lack of difference in risperidone patients versus antipsychotic-naive subjects may relate to risperidone's different pharmacodynamic mechanism. Copyright 2008 S. Karger AG, Basel.

  14. Does the cost function matter in Bayes decision rule?

    PubMed

    Schlü ter, Ralf; Nussbaum-Thom, Markus; Ney, Hermann

    2012-02-01

    In many tasks in pattern recognition, such as automatic speech recognition (ASR), optical character recognition (OCR), part-of-speech (POS) tagging, and other string recognition tasks, we are faced with a well-known inconsistency: The Bayes decision rule is usually used to minimize string (symbol sequence) error, whereas, in practice, we want to minimize symbol (word, character, tag, etc.) error. When comparing different recognition systems, we do indeed use symbol error rate as an evaluation measure. The topic of this work is to analyze the relation between string (i.e., 0-1) and symbol error (i.e., metric, integer valued) cost functions in the Bayes decision rule, for which fundamental analytic results are derived. Simple conditions are derived for which the Bayes decision rule with integer-valued metric cost function and with 0-1 cost gives the same decisions or leads to classes with limited cost. The corresponding conditions can be tested with complexity linear in the number of classes. The results obtained do not make any assumption w.r.t. the structure of the underlying distributions or the classification problem. Nevertheless, the general analytic results are analyzed via simulations of string recognition problems with Levenshtein (edit) distance cost function. The results support earlier findings that considerable improvements are to be expected when initial error rates are high.

  15. Novel naïve Bayes classification models for predicting the carcinogenicity of chemicals.

    PubMed

    Zhang, Hui; Cao, Zhi-Xing; Li, Meng; Li, Yu-Zhi; Peng, Cheng

    2016-11-01

    The carcinogenicity prediction has become a significant issue for the pharmaceutical industry. The purpose of this investigation was to develop a novel prediction model of carcinogenicity of chemicals by using a naïve Bayes classifier. The established model was validated by the internal 5-fold cross validation and external test set. The naïve Bayes classifier gave an average overall prediction accuracy of 90 ± 0.8% for the training set and 68 ± 1.9% for the external test set. Moreover, five simple molecular descriptors (e.g., AlogP, Molecular weight (M W ), No. of H donors, Apol and Wiener) considered as important for the carcinogenicity of chemicals were identified, and some substructures related to the carcinogenicity were achieved. Thus, we hope the established naïve Bayes prediction model could be applied to filter early-stage molecules for this potential carcinogenicity adverse effect; and the identified five simple molecular descriptors and substructures of carcinogens would give a better understanding of the carcinogenicity of chemicals, and further provide guidance for medicinal chemists in the design of new candidate drugs and lead optimization, ultimately reducing the attrition rate in later stages of drug development. Copyright © 2016 Elsevier Ltd. All rights reserved.

  16. Predictions of BuChE inhibitors using support vector machine and naive Bayesian classification techniques in drug discovery.

    PubMed

    Fang, Jiansong; Yang, Ranyao; Gao, Li; Zhou, Dan; Yang, Shengqian; Liu, Ai-Lin; Du, Guan-hua

    2013-11-25

    Butyrylcholinesterase (BuChE, EC 3.1.1.8) is an important pharmacological target for Alzheimer's disease (AD) treatment. However, the currently available BuChE inhibitor screening assays are expensive, labor-intensive, and compound-dependent. It is necessary to develop robust in silico methods to predict the activities of BuChE inhibitors for the lead identification. In this investigation, support vector machine (SVM) models and naive Bayesian models were built to discriminate BuChE inhibitors (BuChEIs) from the noninhibitors. Each molecule was initially represented in 1870 structural descriptors (1235 from ADRIANA.Code, 334 from MOE, and 301 from Discovery studio). Correlation analysis and stepwise variable selection method were applied to figure out activity-related descriptors for prediction models. Additionally, structural fingerprint descriptors were added to improve the predictive ability of models, which were measured by cross-validation, a test set validation with 1001 compounds and an external test set validation with 317 diverse chemicals. The best two models gave Matthews correlation coefficient of 0.9551 and 0.9550 for the test set and 0.9132 and 0.9221 for the external test set. To demonstrate the practical applicability of the models in virtual screening, we screened an in-house data set with 3601 compounds, and 30 compounds were selected for further bioactivity assay. The assay results showed that 10 out of 30 compounds exerted significant BuChE inhibitory activities with IC50 values ranging from 0.32 to 22.22 μM, at which three new scaffolds as BuChE inhibitors were identified for the first time. To our best knowledge, this is the first report on BuChE inhibitors using machine learning approaches. The models generated from SVM and naive Bayesian approaches successfully predicted BuChE inhibitors. The study proved the feasibility of a new method for predicting bioactivities of ligands and discovering novel lead compounds.

  17. Recognition of pornographic web pages by classifying texts and images.

    PubMed

    Hu, Weiming; Wu, Ou; Chen, Zhouyao; Fu, Zhouyu; Maybank, Steve

    2007-06-01

    With the rapid development of the World Wide Web, people benefit more and more from the sharing of information. However, Web pages with obscene, harmful, or illegal content can be easily accessed. It is important to recognize such unsuitable, offensive, or pornographic Web pages. In this paper, a novel framework for recognizing pornographic Web pages is described. A C4.5 decision tree is used to divide Web pages, according to content representations, into continuous text pages, discrete text pages, and image pages. These three categories of Web pages are handled, respectively, by a continuous text classifier, a discrete text classifier, and an algorithm that fuses the results from the image classifier and the discrete text classifier. In the continuous text classifier, statistical and semantic features are used to recognize pornographic texts. In the discrete text classifier, the naive Bayes rule is used to calculate the probability that a discrete text is pornographic. In the image classifier, the object's contour-based features are extracted to recognize pornographic images. In the text and image fusion algorithm, the Bayes theory is used to combine the recognition results from images and texts. Experimental results demonstrate that the continuous text classifier outperforms the traditional keyword-statistics-based classifier, the contour-based image classifier outperforms the traditional skin-region-based image classifier, the results obtained by our fusion algorithm outperform those by either of the individual classifiers, and our framework can be adapted to different categories of Web pages.

  18. Ice Water Classification Using Statistical Distribution Based Conditional Random Fields in RADARSAT-2 Dual Polarization Imagery

    NASA Astrophysics Data System (ADS)

    Zhang, Y.; Li, F.; Zhang, S.; Hao, W.; Zhu, T.; Yuan, L.; Xiao, F.

    2017-09-01

    In this paper, Statistical Distribution based Conditional Random Fields (STA-CRF) algorithm is exploited for improving marginal ice-water classification. Pixel level ice concentration is presented as the comparison of methods based on CRF. Furthermore, in order to explore the effective statistical distribution model to be integrated into STA-CRF, five statistical distribution models are investigated. The STA-CRF methods are tested on 2 scenes around Prydz Bay and Adélie Depression, where contain a variety of ice types during melt season. Experimental results indicate that the proposed method can resolve sea ice edge well in Marginal Ice Zone (MIZ) and show a robust distinction of ice and water.

  19. Development and application of operational techniques for the inventory and monitoring of resources and uses for the Texas coastal zone. [Galvaston Bay and San Antonio test sites

    NASA Technical Reports Server (NTRS)

    Jones, R. (Principal Investigator); Harwood, P.; Finley, R.; Clements, G.; Lodwick, L.; Mcculloch, S.; Marphy, D.

    1976-01-01

    The author has identified the following significant results. The most significant ADP result was the modification of the DAM package to produce classified printouts, scaled and registered to U.S.G.S., 71/2 minute topographic maps from LARSYS-type classification files. With this modification, all the powerful scaling and registration capabilities of DAM become available for multiclass classification files. The most significant results with respect to image interpretation were the application of mapping techniques to a new, more complex area, and the refinement of an image interpretation procedure which should yield the best results.

  20. Sound Classification in Hearing Aids Inspired by Auditory Scene Analysis

    NASA Astrophysics Data System (ADS)

    Büchler, Michael; Allegro, Silvia; Launer, Stefan; Dillier, Norbert

    2005-12-01

    A sound classification system for the automatic recognition of the acoustic environment in a hearing aid is discussed. The system distinguishes the four sound classes "clean speech," "speech in noise," "noise," and "music." A number of features that are inspired by auditory scene analysis are extracted from the sound signal. These features describe amplitude modulations, spectral profile, harmonicity, amplitude onsets, and rhythm. They are evaluated together with different pattern classifiers. Simple classifiers, such as rule-based and minimum-distance classifiers, are compared with more complex approaches, such as Bayes classifier, neural network, and hidden Markov model. Sounds from a large database are employed for both training and testing of the system. The achieved recognition rates are very high except for the class "speech in noise." Problems arise in the classification of compressed pop music, strongly reverberated speech, and tonal or fluctuating noises.

  1. A Theoretical Analysis of Why Hybrid Ensembles Work

    PubMed Central

    2017-01-01

    Inspired by the group decision making process, ensembles or combinations of classifiers have been found favorable in a wide variety of application domains. Some researchers propose to use the mixture of two different types of classification algorithms to create a hybrid ensemble. Why does such an ensemble work? The question remains. Following the concept of diversity, which is one of the fundamental elements of the success of ensembles, we conduct a theoretical analysis of why hybrid ensembles work, connecting using different algorithms to accuracy gain. We also conduct experiments on classification performance of hybrid ensembles of classifiers created by decision tree and naïve Bayes classification algorithms, each of which is a top data mining algorithm and often used to create non-hybrid ensembles. Therefore, through this paper, we provide a complement to the theoretical foundation of creating and using hybrid ensembles. PMID:28255296

  2. Fault detection and diagnosis of diesel engine valve trains

    NASA Astrophysics Data System (ADS)

    Flett, Justin; Bone, Gary M.

    2016-05-01

    This paper presents the development of a fault detection and diagnosis (FDD) system for use with a diesel internal combustion engine (ICE) valve train. A novel feature is generated for each of the valve closing and combustion impacts. Deformed valve spring faults and abnormal valve clearance faults were seeded on a diesel engine instrumented with one accelerometer. Five classification methods were implemented experimentally and compared. The FDD system using the Naïve-Bayes classification method produced the best overall performance, with a lowest detection accuracy (DA) of 99.95% and a lowest classification accuracy (CA) of 99.95% for the spring faults occurring on individual valves. The lowest DA and CA values for multiple faults occurring simultaneously were 99.95% and 92.45%, respectively. The DA and CA results demonstrate the accuracy of our FDD system for diesel ICE valve train fault scenarios not previously addressed in the literature.

  3. Top predators affect the composition of naive protist communities, but only in their early-successional stage.

    PubMed

    Zander, Axel; Gravel, Dominique; Bersier, Louis-Félix; Gray, Sarah M

    2016-02-01

    Introduced top predators have the potential to disrupt community dynamics when prey species are naive to predation. The impact of introduced predators may also vary depending on the stage of community development. Early-succession communities are likely to have small-bodied and fast-growing species, but are not necessarily good at defending against predators. In contrast, late-succession communities are typically composed of larger-bodied species that are more predator resistant relative to small-bodied species. Yet, these aspects are greatly neglected in invasion studies. We therefore tested the effect of top predator presence on early- and late-succession communities that were either naive or non-naive to top predators. We used the aquatic community held within the leaves of Sarracenia purpurea. In North America, communities have experienced the S. purpurea top predator and are therefore non-naive. In Europe, this predator is not present and its niche has not been filled, making these communities top-predator naive. We collected early- and late-succession communities from two non-naive and two naive sites, which are climatically similar. We then conducted a common-garden experiment, with and without the presence of the top predator, in which we recorded changes in community composition, body size spectra, bacterial density, and respiration. We found that the top predator had no statistical effect on global measures of community structure and functioning. However, it significantly altered protist composition, but only in naive, early-succession communities, highlighting that the state of community development is important for understanding the impact of invasion.

  4. DEXAMETHASONE IMPLANT FOR DIABETIC MACULAR EDEMA IN NAIVE COMPARED WITH REFRACTORY EYES: The International Retina Group Real-Life 24-Month Multicenter Study. The IRGREL-DEX Study.

    PubMed

    Iglicki, Matias; Busch, Catharina; Zur, Dinah; Okada, Mali; Mariussi, Miriana; Chhablani, Jay Kumar; Cebeci, Zafer; Fraser-Bell, Samantha; Chaikitmongkol, Voraporn; Couturier, Aude; Giancipoli, Ermete; Lupidi, Marco; Rodríguez-Valdés, Patricio J; Rehak, Matus; Fung, Adrian Tien-Chin; Goldstein, Michaella; Loewenstein, Anat

    2018-04-24

    To investigate efficacy and safety of repeated dexamethasone (DEX) implants over 24 months, in diabetic macular edema (DME) eyes that were treatment naive compared with eyes refractory to anti-vascular endothelial growth factor treatment, in a real-life environment. This multicenter international retrospective study assessed best-corrected visual acuity and central subfield thickness (CST) of naive and refractory eyes to anti-vascular endothelial growth factor injections treated with dexamethasone implants. Safety data (intraocular pressure rise and cataract surgery) were recorded. A total of 130 eyes from 125 patients were included. Baseline best-corrected visual acuity and CST were similar for naive (n = 71) and refractory eyes (n = 59). Both groups improved significantly in vision after 24 months (P < 0.001). However, naive eyes gained statistically significantly more vision than refractory eyes (+11.3 ± 10.0 vs. 7.3 ± 2.7 letters, P = 0.01) and were more likely to gain ≥10 letters (OR 3.31, 95% CI 1.19-9.24, P = 0.02). At 6, 12, and 24 months, CST was significantly decreased compared with baseline in both naive and refractory eyes; however, CST was higher in refractory eyes than in naive eyes (CST 279 ± 61 vs. 313 ± 125 μm, P = 0.10). Over a follow-up of 24 months, vision improved in diabetic macular edema eyes after treatment with dexamethasone implants, both in eyes that were treatment naive and eyes refractory to anti-vascular endothelial growth factor treatment; however, improvement was greater in naive eyes.

  5. Modeling Verdict Outcomes Using Social Network Measures: The Watergate and Caviar Network Cases

    PubMed Central

    2016-01-01

    Modelling criminal trial verdict outcomes using social network measures is an emerging research area in quantitative criminology. Few studies have yet analyzed which of these measures are the most important for verdict modelling or which data classification techniques perform best for this application. To compare the performance of different techniques in classifying members of a criminal network, this article applies three different machine learning classifiers–Logistic Regression, Naïve Bayes and Random Forest–with a range of social network measures and the necessary databases to model the verdicts in two real–world cases: the U.S. Watergate Conspiracy of the 1970’s and the now–defunct Canada–based international drug trafficking ring known as the Caviar Network. In both cases it was found that the Random Forest classifier did better than either Logistic Regression or Naïve Bayes, and its superior performance was statistically significant. This being so, Random Forest was used not only for classification but also to assess the importance of the measures. For the Watergate case, the most important one proved to be betweenness centrality while for the Caviar Network, it was the effective size of the network. These results are significant because they show that an approach combining machine learning with social network analysis not only can generate accurate classification models but also helps quantify the importance social network variables in modelling verdict outcomes. We conclude our analysis with a discussion and some suggestions for future work in verdict modelling using social network measures. PMID:26824351

  6. Evaluating the Impact of Land Use Change on Submerged Aquatic Vegetation Stressors in Mobile Bay

    NASA Technical Reports Server (NTRS)

    Al-Hamdan, Mohammad; Estes, Maurice G., Jr.; Quattrochi, Dale; Thom, Ronald; Woodruff, Dana; Judd, Chaeli; Ellis, Jean; Watson, Brian; Rodriquez, Hugo; Johnson, Hoyt

    2009-01-01

    Alabama coastal systems have been subjected to increasing pressure from a variety of activities including urban and rural development, shoreline modifications, industrial activities, and dredging of shipping and navigation channels. The impacts on coastal ecosystems are often observed through the use of indicator species. One such indicator species for aquatic ecosystem health is submerged aquatic vegetation (SAV). Watershed and hydrodynamic modeling has been performed to evaluate the impact of land use change in Mobile and Baldwin counties on SAV stressors and controlling factors (temperature, salinity, and sediment) in Mobile Bay. Watershed modeling using the Loading Simulation Package in C++ (LSPC) was performed for all watersheds contiguous to Mobile Bay for land use scenarios in 1948, 1992, 2001, and 2030. Landsat-derived National Land Cover Data (NLCD) were used in the 1992 and 2001 simulations after having been reclassified to a common classification scheme. The Prescott Spatial Growth Model was used to project the 2030 land use scenario based on current trends. The LSPC model simulations provided output on changes in flow, temperature, and sediment for 22 discharge points into the Bay. Theses results were inputted in the Environmental Fluid Dynamics Computer Code (EFDC) hydrodynamic model to generate data on changes in temperature, salinity, and sediment on a grid with four vertical profiles throughout Mobile Bay. The changes in the aquatic ecosystem were used to perform an ecological analysis to evaluate the impact on SAV habitat suitability. This is the key product benefiting the Mobile Bay coastal environmental managers that integrates the influences of temperature, salinity, and sediment due to land use driven flow changes with the restoration potential of SAVs.

  7. Seafloor habitat mapping and classification in Glacier Bay, Alaska: Phase 1 & 2 1996-2004

    USGS Publications Warehouse

    Hooge, Philip N.; Carlson, Paul R.; Mondragon, Jennifer; Etherington, Lisa L.; Cochran, G.R.

    2004-01-01

    Glacier Bay is a diverse fjord ecosystem with multiple sills, numerous tidewater glaciers and a highly complex oceanographic system. The Bay was completely glaciated prior to the 1700’s and subsequently experienced the fastest glacial retreat recorded in historical times. Currently, some of the highest sedimentation rates ever observed occur in the Bay, along with rapid uplift (up to 2.5 cm/year) due to a combination of plate tectonics and isostatic rebound. Glacier Bay is the second deepest fjord in Alaska, with depths over 500 meters. This variety of physical processes and bathymetry creates many diverse habitats within a relatively small area (1,255 km2 ). Habitat can be defined as the locality, including resources and environmental conditions, occupied by a species or population of organisms (Morrison et al 1992). Mapping and characterization of benthic habitat is crucial to an understanding of marine species and can serve a variety of purposes including: understanding species distributions and improving stock assessments, designing special management areas and marine protected areas, monitoring and protecting important habitats, and assessing habitat change due to natural or human impacts. In 1996, Congress recognized the importance of understanding benthic habitat for fisheries management by reauthorizing the Magnuson-Stevens Fishery Conservation and Management Act and amending it with the Sustainable Fisheries Act (SFA). This amendment emphasizes the importance of habitat protection to healthy fisheries and requires identification of essential fish habitat in management decisions. Recently, the National Park Service’s Ocean Stewardship Strategy identified the creation of benthic habitat maps and sediment maps as crucial components to complete basic ocean park resource inventories (Davis 2003). Glacier Bay National Park managers currently have very limited knowledge about the bathymetry, sediment types, and various marine habitats of ecological importance in the Park. Ocean floor bathymetry and sediment type are the building blocks of marine communities. Bottom type and shape affects the kinds of benthic communities that develop in a particular environment as well as the oceanographic conditions that communities are subject to. Accurate mapping of the ocean floor is essential for park manager’s understanding of existing marine communities and will be important in assessing human induced changes (e.g., vessel traffic and commercial fishing), biological change (e.g., rapid sea otter recolonization), and geological processes of change (e.g., deglaciation). Information on animal-habitat relationships, particularly within a marine reserve framework, will be valuable to agencies making decisions about critical habitats, marine reserve design, as well as fishery management. Identification and mapping of benthic habitat provides National Park Service mangers with tools to increase the effectiveness of resource management. The primary objective of this project is to investigate the geological characteristics of the biological habitats of halibut, Dungeness crab, king crab, and Tanner crab within Glacier Bay National Park. Additionally, habitat classification of shallow water regions of Glacier Bay will provide crucial information on the relationship between benthic habitat features and the abundance of benthic prey items for a variety of marine predators, including sea ducks, the rapidly increasing population of sea otters, and other marine mammals. 

  8. Students academic performance based on behavior

    NASA Astrophysics Data System (ADS)

    Maulida, Juwita Dien; Kariyam

    2017-12-01

    Utilization of data in an information system that can be used for decision making that utilizes existing data warehouse to help dig useful information to make decisions correctly and accurately. Experience API (xAPI) is one of the enabling technologies for collecting data, so xAPI can be used as a data warehouse that can be used for various needs. One software application whose data is collected in xAPI is LMS. LMS is a software used in an electronic learning process that can handle all aspects of learning, by using LMS can also be known how the learning process and the aspects that can affect learning achievement. One of the aspects that can affect the learning achievement is the background of each student, which is not necessarily the student with a good background is an outstanding student or vice versa. Therefore, an action is needed to anticipate this problem. Prediction of student academic performance using Naive Bayes algorithm obtained accuracy of 67.7983% and error 32.2917%.

  9. Syndrome diagnosis: human intuition or machine intelligence?

    PubMed

    Braaten, Oivind; Friestad, Johannes

    2008-01-01

    The aim of this study was to investigate whether artificial intelligence methods can represent objective methods that are essential in syndrome diagnosis. Most syndromes have no external criterion standard of diagnosis. The predictive value of a clinical sign used in diagnosis is dependent on the prior probability of the syndrome diagnosis. Clinicians often misjudge the probabilities involved. Syndromology needs objective methods to ensure diagnostic consistency, and take prior probabilities into account. We applied two basic artificial intelligence methods to a database of machine-generated patients - a 'vector method' and a set method. As reference methods we ran an ID3 algorithm, a cluster analysis and a naive Bayes' calculation on the same patient series. The overall diagnostic error rate for the the vector algorithm was 0.93%, and for the ID3 0.97%. For the clinical signs found by the set method, the predictive values varied between 0.71 and 1.0. The artificial intelligence methods that we used, proved simple, robust and powerful, and represent objective diagnostic methods.

  10. Effective and extensible feature extraction method using genetic algorithm-based frequency-domain feature search for epileptic EEG multiclassification

    PubMed Central

    Wen, Tingxi; Zhang, Zhongnan

    2017-01-01

    Abstract In this paper, genetic algorithm-based frequency-domain feature search (GAFDS) method is proposed for the electroencephalogram (EEG) analysis of epilepsy. In this method, frequency-domain features are first searched and then combined with nonlinear features. Subsequently, these features are selected and optimized to classify EEG signals. The extracted features are analyzed experimentally. The features extracted by GAFDS show remarkable independence, and they are superior to the nonlinear features in terms of the ratio of interclass distance and intraclass distance. Moreover, the proposed feature search method can search for features of instantaneous frequency in a signal after Hilbert transformation. The classification results achieved using these features are reasonable; thus, GAFDS exhibits good extensibility. Multiple classical classifiers (i.e., k-nearest neighbor, linear discriminant analysis, decision tree, AdaBoost, multilayer perceptron, and Naïve Bayes) achieve satisfactory classification accuracies by using the features generated by the GAFDS method and the optimized feature selection. The accuracies for 2-classification and 3-classification problems may reach up to 99% and 97%, respectively. Results of several cross-validation experiments illustrate that GAFDS is effective in the extraction of effective features for EEG classification. Therefore, the proposed feature selection and optimization model can improve classification accuracy. PMID:28489789

  11. Automating document classification for the Immune Epitope Database

    PubMed Central

    Wang, Peng; Morgan, Alexander A; Zhang, Qing; Sette, Alessandro; Peters, Bjoern

    2007-01-01

    Background The Immune Epitope Database contains information on immune epitopes curated manually from the scientific literature. Like similar projects in other knowledge domains, significant effort is spent on identifying which articles are relevant for this purpose. Results We here report our experience in automating this process using Naïve Bayes classifiers trained on 20,910 abstracts classified by domain experts. Improvements on the basic classifier performance were made by a) utilizing information stored in PubMed beyond the abstract itself b) applying standard feature selection criteria and c) extracting domain specific feature patterns that e.g. identify peptides sequences. We have implemented the classifier into the curation process determining if abstracts are clearly relevant, clearly irrelevant, or if no certain classification can be made, in which case the abstracts are manually classified. Testing this classification scheme on an independent dataset, we achieve 95% sensitivity and specificity in the 51.1% of abstracts that were automatically classified. Conclusion By implementing text classification, we have sped up the reference selection process without sacrificing sensitivity or specificity of the human expert classification. This study provides both practical recommendations for users of text classification tools, as well as a large dataset which can serve as a benchmark for tool developers. PMID:17655769

  12. Effective and extensible feature extraction method using genetic algorithm-based frequency-domain feature search for epileptic EEG multiclassification.

    PubMed

    Wen, Tingxi; Zhang, Zhongnan

    2017-05-01

    In this paper, genetic algorithm-based frequency-domain feature search (GAFDS) method is proposed for the electroencephalogram (EEG) analysis of epilepsy. In this method, frequency-domain features are first searched and then combined with nonlinear features. Subsequently, these features are selected and optimized to classify EEG signals. The extracted features are analyzed experimentally. The features extracted by GAFDS show remarkable independence, and they are superior to the nonlinear features in terms of the ratio of interclass distance and intraclass distance. Moreover, the proposed feature search method can search for features of instantaneous frequency in a signal after Hilbert transformation. The classification results achieved using these features are reasonable; thus, GAFDS exhibits good extensibility. Multiple classical classifiers (i.e., k-nearest neighbor, linear discriminant analysis, decision tree, AdaBoost, multilayer perceptron, and Naïve Bayes) achieve satisfactory classification accuracies by using the features generated by the GAFDS method and the optimized feature selection. The accuracies for 2-classification and 3-classification problems may reach up to 99% and 97%, respectively. Results of several cross-validation experiments illustrate that GAFDS is effective in the extraction of effective features for EEG classification. Therefore, the proposed feature selection and optimization model can improve classification accuracy.

  13. A decision support model for investment on P2P lending platform.

    PubMed

    Zeng, Xiangxiang; Liu, Li; Leung, Stephen; Du, Jiangze; Wang, Xun; Li, Tao

    2017-01-01

    Peer-to-peer (P2P) lending, as a novel economic lending model, has triggered new challenges on making effective investment decisions. In a P2P lending platform, one lender can invest N loans and a loan may be accepted by M investors, thus forming a bipartite graph. Basing on the bipartite graph model, we built an iteration computation model to evaluate the unknown loans. To validate the proposed model, we perform extensive experiments on real-world data from the largest American P2P lending marketplace-Prosper. By comparing our experimental results with those obtained by Bayes and Logistic Regression, we show that our computation model can help borrowers select good loans and help lenders make good investment decisions. Experimental results also show that the Logistic classification model is a good complement to our iterative computation model, which motivates us to integrate the two classification models. The experimental results of the hybrid classification model demonstrate that the logistic classification model and our iteration computation model are complementary to each other. We conclude that the hybrid model (i.e., the integration of iterative computation model and Logistic classification model) is more efficient and stable than the individual model alone.

  14. Maximum a posteriori classification of multifrequency, multilook, synthetic aperture radar intensity data

    NASA Technical Reports Server (NTRS)

    Rignot, E.; Chellappa, R.

    1993-01-01

    We present a maximum a posteriori (MAP) classifier for classifying multifrequency, multilook, single polarization SAR intensity data into regions or ensembles of pixels of homogeneous and similar radar backscatter characteristics. A model for the prior joint distribution of the multifrequency SAR intensity data is combined with a Markov random field for representing the interactions between region labels to obtain an expression for the posterior distribution of the region labels given the multifrequency SAR observations. The maximization of the posterior distribution yields Bayes's optimum region labeling or classification of the SAR data or its MAP estimate. The performance of the MAP classifier is evaluated by using computer-simulated multilook SAR intensity data as a function of the parameters in the classification process. Multilook SAR intensity data are shown to yield higher classification accuracies than one-look SAR complex amplitude data. The MAP classifier is extended to the case in which the radar backscatter from the remotely sensed surface varies within the SAR image because of incidence angle effects. The results obtained illustrate the practicality of the method for combining SAR intensity observations acquired at two different frequencies and for improving classification accuracy of SAR data.

  15. A decision support model for investment on P2P lending platform

    PubMed Central

    Liu, Li; Leung, Stephen; Du, Jiangze; Wang, Xun; Li, Tao

    2017-01-01

    Peer-to-peer (P2P) lending, as a novel economic lending model, has triggered new challenges on making effective investment decisions. In a P2P lending platform, one lender can invest N loans and a loan may be accepted by M investors, thus forming a bipartite graph. Basing on the bipartite graph model, we built an iteration computation model to evaluate the unknown loans. To validate the proposed model, we perform extensive experiments on real-world data from the largest American P2P lending marketplace—Prosper. By comparing our experimental results with those obtained by Bayes and Logistic Regression, we show that our computation model can help borrowers select good loans and help lenders make good investment decisions. Experimental results also show that the Logistic classification model is a good complement to our iterative computation model, which motivates us to integrate the two classification models. The experimental results of the hybrid classification model demonstrate that the logistic classification model and our iteration computation model are complementary to each other. We conclude that the hybrid model (i.e., the integration of iterative computation model and Logistic classification model) is more efficient and stable than the individual model alone. PMID:28877234

  16. IL-7-Induced Proliferation of Human Naive CD4 T-Cells Relies on Continued Thymic Activity.

    PubMed

    Silva, Susana L; Albuquerque, Adriana S; Matoso, Paula; Charmeteau-de-Muylder, Bénédicte; Cheynier, Rémi; Ligeiro, Dário; Abecasis, Miguel; Anjos, Rui; Barata, João T; Victorino, Rui M M; Sousa, Ana E

    2017-01-01

    Naive CD4 T-cell maintenance is critical for immune competence. We investigated here the fine-tuning of homeostatic mechanisms of the naive compartment to counteract the loss of de novo CD4 T-cell generation. Adults thymectomized in early childhood during corrective cardiac surgery were grouped based on presence or absence of thymopoiesis and compared with age-matched controls. We found that the preservation of the CD31 - subset was independent of the thymus and that its size is tightly controlled by peripheral mechanisms, including prolonged cell survival as attested by Bcl-2 levels. Conversely, a significant contraction of the CD31 + naive subset was observed in the absence of thymic activity. This was associated with impaired responses of purified naive CD4 T-cells to IL-7, namely, in vitro proliferation and upregulation of CD31 expression, which likely potentiated the decline in recent thymic emigrants. Additionally, we found no apparent constraint in the differentiation of naive cells into the memory compartment in individuals completely lacking thymic activity despite upregulation of DUSP6 , a phosphatase associated with increased TCR threshold. Of note, thymectomized individuals featuring some degree of thymopoiesis were able to preserve the size and diversity of the naive CD4 compartment, further arguing against complete thymectomy in infancy. Overall, our data suggest that robust peripheral mechanisms ensure the homeostasis of CD31 - naive CD4 pool and point to the requirement of continuous thymic activity to the maintenance of IL-7-driven homeostatic proliferation of CD31 + naive CD4 T-cells, which is essential to secure T-cell diversity throughout life.

  17. A semi-automated image analysis procedure for in situ plankton imaging systems.

    PubMed

    Bi, Hongsheng; Guo, Zhenhua; Benfield, Mark C; Fan, Chunlei; Ford, Michael; Shahrestani, Suzan; Sieracki, Jeffery M

    2015-01-01

    Plankton imaging systems are capable of providing fine-scale observations that enhance our understanding of key physical and biological processes. However, processing the large volumes of data collected by imaging systems remains a major obstacle for their employment, and existing approaches are designed either for images acquired under laboratory controlled conditions or within clear waters. In the present study, we developed a semi-automated approach to analyze plankton taxa from images acquired by the ZOOplankton VISualization (ZOOVIS) system within turbid estuarine waters, in Chesapeake Bay. When compared to images under laboratory controlled conditions or clear waters, images from highly turbid waters are often of relatively low quality and more variable, due to the large amount of objects and nonlinear illumination within each image. We first customized a segmentation procedure to locate objects within each image and extracted them for classification. A maximally stable extremal regions algorithm was applied to segment large gelatinous zooplankton and an adaptive threshold approach was developed to segment small organisms, such as copepods. Unlike the existing approaches for images acquired from laboratory, controlled conditions or clear waters, the target objects are often the majority class, and the classification can be treated as a multi-class classification problem. We customized a two-level hierarchical classification procedure using support vector machines to classify the target objects (< 5%), and remove the non-target objects (> 95%). First, histograms of oriented gradients feature descriptors were constructed for the segmented objects. In the first step all non-target and target objects were classified into different groups: arrow-like, copepod-like, and gelatinous zooplankton. Each object was passed to a group-specific classifier to remove most non-target objects. After the object was classified, an expert or non-expert then manually removed the non-target objects that could not be removed by the procedure. The procedure was tested on 89,419 images collected in Chesapeake Bay, and results were consistent with visual counts with >80% accuracy for all three groups.

  18. A Semi-Automated Image Analysis Procedure for In Situ Plankton Imaging Systems

    PubMed Central

    Bi, Hongsheng; Guo, Zhenhua; Benfield, Mark C.; Fan, Chunlei; Ford, Michael; Shahrestani, Suzan; Sieracki, Jeffery M.

    2015-01-01

    Plankton imaging systems are capable of providing fine-scale observations that enhance our understanding of key physical and biological processes. However, processing the large volumes of data collected by imaging systems remains a major obstacle for their employment, and existing approaches are designed either for images acquired under laboratory controlled conditions or within clear waters. In the present study, we developed a semi-automated approach to analyze plankton taxa from images acquired by the ZOOplankton VISualization (ZOOVIS) system within turbid estuarine waters, in Chesapeake Bay. When compared to images under laboratory controlled conditions or clear waters, images from highly turbid waters are often of relatively low quality and more variable, due to the large amount of objects and nonlinear illumination within each image. We first customized a segmentation procedure to locate objects within each image and extracted them for classification. A maximally stable extremal regions algorithm was applied to segment large gelatinous zooplankton and an adaptive threshold approach was developed to segment small organisms, such as copepods. Unlike the existing approaches for images acquired from laboratory, controlled conditions or clear waters, the target objects are often the majority class, and the classification can be treated as a multi-class classification problem. We customized a two-level hierarchical classification procedure using support vector machines to classify the target objects (< 5%), and remove the non-target objects (> 95%). First, histograms of oriented gradients feature descriptors were constructed for the segmented objects. In the first step all non-target and target objects were classified into different groups: arrow-like, copepod-like, and gelatinous zooplankton. Each object was passed to a group-specific classifier to remove most non-target objects. After the object was classified, an expert or non-expert then manually removed the non-target objects that could not be removed by the procedure. The procedure was tested on 89,419 images collected in Chesapeake Bay, and results were consistent with visual counts with >80% accuracy for all three groups. PMID:26010260

  19. Synergistic use of FIA plot data and Landsat 7 ETM+ images for large area forest mapping

    Treesearch

    Chengquan Huang; Limin Yang; Collin Homer; Michael Coan; Russell Rykhus; Zheng Zhang; Bruce Wylie; Kent Hegge; Andrew Lister; Michael Hoppus; Ronald Tymcio; Larry DeBlander; William Cooke; Ronald McRoberts; Daniel Wendt; Dale Weyermann

    2002-01-01

    FIA plot data were used to assist in classifying forest land cover from Landsat imagery and relevant ancillary data in two regions of the U.S.: one around the Chesapeake Bay area and the other around Utah. The overall accuracies for the forest/nonforest classification were over 90 percent and about 80 percent, respectively, in the two regions. The accuracies for...

  20. On Algorithms for Generating Computationally Simple Piecewise Linear Classifiers

    DTIC Science & Technology

    1989-05-01

    suffers. - Waveform classification, e.g. speech recognition, seismic analysis (i.e. discrimination between earthquakes and nuclear explosions), target...assuming Gaussian distributions (B-G) d) Bayes classifier with probability densities estimated with the k-N-N method (B- kNN ) e) The -arest neighbour...range of classifiers are chosen including a fast, easy computable and often used classifier (B-G), reliable and complex classifiers (B- kNN and NNR

  1. Computer Based Behavioral Biometric Authentication via Multi-Modal Fusion

    DTIC Science & Technology

    2013-03-01

    the decisions made by each individual modality. Fusion of features is the simple concatenation of feature vectors from multiple modalities to be...of Features BayesNet MDL 330 LibSVM PCA 80 J48 Wrapper Evaluator 11 3.5.3 Ensemble Based Decision Level Fusion. In ensemble learning multiple ...The high fusion percentages validate our hypothesis that by combining features from multiple modalities, classification accuracy can be improved. As

  2. PERCH: A Unified Framework for Disease Gene Prioritization.

    PubMed

    Feng, Bing-Jian

    2017-03-01

    To interpret genetic variants discovered from next-generation sequencing, integration of heterogeneous information is vital for success. This article describes a framework named PERCH (Polymorphism Evaluation, Ranking, and Classification for a Heritable trait), available at http://BJFengLab.org/. It can prioritize disease genes by quantitatively unifying a new deleteriousness measure called BayesDel, an improved assessment of the biological relevance of genes to the disease, a modified linkage analysis, a novel rare-variant association test, and a converted variant call quality score. It supports data that contain various combinations of extended pedigrees, trios, and case-controls, and allows for a reduced penetrance, an elevated phenocopy rate, liability classes, and covariates. BayesDel is more accurate than PolyPhen2, SIFT, FATHMM, LRT, Mutation Taster, Mutation Assessor, PhyloP, GERP++, SiPhy, CADD, MetaLR, and MetaSVM. The overall approach is faster and more powerful than the existing quantitative method pVAAST, as shown by the simulations of challenging situations in finding the missing heritability of a complex disease. This framework can also classify variants of unknown significance (variants of uncertain significance) by quantitatively integrating allele frequencies, deleteriousness, association, and co-segregation. PERCH is a versatile tool for gene prioritization in gene discovery research and variant classification in clinical genetic testing. © 2016 The Authors. **Human Mutation published by Wiley Periodicals, Inc.

  3. Bayesian hierarchical modeling for subject-level response classification in peptide microarray immunoassays

    PubMed Central

    Imholte, Gregory; Gottardo, Raphael

    2017-01-01

    Summary The peptide microarray immunoassay simultaneously screens sample serum against thousands of peptides, determining the presence of antibodies bound to array probes. Peptide microarrays tiling immunogenic regions of pathogens (e.g. envelope proteins of a virus) are an important high throughput tool for querying and mapping antibody binding. Because of the assay’s many steps, from probe synthesis to incubation, peptide microarray data can be noisy with extreme outliers. In addition, subjects may produce different antibody profiles in response to an identical vaccine stimulus or infection, due to variability among subjects’ immune systems. We present a robust Bayesian hierarchical model for peptide microarray experiments, pepBayes, to estimate the probability of antibody response for each subject/peptide combination. Heavy-tailed error distributions accommodate outliers and extreme responses, and tailored random effect terms automatically incorporate technical effects prevalent in the assay. We apply our model to two vaccine trial datasets to demonstrate model performance. Our approach enjoys high sensitivity and specificity when detecting vaccine induced antibody responses. A simulation study shows an adaptive thresholding classification method has appropriate false discovery rate control with high sensitivity, and receiver operating characteristics generated on vaccine trial data suggest that pepBayes clearly separates responses from non-responses. PMID:27061097

  4. Classification of sodium MRI data of cartilage using machine learning.

    PubMed

    Madelin, Guillaume; Poidevin, Frederick; Makrymallis, Antonios; Regatte, Ravinder R

    2015-11-01

    To assess the possible utility of machine learning for classifying subjects with and subjects without osteoarthritis using sodium magnetic resonance imaging data. Theory: Support vector machine, k-nearest neighbors, naïve Bayes, discriminant analysis, linear regression, logistic regression, neural networks, decision tree, and tree bagging were tested. Sodium magnetic resonance imaging with and without fluid suppression by inversion recovery was acquired on the knee cartilage of 19 controls and 28 osteoarthritis patients. Sodium concentrations were measured in regions of interests in the knee for both acquisitions. Mean (MEAN) and standard deviation (STD) of these concentrations were measured in each regions of interest, and the minimum, maximum, and mean of these two measurements were calculated over all regions of interests for each subject. The resulting 12 variables per subject were used as predictors for classification. Either Min [STD] alone, or in combination with Mean [MEAN] or Min [MEAN], all from fluid suppressed data, were the best predictors with an accuracy >74%, mainly with linear logistic regression and linear support vector machine. Other good classifiers include discriminant analysis, linear regression, and naïve Bayes. Machine learning is a promising technique for classifying osteoarthritis patients and controls from sodium magnetic resonance imaging data. © 2014 Wiley Periodicals, Inc.

  5. PCANet: A Simple Deep Learning Baseline for Image Classification?

    PubMed

    Chan, Tsung-Han; Jia, Kui; Gao, Shenghua; Lu, Jiwen; Zeng, Zinan; Ma, Yi

    2015-12-01

    In this paper, we propose a very simple deep learning network for image classification that is based on very basic data processing components: 1) cascaded principal component analysis (PCA); 2) binary hashing; and 3) blockwise histograms. In the proposed architecture, the PCA is employed to learn multistage filter banks. This is followed by simple binary hashing and block histograms for indexing and pooling. This architecture is thus called the PCA network (PCANet) and can be extremely easily and efficiently designed and learned. For comparison and to provide a better understanding, we also introduce and study two simple variations of PCANet: 1) RandNet and 2) LDANet. They share the same topology as PCANet, but their cascaded filters are either randomly selected or learned from linear discriminant analysis. We have extensively tested these basic networks on many benchmark visual data sets for different tasks, including Labeled Faces in the Wild (LFW) for face verification; the MultiPIE, Extended Yale B, AR, Facial Recognition Technology (FERET) data sets for face recognition; and MNIST for hand-written digit recognition. Surprisingly, for all tasks, such a seemingly naive PCANet model is on par with the state-of-the-art features either prefixed, highly hand-crafted, or carefully learned [by deep neural networks (DNNs)]. Even more surprisingly, the model sets new records for many classification tasks on the Extended Yale B, AR, and FERET data sets and on MNIST variations. Additional experiments on other public data sets also demonstrate the potential of PCANet to serve as a simple but highly competitive baseline for texture classification and object recognition.

  6. Using geometrical, textural, and contextual information of land parcels for classification of detailed urban land use

    USGS Publications Warehouse

    Wu, S.-S.; Qiu, X.; Usery, E.L.; Wang, L.

    2009-01-01

    Detailed urban land use data are important to government officials, researchers, and businesspeople for a variety of purposes. This article presents an approach to classifying detailed urban land use based on geometrical, textural, and contextual information of land parcels. An area of 6 by 14 km in Austin, Texas, with land parcel boundaries delineated by the Travis Central Appraisal District of Travis County, Texas, is tested for the approach. We derive fifty parcel attributes from relevant geographic information system (GIS) and remote sensing data and use them to discriminate among nine urban land uses: single family, multifamily, commercial, office, industrial, civic, open space, transportation, and undeveloped. Half of the 33,025 parcels in the study area are used as training data for land use classification and the other half are used as testing data for accuracy assessment. The best result with a decision tree classification algorithm has an overall accuracy of 96 percent and a kappa coefficient of 0.78, and two naive, baseline models based on the majority rule and the spatial autocorrelation rule have overall accuracy of 89 percent and 79 percent, respectively. The algorithm is relatively good at classifying single-family, multifamily, commercial, open space, and undeveloped land uses and relatively poor at classifying office, industrial, civic, and transportation land uses. The most important attributes for land use classification are the geometrical attributes, particularly those related to building areas. Next are the contextual attributes, particularly those relevant to the spatial relationship between buildings, then the textural attributes, particularly the semivariance texture statistic from 0.61-m resolution images.

  7. ADMET Evaluation in Drug Discovery. 16. Predicting hERG Blockers by Combining Multiple Pharmacophores and Machine Learning Approaches.

    PubMed

    Wang, Shuangquan; Sun, Huiyong; Liu, Hui; Li, Dan; Li, Youyong; Hou, Tingjun

    2016-08-01

    Blockade of human ether-à-go-go related gene (hERG) channel by compounds may lead to drug-induced QT prolongation, arrhythmia, and Torsades de Pointes (TdP), and therefore reliable prediction of hERG liability in the early stages of drug design is quite important to reduce the risk of cardiotoxicity-related attritions in the later development stages. In this study, pharmacophore modeling and machine learning approaches were combined to construct classification models to distinguish hERG active from inactive compounds based on a diverse data set. First, an optimal ensemble of pharmacophore hypotheses that had good capability to differentiate hERG active from inactive compounds was identified by the recursive partitioning (RP) approach. Then, the naive Bayesian classification (NBC) and support vector machine (SVM) approaches were employed to construct classification models by integrating multiple important pharmacophore hypotheses. The integrated classification models showed improved predictive capability over any single pharmacophore hypothesis, suggesting that the broad binding polyspecificity of hERG can only be well characterized by multiple pharmacophores. The best SVM model achieved the prediction accuracies of 84.7% for the training set and 82.1% for the external test set. Notably, the accuracies for the hERG blockers and nonblockers in the test set reached 83.6% and 78.2%, respectively. Analysis of significant pharmacophores helps to understand the multimechanisms of action of hERG blockers. We believe that the combination of pharmacophore modeling and SVM is a powerful strategy to develop reliable theoretical models for the prediction of potential hERG liability.

  8. Risk of Erectile Dysfunction in Transfusion-naive Thalassemia Men

    PubMed Central

    Chen, Yu-Guang; Lin, Te-Yu; Lin, Cheng-Li; Dai, Ming-Shen; Ho, Ching-Liang; Kao, Chia-Hung

    2015-01-01

    Abstract Based on the mechanism of pathophysiology, thalassemia major or transfusion-dependent thalassemia patients may have an increased risk of developing organic erectile dysfunction resulting from hypogonadism. However, there have been few studies investigating the association between erectile dysfunction and transfusion-naive thalassemia populations. We constructed a population-based cohort study to elucidate the association between transfusion-naive thalassemia populations and organic erectile dysfunction This nationwide population-based cohort study involved analyzing data from 1998 to 2010 obtained from the Taiwanese National Health Insurance Research Database, with a follow-up period extending to the end of 2011. We identified men with transfusion-naive thalassemia and selected a comparison cohort that was frequency-matched with these according to age, and year of diagnosis thalassemia at a ratio of 1 thalassemia man to 4 control men. We analyzed the risks for transfusion-naive thalassemia men and organic erectile dysfunction by using Cox proportional hazards regression models. In this study, 588 transfusion-naive thalassemia men and 2337 controls were included. Total 12 patients were identified within the thalassaemia group and 10 within the control group. The overall risks for developing organic erectile dysfunction were 4.56-fold in patients with transfusion-naive thalassemia men compared with the comparison cohort after we adjusted for age and comorbidities. Our long-term cohort study results showed that in transfusion-naive thalassemia men, there was a higher risk for the development of organic erectile dysfunction, particularly in those patients with comorbidities. PMID:25837766

  9. Risk of erectile dysfunction in transfusion-naive thalassemia men: a nationwide population-based retrospective cohort study.

    PubMed

    Chen, Yu-Guang; Lin, Te-Yu; Lin, Cheng-Li; Dai, Ming-Shen; Ho, Ching-Liang; Kao, Chia-Hung

    2015-04-01

    Based on the mechanism of pathophysiology, thalassemia major or transfusion-dependent thalassemia patients may have an increased risk of developing organic erectile dysfunction resulting from hypogonadism. However, there have been few studies investigating the association between erectile dysfunction and transfusion-naive thalassemia populations. We constructed a population-based cohort study to elucidate the association between transfusion-naive thalassemia populations and organic erectile dysfunction. This nationwide population-based cohort study involved analyzing data from 1998 to 2010 obtained from the Taiwanese National Health Insurance Research Database, with a follow-up period extending to the end of 2011. We identified men with transfusion-naive thalassemia and selected a comparison cohort that was frequency-matched with these according to age, and year of diagnosis thalassemia at a ratio of 1 thalassemia man to 4 control men. We analyzed the risks for transfusion-naive thalassemia men and organic erectile dysfunction by using Cox proportional hazards regression models. In this study, 588 transfusion-naive thalassemia men and 2337 controls were included. Total 12 patients were identified within the thalassaemia group and 10 within the control group. The overall risks for developing organic erectile dysfunction were 4.56-fold in patients with transfusion-naive thalassemia men compared with the comparison cohort after we adjusted for age and comorbidities. Our long-term cohort study results showed that in transfusion-naive thalassemia men, there was a higher risk for the development of organic erectile dysfunction, particularly in those patients with comorbidities.

  10. Integrating conventional and inverse representation for face recognition.

    PubMed

    Xu, Yong; Li, Xuelong; Yang, Jian; Lai, Zhihui; Zhang, David

    2014-10-01

    Representation-based classification methods are all constructed on the basis of the conventional representation, which first expresses the test sample as a linear combination of the training samples and then exploits the deviation between the test sample and the expression result of every class to perform classification. However, this deviation does not always well reflect the difference between the test sample and each class. With this paper, we propose a novel representation-based classification method for face recognition. This method integrates conventional and the inverse representation-based classification for better recognizing the face. It first produces conventional representation of the test sample, i.e., uses a linear combination of the training samples to represent the test sample. Then it obtains the inverse representation, i.e., provides an approximation representation of each training sample of a subject by exploiting the test sample and training samples of the other subjects. Finally, the proposed method exploits the conventional and inverse representation to generate two kinds of scores of the test sample with respect to each class and combines them to recognize the face. The paper shows the theoretical foundation and rationale of the proposed method. Moreover, this paper for the first time shows that a basic nature of the human face, i.e., the symmetry of the face can be exploited to generate new training and test samples. As these new samples really reflect some possible appearance of the face, the use of them will enable us to obtain higher accuracy. The experiments show that the proposed conventional and inverse representation-based linear regression classification (CIRLRC), an improvement to linear regression classification (LRC), can obtain very high accuracy and greatly outperforms the naive LRC and other state-of-the-art conventional representation based face recognition methods. The accuracy of CIRLRC can be 10% greater than that of LRC.

  11. Speaker gender identification based on majority vote classifiers

    NASA Astrophysics Data System (ADS)

    Mezghani, Eya; Charfeddine, Maha; Nicolas, Henri; Ben Amar, Chokri

    2017-03-01

    Speaker gender identification is considered among the most important tools in several multimedia applications namely in automatic speech recognition, interactive voice response systems and audio browsing systems. Gender identification systems performance is closely linked to the selected feature set and the employed classification model. Typical techniques are based on selecting the best performing classification method or searching optimum tuning of one classifier parameters through experimentation. In this paper, we consider a relevant and rich set of features involving pitch, MFCCs as well as other temporal and frequency-domain descriptors. Five classification models including decision tree, discriminant analysis, nave Bayes, support vector machine and k-nearest neighbor was experimented. The three best perming classifiers among the five ones will contribute by majority voting between their scores. Experimentations were performed on three different datasets spoken in three languages: English, German and Arabic in order to validate language independency of the proposed scheme. Results confirm that the presented system has reached a satisfying accuracy rate and promising classification performance thanks to the discriminating abilities and diversity of the used features combined with mid-level statistics.

  12. Evaluation of Interruption Behavior by Naive Encoders.

    ERIC Educational Resources Information Center

    Coon, Christine A.; Schwanenflugel, Paula J.

    1996-01-01

    Determines the characteristics of interactions that influence judgments of interruption behavior in naive observers. Asks subjects to decide whether an example of an interruption was an interruption and then rate it in terms of how "good" or "bad" it was. Finds that naive observers use some of the same features described in…

  13. Naive Juveniles Are More Likely to Become Breeders after Witnessing Predator Mobbing.

    PubMed

    Griesser, Michael; Suzuki, Toshitaka N

    2017-01-01

    Responding appropriately during the first predatory attack in life is often critical for survival. In many social species, naive juveniles acquire this skill from conspecifics, but its fitness consequences remain virtually unknown. Here we experimentally demonstrate how naive juvenile Siberian jays (Perisoreus infaustus) derive a long-term fitness benefit from witnessing knowledgeable adults mobbing their principal predator, the goshawk (Accipiter gentilis). Siberian jays live in family groups of two to six individuals that also can include unrelated nonbreeders. Field observations showed that Siberian jays encounter predators only rarely, and, indeed, naive juveniles do not respond to predator models when on their own but do when observing other individuals mobbing them. Predator exposure experiments demonstrated that naive juveniles had a substantially higher first-winter survival after observing knowledgeable group members mobbing a goshawk model, increasing their likelihood of acquiring a breeding position later in life. Previous research showed that naive individuals may learn from others how to respond to predators, care for offspring, or choose mates, generally assuming that social learning has long-term fitness consequences without empirical evidence. Our results demonstrate a long-term fitness benefit of vertical social learning for naive individuals in the wild, emphasizing its evolutionary importance in animals, including humans.

  14. Soils and Vegetation of the Khaipudyr Bay Coast of the Barents Sea

    NASA Astrophysics Data System (ADS)

    Shamrikova, E. V.; Deneva, S. V.; Panyukov, A. N.; Kubik, O. S.

    2018-04-01

    Soils and vegetation of the coastal zone of the Khaipudyr Bay of the Barents Sea have been examined and compared with analogous objects in the Karelian coastal zone of the White Sea. The environmental conditions of these two areas are somewhat different: the climate of the Khaipudyr Bay coast is more severe, and the seawater salinity is higher (32-33‰ in the Khaipudyr Bay and 25-26‰ in the White Sea). The soil cover patterns of both regions are highly variable. Salt-affected marsh soils (Tidalic Fluvisols) are widespread. The complicated mesotopography includes high geomorphic positions that are not affected by tidal water. Under these conditions, zonal factors of pedogenesis predominate and lead to the development of Cryic Folic Histosols and Histic Reductaquic Cryosols. On low marshes, the concentrations of soluble Ca2+, K+ + Na+, Cl-, and SO2- 4 ions in the soils of the Khaipudyr Bay coast are two to four times higher than those in the analogous soils of Karelian coast. Cluster analysis of a number of soil characteristics allows separation of three soils groups: soils of low marshes, soils of middle-high marshes, and soils of higher positions developing under the impact of zonal factors together with the aerial transfer and deposition of seawater drops. The corresponding plant communities are represented by coastal sedge cenoses, forb-grassy halophytic cenoses, and zonal cenoses of hypoarctic tundra. It is argued that the grouping of marsh soils in the new substantivegenetic classification system of Russian soils requires further elaboration.

  15. Multilayer perceptron, fuzzy sets, and classification

    NASA Technical Reports Server (NTRS)

    Pal, Sankar K.; Mitra, Sushmita

    1992-01-01

    A fuzzy neural network model based on the multilayer perceptron, using the back-propagation algorithm, and capable of fuzzy classification of patterns is described. The input vector consists of membership values to linguistic properties while the output vector is defined in terms of fuzzy class membership values. This allows efficient modeling of fuzzy or uncertain patterns with appropriate weights being assigned to the backpropagated errors depending upon the membership values at the corresponding outputs. During training, the learning rate is gradually decreased in discrete steps until the network converges to a minimum error solution. The effectiveness of the algorithm is demonstrated on a speech recognition problem. The results are compared with those of the conventional MLP, the Bayes classifier, and the other related models.

  16. An automated approach to the design of decision tree classifiers

    NASA Technical Reports Server (NTRS)

    Argentiero, P.; Chin, R.; Beaudet, P.

    1982-01-01

    An automated technique is presented for designing effective decision tree classifiers predicated only on a priori class statistics. The procedure relies on linear feature extractions and Bayes table look-up decision rules. Associated error matrices are computed and utilized to provide an optimal design of the decision tree at each so-called 'node'. A by-product of this procedure is a simple algorithm for computing the global probability of correct classification assuming the statistical independence of the decision rules. Attention is given to a more precise definition of decision tree classification, the mathematical details on the technique for automated decision tree design, and an example of a simple application of the procedure using class statistics acquired from an actual Landsat scene.

  17. Revealing Fundamental Physics from the Daya Bay Neutrino Experiment Using Deep Neural Networks

    DOE PAGES

    Racah, Evan; Ko, Seyoon; Sadowski, Peter; ...

    2017-02-02

    Experiments in particle physics produce enormous quantities of data that must be analyzed and interpreted by teams of physicists. This analysis is often exploratory, where scientists are unable to enumerate the possible types of signal prior to performing the experiment. Thus, tools for summarizing, clustering, visualizing and classifying high-dimensional data are essential. Here in this work, we show that meaningful physical content can be revealed by transforming the raw data into a learned high-level representation using deep neural networks, with measurements taken at the Daya Bay Neutrino Experiment as a case study. We further show how convolutional deep neural networksmore » can provide an effective classification filter with greater than 97% accuracy across different classes of physics events, significantly better than other machine learning approaches.« less

  18. Spatial estimation from remotely sensed data via empirical Bayes models

    NASA Technical Reports Server (NTRS)

    Hill, J. R.; Hinkley, D. V.; Kostal, H.; Morris, C. N.

    1984-01-01

    Multichannel satellite image data, available as LANDSAT imagery, are recorded as a multivariate time series (four channels, multiple passovers) in two spatial dimensions. The application of parametric empirical Bayes theory to classification of, and estimating the probability of, each crop type at each of a large number of pixels is considered. This theory involves both the probability distribution of imagery data, conditional on crop types, and the prior spatial distribution of crop types. For the latter Markov models indexed by estimable parameters are used. A broad outline of the general theory reveals several questions for further research. Some detailed results are given for the special case of two crop types when only a line transect is analyzed. Finally, the estimation of an underlying continuous process on the lattice is discussed which would be applicable to such quantities as crop yield.

  19. The ABAG biogenic emissions inventory project

    NASA Technical Reports Server (NTRS)

    Carson-Henry, C. (Editor)

    1982-01-01

    The ability to identify the role of biogenic hydrocarbon emissions in contributing to overall ozone production in the Bay Area, and to identify the significance of that role, were investigated in a joint project of the Association of Bay Area Governments (ABAG) and NASA/Ames Research Center. Ozone, which is produced when nitrogen oxides and hydrocarbons combine in the presence of sunlight, is a primary factor in air quality planning. In investigating the role of biogenic emissions, this project employed a pre-existing land cover classification to define areal extent of land cover types. Emission factors were then derived for those cover types. The land cover data and emission factors were integrated into an existing geographic information system, where they were combined to form a Biogenic Hydrocarbon Emissions Inventory. The emissions inventory information was then integrated into an existing photochemical dispersion model.

  20. Identifying the key taxonomic categories that characterize microbial community diversity using full-scale classification: a case study of microbial communities in the sediments of Hangzhou Bay.

    PubMed

    Dai, Tianjiao; Zhang, Yan; Tang, Yushi; Bai, Yaohui; Tao, Yile; Huang, Bei; Wen, Donghui

    2016-10-01

    Coastal areas are land-sea transitional zones with complex natural and anthropogenic disturbances. Microorganisms in coastal sediments adapt to such disturbances both individually and as a community. The microbial community structure changes spatially and temporally under environmental stress. In this study, we investigated the microbial community structure in the sediments of Hangzhou Bay, a seriously polluted bay in China. In order to identify the roles and contribution of all microbial taxa, we set thresholds as 0.1% for rare taxa and 1% for abundant taxa, and classified all operational taxonomic units into six exclusive categories based on their abundance. The results showed that the key taxa in differentiating the communities are abundant taxa (AT), conditionally abundant taxa (CAT), and conditionally rare or abundant taxa (CRAT). A large population in conditionally rare taxa (CRT) made this category collectively significant in differentiating the communities. Both bacteria and archaea demonstrated a distance decay pattern of community similarity in the bay, and this pattern was strengthened by rare taxa, CRT and CRAT, but weakened by AT and CAT. This implied that the low abundance taxa were more deterministically distributed, while the high abundance taxa were more ubiquitously distributed. © FEMS 2016. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  1. Integrating multiple fitting regression and Bayes decision for cancer diagnosis with transcriptomic data from tumor-educated blood platelets.

    PubMed

    Huang, Guangzao; Yuan, Mingshun; Chen, Moliang; Li, Lei; You, Wenjie; Li, Hanjie; Cai, James J; Ji, Guoli

    2017-10-07

    The application of machine learning in cancer diagnostics has shown great promise and is of importance in clinic settings. Here we consider applying machine learning methods to transcriptomic data derived from tumor-educated platelets (TEPs) from individuals with different types of cancer. We aim to define a reliability measure for diagnostic purposes to increase the potential for facilitating personalized treatments. To this end, we present a novel classification method called MFRB (for Multiple Fitting Regression and Bayes decision), which integrates the process of multiple fitting regression (MFR) with Bayes decision theory. MFR is first used to map multidimensional features of the transcriptomic data into a one-dimensional feature. The probability density function of each class in the mapped space is then adjusted using the Gaussian probability density function. Finally, the Bayes decision theory is used to build a probabilistic classifier with the estimated probability density functions. The output of MFRB can be used to determine which class a sample belongs to, as well as to assign a reliability measure for a given class. The classical support vector machine (SVM) and probabilistic SVM (PSVM) are used to evaluate the performance of the proposed method with simulated and real TEP datasets. Our results indicate that the proposed MFRB method achieves the best performance compared to SVM and PSVM, mainly due to its strong generalization ability for limited, imbalanced, and noisy data.

  2. Derivation of novel human ground state naive pluripotent stem cells.

    PubMed

    Gafni, Ohad; Weinberger, Leehee; Mansour, Abed AlFatah; Manor, Yair S; Chomsky, Elad; Ben-Yosef, Dalit; Kalma, Yael; Viukov, Sergey; Maza, Itay; Zviran, Asaf; Rais, Yoach; Shipony, Zohar; Mukamel, Zohar; Krupalnik, Vladislav; Zerbib, Mirie; Geula, Shay; Caspi, Inbal; Schneir, Dan; Shwartz, Tamar; Gilad, Shlomit; Amann-Zalcenstein, Daniela; Benjamin, Sima; Amit, Ido; Tanay, Amos; Massarwa, Rada; Novershtern, Noa; Hanna, Jacob H

    2013-12-12

    Mouse embryonic stem (ES) cells are isolated from the inner cell mass of blastocysts, and can be preserved in vitro in a naive inner-cell-mass-like configuration by providing exogenous stimulation with leukaemia inhibitory factor (LIF) and small molecule inhibition of ERK1/ERK2 and GSK3β signalling (termed 2i/LIF conditions). Hallmarks of naive pluripotency include driving Oct4 (also known as Pou5f1) transcription by its distal enhancer, retaining a pre-inactivation X chromosome state, and global reduction in DNA methylation and in H3K27me3 repressive chromatin mark deposition on developmental regulatory gene promoters. Upon withdrawal of 2i/LIF, naive mouse ES cells can drift towards a primed pluripotent state resembling that of the post-implantation epiblast. Although human ES cells share several molecular features with naive mouse ES cells, they also share a variety of epigenetic properties with primed murine epiblast stem cells (EpiSCs). These include predominant use of the proximal enhancer element to maintain OCT4 expression, pronounced tendency for X chromosome inactivation in most female human ES cells, increase in DNA methylation and prominent deposition of H3K27me3 and bivalent domain acquisition on lineage regulatory genes. The feasibility of establishing human ground state naive pluripotency in vitro with equivalent molecular and functional features to those characterized in mouse ES cells remains to be defined. Here we establish defined conditions that facilitate the derivation of genetically unmodified human naive pluripotent stem cells from already established primed human ES cells, from somatic cells through induced pluripotent stem (iPS) cell reprogramming or directly from blastocysts. The novel naive pluripotent cells validated herein retain molecular characteristics and functional properties that are highly similar to mouse naive ES cells, and distinct from conventional primed human pluripotent cells. This includes competence in the generation of cross-species chimaeric mouse embryos that underwent organogenesis following microinjection of human naive iPS cells into mouse morulas. Collectively, our findings establish new avenues for regenerative medicine, patient-specific iPS cell disease modelling and the study of early human development in vitro and in vivo.

  3. Early retirement and income loss in patients with early and advanced Parkinson's disease.

    PubMed

    Johnson, Scott; Davis, Matthew; Kaltenboeck, Anna; Birnbaum, Howard; Grubb, Elizabeth; Tarrants, Marcy; Siderowf, Andrew

    2011-11-01

    The indirect costs of Parkinson's disease (PD) may be larger than direct healthcare costs, and the largest component of indirect costs is income loss related to early retirement. No recent retrospective analysis details PD-related early retirement and income loss in the US. We used an observational, matched cohort to study wages and labour force participation over 4 years and to simulate lifetime income losses conditional on being newly diagnosed with PD (naive) or having evidence of increasing disability. Actively employed primary beneficiaries of private insurance policies aged 18-64 years with more than two PD diagnoses (International Classification of Diseases, Ninth Revision, Clinical Modification [ICD-9-CM]: 332.x) or one diagnosis and a prescription of an antiparkinsonian drug were selected from a privately insured claims database. Continuous health coverage during analysis periods was required. Naive patients were defined as having no claims history indicative of PD during the year prior to first diagnosis or prescription use. A PD with ambulatory assistance devices (PDAAD) cohort was also followed from the date of first evidence of a wheelchair or walker. Controls without PD were matched on age, sex and region. Survival analysis and Wilcoxon rank sum tests were used to compare rates of early retirement and income loss. A simulation of projected economic loss was conducted for PD cohorts diagnosed at different ages using Bureau of Labor Statistics labour force participation and income data. Naive PD patients (n = 278) and PDAAD patients (n = 28) were on average aged 53 years and had significantly higher rates of co-morbidities at baseline versus controls. Conditional on being employed, there was no statistical difference in earnings. However, the hazard of early retirement associated with PD was 2.08 (p < 0.001) for the naive cohort and 5.01 (p < 0.001) for the PDAAD cohort. From age 40 to 79 years, earnings losses in year 2009 values were $US569 393, $US188 590, $US35 496 and $US2451 for those diagnosed at age 45, 55, 65 and 75 years, respectively. Estimates increased by 9% to 37% when using expected 2018 labour force participation estimates. The cost of early retirement associated with patients with PD was substantial. Given that the proportion of Americans participating in the labour force in older age groups is expected to increase, PD-related early retirement costs will likely rise.

  4. A&M. TAN607. Sections for second phase expansion: engine maintenance, machine, ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    A&M. TAN-607. Sections for second phase expansion: engine maintenance, machine, and welding shops; high bay assembly shop, chemical cleaning room (decontamination). Details of sliding door hoods. Approved by INEEL Classification Office for public release. Ralph M. Parsons 1299-5-ANP/GE-3-607-A 109. Date: August 1956. INEEL index code no. 034-0607-00-693-107169 - Idaho National Engineering Laboratory, Test Area North, Scoville, Butte County, ID

  5. Children and Adolescents' Understandings of Family Resemblance: A Study of Naive Inheritance Concepts

    ERIC Educational Resources Information Center

    Williams, Joanne M.

    2012-01-01

    This paper aims to provide developmental data on two connected naive inheritance concepts and to explore the coherence of children's naive biology knowledge. Two tasks examined children and adolescents' (4, 7, 10, and 14 years) conceptions of phenotypic resemblance across kin (in physical characteristics, disabilities, and personality traits). The…

  6. The Persistence of "Solid" and "Liquid" Naive Conceptions: A Reaction Time Study

    ERIC Educational Resources Information Center

    Babai, Reuven; Amsterdamer, Anat

    2008-01-01

    The study explores whether the naive concepts of "solid" and "liquid" persist in adolescence. Accuracy of responses and reaction times where measured while 41 ninth graders classified different solids (rigid, non-rigid and powders) and different liquids (runny, dense) into solid or liquid. The results show that these naive conceptions affect…

  7. Empirical study of seven data mining algorithms on different characteristics of datasets for biomedical classification applications.

    PubMed

    Zhang, Yiyan; Xin, Yi; Li, Qin; Ma, Jianshe; Li, Shuai; Lv, Xiaodan; Lv, Weiqi

    2017-11-02

    Various kinds of data mining algorithms are continuously raised with the development of related disciplines. The applicable scopes and their performances of these algorithms are different. Hence, finding a suitable algorithm for a dataset is becoming an important emphasis for biomedical researchers to solve practical problems promptly. In this paper, seven kinds of sophisticated active algorithms, namely, C4.5, support vector machine, AdaBoost, k-nearest neighbor, naïve Bayes, random forest, and logistic regression, were selected as the research objects. The seven algorithms were applied to the 12 top-click UCI public datasets with the task of classification, and their performances were compared through induction and analysis. The sample size, number of attributes, number of missing values, and the sample size of each class, correlation coefficients between variables, class entropy of task variable, and the ratio of the sample size of the largest class to the least class were calculated to character the 12 research datasets. The two ensemble algorithms reach high accuracy of classification on most datasets. Moreover, random forest performs better than AdaBoost on the unbalanced dataset of the multi-class task. Simple algorithms, such as the naïve Bayes and logistic regression model are suitable for a small dataset with high correlation between the task and other non-task attribute variables. K-nearest neighbor and C4.5 decision tree algorithms perform well on binary- and multi-class task datasets. Support vector machine is more adept on the balanced small dataset of the binary-class task. No algorithm can maintain the best performance in all datasets. The applicability of the seven data mining algorithms on the datasets with different characteristics was summarized to provide a reference for biomedical researchers or beginners in different fields.

  8. Unintended Pregnancies Observed With Combined Use of the Levonorgestrel Contraceptive Implant and Efavirenz-based Antiretroviral Therapy: A Three-Arm Pharmacokinetic Evaluation Over 48 Weeks.

    PubMed

    Scarsi, Kimberly K; Darin, Kristin M; Nakalema, Shadia; Back, David J; Byakika-Kibwika, Pauline; Else, Laura J; Dilly Penchala, Sujan; Buzibye, Allan; Cohn, Susan E; Merry, Concepta; Lamorde, Mohammed

    2016-03-15

    Levonorgestrel subdermal implants are preferred contraceptives with an expected failure rate of <1% over 5 years. We assessed the effect of efavirenz- or nevirapine-based antiretroviral therapy (ART) coadministration on levonorgestrel pharmacokinetics. This nonrandomized, parallel group, pharmacokinetic evaluation was conducted in three groups of human immunodeficiency virus-infected Ugandan women: ART-naive (n = 17), efavirenz-based ART (n = 20), and nevirapine-based ART (n = 20). Levonorgestrel implants were inserted at baseline in all women. Blood was collected at 1, 4, 12, 24, 36, and 48 weeks. The primary endpoint was week 24 levonorgestrel concentrations, compared between the ART-naive group and each ART group by geometric mean ratio (GMR) with 90% confidence interval (CI). Secondary endpoints included week 48 levonorgestrel concentrations and unintended pregnancies. Week 24 geometric mean levonorgestrel concentrations were 528, 280, and 710 pg/mL in the ART-naive, efavirenz, and nevirapine groups, respectively (efavirenz: ART-naive GMR, 0.53; 90% CI, .50, .55 and nevirapine: ART-naive GMR, 1.35; 90% CI, 1.29, 1.43). Week 48 levonorgestrel concentrations were 580, 247, and 664 pg/mL in the ART-naive, efavirenz, and nevirapine groups, respectively (efavirenz: ART-naive GMR, 0.43; 90% CI, .42, .44 and nevirapine: ART-naive GMR, 1.14; 90% CI, 1.14, 1.16). Three pregnancies (3/20, 15%) occurred in the efavirenz group between weeks 36 and 48. No pregnancies occurred in the ART-naive or nevirapine groups. Within 1 year of combined use, levonorgestrel exposure was markedly reduced in participants who received efavirenz-based ART, accompanied by contraceptive failures. In contrast, nevirapine-based ART did not adversely affect levonorgestrel exposure or efficacy. NCT01789879. © The Author 2015. Published by Oxford University Press for the Infectious Diseases Society of America.

  9. Unintended Pregnancies Observed With Combined Use of the Levonorgestrel Contraceptive Implant and Efavirenz-based Antiretroviral Therapy: A Three-Arm Pharmacokinetic Evaluation Over 48 Weeks

    PubMed Central

    Scarsi, Kimberly K.; Darin, Kristin M.; Nakalema, Shadia; Back, David J.; Byakika-Kibwika, Pauline; Else, Laura J.; Dilly Penchala, Sujan; Buzibye, Allan; Cohn, Susan E.; Merry, Concepta; Lamorde, Mohammed

    2016-01-01

    Background. Levonorgestrel subdermal implants are preferred contraceptives with an expected failure rate of <1% over 5 years. We assessed the effect of efavirenz- or nevirapine-based antiretroviral therapy (ART) coadministration on levonorgestrel pharmacokinetics. Methods. This nonrandomized, parallel group, pharmacokinetic evaluation was conducted in three groups of human immunodeficiency virus–infected Ugandan women: ART-naive (n = 17), efavirenz-based ART (n = 20), and nevirapine-based ART (n = 20). Levonorgestrel implants were inserted at baseline in all women. Blood was collected at 1, 4, 12, 24, 36, and 48 weeks. The primary endpoint was week 24 levonorgestrel concentrations, compared between the ART-naive group and each ART group by geometric mean ratio (GMR) with 90% confidence interval (CI). Secondary endpoints included week 48 levonorgestrel concentrations and unintended pregnancies. Results. Week 24 geometric mean levonorgestrel concentrations were 528, 280, and 710 pg/mL in the ART-naive, efavirenz, and nevirapine groups, respectively (efavirenz: ART-naive GMR, 0.53; 90% CI, .50, .55 and nevirapine: ART-naive GMR, 1.35; 90% CI, 1.29, 1.43). Week 48 levonorgestrel concentrations were 580, 247, and 664 pg/mL in the ART-naive, efavirenz, and nevirapine groups, respectively (efavirenz: ART-naive GMR, 0.43; 90% CI, .42, .44 and nevirapine: ART-naive GMR, 1.14; 90% CI, 1.14, 1.16). Three pregnancies (3/20, 15%) occurred in the efavirenz group between weeks 36 and 48. No pregnancies occurred in the ART-naive or nevirapine groups. Conclusions. Within 1 year of combined use, levonorgestrel exposure was markedly reduced in participants who received efavirenz-based ART, accompanied by contraceptive failures. In contrast, nevirapine-based ART did not adversely affect levonorgestrel exposure or efficacy. Clinical Trials Registration. NCT01789879. PMID:26646680

  10. LDA boost classification: boosting by topics

    NASA Astrophysics Data System (ADS)

    Lei, La; Qiao, Guo; Qimin, Cao; Qitao, Li

    2012-12-01

    AdaBoost is an efficacious classification algorithm especially in text categorization (TC) tasks. The methodology of setting up a classifier committee and voting on the documents for classification can achieve high categorization precision. However, traditional Vector Space Model can easily lead to the curse of dimensionality and feature sparsity problems; so it affects classification performance seriously. This article proposed a novel classification algorithm called LDABoost based on boosting ideology which uses Latent Dirichlet Allocation (LDA) to modeling the feature space. Instead of using words or phrase, LDABoost use latent topics as the features. In this way, the feature dimension is significantly reduced. Improved Naïve Bayes (NB) is designed as the weaker classifier which keeps the efficiency advantage of classic NB algorithm and has higher precision. Moreover, a two-stage iterative weighted method called Cute Integration in this article is proposed for improving the accuracy by integrating weak classifiers into strong classifier in a more rational way. Mutual Information is used as metrics of weights allocation. The voting information and the categorization decision made by basis classifiers are fully utilized for generating the strong classifier. Experimental results reveals LDABoost making categorization in a low-dimensional space, it has higher accuracy than traditional AdaBoost algorithms and many other classic classification algorithms. Moreover, its runtime consumption is lower than different versions of AdaBoost, TC algorithms based on support vector machine and Neural Networks.

  11. A model-based test for treatment effects with probabilistic classifications.

    PubMed

    Cavagnaro, Daniel R; Davis-Stober, Clintin P

    2018-05-21

    Within modern psychology, computational and statistical models play an important role in describing a wide variety of human behavior. Model selection analyses are typically used to classify individuals according to the model(s) that best describe their behavior. These classifications are inherently probabilistic, which presents challenges for performing group-level analyses, such as quantifying the effect of an experimental manipulation. We answer this challenge by presenting a method for quantifying treatment effects in terms of distributional changes in model-based (i.e., probabilistic) classifications across treatment conditions. The method uses hierarchical Bayesian mixture modeling to incorporate classification uncertainty at the individual level into the test for a treatment effect at the group level. We illustrate the method with several worked examples, including a reanalysis of the data from Kellen, Mata, and Davis-Stober (2017), and analyze its performance more generally through simulation studies. Our simulations show that the method is both more powerful and less prone to type-1 errors than Fisher's exact test when classifications are uncertain. In the special case where classifications are deterministic, we find a near-perfect power-law relationship between the Bayes factor, derived from our method, and the p value obtained from Fisher's exact test. We provide code in an online supplement that allows researchers to apply the method to their own data. (PsycINFO Database Record (c) 2018 APA, all rights reserved).

  12. Do the Naive Know Best? The Predictive Power of Naive Ratings of Couple Interactions

    ERIC Educational Resources Information Center

    Baucom, Katherine J. W.; Baucom, Brian R.; Christensen, Andrew

    2012-01-01

    We examined the utility of naive ratings of communication patterns and relationship quality in a large sample of distressed couples. Untrained raters assessed 10-min videotaped interactions from 134 distressed couples who participated in both problem-solving and social support discussions at each of 3 time points (pre-therapy, post-therapy, and…

  13. The Preference for Symmetry in Flower-Naive and Not-so-Naive Bumblebees

    ERIC Educational Resources Information Center

    Plowright, C. M. S.; Evans, S. A.; Leung, J. Chew; Collin, C. A.

    2011-01-01

    Truly flower-naive bumblebees, with no prior rewarded experience for visits on any visual patterns outside the colony, were tested for their choice of bilaterally symmetric over asymmetric patterns in a radial-arm maze. No preference for symmetry was found. Prior training with rewarded black and white disks did, however, lead to a significant…

  14. Naive Theory of Biology: The Pre-School Child's Explanation of Death

    ERIC Educational Resources Information Center

    Vlok, Milandre; de Witt, Marike W.

    2012-01-01

    This article explains the naive theory of biology that the pre-school child uses to explain the cause of death. The empirical investigation showed that the young participants do use a naive theory of biology to explain function and do make reference to "vitalistic causality" in explaining organ function. Furthermore, most of these…

  15. Detecting Diseases in Medical Prescriptions Using Data Mining Tools and Combining Techniques.

    PubMed

    Teimouri, Mehdi; Farzadfar, Farshad; Soudi Alamdari, Mahsa; Hashemi-Meshkini, Amir; Adibi Alamdari, Parisa; Rezaei-Darzi, Ehsan; Varmaghani, Mehdi; Zeynalabedini, Aysan

    2016-01-01

    Data about the prevalence of communicable and non-communicable diseases, as one of the most important categories of epidemiological data, is used for interpreting health status of communities. This study aims to calculate the prevalence of outpatient diseases through the characterization of outpatient prescriptions. The data used in this study is collected from 1412 prescriptions for various types of diseases from which we have focused on the identification of ten diseases. In this study, data mining tools are used to identify diseases for which prescriptions are written. In order to evaluate the performances of these methods, we compare the results with Naïve method. Then, combining methods are used to improve the results. Results showed that Support Vector Machine, with an accuracy of 95.32%, shows better performance than the other methods. The result of Naive method, with an accuracy of 67.71%, is 20% worse than Nearest Neighbor method which has the lowest level of accuracy among the other classification algorithms. The results indicate that the implementation of data mining algorithms resulted in a good performance in characterization of outpatient diseases. These results can help to choose appropriate methods for the classification of prescriptions in larger scales.

  16. Pregnancy complications of the antiphospholipid syndrome.

    PubMed

    Tincani, A; Balestrieri, G; Danieli, E; Faden, D; Lojacono, A; Acaia, B; Trespidi, L; Ventura, D; Meroni, P L

    2003-02-01

    Starting from their first description, antiphospholipid antibodies (aPL) were associated with repeated miscarriages and fetal losses. Other complications of pregnancy like preterm birth,with pre-eclampsia or severe placental insufficiency were also frequently reported and are included in the current classification criteria of the antiphospholipid syndrome (APS). The titre, the isotype of the antibodies or their antigen specificity may be important in the risk level determination. Some of the difference in the reported results can be explained by the poor standardization achieved in aPL testing or by the not univocal classification of pregnancy complications. The pathogenesis of pregnancy failures is linked to the thrombophilic effect of aPL but also to different mechanisms including a direct effect of antibodies on the throphoblast differentiation and invasion. The study of experimental animal models provided sound evidence of the pathogenic role of aPL both in lupus prone and naive mice. The definition of APS as a condition linked to high obstetric risk and the application of an effective therapy have completely changed the prognosis of pregnancy in these patients. In fact, despite the high number of complications and preterm delivery, today a successful outcome can be achieved in the large majority of the cases.

  17. Impaired processing speed and attention in first-episode drug naive schizophrenia with deficit syndrome.

    PubMed

    Chen, Ce; Jiang, Wenhui; Zhong, Na; Wu, Jin; Jiang, Haifeng; Du, Jiang; Li, Ye; Ma, Xiancang; Zhao, Min; Hashimoto, Kenji; Gao, Chengge

    2014-11-01

    Although first-episode drug naive patients with schizophrenia are known to show cognitive impairment, the cognitive performances of these patients, who suffer deficit syndrome, compared with those who suffer non-deficit syndrome is undetermined. The aim of this study was to compare cognitive performances in first-episode drug-naive schizophrenia with deficit syndrome or non-deficit syndrome. First-episode drug naive patients (n=49) and medicated patients (n=108) with schizophrenia, and age, sex, and education matched healthy controls (n=57 for the first-episode group, and n=128 for the medicated group) were enrolled. Patients were divided into deficit or non-deficit syndrome groups, using the Schedule for Deficit Syndrome. Cognitive performance was assessed using the CogState computerized cognitive battery. All cognitive domains in first-episode drug naive and medicated patients showed significant impairment compared with their respective control groups. Furthermore, cognitive performance in first-episode drug naive patients was significantly worse than in medicated patients. Interestingly, the cognitive performance markers of processing speed and attention, in first-episode drug naive patients with deficit syndrome, were both significantly worse than in equivalent patients without deficit syndrome. In contrast, no differences in cognitive performance were found between the two groups of medicated patients. In conclusion, this study found that first-episode drug naive schizophrenia with deficit syndrome showed significantly impaired processing speed and attention, compared with patients with non-deficit syndrome. These findings highlight processing speed and attention as potential targets for pharmacological and psychosocial interventions in first-episode schizophrenia with deficit syndrome, since these domains are associated with social outcomes. Copyright © 2014 Elsevier B.V. All rights reserved.

  18. Homeostasis of naive and memory CD4+ T cells: IL-2 and IL-7 differentially regulate the balance between proliferation and Fas-mediated apoptosis.

    PubMed

    Jaleco, Sara; Swainson, Louise; Dardalhon, Valérie; Burjanadze, Maryam; Kinet, Sandrina; Taylor, Naomi

    2003-07-01

    Cytokines play a crucial role in the maintenance of polyclonal naive and memory T cell populations. It has previously been shown that ex vivo, the IL-7 cytokine induces the proliferation of naive recent thymic emigrants (RTE) isolated from umbilical cord blood but not mature adult-derived naive and memory human CD4(+) T cells. We find that the combination of IL-2 and IL-7 strongly promotes the proliferation of RTE, whereas adult CD4(+) T cells remain relatively unresponsive. Immunological activity is controlled by a balance between proliferation and apoptotic cell death. However, the relative contributions of IL-2 and IL-7 in regulating these processes in the absence of MHC/peptide signals are not known. Following exposure to either IL-2 or IL-7 alone, RTE, as well as mature naive and memory CD4(+) T cells, are rendered only minimally sensitive to Fas-mediated cell death. However, in the presence of the two cytokines, Fas engagement results in a high level of caspase-dependent apoptosis in both RTE as well as naive adult CD4(+) T cells. In contrast, equivalently treated memory CD4(+) T cells are significantly less sensitive to Fas-induced cell death. The increased susceptibility of RTE and naive CD4(+) T cells to Fas-induced apoptosis correlates with a significantly higher IL-2/IL-7-induced Fas expression on these T cell subsets than on memory CD4(+) T cells. Thus, IL-2 and IL-7 regulate homeostasis by modulating the equilibrium between proliferation and apoptotic cell death in RTE and mature naive and memory T cell subsets.

  19. Accelerometer and Camera-Based Strategy for Improved Human Fall Detection.

    PubMed

    Zerrouki, Nabil; Harrou, Fouzi; Sun, Ying; Houacine, Amrane

    2016-12-01

    In this paper, we address the problem of detecting human falls using anomaly detection. Detection and classification of falls are based on accelerometric data and variations in human silhouette shape. First, we use the exponentially weighted moving average (EWMA) monitoring scheme to detect a potential fall in the accelerometric data. We used an EWMA to identify features that correspond with a particular type of fall allowing us to classify falls. Only features corresponding with detected falls were used in the classification phase. A benefit of using a subset of the original data to design classification models minimizes training time and simplifies models. Based on features corresponding to detected falls, we used the support vector machine (SVM) algorithm to distinguish between true falls and fall-like events. We apply this strategy to the publicly available fall detection databases from the university of Rzeszow's. Results indicated that our strategy accurately detected and classified fall events, suggesting its potential application to early alert mechanisms in the event of fall situations and its capability for classification of detected falls. Comparison of the classification results using the EWMA-based SVM classifier method with those achieved using three commonly used machine learning classifiers, neural network, K-nearest neighbor and naïve Bayes, proved our model superior.

  20. Machine learning of neural representations of suicide and emotion concepts identifies suicidal youth

    PubMed Central

    Just, Marcel Adam; Pan, Lisa; Cherkassky, Vladimir L.; McMakin, Dana; Cha, Christine; Nock, Matthew K.; Brent, David

    2017-01-01

    The clinical assessment of suicidal risk would be significantly complemented by a biologically-based measure that assesses alterations in the neural representations of concepts related to death and life in people who engage in suicidal ideation. This study used machine-learning algorithms (Gaussian Naïve Bayes) to identify such individuals (17 suicidal ideators vs 17 controls) with high (91%) accuracy, based on their altered fMRI neural signatures of death and life-related concepts. The most discriminating concepts were death, cruelty, trouble, carefree, good, and praise. A similar classification accurately (94%) discriminated 9 suicidal ideators who had made a suicide attempt from 8 who had not. Moreover, a major facet of the concept alterations was the evoked emotion, whose neural signature served as an alternative basis for accurate (85%) group classification. The study establishes a biological, neurocognitive basis for altered concept representations in participants with suicidal ideation, which enables highly accurate group membership classification. PMID:29367952

  1. Automatic discovery of optimal classes

    NASA Technical Reports Server (NTRS)

    Cheeseman, Peter; Stutz, John; Freeman, Don; Self, Matthew

    1986-01-01

    A criterion, based on Bayes' theorem, is described that defines the optimal set of classes (a classification) for a given set of examples. This criterion is transformed into an equivalent minimum message length criterion with an intuitive information interpretation. This criterion does not require that the number of classes be specified in advance, this is determined by the data. The minimum message length criterion includes the message length required to describe the classes, so there is a built in bias against adding new classes unless they lead to a reduction in the message length required to describe the data. Unfortunately, the search space of possible classifications is too large to search exhaustively, so heuristic search methods, such as simulated annealing, are applied. Tutored learning and probabilistic prediction in particular cases are an important indirect result of optimal class discovery. Extensions to the basic class induction program include the ability to combine category and real value data, hierarchical classes, independent classifications and deciding for each class which attributes are relevant.

  2. Belief Function Based Decision Fusion for Decentralized Target Classification in Wireless Sensor Networks

    PubMed Central

    Zhang, Wenyu; Zhang, Zhenjiang

    2015-01-01

    Decision fusion in sensor networks enables sensors to improve classification accuracy while reducing the energy consumption and bandwidth demand for data transmission. In this paper, we focus on the decentralized multi-class classification fusion problem in wireless sensor networks (WSNs) and a new simple but effective decision fusion rule based on belief function theory is proposed. Unlike existing belief function based decision fusion schemes, the proposed approach is compatible with any type of classifier because the basic belief assignments (BBAs) of each sensor are constructed on the basis of the classifier’s training output confusion matrix and real-time observations. We also derive explicit global BBA in the fusion center under Dempster’s combinational rule, making the decision making operation in the fusion center greatly simplified. Also, sending the whole BBA structure to the fusion center is avoided. Experimental results demonstrate that the proposed fusion rule has better performance in fusion accuracy compared with the naïve Bayes rule and weighted majority voting rule. PMID:26295399

  3. Neural Predictors of Initiating Alcohol Use During Adolescence.

    PubMed

    Squeglia, Lindsay M; Ball, Tali M; Jacobus, Joanna; Brumback, Ty; McKenna, Benjamin S; Nguyen-Louie, Tam T; Sorg, Scott F; Paulus, Martin P; Tapert, Susan F

    2017-02-01

    Underage drinking is widely recognized as a leading public health and social problem for adolescents in the United States. Being able to identify at-risk adolescents before they initiate heavy alcohol use could have important clinical and public health implications; however, few investigations have explored individual-level precursors of adolescent substance use. This prospective investigation used machine learning with demographic, neurocognitive, and neuroimaging data in substance-naive adolescents to identify predictors of alcohol use initiation by age 18. Participants (N=137) were healthy substance-naive adolescents (ages 12-14) who underwent neuropsychological testing and structural and functional magnetic resonance imaging (sMRI and fMRI), and then were followed annually. By age 18, 70 youths (51%) initiated moderate to heavy alcohol use, and 67 remained nonusers. Random forest classification models identified the most important predictors of alcohol use from a large set of demographic, neuropsychological, sMRI, and fMRI variables. Random forest models identified 34 predictors contributing to alcohol use by age 18, including several demographic and behavioral factors (being male, higher socioeconomic status, early dating, more externalizing behaviors, positive alcohol expectancies), worse executive functioning, and thinner cortices and less brain activation in diffusely distributed regions of the brain. Incorporating a mix of demographic, behavioral, neuropsychological, and neuroimaging data may be the best strategy for identifying youths at risk for initiating alcohol use during adolescence. The identified risk factors will be useful for alcohol prevention efforts and in research to address brain mechanisms that may contribute to early drinking.

  4. Low frequency of genotypic resistance in HIV-1-infected patients failing an atazanavir-containing regimen: a clinical cohort study.

    PubMed

    Dolling, David I; Dunn, David T; Sutherland, Katherine A; Pillay, Deenan; Mbisa, Jean L; Parry, Chris M; Post, Frank A; Sabin, Caroline A; Cane, Patricia A

    2013-10-01

    To determine protease mutations that develop at viral failure for protease inhibitor (PI)-naive patients on a regimen containing the PI atazanavir. Resistance tests on patients failing atazanavir, conducted as part of routine clinical care in a multicentre observational study, were randomly matched by subtype to resistance tests from PI-naive controls to account for natural polymorphisms. Mutations from the consensus B sequence across the protease region were analysed for association and defined using the IAS-USA 2011 classification list. Four hundred and five of 2528 (16%) patients failed therapy containing atazanavir as a first PI over a median (IQR) follow-up of 1.76 (0.84-3.15) years and 322 resistance tests were available for analysis. Recognized major atazanavir mutations were found in six atazanavir-experienced patients (P < 0.001), including I50L and N88S. The minor mutations most strongly associated with atazanavir experience were M36I, M46I, F53L, A71V, V82T and I85V (P < 0.05). Multiple novel mutations, I15S, L19T, K43T, L63P/V, K70Q, V77I and L89I/T/V, were also associated with atazanavir experience. Viral failure on atazanavir-containing regimens was not common and major resistance mutations were rare, suggesting that adherence may be a major contributor to viral failure. Novel mutations were described that have not been previously documented.

  5. Sequence editing by Apolipoprotein B RNA-editing catalytic component-B and epidemiological surveillance of transmitted HIV-1 drug resistance

    PubMed Central

    Gifford, Robert J.; Rhee, Soo-Yon; Eriksson, Nicolas; Liu, Tommy F.; Kiuchi, Mark; Das, Amar K.; Shafer, Robert W.

    2008-01-01

    Design Promiscuous guanine (G) to adenine (A) substitutions catalysed by apolipoprotein B RNA-editing catalytic component (APOBEC) enzymes are observed in a proportion of HIV-1 sequences in vivo and can introduce artifacts into some genetic analyses. The potential impact of undetected lethal editing on genotypic estimation of transmitted drug resistance was assessed. Methods Classifiers of lethal, APOBEC-mediated editing were developed by analysis of lentiviral pol gene sequence variation and evaluated using control sets of HIV-1 sequences. The potential impact of sequence editing on genotypic estimation of drug resistance was assessed in sets of sequences obtained from 77 studies of 25 or more therapy-naive individuals, using mixture modelling approaches to determine the maximum likelihood classification of sequences as lethally edited as opposed to viable. Results Analysis of 6437 protease and reverse transcriptase sequences from therapy-naive individuals using a novel classifier of lethal, APOBEC3G-mediated sequence editing, the polypeptide-like 3G (APOBEC3G)-mediated defectives (A3GD) index’, detected lethal editing in association with spurious ‘transmitted drug resistance’ in nearly 3% of proviral sequences obtained from whole blood and 0.2% of samples obtained from plasma. Conclusion Screening for lethally edited sequences in datasets containing a proportion of proviral DNA, such as those likely to be obtained for epidemiological surveillance of transmitted drug resistance in the developing world, can eliminate rare but potentially significant errors in genotypic estimation of transmitted drug resistance. PMID:18356601

  6. A New Direction of Cancer Classification: Positive Effect of Low-Ranking MicroRNAs.

    PubMed

    Li, Feifei; Piao, Minghao; Piao, Yongjun; Li, Meijing; Ryu, Keun Ho

    2014-10-01

    Many studies based on microRNA (miRNA) expression profiles showed a new aspect of cancer classification. Because one characteristic of miRNA expression data is the high dimensionality, feature selection methods have been used to facilitate dimensionality reduction. The feature selection methods have one shortcoming thus far: they just consider the problem of where feature to class is 1:1 or n:1. However, because one miRNA may influence more than one type of cancer, human miRNA is considered to be ranked low in traditional feature selection methods and are removed most of the time. In view of the limitation of the miRNA number, low-ranking miRNAs are also important to cancer classification. We considered both high- and low-ranking features to cover all problems (1:1, n:1, 1:n, and m:n) in cancer classification. First, we used the correlation-based feature selection method to select the high-ranking miRNAs, and chose the support vector machine, Bayes network, decision tree, k-nearest-neighbor, and logistic classifier to construct cancer classification. Then, we chose Chi-square test, information gain, gain ratio, and Pearson's correlation feature selection methods to build the m:n feature subset, and used the selected miRNAs to determine cancer classification. The low-ranking miRNA expression profiles achieved higher classification accuracy compared with just using high-ranking miRNAs in traditional feature selection methods. Our results demonstrate that the m:n feature subset made a positive impression of low-ranking miRNAs in cancer classification.

  7. What Fits into a Mirror: Naive Beliefs about the Field of View

    ERIC Educational Resources Information Center

    Bianchi, Ivana; Savardi, Ugo

    2012-01-01

    Research on naive physics and naive optics have shown that people hold surprising beliefs about everyday phenomena that are in contrast with what they see. In this article, we investigated what adults expect to be the field of view of a mirror from various viewpoints. The studies presented here confirm that humans have difficulty dealing with the…

  8. Telomerase Is Involved in IL-7-Mediated Differential Survival of Naive and Memory CD4+ T Cells1

    PubMed Central

    Yang, Yinhua; An, Jie; Weng, Nan-ping

    2008-01-01

    IL-7 plays an essential role in T cell maintenance and survival. The survival effect of IL-7 is thought to be mediated through regulation of Bcl2 family proteins. After a comparative analysis of IL-7-induced growth and cell death of human naive and memory CD4+ T cells, we observed that more memory CD4+ T cells underwent cell division and proceeded to apoptosis than naive cells in response to IL-7. However, IL-7-induced expressions of Bcl2 family members (Bcl2, Bcl-xL, Bax, and Bad) were similar between naive and memory cells. Instead, we found that IL-7 induced higher levels of telomerase activity in naive cells than in memory cells, and the levels of IL-7-induced telomerase activity had a significant inverse correlation with cell death in CD4+ T cells. Furthermore, we showed that reducing expression of telomerase reverse transcriptase and telomerase activity significantly increased cell death of IL-7-cultured CD4+ T cells. Together, these findings demonstrate that telomerase is involved in IL-7-mediated differential survival of naive and memory CD4+ T cells. PMID:18322183

  9. IL-21 sustains CD28 expression on IL-15-activated human naive CD8+ T cells.

    PubMed

    Alves, Nuno L; Arosa, Fernando A; van Lier, René A W

    2005-07-15

    Human naive CD8+ T cells are able to respond in an Ag-independent manner to IL-7 and IL-15. Whereas IL-7 largely maintains CD8+ T cells in a naive phenotype, IL-15 drives these cells to an effector phenotype characterized, among other features, by down-regulation of the costimulatory molecule CD28. We evaluated the influence of the CD4+ Th cell-derived common gamma-chain cytokine IL-21 on cytokine-induced naive CD8+ T cell activation. Stimulation with IL-21 did not induce division and only slightly increased IL-15-induced proliferation of naive CD8+ T cells. Strikingly, however, IL-15-induced down-modulation of CD28 was completely prevented by IL-21 at the protein and transcriptional level. Subsequent stimulation via combined TCR/CD3 and CD28 triggering led to a markedly higher production of IL-2 and IFN-gamma in IL-15/IL-21-stimulated cells compared with IL-15-stimulated T cells. Our data show that IL-21 modulates the phenotype of naive CD8+ T cells that have undergone IL-15 induced homeostatic proliferation and preserves their responsiveness to CD28 ligands.

  10. A&M. TAN607. Elevation for secondphase expansion of A&M Building. Work ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    A&M. TAN-607. Elevation for second-phase expansion of A&M Building. Work areas south of the Carpentry Shop. High-bay shop, decontamination room at south-most end. Approved by INEEL Classification Office for public release. Ralph M. Parsons 1299-5-ANP/GE-3-607-A 106. Date: August 1956. INEEL index code no. 034-0607-00-693-107166 - Idaho National Engineering Laboratory, Test Area North, Scoville, Butte County, ID

  11. Power Plant Discharge Structure, Delta Stabilization Dike, and On-Land Taconite Tailings Disposal, Reserve Mining Company, Silver Bay, Lake County, Minnesota.

    DTIC Science & Technology

    1977-03-01

    preserved in 70% ethanol for future reference. Periphyton (Attached Algae ): Periphyton from the rivers are being collected and periphyton from bear...most abundant of phytoplankton include: Asterionella formosa, Tabellaria fenestrata, Melosica granulata, Dinobryon sp., Synedra acus, and Cyclotella sp...listed in table 5 below: TABLE 5 Aquatic Habitats - :4ile Post 7 Site Classif- Species of Major Major Benthic Water Body ication Importance Substrates

  12. FINAL TECHNICAL REPORT: Underwater Active Acoustic Monitoring Network For Marine And Hydrokinetic Energy Projects

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Stein, Peter J.; Edson, Patrick L.

    2013-12-20

    This project saw the completion of the design and development of a second generation, high frequency (90-120 kHz) Subsurface-Threat Detection Sonar Network (SDSN). The system was deployed, operated, and tested in Cobscook Bay, Maine near the site the Ocean Renewable Power Company TidGen™ power unit. This effort resulted in a very successful demonstration of the SDSN detection, tracking, localization, and classification capabilities in a high current, MHK environment as measured by results from the detection and tracking trials in Cobscook Bay. The new high frequency node, designed to operate outside the hearing range of a subset of marine mammals, wasmore » shown to detect and track objects of marine mammal-like target strength to ranges of approximately 500 meters. This performance range results in the SDSN system tracking objects for a significant duration - on the order of minutes - even in a tidal flow of 5-7 knots, potentially allowing time for MHK system or operator decision-making if marine mammals are present. Having demonstrated detection and tracking of synthetic targets with target strengths similar to some marine mammals, the primary hurdle to eventual automated monitoring is a dataset of actual marine mammal kinematic behavior and modifying the tracking algorithms and parameters which are currently tuned to human diver kinematics and classification.« less

  13. Linear and Order Statistics Combiners for Pattern Classification

    NASA Technical Reports Server (NTRS)

    Tumer, Kagan; Ghosh, Joydeep; Lau, Sonie (Technical Monitor)

    2001-01-01

    Several researchers have experimentally shown that substantial improvements can be obtained in difficult pattern recognition problems by combining or integrating the outputs of multiple classifiers. This chapter provides an analytical framework to quantify the improvements in classification results due to combining. The results apply to both linear combiners and order statistics combiners. We first show that to a first order approximation, the error rate obtained over and above the Bayes error rate, is directly proportional to the variance of the actual decision boundaries around the Bayes optimum boundary. Combining classifiers in output space reduces this variance, and hence reduces the 'added' error. If N unbiased classifiers are combined by simple averaging. the added error rate can be reduced by a factor of N if the individual errors in approximating the decision boundaries are uncorrelated. Expressions are then derived for linear combiners which are biased or correlated, and the effect of output correlations on ensemble performance is quantified. For order statistics based non-linear combiners, we derive expressions that indicate how much the median, the maximum and in general the i-th order statistic can improve classifier performance. The analysis presented here facilitates the understanding of the relationships among error rates, classifier boundary distributions, and combining in output space. Experimental results on several public domain data sets are provided to illustrate the benefits of combining and to support the analytical results.

  14. Marine benthic habitat mapping of Muir Inlet, Glacier Bay National Park and Preserve, Alaska, with an evaluation of the Coastal and Marine Ecological Classification Standard III

    USGS Publications Warehouse

    Trusel, Luke D.; Cochrane, Guy R.; Etherington, Lisa L.; Powell, Ross D.; Mayer, Larry A.

    2010-01-01

    Seafloor geology and potential benthic habitats were mapped in Muir Inlet, Glacier Bay National Park and Preserve, Alaska, using multibeam sonar, ground-truth information, and geological interpretations. Muir Inlet is a recently deglaciated fjord that is under the influence of glacial and paraglacial marine processes. High glacially derived sediment and meltwater fluxes, slope instabilities, and variable bathymetry result in a highly dynamic estuarine environment and benthic ecosystem. We characterize the fjord seafloor and potential benthic habitats using the Coastal and Marine Ecological Classification Standard (CMECS) recently developed by the National Oceanic and Atmospheric Administration (NOAA) and NatureServe. Substrates within Muir Inlet are dominated by mud, derived from the high glacial debris flux. Water-column characteristics are derived from a combination of conductivity temperature depth (CTD) measurements and circulation-model results. We also present modern glaciomarine sediment accumulation data from quantitative differential bathymetry. These data show Muir Inlet is divided into two contrasting environments: a dynamic upper fjord and a relatively static lower fjord. The accompanying maps represent the first publicly available high-resolution bathymetric surveys of Muir Inlet. The results of these analyses serve as a test of the CMECS and as a baseline for continued mapping and correlations among seafloor substrate, benthic habitats, and glaciomarine processes.

  15. Using Remotely Sensed Data and Watershed and Hydrodynamic Models to Evaluate the Effects of Land Cover Land Use Change on Aquatic Ecosystems in Mobile Bay, AL

    NASA Technical Reports Server (NTRS)

    Al-Hamdan, Mohammad Z.; Estes, Maurice G., Jr.; Judd, Chaeli; Thom, Ron; Woodruff, Dana; Ellis, Jean T.; Quattrochi, Dale; Watson, Brian; Rodriquez, Hugo; Johnson, Hoyt

    2012-01-01

    Alabama coastal systems have been subjected to increasing pressure from a variety of activities including urban and rural development, shoreline modifications, industrial activities, and dredging of shipping and navigation channels. The impacts on coastal ecosystems are often observed through the use of indicator species. One such indicator species for aquatic ecosystem health is submerged aquatic vegetation (SAV). Watershed and hydrodynamic modeling has been performed to evaluate the impact of land cover land use (LCLU) change in the two counties surrounding Mobile Bay (Mobile and Baldwin) on SAV stressors and controlling factors (temperature, salinity, and sediment) in the Mobile Bay estuary. Watershed modeling using the Loading Simulation Package in C++ (LSPC) was performed for all watersheds contiguous to Mobile Bay for LCLU scenarios in 1948, 1992, 2001, and 2030. Remotely sensed Landsat-derived National Land Cover Data (NLCD) were used in the 1992 and 2001 simulations after having been reclassified to a common classification scheme. The Prescott Spatial Growth Model was used to project the 2030 LCLU scenario based on current trends. The LSPC model simulations provided output on changes in flow, temperature, and sediment for 22 discharge points into the estuary. These results were inputted in the Environmental Fluid Dynamics Computer Code (EFDC) hydrodynamic model to generate data on changes in temperature, salinity, and sediment on a grid throughout Mobile Bay and adjacent estuaries. The changes in the aquatic ecosystem were used to perform an ecological analysis to evaluate the impact on SAV habitat suitability. This is the key product benefiting the Mobile Bay coastal environmental managers that integrates the influences of temperature, salinity, and sediment due to LCLU driven flow changes with the restoration potential of SAVs. Data products and results are being integrated into NOAA s EcoWatch and Gulf of Mexico Data Atlas online systems for dissemination to coastal resource managers and stakeholders.

  16. Using Remotely Sensed Data and Watershed and Hydrodynamic Models to Evaluate the Effects of Land Cover Land Use Change on Aquatic Ecosystems in Mobile Bay, AL

    NASA Astrophysics Data System (ADS)

    Al-Hamdan, M. Z.; Estes, M. G.; Judd, C.; Thom, R.; Woodruff, D.; Ellis, J. T.; Quattrochi, D.; Watson, B.; Rodriguez, H.; Johnson, H.

    2012-12-01

    Alabama coastal systems have been subjected to increasing pressure from a variety of activities including urban and rural development, shoreline modifications, industrial activities, and dredging of shipping and navigation channels. The impacts on coastal ecosystems are often observed through the use of indicator species. One such indicator species for aquatic ecosystem health is submerged aquatic vegetation (SAV). Watershed and hydrodynamic modeling has been performed to evaluate the impact of land cover land use (LCLU) change in the two counties surrounding Mobile Bay (Mobile and Baldwin) on SAV stressors and controlling factors (temperature, salinity, and sediment) in the Mobile Bay estuary. Watershed modeling using the Loading Simulation Package in C++ (LSPC) was performed for all watersheds contiguous to Mobile Bay for LCLU scenarios in 1948, 1992, 2001, and 2030. Remotely sensed Landsat-derived National Land Cover Data (NLCD) were used in the 1992 and 2001 simulations after having been reclassified to a common classification scheme. The Prescott Spatial Growth Model was used to project the 2030 LCLU scenario based on current trends. The LSPC model simulations provided output on changes in flow, temperature, and sediment for 22 discharge points into the estuary. These results were inputted in the Environmental Fluid Dynamics Computer Code (EFDC) hydrodynamic model to generate data on changes in temperature, salinity, and sediment on a grid throughout Mobile Bay and adjacent estuaries. The changes in the aquatic ecosystem were used to perform an ecological analysis to evaluate the impact on SAV habitat suitability. This is the key product benefiting the Mobile Bay coastal environmental managers that integrates the influences of temperature, salinity, and sediment due to LCLU driven flow changes with the restoration potential of SAVs. Data products and results are being integrated into NOAA's EcoWatch and Gulf of Mexico Data Atlas online systems for dissemination to coastal resource managers and stakeholders.

  17. Detecting the presence-absence of bluefin tuna by automated analysis of medium-range sonars on fishing vessels.

    PubMed

    Uranga, Jon; Arrizabalaga, Haritz; Boyra, Guillermo; Hernandez, Maria Carmen; Goñi, Nicolas; Arregui, Igor; Fernandes, Jose A; Yurramendi, Yosu; Santiago, Josu

    2017-01-01

    This study presents a methodology for the automated analysis of commercial medium-range sonar signals for detecting presence/absence of bluefin tuna (Tunnus thynnus) in the Bay of Biscay. The approach uses image processing techniques to analyze sonar screenshots. For each sonar image we extracted measurable regions and analyzed their characteristics. Scientific data was used to classify each region into a class ("tuna" or "no-tuna") and build a dataset to train and evaluate classification models by using supervised learning. The methodology performed well when validated with commercial sonar screenshots, and has the potential to automatically analyze high volumes of data at a low cost. This represents a first milestone towards the development of acoustic, fishery-independent indices of abundance for bluefin tuna in the Bay of Biscay. Future research lines and additional alternatives to inform stock assessments are also discussed.

  18. Skylab/EREP application to ecological, geological, and oceanographic investigations of Delaware Bay

    NASA Technical Reports Server (NTRS)

    Klemas, V.; Bartlett, D. S.; Philpot, W. D.; Rogers, R. H.; Reed, L. E.

    1978-01-01

    Skylab/EREP S190A and S190B film products were optically enhanced and visually interpreted to extract data suitable for; (1) mapping coastal land use; (2) inventorying wetlands vegetation; (3) monitoring tidal conditions; (4) observing suspended sediment patterns; (5) charting surface currents; (6) locating coastal fronts and water mass boundaries; (7) monitoring industrial and municipal waste dumps in the ocean; (8) determining the size and flow direction of river, bay and man-made discharge plumes; and (9) observing ship traffic. Film products were visually analyzed to identify and map ten land-use and vegetation categories at a scale of 1:125,000. Digital tapes from the multispectral scanner were used to prepare thematic maps of land use. Classification accuracies obtained by comparison of derived thematic maps of land-use with USGS-CARETS land-use maps in southern Delaware ranged from 44 percent to 100 percent.

  19. Detecting the presence-absence of bluefin tuna by automated analysis of medium-range sonars on fishing vessels

    PubMed Central

    Uranga, Jon; Arrizabalaga, Haritz; Boyra, Guillermo; Hernandez, Maria Carmen; Goñi, Nicolas; Arregui, Igor; Fernandes, Jose A.; Yurramendi, Yosu; Santiago, Josu

    2017-01-01

    This study presents a methodology for the automated analysis of commercial medium-range sonar signals for detecting presence/absence of bluefin tuna (Tunnus thynnus) in the Bay of Biscay. The approach uses image processing techniques to analyze sonar screenshots. For each sonar image we extracted measurable regions and analyzed their characteristics. Scientific data was used to classify each region into a class (“tuna” or “no-tuna”) and build a dataset to train and evaluate classification models by using supervised learning. The methodology performed well when validated with commercial sonar screenshots, and has the potential to automatically analyze high volumes of data at a low cost. This represents a first milestone towards the development of acoustic, fishery-independent indices of abundance for bluefin tuna in the Bay of Biscay. Future research lines and additional alternatives to inform stock assessments are also discussed. PMID:28152032

  20. A Lightweight Hierarchical Activity Recognition Framework Using Smartphone Sensors

    PubMed Central

    Han, Manhyung; Bang, Jae Hun; Nugent, Chris; McClean, Sally; Lee, Sungyoung

    2014-01-01

    Activity recognition for the purposes of recognizing a user's intentions using multimodal sensors is becoming a widely researched topic largely based on the prevalence of the smartphone. Previous studies have reported the difficulty in recognizing life-logs by only using a smartphone due to the challenges with activity modeling and real-time recognition. In addition, recognizing life-logs is difficult due to the absence of an established framework which enables the use of different sources of sensor data. In this paper, we propose a smartphone-based Hierarchical Activity Recognition Framework which extends the Naïve Bayes approach for the processing of activity modeling and real-time activity recognition. The proposed algorithm demonstrates higher accuracy than the Naïve Bayes approach and also enables the recognition of a user's activities within a mobile environment. The proposed algorithm has the ability to classify fifteen activities with an average classification accuracy of 92.96%. PMID:25184486

  1. Formaldehyde-Induced Aggravation of Pruritus and Dermatitis Is Associated with the Elevated Expression of Th1 Cytokines in a Rat Model of Atopic Dermatitis

    PubMed Central

    Back, Seung Keun; Lee, Hyunkyoung; Lee, JaeHee; Kim, Hye young; Kim, Hee Jin; Na, Heung Sik

    2016-01-01

    Atopic dermatitis is a complex disease of heterogeneous pathogenesis, in particular, genetic predisposition, environmental triggers, and their interactions. Indoor air pollution, increasing with urbanization, plays a role as environmental risk factor in the development of AD. However, we still lack a detailed picture of the role of air pollution in the development of the disease. Here, we examined the effect of formaldehyde (FA) exposure on the manifestation of atopic dermatitis and the underlying molecular mechanism in naive rats and in a rat model of atopic dermatitis (AD) produced by neonatal capsaicin treatment. The AD and naive rats were exposed to 0.8 ppm FA, 1.2 ppm FA, or fresh air (Air) for 6 weeks (2 hours/day and 5 days/week). So, six groups, namely the 1.2 FA-AD, 0.8 FA-AD, Air-AD, 1.2 FA-naive, 0.8 FA-naive and Air-naive groups, were established. Pruritus and dermatitis, two major symptoms of atopic dermatitis, were evaluated every week for 6 weeks. After that, samples of the blood, the skin and the thymus were collected from the 1.2 FA-AD, the Air-AD, the 1.2 FA-naive and the Air-naive groups. Serum IgE levels were quantified with ELISA, and mRNA expression levels of inflammatory cytokines from extracts of the skin and the thymus were calculated with qRT-PCR. The dermatitis and pruritus significantly worsened in 1.2 FA-AD group, but not in 0.8 FA-AD, compared to the Air-AD animals, whereas FA didn't induce any symptoms in naive rats. Consistently, the levels of serum IgE were significantly higher in 1.2 FA-AD than in air-AD, however, there was no significant difference following FA exposure in naive animals. In the skin, mRNA expression levels of Th1 cytokines such as TNF-α and IL-1β were significantly higher in the 1.2 FA-AD rats compared to the air-AD rats, whereas mRNA expression levels of Th2 cytokines (IL-4, IL-5, IL-13), IL-17A and TSLP were significantly higher in 1.2 FA-naive group than in the Air-naive group. These results suggested that 1.2 ppm of FA penetrated the injured skin barrier, and exacerbated Th1 responses and serum IgE level in the AD rats so that dermatitis and pruritus were aggravated, while the elevated expression of Th2 cytokines by 1.2 ppm of FA in naive rats was probably insufficient for clinical manifestation. In conclusion, in a rat model of atopic dermatitis, exposure to 1.2 ppm of FA aggravated pruritus and skin inflammation, which was associated with the elevated expression of Th1 cytokines. PMID:28005965

  2. Seagrass Identification Using High-Resolution 532nm Bathymetric LiDAR and Hyperspectral Imagery

    NASA Astrophysics Data System (ADS)

    Pan, Z.; Prasad, S.; Starek, M. J.; Fernandez Diaz, J. C.; Glennie, C. L.; Carter, W. E.; Shrestha, R. L.; Singhania, A.; Gibeaut, J. C.

    2013-12-01

    Seagrass provides vital habitat for marine fisheries and is a key indicator species of coastal ecosystem vitality. Monitoring seagrass is therefore an important environmental initiative, but measuring details of seagrass distribution over large areas via remote sensing has proved challenging. Developments in airborne bathymetric light detection and ranging (LiDAR) provide great potential in this regard. Traditional bathymetric LiDAR systems have been limited in their ability to map within the shallow water zone (< 1 m) where seagrass is typically present due to limitations in receiver response and laser pulse length. Emergent short-pulse width bathymetric LiDAR sensors and waveform processing algorithms enable depth measurements in shallow water environments previously inaccessible. This 3D information of the benthic layer can be applied to detect seagrass and characterize its distribution. Researchers with the National Center for Airborne Laser Mapping (NCALM) at the University of Houston (UH) and the Coastal and Marine Geospatial Sciences Lab (CMGL) of the Harte Research Institute at Texas A&M University-Corpus Christi conducted a coordinated airborne and boat-based survey of the Redfish Bay State Scientific Area as part of a collaborative study to investigate the capabilities of bathymetric LiDAR and hyperspectral imaging for seagrass mapping. Redfish Bay, located along the middle Texas coast of the Gulf of Mexico, is a state scientific area designated for the purpose of protecting and studying native seagrasses. Redfish Bay is part of the broader Coastal Bend Bays estuary system recognized by the US Environmental Protection Agency (EPA) as a national estuary of significance. For this survey, UH acquired high-resolution discrete-return and full-waveform bathymetric data using their Optech Aquarius 532 nm green LiDAR. In a separate flight, UH collected 2 sets of hyperspectral imaging data (1.2-m pixel resolution and 72 bands, and 0.6m pixel resolution and 36 bands) with their CASI 1500 hyperspectral sensor. The ground survey was conducted by CMGL. The team used an airboat to collect in-situ radiometer measurements of sky irradiance and surface water reflectance at different locations in the bay. The team also collected water samples, GPS position, and depth. A follow-up survey was conducted to acquire ground-truth data of benthic type at over 80 locations within the bay. Two complementary approaches were developed to detect and map the seagrass cover over the study area - automated classification algorithms were validated with high spatial resolution hyperspectral imagery, and a continuous wavelet based signal processing and pulse broadening analysis of the digitized returns was performed with the full waveform of the bathymetric LiDAR. The two approaches were compared to the collected ground truth data of seagrass type, height, and location. Results of the evaluation will be presented, along with a preliminary discussion of the fusion of the LiDAR and hyperspectral imagery for improved overall classification accuracy.

  3. Exploring the Autonomous Economic World of Children: A Mixed Methods Study of Kids' Naive Economic Theories Incorporating Ethnographic and Behavioral Economics Methodologies

    ERIC Educational Resources Information Center

    Jennings, Amanda Brooke

    2017-01-01

    Children construct meaning from their economic experiences in the form of naive theories and use these theories to explain the relationships between their actions and the outcomes. Inevitably, due to their lack of economic literacy, these theories will be incomplete. Through curriculum design that acknowledges and addresses these naive theories,…

  4. Effector CD8 T cells dedifferentiate into long-lived memory cells.

    PubMed

    Youngblood, Ben; Hale, J Scott; Kissick, Haydn T; Ahn, Eunseon; Xu, Xiaojin; Wieland, Andreas; Araki, Koichi; West, Erin E; Ghoneim, Hazem E; Fan, Yiping; Dogra, Pranay; Davis, Carl W; Konieczny, Bogumila T; Antia, Rustom; Cheng, Xiaodong; Ahmed, Rafi

    2017-12-21

    Memory CD8 T cells that circulate in the blood and are present in lymphoid organs are an essential component of long-lived T cell immunity. These memory CD8 T cells remain poised to rapidly elaborate effector functions upon re-exposure to pathogens, but also have many properties in common with naive cells, including pluripotency and the ability to migrate to the lymph nodes and spleen. Thus, memory cells embody features of both naive and effector cells, fuelling a long-standing debate centred on whether memory T cells develop from effector cells or directly from naive cells. Here we show that long-lived memory CD8 T cells are derived from a subset of effector T cells through a process of dedifferentiation. To assess the developmental origin of memory CD8 T cells, we investigated changes in DNA methylation programming at naive and effector cell-associated genes in virus-specific CD8 T cells during acute lymphocytic choriomeningitis virus infection in mice. Methylation profiling of terminal effector versus memory-precursor CD8 T cell subsets showed that, rather than retaining a naive epigenetic state, the subset of cells that gives rise to memory cells acquired de novo DNA methylation programs at naive-associated genes and became demethylated at the loci of classically defined effector molecules. Conditional deletion of the de novo methyltransferase Dnmt3a at an early stage of effector differentiation resulted in reduced methylation and faster re-expression of naive-associated genes, thereby accelerating the development of memory cells. Longitudinal phenotypic and epigenetic characterization of the memory-precursor effector subset of virus-specific CD8 T cells transferred into antigen-free mice revealed that differentiation to memory cells was coupled to erasure of de novo methylation programs and re-expression of naive-associated genes. Thus, epigenetic repression of naive-associated genes in effector CD8 T cells can be reversed in cells that develop into long-lived memory CD8 T cells while key effector genes remain demethylated, demonstrating that memory T cells arise from a subset of fate-permissive effector T cells.

  5. K-nearest neighbors based methods for identification of different gear crack levels under different motor speeds and loads: Revisited

    NASA Astrophysics Data System (ADS)

    Wang, Dong

    2016-03-01

    Gears are the most commonly used components in mechanical transmission systems. Their failures may cause transmission system breakdown and result in economic loss. Identification of different gear crack levels is important to prevent any unexpected gear failure because gear cracks lead to gear tooth breakage. Signal processing based methods mainly require expertize to explain gear fault signatures which is usually not easy to be achieved by ordinary users. In order to automatically identify different gear crack levels, intelligent gear crack identification methods should be developed. The previous case studies experimentally proved that K-nearest neighbors based methods exhibit high prediction accuracies for identification of 3 different gear crack levels under different motor speeds and loads. In this short communication, to further enhance prediction accuracies of existing K-nearest neighbors based methods and extend identification of 3 different gear crack levels to identification of 5 different gear crack levels, redundant statistical features are constructed by using Daubechies 44 (db44) binary wavelet packet transform at different wavelet decomposition levels, prior to the use of a K-nearest neighbors method. The dimensionality of redundant statistical features is 620, which provides richer gear fault signatures. Since many of these statistical features are redundant and highly correlated with each other, dimensionality reduction of redundant statistical features is conducted to obtain new significant statistical features. At last, the K-nearest neighbors method is used to identify 5 different gear crack levels under different motor speeds and loads. A case study including 3 experiments is investigated to demonstrate that the developed method provides higher prediction accuracies than the existing K-nearest neighbors based methods for recognizing different gear crack levels under different motor speeds and loads. Based on the new significant statistical features, some other popular statistical models including linear discriminant analysis, quadratic discriminant analysis, classification and regression tree and naive Bayes classifier, are compared with the developed method. The results show that the developed method has the highest prediction accuracies among these statistical models. Additionally, selection of the number of new significant features and parameter selection of K-nearest neighbors are thoroughly investigated.

  6. Earthquake damage mapping by using remotely sensed data: the Haiti case study

    NASA Astrophysics Data System (ADS)

    Romaniello, Vito; Piscini, Alessandro; Bignami, Christian; Anniballe, Roberta; Stramondo, Salvatore

    2017-01-01

    This work proposes methodologies aimed at evaluating the sensitivity of optical and synthetic aperture radar (SAR) change features obtained from satellite images with respect to the damage grade due to an earthquake. The test case is the Mw 7.0 earthquake that hit Haiti on January 12, 2010, located 25 km west-south-west of the city of Port-au-Prince. The disastrous shock caused the collapse of a huge number of buildings and widespread damage. The objective is to investigate possible parameters that can affect the robustness and sensitivity of the proposed methods derived from the literature. It is worth noting how the proposed analysis concerns the estimation of derived features at object scale. For this purpose, a segmentation of the study area into several regions has been done by considering a set of polygons, over the city of Port-au-Prince, extracted from the open source open street map geo-database. The analysis of change detection indicators is based on ground truth information collected during a postearthquake survey and is available from a Joint Research Centre database. The resulting damage map is expressed in terms of collapse ratio, thus indicating the areas with a greater number of collapsed buildings. The available satellite dataset is composed of optical and SAR images, collected before and after the seismic event. In particular, we used two GeoEye-1 optical images (one preseismic and one postseismic) and three TerraSAR-X SAR images (two preseismic and one postseismic). Previous studies allowed us to identify some features having a good sensitivity with damage at the object scale. Regarding the optical data, we selected the normalized difference index and two quantities coming from the information theory, namely the Kullback-Libler divergence (KLD) and the mutual information (MI). In addition, for the SAR data, we picked out the intensity correlation difference and the KLD parameter. In order to analyze the capability of these parameters to correctly detect damaged areas, two different classifiers were used: the Naive Bayes and the support vector machine classifiers. The classification results demonstrate that the simultaneous use of several change features from Earth observations can improve the damage estimation at object scale.

  7. The Accuracy and Reliability of Crowdsource Annotations of Digital Retinal Images

    PubMed Central

    Mitry, Danny; Zutis, Kris; Dhillon, Baljean; Peto, Tunde; Hayat, Shabina; Khaw, Kay-Tee; Morgan, James E.; Moncur, Wendy; Trucco, Emanuele; Foster, Paul J.

    2016-01-01

    Purpose Crowdsourcing is based on outsourcing computationally intensive tasks to numerous individuals in the online community who have no formal training. Our aim was to develop a novel online tool designed to facilitate large-scale annotation of digital retinal images, and to assess the accuracy of crowdsource grading using this tool, comparing it to expert classification. Methods We used 100 retinal fundus photograph images with predetermined disease criteria selected by two experts from a large cohort study. The Amazon Mechanical Turk Web platform was used to drive traffic to our site so anonymous workers could perform a classification and annotation task of the fundus photographs in our dataset after a short training exercise. Three groups were assessed: masters only, nonmasters only and nonmasters with compulsory training. We calculated the sensitivity, specificity, and area under the curve (AUC) of receiver operating characteristic (ROC) plots for all classifications compared to expert grading, and used the Dice coefficient and consensus threshold to assess annotation accuracy. Results In total, we received 5389 annotations for 84 images (excluding 16 training images) in 2 weeks. A specificity and sensitivity of 71% (95% confidence interval [CI], 69%–74%) and 87% (95% CI, 86%–88%) was achieved for all classifications. The AUC in this study for all classifications combined was 0.93 (95% CI, 0.91–0.96). For image annotation, a maximal Dice coefficient (∼0.6) was achieved with a consensus threshold of 0.25. Conclusions This study supports the hypothesis that annotation of abnormalities in retinal images by ophthalmologically naive individuals is comparable to expert annotation. The highest AUC and agreement with expert annotation was achieved in the nonmasters with compulsory training group. Translational Relevance The use of crowdsourcing as a technique for retinal image analysis may be comparable to expert graders and has the potential to deliver timely, accurate, and cost-effective image analysis. PMID:27668130

  8. The Accuracy and Reliability of Crowdsource Annotations of Digital Retinal Images.

    PubMed

    Mitry, Danny; Zutis, Kris; Dhillon, Baljean; Peto, Tunde; Hayat, Shabina; Khaw, Kay-Tee; Morgan, James E; Moncur, Wendy; Trucco, Emanuele; Foster, Paul J

    2016-09-01

    Crowdsourcing is based on outsourcing computationally intensive tasks to numerous individuals in the online community who have no formal training. Our aim was to develop a novel online tool designed to facilitate large-scale annotation of digital retinal images, and to assess the accuracy of crowdsource grading using this tool, comparing it to expert classification. We used 100 retinal fundus photograph images with predetermined disease criteria selected by two experts from a large cohort study. The Amazon Mechanical Turk Web platform was used to drive traffic to our site so anonymous workers could perform a classification and annotation task of the fundus photographs in our dataset after a short training exercise. Three groups were assessed: masters only, nonmasters only and nonmasters with compulsory training. We calculated the sensitivity, specificity, and area under the curve (AUC) of receiver operating characteristic (ROC) plots for all classifications compared to expert grading, and used the Dice coefficient and consensus threshold to assess annotation accuracy. In total, we received 5389 annotations for 84 images (excluding 16 training images) in 2 weeks. A specificity and sensitivity of 71% (95% confidence interval [CI], 69%-74%) and 87% (95% CI, 86%-88%) was achieved for all classifications. The AUC in this study for all classifications combined was 0.93 (95% CI, 0.91-0.96). For image annotation, a maximal Dice coefficient (∼0.6) was achieved with a consensus threshold of 0.25. This study supports the hypothesis that annotation of abnormalities in retinal images by ophthalmologically naive individuals is comparable to expert annotation. The highest AUC and agreement with expert annotation was achieved in the nonmasters with compulsory training group. The use of crowdsourcing as a technique for retinal image analysis may be comparable to expert graders and has the potential to deliver timely, accurate, and cost-effective image analysis.

  9. Novel naïve Bayes classification models for predicting the chemical Ames mutagenicity.

    PubMed

    Zhang, Hui; Kang, Yan-Li; Zhu, Yuan-Yuan; Zhao, Kai-Xia; Liang, Jun-Yu; Ding, Lan; Zhang, Teng-Guo; Zhang, Ji

    2017-06-01

    Prediction of drug candidates for mutagenicity is a regulatory requirement since mutagenic compounds could pose a toxic risk to humans. The aim of this investigation was to develop a novel prediction model of mutagenicity by using a naïve Bayes classifier. The established model was validated by the internal 5-fold cross validation and external test sets. For comparison, the recursive partitioning classifier prediction model was also established and other various reported prediction models of mutagenicity were collected. Among these methods, the prediction performance of naïve Bayes classifier established here displayed very well and stable, which yielded average overall prediction accuracies for the internal 5-fold cross validation of the training set and external test set I set were 89.1±0.4% and 77.3±1.5%, respectively. The concordance of the external test set II with 446 marketed drugs was 90.9±0.3%. In addition, four simple molecular descriptors (e.g., Apol, No. of H donors, Num-Rings and Wiener) related to mutagenicity and five representative substructures of mutagens (e.g., aromatic nitro, hydroxyl amine, nitroso, aromatic amine and N-methyl-N-methylenemethanaminum) produced by ECFP_14 fingerprints were identified. We hope the established naïve Bayes prediction model can be applied to risk assessment processes; and the obtained important information of mutagenic chemicals can guide the design of chemical libraries for hit and lead optimization. Copyright © 2017 Elsevier B.V. All rights reserved.

  10. Using Remotely Sensed Data and Watershed and Hydrodynamic Models to Evaluate the Effects of Land Cover Land Use Change on Aquatic Ecosystems in Mobile Bay, AL

    NASA Technical Reports Server (NTRS)

    Al-Hamdan, Mohammad; Estes, Maurice G., Jr.; Judd, Chaeli; Woodruff, Dana; Ellis, Jean; Quattrochi, Dale; Watson, Brian; Rodriquez, Hugo; Johnson, Hoyt

    2012-01-01

    Alabama coastal systems have been subjected to increasing pressure from a variety of activities including urban and rural development, shoreline modifications, industrial activities, and dredging of shipping and navigation channels. The impacts on coastal ecosystems are often observed through the use of indicator species. One such indicator species for aquatic ecosystem health is submerged aquatic vegetation (SAV). Watershed and hydrodynamic modeling has been performed to evaluate the impact of land cover land use (LCLU) change in the two counties surrounding Mobile Bay (Mobile and Baldwin) on SAV stressors and controlling factors (temperature, salinity, and sediment) in the Mobile Bay estuary. Watershed modeling using the Loading Simulation Package in C++ (LSPC) was performed for all watersheds contiguous to Mobile Bay for LCLU scenarios in 1948, 1992, 2001, and 2030. Remotely sensed Landsat-derived National Land Cover Data (NLCD) were used in the 1992 and 2001 simulations after having been reclassified to a common classification scheme. The Prescott Spatial Growth Model was used to project the 2030 LCLU scenario based on current trends. The LSPC model simulations provided output on changes in flow, temperature, and sediment for 22 discharge points into the estuary. These results were inputted in the Environmental Fluid Dynamics Computer Code (EFDC) hydrodynamic model to generate data on changes in temperature, salinity, and sediment on a grid throughout Mobile Bay and adjacent estuaries. The changes in the aquatic ecosystem were used to perform an ecological analysis to evaluate the impact on SAV habitat suitability. This is the key product benefiting the Mobile Bay coastal environmental managers that integrates the influences of temperature, salinity, and sediment due to LCLU driven flow changes with the restoration potential of SAVs. Data products and results are being integrated into NOAA s EcoWatch and Gulf of Mexico Data Atlas online systems for dissemination to coastal resource managers and stakeholders. Objective 1: Develop and utilize Land Use scenarios for Mobile and Baldwin Counties, AL as input to models to predict the affects on water properties (temperature,salinity,)for Mobile Bay through 2030. Objective 2: Evaluate the impact of land use change on seagrasses and SAV in Mobile Bay. Hypothesis: Urbanization will significantly increase surface flows and impact salinity and temperature variables that effect seagrasses and SAVs.

  11. Social learning of a brood parasite by its host

    PubMed Central

    Feeney, William E.; Langmore, Naomi E.

    2013-01-01

    Arms races between brood parasites and their hosts provide model systems for studying the evolutionary repercussions of species interactions. However, how naive hosts identify brood parasites as enemies remains poorly understood, despite its ecological and evolutionary significance. Here, we investigate whether young, cuckoo-naive superb fairy-wrens, Malurus cyaneus, can learn to recognize cuckoos as a threat through social transmission of information. Naive individuals were initially unresponsive to a cuckoo specimen, but after observing conspecifics mob a cuckoo, they made more whining and mobbing alarm calls, and spent more time physically mobbing the cuckoo. This is the first direct evidence that naive hosts can learn to identify brood parasites as enemies via social learning. PMID:23760171

  12. Social learning of a brood parasite by its host.

    PubMed

    Feeney, William E; Langmore, Naomi E

    2013-08-23

    Arms races between brood parasites and their hosts provide model systems for studying the evolutionary repercussions of species interactions. However, how naive hosts identify brood parasites as enemies remains poorly understood, despite its ecological and evolutionary significance. Here, we investigate whether young, cuckoo-naive superb fairy-wrens, Malurus cyaneus, can learn to recognize cuckoos as a threat through social transmission of information. Naive individuals were initially unresponsive to a cuckoo specimen, but after observing conspecifics mob a cuckoo, they made more whining and mobbing alarm calls, and spent more time physically mobbing the cuckoo. This is the first direct evidence that naive hosts can learn to identify brood parasites as enemies via social learning.

  13. Oil Spill Detection along the Gulf of Mexico Coastline based on Airborne Imaging Spectrometer Data

    NASA Astrophysics Data System (ADS)

    Arslan, M. D.; Filippi, A. M.; Guneralp, I.

    2013-12-01

    The Deepwater Horizon oil spill in the Gulf of Mexico between April and July 2010 demonstrated the importance of synoptic oil-spill monitoring in coastal environments via remote-sensing methods. This study focuses on terrestrial oil-spill detection and thickness estimation based on hyperspectral images acquired along the coastline of the Gulf of Mexico. We use AVIRIS (Airborne Visible/Infrared Imaging Spectrometer) imaging spectrometer data collected over Bay Jimmy and Wilkinson Bay within Barataria Bay, Louisiana, USA during September 2010. We also employ field-based observations of the degree of oil accumulation along the coastline, as well as in situ measurements from the literature. As part of our proposed spectroscopic approach, we operate on atmospherically- and geometrically-corrected hyperspectral AVIRIS data to extract image-derived endmembers via Minimum Noise Fraction transform, Pixel Purity Index-generation, and n-dimensional visualization. Extracted endmembers are then used as input to endmember-mapping algorithms to yield fractional-abundance images and crisp classification images. We also employ Multiple Endmember Spectral Mixture Analysis (MESMA) for oil detection and mapping in order to enable the number and types of endmembers to vary on a per-pixel basis, in contast to simple Spectral Mixture Analysis (SMA). MESMA thus better allows accounting for spectral variabiltiy of oil (e.g., due to varying oil thicknesses, states of degradation, and the presence of different oil types, etc.) and other materials, including soils and salt marsh vegetation of varying types, which may or may not be affected by the oil spill. A decision-tree approach is also utilized for comparison. Classification results do indicate that MESMA provides advantageous capabilities for mapping several oil-thickness classes for affected vegetation and soils along the Gulf of Mexico coastline, relative to the conventional approaches tested. Oil thickness-mapping results from MESMA and the decision tree demonstrate that such products can be accurately generated in complex coastal enviroments.

  14. A comparison of acoustic and observed sediment classifications as predictor variables for modelling biotope distributions in Galway Bay, Ireland

    NASA Astrophysics Data System (ADS)

    O'Carroll, Jack P. J.; Kennedy, Robert; Ren, Lei; Nash, Stephen; Hartnett, Michael; Brown, Colin

    2017-10-01

    The INFOMAR (Integrated Mapping For the Sustainable Development of Ireland's Marine Resource) initiative has acoustically mapped and classified a significant proportion of Ireland's Exclusive Economic Zone (EEZ), and is likely to be an important tool in Ireland's efforts to meet the criteria of the MSFD. In this study, open source and relic data were used in combination with new grab survey data to model EUNIS level 4 biotope distributions in Galway Bay, Ireland. The correct prediction rates of two artificial neural networks (ANNs) were compared to assess the effectiveness of acoustic sediment classifications versus sediments that were visually classified by an expert in the field as predictor variables. To test for autocorrelation between predictor variables the RELATE routine with Spearman rank correlation method was used. Optimal models were derived by iteratively removing predictor variables and comparing the correct prediction rates of each model. The models with the highest correct prediction rates were chosen as optimal. The optimal models each used a combination of salinity (binary; 0 = polyhaline and 1 = euhaline), proximity to reef (binary; 0 = within 50 m and 1 = outside 50 m), depth (continuous; metres) and a sediment descriptor (acoustic or observed) as predictor variables. As the status of benthic habitats is required to be assessed under the MSFD the Ecological Status (ES) of the subtidal sediments of Galway Bay was also assessed using the Infaunal Quality Index. The ANN that used observed sediment classes as predictor variables could correctly predict the distribution of biotopes 67% of the time, compared to 63% for the ANN using acoustic sediment classes. Acoustic sediment ANN predictions were affected by local sediment heterogeneity, and the lack of a mixed sediment class. The all-round poor performance of ANNs is likely to be a result of the temporally variable and sparsely distributed data within the study area.

  15. A combination of feature extraction methods with an ensemble of different classifiers for protein structural class prediction problem.

    PubMed

    Dehzangi, Abdollah; Paliwal, Kuldip; Sharma, Alok; Dehzangi, Omid; Sattar, Abdul

    2013-01-01

    Better understanding of structural class of a given protein reveals important information about its overall folding type and its domain. It can also be directly used to provide critical information on general tertiary structure of a protein which has a profound impact on protein function determination and drug design. Despite tremendous enhancements made by pattern recognition-based approaches to solve this problem, it still remains as an unsolved issue for bioinformatics that demands more attention and exploration. In this study, we propose a novel feature extraction model that incorporates physicochemical and evolutionary-based information simultaneously. We also propose overlapped segmented distribution and autocorrelation-based feature extraction methods to provide more local and global discriminatory information. The proposed feature extraction methods are explored for 15 most promising attributes that are selected from a wide range of physicochemical-based attributes. Finally, by applying an ensemble of different classifiers namely, Adaboost.M1, LogitBoost, naive Bayes, multilayer perceptron (MLP), and support vector machine (SVM) we show enhancement of the protein structural class prediction accuracy for four popular benchmarks.

  16. Sentimental Analysis for Airline Twitter data

    NASA Astrophysics Data System (ADS)

    Dutta Das, Deb; Sharma, Sharan; Natani, Shubham; Khare, Neelu; Singh, Brijendra

    2017-11-01

    Social Media has taken the world by surprise at a swift and commendable pace. With the advent of any kind of circumstances may it be related to social, political or current affairs the sentiments of people throughout the world are expressed through their help, making them suitable candidates for sentiment mining. Sentimental analysis becomes highly resourceful for any organization who wants to analyse and enhance their products and services. In the airline industries it is much easier to get feedback from astute data source such as Twitter, for conducting a sentiment analysis on their respective customers. The beneficial factors relating to twitter sentiment analysis cannot be impeded by the consumers who want to know the who’s who and what’s what in everyday life. In this paper we are classifying sentiment of Twitter messages by exhibiting results of a machine learning algorithm using R and Rapid Miner. The tweets are extracted and pre-processed and then categorizing them in neutral, negative and positive sentiments finally summarising the results as a whole. The Naive Bayes algorithm has been used for classifying the sentiments of recent tweets done on the different airlines.

  17. Modeling of plug-in electric vehicle travel patterns and charging load based on trip chain generation

    NASA Astrophysics Data System (ADS)

    Wang, Dai; Gao, Junyu; Li, Pan; Wang, Bin; Zhang, Cong; Saxena, Samveg

    2017-08-01

    Modeling PEV travel and charging behavior is the key to estimate the charging demand and further explore the potential of providing grid services. This paper presents a stochastic simulation methodology to generate itineraries and charging load profiles for a population of PEVs based on real-world vehicle driving data. In order to describe the sequence of daily travel activities, we use the trip chain model which contains the detailed information of each trip, namely start time, end time, trip distance, start location and end location. A trip chain generation method is developed based on the Naive Bayes model to generate a large number of trips which are temporally and spatially coupled. We apply the proposed methodology to investigate the multi-location charging loads in three different scenarios. Simulation results show that home charging can meet the energy demand of the majority of PEVs in an average condition. In addition, we calculate the lower bound of charging load peak on the premise of lowest charging cost. The results are instructive for the design and construction of charging facilities to avoid excessive infrastructure.

  18. Calcium-mediated shaping of naive CD4 T-cell phenotype and function

    PubMed Central

    Guichard, Vincent; Bonilla, Nelly; Durand, Aurélie; Audemard-Verger, Alexandra; Guilbert, Thomas; Martin, Bruno

    2017-01-01

    Continuous contact with self-major histocompatibility complex ligands is essential for the survival of naive CD4 T cells. We have previously shown that the resulting tonic TCR signaling also influences their fate upon activation by increasing their ability to differentiate into induced/peripheral regulatory T cells. To decipher the molecular mechanisms governing this process, we here focus on the TCR signaling cascade and demonstrate that a rise in intracellular calcium levels is sufficient to modulate the phenotype of mouse naive CD4 T cells and to increase their sensitivity to regulatory T-cell polarization signals, both processes relying on calcineurin activation. Accordingly, in vivo calcineurin inhibition leads the most self-reactive naive CD4 T cells to adopt the phenotype of their less self-reactive cell-counterparts. Collectively, our findings demonstrate that calcium-mediated activation of the calcineurin pathway acts as a rheostat to shape both the phenotype and effector potential of naive CD4 T cells in the steady-state. PMID:29239722

  19. 'Educated' dendritic cells act as messengers from memory to naive T helper cells.

    PubMed

    Alpan, Oral; Bachelder, Eric; Isil, Eda; Arnheiter, Heinz; Matzinger, Polly

    2004-06-01

    Ingested antigens lead to the generation of effector T cells that secrete interleukin 4 (IL-4) rather than interferon-gamma (IFN-gamma) and are capable of influencing naive T cells in their immediate environment to do the same. Using chimeric mice generated by aggregation of two genotypically different embryos, we found that the conversion of a naive T cell occurs only if it can interact with the same antigen-presenting cell, although not necessarily the same antigen, as the effector T cell. Using a two-step culture system in vitro, we found that antigen-presenting dendritic cells can act as 'temporal bridges' to relay information from orally immunized memory CD4 T cells to naive CD4 T cells. The orally immunized T cells use IL-4 and IL-10 (but not CD40 ligand) to 'educate' dendritic cells, which in turn induce naive T cells to produce the same cytokines as those produced by the orally immunized memory T cells.

  20. Naive B cells generate regulatory T cells in the presence of a mature immunologic synapse.

    PubMed

    Reichardt, Peter; Dornbach, Bastian; Rong, Song; Beissert, Stefan; Gueler, Faikah; Loser, Karin; Gunzer, Matthias

    2007-09-01

    Naive B cells are ineffective antigen-presenting cells and are considered unable to activate naive T cells. However, antigen-specific contact of these cells leads to stable cell pairs that remain associated over hours in vivo. The physiologic role of such pairs has not been evaluated. We show here that antigen-specific conjugates between naive B cells and naive T cells display a mature immunologic synapse in the contact zone that is absent in T-cell-dendritic-cell (DC) pairs. B cells induce substantial proliferation but, contrary to DCs, no loss of L-selectin in T cells. Surprisingly, while DC-triggered T cells develop into normal effector cells, B-cell stimulation over 72 hours induces regulatory T cells inhibiting priming of fresh T cells in a contact-dependent manner in vitro. In vivo, the regulatory T cells home to lymph nodes where they potently suppress immune responses such as in cutaneous hypersensitivity and ectopic allogeneic heart transplant rejection. Our finding might help to explain old observations on tolerance induction by B cells, identify the mature immunologic synapse as a central functional module of this process, and suggest the use of naive B-cell-primed regulatory T cells, "bTregs," as a useful approach for therapeutic intervention in adverse adaptive immune responses.

  1. Dynamic phenotypic restructuring of the CD4 and CD8 T-cell subsets with age in healthy humans: a compartmental model analysis.

    PubMed

    Jackola, D R; Hallgren, H M

    1998-11-16

    In healthy humans, phenotypic restructuring occurs with age within the CD3+ T-lymphocyte complement. This is characterized by a non-linear decrease of the percentage of 'naive' (CD45RA+) cells and a corresponding non-linear increase of the percentage of 'memory' (CD45R0+) cells among both the CD4+ and CD8+ T-cell subsets. We devised a simple compartmental model to study the age-dependent kinetics of phenotypic restructuring. We also derived differential equations whose parameters determined yearly gains minus losses of the percentage and absolute numbers of circulating naive cells, yearly gains minus losses of the percentage and absolute numbers of circulating memory cells, and the yearly rate of conversion of naive to memory cells. Solutions of these evaluative differential equations demonstrate the following: (1) the memory cell complement 'resides' within its compartment for a longer time than the naive cell complement within its compartment for both CD4 and CD8 cells; (2) the average, annual 'turnover rate' is the same for CD4 and CD8 naive cells. In contrast, the average, annual 'turnover rate' for memory CD8 cells is 1.5 times that of memory CD4 cells; (3) the average, annual conversion rate of CD4 naive cells to memory cells is twice that of the CD8 conversion rate; (4) a transition in dynamic restructuring occurs during the third decade of life that is due to these differences in turnover and conversion rates, between and from naive to memory cells.

  2. Overview of existing algorithms for emotion classification. Uncertainties in evaluations of accuracies.

    NASA Astrophysics Data System (ADS)

    Avetisyan, H.; Bruna, O.; Holub, J.

    2016-11-01

    A numerous techniques and algorithms are dedicated to extract emotions from input data. In our investigation it was stated that emotion-detection approaches can be classified into 3 following types: Keyword based / lexical-based, learning based, and hybrid. The most commonly used techniques, such as keyword-spotting method, Support Vector Machines, Naïve Bayes Classifier, Hidden Markov Model and hybrid algorithms, have impressive results in this sphere and can reach more than 90% determining accuracy.

  3. Learning To Recognize Visual Concepts: Development and Implementation of a Method for Texture Concept Acquisition Through Inductive Learning

    DTIC Science & Technology

    1993-01-01

    Maria and My Parents, Helena and Andrzej IV ACKNOWLEDGMENTS I would like to first of all thank my advisor. Dr. Ryszard Michalski. who introduced...represent the current state of the art in machine learning methodology. The most popular method. the minimization of Bayes risk [ Duda and Hart. 1973]. is a...34 Pattern Recognition, Vol. 23, no. 3-4, pp. 291-309, 1990. Duda , O. and P. Hart, Pattern Classification and Scene Analysis, John Wiley & Sons. 1973

  4. Variability in energy density of forage fishes from the Bay of Biscay (north-east Atlantic Ocean): reliability of functional grouping based on prey quality.

    PubMed

    Spitz, J; Jouma'a, J

    2013-06-01

    Energy densities of 670 fishes belonging to nine species were measured to evaluate intraspecific variability. Functional groups based on energy density appeared to be sufficiently robust to individual variability to provide a classification of forage fish quality applicable in a variety of ecological fields including ecosystem modelling. © 2013 The Authors. Journal of Fish Biology © 2013 The Fisheries Society of the British Isles.

  5. Simulation of scenario earthquake influenced field by using GIS

    USGS Publications Warehouse

    Zuo, H.-Q.; Xie, L.-L.; Borcherdt, R.D.

    1999-01-01

    The method for estimating the site effect on ground motion specified by Borcherdt (1994a, 1994b) is briefly introduced in the paper. This method and the detail geological data and site classification data in San Francisco bay area of California, the United States, are applied to simulate the influenced field of scenario earthquake by GIS technology, and the software for simulating has been drawn up. The paper is a partial result of cooperative research project between China Seismological Bureau and US Geological Survey.

  6. Mechanics of Composite Materials with Different Moduli in Tension and Compression

    DTIC Science & Technology

    1978-07-01

    100% and 400% for carbon-carbon. The principal objective DD N 73 1473 EDITION OF I NOV65 IS OBSOLETE UNCLASSIFIED i i SECURITY CLASSIFICATION OF THIS...corrected. 40 TABLE 2.3 BUCKLING OF PAYLOAD BAY DOOR PANELS WITH VARIOUS LIGHTNING STRIKE PROTECTION CONCEPTS BUCKLING LOAD, N ., lb/in. CONFIGURATION...ORTHOTROPY AND HIGH Et/Ec p 70 P CC"’ CHANGE C02 CHAC -l- AXIAL CHANGE COMMISSION INTIIUNAL IXTERNAL i peamal PRESSURE 40 60 s AXIAL 0 IAN C. TENMiON

  7. Induction of cross-priming of naive CD8+ T lymphocytes by recombinant bacillus Calmette-Guerin that secretes heat shock protein 70-major membrane protein-II fusion protein.

    PubMed

    Mukai, Tetsu; Maeda, Yumi; Tamura, Toshiki; Matsuoka, Masanori; Tsukamoto, Yumiko; Makino, Masahiko

    2009-11-15

    Because Mycobacterium bovis bacillus Calmette-Guérin (BCG) unconvincingly activates human naive CD8(+) T cells, a rBCG (BCG-70M) that secretes a fusion protein comprising BCG-derived heat shock protein (HSP)70 and Mycobacterium leprae-derived major membrane protein (MMP)-II, one of the immunodominant Ags of M. leprae, was newly constructed to potentiate the ability of activating naive CD8(+) T cells through dendritic cells (DC). BCG-70M secreted HSP70-MMP-II fusion protein in vitro, which stimulated DC to produce IL-12p70 through TLR2. BCG-70M-infected DC activated not only memory and naive CD8(+) T cells, but also CD4(+) T cells of both types to produce IFN-gamma. The activation of these naive T cells by BCG-70M was dependent on the MHC and CD86 molecules on BCG-70M-infected DC, and was significantly inhibited by pretreatment of DC with chloroquine. Both brefeldin A and lactacystin significantly inhibited the activation of naive CD8(+) T cells by BCG-70M through DC. Thus, the CD8(+) T cell activation may be induced by cross-presentation of Ags through a TAP- and proteosome-dependent cytosolic pathway. When naive CD8(+) T cells were stimulated by BCG-70M-infected DC in the presence of naive CD4(+) T cells, CD62L(low)CD8(+) T cells and perforin-producing CD8(+) T cells were efficiently produced. MMP-II-reactive CD4(+) and CD8(+) memory T cells were efficiently produced in C57BL/6 mice by infection with BCG-70M. These results indicate that BCG-70M activated DC, CD4(+) T cells, and CD8(+) T cells, and the combination of HSP70 and MMP-II may be useful for inducing better T cell activation.

  8. Feature selection for the classification of traced neurons.

    PubMed

    López-Cabrera, José D; Lorenzo-Ginori, Juan V

    2018-06-01

    The great availability of computational tools to calculate the properties of traced neurons leads to the existence of many descriptors which allow the automated classification of neurons from these reconstructions. This situation determines the necessity to eliminate irrelevant features as well as making a selection of the most appropriate among them, in order to improve the quality of the classification obtained. The dataset used contains a total of 318 traced neurons, classified by human experts in 192 GABAergic interneurons and 126 pyramidal cells. The features were extracted by means of the L-measure software, which is one of the most used computational tools in neuroinformatics to quantify traced neurons. We review some current feature selection techniques as filter, wrapper, embedded and ensemble methods. The stability of the feature selection methods was measured. For the ensemble methods, several aggregation methods based on different metrics were applied to combine the subsets obtained during the feature selection process. The subsets obtained applying feature selection methods were evaluated using supervised classifiers, among which Random Forest, C4.5, SVM, Naïve Bayes, Knn, Decision Table and the Logistic classifier were used as classification algorithms. Feature selection methods of types filter, embedded, wrappers and ensembles were compared and the subsets returned were tested in classification tasks for different classification algorithms. L-measure features EucDistanceSD, PathDistanceSD, Branch_pathlengthAve, Branch_pathlengthSD and EucDistanceAve were present in more than 60% of the selected subsets which provides evidence about their importance in the classification of this neurons. Copyright © 2018 Elsevier B.V. All rights reserved.

  9. Brain Decoding-Classification of Hand Written Digits from fMRI Data Employing Bayesian Networks

    PubMed Central

    Yargholi, Elahe'; Hossein-Zadeh, Gholam-Ali

    2016-01-01

    We are frequently exposed to hand written digits 0–9 in today's modern life. Success in decoding-classification of hand written digits helps us understand the corresponding brain mechanisms and processes and assists seriously in designing more efficient brain–computer interfaces. However, all digits belong to the same semantic category and similarity in appearance of hand written digits makes this decoding-classification a challenging problem. In present study, for the first time, augmented naïve Bayes classifier is used for classification of functional Magnetic Resonance Imaging (fMRI) measurements to decode the hand written digits which took advantage of brain connectivity information in decoding-classification. fMRI was recorded from three healthy participants, with an age range of 25–30. Results in different brain lobes (frontal, occipital, parietal, and temporal) show that utilizing connectivity information significantly improves decoding-classification and capability of different brain lobes in decoding-classification of hand written digits were compared to each other. In addition, in each lobe the most contributing areas and brain connectivities were determined and connectivities with short distances between their endpoints were recognized to be more efficient. Moreover, data driven method was applied to investigate the similarity of brain areas in responding to stimuli and this revealed both similarly active areas and active mechanisms during this experiment. Interesting finding was that during the experiment of watching hand written digits, there were some active networks (visual, working memory, motor, and language processing), but the most relevant one to the task was language processing network according to the voxel selection. PMID:27468261

  10. Does expert knowledge improve automatic probabilistic classification of gait joint motion patterns in children with cerebral palsy?

    PubMed Central

    Papageorgiou, Eirini; Nieuwenhuys, Angela; Desloovere, Kaat

    2017-01-01

    Background This study aimed to improve the automatic probabilistic classification of joint motion gait patterns in children with cerebral palsy by using the expert knowledge available via a recently developed Delphi-consensus study. To this end, this study applied both Naïve Bayes and Logistic Regression classification with varying degrees of usage of the expert knowledge (expert-defined and discretized features). A database of 356 patients and 1719 gait trials was used to validate the classification performance of eleven joint motions. Hypotheses Two main hypotheses stated that: (1) Joint motion patterns in children with CP, obtained through a Delphi-consensus study, can be automatically classified following a probabilistic approach, with an accuracy similar to clinical expert classification, and (2) The inclusion of clinical expert knowledge in the selection of relevant gait features and the discretization of continuous features increases the performance of automatic probabilistic joint motion classification. Findings This study provided objective evidence supporting the first hypothesis. Automatic probabilistic gait classification using the expert knowledge available from the Delphi-consensus study resulted in accuracy (91%) similar to that obtained with two expert raters (90%), and higher accuracy than that obtained with non-expert raters (78%). Regarding the second hypothesis, this study demonstrated that the use of more advanced machine learning techniques such as automatic feature selection and discretization instead of expert-defined and discretized features can result in slightly higher joint motion classification performance. However, the increase in performance is limited and does not outweigh the additional computational cost and the higher risk of loss of clinical interpretability, which threatens the clinical acceptance and applicability. PMID:28570616

  11. Novel Intersection Type Recognition for Autonomous Vehicles Using a Multi-Layer Laser Scanner.

    PubMed

    An, Jhonghyun; Choi, Baehoon; Sim, Kwee-Bo; Kim, Euntai

    2016-07-20

    There are several types of intersections such as merge-roads, diverge-roads, plus-shape intersections and two types of T-shape junctions in urban roads. When an autonomous vehicle encounters new intersections, it is crucial to recognize the types of intersections for safe navigation. In this paper, a novel intersection type recognition method is proposed for an autonomous vehicle using a multi-layer laser scanner. The proposed method consists of two steps: (1) static local coordinate occupancy grid map (SLOGM) building and (2) intersection classification. In the first step, the SLOGM is built relative to the local coordinate using the dynamic binary Bayes filter. In the second step, the SLOGM is used as an attribute for the classification. The proposed method is applied to a real-world environment and its validity is demonstrated through experimentation.

  12. Novel Intersection Type Recognition for Autonomous Vehicles Using a Multi-Layer Laser Scanner

    PubMed Central

    An, Jhonghyun; Choi, Baehoon; Sim, Kwee-Bo; Kim, Euntai

    2016-01-01

    There are several types of intersections such as merge-roads, diverge-roads, plus-shape intersections and two types of T-shape junctions in urban roads. When an autonomous vehicle encounters new intersections, it is crucial to recognize the types of intersections for safe navigation. In this paper, a novel intersection type recognition method is proposed for an autonomous vehicle using a multi-layer laser scanner. The proposed method consists of two steps: (1) static local coordinate occupancy grid map (SLOGM) building and (2) intersection classification. In the first step, the SLOGM is built relative to the local coordinate using the dynamic binary Bayes filter. In the second step, the SLOGM is used as an attribute for the classification. The proposed method is applied to a real-world environment and its validity is demonstrated through experimentation. PMID:27447640

  13. Comparative analysis of drug resistance mutations in the human immunodeficiency virus reverse transcriptase gene in patients who are non-responsive, responsive and naive to antiretroviral therapy.

    PubMed

    Misbah, Mohammad; Roy, Gaurav; Shahid, Mudassar; Nag, Nalin; Kumar, Suresh; Husain, Mohammad

    2016-05-01

    Drug resistance mutations in the Pol gene of human immunodeficiency virus 1 (HIV-1) are one of the critical factors associated with antiretroviral therapy (ART) failure in HIV-1 patients. The issue of resistance to reverse transcriptase inhibitors (RTIs) in HIV infection has not been adequately addressed in the Indian subcontinent. We compared HIV-1 reverse transcriptase (RT) gene sequences to identify mutations present in HIV-1 patients who were ART non-responders, ART responders and drug naive. Genotypic drug resistance testing was performed by sequencing a 655-bp region of the RT gene from 102 HIV-1 patients, consisting of 30 ART-non-responding, 35 ART-responding and 37 drug-naive patients. The Stanford HIV Resistance Database (HIVDBv 6.2), IAS-USA mutation list, ANRS_09/2012 algorithm, and Rega v8.02 algorithm were used to interpret the pattern of drug resistance. The majority of the sequences (96 %) belonged to subtype C, and a few of them (3.9 %) to subtype A1. The frequency of drug resistance mutations observed in ART-non-responding, ART-responding and drug-naive patients was 40.1 %, 10.7 % and 20.58 %, respectively. It was observed that in non-responders, multiple mutations were present in the same patient, while in responders, a single mutation was found. Some of the drug-naive patients had more than one mutation. Thymidine analogue mutations (TAMs), however, were found in non-responders and naive patients but not in responders. Although drug resistance mutations were widely distributed among ART non-responders, the presence of resistance mutations in the viruses of drug-naive patients poses a big concern in the absence of a genotyping resistance test.

  14. Nicotinic Acid Adenine Dinucleotide Phosphate Plays a Critical Role in Naive and Effector Murine T Cells but Not Natural Regulatory T Cells.

    PubMed

    Ali, Ramadan A; Camick, Christina; Wiles, Katherine; Walseth, Timothy F; Slama, James T; Bhattacharya, Sumit; Giovannucci, David R; Wall, Katherine A

    2016-02-26

    Nicotinic acid adenine dinucleotide phosphate (NAADP), the most potent Ca(2+) mobilizing second messenger discovered to date, has been implicated in Ca(2+) signaling in some lymphomas and T cell clones. In contrast, the role of NAADP in Ca(2+) signaling or the identity of the Ca(2+) stores targeted by NAADP in conventional naive T cells is less clear. In the current study, we demonstrate the importance of NAADP in the generation of Ca(2+) signals in murine naive T cells. Combining live-cell imaging methods and a pharmacological approach using the NAADP antagonist Ned-19, we addressed the involvement of NAADP in the generation of Ca(2+) signals evoked by TCR stimulation and the role of this signal in downstream physiological end points such as proliferation, cytokine production, and other responses to stimulation. We demonstrated that acidic compartments in addition to the endoplasmic reticulum were the Ca(2+) stores that were sensitive to NAADP in naive T cells. NAADP was shown to evoke functionally relevant Ca(2+) signals in both naive CD4 and naive CD8 T cells. Furthermore, we examined the role of this signal in the activation, proliferation, and secretion of effector cytokines by Th1, Th2, Th17, and CD8 effector T cells. Overall, NAADP exhibited a similar profile in mediating Ca(2+) release in effector T cells as in their counterpart naive T cells and seemed to be equally important for the function of these different subsets of effector T cells. This profile was not observed for natural T regulatory cells. © 2016 by The American Society for Biochemistry and Molecular Biology, Inc.

  15. Nicotinic Acid Adenine Dinucleotide Phosphate Plays a Critical Role in Naive and Effector Murine T Cells but Not Natural Regulatory T Cells*

    PubMed Central

    Ali, Ramadan A.; Camick, Christina; Wiles, Katherine; Walseth, Timothy F.; Slama, James T.; Bhattacharya, Sumit; Giovannucci, David R.; Wall, Katherine A.

    2016-01-01

    Nicotinic acid adenine dinucleotide phosphate (NAADP), the most potent Ca2+ mobilizing second messenger discovered to date, has been implicated in Ca2+ signaling in some lymphomas and T cell clones. In contrast, the role of NAADP in Ca2+ signaling or the identity of the Ca2+ stores targeted by NAADP in conventional naive T cells is less clear. In the current study, we demonstrate the importance of NAADP in the generation of Ca2+ signals in murine naive T cells. Combining live-cell imaging methods and a pharmacological approach using the NAADP antagonist Ned-19, we addressed the involvement of NAADP in the generation of Ca2+ signals evoked by TCR stimulation and the role of this signal in downstream physiological end points such as proliferation, cytokine production, and other responses to stimulation. We demonstrated that acidic compartments in addition to the endoplasmic reticulum were the Ca2+ stores that were sensitive to NAADP in naive T cells. NAADP was shown to evoke functionally relevant Ca2+ signals in both naive CD4 and naive CD8 T cells. Furthermore, we examined the role of this signal in the activation, proliferation, and secretion of effector cytokines by Th1, Th2, Th17, and CD8 effector T cells. Overall, NAADP exhibited a similar profile in mediating Ca2+ release in effector T cells as in their counterpart naive T cells and seemed to be equally important for the function of these different subsets of effector T cells. This profile was not observed for natural T regulatory cells. PMID:26728458

  16. A Combined Omics Approach to Generate the Surface Atlas of Human Naive CD4+ T Cells during Early T-Cell Receptor Activation*

    PubMed Central

    Graessel, Anke; Hauck, Stefanie M.; von Toerne, Christine; Kloppmann, Edda; Goldberg, Tatyana; Koppensteiner, Herwig; Schindler, Michael; Knapp, Bettina; Krause, Linda; Dietz, Katharina; Schmidt-Weber, Carsten B.; Suttner, Kathrin

    2015-01-01

    Naive CD4+ T cells are the common precursors of multiple effector and memory T-cell subsets and possess a high plasticity in terms of differentiation potential. This stem-cell-like character is important for cell therapies aiming at regeneration of specific immunity. Cell surface proteins are crucial for recognition and response to signals mediated by other cells or environmental changes. Knowledge of cell surface proteins of human naive CD4+ T cells and their changes during the early phase of T-cell activation is urgently needed for a guided differentiation of naive T cells and may support the selection of pluripotent cells for cell therapy. Periodate oxidation and aniline-catalyzed oxime ligation technology was applied with subsequent quantitative liquid chromatography-tandem MS to generate a data set describing the surface proteome of primary human naive CD4+ T cells and to monitor dynamic changes during the early phase of activation. This led to the identification of 173 N-glycosylated surface proteins. To independently confirm the proteomic data set and to analyze the cell surface by an alternative technique a systematic phenotypic expression analysis of surface antigens via flow cytometry was performed. This screening expanded the previous data set, resulting in 229 surface proteins, which were expressed on naive unstimulated and activated CD4+ T cells. Furthermore, we generated a surface expression atlas based on transcriptome data, experimental annotation, and predicted subcellular localization, and correlated the proteomics result with this transcriptional data set. This extensive surface atlas provides an overall naive CD4+ T cell surface resource and will enable future studies aiming at a deeper understanding of mechanisms of T-cell biology allowing the identification of novel immune targets usable for the development of therapeutic treatments. PMID:25991687

  17. Highly efficient gene transfer in naive human T cells with a murine leukemia virus-based vector.

    PubMed

    Dardalhon, V; Jaleco, S; Rebouissou, C; Ferrand, C; Skander, N; Swainson, L; Tiberghien, P; Spits, H; Noraz, N; Taylor, N

    2000-08-01

    Retroviral vectors based on the Moloney murine leukemia virus (MuLV) have become the primary tool for gene delivery into hematopoietic cells, but clinical trials have been hampered by low transduction efficiencies. Recently, we and others have shown that gene transfer of MuLV-based vectors into T cells can be significantly augmented using a fibronectin-facilitated protocol. Nevertheless, the relative abilities of naive (CD45RA(+)) and memory (CD45RO(+)) lymphocyte subsets to be transduced has not been assessed. Although naive T cells demonstrate a restricted cytokine profile following antigen stimulation and a decreased susceptibility to infection with human immunodeficiency virus, it was not clear whether they could be efficiently infected with a MuLV vector. This study describes conditions that permitted gene transfer of an enhanced green fluorescent protein-expressing retroviral vector in more than 50% of naive umbilical cord (UC) blood and peripheral blood (PB) T cells following CD3/CD28 ligation. Moreover, treatment of naive T cells with interleukin-7 resulted in the maintenance of a CD45RA phenotype and gene transfer levels approached 20%. Finally, it was determined that parameters for optimal transduction of CD45RA(+) T cells isolated from PB and UC blood differed: transduction of the UC cells was significantly increased by the presence of autologous mononuclear cells (24.5% versus 56.5%). Because naive T cells harbor a receptor repertoire that allows them to respond to novel antigens, the development of protocols targeting their transduction is crucial for gene therapy applications. This approach will also allow the functions of exogenous genes to be evaluated in primary nontransformed naive T cells.

  18. The Still Bay and Howiesons Poort at Sibudu and Blombos: Understanding Middle Stone Age Technologies.

    PubMed

    Soriano, Sylvain; Villa, Paola; Delagnes, Anne; Degano, Ilaria; Pollarolo, Luca; Lucejko, Jeannette J; Henshilwood, Christopher; Wadley, Lyn

    2015-01-01

    The classification of archaeological assemblages in the Middle Stone Age of South Africa in terms of diversity and temporal continuity has significant implications with respect to recent cultural evolutionary models which propose either gradual accumulation or discontinuous, episodic processes for the emergence and diffusion of cultural traits. We present the results of a systematic technological and typological analysis of the Still Bay assemblages from Sibudu and Blombos. A similar approach is used in the analysis of the Howiesons Poort (HP) assemblages from Sibudu seen in comparison with broadly contemporaneous assemblages from Rose Cottage and Klasies River Cave 1A. Using our own and published data from other sites we report on the diversity between stone artifact assemblages and discuss to what extent they can be grouped into homogeneous lithic sets. The gradual evolution of debitage techniques within the Howiesons Poort sequence with a progressive abandonment of the HP technological style argues against the saltational model for its disappearance while the technological differences between the Sibudu and Blombos Still Bay artifacts considerably weaken an interpretation of similarities between the assemblages and their grouping into the same cultural unit. Limited sampling of a fragmented record may explain why simple models of cultural evolution do not seem to apply to a complex reality.

  19. Land Use Patterns and Fecal Contamination of Coastal Waters in Western Puerto Rico

    NASA Technical Reports Server (NTRS)

    Norat, Jose

    1994-01-01

    The Department of Environmental Health of the Graduate School of Public Health of the Medical Sciences Campus, University of Puerto Rico (UPR-RCM) conducted this research project on how different patterns of land use affect the microbiological quality of rivers flowing into Mayaguez Bay in Western Puerto Rico. Coastal shellfish growing areas, stream and ocean bathing beaches, and pristine marine sites in the Bay are affected by the discharge of the three study rivers. Satellite imagery was used to study watershed land uses which serve as point and nonpoint sources of pathogens affecting stream and coastal water users. The study rivers drain watersheds of different size and type of human activity (including different human waste treatment and disposal facilities). Land use and land cover in the study watersheds were interpreted, classified and mapped using remotely sensed images from NASA's Landsat Thematic Mapper (TM). This study found there is a significant relationship between watershed land cover and microbiological water quality of rivers flowing into Mayaguez Bay in Western Puerto Rico. Land covers in the Guanajibo, Anasco, and Yaguez watersheds were classified into forested areas, pastures, agricultural zones and urban areas so as to determine relative contributions to fecal water contamination. The land cover classification was made processing TM images with IDRISI and ERDAS software.

  20. The Still Bay and Howiesons Poort at Sibudu and Blombos: Understanding Middle Stone Age Technologies

    PubMed Central

    Soriano, Sylvain; Villa, Paola; Delagnes, Anne; Degano, Ilaria; Pollarolo, Luca; Lucejko, Jeannette J.; Henshilwood, Christopher; Wadley, Lyn

    2015-01-01

    The classification of archaeological assemblages in the Middle Stone Age of South Africa in terms of diversity and temporal continuity has significant implications with respect to recent cultural evolutionary models which propose either gradual accumulation or discontinuous, episodic processes for the emergence and diffusion of cultural traits. We present the results of a systematic technological and typological analysis of the Still Bay assemblages from Sibudu and Blombos. A similar approach is used in the analysis of the Howiesons Poort (HP) assemblages from Sibudu seen in comparison with broadly contemporaneous assemblages from Rose Cottage and Klasies River Cave 1A. Using our own and published data from other sites we report on the diversity between stone artifact assemblages and discuss to what extent they can be grouped into homogeneous lithic sets. The gradual evolution of debitage techniques within the Howiesons Poort sequence with a progressive abandonment of the HP technological style argues against the saltational model for its disappearance while the technological differences between the Sibudu and Blombos Still Bay artifacts considerably weaken an interpretation of similarities between the assemblages and their grouping into the same cultural unit. Limited sampling of a fragmented record may explain why simple models of cultural evolution do not seem to apply to a complex reality. PMID:26161665

  1. Gas bubbles in marine mud-How small are they?

    NASA Astrophysics Data System (ADS)

    Reed, Allen H.; Briggs, Kevin B.

    2003-10-01

    Free gas in marine mud poses a challenging problem in the realm of ocean acoustics as it readily attenuates (i.e., scatters or absorbs) energy, such that objects lying below the gassy sediment are acoustically masked. Gas-laden sediments were located in 10- to 120-m water depth adjacent to the South Pass of the Mississippi River in East Bay using a 12-kHz transducer and the Acoustic Sediment Classification System. Several cores were collected in this region for physical property measurements. Some of the cores were x-rayed on medical and industrial computed tomography (CT) scanners. Volumetric CT images were used to locate gas bubbles, determine shapes and sizes to within the limits of the CT resolution. Free gas in the East Bay sediments was relegated to worm tubes as well as isolated pockets as was the case in Eckernförde Bay sediments [Abegg and Anderson, Mar. Geol. 137, 137-147 (1997)]. The primary significance of the present work is that gas bubbles have been determined to exist in the tens of μm size range, which is significantly smaller than the smallest bubbles that were previously resolved with medical CT (~440 μm) with NRL's HD-500 micro-CT System. [Work supported by ONR and NRL.

  2. Low frequency of genotypic resistance in HIV-1-infected patients failing an atazanavir-containing regimen: a clinical cohort study

    PubMed Central

    Dolling, David I.; Dunn, David T.; Sutherland, Katherine A.; Pillay, Deenan; Mbisa, Jean L.; Parry, Chris M.; Post, Frank A.; Sabin, Caroline A.; Cane, Patricia A.; Aitken, Celia; Asboe, David; Webster, Daniel; Cane, Patricia; Castro, Hannah; Dunn, David; Dolling, David; Chadwick, David; Churchill, Duncan; Clark, Duncan; Collins, Simon; Delpech, Valerie; Geretti, Anna Maria; Goldberg, David; Hale, Antony; Hué, Stéphane; Kaye, Steve; Kellam, Paul; Lazarus, Linda; Leigh-Brown, Andrew; Mackie, Nicola; Orkin, Chloe; Rice, Philip; Pillay, Deenan; Phillips, Andrew; Sabin, Caroline; Smit, Erasmus; Templeton, Kate; Tilston, Peter; Tong, William; Williams, Ian; Zhang, Hongyi; Zuckerman, Mark; Greatorex, Jane; Wildfire, Adrian; O'Shea, Siobhan; Mullen, Jane; Mbisa, Tamyo; Cox, Alison; Tandy, Richard; Hale, Tony; Fawcett, Tracy; Hopkins, Mark; Ashton, Lynn; Booth, Claire; Garcia-Diaz, Ana; Shepherd, Jill; Schmid, Matthias L.; Payne, Brendan; Hay, Phillip; Rice, Phillip; Paynter, Mary; Bibby, David; Kirk, Stuart; MacLean, Alasdair; Gunson, Rory; Coughlin, Kate; Fearnhill, Esther; Fradette, Lorraine; Porter, Kholoud; Ainsworth, Jonathan; Anderson, Jane; Babiker, Abdel; Fisher, Martin; Gazzard, Brian; Gilson, Richard; Gompels, Mark; Hill, Teresa; Johnson, Margaret; Kegg, Stephen; Leen, Clifford; Nelson, Mark; Palfreeman, Adrian; Post, Frank; Sachikonye, Memory; Schwenk, Achim; Walsh, John; Huntington, Susie; Jose, Sophie; Thornton, Alicia; Glabay, Adam; Orkin, C.; Garrett, N.; Lynch, J.; Hand, J.; de Souza, C.; Fisher, M.; Perry, N.; Tilbury, S.; Gazzard, B.; Nelson, M.; Waxman, M.; Asboe, D.; Mandalia, S.; Delpech, V.; Anderson, J.; Munshi, S.; Korat, H.; Welch, J.; Poulton, M.; MacDonald, C.; Gleisner, Z.; Campbell, L.; Gilson, R.; Brima, N.; Williams, I.; Schwenk, A.; Ainsworth, J.; Wood, C.; Miller, S.; Johnson, M.; Youle, M.; Lampe, F.; Smith, C.; Grabowska, H.; Chaloner, C.; Puradiredja, D.; Walsh, J.; Weber, J.; Ramzan, F.; Mackie, N.; Winston, A.; Leen, C.; Wilson, A.; Allan, S.; Palfreeman, A.; Moore, A.; Wakeman, K.

    2013-01-01

    Objectives To determine protease mutations that develop at viral failure for protease inhibitor (PI)-naive patients on a regimen containing the PI atazanavir. Methods Resistance tests on patients failing atazanavir, conducted as part of routine clinical care in a multicentre observational study, were randomly matched by subtype to resistance tests from PI-naive controls to account for natural polymorphisms. Mutations from the consensus B sequence across the protease region were analysed for association and defined using the IAS-USA 2011 classification list. Results Four hundred and five of 2528 (16%) patients failed therapy containing atazanavir as a first PI over a median (IQR) follow-up of 1.76 (0.84–3.15) years and 322 resistance tests were available for analysis. Recognized major atazanavir mutations were found in six atazanavir-experienced patients (P < 0.001), including I50L and N88S. The minor mutations most strongly associated with atazanavir experience were M36I, M46I, F53L, A71V, V82T and I85V (P < 0.05). Multiple novel mutations, I15S, L19T, K43T, L63P/V, K70Q, V77I and L89I/T/V, were also associated with atazanavir experience. Conclusions Viral failure on atazanavir-containing regimens was not common and major resistance mutations were rare, suggesting that adherence may be a major contributor to viral failure. Novel mutations were described that have not been previously documented. PMID:23711895

  3. Impact of the Data Collection on Adverse Events of Anti-HIV Drugs cohort study on abacavir prescription among treatment-naive, HIV-infected patients in Canada.

    PubMed

    Antoniou, Tony; Gillis, Jennifer; Loutfy, Mona R; Cooper, Curtis; Hogg, Robert S; Klein, Marina B; Machouf, Nima; Montaner, Julio S G; Rourke, Sean B; Tsoukas, Chris; Raboud, Janet M

    2014-01-01

    To evaluate the trends in abacavir (ABC) prescription among antiretroviral (ARV) medication-naive individuals following the presentation of the Data Collection on Adverse Events of Anti-HIV Drugs (DAD) cohort study. We conducted a retrospective cohort study of ARV medication-naive individuals in the Canadian Observational Cohort (CANOC). Between January 1, 2000, and February 28, 2010, a total of 7280 ARV medication-naive patients were included in CANOC. We observed a significant change in the proportion of new ABC prescriptions immediately following the release of DAD (-11%; 95% confidence interval [CI]: -20% to -2.4%) and in the months following the presentation of these data (-0.66% per month; 95% CI: -1.2% to -0.073%). A post-DAD presentation decrease in the odds of being prescribed ABC versus tenofovir (TDF) was observed (adjusted odds ratio, 0.72 per year, 95% CI: 0.54-0.97). Presentation of the DAD was associated with a significant decrease in ABC use among ARV medication-naive, HIV-positive patients initiating therapy.

  4. Age-Related Decline in Primary CD8+ T Cell Responses Is Associated with the Development of Senescence in Virtual Memory CD8+ T Cells.

    PubMed

    Quinn, Kylie M; Fox, Annette; Harland, Kim L; Russ, Brendan E; Li, Jasmine; Nguyen, Thi H O; Loh, Liyen; Olshanksy, Moshe; Naeem, Haroon; Tsyganov, Kirill; Wiede, Florian; Webster, Rosela; Blyth, Chantelle; Sng, Xavier Y X; Tiganis, Tony; Powell, David; Doherty, Peter C; Turner, Stephen J; Kedzierska, Katherine; La Gruta, Nicole L

    2018-06-19

    Age-associated decreases in primary CD8 + T cell responses occur, in part, due to direct effects on naive CD8 + T cells to reduce intrinsic functionality, but the precise nature of this defect remains undefined. Aging also causes accumulation of antigen-naive but semi-differentiated "virtual memory" (T VM ) cells, but their contribution to age-related functional decline is unclear. Here, we show that T VM cells are poorly proliferative in aged mice and humans, despite being highly proliferative in young individuals, while conventional naive T cells (T N cells) retain proliferative capacity in both aged mice and humans. Adoptive transfer experiments in mice illustrated that naive CD8 T cells can acquire a proliferative defect imposed by the aged environment but age-related proliferative dysfunction could not be rescued by a young environment. Molecular analyses demonstrate that aged T VM cells exhibit a profile consistent with senescence, marking an observation of senescence in an antigenically naive T cell population. Copyright © 2018 The Author(s). Published by Elsevier Inc. All rights reserved.

  5. Experiment Design for Nonparametric Models Based On Minimizing Bayes Risk: Application to Voriconazole1

    PubMed Central

    Bayard, David S.; Neely, Michael

    2016-01-01

    An experimental design approach is presented for individualized therapy in the special case where the prior information is specified by a nonparametric (NP) population model. Here, a nonparametric model refers to a discrete probability model characterized by a finite set of support points and their associated weights. An important question arises as to how to best design experiments for this type of model. Many experimental design methods are based on Fisher Information or other approaches originally developed for parametric models. While such approaches have been used with some success across various applications, it is interesting to note that they largely fail to address the fundamentally discrete nature of the nonparametric model. Specifically, the problem of identifying an individual from a nonparametric prior is more naturally treated as a problem of classification, i.e., to find a support point that best matches the patient’s behavior. This paper studies the discrete nature of the NP experiment design problem from a classification point of view. Several new insights are provided including the use of Bayes Risk as an information measure, and new alternative methods for experiment design. One particular method, denoted as MMopt (Multiple-Model Optimal), will be examined in detail and shown to require minimal computation while having distinct advantages compared to existing approaches. Several simulated examples, including a case study involving oral voriconazole in children, are given to demonstrate the usefulness of MMopt in pharmacokinetics applications. PMID:27909942

  6. Experiment design for nonparametric models based on minimizing Bayes Risk: application to voriconazole¹.

    PubMed

    Bayard, David S; Neely, Michael

    2017-04-01

    An experimental design approach is presented for individualized therapy in the special case where the prior information is specified by a nonparametric (NP) population model. Here, a NP model refers to a discrete probability model characterized by a finite set of support points and their associated weights. An important question arises as to how to best design experiments for this type of model. Many experimental design methods are based on Fisher information or other approaches originally developed for parametric models. While such approaches have been used with some success across various applications, it is interesting to note that they largely fail to address the fundamentally discrete nature of the NP model. Specifically, the problem of identifying an individual from a NP prior is more naturally treated as a problem of classification, i.e., to find a support point that best matches the patient's behavior. This paper studies the discrete nature of the NP experiment design problem from a classification point of view. Several new insights are provided including the use of Bayes Risk as an information measure, and new alternative methods for experiment design. One particular method, denoted as MMopt (multiple-model optimal), will be examined in detail and shown to require minimal computation while having distinct advantages compared to existing approaches. Several simulated examples, including a case study involving oral voriconazole in children, are given to demonstrate the usefulness of MMopt in pharmacokinetics applications.

  7. Voxel-based Gaussian naïve Bayes classification of ischemic stroke lesions in individual T1-weighted MRI scans.

    PubMed

    Griffis, Joseph C; Allendorfer, Jane B; Szaflarski, Jerzy P

    2016-01-15

    Manual lesion delineation by an expert is the standard for lesion identification in MRI scans, but it is time-consuming and can introduce subjective bias. Alternative methods often require multi-modal MRI data, user interaction, scans from a control population, and/or arbitrary statistical thresholding. We present an approach for automatically identifying stroke lesions in individual T1-weighted MRI scans using naïve Bayes classification. Probabilistic tissue segmentation and image algebra were used to create feature maps encoding information about missing and abnormal tissue. Leave-one-case-out training and cross-validation was used to obtain out-of-sample predictions for each of 30 cases with left hemisphere stroke lesions. Our method correctly predicted lesion locations for 30/30 un-trained cases. Post-processing with smoothing (8mm FWHM) and cluster-extent thresholding (100 voxels) was found to improve performance. Quantitative evaluations of post-processed out-of-sample predictions on 30 cases revealed high spatial overlap (mean Dice similarity coefficient=0.66) and volume agreement (mean percent volume difference=28.91; Pearson's r=0.97) with manual lesion delineations. Our automated approach agrees with manual tracing. It provides an alternative to automated methods that require multi-modal MRI data, additional control scans, or user interaction to achieve optimal performance. Our fully trained classifier has applications in neuroimaging and clinical contexts. Copyright © 2015 Elsevier B.V. All rights reserved.

  8. Prediction of carbonate rock type from NMR responses using data mining techniques

    NASA Astrophysics Data System (ADS)

    Gonçalves, Eduardo Corrêa; da Silva, Pablo Nascimento; Silveira, Carla Semiramis; Carneiro, Giovanna; Domingues, Ana Beatriz; Moss, Adam; Pritchard, Tim; Plastino, Alexandre; Azeredo, Rodrigo Bagueira de Vasconcellos

    2017-05-01

    Recent studies have indicated that the accurate identification of carbonate rock types in a reservoir can be employed as a preliminary step to enhance the effectiveness of petrophysical property modeling. Furthermore, rock typing activity has been shown to be of key importance in several steps of formation evaluation, such as the study of sedimentary series, reservoir zonation and well-to-well correlation. In this paper, a methodology based exclusively on the analysis of 1H-NMR (Nuclear Magnetic Resonance) relaxation responses - using data mining algorithms - is evaluated to perform the automatic classification of carbonate samples according to their rock type. We analyze the effectiveness of six different classification algorithms (k-NN, Naïve Bayes, C4.5, Random Forest, SMO and Multilayer Perceptron) and two data preprocessing strategies (discretization and feature selection). The dataset used in this evaluation is formed by 78 1H-NMR T2 distributions of fully brine-saturated rock samples from six different rock type classes. The experiments reveal that the combination of preprocessing strategies with classification algorithms is able to achieve a prediction accuracy of 97.4%.

  9. A semi-automated method for bone age assessment using cervical vertebral maturation.

    PubMed

    Baptista, Roberto S; Quaglio, Camila L; Mourad, Laila M E H; Hummel, Anderson D; Caetano, Cesar Augusto C; Ortolani, Cristina Lúcia F; Pisa, Ivan T

    2012-07-01

    To propose a semi-automated method for pattern classification to predict individuals' stage of growth based on morphologic characteristics that are described in the modified cervical vertebral maturation (CVM) method of Baccetti et al. A total of 188 lateral cephalograms were collected, digitized, evaluated manually, and grouped into cervical stages by two expert examiners. Landmarks were located on each image and measured. Three pattern classifiers based on the Naïve Bayes algorithm were built and assessed using a software program. The classifier with the greatest accuracy according to the weighted kappa test was considered best. The classifier showed a weighted kappa coefficient of 0.861 ± 0.020. If an adjacent estimated pre-stage or poststage value was taken to be acceptable, the classifier would show a weighted kappa coefficient of 0.992 ± 0.019. Results from this study show that the proposed semi-automated pattern classification method can help orthodontists identify the stage of CVM. However, additional studies are needed before this semi-automated classification method for CVM assessment can be implemented in clinical practice.

  10. Classification of ground glass opacity lesion characteristic based on texture feature using lung CT image

    NASA Astrophysics Data System (ADS)

    Sebatubun, M. M.; Haryawan, C.; Windarta, B.

    2018-03-01

    Lung cancer causes a high mortality rate in the world than any other cancers. That can be minimised if the symptoms and cancer cells have been detected early. One of the techniques used to detect lung cancer is by computed tomography (CT) scan. CT scan images have been used in this study to identify one of the lesion characteristics named ground glass opacity (GGO). It has been used to determine the level of malignancy of the lesion. There were three phases in identifying GGO: image cropping, feature extraction using grey level co-occurrence matrices (GLCM) and classification using Naïve Bayes Classifier. In order to improve the classification results, the most significant feature was sought by feature selection using gain ratio evaluation. Based on the results obtained, the most significant features could be identified by using feature selection method used in this research. The accuracy rate increased from 83.33% to 91.67%, the sensitivity from 82.35% to 94.11% and the specificity from 84.21% to 89.47%.

  11. The NWRA Classification Infrastructure: description and extension to the Discriminant Analysis Flare Forecasting System (DAFFS)

    NASA Astrophysics Data System (ADS)

    Leka, K. D.; Barnes, Graham; Wagner, Eric

    2018-04-01

    A classification infrastructure built upon Discriminant Analysis (DA) has been developed at NorthWest Research Associates for examining the statistical differences between samples of two known populations. Originating to examine the physical differences between flare-quiet and flare-imminent solar active regions, we describe herein some details of the infrastructure including: parametrization of large datasets, schemes for handling "null" and "bad" data in multi-parameter analysis, application of non-parametric multi-dimensional DA, an extension through Bayes' theorem to probabilistic classification, and methods invoked for evaluating classifier success. The classifier infrastructure is applicable to a wide range of scientific questions in solar physics. We demonstrate its application to the question of distinguishing flare-imminent from flare-quiet solar active regions, updating results from the original publications that were based on different data and much smaller sample sizes. Finally, as a demonstration of "Research to Operations" efforts in the space-weather forecasting context, we present the Discriminant Analysis Flare Forecasting System (DAFFS), a near-real-time operationally-running solar flare forecasting tool that was developed from the research-directed infrastructure.

  12. Automatic Cataract Hardness Classification Ex Vivo by Ultrasound Techniques.

    PubMed

    Caixinha, Miguel; Santos, Mário; Santos, Jaime

    2016-04-01

    To demonstrate the feasibility of a new methodology for cataract hardness characterization and automatic classification using ultrasound techniques, different cataract degrees were induced in 210 porcine lenses. A 25-MHz ultrasound transducer was used to obtain acoustical parameters (velocity and attenuation) and backscattering signals. B-Scan and parametric Nakagami images were constructed. Ninety-seven parameters were extracted and subjected to a Principal Component Analysis. Bayes, K-Nearest-Neighbours, Fisher Linear Discriminant and Support Vector Machine (SVM) classifiers were used to automatically classify the different cataract severities. Statistically significant increases with cataract formation were found for velocity, attenuation, mean brightness intensity of the B-Scan images and mean Nakagami m parameter (p < 0.01). The four classifiers showed a good performance for healthy versus cataractous lenses (F-measure ≥ 92.68%), while for initial versus severe cataracts the SVM classifier showed the higher performance (90.62%). The results showed that ultrasound techniques can be used for non-invasive cataract hardness characterization and automatic classification. Copyright © 2016 World Federation for Ultrasound in Medicine & Biology. Published by Elsevier Inc. All rights reserved.

  13. Machine learning-based diagnosis of melanoma using macro images.

    PubMed

    Gautam, Diwakar; Ahmed, Mushtaq; Meena, Yogesh Kumar; Ul Haq, Ahtesham

    2018-05-01

    Cancer bears a poisoning threat to human society. Melanoma, the skin cancer, originates from skin layers and penetrates deep into subcutaneous layers. There exists an extensive research in melanoma diagnosis using dermatoscopic images captured through a dermatoscope. While designing a diagnostic model for general handheld imaging systems is an emerging trend, this article proposes a computer-aided decision support system for macro images captured by a general-purpose camera. General imaging conditions are adversely affected by nonuniform illumination, which further affects the extraction of relevant information. To mitigate it, we process an image to define a smooth illumination surface using the multistage illumination compensation approach, and the infected region is extracted using the proposed multimode segmentation method. The lesion information is numerated as a feature set comprising geometry, photometry, border series, and texture measures. The redundancy in feature set is reduced using information theory methods, and a classification boundary is modeled to distinguish benign and malignant samples using support vector machine, random forest, neural network, and fast discriminative mixed-membership-based naive Bayesian classifiers. Moreover, the experimental outcome is supported by hypothesis testing and boxplot representation for classification losses. The simulation results prove the significance of the proposed model that shows an improved performance as compared with competing arts. Copyright © 2017 John Wiley & Sons, Ltd.

  14. Immunoglobulin M antibody response to measles virus following primary and secondary vaccination and natural virus infection.

    PubMed

    Erdman, D D; Heath, J L; Watson, J C; Markowitz, L E; Bellini, W J

    1993-09-01

    The use of IgM antibody detection for the classification of the primary and secondary measles antibody response in persons following primary and secondary vaccination and natural measles virus infection was examined. Of 32 nonimmune children receiving primary measles vaccination, 31 (97%) developed IgM antibodies, consistent with a primary antibody response. Of 21 previously vaccinated children with low levels of preexisting IgG antibodies who responded to revaccination, none developed detectable IgM antibodies, whereas 33 of 35 (94%) with no detectable preexisting IgG antibodies developed an IgM response. Of a sample of 57 measles cases with a prior history of vaccination, 55 (96%) had detectable IgM antibodies. Of these, 30 (55%) were classified as having a primary antibody response and 25 (45%) a secondary antibody response based on differences in their ratios of IgM to IgG antibodies. Differences in the severity of clinical symptoms between these 2 groups were consistent with this classification scheme. These findings suggest that 1) an IgM response follows primary measles vaccination in the immunologically naive, 2) an IgM response is absent on revaccination of those previously immunized, and 3) an IgM response may follow clinical measles virus infection independent of prior immunization status.

  15. Hyperspectral Mapping of the Invasive Species Pepperweed and the Development of a Habitat Suitability Model

    NASA Technical Reports Server (NTRS)

    Nguyen, Andrew; Gole, Alexander; Randall, Jarom; Dlott, Glade; Zhang, Sylvia; Alfaro, Brian; Schmidt, Cindy; Skiles, J. W.

    2011-01-01

    Mapping and predicting the spatial distribution of invasive plant species is central to habitat management, however difficult to implement at landscape and regional scales. Remote sensing techniques can reduce the impact field campaigns have on these ecologically sensitive areas and can provide a regional and multi-temporal view of invasive species spread. Invasive perennial pepperweed (Lepidium latifolium) is now widespread in fragmented estuaries of the South San Francisco Bay, and is shown to degrade native vegetation in estuaries and adjacent habitats, thereby reducing forage and shelter for wildlife. The purpose of this study is to map the present distribution of pepperweed in estuarine areas of the South San Francisco Bay Salt Pond Restoration Project (Alviso, CA), and create a habitat suitability model to predict future spread. Pepperweed reflectance data were collected in-situ with a GER 1500 spectroradiometer along with 88 corresponding pepperweed presence and absence points used for building the statistical models. The spectral angle mapper (SAM) classification algorithm was used to distinguish the reflectance spectrum of pepperweed and map its distribution using an image from EO-1 Hyperion. To map pepperweed, we performed a supervised classification on an ASTER image with a resulting classification accuracy of 71.8%. We generated a weighted overlay analysis model within a geographic information system (GIS) framework to predict areas in the study site most susceptible to pepperweed colonization. Variables for the model included propensity for disturbance, status of pond restoration, proximity to water channels, and terrain curvature. A Generalized Additive Model (GAM) was also used to generate a probability map and investigate the statistical probability that each variable contributed to predict pepperweed spread. Results from the GAM revealed distance to channels, distance to ponds and curvature were statistically significant (p < 0.01) in determining the locations of suitable pepperweed habitats.

  16. A Geostatistical Toolset for Reconstructing Louisiana's Coastal Stratigraphy using Subsurface Boring and Cone Penetrometer Test Data

    NASA Astrophysics Data System (ADS)

    Li, A.; Tsai, F. T. C.; Jafari, N.; Chen, Q. J.; Bentley, S. J.

    2017-12-01

    A vast area of river deltaic wetlands stretches across southern Louisiana coast. The wetlands are suffering from a high rate of land loss, which increasingly threats coastal community and energy infrastructure. A regional stratigraphic framework of the delta plain is now imperative to answer scientific questions (such as how the delta plain grows and decays?) and to provide information to coastal protection and restoration projects (such as marsh creation and construction of levees and floodwalls). Through years, subsurface investigations in Louisiana have been conducted by state and federal agencies (Louisiana Department of Natural Resources, United States Geological Survey, United States Army Corps of Engineers, etc.), research institutes (Louisiana Geological Survey, LSU Coastal Studies Institute, etc.), engineering firms, and oil-gas companies. This has resulted in the availability of various types of data, including geological, geotechnical, and geophysical data. However, it is challenging to integrate different types of data and construct three-dimensional stratigraphy models in regional scale. In this study, a set of geostatistical methods were used to tackle this problem. An ordinary kriging method was used to regionalize continuous data, such as grain size, water content, liquid limit, plasticity index, and cone penetrometer tests (CPTs). Indicator kriging and multiple indicator kriging methods were used to regionalize categorized data, such as soil classification. A compositional kriging method was used to regionalize compositional data, such as soil composition (fractions of sand, silt and clay). Stratigraphy models were constructed for three cases in the coastal zone: (1) Inner Harbor Navigation Canal (IHNC) area: soil classification and soil behavior type (SBT) stratigraphies were constructed using ordinary kriging; (2) Middle Barataria Bay area: a soil classification stratigraphy was constructed using multiple indicator kriging; (3) Lower Barataria Bay and Lower Breton Sound areas: a soil texture stratigraphy was constructed using soil compositional data and compositional kriging. Cross sections were extracted from the three-dimensional stratigraphy models to reveal spatial distributions of different stratigraphic features.

  17. Probability shapes perceptual precision: A study in orientation estimation.

    PubMed

    Jabar, Syaheed B; Anderson, Britt

    2015-12-01

    Probability is known to affect perceptual estimations, but an understanding of mechanisms is lacking. Moving beyond binary classification tasks, we had naive participants report the orientation of briefly viewed gratings where we systematically manipulated contingent probability. Participants rapidly developed faster and more precise estimations for high-probability tilts. The shapes of their error distributions, as indexed by a kurtosis measure, also showed a distortion from Gaussian. This kurtosis metric was robust, capturing probability effects that were graded, contextual, and varying as a function of stimulus orientation. Our data can be understood as a probability-induced reduction in the variability or "shape" of estimation errors, as would be expected if probability affects the perceptual representations. As probability manipulations are an implicit component of many endogenous cuing paradigms, changes at the perceptual level could account for changes in performance that might have traditionally been ascribed to "attention." (c) 2015 APA, all rights reserved).

  18. Ensemble Methods for Classification of Physical Activities from Wrist Accelerometry.

    PubMed

    Chowdhury, Alok Kumar; Tjondronegoro, Dian; Chandran, Vinod; Trost, Stewart G

    2017-09-01

    To investigate whether the use of ensemble learning algorithms improve physical activity recognition accuracy compared to the single classifier algorithms, and to compare the classification accuracy achieved by three conventional ensemble machine learning methods (bagging, boosting, random forest) and a custom ensemble model comprising four algorithms commonly used for activity recognition (binary decision tree, k nearest neighbor, support vector machine, and neural network). The study used three independent data sets that included wrist-worn accelerometer data. For each data set, a four-step classification framework consisting of data preprocessing, feature extraction, normalization and feature selection, and classifier training and testing was implemented. For the custom ensemble, decisions from the single classifiers were aggregated using three decision fusion methods: weighted majority vote, naïve Bayes combination, and behavior knowledge space combination. Classifiers were cross-validated using leave-one subject out cross-validation and compared on the basis of average F1 scores. In all three data sets, ensemble learning methods consistently outperformed the individual classifiers. Among the conventional ensemble methods, random forest models provided consistently high activity recognition; however, the custom ensemble model using weighted majority voting demonstrated the highest classification accuracy in two of the three data sets. Combining multiple individual classifiers using conventional or custom ensemble learning methods can improve activity recognition accuracy from wrist-worn accelerometer data.

  19. Spatial modeling and classification of corneal shape.

    PubMed

    Marsolo, Keith; Twa, Michael; Bullimore, Mark A; Parthasarathy, Srinivasan

    2007-03-01

    One of the most promising applications of data mining is in biomedical data used in patient diagnosis. Any method of data analysis intended to support the clinical decision-making process should meet several criteria: it should capture clinically relevant features, be computationally feasible, and provide easily interpretable results. In an initial study, we examined the feasibility of using Zernike polynomials to represent biomedical instrument data in conjunction with a decision tree classifier to distinguish between the diseased and non-diseased eyes. Here, we provide a comprehensive follow-up to that work, examining a second representation, pseudo-Zernike polynomials, to determine whether they provide any increase in classification accuracy. We compare the fidelity of both methods using residual root-mean-square (rms) error and evaluate accuracy using several classifiers: neural networks, C4.5 decision trees, Voting Feature Intervals, and Naïve Bayes. We also examine the effect of several meta-learning strategies: boosting, bagging, and Random Forests (RFs). We present results comparing accuracy as it relates to dataset and transformation resolution over a larger, more challenging, multi-class dataset. They show that classification accuracy is similar for both data transformations, but differs by classifier. We find that the Zernike polynomials provide better feature representation than the pseudo-Zernikes and that the decision trees yield the best balance of classification accuracy and interpretability.

  20. Pattern recognition for passive polarimetric data using nonparametric classifiers

    NASA Astrophysics Data System (ADS)

    Thilak, Vimal; Saini, Jatinder; Voelz, David G.; Creusere, Charles D.

    2005-08-01

    Passive polarization based imaging is a useful tool in computer vision and pattern recognition. A passive polarization imaging system forms a polarimetric image from the reflection of ambient light that contains useful information for computer vision tasks such as object detection (classification) and recognition. Applications of polarization based pattern recognition include material classification and automatic shape recognition. In this paper, we present two target detection algorithms for images captured by a passive polarimetric imaging system. The proposed detection algorithms are based on Bayesian decision theory. In these approaches, an object can belong to one of any given number classes and classification involves making decisions that minimize the average probability of making incorrect decisions. This minimum is achieved by assigning an object to the class that maximizes the a posteriori probability. Computing a posteriori probabilities requires estimates of class conditional probability density functions (likelihoods) and prior probabilities. A Probabilistic neural network (PNN), which is a nonparametric method that can compute Bayes optimal boundaries, and a -nearest neighbor (KNN) classifier, is used for density estimation and classification. The proposed algorithms are applied to polarimetric image data gathered in the laboratory with a liquid crystal-based system. The experimental results validate the effectiveness of the above algorithms for target detection from polarimetric data.

  1. Predicted seafloor facies of Central Santa Monica Bay, California

    USGS Publications Warehouse

    Dartnell, Peter; Gardner, James V.

    2004-01-01

    Summary -- Mapping surficial seafloor facies (sand, silt, muddy sand, rock, etc.) should be the first step in marine geological studies and is crucial when modeling sediment processes, pollution transport, deciphering tectonics, and defining benthic habitats. This report outlines an empirical technique that predicts the distribution of seafloor facies for a large area offshore Los Angeles, CA using high-resolution bathymetry and co-registered, calibrated backscatter from multibeam echosounders (MBES) correlated to ground-truth sediment samples. The technique uses a series of procedures that involve supervised classification and a hierarchical decision tree classification that are now available in advanced image-analysis software packages. Derivative variance images of both bathymetry and acoustic backscatter are calculated from the MBES data and then used in a hierarchical decision-tree framework to classify the MBES data into areas of rock, gravelly muddy sand, muddy sand, and mud. A quantitative accuracy assessment on the classification results is performed using ground-truth sediment samples. The predicted facies map is also ground-truthed using seafloor photographs and high-resolution sub-bottom seismic-reflection profiles. This Open-File Report contains the predicted seafloor facies map as a georeferenced TIFF image along with the multibeam bathymetry and acoustic backscatter data used in the study as well as an explanation of the empirical classification process.

  2. Detection of flow limitation in obstructive sleep apnea with an artificial neural network.

    PubMed

    Norman, Robert G; Rapoport, David M; Ayappa, Indu

    2007-09-01

    During sleep, the development of a plateau on the inspiratory airflow/time contour provides a non-invasive indicator of airway collapsibility. Humans recognize this abnormal contour easily, and this study replicates this with an artificial neural network (ANN) using a normalized shape. Five 10 min segments were selected from each of 18 sleep records (respiratory airflow measured with a nasal cannula) with varying degrees of sleep disordered breathing. Each breath was visually scored for shape, and breaths split randomly into a training and test set. Equally spaced, peak amplitude normalized flow values (representing breath shape) formed the only input to a back propagation ANN. Following training, breath-by-breath agreement of the ANN with the manual classification was tabulated for the training and test sets separately. Agreement of the ANN was 89% in the training set and 70.6% in the test set. When the categories of 'probably normal' and 'normal', and 'probably flow limited' and 'flow limited' were combined, the agreement increased to 92.7% and 89.4% respectively, similar to the intra- and inter-rater agreements obtained by a visual classification of these breaths. On a naive dataset, the agreement of the ANN to visual classification was 57.7% overall and 82.4% when the categories were collapsed. A neural network based only on the shape of inspiratory airflow succeeded in classifying breaths as to the presence/absence of flow limitation. This approach could be used to provide a standardized, reproducible and automated means of detecting elevated upper airway resistance.

  3. a Fully Automated Pipeline for Classification Tasks with AN Application to Remote Sensing

    NASA Astrophysics Data System (ADS)

    Suzuki, K.; Claesen, M.; Takeda, H.; De Moor, B.

    2016-06-01

    Nowadays deep learning has been intensively in spotlight owing to its great victories at major competitions, which undeservedly pushed `shallow' machine learning methods, relatively naive/handy algorithms commonly used by industrial engineers, to the background in spite of their facilities such as small requisite amount of time/dataset for training. We, with a practical point of view, utilized shallow learning algorithms to construct a learning pipeline such that operators can utilize machine learning without any special knowledge, expensive computation environment, and a large amount of labelled data. The proposed pipeline automates a whole classification process, namely feature-selection, weighting features and the selection of the most suitable classifier with optimized hyperparameters. The configuration facilitates particle swarm optimization, one of well-known metaheuristic algorithms for the sake of generally fast and fine optimization, which enables us not only to optimize (hyper)parameters but also to determine appropriate features/classifier to the problem, which has conventionally been a priori based on domain knowledge and remained untouched or dealt with naïve algorithms such as grid search. Through experiments with the MNIST and CIFAR-10 datasets, common datasets in computer vision field for character recognition and object recognition problems respectively, our automated learning approach provides high performance considering its simple setting (i.e. non-specialized setting depending on dataset), small amount of training data, and practical learning time. Moreover, compared to deep learning the performance stays robust without almost any modification even with a remote sensing object recognition problem, which in turn indicates that there is a high possibility that our approach contributes to general classification problems.

  4. Persistent low thymic activity and non-cardiac mortality in children with chromosome 22q11·2 microdeletion and partial DiGeorge syndrome

    PubMed Central

    Eberle, P; Berger, C; Junge, S; Dougoud, S; Büchel, E Valsangiacomo; Riegel, M; Schinzel, A; Seger, R; Güngör, T

    2009-01-01

    A subgroup of patients with 22q11·2 microdeletion and partial DiGeorge syndrome (pDGS) appears to be susceptible to non-cardiac mortality (NCM) despite sufficient overall CD4+ T cells. To detect these patients, 20 newborns with 22q11·2 microdeletion and congenital heart disease were followed prospectively for 6 years. Besides detailed clinical assessment, longitudinal monitoring of naive CD4+ and cytotoxic CD3+CD8+ T cells (CTL) was performed. To monitor thymic activity, we analysed naive platelet endothelial cell adhesion molecule-1 (CD31+) expressing CD45RA+RO−CD4+ cells containing high numbers of T cell receptor excision circle (TREC)-bearing lymphocytes and compared them with normal values of healthy children (n = 75). Comparing two age periods, low overall CD4+ and naive CD4+ T cell numbers were observed in 65%/75%, respectively, of patients in period A (< 1 year) declining to 22%/50%, respectively, of patients in period B (> 1/< 7 years). The percentage of patients with low CTLs (< P10) remained robust until school age (period A: 60%; period B: 50%). Low numbers of CTLs were associated with abnormally low naive CD45RA+RO−CD4+ T cells. A high-risk (HR) group (n = 11) and a standard-risk (SR) (n = 9) group were identified. HR patients were characterized by low numbers of both naive CD4+ and CTLs and were prone to lethal infectious and lymphoproliferative complications (NCM: four of 11; cardiac mortality: one of 11) while SR patients were not (NCM: none of nine; cardiac mortality: two of nine). Naive CD31+CD45RA+RO−CD4+, naive CD45RA+RO−CD4+ T cells as well as TRECs/106 mononuclear cells were abnormally low in HR and normal in SR patients. Longitudinal monitoring of naive CD4+ and cytotoxic T cells may help to discriminate pDGS patients at increased risk for NCM. PMID:19040613

  5. Benthic habitat classification in Lignumvitae Key Basin, Florida Bay, using the U.S. Geological Survey Along-Track Reef Imaging System (ATRIS)

    USGS Publications Warehouse

    Reich, C.D.; Zawada, D.G.; Thompson, P.R.; Reynolds, C.E.; Spear, A.H.; Umberger, D.K.; Poore, R.Z.

    2011-01-01

    The Comprehensive Everglades Restoration Plan (CERP) funded in partnership between the U.S. Army Corps of Engineers, South Florida Water Management District, and other Federal, local and Tribal members has in its mandate a guideline to protect and restore freshwater flows to coastal environments to pre-1940s conditions (CERP, 1999). Historic salinity data are sparse for Florida Bay, so it is difficult for water managers to decide what the correct quantity, quality, timing, and distribution of freshwater are to maintain a healthy and productive estuarine ecosystem. Proxy records of seasurface temperature (SST) and salinity have proven useful in south Florida. Trace-element chemistry on foraminifera and molluscan shells preserved in shallow-water sediments has provided some information on historical salinity and temperature variability in coastal settings, but little information is available for areas within the main part of Florida Bay (Brewster-Wingard and others, 1996). Geochemistry of coral skeletons can be used to develop subannually resolved proxy records for SST and salinity. Previous studies suggest corals, specifically Solenastrea bournoni, present in the lower section of Florida Bay near Lignumvitae Key, may be suitable for developing records of SST and salinity for the past century, but the distribution and species composition of the bay coral community have not been well documented (Hudson and others, 1989; Swart and others, 1999). Oddly, S. bournoni thrives in the study area because it can grow on a sandy substratum and can tolerate highly turbid water. Solenastrea bournoni coral heads in this area should be ideally located to provide a record (~100-150 years) of past temperature and salinity variations in Florida Bay. The goal of this study was to utilize the U.S. Geological Survey's (USGS) Along-Track Reef Imaging System (ATRIS) capability to further our understanding of the abundance, distribution, and size of corals in the Lignumvitae Key Basin. The study area was subdivided into four areas whereby corals and other benthic habitats were classified based on ATRIS imagery.

  6. Geospatial Method for Computing Supplemental Multi-Decadal U.S. Coastal Land-Use and Land-Cover Classification Products, Using Landsat Data and C-CAP Products

    NASA Technical Reports Server (NTRS)

    Spruce, J. P.; Smoot, James; Ellis, Jean; Hilbert, Kent; Swann, Roberta

    2012-01-01

    This paper discusses the development and implementation of a geospatial data processing method and multi-decadal Landsat time series for computing general coastal U.S. land-use and land-cover (LULC) classifications and change products consisting of seven classes (water, barren, upland herbaceous, non-woody wetland, woody upland, woody wetland, and urban). Use of this approach extends the observational period of the NOAA-generated Coastal Change and Analysis Program (C-CAP) products by almost two decades, assuming the availability of one cloud free Landsat scene from any season for each targeted year. The Mobile Bay region in Alabama was used as a study area to develop, demonstrate, and validate the method that was applied to derive LULC products for nine dates at approximate five year intervals across a 34-year time span, using single dates of data for each classification in which forests were either leaf-on, leaf-off, or mixed senescent conditions. Classifications were computed and refined using decision rules in conjunction with unsupervised classification of Landsat data and C-CAP value-added products. Each classification's overall accuracy was assessed by comparing stratified random locations to available reference data, including higher spatial resolution satellite and aerial imagery, field survey data, and raw Landsat RGBs. Overall classification accuracies ranged from 83 to 91% with overall Kappa statistics ranging from 0.78 to 0.89. The accuracies are comparable to those from similar, generalized LULC products derived from C-CAP data. The Landsat MSS-based LULC product accuracies are similar to those from Landsat TM or ETM+ data. Accurate classifications were computed for all nine dates, yielding effective results regardless of season. This classification method yielded products that were used to compute LULC change products via additive GIS overlay techniques.

  7. A novel artificial immune clonal selection classification and rule mining with swarm learning model

    NASA Astrophysics Data System (ADS)

    Al-Sheshtawi, Khaled A.; Abdul-Kader, Hatem M.; Elsisi, Ashraf B.

    2013-06-01

    Metaheuristic optimisation algorithms have become popular choice for solving complex problems. By integrating Artificial Immune clonal selection algorithm (CSA) and particle swarm optimisation (PSO) algorithm, a novel hybrid Clonal Selection Classification and Rule Mining with Swarm Learning Algorithm (CS2) is proposed. The main goal of the approach is to exploit and explore the parallel computation merit of Clonal Selection and the speed and self-organisation merits of Particle Swarm by sharing information between clonal selection population and particle swarm. Hence, we employed the advantages of PSO to improve the mutation mechanism of the artificial immune CSA and to mine classification rules within datasets. Consequently, our proposed algorithm required less training time and memory cells in comparison to other AIS algorithms. In this paper, classification rule mining has been modelled as a miltiobjective optimisation problem with predictive accuracy. The multiobjective approach is intended to allow the PSO algorithm to return an approximation to the accuracy and comprehensibility border, containing solutions that are spread across the border. We compared our proposed algorithm classification accuracy CS2 with five commonly used CSAs, namely: AIRS1, AIRS2, AIRS-Parallel, CLONALG, and CSCA using eight benchmark datasets. We also compared our proposed algorithm classification accuracy CS2 with other five methods, namely: Naïve Bayes, SVM, MLP, CART, and RFB. The results show that the proposed algorithm is comparable to the 10 studied algorithms. As a result, the hybridisation, built of CSA and PSO, can develop respective merit, compensate opponent defect, and make search-optimal effect and speed better.

  8. Classification of drug molecules considering their IC50 values using mixed-integer linear programming based hyper-boxes method.

    PubMed

    Armutlu, Pelin; Ozdemir, Muhittin E; Uney-Yuksektepe, Fadime; Kavakli, I Halil; Turkay, Metin

    2008-10-03

    A priori analysis of the activity of drugs on the target protein by computational approaches can be useful in narrowing down drug candidates for further experimental tests. Currently, there are a large number of computational methods that predict the activity of drugs on proteins. In this study, we approach the activity prediction problem as a classification problem and, we aim to improve the classification accuracy by introducing an algorithm that combines partial least squares regression with mixed-integer programming based hyper-boxes classification method, where drug molecules are classified as low active or high active regarding their binding activity (IC50 values) on target proteins. We also aim to determine the most significant molecular descriptors for the drug molecules. We first apply our approach by analyzing the activities of widely known inhibitor datasets including Acetylcholinesterase (ACHE), Benzodiazepine Receptor (BZR), Dihydrofolate Reductase (DHFR), Cyclooxygenase-2 (COX-2) with known IC50 values. The results at this stage proved that our approach consistently gives better classification accuracies compared to 63 other reported classification methods such as SVM, Naïve Bayes, where we were able to predict the experimentally determined IC50 values with a worst case accuracy of 96%. To further test applicability of this approach we first created dataset for Cytochrome P450 C17 inhibitors and then predicted their activities with 100% accuracy. Our results indicate that this approach can be utilized to predict the inhibitory effects of inhibitors based on their molecular descriptors. This approach will not only enhance drug discovery process, but also save time and resources committed.

  9. Two separate defects affecting true naive or virtual memory T cell precursors combine to reduce naive T cell responses with aging.

    PubMed

    Renkema, Kristin R; Li, Gang; Wu, Angela; Smithey, Megan J; Nikolich-Žugich, Janko

    2014-01-01

    Naive T cell responses are eroded with aging. We and others have recently shown that unimmunized old mice lose ≥ 70% of Ag-specific CD8 T cell precursors and that many of the remaining precursors acquire a virtual (central) memory (VM; CD44(hi)CD62L(hi)) phenotype. In this study, we demonstrate that unimmunized TCR transgenic (TCRTg) mice also undergo massive VM conversion with age, exhibiting rapid effector function upon both TCR and cytokine triggering. Age-related VM conversion in TCRTg mice directly depended on replacement of the original TCRTg specificity by endogenous TCRα rearrangements, indicating that TCR signals must be critical in VM conversion. Importantly, we found that VM conversion had adverse functional effects in both old wild-type and old TCRTg mice; that is, old VM, but not old true naive, T cells exhibited blunted TCR-mediated, but not IL-15-mediated, proliferation. This selective proliferative senescence correlated with increased apoptosis in old VM cells in response to peptide, but decreased apoptosis in response to homeostatic cytokines IL-7 and IL-15. Our results identify TCR as the key factor in differential maintenance and function of Ag-specific precursors in unimmunized mice with aging, and they demonstrate that two separate age-related defects--drastic reduction in true naive T cell precursors and impaired proliferative capacity of their VM cousins--combine to reduce naive T cell responses with aging.

  10. Interleukin-7 induces HIV replication in primary naive T cells through a nuclear factor of activated T cell (NFAT)-dependent pathway

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Managlia, Elizabeth Z.; Landay, Alan; Al-Harthi, Lena

    2006-07-05

    Interleukin (IL)-7 plays several roles critical to T cell maturation, survival, and homeostasis. Because of these functions, IL-7 is under investigation as an immune-modulator for therapeutic use in lymphopenic clinical conditions, including HIV. We reported that naive T cells, typically not permissive to HIV, can be productively infected when pre-treated with IL-7. We evaluated the mechanism by which IL-7-mediates this effect. IL-7 potently up-regulated the transcriptional factor NFAT, but had no effect on NF{kappa}B. Blocking NFAT activity using a number of reagents, such as Cyclosporin A, FK-506, or the NFAT-specific inhibitor known as VIVIT peptide, all markedly reduced IL-7-mediated inductionmore » of HIV replication in naive T cells. Additional neutralization of cytokines present in IL-7-treated cultures and/or those that have NFAT-binding sequences within their promotors indicated that IL-10, IL-4, and most significantly IFN{gamma}, all contribute to IL-7-induction of HIV productive replication in naive T cells. These data clarify the mechanism by which IL-7 can overcome the block to HIV productive infection in naive T cells, despite their quiescent cell status. These findings are relevant to the treatment of HIV disease and understanding HIV pathogenesis in the naive CD4+ T cell compartment, especially in light of the vigorous pursuit of IL-7 as an in vivo immune modulator.« less

  11. Statistical Approaches to Type Determination of the Ejector Marks on Cartridge Cases.

    PubMed

    Warren, Eric M; Sheets, H David

    2018-03-01

    While type determination on bullets has been performed for over a century, type determination on cartridge cases is often overlooked. Presented here is an example of type determination of ejector marks on cartridge cases from Glock and Smith & Wesson Sigma series pistols using Naïve Bayes and Random Forest classification methods. The shapes of ejector marks were captured from images of test-fired cartridge cases and subjected to multivariate analysis. Naïve Bayes and Random Forest methods were used to assign the ejector shapes to the correct class of firearm with success rates as high as 98%. This method is easily implemented with equipment already available in crime laboratories and can serve as an investigative lead in the form of a list of firearms that could have fired the evidence. Paired with the FBI's General Rifling Characteristics (GRC) database, this could be an invaluable resource for firearm evidence at crime scenes. © 2017 American Academy of Forensic Sciences.

  12. Otolith shape analysis for stock discrimination of two Collichthys genus croaker (Pieces: Sciaenidae,) from the northern Chinese coast

    NASA Astrophysics Data System (ADS)

    Zhao, Bo; Liu, Jinhu; Song, Junjie; Cao, Liang; Dou, Shuozeng

    2017-08-01

    The otolith morphology of two croaker species (Collichthys lucidus and Collichthys niveatus) from three areas (Liaodong Bay, LD; Huanghe (Yellow) River estuary, HRE; Jiaozhou Bay, JZ) along the northern Chinese coast were investigated for species identification and stock discrimination. The otolith contour shape described by elliptic Fourier coefficients (EFC) were analysed using principal components analysis (PCA) and stepwise canonical discriminant analysis (CDA) to identify species and stocks. The two species were well differentiated, with an overall classification success rate of 97.8%. And variations in the otolith shapes were significant enough to discriminate among the three geographical samples of C. lucidus (67.7%) or C. niveatus (65.2%). Relatively high mis-assignment occurred between the geographically adjacent LD and HRE samples, which implied that individual mixing may exist between the two samples. This study yielded information complementary to that derived from genetic studies and provided information for assessing the stock structure of C. lucidus and C. niveatus in the Bohai Sea and the Yellow Sea.

  13. Bayesian model reduction and empirical Bayes for group (DCM) studies

    PubMed Central

    Friston, Karl J.; Litvak, Vladimir; Oswal, Ashwini; Razi, Adeel; Stephan, Klaas E.; van Wijk, Bernadette C.M.; Ziegler, Gabriel; Zeidman, Peter

    2016-01-01

    This technical note describes some Bayesian procedures for the analysis of group studies that use nonlinear models at the first (within-subject) level – e.g., dynamic causal models – and linear models at subsequent (between-subject) levels. Its focus is on using Bayesian model reduction to finesse the inversion of multiple models of a single dataset or a single (hierarchical or empirical Bayes) model of multiple datasets. These applications of Bayesian model reduction allow one to consider parametric random effects and make inferences about group effects very efficiently (in a few seconds). We provide the relatively straightforward theoretical background to these procedures and illustrate their application using a worked example. This example uses a simulated mismatch negativity study of schizophrenia. We illustrate the robustness of Bayesian model reduction to violations of the (commonly used) Laplace assumption in dynamic causal modelling and show how its recursive application can facilitate both classical and Bayesian inference about group differences. Finally, we consider the application of these empirical Bayesian procedures to classification and prediction. PMID:26569570

  14. Skylab/EREP application to ecological, geological, and oceanographic investigations of Delaware Bay

    NASA Technical Reports Server (NTRS)

    Klemas, V. (Principal Investigator); Bartlett, D. S.; Philpot, W. D.; Rogers, R. H.; Reed, L. E.

    1976-01-01

    The author has identified the following significant results. Skylab/EREP S190A and S190B film products were optically enhanced and visually interpreted to extract data suitable for mapping coastal land use; inventorying wetlands vegetation; monitoring tidal conditions; observing suspended sediment patterns; charting surface currents; locating coastal fronts and water mass boundaries; monitoring industrial and municipal waste dumps in the ocean; and determining the size and flow direction of river, bay, and man-made discharge plumes. Film products were visually analyzed to identify and map ten land use and vegetation categories at a scale of 1:125,000. Thematic maps were compared with CARETS land use maps, resulting in classification accuracies of 50 to 98%. Digital tapes from S192 were used to prepare thematic land use maps. The resolutions of the S190A, S190B, and S192 systems were 20-40m, 10-20m, and 70-100m, respectively.

  15. Multiclass Bayes error estimation by a feature space sampling technique

    NASA Technical Reports Server (NTRS)

    Mobasseri, B. G.; Mcgillem, C. D.

    1979-01-01

    A general Gaussian M-class N-feature classification problem is defined. An algorithm is developed that requires the class statistics as its only input and computes the minimum probability of error through use of a combined analytical and numerical integration over a sequence simplifying transformations of the feature space. The results are compared with those obtained by conventional techniques applied to a 2-class 4-feature discrimination problem with results previously reported and 4-class 4-feature multispectral scanner Landsat data classified by training and testing of the available data.

  16. Non-negative matrix factorization in texture feature for classification of dementia with MRI data

    NASA Astrophysics Data System (ADS)

    Sarwinda, D.; Bustamam, A.; Ardaneswari, G.

    2017-07-01

    This paper investigates applications of non-negative matrix factorization as feature selection method to select the features from gray level co-occurrence matrix. The proposed approach is used to classify dementia using MRI data. In this study, texture analysis using gray level co-occurrence matrix is done to feature extraction. In the feature extraction process of MRI data, we found seven features from gray level co-occurrence matrix. Non-negative matrix factorization selected three features that influence of all features produced by feature extractions. A Naïve Bayes classifier is adapted to classify dementia, i.e. Alzheimer's disease, Mild Cognitive Impairment (MCI) and normal control. The experimental results show that non-negative factorization as feature selection method able to achieve an accuracy of 96.4% for classification of Alzheimer's and normal control. The proposed method also compared with other features selection methods i.e. Principal Component Analysis (PCA).

  17. Hierarchical Rhetorical Sentence Categorization for Scientific Papers

    NASA Astrophysics Data System (ADS)

    Rachman, G. H.; Khodra, M. L.; Widyantoro, D. H.

    2018-03-01

    Important information in scientific papers can be composed of rhetorical sentences that is structured from certain categories. To get this information, text categorization should be conducted. Actually, some works in this task have been completed by employing word frequency, semantic similarity words, hierarchical classification, and the others. Therefore, this paper aims to present the rhetorical sentence categorization from scientific paper by employing TF-IDF and Word2Vec to capture word frequency and semantic similarity words and employing hierarchical classification. Every experiment is tested in two classifiers, namely Naïve Bayes and SVM Linear. This paper shows that hierarchical classifier is better than flat classifier employing either TF-IDF or Word2Vec, although it increases only almost 2% from 27.82% when using flat classifier until 29.61% when using hierarchical classifier. It shows also different learning model for child-category can be built by hierarchical classifier.

  18. Deep learning of support vector machines with class probability output networks.

    PubMed

    Kim, Sangwook; Yu, Zhibin; Kil, Rhee Man; Lee, Minho

    2015-04-01

    Deep learning methods endeavor to learn features automatically at multiple levels and allow systems to learn complex functions mapping from the input space to the output space for the given data. The ability to learn powerful features automatically is increasingly important as the volume of data and range of applications of machine learning methods continues to grow. This paper proposes a new deep architecture that uses support vector machines (SVMs) with class probability output networks (CPONs) to provide better generalization power for pattern classification problems. As a result, deep features are extracted without additional feature engineering steps, using multiple layers of the SVM classifiers with CPONs. The proposed structure closely approaches the ideal Bayes classifier as the number of layers increases. Using a simulation of classification problems, the effectiveness of the proposed method is demonstrated. Copyright © 2014 Elsevier Ltd. All rights reserved.

  19. [Image Feature Extraction and Discriminant Analysis of Xinjiang Uygur Medicine Based on Color Histogram].

    PubMed

    Hamit, Murat; Yun, Weikang; Yan, Chuanbo; Kutluk, Abdugheni; Fang, Yang; Alip, Elzat

    2015-06-01

    Image feature extraction is an important part of image processing and it is an important field of research and application of image processing technology. Uygur medicine is one of Chinese traditional medicine and researchers pay more attention to it. But large amounts of Uygur medicine data have not been fully utilized. In this study, we extracted the image color histogram feature of herbal and zooid medicine of Xinjiang Uygur. First, we did preprocessing, including image color enhancement, size normalizition and color space transformation. Then we extracted color histogram feature and analyzed them with statistical method. And finally, we evaluated the classification ability of features by Bayes discriminant analysis. Experimental results showed that high accuracy for Uygur medicine image classification was obtained by using color histogram feature. This study would have a certain help for the content-based medical image retrieval for Xinjiang Uygur medicine.

  20. The Identification of Land Utilization in Coastal Reclamation Areas in Tianjin Using High Resolution Remote Sensing Images

    NASA Astrophysics Data System (ADS)

    Meng, Y.; Cao, Y.; Tian, H.; Han, Z.

    2018-04-01

    In recent decades, land reclamation activities have been developed rapidly in Chinese coastal regions, especially in Bohai Bay. The land reclamation areas can effectively alleviate the contradiction between land resources shortage and human needs, but some idle lands that left unused after the government making approval the usage of sea areas are also supposed to pay attention to. Due to the particular features of land coverage identification in large regions, traditional monitoring approaches are unable to perfectly meet the needs of effectively and quickly land use classification. In this paper, Gaofen-1 remotely sensed satellite imagery data together with sea area usage ownership data were used to identify the land use classifications and find out the idle land resources. It can be seen from the result that most of the land use types and idle land resources can be identified precisely.

Top