Dynamic Dimensionality Selection for Bayesian Classifier Ensembles
2015-03-19
learning of weights in an otherwise generatively learned naive Bayes classifier. WANBIA-C is very cometitive to Logistic Regression but much more...classifier, Generative learning, Discriminative learning, Naïve Bayes, Feature selection, Logistic regression , higher order attribute independence 16...discriminative learning of weights in an otherwise generatively learned naive Bayes classifier. WANBIA-C is very cometitive to Logistic Regression but
Diagnosis of combined faults in Rotary Machinery by Non-Naive Bayesian approach
NASA Astrophysics Data System (ADS)
Asr, Mahsa Yazdanian; Ettefagh, Mir Mohammad; Hassannejad, Reza; Razavi, Seyed Naser
2017-02-01
When combined faults happen in different parts of the rotating machines, their features are profoundly dependent. Experts are completely familiar with individuals faults characteristics and enough data are available from single faults but the problem arises, when the faults combined and the separation of characteristics becomes complex. Therefore, the experts cannot declare exact information about the symptoms of combined fault and its quality. In this paper to overcome this drawback, a novel method is proposed. The core idea of the method is about declaring combined fault without using combined fault features as training data set and just individual fault features are applied in training step. For this purpose, after data acquisition and resampling the obtained vibration signals, Empirical Mode Decomposition (EMD) is utilized to decompose multi component signals to Intrinsic Mode Functions (IMFs). With the use of correlation coefficient, proper IMFs for feature extraction are selected. In feature extraction step, Shannon energy entropy of IMFs was extracted as well as statistical features. It is obvious that most of extracted features are strongly dependent. To consider this matter, Non-Naive Bayesian Classifier (NNBC) is appointed, which release the fundamental assumption of Naive Bayesian, i.e., the independence among features. To demonstrate the superiority of NNBC, other counterpart methods, include Normal Naive Bayesian classifier, Kernel Naive Bayesian classifier and Back Propagation Neural Networks were applied and the classification results are compared. An experimental vibration signals, collected from automobile gearbox, were used to verify the effectiveness of the proposed method. During the classification process, only the features, related individually to healthy state, bearing failure and gear failures, were assigned for training the classifier. But, combined fault features (combined gear and bearing failures) were examined as test data. The achieved probabilities for the test data show that the combined fault can be identified with high success rate.
Classifying emotion in Twitter using Bayesian network
NASA Astrophysics Data System (ADS)
Surya Asriadie, Muhammad; Syahrul Mubarok, Mohamad; Adiwijaya
2018-03-01
Language is used to express not only facts, but also emotions. Emotions are noticeable from behavior up to the social media statuses written by a person. Analysis of emotions in a text is done in a variety of media such as Twitter. This paper studies classification of emotions on twitter using Bayesian network because of its ability to model uncertainty and relationships between features. The result is two models based on Bayesian network which are Full Bayesian Network (FBN) and Bayesian Network with Mood Indicator (BNM). FBN is a massive Bayesian network where each word is treated as a node. The study shows the method used to train FBN is not very effective to create the best model and performs worse compared to Naive Bayes. F1-score for FBN is 53.71%, while for Naive Bayes is 54.07%. BNM is proposed as an alternative method which is based on the improvement of Multinomial Naive Bayes and has much lower computational complexity compared to FBN. Even though it’s not better compared to FBN, the resulting model successfully improves the performance of Multinomial Naive Bayes. F1-Score for Multinomial Naive Bayes model is 51.49%, while for BNM is 52.14%.
NASA Technical Reports Server (NTRS)
Garay, Michael J.; Mazzoni, Dominic; Davies, Roger; Wagstaff, Kiri
2004-01-01
Support Vector Machines (SVMs) are a type of supervised learning algorith,, other examples of which are Artificial Neural Networks (ANNs), Decision Trees, and Naive Bayesian Classifiers. Supervised learning algorithms are used to classify objects labled by a 'supervisor' - typically a human 'expert.'.
A native Bayesian classifier based routing protocol for VANETS
NASA Astrophysics Data System (ADS)
Bao, Zhenshan; Zhou, Keqin; Zhang, Wenbo; Gong, Xiaolei
2016-12-01
Geographic routing protocols are one of the most hot research areas in VANET (Vehicular Ad-hoc Network). However, there are few routing protocols can take both the transmission efficient and the usage of ratio into account. As we have noticed, different messages in VANET may ask different quality of service. So we raised a Native Bayesian Classifier based routing protocol (Naive Bayesian Classifier-Greedy, NBC-Greedy), which can classify and transmit different messages by its emergency degree. As a result, we can balance the transmission efficient and the usage of ratio with this protocol. Based on Matlab simulation, we can draw a conclusion that NBC-Greedy is more efficient and stable than LR-Greedy and GPSR.
Bayesian cloud detection for MERIS, AATSR, and their combination
NASA Astrophysics Data System (ADS)
Hollstein, A.; Fischer, J.; Carbajal Henken, C.; Preusker, R.
2014-11-01
A broad range of different of Bayesian cloud detection schemes is applied to measurements from the Medium Resolution Imaging Spectrometer (MERIS), the Advanced Along-Track Scanning Radiometer (AATSR), and their combination. The cloud masks were designed to be numerically efficient and suited for the processing of large amounts of data. Results from the classical and naive approach to Bayesian cloud masking are discussed for MERIS and AATSR as well as for their combination. A sensitivity study on the resolution of multidimensional histograms, which were post-processed by Gaussian smoothing, shows how theoretically insufficient amounts of truth data can be used to set up accurate classical Bayesian cloud masks. Sets of exploited features from single and derived channels are numerically optimized and results for naive and classical Bayesian cloud masks are presented. The application of the Bayesian approach is discussed in terms of reproducing existing algorithms, enhancing existing algorithms, increasing the robustness of existing algorithms, and on setting up new classification schemes based on manually classified scenes.
Bayesian cloud detection for MERIS, AATSR, and their combination
NASA Astrophysics Data System (ADS)
Hollstein, A.; Fischer, J.; Carbajal Henken, C.; Preusker, R.
2015-04-01
A broad range of different of Bayesian cloud detection schemes is applied to measurements from the Medium Resolution Imaging Spectrometer (MERIS), the Advanced Along-Track Scanning Radiometer (AATSR), and their combination. The cloud detection schemes were designed to be numerically efficient and suited for the processing of large numbers of data. Results from the classical and naive approach to Bayesian cloud masking are discussed for MERIS and AATSR as well as for their combination. A sensitivity study on the resolution of multidimensional histograms, which were post-processed by Gaussian smoothing, shows how theoretically insufficient numbers of truth data can be used to set up accurate classical Bayesian cloud masks. Sets of exploited features from single and derived channels are numerically optimized and results for naive and classical Bayesian cloud masks are presented. The application of the Bayesian approach is discussed in terms of reproducing existing algorithms, enhancing existing algorithms, increasing the robustness of existing algorithms, and on setting up new classification schemes based on manually classified scenes.
Textual and visual content-based anti-phishing: a Bayesian approach.
Zhang, Haijun; Liu, Gang; Chow, Tommy W S; Liu, Wenyin
2011-10-01
A novel framework using a Bayesian approach for content-based phishing web page detection is presented. Our model takes into account textual and visual contents to measure the similarity between the protected web page and suspicious web pages. A text classifier, an image classifier, and an algorithm fusing the results from classifiers are introduced. An outstanding feature of this paper is the exploration of a Bayesian model to estimate the matching threshold. This is required in the classifier for determining the class of the web page and identifying whether the web page is phishing or not. In the text classifier, the naive Bayes rule is used to calculate the probability that a web page is phishing. In the image classifier, the earth mover's distance is employed to measure the visual similarity, and our Bayesian model is designed to determine the threshold. In the data fusion algorithm, the Bayes theory is used to synthesize the classification results from textual and visual content. The effectiveness of our proposed approach was examined in a large-scale dataset collected from real phishing cases. Experimental results demonstrated that the text classifier and the image classifier we designed deliver promising results, the fusion algorithm outperforms either of the individual classifiers, and our model can be adapted to different phishing cases. © 2011 IEEE
Monti, S.; Cooper, G. F.
1998-01-01
We present a new Bayesian classifier for computer-aided diagnosis. The new classifier builds upon the naive-Bayes classifier, and models the dependencies among patient findings in an attempt to improve its performance, both in terms of classification accuracy and in terms of calibration of the estimated probabilities. This work finds motivation in the argument that highly calibrated probabilities are necessary for the clinician to be able to rely on the model's recommendations. Experimental results are presented, supporting the conclusion that modeling the dependencies among findings improves calibration. PMID:9929288
A combined Fuzzy and Naive Bayesian strategy can be used to assign event codes to injury narratives.
Marucci-Wellman, H; Lehto, M; Corns, H
2011-12-01
Bayesian methods show promise for classifying injury narratives from large administrative datasets into cause groups. This study examined a combined approach where two Bayesian models (Fuzzy and Naïve) were used to either classify a narrative or select it for manual review. Injury narratives were extracted from claims filed with a worker's compensation insurance provider between January 2002 and December 2004. Narratives were separated into a training set (n=11,000) and prediction set (n=3,000). Expert coders assigned two-digit Bureau of Labor Statistics Occupational Injury and Illness Classification event codes to each narrative. Fuzzy and Naïve Bayesian models were developed using manually classified cases in the training set. Two semi-automatic machine coding strategies were evaluated. The first strategy assigned cases for manual review if the Fuzzy and Naïve models disagreed on the classification. The second strategy selected additional cases for manual review from the Agree dataset using prediction strength to reach a level of 50% computer coding and 50% manual coding. When agreement alone was used as the filtering strategy, the majority were coded by the computer (n=1,928, 64%) leaving 36% for manual review. The overall combined (human plus computer) sensitivity was 0.90 and positive predictive value (PPV) was >0.90 for 11 of 18 2-digit event categories. Implementing the 2nd strategy improved results with an overall sensitivity of 0.95 and PPV >0.90 for 17 of 18 categories. A combined Naïve-Fuzzy Bayesian approach can classify some narratives with high accuracy and identify others most beneficial for manual review, reducing the burden on human coders.
Bayesian network modelling of upper gastrointestinal bleeding
NASA Astrophysics Data System (ADS)
Aisha, Nazziwa; Shohaimi, Shamarina; Adam, Mohd Bakri
2013-09-01
Bayesian networks are graphical probabilistic models that represent causal and other relationships between domain variables. In the context of medical decision making, these models have been explored to help in medical diagnosis and prognosis. In this paper, we discuss the Bayesian network formalism in building medical support systems and we learn a tree augmented naive Bayes Network (TAN) from gastrointestinal bleeding data. The accuracy of the TAN in classifying the source of gastrointestinal bleeding into upper or lower source is obtained. The TAN achieves a high classification accuracy of 86% and an area under curve of 92%. A sensitivity analysis of the model shows relatively high levels of entropy reduction for color of the stool, history of gastrointestinal bleeding, consistency and the ratio of blood urea nitrogen to creatinine. The TAN facilitates the identification of the source of GIB and requires further validation.
Porras-Alfaro, Andrea; Liu, Kuan-Liang; Kuske, Cheryl R; Xie, Gary
2014-02-01
We compared the classification accuracy of two sections of the fungal internal transcribed spacer (ITS) region, individually and combined, and the 5' section (about 600 bp) of the large-subunit rRNA (LSU), using a naive Bayesian classifier and BLASTN. A hand-curated ITS-LSU training set of 1,091 sequences and a larger training set of 8,967 ITS region sequences were used. Of the factors evaluated, database composition and quality had the largest effect on classification accuracy, followed by fragment size and use of a bootstrap cutoff to improve classification confidence. The naive Bayesian classifier and BLASTN gave similar results at higher taxonomic levels, but the classifier was faster and more accurate at the genus level when a bootstrap cutoff was used. All of the ITS and LSU sections performed well (>97.7% accuracy) at higher taxonomic ranks from kingdom to family, and differences between them were small at the genus level (within 0.66 to 1.23%). When full-length sequence sections were used, the LSU outperformed the ITS1 and ITS2 fragments at the genus level, but the ITS1 and ITS2 showed higher accuracy when smaller fragment sizes of the same length and a 50% bootstrap cutoff were used. In a comparison using the larger ITS training set, ITS1 and ITS2 had very similar accuracy classification for fragments between 100 and 200 bp. Collectively, the results show that any of the ITS or LSU sections we tested provided comparable classification accuracy to the genus level and underscore the need for larger and more diverse classification training sets.
Liu, Kuan-Liang; Kuske, Cheryl R.
2014-01-01
We compared the classification accuracy of two sections of the fungal internal transcribed spacer (ITS) region, individually and combined, and the 5′ section (about 600 bp) of the large-subunit rRNA (LSU), using a naive Bayesian classifier and BLASTN. A hand-curated ITS-LSU training set of 1,091 sequences and a larger training set of 8,967 ITS region sequences were used. Of the factors evaluated, database composition and quality had the largest effect on classification accuracy, followed by fragment size and use of a bootstrap cutoff to improve classification confidence. The naive Bayesian classifier and BLASTN gave similar results at higher taxonomic levels, but the classifier was faster and more accurate at the genus level when a bootstrap cutoff was used. All of the ITS and LSU sections performed well (>97.7% accuracy) at higher taxonomic ranks from kingdom to family, and differences between them were small at the genus level (within 0.66 to 1.23%). When full-length sequence sections were used, the LSU outperformed the ITS1 and ITS2 fragments at the genus level, but the ITS1 and ITS2 showed higher accuracy when smaller fragment sizes of the same length and a 50% bootstrap cutoff were used. In a comparison using the larger ITS training set, ITS1 and ITS2 had very similar accuracy classification for fragments between 100 and 200 bp. Collectively, the results show that any of the ITS or LSU sections we tested provided comparable classification accuracy to the genus level and underscore the need for larger and more diverse classification training sets. PMID:24242255
NASA Astrophysics Data System (ADS)
Yu, Xin; Wen, Zongyong; Zhu, Zhaorong; Xia, Qiang; Shun, Lan
2016-06-01
Image classification will still be a long way in the future, although it has gone almost half a century. In fact, researchers have gained many fruits in the image classification domain, but there is still a long distance between theory and practice. However, some new methods in the artificial intelligence domain will be absorbed into the image classification domain and draw on the strength of each to offset the weakness of the other, which will open up a new prospect. Usually, networks play the role of a high-level language, as is seen in Artificial Intelligence and statistics, because networks are used to build complex model from simple components. These years, Bayesian Networks, one of probabilistic networks, are a powerful data mining technique for handling uncertainty in complex domains. In this paper, we apply Tree Augmented Naive Bayesian Networks (TAN) to texture classification of High-resolution remote sensing images and put up a new method to construct the network topology structure in terms of training accuracy based on the training samples. Since 2013, China government has started the first national geographical information census project, which mainly interprets geographical information based on high-resolution remote sensing images. Therefore, this paper tries to apply Bayesian network to remote sensing image classification, in order to improve image interpretation in the first national geographical information census project. In the experiment, we choose some remote sensing images in Beijing. Experimental results demonstrate TAN outperform than Naive Bayesian Classifier (NBC) and Maximum Likelihood Classification Method (MLC) in the overall classification accuracy. In addition, the proposed method can reduce the workload of field workers and improve the work efficiency. Although it is time consuming, it will be an attractive and effective method for assisting office operation of image interpretation.
NASA Astrophysics Data System (ADS)
Sopharak, Akara; Uyyanonvara, Bunyarit; Barman, Sarah; Williamson, Thomas
To prevent blindness from diabetic retinopathy, periodic screening and early diagnosis are neccessary. Due to lack of expert ophthalmologists in rural area, automated early exudate (one of visible sign of diabetic retinopathy) detection could help to reduce the number of blindness in diabetic patients. Traditional automatic exudate detection methods are based on specific parameter configuration, while the machine learning approaches which seems more flexible may be computationally high cost. A comparative analysis of traditional and machine learning of exudates detection, namely, mathematical morphology, fuzzy c-means clustering, naive Bayesian classifier, Support Vector Machine and Nearest Neighbor classifier are presented. Detected exudates are validated with expert ophthalmologists' hand-drawn ground-truths. The sensitivity, specificity, precision, accuracy and time complexity of each method are also compared.
Machine learning approach to automatic exudate detection in retinal images from diabetic patients
NASA Astrophysics Data System (ADS)
Sopharak, Akara; Dailey, Matthew N.; Uyyanonvara, Bunyarit; Barman, Sarah; Williamson, Tom; Thet Nwe, Khine; Aye Moe, Yin
2010-01-01
Exudates are among the preliminary signs of diabetic retinopathy, a major cause of vision loss in diabetic patients. Early detection of exudates could improve patients' chances to avoid blindness. In this paper, we present a series of experiments on feature selection and exudates classification using naive Bayes and support vector machine (SVM) classifiers. We first fit the naive Bayes model to a training set consisting of 15 features extracted from each of 115,867 positive examples of exudate pixels and an equal number of negative examples. We then perform feature selection on the naive Bayes model, repeatedly removing features from the classifier, one by one, until classification performance stops improving. To find the best SVM, we begin with the best feature set from the naive Bayes classifier, and repeatedly add the previously-removed features to the classifier. For each combination of features, we perform a grid search to determine the best combination of hyperparameters ν (tolerance for training errors) and γ (radial basis function width). We compare the best naive Bayes and SVM classifiers to a baseline nearest neighbour (NN) classifier using the best feature sets from both classifiers. We find that the naive Bayes and SVM classifiers perform better than the NN classifier. The overall best sensitivity, specificity, precision, and accuracy are 92.28%, 98.52%, 53.05%, and 98.41%, respectively.
Robust through-the-wall radar image classification using a target-model alignment procedure.
Smith, Graeme E; Mobasseri, Bijan G
2012-02-01
A through-the-wall radar image (TWRI) bears little resemblance to the equivalent optical image, making it difficult to interpret. To maximize the intelligence that may be obtained, it is desirable to automate the classification of targets in the image to support human operators. This paper presents a technique for classifying stationary targets based on the high-range resolution profile (HRRP) extracted from 3-D TWRIs. The dependence of the image on the target location is discussed using a system point spread function (PSF) approach. It is shown that the position dependence will cause a classifier to fail, unless the image to be classified is aligned to a classifier-training location. A target image alignment technique based on deconvolution of the image with the system PSF is proposed. Comparison of the aligned target images with measured images shows the alignment process introducing normalized mean squared error (NMSE) ≤ 9%. The HRRP extracted from aligned target images are classified using a naive Bayesian classifier supported by principal component analysis. The classifier is tested using a real TWRI of canonical targets behind a concrete wall and shown to obtain correct classification rates ≥ 97%. © 2011 IEEE
Using simple artificial intelligence methods for predicting amyloidogenesis in antibodies
2010-01-01
Background All polypeptide backbones have the potential to form amyloid fibrils, which are associated with a number of degenerative disorders. However, the likelihood that amyloidosis would actually occur under physiological conditions depends largely on the amino acid composition of a protein. We explore using a naive Bayesian classifier and a weighted decision tree for predicting the amyloidogenicity of immunoglobulin sequences. Results The average accuracy based on leave-one-out (LOO) cross validation of a Bayesian classifier generated from 143 amyloidogenic sequences is 60.84%. This is consistent with the average accuracy of 61.15% for a holdout test set comprised of 103 AM and 28 non-amyloidogenic sequences. The LOO cross validation accuracy increases to 81.08% when the training set is augmented by the holdout test set. In comparison, the average classification accuracy for the holdout test set obtained using a decision tree is 78.64%. Non-amyloidogenic sequences are predicted with average LOO cross validation accuracies between 74.05% and 77.24% using the Bayesian classifier, depending on the training set size. The accuracy for the holdout test set was 89%. For the decision tree, the non-amyloidogenic prediction accuracy is 75.00%. Conclusions This exploratory study indicates that both classification methods may be promising in providing straightforward predictions on the amyloidogenicity of a sequence. Nevertheless, the number of available sequences that satisfy the premises of this study are limited, and are consequently smaller than the ideal training set size. Increasing the size of the training set clearly increases the accuracy, and the expansion of the training set to include not only more derivatives, but more alignments, would make the method more sound. The accuracy of the classifiers may also be improved when additional factors, such as structural and physico-chemical data, are considered. The development of this type of classifier has significant applications in evaluating engineered antibodies, and may be adapted for evaluating engineered proteins in general. PMID:20144194
Using simple artificial intelligence methods for predicting amyloidogenesis in antibodies.
David, Maria Pamela C; Concepcion, Gisela P; Padlan, Eduardo A
2010-02-08
All polypeptide backbones have the potential to form amyloid fibrils, which are associated with a number of degenerative disorders. However, the likelihood that amyloidosis would actually occur under physiological conditions depends largely on the amino acid composition of a protein. We explore using a naive Bayesian classifier and a weighted decision tree for predicting the amyloidogenicity of immunoglobulin sequences. The average accuracy based on leave-one-out (LOO) cross validation of a Bayesian classifier generated from 143 amyloidogenic sequences is 60.84%. This is consistent with the average accuracy of 61.15% for a holdout test set comprised of 103 AM and 28 non-amyloidogenic sequences. The LOO cross validation accuracy increases to 81.08% when the training set is augmented by the holdout test set. In comparison, the average classification accuracy for the holdout test set obtained using a decision tree is 78.64%. Non-amyloidogenic sequences are predicted with average LOO cross validation accuracies between 74.05% and 77.24% using the Bayesian classifier, depending on the training set size. The accuracy for the holdout test set was 89%. For the decision tree, the non-amyloidogenic prediction accuracy is 75.00%. This exploratory study indicates that both classification methods may be promising in providing straightforward predictions on the amyloidogenicity of a sequence. Nevertheless, the number of available sequences that satisfy the premises of this study are limited, and are consequently smaller than the ideal training set size. Increasing the size of the training set clearly increases the accuracy, and the expansion of the training set to include not only more derivatives, but more alignments, would make the method more sound. The accuracy of the classifiers may also be improved when additional factors, such as structural and physico-chemical data, are considered. The development of this type of classifier has significant applications in evaluating engineered antibodies, and may be adapted for evaluating engineered proteins in general.
Browne, Fiona; Wang, Haiying; Zheng, Huiru; Azuaje, Francisco
2010-03-01
This study applied a knowledge-driven data integration framework for the inference of protein-protein interactions (PPI). Evidence from diverse genomic features is integrated using a knowledge-driven Bayesian network (KD-BN). Receiver operating characteristic (ROC) curves may not be the optimal assessment method to evaluate a classifier's performance in PPI prediction as the majority of the area under the curve (AUC) may not represent biologically meaningful results. It may be of benefit to interpret the AUC of a partial ROC curve whereby biologically interesting results are represented. Therefore, the novel application of the assessment method referred to as the partial ROC has been employed in this study to assess predictive performance of PPI predictions along with calculating the True positive/false positive rate and true positive/positive rate. By incorporating domain knowledge into the construction of the KD-BN, we demonstrate improvement in predictive performance compared with previous studies based upon the Naive Bayesian approach. Copyright (c) 2010 Elsevier Ltd. All rights reserved.
Gene-Based Multiclass Cancer Diagnosis with Class-Selective Rejections
Jrad, Nisrine; Grall-Maës, Edith; Beauseroy, Pierre
2009-01-01
Supervised learning of microarray data is receiving much attention in recent years. Multiclass cancer diagnosis, based on selected gene profiles, are used as adjunct of clinical diagnosis. However, supervised diagnosis may hinder patient care, add expense or confound a result. To avoid this misleading, a multiclass cancer diagnosis with class-selective rejection is proposed. It rejects some patients from one, some, or all classes in order to ensure a higher reliability while reducing time and expense costs. Moreover, this classifier takes into account asymmetric penalties dependant on each class and on each wrong or partially correct decision. It is based on ν-1-SVM coupled with its regularization path and minimizes a general loss function defined in the class-selective rejection scheme. The state of art multiclass algorithms can be considered as a particular case of the proposed algorithm where the number of decisions is given by the classes and the loss function is defined by the Bayesian risk. Two experiments are carried out in the Bayesian and the class selective rejection frameworks. Five genes selected datasets are used to assess the performance of the proposed method. Results are discussed and accuracies are compared with those computed by the Naive Bayes, Nearest Neighbor, Linear Perceptron, Multilayer Perceptron, and Support Vector Machines classifiers. PMID:19584932
Overlapped Partitioning for Ensemble Classifiers of P300-Based Brain-Computer Interfaces
Onishi, Akinari; Natsume, Kiyohisa
2014-01-01
A P300-based brain-computer interface (BCI) enables a wide range of people to control devices that improve their quality of life. Ensemble classifiers with naive partitioning were recently applied to the P300-based BCI and these classification performances were assessed. However, they were usually trained on a large amount of training data (e.g., 15300). In this study, we evaluated ensemble linear discriminant analysis (LDA) classifiers with a newly proposed overlapped partitioning method using 900 training data. In addition, the classification performances of the ensemble classifier with naive partitioning and a single LDA classifier were compared. One of three conditions for dimension reduction was applied: the stepwise method, principal component analysis (PCA), or none. The results show that an ensemble stepwise LDA (SWLDA) classifier with overlapped partitioning achieved a better performance than the commonly used single SWLDA classifier and an ensemble SWLDA classifier with naive partitioning. This result implies that the performance of the SWLDA is improved by overlapped partitioning and the ensemble classifier with overlapped partitioning requires less training data than that with naive partitioning. This study contributes towards reducing the required amount of training data and achieving better classification performance. PMID:24695550
Overlapped partitioning for ensemble classifiers of P300-based brain-computer interfaces.
Onishi, Akinari; Natsume, Kiyohisa
2014-01-01
A P300-based brain-computer interface (BCI) enables a wide range of people to control devices that improve their quality of life. Ensemble classifiers with naive partitioning were recently applied to the P300-based BCI and these classification performances were assessed. However, they were usually trained on a large amount of training data (e.g., 15300). In this study, we evaluated ensemble linear discriminant analysis (LDA) classifiers with a newly proposed overlapped partitioning method using 900 training data. In addition, the classification performances of the ensemble classifier with naive partitioning and a single LDA classifier were compared. One of three conditions for dimension reduction was applied: the stepwise method, principal component analysis (PCA), or none. The results show that an ensemble stepwise LDA (SWLDA) classifier with overlapped partitioning achieved a better performance than the commonly used single SWLDA classifier and an ensemble SWLDA classifier with naive partitioning. This result implies that the performance of the SWLDA is improved by overlapped partitioning and the ensemble classifier with overlapped partitioning requires less training data than that with naive partitioning. This study contributes towards reducing the required amount of training data and achieving better classification performance.
Understanding of the naive Bayes classifier in spam filtering
NASA Astrophysics Data System (ADS)
Wei, Qijia
2018-05-01
Along with the development of the Internet, the information stream is experiencing an unprecedented burst. The methods of information transmission become more and more important and people receiving effective information is a hot topic in the both research and industry field. As one of the most common methods of information communication, email has its own advantages. However, spams always flood the inbox and automatic filtering is needed. This paper is going to discuss this issue from the perspective of Naive Bayes Classifier, which is one of the applications of Bayes Theorem. Concepts and process of Naive Bayes Classifier will be introduced, followed by two examples. Discussion with Machine Learning is made in the last section. Naive Bayes Classifier has been proved to be surprisingly effective, with the limitation of the interdependence among attributes which are usually email words or phrases.
SUVI Thematic Maps: A new tool for space weather forecasting
NASA Astrophysics Data System (ADS)
Hughes, J. M.; Seaton, D. B.; Darnel, J.
2017-12-01
The new Solar Ultraviolet Imager (SUVI) instruments aboard NOAA's GOES-R series satellites collect continuous, high-quality imagery of the Sun in six wavelengths. SUVI imagers produce at least one image every 10 seconds, or 8,640 images per day, considerably more data than observers can digest in real time. Over the projected 20-year lifetime of the four GOES-R series spacecraft, SUVI will provide critical imagery for space weather forecasters and produce an extensive but unwieldy archive. In order to condense the database into a dynamic and searchable form we have developed solar thematic maps, maps of the Sun with key features, such as coronal holes, flares, bright regions, quiet corona, and filaments, identified. Thematic maps will be used in NOAA's Space Weather Prediction Center to improve forecaster response time to solar events and generate several derivative products. Likewise, scientists use thematic maps to find observations of interest more easily. Using an expert-trained, naive Bayesian classifier to label each pixel, we create thematic maps in real-time. We created software to collect expert classifications of solar features based on SUVI images. Using this software, we compiled a database of expert classifications, from which we could characterize the distribution of pixels associated with each theme. Given new images, the classifier assigns each pixel the most appropriate label according to the trained distribution. Here we describe the software to collect expert training and the successes and limitations of the classifier. The algorithm excellently identifies coronal holes but fails to consistently detect filaments and prominences. We compare the Bayesian classifier to an artificial neural network, one of our attempts to overcome the aforementioned limitations. These results are very promising and encourage future research into an ensemble classification approach.
Chinese Sentence Classification Based on Convolutional Neural Network
NASA Astrophysics Data System (ADS)
Gu, Chengwei; Wu, Ming; Zhang, Chuang
2017-10-01
Sentence classification is one of the significant issues in Natural Language Processing (NLP). Feature extraction is often regarded as the key point for natural language processing. Traditional ways based on machine learning can not take high level features into consideration, such as Naive Bayesian Model. The neural network for sentence classification can make use of contextual information to achieve greater results in sentence classification tasks. In this paper, we focus on classifying Chinese sentences. And the most important is that we post a novel architecture of Convolutional Neural Network (CNN) to apply on Chinese sentence classification. In particular, most of the previous methods often use softmax classifier for prediction, we embed a linear support vector machine to substitute softmax in the deep neural network model, minimizing a margin-based loss to get a better result. And we use tanh as an activation function, instead of ReLU. The CNN model improve the result of Chinese sentence classification tasks. Experimental results on the Chinese news title database validate the effectiveness of our model.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pon, R K; Cardenas, A F; Buttler, D J
The definition of what makes an article interesting varies from user to user and continually evolves even for a single user. As a result, for news recommendation systems, useless document features can not be determined a priori and all features are usually considered for interestingness classification. Consequently, the presence of currently useless features degrades classification performance [1], particularly over the initial set of news articles being classified. The initial set of document is critical for a user when considering which particular news recommendation system to adopt. To address these problems, we introduce an improved version of the naive Bayes classifiermore » with online feature selection. We use correlation to determine the utility of each feature and take advantage of the conditional independence assumption used by naive Bayes for online feature selection and classification. The augmented naive Bayes classifier performs 28% better than the traditional naive Bayes classifier in recommending news articles from the Yahoo! RSS feeds.« less
NASA Astrophysics Data System (ADS)
Candra Permana, Fahmi; Rosmansyah, Yusep; Setiawan Abdullah, Atje
2017-10-01
Students activity on social media can provide implicit knowledge and new perspectives for an educational system. Sentiment analysis is a part of text mining that can help to analyze and classify the opinion data. This research uses text mining and naive Bayes method as opinion classifier, to be used as an alternative methods in the process of evaluating studentss satisfaction for educational institution. Based on test results, this system can determine the opinion classification in Bahasa Indonesia using naive Bayes as opinion classifier with accuracy level of 84% correct, and the comparison between the existing system and the proposed system to evaluate students satisfaction in learning process, there is only a difference of 16.49%.
Fuzzy Naive Bayesian model for medical diagnostic decision support.
Wagholikar, Kavishwar B; Vijayraghavan, Sundararajan; Deshpande, Ashok W
2009-01-01
This work relates to the development of computational algorithms to provide decision support to physicians. The authors propose a Fuzzy Naive Bayesian (FNB) model for medical diagnosis, which extends the Fuzzy Bayesian approach proposed by Okuda. A physician's interview based method is described to define a orthogonal fuzzy symptom information system, required to apply the model. For the purpose of elaboration and elicitation of characteristics, the algorithm is applied to a simple simulated dataset, and compared with conventional Naive Bayes (NB) approach. As a preliminary evaluation of FNB in real world scenario, the comparison is repeated on a real fuzzy dataset of 81 patients diagnosed with infectious diseases. The case study on simulated dataset elucidates that FNB can be optimal over NB for diagnosing patients with imprecise-fuzzy information, on account of the following characteristics - 1) it can model the information that, values of some attributes are semantically closer than values of other attributes, and 2) it offers a mechanism to temper exaggerations in patient information. Although the algorithm requires precise training data, its utility for fuzzy training data is argued for. This is supported by the case study on infectious disease dataset, which indicates optimality of FNB over NB for the infectious disease domain. Further case studies on large datasets are required to establish utility of FNB.
Evaluation of Bayesian approaches to identify DDT source contributions to soils in Southeast China.
Zeng, Faming; Yang, Dan; Xing, Xinli; Qi, Shihua
2017-06-01
Dicofol application may be an important source to elevate the dichlorodiphenyltrichloroethane (DDT) residues to soils in Fujian, Southeast China, after the technical DDT was banned, which left DDT residues from the historical application. The DDT residues varied geographically, corresponding to the varied potential sources of DDT. In this study, a novel approach based on the Bayesian method (BM) was developed to identify the source contributions of DDT to soils, composed with both historical DDT and dicofol. The Naive Bayesian classifier was used basing on the subset of the samples, which were determined by chemical analysis independent of the Bayesian approach. The results show that BM (95%) was higher than that using the ratio of o, p'-/p, p'-DDT (84%) to identify DDT source contributions. High detection rate (97%) of dicofol (p, p'-OH-DDT) was observed in the subset, showing dicofol application influenced the DDX levels in soils in Fujian. However, the contribution from historical technical DDT source was greater than that from dicofol in Fujian, indicating historical technical DDT was still an important pollution source to soils. In addition, both the DDX (DDT isomers and derivatives) level and dicofol contribution in non-agricultural soils were higher than other agricultural land uses, especially in hilly regions, the potential cause may be the atmospheric transport of dicofol type DDT, after spraying during daytime, or regional difference on production and application. Copyright © 2017 Elsevier Ltd. All rights reserved.
Tighe, Patrick J; Lucas, Stephen D; Edwards, David A; Boezaart, André P; Aytug, Haldun; Bihorac, Azra
2012-10-01
The purpose of this project was to determine whether machine-learning classifiers could predict which patients would require a preoperative acute pain service (APS) consultation. Retrospective cohort. University teaching hospital. The records of 9,860 surgical patients posted between January 1 and June 30, 2010 were reviewed. Request for APS consultation. A cohort of machine-learning classifiers was compared according to its ability or inability to classify surgical cases as requiring a request for a preoperative APS consultation. Classifiers were then optimized utilizing ensemble techniques. Computational efficiency was measured with the central processing unit processing times required for model training. Classifiers were tested using the full feature set, as well as the reduced feature set that was optimized using a merit-based dimensional reduction strategy. Machine-learning classifiers correctly predicted preoperative requests for APS consultations in 92.3% (95% confidence intervals [CI], 91.8-92.8) of all surgical cases. Bayesian methods yielded the highest area under the receiver operating curve (0.87, 95% CI 0.84-0.89) and lowest training times (0.0018 seconds, 95% CI, 0.0017-0.0019 for the NaiveBayesUpdateable algorithm). An ensemble of high-performing machine-learning classifiers did not yield a higher area under the receiver operating curve than its component classifiers. Dimensional reduction decreased the computational requirements for multiple classifiers, but did not adversely affect classification performance. Using historical data, machine-learning classifiers can predict which surgical cases should prompt a preoperative request for an APS consultation. Dimensional reduction improved computational efficiency and preserved predictive performance. Wiley Periodicals, Inc.
NASA Astrophysics Data System (ADS)
Luo, Shuwen; Chen, Changshui; Mao, Hua; Jin, Shaoqin
2013-06-01
The feasibility of early detection of gastric cancer using near-infrared (NIR) Raman spectroscopy (RS) by distinguishing premalignant lesions (adenomatous polyp, n=27) and cancer tissues (adenocarcinoma, n=33) from normal gastric tissues (n=45) is evaluated. Significant differences in Raman spectra are observed among the normal, adenomatous polyp, and adenocarcinoma gastric tissues at 936, 1003, 1032, 1174, 1208, 1323, 1335, 1450, and 1655 cm-1. Diverse statistical methods are employed to develop effective diagnostic algorithms for classifying the Raman spectra of different types of ex vivo gastric tissues, including principal component analysis (PCA), linear discriminant analysis (LDA), and naive Bayesian classifier (NBC) techniques. Compared with PCA-LDA algorithms, PCA-NBC techniques together with leave-one-out, cross-validation method provide better discriminative results of normal, adenomatous polyp, and adenocarcinoma gastric tissues, resulting in superior sensitivities of 96.3%, 96.9%, and 96.9%, and specificities of 93%, 100%, and 95.2%, respectively. Therefore, NIR RS associated with multivariate statistical algorithms has the potential for early diagnosis of gastric premalignant lesions and cancer tissues in molecular level.
The Persistence of "Solid" and "Liquid" Naive Conceptions: A Reaction Time Study
ERIC Educational Resources Information Center
Babai, Reuven; Amsterdamer, Anat
2008-01-01
The study explores whether the naive concepts of "solid" and "liquid" persist in adolescence. Accuracy of responses and reaction times where measured while 41 ninth graders classified different solids (rigid, non-rigid and powders) and different liquids (runny, dense) into solid or liquid. The results show that these naive conceptions affect…
Impact of censoring on learning Bayesian networks in survival modelling.
Stajduhar, Ivan; Dalbelo-Basić, Bojana; Bogunović, Nikola
2009-11-01
Bayesian networks are commonly used for presenting uncertainty and covariate interactions in an easily interpretable way. Because of their efficient inference and ability to represent causal relationships, they are an excellent choice for medical decision support systems in diagnosis, treatment, and prognosis. Although good procedures for learning Bayesian networks from data have been defined, their performance in learning from censored survival data has not been widely studied. In this paper, we explore how to use these procedures to learn about possible interactions between prognostic factors and their influence on the variate of interest. We study how censoring affects the probability of learning correct Bayesian network structures. Additionally, we analyse the potential usefulness of the learnt models for predicting the time-independent probability of an event of interest. We analysed the influence of censoring with a simulation on synthetic data sampled from randomly generated Bayesian networks. We used two well-known methods for learning Bayesian networks from data: a constraint-based method and a score-based method. We compared the performance of each method under different levels of censoring to those of the naive Bayes classifier and the proportional hazards model. We did additional experiments on several datasets from real-world medical domains. The machine-learning methods treated censored cases in the data as event-free. We report and compare results for several commonly used model evaluation metrics. On average, the proportional hazards method outperformed other methods in most censoring setups. As part of the simulation study, we also analysed structural similarities of the learnt networks. Heavy censoring, as opposed to no censoring, produces up to a 5% surplus and up to 10% missing total arcs. It also produces up to 50% missing arcs that should originally be connected to the variate of interest. Presented methods for learning Bayesian networks from data can be used to learn from censored survival data in the presence of light censoring (up to 20%) by treating censored cases as event-free. Given intermediate or heavy censoring, the learnt models become tuned to the majority class and would thus require a different approach.
NASA Astrophysics Data System (ADS)
Alehosseini, Ali; A. Hejazi, Maryam; Mokhtari, Ghassem; B. Gharehpetian, Gevork; Mohammadi, Mohammad
2015-06-01
In this paper, the Bayesian classifier is used to detect and classify the radial deformation and axial displacement of transformer windings. The proposed method is tested on a model of transformer for different volumes of radial deformation and axial displacement. In this method, ultra-wideband (UWB) signal is sent to the simplified model of the transformer winding. The received signal from the winding model is recorded and used for training and testing of Bayesian classifier in different axial displacement and radial deformation states of the winding. It is shown that the proposed method has a good accuracy to detect and classify the axial displacement and radial deformation of the winding.
Sentiment analysis system for movie review in Bahasa Indonesia using naive bayes classifier method
NASA Astrophysics Data System (ADS)
Nurdiansyah, Yanuar; Bukhori, Saiful; Hidayat, Rahmad
2018-04-01
There are many ways of implementing the use of sentiments often found in documents; one of which is the sentiments found on the product or service reviews. It is so important to be able to process and extract textual data from the documents. Therefore, we propose a system that is able to classify sentiments from review documents into two classes: positive sentiment and negative sentiment. We use Naive Bayes Classifier method in this document classification system that we build. We choose Movienthusiast, a movie reviews in Bahasa Indonesia website as the source of our review documents. From there, we were able to collect 1201 movie reviews: 783 positive reviews and 418 negative reviews that we use as the dataset for this machine learning classifier. The classifying accuracy yields an average of 88.37% from five times of accuracy measuring attempts using aforementioned dataset.
Guinness, Robert E
2015-04-28
This paper presents the results of research on the use of smartphone sensors (namely, GPS and accelerometers), geospatial information (points of interest, such as bus stops and train stations) and machine learning (ML) to sense mobility contexts. Our goal is to develop techniques to continuously and automatically detect a smartphone user's mobility activities, including walking, running, driving and using a bus or train, in real-time or near-real-time (<5 s). We investigated a wide range of supervised learning techniques for classification, including decision trees (DT), support vector machines (SVM), naive Bayes classifiers (NB), Bayesian networks (BN), logistic regression (LR), artificial neural networks (ANN) and several instance-based classifiers (KStar, LWLand IBk). Applying ten-fold cross-validation, the best performers in terms of correct classification rate (i.e., recall) were DT (96.5%), BN (90.9%), LWL (95.5%) and KStar (95.6%). In particular, the DT-algorithm RandomForest exhibited the best overall performance. After a feature selection process for a subset of algorithms, the performance was improved slightly. Furthermore, after tuning the parameters of RandomForest, performance improved to above 97.5%. Lastly, we measured the computational complexity of the classifiers, in terms of central processing unit (CPU) time needed for classification, to provide a rough comparison between the algorithms in terms of battery usage requirements. As a result, the classifiers can be ranked from lowest to highest complexity (i.e., computational cost) as follows: SVM, ANN, LR, BN, DT, NB, IBk, LWL and KStar. The instance-based classifiers take considerably more computational time than the non-instance-based classifiers, whereas the slowest non-instance-based classifier (NB) required about five-times the amount of CPU time as the fastest classifier (SVM). The above results suggest that DT algorithms are excellent candidates for detecting mobility contexts in smartphones, both in terms of performance and computational complexity.
Guinness, Robert E.
2015-01-01
This paper presents the results of research on the use of smartphone sensors (namely, GPS and accelerometers), geospatial information (points of interest, such as bus stops and train stations) and machine learning (ML) to sense mobility contexts. Our goal is to develop techniques to continuously and automatically detect a smartphone user's mobility activities, including walking, running, driving and using a bus or train, in real-time or near-real-time (<5 s). We investigated a wide range of supervised learning techniques for classification, including decision trees (DT), support vector machines (SVM), naive Bayes classifiers (NB), Bayesian networks (BN), logistic regression (LR), artificial neural networks (ANN) and several instance-based classifiers (KStar, LWLand IBk). Applying ten-fold cross-validation, the best performers in terms of correct classification rate (i.e., recall) were DT (96.5%), BN (90.9%), LWL (95.5%) and KStar (95.6%). In particular, the DT-algorithm RandomForest exhibited the best overall performance. After a feature selection process for a subset of algorithms, the performance was improved slightly. Furthermore, after tuning the parameters of RandomForest, performance improved to above 97.5%. Lastly, we measured the computational complexity of the classifiers, in terms of central processing unit (CPU) time needed for classification, to provide a rough comparison between the algorithms in terms of battery usage requirements. As a result, the classifiers can be ranked from lowest to highest complexity (i.e., computational cost) as follows: SVM, ANN, LR, BN, DT, NB, IBk, LWL and KStar. The instance-based classifiers take considerably more computational time than the non-instance-based classifiers, whereas the slowest non-instance-based classifier (NB) required about five-times the amount of CPU time as the fastest classifier (SVM). The above results suggest that DT algorithms are excellent candidates for detecting mobility contexts in smartphones, both in terms of performance and computational complexity. PMID:25928060
Benndorf, Matthias; Kotter, Elmar; Langer, Mathias; Herda, Christoph; Wu, Yirong; Burnside, Elizabeth S
2015-06-01
To develop and validate a decision support tool for mammographic mass lesions based on a standardized descriptor terminology (BI-RADS lexicon) to reduce variability of practice. We used separate training data (1,276 lesions, 138 malignant) and validation data (1,177 lesions, 175 malignant). We created naïve Bayes (NB) classifiers from the training data with tenfold cross-validation. Our "inclusive model" comprised BI-RADS categories, BI-RADS descriptors, and age as predictive variables; our "descriptor model" comprised BI-RADS descriptors and age. The resulting NB classifiers were applied to the validation data. We evaluated and compared classifier performance with ROC-analysis. In the training data, the inclusive model yields an AUC of 0.959; the descriptor model yields an AUC of 0.910 (P < 0.001). The inclusive model is superior to the clinical performance (BI-RADS categories alone, P < 0.001); the descriptor model performs similarly. When applied to the validation data, the inclusive model yields an AUC of 0.935; the descriptor model yields an AUC of 0.876 (P < 0.001). Again, the inclusive model is superior to the clinical performance (P < 0.001); the descriptor model performs similarly. We consider our classifier a step towards a more uniform interpretation of combinations of BI-RADS descriptors. We provide our classifier at www.ebm-radiology.com/nbmm/index.html . • We provide a decision support tool for mammographic masses at www.ebm-radiology.com/nbmm/index.html . • Our tool may reduce variability of practice in BI-RADS category assignment. • A formal analysis of BI-RADS descriptors may enhance radiologists' diagnostic performance.
A NAIVE BAYES SOURCE CLASSIFIER FOR X-RAY SOURCES
DOE Office of Scientific and Technical Information (OSTI.GOV)
Broos, Patrick S.; Getman, Konstantin V.; Townsley, Leisa K.
2011-05-01
The Chandra Carina Complex Project (CCCP) provides a sensitive X-ray survey of a nearby starburst region over >1 deg{sup 2} in extent. Thousands of faint X-ray sources are found, many concentrated into rich young stellar clusters. However, significant contamination from unrelated Galactic and extragalactic sources is present in the X-ray catalog. We describe the use of a naive Bayes classifier to assign membership probabilities to individual sources, based on source location, X-ray properties, and visual/infrared properties. For the particular membership decision rule adopted, 75% of CCCP sources are classified as members, 11% are classified as contaminants, and 14% remain unclassified.more » The resulting sample of stars likely to be Carina members is used in several other studies, which appear in this special issue devoted to the CCCP.« less
Uses and misuses of Bayes' rule and Bayesian classifiers in cybersecurity
NASA Astrophysics Data System (ADS)
Bard, Gregory V.
2017-12-01
This paper will discuss the applications of Bayes' Rule and Bayesian Classifiers in Cybersecurity. While the most elementary form of Bayes' rule occurs in undergraduate coursework, there are more complicated forms as well. As an extended example, Bayesian spam filtering is explored, and is in many ways the most triumphant accomplishment of Bayesian reasoning in computer science, as nearly everyone with an email address has a spam folder. Bayesian Classifiers have also been responsible significant cybersecurity research results; yet, because they are not part of the standard curriculum, few in the mathematics or information-technology communities have seen the exact definitions, requirements, and proofs that comprise the subject. Moreover, numerous errors have been made by researchers (described in this paper), due to some mathematical misunderstandings dealing with conditional independence, or other badly chosen assumptions. Finally, to provide instructors and researchers with real-world examples, 25 published cybersecurity papers that use Bayesian reasoning are given, with 2-4 sentence summaries of the focus and contributions of each paper.
Millennial Filipino Student Engagement Analyzer Using Facial Feature Classification
NASA Astrophysics Data System (ADS)
Manseras, R.; Eugenio, F.; Palaoag, T.
2018-03-01
Millennials has been a word of mouth of everybody and a target market of various companies nowadays. In the Philippines, they comprise one third of the total population and most of them are still in school. Having a good education system is important for this generation to prepare them for better careers. And a good education system means having quality instruction as one of the input component indicators. In a classroom environment, teachers use facial features to measure the affect state of the class. Emerging technologies like Affective Computing is one of today’s trends to improve quality instruction delivery. This, together with computer vision, can be used in analyzing affect states of the students and improve quality instruction delivery. This paper proposed a system of classifying student engagement using facial features. Identifying affect state, specifically Millennial Filipino student engagement, is one of the main priorities of every educator and this directed the authors to develop a tool to assess engagement percentage. Multiple face detection framework using Face API was employed to detect as many student faces as possible to gauge current engagement percentage of the whole class. The binary classifier model using Support Vector Machine (SVM) was primarily set in the conceptual framework of this study. To achieve the most accuracy performance of this model, a comparison of SVM to two of the most widely used binary classifiers were tested. Results show that SVM bested RandomForest and Naive Bayesian algorithms in most of the experiments from the different test datasets.
AlzhCPI: A knowledge base for predicting chemical-protein interactions towards Alzheimer's disease.
Fang, Jiansong; Wang, Ling; Li, Yecheng; Lian, Wenwen; Pang, Xiaocong; Wang, Hong; Yuan, Dongsheng; Wang, Qi; Liu, Ai-Lin; Du, Guan-Hua
2017-01-01
Alzheimer's disease (AD) is a complicated progressive neurodegeneration disorder. To confront AD, scientists are searching for multi-target-directed ligands (MTDLs) to delay disease progression. The in silico prediction of chemical-protein interactions (CPI) can accelerate target identification and drug discovery. Previously, we developed 100 binary classifiers to predict the CPI for 25 key targets against AD using the multi-target quantitative structure-activity relationship (mt-QSAR) method. In this investigation, we aimed to apply the mt-QSAR method to enlarge the model library to predict CPI towards AD. Another 104 binary classifiers were further constructed to predict the CPI for 26 preclinical AD targets based on the naive Bayesian (NB) and recursive partitioning (RP) algorithms. The internal 5-fold cross-validation and external test set validation were applied to evaluate the performance of the training sets and test set, respectively. The area under the receiver operating characteristic curve (ROC) for the test sets ranged from 0.629 to 1.0, with an average of 0.903. In addition, we developed a web server named AlzhCPI to integrate the comprehensive information of approximately 204 binary classifiers, which has potential applications in network pharmacology and drug repositioning. AlzhCPI is available online at http://rcidm.org/AlzhCPI/index.html. To illustrate the applicability of AlzhCPI, the developed system was employed for the systems pharmacology-based investigation of shichangpu against AD to enhance the understanding of the mechanisms of action of shichangpu from a holistic perspective.
Classification of postural profiles among mouth-breathing children by learning vector quantization.
Mancini, F; Sousa, F S; Hummel, A D; Falcão, A E J; Yi, L C; Ortolani, C F; Sigulem, D; Pisa, I T
2011-01-01
Mouth breathing is a chronic syndrome that may bring about postural changes. Finding characteristic patterns of changes occurring in the complex musculoskeletal system of mouth-breathing children has been a challenge. Learning vector quantization (LVQ) is an artificial neural network model that can be applied for this purpose. The aim of the present study was to apply LVQ to determine the characteristic postural profiles shown by mouth-breathing children, in order to further understand abnormal posture among mouth breathers. Postural training data on 52 children (30 mouth breathers and 22 nose breathers) and postural validation data on 32 children (22 mouth breathers and 10 nose breathers) were used. The performance of LVQ and other classification models was compared in relation to self-organizing maps, back-propagation applied to multilayer perceptrons, Bayesian networks, naive Bayes, J48 decision trees, k, and k-nearest-neighbor classifiers. Classifier accuracy was assessed by means of leave-one-out cross-validation, area under ROC curve (AUC), and inter-rater agreement (Kappa statistics). By using the LVQ model, five postural profiles for mouth-breathing children could be determined. LVQ showed satisfactory results for mouth-breathing and nose-breathing classification: sensitivity and specificity rates of 0.90 and 0.95, respectively, when using the training dataset, and 0.95 and 0.90, respectively, when using the validation dataset. The five postural profiles for mouth-breathing children suggested by LVQ were incorporated into application software for classifying the severity of mouth breathers' abnormal posture.
Fang, Jiansong; Yang, Ranyao; Gao, Li; Zhou, Dan; Yang, Shengqian; Liu, Ai-Lin; Du, Guan-hua
2013-11-25
Butyrylcholinesterase (BuChE, EC 3.1.1.8) is an important pharmacological target for Alzheimer's disease (AD) treatment. However, the currently available BuChE inhibitor screening assays are expensive, labor-intensive, and compound-dependent. It is necessary to develop robust in silico methods to predict the activities of BuChE inhibitors for the lead identification. In this investigation, support vector machine (SVM) models and naive Bayesian models were built to discriminate BuChE inhibitors (BuChEIs) from the noninhibitors. Each molecule was initially represented in 1870 structural descriptors (1235 from ADRIANA.Code, 334 from MOE, and 301 from Discovery studio). Correlation analysis and stepwise variable selection method were applied to figure out activity-related descriptors for prediction models. Additionally, structural fingerprint descriptors were added to improve the predictive ability of models, which were measured by cross-validation, a test set validation with 1001 compounds and an external test set validation with 317 diverse chemicals. The best two models gave Matthews correlation coefficient of 0.9551 and 0.9550 for the test set and 0.9132 and 0.9221 for the external test set. To demonstrate the practical applicability of the models in virtual screening, we screened an in-house data set with 3601 compounds, and 30 compounds were selected for further bioactivity assay. The assay results showed that 10 out of 30 compounds exerted significant BuChE inhibitory activities with IC50 values ranging from 0.32 to 22.22 μM, at which three new scaffolds as BuChE inhibitors were identified for the first time. To our best knowledge, this is the first report on BuChE inhibitors using machine learning approaches. The models generated from SVM and naive Bayesian approaches successfully predicted BuChE inhibitors. The study proved the feasibility of a new method for predicting bioactivities of ligands and discovering novel lead compounds.
Quantum Bayesian networks with application to games displaying Parrondo's paradox
NASA Astrophysics Data System (ADS)
Pejic, Michael
Bayesian networks and their accompanying graphical models are widely used for prediction and analysis across many disciplines. We will reformulate these in terms of linear maps. This reformulation will suggest a natural extension, which we will show is equivalent to standard textbook quantum mechanics. Therefore, this extension will be termed quantum. However, the term quantum should not be taken to imply this extension is necessarily only of utility in situations traditionally thought of as in the domain of quantum mechanics. In principle, it may be employed in any modelling situation, say forecasting the weather or the stock market---it is up to experiment to determine if this extension is useful in practice. Even restricting to the domain of quantum mechanics, with this new formulation the advantages of Bayesian networks can be maintained for models incorporating quantum and mixed classical-quantum behavior. The use of these will be illustrated by various basic examples. Parrondo's paradox refers to the situation where two, multi-round games with a fixed winning criteria, both with probability greater than one-half for one player to win, are combined. Using a possibly biased coin to determine the rule to employ for each round, paradoxically, the previously losing player now wins the combined game with probabilitygreater than one-half. Using the extended Bayesian networks, we will formulate and analyze classical observed, classical hidden, and quantum versions of a game that displays this paradox, finding bounds for the discrepancy from naive expectations for the occurrence of the paradox. A quantum paradox inspired by Parrondo's paradox will also be analyzed. We will prove a bound for the discrepancy from naive expectations for this paradox as well. Games involving quantum walks that achieve this bound will be presented.
NASA Astrophysics Data System (ADS)
Kozoderov, V. V.; Kondranin, T. V.; Dmitriev, E. V.
2017-12-01
The basic model for the recognition of natural and anthropogenic objects using their spectral and textural features is described in the problem of hyperspectral air-borne and space-borne imagery processing. The model is based on improvements of the Bayesian classifier that is a computational procedure of statistical decision making in machine-learning methods of pattern recognition. The principal component method is implemented to decompose the hyperspectral measurements on the basis of empirical orthogonal functions. Application examples are shown of various modifications of the Bayesian classifier and Support Vector Machine method. Examples are provided of comparing these classifiers and a metrical classifier that operates on finding the minimal Euclidean distance between different points and sets in the multidimensional feature space. A comparison is also carried out with the " K-weighted neighbors" method that is close to the nonparametric Bayesian classifier.
NASA Astrophysics Data System (ADS)
Porter, Sophia; Strolger, Louis-Gregory; Lagerstrom, Jill; Weissman, Sarah
2016-01-01
The Space Telescope Science Institute annually receives more than one thousand formal proposals for Hubble Space Telescope time, exceeding the available time with the observatory by a factor of over four. With JWST, the proposal pressure will only increase, straining our ability to provide rigorous peer review of each proposal's scientific merit. Significant hurdles in this process include the proper categorization of proposals, to ensure Time Allocation Committees (TACs) have the required and desired expertise to fairly and appropriately judge each proposal, and the selection of reviewers themselves, to establish diverse and well-qualified TACs. The Panel Auto-Categorizer and Manager (PACMan; a naive Bayesian classifier) was developed to automatically sort new proposals into their appropriate science categories and, similarly, to appoint panel reviewers with the best qualifications to serve on the corresponding TACs. We will provide an overview of PACMan and present the results of its testing on five previous cycles of proposals. PACMan will be implemented in upcoming cycles to support and eventually replace the process for constructing the time allocation reviews.
Palmprint identification using FRIT
NASA Astrophysics Data System (ADS)
Kisku, D. R.; Rattani, A.; Gupta, P.; Hwang, C. J.; Sing, J. K.
2011-06-01
This paper proposes a palmprint identification system using Finite Ridgelet Transform (FRIT) and Bayesian classifier. FRIT is applied on the ROI (region of interest), which is extracted from palmprint image, to extract a set of distinctive features from palmprint image. These features are used to classify with the help of Bayesian classifier. The proposed system has been tested on CASIA and IIT Kanpur palmprint databases. The experimental results reveal better performance compared to all well known systems.
Awaysheh, Abdullah; Wilcke, Jeffrey; Elvinger, François; Rees, Loren; Fan, Weiguo; Zimmerman, Kurt L
2016-11-01
Inflammatory bowel disease (IBD) and alimentary lymphoma (ALA) are common gastrointestinal diseases in cats. The very similar clinical signs and histopathologic features of these diseases make the distinction between them diagnostically challenging. We tested the use of supervised machine-learning algorithms to differentiate between the 2 diseases using data generated from noninvasive diagnostic tests. Three prediction models were developed using 3 machine-learning algorithms: naive Bayes, decision trees, and artificial neural networks. The models were trained and tested on data from complete blood count (CBC) and serum chemistry (SC) results for the following 3 groups of client-owned cats: normal, inflammatory bowel disease (IBD), or alimentary lymphoma (ALA). Naive Bayes and artificial neural networks achieved higher classification accuracy (sensitivities of 70.8% and 69.2%, respectively) than the decision tree algorithm (63%, p < 0.0001). The areas under the receiver-operating characteristic curve for classifying cases into the 3 categories was 83% by naive Bayes, 79% by decision tree, and 82% by artificial neural networks. Prediction models using machine learning provided a method for distinguishing between ALA-IBD, ALA-normal, and IBD-normal. The naive Bayes and artificial neural networks classifiers used 10 and 4 of the CBC and SC variables, respectively, to outperform the C4.5 decision tree, which used 5 CBC and SC variables in classifying cats into the 3 classes. These models can provide another noninvasive diagnostic tool to assist clinicians with differentiating between IBD and ALA, and between diseased and nondiseased cats. © 2016 The Author(s).
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chang, Yongjun; Lim, Jonghyuck; Kim, Namkug
2013-05-15
Purpose: To investigate the effect of using different computed tomography (CT) scanners on the accuracy of high-resolution CT (HRCT) images in classifying regional disease patterns in patients with diffuse lung disease, support vector machine (SVM) and Bayesian classifiers were applied to multicenter data. Methods: Two experienced radiologists marked sets of 600 rectangular 20 Multiplication-Sign 20 pixel regions of interest (ROIs) on HRCT images obtained from two scanners (GE and Siemens), including 100 ROIs for each of local patterns of lungs-normal lung and five of regional pulmonary disease patterns (ground-glass opacity, reticular opacity, honeycombing, emphysema, and consolidation). Each ROI was assessedmore » using 22 quantitative features belonging to one of the following descriptors: histogram, gradient, run-length, gray level co-occurrence matrix, low-attenuation area cluster, and top-hat transform. For automatic classification, a Bayesian classifier and a SVM classifier were compared under three different conditions. First, classification accuracies were estimated using data from each scanner. Next, data from the GE and Siemens scanners were used for training and testing, respectively, and vice versa. Finally, all ROI data were integrated regardless of the scanner type and were then trained and tested together. All experiments were performed based on forward feature selection and fivefold cross-validation with 20 repetitions. Results: For each scanner, better classification accuracies were achieved with the SVM classifier than the Bayesian classifier (92% and 82%, respectively, for the GE scanner; and 92% and 86%, respectively, for the Siemens scanner). The classification accuracies were 82%/72% for training with GE data and testing with Siemens data, and 79%/72% for the reverse. The use of training and test data obtained from the HRCT images of different scanners lowered the classification accuracy compared to the use of HRCT images from the same scanner. For integrated ROI data obtained from both scanners, the classification accuracies with the SVM and Bayesian classifiers were 92% and 77%, respectively. The selected features resulting from the classification process differed by scanner, with more features included for the classification of the integrated HRCT data than for the classification of the HRCT data from each scanner. For the integrated data, consisting of HRCT images of both scanners, the classification accuracy based on the SVM was statistically similar to the accuracy of the data obtained from each scanner. However, the classification accuracy of the integrated data using the Bayesian classifier was significantly lower than the classification accuracy of the ROI data of each scanner. Conclusions: The use of an integrated dataset along with a SVM classifier rather than a Bayesian classifier has benefits in terms of the classification accuracy of HRCT images acquired with more than one scanner. This finding is of relevance in studies involving large number of images, as is the case in a multicenter trial with different scanners.« less
A SVM-based method for sentiment analysis in Persian language
NASA Astrophysics Data System (ADS)
Hajmohammadi, Mohammad Sadegh; Ibrahim, Roliana
2013-03-01
Persian language is the official language of Iran, Tajikistan and Afghanistan. Local online users often represent their opinions and experiences on the web with written Persian. Although the information in those reviews is valuable to potential consumers and sellers, the huge amount of web reviews make it difficult to give an unbiased evaluation to a product. In this paper, standard machine learning techniques SVM and naive Bayes are incorporated into the domain of online Persian Movie reviews to automatically classify user reviews as positive or negative and performance of these two classifiers is compared with each other in this language. The effects of feature presentations on classification performance are discussed. We find that accuracy is influenced by interaction between the classification models and the feature options. The SVM classifier achieves as well as or better accuracy than naive Bayes in Persian movie. Unigrams are proved better features than bigrams and trigrams in capturing Persian sentiment orientation.
Chapman, Brian E.; Lee, Sean; Kang, Hyunseok Peter; Chapman, Wendy W.
2011-01-01
In this paper we describe an application called peFinder for document-level classification of CT pulmonary angiography reports. peFinder is based on a generalized version of the ConText algorithm, a simple text processing algorithm for identifying features in clinical report documents. peFinder was used to answer questions about the disease state (pulmonary emboli present or absent), the certainty state of the diagnosis (uncertainty present or absent), the temporal state of an identified pulmonary embolus (acute or chronic), and the technical quality state of the exam (diagnostic or not diagnostic). Gold standard answers for each question were determined from the consensus classifications of three human annotators. peFinder results were compared to naive Bayes’ classifiers using unigrams and bigrams. The sensitivities (and positive predictive values) for peFinder were 0.98(0.83), 0.86(0.96), 0.94(0.93), and 0.60(0.90) for disease state, quality state, certainty state, and temporal state respectively, compared to 0.68(0.77), 0.67(0.87), 0.62(0.82), and 0.04(0.25) for the naive Bayes’ classifier using unigrams, and 0.75(0.79), 0.52(0.69), 0.59(0.84), and 0.04(0.25) for the naive Bayes’ classifier using bigrams. PMID:21459155
NASA Astrophysics Data System (ADS)
Anees, Asim; Aryal, Jagannath; O'Reilly, Małgorzata M.; Gale, Timothy J.; Wardlaw, Tim
2016-12-01
A robust non-parametric framework, based on multiple Radial Basic Function (RBF) kernels, is proposed in this study, for detecting land/forest cover changes using Landsat 7 ETM+ images. One of the widely used frameworks is to find change vectors (difference image) and use a supervised classifier to differentiate between change and no-change. The Bayesian Classifiers e.g. Maximum Likelihood Classifier (MLC), Naive Bayes (NB), are widely used probabilistic classifiers which assume parametric models, e.g. Gaussian function, for the class conditional distributions. However, their performance can be limited if the data set deviates from the assumed model. The proposed framework exploits the useful properties of Least Squares Probabilistic Classifier (LSPC) formulation i.e. non-parametric and probabilistic nature, to model class posterior probabilities of the difference image using a linear combination of a large number of Gaussian kernels. To this end, a simple technique, based on 10-fold cross-validation is also proposed for tuning model parameters automatically instead of selecting a (possibly) suboptimal combination from pre-specified lists of values. The proposed framework has been tested and compared with Support Vector Machine (SVM) and NB for detection of defoliation, caused by leaf beetles (Paropsisterna spp.) in Eucalyptus nitens and Eucalyptus globulus plantations of two test areas, in Tasmania, Australia, using raw bands and band combination indices of Landsat 7 ETM+. It was observed that due to multi-kernel non-parametric formulation and probabilistic nature, the LSPC outperforms parametric NB with Gaussian assumption in change detection framework, with Overall Accuracy (OA) ranging from 93.6% (κ = 0.87) to 97.4% (κ = 0.94) against 85.3% (κ = 0.69) to 93.4% (κ = 0.85), and is more robust to changing data distributions. Its performance was comparable to SVM, with added advantages of being probabilistic and capable of handling multi-class problems naturally with its original formulation.
Bokulich, Nicholas A; Kaehler, Benjamin D; Rideout, Jai Ram; Dillon, Matthew; Bolyen, Evan; Knight, Rob; Huttley, Gavin A; Gregory Caporaso, J
2018-05-17
Taxonomic classification of marker-gene sequences is an important step in microbiome analysis. We present q2-feature-classifier ( https://github.com/qiime2/q2-feature-classifier ), a QIIME 2 plugin containing several novel machine-learning and alignment-based methods for taxonomy classification. We evaluated and optimized several commonly used classification methods implemented in QIIME 1 (RDP, BLAST, UCLUST, and SortMeRNA) and several new methods implemented in QIIME 2 (a scikit-learn naive Bayes machine-learning classifier, and alignment-based taxonomy consensus methods based on VSEARCH, and BLAST+) for classification of bacterial 16S rRNA and fungal ITS marker-gene amplicon sequence data. The naive-Bayes, BLAST+-based, and VSEARCH-based classifiers implemented in QIIME 2 meet or exceed the species-level accuracy of other commonly used methods designed for classification of marker gene sequences that were evaluated in this work. These evaluations, based on 19 mock communities and error-free sequence simulations, including classification of simulated "novel" marker-gene sequences, are available in our extensible benchmarking framework, tax-credit ( https://github.com/caporaso-lab/tax-credit-data ). Our results illustrate the importance of parameter tuning for optimizing classifier performance, and we make recommendations regarding parameter choices for these classifiers under a range of standard operating conditions. q2-feature-classifier and tax-credit are both free, open-source, BSD-licensed packages available on GitHub.
Cannon, Edward O; Amini, Ata; Bender, Andreas; Sternberg, Michael J E; Muggleton, Stephen H; Glen, Robert C; Mitchell, John B O
2007-05-01
We investigate the classification performance of circular fingerprints in combination with the Naive Bayes Classifier (MP2D), Inductive Logic Programming (ILP) and Support Vector Inductive Logic Programming (SVILP) on a standard molecular benchmark dataset comprising 11 activity classes and about 102,000 structures. The Naive Bayes Classifier treats features independently while ILP combines structural fragments, and then creates new features with higher predictive power. SVILP is a very recently presented method which adds a support vector machine after common ILP procedures. The performance of the methods is evaluated via a number of statistical measures, namely recall, specificity, precision, F-measure, Matthews Correlation Coefficient, area under the Receiver Operating Characteristic (ROC) curve and enrichment factor (EF). According to the F-measure, which takes both recall and precision into account, SVILP is for seven out of the 11 classes the superior method. The results show that the Bayes Classifier gives the best recall performance for eight of the 11 targets, but has a much lower precision, specificity and F-measure. The SVILP model on the other hand has the highest recall for only three of the 11 classes, but generally far superior specificity and precision. To evaluate the statistical significance of the SVILP superiority, we employ McNemar's test which shows that SVILP performs significantly (p < 5%) better than both other methods for six out of 11 activity classes, while being superior with less significance for three of the remaining classes. While previously the Bayes Classifier was shown to perform very well in molecular classification studies, these results suggest that SVILP is able to extract additional knowledge from the data, thus improving classification results further.
Chapman, Brian E; Lee, Sean; Kang, Hyunseok Peter; Chapman, Wendy W
2011-10-01
In this paper we describe an application called peFinder for document-level classification of CT pulmonary angiography reports. peFinder is based on a generalized version of the ConText algorithm, a simple text processing algorithm for identifying features in clinical report documents. peFinder was used to answer questions about the disease state (pulmonary emboli present or absent), the certainty state of the diagnosis (uncertainty present or absent), the temporal state of an identified pulmonary embolus (acute or chronic), and the technical quality state of the exam (diagnostic or not diagnostic). Gold standard answers for each question were determined from the consensus classifications of three human annotators. peFinder results were compared to naive Bayes' classifiers using unigrams and bigrams. The sensitivities (and positive predictive values) for peFinder were 0.98(0.83), 0.86(0.96), 0.94(0.93), and 0.60(0.90) for disease state, quality state, certainty state, and temporal state respectively, compared to 0.68(0.77), 0.67(0.87), 0.62(0.82), and 0.04(0.25) for the naive Bayes' classifier using unigrams, and 0.75(0.79), 0.52(0.69), 0.59(0.84), and 0.04(0.25) for the naive Bayes' classifier using bigrams. Copyright © 2011 Elsevier Inc. All rights reserved.
Naive scoring of human sleep based on a hidden Markov model of the electroencephalogram.
Yaghouby, Farid; Modur, Pradeep; Sunderam, Sridhar
2014-01-01
Clinical sleep scoring involves tedious visual review of overnight polysomnograms by a human expert. Many attempts have been made to automate the process by training computer algorithms such as support vector machines and hidden Markov models (HMMs) to replicate human scoring. Such supervised classifiers are typically trained on scored data and then validated on scored out-of-sample data. Here we describe a methodology based on HMMs for scoring an overnight sleep recording without the benefit of a trained initial model. The number of states in the data is not known a priori and is optimized using a Bayes information criterion. When tested on a 22-subject database, this unsupervised classifier agreed well with human scores (mean of Cohen's kappa > 0.7). The HMM also outperformed other unsupervised classifiers (Gaussian mixture models, k-means, and linkage trees), that are capable of naive classification but do not model dynamics, by a significant margin (p < 0.05).
Liu, Zhihong; Zheng, Minghao; Yan, Xin; Gu, Qiong; Gasteiger, Johann; Tijhuis, Johan; Maas, Peter; Li, Jiabo; Xu, Jun
2014-09-01
Predicting compound chemical stability is important because unstable compounds can lead to either false positive or to false negative conclusions in bioassays. Experimental data (COMDECOM) measured from DMSO/H2O solutions stored at 50 °C for 105 days were used to predicted stability by applying rule-embedded naïve Bayesian learning, based upon atom center fragment (ACF) features. To build the naïve Bayesian classifier, we derived ACF features from 9,746 compounds in the COMDECOM dataset. By recursively applying naïve Bayesian learning from the data set, each ACF is assigned with an expected stable probability (p(s)) and an unstable probability (p(uns)). 13,340 ACFs, together with their p(s) and p(uns) data, were stored in a knowledge base for use by the Bayesian classifier. For a given compound, its ACFs were derived from its structure connection table with the same protocol used to drive ACFs from the training data. Then, the Bayesian classifier assigned p(s) and p(uns) values to the compound ACFs by a structural pattern recognition algorithm, which was implemented in-house. Compound instability is calculated, with Bayes' theorem, based upon the p(s) and p(uns) values of the compound ACFs. We were able to achieve performance with an AUC value of 84% and a tenfold cross validation accuracy of 76.5%. To reduce false negatives, a rule-based approach has been embedded in the classifier. The rule-based module allows the program to improve its predictivity by expanding its compound instability knowledge base, thus further reducing the possibility of false negatives. To our knowledge, this is the first in silico prediction service for the prediction of the stabilities of organic compounds.
Combination of dynamic Bayesian network classifiers for the recognition of degraded characters
NASA Astrophysics Data System (ADS)
Likforman-Sulem, Laurence; Sigelle, Marc
2009-01-01
We investigate in this paper the combination of DBN (Dynamic Bayesian Network) classifiers, either independent or coupled, for the recognition of degraded characters. The independent classifiers are a vertical HMM and a horizontal HMM whose observable outputs are the image columns and the image rows respectively. The coupled classifiers, presented in a previous study, associate the vertical and horizontal observation streams into single DBNs. The scores of the independent and coupled classifiers are then combined linearly at the decision level. We compare the different classifiers -independent, coupled or linearly combined- on two tasks: the recognition of artificially degraded handwritten digits and the recognition of real degraded old printed characters. Our results show that coupled DBNs perform better on degraded characters than the linear combination of independent HMM scores. Our results also show that the best classifier is obtained by linearly combining the scores of the best coupled DBN and the best independent HMM.
Learning a Health Knowledge Graph from Electronic Medical Records.
Rotmensch, Maya; Halpern, Yoni; Tlimat, Abdulhakim; Horng, Steven; Sontag, David
2017-07-20
Demand for clinical decision support systems in medicine and self-diagnostic symptom checkers has substantially increased in recent years. Existing platforms rely on knowledge bases manually compiled through a labor-intensive process or automatically derived using simple pairwise statistics. This study explored an automated process to learn high quality knowledge bases linking diseases and symptoms directly from electronic medical records. Medical concepts were extracted from 273,174 de-identified patient records and maximum likelihood estimation of three probabilistic models was used to automatically construct knowledge graphs: logistic regression, naive Bayes classifier and a Bayesian network using noisy OR gates. A graph of disease-symptom relationships was elicited from the learned parameters and the constructed knowledge graphs were evaluated and validated, with permission, against Google's manually-constructed knowledge graph and against expert physician opinions. Our study shows that direct and automated construction of high quality health knowledge graphs from medical records using rudimentary concept extraction is feasible. The noisy OR model produces a high quality knowledge graph reaching precision of 0.85 for a recall of 0.6 in the clinical evaluation. Noisy OR significantly outperforms all tested models across evaluation frameworks (p < 0.01).
Discriminative power of visual attributes in dermatology.
Giotis, Ioannis; Visser, Margaretha; Jonkman, Marcel; Petkov, Nicolai
2013-02-01
Visual characteristics such as color and shape of skin lesions play an important role in the diagnostic process. In this contribution, we quantify the discriminative power of such attributes using an information theoretical approach. We estimate the probability of occurrence of each attribute as a function of the skin diseases. We use the distribution of this probability across the studied diseases and its entropy to define the discriminative power of the attribute. The discriminative power has a maximum value for attributes that occur (or do not occur) for only one disease and a minimum value for those which are equally likely to be observed among all diseases. Verrucous surface, red and brown colors, and the presence of more than 10 lesions are among the most informative attributes. A ranking of attributes is also carried out and used together with a naive Bayesian classifier, yielding results that confirm the soundness of the proposed method. proposed measure is proven to be a reliable way of assessing the discriminative power of dermatological attributes, and it also helps generate a condensed dermatological lexicon. Therefore, it can be of added value to the manual or computer-aided diagnostic process. © 2012 John Wiley & Sons A/S.
Discriminative Bayesian Dictionary Learning for Classification.
Akhtar, Naveed; Shafait, Faisal; Mian, Ajmal
2016-12-01
We propose a Bayesian approach to learn discriminative dictionaries for sparse representation of data. The proposed approach infers probability distributions over the atoms of a discriminative dictionary using a finite approximation of Beta Process. It also computes sets of Bernoulli distributions that associate class labels to the learned dictionary atoms. This association signifies the selection probabilities of the dictionary atoms in the expansion of class-specific data. Furthermore, the non-parametric character of the proposed approach allows it to infer the correct size of the dictionary. We exploit the aforementioned Bernoulli distributions in separately learning a linear classifier. The classifier uses the same hierarchical Bayesian model as the dictionary, which we present along the analytical inference solution for Gibbs sampling. For classification, a test instance is first sparsely encoded over the learned dictionary and the codes are fed to the classifier. We performed experiments for face and action recognition; and object and scene-category classification using five public datasets and compared the results with state-of-the-art discriminative sparse representation approaches. Experiments show that the proposed Bayesian approach consistently outperforms the existing approaches.
ERIC Educational Resources Information Center
Pankau, Brian L.
2009-01-01
This empirical study evaluates the document category prediction effectiveness of Naive Bayes (NB) and K-Nearest Neighbor (KNN) classifier treatments built from different feature selection and machine learning settings and trained and tested against textual corpora of 2300 Gang-Of-Four (GOF) design pattern documents. Analysis of the experiment's…
Gustafsson, Mats G; Wallman, Mikael; Wickenberg Bolin, Ulrika; Göransson, Hanna; Fryknäs, M; Andersson, Claes R; Isaksson, Anders
2010-06-01
Successful use of classifiers that learn to make decisions from a set of patient examples require robust methods for performance estimation. Recently many promising approaches for determination of an upper bound for the error rate of a single classifier have been reported but the Bayesian credibility interval (CI) obtained from a conventional holdout test still delivers one of the tightest bounds. The conventional Bayesian CI becomes unacceptably large in real world applications where the test set sizes are less than a few hundred. The source of this problem is that fact that the CI is determined exclusively by the result on the test examples. In other words, there is no information at all provided by the uniform prior density distribution employed which reflects complete lack of prior knowledge about the unknown error rate. Therefore, the aim of the study reported here was to study a maximum entropy (ME) based approach to improved prior knowledge and Bayesian CIs, demonstrating its relevance for biomedical research and clinical practice. It is demonstrated how a refined non-uniform prior density distribution can be obtained by means of the ME principle using empirical results from a few designs and tests using non-overlapping sets of examples. Experimental results show that ME based priors improve the CIs when employed to four quite different simulated and two real world data sets. An empirically derived ME prior seems promising for improving the Bayesian CI for the unknown error rate of a designed classifier. Copyright 2010 Elsevier B.V. All rights reserved.
ERIC Educational Resources Information Center
Schembri, Adam; Jones, Caroline; Burnham, Denis
2005-01-01
Recent research into signed languages indicates that signs may share some properties with gesture, especially in the use of space in classifier constructions. A prediction of this proposal is that there will be similarities in the representation of motion events by sign-naive gesturers and by native signers of unrelated signed languages. This…
Using Bayesian neural networks to classify forest scenes
NASA Astrophysics Data System (ADS)
Vehtari, Aki; Heikkonen, Jukka; Lampinen, Jouko; Juujarvi, Jouni
1998-10-01
We present results that compare the performance of Bayesian learning methods for neural networks on the task of classifying forest scenes into trees and background. Classification task is demanding due to the texture richness of the trees, occlusions of the forest scene objects and diverse lighting conditions under operation. This makes it difficult to determine which are optimal image features for the classification. A natural way to proceed is to extract many different types of potentially suitable features, and to evaluate their usefulness in later processing stages. One approach to cope with large number of features is to use Bayesian methods to control the model complexity. Bayesian learning uses a prior on model parameters, combines this with evidence from a training data, and the integrates over the resulting posterior to make predictions. With this method, we can use large networks and many features without fear of overfitting. For this classification task we compare two Bayesian learning methods for multi-layer perceptron (MLP) neural networks: (1) The evidence framework of MacKay uses a Gaussian approximation to the posterior weight distribution and maximizes with respect to hyperparameters. (2) In a Markov Chain Monte Carlo (MCMC) method due to Neal, the posterior distribution of the network parameters is numerically integrated using the MCMC method. As baseline classifiers for comparison we use (3) MLP early stop committee, (4) K-nearest-neighbor and (5) Classification And Regression Tree.
Development of a Bayesian Classifier for Breast Cancer Risk Stratification: A Feasibility Study
2010-03-29
IV 0 2 10 23 BIRADS V 0 1 2 1 No mammogram 116 94 55 45 Breast biopsy category .4076 Benign, no atypia 19 12 27 34 Premalignant 1 0 2 4 Infiltrating... breast EIS result∗ Estimated outcome, % Known evidence Biopsy category EIS Gail Benign, no Infiltrating cancer Case frequency, % result cutoff‘ atypia or...Development of a Bayesian Classifier for Breast Cancer Risk Stratification: A Feasibility Study Alexander Stojadinovic, MD,a,b Christina Eberhardt,a
Accurate, Rapid Taxonomic Classification of Fungal Large-Subunit rRNA Genes
Liu, Kuan-Liang; Porras-Alfaro, Andrea; Eichorst, Stephanie A.
2012-01-01
Taxonomic and phylogenetic fingerprinting based on sequence analysis of gene fragments from the large-subunit rRNA (LSU) gene or the internal transcribed spacer (ITS) region is becoming an integral part of fungal classification. The lack of an accurate and robust classification tool trained by a validated sequence database for taxonomic placement of fungal LSU genes is a severe limitation in taxonomic analysis of fungal isolates or large data sets obtained from environmental surveys. Using a hand-curated set of 8,506 fungal LSU gene fragments, we determined the performance characteristics of a naïve Bayesian classifier across multiple taxonomic levels and compared the classifier performance to that of a sequence similarity-based (BLASTN) approach. The naïve Bayesian classifier was computationally more rapid (>460-fold with our system) than the BLASTN approach, and it provided equal or superior classification accuracy. Classifier accuracies were compared using sequence fragments of 100 bp and 400 bp and two different PCR primer anchor points to mimic sequence read lengths commonly obtained using current high-throughput sequencing technologies. Accuracy was higher with 400-bp sequence reads than with 100-bp reads. It was also significantly affected by sequence location across the 1,400-bp test region. The highest accuracy was obtained across either the D1 or D2 variable region. The naïve Bayesian classifier provides an effective and rapid means to classify fungal LSU sequences from large environmental surveys. The training set and tool are publicly available through the Ribosomal Database Project (http://rdp.cme.msu.edu/classifier/classifier.jsp). PMID:22194300
A Machine Learning Concept for DTN Routing
NASA Technical Reports Server (NTRS)
Dudukovich, Rachel; Hylton, Alan; Papachristou, Christos
2017-01-01
This paper discusses the concept and architecture of a machine learning based router for delay tolerant space networks. The techniques of reinforcement learning and Bayesian learning are used to supplement the routing decisions of the popular Contact Graph Routing algorithm. An introduction to the concepts of Contact Graph Routing, Q-routing and Naive Bayes classification are given. The development of an architecture for a cross-layer feedback framework for DTN (Delay-Tolerant Networking) protocols is discussed. Finally, initial simulation setup and results are given.
Ngcapu, Sinaye; Theys, Kristof; Libin, Pieter; Marconi, Vincent C; Sunpath, Henry; Ndung'u, Thumbi; Gordon, Michelle L
2017-11-08
The South African national treatment programme includes nucleoside reverse transcriptase inhibitors (NRTIs) in both first and second line highly active antiretroviral therapy regimens. Mutations in the RNase H domain have been associated with resistance to NRTIs but primarily in HIV-1 subtype B studies. Here, we investigated the prevalence and association of RNase H mutations with NRTI resistance in sequences from HIV-1 subtype C infected individuals. RNase H sequences from 112 NRTI treated but virologically failing individuals and 28 antiretroviral therapy (ART)-naive individuals were generated and analysed. In addition, sequences from 359 subtype C ART-naive sequences were downloaded from Los Alamos database to give a total of 387 sequences from ART-naive individuals for the analysis. Fisher's exact test was used to identify mutations and Bayesian network learning was applied to identify novel NRTI resistance mutation pathways in RNase H domain. The mutations A435L, S468A, T470S, L484I, A508S, Q509L, L517I, Q524E and E529D were more prevalent in sequences from treatment-experienced compared to antiretroviral treatment naive individuals, however, only the E529D mutation remained significant after correction for multiple comparison. Our findings suggest a potential interaction between E529D and NRTI-treatment; however, site-directed mutagenesis is needed to understand the impact of this RNase H mutation.
Semisupervised learning using Bayesian interpretation: application to LS-SVM.
Adankon, Mathias M; Cheriet, Mohamed; Biem, Alain
2011-04-01
Bayesian reasoning provides an ideal basis for representing and manipulating uncertain knowledge, with the result that many interesting algorithms in machine learning are based on Bayesian inference. In this paper, we use the Bayesian approach with one and two levels of inference to model the semisupervised learning problem and give its application to the successful kernel classifier support vector machine (SVM) and its variant least-squares SVM (LS-SVM). Taking advantage of Bayesian interpretation of LS-SVM, we develop a semisupervised learning algorithm for Bayesian LS-SVM using our approach based on two levels of inference. Experimental results on both artificial and real pattern recognition problems show the utility of our method.
NASA Astrophysics Data System (ADS)
Khalilinezhad, Mahdieh; Minaei, Behrooz; Vernazza, Gianni; Dellepiane, Silvana
2015-03-01
Data mining (DM) is the process of discovery knowledge from large databases. Applications of data mining in Blood Transfusion Organizations could be useful for improving the performance of blood donation service. The aim of this research is the prediction of healthiness of blood donors in Blood Transfusion Organization (BTO). For this goal, three famous algorithms such as Decision Tree C4.5, Naïve Bayesian classifier, and Support Vector Machine have been chosen and applied to a real database made of 11006 donors. Seven fields such as sex, age, job, education, marital status, type of donor, results of blood tests (doctors' comments and lab results about healthy or unhealthy blood donors) have been selected as input to these algorithms. The results of the three algorithms have been compared and an error cost analysis has been performed. According to this research and the obtained results, the best algorithm with low error cost and high accuracy is SVM. This research helps BTO to realize a model from blood donors in each area in order to predict the healthy blood or unhealthy blood of donors. This research could be useful if used in parallel with laboratory tests to better separate unhealthy blood.
Machine learning-based diagnosis of melanoma using macro images.
Gautam, Diwakar; Ahmed, Mushtaq; Meena, Yogesh Kumar; Ul Haq, Ahtesham
2018-05-01
Cancer bears a poisoning threat to human society. Melanoma, the skin cancer, originates from skin layers and penetrates deep into subcutaneous layers. There exists an extensive research in melanoma diagnosis using dermatoscopic images captured through a dermatoscope. While designing a diagnostic model for general handheld imaging systems is an emerging trend, this article proposes a computer-aided decision support system for macro images captured by a general-purpose camera. General imaging conditions are adversely affected by nonuniform illumination, which further affects the extraction of relevant information. To mitigate it, we process an image to define a smooth illumination surface using the multistage illumination compensation approach, and the infected region is extracted using the proposed multimode segmentation method. The lesion information is numerated as a feature set comprising geometry, photometry, border series, and texture measures. The redundancy in feature set is reduced using information theory methods, and a classification boundary is modeled to distinguish benign and malignant samples using support vector machine, random forest, neural network, and fast discriminative mixed-membership-based naive Bayesian classifiers. Moreover, the experimental outcome is supported by hypothesis testing and boxplot representation for classification losses. The simulation results prove the significance of the proposed model that shows an improved performance as compared with competing arts. Copyright © 2017 John Wiley & Sons, Ltd.
Liu, Kai-Chun; Chan, Chia-Tai
2017-01-01
The proportion of the aging population is rapidly increasing around the world, which will cause stress on society and healthcare systems. In recent years, advances in technology have created new opportunities for automatic activities of daily living (ADL) monitoring to improve the quality of life and provide adequate medical service for the elderly. Such automatic ADL monitoring requires reliable ADL information on a fine-grained level, especially for the status of interaction between body gestures and the environment in the real-world. In this work, we propose a significant change spotting mechanism for periodic human motion segmentation during cleaning task performance. A novel approach is proposed based on the search for a significant change of gestures, which can manage critical technical issues in activity recognition, such as continuous data segmentation, individual variance, and category ambiguity. Three typical machine learning classification algorithms are utilized for the identification of the significant change candidate, including a Support Vector Machine (SVM), k-Nearest Neighbors (kNN), and Naive Bayesian (NB) algorithm. Overall, the proposed approach achieves 96.41% in the F1-score by using the SVM classifier. The results show that the proposed approach can fulfill the requirement of fine-grained human motion segmentation for automatic ADL monitoring. PMID:28106853
Single molecule force spectroscopy at high data acquisition: A Bayesian nonparametric analysis
NASA Astrophysics Data System (ADS)
Sgouralis, Ioannis; Whitmore, Miles; Lapidus, Lisa; Comstock, Matthew J.; Pressé, Steve
2018-03-01
Bayesian nonparametrics (BNPs) are poised to have a deep impact in the analysis of single molecule data as they provide posterior probabilities over entire models consistent with the supplied data, not just model parameters of one preferred model. Thus they provide an elegant and rigorous solution to the difficult problem encountered when selecting an appropriate candidate model. Nevertheless, BNPs' flexibility to learn models and their associated parameters from experimental data is a double-edged sword. Most importantly, BNPs are prone to increasing the complexity of the estimated models due to artifactual features present in time traces. Thus, because of experimental challenges unique to single molecule methods, naive application of available BNP tools is not possible. Here we consider traces with time correlations and, as a specific example, we deal with force spectroscopy traces collected at high acquisition rates. While high acquisition rates are required in order to capture dwells in short-lived molecular states, in this setup, a slow response of the optical trap instrumentation (i.e., trapped beads, ambient fluid, and tethering handles) distorts the molecular signals introducing time correlations into the data that may be misinterpreted as true states by naive BNPs. Our adaptation of BNP tools explicitly takes into consideration these response dynamics, in addition to drift and noise, and makes unsupervised time series analysis of correlated single molecule force spectroscopy measurements possible, even at acquisition rates similar to or below the trap's response times.
Bayesian Correlation Analysis for Sequence Count Data
Lau, Nelson; Perkins, Theodore J.
2016-01-01
Evaluating the similarity of different measured variables is a fundamental task of statistics, and a key part of many bioinformatics algorithms. Here we propose a Bayesian scheme for estimating the correlation between different entities’ measurements based on high-throughput sequencing data. These entities could be different genes or miRNAs whose expression is measured by RNA-seq, different transcription factors or histone marks whose expression is measured by ChIP-seq, or even combinations of different types of entities. Our Bayesian formulation accounts for both measured signal levels and uncertainty in those levels, due to varying sequencing depth in different experiments and to varying absolute levels of individual entities, both of which affect the precision of the measurements. In comparison with a traditional Pearson correlation analysis, we show that our Bayesian correlation analysis retains high correlations when measurement confidence is high, but suppresses correlations when measurement confidence is low—especially for entities with low signal levels. In addition, we consider the influence of priors on the Bayesian correlation estimate. Perhaps surprisingly, we show that naive, uniform priors on entities’ signal levels can lead to highly biased correlation estimates, particularly when different experiments have widely varying sequencing depths. However, we propose two alternative priors that provably mitigate this problem. We also prove that, like traditional Pearson correlation, our Bayesian correlation calculation constitutes a kernel in the machine learning sense, and thus can be used as a similarity measure in any kernel-based machine learning algorithm. We demonstrate our approach on two RNA-seq datasets and one miRNA-seq dataset. PMID:27701449
Williams, Jennifer A.; Schmitter-Edgecombe, Maureen; Cook, Diane J.
2016-01-01
Introduction Reducing the amount of testing required to accurately detect cognitive impairment is clinically relevant. The aim of this research was to determine the fewest number of clinical measures required to accurately classify participants as healthy older adult, mild cognitive impairment (MCI) or dementia using a suite of classification techniques. Methods Two variable selection machine learning models (i.e., naive Bayes, decision tree), a logistic regression, and two participant datasets (i.e., clinical diagnosis, clinical dementia rating; CDR) were explored. Participants classified using clinical diagnosis criteria included 52 individuals with dementia, 97 with MCI, and 161 cognitively healthy older adults. Participants classified using CDR included 154 individuals CDR = 0, 93 individuals with CDR = 0.5, and 25 individuals with CDR = 1.0+. Twenty-seven demographic, psychological, and neuropsychological variables were available for variable selection. Results No significant difference was observed between naive Bayes, decision tree, and logistic regression models for classification of both clinical diagnosis and CDR datasets. Participant classification (70.0 – 99.1%), geometric mean (60.9 – 98.1%), sensitivity (44.2 – 100%), and specificity (52.7 – 100%) were generally satisfactory. Unsurprisingly, the MCI/CDR = 0.5 participant group was the most challenging to classify. Through variable selection only 2 – 9 variables were required for classification and varied between datasets in a clinically meaningful way. Conclusions The current study results reveal that machine learning techniques can accurately classifying cognitive impairment and reduce the number of measures required for diagnosis. PMID:26332171
Non-proliferative diabetic retinopathy symptoms detection and classification using neural network.
Al-Jarrah, Mohammad A; Shatnawi, Hadeel
2017-08-01
Diabetic retinopathy (DR) causes blindness in the working age for people with diabetes in most countries. The increasing number of people with diabetes worldwide suggests that DR will continue to be major contributors to vision loss. Early detection of retinopathy progress in individuals with diabetes is critical for preventing visual loss. Non-proliferative DR (NPDR) is an early stage of DR. Moreover, NPDR can be classified into mild, moderate and severe. This paper proposes a novel morphology-based algorithm for detecting retinal lesions and classifying each case. First, the proposed algorithm detects the three DR lesions, namely haemorrhages, microaneurysms and exudates. Second, we defined and extracted a set of features from detected lesions. The set of selected feature emulates what physicians looked for in classifying NPDR case. Finally, we designed an artificial neural network (ANN) classifier with three layers to classify NPDR to normal, mild, moderate and severe. Bayesian regularisation and resilient backpropagation algorithms are used to train ANN. The accuracy for the proposed classifiers based on Bayesian regularisation and resilient backpropagation algorithms are 96.6 and 89.9, respectively. The obtained results are compared with results of the recent published classifier. Our proposed classifier outperforms the best in terms of sensitivity and specificity.
Elegido, Ana; Graell, Montserrat; Andrés, Patricia; Gheorghe, Alina; Marcos, Ascensión; Nova, Esther
2017-03-01
Anorexia nervosa (AN) is an atypical form of malnutrition with peculiar changes in the immune system. We hypothesized that different lymphocyte subsets are differentially affected by malnutrition in AN, and thus, our aim was to investigate the influence of body mass loss on the variability of lymphocyte subsets in AN patients. A group of 66 adolescent female patients, aged 12-17 years, referred for their first episode of either AN or feeding or eating disorders not elsewhere classified were studied upon admission (46 AN-restricting subtype, 11 AN-binge/purging subtype, and 9 feeding or eating disorders not elsewhere classified). Ninety healthy adolescents served as controls. White blood cells and lymphocyte subsets were analyzed by flow cytometry. Relationships with the body mass index (BMI) z score were assessed in linear models adjusted by diagnostic subtype and age. Leukocyte numbers were lower in AN patients than in controls, and relative lymphocytosis was observed in AN-restricting subtype. Lower CD8 + , NK, and memory CD8 + counts were found in eating disorder patients compared with controls. No differences were found for CD4 + counts or naive and memory CD4 + subsets between the groups. Negative associations between lymphocyte percentage and the BMI z score, as well as between the B cell counts, naive CD4 + percentage and counts, and the BMI z score, were found. In conclusion, increased naive CD4 + and B lymphocyte subsets associated with body mass loss drive the relative lymphocytosis observed in AN patients, which reflects an adaptive mechanism to preserve the adaptive immune response. Copyright © 2017 Elsevier Inc. All rights reserved.
Morales, Dinora Araceli; Bengoetxea, Endika; Larrañaga, Pedro; García, Miguel; Franco, Yosu; Fresnada, Mónica; Merino, Marisa
2008-05-01
In vitro fertilization (IVF) is a medically assisted reproduction technique that enables infertile couples to achieve successful pregnancy. Given the uncertainty of the treatment, we propose an intelligent decision support system based on supervised classification by Bayesian classifiers to aid to the selection of the most promising embryos that will form the batch to be transferred to the woman's uterus. The aim of the supervised classification system is to improve overall success rate of each IVF treatment in which a batch of embryos is transferred each time, where the success is achieved when implantation (i.e. pregnancy) is obtained. Due to ethical reasons, different legislative restrictions apply in every country on this technique. In Spain, legislation allows a maximum of three embryos to form each transfer batch. As a result, clinicians prefer to select the embryos by non-invasive embryo examination based on simple methods and observation focused on morphology and dynamics of embryo development after fertilization. This paper proposes the application of Bayesian classifiers to this embryo selection problem in order to provide a decision support system that allows a more accurate selection than with the actual procedures which fully rely on the expertise and experience of embryologists. For this, we propose to take into consideration a reduced subset of feature variables related to embryo morphology and clinical data of patients, and from this data to induce Bayesian classification models. Results obtained applying a filter technique to choose the subset of variables, and the performance of Bayesian classifiers using them, are presented.
Nef, Tobias; Urwyler, Prabitha; Büchler, Marcel; Tarnanas, Ioannis; Stucki, Reto; Cazzoli, Dario; Müri, René; Mosimann, Urs
2012-01-01
Smart homes for the aging population have recently started attracting the attention of the research community. The “health state” of smart homes is comprised of many different levels; starting with the physical health of citizens, it also includes longer-term health norms and outcomes, as well as the arena of positive behavior changes. One of the problems of interest is to monitor the activities of daily living (ADL) of the elderly, aiming at their protection and well-being. For this purpose, we installed passive infrared (PIR) sensors to detect motion in a specific area inside a smart apartment and used them to collect a set of ADL. In a novel approach, we describe a technology that allows the ground truth collected in one smart home to train activity recognition systems for other smart homes. We asked the users to label all instances of all ADL only once and subsequently applied data mining techniques to cluster in-home sensor firings. Each cluster would therefore represent the instances of the same activity. Once the clusters were associated to their corresponding activities, our system was able to recognize future activities. To improve the activity recognition accuracy, our system preprocessed raw sensor data by identifying overlapping activities. To evaluate the recognition performance from a 200-day dataset, we implemented three different active learning classification algorithms and compared their performance: naive Bayesian (NB), support vector machine (SVM) and random forest (RF). Based on our results, the RF classifier recognized activities with an average specificity of 96.53%, a sensitivity of 68.49%, a precision of 74.41% and an F-measure of 71.33%, outperforming both the NB and SVM classifiers. Further clustering markedly improved the results of the RF classifier. An activity recognition system based on PIR sensors in conjunction with a clustering classification approach was able to detect ADL from datasets collected from different homes. Thus, our PIR-based smart home technology could improve care and provide valuable information to better understand the functioning of our societies, as well as to inform both individual and collective action in a smart city scenario. PMID:26007727
Nef, Tobias; Urwyler, Prabitha; Büchler, Marcel; Tarnanas, Ioannis; Stucki, Reto; Cazzoli, Dario; Müri, René; Mosimann, Urs
2015-05-21
Smart homes for the aging population have recently started attracting the attention of the research community. The "health state" of smart homes is comprised of many different levels; starting with the physical health of citizens, it also includes longer-term health norms and outcomes, as well as the arena of positive behavior changes. One of the problems of interest is to monitor the activities of daily living (ADL) of the elderly, aiming at their protection and well-being. For this purpose, we installed passive infrared (PIR) sensors to detect motion in a specific area inside a smart apartment and used them to collect a set of ADL. In a novel approach, we describe a technology that allows the ground truth collected in one smart home to train activity recognition systems for other smart homes. We asked the users to label all instances of all ADL only once and subsequently applied data mining techniques to cluster in-home sensor firings. Each cluster would therefore represent the instances of the same activity. Once the clusters were associated to their corresponding activities, our system was able to recognize future activities. To improve the activity recognition accuracy, our system preprocessed raw sensor data by identifying overlapping activities. To evaluate the recognition performance from a 200-day dataset, we implemented three different active learning classification algorithms and compared their performance: naive Bayesian (NB), support vector machine (SVM) and random forest (RF). Based on our results, the RF classifier recognized activities with an average specificity of 96.53%, a sensitivity of 68.49%, a precision of 74.41% and an F-measure of 71.33%, outperforming both the NB and SVM classifiers. Further clustering markedly improved the results of the RF classifier. An activity recognition system based on PIR sensors in conjunction with a clustering classification approach was able to detect ADL from datasets collected from different homes. Thus, our PIR-based smart home technology could improve care and provide valuable information to better understand the functioning of our societies, as well as to inform both individual and collective action in a smart city scenario.
Learning Negotiation Policies Using IB3 and Bayesian Networks
NASA Astrophysics Data System (ADS)
Nalepa, Gislaine M.; Ávila, Bráulio C.; Enembreck, Fabrício; Scalabrin, Edson E.
This paper presents an intelligent offer policy in a negotiation environment, in which each agent involved learns the preferences of its opponent in order to improve its own performance. Each agent must also be able to detect drifts in the opponent's preferences so as to quickly adjust itself to their new offer policy. For this purpose, two simple learning techniques were first evaluated: (i) based on instances (IB3) and (ii) based on Bayesian Networks. Additionally, as its known that in theory group learning produces better results than individual/single learning, the efficiency of IB3 and Bayesian classifier groups were also analyzed. Finally, each decision model was evaluated in moments of concept drift, being the drift gradual, moderate or abrupt. Results showed that both groups of classifiers were able to effectively detect drifts in the opponent's preferences.
Sopharak, Akara; Uyyanonvara, Bunyarit; Barman, Sarah
2013-01-01
Microaneurysms detection is an important task in computer aided diagnosis of diabetic retinopathy. Microaneurysms are the first clinical sign of diabetic retinopathy, a major cause of vision loss in diabetic patients. Early microaneurysm detection can help reduce the incidence of blindness. Automatic detection of microaneurysms is still an open problem due to their tiny sizes, low contrast and also similarity with blood vessels. It is particularly very difficult to detect fine microaneurysms, especially from non-dilated pupils and that is the goal of this paper. Simple yet effective methods are used. They are coarse segmentation using mathematic morphology and fine segmentation using naive Bayes classifier. A total of 18 microaneurysms features are proposed in this paper and they are extracted for naive Bayes classifier. The detected microaneurysms are validated by comparing at pixel level with ophthalmologists' hand-drawn ground-truth. The sensitivity, specificity, precision and accuracy are 85.68, 99.99, 83.34 and 99.99%, respectively. Copyright © 2013 Elsevier Ltd. All rights reserved.
Jia, Cangzhi; Zuo, Yun; Zou, Quan; Hancock, John
2018-02-06
Protein O-GlcNAcylation (O-GlcNAc) is an important post-translational modification of serine (S)/threonine (T) residues that involves multiple molecular and cellular processes. Recent studies have suggested that abnormal O-G1cNAcylation causes many diseases, such as cancer and various neurodegenerative diseases. With the available protein O-G1cNAcylation sites experimentally verified, it is highly desired to develop automated methods to rapidly and effectively identify O-G1cNAcylation sites. Although some computational methods have been proposed, their performance has been unsatisfactory, particularly in terms of prediction sensitivity. In this study, we developed an ensemble model O-GlcNAcPRED-II to identify potential O-G1cNAcylation sites. A K-means principal component analysis oversampling technique (KPCA) and fuzzy undersampling method (FUS) were first proposed and incorporated to reduce the proportion of the original positive and negative training samples. Then, rotation forest, a type of classifier-integrated system, was adopted to divide the eight types of feature space into several subsets using four sub-classifiers: random forest, k-nearest neighbour, naive Bayesian and support vector machine. We observed that O-GlcNAcPRED-II achieved a sensitivity of 81.05%, specificity of 95.91%, accuracy of 91.43% and Matthew's correlation coefficient of 0.7928 for five-fold cross-validation run 10 times. Additionally, the results obtained by O-GlcNAcPRED-II on two independent datasets also indicated that the proposed predictor outperformed five published prediction tools. http://121.42.167.206/OGlcPred/. cangzhijia@dlmu.edu.cn or zouquan@nclab.net. © The Author (2018). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Classification of Active Microwave and Passive Optical Data Based on Bayesian Theory and Mrf
NASA Astrophysics Data System (ADS)
Yu, F.; Li, H. T.; Han, Y. S.; Gu, H. Y.
2012-08-01
A classifier based on Bayesian theory and Markov random field (MRF) is presented to classify the active microwave and passive optical remote sensing data, which have demonstrated their respective advantages in inversion of surface soil moisture content. In the method, the VV, VH polarization of ASAR and all the 7 TM bands are taken as the input of the classifier to get the class labels of each pixel of the images. And the model is validated for the necessities of integration of TM and ASAR, it shows that, the total precision of classification in this paper is 89.4%. Comparing with the classification with single TM, the accuracy increase 11.5%, illustrating that synthesis of active and passive optical remote sensing data is efficient and potential in classification.
ERIC Educational Resources Information Center
Abayomi, Kobi; Pizarro, Gonzalo
2013-01-01
We offer a straightforward framework for measurement of progress, across many dimensions, using cross-national social indices, which we classify as linear combinations of multivariate country level data onto a univariate score. We suggest a Bayesian approach which yields probabilistic (confidence type) intervals for the point estimates of country…
Know your data: understanding implicit usage versus explicit action in video content classification
NASA Astrophysics Data System (ADS)
Yew, Jude; Shamma, David A.
2011-02-01
In this paper, we present a method for video category classification using only social metadata from websites like YouTube. In place of content analysis, we utilize communicative and social contexts surrounding videos as a means to determine a categorical genre, e.g. Comedy, Music. We hypothesize that video clips belonging to different genre categories would have distinct signatures and patterns that are reflected in their collected metadata. In particular, we define and describe social metadata as usage or action to aid in classification. We trained a Naive Bayes classifier to predict categories from a sample of 1,740 YouTube videos representing the top five genre categories. Using just a small number of the available metadata features, we compare the classifications produced by our Naive Bayes classifier with those provided by the uploader of that particular video. Compared to random predictions with the YouTube data (21% accurate), our classifier attained a mediocre 33% accuracy in predicting video genres. However, we found that the accuracy of our classifier significantly improves by nominal factoring of the explicit data features. By factoring the ratings of the videos in the dataset, the classifier was able to accurately predict the genres of 75% of the videos. We argue that the patterns of social activity found in the metadata are not just meaningful in their own right, but are indicative of the meaning of the shared video content. The results presented by this project represents a first step in investigating the potential meaning and significance of social metadata and its relation to the media experience.
On the structure of Bayesian network for Indonesian text document paraphrase identification
NASA Astrophysics Data System (ADS)
Prayogo, Ario Harry; Syahrul Mubarok, Mohamad; Adiwijaya
2018-03-01
Paraphrase identification is an important process within natural language processing. The idea is to automatically recognize phrases that have different forms but contain same meanings. For examples if we input query “causing fire hazard”, then the computer has to recognize this query that this query has same meaning as “the cause of fire hazard. Paraphrasing is an activity that reveals the meaning of an expression, writing, or speech using different words or forms, especially to achieve greater clarity. In this research we will focus on classifying two Indonesian sentences whether it is a paraphrase to each other or not. There are four steps in this research, first is preprocessing, second is feature extraction, third is classifier building, and the last is performance evaluation. Preprocessing consists of tokenization, non-alphanumerical removal, and stemming. After preprocessing we will conduct feature extraction in order to build new features from given dataset. There are two kinds of features in the research, syntactic features and semantic features. Syntactic features consist of normalized levenshtein distance feature, term-frequency based cosine similarity feature, and LCS (Longest Common Subsequence) feature. Semantic features consist of Wu and Palmer feature and Shortest Path Feature. We use Bayesian Networks as the method of training the classifier. Parameter estimation that we use is called MAP (Maximum A Posteriori). For structure learning of Bayesian Networks DAG (Directed Acyclic Graph), we use BDeu (Bayesian Dirichlet equivalent uniform) scoring function and for finding DAG with the best BDeu score, we use K2 algorithm. In evaluation step we perform cross-validation. The average result that we get from testing the classifier as follows: Precision 75.2%, Recall 76.5%, F1-Measure 75.8% and Accuracy 75.6%.
Application of a Hidden Bayes Naive Multiclass Classifier in Network Intrusion Detection
ERIC Educational Resources Information Center
Koc, Levent
2013-01-01
With increasing Internet connectivity and traffic volume, recent intrusion incidents have reemphasized the importance of network intrusion detection systems for combating increasingly sophisticated network attacks. Techniques such as pattern recognition and the data mining of network events are often used by intrusion detection systems to classify…
Persistence of the Intuitive Conception of Living Things in Adolescence
ERIC Educational Resources Information Center
Babai, Reuven; Sekal, Rachel; Stavy, Ruth
2010-01-01
This study investigated whether intuitive, naive conceptions of "living things" based on objects' mobility (movement = alive) persist into adolescence and affect 10th graders' accuracy of responses and reaction times during object classification. Most of the 58 students classified the test objects correctly as living/nonliving, yet they…
Thermal bioaerosol cloud tracking with Bayesian classification
NASA Astrophysics Data System (ADS)
Smith, Christian W.; Dupuis, Julia R.; Schundler, Elizabeth C.; Marinelli, William J.
2017-05-01
The development of a wide area, bioaerosol early warning capability employing existing uncooled thermal imaging systems used for persistent perimeter surveillance is discussed. The capability exploits thermal imagers with other available data streams including meteorological data and employs a recursive Bayesian classifier to detect, track, and classify observed thermal objects with attributes consistent with a bioaerosol plume. Target detection is achieved based on similarity to a phenomenological model which predicts the scene-dependent thermal signature of bioaerosol plumes. Change detection in thermal sensor data is combined with local meteorological data to locate targets with the appropriate thermal characteristics. Target motion is tracked utilizing a Kalman filter and nearly constant velocity motion model for cloud state estimation. Track management is performed using a logic-based upkeep system, and data association is accomplished using a combinatorial optimization technique. Bioaerosol threat classification is determined using a recursive Bayesian classifier to quantify the threat probability of each tracked object. The classifier can accept additional inputs from visible imagers, acoustic sensors, and point biological sensors to improve classification confidence. This capability was successfully demonstrated for bioaerosol simulant releases during field testing at Dugway Proving Grounds. Standoff detection at a range of 700m was achieved for as little as 500g of anthrax simulant. Developmental test results will be reviewed for a range of simulant releases, and future development and transition plans for the bioaerosol early warning platform will be discussed.
Muthu Rama Krishnan, M; Shah, Pratik; Chakraborty, Chandan; Ray, Ajoy K
2012-04-01
The objective of this paper is to provide an improved technique, which can assist oncopathologists in correct screening of oral precancerous conditions specially oral submucous fibrosis (OSF) with significant accuracy on the basis of collagen fibres in the sub-epithelial connective tissue. The proposed scheme is composed of collagen fibres segmentation, its textural feature extraction and selection, screening perfomance enhancement under Gaussian transformation and finally classification. In this study, collagen fibres are segmented on R,G,B color channels using back-probagation neural network from 60 normal and 59 OSF histological images followed by histogram specification for reducing the stain intensity variation. Henceforth, textural features of collgen area are extracted using fractal approaches viz., differential box counting and brownian motion curve . Feature selection is done using Kullback-Leibler (KL) divergence criterion and the screening performance is evaluated based on various statistical tests to conform Gaussian nature. Here, the screening performance is enhanced under Gaussian transformation of the non-Gaussian features using hybrid distribution. Moreover, the routine screening is designed based on two statistical classifiers viz., Bayesian classification and support vector machines (SVM) to classify normal and OSF. It is observed that SVM with linear kernel function provides better classification accuracy (91.64%) as compared to Bayesian classifier. The addition of fractal features of collagen under Gaussian transformation improves Bayesian classifier's performance from 80.69% to 90.75%. Results are here studied and discussed.
The mode of toxic action (MoA) has been recognized as a key determinant of chemical toxicity, but development of predictive MoA classification models in aquatic toxicology has been limited. We developed a Bayesian network model to classify aquatic toxicity MoA using a recently pu...
Predicting Student Success: A Naïve Bayesian Application to Community College Data
ERIC Educational Resources Information Center
Ornelas, Fermin; Ordonez, Carlos
2017-01-01
This research focuses on developing and implementing a continuous Naïve Bayesian classifier for GEAR courses at Rio Salado Community College. Previous implementation efforts of a discrete version did not predict as well, 70%, and had deployment issues. This predictive model has higher prediction, over 90%, accuracy for both at-risk and successful…
The mode of toxic action (MoA) has been recognized as a key determinant of chemical toxicity but MoA classification in aquatic toxicology has been limited. We developed a Bayesian network model to classify aquatic toxicity mode of action using a recently published dataset contain...
Zollanvari, Amin; Dougherty, Edward R
2014-06-01
The most important aspect of any classifier is its error rate, because this quantifies its predictive capacity. Thus, the accuracy of error estimation is critical. Error estimation is problematic in small-sample classifier design because the error must be estimated using the same data from which the classifier has been designed. Use of prior knowledge, in the form of a prior distribution on an uncertainty class of feature-label distributions to which the true, but unknown, feature-distribution belongs, can facilitate accurate error estimation (in the mean-square sense) in circumstances where accurate completely model-free error estimation is impossible. This paper provides analytic asymptotically exact finite-sample approximations for various performance metrics of the resulting Bayesian Minimum Mean-Square-Error (MMSE) error estimator in the case of linear discriminant analysis (LDA) in the multivariate Gaussian model. These performance metrics include the first, second, and cross moments of the Bayesian MMSE error estimator with the true error of LDA, and therefore, the Root-Mean-Square (RMS) error of the estimator. We lay down the theoretical groundwork for Kolmogorov double-asymptotics in a Bayesian setting, which enables us to derive asymptotic expressions of the desired performance metrics. From these we produce analytic finite-sample approximations and demonstrate their accuracy via numerical examples. Various examples illustrate the behavior of these approximations and their use in determining the necessary sample size to achieve a desired RMS. The Supplementary Material contains derivations for some equations and added figures.
Elementary School Students' Understandings of Technology Concepts.
ERIC Educational Resources Information Center
Davis, Robert S.; Ginns, Ian S.; McRobbie, Campbell J.
2002-01-01
Students in grades 2 (n=27), 4 (n=37), and 6 (n=28) were asked questions about artifacts or pictures. Their explanations, which revealed their understanding of such technological concepts as material properties and stability, were classified as naive, artifact related, or not artifact related. Explanations tended to cluster in a classification at…
Bayesian truthing and experimental validation in homeland security and defense
NASA Astrophysics Data System (ADS)
Jannson, Tomasz; Forrester, Thomas; Wang, Wenjian; Kostrzewski, Andrew; Pradhan, Ranjit
2014-05-01
In this paper we discuss relations between Bayesian Truthing (experimental validation), Bayesian statistics, and Binary Sensing in the context of selected Homeland Security and Intelligence, Surveillance, Reconnaissance (ISR) optical and nonoptical application scenarios. The basic Figure of Merit (FoM) is Positive Predictive Value (PPV), as well as false positives and false negatives. By using these simple binary statistics, we can analyze, classify, and evaluate a broad variety of events including: ISR; natural disasters; QC; and terrorism-related, GIS-related, law enforcement-related, and other C3I events.
Relevance popularity: A term event model based feature selection scheme for text classification.
Feng, Guozhong; An, Baiguo; Yang, Fengqin; Wang, Han; Zhang, Libiao
2017-01-01
Feature selection is a practical approach for improving the performance of text classification methods by optimizing the feature subsets input to classifiers. In traditional feature selection methods such as information gain and chi-square, the number of documents that contain a particular term (i.e. the document frequency) is often used. However, the frequency of a given term appearing in each document has not been fully investigated, even though it is a promising feature to produce accurate classifications. In this paper, we propose a new feature selection scheme based on a term event Multinomial naive Bayes probabilistic model. According to the model assumptions, the matching score function, which is based on the prediction probability ratio, can be factorized. Finally, we derive a feature selection measurement for each term after replacing inner parameters by their estimators. On a benchmark English text datasets (20 Newsgroups) and a Chinese text dataset (MPH-20), our numerical experiment results obtained from using two widely used text classifiers (naive Bayes and support vector machine) demonstrate that our method outperformed the representative feature selection methods.
Automated Classification of Pathology Reports.
Oleynik, Michel; Finger, Marcelo; Patrão, Diogo F C
2015-01-01
This work develops an automated classifier of pathology reports which infers the topography and the morphology classes of a tumor using codes from the International Classification of Diseases for Oncology (ICD-O). Data from 94,980 patients of the A.C. Camargo Cancer Center was used for training and validation of Naive Bayes classifiers, evaluated by the F1-score. Measures greater than 74% in the topographic group and 61% in the morphologic group are reported. Our work provides a successful baseline for future research for the classification of medical documents written in Portuguese and in other domains.
Nowakowska, Marzena
2017-04-01
The development of the Bayesian logistic regression model classifying the road accident severity is discussed. The already exploited informative priors (method of moments, maximum likelihood estimation, and two-stage Bayesian updating), along with the original idea of a Boot prior proposal, are investigated when no expert opinion has been available. In addition, two possible approaches to updating the priors, in the form of unbalanced and balanced training data sets, are presented. The obtained logistic Bayesian models are assessed on the basis of a deviance information criterion (DIC), highest probability density (HPD) intervals, and coefficients of variation estimated for the model parameters. The verification of the model accuracy has been based on sensitivity, specificity and the harmonic mean of sensitivity and specificity, all calculated from a test data set. The models obtained from the balanced training data set have a better classification quality than the ones obtained from the unbalanced training data set. The two-stage Bayesian updating prior model and the Boot prior model, both identified with the use of the balanced training data set, outperform the non-informative, method of moments, and maximum likelihood estimation prior models. It is important to note that one should be careful when interpreting the parameters since different priors can lead to different models. Copyright © 2017 Elsevier Ltd. All rights reserved.
Modeling Statistical Insensitivity: Sources of Suboptimal Behavior
ERIC Educational Resources Information Center
Gagliardi, Annie; Feldman, Naomi H.; Lidz, Jeffrey
2017-01-01
Children acquiring languages with noun classes (grammatical gender) have ample statistical information available that characterizes the distribution of nouns into these classes, but their use of this information to classify novel nouns differs from the predictions made by an optimal Bayesian classifier. We use rational analysis to investigate the…
Persistence of the Intuitive Conception of Living Things in Adolescence
NASA Astrophysics Data System (ADS)
Babai, Reuven; Sekal, Rachel; Stavy, Ruth
2010-02-01
This study investigated whether intuitive, naive conceptions of "living things" based on objects' mobility (movement = alive) persist into adolescence and affect 10th graders' accuracy of responses and reaction times during object classification. Most of the 58 students classified the test objects correctly as living/nonliving, yet they demonstrated significantly longer reaction times for classifying plants compared to animals and for classifying dynamic objects compared to static inanimate objects. Findings indicated that, despite prior learning in biology, the intuitive conception of living things persists up to age 15-16 years, affecting related reasoning processes. Consideration of these findings may help educators in their decisions about the nature of examples they use in their classrooms.
Bayesian Redshift Classification of Emission-line Galaxies with Photometric Equivalent Widths
NASA Astrophysics Data System (ADS)
Leung, Andrew S.; Acquaviva, Viviana; Gawiser, Eric; Ciardullo, Robin; Komatsu, Eiichiro; Malz, A. I.; Zeimann, Gregory R.; Bridge, Joanna S.; Drory, Niv; Feldmeier, John J.; Finkelstein, Steven L.; Gebhardt, Karl; Gronwall, Caryl; Hagen, Alex; Hill, Gary J.; Schneider, Donald P.
2017-07-01
We present a Bayesian approach to the redshift classification of emission-line galaxies when only a single emission line is detected spectroscopically. We consider the case of surveys for high-redshift Lyα-emitting galaxies (LAEs), which have traditionally been classified via an inferred rest-frame equivalent width (EW {W}{Lyα }) greater than 20 Å. Our Bayesian method relies on known prior probabilities in measured emission-line luminosity functions and EW distributions for the galaxy populations, and returns the probability that an object in question is an LAE given the characteristics observed. This approach will be directly relevant for the Hobby-Eberly Telescope Dark Energy Experiment (HETDEX), which seeks to classify ˜106 emission-line galaxies into LAEs and low-redshift [{{O}} {{II}}] emitters. For a simulated HETDEX catalog with realistic measurement noise, our Bayesian method recovers 86% of LAEs missed by the traditional {W}{Lyα } > 20 Å cutoff over 2 < z < 3, outperforming the EW cut in both contamination and incompleteness. This is due to the method’s ability to trade off between the two types of binary classification error by adjusting the stringency of the probability requirement for classifying an observed object as an LAE. In our simulations of HETDEX, this method reduces the uncertainty in cosmological distance measurements by 14% with respect to the EW cut, equivalent to recovering 29% more cosmological information. Rather than using binary object labels, this method enables the use of classification probabilities in large-scale structure analyses. It can be applied to narrowband emission-line surveys as well as upcoming large spectroscopic surveys including Euclid and WFIRST.
Protein classification based on text document classification techniques.
Cheng, Betty Yee Man; Carbonell, Jaime G; Klein-Seetharaman, Judith
2005-03-01
The need for accurate, automated protein classification methods continues to increase as advances in biotechnology uncover new proteins. G-protein coupled receptors (GPCRs) are a particularly difficult superfamily of proteins to classify due to extreme diversity among its members. Previous comparisons of BLAST, k-nearest neighbor (k-NN), hidden markov model (HMM) and support vector machine (SVM) using alignment-based features have suggested that classifiers at the complexity of SVM are needed to attain high accuracy. Here, analogous to document classification, we applied Decision Tree and Naive Bayes classifiers with chi-square feature selection on counts of n-grams (i.e. short peptide sequences of length n) to this classification task. Using the GPCR dataset and evaluation protocol from the previous study, the Naive Bayes classifier attained an accuracy of 93.0 and 92.4% in level I and level II subfamily classification respectively, while SVM has a reported accuracy of 88.4 and 86.3%. This is a 39.7 and 44.5% reduction in residual error for level I and level II subfamily classification, respectively. The Decision Tree, while inferior to SVM, outperforms HMM in both level I and level II subfamily classification. For those GPCR families whose profiles are stored in the Protein FAMilies database of alignments and HMMs (PFAM), our method performs comparably to a search against those profiles. Finally, our method can be generalized to other protein families by applying it to the superfamily of nuclear receptors with 94.5, 97.8 and 93.6% accuracy in family, level I and level II subfamily classification respectively. Copyright 2005 Wiley-Liss, Inc.
ERIC Educational Resources Information Center
Beretvas, S. Natasha; Murphy, Daniel L.
2013-01-01
The authors assessed correct model identification rates of Akaike's information criterion (AIC), corrected criterion (AICC), consistent AIC (CAIC), Hannon and Quinn's information criterion (HQIC), and Bayesian information criterion (BIC) for selecting among cross-classified random effects models. Performance of default values for the 5…
A comparison of machine learning and Bayesian modelling for molecular serotyping.
Newton, Richard; Wernisch, Lorenz
2017-08-11
Streptococcus pneumoniae is a human pathogen that is a major cause of infant mortality. Identifying the pneumococcal serotype is an important step in monitoring the impact of vaccines used to protect against disease. Genomic microarrays provide an effective method for molecular serotyping. Previously we developed an empirical Bayesian model for the classification of serotypes from a molecular serotyping array. With only few samples available, a model driven approach was the only option. In the meanwhile, several thousand samples have been made available to us, providing an opportunity to investigate serotype classification by machine learning methods, which could complement the Bayesian model. We compare the performance of the original Bayesian model with two machine learning algorithms: Gradient Boosting Machines and Random Forests. We present our results as an example of a generic strategy whereby a preliminary probabilistic model is complemented or replaced by a machine learning classifier once enough data are available. Despite the availability of thousands of serotyping arrays, a problem encountered when applying machine learning methods is the lack of training data containing mixtures of serotypes; due to the large number of possible combinations. Most of the available training data comprises samples with only a single serotype. To overcome the lack of training data we implemented an iterative analysis, creating artificial training data of serotype mixtures by combining raw data from single serotype arrays. With the enhanced training set the machine learning algorithms out perform the original Bayesian model. However, for serotypes currently lacking sufficient training data the best performing implementation was a combination of the results of the Bayesian Model and the Gradient Boosting Machine. As well as being an effective method for classifying biological data, machine learning can also be used as an efficient method for revealing subtle biological insights, which we illustrate with an example.
Content Abstract Classification Using Naive Bayes
NASA Astrophysics Data System (ADS)
Latif, Syukriyanto; Suwardoyo, Untung; Aldrin Wihelmus Sanadi, Edwin
2018-03-01
This study aims to classify abstract content based on the use of the highest number of words in an abstract content of the English language journals. This research uses a system of text mining technology that extracts text data to search information from a set of documents. Abstract content of 120 data downloaded at www.computer.org. Data grouping consists of three categories: DM (Data Mining), ITS (Intelligent Transport System) and MM (Multimedia). Systems built using naive bayes algorithms to classify abstract journals and feature selection processes using term weighting to give weight to each word. Dimensional reduction techniques to reduce the dimensions of word counts rarely appear in each document based on dimensional reduction test parameters of 10% -90% of 5.344 words. The performance of the classification system is tested by using the Confusion Matrix based on comparative test data and test data. The results showed that the best classification results were obtained during the 75% training data test and 25% test data from the total data. Accuracy rates for categories of DM, ITS and MM were 100%, 100%, 86%. respectively with dimension reduction parameters of 30% and the value of learning rate between 0.1-0.5.
Bennet, Jaison; Ganaprakasam, Chilambuchelvan Arul; Arputharaj, Kannan
2014-01-01
Cancer classification by doctors and radiologists was based on morphological and clinical features and had limited diagnostic ability in olden days. The recent arrival of DNA microarray technology has led to the concurrent monitoring of thousands of gene expressions in a single chip which stimulates the progress in cancer classification. In this paper, we have proposed a hybrid approach for microarray data classification based on nearest neighbor (KNN), naive Bayes, and support vector machine (SVM). Feature selection prior to classification plays a vital role and a feature selection technique which combines discrete wavelet transform (DWT) and moving window technique (MWT) is used. The performance of the proposed method is compared with the conventional classifiers like support vector machine, nearest neighbor, and naive Bayes. Experiments have been conducted on both real and benchmark datasets and the results indicate that the ensemble approach produces higher classification accuracy than conventional classifiers. This paper serves as an automated system for the classification of cancer and can be applied by doctors in real cases which serve as a boon to the medical community. This work further reduces the misclassification of cancers which is highly not allowed in cancer detection.
Liu, Ximeng; Lu, Rongxing; Ma, Jianfeng; Chen, Le; Qin, Baodong
2016-03-01
Clinical decision support system, which uses advanced data mining techniques to help clinician make proper decisions, has received considerable attention recently. The advantages of clinical decision support system include not only improving diagnosis accuracy but also reducing diagnosis time. Specifically, with large amounts of clinical data generated everyday, naïve Bayesian classification can be utilized to excavate valuable information to improve a clinical decision support system. Although the clinical decision support system is quite promising, the flourish of the system still faces many challenges including information security and privacy concerns. In this paper, we propose a new privacy-preserving patient-centric clinical decision support system, which helps clinician complementary to diagnose the risk of patients' disease in a privacy-preserving way. In the proposed system, the past patients' historical data are stored in cloud and can be used to train the naïve Bayesian classifier without leaking any individual patient medical data, and then the trained classifier can be applied to compute the disease risk for new coming patients and also allow these patients to retrieve the top- k disease names according to their own preferences. Specifically, to protect the privacy of past patients' historical data, a new cryptographic tool called additive homomorphic proxy aggregation scheme is designed. Moreover, to leverage the leakage of naïve Bayesian classifier, we introduce a privacy-preserving top- k disease names retrieval protocol in our system. Detailed privacy analysis ensures that patient's information is private and will not be leaked out during the disease diagnosis phase. In addition, performance evaluation via extensive simulations also demonstrates that our system can efficiently calculate patient's disease risk with high accuracy in a privacy-preserving way.
Predicting Rotator Cuff Tears Using Data Mining and Bayesian Likelihood Ratios
Lu, Hsueh-Yi; Huang, Chen-Yuan; Su, Chwen-Tzeng; Lin, Chen-Chiang
2014-01-01
Objectives Rotator cuff tear is a common cause of shoulder diseases. Correct diagnosis of rotator cuff tears can save patients from further invasive, costly and painful tests. This study used predictive data mining and Bayesian theory to improve the accuracy of diagnosing rotator cuff tears by clinical examination alone. Methods In this retrospective study, 169 patients who had a preliminary diagnosis of rotator cuff tear on the basis of clinical evaluation followed by confirmatory MRI between 2007 and 2011 were identified. MRI was used as a reference standard to classify rotator cuff tears. The predictor variable was the clinical assessment results, which consisted of 16 attributes. This study employed 2 data mining methods (ANN and the decision tree) and a statistical method (logistic regression) to classify the rotator cuff diagnosis into “tear” and “no tear” groups. Likelihood ratio and Bayesian theory were applied to estimate the probability of rotator cuff tears based on the results of the prediction models. Results Our proposed data mining procedures outperformed the classic statistical method. The correction rate, sensitivity, specificity and area under the ROC curve of predicting a rotator cuff tear were statistical better in the ANN and decision tree models compared to logistic regression. Based on likelihood ratios derived from our prediction models, Fagan's nomogram could be constructed to assess the probability of a patient who has a rotator cuff tear using a pretest probability and a prediction result (tear or no tear). Conclusions Our predictive data mining models, combined with likelihood ratios and Bayesian theory, appear to be good tools to classify rotator cuff tears as well as determine the probability of the presence of the disease to enhance diagnostic decision making for rotator cuff tears. PMID:24733553
Ye, Qing; Pan, Hao; Liu, Changhua
2015-01-01
This research proposes a novel framework of final drive simultaneous failure diagnosis containing feature extraction, training paired diagnostic models, generating decision threshold, and recognizing simultaneous failure modes. In feature extraction module, adopt wavelet package transform and fuzzy entropy to reduce noise interference and extract representative features of failure mode. Use single failure sample to construct probability classifiers based on paired sparse Bayesian extreme learning machine which is trained only by single failure modes and have high generalization and sparsity of sparse Bayesian learning approach. To generate optimal decision threshold which can convert probability output obtained from classifiers into final simultaneous failure modes, this research proposes using samples containing both single and simultaneous failure modes and Grid search method which is superior to traditional techniques in global optimization. Compared with other frequently used diagnostic approaches based on support vector machine and probability neural networks, experiment results based on F 1-measure value verify that the diagnostic accuracy and efficiency of the proposed framework which are crucial for simultaneous failure diagnosis are superior to the existing approach. PMID:25722717
Variational dynamic background model for keyword spotting in handwritten documents
NASA Astrophysics Data System (ADS)
Kumar, Gaurav; Wshah, Safwan; Govindaraju, Venu
2013-12-01
We propose a bayesian framework for keyword spotting in handwritten documents. This work is an extension to our previous work where we proposed dynamic background model, DBM for keyword spotting that takes into account the local character level scores and global word level scores to learn a logistic regression classifier to separate keywords from non-keywords. In this work, we add a bayesian layer on top of the DBM called the variational dynamic background model, VDBM. The logistic regression classifier uses the sigmoid function to separate keywords from non-keywords. The sigmoid function being neither convex nor concave, exact inference of VDBM becomes intractable. An expectation maximization step is proposed to do approximate inference. The advantage of VDBM over the DBM is multi-fold. Firstly, being bayesian, it prevents over-fitting of data. Secondly, it provides better modeling of data and an improved prediction of unseen data. VDBM is evaluated on the IAM dataset and the results prove that it outperforms our prior work and other state of the art line based word spotting system.
Zollanvari, Amin; Dougherty, Edward R
2016-12-01
In classification, prior knowledge is incorporated in a Bayesian framework by assuming that the feature-label distribution belongs to an uncertainty class of feature-label distributions governed by a prior distribution. A posterior distribution is then derived from the prior and the sample data. An optimal Bayesian classifier (OBC) minimizes the expected misclassification error relative to the posterior distribution. From an application perspective, prior construction is critical. The prior distribution is formed by mapping a set of mathematical relations among the features and labels, the prior knowledge, into a distribution governing the probability mass across the uncertainty class. In this paper, we consider prior knowledge in the form of stochastic differential equations (SDEs). We consider a vector SDE in integral form involving a drift vector and dispersion matrix. Having constructed the prior, we develop the optimal Bayesian classifier between two models and examine, via synthetic experiments, the effects of uncertainty in the drift vector and dispersion matrix. We apply the theory to a set of SDEs for the purpose of differentiating the evolutionary history between two species.
Akhtar, Naveed; Mian, Ajmal
2017-10-03
We present a principled approach to learn a discriminative dictionary along a linear classifier for hyperspectral classification. Our approach places Gaussian Process priors over the dictionary to account for the relative smoothness of the natural spectra, whereas the classifier parameters are sampled from multivariate Gaussians. We employ two Beta-Bernoulli processes to jointly infer the dictionary and the classifier. These processes are coupled under the same sets of Bernoulli distributions. In our approach, these distributions signify the frequency of the dictionary atom usage in representing class-specific training spectra, which also makes the dictionary discriminative. Due to the coupling between the dictionary and the classifier, the popularity of the atoms for representing different classes gets encoded into the classifier. This helps in predicting the class labels of test spectra that are first represented over the dictionary by solving a simultaneous sparse optimization problem. The labels of the spectra are predicted by feeding the resulting representations to the classifier. Our approach exploits the nonparametric Bayesian framework to automatically infer the dictionary size--the key parameter in discriminative dictionary learning. Moreover, it also has the desirable property of adaptively learning the association between the dictionary atoms and the class labels by itself. We use Gibbs sampling to infer the posterior probability distributions over the dictionary and the classifier under the proposed model, for which, we derive analytical expressions. To establish the effectiveness of our approach, we test it on benchmark hyperspectral images. The classification performance is compared with the state-of-the-art dictionary learning-based classification methods.
Creating Diverse Ensemble Classifiers to Reduce Supervision
2005-12-01
artificial examples. Quite often training with noise improves network generalization (Bishop, 1995; Raviv & Intrator, 1996). Adding noise to training...full training set, as seen by comparing to the to- tal dataset sizes. Hence, improving on the data utilization of DECORATE is a fairly difficult task...prohibitively expensive, except (perhaps) with an incremen- tal learner such as Naive Bayes. Our AFA framework is significantly more efficient because
Automated high resolution mapping of coffee in Rwanda using an expert Bayesian network
NASA Astrophysics Data System (ADS)
Mukashema, A.; Veldkamp, A.; Vrieling, A.
2014-12-01
African highland agro-ecosystems are dominated by small-scale agricultural fields that often contain a mix of annual and perennial crops. This makes such systems difficult to map by remote sensing. We developed an expert Bayesian network model to extract the small-scale coffee fields of Rwanda from very high resolution data. The model was subsequently applied to aerial orthophotos covering more than 99% of Rwanda and on one QuickBird image for the remaining part. The method consists of a stepwise adjustment of pixel probabilities, which incorporates expert knowledge on size of coffee trees and fields, and on their location. The initial naive Bayesian network, which is a spectral-based classification, yielded a coffee map with an overall accuracy of around 50%. This confirms that standard spectral variables alone cannot accurately identify coffee fields from high resolution images. The combination of spectral and ancillary data (DEM and a forest map) allowed mapping of coffee fields and associated uncertainties with an overall accuracy of 87%. Aggregated to district units, the mapped coffee areas demonstrated a high correlation with the coffee areas reported in the detailed national coffee census of 2009 (R2 = 0.92). Unlike the census data our map provides high spatial resolution of coffee area patterns of Rwanda. The proposed method has potential for mapping other perennial small scale cropping systems in the East African Highlands and elsewhere.
Deep Learning Neural Networks and Bayesian Neural Networks in Data Analysis
NASA Astrophysics Data System (ADS)
Chernoded, Andrey; Dudko, Lev; Myagkov, Igor; Volkov, Petr
2017-10-01
Most of the modern analyses in high energy physics use signal-versus-background classification techniques of machine learning methods and neural networks in particular. Deep learning neural network is the most promising modern technique to separate signal and background and now days can be widely and successfully implemented as a part of physical analysis. In this article we compare Deep learning and Bayesian neural networks application as a classifiers in an instance of top quark analysis.
Impaired P600 in neuroleptic naive patients with first-episode schizophrenia.
Papageorgiou, C; Kontaxakis, V P; Havaki-Kontaxaki, B J; Stamouli, S; Vasios, C; Asvestas, P; Matsopoulos, G K; Kontopantelis, E; Rabavilas, A; Uzunoglu, N; Christodoulou, G N
2001-09-17
Deficits of working memory (WM) are recognized as an important pathological feature in schizophrenia. Since the P600 component of event related potentials has been hypothesized that represents aspects of second-pass parsing processes of information processing, and is related to WM, the present study focuses on P600 elicited during a WM test in drug-naive first-episode schizophrenics (FES) compared to healthy controls. We examined 16 drug-naive first-episode schizophrenic patients and 23 healthy controls matched for age and sex. Compared with controls schizophrenic patients showed reduced P600 amplitude on left temporoparietal region and increased P600 amplitude on left occipital region. With regard to the latency, the patients exhibited significantly prolongation on right temporoparietal region. The obtained pattern of differences classified correctly 89.20% of patients. Memory performance of patients was also significantly impaired relative to controls. Our results suggest that second-pass parsing process of information processing, as indexed by P600, elicited during a WM test, is impaired in FES. Moreover, these findings lend support to the view that the auditory WM in schizophrenia involves or affects a circuitry including temporoparietal and occipital brain areas.
Panwar, Bharat; Raghava, Gajendra P S
2015-04-01
The RNA-protein interactions play a diverse role in the cells, thus identification of RNA-protein interface is essential for the biologist to understand their function. In the past, several methods have been developed for predicting RNA interacting residues in proteins, but limited efforts have been made for the identification of protein-interacting nucleotides in RNAs. In order to discriminate protein-interacting and non-interacting nucleotides, we used various classifiers (NaiveBayes, NaiveBayesMultinomial, BayesNet, ComplementNaiveBayes, MultilayerPerceptron, J48, SMO, RandomForest, SMO and SVM(light)) for prediction model development using various features and achieved highest 83.92% sensitivity, 84.82 specificity, 84.62% accuracy and 0.62 Matthew's correlation coefficient by SVM(light) based models. We observed that certain tri-nucleotides like ACA, ACC, AGA, CAC, CCA, GAG, UGA, and UUU preferred in protein-interaction. All the models have been developed using a non-redundant dataset and are evaluated using five-fold cross validation technique. A web-server called RNApin has been developed for the scientific community (http://crdd.osdd.net/raghava/rnapin/). Copyright © 2015 Elsevier Inc. All rights reserved.
Statistics-based email communication security behavior recognition
NASA Astrophysics Data System (ADS)
Yi, Junkai; Su, Yueyang; Zhao, Xianghui
2017-08-01
With the development of information technology, e-mail has become a popular communication medium. It has great significant to determine the relationship between the two sides of the communication. Firstly, this paper analysed and processed the content and attachment of e-mail using the skill of steganalysis and malware analysis. And it also conducts the following feature extracting and behaviour model establishing which based on Naive Bayesian theory. Then a behaviour analysis method was employed to calculate and evaluate the communication security. Finally, some experiments about the accuracy of the behavioural relationship of communication identifying has been carried out. The result shows that this method has a great effects and correctness as eighty-four percent.
Learning in the context of distribution drift
2017-05-09
published in the leading data mining journal, Data Mining and Knowledge Discovery (Webb et. al., 2016)1. We have shown that the previous qualitative...learner Low-bias learner Aggregated classifier Figure 7: Architecture for learning fr m streaming data in th co text of variable or unknown...Learning limited dependence Bayesian classifiers, in Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD
User-customized brain computer interfaces using Bayesian optimization
NASA Astrophysics Data System (ADS)
Bashashati, Hossein; Ward, Rabab K.; Bashashati, Ali
2016-04-01
Objective. The brain characteristics of different people are not the same. Brain computer interfaces (BCIs) should thus be customized for each individual person. In motor-imagery based synchronous BCIs, a number of parameters (referred to as hyper-parameters) including the EEG frequency bands, the channels and the time intervals from which the features are extracted should be pre-determined based on each subject’s brain characteristics. Approach. To determine the hyper-parameter values, previous work has relied on manual or semi-automatic methods that are not applicable to high-dimensional search spaces. In this paper, we propose a fully automatic, scalable and computationally inexpensive algorithm that uses Bayesian optimization to tune these hyper-parameters. We then build different classifiers trained on the sets of hyper-parameter values proposed by the Bayesian optimization. A final classifier aggregates the results of the different classifiers. Main Results. We have applied our method to 21 subjects from three BCI competition datasets. We have conducted rigorous statistical tests, and have shown the positive impact of hyper-parameter optimization in improving the accuracy of BCIs. Furthermore, We have compared our results to those reported in the literature. Significance. Unlike the best reported results in the literature, which are based on more sophisticated feature extraction and classification methods, and rely on prestudies to determine the hyper-parameter values, our method has the advantage of being fully automated, uses less sophisticated feature extraction and classification methods, and yields similar or superior results compared to the best performing designs in the literature.
Pixel-based skin segmentation in psoriasis images.
George, Y; Aldeen, M; Garnavi, R
2016-08-01
In this paper, we present a detailed comparison study of skin segmentation methods for psoriasis images. Different techniques are modified and then applied to a set of psoriasis images acquired from the Royal Melbourne Hospital, Melbourne, Australia, with aim of finding the best technique suited for application to psoriasis images. We investigate the effect of different colour transformations on skin detection performance. In this respect, explicit skin thresholding is evaluated with three different decision boundaries (CbCr, HS and rgHSV). Histogram-based Bayesian classifier is applied to extract skin probability maps (SPMs) for different colour channels. This is then followed by using different approaches to find a binary skin map (SM) image from the SPMs. The approaches used include binary decision tree (DT) and Otsu's thresholding. Finally, a set of morphological operations are implemented to refine the resulted SM image. The paper provides detailed analysis and comparison of the performance of the Bayesian classifier in five different colour spaces (YCbCr, HSV, RGB, XYZ and CIELab). The results show that histogram-based Bayesian classifier is more effective than explicit thresholding, when applied to psoriasis images. It is also found that decision boundary CbCr outperforms HS and rgHSV. Another finding is that the SPMs of Cb, Cr, H and B-CIELab colour bands yield the best SMs for psoriasis images. In this study, we used a set of 100 psoriasis images for training and testing the presented methods. True Positive (TP) and True Negative (TN) are used as statistical evaluation measures.
Functional Interaction Network Construction and Analysis for Disease Discovery.
Wu, Guanming; Haw, Robin
2017-01-01
Network-based approaches project seemingly unrelated genes or proteins onto a large-scale network context, therefore providing a holistic visualization and analysis platform for genomic data generated from high-throughput experiments, reducing the dimensionality of data via using network modules and increasing the statistic analysis power. Based on the Reactome database, the most popular and comprehensive open-source biological pathway knowledgebase, we have developed a highly reliable protein functional interaction network covering around 60 % of total human genes and an app called ReactomeFIViz for Cytoscape, the most popular biological network visualization and analysis platform. In this chapter, we describe the detailed procedures on how this functional interaction network is constructed by integrating multiple external data sources, extracting functional interactions from human curated pathway databases, building a machine learning classifier called a Naïve Bayesian Classifier, predicting interactions based on the trained Naïve Bayesian Classifier, and finally constructing the functional interaction database. We also provide an example on how to use ReactomeFIViz for performing network-based data analysis for a list of genes.
A Neuro-Fuzzy Approach in the Classification of Students' Academic Performance
2013-01-01
Classifying the student academic performance with high accuracy facilitates admission decisions and enhances educational services at educational institutions. The purpose of this paper is to present a neuro-fuzzy approach for classifying students into different groups. The neuro-fuzzy classifier used previous exam results and other related factors as input variables and labeled students based on their expected academic performance. The results showed that the proposed approach achieved a high accuracy. The results were also compared with those obtained from other well-known classification approaches, including support vector machine, Naive Bayes, neural network, and decision tree approaches. The comparative analysis indicated that the neuro-fuzzy approach performed better than the others. It is expected that this work may be used to support student admission procedures and to strengthen the services of educational institutions. PMID:24302928
A neuro-fuzzy approach in the classification of students' academic performance.
Do, Quang Hung; Chen, Jeng-Fung
2013-01-01
Classifying the student academic performance with high accuracy facilitates admission decisions and enhances educational services at educational institutions. The purpose of this paper is to present a neuro-fuzzy approach for classifying students into different groups. The neuro-fuzzy classifier used previous exam results and other related factors as input variables and labeled students based on their expected academic performance. The results showed that the proposed approach achieved a high accuracy. The results were also compared with those obtained from other well-known classification approaches, including support vector machine, Naive Bayes, neural network, and decision tree approaches. The comparative analysis indicated that the neuro-fuzzy approach performed better than the others. It is expected that this work may be used to support student admission procedures and to strengthen the services of educational institutions.
Esfahani, Mohammad Shahrokh; Dougherty, Edward R
2015-01-01
Phenotype classification via genomic data is hampered by small sample sizes that negatively impact classifier design. Utilization of prior biological knowledge in conjunction with training data can improve both classifier design and error estimation via the construction of the optimal Bayesian classifier. In the genomic setting, gene/protein signaling pathways provide a key source of biological knowledge. Although these pathways are neither complete, nor regulatory, with no timing associated with them, they are capable of constraining the set of possible models representing the underlying interaction between molecules. The aim of this paper is to provide a framework and the mathematical tools to transform signaling pathways to prior probabilities governing uncertainty classes of feature-label distributions used in classifier design. Structural motifs extracted from the signaling pathways are mapped to a set of constraints on a prior probability on a Multinomial distribution. Being the conjugate prior for the Multinomial distribution, we propose optimization paradigms to estimate the parameters of a Dirichlet distribution in the Bayesian setting. The performance of the proposed methods is tested on two widely studied pathways: mammalian cell cycle and a p53 pathway model.
Application of Dynamic naïve Bayesian classifier to comprehensive drought assessment
NASA Astrophysics Data System (ADS)
Park, D. H.; Lee, J. Y.; Lee, J. H.; KIm, T. W.
2017-12-01
Drought monitoring has already been extensively studied due to the widespread impacts and complex causes of drought. The most important component of drought monitoring is to estimate the characteristics and extent of drought by quantitatively measuring the characteristics of drought. Drought assessment considering different aspects of the complicated drought condition and uncertainty of drought index is great significance in accurate drought monitoring. This study used the dynamic Naïve Bayesian Classifier (DNBC) which is an extension of the Hidden Markov Model (HMM), to model and classify drought by using various drought indices for integrated drought assessment. To provide a stable model for combined use of multiple drought indices, this study employed the DNBC to perform multi-index drought assessment by aggregating the effect of different type of drought and considering the inherent uncertainty. Drought classification was performed by the DNBC using several drought indices: Standardized Precipitation Index (SPI), Streamflow Drought Index (SDI), and Normalized Vegetation Supply Water Index (NVSWI)) that reflect meteorological, hydrological, and agricultural drought characteristics. Overall results showed that in comparison unidirectional (SPI, SDI, and NVSWI) or multivariate (Composite Drought Index, CDI) drought assessment, the proposed DNBC was able to synthetically classify of drought considering uncertainty. Model provided method for comprehensive drought assessment with combined use of different drought indices.
A bayesian approach to classification criteria for spectacled eiders
Taylor, B.L.; Wade, P.R.; Stehn, R.A.; Cochrane, J.F.
1996-01-01
To facilitate decisions to classify species according to risk of extinction, we used Bayesian methods to analyze trend data for the Spectacled Eider, an arctic sea duck. Trend data from three independent surveys of the Yukon-Kuskokwim Delta were analyzed individually and in combination to yield posterior distributions for population growth rates. We used classification criteria developed by the recovery team for Spectacled Eiders that seek to equalize errors of under- or overprotecting the species. We conducted both a Bayesian decision analysis and a frequentist (classical statistical inference) decision analysis. Bayesian decision analyses are computationally easier, yield basically the same results, and yield results that are easier to explain to nonscientists. With the exception of the aerial survey analysis of the 10 most recent years, both Bayesian and frequentist methods indicated that an endangered classification is warranted. The discrepancy between surveys warrants further research. Although the trend data are abundance indices, we used a preliminary estimate of absolute abundance to demonstrate how to calculate extinction distributions using the joint probability distributions for population growth rate and variance in growth rate generated by the Bayesian analysis. Recent apparent increases in abundance highlight the need for models that apply to declining and then recovering species.
BANYAN_Sigma: Bayesian classifier for members of young stellar associations
NASA Astrophysics Data System (ADS)
Gagné, Jonathan; Mamajek, Eric E.; Malo, Lison; Riedel, Adric; Rodriguez, David; Lafrenière, David; Faherty, Jacqueline K.; Roy-Loubier, Olivier; Pueyo, Laurent; Robin, Annie C.; Doyon, René
2018-01-01
BANYAN_Sigma calculates the membership probability that a given astrophysical object belongs to one of the currently known 27 young associations within 150 pc of the Sun, using Bayesian inference. This tool uses the sky position and proper motion measurements of an object, with optional radial velocity (RV) and distance (D) measurements, to derive a Bayesian membership probability. By default, the priors are adjusted such that a probability threshold of 90% will recover 50%, 68%, 82% or 90% of true association members depending on what observables are input (only sky position and proper motion, with RV, with D, with both RV and D, respectively). The algorithm is implemented in a Python package, in IDL, and is also implemented as an interactive web page.
A Dirichlet-Multinomial Bayes Classifier for Disease Diagnosis with Microbial Compositions.
Gao, Xiang; Lin, Huaiying; Dong, Qunfeng
2017-01-01
Dysbiosis of microbial communities is associated with various human diseases, raising the possibility of using microbial compositions as biomarkers for disease diagnosis. We have developed a Bayes classifier by modeling microbial compositions with Dirichlet-multinomial distributions, which are widely used to model multicategorical count data with extra variation. The parameters of the Dirichlet-multinomial distributions are estimated from training microbiome data sets based on maximum likelihood. The posterior probability of a microbiome sample belonging to a disease or healthy category is calculated based on Bayes' theorem, using the likelihood values computed from the estimated Dirichlet-multinomial distribution, as well as a prior probability estimated from the training microbiome data set or previously published information on disease prevalence. When tested on real-world microbiome data sets, our method, called DMBC (for Dirichlet-multinomial Bayes classifier), shows better classification accuracy than the only existing Bayesian microbiome classifier based on a Dirichlet-multinomial mixture model and the popular random forest method. The advantage of DMBC is its built-in automatic feature selection, capable of identifying a subset of microbial taxa with the best classification accuracy between different classes of samples based on cross-validation. This unique ability enables DMBC to maintain and even improve its accuracy at modeling species-level taxa. The R package for DMBC is freely available at https://github.com/qunfengdong/DMBC. IMPORTANCE By incorporating prior information on disease prevalence, Bayes classifiers have the potential to estimate disease probability better than other common machine-learning methods. Thus, it is important to develop Bayes classifiers specifically tailored for microbiome data. Our method shows higher classification accuracy than the only existing Bayesian classifier and the popular random forest method, and thus provides an alternative option for using microbial compositions for disease diagnosis.
Nicandro, Cruz-Ramírez; Efrén, Mezura-Montes; María Yaneli, Ameca-Alducin; Enrique, Martín-Del-Campo-Mena; Héctor Gabriel, Acosta-Mesa; Nancy, Pérez-Castro; Alejandro, Guerra-Hernández; Guillermo de Jesús, Hoyos-Rivera; Rocío Erandi, Barrientos-Martínez
2013-01-01
Breast cancer is one of the leading causes of death among women worldwide. There are a number of techniques used for diagnosing this disease: mammography, ultrasound, and biopsy, among others. Each of these has well-known advantages and disadvantages. A relatively new method, based on the temperature a tumor may produce, has recently been explored: thermography. In this paper, we will evaluate the diagnostic power of thermography in breast cancer using Bayesian network classifiers. We will show how the information provided by the thermal image can be used in order to characterize patients suspected of having cancer. Our main contribution is the proposal of a score, based on the aforementioned information, that could help distinguish sick patients from healthy ones. Our main results suggest the potential of this technique in such a goal but also show its main limitations that have to be overcome to consider it as an effective diagnosis complementary tool.
Bayesian network classifiers for categorizing cortical GABAergic interneurons.
Mihaljević, Bojan; Benavides-Piccione, Ruth; Bielza, Concha; DeFelipe, Javier; Larrañaga, Pedro
2015-04-01
An accepted classification of GABAergic interneurons of the cerebral cortex is a major goal in neuroscience. A recently proposed taxonomy based on patterns of axonal arborization promises to be a pragmatic method for achieving this goal. It involves characterizing interneurons according to five axonal arborization features, called F1-F5, and classifying them into a set of predefined types, most of which are established in the literature. Unfortunately, there is little consensus among expert neuroscientists regarding the morphological definitions of some of the proposed types. While supervised classifiers were able to categorize the interneurons in accordance with experts' assignments, their accuracy was limited because they were trained with disputed labels. Thus, here we automatically classify interneuron subsets with different label reliability thresholds (i.e., such that every cell's label is backed by at least a certain (threshold) number of experts). We quantify the cells with parameters of axonal and dendritic morphologies and, in order to predict the type, also with axonal features F1-F4 provided by the experts. Using Bayesian network classifiers, we accurately characterize and classify the interneurons and identify useful predictor variables. In particular, we discriminate among reliable examples of common basket, horse-tail, large basket, and Martinotti cells with up to 89.52% accuracy, and single out the number of branches at 180 μm from the soma, the convex hull 2D area, and the axonal features F1-F4 as especially useful predictors for distinguishing among these types. These results open up new possibilities for an objective and pragmatic classification of interneurons.
Wang, Shuihua; Yang, Ming; Du, Sidan; Yang, Jiquan; Liu, Bin; Gorriz, Juan M.; Ramírez, Javier; Yuan, Ti-Fei; Zhang, Yudong
2016-01-01
Highlights We develop computer-aided diagnosis system for unilateral hearing loss detection in structural magnetic resonance imaging.Wavelet entropy is introduced to extract image global features from brain images. Directed acyclic graph is employed to endow support vector machine an ability to handle multi-class problems.The developed computer-aided diagnosis system achieves an overall accuracy of 95.1% for this three-class problem of differentiating left-sided and right-sided hearing loss from healthy controls. Aim: Sensorineural hearing loss (SNHL) is correlated to many neurodegenerative disease. Now more and more computer vision based methods are using to detect it in an automatic way. Materials: We have in total 49 subjects, scanned by 3.0T MRI (Siemens Medical Solutions, Erlangen, Germany). The subjects contain 14 patients with right-sided hearing loss (RHL), 15 patients with left-sided hearing loss (LHL), and 20 healthy controls (HC). Method: We treat this as a three-class classification problem: RHL, LHL, and HC. Wavelet entropy (WE) was selected from the magnetic resonance images of each subjects, and then submitted to a directed acyclic graph support vector machine (DAG-SVM). Results: The 10 repetition results of 10-fold cross validation shows 3-level decomposition will yield an overall accuracy of 95.10% for this three-class classification problem, higher than feedforward neural network, decision tree, and naive Bayesian classifier. Conclusions: This computer-aided diagnosis system is promising. We hope this study can attract more computer vision method for detecting hearing loss. PMID:27807415
Wang, Shuihua; Yang, Ming; Du, Sidan; Yang, Jiquan; Liu, Bin; Gorriz, Juan M; Ramírez, Javier; Yuan, Ti-Fei; Zhang, Yudong
2016-01-01
Highlights We develop computer-aided diagnosis system for unilateral hearing loss detection in structural magnetic resonance imaging.Wavelet entropy is introduced to extract image global features from brain images. Directed acyclic graph is employed to endow support vector machine an ability to handle multi-class problems.The developed computer-aided diagnosis system achieves an overall accuracy of 95.1% for this three-class problem of differentiating left-sided and right-sided hearing loss from healthy controls. Aim: Sensorineural hearing loss (SNHL) is correlated to many neurodegenerative disease. Now more and more computer vision based methods are using to detect it in an automatic way. Materials: We have in total 49 subjects, scanned by 3.0T MRI (Siemens Medical Solutions, Erlangen, Germany). The subjects contain 14 patients with right-sided hearing loss (RHL), 15 patients with left-sided hearing loss (LHL), and 20 healthy controls (HC). Method: We treat this as a three-class classification problem: RHL, LHL, and HC. Wavelet entropy (WE) was selected from the magnetic resonance images of each subjects, and then submitted to a directed acyclic graph support vector machine (DAG-SVM). Results: The 10 repetition results of 10-fold cross validation shows 3-level decomposition will yield an overall accuracy of 95.10% for this three-class classification problem, higher than feedforward neural network, decision tree, and naive Bayesian classifier. Conclusions: This computer-aided diagnosis system is promising. We hope this study can attract more computer vision method for detecting hearing loss.
NASA Astrophysics Data System (ADS)
Hoffmann, Achim; Mahidadia, Ashesh
The purpose of this chapter is to present fundamental ideas and techniques of machine learning suitable for the field of this book, i.e., for automated scientific discovery. The chapter focuses on those symbolic machine learning methods, which produce results that are suitable to be interpreted and understood by humans. This is particularly important in the context of automated scientific discovery as the scientific theories to be produced by machines are usually meant to be interpreted by humans. This chapter contains some of the most influential ideas and concepts in machine learning research to give the reader a basic insight into the field. After the introduction in Sect. 1, general ideas of how learning problems can be framed are given in Sect. 2. The section provides useful perspectives to better understand what learning algorithms actually do. Section 3 presents the Version space model which is an early learning algorithm as well as a conceptual framework, that provides important insight into the general mechanisms behind most learning algorithms. In section 4, a family of learning algorithms, the AQ family for learning classification rules is presented. The AQ family belongs to the early approaches in machine learning. The next, Sect. 5 presents the basic principles of decision tree learners. Decision tree learners belong to the most influential class of inductive learning algorithms today. Finally, a more recent group of learning systems are presented in Sect. 6, which learn relational concepts within the framework of logic programming. This is a particularly interesting group of learning systems since the framework allows also to incorporate background knowledge which may assist in generalisation. Section 7 discusses Association Rules - a technique that comes from the related field of Data mining. Section 8 presents the basic idea of the Naive Bayesian Classifier. While this is a very popular learning technique, the learning result is not well suited for human comprehension as it is essentially a large collection of probability values. In Sect. 9, we present a generic method for improving accuracy of a given learner by generatingmultiple classifiers using variations of the training data. While this works well in most cases, the resulting classifiers have significantly increased complexity and, hence, tend to destroy the human readability of the learning result that a single learner may produce. Section 10 contains a summary, mentions briefly other techniques not discussed in this chapter and presents outlook on the potential of machine learning in the future.
Vogt, Martin; Bajorath, Jürgen
2008-01-01
Bayesian classifiers are increasingly being used to distinguish active from inactive compounds and search large databases for novel active molecules. We introduce an approach to directly combine the contributions of property descriptors and molecular fingerprints in the search for active compounds that is based on a Bayesian framework. Conventionally, property descriptors and fingerprints are used as alternative features for virtual screening methods. Following the approach introduced here, probability distributions of descriptor values and fingerprint bit settings are calculated for active and database molecules and the divergence between the resulting combined distributions is determined as a measure of biological activity. In test calculations on a large number of compound activity classes, this methodology was found to consistently perform better than similarity searching using fingerprints and multiple reference compounds or Bayesian screening calculations using probability distributions calculated only from property descriptors. These findings demonstrate that there is considerable synergy between different types of property descriptors and fingerprints in recognizing diverse structure-activity relationships, at least in the context of Bayesian modeling.
Recognition of pornographic web pages by classifying texts and images.
Hu, Weiming; Wu, Ou; Chen, Zhouyao; Fu, Zhouyu; Maybank, Steve
2007-06-01
With the rapid development of the World Wide Web, people benefit more and more from the sharing of information. However, Web pages with obscene, harmful, or illegal content can be easily accessed. It is important to recognize such unsuitable, offensive, or pornographic Web pages. In this paper, a novel framework for recognizing pornographic Web pages is described. A C4.5 decision tree is used to divide Web pages, according to content representations, into continuous text pages, discrete text pages, and image pages. These three categories of Web pages are handled, respectively, by a continuous text classifier, a discrete text classifier, and an algorithm that fuses the results from the image classifier and the discrete text classifier. In the continuous text classifier, statistical and semantic features are used to recognize pornographic texts. In the discrete text classifier, the naive Bayes rule is used to calculate the probability that a discrete text is pornographic. In the image classifier, the object's contour-based features are extracted to recognize pornographic images. In the text and image fusion algorithm, the Bayes theory is used to combine the recognition results from images and texts. Experimental results demonstrate that the continuous text classifier outperforms the traditional keyword-statistics-based classifier, the contour-based image classifier outperforms the traditional skin-region-based image classifier, the results obtained by our fusion algorithm outperform those by either of the individual classifiers, and our framework can be adapted to different categories of Web pages.
Adaptive statistical pattern classifiers for remotely sensed data
NASA Technical Reports Server (NTRS)
Gonzalez, R. C.; Pace, M. O.; Raulston, H. S.
1975-01-01
A technique for the adaptive estimation of nonstationary statistics necessary for Bayesian classification is developed. The basic approach to the adaptive estimation procedure consists of two steps: (1) an optimal stochastic approximation of the parameters of interest and (2) a projection of the parameters in time or position. A divergence criterion is developed to monitor algorithm performance. Comparative results of adaptive and nonadaptive classifier tests are presented for simulated four dimensional spectral scan data.
Carbon Nanotube Growth Rate Regression using Support Vector Machines and Artificial Neural Networks
2014-03-27
intensity D peak. Reprinted with permission from [38]. The SVM classifier is trained using custom written Java code leveraging the Sequential Minimal...Society Encog is a machine learning framework for Java , C++ and .Net applications that supports Bayesian Networks, Hidden Markov Models, SVMs and ANNs [13...SVM classifiers are trained using Weka libraries and leveraging custom written Java code. The data set is created as an Attribute Relationship File
Sankari, E Siva; Manimegalai, D
2017-12-21
Predicting membrane protein types is an important and challenging research area in bioinformatics and proteomics. Traditional biophysical methods are used to classify membrane protein types. Due to large exploration of uncharacterized protein sequences in databases, traditional methods are very time consuming, expensive and susceptible to errors. Hence, it is highly desirable to develop a robust, reliable, and efficient method to predict membrane protein types. Imbalanced datasets and large datasets are often handled well by decision tree classifiers. Since imbalanced datasets are taken, the performance of various decision tree classifiers such as Decision Tree (DT), Classification And Regression Tree (CART), C4.5, Random tree, REP (Reduced Error Pruning) tree, ensemble methods such as Adaboost, RUS (Random Under Sampling) boost, Rotation forest and Random forest are analysed. Among the various decision tree classifiers Random forest performs well in less time with good accuracy of 96.35%. Another inference is RUS boost decision tree classifier is able to classify one or two samples in the class with very less samples while the other classifiers such as DT, Adaboost, Rotation forest and Random forest are not sensitive for the classes with fewer samples. Also the performance of decision tree classifiers is compared with SVM (Support Vector Machine) and Naive Bayes classifier. Copyright © 2017 Elsevier Ltd. All rights reserved.
Bulashevska, Alla; Eils, Roland
2006-06-14
The subcellular location of a protein is closely related to its function. It would be worthwhile to develop a method to predict the subcellular location for a given protein when only the amino acid sequence of the protein is known. Although many efforts have been made to predict subcellular location from sequence information only, there is the need for further research to improve the accuracy of prediction. A novel method called HensBC is introduced to predict protein subcellular location. HensBC is a recursive algorithm which constructs a hierarchical ensemble of classifiers. The classifiers used are Bayesian classifiers based on Markov chain models. We tested our method on six various datasets; among them are Gram-negative bacteria dataset, data for discriminating outer membrane proteins and apoptosis proteins dataset. We observed that our method can predict the subcellular location with high accuracy. Another advantage of the proposed method is that it can improve the accuracy of the prediction of some classes with few sequences in training and is therefore useful for datasets with imbalanced distribution of classes. This study introduces an algorithm which uses only the primary sequence of a protein to predict its subcellular location. The proposed recursive scheme represents an interesting methodology for learning and combining classifiers. The method is computationally efficient and competitive with the previously reported approaches in terms of prediction accuracies as empirical results indicate. The code for the software is available upon request.
Salient object detection method based on multiple semantic features
NASA Astrophysics Data System (ADS)
Wang, Chunyang; Yu, Chunyan; Song, Meiping; Wang, Yulei
2018-04-01
The existing salient object detection model can only detect the approximate location of salient object, or highlight the background, to resolve the above problem, a salient object detection method was proposed based on image semantic features. First of all, three novel salient features were presented in this paper, including object edge density feature (EF), object semantic feature based on the convex hull (CF) and object lightness contrast feature (LF). Secondly, the multiple salient features were trained with random detection windows. Thirdly, Naive Bayesian model was used for combine these features for salient detection. The results on public datasets showed that our method performed well, the location of salient object can be fixed and the salient object can be accurately detected and marked by the specific window.
Borchani, Hanen; Bielza, Concha; Toro, Carlos; Larrañaga, Pedro
2013-03-01
Our aim is to use multi-dimensional Bayesian network classifiers in order to predict the human immunodeficiency virus type 1 (HIV-1) reverse transcriptase and protease inhibitors given an input set of respective resistance mutations that an HIV patient carries. Multi-dimensional Bayesian network classifiers (MBCs) are probabilistic graphical models especially designed to solve multi-dimensional classification problems, where each input instance in the data set has to be assigned simultaneously to multiple output class variables that are not necessarily binary. In this paper, we introduce a new method, named MB-MBC, for learning MBCs from data by determining the Markov blanket around each class variable using the HITON algorithm. Our method is applied to both reverse transcriptase and protease data sets obtained from the Stanford HIV-1 database. Regarding the prediction of antiretroviral combination therapies, the experimental study shows promising results in terms of classification accuracy compared with state-of-the-art MBC learning algorithms. For reverse transcriptase inhibitors, we get 71% and 11% in mean and global accuracy, respectively; while for protease inhibitors, we get more than 84% and 31% in mean and global accuracy, respectively. In addition, the analysis of MBC graphical structures lets us gain insight into both known and novel interactions between reverse transcriptase and protease inhibitors and their respective resistance mutations. MB-MBC algorithm is a valuable tool to analyze the HIV-1 reverse transcriptase and protease inhibitors prediction problem and to discover interactions within and between these two classes of inhibitors. Copyright © 2012 Elsevier B.V. All rights reserved.
The maximum entropy method of moments and Bayesian probability theory
NASA Astrophysics Data System (ADS)
Bretthorst, G. Larry
2013-08-01
The problem of density estimation occurs in many disciplines. For example, in MRI it is often necessary to classify the types of tissues in an image. To perform this classification one must first identify the characteristics of the tissues to be classified. These characteristics might be the intensity of a T1 weighted image and in MRI many other types of characteristic weightings (classifiers) may be generated. In a given tissue type there is no single intensity that characterizes the tissue, rather there is a distribution of intensities. Often this distributions can be characterized by a Gaussian, but just as often it is much more complicated. Either way, estimating the distribution of intensities is an inference problem. In the case of a Gaussian distribution, one must estimate the mean and standard deviation. However, in the Non-Gaussian case the shape of the density function itself must be inferred. Three common techniques for estimating density functions are binned histograms [1, 2], kernel density estimation [3, 4], and the maximum entropy method of moments [5, 6]. In the introduction, the maximum entropy method of moments will be reviewed. Some of its problems and conditions under which it fails will be discussed. Then in later sections, the functional form of the maximum entropy method of moments probability distribution will be incorporated into Bayesian probability theory. It will be shown that Bayesian probability theory solves all of the problems with the maximum entropy method of moments. One gets posterior probabilities for the Lagrange multipliers, and, finally, one can put error bars on the resulting estimated density function.
NASA Astrophysics Data System (ADS)
Lee, Youngjoo; Kim, Namkug; Seo, Joon Beom; Lee, JuneGoo; Kang, Suk Ho
2007-03-01
In this paper, we proposed novel shape features to improve classification performance of differentiating obstructive lung diseases, based on HRCT (High Resolution Computerized Tomography) images. The images were selected from HRCT images, obtained from 82 subjects. For each image, two experienced radiologists selected rectangular ROIs with various sizes (16x16, 32x32, and 64x64 pixels), representing each disease or normal lung parenchyma. Besides thirteen textural features, we employed additional seven shape features; cluster shape features, and Top-hat transform features. To evaluate the contribution of shape features for differentiation of obstructive lung diseases, several experiments were conducted with two different types of classifiers and various ROI sizes. For automated classification, the Bayesian classifier and support vector machine (SVM) were implemented. To assess the performance and cross-validation of the system, 5-folding method was used. In comparison to employing only textural features, adding shape features yields significant enhancement of overall sensitivity(5.9, 5.4, 4.4% in the Bayesian and 9.0, 7.3, 5.3% in the SVM), in the order of ROI size 16x16, 32x32, 64x64 pixels, respectively (t-test, p<0.01). Moreover, this enhancement was largely due to the improvement on class-specific sensitivity of mild centrilobular emphysema and bronchiolitis obliterans which are most hard to differentiate for radiologists. According to these experimental results, adding shape features to conventional texture features is much useful to improve classification performance of obstructive lung diseases in both Bayesian and SVM classifiers.
Farr, W. M.; Mandel, I.; Stevens, D.
2015-01-01
Selection among alternative theoretical models given an observed dataset is an important challenge in many areas of physics and astronomy. Reversible-jump Markov chain Monte Carlo (RJMCMC) is an extremely powerful technique for performing Bayesian model selection, but it suffers from a fundamental difficulty and it requires jumps between model parameter spaces, but cannot efficiently explore both parameter spaces at once. Thus, a naive jump between parameter spaces is unlikely to be accepted in the Markov chain Monte Carlo (MCMC) algorithm and convergence is correspondingly slow. Here, we demonstrate an interpolation technique that uses samples from single-model MCMCs to propose intermodel jumps from an approximation to the single-model posterior of the target parameter space. The interpolation technique, based on a kD-tree data structure, is adaptive and efficient in modest dimensionality. We show that our technique leads to improved convergence over naive jumps in an RJMCMC, and compare it to other proposals in the literature to improve the convergence of RJMCMCs. We also demonstrate the use of the same interpolation technique as a way to construct efficient ‘global’ proposal distributions for single-model MCMCs without prior knowledge of the structure of the posterior distribution, and discuss improvements that permit the method to be used in higher dimensional spaces efficiently. PMID:26543580
Peltokoski, Jaana; Vehviläinen-Julkunen, Katri; Pitkäaho, Taina; Mikkonen, Santtu; Miettinen, Merja
2015-10-01
To examine the relationship of a comprehensive health care orientation process with a hospital's attractiveness. Little is known about indicators of the employee orientation process that most likely explain a hospital organisation's attractiveness. Empirical data collected from registered nurses (n = 145) and physicians (n = 37) working in two specialised hospital districts. A Naive Bayes Classification was applied to examine the comprehensive orientation process indicators that predict hospital's attractiveness. The model was composed of five orientation process indicators: the contribution of the orientation process to nurses' and physicians' intention to stay; the defined responsibilities of the orientation process; interaction between newcomer and colleagues; responsibilities that are adapted for tasks; and newcomers' baseline knowledge assessment that should be done before the orientation phase. The Naive Bayes Classification was used to explore employee orientation process and related indicators. The model constructed provides insight that can be used in designing and implementing the orientation process to promote the hospital organisation's attractiveness. Managers should focus on developing fluently organised orientation practices based on the indicators that predict the hospital's attractiveness. For the purpose of personalised orientation, employees' baseline knowledge and competence level should be assessed before the orientation phase. © 2014 John Wiley & Sons Ltd.
Kambhampati, Satya Samyukta; Singh, Vishal; Manikandan, M Sabarimalai; Ramkumar, Barathram
2015-08-01
In this Letter, the authors present a unified framework for fall event detection and classification using the cumulants extracted from the acceleration (ACC) signals acquired using a single waist-mounted triaxial accelerometer. The main objective of this Letter is to find suitable representative cumulants and classifiers in effectively detecting and classifying different types of fall and non-fall events. It was discovered that the first level of the proposed hierarchical decision tree algorithm implements fall detection using fifth-order cumulants and support vector machine (SVM) classifier. In the second level, the fall event classification algorithm uses the fifth-order cumulants and SVM. Finally, human activity classification is performed using the second-order cumulants and SVM. The detection and classification results are compared with those of the decision tree, naive Bayes, multilayer perceptron and SVM classifiers with different types of time-domain features including the second-, third-, fourth- and fifth-order cumulants and the signal magnitude vector and signal magnitude area. The experimental results demonstrate that the second- and fifth-order cumulant features and SVM classifier can achieve optimal detection and classification rates of above 95%, as well as the lowest false alarm rate of 1.03%.
Bayesian Variable Selection for Hierarchical Gene-Environment and Gene-Gene Interactions
Liu, Changlu; Ma, Jianzhong; Amos, Christopher I.
2014-01-01
We propose a Bayesian hierarchical mixture model framework that allows us to investigate the genetic and environmental effects, gene by gene interactions and gene by environment interactions in the same model. Our approach incorporates the natural hierarchical structure between the main effects and interaction effects into a mixture model, such that our methods tend to remove the irrelevant interaction effects more effectively, resulting in more robust and parsimonious models. We consider both strong and weak hierarchical models. For a strong hierarchical model, both of the main effects between interacting factors must be present for the interactions to be considered in the model development, while for a weak hierarchical model, only one of the two main effects is required to be present for the interaction to be evaluated. Our simulation results show that the proposed strong and weak hierarchical mixture models work well in controlling false positive rates and provide a powerful approach for identifying the predisposing effects and interactions in gene-environment interaction studies, in comparison with the naive model that does not impose this hierarchical constraint in most of the scenarios simulated. We illustrated our approach using data for lung cancer and cutaneous melanoma. PMID:25154630
NASA Astrophysics Data System (ADS)
van Rossum, Anne C.; Lin, Hai Xiang; Dubbeldam, Johan; van der Herik, H. Jaap
2018-04-01
In machine vision typical heuristic methods to extract parameterized objects out of raw data points are the Hough transform and RANSAC. Bayesian models carry the promise to optimally extract such parameterized objects given a correct definition of the model and the type of noise at hand. A category of solvers for Bayesian models are Markov chain Monte Carlo methods. Naive implementations of MCMC methods suffer from slow convergence in machine vision due to the complexity of the parameter space. Towards this blocked Gibbs and split-merge samplers have been developed that assign multiple data points to clusters at once. In this paper we introduce a new split-merge sampler, the triadic split-merge sampler, that perform steps between two and three randomly chosen clusters. This has two advantages. First, it reduces the asymmetry between the split and merge steps. Second, it is able to propose a new cluster that is composed out of data points from two different clusters. Both advantages speed up convergence which we demonstrate on a line extraction problem. We show that the triadic split-merge sampler outperforms the conventional split-merge sampler. Although this new MCMC sampler is demonstrated in this machine vision context, its application extend to the very general domain of statistical inference.
Fleischmann, Roy; Tongbram, Vanita; van Vollenhoven, Ronald; Tang, Derek H; Chung, James; Collier, David; Urs, Shilpa; Ndirangu, Kerigo; Wells, George; Pope, Janet
2017-01-01
Clinical trials have not consistently demonstrated differences between tumour necrosis factor inhibitor (TNFi) plus methotrexate and triple therapy (methotrexate plus hydroxychloroquine plus sulfasalazine) in rheumatoid arthritis (RA). The study objective was to estimate the efficacy, radiographic benefits, safety and patient-reported outcomes of TNFi-methotrexate versus triple therapy in patients with RA. A systematic review and network meta-analysis (NMA) of randomised controlled trials of TNFi-methotrexate or triple therapy as one of the treatment arms in patients with an inadequate response to or who were naive to methotrexate was conducted. American College of Rheumatology 70% response criteria (ACR70) at 6 months was the prespecified primary endpoint to evaluate depth of response. Data from direct and indirect comparisons between TNFi-methotrexate and triple therapy were pooled and quantitatively analysed using fixed-effects and random-effects Bayesian models. We analysed 33 studies in patients with inadequate response to methotrexate and 19 in patients naive to methotrexate. In inadequate responders, triple therapy was associated with lower odds of achieving ACR70 at 6 months compared with TNFi-methotrexate (OR 0.35, 95% credible interval (CrI) 0.19 to 0.64). Most secondary endpoints tended to favour TNFi-methotrexate in terms of OR direction; however, no clear increased likelihood of achieving these endpoints was observed for either therapy. The odds of infection were lower with triple therapy than with TNFi-methotrexate (OR 0.08, 95% CrI 0.00 to 0.57). There were no differences observed between the two regimens in patients naive to methotrexate. In this NMA, triple therapy was associated with 65% lower odds of achieving ACR70 at 6 months compared with TNFi-methotrexate in patients with inadequate response to methotrexate. Although secondary endpoints numerically favoured TNFi-methotrexate, no clear differences were observed. The odds of infection were greater with TNFi-methotrexate. No differences were observed for patients naive to methotrexate. These results may help inform care of patients who fail methotrexate first-line therapy.
Fleischmann, Roy; Tongbram, Vanita; van Vollenhoven, Ronald; Tang, Derek H; Chung, James; Collier, David; Urs, Shilpa; Ndirangu, Kerigo; Wells, George; Pope, Janet
2017-01-01
Objective Clinical trials have not consistently demonstrated differences between tumour necrosis factor inhibitor (TNFi) plus methotrexate and triple therapy (methotrexate plus hydroxychloroquine plus sulfasalazine) in rheumatoid arthritis (RA). The study objective was to estimate the efficacy, radiographic benefits, safety and patient-reported outcomes of TNFi–methotrexate versus triple therapy in patients with RA. Methods A systematic review and network meta-analysis (NMA) of randomised controlled trials of TNFi–methotrexate or triple therapy as one of the treatment arms in patients with an inadequate response to or who were naive to methotrexate was conducted. American College of Rheumatology 70% response criteria (ACR70) at 6 months was the prespecified primary endpoint to evaluate depth of response. Data from direct and indirect comparisons between TNFi–methotrexate and triple therapy were pooled and quantitatively analysed using fixed-effects and random-effects Bayesian models. Results We analysed 33 studies in patients with inadequate response to methotrexate and 19 in patients naive to methotrexate. In inadequate responders, triple therapy was associated with lower odds of achieving ACR70 at 6 months compared with TNFi–methotrexate (OR 0.35, 95% credible interval (CrI) 0.19 to 0.64). Most secondary endpoints tended to favour TNFi–methotrexate in terms of OR direction; however, no clear increased likelihood of achieving these endpoints was observed for either therapy. The odds of infection were lower with triple therapy than with TNFi−methotrexate (OR 0.08, 95% CrI 0.00 to 0.57). There were no differences observed between the two regimens in patients naive to methotrexate. Conclusions In this NMA, triple therapy was associated with 65% lower odds of achieving ACR70 at 6 months compared with TNFi–methotrexate in patients with inadequate response to methotrexate. Although secondary endpoints numerically favoured TNFi–methotrexate, no clear differences were observed. The odds of infection were greater with TNFi–methotrexate. No differences were observed for patients naive to methotrexate. These results may help inform care of patients who fail methotrexate first-line therapy. PMID:28123782
Garcia-Chimeno, Yolanda; Garcia-Zapirain, Begonya; Gomez-Beldarrain, Marian; Fernandez-Ruanova, Begonya; Garcia-Monco, Juan Carlos
2017-04-13
Feature selection methods are commonly used to identify subsets of relevant features to facilitate the construction of models for classification, yet little is known about how feature selection methods perform in diffusion tensor images (DTIs). In this study, feature selection and machine learning classification methods were tested for the purpose of automating diagnosis of migraines using both DTIs and questionnaire answers related to emotion and cognition - factors that influence of pain perceptions. We select 52 adult subjects for the study divided into three groups: control group (15), subjects with sporadic migraine (19) and subjects with chronic migraine and medication overuse (18). These subjects underwent magnetic resonance with diffusion tensor to see white matter pathway integrity of the regions of interest involved in pain and emotion. The tests also gather data about pathology. The DTI images and test results were then introduced into feature selection algorithms (Gradient Tree Boosting, L1-based, Random Forest and Univariate) to reduce features of the first dataset and classification algorithms (SVM (Support Vector Machine), Boosting (Adaboost) and Naive Bayes) to perform a classification of migraine group. Moreover we implement a committee method to improve the classification accuracy based on feature selection algorithms. When classifying the migraine group, the greatest improvements in accuracy were made using the proposed committee-based feature selection method. Using this approach, the accuracy of classification into three types improved from 67 to 93% when using the Naive Bayes classifier, from 90 to 95% with the support vector machine classifier, 93 to 94% in boosting. The features that were determined to be most useful for classification included are related with the pain, analgesics and left uncinate brain (connected with the pain and emotions). The proposed feature selection committee method improved the performance of migraine diagnosis classifiers compared to individual feature selection methods, producing a robust system that achieved over 90% accuracy in all classifiers. The results suggest that the proposed methods can be used to support specialists in the classification of migraines in patients undergoing magnetic resonance imaging.
Quantum ensembles of quantum classifiers.
Schuld, Maria; Petruccione, Francesco
2018-02-09
Quantum machine learning witnesses an increasing amount of quantum algorithms for data-driven decision making, a problem with potential applications ranging from automated image recognition to medical diagnosis. Many of those algorithms are implementations of quantum classifiers, or models for the classification of data inputs with a quantum computer. Following the success of collective decision making with ensembles in classical machine learning, this paper introduces the concept of quantum ensembles of quantum classifiers. Creating the ensemble corresponds to a state preparation routine, after which the quantum classifiers are evaluated in parallel and their combined decision is accessed by a single-qubit measurement. This framework naturally allows for exponentially large ensembles in which - similar to Bayesian learning - the individual classifiers do not have to be trained. As an example, we analyse an exponentially large quantum ensemble in which each classifier is weighed according to its performance in classifying the training data, leading to new results for quantum as well as classical machine learning.
Classification of mislabelled microarrays using robust sparse logistic regression.
Bootkrajang, Jakramate; Kabán, Ata
2013-04-01
Previous studies reported that labelling errors are not uncommon in microarray datasets. In such cases, the training set may become misleading, and the ability of classifiers to make reliable inferences from the data is compromised. Yet, few methods are currently available in the bioinformatics literature to deal with this problem. The few existing methods focus on data cleansing alone, without reference to classification, and their performance crucially depends on some tuning parameters. In this article, we develop a new method to detect mislabelled arrays simultaneously with learning a sparse logistic regression classifier. Our method may be seen as a label-noise robust extension of the well-known and successful Bayesian logistic regression classifier. To account for possible mislabelling, we formulate a label-flipping process as part of the classifier. The regularization parameter is automatically set using Bayesian regularization, which not only saves the computation time that cross-validation would take, but also eliminates any unwanted effects of label noise when setting the regularization parameter. Extensive experiments with both synthetic data and real microarray datasets demonstrate that our approach is able to counter the bad effects of labelling errors in terms of predictive performance, it is effective at identifying marker genes and simultaneously it detects mislabelled arrays to high accuracy. The code is available from http://cs.bham.ac.uk/∼jxb008. Supplementary data are available at Bioinformatics online.
Chaillon, Antoine; Nakazawa, Masato; Wertheim, Joel O; Little, Susan J; Smith, Davey M; Mehta, Sanjay R; Gianella, Sara
2017-11-01
During primary HIV infection, the presence of minority drug resistance mutations (DRM) may be a consequence of sexual transmission, de novo mutations, or technical errors in identification. Baseline blood samples were collected from 24 HIV-infected antiretroviral-naive, genetically and epidemiologically linked source and recipient partners shortly after the recipient's estimated date of infection. An additional 32 longitudinal samples were available from 11 recipients. Deep sequencing of HIV reverse transcriptase (RT) was performed (Roche/454), and the sequences were screened for nucleoside and nonnucleoside RT inhibitor DRM. The likelihood of sexual transmission and persistence of DRM was assessed using Bayesian-based statistical modeling. While the majority of DRM (>20%) were consistently transmitted from source to recipient, the probability of detecting a minority DRM in the recipient was not increased when the same minority DRM was detected in the source (Bayes factor [BF] = 6.37). Longitudinal analyses revealed an exponential decay of DRM (BF = 0.05) while genetic diversity increased. Our analysis revealed no substantial evidence for sexual transmission of minority DRM (BF = 0.02). The presence of minority DRM during early infection, followed by a rapid decay, is consistent with the "mutation-selection balance" hypothesis, in which deleterious mutations are more efficiently purged later during HIV infection when the larger effective population size allows more efficient selection. Future studies using more recent sequencing technologies that are less prone to single-base errors should confirm these results by applying a similar Bayesian framework in other clinical settings. IMPORTANCE The advent of sensitive sequencing platforms has led to an increased identification of minority drug resistance mutations (DRM), including among antiretroviral therapy-naive HIV-infected individuals. While transmission of DRM may impact future therapy options for newly infected individuals, the clinical significance of the detection of minority DRM remains controversial. In the present study, we applied deep-sequencing techniques within a Bayesian hierarchical framework to a cohort of 24 transmission pairs to investigate whether minority DRM detected shortly after transmission were the consequence of (i) sexual transmission from the source, (ii) de novo emergence shortly after infection followed by viral selection and evolution, or (iii) technical errors/limitations of deep-sequencing methods. We found no clear evidence to support the sexual transmission of minority resistant variants, and our results suggested that minor resistant variants may emerge de novo shortly after transmission, when the small effective population size limits efficient purge by natural selection. Copyright © 2017 American Society for Microbiology.
Bouktif, Salah; Hanna, Eileen Marie; Zaki, Nazar; Abu Khousa, Eman
2014-01-01
Prediction and classification techniques have been well studied by machine learning researchers and developed for several real-word problems. However, the level of acceptance and success of prediction models are still below expectation due to some difficulties such as the low performance of prediction models when they are applied in different environments. Such a problem has been addressed by many researchers, mainly from the machine learning community. A second problem, principally raised by model users in different communities, such as managers, economists, engineers, biologists, and medical practitioners, etc., is the prediction models' interpretability. The latter is the ability of a model to explain its predictions and exhibit the causality relationships between the inputs and the outputs. In the case of classification, a successful way to alleviate the low performance is to use ensemble classiers. It is an intuitive strategy to activate collaboration between different classifiers towards a better performance than individual classier. Unfortunately, ensemble classifiers method do not take into account the interpretability of the final classification outcome. It even worsens the original interpretability of the individual classifiers. In this paper we propose a novel implementation of classifiers combination approach that does not only promote the overall performance but also preserves the interpretability of the resulting model. We propose a solution based on Ant Colony Optimization and tailored for the case of Bayesian classifiers. We validate our proposed solution with case studies from medical domain namely, heart disease and Cardiotography-based predictions, problems where interpretability is critical to make appropriate clinical decisions. The datasets, Prediction Models and software tool together with supplementary materials are available at http://faculty.uaeu.ac.ae/salahb/ACO4BC.htm.
Probabilistic classifiers with high-dimensional data
Kim, Kyung In; Simon, Richard
2011-01-01
For medical classification problems, it is often desirable to have a probability associated with each class. Probabilistic classifiers have received relatively little attention for small n large p classification problems despite of their importance in medical decision making. In this paper, we introduce 2 criteria for assessment of probabilistic classifiers: well-calibratedness and refinement and develop corresponding evaluation measures. We evaluated several published high-dimensional probabilistic classifiers and developed 2 extensions of the Bayesian compound covariate classifier. Based on simulation studies and analysis of gene expression microarray data, we found that proper probabilistic classification is more difficult than deterministic classification. It is important to ensure that a probabilistic classifier is well calibrated or at least not “anticonservative” using the methods developed here. We provide this evaluation for several probabilistic classifiers and also evaluate their refinement as a function of sample size under weak and strong signal conditions. We also present a cross-validation method for evaluating the calibration and refinement of any probabilistic classifier on any data set. PMID:21087946
Mihaljević, Bojan; Bielza, Concha; Benavides-Piccione, Ruth; DeFelipe, Javier; Larrañaga, Pedro
2014-01-01
Interneuron classification is an important and long-debated topic in neuroscience. A recent study provided a data set of digitally reconstructed interneurons classified by 42 leading neuroscientists according to a pragmatic classification scheme composed of five categorical variables, namely, of the interneuron type and four features of axonal morphology. From this data set we now learned a model which can classify interneurons, on the basis of their axonal morphometric parameters, into these five descriptive variables simultaneously. Because of differences in opinion among the neuroscientists, especially regarding neuronal type, for many interneurons we lacked a unique, agreed-upon classification, which we could use to guide model learning. Instead, we guided model learning with a probability distribution over the neuronal type and the axonal features, obtained, for each interneuron, from the neuroscientists' classification choices. We conveniently encoded such probability distributions with Bayesian networks, calling them label Bayesian networks (LBNs), and developed a method to predict them. This method predicts an LBN by forming a probabilistic consensus among the LBNs of the interneurons most similar to the one being classified. We used 18 axonal morphometric parameters as predictor variables, 13 of which we introduce in this paper as quantitative counterparts to the categorical axonal features. We were able to accurately predict interneuronal LBNs. Furthermore, when extracting crisp (i.e., non-probabilistic) predictions from the predicted LBNs, our method outperformed related work on interneuron classification. Our results indicate that our method is adequate for multi-dimensional classification of interneurons with probabilistic labels. Moreover, the introduced morphometric parameters are good predictors of interneuron type and the four features of axonal morphology and thus may serve as objective counterparts to the subjective, categorical axonal features.
Mahoney, Christine M; Kelly, Ryan T; Alexander, Liz; Newburn, Matt; Bader, Sydney; Ewing, Robert G; Fahey, Albert J; Atkinson, David A; Beagley, Nathaniel
2016-04-05
Time-of-flight-secondary ion mass spectrometry (TOF-SIMS) and laser ablation-inductively coupled plasma mass spectrometry (LA-ICPMS) were used for characterization and identification of unique signatures from a series of 18 Composition C-4 plastic explosives. The samples were obtained from various commercial and military sources around the country. Positive and negative ion TOF-SIMS data were acquired directly from the C-4 residue on Si surfaces, where the positive ion mass spectra obtained were consistent with the major composition of organic additives, and the negative ion mass spectra were more consistent with explosive content in the C-4 samples. Each series of mass spectra was subjected to partial least squares-discriminant analysis (PLS-DA), a multivariate statistical analysis approach which serves to first find the areas of maximum variance within different classes of C-4 and subsequently to classify unknown samples based on correlations between the unknown data set and the original data set (often referred to as a training data set). This method was able to successfully classify test samples of C-4, though with a limited degree of certainty. The classification accuracy of the method was further improved by integrating the positive and negative ion data using a Bayesian approach. The TOF-SIMS data was combined with a second analytical method, LA-ICPMS, which was used to analyze elemental signatures in the C-4. The integrated data were able to classify test samples with a high degree of certainty. Results indicate that this Bayesian integrated approach constitutes a robust classification method that should be employable even in dirty samples collected in the field.
Bayesian networks in neuroscience: a survey.
Bielza, Concha; Larrañaga, Pedro
2014-01-01
Bayesian networks are a type of probabilistic graphical models lie at the intersection between statistics and machine learning. They have been shown to be powerful tools to encode dependence relationships among the variables of a domain under uncertainty. Thanks to their generality, Bayesian networks can accommodate continuous and discrete variables, as well as temporal processes. In this paper we review Bayesian networks and how they can be learned automatically from data by means of structure learning algorithms. Also, we examine how a user can take advantage of these networks for reasoning by exact or approximate inference algorithms that propagate the given evidence through the graphical structure. Despite their applicability in many fields, they have been little used in neuroscience, where they have focused on specific problems, like functional connectivity analysis from neuroimaging data. Here we survey key research in neuroscience where Bayesian networks have been used with different aims: discover associations between variables, perform probabilistic reasoning over the model, and classify new observations with and without supervision. The networks are learned from data of any kind-morphological, electrophysiological, -omics and neuroimaging-, thereby broadening the scope-molecular, cellular, structural, functional, cognitive and medical- of the brain aspects to be studied.
Bayesian networks in neuroscience: a survey
Bielza, Concha; Larrañaga, Pedro
2014-01-01
Bayesian networks are a type of probabilistic graphical models lie at the intersection between statistics and machine learning. They have been shown to be powerful tools to encode dependence relationships among the variables of a domain under uncertainty. Thanks to their generality, Bayesian networks can accommodate continuous and discrete variables, as well as temporal processes. In this paper we review Bayesian networks and how they can be learned automatically from data by means of structure learning algorithms. Also, we examine how a user can take advantage of these networks for reasoning by exact or approximate inference algorithms that propagate the given evidence through the graphical structure. Despite their applicability in many fields, they have been little used in neuroscience, where they have focused on specific problems, like functional connectivity analysis from neuroimaging data. Here we survey key research in neuroscience where Bayesian networks have been used with different aims: discover associations between variables, perform probabilistic reasoning over the model, and classify new observations with and without supervision. The networks are learned from data of any kind–morphological, electrophysiological, -omics and neuroimaging–, thereby broadening the scope–molecular, cellular, structural, functional, cognitive and medical– of the brain aspects to be studied. PMID:25360109
NASA Technical Reports Server (NTRS)
Bryant, N. A.; George, A. J., Jr.; Hegdahl, R.
1977-01-01
A systematic verification of Landsat data classifications of the Portland, Oregon metropolitan area has been undertaken on the basis of census tract data. The degree of systematic misclassification due to the Bayesian classifier used to process the Landsat data was noted for the various suburban, industrialized and central business districts of the metropolitan area. The Landsat determinations of residential land use were employed to estimate the number of automobile trips generated in the region and to model air pollution hazards.
Estimation from incomplete multinomial data. Ph.D. Thesis - Harvard Univ.
NASA Technical Reports Server (NTRS)
Credeur, K. R.
1978-01-01
The vector of multinomial cell probabilities was estimated from incomplete data, incomplete in that it contains partially classified observations. Each such partially classified observation was observed to fall in one of two or more selected categories but was not classified further into a single category. The data were assumed to be incomplete at random. The estimation criterion was minimization of risk for quadratic loss. The estimators were the classical maximum likelihood estimate, the Bayesian posterior mode, and the posterior mean. An approximation was developed for the posterior mean. The Dirichlet, the conjugate prior for the multinomial distribution, was assumed for the prior distribution.
NASA Astrophysics Data System (ADS)
Rizzo, D. M.; Fytilis, N.; Stevens, L.
2012-12-01
Environmental managers are increasingly required to monitor and forecast long-term effects and vulnerability of biophysical systems to human-generated stresses. Ideally, a study involving both physical and biological assessments conducted concurrently (in space and time) could provide a better understanding of the mechanisms and complex relationships. However, costs and resources associated with monitoring the complex linkages between the physical, geomorphic and habitat conditions and the biological integrity of stream reaches are prohibitive. Researchers have used classification techniques to place individual streams and rivers into a broader spatial context (hydrologic or health condition). Such efforts require environmental managers to gather multiple forms of information - quantitative, qualitative and subjective. We research and develop a novel classification tool that combines self-organizing maps with a Naïve Bayesian classifier to direct resources to stream reaches most in need. The Vermont Agency of Natural Resources has developed and adopted protocols for physical stream geomorphic and habitat assessments throughout the state of Vermont. Separate from these assessments, the Vermont Department of Environmental Conservation monitors the biological communities and the water quality in streams. Our initial hypothesis is that the geomorphic reach assessments and water quality data may be leveraged to reduce error and uncertainty associated with predictions of biological integrity and stream health. We test our hypothesis using over 2500 Vermont stream reaches (~1371 stream miles) assessed by the two agencies. In the development of this work, we combine a Naïve Bayesian classifier with a modified Kohonen Self-Organizing Map (SOM). The SOM is an unsupervised artificial neural network that autonomously analyzes inherent dataset properties using input data only. It is typically used to cluster data into similar categories when a priori classes do not exist. The incorporation of a Bayesian classifier allows one to explicitly incorporate existing knowledge and expert opinion into the data analysis. Since classification plays a leading role in the future development of data-enabled science and engineering, such a computational tool is applicable to a variety of proactive adaptive watershed management applications.
Chai, Rifai; Naik, Ganesh R; Nguyen, Tuan Nghia; Ling, Sai Ho; Tran, Yvonne; Craig, Ashley; Nguyen, Hung T
2017-05-01
This paper presents a two-class electroencephal-ography-based classification for classifying of driver fatigue (fatigue state versus alert state) from 43 healthy participants. The system uses independent component by entropy rate bound minimization analysis (ERBM-ICA) for the source separation, autoregressive (AR) modeling for the features extraction, and Bayesian neural network for the classification algorithm. The classification results demonstrate a sensitivity of 89.7%, a specificity of 86.8%, and an accuracy of 88.2%. The combination of ERBM-ICA (source separator), AR (feature extractor), and Bayesian neural network (classifier) provides the best outcome with a p-value < 0.05 with the highest value of area under the receiver operating curve (AUC-ROC = 0.93) against other methods such as power spectral density as feature extractor (AUC-ROC = 0.81). The results of this study suggest the method could be utilized effectively for a countermeasure device for driver fatigue identification and other adverse event applications.
Nicandro, Cruz-Ramírez; Efrén, Mezura-Montes; María Yaneli, Ameca-Alducin; Enrique, Martín-Del-Campo-Mena; Héctor Gabriel, Acosta-Mesa; Nancy, Pérez-Castro; Alejandro, Guerra-Hernández; Guillermo de Jesús, Hoyos-Rivera; Rocío Erandi, Barrientos-Martínez
2013-01-01
Breast cancer is one of the leading causes of death among women worldwide. There are a number of techniques used for diagnosing this disease: mammography, ultrasound, and biopsy, among others. Each of these has well-known advantages and disadvantages. A relatively new method, based on the temperature a tumor may produce, has recently been explored: thermography. In this paper, we will evaluate the diagnostic power of thermography in breast cancer using Bayesian network classifiers. We will show how the information provided by the thermal image can be used in order to characterize patients suspected of having cancer. Our main contribution is the proposal of a score, based on the aforementioned information, that could help distinguish sick patients from healthy ones. Our main results suggest the potential of this technique in such a goal but also show its main limitations that have to be overcome to consider it as an effective diagnosis complementary tool. PMID:23762182
Classifying environmentally significant urban land uses with satellite imagery.
Park, Mi-Hyun; Stenstrom, Michael K
2008-01-01
We investigated Bayesian networks to classify urban land use from satellite imagery. Landsat Enhanced Thematic Mapper Plus (ETM(+)) images were used for the classification in two study areas: (1) Marina del Rey and its vicinity in the Santa Monica Bay Watershed, CA and (2) drainage basins adjacent to the Sweetwater Reservoir in San Diego, CA. Bayesian networks provided 80-95% classification accuracy for urban land use using four different classification systems. The classifications were robust with small training data sets with normal and reduced radiometric resolution. The networks needed only 5% of the total data (i.e., 1500 pixels) for sample size and only 5- or 6-bit information for accurate classification. The network explicitly showed the relationship among variables from its structure and was also capable of utilizing information from non-spectral data. The classification can be used to provide timely and inexpensive land use information over large areas for environmental purposes such as estimating stormwater pollutant loads.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Stinnett, Jacob; Sullivan, Clair J.; Xiong, Hao
Low-resolution isotope identifiers are widely deployed for nuclear security purposes, but these detectors currently demonstrate problems in making correct identifications in many typical usage scenarios. While there are many hardware alternatives and improvements that can be made, performance on existing low resolution isotope identifiers should be able to be improved by developing new identification algorithms. We have developed a wavelet-based peak extraction algorithm and an implementation of a Bayesian classifier for automated peak-based identification. The peak extraction algorithm has been extended to compute uncertainties in the peak area calculations. To build empirical joint probability distributions of the peak areas andmore » uncertainties, a large set of spectra were simulated in MCNP6 and processed with the wavelet-based feature extraction algorithm. Kernel density estimation was then used to create a new component of the likelihood function in the Bayesian classifier. Furthermore, identification performance is demonstrated on a variety of real low-resolution spectra, including Category I quantities of special nuclear material.« less
Comparisons and Selections of Features and Classifiers for Short Text Classification
NASA Astrophysics Data System (ADS)
Wang, Ye; Zhou, Zhi; Jin, Shan; Liu, Debin; Lu, Mi
2017-10-01
Short text is considerably different from traditional long text documents due to its shortness and conciseness, which somehow hinders the applications of conventional machine learning and data mining algorithms in short text classification. According to traditional artificial intelligence methods, we divide short text classification into three steps, namely preprocessing, feature selection and classifier comparison. In this paper, we have illustrated step-by-step how we approach our goals. Specifically, in feature selection, we compared the performance and robustness of the four methods of one-hot encoding, tf-idf weighting, word2vec and paragraph2vec, and in the classification part, we deliberately chose and compared Naive Bayes, Logistic Regression, Support Vector Machine, K-nearest Neighbor and Decision Tree as our classifiers. Then, we compared and analysed the classifiers horizontally with each other and vertically with feature selections. Regarding the datasets, we crawled more than 400,000 short text files from Shanghai and Shenzhen Stock Exchanges and manually labeled them into two classes, the big and the small. There are eight labels in the big class, and 59 labels in the small class.
Rajendran, Senthilnathan; Jothi, Arunachalam
2018-05-16
The Three-dimensional structure of a protein depends on the interaction between their amino acid residues. These interactions are in turn influenced by various biophysical properties of the amino acids. There are several examples of proteins that share the same fold but are very dissimilar at the sequence level. For proteins to share a common fold some crucial interactions should be maintained despite insignificant sequence similarity. Since the interactions are because of the biophysical properties of the amino acids, we should be able to detect descriptive patterns for folds at such a property level. In this line, the main focus of our research is to analyze such proteins and to characterize them in terms of their biophysical properties. Protein structures with sequence similarity lesser than 40% were selected for ten different subfolds from three different mainfolds (according to CATH classification) and were used for this analysis. We used the normalized values of the 49 physio-chemical, energetic and conformational properties of amino acids. We characterize the folds based on the average biophysical property values. We also observed a fold specific correlational behavior of biophysical properties despite a very low sequence similarity in our data. We further trained three different binary classification models (Naive Bayes-NB, Support Vector Machines-SVM and Bayesian Generalized Linear Model-BGLM) which could discriminate mainfold based on the biophysical properties. We also show that among the three generated models, the BGLM classifier model was able to discriminate protein sequences coming under all beta category with 81.43% accuracy and all alpha, alpha-beta proteins with 83.37% accuracy. Copyright © 2018 Elsevier Ltd. All rights reserved.
Borchani, Hanen; Bielza, Concha; Martı Nez-Martı N, Pablo; Larrañaga, Pedro
2012-12-01
Multi-dimensional Bayesian network classifiers (MBCs) are probabilistic graphical models recently proposed to deal with multi-dimensional classification problems, where each instance in the data set has to be assigned to more than one class variable. In this paper, we propose a Markov blanket-based approach for learning MBCs from data. Basically, it consists of determining the Markov blanket around each class variable using the HITON algorithm, then specifying the directionality over the MBC subgraphs. Our approach is applied to the prediction problem of the European Quality of Life-5 Dimensions (EQ-5D) from the 39-item Parkinson's Disease Questionnaire (PDQ-39) in order to estimate the health-related quality of life of Parkinson's patients. Fivefold cross-validation experiments were carried out on randomly generated synthetic data sets, Yeast data set, as well as on a real-world Parkinson's disease data set containing 488 patients. The experimental study, including comparison with additional Bayesian network-based approaches, back propagation for multi-label learning, multi-label k-nearest neighbor, multinomial logistic regression, ordinary least squares, and censored least absolute deviations, shows encouraging results in terms of predictive accuracy as well as the identification of dependence relationships among class and feature variables. Copyright © 2012 Elsevier Inc. All rights reserved.
Gifford, Robert J.; Rhee, Soo-Yon; Eriksson, Nicolas; Liu, Tommy F.; Kiuchi, Mark; Das, Amar K.; Shafer, Robert W.
2008-01-01
Design Promiscuous guanine (G) to adenine (A) substitutions catalysed by apolipoprotein B RNA-editing catalytic component (APOBEC) enzymes are observed in a proportion of HIV-1 sequences in vivo and can introduce artifacts into some genetic analyses. The potential impact of undetected lethal editing on genotypic estimation of transmitted drug resistance was assessed. Methods Classifiers of lethal, APOBEC-mediated editing were developed by analysis of lentiviral pol gene sequence variation and evaluated using control sets of HIV-1 sequences. The potential impact of sequence editing on genotypic estimation of drug resistance was assessed in sets of sequences obtained from 77 studies of 25 or more therapy-naive individuals, using mixture modelling approaches to determine the maximum likelihood classification of sequences as lethally edited as opposed to viable. Results Analysis of 6437 protease and reverse transcriptase sequences from therapy-naive individuals using a novel classifier of lethal, APOBEC3G-mediated sequence editing, the polypeptide-like 3G (APOBEC3G)-mediated defectives (A3GD) index’, detected lethal editing in association with spurious ‘transmitted drug resistance’ in nearly 3% of proviral sequences obtained from whole blood and 0.2% of samples obtained from plasma. Conclusion Screening for lethally edited sequences in datasets containing a proportion of proviral DNA, such as those likely to be obtained for epidemiological surveillance of transmitted drug resistance in the developing world, can eliminate rare but potentially significant errors in genotypic estimation of transmitted drug resistance. PMID:18356601
Donato, Gianluca; Bartlett, Marian Stewart; Hager, Joseph C.; Ekman, Paul; Sejnowski, Terrence J.
2010-01-01
The Facial Action Coding System (FACS) [23] is an objective method for quantifying facial movement in terms of component actions. This system is widely used in behavioral investigations of emotion, cognitive processes, and social interaction. The coding is presently performed by highly trained human experts. This paper explores and compares techniques for automatically recognizing facial actions in sequences of images. These techniques include analysis of facial motion through estimation of optical flow; holistic spatial analysis, such as principal component analysis, independent component analysis, local feature analysis, and linear discriminant analysis; and methods based on the outputs of local filters, such as Gabor wavelet representations and local principal components. Performance of these systems is compared to naive and expert human subjects. Best performances were obtained using the Gabor wavelet representation and the independent component representation, both of which achieved 96 percent accuracy for classifying 12 facial actions of the upper and lower face. The results provide converging evidence for the importance of using local filters, high spatial frequencies, and statistical independence for classifying facial actions. PMID:21188284
Applying Data Mining Techniques to Improve Breast Cancer Diagnosis.
Diz, Joana; Marreiros, Goreti; Freitas, Alberto
2016-09-01
In the field of breast cancer research, and more than ever, new computer aided diagnosis based systems have been developed aiming to reduce diagnostic tests false-positives. Within this work, we present a data mining based approach which might support oncologists in the process of breast cancer classification and diagnosis. The present study aims to compare two breast cancer datasets and find the best methods in predicting benign/malignant lesions, breast density classification, and even for finding identification (mass / microcalcification distinction). To carry out these tasks, two matrices of texture features extraction were implemented using Matlab, and classified using data mining algorithms, on WEKA. Results revealed good percentages of accuracy for each class: 89.3 to 64.7 % - benign/malignant; 75.8 to 78.3 % - dense/fatty tissue; 71.0 to 83.1 % - finding identification. Among the different tests classifiers, Naive Bayes was the best to identify masses texture, and Random Forests was the first or second best classifier for the majority of tested groups.
Sentimental analysis of Amazon reviews using naïve bayes on laptop products with MongoDB and R
NASA Astrophysics Data System (ADS)
Kamal Hassan, Mohan; Prasanth Shakthi, Sana; Sasikala, R.
2017-11-01
Start In Today’s era the e-commerce is developing rapidly these years, buying products on-line has become more and more fashionable owing to its variety of options, low cost value (high discounts) and quick supply systems, so abundant folks intend to do online shopping. In the meantime the standard and delivery of merchandise is uneven, fake branded products are delivered. We use product users review comments about product and review about retailers from Amazon as data set and classify review text by subjectivity/objectivity and negative/positive attitude of buyer. Such reviews are helpful to some extent, promising both the shoppers and products makers. This paper presents an empirical study of efficacy of classifying product review by tagging the keyword. In the present study, we tend to analyse the fundamentals of determining, positive and negative approach towards the product. Thus we hereby propose completely different approaches by removing the unstructured data and then classifying comments employing Naive Bayes algorithm.
Bayesian spatio-temporal modeling of particulate matter concentrations in Peninsular Malaysia
NASA Astrophysics Data System (ADS)
Manga, Edna; Awang, Norhashidah
2016-06-01
This article presents an application of a Bayesian spatio-temporal Gaussian process (GP) model on particulate matter concentrations from Peninsular Malaysia. We analyze daily PM10 concentration levels from 35 monitoring sites in June and July 2011. The spatiotemporal model set in a Bayesian hierarchical framework allows for inclusion of informative covariates, meteorological variables and spatiotemporal interactions. Posterior density estimates of the model parameters are obtained by Markov chain Monte Carlo methods. Preliminary data analysis indicate information on PM10 levels at sites classified as industrial locations could explain part of the space time variations. We include the site-type indicator in our modeling efforts. Results of the parameter estimates for the fitted GP model show significant spatio-temporal structure and positive effect of the location-type explanatory variable. We also compute some validation criteria for the out of sample sites that show the adequacy of the model for predicting PM10 at unmonitored sites.
Bayes classifiers for imbalanced traffic accidents datasets.
Mujalli, Randa Oqab; López, Griselda; Garach, Laura
2016-03-01
Traffic accidents data sets are usually imbalanced, where the number of instances classified under the killed or severe injuries class (minority) is much lower than those classified under the slight injuries class (majority). This, however, supposes a challenging problem for classification algorithms and may cause obtaining a model that well cover the slight injuries instances whereas the killed or severe injuries instances are misclassified frequently. Based on traffic accidents data collected on urban and suburban roads in Jordan for three years (2009-2011); three different data balancing techniques were used: under-sampling which removes some instances of the majority class, oversampling which creates new instances of the minority class and a mix technique that combines both. In addition, different Bayes classifiers were compared for the different imbalanced and balanced data sets: Averaged One-Dependence Estimators, Weightily Average One-Dependence Estimators, and Bayesian networks in order to identify factors that affect the severity of an accident. The results indicated that using the balanced data sets, especially those created using oversampling techniques, with Bayesian networks improved classifying a traffic accident according to its severity and reduced the misclassification of killed and severe injuries instances. On the other hand, the following variables were found to contribute to the occurrence of a killed causality or a severe injury in a traffic accident: number of vehicles involved, accident pattern, number of directions, accident type, lighting, surface condition, and speed limit. This work, to the knowledge of the authors, is the first that aims at analyzing historical data records for traffic accidents occurring in Jordan and the first to apply balancing techniques to analyze injury severity of traffic accidents. Copyright © 2015 Elsevier Ltd. All rights reserved.
Protein (multi-)location prediction: using location inter-dependencies in a probabilistic framework
2014-01-01
Motivation Knowing the location of a protein within the cell is important for understanding its function, role in biological processes, and potential use as a drug target. Much progress has been made in developing computational methods that predict single locations for proteins. Most such methods are based on the over-simplifying assumption that proteins localize to a single location. However, it has been shown that proteins localize to multiple locations. While a few recent systems attempt to predict multiple locations of proteins, their performance leaves much room for improvement. Moreover, they typically treat locations as independent and do not attempt to utilize possible inter-dependencies among locations. Our hypothesis is that directly incorporating inter-dependencies among locations into both the classifier-learning and the prediction process can improve location prediction performance. Results We present a new method and a preliminary system we have developed that directly incorporates inter-dependencies among locations into the location-prediction process of multiply-localized proteins. Our method is based on a collection of Bayesian network classifiers, where each classifier is used to predict a single location. Learning the structure of each Bayesian network classifier takes into account inter-dependencies among locations, and the prediction process uses estimates involving multiple locations. We evaluate our system on a dataset of single- and multi-localized proteins (the most comprehensive protein multi-localization dataset currently available, derived from the DBMLoc dataset). Our results, obtained by incorporating inter-dependencies, are significantly higher than those obtained by classifiers that do not use inter-dependencies. The performance of our system on multi-localized proteins is comparable to a top performing system (YLoc+), without being restricted only to location-combinations present in the training set. PMID:24646119
Protein (multi-)location prediction: using location inter-dependencies in a probabilistic framework.
Simha, Ramanuja; Shatkay, Hagit
2014-03-19
Knowing the location of a protein within the cell is important for understanding its function, role in biological processes, and potential use as a drug target. Much progress has been made in developing computational methods that predict single locations for proteins. Most such methods are based on the over-simplifying assumption that proteins localize to a single location. However, it has been shown that proteins localize to multiple locations. While a few recent systems attempt to predict multiple locations of proteins, their performance leaves much room for improvement. Moreover, they typically treat locations as independent and do not attempt to utilize possible inter-dependencies among locations. Our hypothesis is that directly incorporating inter-dependencies among locations into both the classifier-learning and the prediction process can improve location prediction performance. We present a new method and a preliminary system we have developed that directly incorporates inter-dependencies among locations into the location-prediction process of multiply-localized proteins. Our method is based on a collection of Bayesian network classifiers, where each classifier is used to predict a single location. Learning the structure of each Bayesian network classifier takes into account inter-dependencies among locations, and the prediction process uses estimates involving multiple locations. We evaluate our system on a dataset of single- and multi-localized proteins (the most comprehensive protein multi-localization dataset currently available, derived from the DBMLoc dataset). Our results, obtained by incorporating inter-dependencies, are significantly higher than those obtained by classifiers that do not use inter-dependencies. The performance of our system on multi-localized proteins is comparable to a top performing system (YLoc+), without being restricted only to location-combinations present in the training set.
A Pairwise Naïve Bayes Approach to Bayesian Classification.
Asafu-Adjei, Josephine K; Betensky, Rebecca A
2015-10-01
Despite the relatively high accuracy of the naïve Bayes (NB) classifier, there may be several instances where it is not optimal, i.e. does not have the same classification performance as the Bayes classifier utilizing the joint distribution of the examined attributes. However, the Bayes classifier can be computationally intractable due to its required knowledge of the joint distribution. Therefore, we introduce a "pairwise naïve" Bayes (PNB) classifier that incorporates all pairwise relationships among the examined attributes, but does not require specification of the joint distribution. In this paper, we first describe the necessary and sufficient conditions under which the PNB classifier is optimal. We then discuss sufficient conditions for which the PNB classifier, and not NB, is optimal for normal attributes. Through simulation and actual studies, we evaluate the performance of our proposed classifier relative to the Bayes and NB classifiers, along with the HNB, AODE, LBR and TAN classifiers, using normal density and empirical estimation methods. Our applications show that the PNB classifier using normal density estimation yields the highest accuracy for data sets containing continuous attributes. We conclude that it offers a useful compromise between the Bayes and NB classifiers.
Pumpe, Daniel; Greiner, Maksim; Müller, Ewald; Enßlin, Torsten A
2016-07-01
Stochastic differential equations describe well many physical, biological, and sociological systems, despite the simplification often made in their derivation. Here the usage of simple stochastic differential equations to characterize and classify complex dynamical systems is proposed within a Bayesian framework. To this end, we develop a dynamic system classifier (DSC). The DSC first abstracts training data of a system in terms of time-dependent coefficients of the descriptive stochastic differential equation. Thereby the DSC identifies unique correlation structures within the training data. For definiteness we restrict the presentation of the DSC to oscillation processes with a time-dependent frequency ω(t) and damping factor γ(t). Although real systems might be more complex, this simple oscillator captures many characteristic features. The ω and γ time lines represent the abstract system characterization and permit the construction of efficient signal classifiers. Numerical experiments show that such classifiers perform well even in the low signal-to-noise regime.
Wang, Chao; Gao, Qiong; Wang, Xian; Yu, Mei
2015-01-01
Land use land cover (LULC) changes frequently in ecotones due to the large climate and soil gradients, and complex landscape composition and configuration. Accurate mapping of LULC changes in ecotones is of great importance for assessment of ecosystem functions/services and policy-decision support. Decadal or sub-decadal mapping of LULC provides scenarios for modeling biogeochemical processes and their feedbacks to climate, and evaluating effectiveness of land-use policies, e.g. forest conversion. However, it remains a great challenge to produce reliable LULC maps in moderate resolution and to evaluate their uncertainties over large areas with complex landscapes. In this study we developed a robust LULC classification system using multiple classifiers based on MODIS (Moderate Resolution Imaging Spectroradiometer) data and posterior data fusion. Not only does the system create LULC maps with high statistical accuracy, but also it provides pixel-level uncertainties that are essential for subsequent analyses and applications. We applied the classification system to the Agro-pasture transition band in northern China (APTBNC) to detect the decadal changes in LULC during 2003-2013 and evaluated the effectiveness of the implementation of major Key Forestry Programs (KFPs). In our study, the random forest (RF), support vector machine (SVM), and weighted k-nearest neighbors (WKNN) classifiers outperformed the artificial neural networks (ANN) and naive Bayes (NB) in terms of high classification accuracy and low sensitivity to training sample size. The Bayesian-average data fusion based on the results of RF, SVM, and WKNN achieved the 87.5% Kappa statistics, higher than any individual classifiers and the majority-vote integration. The pixel-level uncertainty map agreed with the traditional accuracy assessment. However, it conveys spatial variation of uncertainty. Specifically, it pinpoints the southwestern area of APTBNC has higher uncertainty than other part of the region, and the open shrubland is likely to be misclassified to the bare ground in some locations. Forests, closed shrublands, and grasslands in APTBNC expanded by 23%, 50%, and 9%, respectively, during 2003-2013. The expansion of these land cover types is compensated with the shrinkages in croplands (20%), bare ground (15%), and open shrublands (30%). The significant decline in agricultural lands is primarily attributed to the KFPs implemented in the end of last century and the nationwide urbanization in recent decade. The increased coverage of grass and woody plants would largely reduce soil erosion, improve mitigation of climate change, and enhance carbon sequestration in this region.
Predicting healthcare associated infections using patients' experiences
NASA Astrophysics Data System (ADS)
Pratt, Michael A.; Chu, Henry
2016-05-01
Healthcare associated infections (HAI) are a major threat to patient safety and are costly to health systems. Our goal is to predict the HAI performance of a hospital using the patients' experience responses as input. We use four classifiers, viz. random forest, naive Bayes, artificial feedforward neural networks, and the support vector machine, to perform the prediction of six types of HAI. The six types include blood stream, urinary tract, surgical site, and intestinal infections. Experiments show that the random forest and support vector machine perform well across the six types of HAI.
Application of texture analysis method for mammogram density classification
NASA Astrophysics Data System (ADS)
Nithya, R.; Santhi, B.
2017-07-01
Mammographic density is considered a major risk factor for developing breast cancer. This paper proposes an automated approach to classify breast tissue types in digital mammogram. The main objective of the proposed Computer-Aided Diagnosis (CAD) system is to investigate various feature extraction methods and classifiers to improve the diagnostic accuracy in mammogram density classification. Texture analysis methods are used to extract the features from the mammogram. Texture features are extracted by using histogram, Gray Level Co-Occurrence Matrix (GLCM), Gray Level Run Length Matrix (GLRLM), Gray Level Difference Matrix (GLDM), Local Binary Pattern (LBP), Entropy, Discrete Wavelet Transform (DWT), Wavelet Packet Transform (WPT), Gabor transform and trace transform. These extracted features are selected using Analysis of Variance (ANOVA). The features selected by ANOVA are fed into the classifiers to characterize the mammogram into two-class (fatty/dense) and three-class (fatty/glandular/dense) breast density classification. This work has been carried out by using the mini-Mammographic Image Analysis Society (MIAS) database. Five classifiers are employed namely, Artificial Neural Network (ANN), Linear Discriminant Analysis (LDA), Naive Bayes (NB), K-Nearest Neighbor (KNN), and Support Vector Machine (SVM). Experimental results show that ANN provides better performance than LDA, NB, KNN and SVM classifiers. The proposed methodology has achieved 97.5% accuracy for three-class and 99.37% for two-class density classification.
NASA Astrophysics Data System (ADS)
Sakuma, Jun; Wright, Rebecca N.
Privacy-preserving classification is the task of learning or training a classifier on the union of privately distributed datasets without sharing the datasets. The emphasis of existing studies in privacy-preserving classification has primarily been put on the design of privacy-preserving versions of particular data mining algorithms, However, in classification problems, preprocessing and postprocessing— such as model selection or attribute selection—play a prominent role in achieving higher classification accuracy. In this paper, we show generalization error of classifiers in privacy-preserving classification can be securely evaluated without sharing prediction results. Our main technical contribution is a new generalized Hamming distance protocol that is universally applicable to preprocessing and postprocessing of various privacy-preserving classification problems, such as model selection in support vector machine and attribute selection in naive Bayes classification.
On the classification techniques in data mining for microarray data classification
NASA Astrophysics Data System (ADS)
Aydadenta, Husna; Adiwijaya
2018-03-01
Cancer is one of the deadly diseases, according to data from WHO by 2015 there are 8.8 million more deaths caused by cancer, and this will increase every year if not resolved earlier. Microarray data has become one of the most popular cancer-identification studies in the field of health, since microarray data can be used to look at levels of gene expression in certain cell samples that serve to analyze thousands of genes simultaneously. By using data mining technique, we can classify the sample of microarray data thus it can be identified with cancer or not. In this paper we will discuss some research using some data mining techniques using microarray data, such as Support Vector Machine (SVM), Artificial Neural Network (ANN), Naive Bayes, k-Nearest Neighbor (kNN), and C4.5, and simulation of Random Forest algorithm with technique of reduction dimension using Relief. The result of this paper show performance measure (accuracy) from classification algorithm (SVM, ANN, Naive Bayes, kNN, C4.5, and Random Forets).The results in this paper show the accuracy of Random Forest algorithm higher than other classification algorithms (Support Vector Machine (SVM), Artificial Neural Network (ANN), Naive Bayes, k-Nearest Neighbor (kNN), and C4.5). It is hoped that this paper can provide some information about the speed, accuracy, performance and computational cost generated from each Data Mining Classification Technique based on microarray data.
Using Computational Text Classification for Qualitative Research and Evaluation in Extension
ERIC Educational Resources Information Center
Smith, Justin G.; Tissing, Reid
2018-01-01
This article introduces a process for computational text classification that can be used in a variety of qualitative research and evaluation settings. The process leverages supervised machine learning based on an implementation of a multinomial Bayesian classifier. Applied to a community of inquiry framework, the algorithm was used to identify…
Stinnett, Jacob; Sullivan, Clair J.; Xiong, Hao
2017-03-02
Low-resolution isotope identifiers are widely deployed for nuclear security purposes, but these detectors currently demonstrate problems in making correct identifications in many typical usage scenarios. While there are many hardware alternatives and improvements that can be made, performance on existing low resolution isotope identifiers should be able to be improved by developing new identification algorithms. We have developed a wavelet-based peak extraction algorithm and an implementation of a Bayesian classifier for automated peak-based identification. The peak extraction algorithm has been extended to compute uncertainties in the peak area calculations. To build empirical joint probability distributions of the peak areas andmore » uncertainties, a large set of spectra were simulated in MCNP6 and processed with the wavelet-based feature extraction algorithm. Kernel density estimation was then used to create a new component of the likelihood function in the Bayesian classifier. Furthermore, identification performance is demonstrated on a variety of real low-resolution spectra, including Category I quantities of special nuclear material.« less
Stephens, David; Diesing, Markus
2014-01-01
Detailed seabed substrate maps are increasingly in demand for effective planning and management of marine ecosystems and resources. It has become common to use remotely sensed multibeam echosounder data in the form of bathymetry and acoustic backscatter in conjunction with ground-truth sampling data to inform the mapping of seabed substrates. Whilst, until recently, such data sets have typically been classified by expert interpretation, it is now obvious that more objective, faster and repeatable methods of seabed classification are required. This study compares the performances of a range of supervised classification techniques for predicting substrate type from multibeam echosounder data. The study area is located in the North Sea, off the north-east coast of England. A total of 258 ground-truth samples were classified into four substrate classes. Multibeam bathymetry and backscatter data, and a range of secondary features derived from these datasets were used in this study. Six supervised classification techniques were tested: Classification Trees, Support Vector Machines, k-Nearest Neighbour, Neural Networks, Random Forest and Naive Bayes. Each classifier was trained multiple times using different input features, including i) the two primary features of bathymetry and backscatter, ii) a subset of the features chosen by a feature selection process and iii) all of the input features. The predictive performances of the models were validated using a separate test set of ground-truth samples. The statistical significance of model performances relative to a simple baseline model (Nearest Neighbour predictions on bathymetry and backscatter) were tested to assess the benefits of using more sophisticated approaches. The best performing models were tree based methods and Naive Bayes which achieved accuracies of around 0.8 and kappa coefficients of up to 0.5 on the test set. The models that used all input features didn't generally perform well, highlighting the need for some means of feature selection.
Bayesian approach to MSD-based analysis of particle motion in live cells.
Monnier, Nilah; Guo, Syuan-Ming; Mori, Masashi; He, Jun; Lénárt, Péter; Bathe, Mark
2012-08-08
Quantitative tracking of particle motion using live-cell imaging is a powerful approach to understanding the mechanism of transport of biological molecules, organelles, and cells. However, inferring complex stochastic motion models from single-particle trajectories in an objective manner is nontrivial due to noise from sampling limitations and biological heterogeneity. Here, we present a systematic Bayesian approach to multiple-hypothesis testing of a general set of competing motion models based on particle mean-square displacements that automatically classifies particle motion, properly accounting for sampling limitations and correlated noise while appropriately penalizing model complexity according to Occam's Razor to avoid over-fitting. We test the procedure rigorously using simulated trajectories for which the underlying physical process is known, demonstrating that it chooses the simplest physical model that explains the observed data. Further, we show that computed model probabilities provide a reliability test for the downstream biological interpretation of associated parameter values. We subsequently illustrate the broad utility of the approach by applying it to disparate biological systems including experimental particle trajectories from chromosomes, kinetochores, and membrane receptors undergoing a variety of complex motions. This automated and objective Bayesian framework easily scales to large numbers of particle trajectories, making it ideal for classifying the complex motion of large numbers of single molecules and cells from high-throughput screens, as well as single-cell-, tissue-, and organism-level studies. Copyright © 2012 Biophysical Society. Published by Elsevier Inc. All rights reserved.
Adaptive sequential Bayesian classification using Page's test
NASA Astrophysics Data System (ADS)
Lynch, Robert S., Jr.; Willett, Peter K.
2002-03-01
In this paper, the previously introduced Mean-Field Bayesian Data Reduction Algorithm is extended for adaptive sequential hypothesis testing utilizing Page's test. In general, Page's test is well understood as a method of detecting a permanent change in distribution associated with a sequence of observations. However, the relationship between detecting a change in distribution utilizing Page's test with that of classification and feature fusion is not well understood. Thus, the contribution of this work is based on developing a method of classifying an unlabeled vector of fused features (i.e., detect a change to an active statistical state) as quickly as possible given an acceptable mean time between false alerts. In this case, the developed classification test can be thought of as equivalent to performing a sequential probability ratio test repeatedly until a class is decided, with the lower log-threshold of each test being set to zero and the upper log-threshold being determined by the expected distance between false alerts. It is of interest to estimate the delay (or, related stopping time) to a classification decision (the number of time samples it takes to classify the target), and the mean time between false alerts, as a function of feature selection and fusion by the Mean-Field Bayesian Data Reduction Algorithm. Results are demonstrated by plotting the delay to declaring the target class versus the mean time between false alerts, and are shown using both different numbers of simulated training data and different numbers of relevant features for each class.
Content-based image retrieval for interstitial lung diseases using classification confidence
NASA Astrophysics Data System (ADS)
Dash, Jatindra Kumar; Mukhopadhyay, Sudipta; Prabhakar, Nidhi; Garg, Mandeep; Khandelwal, Niranjan
2013-02-01
Content Based Image Retrieval (CBIR) system could exploit the wealth of High-Resolution Computed Tomography (HRCT) data stored in the archive by finding similar images to assist radiologists for self learning and differential diagnosis of Interstitial Lung Diseases (ILDs). HRCT findings of ILDs are classified into several categories (e.g. consolidation, emphysema, ground glass, nodular etc.) based on their texture like appearances. Therefore, analysis of ILDs is considered as a texture analysis problem. Many approaches have been proposed for CBIR of lung images using texture as primitive visual content. This paper presents a new approach to CBIR for ILDs. The proposed approach makes use of a trained neural network (NN) to find the output class label of query image. The degree of confidence of the NN classifier is analyzed using Naive Bayes classifier that dynamically takes a decision on the size of the search space to be used for retrieval. The proposed approach is compared with three simple distance based and one classifier based texture retrieval approaches. Experimental results show that the proposed technique achieved highest average percentage precision of 92.60% with lowest standard deviation of 20.82%.
LeVan, P; Urrestarazu, E; Gotman, J
2006-04-01
To devise an automated system to remove artifacts from ictal scalp EEG, using independent component analysis (ICA). A Bayesian classifier was used to determine the probability that 2s epochs of seizure segments decomposed by ICA represented EEG activity, as opposed to artifact. The classifier was trained using numerous statistical, spectral, and spatial features. The system's performance was then assessed using separate validation data. The classifier identified epochs representing EEG activity in the validation dataset with a sensitivity of 82.4% and a specificity of 83.3%. An ICA component was considered to represent EEG activity if the sum of the probabilities that its epochs represented EEG exceeded a threshold predetermined using the training data. Otherwise, the component represented artifact. Using this threshold on the validation set, the identification of EEG components was performed with a sensitivity of 87.6% and a specificity of 70.2%. Most misclassified components were a mixture of EEG and artifactual activity. The automated system successfully rejected a good proportion of artifactual components extracted by ICA, while preserving almost all EEG components. The misclassification rate was comparable to the variability observed in human classification. Current ICA methods of artifact removal require a tedious visual classification of the components. The proposed system automates this process and removes simultaneously multiple types of artifacts.
Bulashevska, Alla; Stein, Martin; Jackson, David; Eils, Roland
2009-12-01
Accurate computational methods that can help to predict biological function of a protein from its sequence are of great interest to research biologists and pharmaceutical companies. One approach to assume the function of proteins is to predict the interactions between proteins and other molecules. In this work, we propose a machine learning method that uses a primary sequence of a domain to predict its propensity for interaction with small molecules. By curating the Pfam database with respect to the small molecule binding ability of its component domains, we have constructed a dataset of small molecule binding and non-binding domains. This dataset was then used as training set to learn a Bayesian classifier, which should distinguish members of each class. The domain sequences of both classes are modelled with Markov chains. In a Jack-knife test, our classification procedure achieved the predictive accuracies of 77.2% and 66.7% for binding and non-binding classes respectively. We demonstrate the applicability of our classifier by using it to identify previously unknown small molecule binding domains. Our predictions are available as supplementary material and can provide very useful information to drug discovery specialists. Given the ubiquitous and essential role small molecules play in biological processes, our method is important for identifying pharmaceutically relevant components of complete proteomes. The software is available from the author upon request.
Bayesian Decision Tree for the Classification of the Mode of Motion in Single-Molecule Trajectories
Türkcan, Silvan; Masson, Jean-Baptiste
2013-01-01
Membrane proteins move in heterogeneous environments with spatially (sometimes temporally) varying friction and with biochemical interactions with various partners. It is important to reliably distinguish different modes of motion to improve our knowledge of the membrane architecture and to understand the nature of interactions between membrane proteins and their environments. Here, we present an analysis technique for single molecule tracking (SMT) trajectories that can determine the preferred model of motion that best matches observed trajectories. The method is based on Bayesian inference to calculate the posteriori probability of an observed trajectory according to a certain model. Information theory criteria, such as the Bayesian information criterion (BIC), the Akaike information criterion (AIC), and modified AIC (AICc), are used to select the preferred model. The considered group of models includes free Brownian motion, and confined motion in 2nd or 4th order potentials. We determine the best information criteria for classifying trajectories. We tested its limits through simulations matching large sets of experimental conditions and we built a decision tree. This decision tree first uses the BIC to distinguish between free Brownian motion and confined motion. In a second step, it classifies the confining potential further using the AIC. We apply the method to experimental Clostridium Perfingens -toxin (CPT) receptor trajectories to show that these receptors are confined by a spring-like potential. An adaptation of this technique was applied on a sliding window in the temporal dimension along the trajectory. We applied this adaptation to experimental CPT trajectories that lose confinement due to disaggregation of confining domains. This new technique adds another dimension to the discussion of SMT data. The mode of motion of a receptor might hold more biologically relevant information than the diffusion coefficient or domain size and may be a better tool to classify and compare different SMT experiments. PMID:24376584
Kotaki, Tomohiro; Khairunisa, Siti Qamariyah; Sukartiningrum, Septhia Dwi; Witaningrum, Adiana Mutamsari; Rusli, Musofa; Diansyah, M Noor; Arfijanto, M Vitanata; Rahayu, Retno Pudji; Nasronudin; Kameoka, Masanori
2014-05-01
Although human immunodeficiency virus type 1 (HIV-1) infection causes serious health problems in Indonesia, information in regard to drug resistance is limited. We performed a genotypic study on HIV-1 integrase derived from drug-naive individuals in Surabaya, Indonesia. Sequencing analysis revealed that no primary mutations associated with drug resistance to integrase inhibitors were detected; however, secondary mutations, V72I, L74I/M, V165I, V201I, I203M, and S230N, were detected in more than 5% of samples. In addition, V201I was conserved among all samples. Most integrase genes were classified into CRF01_AE genes. Interestingly, 40% of the CRF01_AE genes had an unusual insertion in the C-terminus of integrase. These mutations and insertions were considered natural polymorphisms since these mutations coincided with previous reports, and integrase inhibitors have not been used in Indonesia. Our results indicated that further studies may be required to assess the impact of these mutations on integrase inhibitors prior to their introduction into Indonesia.
Rannio, Tuomas; Asikainen, Juha; Kokko, Arto; Hannonen, Pekka; Sokka, Tuulikki
2016-04-01
We analyzed remission rates at 3 and 12 months in patients with rheumatoid arthritis (RA) who were naive for disease-modifying antirheumatic drugs (DMARD) and who were treated in a Finnish rheumatology clinic from 2008 to 2011. We compared remission rates and drug treatments between patients with RA and patients with undifferentiated arthritis (UA). Data from all DMARD-naive RA and UA patients from the healthcare district were collected using software that includes demographic and clinical characteristics, disease activity, medications, and patient-reported outcomes. Our rheumatology clinic applies the treat-to-target principle, electronic monitoring of patients, and multidisciplinary care. Out of 409 patients, 406 had data for classification by the 2010 RA criteria of the American College of Rheumatology/European League Against Rheumatism. A total of 68% were female, and mean age (SD) was 58 (16) years. Respectively, 56%, 60%, and 68% were positive for anticyclic citrullinated peptide antibodies (anti-CCP), rheumatoid factor (RF), and RF/anti-CCP, and 19% had erosive disease. The median (interquartile range) duration of symptoms was 6 (4-12) months. A total of 310 were classified as RA and 96 as UA. The patients with UA were younger, had better functional status and lower disease activity, and were more often seronegative than the patients with RA. The 28-joint Disease Activity Score (3 variables) remission rates of RA and UA patients at 3 months were 67% and 58% (p = 0.13), and at 12 months, 71% and 79%, respectively (p = 0.16). Sustained remission was observed in 57%/56% of RA/UA patients. Patients with RA used more conventional synthetic DMARD combinations than did patients with UA. None used biological DMARD at 3 months, and only 2.7%/1.1% of the patients (RA/UA) used them at 12 months (p = 0.36). Remarkably high remission rates are achievable in real-world DMARD-naive patients with RA or UA.
2010-01-01
Background Methods for the calculation and application of quantitative electromyographic (EMG) statistics for the characterization of EMG data detected from forearm muscles of individuals with and without pain associated with repetitive strain injury are presented. Methods A classification procedure using a multi-stage application of Bayesian inference is presented that characterizes a set of motor unit potentials acquired using needle electromyography. The utility of this technique in characterizing EMG data obtained from both normal individuals and those presenting with symptoms of "non-specific arm pain" is explored and validated. The efficacy of the Bayesian technique is compared with simple voting methods. Results The aggregate Bayesian classifier presented is found to perform with accuracy equivalent to that of majority voting on the test data, with an overall accuracy greater than 0.85. Theoretical foundations of the technique are discussed, and are related to the observations found. Conclusions Aggregation of motor unit potential conditional probability distributions estimated using quantitative electromyographic analysis, may be successfully used to perform electrodiagnostic characterization of "non-specific arm pain." It is expected that these techniques will also be able to be applied to other types of electrodiagnostic data. PMID:20156353
Computer-aided diagnosis with potential application to rapid detection of disease outbreaks.
Burr, Tom; Koster, Frederick; Picard, Rick; Forslund, Dave; Wokoun, Doug; Joyce, Ed; Brillman, Judith; Froman, Phil; Lee, Jack
2007-04-15
Our objectives are to quickly interpret symptoms of emergency patients to identify likely syndromes and to improve population-wide disease outbreak detection. We constructed a database of 248 syndromes, each syndrome having an estimated probability of producing any of 85 symptoms, with some two-way, three-way, and five-way probabilities reflecting correlations among symptoms. Using these multi-way probabilities in conjunction with an iterative proportional fitting algorithm allows estimation of full conditional probabilities. Combining these conditional probabilities with misdiagnosis error rates and incidence rates via Bayes theorem, the probability of each syndrome is estimated. We tested a prototype of computer-aided differential diagnosis (CADDY) on simulated data and on more than 100 real cases, including West Nile Virus, Q fever, SARS, anthrax, plague, tularaemia and toxic shock cases. We conclude that: (1) it is important to determine whether the unrecorded positive status of a symptom means that the status is negative or that the status is unknown; (2) inclusion of misdiagnosis error rates produces more realistic results; (3) the naive Bayes classifier, which assumes all symptoms behave independently, is slightly outperformed by CADDY, which includes available multi-symptom information on correlations; as more information regarding symptom correlations becomes available, the advantage of CADDY over the naive Bayes classifier should increase; (4) overlooking low-probability, high-consequence events is less likely if the standard output summary is augmented with a list of rare syndromes that are consistent with observed symptoms, and (5) accumulating patient-level probabilities across a larger population can aid in biosurveillance for disease outbreaks. c 2007 John Wiley & Sons, Ltd.
Combining MLC and SVM Classifiers for Learning Based Decision Making: Analysis and Evaluations
Zhang, Yi; Ren, Jinchang; Jiang, Jianmin
2015-01-01
Maximum likelihood classifier (MLC) and support vector machines (SVM) are two commonly used approaches in machine learning. MLC is based on Bayesian theory in estimating parameters of a probabilistic model, whilst SVM is an optimization based nonparametric method in this context. Recently, it is found that SVM in some cases is equivalent to MLC in probabilistically modeling the learning process. In this paper, MLC and SVM are combined in learning and classification, which helps to yield probabilistic output for SVM and facilitate soft decision making. In total four groups of data are used for evaluations, covering sonar, vehicle, breast cancer, and DNA sequences. The data samples are characterized in terms of Gaussian/non-Gaussian distributed and balanced/unbalanced samples which are then further used for performance assessment in comparing the SVM and the combined SVM-MLC classifier. Interesting results are reported to indicate how the combined classifier may work under various conditions. PMID:26089862
Combining MLC and SVM Classifiers for Learning Based Decision Making: Analysis and Evaluations.
Zhang, Yi; Ren, Jinchang; Jiang, Jianmin
2015-01-01
Maximum likelihood classifier (MLC) and support vector machines (SVM) are two commonly used approaches in machine learning. MLC is based on Bayesian theory in estimating parameters of a probabilistic model, whilst SVM is an optimization based nonparametric method in this context. Recently, it is found that SVM in some cases is equivalent to MLC in probabilistically modeling the learning process. In this paper, MLC and SVM are combined in learning and classification, which helps to yield probabilistic output for SVM and facilitate soft decision making. In total four groups of data are used for evaluations, covering sonar, vehicle, breast cancer, and DNA sequences. The data samples are characterized in terms of Gaussian/non-Gaussian distributed and balanced/unbalanced samples which are then further used for performance assessment in comparing the SVM and the combined SVM-MLC classifier. Interesting results are reported to indicate how the combined classifier may work under various conditions.
Koch, Tobias; Schultze, Martin; Jeon, Minjeong; Nussbeck, Fridtjof W; Praetorius, Anna-Katharina; Eid, Michael
2016-01-01
Multirater (multimethod, multisource) studies are increasingly applied in psychology. Eid and colleagues (2008) proposed a multilevel confirmatory factor model for multitrait-multimethod (MTMM) data combining structurally different and multiple independent interchangeable methods (raters). In many studies, however, different interchangeable raters (e.g., peers, subordinates) are asked to rate different targets (students, supervisors), leading to violations of the independence assumption and to cross-classified data structures. In the present work, we extend the ML-CFA-MTMM model by Eid and colleagues (2008) to cross-classified multirater designs. The new C4 model (Cross-Classified CTC[M-1] Combination of Methods) accounts for nonindependent interchangeable raters and enables researchers to explicitly model the interaction between targets and raters as a latent variable. Using a real data application, it is shown how credibility intervals of model parameters and different variance components can be obtained using Bayesian estimation techniques.
Human Activity Recognition by Combining a Small Number of Classifiers.
Nazabal, Alfredo; Garcia-Moreno, Pablo; Artes-Rodriguez, Antonio; Ghahramani, Zoubin
2016-09-01
We consider the problem of daily human activity recognition (HAR) using multiple wireless inertial sensors, and specifically, HAR systems with a very low number of sensors, each one providing an estimation of the performed activities. We propose new Bayesian models to combine the output of the sensors. The models are based on a soft outputs combination of individual classifiers to deal with the small number of sensors. We also incorporate the dynamic nature of human activities as a first-order homogeneous Markov chain. We develop both inductive and transductive inference methods for each model to be employed in supervised and semisupervised situations, respectively. Using different real HAR databases, we compare our classifiers combination models against a single classifier that employs all the signals from the sensors. Our models exhibit consistently a reduction of the error rate and an increase of robustness against sensor failures. Our models also outperform other classifiers combination models that do not consider soft outputs and an Markovian structure of the human activities.
Boehm, Udo; Steingroever, Helen; Wagenmakers, Eric-Jan
2018-06-01
An important tool in the advancement of cognitive science are quantitative models that represent different cognitive variables in terms of model parameters. To evaluate such models, their parameters are typically tested for relationships with behavioral and physiological variables that are thought to reflect specific cognitive processes. However, many models do not come equipped with the statistical framework needed to relate model parameters to covariates. Instead, researchers often revert to classifying participants into groups depending on their values on the covariates, and subsequently comparing the estimated model parameters between these groups. Here we develop a comprehensive solution to the covariate problem in the form of a Bayesian regression framework. Our framework can be easily added to existing cognitive models and allows researchers to quantify the evidential support for relationships between covariates and model parameters using Bayes factors. Moreover, we present a simulation study that demonstrates the superiority of the Bayesian regression framework to the conventional classification-based approach.
USDA-ARS?s Scientific Manuscript database
An algorithm using a Bayesian classifier was developed to automatically detect olive fruit fly infestations in x-ray images of olives. The data set consisted of 249 olives with various degrees of infestation and 161 non-infested olives. Each olive was x-rayed on film and digital images were acquired...
Zou, W; Ouyang, H
2016-02-01
We propose a multiple estimation adjustment (MEA) method to correct effect overestimation due to selection bias from a hypothesis-generating study (HGS) in pharmacogenetics. MEA uses a hierarchical Bayesian approach to model individual effect estimates from maximal likelihood estimation (MLE) in a region jointly and shrinks them toward the regional effect. Unlike many methods that model a fixed selection scheme, MEA capitalizes on local multiplicity independent of selection. We compared mean square errors (MSEs) in simulated HGSs from naive MLE, MEA and a conditional likelihood adjustment (CLA) method that model threshold selection bias. We observed that MEA effectively reduced MSE from MLE on null effects with or without selection, and had a clear advantage over CLA on extreme MLE estimates from null effects under lenient threshold selection in small samples, which are common among 'top' associations from a pharmacogenetics HGS.
Improving imbalanced scientific text classification using sampling strategies and dictionaries.
Borrajo, L; Romero, R; Iglesias, E L; Redondo Marey, C M
2011-09-15
Many real applications have the imbalanced class distribution problem, where one of the classes is represented by a very small number of cases compared to the other classes. One of the systems affected are those related to the recovery and classification of scientific documentation. Sampling strategies such as Oversampling and Subsampling are popular in tackling the problem of class imbalance. In this work, we study their effects on three types of classifiers (Knn, SVM and Naive-Bayes) when they are applied to search on the PubMed scientific database. Another purpose of this paper is to study the use of dictionaries in the classification of biomedical texts. Experiments are conducted with three different dictionaries (BioCreative, NLPBA, and an ad-hoc subset of the UniProt database named Protein) using the mentioned classifiers and sampling strategies. Best results were obtained with NLPBA and Protein dictionaries and the SVM classifier using the Subsampling balancing technique. These results were compared with those obtained by other authors using the TREC Genomics 2005 public corpus. Copyright 2011 The Author(s). Published by Journal of Integrative Bioinformatics.
Identifying typical physical activity on smartphone with varying positions and orientations.
Miao, Fen; He, Yi; Liu, Jinlei; Li, Ye; Ayoola, Idowu
2015-04-13
Traditional activity recognition solutions are not widely applicable due to a high cost and inconvenience to use with numerous sensors. This paper aims to automatically recognize physical activity with the help of the built-in sensors of the widespread smartphone without any limitation of firm attachment to the human body. By introducing a method to judge whether the phone is in a pocket, we investigated the data collected from six positions of seven subjects, chose five signals that are insensitive to orientation for activity classification. Decision trees (J48), Naive Bayes and Sequential minimal optimization (SMO) were employed to recognize five activities: static, walking, running, walking upstairs and walking downstairs. The experimental results based on 8,097 activity data demonstrated that the J48 classifier produced the best performance with an average recognition accuracy of 89.6% during the three classifiers, and thus would serve as the optimal online classifier. The utilization of the built-in sensors of the smartphone to recognize typical physical activities without any limitation of firm attachment is feasible.
Classification of Indonesian quote on Twitter using Naïve Bayes
NASA Astrophysics Data System (ADS)
Rachmadany, A.; Pranoto, Y. M.; Gunawan; Multazam, M. T.; Nandiyanto, A. B. D.; Abdullah, A. G.; Widiaty, I.
2018-01-01
Quote is sentences made in the hope that someone can become strong personalities, individuals who always improve themselves to move forward and achieve success. Social media is a place for people to express his heart to the world that sometimes the expression of the heart is quotes. Here, the purpose of this study was to classify Indonesian quote on Twitter using Naïve Bayes. This experiment uses text classification from Twitter data written by Twitter users which are quote then classification again grouped into 6 categories (Love, Life, Motivation, Education, Religion, Others). The language used is Indonesian. The method used is Naive Bayes. The results of this experiment are a web application collection of Indonesian quote that have been classified. This classification gives the user ease in finding quote based on class or keyword. For example, when a user wants to find a 'motivation' quote, this classification would be very useful.
Espino-Hernandez, Gabriela; Gustafson, Paul; Burstyn, Igor
2011-05-14
In epidemiological studies explanatory variables are frequently subject to measurement error. The aim of this paper is to develop a Bayesian method to correct for measurement error in multiple continuous exposures in individually matched case-control studies. This is a topic that has not been widely investigated. The new method is illustrated using data from an individually matched case-control study of the association between thyroid hormone levels during pregnancy and exposure to perfluorinated acids. The objective of the motivating study was to examine the risk of maternal hypothyroxinemia due to exposure to three perfluorinated acids measured on a continuous scale. Results from the proposed method are compared with those obtained from a naive analysis. Using a Bayesian approach, the developed method considers a classical measurement error model for the exposures, as well as the conditional logistic regression likelihood as the disease model, together with a random-effect exposure model. Proper and diffuse prior distributions are assigned, and results from a quality control experiment are used to estimate the perfluorinated acids' measurement error variability. As a result, posterior distributions and 95% credible intervals of the odds ratios are computed. A sensitivity analysis of method's performance in this particular application with different measurement error variability was performed. The proposed Bayesian method to correct for measurement error is feasible and can be implemented using statistical software. For the study on perfluorinated acids, a comparison of the inferences which are corrected for measurement error to those which ignore it indicates that little adjustment is manifested for the level of measurement error actually exhibited in the exposures. Nevertheless, a sensitivity analysis shows that more substantial adjustments arise if larger measurement errors are assumed. In individually matched case-control studies, the use of conditional logistic regression likelihood as a disease model in the presence of measurement error in multiple continuous exposures can be justified by having a random-effect exposure model. The proposed method can be successfully implemented in WinBUGS to correct individually matched case-control studies for several mismeasured continuous exposures under a classical measurement error model.
Acharya, U Rajendra; Sree, S Vinitha; Chattopadhyay, Subhagata; Yu, Wenwei; Ang, Peng Chuan Alvin
2011-06-01
Epilepsy is a common neurological disorder that is characterized by the recurrence of seizures. Electroencephalogram (EEG) signals are widely used to diagnose seizures. Because of the non-linear and dynamic nature of the EEG signals, it is difficult to effectively decipher the subtle changes in these signals by visual inspection and by using linear techniques. Therefore, non-linear methods are being researched to analyze the EEG signals. In this work, we use the recorded EEG signals in Recurrence Plots (RP), and extract Recurrence Quantification Analysis (RQA) parameters from the RP in order to classify the EEG signals into normal, ictal, and interictal classes. Recurrence Plot (RP) is a graph that shows all the times at which a state of the dynamical system recurs. Studies have reported significantly different RQA parameters for the three classes. However, more studies are needed to develop classifiers that use these promising features and present good classification accuracy in differentiating the three types of EEG segments. Therefore, in this work, we have used ten RQA parameters to quantify the important features in the EEG signals.These features were fed to seven different classifiers: Support vector machine (SVM), Gaussian Mixture Model (GMM), Fuzzy Sugeno Classifier, K-Nearest Neighbor (KNN), Naive Bayes Classifier (NBC), Decision Tree (DT), and Radial Basis Probabilistic Neural Network (RBPNN). Our results show that the SVM classifier was able to identify the EEG class with an average efficiency of 95.6%, sensitivity and specificity of 98.9% and 97.8%, respectively.
Using Neural Networks to Classify Digitized Images of Galaxies
NASA Astrophysics Data System (ADS)
Goderya, S. N.; McGuire, P. C.
2000-12-01
Automated classification of Galaxies into Hubble types is of paramount importance to study the large scale structure of the Universe, particularly as survey projects like the Sloan Digital Sky Survey complete their data acquisition of one million galaxies. At present it is not possible to find robust and efficient artificial intelligence based galaxy classifiers. In this study we will summarize progress made in the development of automated galaxy classifiers using neural networks as machine learning tools. We explore the Bayesian linear algorithm, the higher order probabilistic network, the multilayer perceptron neural network and Support Vector Machine Classifier. The performance of any machine classifier is dependant on the quality of the parameters that characterize the different groups of galaxies. Our effort is to develop geometric and invariant moment based parameters as input to the machine classifiers instead of the raw pixel data. Such an approach reduces the dimensionality of the classifier considerably, and removes the effects of scaling and rotation, and makes it easier to solve for the unknown parameters in the galaxy classifier. To judge the quality of training and classification we develop the concept of Mathews coefficients for the galaxy classification community. Mathews coefficients are single numbers that quantify classifier performance even with unequal prior probabilities of the classes.
A Bayesian network model for predicting aquatic toxicity mode ...
The mode of toxic action (MoA) has been recognized as a key determinant of chemical toxicity, but development of predictive MoA classification models in aquatic toxicology has been limited. We developed a Bayesian network model to classify aquatic toxicity MoA using a recently published dataset containing over one thousand chemicals with MoA assignments for aquatic animal toxicity. Two dimensional theoretical chemical descriptors were generated for each chemical using the Toxicity Estimation Software Tool. The model was developed through augmented Markov blanket discovery from the dataset of 1098 chemicals with the MoA broad classifications as a target node. From cross validation, the overall precision for the model was 80.2%. The best precision was for the AChEI MoA (93.5%) where 257 chemicals out of 275 were correctly classified. Model precision was poorest for the reactivity MoA (48.5%) where 48 out of 99 reactive chemicals were correctly classified. Narcosis represented the largest class within the MoA dataset and had a precision and reliability of 80.0%, reflecting the global precision across all of the MoAs. False negatives for narcosis most often fell into electron transport inhibition, neurotoxicity or reactivity MoAs. False negatives for all other MoAs were most often narcosis. A probabilistic sensitivity analysis was undertaken for each MoA to examine the sensitivity to individual and multiple descriptor findings. The results show that the Markov blank
A Bayesian network model for predicting aquatic toxicity mode ...
The mode of toxic action (MoA) has been recognized as a key determinant of chemical toxicity but MoA classification in aquatic toxicology has been limited. We developed a Bayesian network model to classify aquatic toxicity mode of action using a recently published dataset containing over one thousand chemicals with MoA assignments for aquatic animal toxicity. Two dimensional theoretical chemical descriptors were generated for each chemical using the Toxicity Estimation Software Tool. The model was developed through augmented Markov blanket discovery from the data set with the MoA broad classifications as a target node. From cross validation, the overall precision for the model was 80.2% with a R2 of 0.959. The best precision was for the AChEI MoA (93.5%) where 257 chemicals out of 275 were correctly classified. Model precision was poorest for the reactivity MoA (48.5%) where 48 out of 99 reactive chemicals were correctly classified. Narcosis represented the largest class within the MoA dataset and had a precision and reliability of 80.0%, reflecting the global precision across all of the MoAs. False negatives for narcosis most often fell into electron transport inhibition, neurotoxicity or reactivity MoAs. False negatives for all other MoAs were most often narcosis. A probabilistic sensitivity analysis was undertaken for each MoA to examine the sensitivity to individual and multiple descriptor findings. The results show that the Markov blanket of a structurally
ERIC Educational Resources Information Center
Vos, Hans J.
As part of a project formulating optimal rules for decision making in computer assisted instructional systems in which the computer is used as a decision support tool, an approach that simultaneously optimizes classification of students into two treatments, each followed by a mastery decision, is presented using the framework of Bayesian decision…
Using Bayesian Learning to Classify College Algebra Students by Understanding in Real-Time
ERIC Educational Resources Information Center
Cousino, Andrew
2013-01-01
The goal of this work is to provide instructors with detailed information about their classes at each assignment during the term. The information is both on an individual level and at the aggregate level. We used the large number of grades, which are available online these days, along with data-mining techniques to build our models. This enabled…
Madigan, Sheri; Goldberg, Susan; Moran, Greg; Pederson, David R
2004-09-01
Previous research has succeeded in distinguishing among drawings made by children with histories of organized attachment relationships (secure, avoidant, and resistant); however, drawings of children with histories of disorganized attachment have yet to be systematically investigated. The purpose of this study was to determine whether naïve observers would respond differentially to family drawings of 7-year-olds who were classified in infancy as disorganized vs. organized. Seventy-three undergraduate students from one university and 78 from a second viewed 50 family drawings of 7-year-olds (25 by children with organized infant attachment and 25 by children with disorganized infant attachment). Participants were asked to (1) circle the emotion that best described their reaction to the drawings and (2) rate the drawings on 6 bipolar scales. Drawings from children classified as disorganized in infancy evoked positive emotion labels less often and negative emotion labels more often than those children classified as organized. Furthermore, drawings from children classified as disorganized in infancy received higher ratings on scales for disorganization, carelessness, family chaos, bizarreness, uneasiness, and dysfunction. These data indicate that naive observers are relatively successful in distinguishing selected features of drawings by children with histories of disorganized vs. organized attachment.
Basavanhally, Ajay; Viswanath, Satish; Madabhushi, Anant
2015-01-01
Clinical trials increasingly employ medical imaging data in conjunction with supervised classifiers, where the latter require large amounts of training data to accurately model the system. Yet, a classifier selected at the start of the trial based on smaller and more accessible datasets may yield inaccurate and unstable classification performance. In this paper, we aim to address two common concerns in classifier selection for clinical trials: (1) predicting expected classifier performance for large datasets based on error rates calculated from smaller datasets and (2) the selection of appropriate classifiers based on expected performance for larger datasets. We present a framework for comparative evaluation of classifiers using only limited amounts of training data by using random repeated sampling (RRS) in conjunction with a cross-validation sampling strategy. Extrapolated error rates are subsequently validated via comparison with leave-one-out cross-validation performed on a larger dataset. The ability to predict error rates as dataset size increases is demonstrated on both synthetic data as well as three different computational imaging tasks: detecting cancerous image regions in prostate histopathology, differentiating high and low grade cancer in breast histopathology, and detecting cancerous metavoxels in prostate magnetic resonance spectroscopy. For each task, the relationships between 3 distinct classifiers (k-nearest neighbor, naive Bayes, Support Vector Machine) are explored. Further quantitative evaluation in terms of interquartile range (IQR) suggests that our approach consistently yields error rates with lower variability (mean IQRs of 0.0070, 0.0127, and 0.0140) than a traditional RRS approach (mean IQRs of 0.0297, 0.0779, and 0.305) that does not employ cross-validation sampling for all three datasets. PMID:25993029
Sironi, Emanuele; Pinchi, Vilma; Pradella, Francesco; Focardi, Martina; Bozza, Silvia; Taroni, Franco
2018-04-01
Not only does the Bayesian approach offer a rational and logical environment for evidence evaluation in a forensic framework, but it also allows scientists to coherently deal with uncertainty related to a collection of multiple items of evidence, due to its flexible nature. Such flexibility might come at the expense of elevated computational complexity, which can be handled by using specific probabilistic graphical tools, namely Bayesian networks. In the current work, such probabilistic tools are used for evaluating dental evidence related to the development of third molars. A set of relevant properties characterizing the graphical models are discussed and Bayesian networks are implemented to deal with the inferential process laying beyond the estimation procedure, as well as to provide age estimates. Such properties include operationality, flexibility, coherence, transparence and sensitivity. A data sample composed of Italian subjects was employed for the analysis; results were in agreement with previous studies in terms of point estimate and age classification. The influence of the prior probability elicitation in terms of Bayesian estimate and classifies was also analyzed. Findings also supported the opportunity to take into consideration multiple teeth in the evaluative procedure, since it can be shown this results in an increased robustness towards the prior probability elicitation process, as well as in more favorable outcomes from a forensic perspective. Copyright © 2018 Elsevier Ltd and Faculty of Forensic and Legal Medicine. All rights reserved.
2012-01-01
Background A statistical analysis plan (SAP) is a critical link between how a clinical trial is conducted and the clinical study report. To secure objective study results, regulatory bodies expect that the SAP will meet requirements in pre-specifying inferential analyses and other important statistical techniques. To write a good SAP for model-based sensitivity and ancillary analyses involves non-trivial decisions on and justification of many aspects of the chosen setting. In particular, trials with longitudinal count data as primary endpoints pose challenges for model choice and model validation. In the random effects setting, frequentist strategies for model assessment and model diagnosis are complex and not easily implemented and have several limitations. Therefore, it is of interest to explore Bayesian alternatives which provide the needed decision support to finalize a SAP. Methods We focus on generalized linear mixed models (GLMMs) for the analysis of longitudinal count data. A series of distributions with over- and under-dispersion is considered. Additionally, the structure of the variance components is modified. We perform a simulation study to investigate the discriminatory power of Bayesian tools for model criticism in different scenarios derived from the model setting. We apply the findings to the data from an open clinical trial on vertigo attacks. These data are seen as pilot data for an ongoing phase III trial. To fit GLMMs we use a novel Bayesian computational approach based on integrated nested Laplace approximations (INLAs). The INLA methodology enables the direct computation of leave-one-out predictive distributions. These distributions are crucial for Bayesian model assessment. We evaluate competing GLMMs for longitudinal count data according to the deviance information criterion (DIC) or probability integral transform (PIT), and by using proper scoring rules (e.g. the logarithmic score). Results The instruments under study provide excellent tools for preparing decisions within the SAP in a transparent way when structuring the primary analysis, sensitivity or ancillary analyses, and specific analyses for secondary endpoints. The mean logarithmic score and DIC discriminate well between different model scenarios. It becomes obvious that the naive choice of a conventional random effects Poisson model is often inappropriate for real-life count data. The findings are used to specify an appropriate mixed model employed in the sensitivity analyses of an ongoing phase III trial. Conclusions The proposed Bayesian methods are not only appealing for inference but notably provide a sophisticated insight into different aspects of model performance, such as forecast verification or calibration checks, and can be applied within the model selection process. The mean of the logarithmic score is a robust tool for model ranking and is not sensitive to sample size. Therefore, these Bayesian model selection techniques offer helpful decision support for shaping sensitivity and ancillary analyses in a statistical analysis plan of a clinical trial with longitudinal count data as the primary endpoint. PMID:22962944
Adrion, Christine; Mansmann, Ulrich
2012-09-10
A statistical analysis plan (SAP) is a critical link between how a clinical trial is conducted and the clinical study report. To secure objective study results, regulatory bodies expect that the SAP will meet requirements in pre-specifying inferential analyses and other important statistical techniques. To write a good SAP for model-based sensitivity and ancillary analyses involves non-trivial decisions on and justification of many aspects of the chosen setting. In particular, trials with longitudinal count data as primary endpoints pose challenges for model choice and model validation. In the random effects setting, frequentist strategies for model assessment and model diagnosis are complex and not easily implemented and have several limitations. Therefore, it is of interest to explore Bayesian alternatives which provide the needed decision support to finalize a SAP. We focus on generalized linear mixed models (GLMMs) for the analysis of longitudinal count data. A series of distributions with over- and under-dispersion is considered. Additionally, the structure of the variance components is modified. We perform a simulation study to investigate the discriminatory power of Bayesian tools for model criticism in different scenarios derived from the model setting. We apply the findings to the data from an open clinical trial on vertigo attacks. These data are seen as pilot data for an ongoing phase III trial. To fit GLMMs we use a novel Bayesian computational approach based on integrated nested Laplace approximations (INLAs). The INLA methodology enables the direct computation of leave-one-out predictive distributions. These distributions are crucial for Bayesian model assessment. We evaluate competing GLMMs for longitudinal count data according to the deviance information criterion (DIC) or probability integral transform (PIT), and by using proper scoring rules (e.g. the logarithmic score). The instruments under study provide excellent tools for preparing decisions within the SAP in a transparent way when structuring the primary analysis, sensitivity or ancillary analyses, and specific analyses for secondary endpoints. The mean logarithmic score and DIC discriminate well between different model scenarios. It becomes obvious that the naive choice of a conventional random effects Poisson model is often inappropriate for real-life count data. The findings are used to specify an appropriate mixed model employed in the sensitivity analyses of an ongoing phase III trial. The proposed Bayesian methods are not only appealing for inference but notably provide a sophisticated insight into different aspects of model performance, such as forecast verification or calibration checks, and can be applied within the model selection process. The mean of the logarithmic score is a robust tool for model ranking and is not sensitive to sample size. Therefore, these Bayesian model selection techniques offer helpful decision support for shaping sensitivity and ancillary analyses in a statistical analysis plan of a clinical trial with longitudinal count data as the primary endpoint.
NASA Astrophysics Data System (ADS)
Boschetto, Davide; Di Claudio, Gianluca; Mirzaei, Hadis; Leong, Rupert; Grisan, Enrico
2016-03-01
Celiac disease (CD) is an immune-mediated enteropathy triggered by exposure to gluten and similar proteins, affecting genetically susceptible persons, increasing their risk of different complications. Small bowels mucosa damage due to CD involves various degrees of endoscopically relevant lesions, which are not easily recognized: their overall sensitivity and positive predictive values are poor even when zoom-endoscopy is used. Confocal Laser Endomicroscopy (CLE) allows skilled and trained experts to qualitative evaluate mucosa alteration such as a decrease in goblet cells density, presence of villous atrophy or crypt hypertrophy. We present a method for automatically classifying CLE images into three different classes: normal regions, villous atrophy and crypt hypertrophy. This classification is performed after a features selection process, in which four features are extracted from each image, through the application of homomorphic filtering and border identification through Canny and Sobel operators. Three different classifiers have been tested on a dataset of 67 different images labeled by experts in three classes (normal, VA and CH): linear approach, Naive-Bayes quadratic approach and a standard quadratic analysis, all validated with a ten-fold cross validation. Linear classification achieves 82.09% accuracy (class accuracies: 90.32% for normal villi, 82.35% for VA and 68.42% for CH, sensitivity: 0.68, specificity 1.00), Naive Bayes analysis returns 83.58% accuracy (90.32% for normal villi, 70.59% for VA and 84.21% for CH, sensitivity: 0.84 specificity: 0.92), while the quadratic analysis achieves a final accuracy of 94.03% (96.77% accuracy for normal villi, 94.12% for VA and 89.47% for CH, sensitivity: 0.89, specificity: 0.98).
Machine learning approaches to diagnosis and laterality effects in semantic dementia discourse.
Garrard, Peter; Rentoumi, Vassiliki; Gesierich, Benno; Miller, Bruce; Gorno-Tempini, Maria Luisa
2014-06-01
Advances in automatic text classification have been necessitated by the rapid increase in the availability of digital documents. Machine learning (ML) algorithms can 'learn' from data: for instance a ML system can be trained on a set of features derived from written texts belonging to known categories, and learn to distinguish between them. Such a trained system can then be used to classify unseen texts. In this paper, we explore the potential of the technique to classify transcribed speech samples along clinical dimensions, using vocabulary data alone. We report the accuracy with which two related ML algorithms [naive Bayes Gaussian (NBG) and naive Bayes multinomial (NBM)] categorized picture descriptions produced by: 32 semantic dementia (SD) patients versus 10 healthy, age-matched controls; and SD patients with left- (n = 21) versus right-predominant (n = 11) patterns of temporal lobe atrophy. We used information gain (IG) to identify the vocabulary features that were most informative to each of these two distinctions. In the SD versus control classification task, both algorithms achieved accuracies of greater than 90%. In the right- versus left-temporal lobe predominant classification, NBM achieved a high level of accuracy (88%), but this was achieved by both NBM and NBG when the features used in the training set were restricted to those with high values of IG. The most informative features for the patient versus control task were low frequency content words, generic terms and components of metanarrative statements. For the right versus left task the number of informative lexical features was too small to support any specific inferences. An enriched feature set, including values derived from Quantitative Production Analysis (QPA) may shed further light on this little understood distinction. Copyright © 2013 Elsevier Ltd. All rights reserved.
Kesselring, Anouk M; Wit, Ferdinand W; Sabin, Caroline A; Lundgren, Jens D; Gill, M John; Gatell, Jose M; Rauch, Andri; Montaner, Julio S; de Wolf, Frank; Reiss, Peter; Mocroft, Amanda
2009-08-24
This collaboration of seven observational clinical cohorts investigated risk factors for treatment-limiting toxicities in both antiretroviral-naive and experienced patients starting nevirapine-based combination antiretroviral therapy (NVPc). Patients starting NVPc after 1 January 1998 were included. CD4 cell count at starting NVPc was classified as high (>400/microl/>250/microl for men/women, respectively) or low. Cox models were used to investigate risk factors for discontinuations due to hypersensitivity reactions (HSR, n = 6547) and discontinuation of NVPc due to treatment-limiting toxicities and/or patient/physician choice (TOXPC, n = 10,186). Patients were classified according to prior antiretroviral treatment experience and CD4 cell count/viral load at start NVPc. Models were stratified by cohort and adjusted for age, sex, nadir CD4 cell count, calendar year of starting NVPc and mode of transmission. Median time from starting NVPc to TOXPC and HSR were 162 days [interquartile range (IQR) 31-737] and 30 days (IQR 17-60), respectively. In adjusted Cox analyses, compared to naive patients with a low CD4 cell count, treatment-experienced patients with high CD4 cell count and viral load more than 400 had a significantly increased risk for HSR [hazard ratio 1.45, confidence interval (CI) 1.03-2.03] and TOXPC within 18 weeks (hazard ratio 1.34, CI 1.08-1.67). In contrast, treatment-experienced patients with high CD4 cell count and viral load less than 400 had no increased risk for HSR 1.10 (0.82-1.46) or TOXPC within 18 weeks (hazard ratio 0.94, CI 0.78-1.13). Our results suggest it may be relatively well tolerated to initiate NVPc in antiretroviral-experienced patients with high CD4 cell counts provided there is no detectable viremia.
Non-Metric Similarity Measures
2015-03-26
Sunil Aryal and Kai Ming Ting. (2015) A generic ensemble approach to estimate multi-dimensional likelihood in Bayesian classifier learning...Computational Intelligence. http://onlinelibrary.wiley.com/doi/10.1111/coin.12063/abstract 5.2 List of peer-reviewed conference publications [3] Sunil Aryal...International Conference on Data Mining. 707-711. [4] Sunil Aryal, Kai Ming Ting, Jonathan R. Wells and Takashi Washio. (2014) Improv- ing iForest with
Automated classifiers for early detection and diagnosis of retinopathy in diabetic eyes.
Somfai, Gábor Márk; Tátrai, Erika; Laurik, Lenke; Varga, Boglárka; Ölvedy, Veronika; Jiang, Hong; Wang, Jianhua; Smiddy, William E; Somogyi, Anikó; DeBuc, Delia Cabrera
2014-04-12
Artificial neural networks (ANNs) have been used to classify eye diseases, such as diabetic retinopathy (DR) and glaucoma. DR is the leading cause of blindness in working-age adults in the developed world. The implementation of DR diagnostic routines could be feasibly improved by the integration of structural and optical property test measurements of the retinal structure that provide important and complementary information for reaching a diagnosis. In this study, we evaluate the capability of several structural and optical features (thickness, total reflectance and fractal dimension) of various intraretinal layers extracted from optical coherence tomography images to train a Bayesian ANN to discriminate between healthy and diabetic eyes with and with no mild retinopathy. When exploring the probability as to whether the subject's eye was healthy (diagnostic condition, Test 1), we found that the structural and optical property features of the outer plexiform layer (OPL) and the complex formed by the ganglion cell and inner plexiform layers (GCL + IPL) provided the highest probability (positive predictive value (PPV) of 91% and 89%, respectively) for the proportion of patients with positive test results (healthy condition) who were correctly diagnosed (Test 1). The true negative, TP and PPV values remained stable despite the different sizes of training data sets (Test 2). The sensitivity, specificity and PPV were greater or close to 0.70 for the retinal nerve fiber layer's features, photoreceptor outer segments and retinal pigment epithelium when 23 diabetic eyes with mild retinopathy were mixed with 38 diabetic eyes with no retinopathy (Test 3). A Bayesian ANN trained on structural and optical features from optical coherence tomography data can successfully discriminate between healthy and diabetic eyes with and with no retinopathy. The fractal dimension of the OPL and the GCL + IPL complex predicted by the Bayesian radial basis function network provides better diagnostic utility to classify diabetic eyes with mild retinopathy. Moreover, the thickness and fractal dimension parameters of the retinal nerve fiber layer, photoreceptor outer segments and retinal pigment epithelium show promise for the diagnostic classification between diabetic eyes with and with no mild retinopathy.
Yin, Weiwei; Garimalla, Swetha; Moreno, Alberto; Galinski, Mary R; Styczynski, Mark P
2015-08-28
There are increasing efforts to bring high-throughput systems biology techniques to bear on complex animal model systems, often with a goal of learning about underlying regulatory network structures (e.g., gene regulatory networks). However, complex animal model systems typically have significant limitations on cohort sizes, number of samples, and the ability to perform follow-up and validation experiments. These constraints are particularly problematic for many current network learning approaches, which require large numbers of samples and may predict many more regulatory relationships than actually exist. Here, we test the idea that by leveraging the accuracy and efficiency of classifiers, we can construct high-quality networks that capture important interactions between variables in datasets with few samples. We start from a previously-developed tree-like Bayesian classifier and generalize its network learning approach to allow for arbitrary depth and complexity of tree-like networks. Using four diverse sample networks, we demonstrate that this approach performs consistently better at low sample sizes than the Sparse Candidate Algorithm, a representative approach for comparison because it is known to generate Bayesian networks with high positive predictive value. We develop and demonstrate a resampling-based approach to enable the identification of a viable root for the learned tree-like network, important for cases where the root of a network is not known a priori. We also develop and demonstrate an integrated resampling-based approach to the reduction of variable space for the learning of the network. Finally, we demonstrate the utility of this approach via the analysis of a transcriptional dataset of a malaria challenge in a non-human primate model system, Macaca mulatta, suggesting the potential to capture indicators of the earliest stages of cellular differentiation during leukopoiesis. We demonstrate that by starting from effective and efficient approaches for creating classifiers, we can identify interesting tree-like network structures with significant ability to capture the relationships in the training data. This approach represents a promising strategy for inferring networks with high positive predictive value under the constraint of small numbers of samples, meeting a need that will only continue to grow as more high-throughput studies are applied to complex model systems.
Predicting Mycobacterium tuberculosis Complex Clades Using Knowledge-Based Bayesian Networks
Bennett, Kristin P.
2014-01-01
We develop a novel approach for incorporating expert rules into Bayesian networks for classification of Mycobacterium tuberculosis complex (MTBC) clades. The proposed knowledge-based Bayesian network (KBBN) treats sets of expert rules as prior distributions on the classes. Unlike prior knowledge-based support vector machine approaches which require rules expressed as polyhedral sets, KBBN directly incorporates the rules without any modification. KBBN uses data to refine rule-based classifiers when the rule set is incomplete or ambiguous. We develop a predictive KBBN model for 69 MTBC clades found in the SITVIT international collection. We validate the approach using two testbeds that model knowledge of the MTBC obtained from two different experts and large DNA fingerprint databases to predict MTBC genetic clades and sublineages. These models represent strains of MTBC using high-throughput biomarkers called spacer oligonucleotide types (spoligotypes), since these are routinely gathered from MTBC isolates of tuberculosis (TB) patients. Results show that incorporating rules into problems can drastically increase classification accuracy if data alone are insufficient. The SITVIT KBBN is publicly available for use on the World Wide Web. PMID:24864238
Yourganov, Grigori; Schmah, Tanya; Churchill, Nathan W; Berman, Marc G; Grady, Cheryl L; Strother, Stephen C
2014-08-01
The field of fMRI data analysis is rapidly growing in sophistication, particularly in the domain of multivariate pattern classification. However, the interaction between the properties of the analytical model and the parameters of the BOLD signal (e.g. signal magnitude, temporal variance and functional connectivity) is still an open problem. We addressed this problem by evaluating a set of pattern classification algorithms on simulated and experimental block-design fMRI data. The set of classifiers consisted of linear and quadratic discriminants, linear support vector machine, and linear and nonlinear Gaussian naive Bayes classifiers. For linear discriminant, we used two methods of regularization: principal component analysis, and ridge regularization. The classifiers were used (1) to classify the volumes according to the behavioral task that was performed by the subject, and (2) to construct spatial maps that indicated the relative contribution of each voxel to classification. Our evaluation metrics were: (1) accuracy of out-of-sample classification and (2) reproducibility of spatial maps. In simulated data sets, we performed an additional evaluation of spatial maps with ROC analysis. We varied the magnitude, temporal variance and connectivity of simulated fMRI signal and identified the optimal classifier for each simulated environment. Overall, the best performers were linear and quadratic discriminants (operating on principal components of the data matrix) and, in some rare situations, a nonlinear Gaussian naïve Bayes classifier. The results from the simulated data were supported by within-subject analysis of experimental fMRI data, collected in a study of aging. This is the first study that systematically characterizes interactions between analysis model and signal parameters (such as magnitude, variance and correlation) on the performance of pattern classifiers for fMRI. Copyright © 2014 Elsevier Inc. All rights reserved.
CompGC: Efficient Offline/Online Semi-Honest Two-Party Computation
2017-02-03
κ ∈ N : Pr [ ExptprivA,S(κ) = 1 ] ≤ 1 2 + µ(κ) 4.1. Component-Based Secure Two-Party Compu- tation We now briefly describe how to use component-based...number of classes and “F” is the number of features. Specs. Naive CompGC Bost et al. [BPTG15] Data Set N D Time Time* Comm. Time Time* Comm. Time Comm...Rounds Nursery 4 4 40 0.3 40 0.01 2085 21.6 15 ECG 6 4 40 0.4 40 0.1 8816 29.1 22 (c) Decision tree classifier. “ N ” is the number of internal nodes in
GPU-accelerated adjoint algorithmic differentiation
NASA Astrophysics Data System (ADS)
Gremse, Felix; Höfter, Andreas; Razik, Lukas; Kiessling, Fabian; Naumann, Uwe
2016-03-01
Many scientific problems such as classifier training or medical image reconstruction can be expressed as minimization of differentiable real-valued cost functions and solved with iterative gradient-based methods. Adjoint algorithmic differentiation (AAD) enables automated computation of gradients of such cost functions implemented as computer programs. To backpropagate adjoint derivatives, excessive memory is potentially required to store the intermediate partial derivatives on a dedicated data structure, referred to as the ;tape;. Parallelization is difficult because threads need to synchronize their accesses during taping and backpropagation. This situation is aggravated for many-core architectures, such as Graphics Processing Units (GPUs), because of the large number of light-weight threads and the limited memory size in general as well as per thread. We show how these limitations can be mediated if the cost function is expressed using GPU-accelerated vector and matrix operations which are recognized as intrinsic functions by our AAD software. We compare this approach with naive and vectorized implementations for CPUs. We use four increasingly complex cost functions to evaluate the performance with respect to memory consumption and gradient computation times. Using vectorization, CPU and GPU memory consumption could be substantially reduced compared to the naive reference implementation, in some cases even by an order of complexity. The vectorization allowed usage of optimized parallel libraries during forward and reverse passes which resulted in high speedups for the vectorized CPU version compared to the naive reference implementation. The GPU version achieved an additional speedup of 7.5 ± 4.4, showing that the processing power of GPUs can be utilized for AAD using this concept. Furthermore, we show how this software can be systematically extended for more complex problems such as nonlinear absorption reconstruction for fluorescence-mediated tomography.
GPU-Accelerated Adjoint Algorithmic Differentiation.
Gremse, Felix; Höfter, Andreas; Razik, Lukas; Kiessling, Fabian; Naumann, Uwe
2016-03-01
Many scientific problems such as classifier training or medical image reconstruction can be expressed as minimization of differentiable real-valued cost functions and solved with iterative gradient-based methods. Adjoint algorithmic differentiation (AAD) enables automated computation of gradients of such cost functions implemented as computer programs. To backpropagate adjoint derivatives, excessive memory is potentially required to store the intermediate partial derivatives on a dedicated data structure, referred to as the "tape". Parallelization is difficult because threads need to synchronize their accesses during taping and backpropagation. This situation is aggravated for many-core architectures, such as Graphics Processing Units (GPUs), because of the large number of light-weight threads and the limited memory size in general as well as per thread. We show how these limitations can be mediated if the cost function is expressed using GPU-accelerated vector and matrix operations which are recognized as intrinsic functions by our AAD software. We compare this approach with naive and vectorized implementations for CPUs. We use four increasingly complex cost functions to evaluate the performance with respect to memory consumption and gradient computation times. Using vectorization, CPU and GPU memory consumption could be substantially reduced compared to the naive reference implementation, in some cases even by an order of complexity. The vectorization allowed usage of optimized parallel libraries during forward and reverse passes which resulted in high speedups for the vectorized CPU version compared to the naive reference implementation. The GPU version achieved an additional speedup of 7.5 ± 4.4, showing that the processing power of GPUs can be utilized for AAD using this concept. Furthermore, we show how this software can be systematically extended for more complex problems such as nonlinear absorption reconstruction for fluorescence-mediated tomography.
van den Brom, Rob R H; van der Geest, Kornelis S M; Brouwer, Elisabeth; Hospers, Geke A P; Boots, Annemieke M H
2018-06-01
The biological behavior of melanoma is unfavorable in the elderly when compared to young subjects. We hypothesized that differences in T-cell responses might underlie the distinct behavior of melanoma in young and old melanoma patients. Therefore, we investigated the circulating T-cell compartment of 34 patients with metastatic melanoma and 42 controls, which were classified as either young or old. Absolute numbers of CD4+ T cells were decreased in young and old melanoma patients when compared to the age-matched control groups. Percentages of naive and memory CD4+ T cells were not different when comparing old melanoma patients to age-matched controls. Percentages of memory CD4+ T cells tended to be increased in young melanoma patients compared to young controls. Proportions of naive CD4+ T cells were lower in young patients than in age-matched controls, and actually comparable to those in old patients and controls. This was accompanied with increased percentages of memory CD4+ T cells expressing HLA-DR, Ki-67, and PD-1 in young melanoma patients in comparison to the age-matched controls, but not in old patients. Proportions of CD45RA-FOXP3 high memory regulatory T cells were increased in young and old melanoma patients when compared to their age-matched controls, whereas those of CD45RA+FOXP3 low naive regulatory T cells were similar. We observed no clear modulation of the circulating CD8+ T-cell repertoire in melanoma patients. In conclusion, we show that CD4+ T cells of young melanoma patients show signs of activation, whereas these signs are less clear in CD4+ T cells of old patients.
GPU-Accelerated Adjoint Algorithmic Differentiation
Gremse, Felix; Höfter, Andreas; Razik, Lukas; Kiessling, Fabian; Naumann, Uwe
2015-01-01
Many scientific problems such as classifier training or medical image reconstruction can be expressed as minimization of differentiable real-valued cost functions and solved with iterative gradient-based methods. Adjoint algorithmic differentiation (AAD) enables automated computation of gradients of such cost functions implemented as computer programs. To backpropagate adjoint derivatives, excessive memory is potentially required to store the intermediate partial derivatives on a dedicated data structure, referred to as the “tape”. Parallelization is difficult because threads need to synchronize their accesses during taping and backpropagation. This situation is aggravated for many-core architectures, such as Graphics Processing Units (GPUs), because of the large number of light-weight threads and the limited memory size in general as well as per thread. We show how these limitations can be mediated if the cost function is expressed using GPU-accelerated vector and matrix operations which are recognized as intrinsic functions by our AAD software. We compare this approach with naive and vectorized implementations for CPUs. We use four increasingly complex cost functions to evaluate the performance with respect to memory consumption and gradient computation times. Using vectorization, CPU and GPU memory consumption could be substantially reduced compared to the naive reference implementation, in some cases even by an order of complexity. The vectorization allowed usage of optimized parallel libraries during forward and reverse passes which resulted in high speedups for the vectorized CPU version compared to the naive reference implementation. The GPU version achieved an additional speedup of 7.5 ± 4.4, showing that the processing power of GPUs can be utilized for AAD using this concept. Furthermore, we show how this software can be systematically extended for more complex problems such as nonlinear absorption reconstruction for fluorescence-mediated tomography. PMID:26941443
Learning accurate very fast decision trees from uncertain data streams
NASA Astrophysics Data System (ADS)
Liang, Chunquan; Zhang, Yang; Shi, Peng; Hu, Zhengguo
2015-12-01
Most existing works on data stream classification assume the streaming data is precise and definite. Such assumption, however, does not always hold in practice, since data uncertainty is ubiquitous in data stream applications due to imprecise measurement, missing values, privacy protection, etc. The goal of this paper is to learn accurate decision tree models from uncertain data streams for classification analysis. On the basis of very fast decision tree (VFDT) algorithms, we proposed an algorithm for constructing an uncertain VFDT tree with classifiers at tree leaves (uVFDTc). The uVFDTc algorithm can exploit uncertain information effectively and efficiently in both the learning and the classification phases. In the learning phase, it uses Hoeffding bound theory to learn from uncertain data streams and yield fast and reasonable decision trees. In the classification phase, at tree leaves it uses uncertain naive Bayes (UNB) classifiers to improve the classification performance. Experimental results on both synthetic and real-life datasets demonstrate the strong ability of uVFDTc to classify uncertain data streams. The use of UNB at tree leaves has improved the performance of uVFDTc, especially the any-time property, the benefit of exploiting uncertain information, and the robustness against uncertainty.
Bayesian Inference on Malignant Breast Cancer in Nigeria: A Diagnosis of MCMC Convergence
Ogunsakin, Ropo Ebenezer; Siaka, Lougue
2017-01-01
Background: There has been no previous study to classify malignant breast tumor in details based on Markov Chain Monte Carlo (MCMC) convergence in Western, Nigeria. This study therefore aims to profile patients living with benign and malignant breast tumor in two different hospitals among women of Western Nigeria, with a focus on prognostic factors and MCMC convergence. Materials and Methods: A hospital-based record was used to identify prognostic factors for malignant breast cancer among women of Western Nigeria. This paper describes Bayesian inference and demonstrates its usage to estimation of parameters of the logistic regression via Markov Chain Monte Carlo (MCMC) algorithm. The result of the Bayesian approach is compared with the classical statistics. Results: The mean age of the respondents was 42.2 ±16.6 years with 52% of the women aged between 35-49 years. The results of both techniques suggest that age and women with at least high school education have a significantly higher risk of being diagnosed with malignant breast tumors than benign breast tumors. The results also indicate a reduction of standard errors is associated with the coefficients obtained from the Bayesian approach. In addition, simulation result reveal that women with at least high school are 1.3 times more at risk of having malignant breast lesion in western Nigeria compared to benign breast lesion. Conclusion: We concluded that more efforts are required towards creating awareness and advocacy campaigns on how the prevalence of malignant breast lesions can be reduced, especially among women. The application of Bayesian produces precise estimates for modeling malignant breast cancer. PMID:29072396
NASA Astrophysics Data System (ADS)
Verma, Sneha K.; Chun, Sophia; Liu, Brent J.
2014-03-01
Pain is a common complication after spinal cord injury with prevalence estimates ranging 77% to 81%, which highly affects a patient's lifestyle and well-being. In the current clinical setting paper-based forms are used to classify pain correctly, however, the accuracy of diagnoses and optimal management of pain largely depend on the expert reviewer, which in many cases is not possible because of very few experts in this field. The need for a clinical decision support system that can be used by expert and non-expert clinicians has been cited in literature, but such a system has not been developed. We have designed and developed a stand-alone tool for correctly classifying pain type in spinal cord injury (SCI) patients, using Bayesian decision theory. Various machine learning simulation methods are used to verify the algorithm using a pilot study data set, which consists of 48 patients data set. The data set consists of the paper-based forms, collected at Long Beach VA clinic with pain classification done by expert in the field. Using the WEKA as the machine learning tool we have tested on the 48 patient dataset that the hypothesis that attributes collected on the forms and the pain location marked by patients have very significant impact on the pain type classification. This tool will be integrated with an imaging informatics system to support a clinical study that will test the effectiveness of using Proton Beam radiotherapy for treating spinal cord injury (SCI) related neuropathic pain as an alternative to invasive surgical lesioning.
New insights into the classification and nomenclature of cortical GABAergic interneurons.
DeFelipe, Javier; López-Cruz, Pedro L; Benavides-Piccione, Ruth; Bielza, Concha; Larrañaga, Pedro; Anderson, Stewart; Burkhalter, Andreas; Cauli, Bruno; Fairén, Alfonso; Feldmeyer, Dirk; Fishell, Gord; Fitzpatrick, David; Freund, Tamás F; González-Burgos, Guillermo; Hestrin, Shaul; Hill, Sean; Hof, Patrick R; Huang, Josh; Jones, Edward G; Kawaguchi, Yasuo; Kisvárday, Zoltán; Kubota, Yoshiyuki; Lewis, David A; Marín, Oscar; Markram, Henry; McBain, Chris J; Meyer, Hanno S; Monyer, Hannah; Nelson, Sacha B; Rockland, Kathleen; Rossier, Jean; Rubenstein, John L R; Rudy, Bernardo; Scanziani, Massimo; Shepherd, Gordon M; Sherwood, Chet C; Staiger, Jochen F; Tamás, Gábor; Thomson, Alex; Wang, Yun; Yuste, Rafael; Ascoli, Giorgio A
2013-03-01
A systematic classification and accepted nomenclature of neuron types is much needed but is currently lacking. This article describes a possible taxonomical solution for classifying GABAergic interneurons of the cerebral cortex based on a novel, web-based interactive system that allows experts to classify neurons with pre-determined criteria. Using Bayesian analysis and clustering algorithms on the resulting data, we investigated the suitability of several anatomical terms and neuron names for cortical GABAergic interneurons. Moreover, we show that supervised classification models could automatically categorize interneurons in agreement with experts' assignments. These results demonstrate a practical and objective approach to the naming, characterization and classification of neurons based on community consensus.
New insights into the classification and nomenclature of cortical GABAergic interneurons
DeFelipe, Javier; López-Cruz, Pedro L.; Benavides-Piccione, Ruth; Bielza, Concha; Larrañaga, Pedro; Anderson, Stewart; Burkhalter, Andreas; Cauli, Bruno; Fairén, Alfonso; Feldmeyer, Dirk; Fishell, Gord; Fitzpatrick, David; Freund, Tamás F.; González-Burgos, Guillermo; Hestrin, Shaul; Hill, Sean; Hof, Patrick R.; Huang, Josh; Jones, Edward G.; Kawaguchi, Yasuo; Kisvárday, Zoltán; Kubota, Yoshiyuki; Lewis, David A.; Marín, Oscar; Markram, Henry; McBain, Chris J.; Meyer, Hanno S.; Monyer, Hannah; Nelson, Sacha B.; Rockland, Kathleen; Rossier, Jean; Rubenstein, John L. R.; Rudy, Bernardo; Scanziani, Massimo; Shepherd, Gordon M.; Sherwood, Chet C.; Staiger, Jochen F.; Tamás, Gábor; Thomson, Alex; Wang, Yun; Yuste, Rafael; Ascoli, Giorgio A.
2013-01-01
A systematic classification and accepted nomenclature of neuron types is much needed but is currently lacking. This article describes a possible taxonomical solution for classifying GABAergic interneurons of the cerebral cortex based on a novel, web-based interactive system that allows experts to classify neurons with pre-determined criteria. Using Bayesian analysis and clustering algorithms on the resulting data, we investigated the suitability of several anatomical terms and neuron names for cortical GABAergic interneurons. Moreover, we show that supervised classification models could automatically categorize interneurons in agreement with experts’ assignments. These results demonstrate a practical and objective approach to the naming, characterization and classification of neurons based on community consensus. PMID:23385869
Learning time series for intelligent monitoring
NASA Technical Reports Server (NTRS)
Manganaris, Stefanos; Fisher, Doug
1994-01-01
We address the problem of classifying time series according to their morphological features in the time domain. In a supervised machine-learning framework, we induce a classification procedure from a set of preclassified examples. For each class, we infer a model that captures its morphological features using Bayesian model induction and the minimum message length approach to assign priors. In the performance task, we classify a time series in one of the learned classes when there is enough evidence to support that decision. Time series with sufficiently novel features, belonging to classes not present in the training set, are recognized as such. We report results from experiments in a monitoring domain of interest to NASA.
Carabin, Hélène; Escalona, Marisela; Marshall, Clare; Vivas-Martínez, Sarai; Botto, Carlos; Joseph, Lawrence; Basáñez, María-Gloria
2003-01-01
OBJECTIVE: To develop a Bayesian hierarchical model for human onchocerciasis with which to explore the factors that influence prevalence of microfilariae in the Amazonian focus of onchocerciasis and predict the probability of any community being at least mesoendemic (>20% prevalence of microfilariae), and thus in need of priority ivermectin treatment. METHODS: Models were developed with data from 732 individuals aged > or =15 years who lived in 29 Yanomami communities along four rivers of the south Venezuelan Orinoco basin. The models' abilities to predict prevalences of microfilariae in communities were compared. The deviance information criterion, Bayesian P-values, and residual values were used to select the best model with an approximate cross-validation procedure. FINDINGS: A three-level model that acknowledged clustering of infection within communities performed best, with host age and sex included at the individual level, a river-dependent altitude effect at the community level, and additional clustering of communities along rivers. This model correctly classified 25/29 (86%) villages with respect to their need for priority ivermectin treatment. CONCLUSION: Bayesian methods are a flexible and useful approach for public health research and control planning. Our model acknowledges the clustering of infection within communities, allows investigation of links between individual- or community-specific characteristics and infection, incorporates additional uncertainty due to missing covariate data, and informs policy decisions by predicting the probability that a new community is at least mesoendemic. PMID:12973640
Carabin, Hélène; Escalona, Marisela; Marshall, Clare; Vivas-Martínez, Sarai; Botto, Carlos; Joseph, Lawrence; Basáñez, María-Gloria
2003-01-01
To develop a Bayesian hierarchical model for human onchocerciasis with which to explore the factors that influence prevalence of microfilariae in the Amazonian focus of onchocerciasis and predict the probability of any community being at least mesoendemic (>20% prevalence of microfilariae), and thus in need of priority ivermectin treatment. Models were developed with data from 732 individuals aged > or =15 years who lived in 29 Yanomami communities along four rivers of the south Venezuelan Orinoco basin. The models' abilities to predict prevalences of microfilariae in communities were compared. The deviance information criterion, Bayesian P-values, and residual values were used to select the best model with an approximate cross-validation procedure. A three-level model that acknowledged clustering of infection within communities performed best, with host age and sex included at the individual level, a river-dependent altitude effect at the community level, and additional clustering of communities along rivers. This model correctly classified 25/29 (86%) villages with respect to their need for priority ivermectin treatment. Bayesian methods are a flexible and useful approach for public health research and control planning. Our model acknowledges the clustering of infection within communities, allows investigation of links between individual- or community-specific characteristics and infection, incorporates additional uncertainty due to missing covariate data, and informs policy decisions by predicting the probability that a new community is at least mesoendemic.
Health Problems Discovery from Motion-Capture Data of Elderly
NASA Astrophysics Data System (ADS)
Pogorelc, B.; Gams, M.
Rapid aging of the population of the developed countries could exceed the society's capacity for taking care for them. In order to help solving this problem, we propose a system for automatic discovery of health problems from motion-capture data of gait of elderly. The gait of the user is captured with the motion capture system, which consists of tags attached to the body and sensors situated in the apartment. Position of the tags is acquired by the sensors and the resulting time series of position coordinates are analyzed with machine learning algorithms in order to identify the specific health problem. We propose novel features for training a machine learning classifier that classifies the user's gait into: i) normal, ii) with hemiplegia, iii) with Parkinson's disease, iv) with pain in the back and v) with pain in the leg. Results show that naive Bayes needs more tags and less noise to reach classification accuracy of 98 % than support vector machines for 99 %.
Evolving optimised decision rules for intrusion detection using particle swarm paradigm
NASA Astrophysics Data System (ADS)
Sivatha Sindhu, Siva S.; Geetha, S.; Kannan, A.
2012-12-01
The aim of this article is to construct a practical intrusion detection system (IDS) that properly analyses the statistics of network traffic pattern and classify them as normal or anomalous class. The objective of this article is to prove that the choice of effective network traffic features and a proficient machine-learning paradigm enhances the detection accuracy of IDS. In this article, a rule-based approach with a family of six decision tree classifiers, namely Decision Stump, C4.5, Naive Baye's Tree, Random Forest, Random Tree and Representative Tree model to perform the detection of anomalous network pattern is introduced. In particular, the proposed swarm optimisation-based approach selects instances that compose training set and optimised decision tree operate over this trained set producing classification rules with improved coverage, classification capability and generalisation ability. Experiment with the Knowledge Discovery and Data mining (KDD) data set which have information on traffic pattern, during normal and intrusive behaviour shows that the proposed algorithm produces optimised decision rules and outperforms other machine-learning algorithm.
Classification of older adults with/without a fall history using machine learning methods.
Lin Zhang; Ou Ma; Fabre, Jennifer M; Wood, Robert H; Garcia, Stephanie U; Ivey, Kayla M; McCann, Evan D
2015-01-01
Falling is a serious problem in an aged society such that assessment of the risk of falls for individuals is imperative for the research and practice of falls prevention. This paper introduces an application of several machine learning methods for training a classifier which is capable of classifying individual older adults into a high risk group and a low risk group (distinguished by whether or not the members of the group have a recent history of falls). Using a 3D motion capture system, significant gait features related to falls risk are extracted. By training these features, classification hypotheses are obtained based on machine learning techniques (K Nearest-neighbour, Naive Bayes, Logistic Regression, Neural Network, and Support Vector Machine). Training and test accuracies with sensitivity and specificity of each of these techniques are assessed. The feature adjustment and tuning of the machine learning algorithms are discussed. The outcome of the study will benefit the prediction and prevention of falls.
Ideal discrimination of discrete clinical endpoints using multilocus genotypes.
Hahn, Lance W; Moore, Jason H
2004-01-01
Multifactor Dimensionality Reduction (MDR) is a method for the classification and prediction of discrete clinical endpoints using attributes constructed from multilocus genotype data. Empirical studies with both real and simulated data suggest that MDR has good power for detecting gene-gene interactions in the absence of independent main effects. The purpose of this study is to develop an objective, theory-driven approach to evaluate the strengths and limitations of MDR. To accomplish this goal, we borrow concepts from ideal observer analysis used in visual perception to evaluate the theoretical limits of classifying and predicting discrete clinical endpoints using multilocus genotype data. We conclude that MDR ideally discriminates between low risk and high risk subjects using attributes constructed from multilocus genotype data. We also how that the classification approach used once a multilocus attribute is constructed is similar to that of a naive Bayes classifier. This study provides a theoretical foundation for the continued development, evaluation, and application of the MDR as a data mining tool in the domain of statistical genetics and genetic epidemiology.
Identifying Audiences of E-Infrastructures - Tools for Measuring Impact
van den Besselaar, Peter
2012-01-01
Research evaluation should take into account the intended scholarly and non-scholarly audiences of the research output. This holds too for research infrastructures, which often aim at serving a large variety of audiences. With research and research infrastructures moving to the web, new possibilities are emerging for evaluation metrics. This paper proposes a feasible indicator for measuring the scope of audiences who use web-based e-infrastructures, as well as the frequency of use. In order to apply this indicator, a method is needed for classifying visitors to e-infrastructures into relevant user categories. The paper proposes such a method, based on an inductive logic program and a Bayesian classifier. The method is tested, showing that the visitors are efficiently classified with 90% accuracy into the selected categories. Consequently, the method can be used to evaluate the use of the e-infrastructure within and outside academia. PMID:23239995
Mapping Land Cover Types in Amazon Basin Using 1km JERS-1 Mosaic
NASA Technical Reports Server (NTRS)
Saatchi, Sassan S.; Nelson, Bruce; Podest, Erika; Holt, John
2000-01-01
In this paper, the 100 meter JERS-1 Amazon mosaic image was used in a new classifier to generate a I km resolution land cover map. The inputs to the classifier were 1 km resolution mean backscatter and seven first order texture measures derived from the 100 m data by using a 10 x 10 independent sampling window. The classification approach included two interdependent stages: 1) a supervised maximum a posteriori Bayesian approach to classify the mean backscatter image into 5 general land cover categories of forest, savannah, inundated, white sand, and anthropogenic vegetation classes, and 2) a texture measure decision rule approach to further discriminate subcategory classes based on taxonomic information and biomass levels. Fourteen classes were successfully separated at 1 km scale. The results were verified by examining the accuracy of the approach by comparison with the IBGE and the AVHRR 1 km resolution land cover maps.
Single-accelerometer-based daily physical activity classification.
Long, Xi; Yin, Bin; Aarts, Ronald M
2009-01-01
In this study, a single tri-axial accelerometer placed on the waist was used to record the acceleration data for human physical activity classification. The data collection involved 24 subjects performing daily real-life activities in a naturalistic environment without researchers' intervention. For the purpose of assessing customers' daily energy expenditure, walking, running, cycling, driving, and sports were chosen as target activities for classification. This study compared a Bayesian classification with that of a Decision Tree based approach. A Bayes classifier has the advantage to be more extensible, requiring little effort in classifier retraining and software update upon further expansion or modification of the target activities. Principal components analysis was applied to remove the correlation among features and to reduce the feature vector dimension. Experiments using leave-one-subject-out and 10-fold cross validation protocols revealed a classification accuracy of approximately 80%, which was comparable with that obtained by a Decision Tree classifier.
NASA Astrophysics Data System (ADS)
Lee, Youngjoo; Seo, Joon Beom; Kang, Bokyoung; Kim, Dongil; Lee, June Goo; Kim, Song Soo; Kim, Namkug; Kang, Suk Ho
2007-03-01
The performance of classification algorithms for differentiating among obstructive lung diseases based on features from texture analysis using HRCT (High Resolution Computerized Tomography) images was compared. HRCT can provide accurate information for the detection of various obstructive lung diseases, including centrilobular emphysema, panlobular emphysema and bronchiolitis obliterans. Features on HRCT images can be subtle, however, particularly in the early stages of disease, and image-based diagnosis is subject to inter-observer variation. To automate the diagnosis and improve the accuracy, we compared three types of automated classification systems, naÃve Bayesian classifier, ANN (Artificial Neural Net) and SVM (Support Vector Machine), based on their ability to differentiate among normal lung and three types of obstructive lung diseases. To assess the performance and cross-validation of these three classifiers, 5 folding methods with 5 randomly chosen groups were used. For a more robust result, each validation was repeated 100 times. SVM showed the best performance, with 86.5% overall sensitivity, significantly different from the other classifiers (one way ANOVA, p<0.01). We address the characteristics of each classifier affecting performance and the issue of which classifier is the most suitable for clinical applications, and propose an appropriate method to choose the best classifier and determine its optimal parameters for optimal disease discrimination. These results can be applied to classifiers for differentiation of other diseases.
NASA Astrophysics Data System (ADS)
Furfaro, R.; Linares, R.; Gaylor, D.; Jah, M.; Walls, R.
2016-09-01
In this paper, we present an end-to-end approach that employs machine learning techniques and Ontology-based Bayesian Networks (BN) to characterize the behavior of resident space objects. State-of-the-Art machine learning architectures (e.g. Extreme Learning Machines, Convolutional Deep Networks) are trained on physical models to learn the Resident Space Object (RSO) features in the vectorized energy and momentum states and parameters. The mapping from measurements to vectorized energy and momentum states and parameters enables behavior characterization via clustering in the features space and subsequent RSO classification. Additionally, Space Object Behavioral Ontologies (SOBO) are employed to define and capture the domain knowledge-base (KB) and BNs are constructed from the SOBO in a semi-automatic fashion to execute probabilistic reasoning over conclusions drawn from trained classifiers and/or directly from processed data. Such an approach enables integrating machine learning classifiers and probabilistic reasoning to support higher-level decision making for space domain awareness applications. The innovation here is to use these methods (which have enjoyed great success in other domains) in synergy so that it enables a "from data to discovery" paradigm by facilitating the linkage and fusion of large and disparate sources of information via a Big Data Science and Analytics framework.
Computer-aided diagnosis system: a Bayesian hybrid classification method.
Calle-Alonso, F; Pérez, C J; Arias-Nicolás, J P; Martín, J
2013-10-01
A novel method to classify multi-class biomedical objects is presented. The method is based on a hybrid approach which combines pairwise comparison, Bayesian regression and the k-nearest neighbor technique. It can be applied in a fully automatic way or in a relevance feedback framework. In the latter case, the information obtained from both an expert and the automatic classification is iteratively used to improve the results until a certain accuracy level is achieved, then, the learning process is finished and new classifications can be automatically performed. The method has been applied in two biomedical contexts by following the same cross-validation schemes as in the original studies. The first one refers to cancer diagnosis, leading to an accuracy of 77.35% versus 66.37%, originally obtained. The second one considers the diagnosis of pathologies of the vertebral column. The original method achieves accuracies ranging from 76.5% to 96.7%, and from 82.3% to 97.1% in two different cross-validation schemes. Even with no supervision, the proposed method reaches 96.71% and 97.32% in these two cases. By using a supervised framework the achieved accuracy is 97.74%. Furthermore, all abnormal cases were correctly classified. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
Zhao, Di; Weng, Chunhua
2011-10-01
In this paper, we propose a novel method that combines PubMed knowledge and Electronic Health Records to develop a weighted Bayesian Network Inference (BNI) model for pancreatic cancer prediction. We selected 20 common risk factors associated with pancreatic cancer and used PubMed knowledge to weigh the risk factors. A keyword-based algorithm was developed to extract and classify PubMed abstracts into three categories that represented positive, negative, or neutral associations between each risk factor and pancreatic cancer. Then we designed a weighted BNI model by adding the normalized weights into a conventional BNI model. We used this model to extract the EHR values for patients with or without pancreatic cancer, which then enabled us to calculate the prior probabilities for the 20 risk factors in the BNI. The software iDiagnosis was designed to use this weighted BNI model for predicting pancreatic cancer. In an evaluation using a case-control dataset, the weighted BNI model significantly outperformed the conventional BNI and two other classifiers (k-Nearest Neighbor and Support Vector Machine). We conclude that the weighted BNI using PubMed knowledge and EHR data shows remarkable accuracy improvement over existing representative methods for pancreatic cancer prediction. Copyright © 2011 Elsevier Inc. All rights reserved.
Zhao, Di; Weng, Chunhua
2011-01-01
In this paper, we propose a novel method that combines PubMed knowledge and Electronic Health Records to develop a weighted Bayesian Network Inference (BNI) model for pancreatic cancer prediction. We selected 20 common risk factors associated with pancreatic cancer and used PubMed knowledge to weigh the risk factors. A keyword-based algorithm was developed to extract and classify PubMed abstracts into three categories that represented positive, negative, or neutral associations between each risk factor and pancreatic cancer. Then we designed a weighted BNI model by adding the normalized weights into a conventional BNI model. We used this model to extract the EHR values for patients with or without pancreatic cancer, which then enabled us to calculate the prior probabilities for the 20 risk factors in the BNI. The software iDiagnosis was designed to use this weighted BNI model for predicting pancreatic cancer. In an evaluation using a case-control dataset, the weighted BNI model significantly outperformed the conventional BNI and two other classifiers (k-Nearest Neighbor and Support Vector Machine). We conclude that the weighted BNI using PubMed knowledge and EHR data shows remarkable accuracy improvement over existing representative methods for pancreatic cancer prediction. PMID:21642013
A novel Bayesian framework for discriminative feature extraction in Brain-Computer Interfaces.
Suk, Heung-Il; Lee, Seong-Whan
2013-02-01
As there has been a paradigm shift in the learning load from a human subject to a computer, machine learning has been considered as a useful tool for Brain-Computer Interfaces (BCIs). In this paper, we propose a novel Bayesian framework for discriminative feature extraction for motor imagery classification in an EEG-based BCI in which the class-discriminative frequency bands and the corresponding spatial filters are optimized by means of the probabilistic and information-theoretic approaches. In our framework, the problem of simultaneous spatiospectral filter optimization is formulated as the estimation of an unknown posterior probability density function (pdf) that represents the probability that a single-trial EEG of predefined mental tasks can be discriminated in a state. In order to estimate the posterior pdf, we propose a particle-based approximation method by extending a factored-sampling technique with a diffusion process. An information-theoretic observation model is also devised to measure discriminative power of features between classes. From the viewpoint of classifier design, the proposed method naturally allows us to construct a spectrally weighted label decision rule by linearly combining the outputs from multiple classifiers. We demonstrate the feasibility and effectiveness of the proposed method by analyzing the results and its success on three public databases.
Human Naive T Cells Express Functional CXCL8 and Promote Tumorigenesis.
Crespo, Joel; Wu, Ke; Li, Wei; Kryczek, Ilona; Maj, Tomasz; Vatan, Linda; Wei, Shuang; Opipari, Anthony W; Zou, Weiping
2018-05-25
Naive T cells are thought to be functionally quiescent. In this study, we studied and compared the phenotype, cytokine profile, and potential function of human naive CD4 + T cells in umbilical cord and peripheral blood. We found that naive CD4 + T cells, but not memory T cells, expressed high levels of chemokine CXCL8. CXCL8 + naive T cells were preferentially enriched CD31 + T cells and did not express T cell activation markers or typical Th effector cytokines, including IFN-γ, IL-4, IL-17, and IL-22. In addition, upon activation, naive T cells retained high levels of CXCL8 expression. Furthermore, we showed that naive T cell-derived CXCL8 mediated neutrophil migration in the in vitro migration assay, supported tumor sphere formation, and promoted tumor growth in an in vivo human xenograft model. Thus, human naive T cells are phenotypically and functionally heterogeneous and can carry out active functions in immune responses. Copyright © 2018 by The American Association of Immunologists, Inc.
A Practical Philosophy of Complex Climate Modelling
NASA Technical Reports Server (NTRS)
Schmidt, Gavin A.; Sherwood, Steven
2014-01-01
We give an overview of the practice of developing and using complex climate models, as seen from experiences in a major climate modelling center and through participation in the Coupled Model Intercomparison Project (CMIP).We discuss the construction and calibration of models; their evaluation, especially through use of out-of-sample tests; and their exploitation in multi-model ensembles to identify biases and make predictions. We stress that adequacy or utility of climate models is best assessed via their skill against more naive predictions. The framework we use for making inferences about reality using simulations is naturally Bayesian (in an informal sense), and has many points of contact with more familiar examples of scientific epistemology. While the use of complex simulations in science is a development that changes much in how science is done in practice, we argue that the concepts being applied fit very much into traditional practices of the scientific method, albeit those more often associated with laboratory work.
Huang, Yangxin; Lu, Xiaosun; Chen, Jiaqing; Liang, Juan; Zangmeister, Miriam
2017-10-27
Longitudinal and time-to-event data are often observed together. Finite mixture models are currently used to analyze nonlinear heterogeneous longitudinal data, which, by releasing the homogeneity restriction of nonlinear mixed-effects (NLME) models, can cluster individuals into one of the pre-specified classes with class membership probabilities. This clustering may have clinical significance, and be associated with clinically important time-to-event data. This article develops a joint modeling approach to a finite mixture of NLME models for longitudinal data and proportional hazard Cox model for time-to-event data, linked by individual latent class indicators, under a Bayesian framework. The proposed joint models and method are applied to a real AIDS clinical trial data set, followed by simulation studies to assess the performance of the proposed joint model and a naive two-step model, in which finite mixture model and Cox model are fitted separately.
NASA Astrophysics Data System (ADS)
Nakada, Tomohiro; Takadama, Keiki; Watanabe, Shigeyoshi
This paper proposes the classification method using Bayesian analytical method to classify the time series data in the international emissions trading market depend on the agent-based simulation and compares the case with Discrete Fourier transform analytical method. The purpose demonstrates the analytical methods mapping time series data such as market price. These analytical methods have revealed the following results: (1) the classification methods indicate the distance of mapping from the time series data, it is easier the understanding and inference than time series data; (2) these methods can analyze the uncertain time series data using the distance via agent-based simulation including stationary process and non-stationary process; and (3) Bayesian analytical method can show the 1% difference description of the emission reduction targets of agent.
Tenório, Josceli Maria; Hummel, Anderson Diniz; Cohrs, Frederico Molina; Sdepanian, Vera Lucia; Pisa, Ivan Torres; de Fátima Marin, Heimar
2013-01-01
Background Celiac disease (CD) is a difficult-to-diagnose condition because of its multiple clinical presentations and symptoms shared with other diseases. Gold-standard diagnostic confirmation of suspected CD is achieved by biopsying the small intestine. Objective To develop a clinical decision–support system (CDSS) integrated with an automated classifier to recognize CD cases, by selecting from experimental models developed using intelligence artificial techniques. Methods A web-based system was designed for constructing a retrospective database that included 178 clinical cases for training. Tests were run on 270 automated classifiers available in Weka 3.6.1 using five artificial intelligence techniques, namely decision trees, Bayesian inference, k-nearest neighbor algorithm, support vector machines and artificial neural networks. The parameters evaluated were accuracy, sensitivity, specificity and area under the ROC curve (AUC). AUC was used as a criterion for selecting the CDSS algorithm. A testing database was constructed including 38 clinical CD cases for CDSS evaluation. The diagnoses suggested by CDSS were compared with those made by physicians during patient consultations. Results The most accurate method during the training phase was the averaged one-dependence estimator (AODE) algorithm (a Bayesian classifier), which showed accuracy 80.0%, sensitivity 0.78, specificity 0.80 and AUC 0.84. This classifier was integrated into the web-based decision–support system. The gold-standard validation of CDSS achieved accuracy of 84.2% and k = 0.68 (p < 0.0001) with good agreement. The same accuracy was achieved in the comparison between the physician’s diagnostic impression and the gold standard k = 0. 64 (p < 0.0001). There was moderate agreement between the physician’s diagnostic impression and CDSS k = 0.46 (p = 0.0008). Conclusions The study results suggest that CDSS could be used to help in diagnosing CD, since the algorithm tested achieved excellent accuracy in differentiating possible positive from negative CD diagnoses. This study may contribute towards developing of a computer-assisted environment to support CD diagnosis. PMID:21917512
Tenório, Josceli Maria; Hummel, Anderson Diniz; Cohrs, Frederico Molina; Sdepanian, Vera Lucia; Pisa, Ivan Torres; de Fátima Marin, Heimar
2011-11-01
Celiac disease (CD) is a difficult-to-diagnose condition because of its multiple clinical presentations and symptoms shared with other diseases. Gold-standard diagnostic confirmation of suspected CD is achieved by biopsying the small intestine. To develop a clinical decision-support system (CDSS) integrated with an automated classifier to recognize CD cases, by selecting from experimental models developed using intelligence artificial techniques. A web-based system was designed for constructing a retrospective database that included 178 clinical cases for training. Tests were run on 270 automated classifiers available in Weka 3.6.1 using five artificial intelligence techniques, namely decision trees, Bayesian inference, k-nearest neighbor algorithm, support vector machines and artificial neural networks. The parameters evaluated were accuracy, sensitivity, specificity and area under the ROC curve (AUC). AUC was used as a criterion for selecting the CDSS algorithm. A testing database was constructed including 38 clinical CD cases for CDSS evaluation. The diagnoses suggested by CDSS were compared with those made by physicians during patient consultations. The most accurate method during the training phase was the averaged one-dependence estimator (AODE) algorithm (a Bayesian classifier), which showed accuracy 80.0%, sensitivity 0.78, specificity 0.80 and AUC 0.84. This classifier was integrated into the web-based decision-support system. The gold-standard validation of CDSS achieved accuracy of 84.2% and k=0.68 (p<0.0001) with good agreement. The same accuracy was achieved in the comparison between the physician's diagnostic impression and the gold standard k=0. 64 (p<0.0001). There was moderate agreement between the physician's diagnostic impression and CDSS k=0.46 (p=0.0008). The study results suggest that CDSS could be used to help in diagnosing CD, since the algorithm tested achieved excellent accuracy in differentiating possible positive from negative CD diagnoses. This study may contribute towards developing of a computer-assisted environment to support CD diagnosis. Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.
Sakr, Sherif; Elshawi, Radwa; Ahmed, Amjad M; Qureshi, Waqas T; Brawner, Clinton A; Keteyian, Steven J; Blaha, Michael J; Al-Mallah, Mouaz H
2017-12-19
Prior studies have demonstrated that cardiorespiratory fitness (CRF) is a strong marker of cardiovascular health. Machine learning (ML) can enhance the prediction of outcomes through classification techniques that classify the data into predetermined categories. The aim of this study is to present an evaluation and comparison of how machine learning techniques can be applied on medical records of cardiorespiratory fitness and how the various techniques differ in terms of capabilities of predicting medical outcomes (e.g. mortality). We use data of 34,212 patients free of known coronary artery disease or heart failure who underwent clinician-referred exercise treadmill stress testing at Henry Ford Health Systems Between 1991 and 2009 and had a complete 10-year follow-up. Seven machine learning classification techniques were evaluated: Decision Tree (DT), Support Vector Machine (SVM), Artificial Neural Networks (ANN), Naïve Bayesian Classifier (BC), Bayesian Network (BN), K-Nearest Neighbor (KNN) and Random Forest (RF). In order to handle the imbalanced dataset used, the Synthetic Minority Over-Sampling Technique (SMOTE) is used. Two set of experiments have been conducted with and without the SMOTE sampling technique. On average over different evaluation metrics, SVM Classifier has shown the lowest performance while other models like BN, BC and DT performed better. The RF classifier has shown the best performance (AUC = 0.97) among all models trained using the SMOTE sampling. The results show that various ML techniques can significantly vary in terms of its performance for the different evaluation metrics. It is also not necessarily that the more complex the ML model, the more prediction accuracy can be achieved. The prediction performance of all models trained with SMOTE is much better than the performance of models trained without SMOTE. The study shows the potential of machine learning methods for predicting all-cause mortality using cardiorespiratory fitness data.
NASA Astrophysics Data System (ADS)
Zaryab, Mohammad; Singh-Moon, Rajinder P.; Hendon, Christine P.
2017-02-01
Using light-based catheters for radiofrequency ablation (RFA) therapies grants the ability to accurately derive tissue properties such as lesion depth and overtreatment from spectroscopic information. However, this information is heavily reliant on contact quality with the treatment area and the orientation of the catheter. Thus to improve assessments of tissue properties, this work utilizes Bayesian modelling to classify whether the catheter is indeed in proper contact with the tissue. Initially in-laboratory experiments were conducted with ten fresh swine hearts submerged in blood. A total of 1555 unique near infrared spectra were collected from a spectrometer using a light-based catheter and manually tagged as "full perpendicular contact," "angled contact," and "no contact," between the catheter and heart tissue. Three features were prominent in all spectra for distinguishing purposes: area underneath the spectra, an intensity "valley" between 730 nm and 800 nm, along with the slope between 850 nm and 1150 nm. A classifier featuring bootstrapping, adaboost, and k-means techniques was thus created and achieved a 96.05% accuracy in classifying full contact, 98.33% accuracy in classifying angled contact, and 100% accuracy in classifying no contact.
Enhancing atlas based segmentation with multiclass linear classifiers
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sdika, Michaël, E-mail: michael.sdika@creatis.insa-lyon.fr
Purpose: To present a method to enrich atlases for atlas based segmentation. Such enriched atlases can then be used as a single atlas or within a multiatlas framework. Methods: In this paper, machine learning techniques have been used to enhance the atlas based segmentation approach. The enhanced atlas defined in this work is a pair composed of a gray level image alongside an image of multiclass classifiers with one classifier per voxel. Each classifier embeds local information from the whole training dataset that allows for the correction of some systematic errors in the segmentation and accounts for the possible localmore » registration errors. The authors also propose to use these images of classifiers within a multiatlas framework: results produced by a set of such local classifier atlases can be combined using a label fusion method. Results: Experiments have been made on the in vivo images of the IBSR dataset and a comparison has been made with several state-of-the-art methods such as FreeSurfer and the multiatlas nonlocal patch based method of Coupé or Rousseau. These experiments show that their method is competitive with state-of-the-art methods while having a low computational cost. Further enhancement has also been obtained with a multiatlas version of their method. It is also shown that, in this case, nonlocal fusion is unnecessary. The multiatlas fusion can therefore be done efficiently. Conclusions: The single atlas version has similar quality as state-of-the-arts multiatlas methods but with the computational cost of a naive single atlas segmentation. The multiatlas version offers a improvement in quality and can be done efficiently without a nonlocal strategy.« less
NASA Astrophysics Data System (ADS)
Tahernezhad-Javazm, Farajollah; Azimirad, Vahid; Shoaran, Maryam
2018-04-01
Objective. Considering the importance and the near-future development of noninvasive brain-machine interface (BMI) systems, this paper presents a comprehensive theoretical-experimental survey on the classification and evolutionary methods for BMI-based systems in which EEG signals are used. Approach. The paper is divided into two main parts. In the first part, a wide range of different types of the base and combinatorial classifiers including boosting and bagging classifiers and evolutionary algorithms are reviewed and investigated. In the second part, these classifiers and evolutionary algorithms are assessed and compared based on two types of relatively widely used BMI systems, sensory motor rhythm-BMI and event-related potentials-BMI. Moreover, in the second part, some of the improved evolutionary algorithms as well as bi-objective algorithms are experimentally assessed and compared. Main results. In this study two databases are used, and cross-validation accuracy (CVA) and stability to data volume (SDV) are considered as the evaluation criteria for the classifiers. According to the experimental results on both databases, regarding the base classifiers, linear discriminant analysis and support vector machines with respect to CVA evaluation metric, and naive Bayes with respect to SDV demonstrated the best performances. Among the combinatorial classifiers, four classifiers, Bagg-DT (bagging decision tree), LogitBoost, and GentleBoost with respect to CVA, and Bagging-LR (bagging logistic regression) and AdaBoost (adaptive boosting) with respect to SDV had the best performances. Finally, regarding the evolutionary algorithms, single-objective invasive weed optimization (IWO) and bi-objective nondominated sorting IWO algorithms demonstrated the best performances. Significance. We present a general survey on the base and the combinatorial classification methods for EEG signals (sensory motor rhythm and event-related potentials) as well as their optimization methods through the evolutionary algorithms. In addition, experimental and statistical significance tests are carried out to study the applicability and effectiveness of the reviewed methods.
Blocking the recruitment of naive CD4+ T cells reverses immunosuppression in breast cancer
Su, Shicheng; Liao, Jianyou; Liu, Jiang; Huang, Di; He, Chonghua; Chen, Fei; Yang, LinBing; Wu, Wei; Chen, Jianing; Lin, Ling; Zeng, Yunjie; Ouyang, Nengtai; Cui, Xiuying; Yao, Herui; Su, Fengxi; Huang, Jian-dong; Lieberman, Judy; Liu, Qiang; Song, Erwei
2017-01-01
The origin of tumor-infiltrating Tregs, critical mediators of tumor immunosuppression, is unclear. Here, we show that tumor-infiltrating naive CD4+ T cells and Tregs in human breast cancer have overlapping TCR repertoires, while hardly overlap with circulating Tregs, suggesting that intratumoral Tregs mainly develop from naive T cells in situ rather than from recruited Tregs. Furthermore, the abundance of naive CD4+ T cells and Tregs is closely correlated, both indicating poor prognosis for breast cancer patients. Naive CD4+ T cells adhere to tumor slices in proportion to the abundance of CCL18-producing macrophages. Moreover, adoptively transferred human naive CD4+ T cells infiltrate human breast cancer orthotopic xenografts in a CCL18-dependent manner. In human breast cancer xenografts in humanized mice, blocking the recruitment of naive CD4+ T cells into tumor by knocking down the expression of PITPNM3, a CCL18 receptor, significantly reduces intratumoral Tregs and inhibits tumor progression. These findings suggest that breast tumor-infiltrating Tregs arise from chemotaxis of circulating naive CD4+ T cells that differentiate into Tregs in situ. Inhibiting naive CD4+ T cell recruitment into tumors by interfering with PITPNM3 recognition of CCL18 may be an attractive strategy for anticancer immunotherapy. PMID:28290464
Comparison of Co-Temporal Modeling Algorithms on Sparse Experimental Time Series Data Sets.
Allen, Edward E; Norris, James L; John, David J; Thomas, Stan J; Turkett, William H; Fetrow, Jacquelyn S
2010-01-01
Multiple approaches for reverse-engineering biological networks from time-series data have been proposed in the computational biology literature. These approaches can be classified by their underlying mathematical algorithms, such as Bayesian or algebraic techniques, as well as by their time paradigm, which includes next-state and co-temporal modeling. The types of biological relationships, such as parent-child or siblings, discovered by these algorithms are quite varied. It is important to understand the strengths and weaknesses of the various algorithms and time paradigms on actual experimental data. We assess how well the co-temporal implementations of three algorithms, continuous Bayesian, discrete Bayesian, and computational algebraic, can 1) identify two types of entity relationships, parent and sibling, between biological entities, 2) deal with experimental sparse time course data, and 3) handle experimental noise seen in replicate data sets. These algorithms are evaluated, using the shuffle index metric, for how well the resulting models match literature models in terms of siblings and parent relationships. Results indicate that all three co-temporal algorithms perform well, at a statistically significant level, at finding sibling relationships, but perform relatively poorly in finding parent relationships.
Bayes-LQAS: classifying the prevalence of global acute malnutrition
2010-01-01
Lot Quality Assurance Sampling (LQAS) applications in health have generally relied on frequentist interpretations for statistical validity. Yet health professionals often seek statements about the probability distribution of unknown parameters to answer questions of interest. The frequentist paradigm does not pretend to yield such information, although a Bayesian formulation might. This is the source of an error made in a recent paper published in this journal. Many applications lend themselves to a Bayesian treatment, and would benefit from such considerations in their design. We discuss Bayes-LQAS (B-LQAS), which allows for incorporation of prior information into the LQAS classification procedure, and thus shows how to correct the aforementioned error. Further, we pay special attention to the formulation of Bayes Operating Characteristic Curves and the use of prior information to improve survey designs. As a motivating example, we discuss the classification of Global Acute Malnutrition prevalence and draw parallels between the Bayes and classical classifications schemes. We also illustrate the impact of informative and non-informative priors on the survey design. Results indicate that using a Bayesian approach allows the incorporation of expert information and/or historical data and is thus potentially a valuable tool for making accurate and precise classifications. PMID:20534159
Bayes-LQAS: classifying the prevalence of global acute malnutrition.
Olives, Casey; Pagano, Marcello
2010-06-09
Lot Quality Assurance Sampling (LQAS) applications in health have generally relied on frequentist interpretations for statistical validity. Yet health professionals often seek statements about the probability distribution of unknown parameters to answer questions of interest. The frequentist paradigm does not pretend to yield such information, although a Bayesian formulation might. This is the source of an error made in a recent paper published in this journal. Many applications lend themselves to a Bayesian treatment, and would benefit from such considerations in their design. We discuss Bayes-LQAS (B-LQAS), which allows for incorporation of prior information into the LQAS classification procedure, and thus shows how to correct the aforementioned error. Further, we pay special attention to the formulation of Bayes Operating Characteristic Curves and the use of prior information to improve survey designs. As a motivating example, we discuss the classification of Global Acute Malnutrition prevalence and draw parallels between the Bayes and classical classifications schemes. We also illustrate the impact of informative and non-informative priors on the survey design. Results indicate that using a Bayesian approach allows the incorporation of expert information and/or historical data and is thus potentially a valuable tool for making accurate and precise classifications.
BASiCS: Bayesian Analysis of Single-Cell Sequencing Data
Vallejos, Catalina A.; Marioni, John C.; Richardson, Sylvia
2015-01-01
Single-cell mRNA sequencing can uncover novel cell-to-cell heterogeneity in gene expression levels in seemingly homogeneous populations of cells. However, these experiments are prone to high levels of unexplained technical noise, creating new challenges for identifying genes that show genuine heterogeneous expression within the population of cells under study. BASiCS (Bayesian Analysis of Single-Cell Sequencing data) is an integrated Bayesian hierarchical model where: (i) cell-specific normalisation constants are estimated as part of the model parameters, (ii) technical variability is quantified based on spike-in genes that are artificially introduced to each analysed cell’s lysate and (iii) the total variability of the expression counts is decomposed into technical and biological components. BASiCS also provides an intuitive detection criterion for highly (or lowly) variable genes within the population of cells under study. This is formalised by means of tail posterior probabilities associated to high (or low) biological cell-to-cell variance contributions, quantities that can be easily interpreted by users. We demonstrate our method using gene expression measurements from mouse Embryonic Stem Cells. Cross-validation and meaningful enrichment of gene ontology categories within genes classified as highly (or lowly) variable supports the efficacy of our approach. PMID:26107944
BASiCS: Bayesian Analysis of Single-Cell Sequencing Data.
Vallejos, Catalina A; Marioni, John C; Richardson, Sylvia
2015-06-01
Single-cell mRNA sequencing can uncover novel cell-to-cell heterogeneity in gene expression levels in seemingly homogeneous populations of cells. However, these experiments are prone to high levels of unexplained technical noise, creating new challenges for identifying genes that show genuine heterogeneous expression within the population of cells under study. BASiCS (Bayesian Analysis of Single-Cell Sequencing data) is an integrated Bayesian hierarchical model where: (i) cell-specific normalisation constants are estimated as part of the model parameters, (ii) technical variability is quantified based on spike-in genes that are artificially introduced to each analysed cell's lysate and (iii) the total variability of the expression counts is decomposed into technical and biological components. BASiCS also provides an intuitive detection criterion for highly (or lowly) variable genes within the population of cells under study. This is formalised by means of tail posterior probabilities associated to high (or low) biological cell-to-cell variance contributions, quantities that can be easily interpreted by users. We demonstrate our method using gene expression measurements from mouse Embryonic Stem Cells. Cross-validation and meaningful enrichment of gene ontology categories within genes classified as highly (or lowly) variable supports the efficacy of our approach.
NASA Astrophysics Data System (ADS)
Anitha, J.; Vijila, C. Kezi Selva; Hemanth, D. Jude
2010-02-01
Diabetic retinopathy (DR) is a chronic eye disease for which early detection is highly essential to avoid any fatal results. Image processing of retinal images emerge as a feasible tool for this early diagnosis. Digital image processing techniques involve image classification which is a significant technique to detect the abnormality in the eye. Various automated classification systems have been developed in the recent years but most of them lack high classification accuracy. Artificial neural networks are the widely preferred artificial intelligence technique since it yields superior results in terms of classification accuracy. In this work, Radial Basis function (RBF) neural network based bi-level classification system is proposed to differentiate abnormal DR Images and normal retinal images. The results are analyzed in terms of classification accuracy, sensitivity and specificity. A comparative analysis is performed with the results of the probabilistic classifier namely Bayesian classifier to show the superior nature of neural classifier. Experimental results show promising results for the neural classifier in terms of the performance measures.
Evaluation of Semi-supervised Learning for Classification of Protein Crystallization Imagery.
Sigdel, Madhav; Dinç, İmren; Dinç, Semih; Sigdel, Madhu S; Pusey, Marc L; Aygün, Ramazan S
2014-03-01
In this paper, we investigate the performance of two wrapper methods for semi-supervised learning algorithms for classification of protein crystallization images with limited labeled images. Firstly, we evaluate the performance of semi-supervised approach using self-training with naïve Bayesian (NB) and sequential minimum optimization (SMO) as the base classifiers. The confidence values returned by these classifiers are used to select high confident predictions to be used for self-training. Secondly, we analyze the performance of Yet Another Two Stage Idea (YATSI) semi-supervised learning using NB, SMO, multilayer perceptron (MLP), J48 and random forest (RF) classifiers. These results are compared with the basic supervised learning using the same training sets. We perform our experiments on a dataset consisting of 2250 protein crystallization images for different proportions of training and test data. Our results indicate that NB and SMO using both self-training and YATSI semi-supervised approaches improve accuracies with respect to supervised learning. On the other hand, MLP, J48 and RF perform better using basic supervised learning. Overall, random forest classifier yields the best accuracy with supervised learning for our dataset.
Cinelli, Mattia; Sun, Yuxin; Best, Katharine; Heather, James M; Reich-Zeliger, Shlomit; Shifrut, Eric; Friedman, Nir; Shawe-Taylor, John; Chain, Benny
2017-04-01
Somatic DNA recombination, the hallmark of vertebrate adaptive immunity, has the potential to generate a vast diversity of antigen receptor sequences. How this diversity captures antigen specificity remains incompletely understood. In this study we use high throughput sequencing to compare the global changes in T cell receptor β chain complementarity determining region 3 (CDR3β) sequences following immunization with ovalbumin administered with complete Freund's adjuvant (CFA) or CFA alone. The CDR3β sequences were deconstructed into short stretches of overlapping contiguous amino acids. The motifs were ranked according to a one-dimensional Bayesian classifier score comparing their frequency in the repertoires of the two immunization classes. The top ranking motifs were selected and used to create feature vectors which were used to train a support vector machine. The support vector machine achieved high classification scores in a leave-one-out validation test reaching >90% in some cases. The study describes a novel two-stage classification strategy combining a one-dimensional Bayesian classifier with a support vector machine. Using this approach we demonstrate that the frequency of a small number of linear motifs three amino acids in length can accurately identify a CD4 T cell response to ovalbumin against a background response to the complex mixture of antigens which characterize Complete Freund's Adjuvant. The sequence data is available at www.ncbi.nlm.nih.gov/sra/?term¼SRP075893 . The Decombinator package is available at github.com/innate2adaptive/Decombinator . The R package e1071 is available at the CRAN repository https://cran.r-project.org/web/packages/e1071/index.html . b.chain@ucl.ac.uk. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press.
Sky Mining - Application to Photomorphic Redshift Estimation
NASA Astrophysics Data System (ADS)
Nayak, Pragyansmita
The field of astronomy has evolved from the ancient craft of observing the sky. In it's present form, astronomers explore the cosmos not just by observing through the tiny visible window used by our eyes, but also by exploiting the electromagnetic spectrum from radio waves to gamma rays. The domain is undoubtedly at the forefront of data-driven science. The data growth rate is expected to be around 50%--100% per year. This data explosion is attributed largely to the large-scale wide and deep surveys of the different regions of the sky at multiple wavelengths (both ground and space-based surveys). This dissertation describes the application of machine learning methods to the estimation of galaxy redshifts leveraging such a survey data. Galaxy is a large system of stars held together by mutual gravitation and isolated from similar systems by vast regions of space. Our view of the universe is closely tied to our understanding of galaxy formation. Thus, a better understanding of the relative location of the multitudes of galaxies is crucial. The position of each galaxy can be characterized using three coordinates. Right Ascension (ra) and Declination (dec) are the two coordinates that locate the galaxy in two dimensions on the plane of the sky. It is relatively straightforward to measure them. In contrast, fixing the third coordinate that is the galaxy's distance from the observer along the line of sight (redshift 'z') is considerably more challenging. "Spectroscopic redshift" method gives us accurate and precise measurements of z. However, it is extremely time-intensive and unusable for faint objects. Additionally, the rate at which objects are being identified via photometric surveys far exceeds the rate at which the spectroscopic redshift measurements can keep pace in determining their distance. As the surveys go deeper into the sky, the proportion of faint objects being identified also continues to increase. In order to tackle both these drawbacks increasing in severity every day, alternative method "Photometric redshift" has been studied in the past. It uses the brightness of the object viewed through various standard filters, each of which lets through a relatively broad spectrum of colors. However, these methods are bound by the degeneracy problem (objects with different color profiles have the same redshift) which leads to low predictive accuracy. As part of our study, we are looking beyond color attributes to identify other measured attributes as degeneracy resolvers as well as generate estimators that are highly accurate; termed as "Photomorphic redshift" estimators. The present study investigates the photometric information of the objects such as color and magnitude (= observed flux) and morphology attributes such as shape, size, orientation and concentration in the different wavelengths. The specific type of magnitude used in this study are the PSF, Fiber and Petrosian magnitude. The morphology attributes are the ratio of Fiber to Petrosian magnitude, concentration index and Petrosian radius. All these attributes are in the five bands ugriz of the Sloan Digital Sky Survey (SDSS). Machine learning techniques based on Naive Bayes (NB), Bayesian Network (BN) and Generalized Linear Model (GLM) are researched to better understand their applicability, advantages and resulting predictive performance in terms of efficiency and accuracy. Note: The SDSS Data Release (DR) 10 data was used in the executed experiments (total of 700,777 galaxies with forty-five attributes associated with each galaxy). The significant findings of the present work are as follows: 1. Magnitude and morphology attributes have been found to be successful degeneracy resolvers. 2. Magnitude and morphology attributes have been found to be better redshift estimators than color attributes alone. 3. Naive Bayes, Bayesian Network and GLM have been found to be viable redshift estimation methods. Attribute selection is an important factor in computational performance. 4. In addition to the redshift estimate, the likelihood distribution of the estimate is even more useful, and my Bayesian Network models provide that information. This is particularly useful in ensemble methods as well as the kernel for mass distribution in the universe. 5. The generated Bayesian Network models can be applied to any of the variables, not just limited to redshift. Example applications include quality analysis and missing value imputation. Different types of Bayesian Network learning algorithms---constraint-based, score-based and hybrid---were investigated in detail.
NASA Astrophysics Data System (ADS)
Ojha, Maheswar; Maiti, Saumen
2016-03-01
A novel approach based on the concept of Bayesian neural network (BNN) has been implemented for classifying sediment boundaries using downhole log data obtained during Integrated Ocean Drilling Program (IODP) Expedition 323 in the Bering Sea slope region. The Bayesian framework in conjunction with Markov Chain Monte Carlo (MCMC)/hybrid Monte Carlo (HMC) learning paradigm has been applied to constrain the lithology boundaries using density, density porosity, gamma ray, sonic P-wave velocity and electrical resistivity at the Hole U1344A. We have demonstrated the effectiveness of our supervised classification methodology by comparing our findings with a conventional neural network and a Bayesian neural network optimized by scaled conjugate gradient method (SCG), and tested the robustness of the algorithm in the presence of red noise in the data. The Bayesian results based on the HMC algorithm (BNN.HMC) resolve detailed finer structures at certain depths in addition to main lithology such as silty clay, diatom clayey silt and sandy silt. Our method also recovers the lithology information from a depth ranging between 615 and 655 m Wireline log Matched depth below Sea Floor of no core recovery zone. Our analyses demonstrate that the BNN based approach renders robust means for the classification of complex lithology successions at the Hole U1344A, which could be very useful for other studies and understanding the oceanic crustal inhomogeneity and structural discontinuities.
Classification of iRBD and Parkinson's disease patients based on eye movements during sleep.
Christensen, Julie A E; Koch, Henriette; Frandsen, Rune; Kempfner, Jacob; Arvastson, Lars; Christensen, Soren R; Sorensen, Helge B D; Jennum, Poul
2013-01-01
Patients suffering from the sleep disorder idiopathic rapid-eye-movement sleep behavior disorder (iRBD) have been observed to be in high risk of developing Parkinson's disease (PD). This makes it essential to analyze them in the search for PD biomarkers. This study aims at classifying patients suffering from iRBD or PD based on features reflecting eye movements (EMs) during sleep. A Latent Dirichlet Allocation (LDA) topic model was developed based on features extracted from two electrooculographic (EOG) signals measured as parts in full night polysomnographic (PSG) recordings from ten control subjects. The trained model was tested on ten other control subjects, ten iRBD patients and ten PD patients, obtaining a EM topic mixture diagram for each subject in the test dataset. Three features were extracted from the topic mixture diagrams, reflecting "certainty", "fragmentation" and "stability" in the timely distribution of the EM topics. Using a Naive Bayes (NB) classifier and the features "certainty" and "stability" yielded the best classification result and the subjects were classified with a sensitivity of 95 %, a specificity of 80% and an accuracy of 90 %. This study demonstrates in a data-driven approach, that iRBD and PD patients may exhibit abnorm form and/or timely distribution of EMs during sleep.
The BANYAN-Sigma Bayesian classifier and the search for isolated planetary-mass objects
NASA Astrophysics Data System (ADS)
Gagné, Jonathan
2018-01-01
I will present new developments in the construction of a Bayesian classification tool to identify members of 22 young associations within 150 pc from partially complete kinematic data sets such as Gaia-DR1 and DR2. The new BANYAN-Sigma tool makes it possible to quickly analyze massive data sets and yields a better classification performance than all its predecessors. It will open the door to large-scale surveys to complete the stellar and substellar populations of nearby associations, which will provide deep insights in the low-mass end of the initial mass function and valuable age-calibrated targets for exoplanet surveys.I will also presents preliminary results of a search for T-type isolated planetary-mass objects in these young associations, based on BANYAN-Sigma and a cross-match between the AllWISE and 2MASS-Reject catalogs.
NASA Astrophysics Data System (ADS)
Gweon, Gey-Hong; Lee, Hee-Sun; Dorsey, Chad; Tinker, Robert; Finzer, William; Damelin, Daniel
2015-03-01
In tracking student learning in on-line learning systems, the Bayesian knowledge tracing (BKT) model is a popular model. However, the model has well-known problems such as the identifiability problem or the empirical degeneracy problem. Understanding of these problems remain unclear and solutions to them remain subjective. Here, we analyze the log data from an online physics learning program with our new model, a Monte Carlo BKT model. With our new approach, we are able to perform a completely unbiased analysis, which can then be used for classifying student learning patterns and performances. Furthermore, a theoretical analysis of the BKT model and our computational work shed new light on the nature of the aforementioned problems. This material is based upon work supported by the National Science Foundation under Grant REC-1147621 and REC-1435470.
Unmasking the masked Universe: the 2M++ catalogue through Bayesian eyes
NASA Astrophysics Data System (ADS)
Lavaux, Guilhem; Jasche, Jens
2016-01-01
This work describes a full Bayesian analysis of the Nearby Universe as traced by galaxies of the 2M++ survey. The analysis is run in two sequential steps. The first step self-consistently derives the luminosity-dependent galaxy biases, the power spectrum of matter fluctuations and matter density fields within a Gaussian statistic approximation. The second step makes a detailed analysis of the three-dimensional large-scale structures, assuming a fixed bias model and a fixed cosmology. This second step allows for the reconstruction of both the final density field and the initial conditions at z = 1000 assuming a fixed bias model. From these, we derive fields that self-consistently extrapolate the observed large-scale structures. We give two examples of these extrapolation and their utility for the detection of structures: the visibility of the Sloan Great Wall, and the detection and characterization of the Local Void using DIVA, a Lagrangian based technique to classify structures.
Feature selection for elderly faller classification based on wearable sensors.
Howcroft, Jennifer; Kofman, Jonathan; Lemaire, Edward D
2017-05-30
Wearable sensors can be used to derive numerous gait pattern features for elderly fall risk and faller classification; however, an appropriate feature set is required to avoid high computational costs and the inclusion of irrelevant features. The objectives of this study were to identify and evaluate smaller feature sets for faller classification from large feature sets derived from wearable accelerometer and pressure-sensing insole gait data. A convenience sample of 100 older adults (75.5 ± 6.7 years; 76 non-fallers, 24 fallers based on 6 month retrospective fall occurrence) walked 7.62 m while wearing pressure-sensing insoles and tri-axial accelerometers at the head, pelvis, left and right shanks. Feature selection was performed using correlation-based feature selection (CFS), fast correlation based filter (FCBF), and Relief-F algorithms. Faller classification was performed using multi-layer perceptron neural network, naïve Bayesian, and support vector machine classifiers, with 75:25 single stratified holdout and repeated random sampling. The best performing model was a support vector machine with 78% accuracy, 26% sensitivity, 95% specificity, 0.36 F1 score, and 0.31 MCC and one posterior pelvis accelerometer input feature (left acceleration standard deviation). The second best model achieved better sensitivity (44%) and used a support vector machine with 74% accuracy, 83% specificity, 0.44 F1 score, and 0.29 MCC. This model had ten input features: maximum, mean and standard deviation posterior acceleration; maximum, mean and standard deviation anterior acceleration; mean superior acceleration; and three impulse features. The best multi-sensor model sensitivity (56%) was achieved using posterior pelvis and both shank accelerometers and a naïve Bayesian classifier. The best single-sensor model sensitivity (41%) was achieved using the posterior pelvis accelerometer and a naïve Bayesian classifier. Feature selection provided models with smaller feature sets and improved faller classification compared to faller classification without feature selection. CFS and FCBF provided the best feature subset (one posterior pelvis accelerometer feature) for faller classification. However, better sensitivity was achieved by the second best model based on a Relief-F feature subset with three pressure-sensing insole features and seven head accelerometer features. Feature selection should be considered as an important step in faller classification using wearable sensors.
Multivariate Spectral Analysis to Extract Materials from Multispectral Data
1993-09-01
Euclidean minimum distance and conventional Bayesian classifier suggest some fundamental instabilities. Two candidate sources are (1) inadequate...Coacete Water 2 TOTAL Cetu¢t1te 0 0 0 0 34 0 0 34 TZC10 0 0 0 0 0 26 0 26 hpem ~d I 0 0 to 0 0 0 0 60 Seb~ s 0 0 0 0 4 24 0 28 Mwal 0 0 0 0 33 29 0 62 Ihwid
Li, Yunhai; Lee, Kee Khoon; Walsh, Sean; Smith, Caroline; Hadingham, Sophie; Sorefan, Karim; Cawley, Gavin; Bevan, Michael W
2006-03-01
Establishing transcriptional regulatory networks by analysis of gene expression data and promoter sequences shows great promise. We developed a novel promoter classification method using a Relevance Vector Machine (RVM) and Bayesian statistical principles to identify discriminatory features in the promoter sequences of genes that can correctly classify transcriptional responses. The method was applied to microarray data obtained from Arabidopsis seedlings treated with glucose or abscisic acid (ABA). Of those genes showing >2.5-fold changes in expression level, approximately 70% were correctly predicted as being up- or down-regulated (under 10-fold cross-validation), based on the presence or absence of a small set of discriminative promoter motifs. Many of these motifs have known regulatory functions in sugar- and ABA-mediated gene expression. One promoter motif that was not known to be involved in glucose-responsive gene expression was identified as the strongest classifier of glucose-up-regulated gene expression. We show it confers glucose-responsive gene expression in conjunction with another promoter motif, thus validating the classification method. We were able to establish a detailed model of glucose and ABA transcriptional regulatory networks and their interactions, which will help us to understand the mechanisms linking metabolism with growth in Arabidopsis. This study shows that machine learning strategies coupled to Bayesian statistical methods hold significant promise for identifying functionally significant promoter sequences.
Bayesian Classification Models for Premature Ventricular Contraction Detection on ECG Traces.
Casas, Manuel M; Avitia, Roberto L; Gonzalez-Navarro, Felix F; Cardenas-Haro, Jose A; Reyna, Marco A
2018-01-01
According to the American Heart Association, in its latest commission about Ventricular Arrhythmias and Sudden Death 2006, the epidemiology of the ventricular arrhythmias ranges from a series of risk descriptors and clinical markers that go from ventricular premature complexes and nonsustained ventricular tachycardia to sudden cardiac death due to ventricular tachycardia in patients with or without clinical history. The premature ventricular complexes (PVCs) are known to be associated with malignant ventricular arrhythmias and sudden cardiac death (SCD) cases. Detecting this kind of arrhythmia has been crucial in clinical applications. The electrocardiogram (ECG) is a clinical test used to measure the heart electrical activity for inferences and diagnosis. Analyzing large ECG traces from several thousands of beats has brought the necessity to develop mathematical models that can automatically make assumptions about the heart condition. In this work, 80 different features from 108,653 ECG classified beats of the gold-standard MIT-BIH database were extracted in order to classify the Normal, PVC, and other kind of ECG beats. Three well-known Bayesian classification algorithms were trained and tested using these extracted features. Experimental results show that the F1 scores for each class were above 0.95, giving almost the perfect value for the PVC class. This gave us a promising path in the development of automated mechanisms for the detection of PVC complexes.
Construction accident narrative classification: An evaluation of text mining techniques.
Goh, Yang Miang; Ubeynarayana, C U
2017-11-01
Learning from past accidents is fundamental to accident prevention. Thus, accident and near miss reporting are encouraged by organizations and regulators. However, for organizations managing large safety databases, the time taken to accurately classify accident and near miss narratives will be very significant. This study aims to evaluate the utility of various text mining classification techniques in classifying 1000 publicly available construction accident narratives obtained from the US OSHA website. The study evaluated six machine learning algorithms, including support vector machine (SVM), linear regression (LR), random forest (RF), k-nearest neighbor (KNN), decision tree (DT) and Naive Bayes (NB), and found that SVM produced the best performance in classifying the test set of 251 cases. Further experimentation with tokenization of the processed text and non-linear SVM were also conducted. In addition, a grid search was conducted on the hyperparameters of the SVM models. It was found that the best performing classifiers were linear SVM with unigram tokenization and radial basis function (RBF) SVM with uni-gram tokenization. In view of its relative simplicity, the linear SVM is recommended. Across the 11 labels of accident causes or types, the precision of the linear SVM ranged from 0.5 to 1, recall ranged from 0.36 to 0.9 and F1 score was between 0.45 and 0.92. The reasons for misclassification were discussed and suggestions on ways to improve the performance were provided. Copyright © 2017 Elsevier Ltd. All rights reserved.
Automatic classification of protein structures using physicochemical parameters.
Mohan, Abhilash; Rao, M Divya; Sunderrajan, Shruthi; Pennathur, Gautam
2014-09-01
Protein classification is the first step to functional annotation; SCOP and Pfam databases are currently the most relevant protein classification schemes. However, the disproportion in the number of three dimensional (3D) protein structures generated versus their classification into relevant superfamilies/families emphasizes the need for automated classification schemes. Predicting function of novel proteins based on sequence information alone has proven to be a major challenge. The present study focuses on the use of physicochemical parameters in conjunction with machine learning algorithms (Naive Bayes, Decision Trees, Random Forest and Support Vector Machines) to classify proteins into their respective SCOP superfamily/Pfam family, using sequence derived information. Spectrophores™, a 1D descriptor of the 3D molecular field surrounding a structure was used as a benchmark to compare the performance of the physicochemical parameters. The machine learning algorithms were modified to select features based on information gain for each SCOP superfamily/Pfam family. The effect of combining physicochemical parameters and spectrophores on classification accuracy (CA) was studied. Machine learning algorithms trained with the physicochemical parameters consistently classified SCOP superfamilies and Pfam families with a classification accuracy above 90%, while spectrophores performed with a CA of around 85%. Feature selection improved classification accuracy for both physicochemical parameters and spectrophores based machine learning algorithms. Combining both attributes resulted in a marginal loss of performance. Physicochemical parameters were able to classify proteins from both schemes with classification accuracy ranging from 90-96%. These results suggest the usefulness of this method in classifying proteins from amino acid sequences.
NASA Astrophysics Data System (ADS)
Liu, Lanxia; Bai, Yuanyuan; Zhu, Dunwan; Song, Liping; Wang, Hai; Dong, Xia; Zhang, Hailing; Leng, Xigang
2011-06-01
Chitosan (CS) is one of the most widely studied polymers in non-viral gene delivery since it is a cationic polysaccharide that forms nanoparticles with DNA and hence protects the DNA against digestion by DNase. However, the impact of CS/DNA nanoparticle on the immune system still remains poorly understood. Previous investigations did not found CS/DNA nanoparticles had any significant impact on the function of human and murine macrophages. To date, little is known about the interaction between CS/DNA nanoparticles and naive CD4+ T cells. This study was designed to investigate whether CS/DNA nanoparticles affect the initial differentiation direction of human naive CD4+ T cells. The indirect impact of CS/DNA nanoparticles on naive CD4+ T cell differentiation was investigated by incubating the nanoparticles with human macrophage THP-1 cells in one chamber of a transwell co-incubation system, with the enriched human naive CD4+ T cells being placed in the other chamber of the transwell. The nanoparticles were also co-incubated with the naive CD4+ T cells to explore their direct impact on naive CD4+ T cell differentiation by measuring the release of IL-4 and IFN-γ from the cells. It was demonstrated that CS/DNA nanoparticles induced slightly elevated production of IL-12 by THP-1 cells, possibly owing to the presence of CpG motifs in the plasmid. However, this macrophage stimulating activity was much less significant as compared with lipopolysaccharide and did not impact on the differentiation of the naive CD4+ T cells. It was also demonstrated that, when directly exposed to the naive CD4+ T cells, the nanoparticles induced neither the activation of the naive CD4+ T cells in the absence of recombinant cytokines (recombinant human IL-4 or IFN-γ) that induce naive CD4+ T cell polarization, nor any changes in the differentiation direction of naive CD4+ T cells in the presence of the corresponding cytokines.
Klement, William; Wilk, Szymon; Michalowski, Wojtek; Farion, Ken J; Osmond, Martin H; Verter, Vedat
2012-03-01
Using an automatic data-driven approach, this paper develops a prediction model that achieves more balanced performance (in terms of sensitivity and specificity) than the Canadian Assessment of Tomography for Childhood Head Injury (CATCH) rule, when predicting the need for computed tomography (CT) imaging of children after a minor head injury. CT is widely considered an effective tool for evaluating patients with minor head trauma who have potentially suffered serious intracranial injury. However, its use poses possible harmful effects, particularly for children, due to exposure to radiation. Safety concerns, along with issues of cost and practice variability, have led to calls for the development of effective methods to decide when CT imaging is needed. Clinical decision rules represent such methods and are normally derived from the analysis of large prospectively collected patient data sets. The CATCH rule was created by a group of Canadian pediatric emergency physicians to support the decision of referring children with minor head injury to CT imaging. The goal of the CATCH rule was to maximize the sensitivity of predictions of potential intracranial lesion while keeping specificity at a reasonable level. After extensive analysis of the CATCH data set, characterized by severe class imbalance, and after a thorough evaluation of several data mining methods, we derived an ensemble of multiple Naive Bayes classifiers as the prediction model for CT imaging decisions. In the first phase of the experiment we compared the proposed ensemble model to other ensemble models employing rule-, tree- and instance-based member classifiers. Our prediction model demonstrated the best performance in terms of AUC, G-mean and sensitivity measures. In the second phase, using a bootstrapping experiment similar to that reported by the CATCH investigators, we showed that the proposed ensemble model achieved a more balanced predictive performance than the CATCH rule with an average sensitivity of 82.8% and an average specificity of 74.4% (vs. 98.1% and 50.0% for the CATCH rule respectively). Automatically derived prediction models cannot replace a physician's acumen. However, they help establish reference performance indicators for the purpose of developing clinical decision rules so the trade-off between prediction sensitivity and specificity is better understood. Copyright © 2011 Elsevier B.V. All rights reserved.
Tislerova, Barbora; Brunovsky, Martin; Horacek, Jiri; Novak, Tomas; Kopecek, Miloslav; Mohr, Pavel; Krajca, Vladimír
2008-01-01
The aim of our study was to detect changes in the distribution of electrical brain activity in schizophrenic patients who were antipsychotic naive and those who received treatment with clozapine, olanzapine or risperidone. We included 41 subjects with schizophrenia (antipsychotic naive = 11; clozapine = 8; olanzapine = 10; risperidone = 12) and 20 healthy controls. Low-resolution brain electromagnetic tomography was computed from 19-channel electroencephalography for the frequency bands delta, theta, alpha-1, alpha-2, beta-1, beta-2 and beta-3. We compared antipsychotic-naive subjects with healthy controls and medicated patients. (1) Comparing antipsychotic-naive subjects and controls we found a general increase in the slow delta and theta frequencies over the fronto-temporo-occipital cortex, particularly in the temporolimbic structures, an increase in alpha-1 and alpha-2 in the temporal cortex and an increase in beta-1 and beta-2 in the temporo-occipital and posterior limbic structures. (2) Comparing patients who received clozapine and those who were antipsychotic naive, we found an increase in delta and theta frequencies in the anterior cingulate and medial frontal cortex, and a decrease in alpha-1 and beta-2 in the occipital structures. (3) Comparing patients taking olanzapine with those who were antipsychotic naive, there was an increase in theta frequencies in the anterior cingulum, a decrease in alpha-1, beta-2 and beta-3 in the occipital cortex and posterior limbic structures, and a decrease in beta-3 in the frontotemporal cortex and anterior cingulum. (4) In patients taking risperidone, we found no significant changes from those who were antipsychotic naive. Our results in antipsychotic-naive patients are in agreement with existing functional findings. Changes in those taking clozapine and olanzapine versus those who were antipsychotic naive suggest a compensatory mechanism in the neurobiological substrate for schizophrenia. The lack of difference in risperidone patients versus antipsychotic-naive subjects may relate to risperidone's different pharmacodynamic mechanism. Copyright 2008 S. Karger AG, Basel.
Improving diagnostic recognition of primary hyperparathyroidism with machine learning.
Somnay, Yash R; Craven, Mark; McCoy, Kelly L; Carty, Sally E; Wang, Tracy S; Greenberg, Caprice C; Schneider, David F
2017-04-01
Parathyroidectomy offers the only cure for primary hyperparathyroidism, but today only 50% of primary hyperparathyroidism patients are referred for operation, in large part, because the condition is widely under-recognized. The diagnosis of primary hyperparathyroidism can be especially challenging with mild biochemical indices. Machine learning is a collection of methods in which computers build predictive algorithms based on labeled examples. With the aim of facilitating diagnosis, we tested the ability of machine learning to distinguish primary hyperparathyroidism from normal physiology using clinical and laboratory data. This retrospective cohort study used a labeled training set and 10-fold cross-validation to evaluate accuracy of the algorithm. Measures of accuracy included area under the receiver operating characteristic curve, precision (sensitivity), and positive and negative predictive value. Several different algorithms and ensembles of algorithms were tested using the Weka platform. Among 11,830 patients managed operatively at 3 high-volume endocrine surgery programs from March 2001 to August 2013, 6,777 underwent parathyroidectomy for confirmed primary hyperparathyroidism, and 5,053 control patients without primary hyperparathyroidism underwent thyroidectomy. Test-set accuracies for machine learning models were determined using 10-fold cross-validation. Age, sex, and serum levels of preoperative calcium, phosphate, parathyroid hormone, vitamin D, and creatinine were defined as potential predictors of primary hyperparathyroidism. Mild primary hyperparathyroidism was defined as primary hyperparathyroidism with normal preoperative calcium or parathyroid hormone levels. After testing a variety of machine learning algorithms, Bayesian network models proved most accurate, classifying correctly 95.2% of all primary hyperparathyroidism patients (area under receiver operating characteristic = 0.989). Omitting parathyroid hormone from the model did not decrease the accuracy significantly (area under receiver operating characteristic = 0.985). In mild disease cases, however, the Bayesian network model classified correctly 71.1% of patients with normal calcium and 92.1% with normal parathyroid hormone levels preoperatively. Bayesian networking and AdaBoost improved the accuracy of all parathyroid hormone patients to 97.2% cases (area under receiver operating characteristic = 0.994), and 91.9% of primary hyperparathyroidism patients with mild disease. This was significantly improved relative to Bayesian networking alone (P < .0001). Machine learning can diagnose accurately primary hyperparathyroidism without human input even in mild disease. Incorporation of this tool into electronic medical record systems may aid in recognition of this under-diagnosed disorder. Copyright © 2016 Elsevier Inc. All rights reserved.
Martin, Summer L; Stohs, Stephen M; Moore, Jeffrey E
2015-03-01
Fisheries bycatch is a global threat to marine megafauna. Environmental laws require bycatch assessment for protected species, but this is difficult when bycatch is rare. Low bycatch rates, combined with low observer coverage, may lead to biased, imprecise estimates when using standard ratio estimators. Bayesian model-based approaches incorporate uncertainty, produce less volatile estimates, and enable probabilistic evaluation of estimates relative to management thresholds. Here, we demonstrate a pragmatic decision-making process that uses Bayesian model-based inferences to estimate the probability of exceeding management thresholds for bycatch in fisheries with < 100% observer coverage. Using the California drift gillnet fishery as a case study, we (1) model rates of rare-event bycatch and mortality using Bayesian Markov chain Monte Carlo estimation methods and 20 years of observer data; (2) predict unobserved counts of bycatch and mortality; (3) infer expected annual mortality; (4) determine probabilities of mortality exceeding regulatory thresholds; and (5) classify the fishery as having low, medium, or high bycatch impact using those probabilities. We focused on leatherback sea turtles (Dermochelys coriacea) and humpback whales (Megaptera novaeangliae). Candidate models included Poisson or zero-inflated Poisson likelihood, fishing effort, and a bycatch rate that varied with area, time, or regulatory regime. Regulatory regime had the strongest effect on leatherback bycatch, with the highest levels occurring prior to a regulatory change. Area had the strongest effect on humpback bycatch. Cumulative bycatch estimates for the 20-year period were 104-242 leatherbacks (52-153 deaths) and 6-50 humpbacks (0-21 deaths). The probability of exceeding a regulatory threshold under the U.S. Marine Mammal Protection Act (Potential Biological Removal, PBR) of 0.113 humpback deaths was 0.58, warranting a "medium bycatch impact" classification of the fishery. No PBR thresholds exist for leatherbacks, but the probability of exceeding an anticipated level of two deaths per year, stated as part of a U.S. Endangered Species Act assessment process, was 0.0007. The approach demonstrated here would allow managers to objectively and probabilistically classify fisheries with respect to bycatch impacts on species that have population-relevant mortality reference points, and declare with a stipulated level of certainty that bycatch did or did not exceed estimated upper bounds.
Semi-supervised vibration-based classification and condition monitoring of compressors
NASA Astrophysics Data System (ADS)
Potočnik, Primož; Govekar, Edvard
2017-09-01
Semi-supervised vibration-based classification and condition monitoring of the reciprocating compressors installed in refrigeration appliances is proposed in this paper. The method addresses the problem of industrial condition monitoring where prior class definitions are often not available or difficult to obtain from local experts. The proposed method combines feature extraction, principal component analysis, and statistical analysis for the extraction of initial class representatives, and compares the capability of various classification methods, including discriminant analysis (DA), neural networks (NN), support vector machines (SVM), and extreme learning machines (ELM). The use of the method is demonstrated on a case study which was based on industrially acquired vibration measurements of reciprocating compressors during the production of refrigeration appliances. The paper presents a comparative qualitative analysis of the applied classifiers, confirming the good performance of several nonlinear classifiers. If the model parameters are properly selected, then very good classification performance can be obtained from NN trained by Bayesian regularization, SVM and ELM classifiers. The method can be effectively applied for the industrial condition monitoring of compressors.
Guijarro, María; Pajares, Gonzalo; Herrera, P. Javier
2009-01-01
The increasing technology of high-resolution image airborne sensors, including those on board Unmanned Aerial Vehicles, demands automatic solutions for processing, either on-line or off-line, the huge amountds of image data sensed during the flights. The classification of natural spectral signatures in images is one potential application. The actual tendency in classification is oriented towards the combination of simple classifiers. In this paper we propose a combined strategy based on the Deterministic Simulated Annealing (DSA) framework. The simple classifiers used are the well tested supervised parametric Bayesian estimator and the Fuzzy Clustering. The DSA is an optimization approach, which minimizes an energy function. The main contribution of DSA is its ability to avoid local minima during the optimization process thanks to the annealing scheme. It outperforms simple classifiers used for the combination and some combined strategies, including a scheme based on the fuzzy cognitive maps and an optimization approach based on the Hopfield neural network paradigm. PMID:22399989
Computer-aided detection of prostate cancer in T2-weighted MRI within the peripheral zone
NASA Astrophysics Data System (ADS)
Rampun, Andrik; Zheng, Ling; Malcolm, Paul; Tiddeman, Bernie; Zwiggelaar, Reyer
2016-07-01
In this paper we propose a prostate cancer computer-aided diagnosis (CAD) system and suggest a set of discriminant texture descriptors extracted from T2-weighted MRI data which can be used as a good basis for a multimodality system. For this purpose, 215 texture descriptors were extracted and eleven different classifiers were employed to achieve the best possible results. The proposed method was tested based on 418 T2-weighted MR images taken from 45 patients and evaluated using 9-fold cross validation with five patients in each fold. The results demonstrated comparable results to existing CAD systems using multimodality MRI. We achieved an area under the receiver operating curve (A z ) values equal to 90.0%+/- 7.6% , 89.5%+/- 8.9% , 87.9%+/- 9.3% and 87.4%+/- 9.2% for Bayesian networks, ADTree, random forest and multilayer perceptron classifiers, respectively, while a meta-voting classifier using average probability as a combination rule achieved 92.7%+/- 7.4% .
Arrogance analysis of several typical pattern recognition classifiers
NASA Astrophysics Data System (ADS)
Jing, Chen; Xia, Shengping; Hu, Weidong
2007-04-01
Various kinds of classification methods have been developed. However, most of these classical methods, such as Back-Propagation (BP), Bayesian method, Support Vector Machine(SVM), Self-Organizing Map (SOM) are arrogant. A so-called arrogance, for a human, means that his decision, which even is a mistake, overstates his actual experience. Accordingly, we say that he is a arrogant if he frequently makes arrogant decisions. Likewise, some classical pattern classifiers represent the similar characteristic of arrogance. Given an input feature vector, we say a classifier is arrogant in its classification if its veracity is high yet its experience is low. Typically, for a new sample which is distinguishable from original training samples, traditional classifiers recognize it as one of the known targets. Clearly, arrogance in classification is an undesirable attribute. Conversely, a classifier is non-arrogant in its classification if there is a reasonable balance between its veracity and its experience. Inquisitiveness is, in many ways, the opposite of arrogance. In nature, inquisitiveness is an eagerness for knowledge characterized by the drive to question, to seek a deeper understanding. The human capacity to doubt present beliefs allows us to acquire new experiences and to learn from our mistakes. Within the discrete world of computers, inquisitive pattern recognition is the constructive investigation and exploitation of conflict in information. Thus, we quantify this balance and discuss new techniques that will detect arrogance in a classifier.
Dehzangi, Abdollah; Paliwal, Kuldip; Sharma, Alok; Dehzangi, Omid; Sattar, Abdul
2013-01-01
Better understanding of structural class of a given protein reveals important information about its overall folding type and its domain. It can also be directly used to provide critical information on general tertiary structure of a protein which has a profound impact on protein function determination and drug design. Despite tremendous enhancements made by pattern recognition-based approaches to solve this problem, it still remains as an unsolved issue for bioinformatics that demands more attention and exploration. In this study, we propose a novel feature extraction model that incorporates physicochemical and evolutionary-based information simultaneously. We also propose overlapped segmented distribution and autocorrelation-based feature extraction methods to provide more local and global discriminatory information. The proposed feature extraction methods are explored for 15 most promising attributes that are selected from a wide range of physicochemical-based attributes. Finally, by applying an ensemble of different classifiers namely, Adaboost.M1, LogitBoost, naive Bayes, multilayer perceptron (MLP), and support vector machine (SVM) we show enhancement of the protein structural class prediction accuracy for four popular benchmarks.
Sentimental Analysis for Airline Twitter data
NASA Astrophysics Data System (ADS)
Dutta Das, Deb; Sharma, Sharan; Natani, Shubham; Khare, Neelu; Singh, Brijendra
2017-11-01
Social Media has taken the world by surprise at a swift and commendable pace. With the advent of any kind of circumstances may it be related to social, political or current affairs the sentiments of people throughout the world are expressed through their help, making them suitable candidates for sentiment mining. Sentimental analysis becomes highly resourceful for any organization who wants to analyse and enhance their products and services. In the airline industries it is much easier to get feedback from astute data source such as Twitter, for conducting a sentiment analysis on their respective customers. The beneficial factors relating to twitter sentiment analysis cannot be impeded by the consumers who want to know the who’s who and what’s what in everyday life. In this paper we are classifying sentiment of Twitter messages by exhibiting results of a machine learning algorithm using R and Rapid Miner. The tweets are extracted and pre-processed and then categorizing them in neutral, negative and positive sentiments finally summarising the results as a whole. The Naive Bayes algorithm has been used for classifying the sentiments of recent tweets done on the different airlines.
Implementation and performance evaluation of acoustic denoising algorithms for UAV
NASA Astrophysics Data System (ADS)
Chowdhury, Ahmed Sony Kamal
Unmanned Aerial Vehicles (UAVs) have become popular alternative for wildlife monitoring and border surveillance applications. Elimination of the UAV's background noise and classifying the target audio signal effectively are still a major challenge. The main goal of this thesis is to remove UAV's background noise by means of acoustic denoising techniques. Existing denoising algorithms, such as Adaptive Least Mean Square (LMS), Wavelet Denoising, Time-Frequency Block Thresholding, and Wiener Filter, were implemented and their performance evaluated. The denoising algorithms were evaluated for average Signal to Noise Ratio (SNR), Segmental SNR (SSNR), Log Likelihood Ratio (LLR), and Log Spectral Distance (LSD) metrics. To evaluate the effectiveness of the denoising algorithms on classification of target audio, we implemented Support Vector Machine (SVM) and Naive Bayes classification algorithms. Simulation results demonstrate that LMS and Discrete Wavelet Transform (DWT) denoising algorithm offered superior performance than other algorithms. Finally, we implemented the LMS and DWT algorithms on a DSP board for hardware evaluation. Experimental results showed that LMS algorithm's performance is robust compared to DWT for various noise types to classify target audio signals.
Pashaei, Elnaz; Ozen, Mustafa; Aydin, Nizamettin
2015-08-01
Improving accuracy of supervised classification algorithms in biomedical applications is one of active area of research. In this study, we improve the performance of Particle Swarm Optimization (PSO) combined with C4.5 decision tree (PSO+C4.5) classifier by applying Boosted C5.0 decision tree as the fitness function. To evaluate the effectiveness of our proposed method, it is implemented on 1 microarray dataset and 5 different medical data sets obtained from UCI machine learning databases. Moreover, the results of PSO + Boosted C5.0 implementation are compared to eight well-known benchmark classification methods (PSO+C4.5, support vector machine under the kernel of Radial Basis Function, Classification And Regression Tree (CART), C4.5 decision tree, C5.0 decision tree, Boosted C5.0 decision tree, Naive Bayes and Weighted K-Nearest neighbor). Repeated five-fold cross-validation method was used to justify the performance of classifiers. Experimental results show that our proposed method not only improve the performance of PSO+C4.5 but also obtains higher classification accuracy compared to the other classification methods.
Link prediction in multiplex online social networks
NASA Astrophysics Data System (ADS)
Jalili, Mahdi; Orouskhani, Yasin; Asgari, Milad; Alipourfard, Nazanin; Perc, Matjaž
2017-02-01
Online social networks play a major role in modern societies, and they have shaped the way social relationships evolve. Link prediction in social networks has many potential applications such as recommending new items to users, friendship suggestion and discovering spurious connections. Many real social networks evolve the connections in multiple layers (e.g. multiple social networking platforms). In this article, we study the link prediction problem in multiplex networks. As an example, we consider a multiplex network of Twitter (as a microblogging service) and Foursquare (as a location-based social network). We consider social networks of the same users in these two platforms and develop a meta-path-based algorithm for predicting the links. The connectivity information of the two layers is used to predict the links in Foursquare network. Three classical classifiers (naive Bayes, support vector machines (SVM) and K-nearest neighbour) are used for the classification task. Although the networks are not highly correlated in the layers, our experiments show that including the cross-layer information significantly improves the prediction performance. The SVM classifier results in the best performance with an average accuracy of 89%.
Link prediction in multiplex online social networks.
Jalili, Mahdi; Orouskhani, Yasin; Asgari, Milad; Alipourfard, Nazanin; Perc, Matjaž
2017-02-01
Online social networks play a major role in modern societies, and they have shaped the way social relationships evolve. Link prediction in social networks has many potential applications such as recommending new items to users, friendship suggestion and discovering spurious connections. Many real social networks evolve the connections in multiple layers (e.g. multiple social networking platforms). In this article, we study the link prediction problem in multiplex networks. As an example, we consider a multiplex network of Twitter (as a microblogging service) and Foursquare (as a location-based social network). We consider social networks of the same users in these two platforms and develop a meta-path-based algorithm for predicting the links. The connectivity information of the two layers is used to predict the links in Foursquare network. Three classical classifiers (naive Bayes, support vector machines (SVM) and K-nearest neighbour) are used for the classification task. Although the networks are not highly correlated in the layers, our experiments show that including the cross-layer information significantly improves the prediction performance. The SVM classifier results in the best performance with an average accuracy of 89%.
NASA Astrophysics Data System (ADS)
Karsi, Redouane; Zaim, Mounia; El Alami, Jamila
2017-07-01
Thanks to the development of the internet, a large community now has the possibility to communicate and express its opinions and preferences through multiple media such as blogs, forums, social networks and e-commerce sites. Today, it becomes clearer that opinions published on the web are a very valuable source for decision-making, so a rapidly growing field of research called “sentiment analysis” is born to address the problem of automatically determining the polarity (Positive, negative, neutral,…) of textual opinions. People expressing themselves in a particular domain often use specific domain language expressions, thus, building a classifier, which performs well in different domains is a challenging problem. The purpose of this paper is to evaluate the impact of domain for sentiment classification when using machine learning techniques. In our study three popular machine learning techniques: Support Vector Machines (SVM), Naive Bayes and K nearest neighbors(KNN) were applied on datasets collected from different domains. Experimental results show that Support Vector Machines outperforms other classifiers in all domains, since it achieved at least 74.75% accuracy with a standard deviation of 4,08.
Kernel-imbedded Gaussian processes for disease classification using microarray gene expression data
Zhao, Xin; Cheung, Leo Wang-Kit
2007-01-01
Background Designing appropriate machine learning methods for identifying genes that have a significant discriminating power for disease outcomes has become more and more important for our understanding of diseases at genomic level. Although many machine learning methods have been developed and applied to the area of microarray gene expression data analysis, the majority of them are based on linear models, which however are not necessarily appropriate for the underlying connection between the target disease and its associated explanatory genes. Linear model based methods usually also bring in false positive significant features more easily. Furthermore, linear model based algorithms often involve calculating the inverse of a matrix that is possibly singular when the number of potentially important genes is relatively large. This leads to problems of numerical instability. To overcome these limitations, a few non-linear methods have recently been introduced to the area. Many of the existing non-linear methods have a couple of critical problems, the model selection problem and the model parameter tuning problem, that remain unsolved or even untouched. In general, a unified framework that allows model parameters of both linear and non-linear models to be easily tuned is always preferred in real-world applications. Kernel-induced learning methods form a class of approaches that show promising potentials to achieve this goal. Results A hierarchical statistical model named kernel-imbedded Gaussian process (KIGP) is developed under a unified Bayesian framework for binary disease classification problems using microarray gene expression data. In particular, based on a probit regression setting, an adaptive algorithm with a cascading structure is designed to find the appropriate kernel, to discover the potentially significant genes, and to make the optimal class prediction accordingly. A Gibbs sampler is built as the core of the algorithm to make Bayesian inferences. Simulation studies showed that, even without any knowledge of the underlying generative model, the KIGP performed very close to the theoretical Bayesian bound not only in the case with a linear Bayesian classifier but also in the case with a very non-linear Bayesian classifier. This sheds light on its broader usability to microarray data analysis problems, especially to those that linear methods work awkwardly. The KIGP was also applied to four published microarray datasets, and the results showed that the KIGP performed better than or at least as well as any of the referred state-of-the-art methods did in all of these cases. Conclusion Mathematically built on the kernel-induced feature space concept under a Bayesian framework, the KIGP method presented in this paper provides a unified machine learning approach to explore both the linear and the possibly non-linear underlying relationship between the target features of a given binary disease classification problem and the related explanatory gene expression data. More importantly, it incorporates the model parameter tuning into the framework. The model selection problem is addressed in the form of selecting a proper kernel type. The KIGP method also gives Bayesian probabilistic predictions for disease classification. These properties and features are beneficial to most real-world applications. The algorithm is naturally robust in numerical computation. The simulation studies and the published data studies demonstrated that the proposed KIGP performs satisfactorily and consistently. PMID:17328811
Automated diagnosis of dry eye using infrared thermography images
NASA Astrophysics Data System (ADS)
Acharya, U. Rajendra; Tan, Jen Hong; Koh, Joel E. W.; Sudarshan, Vidya K.; Yeo, Sharon; Too, Cheah Loon; Chua, Chua Kuang; Ng, E. Y. K.; Tong, Louis
2015-07-01
Dry Eye (DE) is a condition of either decreased tear production or increased tear film evaporation. Prolonged DE damages the cornea causing the corneal scarring, thinning and perforation. There is no single uniform diagnosis test available to date; combinations of diagnostic tests are to be performed to diagnose DE. The current diagnostic methods available are subjective, uncomfortable and invasive. Hence in this paper, we have developed an efficient, fast and non-invasive technique for the automated identification of normal and DE classes using infrared thermography images. The features are extracted from nonlinear method called Higher Order Spectra (HOS). Features are ranked using t-test ranking strategy. These ranked features are fed to various classifiers namely, K-Nearest Neighbor (KNN), Nave Bayesian Classifier (NBC), Decision Tree (DT), Probabilistic Neural Network (PNN), and Support Vector Machine (SVM) to select the best classifier using minimum number of features. Our proposed system is able to identify the DE and normal classes automatically with classification accuracy of 99.8%, sensitivity of 99.8%, and specificity if 99.8% for left eye using PNN and KNN classifiers. And we have reported classification accuracy of 99.8%, sensitivity of 99.9%, and specificity if 99.4% for right eye using SVM classifier with polynomial order 2 kernel.
Zander, Axel; Gravel, Dominique; Bersier, Louis-Félix; Gray, Sarah M
2016-02-01
Introduced top predators have the potential to disrupt community dynamics when prey species are naive to predation. The impact of introduced predators may also vary depending on the stage of community development. Early-succession communities are likely to have small-bodied and fast-growing species, but are not necessarily good at defending against predators. In contrast, late-succession communities are typically composed of larger-bodied species that are more predator resistant relative to small-bodied species. Yet, these aspects are greatly neglected in invasion studies. We therefore tested the effect of top predator presence on early- and late-succession communities that were either naive or non-naive to top predators. We used the aquatic community held within the leaves of Sarracenia purpurea. In North America, communities have experienced the S. purpurea top predator and are therefore non-naive. In Europe, this predator is not present and its niche has not been filled, making these communities top-predator naive. We collected early- and late-succession communities from two non-naive and two naive sites, which are climatically similar. We then conducted a common-garden experiment, with and without the presence of the top predator, in which we recorded changes in community composition, body size spectra, bacterial density, and respiration. We found that the top predator had no statistical effect on global measures of community structure and functioning. However, it significantly altered protist composition, but only in naive, early-succession communities, highlighting that the state of community development is important for understanding the impact of invasion.
Iglicki, Matias; Busch, Catharina; Zur, Dinah; Okada, Mali; Mariussi, Miriana; Chhablani, Jay Kumar; Cebeci, Zafer; Fraser-Bell, Samantha; Chaikitmongkol, Voraporn; Couturier, Aude; Giancipoli, Ermete; Lupidi, Marco; Rodríguez-Valdés, Patricio J; Rehak, Matus; Fung, Adrian Tien-Chin; Goldstein, Michaella; Loewenstein, Anat
2018-04-24
To investigate efficacy and safety of repeated dexamethasone (DEX) implants over 24 months, in diabetic macular edema (DME) eyes that were treatment naive compared with eyes refractory to anti-vascular endothelial growth factor treatment, in a real-life environment. This multicenter international retrospective study assessed best-corrected visual acuity and central subfield thickness (CST) of naive and refractory eyes to anti-vascular endothelial growth factor injections treated with dexamethasone implants. Safety data (intraocular pressure rise and cataract surgery) were recorded. A total of 130 eyes from 125 patients were included. Baseline best-corrected visual acuity and CST were similar for naive (n = 71) and refractory eyes (n = 59). Both groups improved significantly in vision after 24 months (P < 0.001). However, naive eyes gained statistically significantly more vision than refractory eyes (+11.3 ± 10.0 vs. 7.3 ± 2.7 letters, P = 0.01) and were more likely to gain ≥10 letters (OR 3.31, 95% CI 1.19-9.24, P = 0.02). At 6, 12, and 24 months, CST was significantly decreased compared with baseline in both naive and refractory eyes; however, CST was higher in refractory eyes than in naive eyes (CST 279 ± 61 vs. 313 ± 125 μm, P = 0.10). Over a follow-up of 24 months, vision improved in diabetic macular edema eyes after treatment with dexamethasone implants, both in eyes that were treatment naive and eyes refractory to anti-vascular endothelial growth factor treatment; however, improvement was greater in naive eyes.
Introduction in IND and recursive partitioning
NASA Technical Reports Server (NTRS)
Buntine, Wray; Caruana, Rich
1991-01-01
This manual describes the IND package for learning tree classifiers from data. The package is an integrated C and C shell re-implementation of tree learning routines such as CART, C4, and various MDL and Bayesian variations. The package includes routines for experiment control, interactive operation, and analysis of tree building. The manual introduces the system and its many options, gives a basic review of tree learning, contains a guide to the literature and a glossary, and lists the manual pages for the routines and instructions on installation.
Symbolic dynamic filtering and language measure for behavior identification of mobile robots.
Mallapragada, Goutham; Ray, Asok; Jin, Xin
2012-06-01
This paper presents a procedure for behavior identification of mobile robots, which requires limited or no domain knowledge of the underlying process. While the features of robot behavior are extracted by symbolic dynamic filtering of the observed time series, the behavior patterns are classified based on language measure theory. The behavior identification procedure has been experimentally validated on a networked robotic test bed by comparison with commonly used tools, namely, principal component analysis for feature extraction and Bayesian risk analysis for pattern classification.
Mycofier: a new machine learning-based classifier for fungal ITS sequences.
Delgado-Serrano, Luisa; Restrepo, Silvia; Bustos, Jose Ricardo; Zambrano, Maria Mercedes; Anzola, Juan Manuel
2016-08-11
The taxonomic and phylogenetic classification based on sequence analysis of the ITS1 genomic region has become a crucial component of fungal ecology and diversity studies. Nowadays, there is no accurate alignment-free classification tool for fungal ITS1 sequences for large environmental surveys. This study describes the development of a machine learning-based classifier for the taxonomical assignment of fungal ITS1 sequences at the genus level. A fungal ITS1 sequence database was built using curated data. Training and test sets were generated from it. A Naïve Bayesian classifier was built using features from the primary sequence with an accuracy of 87 % in the classification at the genus level. The final model was based on a Naïve Bayes algorithm using ITS1 sequences from 510 fungal genera. This classifier, denoted as Mycofier, provides similar classification accuracy compared to BLASTN, but the database used for the classification contains curated data and the tool, independent of alignment, is more efficient and contributes to the field, given the lack of an accurate classification tool for large data from fungal ITS1 sequences. The software and source code for Mycofier are freely available at https://github.com/ldelgado-serrano/mycofier.git .
Predictive classification of self-paced upper-limb analytical movements with EEG.
Ibáñez, Jaime; Serrano, J I; del Castillo, M D; Minguez, J; Pons, J L
2015-11-01
The extent to which the electroencephalographic activity allows the characterization of movements with the upper limb is an open question. This paper describes the design and validation of a classifier of upper-limb analytical movements based on electroencephalographic activity extracted from intervals preceding self-initiated movement tasks. Features selected for the classification are subject specific and associated with the movement tasks. Further tests are performed to reject the hypothesis that other information different from the task-related cortical activity is being used by the classifiers. Six healthy subjects were measured performing self-initiated upper-limb analytical movements. A Bayesian classifier was used to classify among seven different kinds of movements. Features considered covered the alpha and beta bands. A genetic algorithm was used to optimally select a subset of features for the classification. An average accuracy of 62.9 ± 7.5% was reached, which was above the baseline level observed with the proposed methodology (30.2 ± 4.3%). The study shows how the electroencephalography carries information about the type of analytical movement performed with the upper limb and how it can be decoded before the movement begins. In neurorehabilitation environments, this information could be used for monitoring and assisting purposes.
Evaluation of Semi-supervised Learning for Classification of Protein Crystallization Imagery
Sigdel, Madhav; Dinç, İmren; Dinç, Semih; Sigdel, Madhu S.; Pusey, Marc L.; Aygün, Ramazan S.
2015-01-01
In this paper, we investigate the performance of two wrapper methods for semi-supervised learning algorithms for classification of protein crystallization images with limited labeled images. Firstly, we evaluate the performance of semi-supervised approach using self-training with naïve Bayesian (NB) and sequential minimum optimization (SMO) as the base classifiers. The confidence values returned by these classifiers are used to select high confident predictions to be used for self-training. Secondly, we analyze the performance of Yet Another Two Stage Idea (YATSI) semi-supervised learning using NB, SMO, multilayer perceptron (MLP), J48 and random forest (RF) classifiers. These results are compared with the basic supervised learning using the same training sets. We perform our experiments on a dataset consisting of 2250 protein crystallization images for different proportions of training and test data. Our results indicate that NB and SMO using both self-training and YATSI semi-supervised approaches improve accuracies with respect to supervised learning. On the other hand, MLP, J48 and RF perform better using basic supervised learning. Overall, random forest classifier yields the best accuracy with supervised learning for our dataset. PMID:25914518
IL-7-Induced Proliferation of Human Naive CD4 T-Cells Relies on Continued Thymic Activity.
Silva, Susana L; Albuquerque, Adriana S; Matoso, Paula; Charmeteau-de-Muylder, Bénédicte; Cheynier, Rémi; Ligeiro, Dário; Abecasis, Miguel; Anjos, Rui; Barata, João T; Victorino, Rui M M; Sousa, Ana E
2017-01-01
Naive CD4 T-cell maintenance is critical for immune competence. We investigated here the fine-tuning of homeostatic mechanisms of the naive compartment to counteract the loss of de novo CD4 T-cell generation. Adults thymectomized in early childhood during corrective cardiac surgery were grouped based on presence or absence of thymopoiesis and compared with age-matched controls. We found that the preservation of the CD31 - subset was independent of the thymus and that its size is tightly controlled by peripheral mechanisms, including prolonged cell survival as attested by Bcl-2 levels. Conversely, a significant contraction of the CD31 + naive subset was observed in the absence of thymic activity. This was associated with impaired responses of purified naive CD4 T-cells to IL-7, namely, in vitro proliferation and upregulation of CD31 expression, which likely potentiated the decline in recent thymic emigrants. Additionally, we found no apparent constraint in the differentiation of naive cells into the memory compartment in individuals completely lacking thymic activity despite upregulation of DUSP6 , a phosphatase associated with increased TCR threshold. Of note, thymectomized individuals featuring some degree of thymopoiesis were able to preserve the size and diversity of the naive CD4 compartment, further arguing against complete thymectomy in infancy. Overall, our data suggest that robust peripheral mechanisms ensure the homeostasis of CD31 - naive CD4 pool and point to the requirement of continuous thymic activity to the maintenance of IL-7-driven homeostatic proliferation of CD31 + naive CD4 T-cells, which is essential to secure T-cell diversity throughout life.
Risk of Erectile Dysfunction in Transfusion-naive Thalassemia Men
Chen, Yu-Guang; Lin, Te-Yu; Lin, Cheng-Li; Dai, Ming-Shen; Ho, Ching-Liang; Kao, Chia-Hung
2015-01-01
Abstract Based on the mechanism of pathophysiology, thalassemia major or transfusion-dependent thalassemia patients may have an increased risk of developing organic erectile dysfunction resulting from hypogonadism. However, there have been few studies investigating the association between erectile dysfunction and transfusion-naive thalassemia populations. We constructed a population-based cohort study to elucidate the association between transfusion-naive thalassemia populations and organic erectile dysfunction This nationwide population-based cohort study involved analyzing data from 1998 to 2010 obtained from the Taiwanese National Health Insurance Research Database, with a follow-up period extending to the end of 2011. We identified men with transfusion-naive thalassemia and selected a comparison cohort that was frequency-matched with these according to age, and year of diagnosis thalassemia at a ratio of 1 thalassemia man to 4 control men. We analyzed the risks for transfusion-naive thalassemia men and organic erectile dysfunction by using Cox proportional hazards regression models. In this study, 588 transfusion-naive thalassemia men and 2337 controls were included. Total 12 patients were identified within the thalassaemia group and 10 within the control group. The overall risks for developing organic erectile dysfunction were 4.56-fold in patients with transfusion-naive thalassemia men compared with the comparison cohort after we adjusted for age and comorbidities. Our long-term cohort study results showed that in transfusion-naive thalassemia men, there was a higher risk for the development of organic erectile dysfunction, particularly in those patients with comorbidities. PMID:25837766
Chen, Yu-Guang; Lin, Te-Yu; Lin, Cheng-Li; Dai, Ming-Shen; Ho, Ching-Liang; Kao, Chia-Hung
2015-04-01
Based on the mechanism of pathophysiology, thalassemia major or transfusion-dependent thalassemia patients may have an increased risk of developing organic erectile dysfunction resulting from hypogonadism. However, there have been few studies investigating the association between erectile dysfunction and transfusion-naive thalassemia populations. We constructed a population-based cohort study to elucidate the association between transfusion-naive thalassemia populations and organic erectile dysfunction. This nationwide population-based cohort study involved analyzing data from 1998 to 2010 obtained from the Taiwanese National Health Insurance Research Database, with a follow-up period extending to the end of 2011. We identified men with transfusion-naive thalassemia and selected a comparison cohort that was frequency-matched with these according to age, and year of diagnosis thalassemia at a ratio of 1 thalassemia man to 4 control men. We analyzed the risks for transfusion-naive thalassemia men and organic erectile dysfunction by using Cox proportional hazards regression models. In this study, 588 transfusion-naive thalassemia men and 2337 controls were included. Total 12 patients were identified within the thalassaemia group and 10 within the control group. The overall risks for developing organic erectile dysfunction were 4.56-fold in patients with transfusion-naive thalassemia men compared with the comparison cohort after we adjusted for age and comorbidities. Our long-term cohort study results showed that in transfusion-naive thalassemia men, there was a higher risk for the development of organic erectile dysfunction, particularly in those patients with comorbidities.
Larrañaga, Ana; Bielza, Concha; Pongrácz, Péter; Faragó, Tamás; Bálint, Anna; Larrañaga, Pedro
2015-03-01
Barking is perhaps the most characteristic form of vocalization in dogs; however, very little is known about its role in the intraspecific communication of this species. Besides the obvious need for ethological research, both in the field and in the laboratory, the possible information content of barks can also be explored by computerized acoustic analyses. This study compares four different supervised learning methods (naive Bayes, classification trees, [Formula: see text]-nearest neighbors and logistic regression) combined with three strategies for selecting variables (all variables, filter and wrapper feature subset selections) to classify Mudi dogs by sex, age, context and individual from their barks. The classification accuracy of the models obtained was estimated by means of [Formula: see text]-fold cross-validation. Percentages of correct classifications were 85.13 % for determining sex, 80.25 % for predicting age (recodified as young, adult and old), 55.50 % for classifying contexts (seven situations) and 67.63 % for recognizing individuals (8 dogs), so the results are encouraging. The best-performing method was [Formula: see text]-nearest neighbors following a wrapper feature selection approach. The results for classifying contexts and recognizing individual dogs were better with this method than they were for other approaches reported in the specialized literature. This is the first time that the sex and age of domestic dogs have been predicted with the help of sound analysis. This study shows that dog barks carry ample information regarding the caller's indexical features. Our computerized analysis provides indirect proof that barks may serve as an important source of information for dogs as well.
Detection of Cardiovascular Disease Risk's Level for Adults Using Naive Bayes Classifier.
Miranda, Eka; Irwansyah, Edy; Amelga, Alowisius Y; Maribondang, Marco M; Salim, Mulyadi
2016-07-01
The number of deaths caused by cardiovascular disease and stroke is predicted to reach 23.3 million in 2030. As a contribution to support prevention of this phenomenon, this paper proposes a mining model using a naïve Bayes classifier that could detect cardiovascular disease and identify its risk level for adults. The process of designing the method began by identifying the knowledge related to the cardiovascular disease profile and the level of cardiovascular disease risk factors for adults based on the medical record, and designing a mining technique model using a naïve Bayes classifier. Evaluation of this research employed two methods: accuracy, sensitivity, and specificity calculation as well as an evaluation session with cardiologists and internists. The characteristics of cardiovascular disease are identified by its primary risk factors. Those factors are diabetes mellitus, the level of lipids in the blood, coronary artery function, and kidney function. Class labels were assigned according to the values of these factors: risk level 1, risk level 2 and risk level 3. The evaluation of the classifier performance (accuracy, sensitivity, and specificity) in this research showed that the proposed model predicted the class label of tuples correctly (above 80%). More than eighty percent of respondents (including cardiologists and internists) who participated in the evaluation session agree till strongly agreed that this research followed medical procedures and that the result can support medical analysis related to cardiovascular disease. The research showed that the proposed model achieves good performance for risk level detection of cardiovascular disease.
Miao, Minmin; Zeng, Hong; Wang, Aimin; Zhao, Changsen; Liu, Feixiang
2017-02-15
Common spatial pattern (CSP) is most widely used in motor imagery based brain-computer interface (BCI) systems. In conventional CSP algorithm, pairs of the eigenvectors corresponding to both extreme eigenvalues are selected to construct the optimal spatial filter. In addition, an appropriate selection of subject-specific time segments and frequency bands plays an important role in its successful application. This study proposes to optimize spatial-frequency-temporal patterns for discriminative feature extraction. Spatial optimization is implemented by channel selection and finding discriminative spatial filters adaptively on each time-frequency segment. A novel Discernibility of Feature Sets (DFS) criteria is designed for spatial filter optimization. Besides, discriminative features located in multiple time-frequency segments are selected automatically by the proposed sparse time-frequency segment common spatial pattern (STFSCSP) method which exploits sparse regression for significant features selection. Finally, a weight determined by the sparse coefficient is assigned for each selected CSP feature and we propose a Weighted Naïve Bayesian Classifier (WNBC) for classification. Experimental results on two public EEG datasets demonstrate that optimizing spatial-frequency-temporal patterns in a data-driven manner for discriminative feature extraction greatly improves the classification performance. The proposed method gives significantly better classification accuracies in comparison with several competing methods in the literature. The proposed approach is a promising candidate for future BCI systems. Copyright © 2016 Elsevier B.V. All rights reserved.
Detecting falls with wearable sensors using machine learning techniques.
Özdemir, Ahmet Turan; Barshan, Billur
2014-06-18
Falls are a serious public health problem and possibly life threatening for people in fall risk groups. We develop an automated fall detection system with wearable motion sensor units fitted to the subjects' body at six different positions. Each unit comprises three tri-axial devices (accelerometer, gyroscope, and magnetometer/compass). Fourteen volunteers perform a standardized set of movements including 20 voluntary falls and 16 activities of daily living (ADLs), resulting in a large dataset with 2520 trials. To reduce the computational complexity of training and testing the classifiers, we focus on the raw data for each sensor in a 4 s time window around the point of peak total acceleration of the waist sensor, and then perform feature extraction and reduction. Most earlier studies on fall detection employ rule-based approaches that rely on simple thresholding of the sensor outputs. We successfully distinguish falls from ADLs using six machine learning techniques (classifiers): the k-nearest neighbor (k-NN) classifier, least squares method (LSM), support vector machines (SVM), Bayesian decision making (BDM), dynamic time warping (DTW), and artificial neural networks (ANNs). We compare the performance and the computational complexity of the classifiers and achieve the best results with the k-NN classifier and LSM, with sensitivity, specificity, and accuracy all above 99%. These classifiers also have acceptable computational requirements for training and testing. Our approach would be applicable in real-world scenarios where data records of indeterminate length, containing multiple activities in sequence, are recorded.
Evaluation of Interruption Behavior by Naive Encoders.
ERIC Educational Resources Information Center
Coon, Christine A.; Schwanenflugel, Paula J.
1996-01-01
Determines the characteristics of interactions that influence judgments of interruption behavior in naive observers. Asks subjects to decide whether an example of an interruption was an interruption and then rate it in terms of how "good" or "bad" it was. Finds that naive observers use some of the same features described in…
Naive Probability: A Mental Model Theory of Extensional Reasoning.
ERIC Educational Resources Information Center
Johnson-Laird, P. N.; Legrenzi, Paolo; Girotto, Vittorio; Legrenzi, Maria Sonino; Caverni, Jean-Paul
1999-01-01
Outlines a theory of naive probability in which individuals who are unfamiliar with the probability calculus can infer the probabilities of events in an "extensional" way. The theory accommodates reasoning based on numerical premises, and explains how naive reasoners can infer posterior probabilities without relying on Bayes's theorem.…
Naive Juveniles Are More Likely to Become Breeders after Witnessing Predator Mobbing.
Griesser, Michael; Suzuki, Toshitaka N
2017-01-01
Responding appropriately during the first predatory attack in life is often critical for survival. In many social species, naive juveniles acquire this skill from conspecifics, but its fitness consequences remain virtually unknown. Here we experimentally demonstrate how naive juvenile Siberian jays (Perisoreus infaustus) derive a long-term fitness benefit from witnessing knowledgeable adults mobbing their principal predator, the goshawk (Accipiter gentilis). Siberian jays live in family groups of two to six individuals that also can include unrelated nonbreeders. Field observations showed that Siberian jays encounter predators only rarely, and, indeed, naive juveniles do not respond to predator models when on their own but do when observing other individuals mobbing them. Predator exposure experiments demonstrated that naive juveniles had a substantially higher first-winter survival after observing knowledgeable group members mobbing a goshawk model, increasing their likelihood of acquiring a breeding position later in life. Previous research showed that naive individuals may learn from others how to respond to predators, care for offspring, or choose mates, generally assuming that social learning has long-term fitness consequences without empirical evidence. Our results demonstrate a long-term fitness benefit of vertical social learning for naive individuals in the wild, emphasizing its evolutionary importance in animals, including humans.
Bayesian Integration of Isotope Ratio for Geographic Sourcing of Castor Beans
Webb-Robertson, Bobbie-Jo; Kreuzer, Helen; Hart, Garret; ...
2012-01-01
Recenmore » t years have seen an increase in the forensic interest associated with the poison ricin, which is extracted from the seeds of the Ricinus communis plant. Both light element (C, N, O, and H) and strontium (Sr) isotope ratios have previously been used to associate organic material with geographic regions of origin. We present a Bayesian integration methodology that can more accurately predict the region of origin for a castor bean than individual models developed independently for light element stable isotopes or Sr isotope ratios. Our results demonstrate a clear improvement in the ability to correctly classify regions based on the integrated model with a class accuracy of 60.9 ± 2.1 % versus 55.9 ± 2.1 % and 40.2 ± 1.8 % for the light element and strontium (Sr) isotope ratios, respectively. In addition, we show graphically the strengths and weaknesses of each dataset in respect to class prediction and how the integration of these datasets strengthens the overall model.« less
Bayesian Integration of Isotope Ratios for Geographic Sourcing of Castor Beans
DOE Office of Scientific and Technical Information (OSTI.GOV)
Webb-Robertson, Bobbie-Jo M.; Kreuzer, Helen W.; Hart, Garret L.
Recent years have seen an increase in the forensic interest associated with the poison ricin, which is extracted from the seeds of the Ricinus communis plant. Both light element (C, N, O, and H) and strontium (Sr) isotope ratios have previously been used to associate organic material with geographic regions of origin. We present a Bayesian integration methodology that can more accurately predict the region of origin for a castor bean than individual models developed independently for light element stable isotopes or Sr isotope ratios. Our results demonstrate a clear improvement in the ability to correctly classify regions based onmore » the integrated model with a class accuracy of 6 0 . 9 {+-} 2 . 1 % versus 5 5 . 9 {+-} 2 . 1 % and 4 0 . 2 {+-} 1 . 8 % for the light element and strontium (Sr) isotope ratios, respectively. In addition, we show graphically the strengths and weaknesses of each dataset in respect to class prediction and how the integration of these datasets strengthens the overall model.« less
Bayesian Integration of Isotope Ratio for Geographic Sourcing of Castor Beans
DOE Office of Scientific and Technical Information (OSTI.GOV)
Webb-Robertson, Bobbie-Jo; Kreuzer, Helen; Hart, Garret
Recenmore » t years have seen an increase in the forensic interest associated with the poison ricin, which is extracted from the seeds of the Ricinus communis plant. Both light element (C, N, O, and H) and strontium (Sr) isotope ratios have previously been used to associate organic material with geographic regions of origin. We present a Bayesian integration methodology that can more accurately predict the region of origin for a castor bean than individual models developed independently for light element stable isotopes or Sr isotope ratios. Our results demonstrate a clear improvement in the ability to correctly classify regions based on the integrated model with a class accuracy of 60.9 ± 2.1 % versus 55.9 ± 2.1 % and 40.2 ± 1.8 % for the light element and strontium (Sr) isotope ratios, respectively. In addition, we show graphically the strengths and weaknesses of each dataset in respect to class prediction and how the integration of these datasets strengthens the overall model.« less
NASA Astrophysics Data System (ADS)
Zhang, Peng; Peng, Jing; Sims, S. Richard F.
2005-05-01
In ATR applications, each feature is a convolution of an image with a filter. It is important to use most discriminant features to produce compact representations. We propose two novel subspace methods for dimension reduction to address limitations associated with Fukunaga-Koontz Transform (FKT). The first method, Scatter-FKT, assumes that target is more homogeneous, while clutter can be anything other than target and anywhere. Thus, instead of estimating a clutter covariance matrix, Scatter-FKT computes a clutter scatter matrix that measures the spread of clutter from the target mean. We choose dimensions along which the difference in variation between target and clutter is most pronounced. When the target follows a Gaussian distribution, Scatter-FKT can be viewed as a generalization of FKT. The second method, Optimal Bayesian Subspace, is derived from the optimal Bayesian classifier. It selects dimensions such that the minimum Bayes error rate can be achieved. When both target and clutter follow Gaussian distributions, OBS computes optimal subspace representations. We compare our methods against FKT using character image as well as IR data.
Bayesian Integration of Isotope Ratio for Geographic Sourcing of Castor Beans
Webb-Robertson, Bobbie-Jo; Kreuzer, Helen; Hart, Garret; Ehleringer, James; West, Jason; Gill, Gary; Duckworth, Douglas
2012-01-01
Recent years have seen an increase in the forensic interest associated with the poison ricin, which is extracted from the seeds of the Ricinus communis plant. Both light element (C, N, O, and H) and strontium (Sr) isotope ratios have previously been used to associate organic material with geographic regions of origin. We present a Bayesian integration methodology that can more accurately predict the region of origin for a castor bean than individual models developed independently for light element stable isotopes or Sr isotope ratios. Our results demonstrate a clear improvement in the ability to correctly classify regions based on the integrated model with a class accuracy of 60.9 ± 2.1% versus 55.9 ± 2.1% and 40.2 ± 1.8% for the light element and strontium (Sr) isotope ratios, respectively. In addition, we show graphically the strengths and weaknesses of each dataset in respect to class prediction and how the integration of these datasets strengthens the overall model. PMID:22919270
Data analysis using scale-space filtering and Bayesian probabilistic reasoning
NASA Technical Reports Server (NTRS)
Kulkarni, Deepak; Kutulakos, Kiriakos; Robinson, Peter
1991-01-01
This paper describes a program for analysis of output curves from Differential Thermal Analyzer (DTA). The program first extracts probabilistic qualitative features from a DTA curve of a soil sample, and then uses Bayesian probabilistic reasoning to infer the mineral in the soil. The qualifier module employs a simple and efficient extension of scale-space filtering suitable for handling DTA data. We have observed that points can vanish from contours in the scale-space image when filtering operations are not highly accurate. To handle the problem of vanishing points, perceptual organizations heuristics are used to group the points into lines. Next, these lines are grouped into contours by using additional heuristics. Probabilities are associated with these contours using domain-specific correlations. A Bayes tree classifier processes probabilistic features to infer the presence of different minerals in the soil. Experiments show that the algorithm that uses domain-specific correlation to infer qualitative features outperforms a domain-independent algorithm that does not.
Fuzzy Intervals for Designing Structural Signature: An Application to Graphic Symbol Recognition
NASA Astrophysics Data System (ADS)
Luqman, Muhammad Muzzamil; Delalandre, Mathieu; Brouard, Thierry; Ramel, Jean-Yves; Lladós, Josep
The motivation behind our work is to present a new methodology for symbol recognition. The proposed method employs a structural approach for representing visual associations in symbols and a statistical classifier for recognition. We vectorize a graphic symbol, encode its topological and geometrical information by an attributed relational graph and compute a signature from this structural graph. We have addressed the sensitivity of structural representations to noise, by using data adapted fuzzy intervals. The joint probability distribution of signatures is encoded by a Bayesian network, which serves as a mechanism for pruning irrelevant features and choosing a subset of interesting features from structural signatures of underlying symbol set. The Bayesian network is deployed in a supervised learning scenario for recognizing query symbols. The method has been evaluated for robustness against degradations & deformations on pre-segmented 2D linear architectural & electronic symbols from GREC databases, and for its recognition abilities on symbols with context noise i.e. cropped symbols.
Derivation of novel human ground state naive pluripotent stem cells.
Gafni, Ohad; Weinberger, Leehee; Mansour, Abed AlFatah; Manor, Yair S; Chomsky, Elad; Ben-Yosef, Dalit; Kalma, Yael; Viukov, Sergey; Maza, Itay; Zviran, Asaf; Rais, Yoach; Shipony, Zohar; Mukamel, Zohar; Krupalnik, Vladislav; Zerbib, Mirie; Geula, Shay; Caspi, Inbal; Schneir, Dan; Shwartz, Tamar; Gilad, Shlomit; Amann-Zalcenstein, Daniela; Benjamin, Sima; Amit, Ido; Tanay, Amos; Massarwa, Rada; Novershtern, Noa; Hanna, Jacob H
2013-12-12
Mouse embryonic stem (ES) cells are isolated from the inner cell mass of blastocysts, and can be preserved in vitro in a naive inner-cell-mass-like configuration by providing exogenous stimulation with leukaemia inhibitory factor (LIF) and small molecule inhibition of ERK1/ERK2 and GSK3β signalling (termed 2i/LIF conditions). Hallmarks of naive pluripotency include driving Oct4 (also known as Pou5f1) transcription by its distal enhancer, retaining a pre-inactivation X chromosome state, and global reduction in DNA methylation and in H3K27me3 repressive chromatin mark deposition on developmental regulatory gene promoters. Upon withdrawal of 2i/LIF, naive mouse ES cells can drift towards a primed pluripotent state resembling that of the post-implantation epiblast. Although human ES cells share several molecular features with naive mouse ES cells, they also share a variety of epigenetic properties with primed murine epiblast stem cells (EpiSCs). These include predominant use of the proximal enhancer element to maintain OCT4 expression, pronounced tendency for X chromosome inactivation in most female human ES cells, increase in DNA methylation and prominent deposition of H3K27me3 and bivalent domain acquisition on lineage regulatory genes. The feasibility of establishing human ground state naive pluripotency in vitro with equivalent molecular and functional features to those characterized in mouse ES cells remains to be defined. Here we establish defined conditions that facilitate the derivation of genetically unmodified human naive pluripotent stem cells from already established primed human ES cells, from somatic cells through induced pluripotent stem (iPS) cell reprogramming or directly from blastocysts. The novel naive pluripotent cells validated herein retain molecular characteristics and functional properties that are highly similar to mouse naive ES cells, and distinct from conventional primed human pluripotent cells. This includes competence in the generation of cross-species chimaeric mouse embryos that underwent organogenesis following microinjection of human naive iPS cells into mouse morulas. Collectively, our findings establish new avenues for regenerative medicine, patient-specific iPS cell disease modelling and the study of early human development in vitro and in vivo.
Wagner, Michael M.; Cooper, Gregory F.; Ferraro, Jeffrey P.; Su, Howard; Gesteland, Per H.; Haug, Peter J.; Millett, Nicholas E.; Aronis, John M.; Nowalk, Andrew J.; Ruiz, Victor M.; López Pineda, Arturo; Shi, Lingyun; Van Bree, Rudy; Ginter, Thomas; Tsui, Fuchiang
2017-01-01
Objectives This study evaluates the accuracy and transferability of Bayesian case detection systems (BCD) that use clinical notes from emergency department (ED) to detect influenza cases. Methods A BCD uses natural language processing (NLP) to infer the presence or absence of clinical findings from ED notes, which are fed into a Bayesain network classifier (BN) to infer patients’ diagnoses. We developed BCDs at the University of Pittsburgh Medical Center (BCDUPMC) and Intermountain Healthcare in Utah (BCDIH). At each site, we manually built a rule-based NLP and trained a Bayesain network classifier from over 40,000 ED encounters between Jan. 2008 and May. 2010 using feature selection, machine learning, and expert debiasing approach. Transferability of a BCD in this study may be impacted by seven factors: development (source) institution, development parser, application (target) institution, application parser, NLP transfer, BN transfer, and classification task. We employed an ANOVA analysis to study their impacts on BCD performance. Results Both BCDs discriminated well between influenza and non-influenza on local test cases (AUCs > 0.92). When tested for transferability using the other institution’s cases, BCDUPMC discriminations declined minimally (AUC decreased from 0.95 to 0.94, p<0.01), and BCDIH discriminations declined more (from 0.93 to 0.87, p<0.0001). We attributed the BCDIH decline to the lower recall of the IH parser on UPMC notes. The ANOVA analysis showed five significant factors: development parser, application institution, application parser, BN transfer, and classification task. Conclusion We demonstrated high influenza case detection performance in two large healthcare systems in two geographically separated regions, providing evidentiary support for the use of automated case detection from routinely collected electronic clinical notes in national influenza surveillance. The transferability could be improved by training Bayesian network classifier locally and increasing the accuracy of the NLP parser. PMID:28380048
Ye, Ye; Wagner, Michael M; Cooper, Gregory F; Ferraro, Jeffrey P; Su, Howard; Gesteland, Per H; Haug, Peter J; Millett, Nicholas E; Aronis, John M; Nowalk, Andrew J; Ruiz, Victor M; López Pineda, Arturo; Shi, Lingyun; Van Bree, Rudy; Ginter, Thomas; Tsui, Fuchiang
2017-01-01
This study evaluates the accuracy and transferability of Bayesian case detection systems (BCD) that use clinical notes from emergency department (ED) to detect influenza cases. A BCD uses natural language processing (NLP) to infer the presence or absence of clinical findings from ED notes, which are fed into a Bayesain network classifier (BN) to infer patients' diagnoses. We developed BCDs at the University of Pittsburgh Medical Center (BCDUPMC) and Intermountain Healthcare in Utah (BCDIH). At each site, we manually built a rule-based NLP and trained a Bayesain network classifier from over 40,000 ED encounters between Jan. 2008 and May. 2010 using feature selection, machine learning, and expert debiasing approach. Transferability of a BCD in this study may be impacted by seven factors: development (source) institution, development parser, application (target) institution, application parser, NLP transfer, BN transfer, and classification task. We employed an ANOVA analysis to study their impacts on BCD performance. Both BCDs discriminated well between influenza and non-influenza on local test cases (AUCs > 0.92). When tested for transferability using the other institution's cases, BCDUPMC discriminations declined minimally (AUC decreased from 0.95 to 0.94, p<0.01), and BCDIH discriminations declined more (from 0.93 to 0.87, p<0.0001). We attributed the BCDIH decline to the lower recall of the IH parser on UPMC notes. The ANOVA analysis showed five significant factors: development parser, application institution, application parser, BN transfer, and classification task. We demonstrated high influenza case detection performance in two large healthcare systems in two geographically separated regions, providing evidentiary support for the use of automated case detection from routinely collected electronic clinical notes in national influenza surveillance. The transferability could be improved by training Bayesian network classifier locally and increasing the accuracy of the NLP parser.
Enhancement of force patterns classification based on Gaussian distributions.
Ertelt, Thomas; Solomonovs, Ilja; Gronwald, Thomas
2018-01-23
Description of the patterns of ground reaction force is a standard method in areas such as medicine, biomechanics and robotics. The fundamental parameter is the time course of the force, which is classified visually in particular in the field of clinical diagnostics. Here, the knowledge and experience of the diagnostician is relevant for its assessment. For an objective and valid discrimination of the ground reaction force pattern, a generic method, especially in the medical field, is absolutely necessary to describe the qualities of the time-course. The aim of the presented method was to combine the approaches of two existing procedures from the fields of machine learning and the Gauss approximation in order to take advantages of both methods for the classification of ground reaction force patterns. The current limitations of both methods could be eliminated by an overarching method. Twenty-nine male athletes from different sports were examined. Each participant was given the task of performing a one-legged stopping maneuver on a force plate from the maximum possible starting speed. The individual time course of the ground reaction force of each subject was registered and approximated on the basis of eight Gaussian distributions. The descriptive coefficients were then classified using Bayesian regulated neural networks. The different sports served as the distinguishing feature. Although the athletes were all given the same task, all sports referred to a different quality in the time course of ground reaction force. Meanwhile within each sport, the athletes were homogeneous. With an overall prediction (R = 0.938) all subjects/sports were classified correctly with 94.29% accuracy. The combination of the two methods: the mathematical description of the time course of ground reaction forces on the basis of Gaussian distributions and their classification by means of Bayesian regulated neural networks, seems an adequate and promising method to discriminate the ground reaction forces without any loss of information. Copyright © 2017 Elsevier Ltd. All rights reserved.
Cinelli, Mattia; Sun, , Yuxin; Best, Katharine; Heather, James M.; Reich-Zeliger, Shlomit; Shifrut, Eric; Friedman, Nir; Shawe-Taylor, John; Chain, Benny
2017-01-01
Abstract Motivation: Somatic DNA recombination, the hallmark of vertebrate adaptive immunity, has the potential to generate a vast diversity of antigen receptor sequences. How this diversity captures antigen specificity remains incompletely understood. In this study we use high throughput sequencing to compare the global changes in T cell receptor β chain complementarity determining region 3 (CDR3β) sequences following immunization with ovalbumin administered with complete Freund’s adjuvant (CFA) or CFA alone. Results: The CDR3β sequences were deconstructed into short stretches of overlapping contiguous amino acids. The motifs were ranked according to a one-dimensional Bayesian classifier score comparing their frequency in the repertoires of the two immunization classes. The top ranking motifs were selected and used to create feature vectors which were used to train a support vector machine. The support vector machine achieved high classification scores in a leave-one-out validation test reaching >90% in some cases. Summary: The study describes a novel two-stage classification strategy combining a one-dimensional Bayesian classifier with a support vector machine. Using this approach we demonstrate that the frequency of a small number of linear motifs three amino acids in length can accurately identify a CD4 T cell response to ovalbumin against a background response to the complex mixture of antigens which characterize Complete Freund’s Adjuvant. Availability and implementation: The sequence data is available at www.ncbi.nlm.nih.gov/sra/?term¼SRP075893. The Decombinator package is available at github.com/innate2adaptive/Decombinator. The R package e1071 is available at the CRAN repository https://cran.r-project.org/web/packages/e1071/index.html. Contact: b.chain@ucl.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online. PMID:28073756
ERIC Educational Resources Information Center
Williams, Joanne M.
2012-01-01
This paper aims to provide developmental data on two connected naive inheritance concepts and to explore the coherence of children's naive biology knowledge. Two tasks examined children and adolescents' (4, 7, 10, and 14 years) conceptions of phenotypic resemblance across kin (in physical characteristics, disabilities, and personality traits). The…
NASA Technical Reports Server (NTRS)
Christenson, D.; Gordon, M.; Kistler, R.; Kriegler, F.; Lampert, S.; Marshall, R.; Mclaughlin, R.
1977-01-01
A third-generation, fast, low cost, multispectral recognition system (MIDAS) able to keep pace with the large quantity and high rates of data acquisition from large regions with present and projected sensots is described. The program can process a complete ERTS frame in forty seconds and provide a color map of sixteen constituent categories in a few minutes. A principle objective of the MIDAS program is to provide a system well interfaced with the human operator and thus to obtain large overall reductions in turn-around time and significant gains in throughput. The hardware and software generated in the overall program is described. The system contains a midi-computer to control the various high speed processing elements in the data path, a preprocessor to condition data, and a classifier which implements an all digital prototype multivariate Gaussian maximum likelihood or a Bayesian decision algorithm. Sufficient software was developed to perform signature extraction, control the preprocessor, compute classifier coefficients, control the classifier operation, operate the color display and printer, and diagnose operation.
Discriminative Hierarchical K-Means Tree for Large-Scale Image Classification.
Chen, Shizhi; Yang, Xiaodong; Tian, Yingli
2015-09-01
A key challenge in large-scale image classification is how to achieve efficiency in terms of both computation and memory without compromising classification accuracy. The learning-based classifiers achieve the state-of-the-art accuracies, but have been criticized for the computational complexity that grows linearly with the number of classes. The nonparametric nearest neighbor (NN)-based classifiers naturally handle large numbers of categories, but incur prohibitively expensive computation and memory costs. In this brief, we present a novel classification scheme, i.e., discriminative hierarchical K-means tree (D-HKTree), which combines the advantages of both learning-based and NN-based classifiers. The complexity of the D-HKTree only grows sublinearly with the number of categories, which is much better than the recent hierarchical support vector machines-based methods. The memory requirement is the order of magnitude less than the recent Naïve Bayesian NN-based approaches. The proposed D-HKTree classification scheme is evaluated on several challenging benchmark databases and achieves the state-of-the-art accuracies, while with significantly lower computation cost and memory requirement.
Pattern Recognition Approaches for Breast Cancer DCE-MRI Classification: A Systematic Review.
Fusco, Roberta; Sansone, Mario; Filice, Salvatore; Carone, Guglielmo; Amato, Daniela Maria; Sansone, Carlo; Petrillo, Antonella
2016-01-01
We performed a systematic review of several pattern analysis approaches for classifying breast lesions using dynamic, morphological, and textural features in dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI). Several machine learning approaches, namely artificial neural networks (ANN), support vector machines (SVM), linear discriminant analysis (LDA), tree-based classifiers (TC), and Bayesian classifiers (BC), and features used for classification are described. The findings of a systematic review of 26 studies are presented. The sensitivity and specificity are respectively 91 and 83 % for ANN, 85 and 82 % for SVM, 96 and 85 % for LDA, 92 and 87 % for TC, and 82 and 85 % for BC. The sensitivity and specificity are respectively 82 and 74 % for dynamic features, 93 and 60 % for morphological features, 88 and 81 % for textural features, 95 and 86 % for a combination of dynamic and morphological features, and 88 and 84 % for a combination of dynamic, morphological, and other features. LDA and TC have the best performance. A combination of dynamic and morphological features gives the best performance.
Scarsi, Kimberly K; Darin, Kristin M; Nakalema, Shadia; Back, David J; Byakika-Kibwika, Pauline; Else, Laura J; Dilly Penchala, Sujan; Buzibye, Allan; Cohn, Susan E; Merry, Concepta; Lamorde, Mohammed
2016-03-15
Levonorgestrel subdermal implants are preferred contraceptives with an expected failure rate of <1% over 5 years. We assessed the effect of efavirenz- or nevirapine-based antiretroviral therapy (ART) coadministration on levonorgestrel pharmacokinetics. This nonrandomized, parallel group, pharmacokinetic evaluation was conducted in three groups of human immunodeficiency virus-infected Ugandan women: ART-naive (n = 17), efavirenz-based ART (n = 20), and nevirapine-based ART (n = 20). Levonorgestrel implants were inserted at baseline in all women. Blood was collected at 1, 4, 12, 24, 36, and 48 weeks. The primary endpoint was week 24 levonorgestrel concentrations, compared between the ART-naive group and each ART group by geometric mean ratio (GMR) with 90% confidence interval (CI). Secondary endpoints included week 48 levonorgestrel concentrations and unintended pregnancies. Week 24 geometric mean levonorgestrel concentrations were 528, 280, and 710 pg/mL in the ART-naive, efavirenz, and nevirapine groups, respectively (efavirenz: ART-naive GMR, 0.53; 90% CI, .50, .55 and nevirapine: ART-naive GMR, 1.35; 90% CI, 1.29, 1.43). Week 48 levonorgestrel concentrations were 580, 247, and 664 pg/mL in the ART-naive, efavirenz, and nevirapine groups, respectively (efavirenz: ART-naive GMR, 0.43; 90% CI, .42, .44 and nevirapine: ART-naive GMR, 1.14; 90% CI, 1.14, 1.16). Three pregnancies (3/20, 15%) occurred in the efavirenz group between weeks 36 and 48. No pregnancies occurred in the ART-naive or nevirapine groups. Within 1 year of combined use, levonorgestrel exposure was markedly reduced in participants who received efavirenz-based ART, accompanied by contraceptive failures. In contrast, nevirapine-based ART did not adversely affect levonorgestrel exposure or efficacy. NCT01789879. © The Author 2015. Published by Oxford University Press for the Infectious Diseases Society of America.
Scarsi, Kimberly K.; Darin, Kristin M.; Nakalema, Shadia; Back, David J.; Byakika-Kibwika, Pauline; Else, Laura J.; Dilly Penchala, Sujan; Buzibye, Allan; Cohn, Susan E.; Merry, Concepta; Lamorde, Mohammed
2016-01-01
Background. Levonorgestrel subdermal implants are preferred contraceptives with an expected failure rate of <1% over 5 years. We assessed the effect of efavirenz- or nevirapine-based antiretroviral therapy (ART) coadministration on levonorgestrel pharmacokinetics. Methods. This nonrandomized, parallel group, pharmacokinetic evaluation was conducted in three groups of human immunodeficiency virus–infected Ugandan women: ART-naive (n = 17), efavirenz-based ART (n = 20), and nevirapine-based ART (n = 20). Levonorgestrel implants were inserted at baseline in all women. Blood was collected at 1, 4, 12, 24, 36, and 48 weeks. The primary endpoint was week 24 levonorgestrel concentrations, compared between the ART-naive group and each ART group by geometric mean ratio (GMR) with 90% confidence interval (CI). Secondary endpoints included week 48 levonorgestrel concentrations and unintended pregnancies. Results. Week 24 geometric mean levonorgestrel concentrations were 528, 280, and 710 pg/mL in the ART-naive, efavirenz, and nevirapine groups, respectively (efavirenz: ART-naive GMR, 0.53; 90% CI, .50, .55 and nevirapine: ART-naive GMR, 1.35; 90% CI, 1.29, 1.43). Week 48 levonorgestrel concentrations were 580, 247, and 664 pg/mL in the ART-naive, efavirenz, and nevirapine groups, respectively (efavirenz: ART-naive GMR, 0.43; 90% CI, .42, .44 and nevirapine: ART-naive GMR, 1.14; 90% CI, 1.14, 1.16). Three pregnancies (3/20, 15%) occurred in the efavirenz group between weeks 36 and 48. No pregnancies occurred in the ART-naive or nevirapine groups. Conclusions. Within 1 year of combined use, levonorgestrel exposure was markedly reduced in participants who received efavirenz-based ART, accompanied by contraceptive failures. In contrast, nevirapine-based ART did not adversely affect levonorgestrel exposure or efficacy. Clinical Trials Registration. NCT01789879. PMID:26646680
NASA Astrophysics Data System (ADS)
Skataric, Maja; Bose, Sandip; Zeroug, Smaine; Tilke, Peter
2017-02-01
It is not uncommon in the field of non-destructive evaluation that multiple measurements encompassing a variety of modalities are available for analysis and interpretation for determining the underlying states of nature of the materials or parts being tested. Despite and sometimes due to the richness of data, significant challenges arise in the interpretation manifested as ambiguities and inconsistencies due to various uncertain factors in the physical properties (inputs), environment, measurement device properties, human errors, and the measurement data (outputs). Most of these uncertainties cannot be described by any rigorous mathematical means, and modeling of all possibilities is usually infeasible for many real time applications. In this work, we will discuss an approach based on Hierarchical Bayesian Graphical Models (HBGM) for the improved interpretation of complex (multi-dimensional) problems with parametric uncertainties that lack usable physical models. In this setting, the input space of the physical properties is specified through prior distributions based on domain knowledge and expertise, which are represented as Gaussian mixtures to model the various possible scenarios of interest for non-destructive testing applications. Forward models are then used offline to generate the expected distribution of the proposed measurements which are used to train a hierarchical Bayesian network. In Bayesian analysis, all model parameters are treated as random variables, and inference of the parameters is made on the basis of posterior distribution given the observed data. Learned parameters of the posterior distribution obtained after the training can therefore be used to build an efficient classifier for differentiating new observed data in real time on the basis of pre-trained models. We will illustrate the implementation of the HBGM approach to ultrasonic measurements used for cement evaluation of cased wells in the oil industry.
Computer-aided diagnosis of lung nodule using gradient tree boosting and Bayesian optimization.
Nishio, Mizuho; Nishizawa, Mitsuo; Sugiyama, Osamu; Kojima, Ryosuke; Yakami, Masahiro; Kuroda, Tomohiro; Togashi, Kaori
2018-01-01
We aimed to evaluate a computer-aided diagnosis (CADx) system for lung nodule classification focussing on (i) usefulness of the conventional CADx system (hand-crafted imaging feature + machine learning algorithm), (ii) comparison between support vector machine (SVM) and gradient tree boosting (XGBoost) as machine learning algorithms, and (iii) effectiveness of parameter optimization using Bayesian optimization and random search. Data on 99 lung nodules (62 lung cancers and 37 benign lung nodules) were included from public databases of CT images. A variant of the local binary pattern was used for calculating a feature vector. SVM or XGBoost was trained using the feature vector and its corresponding label. Tree Parzen Estimator (TPE) was used as Bayesian optimization for parameters of SVM and XGBoost. Random search was done for comparison with TPE. Leave-one-out cross-validation was used for optimizing and evaluating the performance of our CADx system. Performance was evaluated using area under the curve (AUC) of receiver operating characteristic analysis. AUC was calculated 10 times, and its average was obtained. The best averaged AUC of SVM and XGBoost was 0.850 and 0.896, respectively; both were obtained using TPE. XGBoost was generally superior to SVM. Optimal parameters for achieving high AUC were obtained with fewer numbers of trials when using TPE, compared with random search. Bayesian optimization of SVM and XGBoost parameters was more efficient than random search. Based on observer study, AUC values of two board-certified radiologists were 0.898 and 0.822. The results show that diagnostic accuracy of our CADx system was comparable to that of radiologists with respect to classifying lung nodules.
BayeSED: A General Approach to Fitting the Spectral Energy Distribution of Galaxies
NASA Astrophysics Data System (ADS)
Han, Yunkun; Han, Zhanwen
2014-11-01
We present a newly developed version of BayeSED, a general Bayesian approach to the spectral energy distribution (SED) fitting of galaxies. The new BayeSED code has been systematically tested on a mock sample of galaxies. The comparison between the estimated and input values of the parameters shows that BayeSED can recover the physical parameters of galaxies reasonably well. We then applied BayeSED to interpret the SEDs of a large Ks -selected sample of galaxies in the COSMOS/UltraVISTA field with stellar population synthesis models. Using the new BayeSED code, a Bayesian model comparison of stellar population synthesis models has been performed for the first time. We found that the 2003 model by Bruzual & Charlot, statistically speaking, has greater Bayesian evidence than the 2005 model by Maraston for the Ks -selected sample. In addition, while setting the stellar metallicity as a free parameter obviously increases the Bayesian evidence of both models, varying the initial mass function has a notable effect only on the Maraston model. Meanwhile, the physical parameters estimated with BayeSED are found to be generally consistent with those obtained using the popular grid-based FAST code, while the former parameters exhibit more natural distributions. Based on the estimated physical parameters of the galaxies in the sample, we qualitatively classified the galaxies in the sample into five populations that may represent galaxies at different evolution stages or in different environments. We conclude that BayeSED could be a reliable and powerful tool for investigating the formation and evolution of galaxies from the rich multi-wavelength observations currently available. A binary version of the BayeSED code parallelized with Message Passing Interface is publicly available at https://bitbucket.org/hanyk/bayesed.
BayeSED: A GENERAL APPROACH TO FITTING THE SPECTRAL ENERGY DISTRIBUTION OF GALAXIES
DOE Office of Scientific and Technical Information (OSTI.GOV)
Han, Yunkun; Han, Zhanwen, E-mail: hanyk@ynao.ac.cn, E-mail: zhanwenhan@ynao.ac.cn
2014-11-01
We present a newly developed version of BayeSED, a general Bayesian approach to the spectral energy distribution (SED) fitting of galaxies. The new BayeSED code has been systematically tested on a mock sample of galaxies. The comparison between the estimated and input values of the parameters shows that BayeSED can recover the physical parameters of galaxies reasonably well. We then applied BayeSED to interpret the SEDs of a large K{sub s} -selected sample of galaxies in the COSMOS/UltraVISTA field with stellar population synthesis models. Using the new BayeSED code, a Bayesian model comparison of stellar population synthesis models has beenmore » performed for the first time. We found that the 2003 model by Bruzual and Charlot, statistically speaking, has greater Bayesian evidence than the 2005 model by Maraston for the K{sub s} -selected sample. In addition, while setting the stellar metallicity as a free parameter obviously increases the Bayesian evidence of both models, varying the initial mass function has a notable effect only on the Maraston model. Meanwhile, the physical parameters estimated with BayeSED are found to be generally consistent with those obtained using the popular grid-based FAST code, while the former parameters exhibit more natural distributions. Based on the estimated physical parameters of the galaxies in the sample, we qualitatively classified the galaxies in the sample into five populations that may represent galaxies at different evolution stages or in different environments. We conclude that BayeSED could be a reliable and powerful tool for investigating the formation and evolution of galaxies from the rich multi-wavelength observations currently available. A binary version of the BayeSED code parallelized with Message Passing Interface is publicly available at https://bitbucket.org/hanyk/bayesed.« less
Detection and recognition of simple spatial forms
NASA Technical Reports Server (NTRS)
Watson, A. B.
1983-01-01
A model of human visual sensitivity to spatial patterns is constructed. The model predicts the visibility and discriminability of arbitrary two-dimensional monochrome images. The image is analyzed by a large array of linear feature sensors, which differ in spatial frequency, phase, orientation, and position in the visual field. All sensors have one octave frequency bandwidths, and increase in size linearly with eccentricity. Sensor responses are processed by an ideal Bayesian classifier, subject to uncertainty. The performance of the model is compared to that of the human observer in detecting and discriminating some simple images.
Introduction to IND and recursive partitioning, version 1.0
NASA Technical Reports Server (NTRS)
Buntine, Wray; Caruana, Rich
1991-01-01
This manual describes the IND package for learning tree classifiers from data. The package is an integrated C and C shell re-implementation of tree learning routines such as CART, C4, and various MDL and Bayesian variations. The package includes routines for experiment control, interactive operation, and analysis of tree building. The manual introduces the system and its many options, gives a basic review of tree learning, contains a guide to the literature and a glossary, lists the manual pages for the routines, and instructions on installation.
Beatty, William; Jay, Chadwick V.; Fischbach, Anthony S.
2016-01-01
State-space models offer researchers an objective approach to modeling complex animal location data sets, and state-space model behavior classifications are often assumed to have a link to animal behavior. In this study, we evaluated the behavioral classification accuracy of a Bayesian state-space model in Pacific walruses using Argos satellite tags with sensors to detect animal behavior in real time. We fit a two-state discrete-time continuous-space Bayesian state-space model to data from 306 Pacific walruses tagged in the Chukchi Sea. We matched predicted locations and behaviors from the state-space model (resident, transient behavior) to true animal behavior (foraging, swimming, hauled out) and evaluated classification accuracy with kappa statistics (κ) and root mean square error (RMSE). In addition, we compared biased random bridge utilization distributions generated with resident behavior locations to true foraging behavior locations to evaluate differences in space use patterns. Results indicated that the two-state model fairly classified true animal behavior (0.06 ≤ κ ≤ 0.26, 0.49 ≤ RMSE ≤ 0.59). Kernel overlap metrics indicated utilization distributions generated with resident behavior locations were generally smaller than utilization distributions generated with true foraging behavior locations. Consequently, we encourage researchers to carefully examine parameters and priors associated with behaviors in state-space models, and reconcile these parameters with the study species and its expected behaviors.
Bayesian data fusion for spatial prediction of categorical variables in environmental sciences
NASA Astrophysics Data System (ADS)
Gengler, Sarah; Bogaert, Patrick
2014-12-01
First developed to predict continuous variables, Bayesian Maximum Entropy (BME) has become a complete framework in the context of space-time prediction since it has been extended to predict categorical variables and mixed random fields. This method proposes solutions to combine several sources of data whatever the nature of the information. However, the various attempts that were made for adapting the BME methodology to categorical variables and mixed random fields faced some limitations, as a high computational burden. The main objective of this paper is to overcome this limitation by generalizing the Bayesian Data Fusion (BDF) theoretical framework to categorical variables, which is somehow a simplification of the BME method through the convenient conditional independence hypothesis. The BDF methodology for categorical variables is first described and then applied to a practical case study: the estimation of soil drainage classes using a soil map and point observations in the sandy area of Flanders around the city of Mechelen (Belgium). The BDF approach is compared to BME along with more classical approaches, as Indicator CoKringing (ICK) and logistic regression. Estimators are compared using various indicators, namely the Percentage of Correctly Classified locations (PCC) and the Average Highest Probability (AHP). Although BDF methodology for categorical variables is somehow a simplification of BME approach, both methods lead to similar results and have strong advantages compared to ICK and logistic regression.
Zhang, Jingyang; Chaloner, Kathryn; McLinden, James H.; Stapleton, Jack T.
2013-01-01
Reconciling two quantitative ELISA tests for an antibody to an RNA virus, in a situation without a gold standard and where false negatives may occur, is the motivation for this work. False negatives occur when access of the antibody to the binding site is blocked. Based on the mechanism of the assay, a mixture of four bivariate normal distributions is proposed with the mixture probabilities depending on a two-stage latent variable model including the prevalence of the antibody in the population and the probabilities of blocking on each test. There is prior information on the prevalence of the antibody, and also on the probability of false negatives, and so a Bayesian analysis is used. The dependence between the two tests is modeled to be consistent with the biological mechanism. Bayesian decision theory is utilized for classification. The proposed method is applied to the motivating data set to classify the data into two groups: those with and those without the antibody. Simulation studies describe the properties of the estimation and the classification. Sensitivity to the choice of the prior distribution is also addressed by simulation. The same model with two levels of latent variables is applicable in other testing procedures such as quantitative polymerase chain reaction tests where false negatives occur when there is a mutation in the primer sequence. PMID:23592433
Yi, Min; Buchholz, Thomas A.; Meric-Bernstam, Funda; Bedrosian, Isabelle; Hwang, Rosa F.; Ross, Merrick I.; Kuerer, Henry M.; Luo, Sheng; Gonzalez-Angulo, Ana M.; Buzdar, Aman U.; Symmans, W. Fraser; Feig, Barry W.; Lucci, Anthony; Huang, Eugene H.; Hunt, Kelly K.
2015-01-01
Objective To classify ipsilateral breast tumor recurrences (IBTR) as either new primary tumors (NP) or true local recurrence (TR). We utilized two different methods and compared sensitivities and specificities between them. Our goal was to determine whether distinguishing NP from TR had prognostic value. Summary Background Data After breast-conservation therapy (BCT), IBTR may be classified into two distinct types (NP and TR). Studies have attempted to classify IBTR by using tumor location, histologic subtype, DNA flow cytometry data, or gene-expression profiling data. Methods 447 (7.9%) of 5660 patients undergoing BCT from 1970 to 2005 experienced IBTR. Clinical data from 397 patients were available for review. We classified IBTRs as NP or TR on the basis of either tumor location and histologic subtype (method 1) or tumor location, histologic subtype, estrogen receptor (ER) status and human epidermal growth factor receptor 2 (HER-2) status (method 2). Kaplan-Meier curves and log-rank tests were used to evaluate overall and disease-specific survival (DSS) differences between the two groups. Classification methods were validated by calculating sensitivity and specificity values using a Bayesian method. Results Of 397 patients, 196 (49.4%) were classified as NP by method 1 and 212 (53.4%) were classified as NP by method 2. The sensitivity and specificity values were 0.812 and 0.867 for method 1 and 0.870 and 0.800 for method 2, respectively. Regardless of method used, patients classified as NP developed contralateral breast carcinoma more often but had better 10-year overall and DSS rates than those classified as TR. Patients with TR were more likely to develop metastatic disease after IBTR. Conclusion IBTR classified as TR and NP had clinically different features, suggesting that classifying IBTR may provide clinically significant data for the management of IBTR. PMID:21209588
Do the Naive Know Best? The Predictive Power of Naive Ratings of Couple Interactions
ERIC Educational Resources Information Center
Baucom, Katherine J. W.; Baucom, Brian R.; Christensen, Andrew
2012-01-01
We examined the utility of naive ratings of communication patterns and relationship quality in a large sample of distressed couples. Untrained raters assessed 10-min videotaped interactions from 134 distressed couples who participated in both problem-solving and social support discussions at each of 3 time points (pre-therapy, post-therapy, and…
The Preference for Symmetry in Flower-Naive and Not-so-Naive Bumblebees
ERIC Educational Resources Information Center
Plowright, C. M. S.; Evans, S. A.; Leung, J. Chew; Collin, C. A.
2011-01-01
Truly flower-naive bumblebees, with no prior rewarded experience for visits on any visual patterns outside the colony, were tested for their choice of bilaterally symmetric over asymmetric patterns in a radial-arm maze. No preference for symmetry was found. Prior training with rewarded black and white disks did, however, lead to a significant…
Naive Theory of Biology: The Pre-School Child's Explanation of Death
ERIC Educational Resources Information Center
Vlok, Milandre; de Witt, Marike W.
2012-01-01
This article explains the naive theory of biology that the pre-school child uses to explain the cause of death. The empirical investigation showed that the young participants do use a naive theory of biology to explain function and do make reference to "vitalistic causality" in explaining organ function. Furthermore, most of these…
Classifier-Guided Sampling for Complex Energy System Optimization
DOE Office of Scientific and Technical Information (OSTI.GOV)
Backlund, Peter B.; Eddy, John P.
2015-09-01
This report documents the results of a Laboratory Directed Research and Development (LDRD) effort enti tled "Classifier - Guided Sampling for Complex Energy System Optimization" that was conducted during FY 2014 and FY 2015. The goal of this proj ect was to develop, implement, and test major improvements to the classifier - guided sampling (CGS) algorithm. CGS is type of evolutionary algorithm for perform ing search and optimization over a set of discrete design variables in the face of one or more objective functions. E xisting evolutionary algorithms, such as genetic algorithms , may require a large number of omore » bjecti ve function evaluations to identify optimal or near - optimal solutions . Reducing the number of evaluations can result in significant time savings, especially if the objective function is computationally expensive. CGS reduce s the evaluation count by us ing a Bayesian network classifier to filter out non - promising candidate designs , prior to evaluation, based on their posterior probabilit ies . In this project, b oth the single - objective and multi - objective version s of the CGS are developed and tested on a set of benchm ark problems. As a domain - specific case study, CGS is used to design a microgrid for use in islanded mode during an extended bulk power grid outage.« less
A possible role of a cerebral energy gene in alcoholism.
Ribeiro, A F; Correia, D; Boerngen-Lacerda, R; Brunialti-Godard, A L
2012-02-17
We examined a possible relationship between genes responsible for energy metabolism of the brain and addictive behavior in an animal model. We used non-inbred, Swiss mice exposed to a three-bottle free-choice model [water, 5% (v/v) ethanol, and 10% (v/v) ethanol] over a 16-week period, consisting of four phases: acquisition, withdrawal, reexposure, and quinine-adulteration. The mice were then behaviorally classified into three groups: loss-of-control-drinker (preference for ethanol and high levels of consumption during all phases, N = 6), heavy-drinker (preference for ethanol and high levels of consumption during acquisition and reduction during quinine-adulteration, N = 7), and light-drinker (preference for water during all phases, N = 10). Another group only received tap water (ethanol-naive control mice, N = 9). Further analysis using quantitative real-time PCR showed that in mice behaviorally classified as loss-of-control-drinkers, there was a significant inverse correlation between transcript levels of the Hadh gene and those of other energy metabolism genes in the nucleus of the amygdala, suggesting that this pathway may contribute to ethanol consumption in these mice. We conclude that cerebral energy metabolism is involved with ethanol addiction, meriting further study.
Chen, Ce; Jiang, Wenhui; Zhong, Na; Wu, Jin; Jiang, Haifeng; Du, Jiang; Li, Ye; Ma, Xiancang; Zhao, Min; Hashimoto, Kenji; Gao, Chengge
2014-11-01
Although first-episode drug naive patients with schizophrenia are known to show cognitive impairment, the cognitive performances of these patients, who suffer deficit syndrome, compared with those who suffer non-deficit syndrome is undetermined. The aim of this study was to compare cognitive performances in first-episode drug-naive schizophrenia with deficit syndrome or non-deficit syndrome. First-episode drug naive patients (n=49) and medicated patients (n=108) with schizophrenia, and age, sex, and education matched healthy controls (n=57 for the first-episode group, and n=128 for the medicated group) were enrolled. Patients were divided into deficit or non-deficit syndrome groups, using the Schedule for Deficit Syndrome. Cognitive performance was assessed using the CogState computerized cognitive battery. All cognitive domains in first-episode drug naive and medicated patients showed significant impairment compared with their respective control groups. Furthermore, cognitive performance in first-episode drug naive patients was significantly worse than in medicated patients. Interestingly, the cognitive performance markers of processing speed and attention, in first-episode drug naive patients with deficit syndrome, were both significantly worse than in equivalent patients without deficit syndrome. In contrast, no differences in cognitive performance were found between the two groups of medicated patients. In conclusion, this study found that first-episode drug naive schizophrenia with deficit syndrome showed significantly impaired processing speed and attention, compared with patients with non-deficit syndrome. These findings highlight processing speed and attention as potential targets for pharmacological and psychosocial interventions in first-episode schizophrenia with deficit syndrome, since these domains are associated with social outcomes. Copyright © 2014 Elsevier B.V. All rights reserved.
Jaleco, Sara; Swainson, Louise; Dardalhon, Valérie; Burjanadze, Maryam; Kinet, Sandrina; Taylor, Naomi
2003-07-01
Cytokines play a crucial role in the maintenance of polyclonal naive and memory T cell populations. It has previously been shown that ex vivo, the IL-7 cytokine induces the proliferation of naive recent thymic emigrants (RTE) isolated from umbilical cord blood but not mature adult-derived naive and memory human CD4(+) T cells. We find that the combination of IL-2 and IL-7 strongly promotes the proliferation of RTE, whereas adult CD4(+) T cells remain relatively unresponsive. Immunological activity is controlled by a balance between proliferation and apoptotic cell death. However, the relative contributions of IL-2 and IL-7 in regulating these processes in the absence of MHC/peptide signals are not known. Following exposure to either IL-2 or IL-7 alone, RTE, as well as mature naive and memory CD4(+) T cells, are rendered only minimally sensitive to Fas-mediated cell death. However, in the presence of the two cytokines, Fas engagement results in a high level of caspase-dependent apoptosis in both RTE as well as naive adult CD4(+) T cells. In contrast, equivalently treated memory CD4(+) T cells are significantly less sensitive to Fas-induced cell death. The increased susceptibility of RTE and naive CD4(+) T cells to Fas-induced apoptosis correlates with a significantly higher IL-2/IL-7-induced Fas expression on these T cell subsets than on memory CD4(+) T cells. Thus, IL-2 and IL-7 regulate homeostasis by modulating the equilibrium between proliferation and apoptotic cell death in RTE and mature naive and memory T cell subsets.
Statistical approaches for the determination of cut points in anti-drug antibody bioassays.
Schaarschmidt, Frank; Hofmann, Matthias; Jaki, Thomas; Grün, Bettina; Hothorn, Ludwig A
2015-03-01
Cut points in immunogenicity assays are used to classify future specimens into anti-drug antibody (ADA) positive or negative. To determine a cut point during pre-study validation, drug-naive specimens are often analyzed on multiple microtiter plates taking sources of future variability into account, such as runs, days, analysts, gender, drug-spiked and the biological variability of un-spiked specimens themselves. Five phenomena may complicate the statistical cut point estimation: i) drug-naive specimens may contain already ADA-positives or lead to signals that erroneously appear to be ADA-positive, ii) mean differences between plates may remain after normalization of observations by negative control means, iii) experimental designs may contain several factors in a crossed or hierarchical structure, iv) low sample sizes in such complex designs lead to low power for pre-tests on distribution, outliers and variance structure, and v) the choice between normal and log-normal distribution has a serious impact on the cut point. We discuss statistical approaches to account for these complex data: i) mixture models, which can be used to analyze sets of specimens containing an unknown, possibly larger proportion of ADA-positive specimens, ii) random effects models, followed by the estimation of prediction intervals, which provide cut points while accounting for several factors, and iii) diagnostic plots, which allow the post hoc assessment of model assumptions. All methods discussed are available in the corresponding R add-on package mixADA. Copyright © 2015 Elsevier B.V. All rights reserved.
Breast cancer Ki67 expression preoperative discrimination by DCE-MRI radiomics features
NASA Astrophysics Data System (ADS)
Ma, Wenjuan; Ji, Yu; Qin, Zhuanping; Guo, Xinpeng; Jian, Xiqi; Liu, Peifang
2018-02-01
To investigate whether quantitative radiomics features extracted from dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) are associated with Ki67 expression of breast cancer. In this institutional review board approved retrospective study, we collected 377 cases Chinese women who were diagnosed with invasive breast cancer in 2015. This cohort included 53 low-Ki67 expression (Ki67 proliferation index less than 14%) and 324 cases with high-Ki67 expression (Ki67 proliferation index more than 14%). A binary-classification of low- vs. high- Ki67 expression was performed. A set of 52 quantitative radiomics features, including morphological, gray scale statistic, and texture features, were extracted from the segmented lesion area. Three most common machine learning classification methods, including Naive Bayes, k-Nearest Neighbor and support vector machine with Gaussian kernel, were employed for the classification and the least absolute shrink age and selection operator (LASSO) method was used to select most predictive features set for the classifiers. Classification performance was evaluated by the area under receiver operating characteristic curve (AUC), accuracy, sensitivity and specificity. The model that used Naive Bayes classification method achieved the best performance than the other two methods, yielding 0.773 AUC value, 0.757 accuracy, 0.777 sensitivity and 0.769 specificity. Our study showed that quantitative radiomics imaging features of breast tumor extracted from DCE-MRI are associated with breast cancer Ki67 expression. Future larger studies are needed in order to further evaluate the findings.
A Face Attention Technique for a Robot Able to Interpret Facial Expressions
NASA Astrophysics Data System (ADS)
Simplício, Carlos; Prado, José; Dias, Jorge
Automatic facial expressions recognition using vision is an important subject towards human-robot interaction. Here is proposed a human face focus of attention technique and a facial expressions classifier (a Dynamic Bayesian Network) to incorporate in an autonomous mobile agent whose hardware is composed by a robotic platform and a robotic head. The focus of attention technique is based on the symmetry presented by human faces. By using the output of this module the autonomous agent keeps always targeting the human face frontally. In order to accomplish this, the robot platform performs an arc centered at the human; thus the robotic head, when necessary, moves synchronized. In the proposed probabilistic classifier the information is propagated, from the previous instant, in a lower level of the network, to the current instant. Moreover, to recognize facial expressions are used not only positive evidences but also negative.
Predicting Flavonoid UGT Regioselectivity
Jackson, Rhydon; Knisley, Debra; McIntosh, Cecilia; Pfeiffer, Phillip
2011-01-01
Machine learning was applied to a challenging and biologically significant protein classification problem: the prediction of avonoid UGT acceptor regioselectivity from primary sequence. Novel indices characterizing graphical models of residues were proposed and found to be widely distributed among existing amino acid indices and to cluster residues appropriately. UGT subsequences biochemically linked to regioselectivity were modeled as sets of index sequences. Several learning techniques incorporating these UGT models were compared with classifications based on standard sequence alignment scores. These techniques included an application of time series distance functions to protein classification. Time series distances defined on the index sequences were used in nearest neighbor and support vector machine classifiers. Additionally, Bayesian neural network classifiers were applied to the index sequences. The experiments identified improvements over the nearest neighbor and support vector machine classifications relying on standard alignment similarity scores, as well as strong correlations between specific subsequences and regioselectivities. PMID:21747849
What Fits into a Mirror: Naive Beliefs about the Field of View
ERIC Educational Resources Information Center
Bianchi, Ivana; Savardi, Ugo
2012-01-01
Research on naive physics and naive optics have shown that people hold surprising beliefs about everyday phenomena that are in contrast with what they see. In this article, we investigated what adults expect to be the field of view of a mirror from various viewpoints. The studies presented here confirm that humans have difficulty dealing with the…
Telomerase Is Involved in IL-7-Mediated Differential Survival of Naive and Memory CD4+ T Cells1
Yang, Yinhua; An, Jie; Weng, Nan-ping
2008-01-01
IL-7 plays an essential role in T cell maintenance and survival. The survival effect of IL-7 is thought to be mediated through regulation of Bcl2 family proteins. After a comparative analysis of IL-7-induced growth and cell death of human naive and memory CD4+ T cells, we observed that more memory CD4+ T cells underwent cell division and proceeded to apoptosis than naive cells in response to IL-7. However, IL-7-induced expressions of Bcl2 family members (Bcl2, Bcl-xL, Bax, and Bad) were similar between naive and memory cells. Instead, we found that IL-7 induced higher levels of telomerase activity in naive cells than in memory cells, and the levels of IL-7-induced telomerase activity had a significant inverse correlation with cell death in CD4+ T cells. Furthermore, we showed that reducing expression of telomerase reverse transcriptase and telomerase activity significantly increased cell death of IL-7-cultured CD4+ T cells. Together, these findings demonstrate that telomerase is involved in IL-7-mediated differential survival of naive and memory CD4+ T cells. PMID:18322183
IL-21 sustains CD28 expression on IL-15-activated human naive CD8+ T cells.
Alves, Nuno L; Arosa, Fernando A; van Lier, René A W
2005-07-15
Human naive CD8+ T cells are able to respond in an Ag-independent manner to IL-7 and IL-15. Whereas IL-7 largely maintains CD8+ T cells in a naive phenotype, IL-15 drives these cells to an effector phenotype characterized, among other features, by down-regulation of the costimulatory molecule CD28. We evaluated the influence of the CD4+ Th cell-derived common gamma-chain cytokine IL-21 on cytokine-induced naive CD8+ T cell activation. Stimulation with IL-21 did not induce division and only slightly increased IL-15-induced proliferation of naive CD8+ T cells. Strikingly, however, IL-15-induced down-modulation of CD28 was completely prevented by IL-21 at the protein and transcriptional level. Subsequent stimulation via combined TCR/CD3 and CD28 triggering led to a markedly higher production of IL-2 and IFN-gamma in IL-15/IL-21-stimulated cells compared with IL-15-stimulated T cells. Our data show that IL-21 modulates the phenotype of naive CD8+ T cells that have undergone IL-15 induced homeostatic proliferation and preserves their responsiveness to CD28 ligands.
Detection of dechallenge in spontaneous reporting systems: a comparison of Bayes methods.
Banu, A Bazila; Alias Balamurugan, S Appavu; Thirumalaikolundusubramanian, Ponniah
2014-01-01
Dechallenge is a response observed for the reduction or disappearance of adverse drug reactions (ADR) on withdrawal of a drug from a patient. Currently available algorithms to detect dechallenge have limitations. Hence, there is a need to compare available new methods. To detect dechallenge in Spontaneous Reporting Systems, data-mining algorithms like Naive Bayes and Improved Naive Bayes were applied for comparing the performance of the algorithms in terms of accuracy and error. Analyzing the factors of dechallenge like outcome and disease category will help medical practitioners and pharmaceutical industries to determine the reasons for dechallenge in order to take essential steps toward drug safety. Adverse drug reactions of the year 2011 and 2012 were downloaded from the United States Food and Drug Administration's database. The outcome of classification algorithms showed that Improved Naive Bayes algorithm outperformed Naive Bayes with accuracy of 90.11% and error of 9.8% in detecting the dechallenge. Detecting dechallenge for unknown samples are essential for proper prescription. To overcome the issues exposed by Naive Bayes algorithm, Improved Naive Bayes algorithm can be used to detect dechallenge in terms of higher accuracy and minimal error.
Improving EEG-Based Driver Fatigue Classification Using Sparse-Deep Belief Networks.
Chai, Rifai; Ling, Sai Ho; San, Phyo Phyo; Naik, Ganesh R; Nguyen, Tuan N; Tran, Yvonne; Craig, Ashley; Nguyen, Hung T
2017-01-01
This paper presents an improvement of classification performance for electroencephalography (EEG)-based driver fatigue classification between fatigue and alert states with the data collected from 43 participants. The system employs autoregressive (AR) modeling as the features extraction algorithm, and sparse-deep belief networks (sparse-DBN) as the classification algorithm. Compared to other classifiers, sparse-DBN is a semi supervised learning method which combines unsupervised learning for modeling features in the pre-training layer and supervised learning for classification in the following layer. The sparsity in sparse-DBN is achieved with a regularization term that penalizes a deviation of the expected activation of hidden units from a fixed low-level prevents the network from overfitting and is able to learn low-level structures as well as high-level structures. For comparison, the artificial neural networks (ANN), Bayesian neural networks (BNN), and original deep belief networks (DBN) classifiers are used. The classification results show that using AR feature extractor and DBN classifiers, the classification performance achieves an improved classification performance with a of sensitivity of 90.8%, a specificity of 90.4%, an accuracy of 90.6%, and an area under the receiver operating curve (AUROC) of 0.94 compared to ANN (sensitivity at 80.8%, specificity at 77.8%, accuracy at 79.3% with AUC-ROC of 0.83) and BNN classifiers (sensitivity at 84.3%, specificity at 83%, accuracy at 83.6% with AUROC of 0.87). Using the sparse-DBN classifier, the classification performance improved further with sensitivity of 93.9%, a specificity of 92.3%, and an accuracy of 93.1% with AUROC of 0.96. Overall, the sparse-DBN classifier improved accuracy by 13.8, 9.5, and 2.5% over ANN, BNN, and DBN classifiers, respectively.
Improving EEG-Based Driver Fatigue Classification Using Sparse-Deep Belief Networks
Chai, Rifai; Ling, Sai Ho; San, Phyo Phyo; Naik, Ganesh R.; Nguyen, Tuan N.; Tran, Yvonne; Craig, Ashley; Nguyen, Hung T.
2017-01-01
This paper presents an improvement of classification performance for electroencephalography (EEG)-based driver fatigue classification between fatigue and alert states with the data collected from 43 participants. The system employs autoregressive (AR) modeling as the features extraction algorithm, and sparse-deep belief networks (sparse-DBN) as the classification algorithm. Compared to other classifiers, sparse-DBN is a semi supervised learning method which combines unsupervised learning for modeling features in the pre-training layer and supervised learning for classification in the following layer. The sparsity in sparse-DBN is achieved with a regularization term that penalizes a deviation of the expected activation of hidden units from a fixed low-level prevents the network from overfitting and is able to learn low-level structures as well as high-level structures. For comparison, the artificial neural networks (ANN), Bayesian neural networks (BNN), and original deep belief networks (DBN) classifiers are used. The classification results show that using AR feature extractor and DBN classifiers, the classification performance achieves an improved classification performance with a of sensitivity of 90.8%, a specificity of 90.4%, an accuracy of 90.6%, and an area under the receiver operating curve (AUROC) of 0.94 compared to ANN (sensitivity at 80.8%, specificity at 77.8%, accuracy at 79.3% with AUC-ROC of 0.83) and BNN classifiers (sensitivity at 84.3%, specificity at 83%, accuracy at 83.6% with AUROC of 0.87). Using the sparse-DBN classifier, the classification performance improved further with sensitivity of 93.9%, a specificity of 92.3%, and an accuracy of 93.1% with AUROC of 0.96. Overall, the sparse-DBN classifier improved accuracy by 13.8, 9.5, and 2.5% over ANN, BNN, and DBN classifiers, respectively. PMID:28326009
Classifying smoking urges via machine learning
Dumortier, Antoine; Beckjord, Ellen; Shiffman, Saul; Sejdić, Ervin
2016-01-01
Background and objective Smoking is the largest preventable cause of death and diseases in the developed world, and advances in modern electronics and machine learning can help us deliver real-time intervention to smokers in novel ways. In this paper, we examine different machine learning approaches to use situational features associated with having or not having urges to smoke during a quit attempt in order to accurately classify high-urge states. Methods To test our machine learning approaches, specifically, Bayes, discriminant analysis and decision tree learning methods, we used a dataset collected from over 300 participants who had initiated a quit attempt. The three classification approaches are evaluated observing sensitivity, specificity, accuracy and precision. Results The outcome of the analysis showed that algorithms based on feature selection make it possible to obtain high classification rates with only a few features selected from the entire dataset. The classification tree method outperformed the naive Bayes and discriminant analysis methods, with an accuracy of the classifications up to 86%. These numbers suggest that machine learning may be a suitable approach to deal with smoking cessation matters, and to predict smoking urges, outlining a potential use for mobile health applications. Conclusions In conclusion, machine learning classifiers can help identify smoking situations, and the search for the best features and classifier parameters significantly improves the algorithms’ performance. In addition, this study also supports the usefulness of new technologies in improving the effect of smoking cessation interventions, the management of time and patients by therapists, and thus the optimization of available health care resources. Future studies should focus on providing more adaptive and personalized support to people who really need it, in a minimum amount of time by developing novel expert systems capable of delivering real-time interventions. PMID:28110725
Classifying smoking urges via machine learning.
Dumortier, Antoine; Beckjord, Ellen; Shiffman, Saul; Sejdić, Ervin
2016-12-01
Smoking is the largest preventable cause of death and diseases in the developed world, and advances in modern electronics and machine learning can help us deliver real-time intervention to smokers in novel ways. In this paper, we examine different machine learning approaches to use situational features associated with having or not having urges to smoke during a quit attempt in order to accurately classify high-urge states. To test our machine learning approaches, specifically, Bayes, discriminant analysis and decision tree learning methods, we used a dataset collected from over 300 participants who had initiated a quit attempt. The three classification approaches are evaluated observing sensitivity, specificity, accuracy and precision. The outcome of the analysis showed that algorithms based on feature selection make it possible to obtain high classification rates with only a few features selected from the entire dataset. The classification tree method outperformed the naive Bayes and discriminant analysis methods, with an accuracy of the classifications up to 86%. These numbers suggest that machine learning may be a suitable approach to deal with smoking cessation matters, and to predict smoking urges, outlining a potential use for mobile health applications. In conclusion, machine learning classifiers can help identify smoking situations, and the search for the best features and classifier parameters significantly improves the algorithms' performance. In addition, this study also supports the usefulness of new technologies in improving the effect of smoking cessation interventions, the management of time and patients by therapists, and thus the optimization of available health care resources. Future studies should focus on providing more adaptive and personalized support to people who really need it, in a minimum amount of time by developing novel expert systems capable of delivering real-time interventions. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Subsurface event detection and classification using Wireless Signal Networks.
Yoon, Suk-Un; Ghazanfari, Ehsan; Cheng, Liang; Pamukcu, Sibel; Suleiman, Muhannad T
2012-11-05
Subsurface environment sensing and monitoring applications such as detection of water intrusion or a landslide, which could significantly change the physical properties of the host soil, can be accomplished using a novel concept, Wireless Signal Networks (WSiNs). The wireless signal networks take advantage of the variations of radio signal strength on the distributed underground sensor nodes of WSiNs to monitor and characterize the sensed area. To characterize subsurface environments for event detection and classification, this paper provides a detailed list and experimental data of soil properties on how radio propagation is affected by soil properties in subsurface communication environments. Experiments demonstrated that calibrated wireless signal strength variations can be used as indicators to sense changes in the subsurface environment. The concept of WSiNs for the subsurface event detection is evaluated with applications such as detection of water intrusion, relative density change, and relative motion using actual underground sensor nodes. To classify geo-events using the measured signal strength as a main indicator of geo-events, we propose a window-based minimum distance classifier based on Bayesian decision theory. The window-based classifier for wireless signal networks has two steps: event detection and event classification. With the event detection, the window-based classifier classifies geo-events on the event occurring regions that are called a classification window. The proposed window-based classification method is evaluated with a water leakage experiment in which the data has been measured in laboratory experiments. In these experiments, the proposed detection and classification method based on wireless signal network can detect and classify subsurface events.
NASA Astrophysics Data System (ADS)
Li, Zheng; Jiang, Yi-han; Duan, Lian; Zhu, Chao-zhe
2017-08-01
Objective. Functional near infra-red spectroscopy (fNIRS) is a promising brain imaging technology for brain-computer interfaces (BCI). Future clinical uses of fNIRS will likely require operation over long time spans, during which neural activation patterns may change. However, current decoders for fNIRS signals are not designed to handle changing activation patterns. The objective of this study is to test via simulations a new adaptive decoder for fNIRS signals, the Gaussian mixture model adaptive classifier (GMMAC). Approach. GMMAC can simultaneously classify and track activation pattern changes without the need for ground-truth labels. This adaptive classifier uses computationally efficient variational Bayesian inference to label new data points and update mixture model parameters, using the previous model parameters as priors. We test GMMAC in simulations in which neural activation patterns change over time and compare to static decoders and unsupervised adaptive linear discriminant analysis classifiers. Main results. Our simulation experiments show GMMAC can accurately decode under time-varying activation patterns: shifts of activation region, expansions of activation region, and combined contractions and shifts of activation region. Furthermore, the experiments show the proposed method can track the changing shape of the activation region. Compared to prior work, GMMAC performed significantly better than the other unsupervised adaptive classifiers on a difficult activation pattern change simulation: 99% versus <54% in two-choice classification accuracy. Significance. We believe GMMAC will be useful for clinical fNIRS-based brain-computer interfaces, including neurofeedback training systems, where operation over long time spans is required.
Subsurface Event Detection and Classification Using Wireless Signal Networks
Yoon, Suk-Un; Ghazanfari, Ehsan; Cheng, Liang; Pamukcu, Sibel; Suleiman, Muhannad T.
2012-01-01
Subsurface environment sensing and monitoring applications such as detection of water intrusion or a landslide, which could significantly change the physical properties of the host soil, can be accomplished using a novel concept, Wireless Signal Networks (WSiNs). The wireless signal networks take advantage of the variations of radio signal strength on the distributed underground sensor nodes of WSiNs to monitor and characterize the sensed area. To characterize subsurface environments for event detection and classification, this paper provides a detailed list and experimental data of soil properties on how radio propagation is affected by soil properties in subsurface communication environments. Experiments demonstrated that calibrated wireless signal strength variations can be used as indicators to sense changes in the subsurface environment. The concept of WSiNs for the subsurface event detection is evaluated with applications such as detection of water intrusion, relative density change, and relative motion using actual underground sensor nodes. To classify geo-events using the measured signal strength as a main indicator of geo-events, we propose a window-based minimum distance classifier based on Bayesian decision theory. The window-based classifier for wireless signal networks has two steps: event detection and event classification. With the event detection, the window-based classifier classifies geo-events on the event occurring regions that are called a classification window. The proposed window-based classification method is evaluated with a water leakage experiment in which the data has been measured in laboratory experiments. In these experiments, the proposed detection and classification method based on wireless signal network can detect and classify subsurface events. PMID:23202191
Astrand, Elaine; Enel, Pierre; Ibos, Guilhem; Dominey, Peter Ford; Baraduc, Pierre; Ben Hamed, Suliann
2014-01-01
Decoding neuronal information is important in neuroscience, both as a basic means to understand how neuronal activity is related to cerebral function and as a processing stage in driving neuroprosthetic effectors. Here, we compare the readout performance of six commonly used classifiers at decoding two different variables encoded by the spiking activity of the non-human primate frontal eye fields (FEF): the spatial position of a visual cue, and the instructed orientation of the animal's attention. While the first variable is exogenously driven by the environment, the second variable corresponds to the interpretation of the instruction conveyed by the cue; it is endogenously driven and corresponds to the output of internal cognitive operations performed on the visual attributes of the cue. These two variables were decoded using either a regularized optimal linear estimator in its explicit formulation, an optimal linear artificial neural network estimator, a non-linear artificial neural network estimator, a non-linear naïve Bayesian estimator, a non-linear Reservoir recurrent network classifier or a non-linear Support Vector Machine classifier. Our results suggest that endogenous information such as the orientation of attention can be decoded from the FEF with the same accuracy as exogenous visual information. All classifiers did not behave equally in the face of population size and heterogeneity, the available training and testing trials, the subject's behavior and the temporal structure of the variable of interest. In most situations, the regularized optimal linear estimator and the non-linear Support Vector Machine classifiers outperformed the other tested decoders. PMID:24466019
Back, Seung Keun; Lee, Hyunkyoung; Lee, JaeHee; Kim, Hye young; Kim, Hee Jin; Na, Heung Sik
2016-01-01
Atopic dermatitis is a complex disease of heterogeneous pathogenesis, in particular, genetic predisposition, environmental triggers, and their interactions. Indoor air pollution, increasing with urbanization, plays a role as environmental risk factor in the development of AD. However, we still lack a detailed picture of the role of air pollution in the development of the disease. Here, we examined the effect of formaldehyde (FA) exposure on the manifestation of atopic dermatitis and the underlying molecular mechanism in naive rats and in a rat model of atopic dermatitis (AD) produced by neonatal capsaicin treatment. The AD and naive rats were exposed to 0.8 ppm FA, 1.2 ppm FA, or fresh air (Air) for 6 weeks (2 hours/day and 5 days/week). So, six groups, namely the 1.2 FA-AD, 0.8 FA-AD, Air-AD, 1.2 FA-naive, 0.8 FA-naive and Air-naive groups, were established. Pruritus and dermatitis, two major symptoms of atopic dermatitis, were evaluated every week for 6 weeks. After that, samples of the blood, the skin and the thymus were collected from the 1.2 FA-AD, the Air-AD, the 1.2 FA-naive and the Air-naive groups. Serum IgE levels were quantified with ELISA, and mRNA expression levels of inflammatory cytokines from extracts of the skin and the thymus were calculated with qRT-PCR. The dermatitis and pruritus significantly worsened in 1.2 FA-AD group, but not in 0.8 FA-AD, compared to the Air-AD animals, whereas FA didn't induce any symptoms in naive rats. Consistently, the levels of serum IgE were significantly higher in 1.2 FA-AD than in air-AD, however, there was no significant difference following FA exposure in naive animals. In the skin, mRNA expression levels of Th1 cytokines such as TNF-α and IL-1β were significantly higher in the 1.2 FA-AD rats compared to the air-AD rats, whereas mRNA expression levels of Th2 cytokines (IL-4, IL-5, IL-13), IL-17A and TSLP were significantly higher in 1.2 FA-naive group than in the Air-naive group. These results suggested that 1.2 ppm of FA penetrated the injured skin barrier, and exacerbated Th1 responses and serum IgE level in the AD rats so that dermatitis and pruritus were aggravated, while the elevated expression of Th2 cytokines by 1.2 ppm of FA in naive rats was probably insufficient for clinical manifestation. In conclusion, in a rat model of atopic dermatitis, exposure to 1.2 ppm of FA aggravated pruritus and skin inflammation, which was associated with the elevated expression of Th1 cytokines. PMID:28005965
Maiti, Saumen; Erram, V C; Gupta, Gautam; Tiwari, Ram Krishna; Kulkarni, U D; Sangpal, R R
2013-04-01
Deplorable quality of groundwater arising from saltwater intrusion, natural leaching and anthropogenic activities is one of the major concerns for the society. Assessment of groundwater quality is, therefore, a primary objective of scientific research. Here, we propose an artificial neural network-based method set in a Bayesian neural network (BNN) framework and employ it to assess groundwater quality. The approach is based on analyzing 36 water samples and inverting up to 85 Schlumberger vertical electrical sounding data. We constructed a priori model by suitably parameterizing geochemical and geophysical data collected from the western part of India. The posterior model (post-inversion) was estimated using the BNN learning procedure and global hybrid Monte Carlo/Markov Chain Monte Carlo optimization scheme. By suitable parameterization of geochemical and geophysical parameters, we simulated 1,500 training samples, out of which 50 % samples were used for training and remaining 50 % were used for validation and testing. We show that the trained model is able to classify validation and test samples with 85 % and 80 % accuracy respectively. Based on cross-correlation analysis and Gibb's diagram of geochemical attributes, the groundwater qualities of the study area were classified into following three categories: "Very good", "Good", and "Unsuitable". The BNN model-based results suggest that groundwater quality falls mostly in the range of "Good" to "Very good" except for some places near the Arabian Sea. The new modeling results powered by uncertainty and statistical analyses would provide useful constrain, which could be utilized in monitoring and assessment of the groundwater quality.
Segmentation schema for enhancing land cover identification: A case study using Sentinel 2 data
NASA Astrophysics Data System (ADS)
Mongus, Domen; Žalik, Borut
2018-04-01
Land monitoring is performed increasingly using high and medium resolution optical satellites, such as the Sentinel-2. However, optical data is inevitably subjected to the variable operational conditions under which it was acquired. Overlapping of features caused by shadows, soft transitions between shadowed and non-shadowed regions, and temporal variability of the observed land-cover types require radiometric corrections. This study examines a new approach to enhancing the accuracy of land cover identification that resolves this problem. The proposed method constructs an ensemble-type classification model with weak classifiers tuned to the particular operational conditions under which the data was acquired. Iterative segmentation over the learning set is applied for this purpose, where feature space is partitioned according to the likelihood of misclassifications introduced by the classification model. As these are a consequence of overlapping features, such partitioning avoids the need for radiometric corrections of the data, and divides land cover types implicitly into subclasses. As a result, improved performance of all tested classification approaches were measured during the validation that was conducted on Sentinel-2 data. The highest accuracies in terms of F1-scores were achieved using the Naive Bayes Classifier as the weak classifier, while supplementing original spectral signatures with normalised difference vegetation index and texture analysis features, namely, average intensity, contrast, homogeneity, and dissimilarity. In total, an F1-score of nearly 95% was achieved in this way, with F1-scores of each particular land cover type reaching above 90%.
Meaningful Assessment of Robotic Surgical Style using the Wisdom of Crowds.
Ershad, M; Rege, R; Fey, A Majewicz
2018-07-01
Quantitative assessment of surgical skills is an important aspect of surgical training; however, the proposed metrics are sometimes difficult to interpret and may not capture the stylistic characteristics that define expertise. This study proposes a methodology for evaluating the surgical skill, based on metrics associated with stylistic adjectives, and evaluates the ability of this method to differentiate expertise levels. We recruited subjects from different expertise levels to perform training tasks on a surgical simulator. A lexicon of contrasting adjective pairs, based on important skills for robotic surgery, inspired by the global evaluative assessment of robotic skills tool, was developed. To validate the use of stylistic adjectives for surgical skill assessment, posture videos of the subjects performing the task, as well as videos of the task were rated by crowd-workers. Metrics associated with each adjective were found using kinematic and physiological measurements through correlation with the crowd-sourced adjective assignment ratings. To evaluate the chosen metrics' ability in distinguishing expertise levels, two classifiers were trained and tested using these metrics. Crowd-assignment ratings for all adjectives were significantly correlated with expertise levels. The results indicate that naive Bayes classifier performs the best, with an accuracy of [Formula: see text], [Formula: see text], [Formula: see text], and [Formula: see text] when classifying into four, three, and two levels of expertise, respectively. The proposed method is effective at mapping understandable adjectives of expertise to the stylistic movements and physiological response of trainees.
ERIC Educational Resources Information Center
Jennings, Amanda Brooke
2017-01-01
Children construct meaning from their economic experiences in the form of naive theories and use these theories to explain the relationships between their actions and the outcomes. Inevitably, due to their lack of economic literacy, these theories will be incomplete. Through curriculum design that acknowledges and addresses these naive theories,…
Effector CD8 T cells dedifferentiate into long-lived memory cells.
Youngblood, Ben; Hale, J Scott; Kissick, Haydn T; Ahn, Eunseon; Xu, Xiaojin; Wieland, Andreas; Araki, Koichi; West, Erin E; Ghoneim, Hazem E; Fan, Yiping; Dogra, Pranay; Davis, Carl W; Konieczny, Bogumila T; Antia, Rustom; Cheng, Xiaodong; Ahmed, Rafi
2017-12-21
Memory CD8 T cells that circulate in the blood and are present in lymphoid organs are an essential component of long-lived T cell immunity. These memory CD8 T cells remain poised to rapidly elaborate effector functions upon re-exposure to pathogens, but also have many properties in common with naive cells, including pluripotency and the ability to migrate to the lymph nodes and spleen. Thus, memory cells embody features of both naive and effector cells, fuelling a long-standing debate centred on whether memory T cells develop from effector cells or directly from naive cells. Here we show that long-lived memory CD8 T cells are derived from a subset of effector T cells through a process of dedifferentiation. To assess the developmental origin of memory CD8 T cells, we investigated changes in DNA methylation programming at naive and effector cell-associated genes in virus-specific CD8 T cells during acute lymphocytic choriomeningitis virus infection in mice. Methylation profiling of terminal effector versus memory-precursor CD8 T cell subsets showed that, rather than retaining a naive epigenetic state, the subset of cells that gives rise to memory cells acquired de novo DNA methylation programs at naive-associated genes and became demethylated at the loci of classically defined effector molecules. Conditional deletion of the de novo methyltransferase Dnmt3a at an early stage of effector differentiation resulted in reduced methylation and faster re-expression of naive-associated genes, thereby accelerating the development of memory cells. Longitudinal phenotypic and epigenetic characterization of the memory-precursor effector subset of virus-specific CD8 T cells transferred into antigen-free mice revealed that differentiation to memory cells was coupled to erasure of de novo methylation programs and re-expression of naive-associated genes. Thus, epigenetic repression of naive-associated genes in effector CD8 T cells can be reversed in cells that develop into long-lived memory CD8 T cells while key effector genes remain demethylated, demonstrating that memory T cells arise from a subset of fate-permissive effector T cells.
Nuutinen, Mikko; Leskelä, Riikka-Leena; Suojalehto, Ella; Tirronen, Anniina; Komssi, Vesa
2017-04-13
In previous years a substantial number of studies have identified statistically important predictors of nursing home admission (NHA). However, as far as we know, the analyses have been done at the population-level. No prior research has analysed the prediction accuracy of a NHA model for individuals. This study is an analysis of 3056 longer-term home care customers in the city of Tampere, Finland. Data were collected from the records of social and health service usage and RAI-HC (Resident Assessment Instrument - Home Care) assessment system during January 2011 and September 2015. The aim was to find out the most efficient variable subsets to predict NHA for individuals and validate the accuracy. The variable subsets of predicting NHA were searched by sequential forward selection (SFS) method, a variable ranking metric and the classifiers of logistic regression (LR), support vector machine (SVM) and Gaussian naive Bayes (GNB). The validation of the results was guaranteed using randomly balanced data sets and cross-validation. The primary performance metrics for the classifiers were the prediction accuracy and AUC (average area under the curve). The LR and GNB classifiers achieved 78% accuracy for predicting NHA. The most important variables were RAI MAPLE (Method for Assigning Priority Levels), functional impairment (RAI IADL, Activities of Daily Living), cognitive impairment (RAI CPS, Cognitive Performance Scale), memory disorders (diagnoses G30-G32 and F00-F03) and the use of community-based health-service and prior hospital use (emergency visits and periods of care). The accuracy of the classifier for individuals was high enough to convince the officials of the city of Tampere to integrate the predictive model based on the findings of this study as a part of home care information system. Further work need to be done to evaluate variables that are modifiable and responsive to interventions.
A multiple-point spatially weighted k-NN method for object-based classification
NASA Astrophysics Data System (ADS)
Tang, Yunwei; Jing, Linhai; Li, Hui; Atkinson, Peter M.
2016-10-01
Object-based classification, commonly referred to as object-based image analysis (OBIA), is now commonly regarded as able to produce more appealing classification maps, often of greater accuracy, than pixel-based classification and its application is now widespread. Therefore, improvement of OBIA using spatial techniques is of great interest. In this paper, multiple-point statistics (MPS) is proposed for object-based classification enhancement in the form of a new multiple-point k-nearest neighbour (k-NN) classification method (MPk-NN). The proposed method first utilises a training image derived from a pre-classified map to characterise the spatial correlation between multiple points of land cover classes. The MPS borrows spatial structures from other parts of the training image, and then incorporates this spatial information, in the form of multiple-point probabilities, into the k-NN classifier. Two satellite sensor images with a fine spatial resolution were selected to evaluate the new method. One is an IKONOS image of the Beijing urban area and the other is a WorldView-2 image of the Wolong mountainous area, in China. The images were object-based classified using the MPk-NN method and several alternatives, including the k-NN, the geostatistically weighted k-NN, the Bayesian method, the decision tree classifier (DTC), and the support vector machine classifier (SVM). It was demonstrated that the new spatial weighting based on MPS can achieve greater classification accuracy relative to the alternatives and it is, thus, recommended as appropriate for object-based classification.
Multi-objective evolutionary algorithms for fuzzy classification in survival prediction.
Jiménez, Fernando; Sánchez, Gracia; Juárez, José M
2014-03-01
This paper presents a novel rule-based fuzzy classification methodology for survival/mortality prediction in severe burnt patients. Due to the ethical aspects involved in this medical scenario, physicians tend not to accept a computer-based evaluation unless they understand why and how such a recommendation is given. Therefore, any fuzzy classifier model must be both accurate and interpretable. The proposed methodology is a three-step process: (1) multi-objective constrained optimization of a patient's data set, using Pareto-based elitist multi-objective evolutionary algorithms to maximize accuracy and minimize the complexity (number of rules) of classifiers, subject to interpretability constraints; this step produces a set of alternative (Pareto) classifiers; (2) linguistic labeling, which assigns a linguistic label to each fuzzy set of the classifiers; this step is essential to the interpretability of the classifiers; (3) decision making, whereby a classifier is chosen, if it is satisfactory, according to the preferences of the decision maker. If no classifier is satisfactory for the decision maker, the process starts again in step (1) with a different input parameter set. The performance of three multi-objective evolutionary algorithms, niched pre-selection multi-objective algorithm, elitist Pareto-based multi-objective evolutionary algorithm for diversity reinforcement (ENORA) and the non-dominated sorting genetic algorithm (NSGA-II), was tested using a patient's data set from an intensive care burn unit and a standard machine learning data set from an standard machine learning repository. The results are compared using the hypervolume multi-objective metric. Besides, the results have been compared with other non-evolutionary techniques and validated with a multi-objective cross-validation technique. Our proposal improves the classification rate obtained by other non-evolutionary techniques (decision trees, artificial neural networks, Naive Bayes, and case-based reasoning) obtaining with ENORA a classification rate of 0.9298, specificity of 0.9385, and sensitivity of 0.9364, with 14.2 interpretable fuzzy rules on average. Our proposal improves the accuracy and interpretability of the classifiers, compared with other non-evolutionary techniques. We also conclude that ENORA outperforms niched pre-selection and NSGA-II algorithms. Moreover, given that our multi-objective evolutionary methodology is non-combinational based on real parameter optimization, the time cost is significantly reduced compared with other evolutionary approaches existing in literature based on combinational optimization. Copyright © 2014 Elsevier B.V. All rights reserved.
Social learning of a brood parasite by its host
Feeney, William E.; Langmore, Naomi E.
2013-01-01
Arms races between brood parasites and their hosts provide model systems for studying the evolutionary repercussions of species interactions. However, how naive hosts identify brood parasites as enemies remains poorly understood, despite its ecological and evolutionary significance. Here, we investigate whether young, cuckoo-naive superb fairy-wrens, Malurus cyaneus, can learn to recognize cuckoos as a threat through social transmission of information. Naive individuals were initially unresponsive to a cuckoo specimen, but after observing conspecifics mob a cuckoo, they made more whining and mobbing alarm calls, and spent more time physically mobbing the cuckoo. This is the first direct evidence that naive hosts can learn to identify brood parasites as enemies via social learning. PMID:23760171
Social learning of a brood parasite by its host.
Feeney, William E; Langmore, Naomi E
2013-08-23
Arms races between brood parasites and their hosts provide model systems for studying the evolutionary repercussions of species interactions. However, how naive hosts identify brood parasites as enemies remains poorly understood, despite its ecological and evolutionary significance. Here, we investigate whether young, cuckoo-naive superb fairy-wrens, Malurus cyaneus, can learn to recognize cuckoos as a threat through social transmission of information. Naive individuals were initially unresponsive to a cuckoo specimen, but after observing conspecifics mob a cuckoo, they made more whining and mobbing alarm calls, and spent more time physically mobbing the cuckoo. This is the first direct evidence that naive hosts can learn to identify brood parasites as enemies via social learning.
Comparison of Machine Learning Methods for the Arterial Hypertension Diagnostics
Belo, David; Gamboa, Hugo
2017-01-01
The paper presents results of machine learning approach accuracy applied analysis of cardiac activity. The study evaluates the diagnostics possibilities of the arterial hypertension by means of the short-term heart rate variability signals. Two groups were studied: 30 relatively healthy volunteers and 40 patients suffering from the arterial hypertension of II-III degree. The following machine learning approaches were studied: linear and quadratic discriminant analysis, k-nearest neighbors, support vector machine with radial basis, decision trees, and naive Bayes classifier. Moreover, in the study, different methods of feature extraction are analyzed: statistical, spectral, wavelet, and multifractal. All in all, 53 features were investigated. Investigation results show that discriminant analysis achieves the highest classification accuracy. The suggested approach of noncorrelated feature set search achieved higher results than data set based on the principal components. PMID:28831239
Big data analytics for early detection of breast cancer based on machine learning
NASA Astrophysics Data System (ADS)
Ivanova, Desislava
2017-12-01
This paper presents the concept and the modern advances in personalized medicine that rely on technology and review the existing tools for early detection of breast cancer. The breast cancer types and distribution worldwide is discussed. It is spent time to explain the importance of identifying the normality and to specify the main classes in breast cancer, benign or malignant. The main purpose of the paper is to propose a conceptual model for early detection of breast cancer based on machine learning for processing and analysis of medical big dataand further knowledge discovery for personalized treatment. The proposed conceptual model is realized by using Naive Bayes classifier. The software is written in python programming language and for the experiments the Wisconsin breast cancer database is used. Finally, the experimental results are presented and discussed.
Calcium-mediated shaping of naive CD4 T-cell phenotype and function
Guichard, Vincent; Bonilla, Nelly; Durand, Aurélie; Audemard-Verger, Alexandra; Guilbert, Thomas; Martin, Bruno
2017-01-01
Continuous contact with self-major histocompatibility complex ligands is essential for the survival of naive CD4 T cells. We have previously shown that the resulting tonic TCR signaling also influences their fate upon activation by increasing their ability to differentiate into induced/peripheral regulatory T cells. To decipher the molecular mechanisms governing this process, we here focus on the TCR signaling cascade and demonstrate that a rise in intracellular calcium levels is sufficient to modulate the phenotype of mouse naive CD4 T cells and to increase their sensitivity to regulatory T-cell polarization signals, both processes relying on calcineurin activation. Accordingly, in vivo calcineurin inhibition leads the most self-reactive naive CD4 T cells to adopt the phenotype of their less self-reactive cell-counterparts. Collectively, our findings demonstrate that calcium-mediated activation of the calcineurin pathway acts as a rheostat to shape both the phenotype and effector potential of naive CD4 T cells in the steady-state. PMID:29239722
'Educated' dendritic cells act as messengers from memory to naive T helper cells.
Alpan, Oral; Bachelder, Eric; Isil, Eda; Arnheiter, Heinz; Matzinger, Polly
2004-06-01
Ingested antigens lead to the generation of effector T cells that secrete interleukin 4 (IL-4) rather than interferon-gamma (IFN-gamma) and are capable of influencing naive T cells in their immediate environment to do the same. Using chimeric mice generated by aggregation of two genotypically different embryos, we found that the conversion of a naive T cell occurs only if it can interact with the same antigen-presenting cell, although not necessarily the same antigen, as the effector T cell. Using a two-step culture system in vitro, we found that antigen-presenting dendritic cells can act as 'temporal bridges' to relay information from orally immunized memory CD4 T cells to naive CD4 T cells. The orally immunized T cells use IL-4 and IL-10 (but not CD40 ligand) to 'educate' dendritic cells, which in turn induce naive T cells to produce the same cytokines as those produced by the orally immunized memory T cells.
Rapid estimation of high-parameter auditory-filter shapes
Shen, Yi; Sivakumar, Rajeswari; Richards, Virginia M.
2014-01-01
A Bayesian adaptive procedure, the quick-auditory-filter (qAF) procedure, was used to estimate auditory-filter shapes that were asymmetric about their peaks. In three experiments, listeners who were naive to psychoacoustic experiments detected a fixed-level, pure-tone target presented with a spectrally notched noise masker. The qAF procedure adaptively manipulated the masker spectrum level and the position of the masker notch, which was optimized for the efficient estimation of the five parameters of an auditory-filter model. Experiment I demonstrated that the qAF procedure provided a convergent estimate of the auditory-filter shape at 2 kHz within 150 to 200 trials (approximately 15 min to complete) and, for a majority of listeners, excellent test-retest reliability. In experiment II, asymmetric auditory filters were estimated for target frequencies of 1 and 4 kHz and target levels of 30 and 50 dB sound pressure level. The estimated filter shapes were generally consistent with published norms, especially at the low target level. It is known that the auditory-filter estimates are narrower for forward masking than simultaneous masking due to peripheral suppression, a result replicated in experiment III using fewer than 200 qAF trials. PMID:25324086
Phillips, Lawrence; Pearl, Lisa
2015-11-01
The informativity of a computational model of language acquisition is directly related to how closely it approximates the actual acquisition task, sometimes referred to as the model's cognitive plausibility. We suggest that though every computational model necessarily idealizes the modeled task, an informative language acquisition model can aim to be cognitively plausible in multiple ways. We discuss these cognitive plausibility checkpoints generally and then apply them to a case study in word segmentation, investigating a promising Bayesian segmentation strategy. We incorporate cognitive plausibility by using an age-appropriate unit of perceptual representation, evaluating the model output in terms of its utility, and incorporating cognitive constraints into the inference process. Our more cognitively plausible model shows a beneficial effect of cognitive constraints on segmentation performance. One interpretation of this effect is as a synergy between the naive theories of language structure that infants may have and the cognitive constraints that limit the fidelity of their inference processes, where less accurate inference approximations are better when the underlying assumptions about how words are generated are less accurate. More generally, these results highlight the utility of incorporating cognitive plausibility more fully into computational models of language acquisition. Copyright © 2015 Cognitive Science Society, Inc.
Cui, Jian; Liu, Jinghua; Li, Yuhua; Shi, Tieliu
2011-01-01
Mitochondria are major players on the production of energy, and host several key reactions involved in basic metabolism and biosynthesis of essential molecules. Currently, the majority of nucleus-encoded mitochondrial proteins are unknown even for model plant Arabidopsis. We reported a computational framework for predicting Arabidopsis mitochondrial proteins based on a probabilistic model, called Naive Bayesian Network, which integrates disparate genomic data generated from eight bioinformatics tools, multiple orthologous mappings, protein domain properties and co-expression patterns using 1,027 microarray profiles. Through this approach, we predicted 2,311 candidate mitochondrial proteins with 84.67% accuracy and 2.53% FPR performances. Together with those experimental confirmed proteins, 2,585 mitochondria proteins (named CoreMitoP) were identified, we explored those proteins with unknown functions based on protein-protein interaction network (PIN) and annotated novel functions for 26.65% CoreMitoP proteins. Moreover, we found newly predicted mitochondrial proteins embedded in particular subnetworks of the PIN, mainly functioning in response to diverse environmental stresses, like salt, draught, cold, and wound etc. Candidate mitochondrial proteins involved in those physiological acitivites provide useful targets for further investigation. Assigned functions also provide comprehensive information for Arabidopsis mitochondrial proteome. PMID:21297957
Naive B cells generate regulatory T cells in the presence of a mature immunologic synapse.
Reichardt, Peter; Dornbach, Bastian; Rong, Song; Beissert, Stefan; Gueler, Faikah; Loser, Karin; Gunzer, Matthias
2007-09-01
Naive B cells are ineffective antigen-presenting cells and are considered unable to activate naive T cells. However, antigen-specific contact of these cells leads to stable cell pairs that remain associated over hours in vivo. The physiologic role of such pairs has not been evaluated. We show here that antigen-specific conjugates between naive B cells and naive T cells display a mature immunologic synapse in the contact zone that is absent in T-cell-dendritic-cell (DC) pairs. B cells induce substantial proliferation but, contrary to DCs, no loss of L-selectin in T cells. Surprisingly, while DC-triggered T cells develop into normal effector cells, B-cell stimulation over 72 hours induces regulatory T cells inhibiting priming of fresh T cells in a contact-dependent manner in vitro. In vivo, the regulatory T cells home to lymph nodes where they potently suppress immune responses such as in cutaneous hypersensitivity and ectopic allogeneic heart transplant rejection. Our finding might help to explain old observations on tolerance induction by B cells, identify the mature immunologic synapse as a central functional module of this process, and suggest the use of naive B-cell-primed regulatory T cells, "bTregs," as a useful approach for therapeutic intervention in adverse adaptive immune responses.
Depaoli, Sarah; van de Schoot, Rens; van Loey, Nancy; Sijbrandij, Marit
2015-01-01
After traumatic events, such as disaster, war trauma, and injuries including burns (which is the focus here), the risk to develop posttraumatic stress disorder (PTSD) is approximately 10% (Breslau & Davis, 1992). Latent Growth Mixture Modeling can be used to classify individuals into distinct groups exhibiting different patterns of PTSD (Galatzer-Levy, 2015). Currently, empirical evidence points to four distinct trajectories of PTSD patterns in those who have experienced burn trauma. These trajectories are labeled as: resilient, recovery, chronic, and delayed onset trajectories (e.g., Bonanno, 2004; Bonanno, Brewin, Kaniasty, & Greca, 2010; Maercker, Gäbler, O'Neil, Schützwohl, & Müller, 2013; Pietrzak et al., 2013). The delayed onset trajectory affects only a small group of individuals, that is, about 4-5% (O'Donnell, Elliott, Lau, & Creamer, 2007). In addition to its low frequency, the later onset of this trajectory may contribute to the fact that these individuals can be easily overlooked by professionals. In this special symposium on Estimating PTSD trajectories (Van de Schoot, 2015a), we illustrate how to properly identify this small group of individuals through the Bayesian estimation framework using previous knowledge through priors (see, e.g., Depaoli & Boyajian, 2014; Van de Schoot, Broere, Perryck, Zondervan-Zwijnenburg, & Van Loey, 2015). We used latent growth mixture modeling (LGMM) (Van de Schoot, 2015b) to estimate PTSD trajectories across 4 years that followed a traumatic burn. We demonstrate and compare results from traditional (maximum likelihood) and Bayesian estimation using priors (see, Depaoli, 2012, 2013). Further, we discuss where priors come from and how to define them in the estimation process. We demonstrate that only the Bayesian approach results in the desired theory-driven solution of PTSD trajectories. Since the priors are chosen subjectively, we also present a sensitivity analysis of the Bayesian results to illustrate how to check the impact of the prior knowledge integrated into the model. We conclude with recommendations and guidelines for researchers looking to implement theory-driven LGMM, and we tailor this discussion to the context of PTSD research.
Jackola, D R; Hallgren, H M
1998-11-16
In healthy humans, phenotypic restructuring occurs with age within the CD3+ T-lymphocyte complement. This is characterized by a non-linear decrease of the percentage of 'naive' (CD45RA+) cells and a corresponding non-linear increase of the percentage of 'memory' (CD45R0+) cells among both the CD4+ and CD8+ T-cell subsets. We devised a simple compartmental model to study the age-dependent kinetics of phenotypic restructuring. We also derived differential equations whose parameters determined yearly gains minus losses of the percentage and absolute numbers of circulating naive cells, yearly gains minus losses of the percentage and absolute numbers of circulating memory cells, and the yearly rate of conversion of naive to memory cells. Solutions of these evaluative differential equations demonstrate the following: (1) the memory cell complement 'resides' within its compartment for a longer time than the naive cell complement within its compartment for both CD4 and CD8 cells; (2) the average, annual 'turnover rate' is the same for CD4 and CD8 naive cells. In contrast, the average, annual 'turnover rate' for memory CD8 cells is 1.5 times that of memory CD4 cells; (3) the average, annual conversion rate of CD4 naive cells to memory cells is twice that of the CD8 conversion rate; (4) a transition in dynamic restructuring occurs during the third decade of life that is due to these differences in turnover and conversion rates, between and from naive to memory cells.
Mukai, Tetsu; Maeda, Yumi; Tamura, Toshiki; Matsuoka, Masanori; Tsukamoto, Yumiko; Makino, Masahiko
2009-11-15
Because Mycobacterium bovis bacillus Calmette-Guérin (BCG) unconvincingly activates human naive CD8(+) T cells, a rBCG (BCG-70M) that secretes a fusion protein comprising BCG-derived heat shock protein (HSP)70 and Mycobacterium leprae-derived major membrane protein (MMP)-II, one of the immunodominant Ags of M. leprae, was newly constructed to potentiate the ability of activating naive CD8(+) T cells through dendritic cells (DC). BCG-70M secreted HSP70-MMP-II fusion protein in vitro, which stimulated DC to produce IL-12p70 through TLR2. BCG-70M-infected DC activated not only memory and naive CD8(+) T cells, but also CD4(+) T cells of both types to produce IFN-gamma. The activation of these naive T cells by BCG-70M was dependent on the MHC and CD86 molecules on BCG-70M-infected DC, and was significantly inhibited by pretreatment of DC with chloroquine. Both brefeldin A and lactacystin significantly inhibited the activation of naive CD8(+) T cells by BCG-70M through DC. Thus, the CD8(+) T cell activation may be induced by cross-presentation of Ags through a TAP- and proteosome-dependent cytosolic pathway. When naive CD8(+) T cells were stimulated by BCG-70M-infected DC in the presence of naive CD4(+) T cells, CD62L(low)CD8(+) T cells and perforin-producing CD8(+) T cells were efficiently produced. MMP-II-reactive CD4(+) and CD8(+) memory T cells were efficiently produced in C57BL/6 mice by infection with BCG-70M. These results indicate that BCG-70M activated DC, CD4(+) T cells, and CD8(+) T cells, and the combination of HSP70 and MMP-II may be useful for inducing better T cell activation.
LANDMARK-BASED SPEECH RECOGNITION: REPORT OF THE 2004 JOHNS HOPKINS SUMMER WORKSHOP.
Hasegawa-Johnson, Mark; Baker, James; Borys, Sarah; Chen, Ken; Coogan, Emily; Greenberg, Steven; Juneja, Amit; Kirchhoff, Katrin; Livescu, Karen; Mohan, Srividya; Muller, Jennifer; Sonmez, Kemal; Wang, Tianyu
2005-01-01
Three research prototype speech recognition systems are described, all of which use recently developed methods from artificial intelligence (specifically support vector machines, dynamic Bayesian networks, and maximum entropy classification) in order to implement, in the form of an automatic speech recognizer, current theories of human speech perception and phonology (specifically landmark-based speech perception, nonlinear phonology, and articulatory phonology). All three systems begin with a high-dimensional multiframe acoustic-to-distinctive feature transformation, implemented using support vector machines trained to detect and classify acoustic phonetic landmarks. Distinctive feature probabilities estimated by the support vector machines are then integrated using one of three pronunciation models: a dynamic programming algorithm that assumes canonical pronunciation of each word, a dynamic Bayesian network implementation of articulatory phonology, or a discriminative pronunciation model trained using the methods of maximum entropy classification. Log probability scores computed by these models are then combined, using log-linear combination, with other word scores available in the lattice output of a first-pass recognizer, and the resulting combination score is used to compute a second-pass speech recognition output.
Buried landmine detection using multivariate normal clustering
NASA Astrophysics Data System (ADS)
Duston, Brian M.
2001-10-01
A Bayesian classification algorithm is presented for discriminating buried land mines from buried and surface clutter in Ground Penetrating Radar (GPR) signals. This algorithm is based on multivariate normal (MVN) clustering, where feature vectors are used to identify populations (clusters) of mines and clutter objects. The features are extracted from two-dimensional images created from ground penetrating radar scans. MVN clustering is used to determine the number of clusters in the data and to create probability density models for target and clutter populations, producing the MVN clustering classifier (MVNCC). The Bayesian Information Criteria (BIC) is used to evaluate each model to determine the number of clusters in the data. An extension of the MVNCC allows the model to adapt to local clutter distributions by treating each of the MVN cluster components as a Poisson process and adaptively estimating the intensity parameters. The algorithm is developed using data collected by the Mine Hunter/Killer Close-In Detector (MH/K CID) at prepared mine lanes. The Mine Hunter/Killer is a prototype mine detecting and neutralizing vehicle developed for the U.S. Army to clear roads of anti-tank mines.
Bayesian learning for spatial filtering in an EEG-based brain-computer interface.
Zhang, Haihong; Yang, Huijuan; Guan, Cuntai
2013-07-01
Spatial filtering for EEG feature extraction and classification is an important tool in brain-computer interface. However, there is generally no established theory that links spatial filtering directly to Bayes classification error. To address this issue, this paper proposes and studies a Bayesian analysis theory for spatial filtering in relation to Bayes error. Following the maximum entropy principle, we introduce a gamma probability model for describing single-trial EEG power features. We then formulate and analyze the theoretical relationship between Bayes classification error and the so-called Rayleigh quotient, which is a function of spatial filters and basically measures the ratio in power features between two classes. This paper also reports our extensive study that examines the theory and its use in classification, using three publicly available EEG data sets and state-of-the-art spatial filtering techniques and various classifiers. Specifically, we validate the positive relationship between Bayes error and Rayleigh quotient in real EEG power features. Finally, we demonstrate that the Bayes error can be practically reduced by applying a new spatial filter with lower Rayleigh quotient.
Supernova Cosmology Inference with Probabilistic Photometric Redshifts (SCIPPR)
NASA Astrophysics Data System (ADS)
Peters, Christina; Malz, Alex; Hlozek, Renée
2018-01-01
The Bayesian Estimation Applied to Multiple Species (BEAMS) framework employs probabilistic supernova type classifications to do photometric SN cosmology. This work extends BEAMS to replace high-confidence spectroscopic redshifts with photometric redshift probability density functions, a capability that will be essential in the era the Large Synoptic Survey Telescope and other next-generation photometric surveys where it will not be possible to perform spectroscopic follow up on every SN. We present the Supernova Cosmology Inference with Probabilistic Photometric Redshifts (SCIPPR) Bayesian hierarchical model for constraining the cosmological parameters from photometric lightcurves and host galaxy photometry, which includes selection effects and is extensible to uncertainty in the redshift-dependent supernova type proportions. We create a pair of realistic mock catalogs of joint posteriors over supernova type, redshift, and distance modulus informed by photometric supernova lightcurves and over redshift from simulated host galaxy photometry. We perform inference under our model to obtain a joint posterior probability distribution over the cosmological parameters and compare our results with other methods, namely: a spectroscopic subset, a subset of high probability photometrically classified supernovae, and reducing the photometric redshift probability to a single measurement and error bar.
NASA Astrophysics Data System (ADS)
Fytilis, N.; Rizzo, D. M.
2012-12-01
Environmental managers are increasingly required to forecast the long-term effects and the resilience or vulnerability of biophysical systems to human-generated stresses. Mitigation strategies for hydrological and environmental systems need to be assessed in the presence of uncertainty. An important aspect of such complex systems is the assessment of variable uncertainty on the model response outputs. We develop a new classification tool that couples a Naïve Bayesian Classifier with a modified Kohonen Self-Organizing Map to tackle this challenge. For proof-of-concept, we use rapid geomorphic and reach-scale habitat assessments data from over 2500 Vermont stream reaches (~1371 stream miles) assessed by the Vermont Agency of Natural Resources (VTANR). In addition, the Vermont Department of Environmental Conservation (VTDEC) estimates stream habitat biodiversity indices (macro-invertebrates and fish) and a variety of water quality data. Our approach fully utilizes the existing VTANR and VTDEC data sets to improve classification of stream-reach habitat and biological integrity. The combined SOM-Naïve Bayesian architecture is sufficiently flexible to allow for continual updates and increased accuracy associated with acquiring new data. The Kohonen Self-Organizing Map (SOM) is an unsupervised artificial neural network that autonomously analyzes properties inherent in a given a set of data. It is typically used to cluster data vectors into similar categories when a priori classes do not exist. The ability of the SOM to convert nonlinear, high dimensional data to some user-defined lower dimension and mine large amounts of data types (i.e., discrete or continuous, biological or geomorphic data) makes it ideal for characterizing the sensitivity of river networks in a variety of contexts. The procedure is data-driven, and therefore does not require the development of site-specific, process-based classification stream models, or sets of if-then-else rules associated with expert systems. This has the potential to save time and resources, while enabling a truly adaptive management approach using existing knowledge (expressed as prior probabilities) and new information (expressed as likelihood functions) to update estimates (i.e., in this case, improved stream classifications expressed as posterior probabilities). The distribution parameters of these posterior probabilities are used to quantify uncertainty associated with environmental data. Since classification plays a leading role in the future development of data-enabled science and engineering, such a computational tool is applicable to a variety of engineering applications. The ability of the new classification neural network to characterize streams with high environmental risk is essential for a proactive adaptive watershed management approach.
Modular Bayesian Networks with Low-Power Wearable Sensors for Recognizing Eating Activities.
Kim, Kee-Hoon; Cho, Sung-Bae
2017-12-11
Recently, recognizing a user's daily activity using a smartphone and wearable sensors has become a popular issue. However, in contrast with the ideal definition of an experiment, there could be numerous complex activities in real life with respect to its various background and contexts: time, space, age, culture, and so on. Recognizing these complex activities with limited low-power sensors, considering the power and memory constraints of the wearable environment and the user's obtrusiveness at once is not an easy problem, although it is very crucial for the activity recognizer to be practically useful. In this paper, we recognize activity of eating, which is one of the most typical examples of a complex activity, using only daily low-power mobile and wearable sensors. To organize the related contexts systemically, we have constructed the context model based on activity theory and the "Five W's", and propose a Bayesian network with 88 nodes to predict uncertain contexts probabilistically. The structure of the proposed Bayesian network is designed by a modular and tree-structured approach to reduce the time complexity and increase the scalability. To evaluate the proposed method, we collected the data with 10 different activities from 25 volunteers of various ages, occupations, and jobs, and have obtained 79.71% accuracy, which outperforms other conventional classifiers by 7.54-14.4%. Analyses of the results showed that our probabilistic approach could also give approximate results even when one of contexts or sensor values has a very heterogeneous pattern or is missing.
A two-component Bayesian mixture model to identify implausible gestational age.
Mohammadian-Khoshnoud, Maryam; Moghimbeigi, Abbas; Faradmal, Javad; Yavangi, Mahnaz
2016-01-01
Background: Birth weight and gestational age are two important variables in obstetric research. The primary measure of gestational age is based on a mother's recall of her last menstrual period. This recall may cause random or systematic errors. Therefore, the objective of this study is to utilize Bayesian mixture model in order to identify implausible gestational age. Methods: In this cross-sectional study, medical documents of 502 preterm infants born and hospitalized in Hamadan Fatemieh Hospital from 2009 to 2013 were gathered. Preterm infants were classified to less than 28 weeks and 28 to 31 weeks. A two-component Bayesian mixture model was utilized to identify implausible gestational age; the first component shows the probability of correct and the second one shows the probability of incorrect classification of gestational ages. The data were analyzed through OpenBUGS 3.2.2 and 'coda' package of R 3.1.1. Results: The mean (SD) of the second component of less than 28 weeks and 28 to 31 weeks were 1179 (0.0123) and 1620 (0.0074), respectively. These values were larger than the mean of the first component for both groups which were 815.9 (0.0123) and 1061 (0.0074), respectively. Conclusion: Errors occurred in recording the gestational ages of these two groups of preterm infants included recording the gestational age less than the actual value at birth. Therefore, developing scientific methods to correct these errors is essential to providing desirable health services and adjusting accurate health indicators.
Bayesian methods for the design and interpretation of clinical trials in very rare diseases
Hampson, Lisa V; Whitehead, John; Eleftheriou, Despina; Brogan, Paul
2014-01-01
This paper considers the design and interpretation of clinical trials comparing treatments for conditions so rare that worldwide recruitment efforts are likely to yield total sample sizes of 50 or fewer, even when patients are recruited over several years. For such studies, the sample size needed to meet a conventional frequentist power requirement is clearly infeasible. Rather, the expectation of any such trial has to be limited to the generation of an improved understanding of treatment options. We propose a Bayesian approach for the conduct of rare-disease trials comparing an experimental treatment with a control where patient responses are classified as a success or failure. A systematic elicitation from clinicians of their beliefs concerning treatment efficacy is used to establish Bayesian priors for unknown model parameters. The process of determining the prior is described, including the possibility of formally considering results from related trials. As sample sizes are small, it is possible to compute all possible posterior distributions of the two success rates. A number of allocation ratios between the two treatment groups can be considered with a view to maximising the prior probability that the trial concludes recommending the new treatment when in fact it is non-inferior to control. Consideration of the extent to which opinion can be changed, even by data from the best feasible design, can help to determine whether such a trial is worthwhile. © 2014 The Authors. Statistics in Medicine published by John Wiley & Sons, Ltd. PMID:24957522
Hummel, A D; Maciel, R F; Sousa, F S; Cohrs, F M; Falcão, A E J; Teixeira, F; Baptista, R; Mancini, F; da Costa, T M; Alves, D; Rodrigues, R G D S; Miranda, R; Pisa, I T
2011-05-01
The gold standard for nephrotoxicity and acute cellular rejection (ACR) is a biopsy, an invasive and expensive procedure. More efficient strategies to screen patients for biopsy are important from the clinical and financial points of view. The aim of this study was to evaluate various artificial intelligence techniques to screen for the need for a biopsy among patients suspected of nephrotoxicity or ACR during the first year after renal transplantation. We used classifiers like artificial neural networks (ANN), support vector machines (SVM), and Bayesian inference (BI) to indicate if the clinical course of the event suggestive of the need for a biopsy. Each classifier was evaluated by values of sensitivity and area under the ROC curve (AUC) for each of the classifiers. The technique that showed the best sensitivity value as an indicator for biopsy was SVM with an AUC of 0.79 and an accuracy rate of 79.86%. The results were better than those described in previous works. The accuracy for an indication of biopsy screening was efficient enough to become useful in clinical practice. Copyright © 2011 Elsevier Inc. All rights reserved.
Attallah, Omneya; Karthikesalingam, Alan; Holt, Peter Je; Thompson, Matthew M; Sayers, Rob; Bown, Matthew J; Choke, Eddie C; Ma, Xianghong
2017-11-01
Feature selection is essential in medical area; however, its process becomes complicated with the presence of censoring which is the unique character of survival analysis. Most survival feature selection methods are based on Cox's proportional hazard model, though machine learning classifiers are preferred. They are less employed in survival analysis due to censoring which prevents them from directly being used to survival data. Among the few work that employed machine learning classifiers, partial logistic artificial neural network with auto-relevance determination is a well-known method that deals with censoring and perform feature selection for survival data. However, it depends on data replication to handle censoring which leads to unbalanced and biased prediction results especially in highly censored data. Other methods cannot deal with high censoring. Therefore, in this article, a new hybrid feature selection method is proposed which presents a solution to high level censoring. It combines support vector machine, neural network, and K-nearest neighbor classifiers using simple majority voting and a new weighted majority voting method based on survival metric to construct a multiple classifier system. The new hybrid feature selection process uses multiple classifier system as a wrapper method and merges it with iterated feature ranking filter method to further reduce features. Two endovascular aortic repair datasets containing 91% censored patients collected from two centers were used to construct a multicenter study to evaluate the performance of the proposed approach. The results showed the proposed technique outperformed individual classifiers and variable selection methods based on Cox's model such as Akaike and Bayesian information criterions and least absolute shrinkage and selector operator in p values of the log-rank test, sensitivity, and concordance index. This indicates that the proposed classifier is more powerful in correctly predicting the risk of re-intervention enabling doctor in selecting patients' future follow-up plan.
Diagnosis of Tempromandibular Disorders Using Local Binary Patterns.
Haghnegahdar, A A; Kolahi, S; Khojastepour, L; Tajeripour, F
2018-03-01
Temporomandibular joint disorder (TMD) might be manifested as structural changes in bone through modification, adaptation or direct destruction. We propose to use Local Binary Pattern (LBP) characteristics and histogram-oriented gradients on the recorded images as a diagnostic tool in TMD assessment. CBCT images of 66 patients (132 joints) with TMD and 66 normal cases (132 joints) were collected and 2 coronal cut prepared from each condyle, although images were limited to head of mandibular condyle. In order to extract features of images, first we use LBP and then histogram of oriented gradients. To reduce dimensionality, the linear algebra Singular Value Decomposition (SVD) is applied to the feature vectors matrix of all images. For evaluation, we used K nearest neighbor (K-NN), Support Vector Machine, Naïve Bayesian and Random Forest classifiers. We used Receiver Operating Characteristic (ROC) to evaluate the hypothesis. K nearest neighbor classifier achieves a very good accuracy (0.9242), moreover, it has desirable sensitivity (0.9470) and specificity (0.9015) results, when other classifiers have lower accuracy, sensitivity and specificity. We proposed a fully automatic approach to detect TMD using image processing techniques based on local binary patterns and feature extraction. K-NN has been the best classifier for our experiments in detecting patients from healthy individuals, by 92.42% accuracy, 94.70% sensitivity and 90.15% specificity. The proposed method can help automatically diagnose TMD at its initial stages.
Feature Selection for Chemical Sensor Arrays Using Mutual Information
Wang, X. Rosalind; Lizier, Joseph T.; Nowotny, Thomas; Berna, Amalia Z.; Prokopenko, Mikhail; Trowell, Stephen C.
2014-01-01
We address the problem of feature selection for classifying a diverse set of chemicals using an array of metal oxide sensors. Our aim is to evaluate a filter approach to feature selection with reference to previous work, which used a wrapper approach on the same data set, and established best features and upper bounds on classification performance. We selected feature sets that exhibit the maximal mutual information with the identity of the chemicals. The selected features closely match those found to perform well in the previous study using a wrapper approach to conduct an exhaustive search of all permitted feature combinations. By comparing the classification performance of support vector machines (using features selected by mutual information) with the performance observed in the previous study, we found that while our approach does not always give the maximum possible classification performance, it always selects features that achieve classification performance approaching the optimum obtained by exhaustive search. We performed further classification using the selected feature set with some common classifiers and found that, for the selected features, Bayesian Networks gave the best performance. Finally, we compared the observed classification performances with the performance of classifiers using randomly selected features. We found that the selected features consistently outperformed randomly selected features for all tested classifiers. The mutual information filter approach is therefore a computationally efficient method for selecting near optimal features for chemical sensor arrays. PMID:24595058
Misbah, Mohammad; Roy, Gaurav; Shahid, Mudassar; Nag, Nalin; Kumar, Suresh; Husain, Mohammad
2016-05-01
Drug resistance mutations in the Pol gene of human immunodeficiency virus 1 (HIV-1) are one of the critical factors associated with antiretroviral therapy (ART) failure in HIV-1 patients. The issue of resistance to reverse transcriptase inhibitors (RTIs) in HIV infection has not been adequately addressed in the Indian subcontinent. We compared HIV-1 reverse transcriptase (RT) gene sequences to identify mutations present in HIV-1 patients who were ART non-responders, ART responders and drug naive. Genotypic drug resistance testing was performed by sequencing a 655-bp region of the RT gene from 102 HIV-1 patients, consisting of 30 ART-non-responding, 35 ART-responding and 37 drug-naive patients. The Stanford HIV Resistance Database (HIVDBv 6.2), IAS-USA mutation list, ANRS_09/2012 algorithm, and Rega v8.02 algorithm were used to interpret the pattern of drug resistance. The majority of the sequences (96 %) belonged to subtype C, and a few of them (3.9 %) to subtype A1. The frequency of drug resistance mutations observed in ART-non-responding, ART-responding and drug-naive patients was 40.1 %, 10.7 % and 20.58 %, respectively. It was observed that in non-responders, multiple mutations were present in the same patient, while in responders, a single mutation was found. Some of the drug-naive patients had more than one mutation. Thymidine analogue mutations (TAMs), however, were found in non-responders and naive patients but not in responders. Although drug resistance mutations were widely distributed among ART non-responders, the presence of resistance mutations in the viruses of drug-naive patients poses a big concern in the absence of a genotyping resistance test.
Ali, Ramadan A; Camick, Christina; Wiles, Katherine; Walseth, Timothy F; Slama, James T; Bhattacharya, Sumit; Giovannucci, David R; Wall, Katherine A
2016-02-26
Nicotinic acid adenine dinucleotide phosphate (NAADP), the most potent Ca(2+) mobilizing second messenger discovered to date, has been implicated in Ca(2+) signaling in some lymphomas and T cell clones. In contrast, the role of NAADP in Ca(2+) signaling or the identity of the Ca(2+) stores targeted by NAADP in conventional naive T cells is less clear. In the current study, we demonstrate the importance of NAADP in the generation of Ca(2+) signals in murine naive T cells. Combining live-cell imaging methods and a pharmacological approach using the NAADP antagonist Ned-19, we addressed the involvement of NAADP in the generation of Ca(2+) signals evoked by TCR stimulation and the role of this signal in downstream physiological end points such as proliferation, cytokine production, and other responses to stimulation. We demonstrated that acidic compartments in addition to the endoplasmic reticulum were the Ca(2+) stores that were sensitive to NAADP in naive T cells. NAADP was shown to evoke functionally relevant Ca(2+) signals in both naive CD4 and naive CD8 T cells. Furthermore, we examined the role of this signal in the activation, proliferation, and secretion of effector cytokines by Th1, Th2, Th17, and CD8 effector T cells. Overall, NAADP exhibited a similar profile in mediating Ca(2+) release in effector T cells as in their counterpart naive T cells and seemed to be equally important for the function of these different subsets of effector T cells. This profile was not observed for natural T regulatory cells. © 2016 by The American Society for Biochemistry and Molecular Biology, Inc.
Ali, Ramadan A.; Camick, Christina; Wiles, Katherine; Walseth, Timothy F.; Slama, James T.; Bhattacharya, Sumit; Giovannucci, David R.; Wall, Katherine A.
2016-01-01
Nicotinic acid adenine dinucleotide phosphate (NAADP), the most potent Ca2+ mobilizing second messenger discovered to date, has been implicated in Ca2+ signaling in some lymphomas and T cell clones. In contrast, the role of NAADP in Ca2+ signaling or the identity of the Ca2+ stores targeted by NAADP in conventional naive T cells is less clear. In the current study, we demonstrate the importance of NAADP in the generation of Ca2+ signals in murine naive T cells. Combining live-cell imaging methods and a pharmacological approach using the NAADP antagonist Ned-19, we addressed the involvement of NAADP in the generation of Ca2+ signals evoked by TCR stimulation and the role of this signal in downstream physiological end points such as proliferation, cytokine production, and other responses to stimulation. We demonstrated that acidic compartments in addition to the endoplasmic reticulum were the Ca2+ stores that were sensitive to NAADP in naive T cells. NAADP was shown to evoke functionally relevant Ca2+ signals in both naive CD4 and naive CD8 T cells. Furthermore, we examined the role of this signal in the activation, proliferation, and secretion of effector cytokines by Th1, Th2, Th17, and CD8 effector T cells. Overall, NAADP exhibited a similar profile in mediating Ca2+ release in effector T cells as in their counterpart naive T cells and seemed to be equally important for the function of these different subsets of effector T cells. This profile was not observed for natural T regulatory cells. PMID:26728458
Graessel, Anke; Hauck, Stefanie M.; von Toerne, Christine; Kloppmann, Edda; Goldberg, Tatyana; Koppensteiner, Herwig; Schindler, Michael; Knapp, Bettina; Krause, Linda; Dietz, Katharina; Schmidt-Weber, Carsten B.; Suttner, Kathrin
2015-01-01
Naive CD4+ T cells are the common precursors of multiple effector and memory T-cell subsets and possess a high plasticity in terms of differentiation potential. This stem-cell-like character is important for cell therapies aiming at regeneration of specific immunity. Cell surface proteins are crucial for recognition and response to signals mediated by other cells or environmental changes. Knowledge of cell surface proteins of human naive CD4+ T cells and their changes during the early phase of T-cell activation is urgently needed for a guided differentiation of naive T cells and may support the selection of pluripotent cells for cell therapy. Periodate oxidation and aniline-catalyzed oxime ligation technology was applied with subsequent quantitative liquid chromatography-tandem MS to generate a data set describing the surface proteome of primary human naive CD4+ T cells and to monitor dynamic changes during the early phase of activation. This led to the identification of 173 N-glycosylated surface proteins. To independently confirm the proteomic data set and to analyze the cell surface by an alternative technique a systematic phenotypic expression analysis of surface antigens via flow cytometry was performed. This screening expanded the previous data set, resulting in 229 surface proteins, which were expressed on naive unstimulated and activated CD4+ T cells. Furthermore, we generated a surface expression atlas based on transcriptome data, experimental annotation, and predicted subcellular localization, and correlated the proteomics result with this transcriptional data set. This extensive surface atlas provides an overall naive CD4+ T cell surface resource and will enable future studies aiming at a deeper understanding of mechanisms of T-cell biology allowing the identification of novel immune targets usable for the development of therapeutic treatments. PMID:25991687
Highly efficient gene transfer in naive human T cells with a murine leukemia virus-based vector.
Dardalhon, V; Jaleco, S; Rebouissou, C; Ferrand, C; Skander, N; Swainson, L; Tiberghien, P; Spits, H; Noraz, N; Taylor, N
2000-08-01
Retroviral vectors based on the Moloney murine leukemia virus (MuLV) have become the primary tool for gene delivery into hematopoietic cells, but clinical trials have been hampered by low transduction efficiencies. Recently, we and others have shown that gene transfer of MuLV-based vectors into T cells can be significantly augmented using a fibronectin-facilitated protocol. Nevertheless, the relative abilities of naive (CD45RA(+)) and memory (CD45RO(+)) lymphocyte subsets to be transduced has not been assessed. Although naive T cells demonstrate a restricted cytokine profile following antigen stimulation and a decreased susceptibility to infection with human immunodeficiency virus, it was not clear whether they could be efficiently infected with a MuLV vector. This study describes conditions that permitted gene transfer of an enhanced green fluorescent protein-expressing retroviral vector in more than 50% of naive umbilical cord (UC) blood and peripheral blood (PB) T cells following CD3/CD28 ligation. Moreover, treatment of naive T cells with interleukin-7 resulted in the maintenance of a CD45RA phenotype and gene transfer levels approached 20%. Finally, it was determined that parameters for optimal transduction of CD45RA(+) T cells isolated from PB and UC blood differed: transduction of the UC cells was significantly increased by the presence of autologous mononuclear cells (24.5% versus 56.5%). Because naive T cells harbor a receptor repertoire that allows them to respond to novel antigens, the development of protocols targeting their transduction is crucial for gene therapy applications. This approach will also allow the functions of exogenous genes to be evaluated in primary nontransformed naive T cells.
2014-01-01
Background Left bundle branch block (LBBB) and right bundle branch block (RBBB) not only mask electrocardiogram (ECG) changes that reflect diseases but also indicate important underlying pathology. The timely detection of LBBB and RBBB is critical in the treatment of cardiac diseases. Inter-patient heartbeat classification is based on independent training and testing sets to construct and evaluate a heartbeat classification system. Therefore, a heartbeat classification system with a high performance evaluation possesses a strong predictive capability for unknown data. The aim of this study was to propose a method for inter-patient classification of heartbeats to accurately detect LBBB and RBBB from the normal beat (NORM). Methods This study proposed a heartbeat classification method through a combination of three different types of classifiers: a minimum distance classifier constructed between NORM and LBBB; a weighted linear discriminant classifier between NORM and RBBB based on Bayesian decision making using posterior probabilities; and a linear support vector machine (SVM) between LBBB and RBBB. Each classifier was used with matching features to obtain better classification performance. The final types of the test heartbeats were determined using a majority voting strategy through the combination of class labels from the three classifiers. The optimal parameters for the classifiers were selected using cross-validation on the training set. The effects of different lead configurations on the classification results were assessed, and the performance of these three classifiers was compared for the detection of each pair of heartbeat types. Results The study results showed that a two-lead configuration exhibited better classification results compared with a single-lead configuration. The construction of a classifier with good performance between each pair of heartbeat types significantly improved the heartbeat classification performance. The results showed a sensitivity of 91.4% and a positive predictive value of 37.3% for LBBB and a sensitivity of 92.8% and a positive predictive value of 88.8% for RBBB. Conclusions A multi-classifier ensemble method was proposed based on inter-patient data and demonstrated a satisfactory classification performance. This approach has the potential for application in clinical practice to distinguish LBBB and RBBB from NORM of unknown patients. PMID:24903422
Huang, Huifang; Liu, Jie; Zhu, Qiang; Wang, Ruiping; Hu, Guangshu
2014-06-05
Left bundle branch block (LBBB) and right bundle branch block (RBBB) not only mask electrocardiogram (ECG) changes that reflect diseases but also indicate important underlying pathology. The timely detection of LBBB and RBBB is critical in the treatment of cardiac diseases. Inter-patient heartbeat classification is based on independent training and testing sets to construct and evaluate a heartbeat classification system. Therefore, a heartbeat classification system with a high performance evaluation possesses a strong predictive capability for unknown data. The aim of this study was to propose a method for inter-patient classification of heartbeats to accurately detect LBBB and RBBB from the normal beat (NORM). This study proposed a heartbeat classification method through a combination of three different types of classifiers: a minimum distance classifier constructed between NORM and LBBB; a weighted linear discriminant classifier between NORM and RBBB based on Bayesian decision making using posterior probabilities; and a linear support vector machine (SVM) between LBBB and RBBB. Each classifier was used with matching features to obtain better classification performance. The final types of the test heartbeats were determined using a majority voting strategy through the combination of class labels from the three classifiers. The optimal parameters for the classifiers were selected using cross-validation on the training set. The effects of different lead configurations on the classification results were assessed, and the performance of these three classifiers was compared for the detection of each pair of heartbeat types. The study results showed that a two-lead configuration exhibited better classification results compared with a single-lead configuration. The construction of a classifier with good performance between each pair of heartbeat types significantly improved the heartbeat classification performance. The results showed a sensitivity of 91.4% and a positive predictive value of 37.3% for LBBB and a sensitivity of 92.8% and a positive predictive value of 88.8% for RBBB. A multi-classifier ensemble method was proposed based on inter-patient data and demonstrated a satisfactory classification performance. This approach has the potential for application in clinical practice to distinguish LBBB and RBBB from NORM of unknown patients.
Antoniou, Tony; Gillis, Jennifer; Loutfy, Mona R; Cooper, Curtis; Hogg, Robert S; Klein, Marina B; Machouf, Nima; Montaner, Julio S G; Rourke, Sean B; Tsoukas, Chris; Raboud, Janet M
2014-01-01
To evaluate the trends in abacavir (ABC) prescription among antiretroviral (ARV) medication-naive individuals following the presentation of the Data Collection on Adverse Events of Anti-HIV Drugs (DAD) cohort study. We conducted a retrospective cohort study of ARV medication-naive individuals in the Canadian Observational Cohort (CANOC). Between January 1, 2000, and February 28, 2010, a total of 7280 ARV medication-naive patients were included in CANOC. We observed a significant change in the proportion of new ABC prescriptions immediately following the release of DAD (-11%; 95% confidence interval [CI]: -20% to -2.4%) and in the months following the presentation of these data (-0.66% per month; 95% CI: -1.2% to -0.073%). A post-DAD presentation decrease in the odds of being prescribed ABC versus tenofovir (TDF) was observed (adjusted odds ratio, 0.72 per year, 95% CI: 0.54-0.97). Presentation of the DAD was associated with a significant decrease in ABC use among ARV medication-naive, HIV-positive patients initiating therapy.
Quinn, Kylie M; Fox, Annette; Harland, Kim L; Russ, Brendan E; Li, Jasmine; Nguyen, Thi H O; Loh, Liyen; Olshanksy, Moshe; Naeem, Haroon; Tsyganov, Kirill; Wiede, Florian; Webster, Rosela; Blyth, Chantelle; Sng, Xavier Y X; Tiganis, Tony; Powell, David; Doherty, Peter C; Turner, Stephen J; Kedzierska, Katherine; La Gruta, Nicole L
2018-06-19
Age-associated decreases in primary CD8 + T cell responses occur, in part, due to direct effects on naive CD8 + T cells to reduce intrinsic functionality, but the precise nature of this defect remains undefined. Aging also causes accumulation of antigen-naive but semi-differentiated "virtual memory" (T VM ) cells, but their contribution to age-related functional decline is unclear. Here, we show that T VM cells are poorly proliferative in aged mice and humans, despite being highly proliferative in young individuals, while conventional naive T cells (T N cells) retain proliferative capacity in both aged mice and humans. Adoptive transfer experiments in mice illustrated that naive CD8 T cells can acquire a proliferative defect imposed by the aged environment but age-related proliferative dysfunction could not be rescued by a young environment. Molecular analyses demonstrate that aged T VM cells exhibit a profile consistent with senescence, marking an observation of senescence in an antigenically naive T cell population. Copyright © 2018 The Author(s). Published by Elsevier Inc. All rights reserved.
Mizumachi, Hideyuki; Sakuma, Megumi; Ikezumi, Mayu; Saito, Kazutoshi; Takeyoshi, Midori; Imai, Noriyasu; Okutomi, Hiroko; Umetsu, Asami; Motohashi, Hiroko; Watanabe, Mika; Miyazawa, Masaaki
2018-05-03
The epidermal sensitization assay (EpiSensA) is an in vitro skin sensitization test method based on gene expression of four markers related to the induction of skin sensitization; the assay uses commercially available reconstructed human epidermis. EpiSensA has exhibited an accuracy of 90% for 72 chemicals, including lipophilic chemicals and pre-/pro-haptens, when compared with the results of the murine local lymph node assay. In this work, a ring study was performed by one lead and two naive laboratories to evaluate the transferability, as well as within- and between-laboratory reproducibilities, of EpiSensA. Three non-coded chemicals (two lipophilic sensitizers and one non-sensitizer) were tested for the assessment of transferability and 10 coded chemicals (seven sensitizers and three non-sensitizers, including four lipophilic chemicals) were tested for the assessment of reproducibility. In the transferability phase, the non-coded chemicals (two sensitizers and one non-sensitizer) were correctly classified at the two naive laboratories, indicating that the EpiSensA protocol was transferred successfully. For the within-laboratory reproducibility, the data generated with three coded chemicals tested in three independent experiments in each laboratory gave consistent predictions within laboratories. For the between-laboratory reproducibility, 9 of the 10 coded chemicals tested once in each laboratory provided consistent predictions among the three laboratories. These results suggested that EpiSensA has good transferability, as well as within- and between-laboratory reproducibility. Copyright © 2018 John Wiley & Sons, Ltd.
NASA Astrophysics Data System (ADS)
Rahmadani, S.; Dongoran, A.; Zarlis, M.; Zakarias
2018-03-01
This paper discusses the problem of feature selection using genetic algorithms on a dataset for classification problems. The classification model used is the decicion tree (DT), and Naive Bayes. In this paper we will discuss how the Naive Bayes and Decision Tree models to overcome the classification problem in the dataset, where the dataset feature is selectively selected using GA. Then both models compared their performance, whether there is an increase in accuracy or not. From the results obtained shows an increase in accuracy if the feature selection using GA. The proposed model is referred to as GADT (GA-Decision Tree) and GANB (GA-Naive Bayes). The data sets tested in this paper are taken from the UCI Machine Learning repository.
Supervised Detection of Anomalous Light Curves in Massive Astronomical Catalogs
NASA Astrophysics Data System (ADS)
Nun, Isadora; Pichara, Karim; Protopapas, Pavlos; Kim, Dae-Won
2014-09-01
The development of synoptic sky surveys has led to a massive amount of data for which resources needed for analysis are beyond human capabilities. In order to process this information and to extract all possible knowledge, machine learning techniques become necessary. Here we present a new methodology to automatically discover unknown variable objects in large astronomical catalogs. With the aim of taking full advantage of all information we have about known objects, our method is based on a supervised algorithm. In particular, we train a random forest classifier using known variability classes of objects and obtain votes for each of the objects in the training set. We then model this voting distribution with a Bayesian network and obtain the joint voting distribution among the training objects. Consequently, an unknown object is considered as an outlier insofar it has a low joint probability. By leaving out one of the classes on the training set, we perform a validity test and show that when the random forest classifier attempts to classify unknown light curves (the class left out), it votes with an unusual distribution among the classes. This rare voting is detected by the Bayesian network and expressed as a low joint probability. Our method is suitable for exploring massive data sets given that the training process is performed offline. We tested our algorithm on 20 million light curves from the MACHO catalog and generated a list of anomalous candidates. After analysis, we divided the candidates into two main classes of outliers: artifacts and intrinsic outliers. Artifacts were principally due to air mass variation, seasonal variation, bad calibration, or instrumental errors and were consequently removed from our outlier list and added to the training set. After retraining, we selected about 4000 objects, which we passed to a post-analysis stage by performing a cross-match with all publicly available catalogs. Within these candidates we identified certain known but rare objects such as eclipsing Cepheids, blue variables, cataclysmic variables, and X-ray sources. For some outliers there was no additional information. Among them we identified three unknown variability types and a few individual outliers that will be followed up in order to perform a deeper analysis.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nun, Isadora; Pichara, Karim; Protopapas, Pavlos
The development of synoptic sky surveys has led to a massive amount of data for which resources needed for analysis are beyond human capabilities. In order to process this information and to extract all possible knowledge, machine learning techniques become necessary. Here we present a new methodology to automatically discover unknown variable objects in large astronomical catalogs. With the aim of taking full advantage of all information we have about known objects, our method is based on a supervised algorithm. In particular, we train a random forest classifier using known variability classes of objects and obtain votes for each ofmore » the objects in the training set. We then model this voting distribution with a Bayesian network and obtain the joint voting distribution among the training objects. Consequently, an unknown object is considered as an outlier insofar it has a low joint probability. By leaving out one of the classes on the training set, we perform a validity test and show that when the random forest classifier attempts to classify unknown light curves (the class left out), it votes with an unusual distribution among the classes. This rare voting is detected by the Bayesian network and expressed as a low joint probability. Our method is suitable for exploring massive data sets given that the training process is performed offline. We tested our algorithm on 20 million light curves from the MACHO catalog and generated a list of anomalous candidates. After analysis, we divided the candidates into two main classes of outliers: artifacts and intrinsic outliers. Artifacts were principally due to air mass variation, seasonal variation, bad calibration, or instrumental errors and were consequently removed from our outlier list and added to the training set. After retraining, we selected about 4000 objects, which we passed to a post-analysis stage by performing a cross-match with all publicly available catalogs. Within these candidates we identified certain known but rare objects such as eclipsing Cepheids, blue variables, cataclysmic variables, and X-ray sources. For some outliers there was no additional information. Among them we identified three unknown variability types and a few individual outliers that will be followed up in order to perform a deeper analysis.« less
Deetjen, Ulrike; Powell, John A
2016-05-01
This research examines the extent to which informational and emotional elements are employed in online support forums for 14 purposively sampled chronic medical conditions and the factors that influence whether posts are of a more informational or emotional nature. Large-scale qualitative data were obtained from Dailystrength.org. Based on a hand-coded training dataset, all posts were classified into informational or emotional using a Bayesian classification algorithm to generalize the findings. Posts that could not be classified with a probability of at least 75% were excluded. The overall tendency toward emotional posts differs by condition: mental health (depression, schizophrenia) and Alzheimer's disease consist of more emotional posts, while informational posts relate more to nonterminal physical conditions (irritable bowel syndrome, diabetes, asthma). There is no gender difference across conditions, although prostate cancer forums are oriented toward informational support, whereas breast cancer forums rather feature emotional support. Across diseases, the best predictors for emotional content are lower age and a higher number of overall posts by the support group member. The results are in line with previous empirical research and unify empirical findings from single/2-condition research. Limitations include the analytical restriction to predefined categories (informational, emotional) through the chosen machine-learning approach. Our findings provide an empirical foundation for building theory on informational versus emotional support across conditions, give insights for practitioners to better understand the role of online support groups for different patients, and show the usefulness of machine-learning approaches to analyze large-scale qualitative health data from online settings. © The Author 2016. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Porter, Teresita M; Gibson, Joel F; Shokralla, Shadi; Baird, Donald J; Golding, G Brian; Hajibabaei, Mehrdad
2014-01-01
Current methods to identify unknown insect (class Insecta) cytochrome c oxidase (COI barcode) sequences often rely on thresholds of distances that can be difficult to define, sequence similarity cut-offs, or monophyly. Some of the most commonly used metagenomic classification methods do not provide a measure of confidence for the taxonomic assignments they provide. The aim of this study was to use a naïve Bayesian classifier (Wang et al. Applied and Environmental Microbiology, 2007; 73: 5261) to automate taxonomic assignments for large batches of insect COI sequences such as data obtained from high-throughput environmental sequencing. This method provides rank-flexible taxonomic assignments with an associated bootstrap support value, and it is faster than the blast-based methods commonly used in environmental sequence surveys. We have developed and rigorously tested the performance of three different training sets using leave-one-out cross-validation, two field data sets, and targeted testing of Lepidoptera, Diptera and Mantodea sequences obtained from the Barcode of Life Data system. We found that type I error rates, incorrect taxonomic assignments with a high bootstrap support, were already relatively low but could be lowered further by ensuring that all query taxa are actually present in the reference database. Choosing bootstrap support cut-offs according to query length and summarizing taxonomic assignments to more inclusive ranks can also help to reduce error while retaining the maximum number of assignments. Additionally, we highlight gaps in the taxonomic and geographic representation of insects in public sequence databases that will require further work by taxonomists to improve the quality of assignments generated using any method.
Measuring Cosmological Parameters with Photometrically Classified Pan-STARRS Supernovae
NASA Astrophysics Data System (ADS)
Jones, David; Scolnic, Daniel; Riess, Adam; Rest, Armin; Kirshner, Robert; Berger, Edo; Kessler, Rick; Pan, Yen-Chen; Foley, Ryan; Chornock, Ryan; Ortega, Carolyn; Challis, Peter; Burgett, William; Chambers, Kenneth; Draper, Peter; Flewelling, Heather; Huber, Mark; Kaiser, Nick; Kudritzki, Rolf; Metcalfe, Nigel; Tonry, John; Wainscoat, Richard J.; Waters, Chris; Gall, E. E. E.; Kotak, Rubina; McCrum, Matt; Smartt, Stephen; Smith, Ken
2018-01-01
We use nearly 1,200 supernovae (SNe) from Pan-STARRS and ~200 low-z (z < 0.1) SNe Ia to measure cosmological parameters. Though most of these SNe lack spectroscopic classifications, in a previous paper we demonstrated that photometrically classified SNe can still be used to infer unbiased cosmological parameters by using a Bayesian methodology that marginalizes over core-collapse (CC) SN contamination. Our sample contains nearly twice as many SNe as the largest previous compilation of SNe Ia. Combining SNe with Cosmic Microwave Background (CMB) constraints from the Planck satellite, we measure the dark energy equation of state parameter w to be -0.986±0.058 (stat+sys). If we allow w to evolve with redshift as w(a) = w0 + wa(1-a), we find w0 = -0.923±0.148 and wa = -0.404±0.797. These results are consistent with measurements of cosmological parameters from the JLA and from a new analysis of 1049 spectroscopically confirmed SNe Ia (Scolnic et al. 2017). We try four different photometric classification priors for Pan-STARRS SNe and two alternate ways of modeling the CC SN contamination, finding that none of these variants gives a w that differs by more than 1% from the baseline measurement. The systematic uncertainty on w due to marginalizing over the CC SN contamination, σwCC = 0.019, is approximately equal to the photometric calibration uncertainty and is lower than the systematic uncertainty in the SN\\,Ia dispersion model (σwdisp = 0.024). Our data provide one of the best current constraints on w, demonstrating that samples with ~5% CC SN contamination can give competitive cosmological constraints when the contaminating distribution is marginalized over in a Bayesian framework.
NASA Astrophysics Data System (ADS)
Jones, D. O.; Scolnic, D. M.; Riess, A. G.; Rest, A.; Kirshner, R. P.; Berger, E.; Kessler, R.; Pan, Y.-C.; Foley, R. J.; Chornock, R.; Ortega, C. A.; Challis, P. J.; Burgett, W. S.; Chambers, K. C.; Draper, P. W.; Flewelling, H.; Huber, M. E.; Kaiser, N.; Kudritzki, R.-P.; Metcalfe, N.; Tonry, J.; Wainscoat, R. J.; Waters, C.; Gall, E. E. E.; Kotak, R.; McCrum, M.; Smartt, S. J.; Smith, K. W.
2018-04-01
We use 1169 Pan-STARRS supernovae (SNe) and 195 low-z (z < 0.1) SNe Ia to measure cosmological parameters. Though most Pan-STARRS SNe lack spectroscopic classifications, in a previous paper we demonstrated that photometrically classified SNe can be used to infer unbiased cosmological parameters by using a Bayesian methodology that marginalizes over core-collapse (CC) SN contamination. Our sample contains nearly twice as many SNe as the largest previous SN Ia compilation. Combining SNe with cosmic microwave background (CMB) constraints from Planck, we measure the dark energy equation-of-state parameter w to be ‑0.989 ± 0.057 (stat+sys). If w evolves with redshift as w(a) = w 0 + w a (1 ‑ a), we find w 0 = ‑0.912 ± 0.149 and w a = ‑0.513 ± 0.826. These results are consistent with cosmological parameters from the Joint Light-curve Analysis and the Pantheon sample. We try four different photometric classification priors for Pan-STARRS SNe and two alternate ways of modeling CC SN contamination, finding that no variant gives a w differing by more than 2% from the baseline measurement. The systematic uncertainty on w due to marginalizing over CC SN contamination, {σ }wCC}=0.012, is the third-smallest source of systematic uncertainty in this work. We find limited (1.6σ) evidence for evolution of the SN color-luminosity relation with redshift, a possible systematic that could constitute a significant uncertainty in future high-z analyses. Our data provide one of the best current constraints on w, demonstrating that samples with ∼5% CC SN contamination can give competitive cosmological constraints when the contaminating distribution is marginalized over in a Bayesian framework.
Non-invasive Fetal ECG Signal Quality Assessment for Multichannel Heart Rate Estimation.
Andreotti, Fernando; Graser, Felix; Malberg, Hagen; Zaunseder, Sebastian
2017-12-01
The noninvasive fetal ECG (NI-FECG) from abdominal recordings offers novel prospects for prenatal monitoring. However, NI-FECG signals are corrupted by various nonstationary noise sources, making the processing of abdominal recordings a challenging task. In this paper, we present an online approach that dynamically assess the quality of NI-FECG to improve fetal heart rate (FHR) estimation. Using a naive Bayes classifier, state-of-the-art and novel signal quality indices (SQIs), and an existing adaptive Kalman filter, FHR estimation was improved. For the purpose of training and validating the proposed methods, a large annotated private clinical dataset was used. The suggested classification scheme demonstrated an accuracy of Krippendorff's alpha in determining the overall quality of NI-FECG signals. The proposed Kalman filter outperformed alternative methods for FHR estimation achieving accuracy. The proposed algorithm was able to reliably reflect changes of signal quality and can be used in improving FHR estimation. NI-ECG signal quality estimation and multichannel information fusion are largely unexplored topics. Based on previous works, multichannel FHR estimation is a field that could strongly benefit from such methods. The developed SQI algorithms as well as resulting classifier were made available under a GNU GPL open-source license and contributed to the FECGSYN toolbox.
Agnihotri, Deepak; Verma, Kesari; Tripathi, Priyanka
2016-01-01
The contiguous sequences of the terms (N-grams) in the documents are symmetrically distributed among different classes. The symmetrical distribution of the N-Grams raises uncertainty in the belongings of the N-Grams towards the class. In this paper, we focused on the selection of most discriminating N-Grams by reducing the effects of symmetrical distribution. In this context, a new text feature selection method named as the symmetrical strength of the N-Grams (SSNG) is proposed using a two pass filtering based feature selection (TPF) approach. Initially, in the first pass of the TPF, the SSNG method chooses various informative N-Grams from the entire extracted N-Grams of the corpus. Subsequently, in the second pass the well-known Chi Square (χ(2)) method is being used to select few most informative N-Grams. Further, to classify the documents the two standard classifiers Multinomial Naive Bayes and Linear Support Vector Machine have been applied on the ten standard text data sets. In most of the datasets, the experimental results state the performance and success rate of SSNG method using TPF approach is superior to the state-of-the-art methods viz. Mutual Information, Information Gain, Odds Ratio, Discriminating Feature Selection and χ(2).
Automation of motor dexterity assessment.
Heyer, Patrick; Castrejon, Luis R; Orihuela-Espina, Felipe; Sucar, Luis Enrique
2017-07-01
Motor dexterity assessment is regularly performed in rehabilitation wards to establish patient status and automatization for such routinary task is sought. A system for automatizing the assessment of motor dexterity based on the Fugl-Meyer scale and with loose restrictions on sensing technologies is presented. The system consists of two main elements: 1) A data representation that abstracts the low level information obtained from a variety of sensors, into a highly separable low dimensionality encoding employing t-distributed Stochastic Neighbourhood Embedding, and, 2) central to this communication, a multi-label classifier that boosts classification rates by exploiting the fact that the classes corresponding to the individual exercises are naturally organized as a network. Depending on the targeted therapeutic movement class labels i.e. exercises scores, are highly correlated-patients who perform well in one, tends to perform well in related exercises-; and critically no node can be used as proxy of others - an exercise does not encode the information of other exercises. Over data from a cohort of 20 patients, the novel classifier outperforms classical Naive Bayes, random forest and variants of support vector machines (ANOVA: p < 0.001). The novel multi-label classification strategy fulfills an automatic system for motor dexterity assessment, with implications for lessening therapist's workloads, reducing healthcare costs and providing support for home-based virtual rehabilitation and telerehabilitation alternatives.
Context-aware adaptive spelling in motor imagery BCI
NASA Astrophysics Data System (ADS)
Perdikis, S.; Leeb, R.; Millán, J. d. R.
2016-06-01
Objective. This work presents a first motor imagery-based, adaptive brain-computer interface (BCI) speller, which is able to exploit application-derived context for improved, simultaneous classifier adaptation and spelling. Online spelling experiments with ten able-bodied users evaluate the ability of our scheme, first, to alleviate non-stationarity of brain signals for restoring the subject’s performances, second, to guide naive users into BCI control avoiding initial offline BCI calibration and, third, to outperform regular unsupervised adaptation. Approach. Our co-adaptive framework combines the BrainTree speller with smooth-batch linear discriminant analysis adaptation. The latter enjoys contextual assistance through BrainTree’s language model to improve online expectation-maximization maximum-likelihood estimation. Main results. Our results verify the possibility to restore single-sample classification and BCI command accuracy, as well as spelling speed for expert users. Most importantly, context-aware adaptation performs significantly better than its unsupervised equivalent and similar to the supervised one. Although no significant differences are found with respect to the state-of-the-art PMean approach, the proposed algorithm is shown to be advantageous for 30% of the users. Significance. We demonstrate the possibility to circumvent supervised BCI recalibration, saving time without compromising the adaptation quality. On the other hand, we show that this type of classifier adaptation is not as efficient for BCI training purposes.
Context-aware adaptive spelling in motor imagery BCI.
Perdikis, S; Leeb, R; Millán, J D R
2016-06-01
This work presents a first motor imagery-based, adaptive brain-computer interface (BCI) speller, which is able to exploit application-derived context for improved, simultaneous classifier adaptation and spelling. Online spelling experiments with ten able-bodied users evaluate the ability of our scheme, first, to alleviate non-stationarity of brain signals for restoring the subject's performances, second, to guide naive users into BCI control avoiding initial offline BCI calibration and, third, to outperform regular unsupervised adaptation. Our co-adaptive framework combines the BrainTree speller with smooth-batch linear discriminant analysis adaptation. The latter enjoys contextual assistance through BrainTree's language model to improve online expectation-maximization maximum-likelihood estimation. Our results verify the possibility to restore single-sample classification and BCI command accuracy, as well as spelling speed for expert users. Most importantly, context-aware adaptation performs significantly better than its unsupervised equivalent and similar to the supervised one. Although no significant differences are found with respect to the state-of-the-art PMean approach, the proposed algorithm is shown to be advantageous for 30% of the users. We demonstrate the possibility to circumvent supervised BCI recalibration, saving time without compromising the adaptation quality. On the other hand, we show that this type of classifier adaptation is not as efficient for BCI training purposes.
Barnabe, Cheryl; Tomlinson, George; Marshall, Deborah; Devoe, Dan; Bombardier, Claire
2016-01-01
Objective To compare methotrexate based disease modifying antirheumatic drug (DMARD) treatments for rheumatoid arthritis in patients naive to or with an inadequate response to methotrexate. Design Systematic review and Bayesian random effects network meta-analysis of trials assessing methotrexate used alone or in combination with other conventional synthetic DMARDs, biologic drugs, or tofacitinib in adult patients with rheumatoid arthritis. Data sources Trials were identified from Medline, Embase, and Central databases from inception to 19 January 2016; abstracts from two major rheumatology meetings from 2009 to 2015; two trial registers; and hand searches of Cochrane reviews. Study selection criteria Randomized or quasi-randomized trials that compared methotrexate with any other DMARD or combination of DMARDs and contributed to the network of evidence between the treatments of interest. Main outcomes American College of Rheumatology (ACR) 50 response (major clinical improvement), radiographic progression, and withdrawals due to adverse events. A comparison between two treatments was considered statistically significant if its credible interval excluded the null effect, indicating >97.5% probability that one treatment was superior. Results 158 trials were included, with between 10 and 53 trials available for each outcome. In methotrexate naive patients, several treatments were statistically superior to oral methotrexate for ACR50 response: sulfasalazine and hydroxychloroquine (“triple therapy”), several biologics (abatacept, adalimumab, etanercept, infliximab, rituximab, tocilizumab), and tofacitinib. The estimated probability of ACR50 response was similar between these treatments (range 56-67%), compared with 41% with methotrexate. Methotrexate combined with adalimumab, etanercept, certolizumab, or infliximab was statistically superior to oral methotrexate for inhibiting radiographic progression, but the estimated mean change over one year with all treatments was less than the minimal clinically important difference of 5 units on the Sharp-van der Heijde scale. Triple therapy had statistically fewer withdrawals due to adverse events than methotrexate plus infliximab. After an inadequate response to methotrexate, several treatments were statistically superior to oral methotrexate for ACR50 response: triple therapy, methotrexate plus hydroxychloroquine, methotrexate plus leflunomide, methotrexate plus intramuscular gold, methotrexate plus most biologics, and methotrexate plus tofacitinib. The probability of response was 61% with triple therapy and ranged widely (27-70%) with other treatments. No treatment was statistically superior to oral methotrexate for inhibiting radiographic progression. Methotrexate plus abatacept had a statistically lower rate of withdrawals due to adverse events than several treatments. Conclusions Triple therapy (methotrexate plus sulfasalazine plus hydroxychloroquine) and most regimens combining biologic DMARDs with methotrexate were effective in controlling disease activity, and all were generally well tolerated in both methotrexate naive and methotrexate exposed patients. PMID:27102806
IND - THE IND DECISION TREE PACKAGE
NASA Technical Reports Server (NTRS)
Buntine, W.
1994-01-01
A common approach to supervised classification and prediction in artificial intelligence and statistical pattern recognition is the use of decision trees. A tree is "grown" from data using a recursive partitioning algorithm to create a tree which has good prediction of classes on new data. Standard algorithms are CART (by Breiman Friedman, Olshen and Stone) and ID3 and its successor C4 (by Quinlan). As well as reimplementing parts of these algorithms and offering experimental control suites, IND also introduces Bayesian and MML methods and more sophisticated search in growing trees. These produce more accurate class probability estimates that are important in applications like diagnosis. IND is applicable to most data sets consisting of independent instances, each described by a fixed length vector of attribute values. An attribute value may be a number, one of a set of attribute specific symbols, or it may be omitted. One of the attributes is delegated the "target" and IND grows trees to predict the target. Prediction can then be done on new data or the decision tree printed out for inspection. IND provides a range of features and styles with convenience for the casual user as well as fine-tuning for the advanced user or those interested in research. IND can be operated in a CART-like mode (but without regression trees, surrogate splits or multivariate splits), and in a mode like the early version of C4. Advanced features allow more extensive search, interactive control and display of tree growing, and Bayesian and MML algorithms for tree pruning and smoothing. These often produce more accurate class probability estimates at the leaves. IND also comes with a comprehensive experimental control suite. IND consists of four basic kinds of routines: data manipulation routines, tree generation routines, tree testing routines, and tree display routines. The data manipulation routines are used to partition a single large data set into smaller training and test sets. The generation routines are used to build classifiers. The test routines are used to evaluate classifiers and to classify data using a classifier. And the display routines are used to display classifiers in various formats. IND is written in C-language for Sun4 series computers. It consists of several programs with controlling shell scripts. Extensive UNIX man entries are included. IND is designed to be used on any UNIX system, although it has only been thoroughly tested on SUN platforms. The standard distribution medium for IND is a .25 inch streaming magnetic tape cartridge in UNIX tar format. An electronic copy of the documentation in PostScript format is included on the distribution medium. IND was developed in 1992.
Classification of Focal and Non Focal Epileptic Seizures Using Multi-Features and SVM Classifier.
Sriraam, N; Raghu, S
2017-09-02
Identifying epileptogenic zones prior to surgery is an essential and crucial step in treating patients having pharmacoresistant focal epilepsy. Electroencephalogram (EEG) is a significant measurement benchmark to assess patients suffering from epilepsy. This paper investigates the application of multi-features derived from different domains to recognize the focal and non focal epileptic seizures obtained from pharmacoresistant focal epilepsy patients from Bern Barcelona database. From the dataset, five different classification tasks were formed. Total 26 features were extracted from focal and non focal EEG. Significant features were selected using Wilcoxon rank sum test by setting p-value (p < 0.05) and z-score (-1.96 > z > 1.96) at 95% significance interval. Hypothesis was made that the effect of removing outliers improves the classification accuracy. Turkey's range test was adopted for pruning outliers from feature set. Finally, 21 features were classified using optimized support vector machine (SVM) classifier with 10-fold cross validation. Bayesian optimization technique was adopted to minimize the cross-validation loss. From the simulation results, it was inferred that the highest sensitivity, specificity, and classification accuracy of 94.56%, 89.74%, and 92.15% achieved respectively and found to be better than the state-of-the-art approaches. Further, it was observed that the classification accuracy improved from 80.2% with outliers to 92.15% without outliers. The classifier performance metrics ensures the suitability of the proposed multi-features with optimized SVM classifier. It can be concluded that the proposed approach can be applied for recognition of focal EEG signals to localize epileptogenic zones.
Eberle, P; Berger, C; Junge, S; Dougoud, S; Büchel, E Valsangiacomo; Riegel, M; Schinzel, A; Seger, R; Güngör, T
2009-01-01
A subgroup of patients with 22q11·2 microdeletion and partial DiGeorge syndrome (pDGS) appears to be susceptible to non-cardiac mortality (NCM) despite sufficient overall CD4+ T cells. To detect these patients, 20 newborns with 22q11·2 microdeletion and congenital heart disease were followed prospectively for 6 years. Besides detailed clinical assessment, longitudinal monitoring of naive CD4+ and cytotoxic CD3+CD8+ T cells (CTL) was performed. To monitor thymic activity, we analysed naive platelet endothelial cell adhesion molecule-1 (CD31+) expressing CD45RA+RO−CD4+ cells containing high numbers of T cell receptor excision circle (TREC)-bearing lymphocytes and compared them with normal values of healthy children (n = 75). Comparing two age periods, low overall CD4+ and naive CD4+ T cell numbers were observed in 65%/75%, respectively, of patients in period A (< 1 year) declining to 22%/50%, respectively, of patients in period B (> 1/< 7 years). The percentage of patients with low CTLs (< P10) remained robust until school age (period A: 60%; period B: 50%). Low numbers of CTLs were associated with abnormally low naive CD45RA+RO−CD4+ T cells. A high-risk (HR) group (n = 11) and a standard-risk (SR) (n = 9) group were identified. HR patients were characterized by low numbers of both naive CD4+ and CTLs and were prone to lethal infectious and lymphoproliferative complications (NCM: four of 11; cardiac mortality: one of 11) while SR patients were not (NCM: none of nine; cardiac mortality: two of nine). Naive CD31+CD45RA+RO−CD4+, naive CD45RA+RO−CD4+ T cells as well as TRECs/106 mononuclear cells were abnormally low in HR and normal in SR patients. Longitudinal monitoring of naive CD4+ and cytotoxic T cells may help to discriminate pDGS patients at increased risk for NCM. PMID:19040613
Identifying Wrist Fracture Patients with High Accuracy by Automatic Categorization of X-ray Reports
de Bruijn, Berry; Cranney, Ann; O’Donnell, Siobhan; Martin, Joel D.; Forster, Alan J.
2006-01-01
The authors performed this study to determine the accuracy of several text classification methods to categorize wrist x-ray reports. We randomly sampled 751 textual wrist x-ray reports. Two expert reviewers rated the presence (n = 301) or absence (n = 450) of an acute fracture of wrist. We developed two information retrieval (IR) text classification methods and a machine learning method using a support vector machine (TC-1). In cross-validation on the derivation set (n = 493), TC-1 outperformed the two IR based methods and six benchmark classifiers, including Naive Bayes and a Neural Network. In the validation set (n = 258), TC-1 demonstrated consistent performance with 93.8% accuracy; 95.5% sensitivity; 92.9% specificity; and 87.5% positive predictive value. TC-1 was easy to implement and superior in performance to the other classification methods. PMID:16929046
Monitoring eating habits using a piezoelectric sensor-based necklace.
Kalantarian, Haik; Alshurafa, Nabil; Le, Tuan; Sarrafzadeh, Majid
2015-03-01
Maintaining appropriate levels of food intake and developing regularity in eating habits is crucial to weight loss and the preservation of a healthy lifestyle. Moreover, awareness of eating habits is an important step towards portion control and weight loss. In this paper, we introduce a novel food-intake monitoring system based around a wearable wireless-enabled necklace. The proposed necklace includes an embedded piezoelectric sensor, small Arduino-compatible microcontroller, Bluetooth LE transceiver, and Lithium-Polymer battery. Motion in the throat is captured and transmitted to a mobile application for processing and user guidance. Results from data collected from 30 subjects indicate that it is possible to detect solid and liquid foods, with an F-measure of 0.837 and 0.864, respectively, using a naive Bayes classifier. Furthermore, identification of extraneous motions such as head turns and walking are shown to significantly reduce the false positive rate of swallow detection. Copyright © 2015 Elsevier Ltd. All rights reserved.
Ding, Huijun; He, Qing; Zhou, Yongjin; Dan, Guo; Cui, Song
2017-01-01
Motion-intent-based finger gesture recognition systems are crucial for many applications such as prosthesis control, sign language recognition, wearable rehabilitation system, and human–computer interaction. In this article, a motion-intent-based finger gesture recognition system is designed to correctly identify the tapping of every finger for the first time. Two auto-event annotation algorithms are firstly applied and evaluated for detecting the finger tapping frame. Based on the truncated signals, the Wavelet packet transform (WPT) coefficients are calculated and compressed as the features, followed by a feature selection method that is able to improve the performance by optimizing the feature set. Finally, three popular classifiers including naive Bayes (NBC), K-nearest neighbor (KNN), and support vector machine (SVM) are applied and evaluated. The recognition accuracy can be achieved up to 94%. The design and the architecture of the system are presented with full system characterization results. PMID:29167655
Innate recognition of water bodies in echolocating bats.
Greif, Stefan; Siemers, Björn M
2010-11-02
In the course of their lives, most animals must find different specific habitat and microhabitat types for survival and reproduction. Yet, in vertebrates, little is known about the sensory cues that mediate habitat recognition. In free flying bats the echolocation of insect-sized point targets is well understood, whereas how they recognize and classify spatially extended echo targets is currently unknown. In this study, we show how echolocating bats recognize ponds or other water bodies that are crucial for foraging, drinking and orientation. With wild bats of 15 different species (seven genera from three phylogenetically distant, large bat families), we found that bats perceived any extended, echo-acoustically smooth surface to be water, even in the presence of conflicting information from other sensory modalities. In addition, naive juvenile bats that had never before encountered a water body showed spontaneous drinking responses from smooth plates. This provides the first evidence for innate recognition of a habitat cue in a mammal.
Wu, S.-S.; Qiu, X.; Usery, E.L.; Wang, L.
2009-01-01
Detailed urban land use data are important to government officials, researchers, and businesspeople for a variety of purposes. This article presents an approach to classifying detailed urban land use based on geometrical, textural, and contextual information of land parcels. An area of 6 by 14 km in Austin, Texas, with land parcel boundaries delineated by the Travis Central Appraisal District of Travis County, Texas, is tested for the approach. We derive fifty parcel attributes from relevant geographic information system (GIS) and remote sensing data and use them to discriminate among nine urban land uses: single family, multifamily, commercial, office, industrial, civic, open space, transportation, and undeveloped. Half of the 33,025 parcels in the study area are used as training data for land use classification and the other half are used as testing data for accuracy assessment. The best result with a decision tree classification algorithm has an overall accuracy of 96 percent and a kappa coefficient of 0.78, and two naive, baseline models based on the majority rule and the spatial autocorrelation rule have overall accuracy of 89 percent and 79 percent, respectively. The algorithm is relatively good at classifying single-family, multifamily, commercial, open space, and undeveloped land uses and relatively poor at classifying office, industrial, civic, and transportation land uses. The most important attributes for land use classification are the geometrical attributes, particularly those related to building areas. Next are the contextual attributes, particularly those relevant to the spatial relationship between buildings, then the textural attributes, particularly the semivariance texture statistic from 0.61-m resolution images.
Silva, Fabrício R; Vidotti, Vanessa G; Cremasco, Fernanda; Dias, Marcelo; Gomi, Edson S; Costa, Vital P
2013-01-01
To evaluate the sensitivity and specificity of machine learning classifiers (MLCs) for glaucoma diagnosis using Spectral Domain OCT (SD-OCT) and standard automated perimetry (SAP). Observational cross-sectional study. Sixty two glaucoma patients and 48 healthy individuals were included. All patients underwent a complete ophthalmologic examination, achromatic standard automated perimetry (SAP) and retinal nerve fiber layer (RNFL) imaging with SD-OCT (Cirrus HD-OCT; Carl Zeiss Meditec Inc., Dublin, California). Receiver operating characteristic (ROC) curves were obtained for all SD-OCT parameters and global indices of SAP. Subsequently, the following MLCs were tested using parameters from the SD-OCT and SAP: Bagging (BAG), Naive-Bayes (NB), Multilayer Perceptron (MLP), Radial Basis Function (RBF), Random Forest (RAN), Ensemble Selection (ENS), Classification Tree (CTREE), Ada Boost M1(ADA),Support Vector Machine Linear (SVML) and Support Vector Machine Gaussian (SVMG). Areas under the receiver operating characteristic curves (aROC) obtained for isolated SAP and OCT parameters were compared with MLCs using OCT+SAP data. Combining OCT and SAP data, MLCs' aROCs varied from 0.777(CTREE) to 0.946 (RAN).The best OCT+SAP aROC obtained with RAN (0.946) was significantly larger the best single OCT parameter (p<0.05), but was not significantly different from the aROC obtained with the best single SAP parameter (p=0.19). Machine learning classifiers trained on OCT and SAP data can successfully discriminate between healthy and glaucomatous eyes. The combination of OCT and SAP measurements improved the diagnostic accuracy compared with OCT data alone.
Brain-Computer Interface Based on Generation of Visual Images
Bobrov, Pavel; Frolov, Alexander; Cantor, Charles; Fedulova, Irina; Bakhnyan, Mikhail; Zhavoronkov, Alexander
2011-01-01
This paper examines the task of recognizing EEG patterns that correspond to performing three mental tasks: relaxation and imagining of two types of pictures: faces and houses. The experiments were performed using two EEG headsets: BrainProducts ActiCap and Emotiv EPOC. The Emotiv headset becomes widely used in consumer BCI application allowing for conducting large-scale EEG experiments in the future. Since classification accuracy significantly exceeded the level of random classification during the first three days of the experiment with EPOC headset, a control experiment was performed on the fourth day using ActiCap. The control experiment has shown that utilization of high-quality research equipment can enhance classification accuracy (up to 68% in some subjects) and that the accuracy is independent of the presence of EEG artifacts related to blinking and eye movement. This study also shows that computationally-inexpensive Bayesian classifier based on covariance matrix analysis yields similar classification accuracy in this problem as a more sophisticated Multi-class Common Spatial Patterns (MCSP) classifier. PMID:21695206
Hand gesture recognition in confined spaces with partial observability and occultation constraints
NASA Astrophysics Data System (ADS)
Shirkhodaie, Amir; Chan, Alex; Hu, Shuowen
2016-05-01
Human activity detection and recognition capabilities have broad applications for military and homeland security. These tasks are very complicated, however, especially when multiple persons are performing concurrent activities in confined spaces that impose significant obstruction, occultation, and observability uncertainty. In this paper, our primary contribution is to present a dedicated taxonomy and kinematic ontology that are developed for in-vehicle group human activities (IVGA). Secondly, we describe a set of hand-observable patterns that represents certain IVGA examples. Thirdly, we propose two classifiers for hand gesture recognition and compare their performance individually and jointly. Finally, we present a variant of Hidden Markov Model for Bayesian tracking, recognition, and annotation of hand motions, which enables spatiotemporal inference to human group activity perception and understanding. To validate our approach, synthetic (graphical data from virtual environment) and real physical environment video imagery are employed to verify the performance of these hand gesture classifiers, while measuring their efficiency and effectiveness based on the proposed Hidden Markov Model for tracking and interpreting dynamic spatiotemporal IVGA scenarios.
Data-driven classification of bipolar I disorder from longitudinal course of mood.
Cochran, A L; McInnis, M G; Forger, D B
2016-10-11
The Diagnostic and Statistical Manual of Mental Disorder (DSM) classification of bipolar disorder defines categories to reflect common understanding of mood symptoms rather than scientific evidence. This work aimed to determine whether bipolar I can be objectively classified from longitudinal mood data and whether resulting classes have clinical associations. Bayesian nonparametric hierarchical models with latent classes and patient-specific models of mood are fit to data from Longitudinal Interval Follow-up Evaluations (LIFE) of bipolar I patients (N=209). Classes are tested for clinical associations. No classes are justified using the time course of DSM-IV mood states. Three classes are justified using the course of subsyndromal mood symptoms. Classes differed in attempted suicides (P=0.017), disability status (P=0.012) and chronicity of affective symptoms (P=0.009). Thus, bipolar I disorder can be objectively classified from mood course, and individuals in the resulting classes share clinical features. Data-driven classification from mood course could be used to enrich sample populations for pharmacological and etiological studies.
Renkema, Kristin R; Li, Gang; Wu, Angela; Smithey, Megan J; Nikolich-Žugich, Janko
2014-01-01
Naive T cell responses are eroded with aging. We and others have recently shown that unimmunized old mice lose ≥ 70% of Ag-specific CD8 T cell precursors and that many of the remaining precursors acquire a virtual (central) memory (VM; CD44(hi)CD62L(hi)) phenotype. In this study, we demonstrate that unimmunized TCR transgenic (TCRTg) mice also undergo massive VM conversion with age, exhibiting rapid effector function upon both TCR and cytokine triggering. Age-related VM conversion in TCRTg mice directly depended on replacement of the original TCRTg specificity by endogenous TCRα rearrangements, indicating that TCR signals must be critical in VM conversion. Importantly, we found that VM conversion had adverse functional effects in both old wild-type and old TCRTg mice; that is, old VM, but not old true naive, T cells exhibited blunted TCR-mediated, but not IL-15-mediated, proliferation. This selective proliferative senescence correlated with increased apoptosis in old VM cells in response to peptide, but decreased apoptosis in response to homeostatic cytokines IL-7 and IL-15. Our results identify TCR as the key factor in differential maintenance and function of Ag-specific precursors in unimmunized mice with aging, and they demonstrate that two separate age-related defects--drastic reduction in true naive T cell precursors and impaired proliferative capacity of their VM cousins--combine to reduce naive T cell responses with aging.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Managlia, Elizabeth Z.; Landay, Alan; Al-Harthi, Lena
2006-07-05
Interleukin (IL)-7 plays several roles critical to T cell maturation, survival, and homeostasis. Because of these functions, IL-7 is under investigation as an immune-modulator for therapeutic use in lymphopenic clinical conditions, including HIV. We reported that naive T cells, typically not permissive to HIV, can be productively infected when pre-treated with IL-7. We evaluated the mechanism by which IL-7-mediates this effect. IL-7 potently up-regulated the transcriptional factor NFAT, but had no effect on NF{kappa}B. Blocking NFAT activity using a number of reagents, such as Cyclosporin A, FK-506, or the NFAT-specific inhibitor known as VIVIT peptide, all markedly reduced IL-7-mediated inductionmore » of HIV replication in naive T cells. Additional neutralization of cytokines present in IL-7-treated cultures and/or those that have NFAT-binding sequences within their promotors indicated that IL-10, IL-4, and most significantly IFN{gamma}, all contribute to IL-7-induction of HIV productive replication in naive T cells. These data clarify the mechanism by which IL-7 can overcome the block to HIV productive infection in naive T cells, despite their quiescent cell status. These findings are relevant to the treatment of HIV disease and understanding HIV pathogenesis in the naive CD4+ T cell compartment, especially in light of the vigorous pursuit of IL-7 as an in vivo immune modulator.« less
Naive T-cell receptor transgenic T cells help memory B cells produce antibody
Duffy, Darragh; Yang, Chun-Ping; Heath, Andrew; Garside, Paul; Bell, Eric B
2006-01-01
Injection of the same antigen following primary immunization induces a classic secondary response characterized by a large quantity of high-affinity antibody of an immunoglobulin G class produced more rapidly than in the initial response – the products of memory B cells are qualitatively distinct from that of the original naive B lymphocytes. Very little is known of the help provided by the CD4 T cells that stimulate memory B cells. Using antigen-specific T-cell receptor transgenic CD4 T cells (DO11.10) as a source of help, we found that naive transgenic T cells stimulated memory B cells almost as well (in terms of quantity and speed) as transgenic T cells that had been recently primed. There was a direct correlation between serum antibody levels and the number of naive transgenic T cells transferred. Using T cells from transgenic interleukin-2-deficient mice we showed that interleukin-2 was not required for a secondary response, although it was necessary for a primary response. The results suggested that the signals delivered by CD4 T cells and required by memory B cells for their activation were common to both antigen-primed and naive CD4 T cells. PMID:17067314
Alves, Nuno L; Hooibrink, Berend; Arosa, Fernando A; van Lier, René A W
2003-10-01
Recent studies in mice have shown that although interleukin 15 (IL-15) plays an important role in regulating homeostasis of memory CD8+ T cells, it has no apparent function in controlling homeostatic proliferation of naive T cells. We here assessed the influence of IL-15 on antigen-independent expansion and differentiation of human CD8+ T cells. Both naive and primed human T cells divided in response to IL-15. In this process, naive CD8+ T cells successively down-regulated CD45RA and CD28 but maintained CD27 expression. Concomitant with these phenotypic changes, naive cells acquired the ability to produce interferon gamma (IFN-gamma) and tumor necrosis factor alpha (TNF-alpha), expressed perforin and granzyme B, and acquired cytotoxic properties. Primed CD8+ T cells, from both noncytotoxic (CD45RA-CD27+) and cytotoxic (CD45RA+CD27-) subsets, responded to IL-15 and yielded ample numbers of cytokine-secreting and cytotoxic effector cells. In summary, all human CD8+ T-cell subsets had the ability to respond to IL-15, which suggests a generic influence of this cytokine on CD8+ T-cell homeostasis in man.
Akiyama, K; Tsuchida, K; Kanzaki, A; Ujike, H; Hamamura, T; Kondo, K; Mutoh, S; Miyanagi, K; Kuroda, S; Otsuki, S
1995-11-15
Plasma homovanillic acid (pHVA) levels were measured and the Brief Psychiatric Rating Scale (BPRS) scores were evaluated in 26 schizophrenic patients who had either never been medicated (neuroleptic-naive, first-episode subjects) or whose condition had become exacerbated following neuroleptic discontinuance (exacerbated subjects). All the subjects received medication with a fixed dose of a neuroleptic (haloperidol or fluphenazine, both 9 mg/day) for the first week and variable doses for the subsequent 4 weeks. In the neuroleptic-naive subjects, pHVA levels increased significantly 1 week after starting the protocol; this increase correlated significantly with clinical improvement of the BPRS positive symptom scores at week 5. In the neuroleptic-naive subjects, pHVA levels had declined to the baseline level by week 5. In the exacerbated subjects, there were no significant correlations between pHVA level changes at week 1 and later improvements of the BPRS positive symptom scores. These results suggest that the rise in pHVA levels occurring within 1 week after starting a fixed neuroleptic dose may predict a favorable clinical response in neuroleptic-naive schizophrenic patients.
Assessment and Classification of Service Learning: A Case Study of CS/EE Students
Wang, Yu-Tseng; Lai, Pao-Lien; Chen, Jen-Yeu
2014-01-01
This study investigates the undergraduate students in computer science/electric engineering (CS/EE) in Taiwan to measure their perceived benefits from the experiences in service learning coursework. In addition, the confidence of their professional disciplines and its correlation with service learning experiences are examined. The results show that students take positive attitudes toward service learning and their perceived benefits from service learning are correlated with their confidence in professional disciplines. Furthermore, this study designs the knowledge model by Bayesian network (BN) classifiers and term frequency-inverse document frequency (TFIDF) for counseling students on the optimal choice of service learning. PMID:25295294
Classifying and Tracking Dust Plumes from Passive Remote Sensing
NASA Astrophysics Data System (ADS)
Bachl, Fabian E.; Garbe, Christoph S.
2012-03-01
Recent studies emphasize the role mineral dust aerosols play in terms of the earth's climate system, its radiation budget and microbial nutrition cycles. In order to gain further insight into the genesis and long term characteristics of dust events, processing setellite imagery is inevitable. We propose a fully Bayesian multispectral classification method that significantly facilitates this task. Using MSG-SEVIRI imagery we show that our technique allows to extract dust activity well enough to pave the way for a tracking scheme. Based on this procedure we derive an approach to identify regions that are likely to be the origin of emerging dust plumes.
Walker, Martin; Basáñez, María-Gloria; Ouédraogo, André Lin; Hermsen, Cornelus; Bousema, Teun; Churcher, Thomas S
2015-01-16
Quantitative molecular methods (QMMs) such as quantitative real-time polymerase chain reaction (q-PCR), reverse-transcriptase PCR (qRT-PCR) and quantitative nucleic acid sequence-based amplification (QT-NASBA) are increasingly used to estimate pathogen density in a variety of clinical and epidemiological contexts. These methods are often classified as semi-quantitative, yet estimates of reliability or sensitivity are seldom reported. Here, a statistical framework is developed for assessing the reliability (uncertainty) of pathogen densities estimated using QMMs and the associated diagnostic sensitivity. The method is illustrated with quantification of Plasmodium falciparum gametocytaemia by QT-NASBA. The reliability of pathogen (e.g. gametocyte) densities, and the accompanying diagnostic sensitivity, estimated by two contrasting statistical calibration techniques, are compared; a traditional method and a mixed model Bayesian approach. The latter accounts for statistical dependence of QMM assays run under identical laboratory protocols and permits structural modelling of experimental measurements, allowing precision to vary with pathogen density. Traditional calibration cannot account for inter-assay variability arising from imperfect QMMs and generates estimates of pathogen density that have poor reliability, are variable among assays and inaccurately reflect diagnostic sensitivity. The Bayesian mixed model approach assimilates information from replica QMM assays, improving reliability and inter-assay homogeneity, providing an accurate appraisal of quantitative and diagnostic performance. Bayesian mixed model statistical calibration supersedes traditional techniques in the context of QMM-derived estimates of pathogen density, offering the potential to improve substantially the depth and quality of clinical and epidemiological inference for a wide variety of pathogens.
Modular Bayesian Networks with Low-Power Wearable Sensors for Recognizing Eating Activities
Kim, Kee-Hoon
2017-01-01
Recently, recognizing a user’s daily activity using a smartphone and wearable sensors has become a popular issue. However, in contrast with the ideal definition of an experiment, there could be numerous complex activities in real life with respect to its various background and contexts: time, space, age, culture, and so on. Recognizing these complex activities with limited low-power sensors, considering the power and memory constraints of the wearable environment and the user’s obtrusiveness at once is not an easy problem, although it is very crucial for the activity recognizer to be practically useful. In this paper, we recognize activity of eating, which is one of the most typical examples of a complex activity, using only daily low-power mobile and wearable sensors. To organize the related contexts systemically, we have constructed the context model based on activity theory and the “Five W’s”, and propose a Bayesian network with 88 nodes to predict uncertain contexts probabilistically. The structure of the proposed Bayesian network is designed by a modular and tree-structured approach to reduce the time complexity and increase the scalability. To evaluate the proposed method, we collected the data with 10 different activities from 25 volunteers of various ages, occupations, and jobs, and have obtained 79.71% accuracy, which outperforms other conventional classifiers by 7.54–14.4%. Analyses of the results showed that our probabilistic approach could also give approximate results even when one of contexts or sensor values has a very heterogeneous pattern or is missing. PMID:29232937
Arbitrary norm support vector machines.
Huang, Kaizhu; Zheng, Danian; King, Irwin; Lyu, Michael R
2009-02-01
Support vector machines (SVM) are state-of-the-art classifiers. Typically L2-norm or L1-norm is adopted as a regularization term in SVMs, while other norm-based SVMs, for example, the L0-norm SVM or even the L(infinity)-norm SVM, are rarely seen in the literature. The major reason is that L0-norm describes a discontinuous and nonconvex term, leading to a combinatorially NP-hard optimization problem. In this letter, motivated by Bayesian learning, we propose a novel framework that can implement arbitrary norm-based SVMs in polynomial time. One significant feature of this framework is that only a sequence of sequential minimal optimization problems needs to be solved, thus making it practical in many real applications. The proposed framework is important in the sense that Bayesian priors can be efficiently plugged into most learning methods without knowing the explicit form. Hence, this builds a connection between Bayesian learning and the kernel machines. We derive the theoretical framework, demonstrate how our approach works on the L0-norm SVM as a typical example, and perform a series of experiments to validate its advantages. Experimental results on nine benchmark data sets are very encouraging. The implemented L0-norm is competitive with or even better than the standard L2-norm SVM in terms of accuracy but with a reduced number of support vectors, -9.46% of the number on average. When compared with another sparse model, the relevance vector machine, our proposed algorithm also demonstrates better sparse properties with a training speed over seven times faster.
Using Bayes Model Averaging for Wind Power Forecasts
NASA Astrophysics Data System (ADS)
Preede Revheim, Pål; Beyer, Hans Georg
2014-05-01
For operational purposes predictions of the forecasts of the lumped output of groups of wind farms spread over larger geographic areas will often be of interest. A naive approach is to make forecasts for each individual site and sum them up to get the group forecast. It is however well documented that a better choice is to use a model that also takes advantage of spatial smoothing effects. It might however be the case that some sites tends to more accurately reflect the total output of the region, either in general or for certain wind directions. It will then be of interest giving these a greater influence over the group forecast. Bayesian model averaging (BMA) is a statistical post-processing method for producing probabilistic forecasts from ensembles. Raftery et al. [1] show how BMA can be used for statistical post processing of forecast ensembles, producing PDFs of future weather quantities. The BMA predictive PDF of a future weather quantity is a weighted average of the ensemble members' PDFs, where the weights can be interpreted as posterior probabilities and reflect the ensemble members' contribution to overall forecasting skill over a training period. In Revheim and Beyer [2] the BMA procedure used in Sloughter, Gneiting and Raftery [3] were found to produce fairly accurate PDFs for the future mean wind speed of a group of sites from the single sites wind speeds. However, when the procedure was attempted applied to wind power it resulted in either problems with the estimation of the parameters (mainly caused by longer consecutive periods of no power production) or severe underestimation (mainly caused by problems with reflecting the power curve). In this paper the problems that arose when applying BMA to wind power forecasting is met through two strategies. First, the BMA procedure is run with a combination of single site wind speeds and single site wind power production as input. This solves the problem with longer consecutive periods where the input data does not contain information, but it has the disadvantage of nearly doubling the number of model parameters to be estimated. Second, the BMA procedure is run with group mean wind power as the response variable instead of group mean wind speed. This also solves the problem with longer consecutive periods without information in the input data, but it leaves the power curve to also be estimated from the data. [1] Raftery, A. E., et al. (2005). Using Bayesian Model Averaging to Calibrate Forecast Ensembles. Monthly Weather Review, 133, 1155-1174. [2]Revheim, P. P. and H. G. Beyer (2013). Using Bayesian Model Averaging for wind farm group forecasts. EWEA Wind Power Forecasting Technology Workshop,Rotterdam, 4-5 December 2013. [3]Sloughter, J. M., T. Gneiting and A. E. Raftery (2010). Probabilistic Wind Speed Forecasting Using Ensembles and Bayesian Model Averaging. Journal of the American Statistical Association, Vol. 105, No. 489, 25-35
Ben-Ari, Yehezkel; Crepel, Valérie; Represa, Alfonso
2008-01-01
Do temporal lobe epilepsy (TLE) seizures in adults promote further seizures? Clinical and experimental data suggest that new synapses are formed after an initial episode of status epilepticus, however their contribution to the transformation of a naive network to an epileptogenic one has been debated. Recent experimental data show that newly formed aberrant excitatory synapses on the granule cells of the fascia dentate operate by means of kainate receptor-operated signals that are not present on naive granule cells. Therefore, genuine epileptic networks rely on signaling cascades that differentiate them from naive networks. Recurrent limbic seizures generated by the activation of kainate receptors and synapses in naive animals lead to the formation of novel synapses that facilitate the emergence of further seizures. This negative, vicious cycle illustrates the central role of reactive plasticity in neurological disorders.
Diagnosis of Tempromandibular Disorders Using Local Binary Patterns
Haghnegahdar, A.A.; Kolahi, S.; Khojastepour, L.; Tajeripour, F.
2018-01-01
Background: Temporomandibular joint disorder (TMD) might be manifested as structural changes in bone through modification, adaptation or direct destruction. We propose to use Local Binary Pattern (LBP) characteristics and histogram-oriented gradients on the recorded images as a diagnostic tool in TMD assessment. Material and Methods: CBCT images of 66 patients (132 joints) with TMD and 66 normal cases (132 joints) were collected and 2 coronal cut prepared from each condyle, although images were limited to head of mandibular condyle. In order to extract features of images, first we use LBP and then histogram of oriented gradients. To reduce dimensionality, the linear algebra Singular Value Decomposition (SVD) is applied to the feature vectors matrix of all images. For evaluation, we used K nearest neighbor (K-NN), Support Vector Machine, Naïve Bayesian and Random Forest classifiers. We used Receiver Operating Characteristic (ROC) to evaluate the hypothesis. Results: K nearest neighbor classifier achieves a very good accuracy (0.9242), moreover, it has desirable sensitivity (0.9470) and specificity (0.9015) results, when other classifiers have lower accuracy, sensitivity and specificity. Conclusion: We proposed a fully automatic approach to detect TMD using image processing techniques based on local binary patterns and feature extraction. K-NN has been the best classifier for our experiments in detecting patients from healthy individuals, by 92.42% accuracy, 94.70% sensitivity and 90.15% specificity. The proposed method can help automatically diagnose TMD at its initial stages. PMID:29732343
Guided filter and convolutional network based tracking for infrared dim moving target
NASA Astrophysics Data System (ADS)
Qian, Kun; Zhou, Huixin; Qin, Hanlin; Rong, Shenghui; Zhao, Dong; Du, Juan
2017-09-01
The dim moving target usually submerges in strong noise, and its motion observability is debased by numerous false alarms for low signal-to-noise ratio. A tracking algorithm that integrates the Guided Image Filter (GIF) and the Convolutional neural network (CNN) into the particle filter framework is presented to cope with the uncertainty of dim targets. First, the initial target template is treated as a guidance to filter incoming templates depending on similarities between the guidance and candidate templates. The GIF algorithm utilizes the structure in the guidance and performs as an edge-preserving smoothing operator. Therefore, the guidance helps to preserve the detail of valuable templates and makes inaccurate ones blurry, alleviating the tracking deviation effectively. Besides, the two-layer CNN method is adopted to obtain a powerful appearance representation. Subsequently, a Bayesian classifier is trained with these discriminative yet strong features. Moreover, an adaptive learning factor is introduced to prevent the update of classifier's parameters when a target undergoes sever background. At last, classifier responses of particles are utilized to generate particle importance weights and a re-sample procedure preserves samples according to the weight. In the predication stage, a 2-order transition model considers the target velocity to estimate current position. Experimental results demonstrate that the presented algorithm outperforms several relative algorithms in the accuracy.
Advances in Doppler recognition for ground moving target indication
NASA Astrophysics Data System (ADS)
Kealey, Paul G.; Jahangir, Mohammed
2006-05-01
Ground Moving Target Indication (GMTI) radar provides a day/night, all-weather, wide-area surveillance capability to detect moving vehicles and personnel. Current GMTI radar sensors are limited to only detecting and tracking targets. The exploitation of GMTI data would be greatly enhanced by a capability to recognize accurately the detections as significant classes of target. Doppler classification exploits the differential internal motion of targets, e.g. due to the tracks, limbs and rotors. Recently, the QinetiQ Bayesian Doppler classifier has been extended to include a helicopter class in addition to wheeled, tracked and personnel classes. This paper presents the performance for these four classes using a traditional low-resolution GMTI surveillance waveform with an experimental radar system. We have determined the utility of an "unknown output decision" for enhancing the accuracy of the declared target classes. A confidence method has been derived, using a threshold of the difference in certainties, to assign uncertain classifications into an "unknown class". The trade-off between fraction of targets declared and accuracy of the classifier has been measured. To determine the operating envelope of a Doppler classification algorithm requires a detailed understanding of the Signal-to-Noise Ratio (SNR) performance of the algorithm. In this study the SNR dependence of the QinetiQ classifier has been determined.
Mercury⊕: An evidential reasoning image classifier
NASA Astrophysics Data System (ADS)
Peddle, Derek R.
1995-12-01
MERCURY⊕ is a multisource evidential reasoning classification software system based on the Dempster-Shafer theory of evidence. The design and implementation of this software package is described for improving the classification and analysis of multisource digital image data necessary for addressing advanced environmental and geoscience applications. In the remote-sensing context, the approach provides a more appropriate framework for classifying modern, multisource, and ancillary data sets which may contain a large number of disparate variables with different statistical properties, scales of measurement, and levels of error which cannot be handled using conventional Bayesian approaches. The software uses a nonparametric, supervised approach to classification, and provides a more objective and flexible interface to the evidential reasoning framework using a frequency-based method for computing support values from training data. The MERCURY⊕ software package has been implemented efficiently in the C programming language, with extensive use made of dynamic memory allocation procedures and compound linked list and hash-table data structures to optimize the storage and retrieval of evidence in a Knowledge Look-up Table. The software is complete with a full user interface and runs under Unix, Ultrix, VAX/VMS, MS-DOS, and Apple Macintosh operating system. An example of classifying alpine land cover and permafrost active layer depth in northern Canada is presented to illustrate the use and application of these ideas.
Classification of Maize and Weeds by Bayesian Networks
NASA Astrophysics Data System (ADS)
Chapron, Michel; Oprea, Alina; Sultana, Bogdan; Assemat, Louis
2007-11-01
Precision Agriculture is concerned with all sorts of within-field variability, spatially and temporally, that reduces the efficacy of agronomic practices applied in a uniform way all over the field. Because of these sources of heterogeneity, uniform management actions strongly reduce the efficiency of the resource input to the crop (i.e. fertilization, water) or for the agrochemicals use for pest control (i.e. herbicide). Moreover, this low efficacy means high environmental cost (pollution) and reduced economic return for the farmer. Weed plants are one of these sources of variability for the crop, as they occur in patches in the field. Detecting the location, size and internal density of these patches, along with identification of main weed species involved, open the way to a site-specific weed control strategy, where only patches of weeds would receive the appropriate herbicide (type and dose). Herein, an automatic recognition method of vegetal species is described. First, the pixels of soil and vegetation are classified in two classes, then the vegetation part of the input image is segmented from the distance image by using the watershed method and finally the leaves of the vegetation are partitioned in two parts maize and weeds thanks to the two Bayesian networks.
Ramanujam, Nedunchelian; Kaliappan, Manivannan
2016-01-01
Nowadays, automatic multidocument text summarization systems can successfully retrieve the summary sentences from the input documents. But, it has many limitations such as inaccurate extraction to essential sentences, low coverage, poor coherence among the sentences, and redundancy. This paper introduces a new concept of timestamp approach with Naïve Bayesian Classification approach for multidocument text summarization. The timestamp provides the summary an ordered look, which achieves the coherent looking summary. It extracts the more relevant information from the multiple documents. Here, scoring strategy is also used to calculate the score for the words to obtain the word frequency. The higher linguistic quality is estimated in terms of readability and comprehensibility. In order to show the efficiency of the proposed method, this paper presents the comparison between the proposed methods with the existing MEAD algorithm. The timestamp procedure is also applied on the MEAD algorithm and the results are examined with the proposed method. The results show that the proposed method results in lesser time than the existing MEAD algorithm to execute the summarization process. Moreover, the proposed method results in better precision, recall, and F-score than the existing clustering with lexical chaining approach. PMID:27034971
Strong Bayesian evidence for the normal neutrino hierarchy
NASA Astrophysics Data System (ADS)
Simpson, Fergus; Jimenez, Raul; Pena-Garay, Carlos; Verde, Licia
2017-06-01
The configuration of the three neutrino masses can take two forms, known as the normal and inverted hierarchies. We compute the Bayesian evidence associated with these two hierarchies. Previous studies found a mild preference for the normal hierarchy, and this was driven by the asymmetric manner in which cosmological data has confined the available parameter space. Here we identify the presence of a second asymmetry, which is imposed by data from neutrino oscillations. By combining constraints on the squared-mass splittings [1] with the limit on the sum of neutrino masses of Σmν < 0.13 eV [2], and using a minimally informative prior on the masses, we infer odds of 42:1 in favour of the normal hierarchy, which is classified as "strong" in the Jeffreys' scale. We explore how these odds may evolve in light of higher precision cosmological data, and discuss the implications of this finding with regards to the nature of neutrinos. Finally the individual masses are inferred to be m1=3.80+26.2-3.73meV; m2=8.8+18-1.2meV; m3=50.4+5.8-1.2meV (95% credible intervals).
Assessment of accident severity in the construction industry using the Bayesian theorem.
Alizadeh, Seyed Shamseddin; Mortazavi, Seyed Bagher; Mehdi Sepehri, Mohammad
2015-01-01
Construction is a major source of employment in many countries. In construction, workers perform a great diversity of activities, each one with a specific associated risk. The aim of this paper is to identify workers who are at risk of accidents with severe consequences and classify these workers to determine appropriate control measures. We defined 48 groups of workers and used the Bayesian theorem to estimate posterior probabilities about the severity of accidents at the level of individuals in construction sector. First, the posterior probabilities of injuries based on four variables were provided. Then the probabilities of injury for 48 groups of workers were determined. With regard to marginal frequency of injury, slight injury (0.856), fatal injury (0.086) and severe injury (0.058) had the highest probability of occurrence. It was observed that workers with <1 year's work experience (0.168) had the highest probability of injury occurrence. The first group of workers, who were extensively exposed to risk of severe and fatal accidents, involved workers ≥ 50 years old, married, with 1-5 years' work experience, who had no past accident experience. The findings provide a direction for more effective safety strategies and occupational accident prevention and emergency programmes.
Experience modulates the influence of gonadal hormones on sexual orientation of male rats.
Matuszczyk, J V; Larsson, K
1994-03-01
The aim of this study was to investigate whether heterosexual experience influences the male's partner preference and the role of gonadal hormones, including several androgens and estradiol, on female-oriented behavior. In the first experiment, the animals were exposed to a four-stimuli test situation including a sexually active male, a castrated male, an estrous female, and an ovariectomized (ovx) female. Castrated naive and experienced male rats were implanted with either testosterone (T), estradiol (E2), or an empty tubing (blank) and compared to intact naive and experienced male rats. None of the naive animals, whether castrated or intact, showed a consistent preference for any of the four stimuli. Whereas when sexually experienced, only intact and T-treated males showed a female-oriented preference. In the second experiment, the animals were allowed to choose between an active male and an estrous female. Castrated naive male rats were implanted with either T, E2, or DHT, or injected daily SC with the synthetic nonaromatizable androgen, methyltrienelone (R 1881). Two groups of intact males, one consisting of experienced and the other of naive animals, were also included in this experiment. The experienced intact males showed a significant preference for the estrous female, while the intact naive males showed no preference for either of the two stimuli. After the animals had gained heterosexual experience, intact and androgen-treated males showed a significant preference for the female. Neither the administration of R 1881 nor E2 promoted a female-oriented behavior.(ABSTRACT TRUNCATED AT 250 WORDS)
The Hayflick Limit May Determine the Effective Clonal Diversity of Naive T Cells.
Ndifon, Wilfred; Dushoff, Jonathan
2016-06-15
Having a large number of sufficiently abundant T cell clones is important for adequate protection against diseases. However, as shown in this paper and elsewhere, between young adulthood and >70 y of age the effective clonal diversity of naive CD4/CD8 T cells found in human blood declines by a factor of >10. (Effective clonal diversity accounts for both the number and the abundance of T cell clones.) The causes of this observation are incompletely understood. A previous study proposed that it might result from the emergence of certain rare, replication-enhancing mutations in T cells. In this paper, we propose an even simpler explanation: that it results from the loss of T cells that have attained replicative senescence (i.e., the Hayflick limit). Stochastic numerical simulations of naive T cell population dynamics, based on experimental parameters, show that the rate of homeostatic T cell proliferation increases after the age of ∼60 y because naive T cells collectively approach replicative senescence. This leads to a sharp decline of effective clonal diversity after ∼70 y, in agreement with empirical data. A mathematical analysis predicts that, without an increase in the naive T cell proliferation rate, this decline will occur >50 yr later than empirically observed. These results are consistent with a model in which exhaustion of the proliferative capacity of naive T cells causes a sharp decline of their effective clonal diversity and imply that therapeutic potentiation of thymopoiesis might either prevent or reverse this outcome. Copyright © 2016 by The American Association of Immunologists, Inc.
El Bouzidi, Kate; White, Ellen; Mbisa, Jean L; Sabin, Caroline A; Phillips, Andrew N; Mackie, Nicola; Pozniak, Anton L; Tostevin, Anna; Pillay, Deenan; Dunn, David T
2016-12-01
Darunavir is considered to have a high genetic barrier to resistance. Most darunavir-associated drug resistance mutations (DRMs) have been identified through correlation of baseline genotype with virological response in clinical trials. However, there is little information on DRMs that are directly selected by darunavir in clinical settings. We examined darunavir DRMs emerging in clinical practice in the UK. Baseline and post-exposure protease genotypes were compared for individuals in the UK Collaborative HIV Cohort Study who had received darunavir; analyses were stratified for PI history. A selection analysis was used to compare the evolution of subtype B proteases in darunavir recipients and matched PI-naive controls. Of 6918 people who had received darunavir, 386 had resistance tests pre- and post-exposure. Overall, 2.8% (11/386) of these participants developed emergent darunavir DRMs. The prevalence of baseline DRMs was 1.0% (2/198) among PI-naive participants and 13.8% (26/188) among PI-experienced participants. Emergent DRMs developed in 2.0% of the PI-naive group (4 mutations) and 3.7% of the PI-experienced group (12 mutations). Codon 77 was positively selected in the PI-naive darunavir cases, but not in the control group. Our findings suggest that although emergent darunavir resistance is rare, it may be more common among PI-experienced patients than those who are PI-naive. Further investigation is required to explore whether codon 77 is a novel site involved in darunavir susceptibility. © The Author 2016. Published by Oxford University Press on behalf of the British Society for Antimicrobial Chemotherapy.
Limsreng, Setha; Him, Sovanvatey; Nouhin, Janin; Hak, Chanroeurn; Srun, Chanvatey; Viretto, Gerald; Ouk, Vara; Delfraissy, Jean Francois; Ségéral, Olivier
2014-01-01
Introduction In resource limited settings, patients entering an antiretroviral therapy (ART) program comprise ART naive and ART pre-treated patients who may show differential virological outcomes. Methods This retrospective study, conducted in 2010–2012 in the HIV clinic of Calmette Hospital located in Phnom Penh (Cambodia) assessed virological failure (VF) rates and patterns of drug resistance of naive and pre-treated patients. Naive and ART pre-treated patients were included when a Viral Load (VL) was performed during the first year of ART for naive subjects or at the first consultation for pre-treated individuals. Patients showing Virological failure (VF) (>1,000 copies/ml) underwent HIV DR genotyping testing. Interpretation of drug resistance mutations was done according to 2013 version 23 ANRS algorithms. Results On a total of 209 patients, 164 (78.4%) were naive and 45 (21.5%) were ART pre-treated. Their median initial CD4 counts were 74 cells/mm3 (IQR: 30–194) and 279 cells/mm3 (IQR: 103–455) (p<0.001), respectively. Twenty seven patients (12.9%) exhibited VF (95% CI: 8.6–18.2%), including 10 naive (10/164, 6.0%) and 17 pre-treated (17/45, 37.8%) patients (p<0.001). Among these viremic patients, twenty-two (81.4%) were sequenced in reverse transcriptase and protease coding regions. Overall, 19 (86.3%) harbored ≥1 drug resistance mutations (DRMs) whereas 3 (all belonging to pre-treated patients) harbored wild-types viruses. The most frequent DRMs were M184V (86.3%), K103N (45.5%) and thymidine analog mutations (TAMs) (40.9%). Two (13.3%) pre-treated patients harbored viruses that showed a multi-nucleos(t)ide resistance including Q151M, K65R, E33A/D, E44A/D mutations. Conclusion In Cambodia, VF rates were low for naive patients but the emergence of DRMs to NNRTI and 3TC occurred relatively quickly in this subgroup. In pre-treated patients, VF rates were much higher and TAMs were relatively common. HIV genotypic assays before ART initiation and for ART pre-treated patients infection should be considered as well. PMID:25166019
NASA Astrophysics Data System (ADS)
Strolger, Louis-Gregory; Porter, Sophia; Lagerstrom, Jill; Weissman, Sarah; Reid, I. Neill; Garcia, Michael
2017-04-01
The Proposal Auto-Categorizer and Manager (PACMan) tool was written to respond to concerns about subjective flaws and potential biases in some aspects of the proposal review process for time allocation for the Hubble Space Telescope (HST), and to partially alleviate some of the anticipated additional workload from the James Webb Space Telescope (JWST) proposal review. PACMan is essentially a mixed-method Naive Bayesian spam filtering routine, with multiple pools representing scientific categories, that utilizes the Robinson method for combining token (or word) probabilities. PACMan was trained to make similar programmatic decisions in science category sorting, panelist selection, and proposal-to-panelists assignments to those made by individuals and committees in the Science Policies Group (SPG) at the Space Telescope Science Institute. Based on training from the previous cycle’s proposals, at an average of 87%, PACMan made the same science category assignments for proposals in Cycle 24 as the SPG. Tests for similar science categorizations, based on training using proposals from additional cycles, show that this accuracy can be further improved, to the > 95 % level. This tool will be used to augment or replace key functions in the Time Allocation Committee review processes in future HST and JWST cycles.
SECURE INTERNET OF THINGS-BASED CLOUD FRAMEWORK TO CONTROL ZIKA VIRUS OUTBREAK.
Sareen, Sanjay; Sood, Sandeep K; Gupta, Sunil Kumar
2017-01-01
Zika virus (ZikaV) is currently one of the most important emerging viruses in the world which has caused outbreaks and epidemics and has also been associated with severe clinical manifestations and congenital malformations. Traditional approaches to combat the ZikaV outbreak are not effective for detection and control. The aim of this study is to propose a cloud-based system to prevent and control the spread of Zika virus disease using integration of mobile phones and Internet of Things (IoT). A Naive Bayesian Network (NBN) is used to diagnose the possibly infected users, and Google Maps Web service is used to provide the geographic positioning system (GPS)-based risk assessment to prevent the outbreak. It is used to represent each ZikaV infected user, mosquito-dense sites, and breeding sites on the Google map that helps the government healthcare authorities to control such risk-prone areas effectively and efficiently. The performance and accuracy of the proposed system are evaluated using dataset for 2 million users. Our system provides high accuracy for initial diagnosis of different users according to their symptoms and appropriate GPS-based risk assessment. The cloud-based proposed system contributed to the accurate NBN-based classification of infected users and accurate identification of risk-prone areas using Google Maps.
A New Tool for Classifying Small Solar System Objects
NASA Astrophysics Data System (ADS)
Desfosses, Ryan; Arel, D.; Walker, M. E.; Ziffer, J.; Harvell, T.; Campins, H.; Fernandez, Y. R.
2011-05-01
An artificial intelligence program, AutoClass, which was developed by NASA's Artificial Intelligence Branch, uses Bayesian classification theory to automatically choose the most probable classification distribution to describe a dataset. To investigate its usefulness to the Planetary Science community, we tested its ability to reproduce the taxonomic classes as defined by Tholen and Barucci (1989). Of the 406 asteroids from the Eight Color Asteroid Survey (ECAS) we chose for our test, 346 were firmly classified and all but 3 (<1%) were classified by Autoclass as they had been in the previous classification system (Walker et al., 2011). We are now applying it to larger datasets to improve the taxonomy of currently unclassified objects. Having demonstrated AutoClass's ability to recreate existing classification effectively, we extended this work to investigations of albedo-based classification systems. To determine how predictive albedo can be, we used data from the Infrared Astronomical Satellite (IRAS) database in conjunction with the large Sloan Digital Sky Survey (SDSS), which contains color and position data for over 200,000 classified and unclassified asteroids (Ivesic et al., 2001). To judge our success we compared our results with a similar approach to classifying objects using IRAS albedo and asteroid color by Tedesco et al. (1989). Understanding the distribution of the taxonomic classes is important to understanding the history and evolution of our Solar System. AutoClass's success in categorizing ECAS, IRAS and SDSS asteroidal data highlights its potential to scan large domains for natural classes in small solar system objects. Based upon our AutoClass results, we intend to make testable predictions about asteroids observed with the Wide-field Infrared Survey Explorer (WISE).
Cain, Lauren E; Phillips, Andrew; Lodi, Sara; Sabin, Caroline; Bansi, Loveleen; Justice, Amy; Tate, Janet; Logan, Roger; Robins, James M; Sterne, Jonathan A C; van Sighem, Ard; de Wolf, Frank; Bucher, Heiner C; Elzi, Luigia; Touloumi, Giota; Vourli, Georgia; Esteve, Anna; Casabona, Jordi; del Amo, Julia; Moreno, Santiago; Seng, Rémonie; Meyer, Laurence; Pérez-Hoyos, Santiago; Muga, Roberto; Abgrall, Sophie; Costagliola, Dominique; Hernán, Miguel A
2012-08-24
To compare regimens consisting of either efavirenz or nevirapine and two or more nucleoside reverse transcriptase inhibitors (NRTIs) among HIV-infected, antiretroviral-naive, and AIDS-free individuals with respect to clinical, immunologic, and virologic outcomes. Prospective studies of HIV-infected individuals in Europe and the US included in the HIV-CAUSAL Collaboration. Antiretroviral therapy-naive and AIDS-free individuals were followed from the time they started an NRTI, efavirenz or nevirapine, classified as following one or both types of regimens at baseline, and censored when they started an ineligible drug or at 6 months if their regimen was not yet complete. We estimated the 'intention-to-treat' effect for nevirapine versus efavirenz regimens on clinical, immunologic, and virologic outcomes. Our models included baseline covariates and adjusted for potential bias introduced by censoring via inverse probability weighting. A total of 15 336 individuals initiated an efavirenz regimen (274 deaths, 774 AIDS-defining illnesses) and 8129 individuals initiated a nevirapine regimen (203 deaths, 441 AIDS-defining illnesses). The intention-to-treat hazard ratios [95% confidence interval (CI)] for nevirapine versus efavirenz regimens were 1.59 (1.27, 1.98) for death and 1.28 (1.09, 1.50) for AIDS-defining illness. Individuals on nevirapine regimens experienced a smaller 12-month increase in CD4 cell count by 11.49 cells/μl and were 52% more likely to have virologic failure at 12 months as those on efavirenz regimens. Our intention-to-treat estimates are consistent with a lower mortality, a lower incidence of AIDS-defining illness, a larger 12-month increase in CD4 cell count, and a smaller risk of virologic failure at 12 months for efavirenz compared with nevirapine. © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins
Bak, N; Ebdrup, B H; Oranje, B; Fagerlund, B; Jensen, M H; Düring, S W; Nielsen, M Ø; Glenthøj, B Y; Hansen, L K
2017-01-01
Deficits in information processing and cognition are among the most robust findings in schizophrenia patients. Previous efforts to translate group-level deficits into clinically relevant and individualized information have, however, been non-successful, which is possibly explained by biologically different disease subgroups. We applied machine learning algorithms on measures of electrophysiology and cognition to identify potential subgroups of schizophrenia. Next, we explored subgroup differences regarding treatment response. Sixty-six antipsychotic-naive first-episode schizophrenia patients and sixty-five healthy controls underwent extensive electrophysiological and neurocognitive test batteries. Patients were assessed on the Positive and Negative Syndrome Scale (PANSS) before and after 6 weeks of monotherapy with the relatively selective D2 receptor antagonist, amisulpride (280.3±159 mg per day). A reduced principal component space based on 19 electrophysiological variables and 26 cognitive variables was used as input for a Gaussian mixture model to identify subgroups of patients. With support vector machines, we explored the relation between PANSS subscores and the identified subgroups. We identified two statistically distinct subgroups of patients. We found no significant baseline psychopathological differences between these subgroups, but the effect of treatment in the groups was predicted with an accuracy of 74.3% (P=0.003). In conclusion, electrophysiology and cognition data may be used to classify subgroups of schizophrenia patients. The two distinct subgroups, which we identified, were psychopathologically inseparable before treatment, yet their response to dopaminergic blockade was predicted with significant accuracy. This proof of principle encourages further endeavors to apply data-driven, multivariate and multimodal models to facilitate progress from symptom-based psychiatry toward individualized treatment regimens. PMID:28398342
Jiang, Weiqin; Shen, Yifei; Ding, Yongfeng; Ye, Chuyu; Zheng, Yi; Zhao, Peng; Liu, Lulu; Tong, Zhou; Zhou, Linfu; Sun, Shuo; Zhang, Xingchen; Teng, Lisong; Timko, Michael P; Fan, Longjiang; Fang, Weijia
2018-01-15
Synchronous multifocal tumors are common in the hepatobiliary and pancreatic system but because of similarities in their histological features, oncologists have difficulty in identifying their precise tissue clonal origin through routine histopathological methods. To address this problem and assist in more precise diagnosis, we developed a computational approach for tissue origin diagnosis based on naive Bayes algorithm (TOD-Bayes) using ubiquitous RNA-Seq data. Massive tissue-specific RNA-Seq data sets were first obtained from The Cancer Genome Atlas (TCGA) and ∼1,000 feature genes were used to train and validate the TOD-Bayes algorithm. The accuracy of the model was >95% based on tenfold cross validation by the data from TCGA. A total of 18 clinical cancer samples (including six negative controls) with definitive tissue origin were subsequently used for external validation and 17 of the 18 samples were classified correctly in our study (94.4%). Furthermore, we included as cases studies seven tumor samples, taken from two individuals who suffered from synchronous multifocal tumors across tissues, where the efforts to make a definitive primary cancer diagnosis by traditional diagnostic methods had failed. Using our TOD-Bayes analysis, the two clinical test cases were successfully diagnosed as pancreatic cancer (PC) and cholangiocarcinoma (CC), respectively, in agreement with their clinical outcomes. Based on our findings, we believe that the TOD-Bayes algorithm is a powerful novel methodology to accurately identify the tissue origin of synchronous multifocal tumors of unknown primary cancers using RNA-Seq data and an important step toward more precision-based medicine in cancer diagnosis and treatment. © 2017 UICC.
Mujtaba, Ghulam; Shuib, Liyana; Raj, Ram Gopal; Rajandram, Retnagowri; Shaikh, Khairunisa
2018-07-01
Automatic text classification techniques are useful for classifying plaintext medical documents. This study aims to automatically predict the cause of death from free text forensic autopsy reports by comparing various schemes for feature extraction, term weighing or feature value representation, text classification, and feature reduction. For experiments, the autopsy reports belonging to eight different causes of death were collected, preprocessed and converted into 43 master feature vectors using various schemes for feature extraction, representation, and reduction. The six different text classification techniques were applied on these 43 master feature vectors to construct a classification model that can predict the cause of death. Finally, classification model performance was evaluated using four performance measures i.e. overall accuracy, macro precision, macro-F-measure, and macro recall. From experiments, it was found that that unigram features obtained the highest performance compared to bigram, trigram, and hybrid-gram features. Furthermore, in feature representation schemes, term frequency, and term frequency with inverse document frequency obtained similar and better results when compared with binary frequency, and normalized term frequency with inverse document frequency. Furthermore, the chi-square feature reduction approach outperformed Pearson correlation, and information gain approaches. Finally, in text classification algorithms, support vector machine classifier outperforms random forest, Naive Bayes, k-nearest neighbor, decision tree, and ensemble-voted classifier. Our results and comparisons hold practical importance and serve as references for future works. Moreover, the comparison outputs will act as state-of-art techniques to compare future proposals with existing automated text classification techniques. Copyright © 2017 Elsevier Ltd and Faculty of Forensic and Legal Medicine. All rights reserved.
Vidotti, Vanessa G; Costa, Vital P; Silva, Fabrício R; Resende, Graziela M; Cremasco, Fernanda; Dias, Marcelo; Gomi, Edson S
2012-06-15
Purpose. To investigate the sensitivity and specificity of machine learning classifiers (MLC) and spectral domain optical coherence tomography (SD-OCT) for the diagnosis of glaucoma. Methods. Sixty-two patients with early to moderate glaucomatous visual field damage and 48 healthy individuals were included. All subjects underwent a complete ophthalmologic examination, achromatic standard automated perimetry, and RNFL imaging with SD-OCT (Cirrus HD-OCT; Carl Zeiss Meditec, Inc., Dublin, California, USA). Receiver operating characteristic (ROC) curves were obtained for all SD-OCT parameters. Subsequently, the following MLCs were tested: Classification Tree (CTREE), Random Forest (RAN), Bagging (BAG), AdaBoost M1 (ADA), Ensemble Selection (ENS), Multilayer Perceptron (MLP), Radial Basis Function (RBF), Naive-Bayes (NB), and Support Vector Machine (SVM). Areas under the ROC curves (aROCs) obtained for each parameter and each MLC were compared. Results. The mean age was 57.0±9.2 years for healthy individuals and 59.9±9.0 years for glaucoma patients (p=0.103). Mean deviation values were -4.1±2.4 dB for glaucoma patients and -1.5±1.6 dB for healthy individuals (p<0.001). The SD-OCT parameters with the greater aROCs were inferior quadrant (0.813), average thickness (0.807), 7 o'clock position (0.765), and 6 o'clock position (0.754). The aROCs from classifiers varied from 0.785 (ADA) to 0.818 (BAG). The aROC obtained with BAG was not significantly different from the aROC obtained with the best single SD-OCT parameter (p=0.93). Conclusions. The SD-OCT showed good diagnostic accuracy in a group of patients with early glaucoma. In this series, MLCs did not improve the sensitivity and specificity of SD-OCT for the diagnosis of glaucoma.
McAllister, Patrick; Zheng, Huiru; Bond, Raymond; Moorhead, Anne
2018-04-01
Obesity is increasing worldwide and can cause many chronic conditions such as type-2 diabetes, heart disease, sleep apnea, and some cancers. Monitoring dietary intake through food logging is a key method to maintain a healthy lifestyle to prevent and manage obesity. Computer vision methods have been applied to food logging to automate image classification for monitoring dietary intake. In this work we applied pretrained ResNet-152 and GoogleNet convolutional neural networks (CNNs), initially trained using ImageNet Large Scale Visual Recognition Challenge (ILSVRC) dataset with MatConvNet package, to extract features from food image datasets; Food 5K, Food-11, RawFooT-DB, and Food-101. Deep features were extracted from CNNs and used to train machine learning classifiers including artificial neural network (ANN), support vector machine (SVM), Random Forest, and Naive Bayes. Results show that using ResNet-152 deep features with SVM with RBF kernel can accurately detect food items with 99.4% accuracy using Food-5K validation food image dataset and 98.8% with Food-5K evaluation dataset using ANN, SVM-RBF, and Random Forest classifiers. Trained with ResNet-152 features, ANN can achieve 91.34%, 99.28% when applied to Food-11 and RawFooT-DB food image datasets respectively and SVM with RBF kernel can achieve 64.98% with Food-101 image dataset. From this research it is clear that using deep CNN features can be used efficiently for diverse food item image classification. The work presented in this research shows that pretrained ResNet-152 features provide sufficient generalisation power when applied to a range of food image classification tasks. Copyright © 2018 Elsevier Ltd. All rights reserved.
Mead, Reginald; Paxton, John; Sojda, Richard S.; Swayne, David A.; Yang, Wanhong; Voinov, A.A.; Rizzoli, A.; Filatova, T.
2010-01-01
Radar ornithology has provided tools for studying the movement of birds, especially related to migration. Researchers have presented qualitative evidence suggesting that birds, or at least migration events, can be identified using large broad scale radars such as the WSR-88D used in the NEXRAD weather surveillance system. This is potentially a boon for ornithologists because such data cover a large portion of the United States, are constantly being produced, are freely available, and have been archived since the early 1990s. A major obstacle to this research, however, has been that identifying birds in NEXRAD data has required a trained technician to manually inspect a graphically rendered radar sweep. A single site completes one volume scan every five to ten minutes, producing over 52,000 volume scans in one year. This is an immense amount of data, and manual classification is infeasible. We have developed a system that identifies biological echoes using machine learning techniques. This approach begins with training data using scans that have been classified by experts, or uses bird data collected in the field. The data are preprocessed to ensure quality and to emphasize relevant features. A classifier is then trained using this data and cross validation is used to measure performance. We compared neural networks, naive Bayes, and k-nearest neighbor classifiers. Empirical evidence is provided showing that this system can achieve classification accuracies in the 80th to 90th percentile. We propose to apply these methods to studying bird migration phenology and how it is affected by climate variability and change over multiple temporal scales.
Jochems, Anouk; Leeneman, Brenda; Franken, Margreet G; Schouwenburg, Maartje G; Aarts, Maureen J B; van Akkooi, Alexander C J; van den Berkmortel, Franchette W P J; van den Eertwegh, Alfonsus J M; Groenewegen, Gerard; de Groot, Jan Willem B; Haanen, John B A G; Hospers, Geke A P; Kapiteijn, Ellen; Koornstra, Rutger H; Kruit, Wim H J; Louwman, Marieke W J; Piersma, Djura; van Rijn, Rozemarijn S; Ten Tije, Albert J; Vreugdenhil, Gerard; Wouters, Michel W J M; Uyl-de Groot, Carin A; van der Hoeven, Koos J M
2018-07-01
Phase III trials with ipilimumab showed an improved survival in patients with metastatic melanoma. We evaluated the use and safety of ipilimumab, and the survival of all patients with metastatic cutaneous melanoma (N=807) receiving ipilimumab in real-world clinical practice in The Netherlands using data from the Dutch Melanoma Treatment Registry. Patients who were registered between July 2012 and July 2015 were included and analyzed according to their treatment status: treatment-naive (N=344) versus previously-treated (N=463). Overall, 70% of treatment-naive patients and 62% of previously-treated patients received all four planned doses of ipilimumab. Grade 3 and 4 immune-related adverse events occurred in 29% of treatment-naive patients and 21% of previously-treated patients. No treatment-related deaths occurred. Median time to first event was 5.4 months [95% confidence interval (CI): 4.7-6.5 months] in treatment-naive patients and 4.4 months (95% CI: 4.0-4.7 months) in previously-treated patients. Median overall survival was 14.3 months (95% CI: 11.6-16.7 months) in treatment-naive patients and 8.7 months (95% CI: 7.6-9.6 months) in previously-treated patients. In both patient groups, an elevated lactate dehydrogenase level (hazard ratio: 2.25 and 1.70 in treatment-naive and previously-treated patients, respectively) and American Joint Committee on Cancer M1c-stage disease (hazard ratio: 1.81 and 1.83, respectively) were negatively associated with overall survival. These real-world outcomes of ipilimumab slightly differed from outcomes in phase III trials. Although phase III trials are crucial for establishing efficacy, real-world data are of great added value enhancing the generalizability of outcomes of ipilimumab in clinical practice.
Developmental Origin Governs CD8+ T Cell Fate Decisions during Infection.
Smith, Norah L; Patel, Ravi K; Reynaldi, Arnold; Grenier, Jennifer K; Wang, Jocelyn; Watson, Neva B; Nzingha, Kito; Yee Mon, Kristel J; Peng, Seth A; Grimson, Andrew; Davenport, Miles P; Rudd, Brian D
2018-06-06
Heterogeneity is a hallmark feature of the adaptive immune system in vertebrates. Following infection, naive T cells differentiate into various subsets of effector and memory T cells, which help to eliminate pathogens and maintain long-term immunity. The current model suggests there is a single lineage of naive T cells that give rise to different populations of effector and memory T cells depending on the type and amounts of stimulation they encounter during infection. Here, we have discovered that multiple sub-populations of cells exist in the naive CD8 + T cell pool that are distinguished by their developmental origin, unique transcriptional profiles, distinct chromatin landscapes, and different kinetics and phenotypes after microbial challenge. These data demonstrate that the naive CD8 + T cell pool is not as homogeneous as previously thought and offers a new framework for explaining the remarkable heterogeneity in the effector and memory T cell subsets that arise after infection. Copyright © 2018 Elsevier Inc. All rights reserved.
Coleman, Jamison; Williams, Ashley; Phan, Tam-Hao T.; Mummalaneni, Shobha; Melone, Pamela; Ren, ZuoJun; Zhou, Huiping; Mahavadi, Sunila; Murthy, Karnam S.; Katsumata, Tadayoshi; DeSimone, John A.
2011-01-01
Strain differences between naive, sucrose- and ethanol-exposed alcohol-preferring (P) and alcohol-nonpreferring (NP) rats were investigated in their consumption of ethanol, sucrose, and NaCl; chorda tympani (CT) nerve responses to sweet and salty stimuli; and gene expression in the anterior tongue of T1R3 and TRPV1/TRPV1t. Preference for 5% ethanol and 10% sucrose, CT responses to sweet stimuli, and T1R3 expression were greater in naive P rats than NP rats. The enhancement of the CT response to 0.5 M sucrose in the presence of varying ethanol concentrations (0.5–40%) in naive P rats was higher and shifted to lower ethanol concentrations than NP rats. Chronic ingestion of 5% sucrose or 5% ethanol decreased T1R3 mRNA in NP and P rats. Naive P rats also demonstrated bigger CT responses to NaCl+benzamil and greater TRPV1/TRPV1t expression. TRPV1t agonists produced biphasic effects on NaCl+benzamil CT responses, enhancing the response at low concentrations and inhibiting it at high concentrations. The concentration of a TRPV1/TRPV1t agonist (Maillard reacted peptides conjugated with galacturonic acid) that produced a maximum enhancement in the NaCl+benzamil CT response induced a decrease in NaCl intake and preference in P rats. In naive P rats and NP rats exposed to 5% ethanol in a no-choice paradigm, the biphasic TRPV1t agonist vs. NaCl+benzamil CT response profiles were higher and shifted to lower agonist concentrations than in naive NP rats. TRPV1/TRPV1t mRNA expression increased in NP rats but not in P rats exposed to 5% ethanol in a no-choice paradigm. We conclude that P and NP rats differ in T1R3 and TRPV1/TRPV1t expression and neural and behavioral responses to sweet and salty stimuli and to chronic sucrose and ethanol exposure. PMID:21849614
Coleman, Jamison; Williams, Ashley; Phan, Tam-Hao T; Mummalaneni, Shobha; Melone, Pamela; Ren, Zuojun; Zhou, Huiping; Mahavadi, Sunila; Murthy, Karnam S; Katsumata, Tadayoshi; DeSimone, John A; Lyall, Vijay
2011-11-01
Strain differences between naive, sucrose- and ethanol-exposed alcohol-preferring (P) and alcohol-nonpreferring (NP) rats were investigated in their consumption of ethanol, sucrose, and NaCl; chorda tympani (CT) nerve responses to sweet and salty stimuli; and gene expression in the anterior tongue of T1R3 and TRPV1/TRPV1t. Preference for 5% ethanol and 10% sucrose, CT responses to sweet stimuli, and T1R3 expression were greater in naive P rats than NP rats. The enhancement of the CT response to 0.5 M sucrose in the presence of varying ethanol concentrations (0.5-40%) in naive P rats was higher and shifted to lower ethanol concentrations than NP rats. Chronic ingestion of 5% sucrose or 5% ethanol decreased T1R3 mRNA in NP and P rats. Naive P rats also demonstrated bigger CT responses to NaCl+benzamil and greater TRPV1/TRPV1t expression. TRPV1t agonists produced biphasic effects on NaCl+benzamil CT responses, enhancing the response at low concentrations and inhibiting it at high concentrations. The concentration of a TRPV1/TRPV1t agonist (Maillard reacted peptides conjugated with galacturonic acid) that produced a maximum enhancement in the NaCl+benzamil CT response induced a decrease in NaCl intake and preference in P rats. In naive P rats and NP rats exposed to 5% ethanol in a no-choice paradigm, the biphasic TRPV1t agonist vs. NaCl+benzamil CT response profiles were higher and shifted to lower agonist concentrations than in naive NP rats. TRPV1/TRPV1t mRNA expression increased in NP rats but not in P rats exposed to 5% ethanol in a no-choice paradigm. We conclude that P and NP rats differ in T1R3 and TRPV1/TRPV1t expression and neural and behavioral responses to sweet and salty stimuli and to chronic sucrose and ethanol exposure.
Cho, Jae-Ho; Sprent, Jonathan
2018-05-01
After selection in the thymus, the post-thymic T cell compartments comprise heterogenous subsets of naive and memory T cells that make continuous T cell receptor (TCR) contact with self-ligands bound to major histocompatibility complex (MHC) molecules. T cell recognition of self-MHC ligands elicits covert TCR signaling and is particularly important for controlling survival of naive T cells. Such tonic TCR signaling is tightly controlled and maintains the cells in a quiescent state to avoid autoimmunity. Here, we review how naive and memory T cells are differentially tuned and wired for TCR sensitivity to self and foreign ligands. © 2018 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Torres, Craig; Jones, Rachael; Boelter, Fred; Poole, James; Dell, Linda; Harper, Paul
2014-01-01
Bayesian Decision Analysis (BDA) uses Bayesian statistics to integrate multiple types of exposure information and classify exposures within the exposure rating categorization scheme promoted in American Industrial Hygiene Association (AIHA) publications. Prior distributions for BDA may be developed from existing monitoring data, mathematical models, or professional judgment. Professional judgments may misclassify exposures. We suggest that a structured qualitative risk assessment (QLRA) method can provide consistency and transparency in professional judgments. In this analysis, we use a structured QLRA method to define prior distributions (priors) for BDA. We applied this approach at three semiconductor facilities in South Korea, and present an evaluation of the performance of structured QLRA for determination of priors, and an evaluation of occupational exposures using BDA. Specifically, the structured QLRA was applied to chemical agents in similar exposure groups to identify provisional risk ratings. Standard priors were developed for each risk rating before review of historical monitoring data. Newly collected monitoring data were used to update priors informed by QLRA or historical monitoring data, and determine the posterior distribution. Exposure ratings were defined by the rating category with the highest probability--i.e., the most likely. We found the most likely exposure rating in the QLRA-informed priors to be consistent with historical and newly collected monitoring data, and the posterior exposure ratings developed with QLRA-informed priors to be equal to or greater than those developed with data-informed priors in 94% of comparisons. Overall, exposures at these facilities are consistent with well-controlled work environments. That is, the 95th percentile of exposure distributions are ≤50% of the occupational exposure limit (OEL) for all chemical-SEG combinations evaluated; and are ≤10% of the limit for 94% of chemical-SEG combinations evaluated.
A Bayesian network approach for modeling local failure in lung cancer
NASA Astrophysics Data System (ADS)
Oh, Jung Hun; Craft, Jeffrey; Lozi, Rawan Al; Vaidya, Manushka; Meng, Yifan; Deasy, Joseph O.; Bradley, Jeffrey D.; El Naqa, Issam
2011-03-01
Locally advanced non-small cell lung cancer (NSCLC) patients suffer from a high local failure rate following radiotherapy. Despite many efforts to develop new dose-volume models for early detection of tumor local failure, there was no reported significant improvement in their application prospectively. Based on recent studies of biomarker proteins' role in hypoxia and inflammation in predicting tumor response to radiotherapy, we hypothesize that combining physical and biological factors with a suitable framework could improve the overall prediction. To test this hypothesis, we propose a graphical Bayesian network framework for predicting local failure in lung cancer. The proposed approach was tested using two different datasets of locally advanced NSCLC patients treated with radiotherapy. The first dataset was collected retrospectively, which comprises clinical and dosimetric variables only. The second dataset was collected prospectively in which in addition to clinical and dosimetric information, blood was drawn from the patients at various time points to extract candidate biomarkers as well. Our preliminary results show that the proposed method can be used as an efficient method to develop predictive models of local failure in these patients and to interpret relationships among the different variables in the models. We also demonstrate the potential use of heterogeneous physical and biological variables to improve the model prediction. With the first dataset, we achieved better performance compared with competing Bayesian-based classifiers. With the second dataset, the combined model had a slightly higher performance compared to individual physical and biological models, with the biological variables making the largest contribution. Our preliminary results highlight the potential of the proposed integrated approach for predicting post-radiotherapy local failure in NSCLC patients.
Luan, Hui; Law, Jane; Quick, Matthew
2015-12-30
Obesity and other adverse health outcomes are influenced by individual- and neighbourhood-scale risk factors, including the food environment. At the small-area scale, past research has analysed spatial patterns of food environments for one time period, overlooking how food environments change over time. Further, past research has infrequently analysed relative healthy food access (RHFA), a measure that is more representative of food purchasing and consumption behaviours than absolute outlet density. This research applies a Bayesian hierarchical model to analyse the spatio-temporal patterns of RHFA in the Region of Waterloo, Canada, from 2011 to 2014 at the small-area level. RHFA is calculated as the proportion of healthy food outlets (healthy outlets/healthy + unhealthy outlets) within 4-km from each small-area. This model measures spatial autocorrelation of RHFA, temporal trend of RHFA for the study region, and spatio-temporal trends of RHFA for small-areas. For the study region, a significant decreasing trend in RHFA is observed (-0.024), suggesting that food swamps have become more prevalent during the study period. For small-areas, significant decreasing temporal trends in RHFA were observed for all small-areas. Specific small-areas located in south Waterloo, north Kitchener, and southeast Cambridge exhibited the steepest decreasing spatio-temporal trends and are classified as spatio-temporal food swamps. This research demonstrates a Bayesian spatio-temporal modelling approach to analyse RHFA at the small-area scale. Results suggest that food swamps are more prevalent than food deserts in the Region of Waterloo. Analysing spatio-temporal trends of RHFA improves understanding of local food environment, highlighting specific small-areas where policies should be targeted to increase RHFA and reduce risk factors of adverse health outcomes such as obesity.
Children's Conceptions of Mental Illness: A Naive Theory Approach
ERIC Educational Resources Information Center
Fox, Claudine; Buchanan-Barrow, Eithne; Barrett, Martyn
2010-01-01
This paper reports two studies that investigated children's conceptions of mental illness using a naive theory approach, drawing upon a conceptual framework for analysing illness representations which distinguishes between the identity, causes, consequences, curability, and timeline of an illness. The studies utilized semi-structured interviewing…
Naive vs. Sophisticated Methods of Forecasting Public Library Circulations.
ERIC Educational Resources Information Center
Brooks, Terrence A.
1984-01-01
Two sophisticated--autoregressive integrated moving average (ARIMA), straight-line regression--and two naive--simple average, monthly average--forecasting techniques were used to forecast monthly circulation totals of 34 public libraries. Comparisons of forecasts and actual totals revealed that ARIMA and monthly average methods had smallest mean…
Laksono, Brigitta M; Grosserichter-Wagener, Christina; de Vries, Rory D; Langeveld, Simone A G; Brem, Maarten D; van Dongen, Jacques J M; Katsikis, Peter D; Koopmans, Marion P G; van Zelm, Menno C; de Swart, Rik L
2018-04-15
Measles is characterized by a transient immune suppression, leading to an increased risk of opportunistic infections. Measles virus (MV) infection of immune cells is mediated by the cellular receptor CD150, expressed by subsets of lymphocytes, dendritic cells, macrophages, and thymocytes. Previous studies showed that human and nonhuman primate memory T cells express higher levels of CD150 than naive cells and are more susceptible to MV infection. However, limited information is available about the CD150 expression and relative susceptibility to MV infection of B-cell subsets. In this study, we assessed the susceptibility and permissiveness of naive and memory T- and B-cell subsets from human peripheral blood or tonsils to in vitro MV infection. Our study demonstrates that naive and memory B cells express CD150, but at lower frequencies than memory T cells. Nevertheless, both naive and memory B cells proved to be highly permissive to MV infection. Furthermore, we assessed the susceptibility and permissiveness of various functionally distinct T and B cells, such as helper T (T H ) cell subsets and IgG- and IgA-positive memory B cells, in peripheral blood and tonsils. We demonstrated that T H 1T H 17 cells and plasma and germinal center B cells were the subsets most susceptible and permissive to MV infection. Our study suggests that both naive and memory B cells, along with several other antigen-experienced lymphocytes, are important target cells of MV infection. Depletion of these cells potentially contributes to the pathogenesis of measles immune suppression. IMPORTANCE Measles is associated with immune suppression and is often complicated by bacterial pneumonia, otitis media, or gastroenteritis. Measles virus infects antigen-presenting cells and T and B cells, and depletion of these cells may contribute to lymphopenia and immune suppression. Measles has been associated with follicular exhaustion in lymphoid tissues in humans and nonhuman primates, emphasizing the importance of MV infection of B cells in vivo However, information on the relative susceptibility of B-cell subsets is scarce. Here, we compared the susceptibility and permissiveness to in vitro MV infection of human naive and memory T- and B-cell subsets isolated from peripheral blood or tonsils. Our results demonstrate that both naive and memory B cells are more permissive to MV infection than T cells. The highest infection levels were detected in plasma cells and germinal center B cells, suggesting that infection and depletion of these populations contribute to reduced host resistance. Copyright © 2018 Laksono et al.
Laksono, Brigitta M.; Grosserichter-Wagener, Christina; de Vries, Rory D.; Langeveld, Simone A. G.; Brem, Maarten D.; van Dongen, Jacques J. M.; Koopmans, Marion P. G.
2018-01-01
ABSTRACT Measles is characterized by a transient immune suppression, leading to an increased risk of opportunistic infections. Measles virus (MV) infection of immune cells is mediated by the cellular receptor CD150, expressed by subsets of lymphocytes, dendritic cells, macrophages, and thymocytes. Previous studies showed that human and nonhuman primate memory T cells express higher levels of CD150 than naive cells and are more susceptible to MV infection. However, limited information is available about the CD150 expression and relative susceptibility to MV infection of B-cell subsets. In this study, we assessed the susceptibility and permissiveness of naive and memory T- and B-cell subsets from human peripheral blood or tonsils to in vitro MV infection. Our study demonstrates that naive and memory B cells express CD150, but at lower frequencies than memory T cells. Nevertheless, both naive and memory B cells proved to be highly permissive to MV infection. Furthermore, we assessed the susceptibility and permissiveness of various functionally distinct T and B cells, such as helper T (TH) cell subsets and IgG- and IgA-positive memory B cells, in peripheral blood and tonsils. We demonstrated that TH1TH17 cells and plasma and germinal center B cells were the subsets most susceptible and permissive to MV infection. Our study suggests that both naive and memory B cells, along with several other antigen-experienced lymphocytes, are important target cells of MV infection. Depletion of these cells potentially contributes to the pathogenesis of measles immune suppression. IMPORTANCE Measles is associated with immune suppression and is often complicated by bacterial pneumonia, otitis media, or gastroenteritis. Measles virus infects antigen-presenting cells and T and B cells, and depletion of these cells may contribute to lymphopenia and immune suppression. Measles has been associated with follicular exhaustion in lymphoid tissues in humans and nonhuman primates, emphasizing the importance of MV infection of B cells in vivo. However, information on the relative susceptibility of B-cell subsets is scarce. Here, we compared the susceptibility and permissiveness to in vitro MV infection of human naive and memory T- and B-cell subsets isolated from peripheral blood or tonsils. Our results demonstrate that both naive and memory B cells are more permissive to MV infection than T cells. The highest infection levels were detected in plasma cells and germinal center B cells, suggesting that infection and depletion of these populations contribute to reduced host resistance. PMID:29437964
Automatic classification of spectra from the Infrared Astronomical Satellite (IRAS)
NASA Technical Reports Server (NTRS)
Cheeseman, Peter; Stutz, John; Self, Matthew; Taylor, William; Goebel, John; Volk, Kevin; Walker, Helen
1989-01-01
A new classification of Infrared spectra collected by the Infrared Astronomical Satellite (IRAS) is presented. The spectral classes were discovered automatically by a program called Auto Class 2. This program is a method for discovering (inducing) classes from a data base, utilizing a Bayesian probability approach. These classes can be used to give insight into the patterns that occur in the particular domain, in this case, infrared astronomical spectroscopy. The classified spectra are the entire Low Resolution Spectra (LRS) Atlas of 5,425 sources. There are seventy-seven classes in this classification and these in turn were meta-classified to produce nine meta-classes. The classification is presented as spectral plots, IRAS color-color plots, galactic distribution plots and class commentaries. Cross-reference tables, listing the sources by IRAS name and by Auto Class class, are also given. These classes show some of the well known classes, such as the black-body class, and silicate emission classes, but many other classes were unsuspected, while others show important subtle differences within the well known classes.
Automated classification of radiology reports to facilitate retrospective study in radiology.
Zhou, Yihua; Amundson, Per K; Yu, Fang; Kessler, Marcus M; Benzinger, Tammie L S; Wippold, Franz J
2014-12-01
Retrospective research is an import tool in radiology. Identifying imaging examinations appropriate for a given research question from the unstructured radiology reports is extremely useful, but labor-intensive. Using the machine learning text-mining methods implemented in LingPipe [1], we evaluated the performance of the dynamic language model (DLM) and the Naïve Bayesian (NB) classifiers in classifying radiology reports to facilitate identification of radiological examinations for research projects. The training dataset consisted of 14,325 sentences from 11,432 radiology reports randomly selected from a database of 5,104,594 reports in all disciplines of radiology. The training sentences were categorized manually into six categories (Positive, Differential, Post Treatment, Negative, Normal, and History). A 10-fold cross-validation [2] was used to evaluate the performance of the models, which were tested in classification of radiology reports for cases of sellar or suprasellar masses and colloid cysts. The average accuracies for the DLM and NB classifiers were 88.5% with 95% confidence interval (CI) of 1.9% and 85.9% with 95% CI of 2.0%, respectively. The DLM performed slightly better and was used to classify 1,397 radiology reports containing the keywords "sellar or suprasellar mass", or "colloid cyst". The DLM model produced an accuracy of 88.2% with 95% CI of 2.1% for 959 reports that contain "sellar or suprasellar mass" and an accuracy of 86.3% with 95% CI of 2.5% for 437 reports of "colloid cyst". We conclude that automated classification of radiology reports using machine learning techniques can effectively facilitate the identification of cases suitable for retrospective research.
Classification of clinically useful sentences in clinical evidence resources.
Morid, Mohammad Amin; Fiszman, Marcelo; Raja, Kalpana; Jonnalagadda, Siddhartha R; Del Fiol, Guilherme
2016-04-01
Most patient care questions raised by clinicians can be answered by online clinical knowledge resources. However, important barriers still challenge the use of these resources at the point of care. To design and assess a method for extracting clinically useful sentences from synthesized online clinical resources that represent the most clinically useful information for directly answering clinicians' information needs. We developed a Kernel-based Bayesian Network classification model based on different domain-specific feature types extracted from sentences in a gold standard composed of 18 UpToDate documents. These features included UMLS concepts and their semantic groups, semantic predications extracted by SemRep, patient population identified by a pattern-based natural language processing (NLP) algorithm, and cue words extracted by a feature selection technique. Algorithm performance was measured in terms of precision, recall, and F-measure. The feature-rich approach yielded an F-measure of 74% versus 37% for a feature co-occurrence method (p<0.001). Excluding predication, population, semantic concept or text-based features reduced the F-measure to 62%, 66%, 58% and 69% respectively (p<0.01). The classifier applied to Medline sentences reached an F-measure of 73%, which is equivalent to the performance of the classifier on UpToDate sentences (p=0.62). The feature-rich approach significantly outperformed general baseline methods. This approach significantly outperformed classifiers based on a single type of feature. Different types of semantic features provided a unique contribution to overall classification performance. The classifier's model and features used for UpToDate generalized well to Medline abstracts. Copyright © 2016 Elsevier Inc. All rights reserved.
Study of Lurasidone in Treating Antipsychotic Naive or Quasi-Naive Children and Adolescents
2017-05-18
Schizophrenia; Schizoaffective Disorder; Schizophreniform Disorder; Psychosis NOS; Autistic Disorder; Asperger Syndrome; Child Development Disorders, Pervasive; Bipolar I Disorder; Bipolar II Disorder; Mood Disorder NOS; Severe Major Depression With Psychotic Features; Single Episode Major Depression Without Psychotic Symptoms; Severe Mood Disorder With Psychotic Features
Naive Theories of Social Groups
ERIC Educational Resources Information Center
Rhodes, Marjorie
2012-01-01
Four studies examined children's (ages 3-10, Total N = 235) naive theories of social groups, in particular, their expectations about how group memberships constrain social interactions. After introduction to novel groups of people, preschoolers (ages 3-5) reliably expected agents from one group to harm members of the other group (rather than…
The Effect of Naive Ideas on Students' Reasoning about Electricity and Magnetism
ERIC Educational Resources Information Center
Leppavirta, Johanna
2012-01-01
Traditional multiple-choice concept inventories measure students' critical conceptual understanding and are designed to reveal students' naive or alternate ideas. The overall scores, however, give little information about the state of students' knowledge and the consistency of reasoning. This study investigates whether students have consistent…
Applying machine-learning techniques to Twitter data for automatic hazard-event classification.
NASA Astrophysics Data System (ADS)
Filgueira, R.; Bee, E. J.; Diaz-Doce, D.; Poole, J., Sr.; Singh, A.
2017-12-01
The constant flow of information offered by tweets provides valuable information about all sorts of events at a high temporal and spatial resolution. Over the past year we have been analyzing in real-time geological hazards/phenomenon, such as earthquakes, volcanic eruptions, landslides, floods or the aurora, as part of the GeoSocial project, by geo-locating tweets filtered by keywords in a web-map. However, not all the filtered tweets are related with hazard/phenomenon events. This work explores two classification techniques for automatic hazard-event categorization based on tweets about the "Aurora". First, tweets were filtered using aurora-related keywords, removing stop words and selecting the ones written in English. For classifying the remaining between "aurora-event" or "no-aurora-event" categories, we compared two state-of-art techniques: Support Vector Machine (SVM) and Deep Convolutional Neural Networks (CNN) algorithms. Both approaches belong to the family of supervised learning algorithms, which make predictions based on labelled training dataset. Therefore, we created a training dataset by tagging 1200 tweets between both categories. The general form of SVM is used to separate two classes by a function (kernel). We compared the performance of four different kernels (Linear Regression, Logistic Regression, Multinomial Naïve Bayesian and Stochastic Gradient Descent) provided by Scikit-Learn library using our training dataset to build the SVM classifier. The results shown that the Logistic Regression (LR) gets the best accuracy (87%). So, we selected the SVM-LR classifier to categorise a large collection of tweets using the "dispel4py" framework.Later, we developed a CNN classifier, where the first layer embeds words into low-dimensional vectors. The next layer performs convolutions over the embedded word vectors. Results from the convolutional layer are max-pooled into a long feature vector, which is classified using a softmax layer. The CNN's accuracy is lower (83%) than the SVM-LR, since the algorithm needs a bigger training dataset to increase its accuracy. We used TensorFlow framework for applying CNN classifier to the same collection of tweets.In future we will modify both classifiers to work with other geo-hazards, use larger training datasets and apply them in real-time.
On the late-time behavior of Virasoro blocks and a classification of semiclassical saddles
Fitzpatrick, A. Liam; Kaplan, Jared
2017-04-12
Recent work has demonstrated that black hole thermodynamics and information loss/restoration in AdS 3/CFT 2 can be derived almost entirely from the behavior of the Virasoro conformal blocks at large central charge, with relatively little dependence on the precise details of the CFT spectrum or OPE coefficients. Here, we elaborate on the non-perturbative behavior of Virasoro blocks by classifying all ‘saddles’ that can contribute for arbitrary values of external and internal operator dimensions in the semiclassical large central charge limit. The leading saddles, which determine the naive semiclassical behavior of the Virasoro blocks, all decay exponentially at late times, andmore » at a rate that is independent of internal operator dimensions. Consequently, the semiclassical contribution of a finite number of high-energy states cannot resolve a well-known version of the information loss problem in AdS 3. Furthermore, we identify two infinite classes of sub-leading saddles, and one of these classes does not decay at late times.« less