Ramsey, Elijah W.; Nelson, Gene A.; Sapkota, Sijan
1998-01-01
A progressive classification of a marsh and forest system using Landsat Thematic Mapper (TM), color infrared (CIR) photograph, and ERS-1 synthetic aperture radar (SAR) data improved classification accuracy when compared to classification using solely TM reflective band data. The classification resulted in a detailed identification of differences within a nearly monotypic black needlerush marsh. Accuracy percentages of these classes were surprisingly high given the complexities of classification. The detailed classification resulted in a more accurate portrayal of the marsh transgressive sequence than was obtainable with TM data alone. Individual sensor contribution to the improved classification was compared to that using only the six reflective TM bands. Individually, the green reflective CIR and SAR data identified broad categories of water, marsh, and forest. In combination with TM, SAR and the green CIR band each improved overall accuracy by about 3% and 15% respectively. The SAR data improved the TM classification accuracy mostly in the marsh classes. The green CIR data also improved the marsh classification accuracy and accuracies in some water classes. The final combination of all sensor data improved almost all class accuracies from 2% to 70% with an overall improvement of about 20% over TM data alone. Not only was the identification of vegetation types improved, but the spatial detail of the classification approached 10 m in some areas.
Land-cover classification in a moist tropical region of Brazil with Landsat TM imagery.
Li, Guiying; Lu, Dengsheng; Moran, Emilio; Hetrick, Scott
2011-01-01
This research aims to improve land-cover classification accuracy in a moist tropical region in Brazil by examining the use of different remote sensing-derived variables and classification algorithms. Different scenarios based on Landsat Thematic Mapper (TM) spectral data and derived vegetation indices and textural images, and different classification algorithms - maximum likelihood classification (MLC), artificial neural network (ANN), classification tree analysis (CTA), and object-based classification (OBC), were explored. The results indicated that a combination of vegetation indices as extra bands into Landsat TM multispectral bands did not improve the overall classification performance, but the combination of textural images was valuable for improving vegetation classification accuracy. In particular, the combination of both vegetation indices and textural images into TM multispectral bands improved overall classification accuracy by 5.6% and kappa coefficient by 6.25%. Comparison of the different classification algorithms indicated that CTA and ANN have poor classification performance in this research, but OBC improved primary forest and pasture classification accuracies. This research indicates that use of textural images or use of OBC are especially valuable for improving the vegetation classes such as upland and liana forest classes having complex stand structures and having relatively large patch sizes.
Land-cover classification in a moist tropical region of Brazil with Landsat TM imagery
LI, GUIYING; LU, DENGSHENG; MORAN, EMILIO; HETRICK, SCOTT
2011-01-01
This research aims to improve land-cover classification accuracy in a moist tropical region in Brazil by examining the use of different remote sensing-derived variables and classification algorithms. Different scenarios based on Landsat Thematic Mapper (TM) spectral data and derived vegetation indices and textural images, and different classification algorithms – maximum likelihood classification (MLC), artificial neural network (ANN), classification tree analysis (CTA), and object-based classification (OBC), were explored. The results indicated that a combination of vegetation indices as extra bands into Landsat TM multispectral bands did not improve the overall classification performance, but the combination of textural images was valuable for improving vegetation classification accuracy. In particular, the combination of both vegetation indices and textural images into TM multispectral bands improved overall classification accuracy by 5.6% and kappa coefficient by 6.25%. Comparison of the different classification algorithms indicated that CTA and ANN have poor classification performance in this research, but OBC improved primary forest and pasture classification accuracies. This research indicates that use of textural images or use of OBC are especially valuable for improving the vegetation classes such as upland and liana forest classes having complex stand structures and having relatively large patch sizes. PMID:22368311
A Classification of Remote Sensing Image Based on Improved Compound Kernels of Svm
NASA Astrophysics Data System (ADS)
Zhao, Jianing; Gao, Wanlin; Liu, Zili; Mou, Guifen; Lu, Lin; Yu, Lina
The accuracy of RS classification based on SVM which is developed from statistical learning theory is high under small number of train samples, which results in satisfaction of classification on RS using SVM methods. The traditional RS classification method combines visual interpretation with computer classification. The accuracy of the RS classification, however, is improved a lot based on SVM method, because it saves much labor and time which is used to interpret images and collect training samples. Kernel functions play an important part in the SVM algorithm. It uses improved compound kernel function and therefore has a higher accuracy of classification on RS images. Moreover, compound kernel improves the generalization and learning ability of the kernel.
[Accuracy improvement of spectral classification of crop using microwave backscatter data].
Jia, Kun; Li, Qiang-Zi; Tian, Yi-Chen; Wu, Bing-Fang; Zhang, Fei-Fei; Meng, Ji-Hua
2011-02-01
In the present study, VV polarization microwave backscatter data used for improving accuracies of spectral classification of crop is investigated. Classification accuracy using different classifiers based on the fusion data of HJ satellite multi-spectral and Envisat ASAR VV backscatter data are compared. The results indicate that fusion data can take full advantage of spectral information of HJ multi-spectral data and the structure sensitivity feature of ASAR VV polarization data. The fusion data enlarges the spectral difference among different classifications and improves crop classification accuracy. The classification accuracy using fusion data can be increased by 5 percent compared to the single HJ data. Furthermore, ASAR VV polarization data is sensitive to non-agrarian area of planted field, and VV polarization data joined classification can effectively distinguish the field border. VV polarization data associating with multi-spectral data used in crop classification enlarges the application of satellite data and has the potential of spread in the domain of agriculture.
Austin, Peter C; Lee, Douglas S
2011-01-01
Purpose: Classification trees are increasingly being used to classifying patients according to the presence or absence of a disease or health outcome. A limitation of classification trees is their limited predictive accuracy. In the data-mining and machine learning literature, boosting has been developed to improve classification. Boosting with classification trees iteratively grows classification trees in a sequence of reweighted datasets. In a given iteration, subjects that were misclassified in the previous iteration are weighted more highly than subjects that were correctly classified. Classifications from each of the classification trees in the sequence are combined through a weighted majority vote to produce a final classification. The authors' objective was to examine whether boosting improved the accuracy of classification trees for predicting outcomes in cardiovascular patients. Methods: We examined the utility of boosting classification trees for classifying 30-day mortality outcomes in patients hospitalized with either acute myocardial infarction or congestive heart failure. Results: Improvements in the misclassification rate using boosted classification trees were at best minor compared to when conventional classification trees were used. Minor to modest improvements to sensitivity were observed, with only a negligible reduction in specificity. For predicting cardiovascular mortality, boosted classification trees had high specificity, but low sensitivity. Conclusions: Gains in predictive accuracy for predicting cardiovascular outcomes were less impressive than gains in performance observed in the data mining literature. PMID:22254181
Yang, Xiaoyan; Chen, Longgao; Li, Yingkui; Xi, Wenjia; Chen, Longqian
2015-07-01
Land use/land cover (LULC) inventory provides an important dataset in regional planning and environmental assessment. To efficiently obtain the LULC inventory, we compared the LULC classifications based on single satellite imagery with a rule-based classification based on multi-seasonal imagery in Lianyungang City, a coastal city in China, using CBERS-02 (the 2nd China-Brazil Environmental Resource Satellites) images. The overall accuracies of the classification based on single imagery are 78.9, 82.8, and 82.0% in winter, early summer, and autumn, respectively. The rule-based classification improves the accuracy to 87.9% (kappa 0.85), suggesting that combining multi-seasonal images can considerably improve the classification accuracy over any single image-based classification. This method could also be used to analyze seasonal changes of LULC types, especially for those associated with tidal changes in coastal areas. The distribution and inventory of LULC types with an overall accuracy of 87.9% and a spatial resolution of 19.5 m can assist regional planning and environmental assessment efficiently in Lianyungang City. This rule-based classification provides a guidance to improve accuracy for coastal areas with distinct LULC temporal spectral features.
Derivation of an artificial gene to improve classification accuracy upon gene selection.
Seo, Minseok; Oh, Sejong
2012-02-01
Classification analysis has been developed continuously since 1936. This research field has advanced as a result of development of classifiers such as KNN, ANN, and SVM, as well as through data preprocessing areas. Feature (gene) selection is required for very high dimensional data such as microarray before classification work. The goal of feature selection is to choose a subset of informative features that reduces processing time and provides higher classification accuracy. In this study, we devised a method of artificial gene making (AGM) for microarray data to improve classification accuracy. Our artificial gene was derived from a whole microarray dataset, and combined with a result of gene selection for classification analysis. We experimentally confirmed a clear improvement of classification accuracy after inserting artificial gene. Our artificial gene worked well for popular feature (gene) selection algorithms and classifiers. The proposed approach can be applied to any type of high dimensional dataset. Copyright © 2011 Elsevier Ltd. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Getman, Daniel J
2008-01-01
Many attempts to observe changes in terrestrial systems over time would be significantly enhanced if it were possible to improve the accuracy of classifications of low-resolution historic satellite data. In an effort to examine improving the accuracy of historic satellite image classification by combining satellite and air photo data, two experiments were undertaken in which low-resolution multispectral data and high-resolution panchromatic data were combined and then classified using the ECHO spectral-spatial image classification algorithm and the Maximum Likelihood technique. The multispectral data consisted of 6 multispectral channels (30-meter pixel resolution) from Landsat 7. These data were augmented with panchromatic datamore » (15m pixel resolution) from Landsat 7 in the first experiment, and with a mosaic of digital aerial photography (1m pixel resolution) in the second. The addition of the Landsat 7 panchromatic data provided a significant improvement in the accuracy of classifications made using the ECHO algorithm. Although the inclusion of aerial photography provided an improvement in accuracy, this improvement was only statistically significant at a 40-60% level. These results suggest that once error levels associated with combining aerial photography and multispectral satellite data are reduced, this approach has the potential to significantly enhance the precision and accuracy of classifications made using historic remotely sensed data, as a way to extend the time range of efforts to track temporal changes in terrestrial systems.« less
Improving crop classification through attention to the timing of airborne radar acquisitions
NASA Technical Reports Server (NTRS)
Brisco, B.; Ulaby, F. T.; Protz, R.
1984-01-01
Radar remote sensors may provide valuable input to crop classification procedures because of (1) their independence of weather conditions and solar illumination, and (2) their ability to respond to differences in crop type. Manual classification of multidate synthetic aperture radar (SAR) imagery resulted in an overall accuracy of 83 percent for corn, forest, grain, and 'other' cover types. Forests and corn fields were identified with accuracies approaching or exceeding 90 percent. Grain fields and 'other' fields were often confused with each other, resulting in classification accuracies of 51 and 66 percent, respectively. The 83 percent correct classification represents a 10 percent improvement when compared to similar SAR data for the same area collected at alternate time periods in 1978. These results demonstrate that improvements in crop classification accuracy can be achieved with SAR data by synchronizing data collection times with crop growth stages in order to maximize differences in the geometric and dielectric properties of the cover types of interest.
Global Optimization Ensemble Model for Classification Methods
Anwar, Hina; Qamar, Usman; Muzaffar Qureshi, Abdul Wahab
2014-01-01
Supervised learning is the process of data mining for deducing rules from training datasets. A broad array of supervised learning algorithms exists, every one of them with its own advantages and drawbacks. There are some basic issues that affect the accuracy of classifier while solving a supervised learning problem, like bias-variance tradeoff, dimensionality of input space, and noise in the input data space. All these problems affect the accuracy of classifier and are the reason that there is no global optimal method for classification. There is not any generalized improvement method that can increase the accuracy of any classifier while addressing all the problems stated above. This paper proposes a global optimization ensemble model for classification methods (GMC) that can improve the overall accuracy for supervised learning problems. The experimental results on various public datasets showed that the proposed model improved the accuracy of the classification models from 1% to 30% depending upon the algorithm complexity. PMID:24883382
Saini, Harsh; Lal, Sunil Pranit; Naidu, Vimal Vikash; Pickering, Vincel Wince; Singh, Gurmeet; Tsunoda, Tatsuhiko; Sharma, Alok
2016-12-05
High dimensional feature space generally degrades classification in several applications. In this paper, we propose a strategy called gene masking, in which non-contributing dimensions are heuristically removed from the data to improve classification accuracy. Gene masking is implemented via a binary encoded genetic algorithm that can be integrated seamlessly with classifiers during the training phase of classification to perform feature selection. It can also be used to discriminate between features that contribute most to the classification, thereby, allowing researchers to isolate features that may have special significance. This technique was applied on publicly available datasets whereby it substantially reduced the number of features used for classification while maintaining high accuracies. The proposed technique can be extremely useful in feature selection as it heuristically removes non-contributing features to improve the performance of classifiers.
The study of vehicle classification equipment with solutions to improve accuracy in Oklahoma.
DOT National Transportation Integrated Search
2014-12-01
The accuracy of vehicle counting and classification data is vital for appropriate future highway and road : design, including determining pavement characteristics, eliminating traffic jams, and improving safety. : Organizations relying on vehicle cla...
Improving EEG-Based Driver Fatigue Classification Using Sparse-Deep Belief Networks.
Chai, Rifai; Ling, Sai Ho; San, Phyo Phyo; Naik, Ganesh R; Nguyen, Tuan N; Tran, Yvonne; Craig, Ashley; Nguyen, Hung T
2017-01-01
This paper presents an improvement of classification performance for electroencephalography (EEG)-based driver fatigue classification between fatigue and alert states with the data collected from 43 participants. The system employs autoregressive (AR) modeling as the features extraction algorithm, and sparse-deep belief networks (sparse-DBN) as the classification algorithm. Compared to other classifiers, sparse-DBN is a semi supervised learning method which combines unsupervised learning for modeling features in the pre-training layer and supervised learning for classification in the following layer. The sparsity in sparse-DBN is achieved with a regularization term that penalizes a deviation of the expected activation of hidden units from a fixed low-level prevents the network from overfitting and is able to learn low-level structures as well as high-level structures. For comparison, the artificial neural networks (ANN), Bayesian neural networks (BNN), and original deep belief networks (DBN) classifiers are used. The classification results show that using AR feature extractor and DBN classifiers, the classification performance achieves an improved classification performance with a of sensitivity of 90.8%, a specificity of 90.4%, an accuracy of 90.6%, and an area under the receiver operating curve (AUROC) of 0.94 compared to ANN (sensitivity at 80.8%, specificity at 77.8%, accuracy at 79.3% with AUC-ROC of 0.83) and BNN classifiers (sensitivity at 84.3%, specificity at 83%, accuracy at 83.6% with AUROC of 0.87). Using the sparse-DBN classifier, the classification performance improved further with sensitivity of 93.9%, a specificity of 92.3%, and an accuracy of 93.1% with AUROC of 0.96. Overall, the sparse-DBN classifier improved accuracy by 13.8, 9.5, and 2.5% over ANN, BNN, and DBN classifiers, respectively.
Improving EEG-Based Driver Fatigue Classification Using Sparse-Deep Belief Networks
Chai, Rifai; Ling, Sai Ho; San, Phyo Phyo; Naik, Ganesh R.; Nguyen, Tuan N.; Tran, Yvonne; Craig, Ashley; Nguyen, Hung T.
2017-01-01
This paper presents an improvement of classification performance for electroencephalography (EEG)-based driver fatigue classification between fatigue and alert states with the data collected from 43 participants. The system employs autoregressive (AR) modeling as the features extraction algorithm, and sparse-deep belief networks (sparse-DBN) as the classification algorithm. Compared to other classifiers, sparse-DBN is a semi supervised learning method which combines unsupervised learning for modeling features in the pre-training layer and supervised learning for classification in the following layer. The sparsity in sparse-DBN is achieved with a regularization term that penalizes a deviation of the expected activation of hidden units from a fixed low-level prevents the network from overfitting and is able to learn low-level structures as well as high-level structures. For comparison, the artificial neural networks (ANN), Bayesian neural networks (BNN), and original deep belief networks (DBN) classifiers are used. The classification results show that using AR feature extractor and DBN classifiers, the classification performance achieves an improved classification performance with a of sensitivity of 90.8%, a specificity of 90.4%, an accuracy of 90.6%, and an area under the receiver operating curve (AUROC) of 0.94 compared to ANN (sensitivity at 80.8%, specificity at 77.8%, accuracy at 79.3% with AUC-ROC of 0.83) and BNN classifiers (sensitivity at 84.3%, specificity at 83%, accuracy at 83.6% with AUROC of 0.87). Using the sparse-DBN classifier, the classification performance improved further with sensitivity of 93.9%, a specificity of 92.3%, and an accuracy of 93.1% with AUROC of 0.96. Overall, the sparse-DBN classifier improved accuracy by 13.8, 9.5, and 2.5% over ANN, BNN, and DBN classifiers, respectively. PMID:28326009
NASA Astrophysics Data System (ADS)
Tamimi, E.; Ebadi, H.; Kiani, A.
2017-09-01
Automatic building detection from High Spatial Resolution (HSR) images is one of the most important issues in Remote Sensing (RS). Due to the limited number of spectral bands in HSR images, using other features will lead to improve accuracy. By adding these features, the presence probability of dependent features will be increased, which leads to accuracy reduction. In addition, some parameters should be determined in Support Vector Machine (SVM) classification. Therefore, it is necessary to simultaneously determine classification parameters and select independent features according to image type. Optimization algorithm is an efficient method to solve this problem. On the other hand, pixel-based classification faces several challenges such as producing salt-paper results and high computational time in high dimensional data. Hence, in this paper, a novel method is proposed to optimize object-based SVM classification by applying continuous Ant Colony Optimization (ACO) algorithm. The advantages of the proposed method are relatively high automation level, independency of image scene and type, post processing reduction for building edge reconstruction and accuracy improvement. The proposed method was evaluated by pixel-based SVM and Random Forest (RF) classification in terms of accuracy. In comparison with optimized pixel-based SVM classification, the results showed that the proposed method improved quality factor and overall accuracy by 17% and 10%, respectively. Also, in the proposed method, Kappa coefficient was improved by 6% rather than RF classification. Time processing of the proposed method was relatively low because of unit of image analysis (image object). These showed the superiority of the proposed method in terms of time and accuracy.
Zhang, Xiaoheng; Wang, Lirui; Cao, Yao; Wang, Pin; Zhang, Cheng; Yang, Liuyang; Li, Yongming; Zhang, Yanling; Cheng, Oumei
2018-02-01
Diagnosis of Parkinson's disease (PD) based on speech data has been proved to be an effective way in recent years. However, current researches just care about the feature extraction and classifier design, and do not consider the instance selection. Former research by authors showed that the instance selection can lead to improvement on classification accuracy. However, no attention is paid on the relationship between speech sample and feature until now. Therefore, a new diagnosis algorithm of PD is proposed in this paper by simultaneously selecting speech sample and feature based on relevant feature weighting algorithm and multiple kernel method, so as to find their synergy effects, thereby improving classification accuracy. Experimental results showed that this proposed algorithm obtained apparent improvement on classification accuracy. It can obtain mean classification accuracy of 82.5%, which was 30.5% higher than the relevant algorithm. Besides, the proposed algorithm detected the synergy effects of speech sample and feature, which is valuable for speech marker extraction.
Zhou, Tao; Li, Zhaofu; Pan, Jianjun
2018-01-27
This paper focuses on evaluating the ability and contribution of using backscatter intensity, texture, coherence, and color features extracted from Sentinel-1A data for urban land cover classification and comparing different multi-sensor land cover mapping methods to improve classification accuracy. Both Landsat-8 OLI and Hyperion images were also acquired, in combination with Sentinel-1A data, to explore the potential of different multi-sensor urban land cover mapping methods to improve classification accuracy. The classification was performed using a random forest (RF) method. The results showed that the optimal window size of the combination of all texture features was 9 × 9, and the optimal window size was different for each individual texture feature. For the four different feature types, the texture features contributed the most to the classification, followed by the coherence and backscatter intensity features; and the color features had the least impact on the urban land cover classification. Satisfactory classification results can be obtained using only the combination of texture and coherence features, with an overall accuracy up to 91.55% and a kappa coefficient up to 0.8935, respectively. Among all combinations of Sentinel-1A-derived features, the combination of the four features had the best classification result. Multi-sensor urban land cover mapping obtained higher classification accuracy. The combination of Sentinel-1A and Hyperion data achieved higher classification accuracy compared to the combination of Sentinel-1A and Landsat-8 OLI images, with an overall accuracy of up to 99.12% and a kappa coefficient up to 0.9889. When Sentinel-1A data was added to Hyperion images, the overall accuracy and kappa coefficient were increased by 4.01% and 0.0519, respectively.
NASA Technical Reports Server (NTRS)
Mulligan, P. J.; Gervin, J. C.; Lu, Y. C.
1985-01-01
An area bordering the Eastern Shore of the Chesapeake Bay was selected for study and classified using unsupervised techniques applied to LANDSAT-2 MSS data and several band combinations of LANDSAT-4 TM data. The accuracies of these Level I land cover classifications were verified using the Taylor's Island USGS 7.5 minute topographic map which was photointerpreted, digitized and rasterized. The the Taylor's Island map, comparing the MSS and TM three band (2 3 4) classifications, the increased resolution of TM produced a small improvement in overall accuracy of 1% correct due primarily to a small improvement, and 1% and 3%, in areas such as water and woodland. This was expected as the MSS data typically produce high accuracies for categories which cover large contiguous areas. However, in the categories covering smaller areas within the map there was generally an improvement of at least 10%. Classification of the important residential category improved 12%, and wetlands were mapped with 11% greater accuracy.
NASA Astrophysics Data System (ADS)
Bangs, Corey F.; Kruse, Fred A.; Olsen, Chris R.
2013-05-01
Hyperspectral data were assessed to determine the effect of integrating spectral data and extracted texture feature data on classification accuracy. Four separate spectral ranges (hundreds of spectral bands total) were used from the Visible and Near Infrared (VNIR) and Shortwave Infrared (SWIR) portions of the electromagnetic spectrum. Haralick texture features (contrast, entropy, and correlation) were extracted from the average gray-level image for each of the four spectral ranges studied. A maximum likelihood classifier was trained using a set of ground truth regions of interest (ROIs) and applied separately to the spectral data, texture data, and a fused dataset containing both. Classification accuracy was measured by comparison of results to a separate verification set of test ROIs. Analysis indicates that the spectral range (source of the gray-level image) used to extract the texture feature data has a significant effect on the classification accuracy. This result applies to texture-only classifications as well as the classification of integrated spectral data and texture feature data sets. Overall classification improvement for the integrated data sets was near 1%. Individual improvement for integrated spectral and texture classification of the "Urban" class showed approximately 9% accuracy increase over spectral-only classification. Texture-only classification accuracy was highest for the "Dirt Path" class at approximately 92% for the spectral range from 947 to 1343nm. This research demonstrates the effectiveness of texture feature data for more accurate analysis of hyperspectral data and the importance of selecting the correct spectral range to be used for the gray-level image source to extract these features.
NASA Astrophysics Data System (ADS)
Chen, Y.; Luo, M.; Xu, L.; Zhou, X.; Ren, J.; Zhou, J.
2018-04-01
The RF method based on grid-search parameter optimization could achieve a classification accuracy of 88.16 % in the classification of images with multiple feature variables. This classification accuracy was higher than that of SVM and ANN under the same feature variables. In terms of efficiency, the RF classification method performs better than SVM and ANN, it is more capable of handling multidimensional feature variables. The RF method combined with object-based analysis approach could highlight the classification accuracy further. The multiresolution segmentation approach on the basis of ESP scale parameter optimization was used for obtaining six scales to execute image segmentation, when the segmentation scale was 49, the classification accuracy reached the highest value of 89.58 %. The classification accuracy of object-based RF classification was 1.42 % higher than that of pixel-based classification (88.16 %), and the classification accuracy was further improved. Therefore, the RF classification method combined with object-based analysis approach could achieve relatively high accuracy in the classification and extraction of land use information for industrial and mining reclamation areas. Moreover, the interpretation of remotely sensed imagery using the proposed method could provide technical support and theoretical reference for remotely sensed monitoring land reclamation.
Real-time, resource-constrained object classification on a micro-air vehicle
NASA Astrophysics Data System (ADS)
Buck, Louis; Ray, Laura
2013-12-01
A real-time embedded object classification algorithm is developed through the novel combination of binary feature descriptors, a bag-of-visual-words object model and the cortico-striatal loop (CSL) learning algorithm. The BRIEF, ORB and FREAK binary descriptors are tested and compared to SIFT descriptors with regard to their respective classification accuracies, execution times, and memory requirements when used with CSL on a 12.6 g ARM Cortex embedded processor running at 800 MHz. Additionally, the effect of x2 feature mapping and opponent-color representations used with these descriptors is examined. These tests are performed on four data sets of varying sizes and difficulty, and the BRIEF descriptor is found to yield the best combination of speed and classification accuracy. Its use with CSL achieves accuracies between 67% and 95% of those achieved with SIFT descriptors and allows for the embedded classification of a 128x192 pixel image in 0.15 seconds, 60 times faster than classification with SIFT. X2 mapping is found to provide substantial improvements in classification accuracy for all of the descriptors at little cost, while opponent-color descriptors are offer accuracy improvements only on colorful datasets.
The use of Landsat data to inventory cotton and soybean acreage in North Alabama
NASA Technical Reports Server (NTRS)
Downs, S. W., Jr.; Faust, N. L.
1980-01-01
This study was performed to determine if Landsat data could be used to improve the accuracy of the estimation of cotton acreage. A linear classification algorithm and a maximum likelihood algorithm were used for computer classification of the area, and the classification was compared with ground truth. The classification accuracy for some fields was greater than 90 percent; however, the overall accuracy was 71 percent for cotton and 56 percent for soybeans. The results of this research indicate that computer analysis of Landsat data has potential for improving upon the methods presently being used to determine cotton acreage; however, additional experiments and refinements are needed before the method can be used operationally.
NASA Astrophysics Data System (ADS)
Quesada-Barriuso, Pablo; Heras, Dora B.; Argüello, Francisco
2016-10-01
The classification of remote sensing hyperspectral images for land cover applications is a very intensive topic. In the case of supervised classification, Support Vector Machines (SVMs) play a dominant role. Recently, the Extreme Learning Machine algorithm (ELM) has been extensively used. The classification scheme previously published by the authors, and called WT-EMP, introduces spatial information in the classification process by means of an Extended Morphological Profile (EMP) that is created from features extracted by wavelets. In addition, the hyperspectral image is denoised in the 2-D spatial domain, also using wavelets and it is joined to the EMP via a stacked vector. In this paper, the scheme is improved achieving two goals. The first one is to reduce the classification time while preserving the accuracy of the classification by using ELM instead of SVM. The second one is to improve the accuracy results by performing not only a 2-D denoising for every spectral band, but also a previous additional 1-D spectral signature denoising applied to each pixel vector of the image. For each denoising the image is transformed by applying a 1-D or 2-D wavelet transform, and then a NeighShrink thresholding is applied. Improvements in terms of classification accuracy are obtained, especially for images with close regions in the classification reference map, because in these cases the accuracy of the classification in the edges between classes is more relevant.
Singha, Mrinal; Wu, Bingfang; Zhang, Miao
2016-01-01
Accurate and timely mapping of paddy rice is vital for food security and environmental sustainability. This study evaluates the utility of temporal features extracted from coarse resolution data for object-based paddy rice classification of fine resolution data. The coarse resolution vegetation index data is first fused with the fine resolution data to generate the time series fine resolution data. Temporal features are extracted from the fused data and added with the multi-spectral data to improve the classification accuracy. Temporal features provided the crop growth information, while multi-spectral data provided the pattern variation of paddy rice. The achieved overall classification accuracy and kappa coefficient were 84.37% and 0.68, respectively. The results indicate that the use of temporal features improved the overall classification accuracy of a single-date multi-spectral image by 18.75% from 65.62% to 84.37%. The minimum sensitivity (MS) of the paddy rice classification has also been improved. The comparison showed that the mapped paddy area was analogous to the agricultural statistics at the district level. This work also highlighted the importance of feature selection to achieve higher classification accuracies. These results demonstrate the potential of the combined use of temporal and spectral features for accurate paddy rice classification. PMID:28025525
Janousova, Eva; Schwarz, Daniel; Kasparek, Tomas
2015-06-30
We investigated a combination of three classification algorithms, namely the modified maximum uncertainty linear discriminant analysis (mMLDA), the centroid method, and the average linkage, with three types of features extracted from three-dimensional T1-weighted magnetic resonance (MR) brain images, specifically MR intensities, grey matter densities, and local deformations for distinguishing 49 first episode schizophrenia male patients from 49 healthy male subjects. The feature sets were reduced using intersubject principal component analysis before classification. By combining the classifiers, we were able to obtain slightly improved results when compared with single classifiers. The best classification performance (81.6% accuracy, 75.5% sensitivity, and 87.8% specificity) was significantly better than classification by chance. We also showed that classifiers based on features calculated using more computation-intensive image preprocessing perform better; mMLDA with classification boundary calculated as weighted mean discriminative scores of the groups had improved sensitivity but similar accuracy compared to the original MLDA; reducing a number of eigenvectors during data reduction did not always lead to higher classification accuracy, since noise as well as the signal important for classification were removed. Our findings provide important information for schizophrenia research and may improve accuracy of computer-aided diagnostics of neuropsychiatric diseases. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Singha, Mrinal; Wu, Bingfang; Zhang, Miao
2016-12-22
Accurate and timely mapping of paddy rice is vital for food security and environmental sustainability. This study evaluates the utility of temporal features extracted from coarse resolution data for object-based paddy rice classification of fine resolution data. The coarse resolution vegetation index data is first fused with the fine resolution data to generate the time series fine resolution data. Temporal features are extracted from the fused data and added with the multi-spectral data to improve the classification accuracy. Temporal features provided the crop growth information, while multi-spectral data provided the pattern variation of paddy rice. The achieved overall classification accuracy and kappa coefficient were 84.37% and 0.68, respectively. The results indicate that the use of temporal features improved the overall classification accuracy of a single-date multi-spectral image by 18.75% from 65.62% to 84.37%. The minimum sensitivity (MS) of the paddy rice classification has also been improved. The comparison showed that the mapped paddy area was analogous to the agricultural statistics at the district level. This work also highlighted the importance of feature selection to achieve higher classification accuracies. These results demonstrate the potential of the combined use of temporal and spectral features for accurate paddy rice classification.
Multi-source remotely sensed data fusion for improving land cover classification
NASA Astrophysics Data System (ADS)
Chen, Bin; Huang, Bo; Xu, Bing
2017-02-01
Although many advances have been made in past decades, land cover classification of fine-resolution remotely sensed (RS) data integrating multiple temporal, angular, and spectral features remains limited, and the contribution of different RS features to land cover classification accuracy remains uncertain. We proposed to improve land cover classification accuracy by integrating multi-source RS features through data fusion. We further investigated the effect of different RS features on classification performance. The results of fusing Landsat-8 Operational Land Imager (OLI) data with Moderate Resolution Imaging Spectroradiometer (MODIS), China Environment 1A series (HJ-1A), and Advanced Spaceborne Thermal Emission and Reflection (ASTER) digital elevation model (DEM) data, showed that the fused data integrating temporal, spectral, angular, and topographic features achieved better land cover classification accuracy than the original RS data. Compared with the topographic feature, the temporal and angular features extracted from the fused data played more important roles in classification performance, especially those temporal features containing abundant vegetation growth information, which markedly increased the overall classification accuracy. In addition, the multispectral and hyperspectral fusion successfully discriminated detailed forest types. Our study provides a straightforward strategy for hierarchical land cover classification by making full use of available RS data. All of these methods and findings could be useful for land cover classification at both regional and global scales.
Cao, Jianfang; Cui, Hongyan; Shi, Hao; Jiao, Lijuan
2016-01-01
A back-propagation (BP) neural network can solve complicated random nonlinear mapping problems; therefore, it can be applied to a wide range of problems. However, as the sample size increases, the time required to train BP neural networks becomes lengthy. Moreover, the classification accuracy decreases as well. To improve the classification accuracy and runtime efficiency of the BP neural network algorithm, we proposed a parallel design and realization method for a particle swarm optimization (PSO)-optimized BP neural network based on MapReduce on the Hadoop platform using both the PSO algorithm and a parallel design. The PSO algorithm was used to optimize the BP neural network's initial weights and thresholds and improve the accuracy of the classification algorithm. The MapReduce parallel programming model was utilized to achieve parallel processing of the BP algorithm, thereby solving the problems of hardware and communication overhead when the BP neural network addresses big data. Datasets on 5 different scales were constructed using the scene image library from the SUN Database. The classification accuracy of the parallel PSO-BP neural network algorithm is approximately 92%, and the system efficiency is approximately 0.85, which presents obvious advantages when processing big data. The algorithm proposed in this study demonstrated both higher classification accuracy and improved time efficiency, which represents a significant improvement obtained from applying parallel processing to an intelligent algorithm on big data.
PCA based feature reduction to improve the accuracy of decision tree c4.5 classification
NASA Astrophysics Data System (ADS)
Nasution, M. Z. F.; Sitompul, O. S.; Ramli, M.
2018-03-01
Splitting attribute is a major process in Decision Tree C4.5 classification. However, this process does not give a significant impact on the establishment of the decision tree in terms of removing irrelevant features. It is a major problem in decision tree classification process called over-fitting resulting from noisy data and irrelevant features. In turns, over-fitting creates misclassification and data imbalance. Many algorithms have been proposed to overcome misclassification and overfitting on classifications Decision Tree C4.5. Feature reduction is one of important issues in classification model which is intended to remove irrelevant data in order to improve accuracy. The feature reduction framework is used to simplify high dimensional data to low dimensional data with non-correlated attributes. In this research, we proposed a framework for selecting relevant and non-correlated feature subsets. We consider principal component analysis (PCA) for feature reduction to perform non-correlated feature selection and Decision Tree C4.5 algorithm for the classification. From the experiments conducted using available data sets from UCI Cervical cancer data set repository with 858 instances and 36 attributes, we evaluated the performance of our framework based on accuracy, specificity and precision. Experimental results show that our proposed framework is robust to enhance classification accuracy with 90.70% accuracy rates.
Pan, Jianjun
2018-01-01
This paper focuses on evaluating the ability and contribution of using backscatter intensity, texture, coherence, and color features extracted from Sentinel-1A data for urban land cover classification and comparing different multi-sensor land cover mapping methods to improve classification accuracy. Both Landsat-8 OLI and Hyperion images were also acquired, in combination with Sentinel-1A data, to explore the potential of different multi-sensor urban land cover mapping methods to improve classification accuracy. The classification was performed using a random forest (RF) method. The results showed that the optimal window size of the combination of all texture features was 9 × 9, and the optimal window size was different for each individual texture feature. For the four different feature types, the texture features contributed the most to the classification, followed by the coherence and backscatter intensity features; and the color features had the least impact on the urban land cover classification. Satisfactory classification results can be obtained using only the combination of texture and coherence features, with an overall accuracy up to 91.55% and a kappa coefficient up to 0.8935, respectively. Among all combinations of Sentinel-1A-derived features, the combination of the four features had the best classification result. Multi-sensor urban land cover mapping obtained higher classification accuracy. The combination of Sentinel-1A and Hyperion data achieved higher classification accuracy compared to the combination of Sentinel-1A and Landsat-8 OLI images, with an overall accuracy of up to 99.12% and a kappa coefficient up to 0.9889. When Sentinel-1A data was added to Hyperion images, the overall accuracy and kappa coefficient were increased by 4.01% and 0.0519, respectively. PMID:29382073
Lu, Dengsheng; Batistella, Mateus; de Miranda, Evaristo E; Moran, Emilio
2008-01-01
Complex forest structure and abundant tree species in the moist tropical regions often cause difficulties in classifying vegetation classes with remotely sensed data. This paper explores improvement in vegetation classification accuracies through a comparative study of different image combinations based on the integration of Landsat Thematic Mapper (TM) and SPOT High Resolution Geometric (HRG) instrument data, as well as the combination of spectral signatures and textures. A maximum likelihood classifier was used to classify the different image combinations into thematic maps. This research indicated that data fusion based on HRG multispectral and panchromatic data slightly improved vegetation classification accuracies: a 3.1 to 4.6 percent increase in the kappa coefficient compared with the classification results based on original HRG or TM multispectral images. A combination of HRG spectral signatures and two textural images improved the kappa coefficient by 6.3 percent compared with pure HRG multispectral images. The textural images based on entropy or second-moment texture measures with a window size of 9 pixels × 9 pixels played an important role in improving vegetation classification accuracy. Overall, optical remote-sensing data are still insufficient for accurate vegetation classifications in the Amazon basin.
Lu, Dengsheng; Batistella, Mateus; de Miranda, Evaristo E.; Moran, Emilio
2009-01-01
Complex forest structure and abundant tree species in the moist tropical regions often cause difficulties in classifying vegetation classes with remotely sensed data. This paper explores improvement in vegetation classification accuracies through a comparative study of different image combinations based on the integration of Landsat Thematic Mapper (TM) and SPOT High Resolution Geometric (HRG) instrument data, as well as the combination of spectral signatures and textures. A maximum likelihood classifier was used to classify the different image combinations into thematic maps. This research indicated that data fusion based on HRG multispectral and panchromatic data slightly improved vegetation classification accuracies: a 3.1 to 4.6 percent increase in the kappa coefficient compared with the classification results based on original HRG or TM multispectral images. A combination of HRG spectral signatures and two textural images improved the kappa coefficient by 6.3 percent compared with pure HRG multispectral images. The textural images based on entropy or second-moment texture measures with a window size of 9 pixels × 9 pixels played an important role in improving vegetation classification accuracy. Overall, optical remote-sensing data are still insufficient for accurate vegetation classifications in the Amazon basin. PMID:19789716
A neural network approach to cloud classification
NASA Technical Reports Server (NTRS)
Lee, Jonathan; Weger, Ronald C.; Sengupta, Sailes K.; Welch, Ronald M.
1990-01-01
It is shown that, using high-spatial-resolution data, very high cloud classification accuracies can be obtained with a neural network approach. A texture-based neural network classifier using only single-channel visible Landsat MSS imagery achieves an overall cloud identification accuracy of 93 percent. Cirrus can be distinguished from boundary layer cloudiness with an accuracy of 96 percent, without the use of an infrared channel. Stratocumulus is retrieved with an accuracy of 92 percent, cumulus at 90 percent. The use of the neural network does not improve cirrus classification accuracy. Rather, its main effect is in the improved separation between stratocumulus and cumulus cloudiness. While most cloud classification algorithms rely on linear parametric schemes, the present study is based on a nonlinear, nonparametric four-layer neural network approach. A three-layer neural network architecture, the nonparametric K-nearest neighbor approach, and the linear stepwise discriminant analysis procedure are compared. A significant finding is that significantly higher accuracies are attained with the nonparametric approaches using only 20 percent of the database as training data, compared to 67 percent of the database in the linear approach.
Improved supervised classification of accelerometry data to distinguish behaviors of soaring birds.
Sur, Maitreyi; Suffredini, Tony; Wessells, Stephen M; Bloom, Peter H; Lanzone, Michael; Blackshire, Sheldon; Sridhar, Srisarguru; Katzner, Todd
2017-01-01
Soaring birds can balance the energetic costs of movement by switching between flapping, soaring and gliding flight. Accelerometers can allow quantification of flight behavior and thus a context to interpret these energetic costs. However, models to interpret accelerometry data are still being developed, rarely trained with supervised datasets, and difficult to apply. We collected accelerometry data at 140Hz from a trained golden eagle (Aquila chrysaetos) whose flight we recorded with video that we used to characterize behavior. We applied two forms of supervised classifications, random forest (RF) models and K-nearest neighbor (KNN) models. The KNN model was substantially easier to implement than the RF approach but both were highly accurate in classifying basic behaviors such as flapping (85.5% and 83.6% accurate, respectively), soaring (92.8% and 87.6%) and sitting (84.1% and 88.9%) with overall accuracies of 86.6% and 92.3% respectively. More detailed classification schemes, with specific behaviors such as banking and straight flights were well classified only by the KNN model (91.24% accurate; RF = 61.64% accurate). The RF model maintained its accuracy of classifying basic behavior classification accuracy of basic behaviors at sampling frequencies as low as 10Hz, the KNN at sampling frequencies as low as 20Hz. Classification of accelerometer data collected from free ranging birds demonstrated a strong dependence of predicted behavior on the type of classification model used. Our analyses demonstrate the consequence of different approaches to classification of accelerometry data, the potential to optimize classification algorithms with validated flight behaviors to improve classification accuracy, ideal sampling frequencies for different classification algorithms, and a number of ways to improve commonly used analytical techniques and best practices for classification of accelerometry data.
Improved supervised classification of accelerometry data to distinguish behaviors of soaring birds
Suffredini, Tony; Wessells, Stephen M.; Bloom, Peter H.; Lanzone, Michael; Blackshire, Sheldon; Sridhar, Srisarguru; Katzner, Todd
2017-01-01
Soaring birds can balance the energetic costs of movement by switching between flapping, soaring and gliding flight. Accelerometers can allow quantification of flight behavior and thus a context to interpret these energetic costs. However, models to interpret accelerometry data are still being developed, rarely trained with supervised datasets, and difficult to apply. We collected accelerometry data at 140Hz from a trained golden eagle (Aquila chrysaetos) whose flight we recorded with video that we used to characterize behavior. We applied two forms of supervised classifications, random forest (RF) models and K-nearest neighbor (KNN) models. The KNN model was substantially easier to implement than the RF approach but both were highly accurate in classifying basic behaviors such as flapping (85.5% and 83.6% accurate, respectively), soaring (92.8% and 87.6%) and sitting (84.1% and 88.9%) with overall accuracies of 86.6% and 92.3% respectively. More detailed classification schemes, with specific behaviors such as banking and straight flights were well classified only by the KNN model (91.24% accurate; RF = 61.64% accurate). The RF model maintained its accuracy of classifying basic behavior classification accuracy of basic behaviors at sampling frequencies as low as 10Hz, the KNN at sampling frequencies as low as 20Hz. Classification of accelerometer data collected from free ranging birds demonstrated a strong dependence of predicted behavior on the type of classification model used. Our analyses demonstrate the consequence of different approaches to classification of accelerometry data, the potential to optimize classification algorithms with validated flight behaviors to improve classification accuracy, ideal sampling frequencies for different classification algorithms, and a number of ways to improve commonly used analytical techniques and best practices for classification of accelerometry data. PMID:28403159
Improved supervised classification of accelerometry data to distinguish behaviors of soaring birds
Sur, Maitreyi; Suffredini, Tony; Wessells, Stephen M.; Bloom, Peter H.; Lanzone, Michael J.; Blackshire, Sheldon; Sridhar, Srisarguru; Katzner, Todd
2017-01-01
Soaring birds can balance the energetic costs of movement by switching between flapping, soaring and gliding flight. Accelerometers can allow quantification of flight behavior and thus a context to interpret these energetic costs. However, models to interpret accelerometry data are still being developed, rarely trained with supervised datasets, and difficult to apply. We collected accelerometry data at 140Hz from a trained golden eagle (Aquila chrysaetos) whose flight we recorded with video that we used to characterize behavior. We applied two forms of supervised classifications, random forest (RF) models and K-nearest neighbor (KNN) models. The KNN model was substantially easier to implement than the RF approach but both were highly accurate in classifying basic behaviors such as flapping (85.5% and 83.6% accurate, respectively), soaring (92.8% and 87.6%) and sitting (84.1% and 88.9%) with overall accuracies of 86.6% and 92.3% respectively. More detailed classification schemes, with specific behaviors such as banking and straight flights were well classified only by the KNN model (91.24% accurate; RF = 61.64% accurate). The RF model maintained its accuracy of classifying basic behavior classification accuracy of basic behaviors at sampling frequencies as low as 10Hz, the KNN at sampling frequencies as low as 20Hz. Classification of accelerometer data collected from free ranging birds demonstrated a strong dependence of predicted behavior on the type of classification model used. Our analyses demonstrate the consequence of different approaches to classification of accelerometry data, the potential to optimize classification algorithms with validated flight behaviors to improve classification accuracy, ideal sampling frequencies for different classification algorithms, and a number of ways to improve commonly used analytical techniques and best practices for classification of accelerometry data.
Tahmasian, Masoud; Jamalabadi, Hamidreza; Abedini, Mina; Ghadami, Mohammad R; Sepehry, Amir A; Knight, David C; Khazaie, Habibolah
2017-05-22
Sleep disturbance is common in chronic post-traumatic stress disorder (PTSD). However, prior work has demonstrated that there are inconsistencies between subjective and objective assessments of sleep disturbance in PTSD. Therefore, we investigated whether subjective or objective sleep assessment has greater clinical utility to differentiate PTSD patients from healthy subjects. Further, we evaluated whether the combination of subjective and objective methods improves the accuracy of classification into patient versus healthy groups, which has important diagnostic implications. We recruited 32 chronic war-induced PTSD patients and 32 age- and gender-matched healthy subjects to participate in this study. Subjective (i.e. from three self-reported sleep questionnaires) and objective sleep-related data (i.e. from actigraphy scores) were collected from each participant. Subjective, objective, and combined (subjective and objective) sleep data were then analyzed using support vector machine classification. The classification accuracy, sensitivity, and specificity for subjective variables were 89.2%, 89.3%, and 89%, respectively. The classification accuracy, sensitivity, and specificity for objective variables were 65%, 62.3%, and 67.8%, respectively. The classification accuracy, sensitivity, and specificity for the aggregate variables (combination of subjective and objective variables) were 91.6%, 93.0%, and 90.3%, respectively. Our findings indicate that classification accuracy using subjective measurements is superior to objective measurements and the combination of both assessments appears to improve the classification accuracy for differentiating PTSD patients from healthy individuals. Copyright © 2017 Elsevier B.V. All rights reserved.
Cao, Jianfang; Cui, Hongyan; Shi, Hao; Jiao, Lijuan
2016-01-01
A back-propagation (BP) neural network can solve complicated random nonlinear mapping problems; therefore, it can be applied to a wide range of problems. However, as the sample size increases, the time required to train BP neural networks becomes lengthy. Moreover, the classification accuracy decreases as well. To improve the classification accuracy and runtime efficiency of the BP neural network algorithm, we proposed a parallel design and realization method for a particle swarm optimization (PSO)-optimized BP neural network based on MapReduce on the Hadoop platform using both the PSO algorithm and a parallel design. The PSO algorithm was used to optimize the BP neural network’s initial weights and thresholds and improve the accuracy of the classification algorithm. The MapReduce parallel programming model was utilized to achieve parallel processing of the BP algorithm, thereby solving the problems of hardware and communication overhead when the BP neural network addresses big data. Datasets on 5 different scales were constructed using the scene image library from the SUN Database. The classification accuracy of the parallel PSO-BP neural network algorithm is approximately 92%, and the system efficiency is approximately 0.85, which presents obvious advantages when processing big data. The algorithm proposed in this study demonstrated both higher classification accuracy and improved time efficiency, which represents a significant improvement obtained from applying parallel processing to an intelligent algorithm on big data. PMID:27304987
Hao, Pengyu; Wang, Li; Niu, Zheng
2015-01-01
A range of single classifiers have been proposed to classify crop types using time series vegetation indices, and hybrid classifiers are used to improve discriminatory power. Traditional fusion rules use the product of multi-single classifiers, but that strategy cannot integrate the classification output of machine learning classifiers. In this research, the performance of two hybrid strategies, multiple voting (M-voting) and probabilistic fusion (P-fusion), for crop classification using NDVI time series were tested with different training sample sizes at both pixel and object levels, and two representative counties in north Xinjiang were selected as study area. The single classifiers employed in this research included Random Forest (RF), Support Vector Machine (SVM), and See 5 (C 5.0). The results indicated that classification performance improved (increased the mean overall accuracy by 5%~10%, and reduced standard deviation of overall accuracy by around 1%) substantially with the training sample number, and when the training sample size was small (50 or 100 training samples), hybrid classifiers substantially outperformed single classifiers with higher mean overall accuracy (1%~2%). However, when abundant training samples (4,000) were employed, single classifiers could achieve good classification accuracy, and all classifiers obtained similar performances. Additionally, although object-based classification did not improve accuracy, it resulted in greater visual appeal, especially in study areas with a heterogeneous cropping pattern. PMID:26360597
On-line analysis of algae in water by discrete three-dimensional fluorescence spectroscopy.
Zhao, Nanjing; Zhang, Xiaoling; Yin, Gaofang; Yang, Ruifang; Hu, Li; Chen, Shuang; Liu, Jianguo; Liu, Wenqing
2018-03-19
In view of the problem of the on-line measurement of algae classification, a method of algae classification and concentration determination based on the discrete three-dimensional fluorescence spectra was studied in this work. The discrete three-dimensional fluorescence spectra of twelve common species of algae belonging to five categories were analyzed, the discrete three-dimensional standard spectra of five categories were built, and the recognition, classification and concentration prediction of algae categories were realized by the discrete three-dimensional fluorescence spectra coupled with non-negative weighted least squares linear regression analysis. The results show that similarities between discrete three-dimensional standard spectra of different categories were reduced and the accuracies of recognition, classification and concentration prediction of the algae categories were significantly improved. By comparing with that of the chlorophyll a fluorescence excitation spectra method, the recognition accuracy rate in pure samples by discrete three-dimensional fluorescence spectra is improved 1.38%, and the recovery rate and classification accuracy in pure diatom samples 34.1% and 46.8%, respectively; the recognition accuracy rate of mixed samples by discrete-three dimensional fluorescence spectra is enhanced by 26.1%, the recovery rate of mixed samples with Chlorophyta 37.8%, and the classification accuracy of mixed samples with diatoms 54.6%.
NASA Astrophysics Data System (ADS)
Liu, Wanjun; Liang, Xuejian; Qu, Haicheng
2017-11-01
Hyperspectral image (HSI) classification is one of the most popular topics in remote sensing community. Traditional and deep learning-based classification methods were proposed constantly in recent years. In order to improve the classification accuracy and robustness, a dimensionality-varied convolutional neural network (DVCNN) was proposed in this paper. DVCNN was a novel deep architecture based on convolutional neural network (CNN). The input of DVCNN was a set of 3D patches selected from HSI which contained spectral-spatial joint information. In the following feature extraction process, each patch was transformed into some different 1D vectors by 3D convolution kernels, which were able to extract features from spectral-spatial data. The rest of DVCNN was about the same as general CNN and processed 2D matrix which was constituted by by all 1D data. So that the DVCNN could not only extract more accurate and rich features than CNN, but also fused spectral-spatial information to improve classification accuracy. Moreover, the robustness of network on water-absorption bands was enhanced in the process of spectral-spatial fusion by 3D convolution, and the calculation was simplified by dimensionality varied convolution. Experiments were performed on both Indian Pines and Pavia University scene datasets, and the results showed that the classification accuracy of DVCNN improved by 32.87% on Indian Pines and 19.63% on Pavia University scene than spectral-only CNN. The maximum accuracy improvement of DVCNN achievement was 13.72% compared with other state-of-the-art HSI classification methods, and the robustness of DVCNN on water-absorption bands noise was demonstrated.
Selective classification for improved robustness of myoelectric control under nonideal conditions.
Scheme, Erik J; Englehart, Kevin B; Hudgins, Bernard S
2011-06-01
Recent literature in pattern recognition-based myoelectric control has highlighted a disparity between classification accuracy and the usability of upper limb prostheses. This paper suggests that the conventionally defined classification accuracy may be idealistic and may not reflect true clinical performance. Herein, a novel myoelectric control system based on a selective multiclass one-versus-one classification scheme, capable of rejecting unknown data patterns, is introduced. This scheme is shown to outperform nine other popular classifiers when compared using conventional classification accuracy as well as a form of leave-one-out analysis that may be more representative of real prosthetic use. Additionally, the classification scheme allows for real-time, independent adjustment of individual class-pair boundaries making it flexible and intuitive for clinical use.
Fuzzy Classification of High Resolution Remote Sensing Scenes Using Visual Attention Features.
Li, Linyi; Xu, Tingbao; Chen, Yun
2017-01-01
In recent years the spatial resolutions of remote sensing images have been improved greatly. However, a higher spatial resolution image does not always lead to a better result of automatic scene classification. Visual attention is an important characteristic of the human visual system, which can effectively help to classify remote sensing scenes. In this study, a novel visual attention feature extraction algorithm was proposed, which extracted visual attention features through a multiscale process. And a fuzzy classification method using visual attention features (FC-VAF) was developed to perform high resolution remote sensing scene classification. FC-VAF was evaluated by using remote sensing scenes from widely used high resolution remote sensing images, including IKONOS, QuickBird, and ZY-3 images. FC-VAF achieved more accurate classification results than the others according to the quantitative accuracy evaluation indices. We also discussed the role and impacts of different decomposition levels and different wavelets on the classification accuracy. FC-VAF improves the accuracy of high resolution scene classification and therefore advances the research of digital image analysis and the applications of high resolution remote sensing images.
Fuzzy Classification of High Resolution Remote Sensing Scenes Using Visual Attention Features
Xu, Tingbao; Chen, Yun
2017-01-01
In recent years the spatial resolutions of remote sensing images have been improved greatly. However, a higher spatial resolution image does not always lead to a better result of automatic scene classification. Visual attention is an important characteristic of the human visual system, which can effectively help to classify remote sensing scenes. In this study, a novel visual attention feature extraction algorithm was proposed, which extracted visual attention features through a multiscale process. And a fuzzy classification method using visual attention features (FC-VAF) was developed to perform high resolution remote sensing scene classification. FC-VAF was evaluated by using remote sensing scenes from widely used high resolution remote sensing images, including IKONOS, QuickBird, and ZY-3 images. FC-VAF achieved more accurate classification results than the others according to the quantitative accuracy evaluation indices. We also discussed the role and impacts of different decomposition levels and different wavelets on the classification accuracy. FC-VAF improves the accuracy of high resolution scene classification and therefore advances the research of digital image analysis and the applications of high resolution remote sensing images. PMID:28761440
Use of collateral information to improve LANDSAT classification accuracies
NASA Technical Reports Server (NTRS)
Strahler, A. H. (Principal Investigator)
1981-01-01
Methods to improve LANDSAT classification accuracies were investigated including: (1) the use of prior probabilities in maximum likelihood classification as a methodology to integrate discrete collateral data with continuously measured image density variables; (2) the use of the logit classifier as an alternative to multivariate normal classification that permits mixing both continuous and categorical variables in a single model and fits empirical distributions of observations more closely than the multivariate normal density function; and (3) the use of collateral data in a geographic information system as exercised to model a desired output information layer as a function of input layers of raster format collateral and image data base layers.
Application of Sensor Fusion to Improve Uav Image Classification
NASA Astrophysics Data System (ADS)
Jabari, S.; Fathollahi, F.; Zhang, Y.
2017-08-01
Image classification is one of the most important tasks of remote sensing projects including the ones that are based on using UAV images. Improving the quality of UAV images directly affects the classification results and can save a huge amount of time and effort in this area. In this study, we show that sensor fusion can improve image quality which results in increasing the accuracy of image classification. Here, we tested two sensor fusion configurations by using a Panchromatic (Pan) camera along with either a colour camera or a four-band multi-spectral (MS) camera. We use the Pan camera to benefit from its higher sensitivity and the colour or MS camera to benefit from its spectral properties. The resulting images are then compared to the ones acquired by a high resolution single Bayer-pattern colour camera (here referred to as HRC). We assessed the quality of the output images by performing image classification tests. The outputs prove that the proposed sensor fusion configurations can achieve higher accuracies compared to the images of the single Bayer-pattern colour camera. Therefore, incorporating a Pan camera on-board in the UAV missions and performing image fusion can help achieving higher quality images and accordingly higher accuracy classification results.
NASA Astrophysics Data System (ADS)
Park, M.; Stenstrom, M. K.
2004-12-01
Recognizing urban information from the satellite imagery is problematic due to the diverse features and dynamic changes of urban landuse. The use of Landsat imagery for urban land use classification involves inherent uncertainty due to its spatial resolution and the low separability among land uses. To resolve the uncertainty problem, we investigated the performance of Bayesian networks to classify urban land use since Bayesian networks provide a quantitative way of handling uncertainty and have been successfully used in many areas. In this study, we developed the optimized networks for urban land use classification from Landsat ETM+ images of Marina del Rey area based on USGS land cover/use classification level III. The networks started from a tree structure based on mutual information between variables and added the links to improve accuracy. This methodology offers several advantages: (1) The network structure shows the dependency relationships between variables. The class node value can be predicted even with particular band information missing due to sensor system error. The missing information can be inferred from other dependent bands. (2) The network structure provides information of variables that are important for the classification, which is not available from conventional classification methods such as neural networks and maximum likelihood classification. In our case, for example, bands 1, 5 and 6 are the most important inputs in determining the land use of each pixel. (3) The networks can be reduced with those input variables important for classification. This minimizes the problem without considering all possible variables. We also examined the effect of incorporating ancillary data: geospatial information such as X and Y coordinate values of each pixel and DEM data, and vegetation indices such as NDVI and Tasseled Cap transformation. The results showed that the locational information improved overall accuracy (81%) and kappa coefficient (76%), and lowered the omission and commission errors compared with using only spectral data (accuracy 71%, kappa coefficient 62%). Incorporating DEM data did not significantly improve overall accuracy (74%) and kappa coefficient (66%) but lowered the omission and commission errors. Incorporating NDVI did not much improve the overall accuracy (72%) and k coefficient (65%). Including Tasseled Cap transformation reduced the accuracy (accuracy 70%, kappa 61%). Therefore, additional information from the DEM and vegetation indices was not useful as locational ancillary data.
The effect of finite field size on classification and atmospheric correction
NASA Technical Reports Server (NTRS)
Kaufman, Y. J.; Fraser, R. S.
1981-01-01
The atmospheric effect on the upward radiance of sunlight scattered from the Earth-atmosphere system is strongly influenced by the contrasts between fields and their sizes. For a given atmospheric turbidity, the atmospheric effect on classification of surface features is much stronger for nonuniform surfaces than for uniform surfaces. Therefore, the classification accuracy of agricultural fields and urban areas is dependent not only on the optical characteristics of the atmosphere, but also on the size of the surface do not account for the nonuniformity of the surface have only a slight effect on the classification accuracy; in other cases the classification accuracy descreases. The radiances above finite fields were computed to simulate radiances measured by a satellite. A simulation case including 11 agricultural fields and four natural fields (water, soil, savanah, and forest) was used to test the effect of the size of the background reflectance and the optical thickness of the atmosphere on classification accuracy. It is concluded that new atmospheric correction methods, which take into account the finite size of the fields, have to be developed to improve significantly the classification accuracy.
Zhang, He-Hua; Yang, Liuyang; Liu, Yuchuan; Wang, Pin; Yin, Jun; Li, Yongming; Qiu, Mingguo; Zhu, Xueru; Yan, Fang
2016-11-16
The use of speech based data in the classification of Parkinson disease (PD) has been shown to provide an effect, non-invasive mode of classification in recent years. Thus, there has been an increased interest in speech pattern analysis methods applicable to Parkinsonism for building predictive tele-diagnosis and tele-monitoring models. One of the obstacles in optimizing classifications is to reduce noise within the collected speech samples, thus ensuring better classification accuracy and stability. While the currently used methods are effect, the ability to invoke instance selection has been seldomly examined. In this study, a PD classification algorithm was proposed and examined that combines a multi-edit-nearest-neighbor (MENN) algorithm and an ensemble learning algorithm. First, the MENN algorithm is applied for selecting optimal training speech samples iteratively, thereby obtaining samples with high separability. Next, an ensemble learning algorithm, random forest (RF) or decorrelated neural network ensembles (DNNE), is used to generate trained samples from the collected training samples. Lastly, the trained ensemble learning algorithms are applied to the test samples for PD classification. This proposed method was examined using a more recently deposited public datasets and compared against other currently used algorithms for validation. Experimental results showed that the proposed algorithm obtained the highest degree of improved classification accuracy (29.44%) compared with the other algorithm that was examined. Furthermore, the MENN algorithm alone was found to improve classification accuracy by as much as 45.72%. Moreover, the proposed algorithm was found to exhibit a higher stability, particularly when combining the MENN and RF algorithms. This study showed that the proposed method could improve PD classification when using speech data and can be applied to future studies seeking to improve PD classification methods.
Research on aviation unsafe incidents classification with improved TF-IDF algorithm
NASA Astrophysics Data System (ADS)
Wang, Yanhua; Zhang, Zhiyuan; Huo, Weigang
2016-05-01
The text content of Aviation Safety Confidential Reports contains a large number of valuable information. Term frequency-inverse document frequency algorithm is commonly used in text analysis, but it does not take into account the sequential relationship of the words in the text and its role in semantic expression. According to the seven category labels of civil aviation unsafe incidents, aiming at solving the problems of TF-IDF algorithm, this paper improved TF-IDF algorithm based on co-occurrence network; established feature words extraction and words sequential relations for classified incidents. Aviation domain lexicon was used to improve the accuracy rate of classification. Feature words network model was designed for multi-documents unsafe incidents classification, and it was used in the experiment. Finally, the classification accuracy of improved algorithm was verified by the experiments.
[Electroencephalogram Feature Selection Based on Correlation Coefficient Analysis].
Zhou, Jinzhi; Tang, Xiaofang
2015-08-01
In order to improve the accuracy of classification with small amount of motor imagery training data on the development of brain-computer interface (BCD systems, we proposed an analyzing method to automatically select the characteristic parameters based on correlation coefficient analysis. Throughout the five sample data of dataset IV a from 2005 BCI Competition, we utilized short-time Fourier transform (STFT) and correlation coefficient calculation to reduce the number of primitive electroencephalogram dimension, then introduced feature extraction based on common spatial pattern (CSP) and classified by linear discriminant analysis (LDA). Simulation results showed that the average rate of classification accuracy could be improved by using correlation coefficient feature selection method than those without using this algorithm. Comparing with support vector machine (SVM) optimization features algorithm, the correlation coefficient analysis can lead better selection parameters to improve the accuracy of classification.
Di-codon Usage for Gene Classification
NASA Astrophysics Data System (ADS)
Nguyen, Minh N.; Ma, Jianmin; Fogel, Gary B.; Rajapakse, Jagath C.
Classification of genes into biologically related groups facilitates inference of their functions. Codon usage bias has been described previously as a potential feature for gene classification. In this paper, we demonstrate that di-codon usage can further improve classification of genes. By using both codon and di-codon features, we achieve near perfect accuracies for the classification of HLA molecules into major classes and sub-classes. The method is illustrated on 1,841 HLA sequences which are classified into two major classes, HLA-I and HLA-II. Major classes are further classified into sub-groups. A binary SVM using di-codon usage patterns achieved 99.95% accuracy in the classification of HLA genes into major HLA classes; and multi-class SVM achieved accuracy rates of 99.82% and 99.03% for sub-class classification of HLA-I and HLA-II genes, respectively. Furthermore, by combining codon and di-codon usages, the prediction accuracies reached 100%, 99.82%, and 99.84% for HLA major class classification, and for sub-class classification of HLA-I and HLA-II genes, respectively.
NASA Technical Reports Server (NTRS)
Cibula, William G.; Nyquist, Maurice O.
1987-01-01
An unsupervised computer classification of vegetation/landcover of Olympic National Park and surrounding environs was initially carried out using four bands of Landsat MSS data. The primary objective of the project was to derive a level of landcover classifications useful for park management applications while maintaining an acceptably high level of classification accuracy. Initially, nine generalized vegetation/landcover classes were derived. Overall classification accuracy was 91.7 percent. In an attempt to refine the level of classification, a geographic information system (GIS) approach was employed. Topographic data and watershed boundaries (inferred precipitation/temperature) data were registered with the Landsat MSS data. The resultant boolean operations yielded 21 vegetation/landcover classes while maintaining the same level of classification accuracy. The final classification provided much better identification and location of the major forest types within the park at the same high level of accuracy, and these met the project objective. This classification could now become inputs into a GIS system to help provide answers to park management coupled with other ancillary data programs such as fire management.
Stinchfield, Randy; McCready, John; Turner, Nigel E; Jimenez-Murcia, Susana; Petry, Nancy M; Grant, Jon; Welte, John; Chapman, Heather; Winters, Ken C
2016-09-01
The DSM-5 was published in 2013 and it included two substantive revisions for gambling disorder (GD). These changes are the reduction in the threshold from five to four criteria and elimination of the illegal activities criterion. The purpose of this study was to twofold. First, to assess the reliability, validity and classification accuracy of the DSM-5 diagnostic criteria for GD. Second, to compare the DSM-5-DSM-IV on reliability, validity, and classification accuracy, including an examination of the effect of the elimination of the illegal acts criterion on diagnostic accuracy. To compare DSM-5 and DSM-IV, eight datasets from three different countries (Canada, USA, and Spain; total N = 3247) were used. All datasets were based on similar research methods. Participants were recruited from outpatient gambling treatment services to represent the group with a GD and from the community to represent the group without a GD. All participants were administered a standardized measure of diagnostic criteria. The DSM-5 yielded satisfactory reliability, validity and classification accuracy. In comparing the DSM-5 to the DSM-IV, most comparisons of reliability, validity and classification accuracy showed more similarities than differences. There was evidence of modest improvements in classification accuracy for DSM-5 over DSM-IV, particularly in reduction of false negative errors. This reduction in false negative errors was largely a function of lowering the cut score from five to four and this revision is an improvement over DSM-IV. From a statistical standpoint, eliminating the illegal acts criterion did not make a significant impact on diagnostic accuracy. From a clinical standpoint, illegal acts can still be addressed in the context of the DSM-5 criterion of lying to others.
Bahadure, Nilesh Bhaskarrao; Ray, Arun Kumar; Thethi, Har Pal
2018-01-17
The detection of a brain tumor and its classification from modern imaging modalities is a primary concern, but a time-consuming and tedious work was performed by radiologists or clinical supervisors. The accuracy of detection and classification of tumor stages performed by radiologists is depended on their experience only, so the computer-aided technology is very important to aid with the diagnosis accuracy. In this study, to improve the performance of tumor detection, we investigated comparative approach of different segmentation techniques and selected the best one by comparing their segmentation score. Further, to improve the classification accuracy, the genetic algorithm is employed for the automatic classification of tumor stage. The decision of classification stage is supported by extracting relevant features and area calculation. The experimental results of proposed technique are evaluated and validated for performance and quality analysis on magnetic resonance brain images, based on segmentation score, accuracy, sensitivity, specificity, and dice similarity index coefficient. The experimental results achieved 92.03% accuracy, 91.42% specificity, 92.36% sensitivity, and an average segmentation score between 0.82 and 0.93 demonstrating the effectiveness of the proposed technique for identifying normal and abnormal tissues from brain MR images. The experimental results also obtained an average of 93.79% dice similarity index coefficient, which indicates better overlap between the automated extracted tumor regions with manually extracted tumor region by radiologists.
Gastric precancerous diseases classification using CNN with a concise model.
Zhang, Xu; Hu, Weiling; Chen, Fei; Liu, Jiquan; Yang, Yuanhang; Wang, Liangjing; Duan, Huilong; Si, Jianmin
2017-01-01
Gastric precancerous diseases (GPD) may deteriorate into early gastric cancer if misdiagnosed, so it is important to help doctors recognize GPD accurately and quickly. In this paper, we realize the classification of 3-class GPD, namely, polyp, erosion, and ulcer using convolutional neural networks (CNN) with a concise model called the Gastric Precancerous Disease Network (GPDNet). GPDNet introduces fire modules from SqueezeNet to reduce the model size and parameters about 10 times while improving speed for quick classification. To maintain classification accuracy with fewer parameters, we propose an innovative method called iterative reinforced learning (IRL). After training GPDNet from scratch, we apply IRL to fine-tune the parameters whose values are close to 0, and then we take the modified model as a pretrained model for the next training. The result shows that IRL can improve the accuracy about 9% after 6 iterations. The final classification accuracy of our GPDNet was 88.90%, which is promising for clinical GPD recognition.
Trakoolwilaiwan, Thanawin; Behboodi, Bahareh; Lee, Jaeseok; Kim, Kyungsoo; Choi, Ji-Woong
2018-01-01
The aim of this work is to develop an effective brain-computer interface (BCI) method based on functional near-infrared spectroscopy (fNIRS). In order to improve the performance of the BCI system in terms of accuracy, the ability to discriminate features from input signals and proper classification are desired. Previous studies have mainly extracted features from the signal manually, but proper features need to be selected carefully. To avoid performance degradation caused by manual feature selection, we applied convolutional neural networks (CNNs) as the automatic feature extractor and classifier for fNIRS-based BCI. In this study, the hemodynamic responses evoked by performing rest, right-, and left-hand motor execution tasks were measured on eight healthy subjects to compare performances. Our CNN-based method provided improvements in classification accuracy over conventional methods employing the most commonly used features of mean, peak, slope, variance, kurtosis, and skewness, classified by support vector machine (SVM) and artificial neural network (ANN). Specifically, up to 6.49% and 3.33% improvement in classification accuracy was achieved by CNN compared with SVM and ANN, respectively.
A stereo remote sensing feature selection method based on artificial bee colony algorithm
NASA Astrophysics Data System (ADS)
Yan, Yiming; Liu, Pigang; Zhang, Ye; Su, Nan; Tian, Shu; Gao, Fengjiao; Shen, Yi
2014-05-01
To improve the efficiency of stereo information for remote sensing classification, a stereo remote sensing feature selection method is proposed in this paper presents, which is based on artificial bee colony algorithm. Remote sensing stereo information could be described by digital surface model (DSM) and optical image, which contain information of the three-dimensional structure and optical characteristics, respectively. Firstly, three-dimensional structure characteristic could be analyzed by 3D-Zernike descriptors (3DZD). However, different parameters of 3DZD could descript different complexity of three-dimensional structure, and it needs to be better optimized selected for various objects on the ground. Secondly, features for representing optical characteristic also need to be optimized. If not properly handled, when a stereo feature vector composed of 3DZD and image features, that would be a lot of redundant information, and the redundant information may not improve the classification accuracy, even cause adverse effects. To reduce information redundancy while maintaining or improving the classification accuracy, an optimized frame for this stereo feature selection problem is created, and artificial bee colony algorithm is introduced for solving this optimization problem. Experimental results show that the proposed method can effectively improve the computational efficiency, improve the classification accuracy.
Hong, Keum-Shik; Khan, Muhammad Jawad
2017-01-01
In this article, non-invasive hybrid brain-computer interface (hBCI) technologies for improving classification accuracy and increasing the number of commands are reviewed. Hybridization combining more than two modalities is a new trend in brain imaging and prosthesis control. Electroencephalography (EEG), due to its easy use and fast temporal resolution, is most widely utilized in combination with other brain/non-brain signal acquisition modalities, for instance, functional near infrared spectroscopy (fNIRS), electromyography (EMG), electrooculography (EOG), and eye tracker. Three main purposes of hybridization are to increase the number of control commands, improve classification accuracy and reduce the signal detection time. Currently, such combinations of EEG + fNIRS and EEG + EOG are most commonly employed. Four principal components (i.e., hardware, paradigm, classifiers, and features) relevant to accuracy improvement are discussed. In the case of brain signals, motor imagination/movement tasks are combined with cognitive tasks to increase active brain-computer interface (BCI) accuracy. Active and reactive tasks sometimes are combined: motor imagination with steady-state evoked visual potentials (SSVEP) and motor imagination with P300. In the case of reactive tasks, SSVEP is most widely combined with P300 to increase the number of commands. Passive BCIs, however, are rare. After discussing the hardware and strategies involved in the development of hBCI, the second part examines the approaches used to increase the number of control commands and to enhance classification accuracy. The future prospects and the extension of hBCI in real-time applications for daily life scenarios are provided.
Hong, Keum-Shik; Khan, Muhammad Jawad
2017-01-01
In this article, non-invasive hybrid brain–computer interface (hBCI) technologies for improving classification accuracy and increasing the number of commands are reviewed. Hybridization combining more than two modalities is a new trend in brain imaging and prosthesis control. Electroencephalography (EEG), due to its easy use and fast temporal resolution, is most widely utilized in combination with other brain/non-brain signal acquisition modalities, for instance, functional near infrared spectroscopy (fNIRS), electromyography (EMG), electrooculography (EOG), and eye tracker. Three main purposes of hybridization are to increase the number of control commands, improve classification accuracy and reduce the signal detection time. Currently, such combinations of EEG + fNIRS and EEG + EOG are most commonly employed. Four principal components (i.e., hardware, paradigm, classifiers, and features) relevant to accuracy improvement are discussed. In the case of brain signals, motor imagination/movement tasks are combined with cognitive tasks to increase active brain–computer interface (BCI) accuracy. Active and reactive tasks sometimes are combined: motor imagination with steady-state evoked visual potentials (SSVEP) and motor imagination with P300. In the case of reactive tasks, SSVEP is most widely combined with P300 to increase the number of commands. Passive BCIs, however, are rare. After discussing the hardware and strategies involved in the development of hBCI, the second part examines the approaches used to increase the number of control commands and to enhance classification accuracy. The future prospects and the extension of hBCI in real-time applications for daily life scenarios are provided. PMID:28790910
Ozcift, Akin
2012-08-01
Parkinson disease (PD) is an age-related deterioration of certain nerve systems, which affects movement, balance, and muscle control of clients. PD is one of the common diseases which affect 1% of people older than 60 years. A new classification scheme based on support vector machine (SVM) selected features to train rotation forest (RF) ensemble classifiers is presented for improving diagnosis of PD. The dataset contains records of voice measurements from 31 people, 23 with PD and each record in the dataset is defined with 22 features. The diagnosis model first makes use of a linear SVM to select ten most relevant features from 22. As a second step of the classification model, six different classifiers are trained with the subset of features. Subsequently, at the third step, the accuracies of classifiers are improved by the utilization of RF ensemble classification strategy. The results of the experiments are evaluated using three metrics; classification accuracy (ACC), Kappa Error (KE) and Area under the Receiver Operating Characteristic (ROC) Curve (AUC). Performance measures of two base classifiers, i.e. KStar and IBk, demonstrated an apparent increase in PD diagnosis accuracy compared to similar studies in literature. After all, application of RF ensemble classification scheme improved PD diagnosis in 5 of 6 classifiers significantly. We, numerically, obtained about 97% accuracy in RF ensemble of IBk (a K-Nearest Neighbor variant) algorithm, which is a quite high performance for Parkinson disease diagnosis.
NASA Astrophysics Data System (ADS)
Löw, Fabian; Schorcht, Gunther; Michel, Ulrich; Dech, Stefan; Conrad, Christopher
2012-10-01
Accurate crop identification and crop area estimation are important for studies on irrigated agricultural systems, yield and water demand modeling, and agrarian policy development. In this study a novel combination of Random Forest (RF) and Support Vector Machine (SVM) classifiers is presented that (i) enhances crop classification accuracy and (ii) provides spatial information on map uncertainty. The methodology was implemented over four distinct irrigated sites in Middle Asia using RapidEye time series data. The RF feature importance statistics was used as feature-selection strategy for the SVM to assess possible negative effects on classification accuracy caused by an oversized feature space. The results of the individual RF and SVM classifications were combined with rules based on posterior classification probability and estimates of classification probability entropy. SVM classification performance was increased by feature selection through RF. Further experimental results indicate that the hybrid classifier improves overall classification accuracy in comparison to the single classifiers as well as useŕs and produceŕs accuracy.
Feature Selection Has a Large Impact on One-Class Classification Accuracy for MicroRNAs in Plants.
Yousef, Malik; Saçar Demirci, Müşerref Duygu; Khalifa, Waleed; Allmer, Jens
2016-01-01
MicroRNAs (miRNAs) are short RNA sequences involved in posttranscriptional gene regulation. Their experimental analysis is complicated and, therefore, needs to be supplemented with computational miRNA detection. Currently computational miRNA detection is mainly performed using machine learning and in particular two-class classification. For machine learning, the miRNAs need to be parametrized and more than 700 features have been described. Positive training examples for machine learning are readily available, but negative data is hard to come by. Therefore, it seems prerogative to use one-class classification instead of two-class classification. Previously, we were able to almost reach two-class classification accuracy using one-class classifiers. In this work, we employ feature selection procedures in conjunction with one-class classification and show that there is up to 36% difference in accuracy among these feature selection methods. The best feature set allowed the training of a one-class classifier which achieved an average accuracy of ~95.6% thereby outperforming previous two-class-based plant miRNA detection approaches by about 0.5%. We believe that this can be improved upon in the future by rigorous filtering of the positive training examples and by improving current feature clustering algorithms to better target pre-miRNA feature selection.
Knauer, Uwe; Matros, Andrea; Petrovic, Tijana; Zanker, Timothy; Scott, Eileen S; Seiffert, Udo
2017-01-01
Hyperspectral imaging is an emerging means of assessing plant vitality, stress parameters, nutrition status, and diseases. Extraction of target values from the high-dimensional datasets either relies on pixel-wise processing of the full spectral information, appropriate selection of individual bands, or calculation of spectral indices. Limitations of such approaches are reduced classification accuracy, reduced robustness due to spatial variation of the spectral information across the surface of the objects measured as well as a loss of information intrinsic to band selection and use of spectral indices. In this paper we present an improved spatial-spectral segmentation approach for the analysis of hyperspectral imaging data and its application for the prediction of powdery mildew infection levels (disease severity) of intact Chardonnay grape bunches shortly before veraison. Instead of calculating texture features (spatial features) for the huge number of spectral bands independently, dimensionality reduction by means of Linear Discriminant Analysis (LDA) was applied first to derive a few descriptive image bands. Subsequent classification was based on modified Random Forest classifiers and selective extraction of texture parameters from the integral image representation of the image bands generated. Dimensionality reduction, integral images, and the selective feature extraction led to improved classification accuracies of up to [Formula: see text] for detached berries used as a reference sample (training dataset). Our approach was validated by predicting infection levels for a sample of 30 intact bunches. Classification accuracy improved with the number of decision trees of the Random Forest classifier. These results corresponded with qPCR results. An accuracy of 0.87 was achieved in classification of healthy, infected, and severely diseased bunches. However, discrimination between visually healthy and infected bunches proved to be challenging for a few samples, perhaps due to colonized berries or sparse mycelia hidden within the bunch or airborne conidia on the berries that were detected by qPCR. An advanced approach to hyperspectral image classification based on combined spatial and spectral image features, potentially applicable to many available hyperspectral sensor technologies, has been developed and validated to improve the detection of powdery mildew infection levels of Chardonnay grape bunches. The spatial-spectral approach improved especially the detection of light infection levels compared with pixel-wise spectral data analysis. This approach is expected to improve the speed and accuracy of disease detection once the thresholds for fungal biomass detected by hyperspectral imaging are established; it can also facilitate monitoring in plant phenotyping of grapevine and additional crops.
High Accuracy Human Activity Recognition Based on Sparse Locality Preserving Projections.
Zhu, Xiangbin; Qiu, Huiling
2016-01-01
Human activity recognition(HAR) from the temporal streams of sensory data has been applied to many fields, such as healthcare services, intelligent environments and cyber security. However, the classification accuracy of most existed methods is not enough in some applications, especially for healthcare services. In order to improving accuracy, it is necessary to develop a novel method which will take full account of the intrinsic sequential characteristics for time-series sensory data. Moreover, each human activity may has correlated feature relationship at different levels. Therefore, in this paper, we propose a three-stage continuous hidden Markov model (TSCHMM) approach to recognize human activities. The proposed method contains coarse, fine and accurate classification. The feature reduction is an important step in classification processing. In this paper, sparse locality preserving projections (SpLPP) is exploited to determine the optimal feature subsets for accurate classification of the stationary-activity data. It can extract more discriminative activities features from the sensor data compared with locality preserving projections. Furthermore, all of the gyro-based features are used for accurate classification of the moving data. Compared with other methods, our method uses significantly less number of features, and the over-all accuracy has been obviously improved.
High Accuracy Human Activity Recognition Based on Sparse Locality Preserving Projections
2016-01-01
Human activity recognition(HAR) from the temporal streams of sensory data has been applied to many fields, such as healthcare services, intelligent environments and cyber security. However, the classification accuracy of most existed methods is not enough in some applications, especially for healthcare services. In order to improving accuracy, it is necessary to develop a novel method which will take full account of the intrinsic sequential characteristics for time-series sensory data. Moreover, each human activity may has correlated feature relationship at different levels. Therefore, in this paper, we propose a three-stage continuous hidden Markov model (TSCHMM) approach to recognize human activities. The proposed method contains coarse, fine and accurate classification. The feature reduction is an important step in classification processing. In this paper, sparse locality preserving projections (SpLPP) is exploited to determine the optimal feature subsets for accurate classification of the stationary-activity data. It can extract more discriminative activities features from the sensor data compared with locality preserving projections. Furthermore, all of the gyro-based features are used for accurate classification of the moving data. Compared with other methods, our method uses significantly less number of features, and the over-all accuracy has been obviously improved. PMID:27893761
NASA Technical Reports Server (NTRS)
Stoner, E. R.; May, G. A.; Kalcic, M. T. (Principal Investigator)
1981-01-01
Sample segments of ground-verified land cover data collected in conjunction with the USDA/ESS June Enumerative Survey were merged with LANDSAT data and served as a focus for unsupervised spectral class development and accuracy assessment. Multitemporal data sets were created from single-date LANDSAT MSS acquisitions from a nominal scene covering an eleven-county area in north central Missouri. Classification accuracies for the four land cover types predominant in the test site showed significant improvement in going from unitemporal to multitemporal data sets. Transformed LANDSAT data sets did not significantly improve classification accuracies. Regression estimators yielded mixed results for different land covers. Misregistration of two LANDSAT data sets by as much and one half pixels did not significantly alter overall classification accuracies. Existing algorithms for scene-to scene overlay proved adequate for multitemporal data analysis as long as statistical class development and accuracy assessment were restricted to field interior pixels.
Ozcift, Akin; Gulten, Arif
2011-12-01
Improving accuracies of machine learning algorithms is vital in designing high performance computer-aided diagnosis (CADx) systems. Researches have shown that a base classifier performance might be enhanced by ensemble classification strategies. In this study, we construct rotation forest (RF) ensemble classifiers of 30 machine learning algorithms to evaluate their classification performances using Parkinson's, diabetes and heart diseases from literature. While making experiments, first the feature dimension of three datasets is reduced using correlation based feature selection (CFS) algorithm. Second, classification performances of 30 machine learning algorithms are calculated for three datasets. Third, 30 classifier ensembles are constructed based on RF algorithm to assess performances of respective classifiers with the same disease data. All the experiments are carried out with leave-one-out validation strategy and the performances of the 60 algorithms are evaluated using three metrics; classification accuracy (ACC), kappa error (KE) and area under the receiver operating characteristic (ROC) curve (AUC). Base classifiers succeeded 72.15%, 77.52% and 84.43% average accuracies for diabetes, heart and Parkinson's datasets, respectively. As for RF classifier ensembles, they produced average accuracies of 74.47%, 80.49% and 87.13% for respective diseases. RF, a newly proposed classifier ensemble algorithm, might be used to improve accuracy of miscellaneous machine learning algorithms to design advanced CADx systems. Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.
NASA Technical Reports Server (NTRS)
Lillesand, T. M.; Werth, L. F. (Principal Investigator)
1980-01-01
A 25% improvement in average classification accuracy was realized by processing double-date vs. single-date data. Under the spectrally and spatially complex site conditions characterizing the geographical area used, further improvement in wetland classification accuracy is apparently precluded by the spectral and spatial resolution restrictions of the LANDSAT MSS. Full scene analysis of scanning densitometer data extracted from scale infrared photography failed to permit discrimination of many wetland and nonwetland cover types. When classification of photographic data was limited to wetland areas only, much more detailed and accurate classification could be made. The integration of conventional image interpretation (to simply delineate wetland boundaries) and machine assisted classification (to discriminate among cover types present within the wetland areas) appears to warrant further research to study the feasibility and cost of extending this methodology over a large area using LANDSAT and/or small scale photography.
Link prediction boosted psychiatry disorder classification for functional connectivity network
NASA Astrophysics Data System (ADS)
Li, Weiwei; Mei, Xue; Wang, Hao; Zhou, Yu; Huang, Jiashuang
2017-02-01
Functional connectivity network (FCN) is an effective tool in psychiatry disorders classification, and represents cross-correlation of the regional blood oxygenation level dependent signal. However, FCN is often incomplete for suffering from missing and spurious edges. To accurate classify psychiatry disorders and health control with the incomplete FCN, we first `repair' the FCN with link prediction, and then exact the clustering coefficients as features to build a weak classifier for every FCN. Finally, we apply a boosting algorithm to combine these weak classifiers for improving classification accuracy. Our method tested by three datasets of psychiatry disorder, including Alzheimer's Disease, Schizophrenia and Attention Deficit Hyperactivity Disorder. The experimental results show our method not only significantly improves the classification accuracy, but also efficiently reconstructs the incomplete FCN.
a Gsa-Svm Hybrid System for Classification of Binary Problems
NASA Astrophysics Data System (ADS)
Sarafrazi, Soroor; Nezamabadi-pour, Hossein; Barahman, Mojgan
2011-06-01
This paperhybridizesgravitational search algorithm (GSA) with support vector machine (SVM) and made a novel GSA-SVM hybrid system to improve the classification accuracy in binary problems. GSA is an optimization heuristic toolused to optimize the value of SVM kernel parameter (in this paper, radial basis function (RBF) is chosen as the kernel function). The experimental results show that this newapproach can achieve high classification accuracy and is comparable to or better than the particle swarm optimization (PSO)-SVM and genetic algorithm (GA)-SVM, which are two hybrid systems for classification.
Classifying four-category visual objects using multiple ERP components in single-trial ERP.
Qin, Yu; Zhan, Yu; Wang, Changming; Zhang, Jiacai; Yao, Li; Guo, Xiaojuan; Wu, Xia; Hu, Bin
2016-08-01
Object categorization using single-trial electroencephalography (EEG) data measured while participants view images has been studied intensively. In previous studies, multiple event-related potential (ERP) components (e.g., P1, N1, P2, and P3) were used to improve the performance of object categorization of visual stimuli. In this study, we introduce a novel method that uses multiple-kernel support vector machine to fuse multiple ERP component features. We investigate whether fusing the potential complementary information of different ERP components (e.g., P1, N1, P2a, and P2b) can improve the performance of four-category visual object classification in single-trial EEGs. We also compare the classification accuracy of different ERP component fusion methods. Our experimental results indicate that the classification accuracy increases through multiple ERP fusion. Additional comparative analyses indicate that the multiple-kernel fusion method can achieve a mean classification accuracy higher than 72 %, which is substantially better than that achieved with any single ERP component feature (55.07 % for the best single ERP component, N1). We compare the classification results with those of other fusion methods and determine that the accuracy of the multiple-kernel fusion method is 5.47, 4.06, and 16.90 % higher than those of feature concatenation, feature extraction, and decision fusion, respectively. Our study shows that our multiple-kernel fusion method outperforms other fusion methods and thus provides a means to improve the classification performance of single-trial ERPs in brain-computer interface research.
Findeisen, Peter; Peccerella, Teresa; Post, Stefan; Wenz, Frederik; Neumaier, Michael
2008-04-01
Serum is a difficult matrix for the identification of biomarkers by mass spectrometry (MS). This is due to high-abundance proteins and their complex processing by a multitude of endogenous proteases making rigorous standardisation difficult. Here, we have investigated the use of defined exogenous reporter peptides as substrates for disease-specific proteases with respect to improved standardisation and disease classification accuracy. A recombinant N-terminal fragment of the Adenomatous Polyposis Coli (APC) protein was digested with trypsin to yield a peptide mixture for subsequent Reporter Peptide Spiking (RPS) of serum. Different preanalytical handling of serum samples was simulated by storage of serum samples for up to 6 h at ambient temperature, followed by RPS, further incubation under standardised conditions and testing for stability of protease-generated MS profiles. To demonstrate the superior classification accuracy achieved by RPS, a pilot profiling experiment was performed using serum specimens from pancreatic cancer patients (n = 50) and healthy controls (n = 50). After RPS six different peak categories could be defined, two of which (categories C and D) are modulated by endogenous proteases. These latter are relevant for improved classification accuracy as shown by enhanced disease-specific classification from 78% to 87% in unspiked and spiked samples, respectively. Peaks of these categories presented with unchanged signal intensities regardless of preanalytical conditions. The use of RPS generally improved the signal intensities of protease-generated peptide peaks. RPS circumvents preanalytical variabilities and improves classification accuracies. Our approach will be helpful to introduce MS-based proteomic profiling into routine laboratory testing.
ERIC Educational Resources Information Center
Cohen, Ira L.; Liu, Xudong; Hudson, Melissa; Gillis, Jennifer; Cavalari, Rachel N. S.; Romanczyk, Raymond G.; Karmel, Bernard Z.; Gardner, Judith M.
2016-01-01
In order to improve discrimination accuracy between Autism Spectrum Disorder (ASD) and similar neurodevelopmental disorders, a data mining procedure, Classification and Regression Trees (CART), was used on a large multi-site sample of PDD Behavior Inventory (PDDBI) forms on children with and without ASD. Discrimination accuracy exceeded 80%,…
Garcia-Chimeno, Yolanda; Garcia-Zapirain, Begonya; Gomez-Beldarrain, Marian; Fernandez-Ruanova, Begonya; Garcia-Monco, Juan Carlos
2017-04-13
Feature selection methods are commonly used to identify subsets of relevant features to facilitate the construction of models for classification, yet little is known about how feature selection methods perform in diffusion tensor images (DTIs). In this study, feature selection and machine learning classification methods were tested for the purpose of automating diagnosis of migraines using both DTIs and questionnaire answers related to emotion and cognition - factors that influence of pain perceptions. We select 52 adult subjects for the study divided into three groups: control group (15), subjects with sporadic migraine (19) and subjects with chronic migraine and medication overuse (18). These subjects underwent magnetic resonance with diffusion tensor to see white matter pathway integrity of the regions of interest involved in pain and emotion. The tests also gather data about pathology. The DTI images and test results were then introduced into feature selection algorithms (Gradient Tree Boosting, L1-based, Random Forest and Univariate) to reduce features of the first dataset and classification algorithms (SVM (Support Vector Machine), Boosting (Adaboost) and Naive Bayes) to perform a classification of migraine group. Moreover we implement a committee method to improve the classification accuracy based on feature selection algorithms. When classifying the migraine group, the greatest improvements in accuracy were made using the proposed committee-based feature selection method. Using this approach, the accuracy of classification into three types improved from 67 to 93% when using the Naive Bayes classifier, from 90 to 95% with the support vector machine classifier, 93 to 94% in boosting. The features that were determined to be most useful for classification included are related with the pain, analgesics and left uncinate brain (connected with the pain and emotions). The proposed feature selection committee method improved the performance of migraine diagnosis classifiers compared to individual feature selection methods, producing a robust system that achieved over 90% accuracy in all classifiers. The results suggest that the proposed methods can be used to support specialists in the classification of migraines in patients undergoing magnetic resonance imaging.
2011-01-01
Background Dementia and cognitive impairment associated with aging are a major medical and social concern. Neuropsychological testing is a key element in the diagnostic procedures of Mild Cognitive Impairment (MCI), but has presently a limited value in the prediction of progression to dementia. We advance the hypothesis that newer statistical classification methods derived from data mining and machine learning methods like Neural Networks, Support Vector Machines and Random Forests can improve accuracy, sensitivity and specificity of predictions obtained from neuropsychological testing. Seven non parametric classifiers derived from data mining methods (Multilayer Perceptrons Neural Networks, Radial Basis Function Neural Networks, Support Vector Machines, CART, CHAID and QUEST Classification Trees and Random Forests) were compared to three traditional classifiers (Linear Discriminant Analysis, Quadratic Discriminant Analysis and Logistic Regression) in terms of overall classification accuracy, specificity, sensitivity, Area under the ROC curve and Press'Q. Model predictors were 10 neuropsychological tests currently used in the diagnosis of dementia. Statistical distributions of classification parameters obtained from a 5-fold cross-validation were compared using the Friedman's nonparametric test. Results Press' Q test showed that all classifiers performed better than chance alone (p < 0.05). Support Vector Machines showed the larger overall classification accuracy (Median (Me) = 0.76) an area under the ROC (Me = 0.90). However this method showed high specificity (Me = 1.0) but low sensitivity (Me = 0.3). Random Forest ranked second in overall accuracy (Me = 0.73) with high area under the ROC (Me = 0.73) specificity (Me = 0.73) and sensitivity (Me = 0.64). Linear Discriminant Analysis also showed acceptable overall accuracy (Me = 0.66), with acceptable area under the ROC (Me = 0.72) specificity (Me = 0.66) and sensitivity (Me = 0.64). The remaining classifiers showed overall classification accuracy above a median value of 0.63, but for most sensitivity was around or even lower than a median value of 0.5. Conclusions When taking into account sensitivity, specificity and overall classification accuracy Random Forests and Linear Discriminant analysis rank first among all the classifiers tested in prediction of dementia using several neuropsychological tests. These methods may be used to improve accuracy, sensitivity and specificity of Dementia predictions from neuropsychological testing. PMID:21849043
Pashaei, Elnaz; Ozen, Mustafa; Aydin, Nizamettin
2015-08-01
Improving accuracy of supervised classification algorithms in biomedical applications is one of active area of research. In this study, we improve the performance of Particle Swarm Optimization (PSO) combined with C4.5 decision tree (PSO+C4.5) classifier by applying Boosted C5.0 decision tree as the fitness function. To evaluate the effectiveness of our proposed method, it is implemented on 1 microarray dataset and 5 different medical data sets obtained from UCI machine learning databases. Moreover, the results of PSO + Boosted C5.0 implementation are compared to eight well-known benchmark classification methods (PSO+C4.5, support vector machine under the kernel of Radial Basis Function, Classification And Regression Tree (CART), C4.5 decision tree, C5.0 decision tree, Boosted C5.0 decision tree, Naive Bayes and Weighted K-Nearest neighbor). Repeated five-fold cross-validation method was used to justify the performance of classifiers. Experimental results show that our proposed method not only improve the performance of PSO+C4.5 but also obtains higher classification accuracy compared to the other classification methods.
Determining successional stage of temperate coniferous forests with Landsat satellite data
NASA Technical Reports Server (NTRS)
Fiorella, Maria; Ripple, William J.
1993-01-01
Thematic Mapper (TM) digital imagery was used to map forest successional stages and to evaluate spectral differences between old-growth and mature forests in the central Cascade Range of Oregon. Relative sun incidence values were incorporated into the successional stage classification to compensate for topographic induced variation. Relative sun incidence improved the classification accuracy of young successional stages, but did not improve the classification accuracy of older, closed canopy forest classes or overall accuracy. TM bands 1, 2, and 4; the normalized difference vegetation index; and TM 4/3, 4/5, and 4/7 band ratio values for o|d-growth forests were found to be significantly lower than the values of mature forests. The Tasseled Cap features of brightness, greenness, and wetness also had significantly lower old-growth values as compared to mature forest values .
Madison, Matthew J; Bradshaw, Laine P
2015-06-01
Diagnostic classification models are psychometric models that aim to classify examinees according to their mastery or non-mastery of specified latent characteristics. These models are well-suited for providing diagnostic feedback on educational assessments because of their practical efficiency and increased reliability when compared with other multidimensional measurement models. A priori specifications of which latent characteristics or attributes are measured by each item are a core element of the diagnostic assessment design. This item-attribute alignment, expressed in a Q-matrix, precedes and supports any inference resulting from the application of the diagnostic classification model. This study investigates the effects of Q-matrix design on classification accuracy for the log-linear cognitive diagnosis model. Results indicate that classification accuracy, reliability, and convergence rates improve when the Q-matrix contains isolated information from each measured attribute.
NASA Astrophysics Data System (ADS)
Dou, P.
2017-12-01
Guangzhou has experienced a rapid urbanization period called "small change in three years and big change in five years" since the reform of China, resulting in significant land use/cover changes(LUC). To overcome the disadvantages of single classifier for remote sensing image classification accuracy, a multiple classifier system (MCS) is proposed to improve the quality of remote sensing image classification. The new method combines advantages of different learning algorithms, and achieves higher accuracy (88.12%) than any single classifier did. With the proposed MCS, land use/cover (LUC) on Landsat images from 1987 to 2015 was obtained, and the LUCs were used on three watersheds (Shijing river, Chebei stream, and Shahe stream) to estimate the impact of urbanization on water flood. The results show that with the high accuracy LUC, the uncertainty in flood simulations are reduced effectively (for Shijing river, Chebei stream, and Shahe stream, the uncertainty reduced 15.5%, 17.3% and 19.8% respectively).
HEp-2 cell image classification method based on very deep convolutional networks with small datasets
NASA Astrophysics Data System (ADS)
Lu, Mengchi; Gao, Long; Guo, Xifeng; Liu, Qiang; Yin, Jianping
2017-07-01
Human Epithelial-2 (HEp-2) cell images staining patterns classification have been widely used to identify autoimmune diseases by the anti-Nuclear antibodies (ANA) test in the Indirect Immunofluorescence (IIF) protocol. Because manual test is time consuming, subjective and labor intensive, image-based Computer Aided Diagnosis (CAD) systems for HEp-2 cell classification are developing. However, methods proposed recently are mostly manual features extraction with low accuracy. Besides, the scale of available benchmark datasets is small, which does not exactly suitable for using deep learning methods. This issue will influence the accuracy of cell classification directly even after data augmentation. To address these issues, this paper presents a high accuracy automatic HEp-2 cell classification method with small datasets, by utilizing very deep convolutional networks (VGGNet). Specifically, the proposed method consists of three main phases, namely image preprocessing, feature extraction and classification. Moreover, an improved VGGNet is presented to address the challenges of small-scale datasets. Experimental results over two benchmark datasets demonstrate that the proposed method achieves superior performance in terms of accuracy compared with existing methods.
NASA Astrophysics Data System (ADS)
Müller-Putz, Gernot R.; Scherer, Reinhold; Brauneis, Christian; Pfurtscheller, Gert
2005-12-01
Brain-computer interfaces (BCIs) can be realized on the basis of steady-state evoked potentials (SSEPs). These types of brain signals resulting from repetitive stimulation have the same fundamental frequency as the stimulation but also include higher harmonics. This study investigated how the classification accuracy of a 4-class BCI system can be improved by incorporating visually evoked harmonic oscillations. The current study revealed that the use of three SSVEP harmonics yielded a significantly higher classification accuracy than was the case for one or two harmonics. During feedback experiments, the five subjects investigated reached a classification accuracy between 42.5% and 94.4%.
Müller-Putz, Gernot R; Scherer, Reinhold; Brauneis, Christian; Pfurtscheller, Gert
2005-12-01
Brain-computer interfaces (BCIs) can be realized on the basis of steady-state evoked potentials (SSEPs). These types of brain signals resulting from repetitive stimulation have the same fundamental frequency as the stimulation but also include higher harmonics. This study investigated how the classification accuracy of a 4-class BCI system can be improved by incorporating visually evoked harmonic oscillations. The current study revealed that the use of three SSVEP harmonics yielded a significantly higher classification accuracy than was the case for one or two harmonics. During feedback experiments, the five subjects investigated reached a classification accuracy between 42.5% and 94.4%.
Uav-Based Crops Classification with Joint Features from Orthoimage and Dsm Data
NASA Astrophysics Data System (ADS)
Liu, B.; Shi, Y.; Duan, Y.; Wu, W.
2018-04-01
Accurate crops classification remains a challenging task due to the same crop with different spectra and different crops with same spectrum phenomenon. Recently, UAV-based remote sensing approach gains popularity not only for its high spatial and temporal resolution, but also for its ability to obtain spectraand spatial data at the same time. This paper focus on how to take full advantages of spatial and spectrum features to improve crops classification accuracy, based on an UAV platform equipped with a general digital camera. Texture and spatial features extracted from the RGB orthoimage and the digital surface model of the monitoring area are analysed and integrated within a SVM classification framework. Extensive experiences results indicate that the overall classification accuracy is drastically improved from 72.9 % to 94.5 % when the spatial features are combined together, which verified the feasibility and effectiveness of the proposed method.
Classification of weld defect based on information fusion technology for radiographic testing system
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jiang, Hongquan; Liang, Zeming, E-mail: heavenlzm@126.com; Gao, Jianmin
Improving the efficiency and accuracy of weld defect classification is an important technical problem in developing the radiographic testing system. This paper proposes a novel weld defect classification method based on information fusion technology, Dempster–Shafer evidence theory. First, to characterize weld defects and improve the accuracy of their classification, 11 weld defect features were defined based on the sub-pixel level edges of radiographic images, four of which are presented for the first time in this paper. Second, we applied information fusion technology to combine different features for weld defect classification, including a mass function defined based on the weld defectmore » feature information and the quartile-method-based calculation of standard weld defect class which is to solve a sample problem involving a limited number of training samples. A steam turbine weld defect classification case study is also presented herein to illustrate our technique. The results show that the proposed method can increase the correct classification rate with limited training samples and address the uncertainties associated with weld defect classification.« less
Jiang, Hongquan; Liang, Zeming; Gao, Jianmin; Dang, Changying
2016-03-01
Improving the efficiency and accuracy of weld defect classification is an important technical problem in developing the radiographic testing system. This paper proposes a novel weld defect classification method based on information fusion technology, Dempster-Shafer evidence theory. First, to characterize weld defects and improve the accuracy of their classification, 11 weld defect features were defined based on the sub-pixel level edges of radiographic images, four of which are presented for the first time in this paper. Second, we applied information fusion technology to combine different features for weld defect classification, including a mass function defined based on the weld defect feature information and the quartile-method-based calculation of standard weld defect class which is to solve a sample problem involving a limited number of training samples. A steam turbine weld defect classification case study is also presented herein to illustrate our technique. The results show that the proposed method can increase the correct classification rate with limited training samples and address the uncertainties associated with weld defect classification.
Ralston, Barbara E.; Davis, Philip A.; Weber, Robert M.; Rundall, Jill M.
2008-01-01
A vegetation database of the riparian vegetation located within the Colorado River ecosystem (CRE), a subsection of the Colorado River between Glen Canyon Dam and the western boundary of Grand Canyon National Park, was constructed using four-band image mosaics acquired in May 2002. A digital line scanner was flown over the Colorado River corridor in Arizona by ISTAR Americas, using a Leica ADS-40 digital camera to acquire a digital surface model and four-band image mosaics (blue, green, red, and near-infrared) for vegetation mapping. The primary objective of this mapping project was to develop a digital inventory map of vegetation to enable patch- and landscape-scale change detection, and to establish randomized sampling points for ground surveys of terrestrial fauna (principally, but not exclusively, birds). The vegetation base map was constructed through a combination of ground surveys to identify vegetation classes, image processing, and automated supervised classification procedures. Analysis of the imagery and subsequent supervised classification involved multiple steps to evaluate band quality, band ratios, and vegetation texture and density. Identification of vegetation classes involved collection of cover data throughout the river corridor and subsequent analysis using two-way indicator species analysis (TWINSPAN). Vegetation was classified into six vegetation classes, following the National Vegetation Classification Standard, based on cover dominance. This analysis indicated that total area covered by all vegetation within the CRE was 3,346 ha. Considering the six vegetation classes, the sparse shrub (SS) class accounted for the greatest amount of vegetation (627 ha) followed by Pluchea (PLSE) and Tamarix (TARA) at 494 and 366 ha, respectively. The wetland (WTLD) and Prosopis-Acacia (PRGL) classes both had similar areal cover values (227 and 213 ha, respectively). Baccharis-Salix (BAXX) was the least represented at 94 ha. Accuracy assessment of the supervised classification determined that accuracies varied among vegetation classes from 90% to 49%. Causes for low accuracies were similar spectral signatures among vegetation classes. Fuzzy accuracy assessment improved classification accuracies such that Federal mapping standards of 80% accuracies for all classes were met. The scale used to quantify vegetation adequately meets the needs of the stakeholder group. Increasing the scale to meet the U.S. Geological Survey (USGS)-National Park Service (NPS)National Mapping Program's minimum mapping unit of 0.5 ha is unwarranted because this scale would reduce the resolution of some classes (e.g., seep willow/coyote willow would likely be combined with tamarisk). While this would undoubtedly improve classification accuracies, it would not provide the community-level information about vegetation change that would benefit stakeholders. The identification of vegetation classes should follow NPS mapping approaches to complement the national effort and should incorporate the alternative analysis for community identification that is being incorporated into newer NPS mapping efforts. National Vegetation Classification is followed in this report for association- to formation-level categories. Accuracies could be improved by including more environmental variables such as stage elevation in the classification process and incorporating object-based classification methods. Another approach that may address the heterogeneous species issue and classification is to use spectral mixing analysis to estimate the fractional cover of species within each pixel and better quantify the cover of individual species that compose a cover class. Varying flights to capture vegetation at different times of the year might also help separate some vegetation classes, though the cost may be prohibitive. Lastly, photointerpretation instead of automated mapping could be tried. Photointerpretation would likely not improve accuracies in this case, howev
Classification of ECG beats using deep belief network and active learning.
G, Sayantan; T, Kien P; V, Kadambari K
2018-04-12
A new semi-supervised approach based on deep learning and active learning for classification of electrocardiogram signals (ECG) is proposed. The objective of the proposed work is to model a scientific method for classification of cardiac irregularities using electrocardiogram beats. The model follows the Association for the Advancement of medical instrumentation (AAMI) standards and consists of three phases. In phase I, feature representation of ECG is learnt using Gaussian-Bernoulli deep belief network followed by a linear support vector machine (SVM) training in the consecutive phase. It yields three deep models which are based on AAMI-defined classes, namely N, V, S, and F. In the last phase, a query generator is introduced to interact with the expert to label few beats to improve accuracy and sensitivity. The proposed approach depicts significant improvement in accuracy with minimal queries posed to the expert and fast online training as tested on the MIT-BIH Arrhythmia Database and the MIT-BIH Supra-ventricular Arrhythmia Database (SVDB). With 100 queries labeled by the expert in phase III, the method achieves an accuracy of 99.5% in "S" versus all classifications (SVEB) and 99.4% accuracy in "V " versus all classifications (VEB) on MIT-BIH Arrhythmia Database. In a similar manner, it is attributed that an accuracy of 97.5% for SVEB and 98.6% for VEB on SVDB database is achieved respectively. Graphical Abstract Reply- Deep belief network augmented by active learning for efficient prediction of arrhythmia.
Blob-level active-passive data fusion for Benthic classification
NASA Astrophysics Data System (ADS)
Park, Joong Yong; Kalluri, Hemanth; Mathur, Abhinav; Ramnath, Vinod; Kim, Minsu; Aitken, Jennifer; Tuell, Grady
2012-06-01
We extend the data fusion pixel level to the more semantically meaningful blob level, using the mean-shift algorithm to form labeled blobs having high similarity in the feature domain, and connectivity in the spatial domain. We have also developed Bhattacharyya Distance (BD) and rule-based classifiers, and have implemented these higher-level data fusion algorithms into the CZMIL Data Processing System. Applying these new algorithms to recent SHOALS and CASI data at Plymouth Harbor, Massachusetts, we achieved improved benthic classification accuracies over those produced with either single sensor, or pixel-level fusion strategies. These results appear to validate the hypothesis that classification accuracy may be generally improved by adopting higher spatial and semantic levels of fusion.
NASA Astrophysics Data System (ADS)
Zhu, L.; Radeloff, V.; Ives, A. R.; Barton, B.
2015-12-01
Deriving crop pattern with high accuracy is of great importance for characterizing landscape diversity, which affects the resilience of food webs in agricultural systems in the face of climatic and land cover changes. Landsat sensors were originally designed to monitor agricultural areas, and both radiometric and spatial resolution are optimized for monitoring large agricultural fields. Unfortunately, few clear Landsat images per year are available, which has limited the use of Landsat for making crop classification, and this situation is worse in cloudy areas of the Earth. Meanwhile, the MODerate Resolution Imaging Spectroradiometer (MODIS) data has better temporal resolution but cannot capture fine spatial heterogeneity of agricultural systems. Our question was to what extent fusing imagery from both sensors could improve crop classifications. We utilized the Spatial and Temporal Adaptive Reflectance Fusion Model (STARFM) algorithm to simulate Landsat-like images at MODIS temporal resolution. Based on Random Forests (RF) classifier, we tested whether and by what degree crop maps from 2000 to 2014 of the Arlington Agricultural Research Station (Wisconsin, USA) were improved by integrating available clear Landsat images each year with synthetic images. We predicted that the degree to which classification accuracy can be improved by incorporating synthetic imagery depends on the number and acquisition time of clear Landsat images. Moreover, multi-season data are essential for mapping crop types by capturing their phenological dynamics, and STARFM-simulated images can be used to compensate for missing Landsat observations. Our study is helpful for eliminating the limits of the use of Landsat data in mapping crop patterns, and can provide a benchmark of accuracy when choosing STARFM-simulated images to make crop classification at broader scales.
Research on cardiovascular disease prediction based on distance metric learning
NASA Astrophysics Data System (ADS)
Ni, Zhuang; Liu, Kui; Kang, Guixia
2018-04-01
Distance metric learning algorithm has been widely applied to medical diagnosis and exhibited its strengths in classification problems. The k-nearest neighbour (KNN) is an efficient method which treats each feature equally. The large margin nearest neighbour classification (LMNN) improves the accuracy of KNN by learning a global distance metric, which did not consider the locality of data distributions. In this paper, we propose a new distance metric algorithm adopting cosine metric and LMNN named COS-SUBLMNN which takes more care about local feature of data to overcome the shortage of LMNN and improve the classification accuracy. The proposed methodology is verified on CVDs patient vector derived from real-world medical data. The Experimental results show that our method provides higher accuracy than KNN and LMNN did, which demonstrates the effectiveness of the Risk predictive model of CVDs based on COS-SUBLMNN.
Transportation Modes Classification Using Sensors on Smartphones.
Fang, Shih-Hau; Liao, Hao-Hsiang; Fei, Yu-Xiang; Chen, Kai-Hsiang; Huang, Jen-Wei; Lu, Yu-Ding; Tsao, Yu
2016-08-19
This paper investigates the transportation and vehicular modes classification by using big data from smartphone sensors. The three types of sensors used in this paper include the accelerometer, magnetometer, and gyroscope. This study proposes improved features and uses three machine learning algorithms including decision trees, K-nearest neighbor, and support vector machine to classify the user's transportation and vehicular modes. In the experiments, we discussed and compared the performance from different perspectives including the accuracy for both modes, the executive time, and the model size. Results show that the proposed features enhance the accuracy, in which the support vector machine provides the best performance in classification accuracy whereas it consumes the largest prediction time. This paper also investigates the vehicle classification mode and compares the results with that of the transportation modes.
Transportation Modes Classification Using Sensors on Smartphones
Fang, Shih-Hau; Liao, Hao-Hsiang; Fei, Yu-Xiang; Chen, Kai-Hsiang; Huang, Jen-Wei; Lu, Yu-Ding; Tsao, Yu
2016-01-01
This paper investigates the transportation and vehicular modes classification by using big data from smartphone sensors. The three types of sensors used in this paper include the accelerometer, magnetometer, and gyroscope. This study proposes improved features and uses three machine learning algorithms including decision trees, K-nearest neighbor, and support vector machine to classify the user’s transportation and vehicular modes. In the experiments, we discussed and compared the performance from different perspectives including the accuracy for both modes, the executive time, and the model size. Results show that the proposed features enhance the accuracy, in which the support vector machine provides the best performance in classification accuracy whereas it consumes the largest prediction time. This paper also investigates the vehicle classification mode and compares the results with that of the transportation modes. PMID:27548182
Rifai Chai; Naik, Ganesh R; Tran, Yvonne; Sai Ho Ling; Craig, Ashley; Nguyen, Hung T
2015-08-01
An electroencephalography (EEG)-based counter measure device could be used for fatigue detection during driving. This paper explores the classification of fatigue and alert states using power spectral density (PSD) as a feature extractor and fuzzy swarm based-artificial neural network (ANN) as a classifier. An independent component analysis of entropy rate bound minimization (ICA-ERBM) is investigated as a novel source separation technique for fatigue classification using EEG analysis. A comparison of the classification accuracy of source separator versus no source separator is presented. Classification performance based on 43 participants without the inclusion of the source separator resulted in an overall sensitivity of 71.67%, a specificity of 75.63% and an accuracy of 73.65%. However, these results were improved after the inclusion of a source separator module, resulting in an overall sensitivity of 78.16%, a specificity of 79.60% and an accuracy of 78.88% (p <; 0.05).
Study design requirements for RNA sequencing-based breast cancer diagnostics.
Mer, Arvind Singh; Klevebring, Daniel; Grönberg, Henrik; Rantalainen, Mattias
2016-02-01
Sequencing-based molecular characterization of tumors provides information required for individualized cancer treatment. There are well-defined molecular subtypes of breast cancer that provide improved prognostication compared to routine biomarkers. However, molecular subtyping is not yet implemented in routine breast cancer care. Clinical translation is dependent on subtype prediction models providing high sensitivity and specificity. In this study we evaluate sample size and RNA-sequencing read requirements for breast cancer subtyping to facilitate rational design of translational studies. We applied subsampling to ascertain the effect of training sample size and the number of RNA sequencing reads on classification accuracy of molecular subtype and routine biomarker prediction models (unsupervised and supervised). Subtype classification accuracy improved with increasing sample size up to N = 750 (accuracy = 0.93), although with a modest improvement beyond N = 350 (accuracy = 0.92). Prediction of routine biomarkers achieved accuracy of 0.94 (ER) and 0.92 (Her2) at N = 200. Subtype classification improved with RNA-sequencing library size up to 5 million reads. Development of molecular subtyping models for cancer diagnostics requires well-designed studies. Sample size and the number of RNA sequencing reads directly influence accuracy of molecular subtyping. Results in this study provide key information for rational design of translational studies aiming to bring sequencing-based diagnostics to the clinic.
Classification with spatio-temporal interpixel class dependency contexts
NASA Technical Reports Server (NTRS)
Jeon, Byeungwoo; Landgrebe, David A.
1992-01-01
A contextual classifier which can utilize both spatial and temporal interpixel dependency contexts is investigated. After spatial and temporal neighbors are defined, a general form of maximum a posterior spatiotemporal contextual classifier is derived. This contextual classifier is simplified under several assumptions. Joint prior probabilities of the classes of each pixel and its spatial neighbors are modeled by the Gibbs random field. The classification is performed in a recursive manner to allow a computationally efficient contextual classification. Experimental results with bitemporal TM data show significant improvement of classification accuracy over noncontextual pixelwise classifiers. This spatiotemporal contextual classifier should find use in many applications of remote sensing, especially when the classification accuracy is important.
Empirical evaluation of data normalization methods for molecular classification.
Huang, Huei-Chung; Qin, Li-Xuan
2018-01-01
Data artifacts due to variations in experimental handling are ubiquitous in microarray studies, and they can lead to biased and irreproducible findings. A popular approach to correct for such artifacts is through post hoc data adjustment such as data normalization. Statistical methods for data normalization have been developed and evaluated primarily for the discovery of individual molecular biomarkers. Their performance has rarely been studied for the development of multi-marker molecular classifiers-an increasingly important application of microarrays in the era of personalized medicine. In this study, we set out to evaluate the performance of three commonly used methods for data normalization in the context of molecular classification, using extensive simulations based on re-sampling from a unique pair of microRNA microarray datasets for the same set of samples. The data and code for our simulations are freely available as R packages at GitHub. In the presence of confounding handling effects, all three normalization methods tended to improve the accuracy of the classifier when evaluated in an independent test data. The level of improvement and the relative performance among the normalization methods depended on the relative level of molecular signal, the distributional pattern of handling effects (e.g., location shift vs scale change), and the statistical method used for building the classifier. In addition, cross-validation was associated with biased estimation of classification accuracy in the over-optimistic direction for all three normalization methods. Normalization may improve the accuracy of molecular classification for data with confounding handling effects; however, it cannot circumvent the over-optimistic findings associated with cross-validation for assessing classification accuracy.
Erdodi, Laszlo A; Tyson, Bradley T; Shahein, Ayman G; Lichtenstein, Jonathan D; Abeare, Christopher A; Pelletier, Chantalle L; Zuccato, Brandon G; Kucharski, Brittany; Roth, Robert M
2017-05-01
The Recognition Memory Test (RMT) and Word Choice Test (WCT) are structurally similar, but psychometrically different. Previous research demonstrated that adding a time-to-completion cutoff improved the classification accuracy of the RMT. However, the contribution of WCT time-cutoffs to improve the detection of invalid responding has not been investigated. The present study was designed to evaluate the classification accuracy of time-to-completion on the WCT compared to the accuracy score and the RMT. Both tests were administered to 202 adults (M age = 45.3 years, SD = 16.8; 54.5% female) clinically referred for neuropsychological assessment in counterbalanced order as part of a larger battery of cognitive tests. Participants obtained lower and more variable scores on the RMT (M = 44.1, SD = 7.6) than on the WCT (M = 46.9, SD = 5.7). Similarly, they took longer to complete the recognition trial on the RMT (M = 157.2 s,SD = 71.8) than the WCT (M = 137.2 s, SD = 75.7). The optimal cutoff on the RMT (≤43) produced .60 sensitivity at .87 specificity. The optimal cutoff on the WCT (≤47) produced .57 sensitivity at .87 specificity. Time-cutoffs produced comparable classification accuracies for both RMT (≥192 s; .48 sensitivity at .88 specificity) and WCT (≥171 s; .49 sensitivity at .91 specificity). They also identified an additional 6-10% of the invalid profiles missed by accuracy score cutoffs, while maintaining good specificity (.93-.95). Functional equivalence was reached at accuracy scores ≤43 (RMT) and ≤47 (WCT) or time-to-completion ≥192 s (RMT) and ≥171 s (WCT). Time-to-completion cutoffs are valuable additions to both tests. They can function as independent validity indicators or enhance the sensitivity of accuracy scores without requiring additional measures or extending standard administration time.
NASA Astrophysics Data System (ADS)
Gutierrez-Velez, V. H.; DeFries, R. S.
2011-12-01
Oil palm expansion has led to clearing of extensive forest areas in the tropics. However quantitative assessments of the magnitude of oil palm expansion to deforestation have been challenging due in large part to the limitations presented by conventional optical data sets for discriminating plantations from forests and other tree cover vegetations. Recently available information from active remote sensors has opened the possibility of using these data sources to overcome these limitations. The purpose of this analysis is to evaluate the accuracy of oil palm classification when using ALOS/PALSAR active satellite data in conjunction with Landsat information, compared to the use of Landsat data only. The analysis takes place in a focused region around the city of Pucallpa in the Ucayali province of the Peruvian Amazon for the year 2010. Oil palm plantations were separated in five categories consisting of four age classes (0-3, 3-5, 5-10 and > 10 yrs) and an additional class accounting for degraded plantations older than 15 yr. Other land covers were water bodies, unvegetated land, short and tall grass, fallow, secondary vegetation, and forest. Classifications were performed using random forests. Training points for calibration and validation consisted of 411 polygons measured in areas representative of the land covers of interest and totaled 6,367 ha. Overall classification accuracy increased from 89.9% using only Landsat data sets to 94.3% using both Landast and ALOS/PALSAR. Both user's and producer's accuracy increased in all classes when using both data sets except for producer's accuracy in short grass which decreased by 1%. The largest increase in user's accuracy was obtained in oil palm plantations older than 10 years from 62 to 80% while producer's accuracy improved the most in plantations in age class 3-5 from 63 to 80%. Results demonstrate the suitability of data from ALOS/PALSAR and other active remote sensors to improve classification of oil palm plantations in age classes and discriminate them from other land covers. Results suggest a potential for improving discrimination of other tree cover types using a combination of active and conventional optical remote sensors.
Developing collaborative classifiers using an expert-based model
Mountrakis, G.; Watts, R.; Luo, L.; Wang, Jingyuan
2009-01-01
This paper presents a hierarchical, multi-stage adaptive strategy for image classification. We iteratively apply various classification methods (e.g., decision trees, neural networks), identify regions of parametric and geographic space where accuracy is low, and in these regions, test and apply alternate methods repeating the process until the entire image is classified. Currently, classifiers are evaluated through human input using an expert-based system; therefore, this paper acts as the proof of concept for collaborative classifiers. Because we decompose the problem into smaller, more manageable sub-tasks, our classification exhibits increased flexibility compared to existing methods since classification methods are tailored to the idiosyncrasies of specific regions. A major benefit of our approach is its scalability and collaborative support since selected low-accuracy classifiers can be easily replaced with others without affecting classification accuracy in high accuracy areas. At each stage, we develop spatially explicit accuracy metrics that provide straightforward assessment of results by non-experts and point to areas that need algorithmic improvement or ancillary data. Our approach is demonstrated in the task of detecting impervious surface areas, an important indicator for human-induced alterations to the environment, using a 2001 Landsat scene from Las Vegas, Nevada. ?? 2009 American Society for Photogrammetry and Remote Sensing.
An ant colony optimization based feature selection for web page classification.
Saraç, Esra; Özel, Selma Ayşe
2014-01-01
The increased popularity of the web has caused the inclusion of huge amount of information to the web, and as a result of this explosive information growth, automated web page classification systems are needed to improve search engines' performance. Web pages have a large number of features such as HTML/XML tags, URLs, hyperlinks, and text contents that should be considered during an automated classification process. The aim of this study is to reduce the number of features to be used to improve runtime and accuracy of the classification of web pages. In this study, we used an ant colony optimization (ACO) algorithm to select the best features, and then we applied the well-known C4.5, naive Bayes, and k nearest neighbor classifiers to assign class labels to web pages. We used the WebKB and Conference datasets in our experiments, and we showed that using the ACO for feature selection improves both accuracy and runtime performance of classification. We also showed that the proposed ACO based algorithm can select better features with respect to the well-known information gain and chi square feature selection methods.
Gómez-Valdés, Jorge A; Menéndez Garmendia, Antinea; García-Barzola, Lizbeth; Sánchez-Mejorada, Gabriela; Karam, Carlos; Baraybar, José Pablo; Klales, Alexandra
2017-03-01
The aim of this study was to test the accuracy of the Klales et al. (2012) equation for sex estimation in contemporary Mexican population. Our investigation was carried out on a sample of 203 left innominates of identified adult skeletons from the UNAM-Collection and the Santa María Xigui Cemetery, in Central Mexico. The Klales' original equation produces a sex bias in sex estimation against males (86-92% accuracy versus 100% accuracy in females). Based on these results, the Klales et al. (2012) method was recalibrated for a new cutt-of-point for sex estimation in contemporary Mexican populations. The results show cross-validated classification accuracy rates as high as 100% after recalibrating the original logistic regression equation. Recalibration improved classification accuracy and eliminated sex bias. This new formula will improve sex estimation for Mexican contemporary populations. © 2017 Wiley Periodicals, Inc.
NASA Technical Reports Server (NTRS)
Wrigley, R. C.; Acevedo, W.; Alexander, D.; Buis, J.; Card, D.
1984-01-01
An experiment of a factorial design was conducted to test the effects on classification accuracy of land cover types due to the improved spatial, spectral and radiometric characteristics of the Thematic Mapper (TM) in comparison to the Multispectral Scanner (MSS). High altitude aircraft scanner data from the Airborne Thematic Mapper instrument was acquired over central California in August, 1983 and used to simulate Thematic Mapper data as well as all combinations of the three characteristics for eight data sets in all. Results for the training sites (field center pixels) showed better classification accuracies for MSS spatial resolution, TM spectral bands and TM radiometry in order of importance.
Post-boosting of classification boundary for imbalanced data using geometric mean.
Du, Jie; Vong, Chi-Man; Pun, Chi-Man; Wong, Pak-Kin; Ip, Weng-Fai
2017-12-01
In this paper, a novel imbalance learning method for binary classes is proposed, named as Post-Boosting of classification boundary for Imbalanced data (PBI), which can significantly improve the performance of any trained neural networks (NN) classification boundary. The procedure of PBI simply consists of two steps: an (imbalanced) NN learning method is first applied to produce a classification boundary, which is then adjusted by PBI under the geometric mean (G-mean). For data imbalance, the geometric mean of the accuracies of both minority and majority classes is considered, that is statistically more suitable than the common metric accuracy. PBI also has the following advantages over traditional imbalance methods: (i) PBI can significantly improve the classification accuracy on minority class while improving or keeping that on majority class as well; (ii) PBI is suitable for large data even with high imbalance ratio (up to 0.001). For evaluation of (i), a new metric called Majority loss/Minority advance ratio (MMR) is proposed that evaluates the loss ratio of majority class to minority class. Experiments have been conducted for PBI and several imbalance learning methods over benchmark datasets of different sizes, different imbalance ratios, and different dimensionalities. By analyzing the experimental results, PBI is shown to outperform other imbalance learning methods on almost all datasets. Copyright © 2017 Elsevier Ltd. All rights reserved.
Orhan, Umut; Erdogmus, Deniz; Roark, Brian; Purwar, Shalini; Hild, Kenneth E.; Oken, Barry; Nezamfar, Hooman; Fried-Oken, Melanie
2013-01-01
Event related potentials (ERP) corresponding to a stimulus in electroencephalography (EEG) can be used to detect the intent of a person for brain computer interfaces (BCI). This paradigm is widely utilized to build letter-by-letter text input systems using BCI. Nevertheless using a BCI-typewriter depending only on EEG responses will not be sufficiently accurate for single-trial operation in general, and existing systems utilize many-trial schemes to achieve accuracy at the cost of speed. Hence incorporation of a language model based prior or additional evidence is vital to improve accuracy and speed. In this paper, we study the effects of Bayesian fusion of an n-gram language model with a regularized discriminant analysis ERP detector for EEG-based BCIs. The letter classification accuracies are rigorously evaluated for varying language model orders as well as number of ERP-inducing trials. The results demonstrate that the language models contribute significantly to letter classification accuracy. Specifically, we find that a BCI-speller supported by a 4-gram language model may achieve the same performance using 3-trial ERP classification for the initial letters of the words and using single trial ERP classification for the subsequent ones. Overall, fusion of evidence from EEG and language models yields a significant opportunity to increase the word rate of a BCI based typing system. PMID:22255652
NASA Astrophysics Data System (ADS)
Hammann, Mark Gregory
The fusion of electro-optical (EO) multi-spectral satellite imagery with Synthetic Aperture Radar (SAR) data was explored with the working hypothesis that the addition of multi-band SAR will increase the land-cover (LC) classification accuracy compared to EO alone. Three satellite sources for SAR imagery were used: X-band from TerraSAR-X, C-band from RADARSAT-2, and L-band from PALSAR. Images from the RapidEye satellites were the source of the EO imagery. Imagery from the GeoEye-1 and WorldView-2 satellites aided the selection of ground truth. Three study areas were chosen: Wad Medani, Sudan; Campinas, Brazil; and Fresno- Kings Counties, USA. EO imagery were radiometrically calibrated, atmospherically compensated, orthorectifed, co-registered, and clipped to a common area of interest (AOI). SAR imagery were radiometrically calibrated, and geometrically corrected for terrain and incidence angle by converting to ground range and Sigma Naught (?0). The original SAR HH data were included in the fused image stack after despeckling with a 3x3 Enhanced Lee filter. The variance and Gray-Level-Co-occurrence Matrix (GLCM) texture measures of contrast, entropy, and correlation were derived from the non-despeckled SAR HH bands. Data fusion was done with layer stacking and all data were resampled to a common spatial resolution. The Support Vector Machine (SVM) decision rule was used for the supervised classifications. Similar LC classes were identified and tested for each study area. For Wad Medani, nine classes were tested: low and medium intensity urban, sparse forest, water, barren ground, and four agriculture classes (fallow, bare agricultural ground, green crops, and orchards). For Campinas, Brazil, five generic classes were tested: urban, agriculture, forest, water, and barren ground. For the Fresno-Kings Counties location 11 classes were studied: three generic classes (urban, water, barren land), and eight specific crops. In all cases the addition of SAR to EO resulted in higher overall classification accuracies. In many cases using more than a single SAR band also improved the classification accuracy. There was no single best SAR band for all cases; for specific study areas or LC classes, different SAR bands were better. For Wad Medani, the overall accuracy increased nearly 25% over EO by using all three SAR bands and GLCM texture. For Campinas, the improvement over EO was 4.3%; the large areas of vegetation were classified by EO with good accuracy. At Fresno-Kings Counties, EO+SAR fusion improved the overall classification accuracy by 7%. For times or regions where EO is not available due to extended cloud cover, classification with SAR is often the only option; note that SAR alone typically results in lower classification accuracies than when using EO or EO-SAR fusion. Fusion of EO and SAR was especially important to improve the separability of orchards from other crops, and separating urban areas with buildings from bare soil; those classes are difficult to accurately separate with EO. The outcome of this dissertation contributes to the understanding of the benefits of combining data from EO imagery with different SAR bands and SAR derived texture data to identify different LC classes. In times of increased public and private budget constraints and industry consolidation, this dissertation provides insight as to which band packages could be most useful for increased accuracy in LC classification.
An incremental knowledge assimilation system (IKAS) for mine detection
NASA Astrophysics Data System (ADS)
Porway, Jake; Raju, Chaitanya; Varadarajan, Karthik Mahesh; Nguyen, Hieu; Yadegar, Joseph
2010-04-01
In this paper we present an adaptive incremental learning system for underwater mine detection and classification that utilizes statistical models of seabed texture and an adaptive nearest-neighbor classifier to identify varied underwater targets in many different environments. The first stage of processing uses our Background Adaptive ANomaly detector (BAAN), which identifies statistically likely target regions using Gabor filter responses over the image. Using this information, BAAN classifies the background type and updates its detection using background-specific parameters. To perform classification, a Fully Adaptive Nearest Neighbor (FAAN) determines the best label for each detection. FAAN uses an extremely fast version of Nearest Neighbor to find the most likely label for the target. The classifier perpetually assimilates new and relevant information into its existing knowledge database in an incremental fashion, allowing improved classification accuracy and capturing concept drift in the target classes. Experiments show that the system achieves >90% classification accuracy on underwater mine detection tasks performed on synthesized datasets provided by the Office of Naval Research. We have also demonstrated that the system can incrementally improve its detection accuracy by constantly learning from new samples.
NASA Astrophysics Data System (ADS)
Wang, Bingjie; Pi, Shaohua; Sun, Qi; Jia, Bo
2015-05-01
An improved classification algorithm that considers multiscale wavelet packet Shannon entropy is proposed. Decomposition coefficients at all levels are obtained to build the initial Shannon entropy feature vector. After subtracting the Shannon entropy map of the background signal, components of the strongest discriminating power in the initial feature vector are picked out to rebuild the Shannon entropy feature vector, which is transferred to radial basis function (RBF) neural network for classification. Four types of man-made vibrational intrusion signals are recorded based on a modified Sagnac interferometer. The performance of the improved classification algorithm has been evaluated by the classification experiments via RBF neural network under different diffusion coefficients. An 85% classification accuracy rate is achieved, which is higher than the other common algorithms. The classification results show that this improved classification algorithm can be used to classify vibrational intrusion signals in an automatic real-time monitoring system.
Estimation of different data compositions for early-season crop type classification.
Hao, Pengyu; Wu, Mingquan; Niu, Zheng; Wang, Li; Zhan, Yulin
2018-01-01
Timely and accurate crop type distribution maps are an important inputs for crop yield estimation and production forecasting as multi-temporal images can observe phenological differences among crops. Therefore, time series remote sensing data are essential for crop type mapping, and image composition has commonly been used to improve the quality of the image time series. However, the optimal composition period is unclear as long composition periods (such as compositions lasting half a year) are less informative and short composition periods lead to information redundancy and missing pixels. In this study, we initially acquired daily 30 m Normalized Difference Vegetation Index (NDVI) time series by fusing MODIS, Landsat, Gaofen and Huanjing (HJ) NDVI, and then composited the NDVI time series using four strategies (daily, 8-day, 16-day, and 32-day). We used Random Forest to identify crop types and evaluated the classification performances of the NDVI time series generated from four composition strategies in two studies regions from Xinjiang, China. Results indicated that crop classification performance improved as crop separabilities and classification accuracies increased, and classification uncertainties dropped in the green-up stage of the crops. When using daily NDVI time series, overall accuracies saturated at 113-day and 116-day in Bole and Luntai, and the saturated overall accuracies (OAs) were 86.13% and 91.89%, respectively. Cotton could be identified 40∼60 days and 35∼45 days earlier than the harvest in Bole and Luntai when using daily, 8-day and 16-day composition NDVI time series since both producer's accuracies (PAs) and user's accuracies (UAs) were higher than 85%. Among the four compositions, the daily NDVI time series generated the highest classification accuracies. Although the 8-day, 16-day and 32-day compositions had similar saturated overall accuracies (around 85% in Bole and 83% in Luntai), the 8-day and 16-day compositions achieved these accuracies around 155-day in Bole and 133-day in Luntai, which were earlier than the 32-day composition (170-day in both Bole and Luntai). Therefore, when the daily NDVI time series cannot be acquired, the 16-day composition is recommended in this study.
Estimation of different data compositions for early-season crop type classification
Wu, Mingquan; Wang, Li; Zhan, Yulin
2018-01-01
Timely and accurate crop type distribution maps are an important inputs for crop yield estimation and production forecasting as multi-temporal images can observe phenological differences among crops. Therefore, time series remote sensing data are essential for crop type mapping, and image composition has commonly been used to improve the quality of the image time series. However, the optimal composition period is unclear as long composition periods (such as compositions lasting half a year) are less informative and short composition periods lead to information redundancy and missing pixels. In this study, we initially acquired daily 30 m Normalized Difference Vegetation Index (NDVI) time series by fusing MODIS, Landsat, Gaofen and Huanjing (HJ) NDVI, and then composited the NDVI time series using four strategies (daily, 8-day, 16-day, and 32-day). We used Random Forest to identify crop types and evaluated the classification performances of the NDVI time series generated from four composition strategies in two studies regions from Xinjiang, China. Results indicated that crop classification performance improved as crop separabilities and classification accuracies increased, and classification uncertainties dropped in the green-up stage of the crops. When using daily NDVI time series, overall accuracies saturated at 113-day and 116-day in Bole and Luntai, and the saturated overall accuracies (OAs) were 86.13% and 91.89%, respectively. Cotton could be identified 40∼60 days and 35∼45 days earlier than the harvest in Bole and Luntai when using daily, 8-day and 16-day composition NDVI time series since both producer’s accuracies (PAs) and user’s accuracies (UAs) were higher than 85%. Among the four compositions, the daily NDVI time series generated the highest classification accuracies. Although the 8-day, 16-day and 32-day compositions had similar saturated overall accuracies (around 85% in Bole and 83% in Luntai), the 8-day and 16-day compositions achieved these accuracies around 155-day in Bole and 133-day in Luntai, which were earlier than the 32-day composition (170-day in both Bole and Luntai). Therefore, when the daily NDVI time series cannot be acquired, the 16-day composition is recommended in this study. PMID:29868265
NASA Astrophysics Data System (ADS)
Kurniawan, Dian; Suparti; Sugito
2018-05-01
Population growth in Indonesia has increased every year. According to the population census conducted by the Central Bureau of Statistics (BPS) in 2010, the population of Indonesia has reached 237.6 million people. Therefore, to control the population growth rate, the government hold Family Planning or Keluarga Berencana (KB) program for couples of childbearing age. The purpose of this program is to improve the health of mothers and children in order to manifest prosperous society by controlling births while ensuring control of population growth. The data used in this study is the updated family data of Semarang city in 2016 that conducted by National Family Planning Coordinating Board (BKKBN). From these data, classifiers with kernel discriminant analysis will be obtained, and also classification accuracy will be obtained from that method. The result of the analysis showed that normal kernel discriminant analysis gives 71.05 % classification accuracy with 28.95 % classification error. Whereas triweight kernel discriminant analysis gives 73.68 % classification accuracy with 26.32 % classification error. Using triweight kernel discriminant for data preprocessing of family planning participation of childbearing age couples in Semarang City of 2016 can be stated better than with normal kernel discriminant.
Feature ranking and rank aggregation for automatic sleep stage classification: a comparative study.
Najdi, Shirin; Gharbali, Ali Abdollahi; Fonseca, José Manuel
2017-08-18
Nowadays, sleep quality is one of the most important measures of healthy life, especially considering the huge number of sleep-related disorders. Identifying sleep stages using polysomnographic (PSG) signals is the traditional way of assessing sleep quality. However, the manual process of sleep stage classification is time-consuming, subjective and costly. Therefore, in order to improve the accuracy and efficiency of the sleep stage classification, researchers have been trying to develop automatic classification algorithms. Automatic sleep stage classification mainly consists of three steps: pre-processing, feature extraction and classification. Since classification accuracy is deeply affected by the extracted features, a poor feature vector will adversely affect the classifier and eventually lead to low classification accuracy. Therefore, special attention should be given to the feature extraction and selection process. In this paper the performance of seven feature selection methods, as well as two feature rank aggregation methods, were compared. Pz-Oz EEG, horizontal EOG and submental chin EMG recordings of 22 healthy males and females were used. A comprehensive feature set including 49 features was extracted from these recordings. The extracted features are among the most common and effective features used in sleep stage classification from temporal, spectral, entropy-based and nonlinear categories. The feature selection methods were evaluated and compared using three criteria: classification accuracy, stability, and similarity. Simulation results show that MRMR-MID achieves the highest classification performance while Fisher method provides the most stable ranking. In our simulations, the performance of the aggregation methods was in the average level, although they are known to generate more stable results and better accuracy. The Borda and RRA rank aggregation methods could not outperform significantly the conventional feature ranking methods. Among conventional methods, some of them slightly performed better than others, although the choice of a suitable technique is dependent on the computational complexity and accuracy requirements of the user.
Porras-Alfaro, Andrea; Liu, Kuan-Liang; Kuske, Cheryl R; Xie, Gary
2014-02-01
We compared the classification accuracy of two sections of the fungal internal transcribed spacer (ITS) region, individually and combined, and the 5' section (about 600 bp) of the large-subunit rRNA (LSU), using a naive Bayesian classifier and BLASTN. A hand-curated ITS-LSU training set of 1,091 sequences and a larger training set of 8,967 ITS region sequences were used. Of the factors evaluated, database composition and quality had the largest effect on classification accuracy, followed by fragment size and use of a bootstrap cutoff to improve classification confidence. The naive Bayesian classifier and BLASTN gave similar results at higher taxonomic levels, but the classifier was faster and more accurate at the genus level when a bootstrap cutoff was used. All of the ITS and LSU sections performed well (>97.7% accuracy) at higher taxonomic ranks from kingdom to family, and differences between them were small at the genus level (within 0.66 to 1.23%). When full-length sequence sections were used, the LSU outperformed the ITS1 and ITS2 fragments at the genus level, but the ITS1 and ITS2 showed higher accuracy when smaller fragment sizes of the same length and a 50% bootstrap cutoff were used. In a comparison using the larger ITS training set, ITS1 and ITS2 had very similar accuracy classification for fragments between 100 and 200 bp. Collectively, the results show that any of the ITS or LSU sections we tested provided comparable classification accuracy to the genus level and underscore the need for larger and more diverse classification training sets.
Liu, Kuan-Liang; Kuske, Cheryl R.
2014-01-01
We compared the classification accuracy of two sections of the fungal internal transcribed spacer (ITS) region, individually and combined, and the 5′ section (about 600 bp) of the large-subunit rRNA (LSU), using a naive Bayesian classifier and BLASTN. A hand-curated ITS-LSU training set of 1,091 sequences and a larger training set of 8,967 ITS region sequences were used. Of the factors evaluated, database composition and quality had the largest effect on classification accuracy, followed by fragment size and use of a bootstrap cutoff to improve classification confidence. The naive Bayesian classifier and BLASTN gave similar results at higher taxonomic levels, but the classifier was faster and more accurate at the genus level when a bootstrap cutoff was used. All of the ITS and LSU sections performed well (>97.7% accuracy) at higher taxonomic ranks from kingdom to family, and differences between them were small at the genus level (within 0.66 to 1.23%). When full-length sequence sections were used, the LSU outperformed the ITS1 and ITS2 fragments at the genus level, but the ITS1 and ITS2 showed higher accuracy when smaller fragment sizes of the same length and a 50% bootstrap cutoff were used. In a comparison using the larger ITS training set, ITS1 and ITS2 had very similar accuracy classification for fragments between 100 and 200 bp. Collectively, the results show that any of the ITS or LSU sections we tested provided comparable classification accuracy to the genus level and underscore the need for larger and more diverse classification training sets. PMID:24242255
Ensemble Methods for Classification of Physical Activities from Wrist Accelerometry.
Chowdhury, Alok Kumar; Tjondronegoro, Dian; Chandran, Vinod; Trost, Stewart G
2017-09-01
To investigate whether the use of ensemble learning algorithms improve physical activity recognition accuracy compared to the single classifier algorithms, and to compare the classification accuracy achieved by three conventional ensemble machine learning methods (bagging, boosting, random forest) and a custom ensemble model comprising four algorithms commonly used for activity recognition (binary decision tree, k nearest neighbor, support vector machine, and neural network). The study used three independent data sets that included wrist-worn accelerometer data. For each data set, a four-step classification framework consisting of data preprocessing, feature extraction, normalization and feature selection, and classifier training and testing was implemented. For the custom ensemble, decisions from the single classifiers were aggregated using three decision fusion methods: weighted majority vote, naïve Bayes combination, and behavior knowledge space combination. Classifiers were cross-validated using leave-one subject out cross-validation and compared on the basis of average F1 scores. In all three data sets, ensemble learning methods consistently outperformed the individual classifiers. Among the conventional ensemble methods, random forest models provided consistently high activity recognition; however, the custom ensemble model using weighted majority voting demonstrated the highest classification accuracy in two of the three data sets. Combining multiple individual classifiers using conventional or custom ensemble learning methods can improve activity recognition accuracy from wrist-worn accelerometer data.
Improved fibrosis staging by elastometry and blood test in chronic hepatitis C.
Calès, Paul; Boursier, Jérôme; Ducancelle, Alexandra; Oberti, Frédéric; Hubert, Isabelle; Hunault, Gilles; de Lédinghen, Victor; Zarski, Jean-Pierre; Salmon, Dominique; Lunel, Françoise
2014-07-01
Our main objective was to improve non-invasive fibrosis staging accuracy by resolving the limits of previous methods via new test combinations. Our secondary objectives were to improve staging precision, by developing a detailed fibrosis classification, and reliability (personalized accuracy) determination. All patients (729) included in the derivation population had chronic hepatitis C, liver biopsy, 6 blood tests and Fibroscan. Validation populations included 1584 patients. The most accurate combination was provided by using most markers of FibroMeter and Fibroscan results targeted for significant fibrosis, i.e. 'E-FibroMeter'. Its classification accuracy (91.7%) and precision (assessed by F difference with Metavir: 0.62 ± 0.57) were better than those of FibroMeter (84.1%, P < 0.001; 0.72 ± 0.57, P < 0.001), Fibroscan (88.2%, P = 0.011; 0.68 ± 0.57, P = 0.020), and a previous CSF-SF classification of FibroMeter + Fibroscan (86.7%, P < 0.001; 0.65 ± 0.57, P = 0.044). The accuracy for fibrosis absence (F0) was increased, e.g. from 16.0% with Fibroscan to 75.0% with E-FibroMeter (P < 0.001). Cirrhosis sensitivity was improved, e.g. E-FibroMeter: 92.7% vs. Fibroscan: 83.3%, P = 0.004. The combination improved reliability by deleting unreliable results (accuracy <50%) observed with a single test (1.2% of patients) and increasing optimal reliability (accuracy ≥85%) from 80.4% of patients with Fibroscan (accuracy: 90.9%) to 94.2% of patients with E-FibroMeter (accuracy: 92.9%), P < 0.001. The patient rate with 100% predictive values for cirrhosis by the best combination was twice (36.2%) that of the best single test (FibroMeter: 16.2%, P < 0.001). The new test combination increased: accuracy, globally and especially in patients without fibrosis, staging precision, cirrhosis prediction, and even reliability, thus offering improved fibrosis staging. © 2013 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Shin, Jaeyoung; Kwon, Jinuk; Im, Chang-Hwan
2018-01-01
The performance of a brain-computer interface (BCI) can be enhanced by simultaneously using two or more modalities to record brain activity, which is generally referred to as a hybrid BCI. To date, many BCI researchers have tried to implement a hybrid BCI system by combining electroencephalography (EEG) and functional near-infrared spectroscopy (NIRS) to improve the overall accuracy of binary classification. However, since hybrid EEG-NIRS BCI, which will be denoted by hBCI in this paper, has not been applied to ternary classification problems, paradigms and classification strategies appropriate for ternary classification using hBCI are not well investigated. Here we propose the use of an hBCI for the classification of three brain activation patterns elicited by mental arithmetic, motor imagery, and idle state, with the aim to elevate the information transfer rate (ITR) of hBCI by increasing the number of classes while minimizing the loss of accuracy. EEG electrodes were placed over the prefrontal cortex and the central cortex, and NIRS optodes were placed only on the forehead. The ternary classification problem was decomposed into three binary classification problems using the "one-versus-one" (OVO) classification strategy to apply the filter-bank common spatial patterns filter to EEG data. A 10 × 10-fold cross validation was performed using shrinkage linear discriminant analysis (sLDA) to evaluate the average classification accuracies for EEG-BCI, NIRS-BCI, and hBCI when the meta-classification method was adopted to enhance classification accuracy. The ternary classification accuracies for EEG-BCI, NIRS-BCI, and hBCI were 76.1 ± 12.8, 64.1 ± 9.7, and 82.2 ± 10.2%, respectively. The classification accuracy of the proposed hBCI was thus significantly higher than those of the other BCIs ( p < 0.005). The average ITR for the proposed hBCI was calculated to be 4.70 ± 1.92 bits/minute, which was 34.3% higher than that reported for a previous binary hBCI study.
Tissue classification using depth-dependent ultrasound time series analysis: in-vitro animal study
NASA Astrophysics Data System (ADS)
Imani, Farhad; Daoud, Mohammad; Moradi, Mehdi; Abolmaesumi, Purang; Mousavi, Parvin
2011-03-01
Time series analysis of ultrasound radio-frequency (RF) signals has been shown to be an effective tissue classification method. Previous studies of this method for tissue differentiation at high and clinical-frequencies have been reported. In this paper, analysis of RF time series is extended to improve tissue classification at the clinical frequencies by including novel features extracted from the time series spectrum. The primary feature examined is the Mean Central Frequency (MCF) computed for regions of interest (ROIs) in the tissue extending along the axial axis of the transducer. In addition, the intercept and slope of a line fitted to the MCF-values of the RF time series as a function of depth have been included. To evaluate the accuracy of the new features, an in vitro animal study is performed using three tissue types: bovine muscle, bovine liver, and chicken breast, where perfect two-way classification is achieved. The results show statistically significant improvements over the classification accuracies with previously reported features.
Wang, Xueyi; Davidson, Nicholas J.
2011-01-01
Ensemble methods have been widely used to improve prediction accuracy over individual classifiers. In this paper, we achieve a few results about the prediction accuracies of ensemble methods for binary classification that are missed or misinterpreted in previous literature. First we show the upper and lower bounds of the prediction accuracies (i.e. the best and worst possible prediction accuracies) of ensemble methods. Next we show that an ensemble method can achieve > 0.5 prediction accuracy, while individual classifiers have < 0.5 prediction accuracies. Furthermore, for individual classifiers with different prediction accuracies, the average of the individual accuracies determines the upper and lower bounds. We perform two experiments to verify the results and show that it is hard to achieve the upper and lower bounds accuracies by random individual classifiers and better algorithms need to be developed. PMID:21853162
Karan, Shivesh Kishore; Samadder, Sukha Ranjan
2016-08-01
One objective of the present study was to evaluate the performance of support vector machine (SVM)-based image classification technique with the maximum likelihood classification (MLC) technique for a rapidly changing landscape of an open-cast mine. The other objective was to assess the change in land use pattern due to coal mining from 2006 to 2016. Assessing the change in land use pattern accurately is important for the development and monitoring of coalfields in conjunction with sustainable development. For the present study, Landsat 5 Thematic Mapper (TM) data of 2006 and Landsat 8 Operational Land Imager (OLI)/Thermal Infrared Sensor (TIRS) data of 2016 of a part of Jharia Coalfield, Dhanbad, India, were used. The SVM classification technique provided greater overall classification accuracy when compared to the MLC technique in classifying heterogeneous landscape with limited training dataset. SVM exceeded MLC in handling a difficult challenge of classifying features having near similar reflectance on the mean signature plot, an improvement of over 11 % was observed in classification of built-up area, and an improvement of 24 % was observed in classification of surface water using SVM; similarly, the SVM technique improved the overall land use classification accuracy by almost 6 and 3 % for Landsat 5 and Landsat 8 images, respectively. Results indicated that land degradation increased significantly from 2006 to 2016 in the study area. This study will help in quantifying the changes and can also serve as a basis for further decision support system studies aiding a variety of purposes such as planning and management of mines and environmental impact assessment.
A Two-Stream Deep Fusion Framework for High-Resolution Aerial Scene Classification
Liu, Fuxian
2018-01-01
One of the challenging problems in understanding high-resolution remote sensing images is aerial scene classification. A well-designed feature representation method and classifier can improve classification accuracy. In this paper, we construct a new two-stream deep architecture for aerial scene classification. First, we use two pretrained convolutional neural networks (CNNs) as feature extractor to learn deep features from the original aerial image and the processed aerial image through saliency detection, respectively. Second, two feature fusion strategies are adopted to fuse the two different types of deep convolutional features extracted by the original RGB stream and the saliency stream. Finally, we use the extreme learning machine (ELM) classifier for final classification with the fused features. The effectiveness of the proposed architecture is tested on four challenging datasets: UC-Merced dataset with 21 scene categories, WHU-RS dataset with 19 scene categories, AID dataset with 30 scene categories, and NWPU-RESISC45 dataset with 45 challenging scene categories. The experimental results demonstrate that our architecture gets a significant classification accuracy improvement over all state-of-the-art references. PMID:29581722
A Two-Stream Deep Fusion Framework for High-Resolution Aerial Scene Classification.
Yu, Yunlong; Liu, Fuxian
2018-01-01
One of the challenging problems in understanding high-resolution remote sensing images is aerial scene classification. A well-designed feature representation method and classifier can improve classification accuracy. In this paper, we construct a new two-stream deep architecture for aerial scene classification. First, we use two pretrained convolutional neural networks (CNNs) as feature extractor to learn deep features from the original aerial image and the processed aerial image through saliency detection, respectively. Second, two feature fusion strategies are adopted to fuse the two different types of deep convolutional features extracted by the original RGB stream and the saliency stream. Finally, we use the extreme learning machine (ELM) classifier for final classification with the fused features. The effectiveness of the proposed architecture is tested on four challenging datasets: UC-Merced dataset with 21 scene categories, WHU-RS dataset with 19 scene categories, AID dataset with 30 scene categories, and NWPU-RESISC45 dataset with 45 challenging scene categories. The experimental results demonstrate that our architecture gets a significant classification accuracy improvement over all state-of-the-art references.
NASA Astrophysics Data System (ADS)
Hänsch, Ronny; Hellwich, Olaf
2018-04-01
Random Forests have continuously proven to be one of the most accurate, robust, as well as efficient methods for the supervised classification of images in general and polarimetric synthetic aperture radar data in particular. While the majority of previous work focus on improving classification accuracy, we aim for accelerating the training of the classifier as well as its usage during prediction while maintaining its accuracy. Unlike other approaches we mainly consider algorithmic changes to stay as much as possible independent of platform and programming language. The final model achieves an approximately 60 times faster training and a 500 times faster prediction, while the accuracy is only marginally decreased by roughly 1 %.
Empirical evaluation of data normalization methods for molecular classification
Huang, Huei-Chung
2018-01-01
Background Data artifacts due to variations in experimental handling are ubiquitous in microarray studies, and they can lead to biased and irreproducible findings. A popular approach to correct for such artifacts is through post hoc data adjustment such as data normalization. Statistical methods for data normalization have been developed and evaluated primarily for the discovery of individual molecular biomarkers. Their performance has rarely been studied for the development of multi-marker molecular classifiers—an increasingly important application of microarrays in the era of personalized medicine. Methods In this study, we set out to evaluate the performance of three commonly used methods for data normalization in the context of molecular classification, using extensive simulations based on re-sampling from a unique pair of microRNA microarray datasets for the same set of samples. The data and code for our simulations are freely available as R packages at GitHub. Results In the presence of confounding handling effects, all three normalization methods tended to improve the accuracy of the classifier when evaluated in an independent test data. The level of improvement and the relative performance among the normalization methods depended on the relative level of molecular signal, the distributional pattern of handling effects (e.g., location shift vs scale change), and the statistical method used for building the classifier. In addition, cross-validation was associated with biased estimation of classification accuracy in the over-optimistic direction for all three normalization methods. Conclusion Normalization may improve the accuracy of molecular classification for data with confounding handling effects; however, it cannot circumvent the over-optimistic findings associated with cross-validation for assessing classification accuracy. PMID:29666754
Monti, S.; Cooper, G. F.
1998-01-01
We present a new Bayesian classifier for computer-aided diagnosis. The new classifier builds upon the naive-Bayes classifier, and models the dependencies among patient findings in an attempt to improve its performance, both in terms of classification accuracy and in terms of calibration of the estimated probabilities. This work finds motivation in the argument that highly calibrated probabilities are necessary for the clinician to be able to rely on the model's recommendations. Experimental results are presented, supporting the conclusion that modeling the dependencies among findings improves calibration. PMID:9929288
Zhang, Heng; Pan, Zhongming; Zhang, Wenna
2018-06-07
An acoustic⁻seismic mixed feature extraction method based on the wavelet coefficient energy ratio (WCER) of the target signal is proposed in this study for classifying vehicle targets in wireless sensor networks. The signal was decomposed into a set of wavelet coefficients using the à trous algorithm, which is a concise method used to implement the wavelet transform of a discrete signal sequence. After the wavelet coefficients of the target acoustic and seismic signals were obtained, the energy ratio of each layer coefficient was calculated as the feature vector of the target signals. Subsequently, the acoustic and seismic features were merged into an acoustic⁻seismic mixed feature to improve the target classification accuracy after the acoustic and seismic WCER features of the target signal were simplified using the hierarchical clustering method. We selected the support vector machine method for classification and utilized the data acquired from a real-world experiment to validate the proposed method. The calculated results show that the WCER feature extraction method can effectively extract the target features from target signals. Feature simplification can reduce the time consumption of feature extraction and classification, with no effect on the target classification accuracy. The use of acoustic⁻seismic mixed features effectively improved target classification accuracy by approximately 12% compared with either acoustic signal or seismic signal alone.
SENTINEL-1 and SENTINEL-2 Data Fusion for Wetlands Mapping: Balikdami, Turkey
NASA Astrophysics Data System (ADS)
Kaplan, G.; Avdan, U.
2018-04-01
Wetlands provide a number of environmental and socio-economic benefits such as their ability to store floodwaters and improve water quality, providing habitats for wildlife and supporting biodiversity, as well as aesthetic values. Remote sensing technology has proven to be a useful and frequent application in monitoring and mapping wetlands. Combining optical and microwave satellite data can help with mapping and monitoring the biophysical characteristics of wetlands and wetlands` vegetation. Also, fusing radar and optical remote sensing data can increase the wetland classification accuracy. In this paper, data from the fine spatial resolution optical satellite, Sentinel-2 and the Synthetic Aperture Radar Satellite, Sentinel-1, were fused for mapping wetlands. Both Sentinel-1 and Sentinel-2 images were pre-processed. After the pre-processing, vegetation indices were calculated using the Sentinel-2 bands and the results were included in the fusion data set. For the classification of the fused data, three different classification approaches were used and compared. The results showed significant improvement in the wetland classification using both multispectral and microwave data. Also, the presence of the red edge bands and the vegetation indices used in the data set showed significant improvement in the discrimination between wetlands and other vegetated areas. The statistical results of the fusion of the optical and radar data showed high wetland mapping accuracy, showing an overall classification accuracy of approximately 90 % in the object-based classification method. For future research, we recommend multi-temporal image use, terrain data collection, as well as a comparison of the used method with the traditional image fusion techniques.
Calès, P; Boursier, J; Lebigot, J; de Ledinghen, V; Aubé, C; Hubert, I; Oberti, F
2017-04-01
In chronic hepatitis C, the European Association for the Study of the Liver and the Asociacion Latinoamericana para el Estudio del Higado recommend performing transient elastography plus a blood test to diagnose significant fibrosis; test concordance confirms the diagnosis. To validate this rule and improve it by combining a blood test, FibroMeter (virus second generation, Echosens, Paris, France) and transient elastography (constitutive tests) into a single combined test, as suggested by the American Association for the Study of Liver Diseases and the Infectious Diseases Society of America. A total of 1199 patients were included in an exploratory set (HCV, n = 679) or in two validation sets (HCV ± HIV, HBV, n = 520). Accuracy was mainly evaluated by correct diagnosis rate for severe fibrosis (pathological Metavir F ≥ 3, primary outcome) by classical test scores or a fibrosis classification, reflecting Metavir staging, as a function of test concordance. Score accuracy: there were no significant differences between the blood test (75.7%), elastography (79.1%) and the combined test (79.4%) (P = 0.066); the score accuracy of each test was significantly (P < 0.001) decreased in discordant vs. concordant tests. Classification accuracy: combined test accuracy (91.7%) was significantly (P < 0.001) increased vs. the blood test (84.1%) and elastography (88.2%); accuracy of each constitutive test was significantly (P < 0.001) decreased in discordant vs. concordant tests but not with combined test: 89.0 vs. 92.7% (P = 0.118). Multivariate analysis for accuracy showed an interaction between concordance and fibrosis level: in the 1% of patients with full classification discordance and severe fibrosis, non-invasive tests were unreliable. The advantage of combined test classification was confirmed in the validation sets. The concordance recommendation is validated. A combined test, expressed in classification instead of score, improves this rule and validates the recommendation of a combined test, avoiding 99% of biopsies, and offering precise staging. © 2017 John Wiley & Sons Ltd.
Automatic classification of protein structures using physicochemical parameters.
Mohan, Abhilash; Rao, M Divya; Sunderrajan, Shruthi; Pennathur, Gautam
2014-09-01
Protein classification is the first step to functional annotation; SCOP and Pfam databases are currently the most relevant protein classification schemes. However, the disproportion in the number of three dimensional (3D) protein structures generated versus their classification into relevant superfamilies/families emphasizes the need for automated classification schemes. Predicting function of novel proteins based on sequence information alone has proven to be a major challenge. The present study focuses on the use of physicochemical parameters in conjunction with machine learning algorithms (Naive Bayes, Decision Trees, Random Forest and Support Vector Machines) to classify proteins into their respective SCOP superfamily/Pfam family, using sequence derived information. Spectrophores™, a 1D descriptor of the 3D molecular field surrounding a structure was used as a benchmark to compare the performance of the physicochemical parameters. The machine learning algorithms were modified to select features based on information gain for each SCOP superfamily/Pfam family. The effect of combining physicochemical parameters and spectrophores on classification accuracy (CA) was studied. Machine learning algorithms trained with the physicochemical parameters consistently classified SCOP superfamilies and Pfam families with a classification accuracy above 90%, while spectrophores performed with a CA of around 85%. Feature selection improved classification accuracy for both physicochemical parameters and spectrophores based machine learning algorithms. Combining both attributes resulted in a marginal loss of performance. Physicochemical parameters were able to classify proteins from both schemes with classification accuracy ranging from 90-96%. These results suggest the usefulness of this method in classifying proteins from amino acid sequences.
NASA Astrophysics Data System (ADS)
Sun, D.; Zheng, J. H.; Ma, T.; Chen, J. J.; Li, X.
2018-04-01
The rodent disaster is one of the main biological disasters in grassland in northern Xinjiang. The eating and digging behaviors will cause the destruction of ground vegetation, which seriously affected the development of animal husbandry and grassland ecological security. UAV low altitude remote sensing, as an emerging technique with high spatial resolution, can effectively recognize the burrows. However, how to select the appropriate spatial resolution to monitor the calamity of the rodent disaster is the first problem we need to pay attention to. The purpose of this study is to explore the optimal spatial scale on identification of the burrows by evaluating the impact of different spatial resolution for the burrows identification accuracy. In this study, we shoot burrows from different flight heights to obtain visible images of different spatial resolution. Then an object-oriented method is used to identify the caves, and we also evaluate the accuracy of the classification. We found that the highest classification accuracy of holes, the average has reached more than 80 %. At the altitude of 24 m and the spatial resolution of 1cm, the accuracy of the classification is the highest We have created a unique and effective way to identify burrows by using UAVs visible images. We draw the following conclusion: the best spatial resolution of burrows recognition is 1 cm using DJI PHANTOM-3 UAV, and the improvement of spatial resolution does not necessarily lead to the improvement of classification accuracy. This study lays the foundation for future research and can be extended to similar studies elsewhere.
NASA Astrophysics Data System (ADS)
Susanti, Yuliana; Zukhronah, Etik; Pratiwi, Hasih; Respatiwulan; Sri Sulistijowati, H.
2017-11-01
To achieve food resilience in Indonesia, food diversification by exploring potentials of local food is required. Corn is one of alternating staple food of Javanese society. For that reason, corn production needs to be improved by considering the influencing factors. CHAID and CRT are methods of data mining which can be used to classify the influencing variables. The present study seeks to dig up information on the potentials of local food availability of corn in regencies and cities in Java Island. CHAID analysis yields four classifications with accuracy of 78.8%, while CRT analysis yields seven classifications with accuracy of 79.6%.
Deep multi-scale convolutional neural network for hyperspectral image classification
NASA Astrophysics Data System (ADS)
Zhang, Feng-zhe; Yang, Xia
2018-04-01
In this paper, we proposed a multi-scale convolutional neural network for hyperspectral image classification task. Firstly, compared with conventional convolution, we utilize multi-scale convolutions, which possess larger respective fields, to extract spectral features of hyperspectral image. We design a deep neural network with a multi-scale convolution layer which contains 3 different convolution kernel sizes. Secondly, to avoid overfitting of deep neural network, dropout is utilized, which randomly sleeps neurons, contributing to improve the classification accuracy a bit. In addition, new skills like ReLU in deep learning is utilized in this paper. We conduct experiments on University of Pavia and Salinas datasets, and obtained better classification accuracy compared with other methods.
Rey, Sergio J.; Stephens, Philip A.; Laura, Jason R.
2017-01-01
Large data contexts present a number of challenges to optimal choropleth map classifiers. Application of optimal classifiers to a sample of the attribute space is one proposed solution. The properties of alternative sampling-based classification methods are examined through a series of Monte Carlo simulations. The impacts of spatial autocorrelation, number of desired classes, and form of sampling are shown to have significant impacts on the accuracy of map classifications. Tradeoffs between improved speed of the sampling approaches and loss of accuracy are also considered. The results suggest the possibility of guiding the choice of classification scheme as a function of the properties of large data sets.
Vehicle Classification Using an Imbalanced Dataset Based on a Single Magnetic Sensor.
Xu, Chang; Wang, Yingguan; Bao, Xinghe; Li, Fengrong
2018-05-24
This paper aims to improve the accuracy of automatic vehicle classifiers for imbalanced datasets. Classification is made through utilizing a single anisotropic magnetoresistive sensor, with the models of vehicles involved being classified into hatchbacks, sedans, buses, and multi-purpose vehicles (MPVs). Using time domain and frequency domain features in combination with three common classification algorithms in pattern recognition, we develop a novel feature extraction method for vehicle classification. These three common classification algorithms are the k-nearest neighbor, the support vector machine, and the back-propagation neural network. Nevertheless, a problem remains with the original vehicle magnetic dataset collected being imbalanced, and may lead to inaccurate classification results. With this in mind, we propose an approach called SMOTE, which can further boost the performance of classifiers. Experimental results show that the k-nearest neighbor (KNN) classifier with the SMOTE algorithm can reach a classification accuracy of 95.46%, thus minimizing the effect of the imbalance.
Evaluation of spatial filtering on the accuracy of wheat area estimate
NASA Technical Reports Server (NTRS)
Dejesusparada, N. (Principal Investigator); Moreira, M. A.; Chen, S. C.; Delima, A. M.
1982-01-01
A 3 x 3 pixel spatial filter for postclassification was used for wheat classification to evaluate the effects of this procedure on the accuracy of area estimation using LANDSAT digital data obtained from a single pass. Quantitative analyses were carried out in five test sites (approx 40 sq km each) and t tests showed that filtering with threshold values significantly decreased errors of commission and omission. In area estimation filtering improved the overestimate of 4.5% to 2.7% and the root-mean-square error decreased from 126.18 ha to 107.02 ha. Extrapolating the same procedure of automatic classification using spatial filtering for postclassification to the whole study area, the accuracy in area estimate was improved from the overestimate of 10.9% to 9.7%. It is concluded that when single pass LANDSAT data is used for crop identification and area estimation the postclassification procedure using a spatial filter provides a more accurate area estimate by reducing classification errors.
Forest tree species discrimination in western Himalaya using EO-1 Hyperion
NASA Astrophysics Data System (ADS)
George, Rajee; Padalia, Hitendra; Kushwaha, S. P. S.
2014-05-01
The information acquired in the narrow bands of hyperspectral remote sensing data has potential to capture plant species spectral variability, thereby improving forest tree species mapping. This study assessed the utility of spaceborne EO-1 Hyperion data in discrimination and classification of broadleaved evergreen and conifer forest tree species in western Himalaya. The pre-processing of 242 bands of Hyperion data resulted into 160 noise-free and vertical stripe corrected reflectance bands. Of these, 29 bands were selected through step-wise exclusion of bands (Wilk's Lambda). Spectral Angle Mapper (SAM) and Support Vector Machine (SVM) algorithms were applied to the selected bands to assess their effectiveness in classification. SVM was also applied to broadband data (Landsat TM) to compare the variation in classification accuracy. All commonly occurring six gregarious tree species, viz., white oak, brown oak, chir pine, blue pine, cedar and fir in western Himalaya could be effectively discriminated. SVM produced a better species classification (overall accuracy 82.27%, kappa statistic 0.79) than SAM (overall accuracy 74.68%, kappa statistic 0.70). It was noticed that classification accuracy achieved with Hyperion bands was significantly higher than Landsat TM bands (overall accuracy 69.62%, kappa statistic 0.65). Study demonstrated the potential utility of narrow spectral bands of Hyperion data in discriminating tree species in a hilly terrain.
Improving zero-training brain-computer interfaces by mixing model estimators
NASA Astrophysics Data System (ADS)
Verhoeven, T.; Hübner, D.; Tangermann, M.; Müller, K. R.; Dambre, J.; Kindermans, P. J.
2017-06-01
Objective. Brain-computer interfaces (BCI) based on event-related potentials (ERP) incorporate a decoder to classify recorded brain signals and subsequently select a control signal that drives a computer application. Standard supervised BCI decoders require a tedious calibration procedure prior to every session. Several unsupervised classification methods have been proposed that tune the decoder during actual use and as such omit this calibration. Each of these methods has its own strengths and weaknesses. Our aim is to improve overall accuracy of ERP-based BCIs without calibration. Approach. We consider two approaches for unsupervised classification of ERP signals. Learning from label proportions (LLP) was recently shown to be guaranteed to converge to a supervised decoder when enough data is available. In contrast, the formerly proposed expectation maximization (EM) based decoding for ERP-BCI does not have this guarantee. However, while this decoder has high variance due to random initialization of its parameters, it obtains a higher accuracy faster than LLP when the initialization is good. We introduce a method to optimally combine these two unsupervised decoding methods, letting one method’s strengths compensate for the weaknesses of the other and vice versa. The new method is compared to the aforementioned methods in a resimulation of an experiment with a visual speller. Main results. Analysis of the experimental results shows that the new method exceeds the performance of the previous unsupervised classification approaches in terms of ERP classification accuracy and symbol selection accuracy during the spelling experiment. Furthermore, the method shows less dependency on random initialization of model parameters and is consequently more reliable. Significance. Improving the accuracy and subsequent reliability of calibrationless BCIs makes these systems more appealing for frequent use.
Analysis of a multisensor image data set of south San Rafael Swell, Utah
NASA Technical Reports Server (NTRS)
Evans, D. L.
1982-01-01
A Shuttle Imaging Radar (SIR-A) image of the southern portion of the San Rafael Swell in Utah has been digitized and registered to coregistered Landsat, Seasat, and HCMM thermal inertia images. The addition of the SIR-A image to the registered data set improves rock type discrimination in both qualitative and quantitative analyses. Sedimentary units can be separated in a combined SIR-A/Seasat image that cannot be seen in either image alone. Discriminant Analyses show that the classification accuracy is improved with addition of the SIR-A image to Landsat images. Classification accuracy is further improved when texture information from the Seasat and SIR-A images is included.
Sørensen, Lauge; Nielsen, Mads
2018-05-15
The International Challenge for Automated Prediction of MCI from MRI data offered independent, standardized comparison of machine learning algorithms for multi-class classification of normal control (NC), mild cognitive impairment (MCI), converting MCI (cMCI), and Alzheimer's disease (AD) using brain imaging and general cognition. We proposed to use an ensemble of support vector machines (SVMs) that combined bagging without replacement and feature selection. SVM is the most commonly used algorithm in multivariate classification of dementia, and it was therefore valuable to evaluate the potential benefit of ensembling this type of classifier. The ensemble SVM, using either a linear or a radial basis function (RBF) kernel, achieved multi-class classification accuracies of 55.6% and 55.0% in the challenge test set (60 NC, 60 MCI, 60 cMCI, 60 AD), resulting in a third place in the challenge. Similar feature subset sizes were obtained for both kernels, and the most frequently selected MRI features were the volumes of the two hippocampal subregions left presubiculum and right subiculum. Post-challenge analysis revealed that enforcing a minimum number of selected features and increasing the number of ensemble classifiers improved classification accuracy up to 59.1%. The ensemble SVM outperformed single SVM classifications consistently in the challenge test set. Ensemble methods using bagging and feature selection can improve the performance of the commonly applied SVM classifier in dementia classification. This resulted in competitive classification accuracies in the International Challenge for Automated Prediction of MCI from MRI data. Copyright © 2018 Elsevier B.V. All rights reserved.
NASA Technical Reports Server (NTRS)
Myint, Soe W.; Mesev, Victor; Quattrochi, Dale; Wentz, Elizabeth A.
2013-01-01
Remote sensing methods used to generate base maps to analyze the urban environment rely predominantly on digital sensor data from space-borne platforms. This is due in part from new sources of high spatial resolution data covering the globe, a variety of multispectral and multitemporal sources, sophisticated statistical and geospatial methods, and compatibility with GIS data sources and methods. The goal of this chapter is to review the four groups of classification methods for digital sensor data from space-borne platforms; per-pixel, sub-pixel, object-based (spatial-based), and geospatial methods. Per-pixel methods are widely used methods that classify pixels into distinct categories based solely on the spectral and ancillary information within that pixel. They are used for simple calculations of environmental indices (e.g., NDVI) to sophisticated expert systems to assign urban land covers. Researchers recognize however, that even with the smallest pixel size the spectral information within a pixel is really a combination of multiple urban surfaces. Sub-pixel classification methods therefore aim to statistically quantify the mixture of surfaces to improve overall classification accuracy. While within pixel variations exist, there is also significant evidence that groups of nearby pixels have similar spectral information and therefore belong to the same classification category. Object-oriented methods have emerged that group pixels prior to classification based on spectral similarity and spatial proximity. Classification accuracy using object-based methods show significant success and promise for numerous urban 3 applications. Like the object-oriented methods that recognize the importance of spatial proximity, geospatial methods for urban mapping also utilize neighboring pixels in the classification process. The primary difference though is that geostatistical methods (e.g., spatial autocorrelation methods) are utilized during both the pre- and post-classification steps. Within this chapter, each of the four approaches is described in terms of scale and accuracy classifying urban land use and urban land cover; and for its range of urban applications. We demonstrate the overview of four main classification groups in Figure 1 while Table 1 details the approaches with respect to classification requirements and procedures (e.g., reflectance conversion, steps before training sample selection, training samples, spatial approaches commonly used, classifiers, primary inputs for classification, output structures, number of output layers, and accuracy assessment). The chapter concludes with a brief summary of the methods reviewed and the challenges that remain in developing new classification methods for improving the efficiency and accuracy of mapping urban areas.
An Ant Colony Optimization Based Feature Selection for Web Page Classification
2014-01-01
The increased popularity of the web has caused the inclusion of huge amount of information to the web, and as a result of this explosive information growth, automated web page classification systems are needed to improve search engines' performance. Web pages have a large number of features such as HTML/XML tags, URLs, hyperlinks, and text contents that should be considered during an automated classification process. The aim of this study is to reduce the number of features to be used to improve runtime and accuracy of the classification of web pages. In this study, we used an ant colony optimization (ACO) algorithm to select the best features, and then we applied the well-known C4.5, naive Bayes, and k nearest neighbor classifiers to assign class labels to web pages. We used the WebKB and Conference datasets in our experiments, and we showed that using the ACO for feature selection improves both accuracy and runtime performance of classification. We also showed that the proposed ACO based algorithm can select better features with respect to the well-known information gain and chi square feature selection methods. PMID:25136678
Improved classification accuracy by feature extraction using genetic algorithms
NASA Astrophysics Data System (ADS)
Patriarche, Julia; Manduca, Armando; Erickson, Bradley J.
2003-05-01
A feature extraction algorithm has been developed for the purposes of improving classification accuracy. The algorithm uses a genetic algorithm / hill-climber hybrid to generate a set of linearly recombined features, which may be of reduced dimensionality compared with the original set. The genetic algorithm performs the global exploration, and a hill climber explores local neighborhoods. Hybridizing the genetic algorithm with a hill climber improves both the rate of convergence, and the final overall cost function value; it also reduces the sensitivity of the genetic algorithm to parameter selection. The genetic algorithm includes the operators: crossover, mutation, and deletion / reactivation - the last of these effects dimensionality reduction. The feature extractor is supervised, and is capable of deriving a separate feature space for each tissue (which are reintegrated during classification). A non-anatomical digital phantom was developed as a gold standard for testing purposes. In tests with the phantom, and with images of multiple sclerosis patients, classification with feature extractor derived features yielded lower error rates than using standard pulse sequences, and with features derived using principal components analysis. Using the multiple sclerosis patient data, the algorithm resulted in a mean 31% reduction in classification error of pure tissues.
Analysis of spatial distribution of land cover maps accuracy
NASA Astrophysics Data System (ADS)
Khatami, R.; Mountrakis, G.; Stehman, S. V.
2017-12-01
Land cover maps have become one of the most important products of remote sensing science. However, classification errors will exist in any classified map and affect the reliability of subsequent map usage. Moreover, classification accuracy often varies over different regions of a classified map. These variations of accuracy will affect the reliability of subsequent analyses of different regions based on the classified maps. The traditional approach of map accuracy assessment based on an error matrix does not capture the spatial variation in classification accuracy. Here, per-pixel accuracy prediction methods are proposed based on interpolating accuracy values from a test sample to produce wall-to-wall accuracy maps. Different accuracy prediction methods were developed based on four factors: predictive domain (spatial versus spectral), interpolation function (constant, linear, Gaussian, and logistic), incorporation of class information (interpolating each class separately versus grouping them together), and sample size. Incorporation of spectral domain as explanatory feature spaces of classification accuracy interpolation was done for the first time in this research. Performance of the prediction methods was evaluated using 26 test blocks, with 10 km × 10 km dimensions, dispersed throughout the United States. The performance of the predictions was evaluated using the area under the curve (AUC) of the receiver operating characteristic. Relative to existing accuracy prediction methods, our proposed methods resulted in improvements of AUC of 0.15 or greater. Evaluation of the four factors comprising the accuracy prediction methods demonstrated that: i) interpolations should be done separately for each class instead of grouping all classes together; ii) if an all-classes approach is used, the spectral domain will result in substantially greater AUC than the spatial domain; iii) for the smaller sample size and per-class predictions, the spectral and spatial domain yielded similar AUC; iv) for the larger sample size (i.e., very dense spatial sample) and per-class predictions, the spatial domain yielded larger AUC; v) increasing the sample size improved accuracy predictions with a greater benefit accruing to the spatial domain; and vi) the function used for interpolation had the smallest effect on AUC.
Marciano, Michael A; Adelman, Jonathan D
2017-03-01
The deconvolution of DNA mixtures remains one of the most critical challenges in the field of forensic DNA analysis. In addition, of all the data features required to perform such deconvolution, the number of contributors in the sample is widely considered the most important, and, if incorrectly chosen, the most likely to negatively influence the mixture interpretation of a DNA profile. Unfortunately, most current approaches to mixture deconvolution require the assumption that the number of contributors is known by the analyst, an assumption that can prove to be especially faulty when faced with increasingly complex mixtures of 3 or more contributors. In this study, we propose a probabilistic approach for estimating the number of contributors in a DNA mixture that leverages the strengths of machine learning. To assess this approach, we compare classification performances of six machine learning algorithms and evaluate the model from the top-performing algorithm against the current state of the art in the field of contributor number classification. Overall results show over 98% accuracy in identifying the number of contributors in a DNA mixture of up to 4 contributors. Comparative results showed 3-person mixtures had a classification accuracy improvement of over 6% compared to the current best-in-field methodology, and that 4-person mixtures had a classification accuracy improvement of over 20%. The Probabilistic Assessment for Contributor Estimation (PACE) also accomplishes classification of mixtures of up to 4 contributors in less than 1s using a standard laptop or desktop computer. Considering the high classification accuracy rates, as well as the significant time commitment required by the current state of the art model versus seconds required by a machine learning-derived model, the approach described herein provides a promising means of estimating the number of contributors and, subsequently, will lead to improved DNA mixture interpretation. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Multisensor multiresolution data fusion for improvement in classification
NASA Astrophysics Data System (ADS)
Rubeena, V.; Tiwari, K. C.
2016-04-01
The rapid advancements in technology have facilitated easy availability of multisensor and multiresolution remote sensing data. Multisensor, multiresolution data contain complementary information and fusion of such data may result in application dependent significant information which may otherwise remain trapped within. The present work aims at improving classification by fusing features of coarse resolution hyperspectral (1 m) LWIR and fine resolution (20 cm) RGB data. The classification map comprises of eight classes. The class names are Road, Trees, Red Roof, Grey Roof, Concrete Roof, Vegetation, bare Soil and Unclassified. The processing methodology for hyperspectral LWIR data comprises of dimensionality reduction, resampling of data by interpolation technique for registering the two images at same spatial resolution, extraction of the spatial features to improve classification accuracy. In the case of fine resolution RGB data, the vegetation index is computed for classifying the vegetation class and the morphological building index is calculated for buildings. In order to extract the textural features, occurrence and co-occurence statistics is considered and the features will be extracted from all the three bands of RGB data. After extracting the features, Support Vector Machine (SVMs) has been used for training and classification. To increase the classification accuracy, post processing steps like removal of any spurious noise such as salt and pepper noise is done which is followed by filtering process by majority voting within the objects for better object classification.
NASA Technical Reports Server (NTRS)
Sadowski, F. E.; Sarno, J. E.
1976-01-01
First, an analysis of forest feature signatures was used to help explain the large variation in classification accuracy that can occur among individual forest features for any one case of spatial resolution and the inconsistent changes in classification accuracy that were demonstrated among features as spatial resolution was degraded. Second, the classification rejection threshold was varied in an effort to reduce the large proportion of unclassified resolution elements that previously appeared in the processing of coarse resolution data when a constant rejection threshold was used for all cases of spatial resolution. For the signature analysis, two-channel ellipse plots showing the feature signature distributions for several cases of spatial resolution indicated that the capability of signatures to correctly identify their respective features is dependent on the amount of statistical overlap among signatures. Reductions in signature variance that occur in data of degraded spatial resolution may not necessarily decrease the amount of statistical overlap among signatures having large variance and small mean separations. Features classified by such signatures may thus continue to have similar amounts of misclassified elements in coarser resolution data, and thus, not necessarily improve in classification accuracy.
NASA Astrophysics Data System (ADS)
Fujita, Yusuke; Mitani, Yoshihiro; Hamamoto, Yoshihiko; Segawa, Makoto; Terai, Shuji; Sakaida, Isao
2017-03-01
Ultrasound imaging is a popular and non-invasive tool used in the diagnoses of liver disease. Cirrhosis is a chronic liver disease and it can advance to liver cancer. Early detection and appropriate treatment are crucial to prevent liver cancer. However, ultrasound image analysis is very challenging, because of the low signal-to-noise ratio of ultrasound images. To achieve the higher classification performance, selection of training regions of interest (ROIs) is very important that effect to classification accuracy. The purpose of our study is cirrhosis detection with high accuracy using liver ultrasound images. In our previous works, training ROI selection by MILBoost and multiple-ROI classification based on the product rule had been proposed, to achieve high classification performance. In this article, we propose self-training method to select training ROIs effectively. Evaluation experiments were performed to evaluate effect of self-training, using manually selected ROIs and also automatically selected ROIs. Experimental results show that self-training for manually selected ROIs achieved higher classification performance than other approaches, including our conventional methods. The manually ROI definition and sample selection are important to improve classification accuracy in cirrhosis detection using ultrasound images.
Ensemble of classifiers for confidence-rated classification of NDE signal
NASA Astrophysics Data System (ADS)
Banerjee, Portia; Safdarnejad, Seyed; Udpa, Lalita; Udpa, Satish
2016-02-01
Ensemble of classifiers in general, aims to improve classification accuracy by combining results from multiple weak hypotheses into a single strong classifier through weighted majority voting. Improved versions of ensemble of classifiers generate self-rated confidence scores which estimate the reliability of each of its prediction and boost the classifier using these confidence-rated predictions. However, such a confidence metric is based only on the rate of correct classification. In existing works, although ensemble of classifiers has been widely used in computational intelligence, the effect of all factors of unreliability on the confidence of classification is highly overlooked. With relevance to NDE, classification results are affected by inherent ambiguity of classifica-tion, non-discriminative features, inadequate training samples and noise due to measurement. In this paper, we extend the existing ensemble classification by maximizing confidence of every classification decision in addition to minimizing the classification error. Initial results of the approach on data from eddy current inspection show improvement in classification performance of defect and non-defect indications.
NASA Technical Reports Server (NTRS)
Fagan, Matthew E.; Defries, Ruth S.; Sesnie, Steven E.; Arroyo-Mora, J. Pablo; Soto, Carlomagno; Singh, Aditya; Townsend, Philip A.; Chazdon, Robin L.
2015-01-01
An efficient means to map tree plantations is needed to detect tropical land use change and evaluate reforestation projects. To analyze recent tree plantation expansion in northeastern Costa Rica, we examined the potential of combining moderate-resolution hyperspectral imagery (2005 HyMap mosaic) with multitemporal, multispectral data (Landsat) to accurately classify (1) general forest types and (2) tree plantations by species composition. Following a linear discriminant analysis to reduce data dimensionality, we compared four Random Forest classification models: hyperspectral data (HD) alone; HD plus interannual spectral metrics; HD plus a multitemporal forest regrowth classification; and all three models combined. The fourth, combined model achieved overall accuracy of 88.5%. Adding multitemporal data significantly improved classification accuracy (p less than 0.0001) of all forest types, although the effect on tree plantation accuracy was modest. The hyperspectral data alone classified six species of tree plantations with 75% to 93% producer's accuracy; adding multitemporal spectral data increased accuracy only for two species with dense canopies. Non-native tree species had higher classification accuracy overall and made up the majority of tree plantations in this landscape. Our results indicate that combining occasionally acquired hyperspectral data with widely available multitemporal satellite imagery enhances mapping and monitoring of reforestation in tropical landscapes.
Sarker, Abeed; Gonzalez, Graciela
2015-02-01
Automatic detection of adverse drug reaction (ADR) mentions from text has recently received significant interest in pharmacovigilance research. Current research focuses on various sources of text-based information, including social media-where enormous amounts of user posted data is available, which have the potential for use in pharmacovigilance if collected and filtered accurately. The aims of this study are: (i) to explore natural language processing (NLP) approaches for generating useful features from text, and utilizing them in optimized machine learning algorithms for automatic classification of ADR assertive text segments; (ii) to present two data sets that we prepared for the task of ADR detection from user posted internet data; and (iii) to investigate if combining training data from distinct corpora can improve automatic classification accuracies. One of our three data sets contains annotated sentences from clinical reports, and the two other data sets, built in-house, consist of annotated posts from social media. Our text classification approach relies on generating a large set of features, representing semantic properties (e.g., sentiment, polarity, and topic), from short text nuggets. Importantly, using our expanded feature sets, we combine training data from different corpora in attempts to boost classification accuracies. Our feature-rich classification approach performs significantly better than previously published approaches with ADR class F-scores of 0.812 (previously reported best: 0.770), 0.538 and 0.678 for the three data sets. Combining training data from multiple compatible corpora further improves the ADR F-scores for the in-house data sets to 0.597 (improvement of 5.9 units) and 0.704 (improvement of 2.6 units) respectively. Our research results indicate that using advanced NLP techniques for generating information rich features from text can significantly improve classification accuracies over existing benchmarks. Our experiments illustrate the benefits of incorporating various semantic features such as topics, concepts, sentiments, and polarities. Finally, we show that integration of information from compatible corpora can significantly improve classification performance. This form of multi-corpus training may be particularly useful in cases where data sets are heavily imbalanced (e.g., social media data), and may reduce the time and costs associated with the annotation of data in the future. Copyright © 2014 The Authors. Published by Elsevier Inc. All rights reserved.
Portable Automatic Text Classification for Adverse Drug Reaction Detection via Multi-corpus Training
Gonzalez, Graciela
2014-01-01
Objective Automatic detection of Adverse Drug Reaction (ADR) mentions from text has recently received significant interest in pharmacovigilance research. Current research focuses on various sources of text-based information, including social media — where enormous amounts of user posted data is available, which have the potential for use in pharmacovigilance if collected and filtered accurately. The aims of this study are: (i) to explore natural language processing approaches for generating useful features from text, and utilizing them in optimized machine learning algorithms for automatic classification of ADR assertive text segments; (ii) to present two data sets that we prepared for the task of ADR detection from user posted internet data; and (iii) to investigate if combining training data from distinct corpora can improve automatic classification accuracies. Methods One of our three data sets contains annotated sentences from clinical reports, and the two other data sets, built in-house, consist of annotated posts from social media. Our text classification approach relies on generating a large set of features, representing semantic properties (e.g., sentiment, polarity, and topic), from short text nuggets. Importantly, using our expanded feature sets, we combine training data from different corpora in attempts to boost classification accuracies. Results Our feature-rich classification approach performs significantly better than previously published approaches with ADR class F-scores of 0.812 (previously reported best: 0.770), 0.538 and 0.678 for the three data sets. Combining training data from multiple compatible corpora further improves the ADR F-scores for the in-house data sets to 0.597 (improvement of 5.9 units) and 0.704 (improvement of 2.6 units) respectively. Conclusions Our research results indicate that using advanced NLP techniques for generating information rich features from text can significantly improve classification accuracies over existing benchmarks. Our experiments illustrate the benefits of incorporating various semantic features such as topics, concepts, sentiments, and polarities. Finally, we show that integration of information from compatible corpora can significantly improve classification performance. This form of multi-corpus training may be particularly useful in cases where data sets are heavily imbalanced (e.g., social media data), and may reduce the time and costs associated with the annotation of data in the future. PMID:25451103
Multiple confidence estimates as indices of eyewitness memory.
Sauer, James D; Brewer, Neil; Weber, Nathan
2008-08-01
Eyewitness identification decisions are vulnerable to various influences on witnesses' decision criteria that contribute to false identifications of innocent suspects and failures to choose perpetrators. An alternative procedure using confidence estimates to assess the degree of match between novel and previously viewed faces was investigated. Classification algorithms were applied to participants' confidence data to determine when a confidence value or pattern of confidence values indicated a positive response. Experiment 1 compared confidence group classification accuracy with a binary decision control group's accuracy on a standard old-new face recognition task and found superior accuracy for the confidence group for target-absent trials but not for target-present trials. Experiment 2 used a face mini-lineup task and found reduced target-present accuracy offset by large gains in target-absent accuracy. Using a standard lineup paradigm, Experiments 3 and 4 also found improved classification accuracy for target-absent lineups and, with a more sophisticated algorithm, for target-present lineups. This demonstrates the accessibility of evidence for recognition memory decisions and points to a more sensitive index of memory quality than is afforded by binary decisions.
Zourmand, Alireza; Ting, Hua-Nong; Mirhassani, Seyed Mostafa
2013-03-01
Speech is one of the prevalent communication mediums for humans. Identifying the gender of a child speaker based on his/her speech is crucial in telecommunication and speech therapy. This article investigates the use of fundamental and formant frequencies from sustained vowel phonation to distinguish the gender of Malay children aged between 7 and 12 years. The Euclidean minimum distance and multilayer perceptron were used to classify the gender of 360 Malay children based on different combinations of fundamental and formant frequencies (F0, F1, F2, and F3). The Euclidean minimum distance with normalized frequency data achieved a classification accuracy of 79.44%, which was higher than that of the nonnormalized frequency data. Age-dependent modeling was used to improve the accuracy of gender classification. The Euclidean distance method obtained 84.17% based on the optimal classification accuracy for all age groups. The accuracy was further increased to 99.81% using multilayer perceptron based on mel-frequency cepstral coefficients. Copyright © 2013 The Voice Foundation. Published by Mosby, Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Du, Peijun; Tan, Kun; Xing, Xiaoshi
2010-12-01
Combining Support Vector Machine (SVM) with wavelet analysis, we constructed wavelet SVM (WSVM) classifier based on wavelet kernel functions in Reproducing Kernel Hilbert Space (RKHS). In conventional kernel theory, SVM is faced with the bottleneck of kernel parameter selection which further results in time-consuming and low classification accuracy. The wavelet kernel in RKHS is a kind of multidimensional wavelet function that can approximate arbitrary nonlinear functions. Implications on semiparametric estimation are proposed in this paper. Airborne Operational Modular Imaging Spectrometer II (OMIS II) hyperspectral remote sensing image with 64 bands and Reflective Optics System Imaging Spectrometer (ROSIS) data with 115 bands were used to experiment the performance and accuracy of the proposed WSVM classifier. The experimental results indicate that the WSVM classifier can obtain the highest accuracy when using the Coiflet Kernel function in wavelet transform. In contrast with some traditional classifiers, including Spectral Angle Mapping (SAM) and Minimum Distance Classification (MDC), and SVM classifier using Radial Basis Function kernel, the proposed wavelet SVM classifier using the wavelet kernel function in Reproducing Kernel Hilbert Space is capable of improving classification accuracy obviously.
Koch, Stefan P.; Hägele, Claudia; Haynes, John-Dylan; Heinz, Andreas; Schlagenhauf, Florian; Sterzer, Philipp
2015-01-01
Functional neuroimaging has provided evidence for altered function of mesolimbic circuits implicated in reward processing, first and foremost the ventral striatum, in patients with schizophrenia. While such findings based on significant group differences in brain activations can provide important insights into the pathomechanisms of mental disorders, the use of neuroimaging results from standard univariate statistical analysis for individual diagnosis has proven difficult. In this proof of concept study, we tested whether the predictive accuracy for the diagnostic classification of schizophrenia patients vs. healthy controls could be improved using multivariate pattern analysis (MVPA) of regional functional magnetic resonance imaging (fMRI) activation patterns for the anticipation of monetary reward. With a searchlight MVPA approach using support vector machine classification, we found that the diagnostic category could be predicted from local activation patterns in frontal, temporal, occipital and midbrain regions, with a maximal cluster peak classification accuracy of 93% for the right pallidum. Region-of-interest based MVPA for the ventral striatum achieved a maximal cluster peak accuracy of 88%, whereas the classification accuracy on the basis of standard univariate analysis reached only 75%. Moreover, using support vector regression we could additionally predict the severity of negative symptoms from ventral striatal activation patterns. These results show that MVPA can be used to substantially increase the accuracy of diagnostic classification on the basis of task-related fMRI signal patterns in a regionally specific way. PMID:25799236
Improved Fuzzy K-Nearest Neighbor Using Modified Particle Swarm Optimization
NASA Astrophysics Data System (ADS)
Jamaluddin; Siringoringo, Rimbun
2017-12-01
Fuzzy k-Nearest Neighbor (FkNN) is one of the most powerful classification methods. The presence of fuzzy concepts in this method successfully improves its performance on almost all classification issues. The main drawbackof FKNN is that it is difficult to determine the parameters. These parameters are the number of neighbors (k) and fuzzy strength (m). Both parameters are very sensitive. This makes it difficult to determine the values of ‘m’ and ‘k’, thus making FKNN difficult to control because no theories or guides can deduce how proper ‘m’ and ‘k’ should be. This study uses Modified Particle Swarm Optimization (MPSO) to determine the best value of ‘k’ and ‘m’. MPSO is focused on the Constriction Factor Method. Constriction Factor Method is an improvement of PSO in order to avoid local circumstances optima. The model proposed in this study was tested on the German Credit Dataset. The test of the data/The data test has been standardized by UCI Machine Learning Repository which is widely applied to classification problems. The application of MPSO to the determination of FKNN parameters is expected to increase the value of classification performance. Based on the experiments that have been done indicating that the model offered in this research results in a better classification performance compared to the Fk-NN model only. The model offered in this study has an accuracy rate of 81%, while. With using Fk-NN model, it has the accuracy of 70%. At the end is done comparison of research model superiority with 2 other classification models;such as Naive Bayes and Decision Tree. This research model has a better performance level, where Naive Bayes has accuracy 75%, and the decision tree model has 70%
Devos, Olivier; Downey, Gerard; Duponchel, Ludovic
2014-04-01
Classification is an important task in chemometrics. For several years now, support vector machines (SVMs) have proven to be powerful for infrared spectral data classification. However such methods require optimisation of parameters in order to control the risk of overfitting and the complexity of the boundary. Furthermore, it is established that the prediction ability of classification models can be improved using pre-processing in order to remove unwanted variance in the spectra. In this paper we propose a new methodology based on genetic algorithm (GA) for the simultaneous optimisation of SVM parameters and pre-processing (GENOPT-SVM). The method has been tested for the discrimination of the geographical origin of Italian olive oil (Ligurian and non-Ligurian) on the basis of near infrared (NIR) or mid infrared (FTIR) spectra. Different classification models (PLS-DA, SVM with mean centre data, GENOPT-SVM) have been tested and statistically compared using McNemar's statistical test. For the two datasets, SVM with optimised pre-processing give models with higher accuracy than the one obtained with PLS-DA on pre-processed data. In the case of the NIR dataset, most of this accuracy improvement (86.3% compared with 82.8% for PLS-DA) occurred using only a single pre-processing step. For the FTIR dataset, three optimised pre-processing steps are required to obtain SVM model with significant accuracy improvement (82.2%) compared to the one obtained with PLS-DA (78.6%). Furthermore, this study demonstrates that even SVM models have to be developed on the basis of well-corrected spectral data in order to obtain higher classification rates. Copyright © 2013 Elsevier Ltd. All rights reserved.
Chen, Chien P; Braunstein, Steve; Mourad, Michelle; Hsu, I-Chow J; Haas-Kogan, Daphne; Roach, Mack; Fogh, Shannon E
2015-01-01
Accurate International Classification of Diseases (ICD) diagnosis coding is critical for patient care, billing purposes, and research endeavors. In this single-institution study, we evaluated our baseline ICD-9 (9th revision) diagnosis coding accuracy, identified the most common errors contributing to inaccurate coding, and implemented a multimodality strategy to improve radiation oncology coding. We prospectively studied ICD-9 coding accuracy in our radiation therapy--specific electronic medical record system. Baseline ICD-9 coding accuracy was obtained from chart review targeting ICD-9 coding accuracy of all patients treated at our institution between March and June of 2010. To improve performance an educational session highlighted common coding errors, and a user-friendly software tool, RadOnc ICD Search, version 1.0, for coding radiation oncology specific diagnoses was implemented. We then prospectively analyzed ICD-9 coding accuracy for all patients treated from July 2010 to June 2011, with the goal of maintaining 80% or higher coding accuracy. Data on coding accuracy were analyzed and fed back monthly to individual providers. Baseline coding accuracy for physicians was 463 of 661 (70%) cases. Only 46% of physicians had coding accuracy above 80%. The most common errors involved metastatic cases, whereby primary or secondary site ICD-9 codes were either incorrect or missing, and special procedures such as stereotactic radiosurgery cases. After implementing our project, overall coding accuracy rose to 92% (range, 86%-96%). The median accuracy for all physicians was 93% (range, 77%-100%) with only 1 attending having accuracy below 80%. Incorrect primary and secondary ICD-9 codes in metastatic cases showed the most significant improvement (10% vs 2% after intervention). Identifying common coding errors and implementing both education and systems changes led to significantly improved coding accuracy. This quality assurance project highlights the potential problem of ICD-9 coding accuracy by physicians and offers an approach to effectively address this shortcoming. Copyright © 2015. Published by Elsevier Inc.
Research on Remote Sensing Image Classification Based on Feature Level Fusion
NASA Astrophysics Data System (ADS)
Yuan, L.; Zhu, G.
2018-04-01
Remote sensing image classification, as an important direction of remote sensing image processing and application, has been widely studied. However, in the process of existing classification algorithms, there still exists the phenomenon of misclassification and missing points, which leads to the final classification accuracy is not high. In this paper, we selected Sentinel-1A and Landsat8 OLI images as data sources, and propose a classification method based on feature level fusion. Compare three kind of feature level fusion algorithms (i.e., Gram-Schmidt spectral sharpening, Principal Component Analysis transform and Brovey transform), and then select the best fused image for the classification experimental. In the classification process, we choose four kinds of image classification algorithms (i.e. Minimum distance, Mahalanobis distance, Support Vector Machine and ISODATA) to do contrast experiment. We use overall classification precision and Kappa coefficient as the classification accuracy evaluation criteria, and the four classification results of fused image are analysed. The experimental results show that the fusion effect of Gram-Schmidt spectral sharpening is better than other methods. In four kinds of classification algorithms, the fused image has the best applicability to Support Vector Machine classification, the overall classification precision is 94.01 % and the Kappa coefficients is 0.91. The fused image with Sentinel-1A and Landsat8 OLI is not only have more spatial information and spectral texture characteristics, but also enhances the distinguishing features of the images. The proposed method is beneficial to improve the accuracy and stability of remote sensing image classification.
NASA Astrophysics Data System (ADS)
Diesing, Markus; Green, Sophie L.; Stephens, David; Lark, R. Murray; Stewart, Heather A.; Dove, Dayton
2014-08-01
Marine spatial planning and conservation need underpinning with sufficiently detailed and accurate seabed substrate and habitat maps. Although multibeam echosounders enable us to map the seabed with high resolution and spatial accuracy, there is still a lack of fit-for-purpose seabed maps. This is due to the high costs involved in carrying out systematic seabed mapping programmes and the fact that the development of validated, repeatable, quantitative and objective methods of swath acoustic data interpretation is still in its infancy. We compared a wide spectrum of approaches including manual interpretation, geostatistics, object-based image analysis and machine-learning to gain further insights into the accuracy and comparability of acoustic data interpretation approaches based on multibeam echosounder data (bathymetry, backscatter and derivatives) and seabed samples with the aim to derive seabed substrate maps. Sample data were split into a training and validation data set to allow us to carry out an accuracy assessment. Overall thematic classification accuracy ranged from 67% to 76% and Cohen's kappa varied between 0.34 and 0.52. However, these differences were not statistically significant at the 5% level. Misclassifications were mainly associated with uncommon classes, which were rarely sampled. Map outputs were between 68% and 87% identical. To improve classification accuracy in seabed mapping, we suggest that more studies on the effects of factors affecting the classification performance as well as comparative studies testing the performance of different approaches need to be carried out with a view to developing guidelines for selecting an appropriate method for a given dataset. In the meantime, classification accuracy might be improved by combining different techniques to hybrid approaches and multi-method ensembles.
Multi-site evaluation of IKONOS data for classification of tropical coral reef environments
Andrefouet, S.; Kramer, Philip; Torres-Pulliza, D.; Joyce, K.E.; Hochberg, E.J.; Garza-Perez, R.; Mumby, P.J.; Riegl, Bernhard; Yamano, H.; White, W.H.; Zubia, M.; Brock, J.C.; Phinn, S.R.; Naseer, A.; Hatcher, B.G.; Muller-Karger, F. E.
2003-01-01
Ten IKONOS images of different coral reef sites distributed around the world were processed to assess the potential of 4-m resolution multispectral data for coral reef habitat mapping. Complexity of reef environments, established by field observation, ranged from 3 to 15 classes of benthic habitats containing various combinations of sediments, carbonate pavement, seagrass, algae, and corals in different geomorphologic zones (forereef, lagoon, patch reef, reef flats). Processing included corrections for sea surface roughness and bathymetry, unsupervised or supervised classification, and accuracy assessment based on ground-truth data. IKONOS classification results were compared with classified Landsat 7 imagery for simple to moderate complexity of reef habitats (5-11 classes). For both sensors, overall accuracies of the classifications show a general linear trend of decreasing accuracy with increasing habitat complexity. The IKONOS sensor performed better, with a 15-20% improvement in accuracy compared to Landsat. For IKONOS, overall accuracy was 77% for 4-5 classes, 71% for 7-8 classes, 65% in 9-11 classes, and 53% for more than 13 classes. The Landsat classification accuracy was systematically lower, with an average of 56% for 5-10 classes. Within this general trend, inter-site comparisons and specificities demonstrate the benefits of different approaches. Pre-segmentation of the different geomorphologic zones and depth correction provided different advantages in different environments. Our results help guide scientists and managers in applying IKONOS-class data for coral reef mapping applications. ?? 2003 Elsevier Inc. All rights reserved.
A review of supervised object-based land-cover image classification
NASA Astrophysics Data System (ADS)
Ma, Lei; Li, Manchun; Ma, Xiaoxue; Cheng, Liang; Du, Peijun; Liu, Yongxue
2017-08-01
Object-based image classification for land-cover mapping purposes using remote-sensing imagery has attracted significant attention in recent years. Numerous studies conducted over the past decade have investigated a broad array of sensors, feature selection, classifiers, and other factors of interest. However, these research results have not yet been synthesized to provide coherent guidance on the effect of different supervised object-based land-cover classification processes. In this study, we first construct a database with 28 fields using qualitative and quantitative information extracted from 254 experimental cases described in 173 scientific papers. Second, the results of the meta-analysis are reported, including general characteristics of the studies (e.g., the geographic range of relevant institutes, preferred journals) and the relationships between factors of interest (e.g., spatial resolution and study area or optimal segmentation scale, accuracy and number of targeted classes), especially with respect to the classification accuracy of different sensors, segmentation scale, training set size, supervised classifiers, and land-cover types. Third, useful data on supervised object-based image classification are determined from the meta-analysis. For example, we find that supervised object-based classification is currently experiencing rapid advances, while development of the fuzzy technique is limited in the object-based framework. Furthermore, spatial resolution correlates with the optimal segmentation scale and study area, and Random Forest (RF) shows the best performance in object-based classification. The area-based accuracy assessment method can obtain stable classification performance, and indicates a strong correlation between accuracy and training set size, while the accuracy of the point-based method is likely to be unstable due to mixed objects. In addition, the overall accuracy benefits from higher spatial resolution images (e.g., unmanned aerial vehicle) or agricultural sites where it also correlates with the number of targeted classes. More than 95.6% of studies involve an area less than 300 ha, and the spatial resolution of images is predominantly between 0 and 2 m. Furthermore, we identify some methods that may advance supervised object-based image classification. For example, deep learning and type-2 fuzzy techniques may further improve classification accuracy. Lastly, scientists are strongly encouraged to report results of uncertainty studies to further explore the effects of varied factors on supervised object-based image classification.
Mapping Winter Wheat with Multi-Temporal SAR and Optical Images in an Urban Agricultural Region
Zhou, Tao; Pan, Jianjun; Zhang, Peiyu; Wei, Shanbao; Han, Tao
2017-01-01
Winter wheat is the second largest food crop in China. It is important to obtain reliable winter wheat acreage to guarantee the food security for the most populous country in the world. This paper focuses on assessing the feasibility of in-season winter wheat mapping and investigating potential classification improvement by using SAR (Synthetic Aperture Radar) images, optical images, and the integration of both types of data in urban agricultural regions with complex planting structures in Southern China. Both SAR (Sentinel-1A) and optical (Landsat-8) data were acquired, and classification using different combinations of Sentinel-1A-derived information and optical images was performed using a support vector machine (SVM) and a random forest (RF) method. The interference coherence and texture images were obtained and used to assess the effect of adding them to the backscatter intensity images on the classification accuracy. The results showed that the use of four Sentinel-1A images acquired before the jointing period of winter wheat can provide satisfactory winter wheat classification accuracy, with an F1 measure of 87.89%. The combination of SAR and optical images for winter wheat mapping achieved the best F1 measure–up to 98.06%. The SVM was superior to RF in terms of the overall accuracy and the kappa coefficient, and was faster than RF, while the RF classifier was slightly better than SVM in terms of the F1 measure. In addition, the classification accuracy can be effectively improved by adding the texture and coherence images to the backscatter intensity data. PMID:28587066
New Framework for Cross-Domain Document Classification
2011-03-01
classification. The following paragraphs will introduce these related works in more detail. Wang et al . attempted to improve the accuracy of text document...of using Wikipedia to develop a thesaurus [20]. Gabrilovich et al . had an approach that is more elaborate in its use of Wikipedia text [21]. The...did show a modest improvement when it is performed using the Wikipedia information. Wang et al . improved on the results of co-clustering algorithm [24
NASA Technical Reports Server (NTRS)
Jung, Jinha; Pasolli, Edoardo; Prasad, Saurabh; Tilton, James C.; Crawford, Melba M.
2014-01-01
Acquiring current, accurate land-use information is critical for monitoring and understanding the impact of anthropogenic activities on natural environments.Remote sensing technologies are of increasing importance because of their capability to acquire information for large areas in a timely manner, enabling decision makers to be more effective in complex environments. Although optical imagery has demonstrated to be successful for land cover classification, active sensors, such as light detection and ranging (LiDAR), have distinct capabilities that can be exploited to improve classification results. However, utilization of LiDAR data for land cover classification has not been fully exploited. Moreover, spatial-spectral classification has recently gained significant attention since classification accuracy can be improved by extracting additional information from the neighboring pixels. Although spatial information has been widely used for spectral data, less attention has been given to LiDARdata. In this work, a new framework for land cover classification using discrete return LiDAR data is proposed. Pseudo-waveforms are generated from the LiDAR data and processed by hierarchical segmentation. Spatial featuresare extracted in a region-based way using a new unsupervised strategy for multiple pruning of the segmentation hierarchy. The proposed framework is validated experimentally on a real dataset acquired in an urban area. Better classification results are exhibited by the proposed framework compared to the cases in which basic LiDAR products such as digital surface model and intensity image are used. Moreover, the proposed region-based feature extraction strategy results in improved classification accuracies in comparison with a more traditional window-based approach.
ERIC Educational Resources Information Center
Koon, Sharon; Petscher, Yaacov
2015-01-01
The purpose of this report was to explicate the use of logistic regression and classification and regression tree (CART) analysis in the development of early warning systems. It was motivated by state education leaders' interest in maintaining high classification accuracy while simultaneously improving practitioner understanding of the rules by…
Papageorgiou, Eirini; Nieuwenhuys, Angela; Desloovere, Kaat
2017-01-01
Background This study aimed to improve the automatic probabilistic classification of joint motion gait patterns in children with cerebral palsy by using the expert knowledge available via a recently developed Delphi-consensus study. To this end, this study applied both Naïve Bayes and Logistic Regression classification with varying degrees of usage of the expert knowledge (expert-defined and discretized features). A database of 356 patients and 1719 gait trials was used to validate the classification performance of eleven joint motions. Hypotheses Two main hypotheses stated that: (1) Joint motion patterns in children with CP, obtained through a Delphi-consensus study, can be automatically classified following a probabilistic approach, with an accuracy similar to clinical expert classification, and (2) The inclusion of clinical expert knowledge in the selection of relevant gait features and the discretization of continuous features increases the performance of automatic probabilistic joint motion classification. Findings This study provided objective evidence supporting the first hypothesis. Automatic probabilistic gait classification using the expert knowledge available from the Delphi-consensus study resulted in accuracy (91%) similar to that obtained with two expert raters (90%), and higher accuracy than that obtained with non-expert raters (78%). Regarding the second hypothesis, this study demonstrated that the use of more advanced machine learning techniques such as automatic feature selection and discretization instead of expert-defined and discretized features can result in slightly higher joint motion classification performance. However, the increase in performance is limited and does not outweigh the additional computational cost and the higher risk of loss of clinical interpretability, which threatens the clinical acceptance and applicability. PMID:28570616
Vetter, Jeffrey S.
2005-02-01
The method and system described herein presents a technique for performance analysis that helps users understand the communication behavior of their message passing applications. The method and system described herein may automatically classifies individual communication operations and reveal the cause of communication inefficiencies in the application. This classification allows the developer to quickly focus on the culprits of truly inefficient behavior, rather than manually foraging through massive amounts of performance data. Specifically, the method and system described herein trace the message operations of Message Passing Interface (MPI) applications and then classify each individual communication event using a supervised learning technique: decision tree classification. The decision tree may be trained using microbenchmarks that demonstrate both efficient and inefficient communication. Since the method and system described herein adapt to the target system's configuration through these microbenchmarks, they simultaneously automate the performance analysis process and improve classification accuracy. The method and system described herein may improve the accuracy of performance analysis and dramatically reduce the amount of data that users must encounter.
NASA Astrophysics Data System (ADS)
Dash, Jatindra K.; Kale, Mandar; Mukhopadhyay, Sudipta; Khandelwal, Niranjan; Prabhakar, Nidhi; Garg, Mandeep; Kalra, Naveen
2017-03-01
In this paper, we investigate the effect of the error criteria used during a training phase of the artificial neural network (ANN) on the accuracy of the classifier for classification of lung tissues affected with Interstitial Lung Diseases (ILD). Mean square error (MSE) and the cross-entropy (CE) criteria are chosen being most popular choice in state-of-the-art implementations. The classification experiment performed on the six interstitial lung disease (ILD) patterns viz. Consolidation, Emphysema, Ground Glass Opacity, Micronodules, Fibrosis and Healthy from MedGIFT database. The texture features from an arbitrary region of interest (AROI) are extracted using Gabor filter. Two different neural networks are trained with the scaled conjugate gradient back propagation algorithm with MSE and CE error criteria function respectively for weight updation. Performance is evaluated in terms of average accuracy of these classifiers using 4 fold cross-validation. Each network is trained for five times for each fold with randomly initialized weight vectors and accuracies are computed. Significant improvement in classification accuracy is observed when ANN is trained by using CE (67.27%) as error function compared to MSE (63.60%). Moreover, standard deviation of the classification accuracy for the network trained with CE (6.69) error criteria is found less as compared to network trained with MSE (10.32) criteria.
NASA Astrophysics Data System (ADS)
Qu, Haicheng; Liang, Xuejian; Liang, Shichao; Liu, Wanjun
2018-01-01
Many methods of hyperspectral image classification have been proposed recently, and the convolutional neural network (CNN) achieves outstanding performance. However, spectral-spatial classification of CNN requires an excessively large model, tremendous computations, and complex network, and CNN is generally unable to use the noisy bands caused by water-vapor absorption. A dimensionality-varied CNN (DV-CNN) is proposed to address these issues. There are four stages in DV-CNN and the dimensionalities of spectral-spatial feature maps vary with the stages. DV-CNN can reduce the computation and simplify the structure of the network. All feature maps are processed by more kernels in higher stages to extract more precise features. DV-CNN also improves the classification accuracy and enhances the robustness to water-vapor absorption bands. The experiments are performed on data sets of Indian Pines and Pavia University scene. The classification performance of DV-CNN is compared with state-of-the-art methods, which contain the variations of CNN, traditional, and other deep learning methods. The experiment of performance analysis about DV-CNN itself is also carried out. The experimental results demonstrate that DV-CNN outperforms state-of-the-art methods for spectral-spatial classification and it is also robust to water-vapor absorption bands. Moreover, reasonable parameters selection is effective to improve classification accuracy.
Convolutional Neural Network for Histopathological Analysis of Osteosarcoma.
Mishra, Rashika; Daescu, Ovidiu; Leavey, Patrick; Rakheja, Dinesh; Sengupta, Anita
2018-03-01
Pathologists often deal with high complexity and sometimes disagreement over osteosarcoma tumor classification due to cellular heterogeneity in the dataset. Segmentation and classification of histology tissue in H&E stained tumor image datasets is a challenging task because of intra-class variations, inter-class similarity, crowded context, and noisy data. In recent years, deep learning approaches have led to encouraging results in breast cancer and prostate cancer analysis. In this article, we propose convolutional neural network (CNN) as a tool to improve efficiency and accuracy of osteosarcoma tumor classification into tumor classes (viable tumor, necrosis) versus nontumor. The proposed CNN architecture contains eight learned layers: three sets of stacked two convolutional layers interspersed with max pooling layers for feature extraction and two fully connected layers with data augmentation strategies to boost performance. The use of a neural network results in higher accuracy of average 92% for the classification. We compare the proposed architecture with three existing and proven CNN architectures for image classification: AlexNet, LeNet, and VGGNet. We also provide a pipeline to calculate percentage necrosis in a given whole slide image. We conclude that the use of neural networks can assure both high accuracy and efficiency in osteosarcoma classification.
Lu, Huijuan; Wei, Shasha; Zhou, Zili; Miao, Yanzi; Lu, Yi
2015-01-01
The main purpose of traditional classification algorithms on bioinformatics application is to acquire better classification accuracy. However, these algorithms cannot meet the requirement that minimises the average misclassification cost. In this paper, a new algorithm of cost-sensitive regularised extreme learning machine (CS-RELM) was proposed by using probability estimation and misclassification cost to reconstruct the classification results. By improving the classification accuracy of a group of small sample which higher misclassification cost, the new CS-RELM can minimise the classification cost. The 'rejection cost' was integrated into CS-RELM algorithm to further reduce the average misclassification cost. By using Colon Tumour dataset and SRBCT (Small Round Blue Cells Tumour) dataset, CS-RELM was compared with other cost-sensitive algorithms such as extreme learning machine (ELM), cost-sensitive extreme learning machine, regularised extreme learning machine, cost-sensitive support vector machine (SVM). The results of experiments show that CS-RELM with embedded rejection cost could reduce the average cost of misclassification and made more credible classification decision than others.
NASA Astrophysics Data System (ADS)
Sung, Changhyuck; Lim, Seokjae; Kim, Hyungjun; Kim, Taesu; Moon, Kibong; Song, Jeonghwan; Kim, Jae-Joon; Hwang, Hyunsang
2018-03-01
To improve the classification accuracy of an image data set (CIFAR-10) by using analog input voltage, synapse devices with excellent conductance linearity (CL) and multi-level cell (MLC) characteristics are required. We analyze the CL and MLC characteristics of TaOx-based filamentary resistive random access memory (RRAM) to implement the synapse device in neural network hardware. Our findings show that the number of oxygen vacancies in the filament constriction region of the RRAM directly controls the CL and MLC characteristics. By adopting a Ta electrode (instead of Ti) and the hot-forming step, we could form a dense conductive filament. As a result, a wide range of conductance levels with CL is achieved and significantly improved image classification accuracy is confirmed.
Classification of EMG signals using PSO optimized SVM for diagnosis of neuromuscular disorders.
Subasi, Abdulhamit
2013-06-01
Support vector machine (SVM) is an extensively used machine learning method with many biomedical signal classification applications. In this study, a novel PSO-SVM model has been proposed that hybridized the particle swarm optimization (PSO) and SVM to improve the EMG signal classification accuracy. This optimization mechanism involves kernel parameter setting in the SVM training procedure, which significantly influences the classification accuracy. The experiments were conducted on the basis of EMG signal to classify into normal, neurogenic or myopathic. In the proposed method the EMG signals were decomposed into the frequency sub-bands using discrete wavelet transform (DWT) and a set of statistical features were extracted from these sub-bands to represent the distribution of wavelet coefficients. The obtained results obviously validate the superiority of the SVM method compared to conventional machine learning methods, and suggest that further significant enhancements in terms of classification accuracy can be achieved by the proposed PSO-SVM classification system. The PSO-SVM yielded an overall accuracy of 97.41% on 1200 EMG signals selected from 27 subject records against 96.75%, 95.17% and 94.08% for the SVM, the k-NN and the RBF classifiers, respectively. PSO-SVM is developed as an efficient tool so that various SVMs can be used conveniently as the core of PSO-SVM for diagnosis of neuromuscular disorders. Copyright © 2013 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
O'Neil, Gina L.; Goodall, Jonathan L.; Watson, Layne T.
2018-04-01
Wetlands are important ecosystems that provide many ecological benefits, and their quality and presence are protected by federal regulations. These regulations require wetland delineations, which can be costly and time-consuming to perform. Computer models can assist in this process, but lack the accuracy necessary for environmental planning-scale wetland identification. In this study, the potential for improvement of wetland identification models through modification of digital elevation model (DEM) derivatives, derived from high-resolution and increasingly available light detection and ranging (LiDAR) data, at a scale necessary for small-scale wetland delineations is evaluated. A novel approach of flow convergence modelling is presented where Topographic Wetness Index (TWI), curvature, and Cartographic Depth-to-Water index (DTW), are modified to better distinguish wetland from upland areas, combined with ancillary soil data, and used in a Random Forest classification. This approach is applied to four study sites in Virginia, implemented as an ArcGIS model. The model resulted in significant improvement in average wetland accuracy compared to the commonly used National Wetland Inventory (84.9% vs. 32.1%), at the expense of a moderately lower average non-wetland accuracy (85.6% vs. 98.0%) and average overall accuracy (85.6% vs. 92.0%). From this, we concluded that modifying TWI, curvature, and DTW provides more robust wetland and non-wetland signatures to the models by improving accuracy rates compared to classifications using the original indices. The resulting ArcGIS model is a general tool able to modify these local LiDAR DEM derivatives based on site characteristics to identify wetlands at a high resolution.
NASA Technical Reports Server (NTRS)
Sekhon, R.
1981-01-01
Digital SEASAT-1 synthetic aperture radar (SAR) data were used to enhance linear features to extract geologically significant lineaments in the Appalachian region. Comparison of Lineaments thus mapped with an existing lineament map based on LANDSAT MSS images shows that appropriately processed SEASAT-1 SAR data can significantly improve the detection of lineaments. Merge MSS and SAR data sets were more useful fo lineament detection and landcover classification than LANDSAT or SEASAT data alone. About 20 percent of the lineaments plotted from the SEASAT SAR image did not appear on the LANDSAT image. About 6 percent of minor lineaments or parts of lineaments present in the LANDSAT map were missing from the SEASAT map. Improvement in the landcover classification (acreage and spatial estimation accuracy) was attained by using MSS-SAR merged data. The aerial estimation of residential/built-up and forest categories was improved. Accuracy in estimating the agricultural and water categories was slightly reduced.
Classification of Tree Species in Overstorey Canopy of Subtropical Forest Using QuickBird Images.
Lin, Chinsu; Popescu, Sorin C; Thomson, Gavin; Tsogt, Khongor; Chang, Chein-I
2015-01-01
This paper proposes a supervised classification scheme to identify 40 tree species (2 coniferous, 38 broadleaf) belonging to 22 families and 36 genera in high spatial resolution QuickBird multispectral images (HMS). Overall kappa coefficient (OKC) and species conditional kappa coefficients (SCKC) were used to evaluate classification performance in training samples and estimate accuracy and uncertainty in test samples. Baseline classification performance using HMS images and vegetation index (VI) images were evaluated with an OKC value of 0.58 and 0.48 respectively, but performance improved significantly (up to 0.99) when used in combination with an HMS spectral-spatial texture image (SpecTex). One of the 40 species had very high conditional kappa coefficient performance (SCKC ≥ 0.95) using 4-band HMS and 5-band VIs images, but, only five species had lower performance (0.68 ≤ SCKC ≤ 0.94) using the SpecTex images. When SpecTex images were combined with a Visible Atmospherically Resistant Index (VARI), there was a significant improvement in performance in the training samples. The same level of improvement could not be replicated in the test samples indicating that a high degree of uncertainty exists in species classification accuracy which may be due to individual tree crown density, leaf greenness (inter-canopy gaps), and noise in the background environment (intra-canopy gaps). These factors increase uncertainty in the spectral texture features and therefore represent potential problems when using pixel-based classification techniques for multi-species classification.
Determining successional stage of temperate coniferous forests with Landsat satellite data
NASA Technical Reports Server (NTRS)
Fiorella, Maria; Ripple, William J.
1995-01-01
Thematic Mapper (TM) digital imagery was used to map forest successional stages and to evaluate spectral differences between old-growth and mature forests in the central Cascade Range of Oregon. Relative sun incidence values were incorporated into the successional stage classification to compensate for topographic induced variation. Relative sun incidence improved the classification accuracy of young successional stages, but did not improve the classification accuracy of older, closed canopy forest classes or overall accuracy. TM bands 1, 2, and 4; the normalized difference vegetation index (NDVI); and TM 4/3, 4/5, and 4/7 band ratio values for old-growth forests were found to be significantly lower than the values of mature forests (P less than or equal to 0.010). Wetness and the TM 4/5 and 4/7 band ratios all had low correlations to relative sun incidence (r(exp 2) less than or equal to 0.16). The TM 4/5 band ratio was named the 'structural index' (SI) because of its ability to distinguish between mature and old-growth forests and its simplicity.
Mikhno, Arthur; Nuevo, Pablo Martinez; Devanand, Davangere P.; Parsey, Ramin V.; Laine, Andrew F.
2013-01-01
Multimodality classification of Alzheimer’s disease (AD) and its prodromal stage, Mild Cognitive Impairment (MCI), is of interest to the medical community. We improve on prior classification frameworks by incorporating multiple features from MRI and PET data obtained with multiple radioligands, fluorodeoxyglucose (FDG) and Pittsburg compound B (PIB). We also introduce a new MRI feature, invariant shape descriptors based on 3D Zernike moments applied to the hippocampus region. Classification performance is evaluated on data from 17 healthy controls (CTR), 22 MCI, and 17 AD subjects. Zernike significantly outperforms volume, accuracy (Zernike to volume): CTR/AD (90.7% to 71.6%), CTR/MCI (76.2% to 60.0%), MCI/AD (84.3% to 65.5%). Zernike also provides comparable and complementary performance to PET. Optimal accuracy is achieved when Zernike and PET features are combined (accuracy, specificity, sensitivity), CTR/AD (98.8%, 99.5%, 98.1%), CTR/MCI (84.3%, 82.9%, 85.9%) and MCI/AD (93.3%, 93.6%, 93.3%). PMID:24576927
Mikhno, Arthur; Nuevo, Pablo Martinez; Devanand, Davangere P; Parsey, Ramin V; Laine, Andrew F
2012-01-01
Multimodality classification of Alzheimer's disease (AD) and its prodromal stage, Mild Cognitive Impairment (MCI), is of interest to the medical community. We improve on prior classification frameworks by incorporating multiple features from MRI and PET data obtained with multiple radioligands, fluorodeoxyglucose (FDG) and Pittsburg compound B (PIB). We also introduce a new MRI feature, invariant shape descriptors based on 3D Zernike moments applied to the hippocampus region. Classification performance is evaluated on data from 17 healthy controls (CTR), 22 MCI, and 17 AD subjects. Zernike significantly outperforms volume, accuracy (Zernike to volume): CTR/AD (90.7% to 71.6%), CTR/MCI (76.2% to 60.0%), MCI/AD (84.3% to 65.5%). Zernike also provides comparable and complementary performance to PET. Optimal accuracy is achieved when Zernike and PET features are combined (accuracy, specificity, sensitivity), CTR/AD (98.8%, 99.5%, 98.1%), CTR/MCI (84.3%, 82.9%, 85.9%) and MCI/AD (93.3%, 93.6%, 93.3%).
Cognitive-motivational deficits in ADHD: development of a classification system.
Gupta, Rashmi; Kar, Bhoomika R; Srinivasan, Narayanan
2011-01-01
The classification systems developed so far to detect attention deficit/hyperactivity disorder (ADHD) do not have high sensitivity and specificity. We have developed a classification system based on several neuropsychological tests that measure cognitive-motivational functions that are specifically impaired in ADHD children. A total of 240 (120 ADHD children and 120 healthy controls) children in the age range of 6-9 years and 32 Oppositional Defiant Disorder (ODD) children (aged 9 years) participated in the study. Stop-Signal, Task-Switching, Attentional Network, and Choice Delay tests were administered to all the participants. Receiver operating characteristic (ROC) analysis indicated that percentage choice of long-delay reward best classified the ADHD children from healthy controls. Single parameters were not helpful in making a differential classification of ADHD with ODD. Multinominal logistic regression (MLR) was performed with multiple parameters (data fusion) that produced improved overall classification accuracy. A combination of stop-signal reaction time, posterror-slowing, mean delay, switch cost, and percentage choice of long-delay reward produced an overall classification accuracy of 97.8%; with internal validation, the overall accuracy was 92.2%. Combining parameters from different tests of control functions not only enabled us to accurately classify ADHD children from healthy controls but also in making a differential classification with ODD. These results have implications for the theories of ADHD.
Mexican Hat Wavelet Kernel ELM for Multiclass Classification.
Wang, Jie; Song, Yi-Fan; Ma, Tian-Lei
2017-01-01
Kernel extreme learning machine (KELM) is a novel feedforward neural network, which is widely used in classification problems. To some extent, it solves the existing problems of the invalid nodes and the large computational complexity in ELM. However, the traditional KELM classifier usually has a low test accuracy when it faces multiclass classification problems. In order to solve the above problem, a new classifier, Mexican Hat wavelet KELM classifier, is proposed in this paper. The proposed classifier successfully improves the training accuracy and reduces the training time in the multiclass classification problems. Moreover, the validity of the Mexican Hat wavelet as a kernel function of ELM is rigorously proved. Experimental results on different data sets show that the performance of the proposed classifier is significantly superior to the compared classifiers.
Decimated Input Ensembles for Improved Generalization
NASA Technical Reports Server (NTRS)
Tumer, Kagan; Oza, Nikunj C.; Norvig, Peter (Technical Monitor)
1999-01-01
Recently, many researchers have demonstrated that using classifier ensembles (e.g., averaging the outputs of multiple classifiers before reaching a classification decision) leads to improved performance for many difficult generalization problems. However, in many domains there are serious impediments to such "turnkey" classification accuracy improvements. Most notable among these is the deleterious effect of highly correlated classifiers on the ensemble performance. One particular solution to this problem is generating "new" training sets by sampling the original one. However, with finite number of patterns, this causes a reduction in the training patterns each classifier sees, often resulting in considerably worsened generalization performance (particularly for high dimensional data domains) for each individual classifier. Generally, this drop in the accuracy of the individual classifier performance more than offsets any potential gains due to combining, unless diversity among classifiers is actively promoted. In this work, we introduce a method that: (1) reduces the correlation among the classifiers; (2) reduces the dimensionality of the data, thus lessening the impact of the 'curse of dimensionality'; and (3) improves the classification performance of the ensemble.
Document image improvement for OCR as a classification problem
NASA Astrophysics Data System (ADS)
Summers, Kristen M.
2003-01-01
In support of the goal of automatically selecting methods of enhancing an image to improve the accuracy of OCR on that image, we consider the problem of determining whether to apply each of a set of methods as a supervised classification problem for machine learning. We characterize each image according to a combination of two sets of measures: a set that are intended to reflect the degree of particular types of noise present in documents in a single font of Roman or similar script and a more general set based on connected component statistics. We consider several potential methods of image improvement, each of which constitutes its own 2-class classification problem, according to whether transforming the image with this method improves the accuracy of OCR. In our experiments, the results varied for the different image transformation methods, but the system made the correct choice in 77% of the cases in which the decision affected the OCR score (in the range [0,1]) by at least .01, and it made the correct choice 64% of the time overall.
de Souza, Juliana Martins; Veríssimo, Maria De La Ó Ramallo
2013-02-01
Identify and analyze the NANDA-I diagnoses and the focus terms of the International Classification for Nursing Practices (ICNP) related to child development. Literature, reflections about clinical experience, and a model case. DATA SYNTHESE: The current diagnoses proposed by NANDA-I and the ICNP focus terms do not encompass the extent of the child development phenomenon. It is necessary studying the child development concept to improve the definition of the ICNP focus terms and the accuracy of NANDA-I diagnoses. Discussing the nursing classifications can improve their understanding and use. © 2012, The Authors. International Journal of Nursing Knowledge © 2012, NANDA International.
Myint, S.W.; Yuan, M.; Cerveny, R.S.; Giri, C.P.
2008-01-01
Remote sensing techniques have been shown effective for large-scale damage surveys after a hazardous event in both near real-time or post-event analyses. The paper aims to compare accuracy of common imaging processing techniques to detect tornado damage tracks from Landsat TM data. We employed the direct change detection approach using two sets of images acquired before and after the tornado event to produce a principal component composite images and a set of image difference bands. Techniques in the comparison include supervised classification, unsupervised classification, and objectoriented classification approach with a nearest neighbor classifier. Accuracy assessment is based on Kappa coefficient calculated from error matrices which cross tabulate correctly identified cells on the TM image and commission and omission errors in the result. Overall, the Object-oriented Approach exhibits the highest degree of accuracy in tornado damage detection. PCA and Image Differencing methods show comparable outcomes. While selected PCs can improve detection accuracy 5 to 10%, the Object-oriented Approach performs significantly better with 15-20% higher accuracy than the other two techniques. ?? 2008 by MDPI.
Myint, Soe W.; Yuan, May; Cerveny, Randall S.; Giri, Chandra P.
2008-01-01
Remote sensing techniques have been shown effective for large-scale damage surveys after a hazardous event in both near real-time or post-event analyses. The paper aims to compare accuracy of common imaging processing techniques to detect tornado damage tracks from Landsat TM data. We employed the direct change detection approach using two sets of images acquired before and after the tornado event to produce a principal component composite images and a set of image difference bands. Techniques in the comparison include supervised classification, unsupervised classification, and object-oriented classification approach with a nearest neighbor classifier. Accuracy assessment is based on Kappa coefficient calculated from error matrices which cross tabulate correctly identified cells on the TM image and commission and omission errors in the result. Overall, the Object-oriented Approach exhibits the highest degree of accuracy in tornado damage detection. PCA and Image Differencing methods show comparable outcomes. While selected PCs can improve detection accuracy 5 to 10%, the Object-oriented Approach performs significantly better with 15-20% higher accuracy than the other two techniques. PMID:27879757
Jeff Jenness; J. Judson Wynne
2005-01-01
In the field of spatially explicit modeling, well-developed accuracy assessment methodologies are often poorly applied. Deriving model accuracy metrics have been possible for decades, but these calculations were made by hand or with the use of a spreadsheet application. Accuracy assessments may be useful for: (1) ascertaining the quality of a model; (2) improving model...
An Evaluation of Item Response Theory Classification Accuracy and Consistency Indices
ERIC Educational Resources Information Center
Wyse, Adam E.; Hao, Shiqi
2012-01-01
This article introduces two new classification consistency indices that can be used when item response theory (IRT) models have been applied. The new indices are shown to be related to Rudner's classification accuracy index and Guo's classification accuracy index. The Rudner- and Guo-based classification accuracy and consistency indices are…
Hartling, Lisa; Bond, Kenneth; Santaguida, P Lina; Viswanathan, Meera; Dryden, Donna M
2011-08-01
To develop and test a study design classification tool. We contacted relevant organizations and individuals to identify tools used to classify study designs and ranked these using predefined criteria. The highest ranked tool was a design algorithm developed, but no longer advocated, by the Cochrane Non-Randomized Studies Methods Group; this was modified to include additional study designs and decision points. We developed a reference classification for 30 studies; 6 testers applied the tool to these studies. Interrater reliability (Fleiss' κ) and accuracy against the reference classification were assessed. The tool was further revised and retested. Initial reliability was fair among the testers (κ=0.26) and the reference standard raters κ=0.33). Testing after revisions showed improved reliability (κ=0.45, moderate agreement) with improved, but still low, accuracy. The most common disagreements were whether the study design was experimental (5 of 15 studies), and whether there was a comparison of any kind (4 of 15 studies). Agreement was higher among testers who had completed graduate level training versus those who had not. The moderate reliability and low accuracy may be because of lack of clarity and comprehensiveness of the tool, inadequate reporting of the studies, and variability in tester characteristics. The results may not be generalizable to all published studies, as the test studies were selected because they had posed challenges for previous reviewers with respect to their design classification. Application of such a tool should be accompanied by training, pilot testing, and context-specific decision rules. Copyright © 2011 Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Zhu, Zhe; Gallant, Alisa L.; Woodcock, Curtis E.; Pengra, Bruce; Olofsson, Pontus; Loveland, Thomas R.; Jin, Suming; Dahal, Devendra; Yang, Limin; Auch, Roger F.
2016-12-01
The U.S. Geological Survey's Land Change Monitoring, Assessment, and Projection (LCMAP) initiative is a new end-to-end capability to continuously track and characterize changes in land cover, use, and condition to better support research and applications relevant to resource management and environmental change. Among the LCMAP product suite are annual land cover maps that will be available to the public. This paper describes an approach to optimize the selection of training and auxiliary data for deriving the thematic land cover maps based on all available clear observations from Landsats 4-8. Training data were selected from map products of the U.S. Geological Survey's Land Cover Trends project. The Random Forest classifier was applied for different classification scenarios based on the Continuous Change Detection and Classification (CCDC) algorithm. We found that extracting training data proportionally to the occurrence of land cover classes was superior to an equal distribution of training data per class, and suggest using a total of 20,000 training pixels to classify an area about the size of a Landsat scene. The problem of unbalanced training data was alleviated by extracting a minimum of 600 training pixels and a maximum of 8000 training pixels per class. We additionally explored removing outliers contained within the training data based on their spectral and spatial criteria, but observed no significant improvement in classification results. We also tested the importance of different types of auxiliary data that were available for the conterminous United States, including: (a) five variables used by the National Land Cover Database, (b) three variables from the cloud screening "Function of mask" (Fmask) statistics, and (c) two variables from the change detection results of CCDC. We found that auxiliary variables such as a Digital Elevation Model and its derivatives (aspect, position index, and slope), potential wetland index, water probability, snow probability, and cloud probability improved the accuracy of land cover classification. Compared to the original strategy of the CCDC algorithm (500 pixels per class), the use of the optimal strategy improved the classification accuracies substantially (15-percentage point increase in overall accuracy and 4-percentage point increase in minimum accuracy).
Subject-Adaptive Real-Time Sleep Stage Classification Based on Conditional Random Field
Luo, Gang; Min, Wanli
2007-01-01
Sleep staging is the pattern recognition task of classifying sleep recordings into sleep stages. This task is one of the most important steps in sleep analysis. It is crucial for the diagnosis and treatment of various sleep disorders, and also relates closely to brain-machine interfaces. We report an automatic, online sleep stager using electroencephalogram (EEG) signal based on a recently-developed statistical pattern recognition method, conditional random field, and novel potential functions that have explicit physical meanings. Using sleep recordings from human subjects, we show that the average classification accuracy of our sleep stager almost approaches the theoretical limit and is about 8% higher than that of existing systems. Moreover, for a new subject snew with limited training data Dnew, we perform subject adaptation to improve classification accuracy. Our idea is to use the knowledge learned from old subjects to obtain from Dnew a regulated estimate of CRF’s parameters. Using sleep recordings from human subjects, we show that even without any Dnew, our sleep stager can achieve an average classification accuracy of 70% on snew. This accuracy increases with the size of Dnew and eventually becomes close to the theoretical limit. PMID:18693884
NASA Astrophysics Data System (ADS)
Zhang, Zhiming; de Wulf, Robert R.; van Coillie, Frieke M. B.; Verbeke, Lieven P. C.; de Clercq, Eva M.; Ou, Xiaokun
2011-01-01
Mapping of vegetation using remote sensing in mountainous areas is considerably hampered by topographic effects on the spectral response pattern. A variety of topographic normalization techniques have been proposed to correct these illumination effects due to topography. The purpose of this study was to compare six different topographic normalization methods (Cosine correction, Minnaert correction, C-correction, Sun-canopy-sensor correction, two-stage topographic normalization, and slope matching technique) for their effectiveness in enhancing vegetation classification in mountainous environments. Since most of the vegetation classes in the rugged terrain of the Lancang Watershed (China) did not feature a normal distribution, artificial neural networks (ANNs) were employed as a classifier. Comparing the ANN classifications, none of the topographic correction methods could significantly improve ETM+ image classification overall accuracy. Nevertheless, at the class level, the accuracy of pine forest could be increased by using topographically corrected images. On the contrary, oak forest and mixed forest accuracies were significantly decreased by using corrected images. The results also showed that none of the topographic normalization strategies was satisfactorily able to correct for the topographic effects in severely shadowed areas.
Support vector machine and principal component analysis for microarray data classification
NASA Astrophysics Data System (ADS)
Astuti, Widi; Adiwijaya
2018-03-01
Cancer is a leading cause of death worldwide although a significant proportion of it can be cured if it is detected early. In recent decades, technology called microarray takes an important role in the diagnosis of cancer. By using data mining technique, microarray data classification can be performed to improve the accuracy of cancer diagnosis compared to traditional techniques. The characteristic of microarray data is small sample but it has huge dimension. Since that, there is a challenge for researcher to provide solutions for microarray data classification with high performance in both accuracy and running time. This research proposed the usage of Principal Component Analysis (PCA) as a dimension reduction method along with Support Vector Method (SVM) optimized by kernel functions as a classifier for microarray data classification. The proposed scheme was applied on seven data sets using 5-fold cross validation and then evaluation and analysis conducted on term of both accuracy and running time. The result showed that the scheme can obtained 100% accuracy for Ovarian and Lung Cancer data when Linear and Cubic kernel functions are used. In term of running time, PCA greatly reduced the running time for every data sets.
Pediatric Surgeon-Directed Wound Classification Improves Accuracy
Zens, Tiffany J.; Rusy, Deborah A.; Gosain, Ankush
2015-01-01
Background Surgical wound classification (SWC) communicates the degree of contamination in the surgical field and is used to stratify risk of surgical site infection and compare outcomes amongst centers. We hypothesized that changing from nurse-directed to surgeon-directed SWC during a structured operative debrief we will improve accuracy of documentation. Methods An IRB-approved retrospective chart review was performed. Two time periods were defined: initially, SWC was determined and recorded by the circulating nurse (Pre-Debrief 6/2012-5/2013) and allowing six months for adoption and education, we implemented a structured operative debriefing including surgeon-directed SWC (Post-Debrief 1/2014-8/2014). Accuracy of SWC was determined for four commonly performed Pediatric General Surgery operations: inguinal hernia repair (clean), gastrostomy +/− Nissen fundoplication (clean-contaminated), appendectomy without perforation (contaminated), and appendectomy with perforation (dirty). Results 183 cases Pre-Debrief and 142 cases Post-Debrief met inclusion criteria. No differences between time periods were noted in regards to patient demographics, ASA class, or case mix. Accuracy of wound classification improved Post-Debrief (42% vs. 58.5%, p=0.003). Pre-Debrief, 26.8% of cases were overestimated or underestimated by more than one wound class, vs. 3.5% of cases Post-Debrief (p<0.001). Interestingly, the majority of Post-Debrief contaminated cases were incorrectly classified as clean-contaminated. Conclusions Implementation of a structured operative debrief including surgeon-directed SWC improves the percentage of correctly classified wounds and decreases the degree of inaccuracy in incorrectly classified cases. However, following implementation of the debriefing, we still observed a 41.5% rate of incorrect documentation, most notably in contaminated cases, indicating further education and process improvement is needed. PMID:27020829
NASA Astrophysics Data System (ADS)
Xie, W.-J.; Zhang, L.; Chen, H.-P.; Zhou, J.; Mao, W.-J.
2018-04-01
The purpose of carrying out national geographic conditions monitoring is to obtain information of surface changes caused by human social and economic activities, so that the geographic information can be used to offer better services for the government, enterprise and public. Land cover data contains detailed geographic conditions information, thus has been listed as one of the important achievements in the national geographic conditions monitoring project. At present, the main issue of the production of the land cover data is about how to improve the classification accuracy. For the land cover data quality inspection and acceptance, classification accuracy is also an important check point. So far, the classification accuracy inspection is mainly based on human-computer interaction or manual inspection in the project, which are time consuming and laborious. By harnessing the automatic high-resolution remote sensing image change detection technology based on the ERDAS IMAGINE platform, this paper carried out the classification accuracy inspection test of land cover data in the project, and presented a corresponding technical route, which includes data pre-processing, change detection, result output and information extraction. The result of the quality inspection test shows the effectiveness of the technical route, which can meet the inspection needs for the two typical errors, that is, missing and incorrect update error, and effectively reduces the work intensity of human-computer interaction inspection for quality inspectors, and also provides a technical reference for the data production and quality control of the land cover data.
NASA Technical Reports Server (NTRS)
Rignot, Eric; Williams, Cynthia; Way, Jobea; Viereck, Leslie
1993-01-01
A maximum a posteriori Bayesian classifier for multifrequency polarimetric SAR data is used to perform a supervised classification of forest types in the floodplains of Alaska. The image classes include white spruce, balsam poplar, black spruce, alder, non-forests, and open water. The authors investigate the effect on classification accuracy of changing environmental conditions, and of frequency and polarization of the signal. The highest classification accuracy (86 percent correctly classified forest pixels, and 91 percent overall) is obtained combining L- and C-band frequencies fully polarimetric on a date where the forest is just recovering from flooding. The forest map compares favorably with a vegetation map assembled from digitized aerial photos which took five years for completion, and address the state of the forest in 1978, ignoring subsequent fires, changes in the course of the river, clear-cutting of trees, and tree growth. HV-polarization is the most useful polarization at L- and C-band for classification. C-band VV (ERS-1 mode) and L-band HH (J-ERS-1 mode) alone or combined yield unsatisfactory classification accuracies. Additional data acquired in the winter season during thawed and frozen days yield classification accuracies respectively 20 percent and 30 percent lower due to a greater confusion between conifers and deciduous trees. Data acquired at the peak of flooding in May 1991 also yield classification accuracies 10 percent lower because of dominant trunk-ground interactions which mask out finer differences in radar backscatter between tree species. Combination of several of these dates does not improve classification accuracy. For comparison, panchromatic optical data acquired by SPOT in the summer season of 1991 are used to classify the same area. The classification accuracy (78 percent for the forest types and 90 percent if open water is included) is lower than that obtained with AIRSAR although conifers and deciduous trees are better separated due to the presence of leaves on the deciduous trees. Optical data do not separate black spruce and white spruce as well as SAR data, cannot separate alder from balsam poplar, and are of course limited by the frequent cloud cover in the polar regions. Yet, combining SPOT and AIRSAR offers better chances to identify vegetation types independent of ground truth information using a combination of NDVI indexes from SPOT, biomass numbers from AIRSAR, and a segmentation map from either one.
The decision tree approach to classification
NASA Technical Reports Server (NTRS)
Wu, C.; Landgrebe, D. A.; Swain, P. H.
1975-01-01
A class of multistage decision tree classifiers is proposed and studied relative to the classification of multispectral remotely sensed data. The decision tree classifiers are shown to have the potential for improving both the classification accuracy and the computation efficiency. Dimensionality in pattern recognition is discussed and two theorems on the lower bound of logic computation for multiclass classification are derived. The automatic or optimization approach is emphasized. Experimental results on real data are reported, which clearly demonstrate the usefulness of decision tree classifiers.
Bricher, Phillippa K.; Lucieer, Arko; Shaw, Justine; Terauds, Aleks; Bergstrom, Dana M.
2013-01-01
Monitoring changes in the distribution and density of plant species often requires accurate and high-resolution baseline maps of those species. Detecting such change at the landscape scale is often problematic, particularly in remote areas. We examine a new technique to improve accuracy and objectivity in mapping vegetation, combining species distribution modelling and satellite image classification on a remote sub-Antarctic island. In this study, we combine spectral data from very high resolution WorldView-2 satellite imagery and terrain variables from a high resolution digital elevation model to improve mapping accuracy, in both pixel- and object-based classifications. Random forest classification was used to explore the effectiveness of these approaches on mapping the distribution of the critically endangered cushion plant Azorella macquariensis Orchard (Apiaceae) on sub-Antarctic Macquarie Island. Both pixel- and object-based classifications of the distribution of Azorella achieved very high overall validation accuracies (91.6–96.3%, κ = 0.849–0.924). Both two-class and three-class classifications were able to accurately and consistently identify the areas where Azorella was absent, indicating that these maps provide a suitable baseline for monitoring expected change in the distribution of the cushion plants. Detecting such change is critical given the threats this species is currently facing under altering environmental conditions. The method presented here has applications to monitoring a range of species, particularly in remote and isolated environments. PMID:23940805
NASA Technical Reports Server (NTRS)
Mehta, N. C.
1984-01-01
The utility of radar scatterometers for discrimination and characterization of natural vegetation was investigated. Backscatter measurements were acquired with airborne multi-frequency, multi-polarization, multi-angle radar scatterometers over a test site in a southern temperate forest. Separability between ground cover classes was studied using a two-class separability measure. Very good separability is achieved between most classes. Longer wavelength is useful in separating trees from non-tree classes, while shorter wavelength and cross polarization are helpful for discrimination among tree classes. Using the maximum likelihood classifier, 50% overall classification accuracy is achieved using a single, short-wavelength scatterometer channel. Addition of multiple incidence angles and another radar band improves classification accuracy by 20% and 50%, respectively, over the single channel accuracy. Incorporation of a third radar band seems redundant for vegetation classification. Vertical transmit polarization is critically important for all classes.
Chikh, Mohamed Amine; Saidi, Meryem; Settouti, Nesma
2012-10-01
The use of expert systems and artificial intelligence techniques in disease diagnosis has been increasing gradually. Artificial Immune Recognition System (AIRS) is one of the methods used in medical classification problems. AIRS2 is a more efficient version of the AIRS algorithm. In this paper, we used a modified AIRS2 called MAIRS2 where we replace the K- nearest neighbors algorithm with the fuzzy K-nearest neighbors to improve the diagnostic accuracy of diabetes diseases. The diabetes disease dataset used in our work is retrieved from UCI machine learning repository. The performances of the AIRS2 and MAIRS2 are evaluated regarding classification accuracy, sensitivity and specificity values. The highest classification accuracy obtained when applying the AIRS2 and MAIRS2 using 10-fold cross-validation was, respectively 82.69% and 89.10%.
Real-time and simultaneous control of artificial limbs based on pattern recognition algorithms.
Ortiz-Catalan, Max; Håkansson, Bo; Brånemark, Rickard
2014-07-01
The prediction of simultaneous limb motions is a highly desirable feature for the control of artificial limbs. In this work, we investigate different classification strategies for individual and simultaneous movements based on pattern recognition of myoelectric signals. Our results suggest that any classifier can be potentially employed in the prediction of simultaneous movements if arranged in a distributed topology. On the other hand, classifiers inherently capable of simultaneous predictions, such as the multi-layer perceptron (MLP), were found to be more cost effective, as they can be successfully employed in their simplest form. In the prediction of individual movements, the one-vs-one (OVO) topology was found to improve classification accuracy across different classifiers and it was therefore used to benchmark the benefits of simultaneous control. As opposed to previous work reporting only offline accuracy, the classification performance and the resulting controllability are evaluated in real time using the motion test and target achievement control (TAC) test, respectively. We propose a simultaneous classification strategy based on MLP that outperformed a top classifier for individual movements (LDA-OVO), thus improving the state-of-the-art classification approach. Furthermore, all the presented classification strategies and data collected in this study are freely available in BioPatRec, an open source platform for the development of advanced prosthetic control strategies.
Jane, Nancy Yesudhas; Nehemiah, Khanna Harichandran; Arputharaj, Kannan
2016-01-01
Clinical time-series data acquired from electronic health records (EHR) are liable to temporal complexities such as irregular observations, missing values and time constrained attributes that make the knowledge discovery process challenging. This paper presents a temporal rough set induced neuro-fuzzy (TRiNF) mining framework that handles these complexities and builds an effective clinical decision-making system. TRiNF provides two functionalities namely temporal data acquisition (TDA) and temporal classification. In TDA, a time-series forecasting model is constructed by adopting an improved double exponential smoothing method. The forecasting model is used in missing value imputation and temporal pattern extraction. The relevant attributes are selected using a temporal pattern based rough set approach. In temporal classification, a classification model is built with the selected attributes using a temporal pattern induced neuro-fuzzy classifier. For experimentation, this work uses two clinical time series dataset of hepatitis and thrombosis patients. The experimental result shows that with the proposed TRiNF framework, there is a significant reduction in the error rate, thereby obtaining the classification accuracy on an average of 92.59% for hepatitis and 91.69% for thrombosis dataset. The obtained classification results prove the efficiency of the proposed framework in terms of its improved classification accuracy.
Landcover Classification Using Deep Fully Convolutional Neural Networks
NASA Astrophysics Data System (ADS)
Wang, J.; Li, X.; Zhou, S.; Tang, J.
2017-12-01
Land cover classification has always been an essential application in remote sensing. Certain image features are needed for land cover classification whether it is based on pixel or object-based methods. Different from other machine learning methods, deep learning model not only extracts useful information from multiple bands/attributes, but also learns spatial characteristics. In recent years, deep learning methods have been developed rapidly and widely applied in image recognition, semantic understanding, and other application domains. However, there are limited studies applying deep learning methods in land cover classification. In this research, we used fully convolutional networks (FCN) as the deep learning model to classify land covers. The National Land Cover Database (NLCD) within the state of Kansas was used as training dataset and Landsat images were classified using the trained FCN model. We also applied an image segmentation method to improve the original results from the FCN model. In addition, the pros and cons between deep learning and several machine learning methods were compared and explored. Our research indicates: (1) FCN is an effective classification model with an overall accuracy of 75%; (2) image segmentation improves the classification results with better match of spatial patterns; (3) FCN has an excellent ability of learning which can attains higher accuracy and better spatial patterns compared with several machine learning methods.
High-accuracy user identification using EEG biometrics.
Koike-Akino, Toshiaki; Mahajan, Ruhi; Marks, Tim K; Ye Wang; Watanabe, Shinji; Tuzel, Oncel; Orlik, Philip
2016-08-01
We analyze brain waves acquired through a consumer-grade EEG device to investigate its capabilities for user identification and authentication. First, we show the statistical significance of the P300 component in event-related potential (ERP) data from 14-channel EEGs across 25 subjects. We then apply a variety of machine learning techniques, comparing the user identification performance of various different combinations of a dimensionality reduction technique followed by a classification algorithm. Experimental results show that an identification accuracy of 72% can be achieved using only a single 800 ms ERP epoch. In addition, we demonstrate that the user identification accuracy can be significantly improved to more than 96.7% by joint classification of multiple epochs.
Optimizing Support Vector Machine Parameters with Genetic Algorithm for Credit Risk Assessment
NASA Astrophysics Data System (ADS)
Manurung, Jonson; Mawengkang, Herman; Zamzami, Elviawaty
2017-12-01
Support vector machine (SVM) is a popular classification method known to have strong generalization capabilities. SVM can solve the problem of classification and linear regression or nonlinear kernel which can be a learning algorithm for the ability of classification and regression. However, SVM also has a weakness that is difficult to determine the optimal parameter value. SVM calculates the best linear separator on the input feature space according to the training data. To classify data which are non-linearly separable, SVM uses kernel tricks to transform the data into a linearly separable data on a higher dimension feature space. The kernel trick using various kinds of kernel functions, such as : linear kernel, polynomial, radial base function (RBF) and sigmoid. Each function has parameters which affect the accuracy of SVM classification. To solve the problem genetic algorithms are proposed to be applied as the optimal parameter value search algorithm thus increasing the best classification accuracy on SVM. Data taken from UCI repository of machine learning database: Australian Credit Approval. The results show that the combination of SVM and genetic algorithms is effective in improving classification accuracy. Genetic algorithms has been shown to be effective in systematically finding optimal kernel parameters for SVM, instead of randomly selected kernel parameters. The best accuracy for data has been upgraded from kernel Linear: 85.12%, polynomial: 81.76%, RBF: 77.22% Sigmoid: 78.70%. However, for bigger data sizes, this method is not practical because it takes a lot of time.
NASA Astrophysics Data System (ADS)
Wu, M. F.; Sun, Z. C.; Yang, B.; Yu, S. S.
2016-11-01
In order to reduce the “salt and pepper” in pixel-based urban land cover classification and expand the application of fusion of multi-source data in the field of urban remote sensing, WorldView-2 imagery and airborne Light Detection and Ranging (LiDAR) data were used to improve the classification of urban land cover. An approach of object- oriented hierarchical classification was proposed in our study. The processing of proposed method consisted of two hierarchies. (1) In the first hierarchy, LiDAR Normalized Digital Surface Model (nDSM) image was segmented to objects. The NDVI, Costal Blue and nDSM thresholds were set for extracting building objects. (2) In the second hierarchy, after removing building objects, WorldView-2 fused imagery was obtained by Haze-ratio-based (HR) fusion, and was segmented. A SVM classifier was applied to generate road/parking lot, vegetation and bare soil objects. (3) Trees and grasslands were split based on an nDSM threshold (2.4 meter). The results showed that compared with pixel-based and non-hierarchical object-oriented approach, proposed method provided a better performance of urban land cover classification, the overall accuracy (OA) and overall kappa (OK) improved up to 92.75% and 0.90. Furthermore, proposed method reduced “salt and pepper” in pixel-based classification, improved the extraction accuracy of buildings based on LiDAR nDSM image segmentation, and reduced the confusion between trees and grasslands through setting nDSM threshold.
Hyperspectral analysis of seagrass in Redfish Bay, Texas
NASA Astrophysics Data System (ADS)
Wood, John S.
Remote sensing using multi- and hyperspectral imaging and analysis has been used in resource management for quite some time, and for a variety of purposes. In the studies to follow, hyperspectral imagery of Redfish Bay is used to discriminate between species of seagrasses found below the water surface. Water attenuates and reflects light and energy from the electromagnetic spectrum, and as a result, subsurface analysis can be more complex than that performed in the terrestrial world. In the following studies, an iterative process is developed, using ENVI image processing software and ArcGIS software. Band selection was based on recommendations developed empirically in conjunction with ongoing research into depth corrections, which were applied to the imagery bands (a default depth of 65 cm was used). Polygons generated, classified and aggregated within ENVI are reclassified in ArcGIS using field site data that was randomly selected for that purpose. After the first iteration, polygons that remain classified as 'Mixed' are subjected to another iteration of classification in ENVI, then brought into ArcGIS and reclassified. Finally, when that classification scheme is exhausted, a supervised classification is performed, using a 'Maximum Likelihood' classification technique, which assigned the remaining polygons to the classification that was most like the training polygons, by digital number value. Producer's Accuracy by classification ranged from 23.33 % for the 'MixedMono' class to 66.67% for the 'Bare' class; User's Accuracy by classification ranged from 22.58% for the 'MixedMono' class to 69.57% for the 'Bare' classification. An overall accuracy of 37.93% was achieved. Producers and Users Accuracies for Halodule were 29% and 39%, respectively; for Thalassia, they were 46% and 40%. Cohen's Kappa Coefficient was calculated at .2988. We then returned to the field and collected spectral signatures of monotypic stands of seagrass at varying depths and at three sensor levels: above the water surface, just below the air/water interface, and at the canopy position, when it differed from the subsurface position. Analysis of plots of these spectral curves, after applying depth corrections and Multiplicative Scatter Correction, indicates that there are detectable spectral differences between Halodule and Thalassia species at all three positions. Further analysis indicated that only above-surface spectral signals could reliably be used to discriminate between species, because there was an overlap of the standard deviations in the other two positions. A recommendation for wavelengths that would produce increased accuracy in hyperspectral image analysis was made, based on areas where there is a significant amount of difference between the mean spectral signatures, and no overlap of the standard deviations in our samples. The original hyperspectral imagery was reprocessed, using the bands recommended from the research above (approximately 535, 600, 620, 638, and 656 nm). A depth raster was developed from various available sources, which was resampled and reclassified to reflect values for water absorption and water scattering, which were then applied to each band using the depth correction algorithm. Processing followed the iterative classification methods described above. Accuracy for this round of processing improved; overall accuracy increased from 38% to 57%. Improvements were noted in Producer's Accuracy, with the 'Bare' vi classification increasing from 67% to 73%, Halodule increasing from 29% to 63%, Thalassia increasing slightly, from 46% to 50%, and 'MixedMono' improving from 23% to 42%. User's Accuracy also improved, with the 'Bare' class increasing from 69% to 70%, Halodule increasing from 39% to 67%, Thalassia increasing from 40% to 7%, and 'MixedMono' increasing from 22.5% to 35%. A very recent report shows the mean percent cover of seagrasses in Redfish Bay and Corpus Christi Bay combined for all species at 68.6%, and individually by species: Halodule 39.8%, Thalassia 23.7%, Syringodium 4%, Ruppia 1% and Halophila 0.1%. Our study classifies 15% as 'Bare', 23% Halodule, 18% Thalassia, and 2% Ruppia. In addition, we classify 5% as 'Mixed', 22% as 'MixedMono', 12% as 'Bare/Halodule Mix', and 3% 'Bare/Thalassia Mix'. Aggregating the 'Bare' and 'Bare/species' classes would equate to approximately 30%, very close to what this new study produces. Other classes are quite similar, when considering that their study includes no 'Mixed' classifications. This series of research studies illustrates the application and utility of hyperspectral imagery and associated processing to mapping shallow benthic habitats. It also demonstrates that the technology is rapidly changing and adapting, which will lead to even further increases in accuracy. Future studies with hyperspectral imaging should include extensive spectral field collection, and the application of a depth correction.
NASA Astrophysics Data System (ADS)
Adjorlolo, Clement; Mutanga, Onisimo; Cho, Moses A.; Ismail, Riyad
2013-04-01
In this paper, a user-defined inter-band correlation filter function was used to resample hyperspectral data and thereby mitigate the problem of multicollinearity in classification analysis. The proposed resampling technique convolves the spectral dependence information between a chosen band-centre and its shorter and longer wavelength neighbours. Weighting threshold of inter-band correlation (WTC, Pearson's r) was calculated, whereby r = 1 at the band-centre. Various WTC (r = 0.99, r = 0.95 and r = 0.90) were assessed, and bands with coefficients beyond a chosen threshold were assigned r = 0. The resultant data were used in the random forest analysis to classify in situ C3 and C4 grass canopy reflectance. The respective WTC datasets yielded improved classification accuracies (kappa = 0.82, 0.79 and 0.76) with less correlated wavebands when compared to resampled Hyperion bands (kappa = 0.76). Overall, the results obtained from this study suggested that resampling of hyperspectral data should account for the spectral dependence information to improve overall classification accuracy as well as reducing the problem of multicollinearity.
Wen, Tingxi; Zhang, Zhongnan
2017-01-01
Abstract In this paper, genetic algorithm-based frequency-domain feature search (GAFDS) method is proposed for the electroencephalogram (EEG) analysis of epilepsy. In this method, frequency-domain features are first searched and then combined with nonlinear features. Subsequently, these features are selected and optimized to classify EEG signals. The extracted features are analyzed experimentally. The features extracted by GAFDS show remarkable independence, and they are superior to the nonlinear features in terms of the ratio of interclass distance and intraclass distance. Moreover, the proposed feature search method can search for features of instantaneous frequency in a signal after Hilbert transformation. The classification results achieved using these features are reasonable; thus, GAFDS exhibits good extensibility. Multiple classical classifiers (i.e., k-nearest neighbor, linear discriminant analysis, decision tree, AdaBoost, multilayer perceptron, and Naïve Bayes) achieve satisfactory classification accuracies by using the features generated by the GAFDS method and the optimized feature selection. The accuracies for 2-classification and 3-classification problems may reach up to 99% and 97%, respectively. Results of several cross-validation experiments illustrate that GAFDS is effective in the extraction of effective features for EEG classification. Therefore, the proposed feature selection and optimization model can improve classification accuracy. PMID:28489789
Wen, Tingxi; Zhang, Zhongnan
2017-05-01
In this paper, genetic algorithm-based frequency-domain feature search (GAFDS) method is proposed for the electroencephalogram (EEG) analysis of epilepsy. In this method, frequency-domain features are first searched and then combined with nonlinear features. Subsequently, these features are selected and optimized to classify EEG signals. The extracted features are analyzed experimentally. The features extracted by GAFDS show remarkable independence, and they are superior to the nonlinear features in terms of the ratio of interclass distance and intraclass distance. Moreover, the proposed feature search method can search for features of instantaneous frequency in a signal after Hilbert transformation. The classification results achieved using these features are reasonable; thus, GAFDS exhibits good extensibility. Multiple classical classifiers (i.e., k-nearest neighbor, linear discriminant analysis, decision tree, AdaBoost, multilayer perceptron, and Naïve Bayes) achieve satisfactory classification accuracies by using the features generated by the GAFDS method and the optimized feature selection. The accuracies for 2-classification and 3-classification problems may reach up to 99% and 97%, respectively. Results of several cross-validation experiments illustrate that GAFDS is effective in the extraction of effective features for EEG classification. Therefore, the proposed feature selection and optimization model can improve classification accuracy.
[Object-oriented aquatic vegetation extracting approach based on visible vegetation indices.
Jing, Ran; Deng, Lei; Zhao, Wen Ji; Gong, Zhao Ning
2016-05-01
Using the estimation of scale parameters (ESP) image segmentation tool to determine the ideal image segmentation scale, the optimal segmented image was created by the multi-scale segmentation method. Based on the visible vegetation indices derived from mini-UAV imaging data, we chose a set of optimal vegetation indices from a series of visible vegetation indices, and built up a decision tree rule. A membership function was used to automatically classify the study area and an aquatic vegetation map was generated. The results showed the overall accuracy of image classification using the supervised classification was 53.7%, and the overall accuracy of object-oriented image analysis (OBIA) was 91.7%. Compared with pixel-based supervised classification method, the OBIA method improved significantly the image classification result and further increased the accuracy of extracting the aquatic vegetation. The Kappa value of supervised classification was 0.4, and the Kappa value based OBIA was 0.9. The experimental results demonstrated that using visible vegetation indices derived from the mini-UAV data and OBIA method extracting the aquatic vegetation developed in this study was feasible and could be applied in other physically similar areas.
Pettersson-Yeo, William; Benetti, Stefania; Marquand, Andre F.; Joules, Richard; Catani, Marco; Williams, Steve C. R.; Allen, Paul; McGuire, Philip; Mechelli, Andrea
2014-01-01
In the pursuit of clinical utility, neuroimaging researchers of psychiatric and neurological illness are increasingly using analyses, such as support vector machine, that allow inference at the single-subject level. Recent studies employing single-modality data, however, suggest that classification accuracies must be improved for such utility to be realized. One possible solution is to integrate different data types to provide a single combined output classification; either by generating a single decision function based on an integrated kernel matrix, or, by creating an ensemble of multiple single modality classifiers and integrating their predictions. Here, we describe four integrative approaches: (1) an un-weighted sum of kernels, (2) multi-kernel learning, (3) prediction averaging, and (4) majority voting, and compare their ability to enhance classification accuracy relative to the best single-modality classification accuracy. We achieve this by integrating structural, functional, and diffusion tensor magnetic resonance imaging data, in order to compare ultra-high risk (n = 19), first episode psychosis (n = 19) and healthy control subjects (n = 23). Our results show that (i) whilst integration can enhance classification accuracy by up to 13%, the frequency of such instances may be limited, (ii) where classification can be enhanced, simple methods may yield greater increases relative to more computationally complex alternatives, and, (iii) the potential for classification enhancement is highly influenced by the specific diagnostic comparison under consideration. In conclusion, our findings suggest that for moderately sized clinical neuroimaging datasets, combining different imaging modalities in a data-driven manner is no “magic bullet” for increasing classification accuracy. However, it remains possible that this conclusion is dependent on the use of neuroimaging modalities that had little, or no, complementary information to offer one another, and that the integration of more diverse types of data would have produced greater classification enhancement. We suggest that future studies ideally examine a greater variety of data types (e.g., genetic, cognitive, and neuroimaging) in order to identify the data types and combinations optimally suited to the classification of early stage psychosis. PMID:25076868
Pettersson-Yeo, William; Benetti, Stefania; Marquand, Andre F; Joules, Richard; Catani, Marco; Williams, Steve C R; Allen, Paul; McGuire, Philip; Mechelli, Andrea
2014-01-01
In the pursuit of clinical utility, neuroimaging researchers of psychiatric and neurological illness are increasingly using analyses, such as support vector machine, that allow inference at the single-subject level. Recent studies employing single-modality data, however, suggest that classification accuracies must be improved for such utility to be realized. One possible solution is to integrate different data types to provide a single combined output classification; either by generating a single decision function based on an integrated kernel matrix, or, by creating an ensemble of multiple single modality classifiers and integrating their predictions. Here, we describe four integrative approaches: (1) an un-weighted sum of kernels, (2) multi-kernel learning, (3) prediction averaging, and (4) majority voting, and compare their ability to enhance classification accuracy relative to the best single-modality classification accuracy. We achieve this by integrating structural, functional, and diffusion tensor magnetic resonance imaging data, in order to compare ultra-high risk (n = 19), first episode psychosis (n = 19) and healthy control subjects (n = 23). Our results show that (i) whilst integration can enhance classification accuracy by up to 13%, the frequency of such instances may be limited, (ii) where classification can be enhanced, simple methods may yield greater increases relative to more computationally complex alternatives, and, (iii) the potential for classification enhancement is highly influenced by the specific diagnostic comparison under consideration. In conclusion, our findings suggest that for moderately sized clinical neuroimaging datasets, combining different imaging modalities in a data-driven manner is no "magic bullet" for increasing classification accuracy. However, it remains possible that this conclusion is dependent on the use of neuroimaging modalities that had little, or no, complementary information to offer one another, and that the integration of more diverse types of data would have produced greater classification enhancement. We suggest that future studies ideally examine a greater variety of data types (e.g., genetic, cognitive, and neuroimaging) in order to identify the data types and combinations optimally suited to the classification of early stage psychosis.
A Hybrid Semi-supervised Classification Scheme for Mining Multisource Geospatial Data
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vatsavai, Raju; Bhaduri, Budhendra L
2011-01-01
Supervised learning methods such as Maximum Likelihood (ML) are often used in land cover (thematic) classification of remote sensing imagery. ML classifier relies exclusively on spectral characteristics of thematic classes whose statistical distributions (class conditional probability densities) are often overlapping. The spectral response distributions of thematic classes are dependent on many factors including elevation, soil types, and ecological zones. A second problem with statistical classifiers is the requirement of large number of accurate training samples (10 to 30 |dimensions|), which are often costly and time consuming to acquire over large geographic regions. With the increasing availability of geospatial databases, itmore » is possible to exploit the knowledge derived from these ancillary datasets to improve classification accuracies even when the class distributions are highly overlapping. Likewise newer semi-supervised techniques can be adopted to improve the parameter estimates of statistical model by utilizing a large number of easily available unlabeled training samples. Unfortunately there is no convenient multivariate statistical model that can be employed for mulitsource geospatial databases. In this paper we present a hybrid semi-supervised learning algorithm that effectively exploits freely available unlabeled training samples from multispectral remote sensing images and also incorporates ancillary geospatial databases. We have conducted several experiments on real datasets, and our new hybrid approach shows over 25 to 35% improvement in overall classification accuracy over conventional classification schemes.« less
NASA Astrophysics Data System (ADS)
Liu, Yansong; Monteiro, Sildomar T.; Saber, Eli
2015-10-01
Changes in vegetation cover, building construction, road network and traffic conditions caused by urban expansion affect the human habitat as well as the natural environment in rapidly developing cities. It is crucial to assess these changes and respond accordingly by identifying man-made and natural structures with accurate classification algorithms. With the increase in use of multi-sensor remote sensing systems, researchers are able to obtain a more complete description of the scene of interest. By utilizing multi-sensor data, the accuracy of classification algorithms can be improved. In this paper, we propose a method for combining 3D LiDAR point clouds and high-resolution color images to classify urban areas using Gaussian processes (GP). GP classification is a powerful non-parametric classification method that yields probabilistic classification results. It makes predictions in a way that addresses the uncertainty of real world. In this paper, we attempt to identify man-made and natural objects in urban areas including buildings, roads, trees, grass, water and vehicles. LiDAR features are derived from the 3D point clouds and the spatial and color features are extracted from RGB images. For classification, we use the Laplacian approximation for GP binary classification on the new combined feature space. The multiclass classification has been implemented by using one-vs-all binary classification strategy. The result of applying support vector machines (SVMs) and logistic regression (LR) classifier is also provided for comparison. Our experiments show a clear improvement of classification results by using the two sensors combined instead of each sensor separately. Also we found the advantage of applying GP approach to handle the uncertainty in classification result without compromising accuracy compared to SVM, which is considered as the state-of-the-art classification method.
NASA Technical Reports Server (NTRS)
Kiang, Richard K.
1992-01-01
Neural networks have been applied to classifications of remotely sensed data with some success. To improve the performance of this approach, an examination was made of how neural networks are applied to the optical character recognition (OCR) of handwritten digits and letters. A three-layer, feedforward network, along with techniques adopted from OCR, was used to classify Landsat-4 Thematic Mapper data. Good results were obtained. To overcome the difficulties that are characteristic of remote sensing applications and to attain significant improvements in classification accuracy, a special network architecture may be required.
Shi, Rong; Schraedley-Desmond, Pamela; Napel, Sandy; Olcott, Eric W; Jeffrey, R Brooke; Yee, Judy; Zalis, Michael E; Margolis, Daniel; Paik, David S; Sherbondy, Anthony J; Sundaram, Padmavathi; Beaulieu, Christopher F
2006-06-01
To retrospectively determine if three-dimensional (3D) viewing improves radiologists' accuracy in classifying true-positive (TP) and false-positive (FP) polyp candidates identified with computer-aided detection (CAD) and to determine candidate polyp features that are associated with classification accuracy, with known polyps serving as the reference standard. Institutional review board approval and informed consent were obtained; this study was HIPAA compliant. Forty-seven computed tomographic (CT) colonography data sets were obtained in 26 men and 10 women (age range, 42-76 years). Four radiologists classified 705 polyp candidates (53 TP candidates, 652 FP candidates) identified with CAD; initially, only two-dimensional images were used, but these were later supplemented with 3D rendering. Another radiologist unblinded to colonoscopy findings characterized the features of each candidate, assessed colon distention and preparation, and defined the true nature of FP candidates. Receiver operating characteristic curves were used to compare readers' performance, and repeated-measures analysis of variance was used to test features that affect interpretation. Use of 3D viewing improved classification accuracy for three readers and increased the area under the receiver operating characteristic curve to 0.96-0.97 (P<.001). For TP candidates, maximum polyp width (P=.038), polyp height (P=.019), and preparation (P=.004) significantly affected accuracy. For FP candidates, colonic segment (P=.007), attenuation (P<.001), surface smoothness (P<.001), distention (P=.034), preparation (P<.001), and true nature of candidate lesions (P<.001) significantly affected accuracy. Use of 3D viewing increases reader accuracy in the classification of polyp candidates identified with CAD. Polyp size and examination quality are significantly associated with accuracy. Copyright (c) RSNA, 2006.
Kwon, Yea-Hoon; Shin, Sae-Byuk; Kim, Shin-Dug
2018-04-30
The purpose of this study is to improve human emotional classification accuracy using a convolution neural networks (CNN) model and to suggest an overall method to classify emotion based on multimodal data. We improved classification performance by combining electroencephalogram (EEG) and galvanic skin response (GSR) signals. GSR signals are preprocessed using by the zero-crossing rate. Sufficient EEG feature extraction can be obtained through CNN. Therefore, we propose a suitable CNN model for feature extraction by tuning hyper parameters in convolution filters. The EEG signal is preprocessed prior to convolution by a wavelet transform while considering time and frequency simultaneously. We use a database for emotion analysis using the physiological signals open dataset to verify the proposed process, achieving 73.4% accuracy, showing significant performance improvement over the current best practice models.
VO2 estimation using 6-axis motion sensor with sports activity classification.
Nagata, Takashi; Nakamura, Naoteru; Miyatake, Masato; Yuuki, Akira; Yomo, Hiroyuki; Kawabata, Takashi; Hara, Shinsuke
2016-08-01
In this paper, we focus on oxygen consumption (VO2) estimation using 6-axis motion sensor (3-axis accelerometer and 3-axis gyroscope) for people playing sports with diverse intensities. The VO2 estimated with a small motion sensor can be used to calculate the energy expenditure, however, its accuracy depends on the intensities of various types of activities. In order to achieve high accuracy over a wide range of intensities, we employ an estimation framework that first classifies activities with a simple machine-learning based classification algorithm. We prepare different coefficients of linear regression model for different types of activities, which are determined with training data obtained by experiments. The best-suited model is used for each type of activity when VO2 is estimated. The accuracy of the employed framework depends on the trade-off between the degradation due to classification errors and improvement brought by applying separate, optimum model to VO2 estimation. Taking this trade-off into account, we evaluate the accuracy of the employed estimation framework by using a set of experimental data consisting of VO2 and motion data of people with a wide range of intensities of exercises, which were measured by a VO2 meter and motion sensor, respectively. Our numerical results show that the employed framework can improve the estimation accuracy in comparison to a reference method that uses a common regression model for all types of activities.
AUCTSP: an improved biomarker gene pair class predictor.
Kagaris, Dimitri; Khamesipour, Alireza; Yiannoutsos, Constantin T
2018-06-26
The Top Scoring Pair (TSP) classifier, based on the concept of relative ranking reversals in the expressions of pairs of genes, has been proposed as a simple, accurate, and easily interpretable decision rule for classification and class prediction of gene expression profiles. The idea that differences in gene expression ranking are associated with presence or absence of disease is compelling and has strong biological plausibility. Nevertheless, the TSP formulation ignores significant available information which can improve classification accuracy and is vulnerable to selecting genes which do not have differential expression in the two conditions ("pivot" genes). We introduce the AUCTSP classifier as an alternative rank-based estimator of the magnitude of the ranking reversals involved in the original TSP. The proposed estimator is based on the Area Under the Receiver Operating Characteristic (ROC) Curve (AUC) and as such, takes into account the separation of the entire distribution of gene expression levels in gene pairs under the conditions considered, as opposed to comparing gene rankings within individual subjects as in the original TSP formulation. Through extensive simulations and case studies involving classification in ovarian, leukemia, colon, breast and prostate cancers and diffuse large b-cell lymphoma, we show the superiority of the proposed approach in terms of improving classification accuracy, avoiding overfitting and being less prone to selecting non-informative (pivot) genes. The proposed AUCTSP is a simple yet reliable and robust rank-based classifier for gene expression classification. While the AUCTSP works by the same principle as TSP, its ability to determine the top scoring gene pair based on the relative rankings of two marker genes across all subjects as opposed to each individual subject results in significant performance gains in classification accuracy. In addition, the proposed method tends to avoid selection of non-informative (pivot) genes as members of the top-scoring pair.
Multi-phenology WorldView-2 imagery improves remote sensing of savannah tree species
NASA Astrophysics Data System (ADS)
Madonsela, Sabelo; Cho, Moses Azong; Mathieu, Renaud; Mutanga, Onisimo; Ramoelo, Abel; Kaszta, Żaneta; Kerchove, Ruben Van De; Wolff, Eléonore
2017-06-01
Biodiversity mapping in African savannah is important for monitoring changes and ensuring sustainable use of ecosystem resources. Biodiversity mapping can benefit from multi-spectral instruments such as WorldView-2 with very high spatial resolution and a spectral configuration encompassing important spectral regions not previously available for vegetation mapping. This study investigated i) the benefits of the eight-band WorldView-2 (WV-2) spectral configuration for discriminating tree species in Southern African savannah and ii) if multiple-images acquired at key points of the typical phenological development of savannahs (peak productivity, transition to senescence) improve on tree species classifications. We first assessed the discriminatory power of WV-2 bands using interspecies-Spectral Angle Mapper (SAM) via Band Add-On procedure and tested the spectral capability of WorldView-2 against simulated IKONOS for tree species classification. The results from interspecies-SAM procedure identified the yellow and red bands as the most statistically significant bands (p = 0.000251 and p = 0.000039 respectively) in the discriminatory power of WV-2 during the transition from wet to dry season (April). Using Random Forest classifier, the classification scenarios investigated showed that i) the 8-bands of the WV-2 sensor achieved higher classification accuracy for the April date (transition from wet to dry season, senescence) compared to the March date (peak productivity season) ii) the WV-2 spectral configuration systematically outperformed the IKONOS sensor spectral configuration and iii) the multi-temporal approach (March and April combined) improved the discrimination of tress species and produced the highest overall accuracy results at 80.4%. Consistent with the interspecies-SAM procedure, the yellow (605 nm) band also showed a statistically significant contribution in the improved classification accuracy from WV-2. These results highlight the mapping opportunities presented by WV-2 data for monitoring the distribution status of e.g. species often harvested by local communities (e.g. Sclerocharya birrea), encroaching species, or species-specific tree losses induced by elephants.
SNR-adaptive stream weighting for audio-MES ASR.
Lee, Ki-Seung
2008-08-01
Myoelectric signals (MESs) from the speaker's mouth region have been successfully shown to improve the noise robustness of automatic speech recognizers (ASRs), thus promising to extend their usability in implementing noise-robust ASR. In the recognition system presented herein, extracted audio and facial MES features were integrated by a decision fusion method, where the likelihood score of the audio-MES observation vector was given by a linear combination of class-conditional observation log-likelihoods of two classifiers, using appropriate weights. We developed a weighting process adaptive to SNRs. The main objective of the paper involves determining the optimal SNR classification boundaries and constructing a set of optimum stream weights for each SNR class. These two parameters were determined by a method based on a maximum mutual information criterion. Acoustic and facial MES data were collected from five subjects, using a 60-word vocabulary. Four types of acoustic noise including babble, car, aircraft, and white noise were acoustically added to clean speech signals with SNR ranging from -14 to 31 dB. The classification accuracy of the audio ASR was as low as 25.5%. Whereas, the classification accuracy of the MES ASR was 85.2%. The classification accuracy could be further improved by employing the proposed audio-MES weighting method, which was as high as 89.4% in the case of babble noise. A similar result was also found for the other types of noise.
NASA Technical Reports Server (NTRS)
Wu, S. T.
1983-01-01
Data acquired by synthetic aperture radar (SAR) and LANDSAT multispectral scanner (MSS) were processed and analyzed to derive forest-related resources inventory information. The SAR data were acquired by using the NASA aircraft X-band SAR with linear (HH, VV) and cross (HV, VH) polarizations and the SEASAT L-band SAR. After data processing and data quality examination, the three polarization (HH, HV, and VV) data from the aircraft X-band SAR were used in conjunction with LANDSAT MSS for multisensor data classification. The results of accuracy evaluation for the SAR, MSS and SAR/MSS data using supervised classification show that the SAR-only data set contains low classification accuracy for several land cover classes. However, the SAR/MSS data show that significant improvement in classification accuracy is obtained for all eight land cover classes. These results suggest the usefulness of using combined SAR/MSS data for forest-related cover mapping. The SAR data also detect several small special surface features that are not detectable by MSS data.
NASA Astrophysics Data System (ADS)
Nitze, Ingmar; Barrett, Brian; Cawkwell, Fiona
2015-02-01
The analysis and classification of land cover is one of the principal applications in terrestrial remote sensing. Due to the seasonal variability of different vegetation types and land surface characteristics, the ability to discriminate land cover types changes over time. Multi-temporal classification can help to improve the classification accuracies, but different constraints, such as financial restrictions or atmospheric conditions, may impede their application. The optimisation of image acquisition timing and frequencies can help to increase the effectiveness of the classification process. For this purpose, the Feature Importance (FI) measure of the state-of-the art machine learning method Random Forest was used to determine the optimal image acquisition periods for a general (Grassland, Forest, Water, Settlement, Peatland) and Grassland specific (Improved Grassland, Semi-Improved Grassland) land cover classification in central Ireland based on a 9-year time-series of MODIS Terra 16 day composite data (MOD13Q1). Feature Importances for each acquisition period of the Enhanced Vegetation Index (EVI) and Normalised Difference Vegetation Index (NDVI) were calculated for both classification scenarios. In the general land cover classification, the months December and January showed the highest, and July and August the lowest separability for both VIs over the entire nine-year period. This temporal separability was reflected in the classification accuracies, where the optimal choice of image dates outperformed the worst image date by 13% using NDVI and 5% using EVI on a mono-temporal analysis. With the addition of the next best image periods to the data input the classification accuracies converged quickly to their limit at around 8-10 images. The binary classification schemes, using two classes only, showed a stronger seasonal dependency with a higher intra-annual, but lower inter-annual variation. Nonetheless anomalous weather conditions, such as the cold winter of 2009/2010 can alter the temporal separability pattern significantly. Due to the extensive use of the NDVI for land cover discrimination, the findings of this study should be transferrable to data from other optical sensors with a higher spatial resolution. However, the high impact of outliers from the general climatic pattern highlights the limitation of spatial transferability to locations with different climatic and land cover conditions. The use of high-temporal, moderate resolution data such as MODIS in conjunction with machine-learning techniques proved to be a good base for the prediction of image acquisition timing for optimal land cover classification results.
Effect of the atmosphere on the classification of LANDSAT data. [Identifying sugar canes in Brazil
NASA Technical Reports Server (NTRS)
Dejesusparada, N. (Principal Investigator); Morimoto, T.; Kumar, R.; Molion, L. C. B.
1979-01-01
The author has identified the following significant results. In conjunction with Turner's model for the correction of satellite data for atmospheric interference, the LOWTRAN-3 computer was used to calculate the atmospheric interference. Use of the program improved the contrast between different natural targets in the MSS LANDSAT data of Brasilia, Brazil. The classification accuracy of sugar canes was improved by about 9% in the multispectral data of Ribeirao Preto, Sao Paulo.
Wishart Deep Stacking Network for Fast POLSAR Image Classification.
Jiao, Licheng; Liu, Fang
2016-05-11
Inspired by the popular deep learning architecture - Deep Stacking Network (DSN), a specific deep model for polarimetric synthetic aperture radar (POLSAR) image classification is proposed in this paper, which is named as Wishart Deep Stacking Network (W-DSN). First of all, a fast implementation of Wishart distance is achieved by a special linear transformation, which speeds up the classification of POLSAR image and makes it possible to use this polarimetric information in the following Neural Network (NN). Then a single-hidden-layer neural network based on the fast Wishart distance is defined for POLSAR image classification, which is named as Wishart Network (WN) and improves the classification accuracy. Finally, a multi-layer neural network is formed by stacking WNs, which is in fact the proposed deep learning architecture W-DSN for POLSAR image classification and improves the classification accuracy further. In addition, the structure of WN can be expanded in a straightforward way by adding hidden units if necessary, as well as the structure of the W-DSN. As a preliminary exploration on formulating specific deep learning architecture for POLSAR image classification, the proposed methods may establish a simple but clever connection between POLSAR image interpretation and deep learning. The experiment results tested on real POLSAR image show that the fast implementation of Wishart distance is very efficient (a POLSAR image with 768000 pixels can be classified in 0.53s), and both the single-hidden-layer architecture WN and the deep learning architecture W-DSN for POLSAR image classification perform well and work efficiently.
Parsons, Helen M; Ludwig, Christian; Günther, Ulrich L; Viant, Mark R
2007-01-01
Background Classifying nuclear magnetic resonance (NMR) spectra is a crucial step in many metabolomics experiments. Since several multivariate classification techniques depend upon the variance of the data, it is important to first minimise any contribution from unwanted technical variance arising from sample preparation and analytical measurements, and thereby maximise any contribution from wanted biological variance between different classes. The generalised logarithm (glog) transform was developed to stabilise the variance in DNA microarray datasets, but has rarely been applied to metabolomics data. In particular, it has not been rigorously evaluated against other scaling techniques used in metabolomics, nor tested on all forms of NMR spectra including 1-dimensional (1D) 1H, projections of 2D 1H, 1H J-resolved (pJRES), and intact 2D J-resolved (JRES). Results Here, the effects of the glog transform are compared against two commonly used variance stabilising techniques, autoscaling and Pareto scaling, as well as unscaled data. The four methods are evaluated in terms of the effects on the variance of NMR metabolomics data and on the classification accuracy following multivariate analysis, the latter achieved using principal component analysis followed by linear discriminant analysis. For two of three datasets analysed, classification accuracies were highest following glog transformation: 100% accuracy for discriminating 1D NMR spectra of hypoxic and normoxic invertebrate muscle, and 100% accuracy for discriminating 2D JRES spectra of fish livers sampled from two rivers. For the third dataset, pJRES spectra of urine from two breeds of dog, the glog transform and autoscaling achieved equal highest accuracies. Additionally we extended the glog algorithm to effectively suppress noise, which proved critical for the analysis of 2D JRES spectra. Conclusion We have demonstrated that the glog and extended glog transforms stabilise the technical variance in NMR metabolomics datasets. This significantly improves the discrimination between sample classes and has resulted in higher classification accuracies compared to unscaled, autoscaled or Pareto scaled data. Additionally we have confirmed the broad applicability of the glog approach using three disparate datasets from different biological samples using 1D NMR spectra, 1D projections of 2D JRES spectra, and intact 2D JRES spectra. PMID:17605789
Feature Extraction of Electronic Nose Signals Using QPSO-Based Multiple KFDA Signal Processing
Wen, Tailai; Huang, Daoyu; Lu, Kun; Deng, Changjian; Zeng, Tanyue; Yu, Song; He, Zhiyi
2018-01-01
The aim of this research was to enhance the classification accuracy of an electronic nose (E-nose) in different detecting applications. During the learning process of the E-nose to predict the types of different odors, the prediction accuracy was not quite satisfying because the raw features extracted from sensors’ responses were regarded as the input of a classifier without any feature extraction processing. Therefore, in order to obtain more useful information and improve the E-nose’s classification accuracy, in this paper, a Weighted Kernels Fisher Discriminant Analysis (WKFDA) combined with Quantum-behaved Particle Swarm Optimization (QPSO), i.e., QWKFDA, was presented to reprocess the original feature matrix. In addition, we have also compared the proposed method with quite a few previously existing ones including Principal Component Analysis (PCA), Locality Preserving Projections (LPP), Fisher Discriminant Analysis (FDA) and Kernels Fisher Discriminant Analysis (KFDA). Experimental results proved that QWKFDA is an effective feature extraction method for E-nose in predicting the types of wound infection and inflammable gases, which shared much higher classification accuracy than those of the contrast methods. PMID:29382146
Feature Extraction of Electronic Nose Signals Using QPSO-Based Multiple KFDA Signal Processing.
Wen, Tailai; Yan, Jia; Huang, Daoyu; Lu, Kun; Deng, Changjian; Zeng, Tanyue; Yu, Song; He, Zhiyi
2018-01-29
The aim of this research was to enhance the classification accuracy of an electronic nose (E-nose) in different detecting applications. During the learning process of the E-nose to predict the types of different odors, the prediction accuracy was not quite satisfying because the raw features extracted from sensors' responses were regarded as the input of a classifier without any feature extraction processing. Therefore, in order to obtain more useful information and improve the E-nose's classification accuracy, in this paper, a Weighted Kernels Fisher Discriminant Analysis (WKFDA) combined with Quantum-behaved Particle Swarm Optimization (QPSO), i.e., QWKFDA, was presented to reprocess the original feature matrix. In addition, we have also compared the proposed method with quite a few previously existing ones including Principal Component Analysis (PCA), Locality Preserving Projections (LPP), Fisher Discriminant Analysis (FDA) and Kernels Fisher Discriminant Analysis (KFDA). Experimental results proved that QWKFDA is an effective feature extraction method for E-nose in predicting the types of wound infection and inflammable gases, which shared much higher classification accuracy than those of the contrast methods.
Significance of perceptually relevant image decolorization for scene classification
NASA Astrophysics Data System (ADS)
Viswanathan, Sowmya; Divakaran, Govind; Soman, Kutti Padanyl
2017-11-01
Color images contain luminance and chrominance components representing the intensity and color information, respectively. The objective of this paper is to show the significance of incorporating chrominance information to the task of scene classification. An improved color-to-grayscale image conversion algorithm that effectively incorporates chrominance information is proposed using the color-to-gray structure similarity index and singular value decomposition to improve the perceptual quality of the converted grayscale images. The experimental results based on an image quality assessment for image decolorization and its success rate (using the Cadik and COLOR250 datasets) show that the proposed image decolorization technique performs better than eight existing benchmark algorithms for image decolorization. In the second part of the paper, the effectiveness of incorporating the chrominance component for scene classification tasks is demonstrated using a deep belief network-based image classification system developed using dense scale-invariant feature transforms. The amount of chrominance information incorporated into the proposed image decolorization technique is confirmed with the improvement to the overall scene classification accuracy. Moreover, the overall scene classification performance improved by combining the models obtained using the proposed method and conventional decolorization methods.
AVNM: A Voting based Novel Mathematical Rule for Image Classification.
Vidyarthi, Ankit; Mittal, Namita
2016-12-01
In machine learning, the accuracy of the system depends upon classification result. Classification accuracy plays an imperative role in various domains. Non-parametric classifier like K-Nearest Neighbor (KNN) is the most widely used classifier for pattern analysis. Besides its easiness, simplicity and effectiveness characteristics, the main problem associated with KNN classifier is the selection of a number of nearest neighbors i.e. "k" for computation. At present, it is hard to find the optimal value of "k" using any statistical algorithm, which gives perfect accuracy in terms of low misclassification error rate. Motivated by the prescribed problem, a new sample space reduction weighted voting mathematical rule (AVNM) is proposed for classification in machine learning. The proposed AVNM rule is also non-parametric in nature like KNN. AVNM uses the weighted voting mechanism with sample space reduction to learn and examine the predicted class label for unidentified sample. AVNM is free from any initial selection of predefined variable and neighbor selection as found in KNN algorithm. The proposed classifier also reduces the effect of outliers. To verify the performance of the proposed AVNM classifier, experiments are made on 10 standard datasets taken from UCI database and one manually created dataset. The experimental result shows that the proposed AVNM rule outperforms the KNN classifier and its variants. Experimentation results based on confusion matrix accuracy parameter proves higher accuracy value with AVNM rule. The proposed AVNM rule is based on sample space reduction mechanism for identification of an optimal number of nearest neighbor selections. AVNM results in better classification accuracy and minimum error rate as compared with the state-of-art algorithm, KNN, and its variants. The proposed rule automates the selection of nearest neighbor selection and improves classification rate for UCI dataset and manually created dataset. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
An unsupervised classification technique for multispectral remote sensing data.
NASA Technical Reports Server (NTRS)
Su, M. Y.; Cummings, R. E.
1973-01-01
Description of a two-part clustering technique consisting of (a) a sequential statistical clustering, which is essentially a sequential variance analysis, and (b) a generalized K-means clustering. In this composite clustering technique, the output of (a) is a set of initial clusters which are input to (b) for further improvement by an iterative scheme. This unsupervised composite technique was employed for automatic classification of two sets of remote multispectral earth resource observations. The classification accuracy by the unsupervised technique is found to be comparable to that by traditional supervised maximum-likelihood classification techniques.
Unsupervised classification of earth resources data.
NASA Technical Reports Server (NTRS)
Su, M. Y.; Jayroe, R. R., Jr.; Cummings, R. E.
1972-01-01
A new clustering technique is presented. It consists of two parts: (a) a sequential statistical clustering which is essentially a sequential variance analysis and (b) a generalized K-means clustering. In this composite clustering technique, the output of (a) is a set of initial clusters which are input to (b) for further improvement by an iterative scheme. This unsupervised composite technique was employed for automatic classification of two sets of remote multispectral earth resource observations. The classification accuracy by the unsupervised technique is found to be comparable to that by existing supervised maximum liklihood classification technique.
Incorporating spatial context into statistical classification of multidimensional image data
NASA Technical Reports Server (NTRS)
Bauer, M. E. (Principal Investigator); Tilton, J. C.; Swain, P. H.
1981-01-01
Compound decision theory is employed to develop a general statistical model for classifying image data using spatial context. The classification algorithm developed from this model exploits the tendency of certain ground-cover classes to occur more frequently in some spatial contexts than in others. A key input to this contextural classifier is a quantitative characterization of this tendency: the context function. Several methods for estimating the context function are explored, and two complementary methods are recommended. The contextural classifier is shown to produce substantial improvements in classification accuracy compared to the accuracy produced by a non-contextural uniform-priors maximum likelihood classifier when these methods of estimating the context function are used. An approximate algorithm, which cuts computational requirements by over one-half, is presented. The search for an optimal implementation is furthered by an exploration of the relative merits of using spectral classes or information classes for classification and/or context function estimation.
NASA Technical Reports Server (NTRS)
Huck, F. O.; Davis, R. E.; Fales, C. L.; Aherron, R. M.
1982-01-01
A computational model of the deterministic and stochastic processes involved in remote sensing is used to study spectral feature identification techniques for real-time onboard processing of data acquired with advanced earth-resources sensors. Preliminary results indicate that: Narrow spectral responses are advantageous; signal normalization improves mean-square distance (MSD) classification accuracy but tends to degrade maximum-likelihood (MLH) classification accuracy; and MSD classification of normalized signals performs better than the computationally more complex MLH classification when imaging conditions change appreciably from those conditions during which reference data were acquired. The results also indicate that autonomous categorization of TM signals into vegetation, bare land, water, snow and clouds can be accomplished with adequate reliability for many applications over a reasonably wide range of imaging conditions. However, further analysis is required to develop computationally efficient boundary approximation algorithms for such categorization.
Electromyogram whitening for improved classification accuracy in upper limb prosthesis control.
Liu, Lukai; Liu, Pu; Clancy, Edward A; Scheme, Erik; Englehart
2013-09-01
Time and frequency domain features of the surface electromyogram (EMG) signal acquired from multiple channels have frequently been investigated for use in controlling upper-limb prostheses. A common control method is EMG-based motion classification. We propose the use of EMG signal whitening as a preprocessing step in EMG-based motion classification. Whitening decorrelates the EMG signal and has been shown to be advantageous in other EMG applications including EMG amplitude estimation and EMG-force processing. In a study of ten intact subjects and five amputees with up to 11 motion classes and ten electrode channels, we found that the coefficient of variation of time domain features (mean absolute value, average signal length and normalized zero crossing rate) was significantly reduced due to whitening. When using these features along with autoregressive power spectrum coefficients, whitening added approximately five percentage points to classification accuracy when small window lengths were considered.
Telephone-quality pathological speech classification using empirical mode decomposition.
Kaleem, M F; Ghoraani, B; Guergachi, A; Krishnan, S
2011-01-01
This paper presents a computationally simple and effective methodology based on empirical mode decomposition (EMD) for classification of telephone quality normal and pathological speech signals. EMD is used to decompose continuous normal and pathological speech signals into intrinsic mode functions, which are analyzed to extract physically meaningful and unique temporal and spectral features. Using continuous speech samples from a database of 51 normal and 161 pathological speakers, which has been modified to simulate telephone quality speech under different levels of noise, a linear classifier is used with the feature vector thus obtained to obtain a high classification accuracy, thereby demonstrating the effectiveness of the methodology. The classification accuracy reported in this paper (89.7% for signal-to-noise ratio 30 dB) is a significant improvement over previously reported results for the same task, and demonstrates the utility of our methodology for cost-effective remote voice pathology assessment over telephone channels.
Zhang, Jianhua; Li, Sunan; Wang, Rubin
2017-01-01
In this paper, we deal with the Mental Workload (MWL) classification problem based on the measured physiological data. First we discussed the optimal depth (i.e., the number of hidden layers) and parameter optimization algorithms for the Convolutional Neural Networks (CNN). The base CNNs designed were tested according to five classification performance indices, namely Accuracy, Precision, F-measure, G-mean, and required training time. Then we developed an Ensemble Convolutional Neural Network (ECNN) to enhance the accuracy and robustness of the individual CNN model. For the ECNN design, three model aggregation approaches (weighted averaging, majority voting and stacking) were examined and a resampling strategy was used to enhance the diversity of individual CNN models. The results of MWL classification performance comparison indicated that the proposed ECNN framework can effectively improve MWL classification performance and is featured by entirely automatic feature extraction and MWL classification, when compared with traditional machine learning methods.
A Visual mining based framework for classification accuracy estimation
NASA Astrophysics Data System (ADS)
Arun, Pattathal Vijayakumar
2013-12-01
Classification techniques have been widely used in different remote sensing applications and correct classification of mixed pixels is a tedious task. Traditional approaches adopt various statistical parameters, however does not facilitate effective visualisation. Data mining tools are proving very helpful in the classification process. We propose a visual mining based frame work for accuracy assessment of classification techniques using open source tools such as WEKA and PREFUSE. These tools in integration can provide an efficient approach for getting information about improvements in the classification accuracy and helps in refining training data set. We have illustrated framework for investigating the effects of various resampling methods on classification accuracy and found that bilinear (BL) is best suited for preserving radiometric characteristics. We have also investigated the optimal number of folds required for effective analysis of LISS-IV images. Techniki klasyfikacji są szeroko wykorzystywane w różnych aplikacjach teledetekcyjnych, w których poprawna klasyfikacja pikseli stanowi poważne wyzwanie. Podejście tradycyjne wykorzystujące różnego rodzaju parametry statystyczne nie zapewnia efektywnej wizualizacji. Wielce obiecujące wydaje się zastosowanie do klasyfikacji narzędzi do eksploracji danych. W artykule zaproponowano podejście bazujące na wizualnej analizie eksploracyjnej, wykorzystujące takie narzędzia typu open source jak WEKA i PREFUSE. Wymienione narzędzia ułatwiają korektę pół treningowych i efektywnie wspomagają poprawę dokładności klasyfikacji. Działanie metody sprawdzono wykorzystując wpływ różnych metod resampling na zachowanie dokładności radiometrycznej i uzyskując najlepsze wyniki dla metody bilinearnej (BL).
Hyperspectral analysis of columbia spotted frog habitat
Shive, J.P.; Pilliod, D.S.; Peterson, C.R.
2010-01-01
Wildlife managers increasingly are using remotely sensed imagery to improve habitat delineations and sampling strategies. Advances in remote sensing technology, such as hyperspectral imagery, provide more information than previously was available with multispectral sensors. We evaluated accuracy of high-resolution hyperspectral image classifications to identify wetlands and wetland habitat features important for Columbia spotted frogs (Rana luteiventris) and compared the results to multispectral image classification and United States Geological Survey topographic maps. The study area spanned 3 lake basins in the Salmon River Mountains, Idaho, USA. Hyperspectral data were collected with an airborne sensor on 30 June 2002 and on 8 July 2006. A 12-year comprehensive ground survey of the study area for Columbia spotted frog reproduction served as validation for image classifications. Hyperspectral image classification accuracy of wetlands was high, with a producer's accuracy of 96 (44 wetlands) correctly classified with the 2002 data and 89 (41 wetlands) correctly classified with the 2006 data. We applied habitat-based rules to delineate breeding habitat from other wetlands, and successfully predicted 74 (14 wetlands) of known breeding wetlands for the Columbia spotted frog. Emergent sedge microhabitat classification showed promise for directly predicting Columbia spotted frog egg mass locations within a wetland by correctly identifying 72 (23 of 32) of known locations. Our study indicates hyperspectral imagery can be an effective tool for mapping spotted frog breeding habitat in the selected mountain basins. We conclude that this technique has potential for improving site selection for inventory and monitoring programs conducted across similar wetland habitat and can be a useful tool for delineating wildlife habitats. ?? 2010 The Wildlife Society.
Classification of Focal and Non Focal Epileptic Seizures Using Multi-Features and SVM Classifier.
Sriraam, N; Raghu, S
2017-09-02
Identifying epileptogenic zones prior to surgery is an essential and crucial step in treating patients having pharmacoresistant focal epilepsy. Electroencephalogram (EEG) is a significant measurement benchmark to assess patients suffering from epilepsy. This paper investigates the application of multi-features derived from different domains to recognize the focal and non focal epileptic seizures obtained from pharmacoresistant focal epilepsy patients from Bern Barcelona database. From the dataset, five different classification tasks were formed. Total 26 features were extracted from focal and non focal EEG. Significant features were selected using Wilcoxon rank sum test by setting p-value (p < 0.05) and z-score (-1.96 > z > 1.96) at 95% significance interval. Hypothesis was made that the effect of removing outliers improves the classification accuracy. Turkey's range test was adopted for pruning outliers from feature set. Finally, 21 features were classified using optimized support vector machine (SVM) classifier with 10-fold cross validation. Bayesian optimization technique was adopted to minimize the cross-validation loss. From the simulation results, it was inferred that the highest sensitivity, specificity, and classification accuracy of 94.56%, 89.74%, and 92.15% achieved respectively and found to be better than the state-of-the-art approaches. Further, it was observed that the classification accuracy improved from 80.2% with outliers to 92.15% without outliers. The classifier performance metrics ensures the suitability of the proposed multi-features with optimized SVM classifier. It can be concluded that the proposed approach can be applied for recognition of focal EEG signals to localize epileptogenic zones.
NASA Astrophysics Data System (ADS)
Brandl, Miriam B.; Beck, Dominik; Pham, Tuan D.
2011-06-01
The high dimensionality of image-based dataset can be a drawback for classification accuracy. In this study, we propose the application of fuzzy c-means clustering, cluster validity indices and the notation of a joint-feature-clustering matrix to find redundancies of image-features. The introduced matrix indicates how frequently features are grouped in a mutual cluster. The resulting information can be used to find data-derived feature prototypes with a common biological meaning, reduce data storage as well as computation times and improve the classification accuracy.
Efficient alignment-free DNA barcode analytics.
Kuksa, Pavel; Pavlovic, Vladimir
2009-11-10
In this work we consider barcode DNA analysis problems and address them using alternative, alignment-free methods and representations which model sequences as collections of short sequence fragments (features). The methods use fixed-length representations (spectrum) for barcode sequences to measure similarities or dissimilarities between sequences coming from the same or different species. The spectrum-based representation not only allows for accurate and computationally efficient species classification, but also opens possibility for accurate clustering analysis of putative species barcodes and identification of critical within-barcode loci distinguishing barcodes of different sample groups. New alignment-free methods provide highly accurate and fast DNA barcode-based identification and classification of species with substantial improvements in accuracy and speed over state-of-the-art barcode analysis methods. We evaluate our methods on problems of species classification and identification using barcodes, important and relevant analytical tasks in many practical applications (adverse species movement monitoring, sampling surveys for unknown or pathogenic species identification, biodiversity assessment, etc.) On several benchmark barcode datasets, including ACG, Astraptes, Hesperiidae, Fish larvae, and Birds of North America, proposed alignment-free methods considerably improve prediction accuracy compared to prior results. We also observe significant running time improvements over the state-of-the-art methods. Our results show that newly developed alignment-free methods for DNA barcoding can efficiently and with high accuracy identify specimens by examining only few barcode features, resulting in increased scalability and interpretability of current computational approaches to barcoding.
Land use/cover classification in the Brazilian Amazon using satellite images.
Lu, Dengsheng; Batistella, Mateus; Li, Guiying; Moran, Emilio; Hetrick, Scott; Freitas, Corina da Costa; Dutra, Luciano Vieira; Sant'anna, Sidnei João Siqueira
2012-09-01
Land use/cover classification is one of the most important applications in remote sensing. However, mapping accurate land use/cover spatial distribution is a challenge, particularly in moist tropical regions, due to the complex biophysical environment and limitations of remote sensing data per se. This paper reviews experiments related to land use/cover classification in the Brazilian Amazon for a decade. Through comprehensive analysis of the classification results, it is concluded that spatial information inherent in remote sensing data plays an essential role in improving land use/cover classification. Incorporation of suitable textural images into multispectral bands and use of segmentation-based method are valuable ways to improve land use/cover classification, especially for high spatial resolution images. Data fusion of multi-resolution images within optical sensor data is vital for visual interpretation, but may not improve classification performance. In contrast, integration of optical and radar data did improve classification performance when the proper data fusion method was used. Of the classification algorithms available, the maximum likelihood classifier is still an important method for providing reasonably good accuracy, but nonparametric algorithms, such as classification tree analysis, has the potential to provide better results. However, they often require more time to achieve parametric optimization. Proper use of hierarchical-based methods is fundamental for developing accurate land use/cover classification, mainly from historical remotely sensed data.
Land use/cover classification in the Brazilian Amazon using satellite images
Lu, Dengsheng; Batistella, Mateus; Li, Guiying; Moran, Emilio; Hetrick, Scott; Freitas, Corina da Costa; Dutra, Luciano Vieira; Sant’Anna, Sidnei João Siqueira
2013-01-01
Land use/cover classification is one of the most important applications in remote sensing. However, mapping accurate land use/cover spatial distribution is a challenge, particularly in moist tropical regions, due to the complex biophysical environment and limitations of remote sensing data per se. This paper reviews experiments related to land use/cover classification in the Brazilian Amazon for a decade. Through comprehensive analysis of the classification results, it is concluded that spatial information inherent in remote sensing data plays an essential role in improving land use/cover classification. Incorporation of suitable textural images into multispectral bands and use of segmentation-based method are valuable ways to improve land use/cover classification, especially for high spatial resolution images. Data fusion of multi-resolution images within optical sensor data is vital for visual interpretation, but may not improve classification performance. In contrast, integration of optical and radar data did improve classification performance when the proper data fusion method was used. Of the classification algorithms available, the maximum likelihood classifier is still an important method for providing reasonably good accuracy, but nonparametric algorithms, such as classification tree analysis, has the potential to provide better results. However, they often require more time to achieve parametric optimization. Proper use of hierarchical-based methods is fundamental for developing accurate land use/cover classification, mainly from historical remotely sensed data. PMID:24353353
Crabtree, Nathaniel M; Moore, Jason H; Bowyer, John F; George, Nysia I
2017-01-01
A computational evolution system (CES) is a knowledge discovery engine that can identify subtle, synergistic relationships in large datasets. Pareto optimization allows CESs to balance accuracy with model complexity when evolving classifiers. Using Pareto optimization, a CES is able to identify a very small number of features while maintaining high classification accuracy. A CES can be designed for various types of data, and the user can exploit expert knowledge about the classification problem in order to improve discrimination between classes. These characteristics give CES an advantage over other classification and feature selection algorithms, particularly when the goal is to identify a small number of highly relevant, non-redundant biomarkers. Previously, CESs have been developed only for binary class datasets. In this study, we developed a multi-class CES. The multi-class CES was compared to three common feature selection and classification algorithms: support vector machine (SVM), random k-nearest neighbor (RKNN), and random forest (RF). The algorithms were evaluated on three distinct multi-class RNA sequencing datasets. The comparison criteria were run-time, classification accuracy, number of selected features, and stability of selected feature set (as measured by the Tanimoto distance). The performance of each algorithm was data-dependent. CES performed best on the dataset with the smallest sample size, indicating that CES has a unique advantage since the accuracy of most classification methods suffer when sample size is small. The multi-class extension of CES increases the appeal of its application to complex, multi-class datasets in order to identify important biomarkers and features.
A novel artificial immune clonal selection classification and rule mining with swarm learning model
NASA Astrophysics Data System (ADS)
Al-Sheshtawi, Khaled A.; Abdul-Kader, Hatem M.; Elsisi, Ashraf B.
2013-06-01
Metaheuristic optimisation algorithms have become popular choice for solving complex problems. By integrating Artificial Immune clonal selection algorithm (CSA) and particle swarm optimisation (PSO) algorithm, a novel hybrid Clonal Selection Classification and Rule Mining with Swarm Learning Algorithm (CS2) is proposed. The main goal of the approach is to exploit and explore the parallel computation merit of Clonal Selection and the speed and self-organisation merits of Particle Swarm by sharing information between clonal selection population and particle swarm. Hence, we employed the advantages of PSO to improve the mutation mechanism of the artificial immune CSA and to mine classification rules within datasets. Consequently, our proposed algorithm required less training time and memory cells in comparison to other AIS algorithms. In this paper, classification rule mining has been modelled as a miltiobjective optimisation problem with predictive accuracy. The multiobjective approach is intended to allow the PSO algorithm to return an approximation to the accuracy and comprehensibility border, containing solutions that are spread across the border. We compared our proposed algorithm classification accuracy CS2 with five commonly used CSAs, namely: AIRS1, AIRS2, AIRS-Parallel, CLONALG, and CSCA using eight benchmark datasets. We also compared our proposed algorithm classification accuracy CS2 with other five methods, namely: Naïve Bayes, SVM, MLP, CART, and RFB. The results show that the proposed algorithm is comparable to the 10 studied algorithms. As a result, the hybridisation, built of CSA and PSO, can develop respective merit, compensate opponent defect, and make search-optimal effect and speed better.
Armutlu, Pelin; Ozdemir, Muhittin E; Uney-Yuksektepe, Fadime; Kavakli, I Halil; Turkay, Metin
2008-10-03
A priori analysis of the activity of drugs on the target protein by computational approaches can be useful in narrowing down drug candidates for further experimental tests. Currently, there are a large number of computational methods that predict the activity of drugs on proteins. In this study, we approach the activity prediction problem as a classification problem and, we aim to improve the classification accuracy by introducing an algorithm that combines partial least squares regression with mixed-integer programming based hyper-boxes classification method, where drug molecules are classified as low active or high active regarding their binding activity (IC50 values) on target proteins. We also aim to determine the most significant molecular descriptors for the drug molecules. We first apply our approach by analyzing the activities of widely known inhibitor datasets including Acetylcholinesterase (ACHE), Benzodiazepine Receptor (BZR), Dihydrofolate Reductase (DHFR), Cyclooxygenase-2 (COX-2) with known IC50 values. The results at this stage proved that our approach consistently gives better classification accuracies compared to 63 other reported classification methods such as SVM, Naïve Bayes, where we were able to predict the experimentally determined IC50 values with a worst case accuracy of 96%. To further test applicability of this approach we first created dataset for Cytochrome P450 C17 inhibitors and then predicted their activities with 100% accuracy. Our results indicate that this approach can be utilized to predict the inhibitory effects of inhibitors based on their molecular descriptors. This approach will not only enhance drug discovery process, but also save time and resources committed.
The use of the modified Cholesky decomposition in divergence and classification calculations
NASA Technical Reports Server (NTRS)
Vanroony, D. L.; Lynn, M. S.; Snyder, C. H.
1973-01-01
The use of the Cholesky decomposition technique is analyzed as applied to the feature selection and classification algorithms used in the analysis of remote sensing data (e.g. as in LARSYS). This technique is approximately 30% faster in classification and a factor of 2-3 faster in divergence, as compared with LARSYS. Also numerical stability and accuracy are slightly improved. Other methods necessary to deal with numerical stablity problems are briefly discussed.
The use of the modified Cholesky decomposition in divergence and classification calculations
NASA Technical Reports Server (NTRS)
Van Rooy, D. L.; Lynn, M. S.; Snyder, C. H.
1973-01-01
This report analyzes the use of the modified Cholesky decomposition technique as applied to the feature selection and classification algorithms used in the analysis of remote sensing data (e.g., as in LARSYS). This technique is approximately 30% faster in classification and a factor of 2-3 faster in divergence, as compared with LARSYS. Also numerical stability and accuracy are slightly improved. Other methods necessary to deal with numerical stability problems are briefly discussed.
LDA boost classification: boosting by topics
NASA Astrophysics Data System (ADS)
Lei, La; Qiao, Guo; Qimin, Cao; Qitao, Li
2012-12-01
AdaBoost is an efficacious classification algorithm especially in text categorization (TC) tasks. The methodology of setting up a classifier committee and voting on the documents for classification can achieve high categorization precision. However, traditional Vector Space Model can easily lead to the curse of dimensionality and feature sparsity problems; so it affects classification performance seriously. This article proposed a novel classification algorithm called LDABoost based on boosting ideology which uses Latent Dirichlet Allocation (LDA) to modeling the feature space. Instead of using words or phrase, LDABoost use latent topics as the features. In this way, the feature dimension is significantly reduced. Improved Naïve Bayes (NB) is designed as the weaker classifier which keeps the efficiency advantage of classic NB algorithm and has higher precision. Moreover, a two-stage iterative weighted method called Cute Integration in this article is proposed for improving the accuracy by integrating weak classifiers into strong classifier in a more rational way. Mutual Information is used as metrics of weights allocation. The voting information and the categorization decision made by basis classifiers are fully utilized for generating the strong classifier. Experimental results reveals LDABoost making categorization in a low-dimensional space, it has higher accuracy than traditional AdaBoost algorithms and many other classic classification algorithms. Moreover, its runtime consumption is lower than different versions of AdaBoost, TC algorithms based on support vector machine and Neural Networks.
Classification of Clouds in Satellite Imagery Using Adaptive Fuzzy Sparse Representation.
Jin, Wei; Gong, Fei; Zeng, Xingbin; Fu, Randi
2016-12-16
Automatic cloud detection and classification using satellite cloud imagery have various meteorological applications such as weather forecasting and climate monitoring. Cloud pattern analysis is one of the research hotspots recently. Since satellites sense the clouds remotely from space, and different cloud types often overlap and convert into each other, there must be some fuzziness and uncertainty in satellite cloud imagery. Satellite observation is susceptible to noises, while traditional cloud classification methods are sensitive to noises and outliers; it is hard for traditional cloud classification methods to achieve reliable results. To deal with these problems, a satellite cloud classification method using adaptive fuzzy sparse representation-based classification (AFSRC) is proposed. Firstly, by defining adaptive parameters related to attenuation rate and critical membership, an improved fuzzy membership is introduced to accommodate the fuzziness and uncertainty of satellite cloud imagery; secondly, by effective combination of the improved fuzzy membership function and sparse representation-based classification (SRC), atoms in training dictionary are optimized; finally, an adaptive fuzzy sparse representation classifier for cloud classification is proposed. Experiment results on FY-2G satellite cloud image show that, the proposed method not only improves the accuracy of cloud classification, but also has strong stability and adaptability with high computational efficiency.
SoFoCles: feature filtering for microarray classification based on gene ontology.
Papachristoudis, Georgios; Diplaris, Sotiris; Mitkas, Pericles A
2010-02-01
Marker gene selection has been an important research topic in the classification analysis of gene expression data. Current methods try to reduce the "curse of dimensionality" by using statistical intra-feature set calculations, or classifiers that are based on the given dataset. In this paper, we present SoFoCles, an interactive tool that enables semantic feature filtering in microarray classification problems with the use of external, well-defined knowledge retrieved from the Gene Ontology. The notion of semantic similarity is used to derive genes that are involved in the same biological path during the microarray experiment, by enriching a feature set that has been initially produced with legacy methods. Among its other functionalities, SoFoCles offers a large repository of semantic similarity methods that are used in order to derive feature sets and marker genes. The structure and functionality of the tool are discussed in detail, as well as its ability to improve classification accuracy. Through experimental evaluation, SoFoCles is shown to outperform other classification schemes in terms of classification accuracy in two real datasets using different semantic similarity computation approaches.
NASA Astrophysics Data System (ADS)
Yao, C.; Zhang, Y.; Zhang, Y.; Liu, H.
2017-09-01
With the rapid development of Precision Agriculture (PA) promoted by high-resolution remote sensing, it makes significant sense in management and estimation of agriculture through crop classification of high-resolution remote sensing image. Due to the complex and fragmentation of the features and the surroundings in the circumstance of high-resolution, the accuracy of the traditional classification methods has not been able to meet the standard of agricultural problems. In this case, this paper proposed a classification method for high-resolution agricultural remote sensing images based on convolution neural networks(CNN). For training, a large number of training samples were produced by panchromatic images of GF-1 high-resolution satellite of China. In the experiment, through training and testing on the CNN under the toolbox of deep learning by MATLAB, the crop classification finally got the correct rate of 99.66 % after the gradual optimization of adjusting parameter during training. Through improving the accuracy of image classification and image recognition, the applications of CNN provide a reference value for the field of remote sensing in PA.
Towards automated spectroscopic tissue classification in thyroid and parathyroid surgery.
Schols, Rutger M; Alic, Lejla; Wieringa, Fokko P; Bouvy, Nicole D; Stassen, Laurents P S
2017-03-01
In (para-)thyroid surgery iatrogenic parathyroid injury should be prevented. To aid the surgeons' eye, a camera system enabling parathyroid-specific image enhancement would be useful. Hyperspectral camera technology might work, provided that the spectral signature of parathyroid tissue offers enough specific features to be reliably and automatically distinguished from surrounding tissues. As a first step to investigate this, we examined the feasibility of wide band diffuse reflectance spectroscopy (DRS) for automated spectroscopic tissue classification, using silicon (Si) and indium-gallium-arsenide (InGaAs) sensors. DRS (350-1830 nm) was performed during (para-)thyroid resections. From the acquired spectra 36 features at predefined wavelengths were extracted. The best features for classification of parathyroid from adipose or thyroid were assessed by binary logistic regression for Si- and InGaAs-sensor ranges. Classification performance was evaluated by leave-one-out cross-validation. In 19 patients 299 spectra were recorded (62 tissue sites: thyroid = 23, parathyroid = 21, adipose = 18). Classification accuracy of parathyroid-adipose was, respectively, 79% (Si), 82% (InGaAs) and 97% (Si/InGaAs combined). Parathyroid-thyroid classification accuracies were 80% (Si), 75% (InGaAs), 82% (Si/InGaAs combined). Si and InGaAs sensors are fairly accurate for automated spectroscopic classification of parathyroid, adipose and thyroid tissues. Combination of both sensor technologies improves accuracy. Follow-up research, aimed towards hyperspectral imaging seems justified. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Detection of artificially ripened mango using spectrometric analysis
NASA Astrophysics Data System (ADS)
Mithun, B. S.; Mondal, Milton; Vishwakarma, Harsh; Shinde, Sujit; Kimbahune, Sanjay
2017-05-01
Hyperspectral sensing has been proven to be useful to determine the quality of food in general. It has also been used to distinguish naturally and artificially ripened mangoes by analyzing the spectral signature. However the focus has been on improving the accuracy of classification after performing dimensionality reduction, optimum feature selection and using suitable learning algorithm on the complete visible and NIR spectrum range data, namely 350nm to 1050nm. In this paper we focus on, (i) the use of low wavelength resolution and low cost multispectral sensor to reliably identify artificially ripened mango by selectively using the spectral information so that classification accuracy is not hampered at the cost of low resolution spectral data and (ii) use of visible spectrum i.e. 390nm to 700 nm data to accurately discriminate artificially ripened mangoes. Our results show that on a low resolution spectral data, the use of logistic regression produces an accuracy of 98.83% and outperforms other methods like classification tree, random forest significantly. And this is achieved by analyzing only 36 spectral reflectance data points instead of the complete 216 data points available in visual and NIR range. Another interesting experimental observation is that we are able to achieve more than 98% classification accuracy by selecting only 15 irradiance values in the visible spectrum. Even the number of data needs to be collected using hyper-spectral or multi-spectral sensor can be reduced by a factor of 24 for classification with high degree of confidence
Mapping Mangrove Density from Rapideye Data in Central America
NASA Astrophysics Data System (ADS)
Son, Nguyen-Thanh; Chen, Chi-Farn; Chen, Cheng-Ru
2017-06-01
Mangrove forests provide a wide range of socioeconomic and ecological services for coastal communities. Extensive aquaculture development of mangrove waters in many developing countries has constantly ignored services of mangrove ecosystems, leading to unintended environmental consequences. Monitoring the current status and distribution of mangrove forests is deemed important for evaluating forest management strategies. This study aims to delineate the density distribution of mangrove forests in the Gulf of Fonseca, Central America with Rapideye data using the support vector machines (SVM). The data collected in 2012 for density classification of mangrove forests were processed based on four different band combination schemes: scheme-1 (bands 1-3, 5 excluding the red-edge band 4), scheme-2 (bands 1-5), scheme-3 (bands 1-3, 5 incorporating with the normalized difference vegetation index, NDVI), and scheme-4 (bands 1-3, 5 incorporating with the normalized difference red-edge index, NDRI). We also hypothesized if the obvious contribution of Rapideye red-edge band could improve the classification results. Three main steps of data processing were employed: (1), data pre-processing, (2) image classification, and (3) accuracy assessment to evaluate the contribution of red-edge band in terms of the accuracy of classification results across these four schemes. The classification maps compared with the ground reference data indicated the slightly higher accuracy level observed for schemes 2 and 4. The overall accuracies and Kappa coefficients were 97% and 0.95 for scheme-2 and 96.9% and 0.95 for scheme-4, respectively.
Islam, Md Rabiul; Tanaka, Toshihisa; Molla, Md Khademul Islam
2018-05-08
When designing multiclass motor imagery-based brain-computer interface (MI-BCI), a so-called tangent space mapping (TSM) method utilizing the geometric structure of covariance matrices is an effective technique. This paper aims to introduce a method using TSM for finding accurate operational frequency bands related brain activities associated with MI tasks. A multichannel electroencephalogram (EEG) signal is decomposed into multiple subbands, and tangent features are then estimated on each subband. A mutual information analysis-based effective algorithm is implemented to select subbands containing features capable of improving motor imagery classification accuracy. Thus obtained features of selected subbands are combined to get feature space. A principal component analysis-based approach is employed to reduce the features dimension and then the classification is accomplished by a support vector machine (SVM). Offline analysis demonstrates the proposed multiband tangent space mapping with subband selection (MTSMS) approach outperforms state-of-the-art methods. It acheives the highest average classification accuracy for all datasets (BCI competition dataset 2a, IIIa, IIIb, and dataset JK-HH1). The increased classification accuracy of MI tasks with the proposed MTSMS approach can yield effective implementation of BCI. The mutual information-based subband selection method is implemented to tune operation frequency bands to represent actual motor imagery tasks.
NASA Astrophysics Data System (ADS)
Kale, Mandar; Mukhopadhyay, Sudipta; Dash, Jatindra K.; Garg, Mandeep; Khandelwal, Niranjan
2016-03-01
Interstitial lung disease (ILD) is complicated group of pulmonary disorders. High Resolution Computed Tomography (HRCT) considered to be best imaging technique for analysis of different pulmonary disorders. HRCT findings can be categorised in several patterns viz. Consolidation, Emphysema, Ground Glass Opacity, Nodular, Normal etc. based on their texture like appearance. Clinician often find it difficult to diagnosis these pattern because of their complex nature. In such scenario computer-aided diagnosis system could help clinician to identify patterns. Several approaches had been proposed for classification of ILD patterns. This includes computation of textural feature and training /testing of classifier such as artificial neural network (ANN), support vector machine (SVM) etc. In this paper, wavelet features are calculated from two different ILD database, publically available MedGIFT ILD database and private ILD database, followed by performance evaluation of ANN and SVM classifiers in terms of average accuracy. It is found that average classification accuracy by SVM is greater than ANN where trained and tested on same database. Investigation continued further to test variation in accuracy of classifier when training and testing is performed with alternate database and training and testing of classifier with database formed by merging samples from same class from two individual databases. The average classification accuracy drops when two independent databases used for training and testing respectively. There is significant improvement in average accuracy when classifiers are trained and tested with merged database. It infers dependency of classification accuracy on training data. It is observed that SVM outperforms ANN when same database is used for training and testing.
New tools for evaluating LQAS survey designs
2014-01-01
Lot Quality Assurance Sampling (LQAS) surveys have become increasingly popular in global health care applications. Incorporating Bayesian ideas into LQAS survey design, such as using reasonable prior beliefs about the distribution of an indicator, can improve the selection of design parameters and decision rules. In this paper, a joint frequentist and Bayesian framework is proposed for evaluating LQAS classification accuracy and informing survey design parameters. Simple software tools are provided for calculating the positive and negative predictive value of a design with respect to an underlying coverage distribution and the selected design parameters. These tools are illustrated using a data example from two consecutive LQAS surveys measuring Oral Rehydration Solution (ORS) preparation. Using the survey tools, the dependence of classification accuracy on benchmark selection and the width of the ‘grey region’ are clarified in the context of ORS preparation across seven supervision areas. Following the completion of an LQAS survey, estimation of the distribution of coverage across areas facilitates quantifying classification accuracy and can help guide intervention decisions. PMID:24528928
New tools for evaluating LQAS survey designs.
Hund, Lauren
2014-02-15
Lot Quality Assurance Sampling (LQAS) surveys have become increasingly popular in global health care applications. Incorporating Bayesian ideas into LQAS survey design, such as using reasonable prior beliefs about the distribution of an indicator, can improve the selection of design parameters and decision rules. In this paper, a joint frequentist and Bayesian framework is proposed for evaluating LQAS classification accuracy and informing survey design parameters. Simple software tools are provided for calculating the positive and negative predictive value of a design with respect to an underlying coverage distribution and the selected design parameters. These tools are illustrated using a data example from two consecutive LQAS surveys measuring Oral Rehydration Solution (ORS) preparation. Using the survey tools, the dependence of classification accuracy on benchmark selection and the width of the 'grey region' are clarified in the context of ORS preparation across seven supervision areas. Following the completion of an LQAS survey, estimation of the distribution of coverage across areas facilitates quantifying classification accuracy and can help guide intervention decisions.
Analysis of urban area land cover using SEASAT Synthetic Aperture Radar data
NASA Technical Reports Server (NTRS)
Henderson, F. M. (Principal Investigator)
1980-01-01
Digitally processed SEASAT synthetic aperture raar (SAR) imagery of the Denver, Colorado urban area was examined to explore the potential of SAR data for mapping urban land cover and the compatability of SAR derived land cover classes with the United States Geological Survey classification system. The imagery is examined at three different scales to determine the effect of image enlargement on accuracy and level of detail extractable. At each scale the value of employing a simplistic preprocessing smoothing algorithm to improve image interpretation is addressed. A visual interpretation approach and an automated machine/visual approach are employed to evaluate the feasibility of producing a semiautomated land cover classification from SAR data. Confusion matrices of omission and commission errors are employed to define classification accuracies for each interpretation approach and image scale.
NASA Astrophysics Data System (ADS)
Schudlo, Larissa C.; Chau, Tom
2015-12-01
Objective. The majority of near-infrared spectroscopy (NIRS) brain-computer interface (BCI) studies have investigated binary classification problems. Limited work has considered differentiation of more than two mental states, or multi-class differentiation of higher-level cognitive tasks using measurements outside of the anterior prefrontal cortex. Improvements in accuracies are needed to deliver effective communication with a multi-class NIRS system. We investigated the feasibility of a ternary NIRS-BCI that supports mental states corresponding to verbal fluency task (VFT) performance, Stroop task performance, and unconstrained rest using prefrontal and parietal measurements. Approach. Prefrontal and parietal NIRS signals were acquired from 11 able-bodied adults during rest and performance of the VFT or Stroop task. Classification was performed offline using bagging with a linear discriminant base classifier trained on a 10 dimensional feature set. Main results. VFT, Stroop task and rest were classified at an average accuracy of 71.7% ± 7.9%. The ternary classification system provided a statistically significant improvement in information transfer rate relative to a binary system controlled by either mental task (0.87 ± 0.35 bits/min versus 0.73 ± 0.24 bits/min). Significance. These results suggest that effective communication can be achieved with a ternary NIRS-BCI that supports VFT, Stroop task and rest via measurements from the frontal and parietal cortices. Further development of such a system is warranted. Accurate ternary classification can enhance communication rates offered by NIRS-BCIs, improving the practicality of this technology.
2012-01-01
Background While progress has been made to develop automatic segmentation techniques for mitochondria, there remains a need for more accurate and robust techniques to delineate mitochondria in serial blockface scanning electron microscopic data. Previously developed texture based methods are limited for solving this problem because texture alone is often not sufficient to identify mitochondria. This paper presents a new three-step method, the Cytoseg process, for automated segmentation of mitochondria contained in 3D electron microscopic volumes generated through serial block face scanning electron microscopic imaging. The method consists of three steps. The first is a random forest patch classification step operating directly on 2D image patches. The second step consists of contour-pair classification. At the final step, we introduce a method to automatically seed a level set operation with output from previous steps. Results We report accuracy of the Cytoseg process on three types of tissue and compare it to a previous method based on Radon-Like Features. At step 1, we show that the patch classifier identifies mitochondria texture but creates many false positive pixels. At step 2, our contour processing step produces contours and then filters them with a second classification step, helping to improve overall accuracy. We show that our final level set operation, which is automatically seeded with output from previous steps, helps to smooth the results. Overall, our results show that use of contour pair classification and level set operations improve segmentation accuracy beyond patch classification alone. We show that the Cytoseg process performs well compared to another modern technique based on Radon-Like Features. Conclusions We demonstrated that texture based methods for mitochondria segmentation can be enhanced with multiple steps that form an image processing pipeline. While we used a random-forest based patch classifier to recognize texture, it would be possible to replace this with other texture identifiers, and we plan to explore this in future work. PMID:22321695
Characterization and classification of lupus patients based on plasma thermograms
Chaires, Jonathan B.; Mekmaysy, Chongkham S.; DeLeeuw, Lynn; Sivils, Kathy L.; Harley, John B.; Rovin, Brad H.; Kulasekera, K. B.; Jarjour, Wael N.
2017-01-01
Objective Plasma thermograms (thermal stability profiles of blood plasma) are being utilized as a new diagnostic approach for clinical assessment. In this study, we investigated the ability of plasma thermograms to classify systemic lupus erythematosus (SLE) patients versus non SLE controls using a sample of 300 SLE and 300 control subjects from the Lupus Family Registry and Repository. Additionally, we evaluated the heterogeneity of thermograms along age, sex, ethnicity, concurrent health conditions and SLE diagnostic criteria. Methods Thermograms were visualized graphically for important differences between covariates and summarized using various measures. A modified linear discriminant analysis was used to segregate SLE versus control subjects on the basis of the thermograms. Classification accuracy was measured based on multiple training/test splits of the data and compared to classification based on SLE serological markers. Results Median sensitivity, specificity, and overall accuracy based on classification using plasma thermograms was 86%, 83%, and 84% compared to 78%, 95%, and 86% based on a combination of five antibody tests. Combining thermogram and serology information together improved sensitivity from 78% to 86% and overall accuracy from 86% to 89% relative to serology alone. Predictive accuracy of thermograms for distinguishing SLE and osteoarthritis / rheumatoid arthritis patients was comparable. Both gender and anemia significantly interacted with disease status for plasma thermograms (p<0.001), with greater separation between SLE and control thermograms for females relative to males and for patients with anemia relative to patients without anemia. Conclusion Plasma thermograms constitute an additional biomarker which may help improve diagnosis of SLE patients, particularly when coupled with standard diagnostic testing. Differences in thermograms according to patient sex, ethnicity, clinical and environmental factors are important considerations for application of thermograms in a clinical setting. PMID:29149219
ERIC Educational Resources Information Center
Duffrin, Christopher; Eakin, Angela; Bertrand, Brenda; Barber-Heidel, Kimberly; Carraway-Stage, Virginia
2011-01-01
The American College Health Association estimated that 31% of college students are overweight or obese. It is important that students have a correct perception of body weight status as extra weight has potential adverse health effects. This study assessed accuracy of perceived weight status versus medical classification among 102 college students.…
Efficient use of unlabeled data for protein sequence classification: a comparative study.
Kuksa, Pavel; Huang, Pai-Hsi; Pavlovic, Vladimir
2009-04-29
Recent studies in computational primary protein sequence analysis have leveraged the power of unlabeled data. For example, predictive models based on string kernels trained on sequences known to belong to particular folds or superfamilies, the so-called labeled data set, can attain significantly improved accuracy if this data is supplemented with protein sequences that lack any class tags-the unlabeled data. In this study, we present a principled and biologically motivated computational framework that more effectively exploits the unlabeled data by only using the sequence regions that are more likely to be biologically relevant for better prediction accuracy. As overly-represented sequences in large uncurated databases may bias the estimation of computational models that rely on unlabeled data, we also propose a method to remove this bias and improve performance of the resulting classifiers. Combined with state-of-the-art string kernels, our proposed computational framework achieves very accurate semi-supervised protein remote fold and homology detection on three large unlabeled databases. It outperforms current state-of-the-art methods and exhibits significant reduction in running time. The unlabeled sequences used under the semi-supervised setting resemble the unpolished gemstones; when used as-is, they may carry unnecessary features and hence compromise the classification accuracy but once cut and polished, they improve the accuracy of the classifiers considerably.
Detection of epileptic seizure in EEG signals using linear least squares preprocessing.
Roshan Zamir, Z
2016-09-01
An epileptic seizure is a transient event of abnormal excessive neuronal discharge in the brain. This unwanted event can be obstructed by detection of electrical changes in the brain that happen before the seizure takes place. The automatic detection of seizures is necessary since the visual screening of EEG recordings is a time consuming task and requires experts to improve the diagnosis. Much of the prior research in detection of seizures has been developed based on artificial neural network, genetic programming, and wavelet transforms. Although the highest achieved accuracy for classification is 100%, there are drawbacks, such as the existence of unbalanced datasets and the lack of investigations in performances consistency. To address these, four linear least squares-based preprocessing models are proposed to extract key features of an EEG signal in order to detect seizures. The first two models are newly developed. The original signal (EEG) is approximated by a sinusoidal curve. Its amplitude is formed by a polynomial function and compared with the predeveloped spline function. Different statistical measures, namely classification accuracy, true positive and negative rates, false positive and negative rates and precision, are utilised to assess the performance of the proposed models. These metrics are derived from confusion matrices obtained from classifiers. Different classifiers are used over the original dataset and the set of extracted features. The proposed models significantly reduce the dimension of the classification problem and the computational time while the classification accuracy is improved in most cases. The first and third models are promising feature extraction methods with the classification accuracy of 100%. Logistic, LazyIB1, LazyIB5, and J48 are the best classifiers. Their true positive and negative rates are 1 while false positive and negative rates are 0 and the corresponding precision values are 1. Numerical results suggest that these models are robust and efficient for detecting epileptic seizure. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Xia, Jiaqi; Peng, Zhenling; Qi, Dawei; Mu, Hongbo; Yang, Jianyi
2017-03-15
Protein fold classification is a critical step in protein structure prediction. There are two possible ways to classify protein folds. One is through template-based fold assignment and the other is ab-initio prediction using machine learning algorithms. Combination of both solutions to improve the prediction accuracy was never explored before. We developed two algorithms, HH-fold and SVM-fold for protein fold classification. HH-fold is a template-based fold assignment algorithm using the HHsearch program. SVM-fold is a support vector machine-based ab-initio classification algorithm, in which a comprehensive set of features are extracted from three complementary sequence profiles. These two algorithms are then combined, resulting to the ensemble approach TA-fold. We performed a comprehensive assessment for the proposed methods by comparing with ab-initio methods and template-based threading methods on six benchmark datasets. An accuracy of 0.799 was achieved by TA-fold on the DD dataset that consists of proteins from 27 folds. This represents improvement of 5.4-11.7% over ab-initio methods. After updating this dataset to include more proteins in the same folds, the accuracy increased to 0.971. In addition, TA-fold achieved >0.9 accuracy on a large dataset consisting of 6451 proteins from 184 folds. Experiments on the LE dataset show that TA-fold consistently outperforms other threading methods at the family, superfamily and fold levels. The success of TA-fold is attributed to the combination of template-based fold assignment and ab-initio classification using features from complementary sequence profiles that contain rich evolution information. http://yanglab.nankai.edu.cn/TA-fold/. yangjy@nankai.edu.cn or mhb-506@163.com. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Adaptive sleep-wake discrimination for wearable devices.
Karlen, Walter; Floreano, Dario
2011-04-01
Sleep/wake classification systems that rely on physiological signals suffer from intersubject differences that make accurate classification with a single, subject-independent model difficult. To overcome the limitations of intersubject variability, we suggest a novel online adaptation technique that updates the sleep/wake classifier in real time. The objective of the present study was to evaluate the performance of a newly developed adaptive classification algorithm that was embedded on a wearable sleep/wake classification system called SleePic. The algorithm processed ECG and respiratory effort signals for the classification task and applied behavioral measurements (obtained from accelerometer and press-button data) for the automatic adaptation task. When trained as a subject-independent classifier algorithm, the SleePic device was only able to correctly classify 74.94 ± 6.76% of the human-rated sleep/wake data. By using the suggested automatic adaptation method, the mean classification accuracy could be significantly improved to 92.98 ± 3.19%. A subject-independent classifier based on activity data only showed a comparable accuracy of 90.44 ± 3.57%. We demonstrated that subject-independent models used for online sleep-wake classification can successfully be adapted to previously unseen subjects without the intervention of human experts or off-line calibration.
Vidić, Igor; Egnell, Liv; Jerome, Neil P; Teruel, Jose R; Sjøbakk, Torill E; Østlie, Agnes; Fjøsne, Hans E; Bathen, Tone F; Goa, Pål Erik
2018-05-01
Diffusion-weighted MRI (DWI) is currently one of the fastest developing MRI-based techniques in oncology. Histogram properties from model fitting of DWI are useful features for differentiation of lesions, and classification can potentially be improved by machine learning. To evaluate classification of malignant and benign tumors and breast cancer subtypes using support vector machine (SVM). Prospective. Fifty-one patients with benign (n = 23) and malignant (n = 28) breast tumors (26 ER+, whereof six were HER2+). Patients were imaged with DW-MRI (3T) using twice refocused spin-echo echo-planar imaging with echo time / repetition time (TR/TE) = 9000/86 msec, 90 × 90 matrix size, 2 × 2 mm in-plane resolution, 2.5 mm slice thickness, and 13 b-values. Apparent diffusion coefficient (ADC), relative enhanced diffusivity (RED), and the intravoxel incoherent motion (IVIM) parameters diffusivity (D), pseudo-diffusivity (D*), and perfusion fraction (f) were calculated. The histogram properties (median, mean, standard deviation, skewness, kurtosis) were used as features in SVM (10-fold cross-validation) for differentiation of lesions and subtyping. Accuracies of the SVM classifications were calculated to find the combination of features with highest prediction accuracy. Mann-Whitney tests were performed for univariate comparisons. For benign versus malignant tumors, univariate analysis found 11 histogram properties to be significant differentiators. Using SVM, the highest accuracy (0.96) was achieved from a single feature (mean of RED), or from three feature combinations of IVIM or ADC. Combining features from all models gave perfect classification. No single feature predicted HER2 status of ER + tumors (univariate or SVM), although high accuracy (0.90) was achieved with SVM combining several features. Importantly, these features had to include higher-order statistics (kurtosis and skewness), indicating the importance to account for heterogeneity. Our findings suggest that SVM, using features from a combination of diffusion models, improves prediction accuracy for differentiation of benign versus malignant breast tumors, and may further assist in subtyping of breast cancer. 3 Technical Efficacy: Stage 3 J. Magn. Reson. Imaging 2018;47:1205-1216. © 2017 International Society for Magnetic Resonance in Medicine.
NASA Astrophysics Data System (ADS)
Dondurur, Mehmet
The primary objective of this study was to determine the degree to which modern SAR systems can be used to obtain information about the Earth's vegetative resources. Information obtainable from microwave synthetic aperture radar (SAR) data was compared with that obtainable from LANDSAT-TM and SPOT data. Three hypotheses were tested: (a) Classification of land cover/use from SAR data can be accomplished on a pixel-by-pixel basis with the same overall accuracy as from LANDSAT-TM and SPOT data. (b) Classification accuracy for individual land cover/use classes will differ between sensors. (c) Combining information derived from optical and SAR data into an integrated monitoring system will improve overall and individual land cover/use class accuracies. The study was conducted with three data sets for the Sleeping Bear Dunes test site in the northwestern part of Michigan's lower peninsula, including an October 1982 LANDSAT-TM scene, a June 1989 SPOT scene and C-, L- and P-Band radar data from the Jet Propulsion Laboratory AIRSAR. Reference data were derived from the Michigan Resource Information System (MIRIS) and available color infrared aerial photos. Classification and rectification of data sets were done using ERDAS Image Processing Programs. Classification algorithms included Maximum Likelihood, Mahalanobis Distance, Minimum Spectral Distance, ISODATA, Parallelepiped, and Sequential Cluster Analysis. Classified images were rectified as necessary so that all were at the same scale and oriented north-up. Results were analyzed with contingency tables and percent correctly classified (PCC) and Cohen's Kappa (CK) as accuracy indices using CSLANT and ImagePro programs developed for this study. Accuracy analyses were based upon a 1.4 by 6.5 km area with its long axis east-west. Reference data for this subscene total 55,770 15 by 15 m pixels with sixteen cover types, including seven level III forest classes, three level III urban classes, two level II range classes, two water classes, one wetland class and one agriculture class. An initial analysis was made without correcting the 1978 MIRIS reference data to the different dates of the TM, SPOT and SAR data sets. In this analysis, highest overall classification accuracy (PCC) was 87% with the TM data set, with both SPOT and C-Band SAR at 85%, a difference statistically significant at the 0.05 level. When the reference data were corrected for land cover change between 1978 and 1991, classification accuracy with the C-Band SAR data increased to 87%. Classification accuracy differed from sensor to sensor for individual land cover classes, Combining sensors into hypothetical multi-sensor systems resulted in higher accuracies than for any single sensor. Combining LANDSAT -TM and C-Band SAR yielded an overall classification accuracy (PCC) of 92%. The results of this study indicate that C-Band SAR data provide an acceptable substitute for LANDSAT-TM or SPOT data when land cover information is desired of areas where cloud cover obscures the terrain. Even better results can be obtained by integrating TM and C-Band SAR data into a multi-sensor system.
Variance approximations for assessments of classification accuracy
R. L. Czaplewski
1994-01-01
Variance approximations are derived for the weighted and unweighted kappa statistics, the conditional kappa statistic, and conditional probabilities. These statistics are useful to assess classification accuracy, such as accuracy of remotely sensed classifications in thematic maps when compared to a sample of reference classifications made in the field. Published...
Implementing Legacy-C Algorithms in FPGA Co-Processors for Performance Accelerated Smart Payloads
NASA Technical Reports Server (NTRS)
Pingree, Paula J.; Scharenbroich, Lucas J.; Werne, Thomas A.; Hartzell, Christine
2008-01-01
Accurate, on-board classification of instrument data is used to increase science return by autonomously identifying regions of interest for priority transmission or generating summary products to conserve transmission bandwidth. Due to on-board processing constraints, such classification has been limited to using the simplest functions on a small subset of the full instrument data. FPGA co-processor designs for SVM1 classifiers will lead to significant improvement in on-board classification capability and accuracy.
NASA Astrophysics Data System (ADS)
Starkey, Andrew; Usman Ahmad, Aliyu; Hamdoun, Hassan
2017-10-01
This paper investigates the application of a novel method for classification called Feature Weighted Self Organizing Map (FWSOM) that analyses the topology information of a converged standard Self Organizing Map (SOM) to automatically guide the selection of important inputs during training for improved classification of data with redundant inputs, examined against two traditional approaches namely neural networks and Support Vector Machines (SVM) for the classification of EEG data as presented in previous work. In particular, the novel method looks to identify the features that are important for classification automatically, and in this way the important features can be used to improve the diagnostic ability of any of the above methods. The paper presents the results and shows how the automated identification of the important features successfully identified the important features in the dataset and how this results in an improvement of the classification results for all methods apart from linear discriminatory methods which cannot separate the underlying nonlinear relationship in the data. The FWSOM in addition to achieving higher classification accuracy has given insights into what features are important in the classification of each class (left and right-hand movements), and these are corroborated by already published work in this area.
Liu, Aiming; Liu, Quan; Ai, Qingsong; Xie, Yi; Chen, Anqi
2017-01-01
Motor Imagery (MI) electroencephalography (EEG) is widely studied for its non-invasiveness, easy availability, portability, and high temporal resolution. As for MI EEG signal processing, the high dimensions of features represent a research challenge. It is necessary to eliminate redundant features, which not only create an additional overhead of managing the space complexity, but also might include outliers, thereby reducing classification accuracy. The firefly algorithm (FA) can adaptively select the best subset of features, and improve classification accuracy. However, the FA is easily entrapped in a local optimum. To solve this problem, this paper proposes a method of combining the firefly algorithm and learning automata (LA) to optimize feature selection for motor imagery EEG. We employed a method of combining common spatial pattern (CSP) and local characteristic-scale decomposition (LCD) algorithms to obtain a high dimensional feature set, and classified it by using the spectral regression discriminant analysis (SRDA) classifier. Both the fourth brain–computer interface competition data and real-time data acquired in our designed experiments were used to verify the validation of the proposed method. Compared with genetic and adaptive weight particle swarm optimization algorithms, the experimental results show that our proposed method effectively eliminates redundant features, and improves the classification accuracy of MI EEG signals. In addition, a real-time brain–computer interface system was implemented to verify the feasibility of our proposed methods being applied in practical brain–computer interface systems. PMID:29117100
Liu, Aiming; Chen, Kun; Liu, Quan; Ai, Qingsong; Xie, Yi; Chen, Anqi
2017-11-08
Motor Imagery (MI) electroencephalography (EEG) is widely studied for its non-invasiveness, easy availability, portability, and high temporal resolution. As for MI EEG signal processing, the high dimensions of features represent a research challenge. It is necessary to eliminate redundant features, which not only create an additional overhead of managing the space complexity, but also might include outliers, thereby reducing classification accuracy. The firefly algorithm (FA) can adaptively select the best subset of features, and improve classification accuracy. However, the FA is easily entrapped in a local optimum. To solve this problem, this paper proposes a method of combining the firefly algorithm and learning automata (LA) to optimize feature selection for motor imagery EEG. We employed a method of combining common spatial pattern (CSP) and local characteristic-scale decomposition (LCD) algorithms to obtain a high dimensional feature set, and classified it by using the spectral regression discriminant analysis (SRDA) classifier. Both the fourth brain-computer interface competition data and real-time data acquired in our designed experiments were used to verify the validation of the proposed method. Compared with genetic and adaptive weight particle swarm optimization algorithms, the experimental results show that our proposed method effectively eliminates redundant features, and improves the classification accuracy of MI EEG signals. In addition, a real-time brain-computer interface system was implemented to verify the feasibility of our proposed methods being applied in practical brain-computer interface systems.
A robust data scaling algorithm to improve classification accuracies in biomedical data.
Cao, Xi Hang; Stojkovic, Ivan; Obradovic, Zoran
2016-09-09
Machine learning models have been adapted in biomedical research and practice for knowledge discovery and decision support. While mainstream biomedical informatics research focuses on developing more accurate models, the importance of data preprocessing draws less attention. We propose the Generalized Logistic (GL) algorithm that scales data uniformly to an appropriate interval by learning a generalized logistic function to fit the empirical cumulative distribution function of the data. The GL algorithm is simple yet effective; it is intrinsically robust to outliers, so it is particularly suitable for diagnostic/classification models in clinical/medical applications where the number of samples is usually small; it scales the data in a nonlinear fashion, which leads to potential improvement in accuracy. To evaluate the effectiveness of the proposed algorithm, we conducted experiments on 16 binary classification tasks with different variable types and cover a wide range of applications. The resultant performance in terms of area under the receiver operation characteristic curve (AUROC) and percentage of correct classification showed that models learned using data scaled by the GL algorithm outperform the ones using data scaled by the Min-max and the Z-score algorithm, which are the most commonly used data scaling algorithms. The proposed GL algorithm is simple and effective. It is robust to outliers, so no additional denoising or outlier detection step is needed in data preprocessing. Empirical results also show models learned from data scaled by the GL algorithm have higher accuracy compared to the commonly used data scaling algorithms.
Using spectrotemporal indices to improve the fruit-tree crop classification accuracy
NASA Astrophysics Data System (ADS)
Peña, M. A.; Liao, R.; Brenning, A.
2017-06-01
This study assesses the potential of spectrotemporal indices derived from satellite image time series (SITS) to improve the classification accuracy of fruit-tree crops. Six major fruit-tree crop types in the Aconcagua Valley, Chile, were classified by applying various linear discriminant analysis (LDA) techniques on a Landsat-8 time series of nine images corresponding to the 2014-15 growing season. As features we not only used the complete spectral resolution of the SITS, but also all possible normalized difference indices (NDIs) that can be constructed from any two bands of the time series, a novel approach to derive features from SITS. Due to the high dimensionality of this "enhanced" feature set we used the lasso and ridge penalized variants of LDA (PLDA). Although classification accuracies yielded by the standard LDA applied on the full-band SITS were good (misclassification error rate, MER = 0.13), they were further improved by 23% (MER = 0.10) with ridge PLDA using the enhanced feature set. The most important bands to discriminate the crops of interest were mainly concentrated on the first two image dates of the time series, corresponding to the crops' greenup stage. Despite the high predictor weights provided by the red and near infrared bands, typically used to construct greenness spectral indices, other spectral regions were also found important for the discrimination, such as the shortwave infrared band at 2.11-2.19 μm, sensitive to foliar water changes. These findings support the usefulness of spectrotemporal indices in the context of SITS-based crop type classifications, which until now have been mainly constructed by the arithmetic combination of two bands of the same image date in order to derive greenness temporal profiles like those from the normalized difference vegetation index.
NASA Astrophysics Data System (ADS)
Mohammadimanesh, F.; Salehi, B.; Mahdianpari, M.; Homayouni, S.
2016-06-01
Polarimetric Synthetic Aperture Radar (PolSAR) imagery is a complex multi-dimensional dataset, which is an important source of information for various natural resources and environmental classification and monitoring applications. PolSAR imagery produces valuable information by observing scattering mechanisms from different natural and man-made objects. Land cover mapping using PolSAR data classification is one of the most important applications of SAR remote sensing earth observations, which have gained increasing attention in the recent years. However, one of the most challenging aspects of classification is selecting features with maximum discrimination capability. To address this challenge, a statistical approach based on the Fisher Linear Discriminant Analysis (FLDA) and the incorporation of physical interpretation of PolSAR data into classification is proposed in this paper. After pre-processing of PolSAR data, including the speckle reduction, the H/α classification is used in order to classify the basic scattering mechanisms. Then, a new method for feature weighting, based on the fusion of FLDA and physical interpretation, is implemented. This method proves to increase the classification accuracy as well as increasing between-class discrimination in the final Wishart classification. The proposed method was applied to a full polarimetric C-band RADARSAT-2 data set from Avalon area, Newfoundland and Labrador, Canada. This imagery has been acquired in June 2015, and covers various types of wetlands including bogs, fens, marshes and shallow water. The results were compared with the standard Wishart classification, and an improvement of about 20% was achieved in the overall accuracy. This method provides an opportunity for operational wetland classification in northern latitude with high accuracy using only SAR polarimetric data.
NASA Astrophysics Data System (ADS)
Yang, He; Ma, Ben; Du, Qian; Yang, Chenghai
2010-08-01
In this paper, we propose approaches to improve the pixel-based support vector machine (SVM) classification for urban land use and land cover (LULC) mapping from airborne hyperspectral imagery with high spatial resolution. Class spatial neighborhood relationship is used to correct the misclassified class pairs, such as roof and trail, road and roof. These classes may be difficult to be separated because they may have similar spectral signatures and their spatial features are not distinct enough to help their discrimination. In addition, misclassification incurred from within-class trivial spectral variation can be corrected by using pixel connectivity information in a local window so that spectrally homogeneous regions can be well preserved. Our experimental results demonstrate the efficiency of the proposed approaches in classification accuracy improvement. The overall performance is competitive to the object-based SVM classification.
Iacucci, Marietta; Trovato, Cristina; Daperno, Marco; Akinola, Oluseyi; Greenwald, David; Gross, Seth A; Hoffman, Arthur; Lee, Jeffrey; Lethebe, Brendan C; Lowerison, Mark; Nayor, Jennifer; Neumann, Helmut; Rath, Timo; Sanduleanu, Silvia; Sharma, Prateek; Kiesslich, Ralf; Ghosh, Subrata; Saltzman, John R
2018-03-23
Prediction of histology of small polyps facilitates colonoscopic treatment. The aims of this study were: 1) to develop a simplified polyp classification, 2) to evaluate its performance in predicting polyp histology, and 3) to evaluate the reproducibility of the classification by trainees using multiplatform endoscopic systems. In phase 1, a new simplified endoscopic classification for polyps - Simplified Identification Method for Polyp Labeling during Endoscopy (SIMPLE) - was created, using the new I-SCAN OE system (Pentax, Tokyo, Japan), by eight international experts. In phase 2, the accuracy, level of confidence, and interobserver agreement to predict polyp histology before and after training, and univariable/multivariable analysis of the endoscopic features, were performed. In phase 3, the reproducibility of SIMPLE by trainees using different endoscopy platforms was evaluated. Using the SIMPLE classification, the accuracy of experts in predicting polyps was 83 % (95 % confidence interval [CI] 77 % - 88 %) before and 94 % (95 %CI 89 % - 97 %) after training ( P = 0.002). The sensitivity, specificity, positive predictive value, and negative predictive value after training were 97 %, 88 %, 95 %, and 91 %. The interobserver agreement of polyp diagnosis improved from 0.46 (95 %CI 0.30 - 0.64) before to 0.66 (95 %CI 0.48 - 0.82) after training. The trainees demonstrated that the SIMPLE classification is applicable across endoscopy platforms, with similar post-training accuracies for narrow-band imaging NBI classification (0.69; 95 %CI 0.64 - 0.73) and SIMPLE (0.71; 95 %CI 0.67 - 0.75). Using the I-SCAN OE system, the new SIMPLE classification demonstrated a high degree of accuracy for adenoma diagnosis, meeting the ASGE PIVI recommendations. We demonstrated that SIMPLE may be used with either I-SCAN OE or NBI. © Georg Thieme Verlag KG Stuttgart · New York.
Approximated mutual information training for speech recognition using myoelectric signals.
Guo, Hua J; Chan, A D C
2006-01-01
A new training algorithm called the approximated maximum mutual information (AMMI) is proposed to improve the accuracy of myoelectric speech recognition using hidden Markov models (HMMs). Previous studies have demonstrated that automatic speech recognition can be performed using myoelectric signals from articulatory muscles of the face. Classification of facial myoelectric signals can be performed using HMMs that are trained using the maximum likelihood (ML) algorithm; however, this algorithm maximizes the likelihood of the observations in the training sequence, which is not directly associated with optimal classification accuracy. The AMMI training algorithm attempts to maximize the mutual information, thereby training the HMMs to optimize their parameters for discrimination. Our results show that AMMI training consistently reduces the error rates compared to these by the ML training, increasing the accuracy by approximately 3% on average.
Renjith, Arokia; Manjula, P; Mohan Kumar, P
2015-01-01
Brain tumour is one of the main causes for an increase in transience among children and adults. This paper proposes an improved method based on Magnetic Resonance Imaging (MRI) brain image classification and image segmentation approach. Automated classification is encouraged by the need of high accuracy when dealing with a human life. The detection of the brain tumour is a challenging problem, due to high diversity in tumour appearance and ambiguous tumour boundaries. MRI images are chosen for detection of brain tumours, as they are used in soft tissue determinations. First of all, image pre-processing is used to enhance the image quality. Second, dual-tree complex wavelet transform multi-scale decomposition is used to analyse texture of an image. Feature extraction extracts features from an image using gray-level co-occurrence matrix (GLCM). Then, the Neuro-Fuzzy technique is used to classify the stages of brain tumour as benign, malignant or normal based on texture features. Finally, tumour location is detected using Otsu thresholding. The classifier performance is evaluated based on classification accuracies. The simulated results show that the proposed classifier provides better accuracy than previous method.
Identification of Anisomerous Motor Imagery EEG Signals Based on Complex Algorithms
Zhang, Zhiwen; Duan, Feng; Zhou, Xin; Meng, Zixuan
2017-01-01
Motor imagery (MI) electroencephalograph (EEG) signals are widely applied in brain-computer interface (BCI). However, classified MI states are limited, and their classification accuracy rates are low because of the characteristics of nonlinearity and nonstationarity. This study proposes a novel MI pattern recognition system that is based on complex algorithms for classifying MI EEG signals. In electrooculogram (EOG) artifact preprocessing, band-pass filtering is performed to obtain the frequency band of MI-related signals, and then, canonical correlation analysis (CCA) combined with wavelet threshold denoising (WTD) is used for EOG artifact preprocessing. We propose a regularized common spatial pattern (R-CSP) algorithm for EEG feature extraction by incorporating the principle of generic learning. A new classifier combining the K-nearest neighbor (KNN) and support vector machine (SVM) approaches is used to classify four anisomerous states, namely, imaginary movements with the left hand, right foot, and right shoulder and the resting state. The highest classification accuracy rate is 92.5%, and the average classification accuracy rate is 87%. The proposed complex algorithm identification method can significantly improve the identification rate of the minority samples and the overall classification performance. PMID:28874909
NASA Astrophysics Data System (ADS)
de Oliveira Silveira, Eduarda Martiniano; de Menezes, Michele Duarte; Acerbi Júnior, Fausto Weimar; Castro Nunes Santos Terra, Marcela; de Mello, José Márcio
2017-07-01
Accurate mapping and monitoring of savanna and semiarid woodland biomes are needed to support the selection of areas of conservation, to provide sustainable land use, and to improve the understanding of vegetation. The potential of geostatistical features, derived from medium spatial resolution satellite imagery, to characterize contrasted landscape vegetation cover and improve object-based image classification is studied. The study site in Brazil includes cerrado sensu stricto, deciduous forest, and palm swamp vegetation cover. Sentinel 2 and Landsat 8 images were acquired and divided into objects, for each of which a semivariogram was calculated using near-infrared (NIR) and normalized difference vegetation index (NDVI) to extract the set of geostatistical features. The features selected by principal component analysis were used as input data to train a random forest algorithm. Tests were conducted, combining spectral and geostatistical features. Change detection evaluation was performed using a confusion matrix and its accuracies. The semivariogram curves were efficient to characterize spatial heterogeneity, with similar results using NIR and NDVI from Sentinel 2 and Landsat 8. Accuracy was significantly greater when combining geostatistical features with spectral data, suggesting that this method can improve image classification results.
Comparison of Feature Selection Techniques in Machine Learning for Anatomical Brain MRI in Dementia.
Tohka, Jussi; Moradi, Elaheh; Huttunen, Heikki
2016-07-01
We present a comparative split-half resampling analysis of various data driven feature selection and classification methods for the whole brain voxel-based classification analysis of anatomical magnetic resonance images. We compared support vector machines (SVMs), with or without filter based feature selection, several embedded feature selection methods and stability selection. While comparisons of the accuracy of various classification methods have been reported previously, the variability of the out-of-training sample classification accuracy and the set of selected features due to independent training and test sets have not been previously addressed in a brain imaging context. We studied two classification problems: 1) Alzheimer's disease (AD) vs. normal control (NC) and 2) mild cognitive impairment (MCI) vs. NC classification. In AD vs. NC classification, the variability in the test accuracy due to the subject sample did not vary between different methods and exceeded the variability due to different classifiers. In MCI vs. NC classification, particularly with a large training set, embedded feature selection methods outperformed SVM-based ones with the difference in the test accuracy exceeding the test accuracy variability due to the subject sample. The filter and embedded methods produced divergent feature patterns for MCI vs. NC classification that suggests the utility of the embedded feature selection for this problem when linked with the good generalization performance. The stability of the feature sets was strongly correlated with the number of features selected, weakly correlated with the stability of classification accuracy, and uncorrelated with the average classification accuracy.
ERIC Educational Resources Information Center
Wang, Wenyi; Song, Lihong; Chen, Ping; Meng, Yaru; Ding, Shuliang
2015-01-01
Classification consistency and accuracy are viewed as important indicators for evaluating the reliability and validity of classification results in cognitive diagnostic assessment (CDA). Pattern-level classification consistency and accuracy indices were introduced by Cui, Gierl, and Chang. However, the indices at the attribute level have not yet…
Best Merge Region Growing with Integrated Probabilistic Classification for Hyperspectral Imagery
NASA Technical Reports Server (NTRS)
Tarabalka, Yuliya; Tilton, James C.
2011-01-01
A new method for spectral-spatial classification of hyperspectral images is proposed. The method is based on the integration of probabilistic classification within the hierarchical best merge region growing algorithm. For this purpose, preliminary probabilistic support vector machines classification is performed. Then, hierarchical step-wise optimization algorithm is applied, by iteratively merging regions with the smallest Dissimilarity Criterion (DC). The main novelty of this method consists in defining a DC between regions as a function of region statistical and geometrical features along with classification probabilities. Experimental results are presented on a 200-band AVIRIS image of the Northwestern Indiana s vegetation area and compared with those obtained by recently proposed spectral-spatial classification techniques. The proposed method improves classification accuracies when compared to other classification approaches.
Hripcsak, George; Knirsch, Charles; Zhou, Li; Wilcox, Adam; Melton, Genevieve B
2007-03-01
Data mining in electronic medical records may facilitate clinical research, but much of the structured data may be miscoded, incomplete, or non-specific. The exploitation of narrative data using natural language processing may help, although nesting, varying granularity, and repetition remain challenges. In a study of community-acquired pneumonia using electronic records, these issues led to poor classification. Limiting queries to accurate, complete records led to vastly reduced, possibly biased samples. We exploited knowledge latent in the electronic records to improve classification. A similarity metric was used to cluster cases. We defined discordance as the degree to which cases within a cluster give different answers for some query that addresses a classification task of interest. Cases with higher discordance are more likely to be incorrectly classified, and can be reviewed manually to adjust the classification, improve the query, or estimate the likely accuracy of the query. In a study of pneumonia--in which the ICD9-CM coding was found to be very poor--the discordance measure was statistically significantly correlated with classification correctness (.45; 95% CI .15-.62).
Xu, Xiayu; Ding, Wenxiang; Abràmoff, Michael D; Cao, Ruofan
2017-04-01
Retinal artery and vein classification is an important task for the automatic computer-aided diagnosis of various eye diseases and systemic diseases. This paper presents an improved supervised artery and vein classification method in retinal image. Intra-image regularization and inter-subject normalization is applied to reduce the differences in feature space. Novel features, including first-order and second-order texture features, are utilized to capture the discriminating characteristics of arteries and veins. The proposed method was tested on the DRIVE dataset and achieved an overall accuracy of 0.923. This retinal artery and vein classification algorithm serves as a potentially important tool for the early diagnosis of various diseases, including diabetic retinopathy and cardiovascular diseases. Copyright © 2017 Elsevier B.V. All rights reserved.
Xia, Wenjun; Mita, Yoshio; Shibata, Tadashi
2016-05-01
Aiming at efficient data condensation and improving accuracy, this paper presents a hardware-friendly template reduction (TR) method for the nearest neighbor (NN) classifiers by introducing the concept of critical boundary vectors. A hardware system is also implemented to demonstrate the feasibility of using an field-programmable gate array (FPGA) to accelerate the proposed method. Initially, k -means centers are used as substitutes for the entire template set. Then, to enhance the classification performance, critical boundary vectors are selected by a novel learning algorithm, which is completed within a single iteration. Moreover, to remove noisy boundary vectors that can mislead the classification in a generalized manner, a global categorization scheme has been explored and applied to the algorithm. The global characterization automatically categorizes each classification problem and rapidly selects the boundary vectors according to the nature of the problem. Finally, only critical boundary vectors and k -means centers are used as the new template set for classification. Experimental results for 24 data sets show that the proposed algorithm can effectively reduce the number of template vectors for classification with a high learning speed. At the same time, it improves the accuracy by an average of 2.17% compared with the traditional NN classifiers and also shows greater accuracy than seven other TR methods. We have shown the feasibility of using a proof-of-concept FPGA system of 256 64-D vectors to accelerate the proposed method on hardware. At a 50-MHz clock frequency, the proposed system achieves a 3.86 times higher learning speed than on a 3.4-GHz PC, while consuming only 1% of the power of that used by the PC.
Do pre-trained deep learning models improve computer-aided classification of digital mammograms?
NASA Astrophysics Data System (ADS)
Aboutalib, Sarah S.; Mohamed, Aly A.; Zuley, Margarita L.; Berg, Wendie A.; Luo, Yahong; Wu, Shandong
2018-02-01
Digital mammography screening is an important exam for the early detection of breast cancer and reduction in mortality. False positives leading to high recall rates, however, results in unnecessary negative consequences to patients and health care systems. In order to better aid radiologists, computer-aided tools can be utilized to improve distinction between image classifications and thus potentially reduce false recalls. The emergence of deep learning has shown promising results in the area of biomedical imaging data analysis. This study aimed to investigate deep learning and transfer learning methods that can improve digital mammography classification performance. In particular, we evaluated the effect of pre-training deep learning models with other imaging datasets in order to boost classification performance on a digital mammography dataset. Two types of datasets were used for pre-training: (1) a digitized film mammography dataset, and (2) a very large non-medical imaging dataset. By using either of these datasets to pre-train the network initially, and then fine-tuning with the digital mammography dataset, we found an increase in overall classification performance in comparison to a model without pre-training, with the very large non-medical dataset performing the best in improving the classification accuracy.
Muthu Rama Krishnan, M; Shah, Pratik; Chakraborty, Chandan; Ray, Ajoy K
2012-04-01
The objective of this paper is to provide an improved technique, which can assist oncopathologists in correct screening of oral precancerous conditions specially oral submucous fibrosis (OSF) with significant accuracy on the basis of collagen fibres in the sub-epithelial connective tissue. The proposed scheme is composed of collagen fibres segmentation, its textural feature extraction and selection, screening perfomance enhancement under Gaussian transformation and finally classification. In this study, collagen fibres are segmented on R,G,B color channels using back-probagation neural network from 60 normal and 59 OSF histological images followed by histogram specification for reducing the stain intensity variation. Henceforth, textural features of collgen area are extracted using fractal approaches viz., differential box counting and brownian motion curve . Feature selection is done using Kullback-Leibler (KL) divergence criterion and the screening performance is evaluated based on various statistical tests to conform Gaussian nature. Here, the screening performance is enhanced under Gaussian transformation of the non-Gaussian features using hybrid distribution. Moreover, the routine screening is designed based on two statistical classifiers viz., Bayesian classification and support vector machines (SVM) to classify normal and OSF. It is observed that SVM with linear kernel function provides better classification accuracy (91.64%) as compared to Bayesian classifier. The addition of fractal features of collagen under Gaussian transformation improves Bayesian classifier's performance from 80.69% to 90.75%. Results are here studied and discussed.
Accurate label-free 3-part leukocyte recognition with single cell lens-free imaging flow cytometry.
Li, Yuqian; Cornelis, Bruno; Dusa, Alexandra; Vanmeerbeeck, Geert; Vercruysse, Dries; Sohn, Erik; Blaszkiewicz, Kamil; Prodanov, Dimiter; Schelkens, Peter; Lagae, Liesbet
2018-05-01
Three-part white blood cell differentials which are key to routine blood workups are typically performed in centralized laboratories on conventional hematology analyzers operated by highly trained staff. With the trend of developing miniaturized blood analysis tool for point-of-need in order to accelerate turnaround times and move routine blood testing away from centralized facilities on the rise, our group has developed a highly miniaturized holographic imaging system for generating lens-free images of white blood cells in suspension. Analysis and classification of its output data, constitutes the final crucial step ensuring appropriate accuracy of the system. In this work, we implement reference holographic images of single white blood cells in suspension, in order to establish an accurate ground truth to increase classification accuracy. We also automate the entire workflow for analyzing the output and demonstrate clear improvement in the accuracy of the 3-part classification. High-dimensional optical and morphological features are extracted from reconstructed digital holograms of single cells using the ground-truth images and advanced machine learning algorithms are investigated and implemented to obtain 99% classification accuracy. Representative features of the three white blood cell subtypes are selected and give comparable results, with a focus on rapid cell recognition and decreased computational cost. Copyright © 2018 The Authors. Published by Elsevier Ltd.. All rights reserved.
NASA Astrophysics Data System (ADS)
Samsudin, Sarah Hanim; Shafri, Helmi Z. M.; Hamedianfar, Alireza
2016-04-01
Status observations of roofing material degradation are constantly evolving due to urban feature heterogeneities. Although advanced classification techniques have been introduced to improve within-class impervious surface classifications, these techniques involve complex processing and high computation times. This study integrates field spectroscopy and satellite multispectral remote sensing data to generate degradation status maps of concrete and metal roofing materials. Field spectroscopy data were used as bases for selecting suitable bands for spectral index development because of the limited number of multispectral bands. Mapping methods for roof degradation status were established for metal and concrete roofing materials by developing the normalized difference concrete condition index (NDCCI) and the normalized difference metal condition index (NDMCI). Results indicate that the accuracies achieved using the spectral indices are higher than those obtained using supervised pixel-based classification. The NDCCI generated an accuracy of 84.44%, whereas the support vector machine (SVM) approach yielded an accuracy of 73.06%. The NDMCI obtained an accuracy of 94.17% compared with 62.5% for the SVM approach. These findings support the suitability of the developed spectral index methods for determining roof degradation statuses from satellite observations in heterogeneous urban environments.
Classification of Clouds in Satellite Imagery Using Adaptive Fuzzy Sparse Representation
Jin, Wei; Gong, Fei; Zeng, Xingbin; Fu, Randi
2016-01-01
Automatic cloud detection and classification using satellite cloud imagery have various meteorological applications such as weather forecasting and climate monitoring. Cloud pattern analysis is one of the research hotspots recently. Since satellites sense the clouds remotely from space, and different cloud types often overlap and convert into each other, there must be some fuzziness and uncertainty in satellite cloud imagery. Satellite observation is susceptible to noises, while traditional cloud classification methods are sensitive to noises and outliers; it is hard for traditional cloud classification methods to achieve reliable results. To deal with these problems, a satellite cloud classification method using adaptive fuzzy sparse representation-based classification (AFSRC) is proposed. Firstly, by defining adaptive parameters related to attenuation rate and critical membership, an improved fuzzy membership is introduced to accommodate the fuzziness and uncertainty of satellite cloud imagery; secondly, by effective combination of the improved fuzzy membership function and sparse representation-based classification (SRC), atoms in training dictionary are optimized; finally, an adaptive fuzzy sparse representation classifier for cloud classification is proposed. Experiment results on FY-2G satellite cloud image show that, the proposed method not only improves the accuracy of cloud classification, but also has strong stability and adaptability with high computational efficiency. PMID:27999261
NASA Astrophysics Data System (ADS)
Wang, Z.; Wu, J.; Wang, Y.; Kong, X.; Bao, H.; Ni, Y.; Ma, L.; Jin, J.
2018-05-01
Mapping tree species is essential for sustainable planning as well as to improve our understanding of the role of different trees as different ecological service. However, crown-level tree species automatic classification is a challenging task due to the spectral similarity among diversified tree species, fine-scale spatial variation, shadow, and underlying objects within a crown. Advanced remote sensing data such as airborne Light Detection and Ranging (LiDAR) and hyperspectral imagery offer a great potential opportunity to derive crown spectral, structure and canopy physiological information at the individual crown scale, which can be useful for mapping tree species. In this paper, an innovative approach was developed for tree species classification at the crown level. The method utilized LiDAR data for individual tree crown delineation and morphological structure extraction, and Compact Airborne Spectrographic Imager (CASI) hyperspectral imagery for pure crown-scale spectral extraction. Specifically, four steps were include: 1) A weighted mean filtering method was developed to improve the accuracy of the smoothed Canopy Height Model (CHM) derived from LiDAR data; 2) The marker-controlled watershed segmentation algorithm was, therefore, also employed to delineate the tree-level canopy from the CHM image in this study, and then individual tree height and tree crown were calculated according to the delineated crown; 3) Spectral features within 3 × 3 neighborhood regions centered on the treetops detected by the treetop detection algorithm were derived from the spectrally normalized CASI imagery; 4) The shape characteristics related to their crown diameters and heights were established, and different crown-level tree species were classified using the combination of spectral and shape characteristics. Analysis of results suggests that the developed classification strategy in this paper (OA = 85.12 %, Kc = 0.90) performed better than LiDAR-metrics method (OA = 79.86 %, Kc = 0.81) and spectral-metircs method (OA = 71.26, Kc = 0.69) in terms of classification accuracy, which indicated that the advanced method of data processing and sensitive feature selection are critical for improving the accuracy of crown-level tree species classification.
Efficient alignment-free DNA barcode analytics
Kuksa, Pavel; Pavlovic, Vladimir
2009-01-01
Background In this work we consider barcode DNA analysis problems and address them using alternative, alignment-free methods and representations which model sequences as collections of short sequence fragments (features). The methods use fixed-length representations (spectrum) for barcode sequences to measure similarities or dissimilarities between sequences coming from the same or different species. The spectrum-based representation not only allows for accurate and computationally efficient species classification, but also opens possibility for accurate clustering analysis of putative species barcodes and identification of critical within-barcode loci distinguishing barcodes of different sample groups. Results New alignment-free methods provide highly accurate and fast DNA barcode-based identification and classification of species with substantial improvements in accuracy and speed over state-of-the-art barcode analysis methods. We evaluate our methods on problems of species classification and identification using barcodes, important and relevant analytical tasks in many practical applications (adverse species movement monitoring, sampling surveys for unknown or pathogenic species identification, biodiversity assessment, etc.) On several benchmark barcode datasets, including ACG, Astraptes, Hesperiidae, Fish larvae, and Birds of North America, proposed alignment-free methods considerably improve prediction accuracy compared to prior results. We also observe significant running time improvements over the state-of-the-art methods. Conclusion Our results show that newly developed alignment-free methods for DNA barcoding can efficiently and with high accuracy identify specimens by examining only few barcode features, resulting in increased scalability and interpretability of current computational approaches to barcoding. PMID:19900305
Assessments of SENTINEL-2 Vegetation Red-Edge Spectral Bands for Improving Land Cover Classification
NASA Astrophysics Data System (ADS)
Qiu, S.; He, B.; Yin, C.; Liao, Z.
2017-09-01
The Multi Spectral Instrument (MSI) onboard Sentinel-2 can record the information in Vegetation Red-Edge (VRE) spectral domains. In this study, the performance of the VRE bands on improving land cover classification was evaluated based on a Sentinel-2A MSI image in East Texas, USA. Two classification scenarios were designed by excluding and including the VRE bands. A Random Forest (RF) classifier was used to generate land cover maps and evaluate the contributions of different spectral bands. The combination of VRE bands increased the overall classification accuracy by 1.40 %, which was statistically significant. Both confusion matrices and land cover maps indicated that the most beneficial increase was from vegetation-related land cover types, especially agriculture. Comparison of the relative importance of each band showed that the most beneficial VRE bands were Band 5 and Band 6. These results demonstrated the value of VRE bands for land cover classification.
Improving Generalization Based on l1-Norm Regularization for EEG-Based Motor Imagery Classification
Zhao, Yuwei; Han, Jiuqi; Chen, Yushu; Sun, Hongji; Chen, Jiayun; Ke, Ang; Han, Yao; Zhang, Peng; Zhang, Yi; Zhou, Jin; Wang, Changyong
2018-01-01
Multichannel electroencephalography (EEG) is widely used in typical brain-computer interface (BCI) systems. In general, a number of parameters are essential for a EEG classification algorithm due to redundant features involved in EEG signals. However, the generalization of the EEG method is often adversely affected by the model complexity, considerably coherent with its number of undetermined parameters, further leading to heavy overfitting. To decrease the complexity and improve the generalization of EEG method, we present a novel l1-norm-based approach to combine the decision value obtained from each EEG channel directly. By extracting the information from different channels on independent frequency bands (FB) with l1-norm regularization, the method proposed fits the training data with much less parameters compared to common spatial pattern (CSP) methods in order to reduce overfitting. Moreover, an effective and efficient solution to minimize the optimization object is proposed. The experimental results on dataset IVa of BCI competition III and dataset I of BCI competition IV show that, the proposed method contributes to high classification accuracy and increases generalization performance for the classification of MI EEG. As the training set ratio decreases from 80 to 20%, the average classification accuracy on the two datasets changes from 85.86 and 86.13% to 84.81 and 76.59%, respectively. The classification performance and generalization of the proposed method contribute to the practical application of MI based BCI systems. PMID:29867307
Liu, Yu; Xia, Jun; Shi, Chun-Xiang; Hong, Yang
2009-01-01
The crowning objective of this research was to identify a better cloud classification method to upgrade the current window-based clustering algorithm used operationally for China’s first operational geostationary meteorological satellite FengYun-2C (FY-2C) data. First, the capabilities of six widely-used Artificial Neural Network (ANN) methods are analyzed, together with the comparison of two other methods: Principal Component Analysis (PCA) and a Support Vector Machine (SVM), using 2864 cloud samples manually collected by meteorologists in June, July, and August in 2007 from three FY-2C channel (IR1, 10.3–11.3 μm; IR2, 11.5–12.5 μm and WV 6.3–7.6 μm) imagery. The result shows that: (1) ANN approaches, in general, outperformed the PCA and the SVM given sufficient training samples and (2) among the six ANN networks, higher cloud classification accuracy was obtained with the Self-Organizing Map (SOM) and Probabilistic Neural Network (PNN). Second, to compare the ANN methods to the present FY-2C operational algorithm, this study implemented SOM, one of the best ANN network identified from this study, as an automated cloud classification system for the FY-2C multi-channel data. It shows that SOM method has improved the results greatly not only in pixel-level accuracy but also in cloud patch-level classification by more accurately identifying cloud types such as cumulonimbus, cirrus and clouds in high latitude. Findings of this study suggest that the ANN-based classifiers, in particular the SOM, can be potentially used as an improved Automated Cloud Classification Algorithm to upgrade the current window-based clustering method for the FY-2C operational products. PMID:22346714
Liu, Yu; Xia, Jun; Shi, Chun-Xiang; Hong, Yang
2009-01-01
The crowning objective of this research was to identify a better cloud classification method to upgrade the current window-based clustering algorithm used operationally for China's first operational geostationary meteorological satellite FengYun-2C (FY-2C) data. First, the capabilities of six widely-used Artificial Neural Network (ANN) methods are analyzed, together with the comparison of two other methods: Principal Component Analysis (PCA) and a Support Vector Machine (SVM), using 2864 cloud samples manually collected by meteorologists in June, July, and August in 2007 from three FY-2C channel (IR1, 10.3-11.3 μm; IR2, 11.5-12.5 μm and WV 6.3-7.6 μm) imagery. The result shows that: (1) ANN approaches, in general, outperformed the PCA and the SVM given sufficient training samples and (2) among the six ANN networks, higher cloud classification accuracy was obtained with the Self-Organizing Map (SOM) and Probabilistic Neural Network (PNN). Second, to compare the ANN methods to the present FY-2C operational algorithm, this study implemented SOM, one of the best ANN network identified from this study, as an automated cloud classification system for the FY-2C multi-channel data. It shows that SOM method has improved the results greatly not only in pixel-level accuracy but also in cloud patch-level classification by more accurately identifying cloud types such as cumulonimbus, cirrus and clouds in high latitude. Findings of this study suggest that the ANN-based classifiers, in particular the SOM, can be potentially used as an improved Automated Cloud Classification Algorithm to upgrade the current window-based clustering method for the FY-2C operational products.
Large-area settlement pattern recognition from Landsat-8 data
NASA Astrophysics Data System (ADS)
Wieland, Marc; Pittore, Massimiliano
2016-09-01
The study presents an image processing and analysis pipeline that combines object-based image analysis with a Support Vector Machine to derive a multi-layered settlement product from Landsat-8 data over large areas. 43 image scenes are processed over large parts of Central Asia (Southern Kazakhstan, Kyrgyzstan, Tajikistan and Eastern Uzbekistan). The main tasks tackled by this work include built-up area identification, settlement type classification and urban structure types pattern recognition. Besides commonly used accuracy assessments of the resulting map products, thorough performance evaluations are carried out under varying conditions to tune algorithm parameters and assess their applicability for the given tasks. As part of this, several research questions are being addressed. In particular the influence of the improved spatial and spectral resolution of Landsat-8 on the SVM performance to identify built-up areas and urban structure types are evaluated. Also the influence of an extended feature space including digital elevation model features is tested for mountainous regions. Moreover, the spatial distribution of classification uncertainties is analyzed and compared to the heterogeneity of the building stock within the computational unit of the segments. The study concludes that the information content of Landsat-8 images is sufficient for the tested classification tasks and even detailed urban structures could be extracted with satisfying accuracy. Freely available ancillary settlement point location data could further improve the built-up area classification. Digital elevation features and pan-sharpening could, however, not significantly improve the classification results. The study highlights the importance of dynamically tuned classifier parameters, and underlines the use of Shannon entropy computed from the soft answers of the SVM as a valid measure of the spatial distribution of classification uncertainties.
Towards Cooperative Predictive Data Mining in Competitive Environments
NASA Astrophysics Data System (ADS)
Lisý, Viliam; Jakob, Michal; Benda, Petr; Urban, Štěpán; Pěchouček, Michal
We study the problem of predictive data mining in a competitive multi-agent setting, in which each agent is assumed to have some partial knowledge required for correctly classifying a set of unlabelled examples. The agents are self-interested and therefore need to reason about the trade-offs between increasing their classification accuracy by collaborating with other agents and disclosing their private classification knowledge to other agents through such collaboration. We analyze the problem and propose a set of components which can enable cooperation in this otherwise competitive task. These components include measures for quantifying private knowledge disclosure, data-mining models suitable for multi-agent predictive data mining, and a set of strategies by which agents can improve their classification accuracy through collaboration. The overall framework and its individual components are validated on a synthetic experimental domain.
Ntranos, Achilles; Lublin, Fred
2016-10-01
Multiple sclerosis (MS) is one of the most diverse human diseases. Since its first description by Charcot in the nineteenth century, the diagnostic criteria, clinical course classification, and treatment goals for MS have been constantly revised and updated to improve diagnostic accuracy, physician communication, and clinical trial design. These changes have improved the clinical outcomes and quality of life for patients with the disease. Recent technological and research breakthroughs will almost certainly further change how we diagnose, classify, and treat MS in the future. In this review, we summarize the key events in the history of MS, explain the reasoning behind the current criteria for MS diagnosis, classification, and treatment, and provide suggestions for further improvements that will keep enhancing the clinical practice of MS.
Branch classification: A new mechanism for improving branch predictor performance
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chang, P.Y.; Hao, E.; Patt, Y.
There is wide agreement that one of the most significant impediments to the performance of current and future pipelined superscalar processors is the presence of conditional branches in the instruction stream. Speculative execution is one solution to the branch problem, but speculative work is discarded if a branch is mispredicted. For it to be effective, speculative work is discarded if a branch is mispredicted. For it to be effective, speculative execution requires a very accurate branch predictor; 95% accuracy is not good enough. This paper proposes branch classification, a methodology for building more accurate branch predictors. Branch classification allows anmore » individual branch instruction to be associated with the branch predictor best suited to predict its direction. Using this approach, a hybrid branch predictor can be constructed such that each component branch predictor predicts those branches for which it is best suited. To demonstrate the usefulness of branch classification, an example classification scheme is given and a new hybrid predictor is built based on this scheme which achieves a higher prediction accuracy than any branch predictor previously reported in the literature.« less
Classification Based on Pruning and Double Covered Rule Sets for the Internet of Things Applications
Zhou, Zhongmei; Wang, Weiping
2014-01-01
The Internet of things (IOT) is a hot issue in recent years. It accumulates large amounts of data by IOT users, which is a great challenge to mining useful knowledge from IOT. Classification is an effective strategy which can predict the need of users in IOT. However, many traditional rule-based classifiers cannot guarantee that all instances can be covered by at least two classification rules. Thus, these algorithms cannot achieve high accuracy in some datasets. In this paper, we propose a new rule-based classification, CDCR-P (Classification based on the Pruning and Double Covered Rule sets). CDCR-P can induce two different rule sets A and B. Every instance in training set can be covered by at least one rule not only in rule set A, but also in rule set B. In order to improve the quality of rule set B, we take measure to prune the length of rules in rule set B. Our experimental results indicate that, CDCR-P not only is feasible, but also it can achieve high accuracy. PMID:24511304
Li, Shasha; Zhou, Zhongmei; Wang, Weiping
2014-01-01
The Internet of things (IOT) is a hot issue in recent years. It accumulates large amounts of data by IOT users, which is a great challenge to mining useful knowledge from IOT. Classification is an effective strategy which can predict the need of users in IOT. However, many traditional rule-based classifiers cannot guarantee that all instances can be covered by at least two classification rules. Thus, these algorithms cannot achieve high accuracy in some datasets. In this paper, we propose a new rule-based classification, CDCR-P (Classification based on the Pruning and Double Covered Rule sets). CDCR-P can induce two different rule sets A and B. Every instance in training set can be covered by at least one rule not only in rule set A, but also in rule set B. In order to improve the quality of rule set B, we take measure to prune the length of rules in rule set B. Our experimental results indicate that, CDCR-P not only is feasible, but also it can achieve high accuracy.
Arrhythmia Classification Based on Multi-Domain Feature Extraction for an ECG Recognition System.
Li, Hongqiang; Yuan, Danyang; Wang, Youxi; Cui, Dianyin; Cao, Lu
2016-10-20
Automatic recognition of arrhythmias is particularly important in the diagnosis of heart diseases. This study presents an electrocardiogram (ECG) recognition system based on multi-domain feature extraction to classify ECG beats. An improved wavelet threshold method for ECG signal pre-processing is applied to remove noise interference. A novel multi-domain feature extraction method is proposed; this method employs kernel-independent component analysis in nonlinear feature extraction and uses discrete wavelet transform to extract frequency domain features. The proposed system utilises a support vector machine classifier optimized with a genetic algorithm to recognize different types of heartbeats. An ECG acquisition experimental platform, in which ECG beats are collected as ECG data for classification, is constructed to demonstrate the effectiveness of the system in ECG beat classification. The presented system, when applied to the MIT-BIH arrhythmia database, achieves a high classification accuracy of 98.8%. Experimental results based on the ECG acquisition experimental platform show that the system obtains a satisfactory classification accuracy of 97.3% and is able to classify ECG beats efficiently for the automatic identification of cardiac arrhythmias.
Arrhythmia Classification Based on Multi-Domain Feature Extraction for an ECG Recognition System
Li, Hongqiang; Yuan, Danyang; Wang, Youxi; Cui, Dianyin; Cao, Lu
2016-01-01
Automatic recognition of arrhythmias is particularly important in the diagnosis of heart diseases. This study presents an electrocardiogram (ECG) recognition system based on multi-domain feature extraction to classify ECG beats. An improved wavelet threshold method for ECG signal pre-processing is applied to remove noise interference. A novel multi-domain feature extraction method is proposed; this method employs kernel-independent component analysis in nonlinear feature extraction and uses discrete wavelet transform to extract frequency domain features. The proposed system utilises a support vector machine classifier optimized with a genetic algorithm to recognize different types of heartbeats. An ECG acquisition experimental platform, in which ECG beats are collected as ECG data for classification, is constructed to demonstrate the effectiveness of the system in ECG beat classification. The presented system, when applied to the MIT-BIH arrhythmia database, achieves a high classification accuracy of 98.8%. Experimental results based on the ECG acquisition experimental platform show that the system obtains a satisfactory classification accuracy of 97.3% and is able to classify ECG beats efficiently for the automatic identification of cardiac arrhythmias. PMID:27775596
Balanced VS Imbalanced Training Data: Classifying Rapideye Data with Support Vector Machines
NASA Astrophysics Data System (ADS)
Ustuner, M.; Sanli, F. B.; Abdikan, S.
2016-06-01
The accuracy of supervised image classification is highly dependent upon several factors such as the design of training set (sample selection, composition, purity and size), resolution of input imagery and landscape heterogeneity. The design of training set is still a challenging issue since the sensitivity of classifier algorithm at learning stage is different for the same dataset. In this paper, the classification of RapidEye imagery with balanced and imbalanced training data for mapping the crop types was addressed. Classification with imbalanced training data may result in low accuracy in some scenarios. Support Vector Machines (SVM), Maximum Likelihood (ML) and Artificial Neural Network (ANN) classifications were implemented here to classify the data. For evaluating the influence of the balanced and imbalanced training data on image classification algorithms, three different training datasets were created. Two different balanced datasets which have 70 and 100 pixels for each class of interest and one imbalanced dataset in which each class has different number of pixels were used in classification stage. Results demonstrate that ML and NN classifications are affected by imbalanced training data in resulting a reduction in accuracy (from 90.94% to 85.94% for ML and from 91.56% to 88.44% for NN) while SVM is not affected significantly (from 94.38% to 94.69%) and slightly improved. Our results highlighted that SVM is proven to be a very robust, consistent and effective classifier as it can perform very well under balanced and imbalanced training data situations. Furthermore, the training stage should be precisely and carefully designed for the need of adopted classifier.
Heredia-Juesas, Juan; Thatcher, Jeffrey E; Lu, Yang; Squiers, John J; King, Darlene; Fan, Wensheng; DiMaio, J Michael; Martinez-Lorenzo, Jose A
2018-04-01
The process of burn debridement is a challenging technique requiring significant skills to identify the regions that need excision and their appropriate excision depths. In order to assist surgeons, a machine learning tool is being developed to provide a quantitative assessment of burn-injured tissue. This paper presents three non-invasive optical imaging techniques capable of distinguishing four kinds of tissue-healthy skin, viable wound bed, shallow burn, and deep burn-during serial burn debridement in a porcine model. All combinations of these three techniques have been studied through a k-fold cross-validation method. In terms of global performance, the combination of all three techniques significantly improves the classification accuracy with respect to just one technique, from 0.42 up to more than 0.76. Furthermore, a non-linear spatial filtering based on the mode of a small neighborhood has been applied as a post-processing technique, in order to improve the performance of the classification. Using this technique, the global accuracy reaches a value close to 0.78 and, for some particular tissues and combination of techniques, the accuracy improves by 13%.
Sugimoto, Katsutoshi; Shiraishi, Junji; Moriyasu, Fuminori; Doi, Kunio
2009-04-01
To develop a computer-aided diagnostic (CAD) scheme for classifying focal liver lesions (FLLs) by use of physicians' subjective classification of echogenic patterns of FLLs on baseline and contrast-enhanced ultrasonography (US). A total of 137 hepatic lesions in 137 patients were evaluated with B-mode and NC100100 (Sonazoid)-enhanced pulse-inversion US; lesions included 74 hepatocellular carcinomas (HCCs) (23: well-differentiated, 36: moderately differentiated, 15: poorly differentiated HCCs), 33 liver metastases, and 30 liver hemangiomas. Three physicians evaluated single images at B-mode and arterial phases with a cine mode. Physicians were asked to classify each lesion into one of eight B-mode and one of eight enhancement patterns, but did not make a diagnosis. To classify five types of FLLs, we employed a decision tree model with four decision nodes and four artificial neural networks (ANNs). The results of the physicians' pattern classifications were used successively for four different ANNs in making decisions at each of the decision nodes in the decision tree model. The classification accuracies for the 137 FLLs were 84.8% for metastasis, 93.3% for hemangioma, and 98.6% for all HCCs. In addition, the classification accuracies for histological differentiation types of HCCs were 65.2% for well-differentiated HCC, 41.7% for moderately differentiated HCC, and 80.0% for poorly differentiated HCC. This CAD scheme has the potential to improve the diagnostic accuracy of liver lesions. However, the accuracy in the histologic differential diagnosis of HCC based on baseline and contrast-enhanced US is still limited.
NASA Technical Reports Server (NTRS)
Justice, C.; Townshend, J. (Principal Investigator)
1981-01-01
Two unsupervised classification procedures were applied to ratioed and unratioed LANDSAT multispectral scanner data of an area of spatially complex vegetation and terrain. An objective accuracy assessment was undertaken on each classification and comparison was made of the classification accuracies. The two unsupervised procedures use the same clustering algorithm. By on procedure the entire area is clustered and by the other a representative sample of the area is clustered and the resulting statistics are extrapolated to the remaining area using a maximum likelihood classifier. Explanation is given of the major steps in the classification procedures including image preprocessing; classification; interpretation of cluster classes; and accuracy assessment. Of the four classifications undertaken, the monocluster block approach on the unratioed data gave the highest accuracy of 80% for five coarse cover classes. This accuracy was increased to 84% by applying a 3 x 3 contextual filter to the classified image. A detailed description and partial explanation is provided for the major misclassification. The classification of the unratioed data produced higher percentage accuracies than for the ratioed data and the monocluster block approach gave higher accuracies than clustering the entire area. The moncluster block approach was additionally the most economical in terms of computing time.
Nelson, G.; Ramsey, Elijah W.; Rangoonwala, A.
2005-01-01
Landsat Thematic Mapper images and collateral data sources were used to classify the land cover of the Mermentau River Basin within the chenier coastal plain and the adjacent uplands of Louisiana, USA. Landcover classes followed that of the National Oceanic and Atmospheric Administration's Coastal Change Analysis Program; however, classification methods needed to be developed to meet these national standards. Our first classification was limited to the Mermentau River Basin (MRB) in southcentral Louisiana, and the years of 1990, 1993, and 1996. To overcome problems due to class spectral inseparable, spatial and spectra continuums, mixed landcovers, and abnormal transitions, we separated the coastal area into regions of commonality and applying masks to specific land mixtures. Over the three years and 14 landcover classes (aggregating the cultivated land and grassland, and water and floating vegetation classes), overall accuracies ranged from 82% to 90%. To enhance landcover change interpretation, three indicators were introduced as Location Stability, Residence stability, and Turnover. Implementing methods substantiated in the multiple date MRB classification, we spatially extended the classification to the entire Louisiana coast and temporally extended the original 1990, 1993, 1996 classifications to 1999 (Figure 1). We also advanced the operational functionality of the classification and increased the credibility of change detection results. Increased operational functionality that resulted in diminished user input was for the most part gained by implementing a classification logic based on forbidden transitions. The logic detected and corrected misclassifications and mostly alleviated the necessity of subregion separation prior to the classification. The new methods provided an improved ability for more timely detection and response to landcover impact. ?? 2005 IEEE.
NASA Astrophysics Data System (ADS)
Yu, Xin; Wen, Zongyong; Zhu, Zhaorong; Xia, Qiang; Shun, Lan
2016-06-01
Image classification will still be a long way in the future, although it has gone almost half a century. In fact, researchers have gained many fruits in the image classification domain, but there is still a long distance between theory and practice. However, some new methods in the artificial intelligence domain will be absorbed into the image classification domain and draw on the strength of each to offset the weakness of the other, which will open up a new prospect. Usually, networks play the role of a high-level language, as is seen in Artificial Intelligence and statistics, because networks are used to build complex model from simple components. These years, Bayesian Networks, one of probabilistic networks, are a powerful data mining technique for handling uncertainty in complex domains. In this paper, we apply Tree Augmented Naive Bayesian Networks (TAN) to texture classification of High-resolution remote sensing images and put up a new method to construct the network topology structure in terms of training accuracy based on the training samples. Since 2013, China government has started the first national geographical information census project, which mainly interprets geographical information based on high-resolution remote sensing images. Therefore, this paper tries to apply Bayesian network to remote sensing image classification, in order to improve image interpretation in the first national geographical information census project. In the experiment, we choose some remote sensing images in Beijing. Experimental results demonstrate TAN outperform than Naive Bayesian Classifier (NBC) and Maximum Likelihood Classification Method (MLC) in the overall classification accuracy. In addition, the proposed method can reduce the workload of field workers and improve the work efficiency. Although it is time consuming, it will be an attractive and effective method for assisting office operation of image interpretation.
Virtual Sensor of Surface Electromyography in a New Extensive Fault-Tolerant Classification System.
de Moura, Karina de O A; Balbinot, Alexandre
2018-05-01
A few prosthetic control systems in the scientific literature obtain pattern recognition algorithms adapted to changes that occur in the myoelectric signal over time and, frequently, such systems are not natural and intuitive. These are some of the several challenges for myoelectric prostheses for everyday use. The concept of the virtual sensor, which has as its fundamental objective to estimate unavailable measures based on other available measures, is being used in other fields of research. The virtual sensor technique applied to surface electromyography can help to minimize these problems, typically related to the degradation of the myoelectric signal that usually leads to a decrease in the classification accuracy of the movements characterized by computational intelligent systems. This paper presents a virtual sensor in a new extensive fault-tolerant classification system to maintain the classification accuracy after the occurrence of the following contaminants: ECG interference, electrode displacement, movement artifacts, power line interference, and saturation. The Time-Varying Autoregressive Moving Average (TVARMA) and Time-Varying Kalman filter (TVK) models are compared to define the most robust model for the virtual sensor. Results of movement classification were presented comparing the usual classification techniques with the method of the degraded signal replacement and classifier retraining. The experimental results were evaluated for these five noise types in 16 surface electromyography (sEMG) channel degradation case studies. The proposed system without using classifier retraining techniques recovered of mean classification accuracy was of 4% to 38% for electrode displacement, movement artifacts, and saturation noise. The best mean classification considering all signal contaminants and channel combinations evaluated was the classification using the retraining method, replacing the degraded channel by the virtual sensor TVARMA model. This method recovered the classification accuracy after the degradations, reaching an average of 5.7% below the classification of the clean signal, that is the signal without the contaminants or the original signal. Moreover, the proposed intelligent technique minimizes the impact of the motion classification caused by signal contamination related to degrading events over time. There are improvements in the virtual sensor model and in the algorithm optimization that need further development to provide an increase the clinical application of myoelectric prostheses but already presents robust results to enable research with virtual sensors on biological signs with stochastic behavior.
Virtual Sensor of Surface Electromyography in a New Extensive Fault-Tolerant Classification System
Balbinot, Alexandre
2018-01-01
A few prosthetic control systems in the scientific literature obtain pattern recognition algorithms adapted to changes that occur in the myoelectric signal over time and, frequently, such systems are not natural and intuitive. These are some of the several challenges for myoelectric prostheses for everyday use. The concept of the virtual sensor, which has as its fundamental objective to estimate unavailable measures based on other available measures, is being used in other fields of research. The virtual sensor technique applied to surface electromyography can help to minimize these problems, typically related to the degradation of the myoelectric signal that usually leads to a decrease in the classification accuracy of the movements characterized by computational intelligent systems. This paper presents a virtual sensor in a new extensive fault-tolerant classification system to maintain the classification accuracy after the occurrence of the following contaminants: ECG interference, electrode displacement, movement artifacts, power line interference, and saturation. The Time-Varying Autoregressive Moving Average (TVARMA) and Time-Varying Kalman filter (TVK) models are compared to define the most robust model for the virtual sensor. Results of movement classification were presented comparing the usual classification techniques with the method of the degraded signal replacement and classifier retraining. The experimental results were evaluated for these five noise types in 16 surface electromyography (sEMG) channel degradation case studies. The proposed system without using classifier retraining techniques recovered of mean classification accuracy was of 4% to 38% for electrode displacement, movement artifacts, and saturation noise. The best mean classification considering all signal contaminants and channel combinations evaluated was the classification using the retraining method, replacing the degraded channel by the virtual sensor TVARMA model. This method recovered the classification accuracy after the degradations, reaching an average of 5.7% below the classification of the clean signal, that is the signal without the contaminants or the original signal. Moreover, the proposed intelligent technique minimizes the impact of the motion classification caused by signal contamination related to degrading events over time. There are improvements in the virtual sensor model and in the algorithm optimization that need further development to provide an increase the clinical application of myoelectric prostheses but already presents robust results to enable research with virtual sensors on biological signs with stochastic behavior. PMID:29723994
Active relearning for robust supervised classification of pulmonary emphysema
NASA Astrophysics Data System (ADS)
Raghunath, Sushravya; Rajagopalan, Srinivasan; Karwoski, Ronald A.; Bartholmai, Brian J.; Robb, Richard A.
2012-03-01
Radiologists are adept at recognizing the appearance of lung parenchymal abnormalities in CT scans. However, the inconsistent differential diagnosis, due to subjective aggregation, mandates supervised classification. Towards optimizing Emphysema classification, we introduce a physician-in-the-loop feedback approach in order to minimize uncertainty in the selected training samples. Using multi-view inductive learning with the training samples, an ensemble of Support Vector Machine (SVM) models, each based on a specific pair-wise dissimilarity metric, was constructed in less than six seconds. In the active relearning phase, the ensemble-expert label conflicts were resolved by an expert. This just-in-time feedback with unoptimized SVMs yielded 15% increase in classification accuracy and 25% reduction in the number of support vectors. The generality of relearning was assessed in the optimized parameter space of six different classifiers across seven dissimilarity metrics. The resultant average accuracy improved to 21%. The co-operative feedback method proposed here could enhance both diagnostic and staging throughput efficiency in chest radiology practice.
NASA Technical Reports Server (NTRS)
Kettig, R. L.
1975-01-01
A method of classification of digitized multispectral images is developed and experimentally evaluated on actual earth resources data collected by aircraft and satellite. The method is designed to exploit the characteristic dependence between adjacent states of nature that is neglected by the more conventional simple-symmetric decision rule. Thus contextual information is incorporated into the classification scheme. The principle reason for doing this is to improve the accuracy of the classification. For general types of dependence this would generally require more computation per resolution element than the simple-symmetric classifier. But when the dependence occurs in the form of redundance, the elements can be classified collectively, in groups, therby reducing the number of classifications required.
An Active Learning Framework for Hyperspectral Image Classification Using Hierarchical Segmentation
NASA Technical Reports Server (NTRS)
Zhang, Zhou; Pasolli, Edoardo; Crawford, Melba M.; Tilton, James C.
2015-01-01
Augmenting spectral data with spatial information for image classification has recently gained significant attention, as classification accuracy can often be improved by extracting spatial information from neighboring pixels. In this paper, we propose a new framework in which active learning (AL) and hierarchical segmentation (HSeg) are combined for spectral-spatial classification of hyperspectral images. The spatial information is extracted from a best segmentation obtained by pruning the HSeg tree using a new supervised strategy. The best segmentation is updated at each iteration of the AL process, thus taking advantage of informative labeled samples provided by the user. The proposed strategy incorporates spatial information in two ways: 1) concatenating the extracted spatial features and the original spectral features into a stacked vector and 2) extending the training set using a self-learning-based semi-supervised learning (SSL) approach. Finally, the two strategies are combined within an AL framework. The proposed framework is validated with two benchmark hyperspectral datasets. Higher classification accuracies are obtained by the proposed framework with respect to five other state-of-the-art spectral-spatial classification approaches. Moreover, the effectiveness of the proposed pruning strategy is also demonstrated relative to the approaches based on a fixed segmentation.
Couple Graph Based Label Propagation Method for Hyperspectral Remote Sensing Data Classification
NASA Astrophysics Data System (ADS)
Wang, X. P.; Hu, Y.; Chen, J.
2018-04-01
Graph based semi-supervised classification method are widely used for hyperspectral image classification. We present a couple graph based label propagation method, which contains both the adjacency graph and the similar graph. We propose to construct the similar graph by using the similar probability, which utilize the label similarity among examples probably. The adjacency graph was utilized by a common manifold learning method, which has effective improve the classification accuracy of hyperspectral data. The experiments indicate that the couple graph Laplacian which unite both the adjacency graph and the similar graph, produce superior classification results than other manifold Learning based graph Laplacian and Sparse representation based graph Laplacian in label propagation framework.
NASA Astrophysics Data System (ADS)
Dementev, A. O.; Dmitriev, E. V.; Kozoderov, V. V.; Egorov, V. D.
2017-10-01
Hyperspectral imaging is up-to-date promising technology widely applied for the accurate thematic mapping. The presence of a large number of narrow survey channels allows us to use subtle differences in spectral characteristics of objects and to make a more detailed classification than in the case of using standard multispectral data. The difficulties encountered in the processing of hyperspectral images are usually associated with the redundancy of spectral information which leads to the problem of the curse of dimensionality. Methods currently used for recognizing objects on multispectral and hyperspectral images are usually based on standard base supervised classification algorithms of various complexity. Accuracy of these algorithms can be significantly different depending on considered classification tasks. In this paper we study the performance of ensemble classification methods for the problem of classification of the forest vegetation. Error correcting output codes and boosting are tested on artificial data and real hyperspectral images. It is demonstrates, that boosting gives more significant improvement when used with simple base classifiers. The accuracy in this case in comparable the error correcting output code (ECOC) classifier with Gaussian kernel SVM base algorithm. However the necessity of boosting ECOC with Gaussian kernel SVM is questionable. It is demonstrated, that selected ensemble classifiers allow us to recognize forest species with high enough accuracy which can be compared with ground-based forest inventory data.
Can segmentation evaluation metric be used as an indicator of land cover classification accuracy?
NASA Astrophysics Data System (ADS)
Švab Lenarčič, Andreja; Đurić, Nataša; Čotar, Klemen; Ritlop, Klemen; Oštir, Krištof
2016-10-01
It is a broadly established belief that the segmentation result significantly affects subsequent image classification accuracy. However, the actual correlation between the two has never been evaluated. Such an evaluation would be of considerable importance for any attempts to automate the object-based classification process, as it would reduce the amount of user intervention required to fine-tune the segmentation parameters. We conducted an assessment of segmentation and classification by analyzing 100 different segmentation parameter combinations, 3 classifiers, 5 land cover classes, 20 segmentation evaluation metrics, and 7 classification accuracy measures. The reliability definition of segmentation evaluation metrics as indicators of land cover classification accuracy was based on the linear correlation between the two. All unsupervised metrics that are not based on number of segments have a very strong correlation with all classification measures and are therefore reliable as indicators of land cover classification accuracy. On the other hand, correlation at supervised metrics is dependent on so many factors that it cannot be trusted as a reliable classification quality indicator. Algorithms for land cover classification studied in this paper are widely used; therefore, presented results are applicable to a wider area.
Iliyasu, Abdullah M; Fatichah, Chastine
2017-12-19
A quantum hybrid (QH) intelligent approach that blends the adaptive search capability of the quantum-behaved particle swarm optimisation (QPSO) method with the intuitionistic rationality of traditional fuzzy k -nearest neighbours (Fuzzy k -NN) algorithm (known simply as the Q-Fuzzy approach) is proposed for efficient feature selection and classification of cells in cervical smeared (CS) images. From an initial multitude of 17 features describing the geometry, colour, and texture of the CS images, the QPSO stage of our proposed technique is used to select the best subset features (i.e., global best particles) that represent a pruned down collection of seven features. Using a dataset of almost 1000 images, performance evaluation of our proposed Q-Fuzzy approach assesses the impact of our feature selection on classification accuracy by way of three experimental scenarios that are compared alongside two other approaches: the All-features (i.e., classification without prior feature selection) and another hybrid technique combining the standard PSO algorithm with the Fuzzy k -NN technique (P-Fuzzy approach). In the first and second scenarios, we further divided the assessment criteria in terms of classification accuracy based on the choice of best features and those in terms of the different categories of the cervical cells. In the third scenario, we introduced new QH hybrid techniques, i.e., QPSO combined with other supervised learning methods, and compared the classification accuracy alongside our proposed Q-Fuzzy approach. Furthermore, we employed statistical approaches to establish qualitative agreement with regards to the feature selection in the experimental scenarios 1 and 3. The synergy between the QPSO and Fuzzy k -NN in the proposed Q-Fuzzy approach improves classification accuracy as manifest in the reduction in number cell features, which is crucial for effective cervical cancer detection and diagnosis.
Classification of Aerosol Retrievals from Spaceborne Polarimetry Using a Multiparameter Algorithm
NASA Technical Reports Server (NTRS)
Russell, Philip B.; Kacenelenbogen, Meloe; Livingston, John M.; Hasekamp, Otto P.; Burton, Sharon P.; Schuster, Gregory L.; Johnson, Matthew S.; Knobelspiesse, Kirk D.; Redemann, Jens; Ramachandran, S.;
2013-01-01
In this presentation, we demonstrate application of a new aerosol classification algorithm to retrievals from the POLDER-3 polarimter on the PARASOL spacecraft. Motivation and method: Since the development of global aerosol measurements by satellites and AERONET, classification of observed aerosols into several types (e.g., urban-industrial, biomass burning, mineral dust, maritime, and various subtypes or mixtures of these) has proven useful to: understanding aerosol sources, transformations, effects, and feedback mechanisms; improving accuracy of satellite retrievals and quantifying assessments of aerosol radiative impacts on climate.
Multispectral LiDAR Data for Land Cover Classification of Urban Areas
Morsy, Salem; Shaker, Ahmed; El-Rabbany, Ahmed
2017-01-01
Airborne Light Detection And Ranging (LiDAR) systems usually operate at a monochromatic wavelength measuring the range and the strength of the reflected energy (intensity) from objects. Recently, multispectral LiDAR sensors, which acquire data at different wavelengths, have emerged. This allows for recording of a diversity of spectral reflectance from objects. In this context, we aim to investigate the use of multispectral LiDAR data in land cover classification using two different techniques. The first is image-based classification, where intensity and height images are created from LiDAR points and then a maximum likelihood classifier is applied. The second is point-based classification, where ground filtering and Normalized Difference Vegetation Indices (NDVIs) computation are conducted. A dataset of an urban area located in Oshawa, Ontario, Canada, is classified into four classes: buildings, trees, roads and grass. An overall accuracy of up to 89.9% and 92.7% is achieved from image classification and 3D point classification, respectively. A radiometric correction model is also applied to the intensity data in order to remove the attenuation due to the system distortion and terrain height variation. The classification process is then repeated, and the results demonstrate that there are no significant improvements achieved in the overall accuracy. PMID:28445432
Multispectral LiDAR Data for Land Cover Classification of Urban Areas.
Morsy, Salem; Shaker, Ahmed; El-Rabbany, Ahmed
2017-04-26
Airborne Light Detection And Ranging (LiDAR) systems usually operate at a monochromatic wavelength measuring the range and the strength of the reflected energy (intensity) from objects. Recently, multispectral LiDAR sensors, which acquire data at different wavelengths, have emerged. This allows for recording of a diversity of spectral reflectance from objects. In this context, we aim to investigate the use of multispectral LiDAR data in land cover classification using two different techniques. The first is image-based classification, where intensity and height images are created from LiDAR points and then a maximum likelihood classifier is applied. The second is point-based classification, where ground filtering and Normalized Difference Vegetation Indices (NDVIs) computation are conducted. A dataset of an urban area located in Oshawa, Ontario, Canada, is classified into four classes: buildings, trees, roads and grass. An overall accuracy of up to 89.9% and 92.7% is achieved from image classification and 3D point classification, respectively. A radiometric correction model is also applied to the intensity data in order to remove the attenuation due to the system distortion and terrain height variation. The classification process is then repeated, and the results demonstrate that there are no significant improvements achieved in the overall accuracy.
Wavelet-based multicomponent denoising on GPU to improve the classification of hyperspectral images
NASA Astrophysics Data System (ADS)
Quesada-Barriuso, Pablo; Heras, Dora B.; Argüello, Francisco; Mouriño, J. C.
2017-10-01
Supervised classification allows handling a wide range of remote sensing hyperspectral applications. Enhancing the spatial organization of the pixels over the image has proven to be beneficial for the interpretation of the image content, thus increasing the classification accuracy. Denoising in the spatial domain of the image has been shown as a technique that enhances the structures in the image. This paper proposes a multi-component denoising approach in order to increase the classification accuracy when a classification method is applied. It is computed on multicore CPUs and NVIDIA GPUs. The method combines feature extraction based on a 1Ddiscrete wavelet transform (DWT) applied in the spectral dimension followed by an Extended Morphological Profile (EMP) and a classifier (SVM or ELM). The multi-component noise reduction is applied to the EMP just before the classification. The denoising recursively applies a separable 2D DWT after which the number of wavelet coefficients is reduced by using a threshold. Finally, inverse 2D-DWT filters are applied to reconstruct the noise free original component. The computational cost of the classifiers as well as the cost of the whole classification chain is high but it is reduced achieving real-time behavior for some applications through their computation on NVIDIA multi-GPU platforms.
Efficient use of unlabeled data for protein sequence classification: a comparative study
Kuksa, Pavel; Huang, Pai-Hsi; Pavlovic, Vladimir
2009-01-01
Background Recent studies in computational primary protein sequence analysis have leveraged the power of unlabeled data. For example, predictive models based on string kernels trained on sequences known to belong to particular folds or superfamilies, the so-called labeled data set, can attain significantly improved accuracy if this data is supplemented with protein sequences that lack any class tags–the unlabeled data. In this study, we present a principled and biologically motivated computational framework that more effectively exploits the unlabeled data by only using the sequence regions that are more likely to be biologically relevant for better prediction accuracy. As overly-represented sequences in large uncurated databases may bias the estimation of computational models that rely on unlabeled data, we also propose a method to remove this bias and improve performance of the resulting classifiers. Results Combined with state-of-the-art string kernels, our proposed computational framework achieves very accurate semi-supervised protein remote fold and homology detection on three large unlabeled databases. It outperforms current state-of-the-art methods and exhibits significant reduction in running time. Conclusion The unlabeled sequences used under the semi-supervised setting resemble the unpolished gemstones; when used as-is, they may carry unnecessary features and hence compromise the classification accuracy but once cut and polished, they improve the accuracy of the classifiers considerably. PMID:19426450
Shanthi, C; Pappa, N
2017-05-01
Flow pattern recognition is necessary to select design equations for finding operating details of the process and to perform computational simulations. Visual image processing can be used to automate the interpretation of patterns in two-phase flow. In this paper, an attempt has been made to improve the classification accuracy of the flow pattern of gas/ liquid two- phase flow using fuzzy logic and Support Vector Machine (SVM) with Principal Component Analysis (PCA). The videos of six different types of flow patterns namely, annular flow, bubble flow, churn flow, plug flow, slug flow and stratified flow are recorded for a period and converted to 2D images for processing. The textural and shape features extracted using image processing are applied as inputs to various classification schemes namely fuzzy logic, SVM and SVM with PCA in order to identify the type of flow pattern. The results obtained are compared and it is observed that SVM with features reduced using PCA gives the better classification accuracy and computationally less intensive than other two existing schemes. This study results cover industrial application needs including oil and gas and any other gas-liquid two-phase flows. Copyright © 2017 ISA. Published by Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Luo, Qiu; Xin, Wu; Qiming, Xiong
2017-06-01
In the process of vegetation remote sensing information extraction, the problem of phenological features and low performance of remote sensing analysis algorithm is not considered. To solve this problem, the method of remote sensing vegetation information based on EVI time-series and the classification of decision-tree of multi-source branch similarity is promoted. Firstly, to improve the time-series stability of recognition accuracy, the seasonal feature of vegetation is extracted based on the fitting span range of time-series. Secondly, the decision-tree similarity is distinguished by adaptive selection path or probability parameter of component prediction. As an index, it is to evaluate the degree of task association, decide whether to perform migration of multi-source decision tree, and ensure the speed of migration. Finally, the accuracy of classification and recognition of pests and diseases can reach 87%--98% of commercial forest in Dalbergia hainanensis, which is significantly better than that of MODIS coverage accuracy of 80%--96% in this area. Therefore, the validity of the proposed method can be verified.
Besga, Ariadna; Gonzalez, Itxaso; Echeburua, Enrique; Savio, Alexandre; Ayerdi, Borja; Chyzhyk, Darya; Madrigal, Jose L M; Leza, Juan C; Graña, Manuel; Gonzalez-Pinto, Ana Maria
2015-01-01
Late onset bipolar disorder (LOBD) is often difficult to distinguish from degenerative dementias, such as Alzheimer disease (AD), due to comorbidities and common cognitive symptoms. Moreover, LOBD prevalence in the elder population is not negligible and it is increasing. Both pathologies share pathophysiological neuroinflammation features. Improvements in differential diagnosis of LOBD and AD will help to select the best personalized treatment. The aim of this study is to assess the relative significance of clinical observations, neuropsychological tests, and specific blood plasma biomarkers (inflammatory and neurotrophic), separately and combined, in the differential diagnosis of LOBD versus AD. It was carried out evaluating the accuracy achieved by classification-based computer-aided diagnosis (CAD) systems based on these variables. A sample of healthy controls (HC) (n = 26), AD patients (n = 37), and LOBD patients (n = 32) was recruited at the Alava University Hospital. Clinical observations, neuropsychological tests, and plasma biomarkers were measured at recruitment time. We applied multivariate machine learning classification methods to discriminate subjects from HC, AD, and LOBD populations in the study. We analyzed, for each classification contrast, feature sets combining clinical observations, neuropsychological measures, and biological markers, including inflammation biomarkers. Furthermore, we analyzed reduced feature sets containing variables with significative differences determined by a Welch's t-test. Furthermore, a battery of classifier architectures were applied, encompassing linear and non-linear Support Vector Machines (SVM), Random Forests (RF), Classification and regression trees (CART), and their performance was evaluated in a leave-one-out (LOO) cross-validation scheme. Post hoc analysis of Gini index in CART classifiers provided a measure of each variable importance. Welch's t-test found one biomarker (Malondialdehyde) with significative differences (p < 0.001) in LOBD vs. AD contrast. Classification results with the best features are as follows: discrimination of HC vs. AD patients reaches accuracy 97.21% and AUC 98.17%. Discrimination of LOBD vs. AD patients reaches accuracy 90.26% and AUC 89.57%. Discrimination of HC vs LOBD patients achieves accuracy 95.76% and AUC 88.46%. It is feasible to build CAD systems for differential diagnosis of LOBD and AD on the basis of a reduced set of clinical variables. Clinical observations provide the greatest discrimination. Neuropsychological tests are improved by the addition of biomarkers, and both contribute significantly to improve the overall predictive performance.
Zhao, Dehua; Jiang, Hao; Yang, Tangwu; Cai, Ying; Xu, Delin; An, Shuqing
2012-03-01
Classification trees (CT) have been used successfully in the past to classify aquatic vegetation from spectral indices (SI) obtained from remotely-sensed images. However, applying CT models developed for certain image dates to other time periods within the same year or among different years can reduce the classification accuracy. In this study, we developed CT models with modified thresholds using extreme SI values (CT(m)) to improve the stability of the models when applying them to different time periods. A total of 903 ground-truth samples were obtained in September of 2009 and 2010 and classified as emergent, floating-leaf, or submerged vegetation or other cover types. Classification trees were developed for 2009 (Model-09) and 2010 (Model-10) using field samples and a combination of two images from winter and summer. Overall accuracies of these models were 92.8% and 94.9%, respectively, which confirmed the ability of CT analysis to map aquatic vegetation in Taihu Lake. However, Model-10 had only 58.9-71.6% classification accuracy and 31.1-58.3% agreement (i.e., pixels classified the same in the two maps) for aquatic vegetation when it was applied to image pairs from both a different time period in 2010 and a similar time period in 2009. We developed a method to estimate the effects of extrinsic (EF) and intrinsic (IF) factors on model uncertainty using Modis images. Results indicated that 71.1% of the instability in classification between time periods was due to EF, which might include changes in atmospheric conditions, sun-view angle and water quality. The remainder was due to IF, such as phenological and growth status differences between time periods. The modified version of Model-10 (i.e. CT(m)) performed better than traditional CT with different image dates. When applied to 2009 images, the CT(m) version of Model-10 had very similar thresholds and performance as Model-09, with overall accuracies of 92.8% and 90.5% for Model-09 and the CT(m) version of Model-10, respectively. CT(m) decreased the variability related to EF and IF and thereby improved the applicability of the models to different time periods. In both practice and theory, our results suggested that CT(m) was more stable than traditional CT models and could be used to map aquatic vegetation in time periods other than the one for which the model was developed. Copyright © 2011 Elsevier Ltd. All rights reserved.
The Southwest Regional Gap Analysis Project (SW ReGAP) improves upon previous GAP projects conducted in Arizona, Colorado, Nevada, New Mexico, and Utah to provide a
consistent, seamless vegetation map for this large and ecologically diverse geographic region. Nevada's compone...
Sentiment analysis of feature ranking methods for classification accuracy
NASA Astrophysics Data System (ADS)
Joseph, Shashank; Mugauri, Calvin; Sumathy, S.
2017-11-01
Text pre-processing and feature selection are important and critical steps in text mining. Text pre-processing of large volumes of datasets is a difficult task as unstructured raw data is converted into structured format. Traditional methods of processing and weighing took much time and were less accurate. To overcome this challenge, feature ranking techniques have been devised. A feature set from text preprocessing is fed as input for feature selection. Feature selection helps improve text classification accuracy. Of the three feature selection categories available, the filter category will be the focus. Five feature ranking methods namely: document frequency, standard deviation information gain, CHI-SQUARE, and weighted-log likelihood -ratio is analyzed.
Malay sentiment analysis based on combined classification approaches and Senti-lexicon algorithm.
Al-Saffar, Ahmed; Awang, Suryanti; Tao, Hai; Omar, Nazlia; Al-Saiagh, Wafaa; Al-Bared, Mohammed
2018-01-01
Sentiment analysis techniques are increasingly exploited to categorize the opinion text to one or more predefined sentiment classes for the creation and automated maintenance of review-aggregation websites. In this paper, a Malay sentiment analysis classification model is proposed to improve classification performances based on the semantic orientation and machine learning approaches. First, a total of 2,478 Malay sentiment-lexicon phrases and words are assigned with a synonym and stored with the help of more than one Malay native speaker, and the polarity is manually allotted with a score. In addition, the supervised machine learning approaches and lexicon knowledge method are combined for Malay sentiment classification with evaluating thirteen features. Finally, three individual classifiers and a combined classifier are used to evaluate the classification accuracy. In experimental results, a wide-range of comparative experiments is conducted on a Malay Reviews Corpus (MRC), and it demonstrates that the feature extraction improves the performance of Malay sentiment analysis based on the combined classification. However, the results depend on three factors, the features, the number of features and the classification approach.
Malay sentiment analysis based on combined classification approaches and Senti-lexicon algorithm
Awang, Suryanti; Tao, Hai; Omar, Nazlia; Al-Saiagh, Wafaa; Al-bared, Mohammed
2018-01-01
Sentiment analysis techniques are increasingly exploited to categorize the opinion text to one or more predefined sentiment classes for the creation and automated maintenance of review-aggregation websites. In this paper, a Malay sentiment analysis classification model is proposed to improve classification performances based on the semantic orientation and machine learning approaches. First, a total of 2,478 Malay sentiment-lexicon phrases and words are assigned with a synonym and stored with the help of more than one Malay native speaker, and the polarity is manually allotted with a score. In addition, the supervised machine learning approaches and lexicon knowledge method are combined for Malay sentiment classification with evaluating thirteen features. Finally, three individual classifiers and a combined classifier are used to evaluate the classification accuracy. In experimental results, a wide-range of comparative experiments is conducted on a Malay Reviews Corpus (MRC), and it demonstrates that the feature extraction improves the performance of Malay sentiment analysis based on the combined classification. However, the results depend on three factors, the features, the number of features and the classification approach. PMID:29684036
A Stimulus-Independent Hybrid BCI Based on Motor Imagery and Somatosensory Attentional Orientation.
Yao, Lin; Sheng, Xinjun; Zhang, Dingguo; Jiang, Ning; Mrachacz-Kersting, Natalie; Zhu, Xiangyang; Farina, Dario
2017-09-01
Distinctive EEG signals from the motor and somatosensory cortex are generated during mental tasks of motor imagery (MI) and somatosensory attentional orientation (SAO). In this paper, we hypothesize that a combination of these two signal modalities provides improvements in a brain-computer interface (BCI) performance with respect to using the two methods separately, and generate novel types of multi-class BCI systems. Thirty two subjects were randomly divided into a Control-Group and a Hybrid-Group. In the Control-Group, the subjects performed left and right hand motor imagery (i.e., L-MI and R-MI). In the Hybrid-Group, the subjects performed the four mental tasks (i.e., L-MI, R-MI, L-SAO, and R-SAO). The results indicate that combining two of the tasks in a hybrid manner (such as L-SAO and R-MI) resulted in a significantly greater classification accuracy than when using two MI tasks. The hybrid modality reached 86.1% classification accuracy on average, with a 7.70% increase with respect to MI ( ), and 7.21% to SAO ( ) alone. Moreover, all 16 subjects in the hybrid modality reached at least 70% accuracy, which is considered the threshold for BCI illiteracy. In addition to the two-class results, the classification accuracy was 68.1% and 54.1% for the three-class and four-class hybrid BCI. Combining the induced brain signals from motor and somatosensory cortex, the proposed stimulus-independent hybrid BCI has shown improved performance with respect to individual modalities, reducing the portion of BCI-illiterate subjects, and provided novel types of multi-class BCIs.
Urban Change Detection of Pingtan City based on Bi-temporal Remote Sensing Images
NASA Astrophysics Data System (ADS)
Degang, JIANG; Jinyan, XU; Yikang, GAO
2017-02-01
In this paper, a pair of SPOT 5-6 images with the resolution of 0.5m is selected. An object-oriented classification method is used to the two images and five classes of ground features were identified as man-made objects, farmland, forest, waterbody and unutilized land. An auxiliary ASTER GDEM was used to improve the classification accuracy. And the change detection based on the classification results was performed. Accuracy assessment was carried out finally. Consequently, satisfactory results were obtained. The results show that great changes of the Pingtan city have been detected as the expansion of the city area and the intensity increase of man-made buildings, roads and other infrastructures with the establishment of Pingtan comprehensive experimental zone. Wide range of open sea area along the island coast zones has been reclaimed for port and CBDs construction.
Graph-Based Semi-Supervised Hyperspectral Image Classification Using Spatial Information
NASA Astrophysics Data System (ADS)
Jamshidpour, N.; Homayouni, S.; Safari, A.
2017-09-01
Hyperspectral image classification has been one of the most popular research areas in the remote sensing community in the past decades. However, there are still some problems that need specific attentions. For example, the lack of enough labeled samples and the high dimensionality problem are two most important issues which degrade the performance of supervised classification dramatically. The main idea of semi-supervised learning is to overcome these issues by the contribution of unlabeled samples, which are available in an enormous amount. In this paper, we propose a graph-based semi-supervised classification method, which uses both spectral and spatial information for hyperspectral image classification. More specifically, two graphs were designed and constructed in order to exploit the relationship among pixels in spectral and spatial spaces respectively. Then, the Laplacians of both graphs were merged to form a weighted joint graph. The experiments were carried out on two different benchmark hyperspectral data sets. The proposed method performed significantly better than the well-known supervised classification methods, such as SVM. The assessments consisted of both accuracy and homogeneity analyses of the produced classification maps. The proposed spectral-spatial SSL method considerably increased the classification accuracy when the labeled training data set is too scarce.When there were only five labeled samples for each class, the performance improved 5.92% and 10.76% compared to spatial graph-based SSL, for AVIRIS Indian Pine and Pavia University data sets respectively.
Abdolali, Fatemeh; Zoroofi, Reza Aghaeizadeh; Otake, Yoshito; Sato, Yoshinobu
2017-02-01
Accurate detection of maxillofacial cysts is an essential step for diagnosis, monitoring and planning therapeutic intervention. Cysts can be of various sizes and shapes and existing detection methods lead to poor results. Customizing automatic detection systems to gain sufficient accuracy in clinical practice is highly challenging. For this purpose, integrating the engineering knowledge in efficient feature extraction is essential. This paper presents a novel framework for maxillofacial cysts detection. A hybrid methodology based on surface and texture information is introduced. The proposed approach consists of three main steps as follows: At first, each cystic lesion is segmented with high accuracy. Then, in the second and third steps, feature extraction and classification are performed. Contourlet and SPHARM coefficients are utilized as texture and shape features which are fed into the classifier. Two different classifiers are used in this study, i.e. support vector machine and sparse discriminant analysis. Generally SPHARM coefficients are estimated by the iterative residual fitting (IRF) algorithm which is based on stepwise regression method. In order to improve the accuracy of IRF estimation, a method based on extra orthogonalization is employed to reduce linear dependency. We have utilized a ground-truth dataset consisting of cone beam CT images of 96 patients, belonging to three maxillofacial cyst categories: radicular cyst, dentigerous cyst and keratocystic odontogenic tumor. Using orthogonalized SPHARM, residual sum of squares is decreased which leads to a more accurate estimation. Analysis of the results based on statistical measures such as specificity, sensitivity, positive predictive value and negative predictive value is reported. The classification rate of 96.48% is achieved using sparse discriminant analysis and orthogonalized SPHARM features. Classification accuracy at least improved by 8.94% with respect to conventional features. This study demonstrated that our proposed methodology can improve the computer assisted diagnosis (CAD) performance by incorporating more discriminative features. Using orthogonalized SPHARM is promising in computerized cyst detection and may have a significant impact in future CAD systems. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Classification of right-hand grasp movement based on EMOTIV Epoc+
NASA Astrophysics Data System (ADS)
Tobing, T. A. M. L.; Prawito, Wijaya, S. K.
2017-07-01
Combinations of BCT elements for right-hand grasp movement have been obtained, providing the average value of their classification accuracy. The aim of this study is to find a suitable combination for best classification accuracy of right-hand grasp movement based on EEG headset, EMOTIV Epoc+. There are three movement classifications: grasping hand, relax, and opening hand. These classifications take advantage of Event-Related Desynchronization (ERD) phenomenon that makes it possible to differ relaxation, imagery, and movement state from each other. The combinations of elements are the usage of Independent Component Analysis (ICA), spectrum analysis by Fast Fourier Transform (FFT), maximum mu and beta power with their frequency as features, and also classifier Probabilistic Neural Network (PNN) and Radial Basis Function (RBF). The average values of classification accuracy are ± 83% for training and ± 57% for testing. To have a better understanding of the signal quality recorded by EMOTIV Epoc+, the result of classification accuracy of left or right-hand grasping movement EEG signal (provided by Physionet) also be given, i.e.± 85% for training and ± 70% for testing. The comparison of accuracy value from each combination, experiment condition, and external EEG data are provided for the purpose of value analysis of classification accuracy.
NASA Astrophysics Data System (ADS)
Porto, C. D. N.; Costa Filho, C. F. F.; Macedo, M. M. G.; Gutierrez, M. A.; Costa, M. G. F.
2017-03-01
Studies in intravascular optical coherence tomography (IV-OCT) have demonstrated the importance of coronary bifurcation regions in intravascular medical imaging analysis, as plaques are more likely to accumulate in this region leading to coronary disease. A typical IV-OCT pullback acquires hundreds of frames, thus developing an automated tool to classify the OCT frames as bifurcation or non-bifurcation can be an important step to speed up OCT pullbacks analysis and assist automated methods for atherosclerotic plaque quantification. In this work, we evaluate the performance of two state-of-the-art classifiers, SVM and Neural Networks in the bifurcation classification task. The study included IV-OCT frames from 9 patients. In order to improve classification performance, we trained and tested the SVM with different parameters by means of a grid search and different stop criteria were applied to the Neural Network classifier: mean square error, early stop and regularization. Different sets of features were tested, using feature selection techniques: PCA, LDA and scalar feature selection with correlation. Training and test were performed in sets with a maximum of 1460 OCT frames. We quantified our results in terms of false positive rate, true positive rate, accuracy, specificity, precision, false alarm, f-measure and area under ROC curve. Neural networks obtained the best classification accuracy, 98.83%, overcoming the results found in literature. Our methods appear to offer a robust and reliable automated classification of OCT frames that might assist physicians indicating potential frames to analyze. Methods for improving neural networks generalization have increased the classification performance.
Object-oriented crop mapping and monitoring using multi-temporal polarimetric RADARSAT-2 data
NASA Astrophysics Data System (ADS)
Jiao, Xianfeng; Kovacs, John M.; Shang, Jiali; McNairn, Heather; Walters, Dan; Ma, Baoluo; Geng, Xiaoyuan
2014-10-01
The aim of this paper is to assess the accuracy of an object-oriented classification of polarimetric Synthetic Aperture Radar (PolSAR) data to map and monitor crops using 19 RADARSAT-2 fine beam polarimetric (FQ) images of an agricultural area in North-eastern Ontario, Canada. Polarimetric images and field data were acquired during the 2011 and 2012 growing seasons. The classification and field data collection focused on the main crop types grown in the region, which include: wheat, oat, soybean, canola and forage. The polarimetric parameters were extracted with PolSAR analysis using both the Cloude-Pottier and Freeman-Durden decompositions. The object-oriented classification, with a single date of PolSAR data, was able to classify all five crop types with an accuracy of 95% and Kappa of 0.93; a 6% improvement in comparison with linear-polarization only classification. However, the time of acquisition is crucial. The larger biomass crops of canola and soybean were most accurately mapped, whereas the identification of oat and wheat were more variable. The multi-temporal data using the Cloude-Pottier decomposition parameters provided the best classification accuracy compared to the linear polarizations and the Freeman-Durden decomposition parameters. In general, the object-oriented classifications were able to accurately map crop types by reducing the noise inherent in the SAR data. Furthermore, using the crop classification maps we were able to monitor crop growth stage based on a trend analysis of the radar response. Based on field data from canola crops, there was a strong relationship between the phenological growth stage based on the BBCH scale, and the HV backscatter and entropy.
Burlina, Philippe; Billings, Seth; Joshi, Neil
2017-01-01
Objective To evaluate the use of ultrasound coupled with machine learning (ML) and deep learning (DL) techniques for automated or semi-automated classification of myositis. Methods Eighty subjects comprised of 19 with inclusion body myositis (IBM), 14 with polymyositis (PM), 14 with dermatomyositis (DM), and 33 normal (N) subjects were included in this study, where 3214 muscle ultrasound images of 7 muscles (observed bilaterally) were acquired. We considered three problems of classification including (A) normal vs. affected (DM, PM, IBM); (B) normal vs. IBM patients; and (C) IBM vs. other types of myositis (DM or PM). We studied the use of an automated DL method using deep convolutional neural networks (DL-DCNNs) for diagnostic classification and compared it with a semi-automated conventional ML method based on random forests (ML-RF) and “engineered” features. We used the known clinical diagnosis as the gold standard for evaluating performance of muscle classification. Results The performance of the DL-DCNN method resulted in accuracies ± standard deviation of 76.2% ± 3.1% for problem (A), 86.6% ± 2.4% for (B) and 74.8% ± 3.9% for (C), while the ML-RF method led to accuracies of 72.3% ± 3.3% for problem (A), 84.3% ± 2.3% for (B) and 68.9% ± 2.5% for (C). Conclusions This study demonstrates the application of machine learning methods for automatically or semi-automatically classifying inflammatory muscle disease using muscle ultrasound. Compared to the conventional random forest machine learning method used here, which has the drawback of requiring manual delineation of muscle/fat boundaries, DCNN-based classification by and large improved the accuracies in all classification problems while providing a fully automated approach to classification. PMID:28854220
Burlina, Philippe; Billings, Seth; Joshi, Neil; Albayda, Jemima
2017-01-01
To evaluate the use of ultrasound coupled with machine learning (ML) and deep learning (DL) techniques for automated or semi-automated classification of myositis. Eighty subjects comprised of 19 with inclusion body myositis (IBM), 14 with polymyositis (PM), 14 with dermatomyositis (DM), and 33 normal (N) subjects were included in this study, where 3214 muscle ultrasound images of 7 muscles (observed bilaterally) were acquired. We considered three problems of classification including (A) normal vs. affected (DM, PM, IBM); (B) normal vs. IBM patients; and (C) IBM vs. other types of myositis (DM or PM). We studied the use of an automated DL method using deep convolutional neural networks (DL-DCNNs) for diagnostic classification and compared it with a semi-automated conventional ML method based on random forests (ML-RF) and "engineered" features. We used the known clinical diagnosis as the gold standard for evaluating performance of muscle classification. The performance of the DL-DCNN method resulted in accuracies ± standard deviation of 76.2% ± 3.1% for problem (A), 86.6% ± 2.4% for (B) and 74.8% ± 3.9% for (C), while the ML-RF method led to accuracies of 72.3% ± 3.3% for problem (A), 84.3% ± 2.3% for (B) and 68.9% ± 2.5% for (C). This study demonstrates the application of machine learning methods for automatically or semi-automatically classifying inflammatory muscle disease using muscle ultrasound. Compared to the conventional random forest machine learning method used here, which has the drawback of requiring manual delineation of muscle/fat boundaries, DCNN-based classification by and large improved the accuracies in all classification problems while providing a fully automated approach to classification.
Computer-aided diagnosis system: a Bayesian hybrid classification method.
Calle-Alonso, F; Pérez, C J; Arias-Nicolás, J P; Martín, J
2013-10-01
A novel method to classify multi-class biomedical objects is presented. The method is based on a hybrid approach which combines pairwise comparison, Bayesian regression and the k-nearest neighbor technique. It can be applied in a fully automatic way or in a relevance feedback framework. In the latter case, the information obtained from both an expert and the automatic classification is iteratively used to improve the results until a certain accuracy level is achieved, then, the learning process is finished and new classifications can be automatically performed. The method has been applied in two biomedical contexts by following the same cross-validation schemes as in the original studies. The first one refers to cancer diagnosis, leading to an accuracy of 77.35% versus 66.37%, originally obtained. The second one considers the diagnosis of pathologies of the vertebral column. The original method achieves accuracies ranging from 76.5% to 96.7%, and from 82.3% to 97.1% in two different cross-validation schemes. Even with no supervision, the proposed method reaches 96.71% and 97.32% in these two cases. By using a supervised framework the achieved accuracy is 97.74%. Furthermore, all abnormal cases were correctly classified. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
Domínguez, Rocio Berenice; Moreno-Barón, Laura; Muñoz, Roberto; Gutiérrez, Juan Manuel
2014-01-01
This paper describes a new method based on a voltammetric electronic tongue (ET) for the recognition of distinctive features in coffee samples. An ET was directly applied to different samples from the main Mexican coffee regions without any pretreatment before the analysis. The resulting electrochemical information was modeled with two different mathematical tools, namely Linear Discriminant Analysis (LDA) and Support Vector Machines (SVM). Growing conditions (i.e., organic or non-organic practices and altitude of crops) were considered for a first classification. LDA results showed an average discrimination rate of 88% ± 6.53% while SVM successfully accomplished an overall accuracy of 96.4% ± 3.50% for the same task. A second classification based on geographical origin of samples was carried out. Results showed an overall accuracy of 87.5% ± 7.79% for LDA and a superior performance of 97.5% ± 3.22% for SVM. Given the complexity of coffee samples, the high accuracy percentages achieved by ET coupled with SVM in both classification problems suggested a potential applicability of ET in the assessment of selected coffee features with a simpler and faster methodology along with a null sample pretreatment. In addition, the proposed method can be applied to authentication assessment while improving cost, time and accuracy of the general procedure. PMID:25254303
Domínguez, Rocio Berenice; Moreno-Barón, Laura; Muñoz, Roberto; Gutiérrez, Juan Manuel
2014-09-24
This paper describes a new method based on a voltammetric electronic tongue (ET) for the recognition of distinctive features in coffee samples. An ET was directly applied to different samples from the main Mexican coffee regions without any pretreatment before the analysis. The resulting electrochemical information was modeled with two different mathematical tools, namely Linear Discriminant Analysis (LDA) and Support Vector Machines (SVM). Growing conditions (i.e., organic or non-organic practices and altitude of crops) were considered for a first classification. LDA results showed an average discrimination rate of 88% ± 6.53% while SVM successfully accomplished an overall accuracy of 96.4% ± 3.50% for the same task. A second classification based on geographical origin of samples was carried out. Results showed an overall accuracy of 87.5% ± 7.79% for LDA and a superior performance of 97.5% ± 3.22% for SVM. Given the complexity of coffee samples, the high accuracy percentages achieved by ET coupled with SVM in both classification problems suggested a potential applicability of ET in the assessment of selected coffee features with a simpler and faster methodology along with a null sample pretreatment. In addition, the proposed method can be applied to authentication assessment while improving cost, time and accuracy of the general procedure.
Low-back electromyography (EMG) data-driven load classification for dynamic lifting tasks.
Totah, Deema; Ojeda, Lauro; Johnson, Daniel D; Gates, Deanna; Mower Provost, Emily; Barton, Kira
2018-01-01
Numerous devices have been designed to support the back during lifting tasks. To improve the utility of such devices, this research explores the use of preparatory muscle activity to classify muscle loading and initiate appropriate device activation. The goal of this study was to determine the earliest time window that enabled accurate load classification during a dynamic lifting task. Nine subjects performed thirty symmetrical lifts, split evenly across three weight conditions (no-weight, 10-lbs and 24-lbs), while low-back muscle activity data was collected. Seven descriptive statistics features were extracted from 100 ms windows of data. A multinomial logistic regression (MLR) classifier was trained and tested, employing leave-one subject out cross-validation, to classify lifted load values. Dimensionality reduction was achieved through feature cross-correlation analysis and greedy feedforward selection. The time of full load support by the subject was defined as load-onset. Regions of highest average classification accuracy started at 200 ms before until 200 ms after load-onset with average accuracies ranging from 80% (±10%) to 81% (±7%). The average recall for each class ranged from 69-92%. These inter-subject classification results indicate that preparatory muscle activity can be leveraged to identify the intent to lift a weight up to 100 ms prior to load-onset. The high accuracies shown indicate the potential to utilize intent classification for assistive device applications. Active assistive devices, e.g. exoskeletons, could prevent back injury by off-loading low-back muscles. Early intent classification allows more time for actuators to respond and integrate seamlessly with the user.
NASA Astrophysics Data System (ADS)
Liu, F.; Chen, T.; He, J.; Wen, Q.; Yu, F.; Gu, X.; Wang, Z.
2018-04-01
In recent years, the quick upgrading and improvement of SAR sensors provide beneficial complements for the traditional optical remote sensing in the aspects of theory, technology and data. In this paper, Sentinel-1A SAR data and GF-1 optical data were selected for image fusion, and more emphases were put on the dryland crop classification under a complex crop planting structure, regarding corn and cotton as the research objects. Considering the differences among various data fusion methods, the principal component analysis (PCA), Gram-Schmidt (GS), Brovey and wavelet transform (WT) methods were compared with each other, and the GS and Brovey methods were proved to be more applicable in the study area. Then, the classification was conducted based on the object-oriented technique process. And for the GS, Brovey fusion images and GF-1 optical image, the nearest neighbour algorithm was adopted to realize the supervised classification with the same training samples. Based on the sample plots in the study area, the accuracy assessment was conducted subsequently. The values of overall accuracy and kappa coefficient of fusion images were all higher than those of GF-1 optical image, and GS method performed better than Brovey method. In particular, the overall accuracy of GS fusion image was 79.8 %, and the Kappa coefficient was 0.644. Thus, the results showed that GS and Brovey fusion images were superior to optical images for dryland crop classification. This study suggests that the fusion of SAR and optical images is reliable for dryland crop classification under a complex crop planting structure.
Kos, Gregor; Sieger, Markus; McMullin, David; Zahradnik, Celine; Sulyok, Michael; Öner, Tuba; Mizaikoff, Boris; Krska, Rudolf
2016-10-01
The rapid identification of mycotoxins such as deoxynivalenol and aflatoxin B 1 in agricultural commodities is an ongoing concern for food importers and processors. While sophisticated chromatography-based methods are well established for regulatory testing by food safety authorities, few techniques exist to provide a rapid assessment for traders. This study advances the development of a mid-infrared spectroscopic method, recording spectra with little sample preparation. Spectral data were classified using a bootstrap-aggregated (bagged) decision tree method, evaluating the protein and carbohydrate absorption regions of the spectrum. The method was able to classify 79% of 110 maize samples at the European Union regulatory limit for deoxynivalenol of 1750 µg kg -1 and, for the first time, 77% of 92 peanut samples at 8 µg kg -1 of aflatoxin B 1 . A subset model revealed a dependency on variety and type of fungal infection. The employed CRC and SBL maize varieties could be pooled in the model with a reduction of classification accuracy from 90% to 79%. Samples infected with Fusarium verticillioides were removed, leaving samples infected with F. graminearum and F. culmorum in the dataset improving classification accuracy from 73% to 79%. A 500 µg kg -1 classification threshold for deoxynivalenol in maize performed even better with 85% accuracy. This is assumed to be due to a larger number of samples around the threshold increasing representativity. Comparison with established principal component analysis classification, which consistently showed overlapping clusters, confirmed the superior performance of bagged decision tree classification.
Automated detection and recognition of wildlife using thermal cameras.
Christiansen, Peter; Steen, Kim Arild; Jørgensen, Rasmus Nyholm; Karstoft, Henrik
2014-07-30
In agricultural mowing operations, thousands of animals are injured or killed each year, due to the increased working widths and speeds of agricultural machinery. Detection and recognition of wildlife within the agricultural fields is important to reduce wildlife mortality and, thereby, promote wildlife-friendly farming. The work presented in this paper contributes to the automated detection and classification of animals in thermal imaging. The methods and results are based on top-view images taken manually from a lift to motivate work towards unmanned aerial vehicle-based detection and recognition. Hot objects are detected based on a threshold dynamically adjusted to each frame. For the classification of animals, we propose a novel thermal feature extraction algorithm. For each detected object, a thermal signature is calculated using morphological operations. The thermal signature describes heat characteristics of objects and is partly invariant to translation, rotation, scale and posture. The discrete cosine transform (DCT) is used to parameterize the thermal signature and, thereby, calculate a feature vector, which is used for subsequent classification. Using a k-nearest-neighbor (kNN) classifier, animals are discriminated from non-animals with a balanced classification accuracy of 84.7% in an altitude range of 3-10 m and an accuracy of 75.2% for an altitude range of 10-20 m. To incorporate temporal information in the classification, a tracking algorithm is proposed. Using temporal information improves the balanced classification accuracy to 93.3% in an altitude range 3-10 of meters and 77.7% in an altitude range of 10-20 m.
NASA Astrophysics Data System (ADS)
Amit, Guy; Ben-Ari, Rami; Hadad, Omer; Monovich, Einat; Granot, Noa; Hashoul, Sharbell
2017-03-01
Diagnostic interpretation of breast MRI studies requires meticulous work and a high level of expertise. Computerized algorithms can assist radiologists by automatically characterizing the detected lesions. Deep learning approaches have shown promising results in natural image classification, but their applicability to medical imaging is limited by the shortage of large annotated training sets. In this work, we address automatic classification of breast MRI lesions using two different deep learning approaches. We propose a novel image representation for dynamic contrast enhanced (DCE) breast MRI lesions, which combines the morphological and kinetics information in a single multi-channel image. We compare two classification approaches for discriminating between benign and malignant lesions: training a designated convolutional neural network and using a pre-trained deep network to extract features for a shallow classifier. The domain-specific trained network provided higher classification accuracy, compared to the pre-trained model, with an area under the ROC curve of 0.91 versus 0.81, and an accuracy of 0.83 versus 0.71. Similar accuracy was achieved in classifying benign lesions, malignant lesions, and normal tissue images. The trained network was able to improve accuracy by using the multi-channel image representation, and was more robust to reductions in the size of the training set. A small-size convolutional neural network can learn to accurately classify findings in medical images using only a few hundred images from a few dozen patients. With sufficient data augmentation, such a network can be trained to outperform a pre-trained out-of-domain classifier. Developing domain-specific deep-learning models for medical imaging can facilitate technological advancements in computer-aided diagnosis.
Identification of an Efficient Gene Expression Panel for Glioblastoma Classification
Zelaya, Ivette; Laks, Dan R.; Zhao, Yining; Kawaguchi, Riki; Gao, Fuying; Kornblum, Harley I.; Coppola, Giovanni
2016-01-01
We present here a novel genetic algorithm-based random forest (GARF) modeling technique that enables a reduction in the complexity of large gene disease signatures to highly accurate, greatly simplified gene panels. When applied to 803 glioblastoma multiforme samples, this method allowed the 840-gene Verhaak et al. gene panel (the standard in the field) to be reduced to a 48-gene classifier, while retaining 90.91% classification accuracy, and outperforming the best available alternative methods. Additionally, using this approach we produced a 32-gene panel which allows for better consistency between RNA-seq and microarray-based classifications, improving cross-platform classification retention from 69.67% to 86.07%. A webpage producing these classifications is available at http://simplegbm.semel.ucla.edu. PMID:27855170
The research on medical image classification algorithm based on PLSA-BOW model.
Cao, C H; Cao, H L
2016-04-29
With the rapid development of modern medical imaging technology, medical image classification has become more important for medical diagnosis and treatment. To solve the existence of polysemous words and synonyms problem, this study combines the word bag model with PLSA (Probabilistic Latent Semantic Analysis) and proposes the PLSA-BOW (Probabilistic Latent Semantic Analysis-Bag of Words) model. In this paper we introduce the bag of words model in text field to image field, and build the model of visual bag of words model. The method enables the word bag model-based classification method to be further improved in accuracy. The experimental results show that the PLSA-BOW model for medical image classification can lead to a more accurate classification.
Spectral-Spatial Classification of Hyperspectral Images Using Hierarchical Optimization
NASA Technical Reports Server (NTRS)
Tarabalka, Yuliya; Tilton, James C.
2011-01-01
A new spectral-spatial method for hyperspectral data classification is proposed. For a given hyperspectral image, probabilistic pixelwise classification is first applied. Then, hierarchical step-wise optimization algorithm is performed, by iteratively merging neighboring regions with the smallest Dissimilarity Criterion (DC) and recomputing class labels for new regions. The DC is computed by comparing region mean vectors, class labels and a number of pixels in the two regions under consideration. The algorithm is converged when all the pixels get involved in the region merging procedure. Experimental results are presented on two remote sensing hyperspectral images acquired by the AVIRIS and ROSIS sensors. The proposed approach improves classification accuracies and provides maps with more homogeneous regions, when compared to previously proposed classification techniques.
Multiple Spectral-Spatial Classification Approach for Hyperspectral Data
NASA Technical Reports Server (NTRS)
Tarabalka, Yuliya; Benediktsson, Jon Atli; Chanussot, Jocelyn; Tilton, James C.
2010-01-01
A .new multiple classifier approach for spectral-spatial classification of hyperspectral images is proposed. Several classifiers are used independently to classify an image. For every pixel, if all the classifiers have assigned this pixel to the same class, the pixel is kept as a marker, i.e., a seed of the spatial region, with the corresponding class label. We propose to use spectral-spatial classifiers at the preliminary step of the marker selection procedure, each of them combining the results of a pixel-wise classification and a segmentation map. Different segmentation methods based on dissimilar principles lead to different classification results. Furthermore, a minimum spanning forest is built, where each tree is rooted on a classification -driven marker and forms a region in the spectral -spatial classification: map. Experimental results are presented for two hyperspectral airborne images. The proposed method significantly improves classification accuracies, when compared to previously proposed classification techniques.
A Learning-Based Approach for IP Geolocation
NASA Astrophysics Data System (ADS)
Eriksson, Brian; Barford, Paul; Sommers, Joel; Nowak, Robert
The ability to pinpoint the geographic location of IP hosts is compelling for applications such as on-line advertising and network attack diagnosis. While prior methods can accurately identify the location of hosts in some regions of the Internet, they produce erroneous results when the delay or topology measurement on which they are based is limited. The hypothesis of our work is that the accuracy of IP geolocation can be improved through the creation of a flexible analytic framework that accommodates different types of geolocation information. In this paper, we describe a new framework for IP geolocation that reduces to a machine-learning classification problem. Our methodology considers a set of lightweight measurements from a set of known monitors to a target, and then classifies the location of that target based on the most probable geographic region given probability densities learned from a training set. For this study, we employ a Naive Bayes framework that has low computational complexity and enables additional environmental information to be easily added to enhance the classification process. To demonstrate the feasibility and accuracy of our approach, we test IP geolocation on over 16,000 routers given ping measurements from 78 monitors with known geographic placement. Our results show that the simple application of our method improves geolocation accuracy for over 96% of the nodes identified in our data set, with on average accuracy 70 miles closer to the true geographic location versus prior constraint-based geolocation. These results highlight the promise of our method and indicate how future expansion of the classifier can lead to further improvements in geolocation accuracy.
NASA Astrophysics Data System (ADS)
Squiers, John J.; Li, Weizhi; King, Darlene R.; Mo, Weirong; Zhang, Xu; Lu, Yang; Sellke, Eric W.; Fan, Wensheng; DiMaio, J. Michael; Thatcher, Jeffrey E.
2016-03-01
The clinical judgment of expert burn surgeons is currently the standard on which diagnostic and therapeutic decisionmaking regarding burn injuries is based. Multispectral imaging (MSI) has the potential to increase the accuracy of burn depth assessment and the intraoperative identification of viable wound bed during surgical debridement of burn injuries. A highly accurate classification model must be developed using machine-learning techniques in order to translate MSI data into clinically-relevant information. An animal burn model was developed to build an MSI training database and to study the burn tissue classification ability of several models trained via common machine-learning algorithms. The algorithms tested, from least to most complex, were: K-nearest neighbors (KNN), decision tree (DT), linear discriminant analysis (LDA), weighted linear discriminant analysis (W-LDA), quadratic discriminant analysis (QDA), ensemble linear discriminant analysis (EN-LDA), ensemble K-nearest neighbors (EN-KNN), and ensemble decision tree (EN-DT). After the ground-truth database of six tissue types (healthy skin, wound bed, blood, hyperemia, partial injury, full injury) was generated by histopathological analysis, we used 10-fold cross validation to compare the algorithms' performances based on their accuracies in classifying data against the ground truth, and each algorithm was tested 100 times. The mean test accuracy of the algorithms were KNN 68.3%, DT 61.5%, LDA 70.5%, W-LDA 68.1%, QDA 68.9%, EN-LDA 56.8%, EN-KNN 49.7%, and EN-DT 36.5%. LDA had the highest test accuracy, reflecting the bias-variance tradeoff over the range of complexities inherent to the algorithms tested. Several algorithms were able to match the current standard in burn tissue classification, the clinical judgment of expert burn surgeons. These results will guide further development of an MSI burn tissue classification system. Given that there are few surgeons and facilities specializing in burn care, this technology may improve the standard of burn care for patients without access to specialized facilities.
Classification Accuracy Increase Using Multisensor Data Fusion
NASA Astrophysics Data System (ADS)
Makarau, A.; Palubinskas, G.; Reinartz, P.
2011-09-01
The practical use of very high resolution visible and near-infrared (VNIR) data is still growing (IKONOS, Quickbird, GeoEye-1, etc.) but for classification purposes the number of bands is limited in comparison to full spectral imaging. These limitations may lead to the confusion of materials such as different roofs, pavements, roads, etc. and therefore may provide wrong interpretation and use of classification products. Employment of hyperspectral data is another solution, but their low spatial resolution (comparing to multispectral data) restrict their usage for many applications. Another improvement can be achieved by fusion approaches of multisensory data since this may increase the quality of scene classification. Integration of Synthetic Aperture Radar (SAR) and optical data is widely performed for automatic classification, interpretation, and change detection. In this paper we present an approach for very high resolution SAR and multispectral data fusion for automatic classification in urban areas. Single polarization TerraSAR-X (SpotLight mode) and multispectral data are integrated using the INFOFUSE framework, consisting of feature extraction (information fission), unsupervised clustering (data representation on a finite domain and dimensionality reduction), and data aggregation (Bayesian or neural network). This framework allows a relevant way of multisource data combination following consensus theory. The classification is not influenced by the limitations of dimensionality, and the calculation complexity primarily depends on the step of dimensionality reduction. Fusion of single polarization TerraSAR-X, WorldView-2 (VNIR or full set), and Digital Surface Model (DSM) data allow for different types of urban objects to be classified into predefined classes of interest with increased accuracy. The comparison to classification results of WorldView-2 multispectral data (8 spectral bands) is provided and the numerical evaluation of the method in comparison to other established methods illustrates the advantage in the classification accuracy for many classes such as buildings, low vegetation, sport objects, forest, roads, rail roads, etc.
Li, Yachun; Charalampaki, Patra; Liu, Yong; Yang, Guang-Zhong; Giannarou, Stamatia
2018-06-13
Probe-based confocal laser endomicroscopy (pCLE) enables in vivo, in situ tissue characterisation without changes in the surgical setting and simplifies the oncological surgical workflow. The potential of this technique in identifying residual cancer tissue and improving resection rates of brain tumours has been recently verified in pilot studies. The interpretation of endomicroscopic information is challenging, particularly for surgeons who do not themselves routinely review histopathology. Also, the diagnosis can be examiner-dependent, leading to considerable inter-observer variability. Therefore, automatic tissue characterisation with pCLE would support the surgeon in establishing diagnosis as well as guide robot-assisted intervention procedures. The aim of this work is to propose a deep learning-based framework for brain tissue characterisation for context aware diagnosis support in neurosurgical oncology. An efficient representation of the context information of pCLE data is presented by exploring state-of-the-art CNN models with different tuning configurations. A novel video classification framework based on the combination of convolutional layers with long-range temporal recursion has been proposed to estimate the probability of each tumour class. The video classification accuracy is compared for different network architectures and data representation and video segmentation methods. We demonstrate the application of the proposed deep learning framework to classify Glioblastoma and Meningioma brain tumours based on endomicroscopic data. Results show significant improvement of our proposed image classification framework over state-of-the-art feature-based methods. The use of video data further improves the classification performance, achieving accuracy equal to 99.49%. This work demonstrates that deep learning can provide an efficient representation of pCLE data and accurately classify Glioblastoma and Meningioma tumours. The performance evaluation analysis shows the potential clinical value of the technique.
Estimating Classification Consistency and Accuracy for Cognitive Diagnostic Assessment
ERIC Educational Resources Information Center
Cui, Ying; Gierl, Mark J.; Chang, Hua-Hua
2012-01-01
This article introduces procedures for the computation and asymptotic statistical inference for classification consistency and accuracy indices specifically designed for cognitive diagnostic assessments. The new classification indices can be used as important indicators of the reliability and validity of classification results produced by…
Tabu search and binary particle swarm optimization for feature selection using microarray data.
Chuang, Li-Yeh; Yang, Cheng-Huei; Yang, Cheng-Hong
2009-12-01
Gene expression profiles have great potential as a medical diagnosis tool because they represent the state of a cell at the molecular level. In the classification of cancer type research, available training datasets generally have a fairly small sample size compared to the number of genes involved. This fact poses an unprecedented challenge to some classification methodologies due to training data limitations. Therefore, a good selection method for genes relevant for sample classification is needed to improve the predictive accuracy, and to avoid incomprehensibility due to the large number of genes investigated. In this article, we propose to combine tabu search (TS) and binary particle swarm optimization (BPSO) for feature selection. BPSO acts as a local optimizer each time the TS has been run for a single generation. The K-nearest neighbor method with leave-one-out cross-validation and support vector machine with one-versus-rest serve as evaluators of the TS and BPSO. The proposed method is applied and compared to the 11 classification problems taken from the literature. Experimental results show that our method simplifies features effectively and either obtains higher classification accuracy or uses fewer features compared to other feature selection methods.
NASA Technical Reports Server (NTRS)
Rignot, E.; Chellappa, R.
1993-01-01
We present a maximum a posteriori (MAP) classifier for classifying multifrequency, multilook, single polarization SAR intensity data into regions or ensembles of pixels of homogeneous and similar radar backscatter characteristics. A model for the prior joint distribution of the multifrequency SAR intensity data is combined with a Markov random field for representing the interactions between region labels to obtain an expression for the posterior distribution of the region labels given the multifrequency SAR observations. The maximization of the posterior distribution yields Bayes's optimum region labeling or classification of the SAR data or its MAP estimate. The performance of the MAP classifier is evaluated by using computer-simulated multilook SAR intensity data as a function of the parameters in the classification process. Multilook SAR intensity data are shown to yield higher classification accuracies than one-look SAR complex amplitude data. The MAP classifier is extended to the case in which the radar backscatter from the remotely sensed surface varies within the SAR image because of incidence angle effects. The results obtained illustrate the practicality of the method for combining SAR intensity observations acquired at two different frequencies and for improving classification accuracy of SAR data.
Li, Yiqing; Wang, Yu; Zi, Yanyang; Zhang, Mingquan
2015-10-21
The various multi-sensor signal features from a diesel engine constitute a complex high-dimensional dataset. The non-linear dimensionality reduction method, t-distributed stochastic neighbor embedding (t-SNE), provides an effective way to implement data visualization for complex high-dimensional data. However, irrelevant features can deteriorate the performance of data visualization, and thus, should be eliminated a priori. This paper proposes a feature subset score based t-SNE (FSS-t-SNE) data visualization method to deal with the high-dimensional data that are collected from multi-sensor signals. In this method, the optimal feature subset is constructed by a feature subset score criterion. Then the high-dimensional data are visualized in 2-dimension space. According to the UCI dataset test, FSS-t-SNE can effectively improve the classification accuracy. An experiment was performed with a large power marine diesel engine to validate the proposed method for diesel engine malfunction classification. Multi-sensor signals were collected by a cylinder vibration sensor and a cylinder pressure sensor. Compared with other conventional data visualization methods, the proposed method shows good visualization performance and high classification accuracy in multi-malfunction classification of a diesel engine.
Moncada-Torres, A; Leuenberger, K; Gonzenbach, R; Luft, A; Gassert, R
2014-07-01
Miniature, wearable sensor modules are a promising technology to monitor activities of daily living (ADL) over extended periods of time. To assure both user compliance and meaningful results, the selection and placement site of sensors requires careful consideration. We investigated these aspects for the classification of 16 ADL in 6 healthy subjects under laboratory conditions using ReSense, our custom-made inertial measurement unit enhanced with a barometric pressure sensor used to capture activity-related altitude changes. Subjects wore a module on each wrist and ankle, and one on the trunk. Activities comprised whole body movements as well as gross and dextrous upper-limb activities. Wrist-module data outperformed the other locations for the three activity groups. Specifically, overall classification accuracy rates of almost 93% and more than 95% were achieved for the repeated holdout and user-specific validation methods, respectively, for all 16 activities. Including the altitude profile resulted in a considerable improvement of up to 20% in the classification accuracy for stair ascent and descent. The gyroscopes provided no useful information for activity classification under this scheme. The proposed sensor setting could allow for robust long-term activity monitoring with high compliance in different patient populations.
Li, Yiqing; Wang, Yu; Zi, Yanyang; Zhang, Mingquan
2015-01-01
The various multi-sensor signal features from a diesel engine constitute a complex high-dimensional dataset. The non-linear dimensionality reduction method, t-distributed stochastic neighbor embedding (t-SNE), provides an effective way to implement data visualization for complex high-dimensional data. However, irrelevant features can deteriorate the performance of data visualization, and thus, should be eliminated a priori. This paper proposes a feature subset score based t-SNE (FSS-t-SNE) data visualization method to deal with the high-dimensional data that are collected from multi-sensor signals. In this method, the optimal feature subset is constructed by a feature subset score criterion. Then the high-dimensional data are visualized in 2-dimension space. According to the UCI dataset test, FSS-t-SNE can effectively improve the classification accuracy. An experiment was performed with a large power marine diesel engine to validate the proposed method for diesel engine malfunction classification. Multi-sensor signals were collected by a cylinder vibration sensor and a cylinder pressure sensor. Compared with other conventional data visualization methods, the proposed method shows good visualization performance and high classification accuracy in multi-malfunction classification of a diesel engine. PMID:26506347
NASA Astrophysics Data System (ADS)
Raziff, Abdul Rafiez Abdul; Sulaiman, Md Nasir; Mustapha, Norwati; Perumal, Thinagaran
2017-10-01
Gait recognition is widely used in many applications. In the application of the gait identification especially in people, the number of classes (people) is many which may comprise to more than 20. Due to the large amount of classes, the usage of single classification mapping (direct classification) may not be suitable as most of the existing algorithms are mostly designed for the binary classification. Furthermore, having many classes in a dataset may result in the possibility of having a high degree of overlapped class boundary. This paper discusses the application of multiclass classifier mappings such as one-vs-all (OvA), one-vs-one (OvO) and random correction code (RCC) on handheld based smartphone gait signal for person identification. The results is then compared with a single J48 decision tree for benchmark. From the result, it can be said that using multiclass classification mapping method thus partially improved the overall accuracy especially on OvO and RCC with width factor more than 4. For OvA, the accuracy result is worse than a single J48 due to a high number of classes.
Learning discriminative features from RGB-D images for gender and ethnicity identification
NASA Astrophysics Data System (ADS)
Azzakhnini, Safaa; Ballihi, Lahoucine; Aboutajdine, Driss
2016-11-01
The development of sophisticated sensor technologies gave rise to an interesting variety of data. With the appearance of affordable devices, such as the Microsoft Kinect, depth-maps and three-dimensional data became easily accessible. This attracted many computer vision researchers seeking to exploit this information in classification and recognition tasks. In this work, the problem of face classification in the context of RGB images and depth information (RGB-D images) is addressed. The purpose of this paper is to study and compare some popular techniques for gender recognition and ethnicity classification to understand how much depth data can improve the quality of recognition. Furthermore, we investigate which combination of face descriptors, feature selection methods, and learning techniques is best suited to better exploit RGB-D images. The experimental results show that depth data improve the recognition accuracy for gender and ethnicity classification applications in many use cases.
Walton, Emily; Casey, Christy; Mitsch, Jurgen; Vázquez-Diosdado, Jorge A; Yan, Juan; Dottorini, Tania; Ellis, Keith A; Winterlich, Anthony; Kaler, Jasmeet
2018-02-01
Automated behavioural classification and identification through sensors has the potential to improve health and welfare of the animals. Position of a sensor, sampling frequency and window size of segmented signal data has a major impact on classification accuracy in activity recognition and energy needs for the sensor, yet, there are no studies in precision livestock farming that have evaluated the effect of all these factors simultaneously. The aim of this study was to evaluate the effects of position (ear and collar), sampling frequency (8, 16 and 32 Hz) of a triaxial accelerometer and gyroscope sensor and window size (3, 5 and 7 s) on the classification of important behaviours in sheep such as lying, standing and walking. Behaviours were classified using a random forest approach with 44 feature characteristics. The best performance for walking, standing and lying classification in sheep (accuracy 95%, F -score 91%-97%) was obtained using combination of 32 Hz, 7 s and 32 Hz, 5 s for both ear and collar sensors, although, results obtained with 16 Hz and 7 s window were comparable with accuracy of 91%-93% and F -score 88%-95%. Energy efficiency was best at a 7 s window. This suggests that sampling at 16 Hz with 7 s window will offer benefits in a real-time behavioural monitoring system for sheep due to reduced energy needs.
Walton, Emily; Casey, Christy; Mitsch, Jurgen; Vázquez-Diosdado, Jorge A.; Yan, Juan; Dottorini, Tania; Ellis, Keith A.; Winterlich, Anthony
2018-01-01
Automated behavioural classification and identification through sensors has the potential to improve health and welfare of the animals. Position of a sensor, sampling frequency and window size of segmented signal data has a major impact on classification accuracy in activity recognition and energy needs for the sensor, yet, there are no studies in precision livestock farming that have evaluated the effect of all these factors simultaneously. The aim of this study was to evaluate the effects of position (ear and collar), sampling frequency (8, 16 and 32 Hz) of a triaxial accelerometer and gyroscope sensor and window size (3, 5 and 7 s) on the classification of important behaviours in sheep such as lying, standing and walking. Behaviours were classified using a random forest approach with 44 feature characteristics. The best performance for walking, standing and lying classification in sheep (accuracy 95%, F-score 91%–97%) was obtained using combination of 32 Hz, 7 s and 32 Hz, 5 s for both ear and collar sensors, although, results obtained with 16 Hz and 7 s window were comparable with accuracy of 91%–93% and F-score 88%–95%. Energy efficiency was best at a 7 s window. This suggests that sampling at 16 Hz with 7 s window will offer benefits in a real-time behavioural monitoring system for sheep due to reduced energy needs. PMID:29515862
Texture classification of vegetation cover in high altitude wetlands zone
NASA Astrophysics Data System (ADS)
Wentao, Zou; Bingfang, Wu; Hongbo, Ju; Hua, Liu
2014-03-01
The aim of this study was to investigate the utility of datasets composed of texture measures and other features for the classification of vegetation cover, specifically wetlands. QUEST decision tree classifier was applied to a SPOT-5 image sub-scene covering the typical wetlands area in Three River Sources region in Qinghai province, China. The dataset used for the classification comprised of: (1) spectral data and the components of principal component analysis; (2) texture measures derived from pixel basis; (3) DEM and other ancillary data covering the research area. Image textures is an important characteristic of remote sensing images; it can represent spatial variations with spectral brightness in digital numbers. When the spectral information is not enough to separate the different land covers, the texture information can be used to increase the classification accuracy. The texture measures used in this study were calculated from GLCM (Gray level Co-occurrence Matrix); eight frequently used measures were chosen to conduct the classification procedure. The results showed that variance, mean and entropy calculated by GLCM with a 9*9 size window were effective in distinguishing different vegetation types in wetlands zone. The overall accuracy of this method was 84.19% and the Kappa coefficient was 0.8261. The result indicated that the introduction of texture measures can improve the overall accuracy by 12.05% and the overall kappa coefficient by 0.1407 compared with the result using spectral and ancillary data.
Joint deconvolution and classification with applications to passive acoustic underwater multipath.
Anderson, Hyrum S; Gupta, Maya R
2008-11-01
This paper addresses the problem of classifying signals that have been corrupted by noise and unknown linear time-invariant (LTI) filtering such as multipath, given labeled uncorrupted training signals. A maximum a posteriori approach to the deconvolution and classification is considered, which produces estimates of the desired signal, the unknown channel, and the class label. For cases in which only a class label is needed, the classification accuracy can be improved by not committing to an estimate of the channel or signal. A variant of the quadratic discriminant analysis (QDA) classifier is proposed that probabilistically accounts for the unknown LTI filtering, and which avoids deconvolution. The proposed QDA classifier can work either directly on the signal or on features whose transformation by LTI filtering can be analyzed; as an example a classifier for subband-power features is derived. Results on simulated data and real Bowhead whale vocalizations show that jointly considering deconvolution with classification can dramatically improve classification performance over traditional methods over a range of signal-to-noise ratios.
Gender classification of running subjects using full-body kinematics
NASA Astrophysics Data System (ADS)
Williams, Christina M.; Flora, Jeffrey B.; Iftekharuddin, Khan M.
2016-05-01
This paper proposes novel automated gender classification of subjects while engaged in running activity. The machine learning techniques include preprocessing steps using principal component analysis followed by classification with linear discriminant analysis, and nonlinear support vector machines, and decision-stump with AdaBoost. The dataset consists of 49 subjects (25 males, 24 females, 2 trials each) all equipped with approximately 80 retroreflective markers. The trials are reflective of the subject's entire body moving unrestrained through a capture volume at a self-selected running speed, thus producing highly realistic data. The classification accuracy using leave-one-out cross validation for the 49 subjects is improved from 66.33% using linear discriminant analysis to 86.74% using the nonlinear support vector machine. Results are further improved to 87.76% by means of implementing a nonlinear decision stump with AdaBoost classifier. The experimental findings suggest that the linear classification approaches are inadequate in classifying gender for a large dataset with subjects running in a moderately uninhibited environment.
Quirós, Elia; Felicísimo, Angel M; Cuartero, Aurora
2009-01-01
This work proposes a new method to classify multi-spectral satellite images based on multivariate adaptive regression splines (MARS) and compares this classification system with the more common parallelepiped and maximum likelihood (ML) methods. We apply the classification methods to the land cover classification of a test zone located in southwestern Spain. The basis of the MARS method and its associated procedures are explained in detail, and the area under the ROC curve (AUC) is compared for the three methods. The results show that the MARS method provides better results than the parallelepiped method in all cases, and it provides better results than the maximum likelihood method in 13 cases out of 17. These results demonstrate that the MARS method can be used in isolation or in combination with other methods to improve the accuracy of soil cover classification. The improvement is statistically significant according to the Wilcoxon signed rank test.
Computational approaches for the classification of seed storage proteins.
Radhika, V; Rao, V Sree Hari
2015-07-01
Seed storage proteins comprise a major part of the protein content of the seed and have an important role on the quality of the seed. These storage proteins are important because they determine the total protein content and have an effect on the nutritional quality and functional properties for food processing. Transgenic plants are being used to develop improved lines for incorporation into plant breeding programs and the nutrient composition of seeds is a major target of molecular breeding programs. Hence, classification of these proteins is crucial for the development of superior varieties with improved nutritional quality. In this study we have applied machine learning algorithms for classification of seed storage proteins. We have presented an algorithm based on nearest neighbor approach for classification of seed storage proteins and compared its performance with decision tree J48, multilayer perceptron neural (MLP) network and support vector machine (SVM) libSVM. The model based on our algorithm has been able to give higher classification accuracy in comparison to the other methods.
Gao, Xiang; Lin, Huaiying; Revanna, Kashi; Dong, Qunfeng
2017-05-10
Species-level classification for 16S rRNA gene sequences remains a serious challenge for microbiome researchers, because existing taxonomic classification tools for 16S rRNA gene sequences either do not provide species-level classification, or their classification results are unreliable. The unreliable results are due to the limitations in the existing methods which either lack solid probabilistic-based criteria to evaluate the confidence of their taxonomic assignments, or use nucleotide k-mer frequency as the proxy for sequence similarity measurement. We have developed a method that shows significantly improved species-level classification results over existing methods. Our method calculates true sequence similarity between query sequences and database hits using pairwise sequence alignment. Taxonomic classifications are assigned from the species to the phylum levels based on the lowest common ancestors of multiple database hits for each query sequence, and further classification reliabilities are evaluated by bootstrap confidence scores. The novelty of our method is that the contribution of each database hit to the taxonomic assignment of the query sequence is weighted by a Bayesian posterior probability based upon the degree of sequence similarity of the database hit to the query sequence. Our method does not need any training datasets specific for different taxonomic groups. Instead only a reference database is required for aligning to the query sequences, making our method easily applicable for different regions of the 16S rRNA gene or other phylogenetic marker genes. Reliable species-level classification for 16S rRNA or other phylogenetic marker genes is critical for microbiome research. Our software shows significantly higher classification accuracy than the existing tools and we provide probabilistic-based confidence scores to evaluate the reliability of our taxonomic classification assignments based on multiple database matches to query sequences. Despite its higher computational costs, our method is still suitable for analyzing large-scale microbiome datasets for practical purposes. Furthermore, our method can be applied for taxonomic classification of any phylogenetic marker gene sequences. Our software, called BLCA, is freely available at https://github.com/qunfengdong/BLCA .
Deep Multi-Task Learning for Tree Genera Classification
NASA Astrophysics Data System (ADS)
Ko, C.; Kang, J.; Sohn, G.
2018-05-01
The goal for our paper is to classify tree genera using airborne Light Detection and Ranging (LiDAR) data with Convolution Neural Network (CNN) - Multi-task Network (MTN) implementation. Unlike Single-task Network (STN) where only one task is assigned to the learning outcome, MTN is a deep learning architect for learning a main task (classification of tree genera) with other tasks (in our study, classification of coniferous and deciduous) simultaneously, with shared classification features. The main contribution of this paper is to improve classification accuracy from CNN-STN to CNN-MTN. This is achieved by introducing a concurrence loss (Lcd) to the designed MTN. This term regulates the overall network performance by minimizing the inconsistencies between the two tasks. Results show that we can increase the classification accuracy from 88.7 % to 91.0 % (from STN to MTN). The second goal of this paper is to solve the problem of small training sample size by multiple-view data generation. The motivation of this goal is to address one of the most common problems in implementing deep learning architecture, the insufficient number of training data. We address this problem by simulating training dataset with multiple-view approach. The promising results from this paper are providing a basis for classifying a larger number of dataset and number of classes in the future.
Myakalwar, Ashwin Kumar; Sreedhar, S.; Barman, Ishan; Dingari, Narahara Chari; Rao, S. Venugopal; Kiran, P. Prem; Tewari, Surya P.; Kumar, G. Manoj
2012-01-01
We report the effectiveness of laser-induced breakdown spectroscopy (LIBS) in probing the content of pharmaceutical tablets and also investigate its feasibility for routine classification. This method is particularly beneficial in applications where its exquisite chemical specificity and suitability for remote and on site characterization significantly improves the speed and accuracy of quality control and assurance process. Our experiments reveal that in addition to the presence of carbon, hydrogen, nitrogen and oxygen, which can be primarily attributed to the active pharmaceutical ingredients, specific inorganic atoms were also present in all the tablets. Initial attempts at classification by a ratiometric approach using oxygen to nitrogen compositional values yielded an optimal value (at 746.83 nm) with the least relative standard deviation but nevertheless failed to provide an acceptable classification. To overcome this bottleneck in the detection process, two chemometric algorithms, i.e. principal component analysis (PCA) and soft independent modeling of class analogy (SIMCA), were implemented to exploit the multivariate nature of the LIBS data demonstrating that LIBS has the potential to differentiate and discriminate among pharmaceutical tablets. We report excellent prospective classification accuracy using supervised classification via the SIMCA algorithm, demonstrating its potential for future applications in process analytical technology, especially for fast on-line process control monitoring applications in the pharmaceutical industry. PMID:22099648
Multi-level discriminative dictionary learning with application to large scale image classification.
Shen, Li; Sun, Gang; Huang, Qingming; Wang, Shuhui; Lin, Zhouchen; Wu, Enhua
2015-10-01
The sparse coding technique has shown flexibility and capability in image representation and analysis. It is a powerful tool in many visual applications. Some recent work has shown that incorporating the properties of task (such as discrimination for classification task) into dictionary learning is effective for improving the accuracy. However, the traditional supervised dictionary learning methods suffer from high computation complexity when dealing with large number of categories, making them less satisfactory in large scale applications. In this paper, we propose a novel multi-level discriminative dictionary learning method and apply it to large scale image classification. Our method takes advantage of hierarchical category correlation to encode multi-level discriminative information. Each internal node of the category hierarchy is associated with a discriminative dictionary and a classification model. The dictionaries at different layers are learnt to capture the information of different scales. Moreover, each node at lower layers also inherits the dictionary of its parent, so that the categories at lower layers can be described with multi-scale information. The learning of dictionaries and associated classification models is jointly conducted by minimizing an overall tree loss. The experimental results on challenging data sets demonstrate that our approach achieves excellent accuracy and competitive computation cost compared with other sparse coding methods for large scale image classification.
Weighted statistical parameters for irregularly sampled time series
NASA Astrophysics Data System (ADS)
Rimoldini, Lorenzo
2014-01-01
Unevenly spaced time series are common in astronomy because of the day-night cycle, weather conditions, dependence on the source position in the sky, allocated telescope time and corrupt measurements, for example, or inherent to the scanning law of satellites like Hipparcos and the forthcoming Gaia. Irregular sampling often causes clumps of measurements and gaps with no data which can severely disrupt the values of estimators. This paper aims at improving the accuracy of common statistical parameters when linear interpolation (in time or phase) can be considered an acceptable approximation of a deterministic signal. A pragmatic solution is formulated in terms of a simple weighting scheme, adapting to the sampling density and noise level, applicable to large data volumes at minimal computational cost. Tests on time series from the Hipparcos periodic catalogue led to significant improvements in the overall accuracy and precision of the estimators with respect to the unweighted counterparts and those weighted by inverse-squared uncertainties. Automated classification procedures employing statistical parameters weighted by the suggested scheme confirmed the benefits of the improved input attributes. The classification of eclipsing binaries, Mira, RR Lyrae, Delta Cephei and Alpha2 Canum Venaticorum stars employing exclusively weighted descriptive statistics achieved an overall accuracy of 92 per cent, about 6 per cent higher than with unweighted estimators.
Le, Trang T; Simmons, W Kyle; Misaki, Masaya; Bodurka, Jerzy; White, Bill C; Savitz, Jonathan; McKinney, Brett A
2017-09-15
Classification of individuals into disease or clinical categories from high-dimensional biological data with low prediction error is an important challenge of statistical learning in bioinformatics. Feature selection can improve classification accuracy but must be incorporated carefully into cross-validation to avoid overfitting. Recently, feature selection methods based on differential privacy, such as differentially private random forests and reusable holdout sets, have been proposed. However, for domains such as bioinformatics, where the number of features is much larger than the number of observations p≫n , these differential privacy methods are susceptible to overfitting. We introduce private Evaporative Cooling, a stochastic privacy-preserving machine learning algorithm that uses Relief-F for feature selection and random forest for privacy preserving classification that also prevents overfitting. We relate the privacy-preserving threshold mechanism to a thermodynamic Maxwell-Boltzmann distribution, where the temperature represents the privacy threshold. We use the thermal statistical physics concept of Evaporative Cooling of atomic gases to perform backward stepwise privacy-preserving feature selection. On simulated data with main effects and statistical interactions, we compare accuracies on holdout and validation sets for three privacy-preserving methods: the reusable holdout, reusable holdout with random forest, and private Evaporative Cooling, which uses Relief-F feature selection and random forest classification. In simulations where interactions exist between attributes, private Evaporative Cooling provides higher classification accuracy without overfitting based on an independent validation set. In simulations without interactions, thresholdout with random forest and private Evaporative Cooling give comparable accuracies. We also apply these privacy methods to human brain resting-state fMRI data from a study of major depressive disorder. Code available at http://insilico.utulsa.edu/software/privateEC . brett-mckinney@utulsa.edu. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
NASA Astrophysics Data System (ADS)
Gao, Yan; Marpu, Prashanth; Morales Manila, Luis M.
2014-11-01
This paper assesses the suitability of 8-band Worldview-2 (WV2) satellite data and object-based random forest algorithm for the classification of avocado growth stages in Mexico. We tested both pixel-based with minimum distance (MD) and maximum likelihood (MLC) and object-based with Random Forest (RF) algorithm for this task. Training samples and verification data were selected by visual interpreting the WV2 images for seven thematic classes: fully grown, middle stage, and early stage of avocado crops, bare land, two types of natural forests, and water body. To examine the contribution of the four new spectral bands of WV2 sensor, all the tested classifications were carried out with and without the four new spectral bands. Classification accuracy assessment results show that object-based classification with RF algorithm obtained higher overall higher accuracy (93.06%) than pixel-based MD (69.37%) and MLC (64.03%) method. For both pixel-based and object-based methods, the classifications with the four new spectral bands (overall accuracy obtained higher accuracy than those without: overall accuracy of object-based RF classification with vs without: 93.06% vs 83.59%, pixel-based MD: 69.37% vs 67.2%, pixel-based MLC: 64.03% vs 36.05%, suggesting that the four new spectral bands in WV2 sensor contributed to the increase of the classification accuracy.
Molecular cancer classification using a meta-sample-based regularized robust coding method.
Wang, Shu-Lin; Sun, Liuchao; Fang, Jianwen
2014-01-01
Previous studies have demonstrated that machine learning based molecular cancer classification using gene expression profiling (GEP) data is promising for the clinic diagnosis and treatment of cancer. Novel classification methods with high efficiency and prediction accuracy are still needed to deal with high dimensionality and small sample size of typical GEP data. Recently the sparse representation (SR) method has been successfully applied to the cancer classification. Nevertheless, its efficiency needs to be improved when analyzing large-scale GEP data. In this paper we present the meta-sample-based regularized robust coding classification (MRRCC), a novel effective cancer classification technique that combines the idea of meta-sample-based cluster method with regularized robust coding (RRC) method. It assumes that the coding residual and the coding coefficient are respectively independent and identically distributed. Similar to meta-sample-based SR classification (MSRC), MRRCC extracts a set of meta-samples from the training samples, and then encodes a testing sample as the sparse linear combination of these meta-samples. The representation fidelity is measured by the l2-norm or l1-norm of the coding residual. Extensive experiments on publicly available GEP datasets demonstrate that the proposed method is more efficient while its prediction accuracy is equivalent to existing MSRC-based methods and better than other state-of-the-art dimension reduction based methods.
NASA Astrophysics Data System (ADS)
Miao, Minmin; Zeng, Hong; Wang, Aimin; Zhao, Fengkui; Liu, Feixiang
2017-09-01
Electroencephalogram (EEG)-based motor imagery (MI) brain-computer interface (BCI) has shown its effectiveness for the control of rehabilitation devices designed for large body parts of the patients with neurologic impairments. In order to validate the feasibility of using EEG to decode the MI of a single index finger and constructing a BCI-enhanced finger rehabilitation system, we collected EEG data during right hand index finger MI and rest state for five healthy subjects and proposed a pattern recognition approach for classifying these two mental states. First, Fisher's linear discriminant criteria and power spectral density analysis were used to analyze the event-related desynchronization patterns. Second, both band power and approximate entropy were extracted as features. Third, aiming to eliminate the abnormal samples in the dictionary and improve the classification performance of the conventional sparse representation-based classification (SRC) method, we proposed a novel dictionary cleaned sparse representation-based classification (DCSRC) method for final classification. The experimental results show that the proposed DCSRC method gives better classification accuracies than SRC and an average classification accuracy of 81.32% is obtained for five subjects. Thus, it is demonstrated that single right hand index finger MI can be decoded from the sensorimotor rhythms, and the feature patterns of index finger MI and rest state can be well recognized for robotic exoskeleton initiation.
Synergistic Use of WorldView-2 Imagery and Airborne LiDAR Data for Urban Land Cover Classification
NASA Astrophysics Data System (ADS)
Wu, M. F.; Sun, Z. C.; Yang, B.; Yu, S. S.
2017-02-01
There are lots of challenges for deriving urban land cover types for high resolution optical imagery because of spectral similarity of different objects, mixed pixels, shadows of buildings and large tree crowns. In order to reduce these uncertainties, recently, it’s a trend of the classification of urban land cover from multi-source sensors in the field of urban remote sensing. In this study, a hierarchical support vector machine (SVM) classification method was applied to the urban land cover mapping, using the WorldView-2 imagery and airborne Light Detection and Ranging (LiDAR) data. The results showed that: (1) The overall accuracy (OA) and overall kappa (OK) were 72.92% and 0.66 for WorldView-2 imagery alone; while the OA and OK were improved up to 89.44% and 0.87 for the synergistic use of the two types of data source. (2) Buildings and road/parking lots extracted from fused data were more precision and well-shaped. The two classes from fused data were optimally classified with higher producer’s accuracy and user’s accuracy than WorldView-2 imagery alone. The trees were also easily separated from the grasslands when the airborne LiDAR data was added. (3) The fused data could reduce the phenomenon of different spectral character of the complex and detailed objects. It was also helpful to address the problem of shadows from the high-rise buildings. The results from this study indicate that the synergistic use of high resolution optical imagery and airborne LiDAR data can be an efficient approach to improving the classification of urban land cover.
Zhang, Chuncheng; Song, Sutao; Wen, Xiaotong; Yao, Li; Long, Zhiying
2015-04-30
Feature selection plays an important role in improving the classification accuracy of multivariate classification techniques in the context of fMRI-based decoding due to the "few samples and large features" nature of functional magnetic resonance imaging (fMRI) data. Recently, several sparse representation methods have been applied to the voxel selection of fMRI data. Despite the low computational efficiency of the sparse representation methods, they still displayed promise for applications that select features from fMRI data. In this study, we proposed the Laplacian smoothed L0 norm (LSL0) approach for feature selection of fMRI data. Based on the fast sparse decomposition using smoothed L0 norm (SL0) (Mohimani, 2007), the LSL0 method used the Laplacian function to approximate the L0 norm of sources. Results of the simulated and real fMRI data demonstrated the feasibility and robustness of LSL0 for the sparse source estimation and feature selection. Simulated results indicated that LSL0 produced more accurate source estimation than SL0 at high noise levels. The classification accuracy using voxels that were selected by LSL0 was higher than that by SL0 in both simulated and real fMRI experiment. Moreover, both LSL0 and SL0 showed higher classification accuracy and required less time than ICA and t-test for the fMRI decoding. LSL0 outperformed SL0 in sparse source estimation at high noise level and in feature selection. Moreover, LSL0 and SL0 showed better performance than ICA and t-test for feature selection. Copyright © 2015 Elsevier B.V. All rights reserved.
Lambron, Julien; Rakotonjanahary, Josué; Loisel, Didier; Frampas, Eric; De Carli, Emilie; Delion, Matthieu; Rialland, Xavier; Toulgoat, Frédérique
2016-02-01
Magnetic resonance (MR) images from children with optic pathway glioma (OPG) are complex. We initiated this study to evaluate the accuracy of MR imaging (MRI) interpretation and to propose a simple and reproducible imaging classification for MRI. We randomly selected 140 MRIs from among 510 MRIs performed on 104 children diagnosed with OPG in France from 1990 to 2004. These images were reviewed independently by three radiologists (F.T., 15 years of experience in neuroradiology; D.L., 25 years of experience in pediatric radiology; and J.L., 3 years of experience in radiology) using a classification derived from the Dodge and modified Dodge classifications. Intra- and interobserver reliabilities were assessed using the Bland-Altman method and the kappa coefficient. These reviews allowed the definition of reliable criteria for MRI interpretation. The reviews showed intraobserver variability and large discrepancies among the three radiologists (kappa coefficient varying from 0.11 to 1). These variabilities were too large for the interpretation to be considered reproducible over time or among observers. A consensual analysis, taking into account all observed variabilities, allowed the development of a definitive interpretation protocol. Using this revised protocol, we observed consistent intra- and interobserver results (kappa coefficient varying from 0.56 to 1). The mean interobserver difference for the solid portion of the tumor with contrast enhancement was 0.8 cm(3) (limits of agreement = -16 to 17). We propose simple and precise rules for improving the accuracy and reliability of MRI interpretation for children with OPG. Further studies will be necessary to investigate the possible prognostic value of this approach.
NASA Astrophysics Data System (ADS)
Meerdink, S.; Roberts, D. A.; Roth, K. L.
2015-12-01
Accurate knowledge of the spatial distribution of plant species is required for many research and management agendas that track ecosystem health. Because of this, there is continuous development of research focused on remotely-sensed species classifications for many diverse ecosystems. While plant species have been mapped using airborne imaging spectroscopy, the geographic extent has been limited due to data availability and spectrally similar species continue to be difficult to separate. The proposed Hyperspectral Infrared Imager (HyspIRI) space-borne mission, which includes a visible near infrared/shortwave infrared (VSWIR) imaging spectrometer and thermal infrared (TIR) multi-spectral imager, would present an opportunity to improve species discrimination over a much broader scale. Here we evaluate: 1) the capability of VSWIR and/or TIR spectra to discriminate plant species; 2) the accuracy of species classifications within an ecosystem; and 3) the potential for discriminating among species across a range of ecosystems. Simulated HyspIRI imagery was acquired in spring/summer of 2013 spanning from Santa Barbara to Bakersfield, CA with the Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) and the MODIS/ASTER Airborne Simulator (MASTER) instruments. Three spectral libraries were created from these images: AVIRIS (224 bands from 0.4 - 2.5 µm), MASTER (8 bands from 7.5 - 12 µm), and AVIRIS + MASTER. We used canonical discriminant analysis (CDA) as a dimension reduction technique and then classified plant species using linear discriminant analysis (LDA). Our results show the inclusion of TIR spectra improved species discrimination, but only for plant species with emissivities departing from that of a gray body. Ecosystems with species that have high spectral contrast had higher classification accuracies. Mapping plant species across all ecosystems resulted in a classification with lower accuracies than a single ecosystem due to the complex nature of incorporating more plant species.
McRoy, Susan; Jones, Sean; Kurmally, Adam
2016-09-01
This article examines methods for automated question classification applied to cancer-related questions that people have asked on the web. This work is part of a broader effort to provide automated question answering for health education. We created a new corpus of consumer-health questions related to cancer and a new taxonomy for those questions. We then compared the effectiveness of different statistical methods for developing classifiers, including weighted classification and resampling. Basic methods for building classifiers were limited by the high variability in the natural distribution of questions and typical refinement approaches of feature selection and merging categories achieved only small improvements to classifier accuracy. Best performance was achieved using weighted classification and resampling methods, the latter yielding an accuracy of F1 = 0.963. Thus, it would appear that statistical classifiers can be trained on natural data, but only if natural distributions of classes are smoothed. Such classifiers would be useful for automated question answering, for enriching web-based content, or assisting clinical professionals to answer questions. © The Author(s) 2015.
Classifier dependent feature preprocessing methods
NASA Astrophysics Data System (ADS)
Rodriguez, Benjamin M., II; Peterson, Gilbert L.
2008-04-01
In mobile applications, computational complexity is an issue that limits sophisticated algorithms from being implemented on these devices. This paper provides an initial solution to applying pattern recognition systems on mobile devices by combining existing preprocessing algorithms for recognition. In pattern recognition systems, it is essential to properly apply feature preprocessing tools prior to training classification models in an attempt to reduce computational complexity and improve the overall classification accuracy. The feature preprocessing tools extended for the mobile environment are feature ranking, feature extraction, data preparation and outlier removal. Most desktop systems today are capable of processing a majority of the available classification algorithms without concern of processing while the same is not true on mobile platforms. As an application of pattern recognition for mobile devices, the recognition system targets the problem of steganalysis, determining if an image contains hidden information. The measure of performance shows that feature preprocessing increases the overall steganalysis classification accuracy by an average of 22%. The methods in this paper are tested on a workstation and a Nokia 6620 (Symbian operating system) camera phone with similar results.
Clemans, Katherine H; Musci, Rashelle J; Leoutsakos, Jeannie-Marie S; Ialongo, Nicholas S
2014-04-01
This study compared the ability of teacher, parent, and peer reports of aggressive behavior in early childhood to accurately classify cases of maladaptive outcomes in late adolescence and early adulthood. Weighted kappa analyses determined optimal cut points and relative classification accuracy among teacher, parent, and peer reports of aggression assessed for 691 students (54% male; 84% African American and 13% White) in the fall of first grade. Outcomes included antisocial personality, substance use, incarceration history, risky sexual behavior, and failure to graduate from high school on time. Peer reports were the most accurate classifier of all outcomes in the full sample. For most outcomes, the addition of teacher or parent reports did not improve overall classification accuracy once peer reports were accounted for. Additional gender-specific and adjusted kappa analyses supported the superior classification utility of the peer report measure. The results suggest that peer reports provided the most useful classification information of the 3 aggression measures. Implications for targeted intervention efforts in which screening measures are used to identify at-risk children are discussed.
NASA Astrophysics Data System (ADS)
Li, Hong; Ding, Xue
2017-03-01
This paper combines wavelet analysis and wavelet transform theory with artificial neural network, through the pretreatment on point feature attributes before in intrusion detection, to make them suitable for improvement of wavelet neural network. The whole intrusion classification model gets the better adaptability, self-learning ability, greatly enhances the wavelet neural network for solving the problem of field detection invasion, reduces storage space, contributes to improve the performance of the constructed neural network, and reduces the training time. Finally the results of the KDDCup99 data set simulation experiment shows that, this method reduces the complexity of constructing wavelet neural network, but also ensures the accuracy of the intrusion classification.
Zhou, Zhen; Wang, Jian-Bao; Zang, Yu-Feng; Pan, Gang
2018-01-01
Classification approaches have been increasingly applied to differentiate patients and normal controls using resting-state functional magnetic resonance imaging data (RS-fMRI). Although most previous classification studies have reported promising accuracy within individual datasets, achieving high levels of accuracy with multiple datasets remains challenging for two main reasons: high dimensionality, and high variability across subjects. We used two independent RS-fMRI datasets (n = 31, 46, respectively) both with eyes closed (EC) and eyes open (EO) conditions. For each dataset, we first reduced the number of features to a small number of brain regions with paired t-tests, using the amplitude of low frequency fluctuation (ALFF) as a metric. Second, we employed a new method for feature extraction, named the PAIR method, examining EC and EO as paired conditions rather than independent conditions. Specifically, for each dataset, we obtained EC minus EO (EC—EO) maps of ALFF from half of subjects (n = 15 for dataset-1, n = 23 for dataset-2) and obtained EO—EC maps from the other half (n = 16 for dataset-1, n = 23 for dataset-2). A support vector machine (SVM) method was used for classification of EC RS-fMRI mapping and EO mapping. The mean classification accuracy of the PAIR method was 91.40% for dataset-1, and 92.75% for dataset-2 in the conventional frequency band of 0.01–0.08 Hz. For cross-dataset validation, we applied the classifier from dataset-1 directly to dataset-2, and vice versa. The mean accuracy of cross-dataset validation was 94.93% for dataset-1 to dataset-2 and 90.32% for dataset-2 to dataset-1 in the 0.01–0.08 Hz range. For the UNPAIR method, classification accuracy was substantially lower (mean 69.89% for dataset-1 and 82.97% for dataset-2), and was much lower for cross-dataset validation (64.69% for dataset-1 to dataset-2 and 64.98% for dataset-2 to dataset-1) in the 0.01–0.08 Hz range. In conclusion, for within-group design studies (e.g., paired conditions or follow-up studies), we recommend the PAIR method for feature extraction. In addition, dimensionality reduction with strong prior knowledge of specific brain regions should also be considered for feature selection in neuroimaging studies. PMID:29375288
Open Dataset for the Automatic Recognition of Sedentary Behaviors.
Possos, William; Cruz, Robinson; Cerón, Jesús D; López, Diego M; Sierra-Torres, Carlos H
2017-01-01
Sedentarism is associated with the development of noncommunicable diseases (NCD) such as cardiovascular diseases (CVD), type 2 diabetes, and cancer. Therefore, the identification of specific sedentary behaviors (TV viewing, sitting at work, driving, relaxing, etc.) is especially relevant for planning personalized prevention programs. To build and evaluate a public a dataset for the automatic recognition (classification) of sedentary behaviors. The dataset included data from 30 subjects, who performed 23 sedentary behaviors while wearing a commercial wearable on the wrist, a smartphone on the hip and another in the thigh. Bluetooth Low Energy (BLE) beacons were used in order to improve the automatic classification of different sedentary behaviors. The study also compared six well know data mining classification techniques in order to identify the more precise method of solving the classification problem of the 23 defined behaviors. A better classification accuracy was obtained using the Random Forest algorithm and when data were collected from the phone on the hip. Furthermore, the use of beacons as a reference for obtaining the symbolic location of the individual improved the precision of the classification.
New KF-PP-SVM classification method for EEG in brain-computer interfaces.
Yang, Banghua; Han, Zhijun; Zan, Peng; Wang, Qian
2014-01-01
Classification methods are a crucial direction in the current study of brain-computer interfaces (BCIs). To improve the classification accuracy for electroencephalogram (EEG) signals, a novel KF-PP-SVM (kernel fisher, posterior probability, and support vector machine) classification method is developed. Its detailed process entails the use of common spatial patterns to obtain features, based on which the within-class scatter is calculated. Then the scatter is added into the kernel function of a radial basis function to construct a new kernel function. This new kernel is integrated into the SVM to obtain a new classification model. Finally, the output of SVM is calculated based on posterior probability and the final recognition result is obtained. To evaluate the effectiveness of the proposed KF-PP-SVM method, EEG data collected from laboratory are processed with four different classification schemes (KF-PP-SVM, KF-SVM, PP-SVM, and SVM). The results showed that the overall average improvements arising from the use of the KF-PP-SVM scheme as opposed to KF-SVM, PP-SVM and SVM schemes are 2.49%, 5.83 % and 6.49 % respectively.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chang, Yongjun; Lim, Jonghyuck; Kim, Namkug
2013-05-15
Purpose: To investigate the effect of using different computed tomography (CT) scanners on the accuracy of high-resolution CT (HRCT) images in classifying regional disease patterns in patients with diffuse lung disease, support vector machine (SVM) and Bayesian classifiers were applied to multicenter data. Methods: Two experienced radiologists marked sets of 600 rectangular 20 Multiplication-Sign 20 pixel regions of interest (ROIs) on HRCT images obtained from two scanners (GE and Siemens), including 100 ROIs for each of local patterns of lungs-normal lung and five of regional pulmonary disease patterns (ground-glass opacity, reticular opacity, honeycombing, emphysema, and consolidation). Each ROI was assessedmore » using 22 quantitative features belonging to one of the following descriptors: histogram, gradient, run-length, gray level co-occurrence matrix, low-attenuation area cluster, and top-hat transform. For automatic classification, a Bayesian classifier and a SVM classifier were compared under three different conditions. First, classification accuracies were estimated using data from each scanner. Next, data from the GE and Siemens scanners were used for training and testing, respectively, and vice versa. Finally, all ROI data were integrated regardless of the scanner type and were then trained and tested together. All experiments were performed based on forward feature selection and fivefold cross-validation with 20 repetitions. Results: For each scanner, better classification accuracies were achieved with the SVM classifier than the Bayesian classifier (92% and 82%, respectively, for the GE scanner; and 92% and 86%, respectively, for the Siemens scanner). The classification accuracies were 82%/72% for training with GE data and testing with Siemens data, and 79%/72% for the reverse. The use of training and test data obtained from the HRCT images of different scanners lowered the classification accuracy compared to the use of HRCT images from the same scanner. For integrated ROI data obtained from both scanners, the classification accuracies with the SVM and Bayesian classifiers were 92% and 77%, respectively. The selected features resulting from the classification process differed by scanner, with more features included for the classification of the integrated HRCT data than for the classification of the HRCT data from each scanner. For the integrated data, consisting of HRCT images of both scanners, the classification accuracy based on the SVM was statistically similar to the accuracy of the data obtained from each scanner. However, the classification accuracy of the integrated data using the Bayesian classifier was significantly lower than the classification accuracy of the ROI data of each scanner. Conclusions: The use of an integrated dataset along with a SVM classifier rather than a Bayesian classifier has benefits in terms of the classification accuracy of HRCT images acquired with more than one scanner. This finding is of relevance in studies involving large number of images, as is the case in a multicenter trial with different scanners.« less
Corn and soybean Landsat MSS classification performance as a function of scene characteristics
NASA Technical Reports Server (NTRS)
Batista, G. T.; Hixson, M. M.; Bauer, M. E.
1982-01-01
In order to fully utilize remote sensing to inventory crop production, it is important to identify the factors that affect the accuracy of Landsat classifications. The objective of this study was to investigate the effect of scene characteristics involving crop, soil, and weather variables on the accuracy of Landsat classifications of corn and soybeans. Segments sampling the U.S. Corn Belt were classified using a Gaussian maximum likelihood classifier on multitemporally registered data from two key acquisition periods. Field size had a strong effect on classification accuracy with small fields tending to have low accuracies even when the effect of mixed pixels was eliminated. Other scene characteristics accounting for variability in classification accuracy included proportions of corn and soybeans, crop diversity index, proportion of all field crops, soil drainage, slope, soil order, long-term average soybean yield, maximum yield, relative position of the segment in the Corn Belt, weather, and crop development stage.
Classification of teeth in cone-beam CT using deep convolutional neural network.
Miki, Yuma; Muramatsu, Chisako; Hayashi, Tatsuro; Zhou, Xiangrong; Hara, Takeshi; Katsumata, Akitoshi; Fujita, Hiroshi
2017-01-01
Dental records play an important role in forensic identification. To this end, postmortem dental findings and teeth conditions are recorded in a dental chart and compared with those of antemortem records. However, most dentists are inexperienced at recording the dental chart for corpses, and it is a physically and mentally laborious task, especially in large scale disasters. Our goal is to automate the dental filing process by using dental x-ray images. In this study, we investigated the application of a deep convolutional neural network (DCNN) for classifying tooth types on dental cone-beam computed tomography (CT) images. Regions of interest (ROIs) including single teeth were extracted from CT slices. Fifty two CT volumes were randomly divided into 42 training and 10 test cases, and the ROIs obtained from the training cases were used for training the DCNN. For examining the sampling effect, random sampling was performed 3 times, and training and testing were repeated. We used the AlexNet network architecture provided in the Caffe framework, which consists of 5 convolution layers, 3 pooling layers, and 2 full connection layers. For reducing the overtraining effect, we augmented the data by image rotation and intensity transformation. The test ROIs were classified into 7 tooth types by the trained network. The average classification accuracy using the augmented training data by image rotation and intensity transformation was 88.8%. Compared with the result without data augmentation, data augmentation resulted in an approximately 5% improvement in classification accuracy. This indicates that the further improvement can be expected by expanding the CT dataset. Unlike the conventional methods, the proposed method is advantageous in obtaining high classification accuracy without the need for precise tooth segmentation. The proposed tooth classification method can be useful in automatic filing of dental charts for forensic identification. Copyright © 2016 Elsevier Ltd. All rights reserved.
Nationwide forestry applications program. Analysis of forest classification accuracy
NASA Technical Reports Server (NTRS)
Congalton, R. G.; Mead, R. A.; Oderwald, R. G.; Heinen, J. (Principal Investigator)
1981-01-01
The development of LANDSAT classification accuracy assessment techniques, and of a computerized system for assessing wildlife habitat from land cover maps are considered. A literature review on accuracy assessment techniques and an explanation for the techniques development under both projects are included along with listings of the computer programs. The presentations and discussions at the National Working Conference on LANDSAT Classification Accuracy are summarized. Two symposium papers which were published on the results of this project are appended.
Building confidence and credibility into CAD with belief decision trees
NASA Astrophysics Data System (ADS)
Affenit, Rachael N.; Barns, Erik R.; Furst, Jacob D.; Rasin, Alexander; Raicu, Daniela S.
2017-03-01
Creating classifiers for computer-aided diagnosis in the absence of ground truth is a challenging problem. Using experts' opinions as reference truth is difficult because the variability in the experts' interpretations introduces uncertainty in the labeled diagnostic data. This uncertainty translates into noise, which can significantly affect the performance of any classifier on test data. To address this problem, we propose a new label set weighting approach to combine the experts' interpretations and their variability, as well as a selective iterative classification (SIC) approach that is based on conformal prediction. Using the NIH/NCI Lung Image Database Consortium (LIDC) dataset in which four radiologists interpreted the lung nodule characteristics, including the degree of malignancy, we illustrate the benefits of the proposed approach. Our results show that the proposed 2-label-weighted approach significantly outperforms the accuracy of the original 5- label and 2-label-unweighted classification approaches by 39.9% and 7.6%, respectively. We also found that the weighted 2-label models produce higher skewness values by 1.05 and 0.61 for non-SIC and SIC respectively on root mean square error (RMSE) distributions. When each approach was combined with selective iterative classification, this further improved the accuracy of classification for the 2-weighted-label by 7.5% over the original, and improved the skewness of the 5-label and 2-unweighted-label by 0.22 and 0.44 respectively.
Active microwave responses - An aid in improved crop classification
NASA Technical Reports Server (NTRS)
Rosenthal, W. D.; Blanchard, B. J.
1984-01-01
A study determined the feasibility of using visible, infrared, and active microwave data to classify agricultural crops such as corn, sorghum, alfalfa, wheat stubble, millet, shortgrass pasture and bare soil. Visible through microwave data were collected by instruments on board the NASA C-130 aircraft over 40 agricultural fields near Guymon, OK in 1978 and Dalhart, TX in 1980. Results from stepwise and discriminant analysis techniques indicated 4.75 GHz, 1.6 GHz, and 0.4 GHz cross-polarized microwave frequencies were the microwave frequencies most sensitive to crop type differences. Inclusion of microwave data in visible and infrared classification models improved classification accuracy from 73 percent to 92 percent. Despite the results, further studies are needed during different growth stages to validate the visible, infrared, and active microwave responses to vegetation.
NASA Technical Reports Server (NTRS)
Hoffbeck, Joseph P.; Landgrebe, David A.
1994-01-01
Many analysis algorithms for high-dimensional remote sensing data require that the remotely sensed radiance spectra be transformed to approximate reflectance to allow comparison with a library of laboratory reflectance spectra. In maximum likelihood classification, however, the remotely sensed spectra are compared to training samples, thus a transformation to reflectance may or may not be helpful. The effect of several radiance-to-reflectance transformations on maximum likelihood classification accuracy is investigated in this paper. We show that the empirical line approach, LOWTRAN7, flat-field correction, single spectrum method, and internal average reflectance are all non-singular affine transformations, and that non-singular affine transformations have no effect on discriminant analysis feature extraction and maximum likelihood classification accuracy. (An affine transformation is a linear transformation with an optional offset.) Since the Atmosphere Removal Program (ATREM) and the log residue method are not affine transformations, experiments with Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) data were conducted to determine the effect of these transformations on maximum likelihood classification accuracy. The average classification accuracy of the data transformed by ATREM and the log residue method was slightly less than the accuracy of the original radiance data. Since the radiance-to-reflectance transformations allow direct comparison of remotely sensed spectra with laboratory reflectance spectra, they can be quite useful in labeling the training samples required by maximum likelihood classification, but these transformations have only a slight effect or no effect at all on discriminant analysis and maximum likelihood classification accuracy.
A multiple-point spatially weighted k-NN method for object-based classification
NASA Astrophysics Data System (ADS)
Tang, Yunwei; Jing, Linhai; Li, Hui; Atkinson, Peter M.
2016-10-01
Object-based classification, commonly referred to as object-based image analysis (OBIA), is now commonly regarded as able to produce more appealing classification maps, often of greater accuracy, than pixel-based classification and its application is now widespread. Therefore, improvement of OBIA using spatial techniques is of great interest. In this paper, multiple-point statistics (MPS) is proposed for object-based classification enhancement in the form of a new multiple-point k-nearest neighbour (k-NN) classification method (MPk-NN). The proposed method first utilises a training image derived from a pre-classified map to characterise the spatial correlation between multiple points of land cover classes. The MPS borrows spatial structures from other parts of the training image, and then incorporates this spatial information, in the form of multiple-point probabilities, into the k-NN classifier. Two satellite sensor images with a fine spatial resolution were selected to evaluate the new method. One is an IKONOS image of the Beijing urban area and the other is a WorldView-2 image of the Wolong mountainous area, in China. The images were object-based classified using the MPk-NN method and several alternatives, including the k-NN, the geostatistically weighted k-NN, the Bayesian method, the decision tree classifier (DTC), and the support vector machine classifier (SVM). It was demonstrated that the new spatial weighting based on MPS can achieve greater classification accuracy relative to the alternatives and it is, thus, recommended as appropriate for object-based classification.
NASA Astrophysics Data System (ADS)
Saran, Sameer; Sterk, Geert; Kumar, Suresh
2009-10-01
Land use/land cover is an important watershed surface characteristic that affects surface runoff and erosion. Many of the available hydrological models divide the watershed into Hydrological Response Units (HRU), which are spatial units with expected similar hydrological behaviours. The division into HRU's requires good-quality spatial data on land use/land cover. This paper presents different approaches to attain an optimal land use/land cover map based on remote sensing imagery for a Himalayan watershed in northern India. First digital classifications using maximum likelihood classifier (MLC) and a decision tree classifier were applied. The results obtained from the decision tree were better and even improved after post classification sorting. But the obtained land use/land cover map was not sufficient for the delineation of HRUs, since the agricultural land use/land cover class did not discriminate between the two major crops in the area i.e. paddy and maize. Subsequently the digital classification on fused data (ASAR and ASTER) were attempted to map land use/land cover classes with emphasis to delineate the paddy and maize crops but the supervised classification over fused datasets did not provide the desired accuracy and proper delineation of paddy and maize crops. Eventually, we adopted a visual classification approach on fused data. This second step with detailed classification system resulted into better classification accuracy within the 'agricultural land' class which will be further combined with topography and soil type to derive HRU's for physically-based hydrological modeling.
Monitoring of "urban villages" in Shenzhen, China from high-resolution GF-1 and TerraSAR-X data
NASA Astrophysics Data System (ADS)
Wei, Chunzhu; Blaschke, Thomas; Taubenböck, Hannes
2015-10-01
Urban villages comprise mainly low-rise and congested, often informal settlements surrounded by new constructions and high-rise buildings whereby structures can be very different between neighboring areas. Monitoring urban villages and analyzing their characteristics are crucial for urban development and sustainability research. In this study, we carried out a combined analysis of multispectral GaoFen-1 (GF-1) and high resolution TerraSAR-X radar (TSX) imagery to extract the urban village information. GF-1 and TSX data are combined with the Gramshmidt spectral sharpening method so as to provide new input data for urban village classification. The Grey-Level Co-occurrence Matrix (GLCM) approach was also applied to four directions to provide another four types (all, 0°, 90°, 45° directions) of TSX-based inputs for urban village detection. We analyzed the urban village mapping performance using the Random Forest approach. The results demonstrate that the best overall accuracy and the best producer accuracy of urban villages reached with the GLCM 90° dataset (82.33%, 68.54% respectively). Adding single polarization TSX data as input information to the optical image GF-1 provided an average product accuracy improvement of around 7% in formal built-up area classification. The SAR and optical fusion imagery also provided an effective means to eliminate some layover, shadow effects, and dominant scattering at building locations and green spaces, improving the producer accuracy by 7% in urban area classification. To sum up, the added value of SAR information is demonstrated by the enhanced results achievable over built-up areas, including formal and informal settlements.
Shamim, Mohammad Tabrez Anwar; Anwaruddin, Mohammad; Nagarajaram, H A
2007-12-15
Fold recognition is a key step in the protein structure discovery process, especially when traditional sequence comparison methods fail to yield convincing structural homologies. Although many methods have been developed for protein fold recognition, their accuracies remain low. This can be attributed to insufficient exploitation of fold discriminatory features. We have developed a new method for protein fold recognition using structural information of amino acid residues and amino acid residue pairs. Since protein fold recognition can be treated as a protein fold classification problem, we have developed a Support Vector Machine (SVM) based classifier approach that uses secondary structural state and solvent accessibility state frequencies of amino acids and amino acid pairs as feature vectors. Among the individual properties examined secondary structural state frequencies of amino acids gave an overall accuracy of 65.2% for fold discrimination, which is better than the accuracy by any method reported so far in the literature. Combination of secondary structural state frequencies with solvent accessibility state frequencies of amino acids and amino acid pairs further improved the fold discrimination accuracy to more than 70%, which is approximately 8% higher than the best available method. In this study we have also tested, for the first time, an all-together multi-class method known as Crammer and Singer method for protein fold classification. Our studies reveal that the three multi-class classification methods, namely one versus all, one versus one and Crammer and Singer method, yield similar predictions. Dataset and stand-alone program are available upon request.
A fuzzy hill-climbing algorithm for the development of a compact associative classifier
NASA Astrophysics Data System (ADS)
Mitra, Soumyaroop; Lam, Sarah S.
2012-02-01
Classification, a data mining technique, has widespread applications including medical diagnosis, targeted marketing, and others. Knowledge discovery from databases in the form of association rules is one of the important data mining tasks. An integrated approach, classification based on association rules, has drawn the attention of the data mining community over the last decade. While attention has been mainly focused on increasing classifier accuracies, not much efforts have been devoted towards building interpretable and less complex models. This paper discusses the development of a compact associative classification model using a hill-climbing approach and fuzzy sets. The proposed methodology builds the rule-base by selecting rules which contribute towards increasing training accuracy, thus balancing classification accuracy with the number of classification association rules. The results indicated that the proposed associative classification model can achieve competitive accuracies on benchmark datasets with continuous attributes and lend better interpretability, when compared with other rule-based systems.
User-Independent Motion State Recognition Using Smartphone Sensors
Gu, Fuqiang; Kealy, Allison; Khoshelham, Kourosh; Shang, Jianga
2015-01-01
The recognition of locomotion activities (e.g., walking, running, still) is important for a wide range of applications like indoor positioning, navigation, location-based services, and health monitoring. Recently, there has been a growing interest in activity recognition using accelerometer data. However, when utilizing only acceleration-based features, it is difficult to differentiate varying vertical motion states from horizontal motion states especially when conducting user-independent classification. In this paper, we also make use of the newly emerging barometer built in modern smartphones, and propose a novel feature called pressure derivative from the barometer readings for user motion state recognition, which is proven to be effective for distinguishing vertical motion states and does not depend on specific users’ data. Seven types of motion states are defined and six commonly-used classifiers are compared. In addition, we utilize the motion state history and the characteristics of people’s motion to improve the classification accuracies of those classifiers. Experimental results show that by using the historical information and human’s motion characteristics, we can achieve user-independent motion state classification with an accuracy of up to 90.7%. In addition, we analyze the influence of the window size and smartphone pose on the accuracy. PMID:26690163
User-Independent Motion State Recognition Using Smartphone Sensors.
Gu, Fuqiang; Kealy, Allison; Khoshelham, Kourosh; Shang, Jianga
2015-12-04
The recognition of locomotion activities (e.g., walking, running, still) is important for a wide range of applications like indoor positioning, navigation, location-based services, and health monitoring. Recently, there has been a growing interest in activity recognition using accelerometer data. However, when utilizing only acceleration-based features, it is difficult to differentiate varying vertical motion states from horizontal motion states especially when conducting user-independent classification. In this paper, we also make use of the newly emerging barometer built in modern smartphones, and propose a novel feature called pressure derivative from the barometer readings for user motion state recognition, which is proven to be effective for distinguishing vertical motion states and does not depend on specific users' data. Seven types of motion states are defined and six commonly-used classifiers are compared. In addition, we utilize the motion state history and the characteristics of people's motion to improve the classification accuracies of those classifiers. Experimental results show that by using the historical information and human's motion characteristics, we can achieve user-independent motion state classification with an accuracy of up to 90.7%. In addition, we analyze the influence of the window size and smartphone pose on the accuracy.
Sidek, Khairul; Khali, Ibrahim
2012-01-01
In this paper, a person identification mechanism implemented with Cardioid based graph using electrocardiogram (ECG) is presented. Cardioid based graph has given a reasonably good classification accuracy in terms of differentiating between individuals. However, the current feature extraction method using Euclidean distance could be further improved by using Mahalanobis distance measurement producing extracted coefficients which takes into account the correlations of the data set. Identification is then done by applying these extracted features to Radial Basis Function Network. A total of 30 ECG data from MITBIH Normal Sinus Rhythm database (NSRDB) and MITBIH Arrhythmia database (MITDB) were used for development and evaluation purposes. Our experimentation results suggest that the proposed feature extraction method has significantly increased the classification performance of subjects in both databases with accuracy from 97.50% to 99.80% in NSRDB and 96.50% to 99.40% in MITDB. High sensitivity, specificity and positive predictive value of 99.17%, 99.91% and 99.23% for NSRDB and 99.30%, 99.90% and 99.40% for MITDB also validates the proposed method. This result also indicates that the right feature extraction technique plays a vital role in determining the persistency of the classification accuracy for Cardioid based person identification mechanism.
SVM-RFE based feature selection and Taguchi parameters optimization for multiclass SVM classifier.
Huang, Mei-Ling; Hung, Yung-Hsiang; Lee, W M; Li, R K; Jiang, Bo-Ru
2014-01-01
Recently, support vector machine (SVM) has excellent performance on classification and prediction and is widely used on disease diagnosis or medical assistance. However, SVM only functions well on two-group classification problems. This study combines feature selection and SVM recursive feature elimination (SVM-RFE) to investigate the classification accuracy of multiclass problems for Dermatology and Zoo databases. Dermatology dataset contains 33 feature variables, 1 class variable, and 366 testing instances; and the Zoo dataset contains 16 feature variables, 1 class variable, and 101 testing instances. The feature variables in the two datasets were sorted in descending order by explanatory power, and different feature sets were selected by SVM-RFE to explore classification accuracy. Meanwhile, Taguchi method was jointly combined with SVM classifier in order to optimize parameters C and γ to increase classification accuracy for multiclass classification. The experimental results show that the classification accuracy can be more than 95% after SVM-RFE feature selection and Taguchi parameter optimization for Dermatology and Zoo databases.
SVM-RFE Based Feature Selection and Taguchi Parameters Optimization for Multiclass SVM Classifier
Huang, Mei-Ling; Hung, Yung-Hsiang; Lee, W. M.; Li, R. K.; Jiang, Bo-Ru
2014-01-01
Recently, support vector machine (SVM) has excellent performance on classification and prediction and is widely used on disease diagnosis or medical assistance. However, SVM only functions well on two-group classification problems. This study combines feature selection and SVM recursive feature elimination (SVM-RFE) to investigate the classification accuracy of multiclass problems for Dermatology and Zoo databases. Dermatology dataset contains 33 feature variables, 1 class variable, and 366 testing instances; and the Zoo dataset contains 16 feature variables, 1 class variable, and 101 testing instances. The feature variables in the two datasets were sorted in descending order by explanatory power, and different feature sets were selected by SVM-RFE to explore classification accuracy. Meanwhile, Taguchi method was jointly combined with SVM classifier in order to optimize parameters C and γ to increase classification accuracy for multiclass classification. The experimental results show that the classification accuracy can be more than 95% after SVM-RFE feature selection and Taguchi parameter optimization for Dermatology and Zoo databases. PMID:25295306
Delineation of marsh types of the Texas coast from Corpus Christi Bay to the Sabine River in 2010
Enwright, Nicholas M.; Hartley, Stephen B.; Brasher, Michael G.; Visser, Jenneke M.; Mitchell, Michael K.; Ballard, Bart M.; Parr, Mark W.; Couvillion, Brady R.; Wilson, Barry C.
2014-01-01
Coastal zone managers and researchers often require detailed information regarding emergent marsh vegetation types for modeling habitat capacities and needs of marsh-reliant wildlife (such as waterfowl and alligator). Detailed information on the extent and distribution of marsh vegetation zones throughout the Texas coast has been historically unavailable. In response, the U.S. Geological Survey, in cooperation and collaboration with the U.S. Fish and Wildlife Service via the Gulf Coast Joint Venture, Texas A&M University-Kingsville, the University of Louisiana-Lafayette, and Ducks Unlimited, Inc., has produced a classification of marsh vegetation types along the middle and upper Texas coast from Corpus Christi Bay to the Sabine River. This study incorporates approximately 1,000 ground reference locations collected via helicopter surveys in coastal marsh areas and about 2,000 supplemental locations from fresh marsh, water, and “other” (that is, nonmarsh) areas. About two-thirds of these data were used for training, and about one-third were used for assessing accuracy. Decision-tree analyses using Rulequest See5 were used to classify emergent marsh vegetation types by using these data, multitemporal satellite-based multispectral imagery from 2009 to 2011, a bare-earth digital elevation model (DEM) based on airborne light detection and ranging (lidar), alternative contemporary land cover classifications, and other spatially explicit variables believed to be important for delineating the extent and distribution of marsh vegetation communities. Image objects were generated from segmentation of high-resolution airborne imagery acquired in 2010 and were used to refine the classification. The classification is dated 2010 because the year is both the midpoint of the multitemporal satellite-based imagery (2009–11) classified and the date of the high-resolution airborne imagery that was used to develop image objects. Overall accuracy corrected for bias (accuracy estimate incorporates true marginal proportions) was 91 percent (95 percent confidence interval [CI]: 89.2–92.8), with a kappa statistic of 0.79 (95 percent CI: 0.77–0.81). The classification performed best for saline marsh (user’s accuracy 81.5 percent; producer’s accuracy corrected for bias 62.9 percent) but showed a lesser ability to discriminate intermediate marsh (user’s accuracy 47.7 percent; producer’s accuracy corrected for bias 49.5 percent). Because of confusion in intermediate and brackish marsh classes, an alternative classification containing only three marsh types was created in which intermediate and brackish marshes were combined into a single class. Image objects were reattributed by using this alternative three-marsh-type classification. Overall accuracy, corrected for bias, of this more general classification was 92.4 percent (95 percent CI: 90.7–94.2), and the kappa statistic was 0.83 (95 percent CI: 0.81–0.85). Mean user’s accuracy for marshes within the four-marsh-type and three-marsh-type classifications was 65.4 percent and 75.6 percent, respectively, whereas mean producer’s accuracy was 56.7 percent and 65.1 percent, respectively. This study provides a more objective and repeatable method for classifying marsh types of the middle and upper Texas coast at an extent and greater level of detail than previously available for the study area. The seamless classification produced through this work is now available to help State agencies (such as the Texas Parks and Wildlife Department) and landscape-scale conservation partnerships (such as the Gulf Coast Prairie Landscape Conservation Cooperative and the Gulf Coast Joint Venture) to develop and (or) refine conservation plans targeting priority natural resources. Moreover, these data may improve projections of landscape change and serve as a baseline for monitoring future changes resulting from chronic and episodic stressors.
Classification of urban features using airborne hyperspectral data
NASA Astrophysics Data System (ADS)
Ganesh Babu, Bharath
Accurate mapping and modeling of urban environments are critical for their efficient and successful management. Superior understanding of complex urban environments is made possible by using modern geospatial technologies. This research focuses on thematic classification of urban land use and land cover (LULC) using 248 bands of 2.0 meter resolution hyperspectral data acquired from an airborne imaging spectrometer (AISA+) on 24th July 2006 in and near Terre Haute, Indiana. Three distinct study areas including two commercial classes, two residential classes, and two urban parks/recreational classes were selected for classification and analysis. Four commonly used classification methods -- maximum likelihood (ML), extraction and classification of homogeneous objects (ECHO), spectral angle mapper (SAM), and iterative self organizing data analysis (ISODATA) - were applied to each data set. Accuracy assessment was conducted and overall accuracies were compared between the twenty four resulting thematic maps. With the exception of SAM and ISODATA in a complex commercial area, all methods employed classified the designated urban features with more than 80% accuracy. The thematic classification from ECHO showed the best agreement with ground reference samples. The residential area with relatively homogeneous composition was classified consistently with highest accuracy by all four of the classification methods used. The average accuracy amongst the classifiers was 93.60% for this area. When individually observed, the complex recreational area (Deming Park) was classified with the highest accuracy by ECHO, with an accuracy of 96.80% and 96.10% Kappa. The average accuracy amongst all the classifiers was 92.07%. The commercial area with relatively high complexity was classified with the least accuracy by all classifiers. The lowest accuracy was achieved by SAM at 63.90% with 59.20% Kappa. This was also the lowest accuracy in the entire analysis. This study demonstrates the potential for using the visible and near infrared (VNIR) bands from AISA+ hyperspectral data in urban LULC classification. Based on their performance, the need for further research using ECHO and SAM is underscored. The importance incorporating imaging spectrometer data in high resolution urban feature mapping is emphasized.
NASA Astrophysics Data System (ADS)
Erener, A.
2013-04-01
Automatic extraction of urban features from high resolution satellite images is one of the main applications in remote sensing. It is useful for wide scale applications, namely: urban planning, urban mapping, disaster management, GIS (geographic information systems) updating, and military target detection. One common approach to detecting urban features from high resolution images is to use automatic classification methods. This paper has four main objectives with respect to detecting buildings. The first objective is to compare the performance of the most notable supervised classification algorithms, including the maximum likelihood classifier (MLC) and the support vector machine (SVM). In this experiment the primary consideration is the impact of kernel configuration on the performance of the SVM. The second objective of the study is to explore the suitability of integrating additional bands, namely first principal component (1st PC) and the intensity image, for original data for multi classification approaches. The performance evaluation of classification results is done using two different accuracy assessment methods: pixel based and object based approaches, which reflect the third aim of the study. The objective here is to demonstrate the differences in the evaluation of accuracies of classification methods. Considering consistency, the same set of ground truth data which is produced by labeling the building boundaries in the GIS environment is used for accuracy assessment. Lastly, the fourth aim is to experimentally evaluate variation in the accuracy of classifiers for six different real situations in order to identify the impact of spatial and spectral diversity on results. The method is applied to Quickbird images for various urban complexity levels, extending from simple to complex urban patterns. The simple surface type includes a regular urban area with low density and systematic buildings with brick rooftops. The complex surface type involves almost all kinds of challenges, such as high dense build up areas, regions with bare soil, and small and large buildings with different rooftops, such as concrete, brick, and metal. Using the pixel based accuracy assessment it was shown that the percent building detection (PBD) and quality percent (QP) of the MLC and SVM depend on the complexity and texture variation of the region. Generally, PBD values range between 70% and 90% for the MLC and SVM, respectively. No substantial improvements were observed when the SVM and MLC classifications were developed by the addition of more variables, instead of the use of only four bands. In the evaluation of object based accuracy assessment, it was demonstrated that while MLC and SVM provide higher rates of correct detection, they also provide higher rates of false alarms.
Classification of large-scale fundus image data sets: a cloud-computing framework.
Roychowdhury, Sohini
2016-08-01
Large medical image data sets with high dimensionality require substantial amount of computation time for data creation and data processing. This paper presents a novel generalized method that finds optimal image-based feature sets that reduce computational time complexity while maximizing overall classification accuracy for detection of diabetic retinopathy (DR). First, region-based and pixel-based features are extracted from fundus images for classification of DR lesions and vessel-like structures. Next, feature ranking strategies are used to distinguish the optimal classification feature sets. DR lesion and vessel classification accuracies are computed using the boosted decision tree and decision forest classifiers in the Microsoft Azure Machine Learning Studio platform, respectively. For images from the DIARETDB1 data set, 40 of its highest-ranked features are used to classify four DR lesion types with an average classification accuracy of 90.1% in 792 seconds. Also, for classification of red lesion regions and hemorrhages from microaneurysms, accuracies of 85% and 72% are observed, respectively. For images from STARE data set, 40 high-ranked features can classify minor blood vessels with an accuracy of 83.5% in 326 seconds. Such cloud-based fundus image analysis systems can significantly enhance the borderline classification performances in automated screening systems.
NASA Astrophysics Data System (ADS)
Khan, Asif; Ryoo, Chang-Kyung; Kim, Heung Soo
2017-04-01
This paper presents a comparative study of different classification algorithms for the classification of various types of inter-ply delaminations in smart composite laminates. Improved layerwise theory is used to model delamination at different interfaces along the thickness and longitudinal directions of the smart composite laminate. The input-output data obtained through surface bonded piezoelectric sensor and actuator is analyzed by the system identification algorithm to get the system parameters. The identified parameters for the healthy and delaminated structure are supplied as input data to the classification algorithms. The classification algorithms considered in this study are ZeroR, Classification via regression, Naïve Bayes, Multilayer Perceptron, Sequential Minimal Optimization, Multiclass-Classifier, and Decision tree (J48). The open source software of Waikato Environment for Knowledge Analysis (WEKA) is used to evaluate the classification performance of the classifiers mentioned above via 75-25 holdout and leave-one-sample-out cross-validation regarding classification accuracy, precision, recall, kappa statistic and ROC Area.
Hierarchical Higher Order Crf for the Classification of Airborne LIDAR Point Clouds in Urban Areas
NASA Astrophysics Data System (ADS)
Niemeyer, J.; Rottensteiner, F.; Soergel, U.; Heipke, C.
2016-06-01
We propose a novel hierarchical approach for the classification of airborne 3D lidar points. Spatial and semantic context is incorporated via a two-layer Conditional Random Field (CRF). The first layer operates on a point level and utilises higher order cliques. Segments are generated from the labelling obtained in this way. They are the entities of the second layer, which incorporates larger scale context. The classification result of the segments is introduced as an energy term for the next iteration of the point-based layer. This framework iterates and mutually propagates context to improve the classification results. Potentially wrong decisions can be revised at later stages. The output is a labelled point cloud as well as segments roughly corresponding to object instances. Moreover, we present two new contextual features for the segment classification: the distance and the orientation of a segment with respect to the closest road. It is shown that the classification benefits from these features. In our experiments the hierarchical framework improve the overall accuracies by 2.3% on a point-based level and by 3.0% on a segment-based level, respectively, compared to a purely point-based classification.
A Nonparametric Approach to Estimate Classification Accuracy and Consistency
ERIC Educational Resources Information Center
Lathrop, Quinn N.; Cheng, Ying
2014-01-01
When cut scores for classifications occur on the total score scale, popular methods for estimating classification accuracy (CA) and classification consistency (CC) require assumptions about a parametric form of the test scores or about a parametric response model, such as item response theory (IRT). This article develops an approach to estimate CA…
Fiannaca, Antonino; La Rosa, Massimo; Rizzo, Riccardo; Urso, Alfonso
2015-07-01
In this paper, an alignment-free method for DNA barcode classification that is based on both a spectral representation and a neural gas network for unsupervised clustering is proposed. In the proposed methodology, distinctive words are identified from a spectral representation of DNA sequences. A taxonomic classification of the DNA sequence is then performed using the sequence signature, i.e., the smallest set of k-mers that can assign a DNA sequence to its proper taxonomic category. Experiments were then performed to compare our method with other supervised machine learning classification algorithms, such as support vector machine, random forest, ripper, naïve Bayes, ridor, and classification tree, which also consider short DNA sequence fragments of 200 and 300 base pairs (bp). The experimental tests were conducted over 10 real barcode datasets belonging to different animal species, which were provided by the on-line resource "Barcode of Life Database". The experimental results showed that our k-mer-based approach is directly comparable, in terms of accuracy, recall and precision metrics, with the other classifiers when considering full-length sequences. In addition, we demonstrate the robustness of our method when a classification is performed task with a set of short DNA sequences that were randomly extracted from the original data. For example, the proposed method can reach the accuracy of 64.8% at the species level with 200-bp fragments. Under the same conditions, the best other classifier (random forest) reaches the accuracy of 20.9%. Our results indicate that we obtained a clear improvement over the other classifiers for the study of short DNA barcode sequence fragments. Copyright © 2015 Elsevier B.V. All rights reserved.
Jaiswara, Ranjana; Nandi, Diptarup; Balakrishnan, Rohini
2013-01-01
Traditional taxonomy based on morphology has often failed in accurate species identification owing to the occurrence of cryptic species, which are reproductively isolated but morphologically identical. Molecular data have thus been used to complement morphology in species identification. The sexual advertisement calls in several groups of acoustically communicating animals are species-specific and can thus complement molecular data as non-invasive tools for identification. Several statistical tools and automated identifier algorithms have been used to investigate the efficiency of acoustic signals in species identification. Despite a plethora of such methods, there is a general lack of knowledge regarding the appropriate usage of these methods in specific taxa. In this study, we investigated the performance of two commonly used statistical methods, discriminant function analysis (DFA) and cluster analysis, in identification and classification based on acoustic signals of field cricket species belonging to the subfamily Gryllinae. Using a comparative approach we evaluated the optimal number of species and calling song characteristics for both the methods that lead to most accurate classification and identification. The accuracy of classification using DFA was high and was not affected by the number of taxa used. However, a constraint in using discriminant function analysis is the need for a priori classification of songs. Accuracy of classification using cluster analysis, which does not require a priori knowledge, was maximum for 6-7 taxa and decreased significantly when more than ten taxa were analysed together. We also investigated the efficacy of two novel derived acoustic features in improving the accuracy of identification. Our results show that DFA is a reliable statistical tool for species identification using acoustic signals. Our results also show that cluster analysis of acoustic signals in crickets works effectively for species classification and identification.
Mathieu, Renaud; Aryal, Jagannath; Chong, Albert K
2007-11-20
Effective assessment of biodiversity in cities requires detailed vegetation maps.To date, most remote sensing of urban vegetation has focused on thematically coarse landcover products. Detailed habitat maps are created by manual interpretation of aerialphotographs, but this is time consuming and costly at large scale. To address this issue, wetested the effectiveness of object-based classifications that use automated imagesegmentation to extract meaningful ground features from imagery. We applied thesetechniques to very high resolution multispectral Ikonos images to produce vegetationcommunity maps in Dunedin City, New Zealand. An Ikonos image was orthorectified and amulti-scale segmentation algorithm used to produce a hierarchical network of image objects.The upper level included four coarse strata: industrial/commercial (commercial buildings),residential (houses and backyard private gardens), vegetation (vegetation patches larger than0.8/1ha), and water. We focused on the vegetation stratum that was segmented at moredetailed level to extract and classify fifteen classes of vegetation communities. The firstclassification yielded a moderate overall classification accuracy (64%, κ = 0.52), which ledus to consider a simplified classification with ten vegetation classes. The overallclassification accuracy from the simplified classification was 77% with a κ value close tothe excellent range (κ = 0.74). These results compared favourably with similar studies inother environments. We conclude that this approach does not provide maps as detailed as those produced by manually interpreting aerial photographs, but it can still extract ecologically significant classes. It is an efficient way to generate accurate and detailed maps in significantly shorter time. The final map accuracy could be improved by integrating segmentation, automated and manual classification in the mapping process, especially when considering important vegetation classes with limited spectral contrast.
Low-back electromyography (EMG) data-driven load classification for dynamic lifting tasks
Ojeda, Lauro; Johnson, Daniel D.; Gates, Deanna; Mower Provost, Emily; Barton, Kira
2018-01-01
Objective Numerous devices have been designed to support the back during lifting tasks. To improve the utility of such devices, this research explores the use of preparatory muscle activity to classify muscle loading and initiate appropriate device activation. The goal of this study was to determine the earliest time window that enabled accurate load classification during a dynamic lifting task. Methods Nine subjects performed thirty symmetrical lifts, split evenly across three weight conditions (no-weight, 10-lbs and 24-lbs), while low-back muscle activity data was collected. Seven descriptive statistics features were extracted from 100 ms windows of data. A multinomial logistic regression (MLR) classifier was trained and tested, employing leave-one subject out cross-validation, to classify lifted load values. Dimensionality reduction was achieved through feature cross-correlation analysis and greedy feedforward selection. The time of full load support by the subject was defined as load-onset. Results Regions of highest average classification accuracy started at 200 ms before until 200 ms after load-onset with average accuracies ranging from 80% (±10%) to 81% (±7%). The average recall for each class ranged from 69–92%. Conclusion These inter-subject classification results indicate that preparatory muscle activity can be leveraged to identify the intent to lift a weight up to 100 ms prior to load-onset. The high accuracies shown indicate the potential to utilize intent classification for assistive device applications. Significance Active assistive devices, e.g. exoskeletons, could prevent back injury by off-loading low-back muscles. Early intent classification allows more time for actuators to respond and integrate seamlessly with the user. PMID:29447252
Information extraction with object based support vector machines and vegetation indices
NASA Astrophysics Data System (ADS)
Ustuner, Mustafa; Abdikan, Saygin; Balik Sanli, Fusun
2016-07-01
Information extraction through remote sensing data is important for policy and decision makers as extracted information provide base layers for many application of real world. Classification of remotely sensed data is the one of the most common methods of extracting information however it is still a challenging issue because several factors are affecting the accuracy of the classification. Resolution of the imagery, number and homogeneity of land cover classes, purity of training data and characteristic of adopted classifiers are just some of these challenging factors. Object based image classification has some superiority than pixel based classification for high resolution images since it uses geometry and structure information besides spectral information. Vegetation indices are also commonly used for the classification process since it provides additional spectral information for vegetation, forestry and agricultural areas. In this study, the impacts of the Normalized Difference Vegetation Index (NDVI) and Normalized Difference Red Edge Index (NDRE) on the classification accuracy of RapidEye imagery were investigated. Object based Support Vector Machines were implemented for the classification of crop types for the study area located in Aegean region of Turkey. Results demonstrated that the incorporation of NDRE increase the classification accuracy from 79,96% to 86,80% as overall accuracy, however NDVI decrease the classification accuracy from 79,96% to 78,90%. Moreover it is proven than object based classification with RapidEye data give promising results for crop type mapping and analysis.
Seeland, Marco; Rzanny, Michael; Alaqraa, Nedal; Wäldchen, Jana; Mäder, Patrick
2017-01-01
Steady improvements of image description methods induced a growing interest in image-based plant species classification, a task vital to the study of biodiversity and ecological sensitivity. Various techniques have been proposed for general object classification over the past years and several of them have already been studied for plant species classification. However, results of these studies are selective in the evaluated steps of a classification pipeline, in the utilized datasets for evaluation, and in the compared baseline methods. No study is available that evaluates the main competing methods for building an image representation on the same datasets allowing for generalized findings regarding flower-based plant species classification. The aim of this paper is to comparatively evaluate methods, method combinations, and their parameters towards classification accuracy. The investigated methods span from detection, extraction, fusion, pooling, to encoding of local features for quantifying shape and color information of flower images. We selected the flower image datasets Oxford Flower 17 and Oxford Flower 102 as well as our own Jena Flower 30 dataset for our experiments. Findings show large differences among the various studied techniques and that their wisely chosen orchestration allows for high accuracies in species classification. We further found that true local feature detectors in combination with advanced encoding methods yield higher classification results at lower computational costs compared to commonly used dense sampling and spatial pooling methods. Color was found to be an indispensable feature for high classification results, especially while preserving spatial correspondence to gray-level features. In result, our study provides a comprehensive overview of competing techniques and the implications of their main parameters for flower-based plant species classification. PMID:28234999
Bolin, Jocelyn Holden; Finch, W Holmes
2014-01-01
Statistical classification of phenomena into observed groups is very common in the social and behavioral sciences. Statistical classification methods, however, are affected by the characteristics of the data under study. Statistical classification can be further complicated by initial misclassification of the observed groups. The purpose of this study is to investigate the impact of initial training data misclassification on several statistical classification and data mining techniques. Misclassification conditions in the three group case will be simulated and results will be presented in terms of overall as well as subgroup classification accuracy. Results show decreased classification accuracy as sample size, group separation and group size ratio decrease and as misclassification percentage increases with random forests demonstrating the highest accuracy across conditions.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zbijewski, W., E-mail: wzbijewski@jhu.edu; Gang, G. J.; Xu, J.
2014-02-15
Purpose: Cone-beam CT (CBCT) with a flat-panel detector (FPD) is finding application in areas such as breast and musculoskeletal imaging, where dual-energy (DE) capabilities offer potential benefit. The authors investigate the accuracy of material classification in DE CBCT using filtered backprojection (FBP) and penalized likelihood (PL) reconstruction and optimize contrast-enhanced DE CBCT of the joints as a function of dose, material concentration, and detail size. Methods: Phantoms consisting of a 15 cm diameter water cylinder with solid calcium inserts (50–200 mg/ml, 3–28.4 mm diameter) and solid iodine inserts (2–10 mg/ml, 3–28.4 mm diameter), as well as a cadaveric knee withmore » intra-articular injection of iodine were imaged on a CBCT bench with a Varian 4343 FPD. The low energy (LE) beam was 70 kVp (+0.2 mm Cu), and the high energy (HE) beam was 120 kVp (+0.2 mm Cu, +0.5 mm Ag). Total dose (LE+HE) was varied from 3.1 to 15.6 mGy with equal dose allocation. Image-based DE classification involved a nearest distance classifier in the space of LE versus HE attenuation values. Recognizing the differences in noise between LE and HE beams, the LE and HE data were differentially filtered (in FBP) or regularized (in PL). Both a quadratic (PLQ) and a total-variation penalty (PLTV) were investigated for PL. The performance of DE CBCT material discrimination was quantified in terms of voxelwise specificity, sensitivity, and accuracy. Results: Noise in the HE image was primarily responsible for classification errors within the contrast inserts, whereas noise in the LE image mainly influenced classification in the surrounding water. For inserts of diameter 28.4 mm, DE CBCT reconstructions were optimized to maximize the total combined accuracy across the range of calcium and iodine concentrations, yielding values of ∼88% for FBP and PLQ, and ∼95% for PLTV at 3.1 mGy total dose, increasing to ∼95% for FBP and PLQ, and ∼98% for PLTV at 15.6 mGy total dose. For a fixed iodine concentration of 5 mg/ml and reconstructions maximizing overall accuracy across the range of insert diameters, the minimum diameter classified with accuracy >80% was ∼15 mm for FBP and PLQ and ∼10 mm for PLTV, improving to ∼7 mm for FBP and PLQ and ∼3 mm for PLTV at 15.6 mGy. The results indicate similar performance for FBP and PLQ and showed improved classification accuracy with edge-preserving PLTV. A slight preference for increased smoothing of the HE data was found. DE CBCT discrimination of iodine and bone in the knee was demonstrated with FBP and PLTV at 6.2 mGy total dose. Conclusions: For iodine concentrations >5 mg/ml and detail size ∼20 mm, material classification accuracy of >90% was achieved in DE CBCT with both FBP and PL at total doses <10 mGy. Optimal performance was attained by selection of reconstruction parameters based on the differences in noise between HE and LE data, typically favoring stronger smoothing of the HE data, and by using penalties matched to the imaging task (e.g., edge-preserving PLTV in areas of uniform enhancement)« less
Analysis of near infrared spectra for age-grading of wild populations of Anopheles gambiae.
Krajacich, Benjamin J; Meyers, Jacob I; Alout, Haoues; Dabiré, Roch K; Dowell, Floyd E; Foy, Brian D
2017-11-07
Understanding the age-structure of mosquito populations, especially malaria vectors such as Anopheles gambiae, is important for assessing the risk of infectious mosquitoes, and how vector control interventions may impact this risk. The use of near-infrared spectroscopy (NIRS) for age-grading has been demonstrated previously on laboratory and semi-field mosquitoes, but to date has not been utilized on wild-caught mosquitoes whose age is externally validated via parity status or parasite infection stage. In this study, we developed regression and classification models using NIRS on datasets of wild An. gambiae (s.l.) reared from larvae collected from the field in Burkina Faso, and two laboratory strains. We compared the accuracy of these models for predicting the ages of wild-caught mosquitoes that had been scored for their parity status as well as for positivity for Plasmodium sporozoites. Regression models utilizing variable selection increased predictive accuracy over the more common full-spectrum partial least squares (PLS) approach for cross-validation of the datasets, validation, and independent test sets. Models produced from datasets that included the greatest range of mosquito samples (i.e. different sampling locations and times) had the highest predictive accuracy on independent testing sets, though overall accuracy on these samples was low. For classification, we found that intramodel accuracy ranged between 73.5-97.0% for grouping of mosquitoes into "early" and "late" age classes, with the highest prediction accuracy found in laboratory colonized mosquitoes. However, this accuracy was decreased on test sets, with the highest classification of an independent set of wild-caught larvae reared to set ages being 69.6%. Variation in NIRS data, likely from dietary, genetic, and other factors limits the accuracy of this technique with wild-caught mosquitoes. Alternative algorithms may help improve prediction accuracy, but care should be taken to either maximize variety in models or minimize confounders.
Bashir, Saba; Qamar, Usman; Khan, Farhan Hassan
2016-02-01
Accuracy plays a vital role in the medical field as it concerns with the life of an individual. Extensive research has been conducted on disease classification and prediction using machine learning techniques. However, there is no agreement on which classifier produces the best results. A specific classifier may be better than others for a specific dataset, but another classifier could perform better for some other dataset. Ensemble of classifiers has been proved to be an effective way to improve classification accuracy. In this research we present an ensemble framework with multi-layer classification using enhanced bagging and optimized weighting. The proposed model called "HM-BagMoov" overcomes the limitations of conventional performance bottlenecks by utilizing an ensemble of seven heterogeneous classifiers. The framework is evaluated on five different heart disease datasets, four breast cancer datasets, two diabetes datasets, two liver disease datasets and one hepatitis dataset obtained from public repositories. The analysis of the results show that ensemble framework achieved the highest accuracy, sensitivity and F-Measure when compared with individual classifiers for all the diseases. In addition to this, the ensemble framework also achieved the highest accuracy when compared with the state of the art techniques. An application named "IntelliHealth" is also developed based on proposed model that may be used by hospitals/doctors for diagnostic advice. Copyright © 2015 Elsevier Inc. All rights reserved.
Automatic Fault Characterization via Abnormality-Enhanced Classification
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bronevetsky, G; Laguna, I; de Supinski, B R
Enterprise and high-performance computing systems are growing extremely large and complex, employing hundreds to hundreds of thousands of processors and software/hardware stacks built by many people across many organizations. As the growing scale of these machines increases the frequency of faults, system complexity makes these faults difficult to detect and to diagnose. Current system management techniques, which focus primarily on efficient data access and query mechanisms, require system administrators to examine the behavior of various system services manually. Growing system complexity is making this manual process unmanageable: administrators require more effective management tools that can detect faults and help tomore » identify their root causes. System administrators need timely notification when a fault is manifested that includes the type of fault, the time period in which it occurred and the processor on which it originated. Statistical modeling approaches can accurately characterize system behavior. However, the complex effects of system faults make these tools difficult to apply effectively. This paper investigates the application of classification and clustering algorithms to fault detection and characterization. We show experimentally that naively applying these methods achieves poor accuracy. Further, we design novel techniques that combine classification algorithms with information on the abnormality of application behavior to improve detection and characterization accuracy. Our experiments demonstrate that these techniques can detect and characterize faults with 65% accuracy, compared to just 5% accuracy for naive approaches.« less
Karuppiah Ramachandran, Vignesh Raja; Alblas, Huibert J; Le, Duc V; Meratnia, Nirvana
2018-05-24
In the last decade, seizure prediction systems have gained a lot of attention because of their enormous potential to largely improve the quality-of-life of the epileptic patients. The accuracy of the prediction algorithms to detect seizure in real-world applications is largely limited because the brain signals are inherently uncertain and affected by various factors, such as environment, age, drug intake, etc., in addition to the internal artefacts that occur during the process of recording the brain signals. To deal with such ambiguity, researchers transitionally use active learning, which selects the ambiguous data to be annotated by an expert and updates the classification model dynamically. However, selecting the particular data from a pool of large ambiguous datasets to be labelled by an expert is still a challenging problem. In this paper, we propose an active learning-based prediction framework that aims to improve the accuracy of the prediction with a minimum number of labelled data. The core technique of our framework is employing the Bernoulli-Gaussian Mixture model (BGMM) to determine the feature samples that have the most ambiguity to be annotated by an expert. By doing so, our approach facilitates expert intervention as well as increasing medical reliability. We evaluate seven different classifiers in terms of the classification time and memory required. An active learning framework built on top of the best performing classifier is evaluated in terms of required annotation effort to achieve a high level of prediction accuracy. The results show that our approach can achieve the same accuracy as a Support Vector Machine (SVM) classifier using only 20 % of the labelled data and also improve the prediction accuracy even under the noisy condition.
NASA Astrophysics Data System (ADS)
Ghaffarian, S.; Ghaffarian, S.
2014-08-01
This paper presents a novel approach to detect the buildings by automization of the training area collecting stage for supervised classification. The method based on the fact that a 3d building structure should cast a shadow under suitable imaging conditions. Therefore, the methodology begins with the detection and masking out the shadow areas using luminance component of the LAB color space, which indicates the lightness of the image, and a novel double thresholding technique. Further, the training areas for supervised classification are selected by automatically determining a buffer zone on each building whose shadow is detected by using the shadow shape and the sun illumination direction. Thereafter, by calculating the statistic values of each buffer zone which is collected from the building areas the Improved Parallelepiped Supervised Classification is executed to detect the buildings. Standard deviation thresholding applied to the Parallelepiped classification method to improve its accuracy. Finally, simple morphological operations conducted for releasing the noises and increasing the accuracy of the results. The experiments were performed on set of high resolution Google Earth images. The performance of the proposed approach was assessed by comparing the results of the proposed approach with the reference data by using well-known quality measurements (Precision, Recall and F1-score) to evaluate the pixel-based and object-based performances of the proposed approach. Evaluation of the results illustrates that buildings detected from dense and suburban districts with divers characteristics and color combinations using our proposed method have 88.4 % and 853 % overall pixel-based and object-based precision performances, respectively.
Minimum distance classification in remote sensing
NASA Technical Reports Server (NTRS)
Wacker, A. G.; Landgrebe, D. A.
1972-01-01
The utilization of minimum distance classification methods in remote sensing problems, such as crop species identification, is considered. Literature concerning both minimum distance classification problems and distance measures is reviewed. Experimental results are presented for several examples. The objective of these examples is to: (a) compare the sample classification accuracy of a minimum distance classifier, with the vector classification accuracy of a maximum likelihood classifier, and (b) compare the accuracy of a parametric minimum distance classifier with that of a nonparametric one. Results show the minimum distance classifier performance is 5% to 10% better than that of the maximum likelihood classifier. The nonparametric classifier is only slightly better than the parametric version.
Combining multiple features for color texture classification
NASA Astrophysics Data System (ADS)
Cusano, Claudio; Napoletano, Paolo; Schettini, Raimondo
2016-11-01
The analysis of color and texture has a long history in image analysis and computer vision. These two properties are often considered as independent, even though they are strongly related in images of natural objects and materials. Correlation between color and texture information is especially relevant in the case of variable illumination, a condition that has a crucial impact on the effectiveness of most visual descriptors. We propose an ensemble of hand-crafted image descriptors designed to capture different aspects of color textures. We show that the use of these descriptors in a multiple classifiers framework makes it possible to achieve a very high classification accuracy in classifying texture images acquired under different lighting conditions. A powerful alternative to hand-crafted descriptors is represented by features obtained with deep learning methods. We also show how the proposed combining strategy hand-crafted and convolutional neural networks features can be used together to further improve the classification accuracy. Experimental results on a food database (raw food texture) demonstrate the effectiveness of the proposed strategy.
A stochastic atmospheric model for remote sensing applications
NASA Technical Reports Server (NTRS)
Turner, R. E.
1983-01-01
There are many factors which reduce the accuracy of classification of objects in the satellite remote sensing of Earth's surface. One important factor is the variability in the scattering and absorptive properties of the atmospheric components such as particulates and the variable gases. For multispectral remote sensing of the Earth's surface in the visible and infrared parts of the spectrum the atmospheric particulates are a major source of variability in the received signal. It is difficult to design a sensor which will determine the unknown atmospheric components by remote sensing methods, at least to the accuracy needed for multispectral classification. The problem of spatial and temporal variations in the atmospheric quantities which can affect the measured radiances are examined. A method based upon the stochastic nature of the atmospheric components was developed, and, using actual data the statistical parameters needed for inclusion into a radiometric model was generated. Methods are then described for an improved correction of radiances. These algorithms will then result in a more accurate and consistent classification procedure.
Zhang, Wenyu; Zhang, Zhenjiang
2015-01-01
Decision fusion in sensor networks enables sensors to improve classification accuracy while reducing the energy consumption and bandwidth demand for data transmission. In this paper, we focus on the decentralized multi-class classification fusion problem in wireless sensor networks (WSNs) and a new simple but effective decision fusion rule based on belief function theory is proposed. Unlike existing belief function based decision fusion schemes, the proposed approach is compatible with any type of classifier because the basic belief assignments (BBAs) of each sensor are constructed on the basis of the classifier’s training output confusion matrix and real-time observations. We also derive explicit global BBA in the fusion center under Dempster’s combinational rule, making the decision making operation in the fusion center greatly simplified. Also, sending the whole BBA structure to the fusion center is avoided. Experimental results demonstrate that the proposed fusion rule has better performance in fusion accuracy compared with the naïve Bayes rule and weighted majority voting rule. PMID:26295399
NASA Astrophysics Data System (ADS)
Yekkehkhany, B.; Safari, A.; Homayouni, S.; Hasanlou, M.
2014-10-01
In this paper, a framework is developed based on Support Vector Machines (SVM) for crop classification using polarimetric features extracted from multi-temporal Synthetic Aperture Radar (SAR) imageries. The multi-temporal integration of data not only improves the overall retrieval accuracy but also provides more reliable estimates with respect to single-date data. Several kernel functions are employed and compared in this study for mapping the input space to higher Hilbert dimension space. These kernel functions include linear, polynomials and Radial Based Function (RBF). The method is applied to several UAVSAR L-band SAR images acquired over an agricultural area near Winnipeg, Manitoba, Canada. In this research, the temporal alpha features of H/A/α decomposition method are used in classification. The experimental tests show an SVM classifier with RBF kernel for three dates of data increases the Overall Accuracy (OA) to up to 3% in comparison to using linear kernel function, and up to 1% in comparison to a 3rd degree polynomial kernel function.
Effects of temporal variability in ground data collection on classification accuracy
Hoch, G.A.; Cully, J.F.
1999-01-01
This research tested whether the timing of ground data collection can significantly impact the accuracy of land cover classification. Ft. Riley Military Reservation, Kansas, USA was used to test this hypothesis. The U.S. Army's Land Condition Trend Analysis (LCTA) data annually collected at military bases was used to ground truth disturbance patterns. Ground data collected over an entire growing season and data collected one year after the imagery had a kappa statistic of 0.33. When using ground data from only within two weeks of image acquisition the kappa statistic improved to 0.55. Potential sources of this discrepancy are identified. These data demonstrate that there can be significant amounts of land cover change within a narrow time window on military reservations. To accurately conduct land cover classification at military reservations, ground data need to be collected in as narrow a window of time as possible and be closely synchronized with the date of the satellite imagery.
Optical signal processing using photonic reservoir computing
NASA Astrophysics Data System (ADS)
Salehi, Mohammad Reza; Dehyadegari, Louiza
2014-10-01
As a new approach to recognition and classification problems, photonic reservoir computing has such advantages as parallel information processing, power efficient and high speed. In this paper, a photonic structure has been proposed for reservoir computing which is investigated using a simple, yet, non-partial noisy time series prediction task. This study includes the application of a suitable topology with self-feedbacks in a network of SOA's - which lends the system a strong memory - and leads to adjusting adequate parameters resulting in perfect recognition accuracy (100%) for noise-free time series, which shows a 3% improvement over previous results. For the classification of noisy time series, the rate of accuracy showed a 4% increase and amounted to 96%. Furthermore, an analytical approach was suggested to solve rate equations which led to a substantial decrease in the simulation time, which is an important parameter in classification of large signals such as speech recognition, and better results came up compared with previous works.
Characteristics of Forests in Western Sayani Mountains, Siberia from SAR Data
NASA Technical Reports Server (NTRS)
Ranson, K. Jon; Sun, Guoqing; Kharuk, V. I.; Kovacs, Katalin
1998-01-01
This paper investigated the possibility of using spaceborne radar data to map forest types and logging in the mountainous Western Sayani area in Siberia. L and C band HH, HV, and VV polarized images from the Shuttle Imaging Radar-C instrument were used in the study. Techniques to reduce topographic effects in the radar images were investigated. These included radiometric correction using illumination angle inferred from a digital elevation model, and reducing apparent effects of topography through band ratios. Forest classification was performed after terrain correction utilizing typical supervised techniques and principal component analyses. An ancillary data set of local elevations was also used to improve the forest classification. Map accuracy for each technique was estimated for training sites based on Russian forestry maps, satellite imagery and field measurements. The results indicate that it is necessary to correct for topography when attempting to classify forests in mountainous terrain. Radiometric correction based on a DEM (Digital Elevation Model) improved classification results but required reducing the SAR (Synthetic Aperture Radar) resolution to match the DEM. Using ratios of SAR channels that include cross-polarization improved classification and
Super-resolution mapping using multi-viewing CHRIS/PROBA data
NASA Astrophysics Data System (ADS)
Dwivedi, Manish; Kumar, Vinay
2016-04-01
High-spatial resolution Remote Sensing (RS) data provides detailed information which ensures high-definition visual image analysis of earth surface features. These data sets also support improved information extraction capabilities at a fine scale. In order to improve the spatial resolution of coarser resolution RS data, the Super Resolution Reconstruction (SRR) technique has become widely acknowledged which focused on multi-angular image sequences. In this study multi-angle CHRIS/PROBA data of Kutch area is used for SR image reconstruction to enhance the spatial resolution from 18 m to 6m in the hope to obtain a better land cover classification. Various SR approaches like Projection onto Convex Sets (POCS), Robust, Iterative Back Projection (IBP), Non-Uniform Interpolation and Structure-Adaptive Normalized Convolution (SANC) chosen for this study. Subjective assessment through visual interpretation shows substantial improvement in land cover details. Quantitative measures including peak signal to noise ratio and structural similarity are used for the evaluation of the image quality. It was observed that SANC SR technique using Vandewalle algorithm for the low resolution image registration outperformed the other techniques. After that SVM based classifier is used for the classification of SRR and data resampled to 6m spatial resolution using bi-cubic interpolation. A comparative analysis is carried out between classified data of bicubic interpolated and SR derived images of CHRIS/PROBA and SR derived classified data have shown a significant improvement of 10-12% in the overall accuracy. The results demonstrated that SR methods is able to improve spatial detail of multi-angle images as well as the classification accuracy.
Terrain-Moisture Classification Using GPS Surface-Reflected Signals
NASA Technical Reports Server (NTRS)
Grant, Michael S.; Acton, Scott T.; Katzberg, Stephen J.
2006-01-01
In this study we present a novel method of land surface classification using surface-reflected GPS signals in combination with digital imagery. Two GPS-derived classification features are merged with visible image data to create terrain-moisture (TM) classes, defined here as visibly identifiable terrain or landcover classes containing a surface/soil moisture component. As compared to using surface imagery alone, classification accuracy is significantly improved for a number of visible classes when adding the GPS-based signal features. Since the strength of the reflected GPS signal is proportional to the amount of moisture in the surface, use of these GPS features provides information about the surface that is not obtainable using visible wavelengths alone. Application areas include hydrology, precision agriculture, and wetlands mapping.
Unsupervised classification of remote multispectral sensing data
NASA Technical Reports Server (NTRS)
Su, M. Y.
1972-01-01
The new unsupervised classification technique for classifying multispectral remote sensing data which can be either from the multispectral scanner or digitized color-separation aerial photographs consists of two parts: (a) a sequential statistical clustering which is a one-pass sequential variance analysis and (b) a generalized K-means clustering. In this composite clustering technique, the output of (a) is a set of initial clusters which are input to (b) for further improvement by an iterative scheme. Applications of the technique using an IBM-7094 computer on multispectral data sets over Purdue's Flight Line C-1 and the Yellowstone National Park test site have been accomplished. Comparisons between the classification maps by the unsupervised technique and the supervised maximum liklihood technique indicate that the classification accuracies are in agreement.
A supervised learning rule for classification of spatiotemporal spike patterns.
Lilin Guo; Zhenzhong Wang; Adjouadi, Malek
2016-08-01
This study introduces a novel supervised algorithm for spiking neurons that take into consideration synapse delays and axonal delays associated with weights. It can be utilized for both classification and association and uses several biologically influenced properties, such as axonal and synaptic delays. This algorithm also takes into consideration spike-timing-dependent plasticity as in Remote Supervised Method (ReSuMe). This paper focuses on the classification aspect alone. Spiked neurons trained according to this proposed learning rule are capable of classifying different categories by the associated sequences of precisely timed spikes. Simulation results have shown that the proposed learning method greatly improves classification accuracy when compared to the Spike Pattern Association Neuron (SPAN) and the Tempotron learning rule.
NASA Astrophysics Data System (ADS)
Hramov, Alexander E.; Frolov, Nikita S.; Musatov, Vyachaslav Yu.
2018-02-01
In present work we studied features of the human brain states classification, corresponding to the real movements of hands and legs. For this purpose we used supervised learning algorithm based on feed-forward artificial neural networks (ANNs) with error back-propagation along with the support vector machine (SVM) method. We compared the quality of operator movements classification by means of EEG signals obtained experimentally in the absence of preliminary processing and after filtration in different ranges up to 25 Hz. It was shown that low-frequency filtering of multichannel EEG data significantly improved accuracy of operator movements classification.
Alzheimer's Disease Detection by Pseudo Zernike Moment and Linear Regression Classification.
Wang, Shui-Hua; Du, Sidan; Zhang, Yin; Phillips, Preetha; Wu, Le-Nan; Chen, Xian-Qing; Zhang, Yu-Dong
2017-01-01
This study presents an improved method based on "Gorji et al. Neuroscience. 2015" by introducing a relatively new classifier-linear regression classification. Our method selects one axial slice from 3D brain image, and employed pseudo Zernike moment with maximum order of 15 to extract 256 features from each image. Finally, linear regression classification was harnessed as the classifier. The proposed approach obtains an accuracy of 97.51%, a sensitivity of 96.71%, and a specificity of 97.73%. Our method performs better than Gorji's approach and five other state-of-the-art approaches. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
Gene selection for tumor classification using neighborhood rough sets and entropy measures.
Chen, Yumin; Zhang, Zunjun; Zheng, Jianzhong; Ma, Ying; Xue, Yu
2017-03-01
With the development of bioinformatics, tumor classification from gene expression data becomes an important useful technology for cancer diagnosis. Since a gene expression data often contains thousands of genes and a small number of samples, gene selection from gene expression data becomes a key step for tumor classification. Attribute reduction of rough sets has been successfully applied to gene selection field, as it has the characters of data driving and requiring no additional information. However, traditional rough set method deals with discrete data only. As for the gene expression data containing real-value or noisy data, they are usually employed by a discrete preprocessing, which may result in poor classification accuracy. In this paper, we propose a novel gene selection method based on the neighborhood rough set model, which has the ability of dealing with real-value data whilst maintaining the original gene classification information. Moreover, this paper addresses an entropy measure under the frame of neighborhood rough sets for tackling the uncertainty and noisy of gene expression data. The utilization of this measure can bring about a discovery of compact gene subsets. Finally, a gene selection algorithm is designed based on neighborhood granules and the entropy measure. Some experiments on two gene expression data show that the proposed gene selection is an effective method for improving the accuracy of tumor classification. Copyright © 2017 Elsevier Inc. All rights reserved.
Lewicke, Aaron; Sazonov, Edward; Corwin, Michael J; Neuman, Michael; Schuckers, Stephanie
2008-01-01
Reliability of classification performance is important for many biomedical applications. A classification model which considers reliability in the development of the model such that unreliable segments are rejected would be useful, particularly, in large biomedical data sets. This approach is demonstrated in the development of a technique to reliably determine sleep and wake using only the electrocardiogram (ECG) of infants. Typically, sleep state scoring is a time consuming task in which sleep states are manually derived from many physiological signals. The method was tested with simultaneous 8-h ECG and polysomnogram (PSG) determined sleep scores from 190 infants enrolled in the collaborative home infant monitoring evaluation (CHIME) study. Learning vector quantization (LVQ) neural network, multilayer perceptron (MLP) neural network, and support vector machines (SVMs) are tested as the classifiers. After systematic rejection of difficult to classify segments, the models can achieve 85%-87% correct classification while rejecting only 30% of the data. This corresponds to a Kappa statistic of 0.65-0.68. With rejection, accuracy improves by about 8% over a model without rejection. Additionally, the impact of the PSG scored indeterminate state epochs is analyzed. The advantages of a reliable sleep/wake classifier based only on ECG include high accuracy, simplicity of use, and low intrusiveness. Reliability of the classification can be built directly in the model, such that unreliable segments are rejected.
Peng, Xiang; King, Irwin
2008-01-01
The Biased Minimax Probability Machine (BMPM) constructs a classifier which deals with the imbalanced learning tasks. It provides a worst-case bound on the probability of misclassification of future data points based on reliable estimates of means and covariance matrices of the classes from the training data samples, and achieves promising performance. In this paper, we develop a novel yet critical extension training algorithm for BMPM that is based on Second-Order Cone Programming (SOCP). Moreover, we apply the biased classification model to medical diagnosis problems to demonstrate its usefulness. By removing some crucial assumptions in the original solution to this model, we make the new method more accurate and robust. We outline the theoretical derivatives of the biased classification model, and reformulate it into an SOCP problem which could be efficiently solved with global optima guarantee. We evaluate our proposed SOCP-based BMPM (BMPMSOCP) scheme in comparison with traditional solutions on medical diagnosis tasks where the objectives are to focus on improving the sensitivity (the accuracy of the more important class, say "ill" samples) instead of the overall accuracy of the classification. Empirical results have shown that our method is more effective and robust to handle imbalanced classification problems than traditional classification approaches, and the original Fractional Programming-based BMPM (BMPMFP).
A Power Transformers Fault Diagnosis Model Based on Three DGA Ratios and PSO Optimization SVM
NASA Astrophysics Data System (ADS)
Ma, Hongzhe; Zhang, Wei; Wu, Rongrong; Yang, Chunyan
2018-03-01
In order to make up for the shortcomings of existing transformer fault diagnosis methods in dissolved gas-in-oil analysis (DGA) feature selection and parameter optimization, a transformer fault diagnosis model based on the three DGA ratios and particle swarm optimization (PSO) optimize support vector machine (SVM) is proposed. Using transforming support vector machine to the nonlinear and multi-classification SVM, establishing the particle swarm optimization to optimize the SVM multi classification model, and conducting transformer fault diagnosis combined with the cross validation principle. The fault diagnosis results show that the average accuracy of test method is better than the standard support vector machine and genetic algorithm support vector machine, and the proposed method can effectively improve the accuracy of transformer fault diagnosis is proved.
Adaptive distributed outlier detection for WSNs.
De Paola, Alessandra; Gaglio, Salvatore; Lo Re, Giuseppe; Milazzo, Fabrizio; Ortolani, Marco
2015-05-01
The paradigm of pervasive computing is gaining more and more attention nowadays, thanks to the possibility of obtaining precise and continuous monitoring. Ease of deployment and adaptivity are typically implemented by adopting autonomous and cooperative sensory devices; however, for such systems to be of any practical use, reliability and fault tolerance must be guaranteed, for instance by detecting corrupted readings amidst the huge amount of gathered sensory data. This paper proposes an adaptive distributed Bayesian approach for detecting outliers in data collected by a wireless sensor network; our algorithm aims at optimizing classification accuracy, time complexity and communication complexity, and also considering externally imposed constraints on such conflicting goals. The performed experimental evaluation showed that our approach is able to improve the considered metrics for latency and energy consumption, with limited impact on classification accuracy.
Comparison of wheat classification accuracy using different classifiers of the image-100 system
NASA Technical Reports Server (NTRS)
Dejesusparada, N. (Principal Investigator); Chen, S. C.; Moreira, M. A.; Delima, A. M.
1981-01-01
Classification results using single-cell and multi-cell signature acquisition options, a point-by-point Gaussian maximum-likelihood classifier, and K-means clustering of the Image-100 system are presented. Conclusions reached are that: a better indication of correct classification can be provided by using a test area which contains various cover types of the study area; classification accuracy should be evaluated considering both the percentages of correct classification and error of commission; supervised classification approaches are better than K-means clustering; Gaussian distribution maximum likelihood classifier is better than Single-cell and Multi-cell Signature Acquisition Options of the Image-100 system; and in order to obtain a high classification accuracy in a large and heterogeneous crop area, using Gaussian maximum-likelihood classifier, homogeneous spectral subclasses of the study crop should be created to derive training statistics.
Protein Secondary Structure Prediction Using AutoEncoder Network and Bayes Classifier
NASA Astrophysics Data System (ADS)
Wang, Leilei; Cheng, Jinyong
2018-03-01
Protein secondary structure prediction is belong to bioinformatics,and it's important in research area. In this paper, we propose a new prediction way of protein using bayes classifier and autoEncoder network. Our experiments show some algorithms including the construction of the model, the classification of parameters and so on. The data set is a typical CB513 data set for protein. In terms of accuracy, the method is the cross validation based on the 3-fold. Then we can get the Q3 accuracy. Paper results illustrate that the autoencoder network improved the prediction accuracy of protein secondary structure.
Guo, Xinyu; Dominick, Kelli C; Minai, Ali A; Li, Hailong; Erickson, Craig A; Lu, Long J
2017-01-01
The whole-brain functional connectivity (FC) pattern obtained from resting-state functional magnetic resonance imaging data are commonly applied to study neuropsychiatric conditions such as autism spectrum disorder (ASD) by using different machine learning models. Recent studies indicate that both hyper- and hypo- aberrant ASD-associated FCs were widely distributed throughout the entire brain rather than only in some specific brain regions. Deep neural networks (DNN) with multiple hidden layers have shown the ability to systematically extract lower-to-higher level information from high dimensional data across a series of neural hidden layers, significantly improving classification accuracy for such data. In this study, a DNN with a novel feature selection method (DNN-FS) is developed for the high dimensional whole-brain resting-state FC pattern classification of ASD patients vs. typical development (TD) controls. The feature selection method is able to help the DNN generate low dimensional high-quality representations of the whole-brain FC patterns by selecting features with high discriminating power from multiple trained sparse auto-encoders. For the comparison, a DNN without the feature selection method (DNN-woFS) is developed, and both of them are tested with different architectures (i.e., with different numbers of hidden layers/nodes). Results show that the best classification accuracy of 86.36% is generated by the DNN-FS approach with 3 hidden layers and 150 hidden nodes (3/150). Remarkably, DNN-FS outperforms DNN-woFS for all architectures studied. The most significant accuracy improvement was 9.09% with the 3/150 architecture. The method also outperforms other feature selection methods, e.g., two sample t -test and elastic net. In addition to improving the classification accuracy, a Fisher's score-based biomarker identification method based on the DNN is also developed, and used to identify 32 FCs related to ASD. These FCs come from or cross different pre-defined brain networks including the default-mode, cingulo-opercular, frontal-parietal, and cerebellum. Thirteen of them are statically significant between ASD and TD groups (two sample t -test p < 0.05) while 19 of them are not. The relationship between the statically significant FCs and the corresponding ASD behavior symptoms is discussed based on the literature and clinician's expert knowledge. Meanwhile, the potential reason of obtaining 19 FCs which are not statistically significant is also provided.
NASA Astrophysics Data System (ADS)
Książek, Judyta
2015-10-01
At present, there has been a great interest in the development of texture based image classification methods in many different areas. This study presents the results of research carried out to assess the usefulness of selected textural features for detection of asbestos-cement roofs in orthophotomap classification. Two different orthophotomaps of southern Poland (with ground resolution: 5 cm and 25 cm) were used. On both orthoimages representative samples for two classes: asbestos-cement roofing sheets and other roofing materials were selected. Estimation of texture analysis usefulness was conducted using machine learning methods based on decision trees (C5.0 algorithm). For this purpose, various sets of texture parameters were calculated in MaZda software. During the calculation of decision trees different numbers of texture parameters groups were considered. In order to obtain the best settings for decision trees models cross-validation was performed. Decision trees models with the lowest mean classification error were selected. The accuracy of the classification was held based on validation data sets, which were not used for the classification learning. For 5 cm ground resolution samples, the lowest mean classification error was 15.6%. The lowest mean classification error in the case of 25 cm ground resolution was 20.0%. The obtained results confirm potential usefulness of the texture parameter image processing for detection of asbestos-cement roofing sheets. In order to improve the accuracy another extended study should be considered in which additional textural features as well as spectral characteristics should be analyzed.
Accuracy of Remotely Sensed Classifications For Stratification of Forest and Nonforest Lands
Raymond L. Czaplewski; Paul L. Patterson
2001-01-01
We specify accuracy standards for remotely sensed classifications used by FIA to stratify landscapes into two categories: forest and nonforest. Accuracy must be highest when forest area approaches 100 percent of the landscape. If forest area is rare in a landscape, then accuracy in the nonforest stratum must be very high, even at the expense of accuracy in the forest...
NASA Astrophysics Data System (ADS)
Shahriari Nia, Morteza; Wang, Daisy Zhe; Bohlman, Stephanie Ann; Gader, Paul; Graves, Sarah J.; Petrovic, Milenko
2015-01-01
Hyperspectral images can be used to identify savannah tree species at the landscape scale, which is a key step in measuring biomass and carbon, and tracking changes in species distributions, including invasive species, in these ecosystems. Before automated species mapping can be performed, image processing and atmospheric correction is often performed, which can potentially affect the performance of classification algorithms. We determine how three processing and correction techniques (atmospheric correction, Gaussian filters, and shade/green vegetation filters) affect the prediction accuracy of classification of tree species at pixel level from airborne visible/infrared imaging spectrometer imagery of longleaf pine savanna in Central Florida, United States. Species classification using fast line-of-sight atmospheric analysis of spectral hypercubes (FLAASH) atmospheric correction outperformed ATCOR in the majority of cases. Green vegetation (normalized difference vegetation index) and shade (near-infrared) filters did not increase classification accuracy when applied to large and continuous patches of specific species. Finally, applying a Gaussian filter reduces interband noise and increases species classification accuracy. Using the optimal preprocessing steps, our classification accuracy of six species classes is about 75%.
NASA Astrophysics Data System (ADS)
Wei, Hongqiang; Zhou, Guiyun; Zhou, Junjie
2018-04-01
The classification of leaf and wood points is an essential preprocessing step for extracting inventory measurements and canopy characterization of trees from the terrestrial laser scanning (TLS) data. The geometry-based approach is one of the widely used classification method. In the geometry-based method, it is common practice to extract salient features at one single scale before the features are used for classification. It remains unclear how different scale(s) used affect the classification accuracy and efficiency. To assess the scale effect on the classification accuracy and efficiency, we extracted the single-scale and multi-scale salient features from the point clouds of two oak trees of different sizes and conducted the classification on leaf and wood. Our experimental results show that the balanced accuracy of the multi-scale method is higher than the average balanced accuracy of the single-scale method by about 10 % for both trees. The average speed-up ratio of single scale classifiers over multi-scale classifier for each tree is higher than 30.
Comparing Features for Classification of MEG Responses to Motor Imagery.
Halme, Hanna-Leena; Parkkonen, Lauri
2016-01-01
Motor imagery (MI) with real-time neurofeedback could be a viable approach, e.g., in rehabilitation of cerebral stroke. Magnetoencephalography (MEG) noninvasively measures electric brain activity at high temporal resolution and is well-suited for recording oscillatory brain signals. MI is known to modulate 10- and 20-Hz oscillations in the somatomotor system. In order to provide accurate feedback to the subject, the most relevant MI-related features should be extracted from MEG data. In this study, we evaluated several MEG signal features for discriminating between left- and right-hand MI and between MI and rest. MEG was measured from nine healthy participants imagining either left- or right-hand finger tapping according to visual cues. Data preprocessing, feature extraction and classification were performed offline. The evaluated MI-related features were power spectral density (PSD), Morlet wavelets, short-time Fourier transform (STFT), common spatial patterns (CSP), filter-bank common spatial patterns (FBCSP), spatio-spectral decomposition (SSD), and combined SSD+CSP, CSP+PSD, CSP+Morlet, and CSP+STFT. We also compared four classifiers applied to single trials using 5-fold cross-validation for evaluating the classification accuracy and its possible dependence on the classification algorithm. In addition, we estimated the inter-session left-vs-right accuracy for each subject. The SSD+CSP combination yielded the best accuracy in both left-vs-right (mean 73.7%) and MI-vs-rest (mean 81.3%) classification. CSP+Morlet yielded the best mean accuracy in inter-session left-vs-right classification (mean 69.1%). There were large inter-subject differences in classification accuracy, and the level of the 20-Hz suppression correlated significantly with the subjective MI-vs-rest accuracy. Selection of the classification algorithm had only a minor effect on the results. We obtained good accuracy in sensor-level decoding of MI from single-trial MEG data. Feature extraction methods utilizing both the spatial and spectral profile of MI-related signals provided the best classification results, suggesting good performance of these methods in an online MEG neurofeedback system.
Mapping forest tree species over large areas with partially cloudy Landsat imagery
NASA Astrophysics Data System (ADS)
Turlej, K.; Radeloff, V.
2017-12-01
Forests provide numerous services to natural systems and humankind, but which services forest provide depends greatly on their tree species composition. That makes it important to track not only changes in forest extent, something that remote sensing excels in, but also to map tree species. The main goal of our work was to map tree species with Landsat imagery, and to identify how to maximize mapping accuracy by including partially cloudy imagery. Our study area covered one Landsat footprint (26/28) in Northern Wisconsin, USA, with temperate and boreal forests. We selected this area because it contains numerous tree species and variable forest composition providing an ideal study area to test the limits of Landsat data. We quantified how species-level classification accuracy was affected by a) the number of acquisitions, b) the seasonal distribution of observations, and c) the amount of cloud contamination. We classified a single year stack of Landsat-7, and -8 images data with a decision tree algorithm to generate a map of dominant tree species at the pixel- and stand-level. We obtained three important results. First, we achieved producer's accuracies in the range 70-80% and user's accuracies in range 80-90% for the most abundant tree species in our study area. Second, classification accuracy improved with more acquisitions, when observations were available from all seasons, and is the best when images with up to 40% cloud cover are included. Finally, classifications for pure stands were 10 to 30 percentage points better than those for mixed stands. We conclude that including partially cloudy Landsat imagery allows to map forest tree species with accuracies that were previously only possible for rare years with many cloud-free observations. Our approach thus provides important information for both forest management and science.
Local Subspace Classifier with Transform-Invariance for Image Classification
NASA Astrophysics Data System (ADS)
Hotta, Seiji
A family of linear subspace classifiers called local subspace classifier (LSC) outperforms the k-nearest neighbor rule (kNN) and conventional subspace classifiers in handwritten digit classification. However, LSC suffers very high sensitivity to image transformations because it uses projection and the Euclidean distances for classification. In this paper, I present a combination of a local subspace classifier (LSC) and a tangent distance (TD) for improving accuracy of handwritten digit recognition. In this classification rule, we can deal with transform-invariance easily because we are able to use tangent vectors for approximation of transformations. However, we cannot use tangent vectors in other type of images such as color images. Hence, kernel LSC (KLSC) is proposed for incorporating transform-invariance into LSC via kernel mapping. The performance of the proposed methods is verified with the experiments on handwritten digit and color image classification.
Zhang, Jiang; Wang, James Z; Yuan, Zhen; Sobel, Eric S; Jiang, Huabei
2011-01-01
This study presents a computer-aided classification method to distinguish osteoarthritis finger joints from healthy ones based on the functional images captured by x-ray guided diffuse optical tomography. Three imaging features, joint space width, optical absorption, and scattering coefficients, are employed to train a Least Squares Support Vector Machine (LS-SVM) classifier for osteoarthritis classification. The 10-fold validation results show that all osteoarthritis joints are clearly identified and all healthy joints are ruled out by the LS-SVM classifier. The best sensitivity, specificity, and overall accuracy of the classification by experienced technicians based on manual calculation of optical properties and visual examination of optical images are only 85%, 93%, and 90%, respectively. Therefore, our LS-SVM based computer-aided classification is a considerably improved method for osteoarthritis diagnosis.
2011-01-01
Background The aim of this study was to develop a child-specific classification system for long bone fractures and to examine its reliability and validity on the basis of a prospective multicentre study. Methods Using the sequentially developed classification system, three samples of between 30 and 185 paediatric limb fractures from a pool of 2308 fractures documented in two multicenter studies were analysed in a blinded fashion by eight orthopaedic surgeons, on a total of 5 occasions. Intra- and interobserver reliability and accuracy were calculated. Results The reliability improved with successive simplification of the classification. The final version resulted in an overall interobserver agreement of κ = 0.71 with no significant difference between experienced and less experienced raters. Conclusions In conclusion, the evaluation of the newly proposed classification system resulted in a reliable and routinely applicable system, for which training in its proper use may further improve the reliability. It can be recommended as a useful tool for clinical practice and offers the option for developing treatment recommendations and outcome predictions in the future. PMID:21548939
EEG channels reduction using PCA to increase XGBoost's accuracy for stroke detection
NASA Astrophysics Data System (ADS)
Fitriah, N.; Wijaya, S. K.; Fanany, M. I.; Badri, C.; Rezal, M.
2017-07-01
In Indonesia, based on the result of Basic Health Research 2013, the number of stroke patients had increased from 8.3 ‰ (2007) to 12.1 ‰ (2013). These days, some researchers are using electroencephalography (EEG) result as another option to detect the stroke disease besides CT Scan image as the gold standard. A previous study on the data of stroke and healthy patients in National Brain Center Hospital (RS PON) used Brain Symmetry Index (BSI), Delta-Alpha Ratio (DAR), and Delta-Theta-Alpha-Beta Ratio (DTABR) as the features for classification by an Extreme Learning Machine (ELM). The study got 85% accuracy with sensitivity above 86 % for acute ischemic stroke detection. Using EEG data means dealing with many data dimensions, and it can reduce the accuracy of classifier (the curse of dimensionality). Principal Component Analysis (PCA) could reduce dimensionality and computation cost without decreasing classification accuracy. XGBoost, as the scalable tree boosting classifier, can solve real-world scale problems (Higgs Boson and Allstate dataset) with using a minimal amount of resources. This paper reuses the same data from RS PON and features from previous research, preprocessed with PCA and classified with XGBoost, to increase the accuracy with fewer electrodes. The specific fewer electrodes improved the accuracy of stroke detection. Our future work will examine the other algorithm besides PCA to get higher accuracy with less number of channels.
A motion-classification strategy based on sEMG-EEG signal combination for upper-limb amputees.
Li, Xiangxin; Samuel, Oluwarotimi Williams; Zhang, Xu; Wang, Hui; Fang, Peng; Li, Guanglin
2017-01-07
Most of the modern motorized prostheses are controlled with the surface electromyography (sEMG) recorded on the residual muscles of amputated limbs. However, the residual muscles are usually limited, especially after above-elbow amputations, which would not provide enough sEMG for the control of prostheses with multiple degrees of freedom. Signal fusion is a possible approach to solve the problem of insufficient control commands, where some non-EMG signals are combined with sEMG signals to provide sufficient information for motion intension decoding. In this study, a motion-classification method that combines sEMG and electroencephalography (EEG) signals were proposed and investigated, in order to improve the control performance of upper-limb prostheses. Four transhumeral amputees without any form of neurological disease were recruited in the experiments. Five motion classes including hand-open, hand-close, wrist-pronation, wrist-supination, and no-movement were specified. During the motion performances, sEMG and EEG signals were simultaneously acquired from the skin surface and scalp of the amputees, respectively. The two types of signals were independently preprocessed and then combined as a parallel control input. Four time-domain features were extracted and fed into a classifier trained by the Linear Discriminant Analysis (LDA) algorithm for motion recognition. In addition, channel selections were performed by using the Sequential Forward Selection (SFS) algorithm to optimize the performance of the proposed method. The classification performance achieved by the fusion of sEMG and EEG signals was significantly better than that obtained by single signal source of either sEMG or EEG. An increment of more than 14% in classification accuracy was achieved when using a combination of 32-channel sEMG and 64-channel EEG. Furthermore, based on the SFS algorithm, two optimized electrode arrangements (10-channel sEMG + 10-channel EEG, 10-channel sEMG + 20-channel EEG) were obtained with classification accuracies of 84.2 and 87.0%, respectively, which were about 7.2 and 10% higher than the accuracy by using only 32-channel sEMG input. This study demonstrated the feasibility of fusing sEMG and EEG signals towards improving motion classification accuracy for above-elbow amputees, which might enhance the control performances of multifunctional myoelectric prostheses in clinical application. The study was approved by the ethics committee of Institutional Review Board of Shenzhen Institutes of Advanced Technology, and the reference number is SIAT-IRB-150515-H0077.
Mapping of land cover in northern California with simulated hyperspectral satellite imagery
NASA Astrophysics Data System (ADS)
Clark, Matthew L.; Kilham, Nina E.
2016-09-01
Land-cover maps are important science products needed for natural resource and ecosystem service management, biodiversity conservation planning, and assessing human-induced and natural drivers of land change. Analysis of hyperspectral, or imaging spectrometer, imagery has shown an impressive capacity to map a wide range of natural and anthropogenic land cover. Applications have been mostly with single-date imagery from relatively small spatial extents. Future hyperspectral satellites will provide imagery at greater spatial and temporal scales, and there is a need to assess techniques for mapping land cover with these data. Here we used simulated multi-temporal HyspIRI satellite imagery over a 30,000 km2 area in the San Francisco Bay Area, California to assess its capabilities for mapping classes defined by the international Land Cover Classification System (LCCS). We employed a mapping methodology and analysis framework that is applicable to regional and global scales. We used the Random Forests classifier with three sets of predictor variables (reflectance, MNF, hyperspectral metrics), two temporal resolutions (summer, spring-summer-fall), two sample scales (pixel, polygon) and two levels of classification complexity (12, 20 classes). Hyperspectral metrics provided a 16.4-21.8% and 3.1-6.7% increase in overall accuracy relative to MNF and reflectance bands, respectively, depending on pixel or polygon scales of analysis. Multi-temporal metrics improved overall accuracy by 0.9-3.1% over summer metrics, yet increases were only significant at the pixel scale of analysis. Overall accuracy at pixel scales was 72.2% (Kappa 0.70) with three seasons of metrics. Anthropogenic and homogenous natural vegetation classes had relatively high confidence and producer and user accuracies were over 70%; in comparison, woodland and forest classes had considerable confusion. We next focused on plant functional types with relatively pure spectra by removing open-canopy shrublands, woodlands and mixed forests from the classification. This 12-class map had significantly improved accuracy of 85.1% (Kappa 0.83) and most classes had over 70% producer and user accuracies. Finally, we summarized important metrics from the multi-temporal Random Forests to infer the underlying chemical and structural properties that best discriminated our land-cover classes across seasons.
Although remote sensing technology has long been used in wetland inventory and monitoring, the accuracy and detail level of derived wetland maps were limited or often unsatisfactory largely due to the relatively coarse spatial resolution of conventional satellite imagery. This re...
BRAIN TUMOR SEGMENTATION WITH SYMMETRIC TEXTURE AND SYMMETRIC INTENSITY-BASED DECISION FORESTS.
Bianchi, Anthony; Miller, James V; Tan, Ek Tsoon; Montillo, Albert
2013-04-01
Accurate automated segmentation of brain tumors in MR images is challenging due to overlapping tissue intensity distributions and amorphous tumor shape. However, a clinically viable solution providing precise quantification of tumor and edema volume would enable better pre-operative planning, treatment monitoring and drug development. Our contributions are threefold. First, we design efficient gradient and LBPTOP based texture features which improve classification accuracy over standard intensity features. Second, we extend our texture and intensity features to symmetric texture and symmetric intensity which further improve the accuracy for all tissue classes. Third, we demonstrate further accuracy enhancement by extending our long range features from 100mm to a full 200mm. We assess our brain segmentation technique on 20 patients in the BraTS 2012 dataset. Impact from each contribution is measured and the combination of all the features is shown to yield state-of-the-art accuracy and speed.
Yi, Zhenzhen; Strüder-Kypke, Michaela; Hu, Xiaozhong; Lin, Xiaofeng; Song, Weibo
2014-02-01
In order to assess how dataset-selection for multi-gene analyses affects the accuracy of inferred phylogenetic trees in ciliates, we chose five genes and the genus Paramecium, one of the most widely used model protist genera, and compared tree topologies of the single- and multi-gene analyses. Our empirical study shows that: (1) Using multiple genes improves phylogenetic accuracy, even when their one-gene topologies are in conflict with each other. (2) The impact of missing data on phylogenetic accuracy is ambiguous: resolution power and topological similarity, but not number of represented taxa, are the most important criteria of a dataset for inclusion in concatenated analyses. (3) As an example, we tested the three classification models of the genus Paramecium with a multi-gene based approach, and only the monophyly of the subgenus Paramecium is supported. Copyright © 2013 Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Lestari, A. W.; Rustam, Z.
2017-07-01
In the last decade, breast cancer has become the focus of world attention as this disease is one of the primary leading cause of death for women. Therefore, it is necessary to have the correct precautions and treatment. In previous studies, Fuzzy Kennel K-Medoid algorithm has been used for multi-class data. This paper proposes an algorithm to classify the high dimensional data of breast cancer using Fuzzy Possibilistic C-means (FPCM) and a new method based on clustering analysis using Normed Kernel Function-Based Fuzzy Possibilistic C-Means (NKFPCM). The objective of this paper is to obtain the best accuracy in classification of breast cancer data. In order to improve the accuracy of the two methods, the features candidates are evaluated using feature selection, where Laplacian Score is used. The results show the comparison accuracy and running time of FPCM and NKFPCM with and without feature selection.
Performance Analysis of Classification Methods for Indoor Localization in Vlc Networks
NASA Astrophysics Data System (ADS)
Sánchez-Rodríguez, D.; Alonso-González, I.; Sánchez-Medina, J.; Ley-Bosch, C.; Díaz-Vilariño, L.
2017-09-01
Indoor localization has gained considerable attention over the past decade because of the emergence of numerous location-aware services. Research works have been proposed on solving this problem by using wireless networks. Nevertheless, there is still much room for improvement in the quality of the proposed classification models. In the last years, the emergence of Visible Light Communication (VLC) brings a brand new approach to high quality indoor positioning. Among its advantages, this new technology is immune to electromagnetic interference and has the advantage of having a smaller variance of received signal power compared to RF based technologies. In this paper, a performance analysis of seventeen machine leaning classifiers for indoor localization in VLC networks is carried out. The analysis is accomplished in terms of accuracy, average distance error, computational cost, training size, precision and recall measurements. Results show that most of classifiers harvest an accuracy above 90 %. The best tested classifier yielded a 99.0 % accuracy, with an average error distance of 0.3 centimetres.
Swiercz, Miroslaw; Kochanowicz, Jan; Weigele, John; Hurst, Robert; Liebeskind, David S; Mariak, Zenon; Melhem, Elias R; Krejza, Jaroslaw
2008-01-01
To determine the performance of an artificial neural network in transcranial color-coded duplex sonography (TCCS) diagnosis of middle cerebral artery (MCA) spasm. TCCS was prospectively acquired within 2 h prior to routine cerebral angiography in 100 consecutive patients (54M:46F, median age 50 years). Angiographic MCA vasospasm was classified as mild (<25% of vessel caliber reduction), moderate (25-50%), or severe (>50%). A Learning Vector Quantization neural network classified MCA spasm based on TCCS peak-systolic, mean, and end-diastolic velocity data. During a four-class discrimination task, accurate classification by the network ranged from 64.9% to 72.3%, depending on the number of neurons in the Kohonen layer. Accurate classification of vasospasm ranged from 79.6% to 87.6%, with an accuracy of 84.7% to 92.1% for the detection of moderate-to-severe vasospasm. An artificial neural network may increase the accuracy of TCCS in diagnosis of MCA spasm.
IMPACTS OF PATCH SIZE AND LANDSCAPE HETEROGENEITY ON THEMATIC IMAGE CLASSIFICATION ACCURACY
Impacts of Patch Size and Landscape Heterogeneity on Thematic Image Classification Accuracy.
Currently, most thematic accuracy assessments of classified remotely sensed images oily account for errors between the various classes employed, at particular pixels of interest, thu...
NASA Astrophysics Data System (ADS)
Gómez Giménez, M.; Della Peruta, R.; de Jong, R.; Keller, A.; Schaepman, M. E.
2015-12-01
Agroecosystems play an important role providing economic and ecosystem services, which directly impact society. Inappropriate land use and unsustainable agricultural management with associated nutrient cycles can jeopardize important soil functions such as food production, livestock feeding and conservation of biodiversity. The objective of this study was to integrate remotely sensed land cover information into a regional Land Management Model (LMM) to improve the assessment of spatial explicit nutrient balances for agroecosystems. Remotely sensed data as well as an optimized parameter set contributed to feed the LMM providing a better spatial allocation of agricultural data aggregated at farm level. The integration of land use information in the land allocation process relied predominantly on three factors: i) spatial resolution, ii) classification accuracy and iii) parcels definition. The best-input parameter combination resulted in two different land cover classifications with overall accuracies of 98%, improving the LMM performance by 16% as compared to using non-spatially explicit input. Firstly, the use of spatial explicit information improved the spatial allocation output resulting in a pattern that better followed parcel boundaries (Figure 1). Second, the high classification accuracies ensured consistency between the datasets used. Third, the use of a suitable spatial unit to define the parcels boundaries influenced the model in terms of computational time and the amount of farmland allocated. We conclude that the combined use of remote sensing (RS) data with the LMM has the potential to provide highly accurate information of spatial explicit nutrient balances that are crucial for policy options concerning sustainable management of agricultural soils. Figure 1. Details of the spatial pattern obtained: a) Using only the farm census data, b) using also land use information. Framed in black in the left image (a), examples of artifacts that disappeared when using land use information (right image, b). Colors represent different ownership.
Object oriented classification of high resolution data for inventory of horticultural crops
NASA Astrophysics Data System (ADS)
Hebbar, R.; Ravishankar, H. M.; Trivedi, S.; Subramoniam, S. R.; Uday, R.; Dadhwal, V. K.
2014-11-01
High resolution satellite images are associated with large variance and thus, per pixel classifiers often result in poor accuracy especially in delineation of horticultural crops. In this context, object oriented techniques are powerful and promising methods for classification. In the present study, a semi-automatic object oriented feature extraction model has been used for delineation of horticultural fruit and plantation crops using Erdas Objective Imagine. Multi-resolution data from Resourcesat LISS-IV and Cartosat-1 have been used as source data in the feature extraction model. Spectral and textural information along with NDVI were used as inputs for generation of Spectral Feature Probability (SFP) layers using sample training pixels. The SFP layers were then converted into raster objects using threshold and clump function resulting in pixel probability layer. A set of raster and vector operators was employed in the subsequent steps for generating thematic layer in the vector format. This semi-automatic feature extraction model was employed for classification of major fruit and plantations crops viz., mango, banana, citrus, coffee and coconut grown under different agro-climatic conditions. In general, the classification accuracy of about 75-80 per cent was achieved for these crops using object based classification alone and the same was further improved using minimal visual editing of misclassified areas. A comparison of on-screen visual interpretation with object oriented approach showed good agreement. It was observed that old and mature plantations were classified more accurately while young and recently planted ones (3 years or less) showed poor classification accuracy due to mixed spectral signature, wider spacing and poor stands of plantations. The results indicated the potential use of object oriented approach for classification of high resolution data for delineation of horticultural fruit and plantation crops. The present methodology is applicable at local levels and future development is focused on up-scaling the methodology for generation of fruit and plantation crop maps at regional and national level which is important for creation of database for overall horticultural crop development.
Power System Transient Stability Based on Data Mining Theory
NASA Astrophysics Data System (ADS)
Cui, Zhen; Shi, Jia; Wu, Runsheng; Lu, Dan; Cui, Mingde
2018-01-01
In order to study the stability of power system, a power system transient stability based on data mining theory is designed. By introducing association rules analysis in data mining theory, an association classification method for transient stability assessment is presented. A mathematical model of transient stability assessment based on data mining technology is established. Meanwhile, combining rule reasoning with classification prediction, the method of association classification is proposed to perform transient stability assessment. The transient stability index is used to identify the samples that cannot be correctly classified in association classification. Then, according to the critical stability of each sample, the time domain simulation method is used to determine the state, so as to ensure the accuracy of the final results. The results show that this stability assessment system can improve the speed of operation under the premise that the analysis result is completely correct, and the improved algorithm can find out the inherent relation between the change of power system operation mode and the change of transient stability degree.
Automatic Identification of Critical Follow-Up Recommendation Sentences in Radiology Reports
Yetisgen-Yildiz, Meliha; Gunn, Martin L.; Xia, Fei; Payne, Thomas H.
2011-01-01
Communication of follow-up recommendations when abnormalities are identified on imaging studies is prone to error. When recommendations are not systematically identified and promptly communicated to referrers, poor patient outcomes can result. Using information technology can improve communication and improve patient safety. In this paper, we describe a text processing approach that uses natural language processing (NLP) and supervised text classification methods to automatically identify critical recommendation sentences in radiology reports. To increase the classification performance we enhanced the simple unigram token representation approach with lexical, semantic, knowledge-base, and structural features. We tested different combinations of those features with the Maximum Entropy (MaxEnt) classification algorithm. Classifiers were trained and tested with a gold standard corpus annotated by a domain expert. We applied 5-fold cross validation and our best performing classifier achieved 95.60% precision, 79.82% recall, 87.0% F-score, and 99.59% classification accuracy in identifying the critical recommendation sentences in radiology reports. PMID:22195225
Automatic identification of critical follow-up recommendation sentences in radiology reports.
Yetisgen-Yildiz, Meliha; Gunn, Martin L; Xia, Fei; Payne, Thomas H
2011-01-01
Communication of follow-up recommendations when abnormalities are identified on imaging studies is prone to error. When recommendations are not systematically identified and promptly communicated to referrers, poor patient outcomes can result. Using information technology can improve communication and improve patient safety. In this paper, we describe a text processing approach that uses natural language processing (NLP) and supervised text classification methods to automatically identify critical recommendation sentences in radiology reports. To increase the classification performance we enhanced the simple unigram token representation approach with lexical, semantic, knowledge-base, and structural features. We tested different combinations of those features with the Maximum Entropy (MaxEnt) classification algorithm. Classifiers were trained and tested with a gold standard corpus annotated by a domain expert. We applied 5-fold cross validation and our best performing classifier achieved 95.60% precision, 79.82% recall, 87.0% F-score, and 99.59% classification accuracy in identifying the critical recommendation sentences in radiology reports.
NASA Astrophysics Data System (ADS)
Yan, Dan; Bai, Lianfa; Zhang, Yi; Han, Jing
2018-02-01
For the problems of missing details and performance of the colorization based on sparse representation, we propose a conceptual model framework for colorizing gray-scale images, and then a multi-sparse dictionary colorization algorithm based on the feature classification and detail enhancement (CEMDC) is proposed based on this framework. The algorithm can achieve a natural colorized effect for a gray-scale image, and it is consistent with the human vision. First, the algorithm establishes a multi-sparse dictionary classification colorization model. Then, to improve the accuracy rate of the classification, the corresponding local constraint algorithm is proposed. Finally, we propose a detail enhancement based on Laplacian Pyramid, which is effective in solving the problem of missing details and improving the speed of image colorization. In addition, the algorithm not only realizes the colorization of the visual gray-scale image, but also can be applied to the other areas, such as color transfer between color images, colorizing gray fusion images, and infrared images.
Enhancing the Performance of LibSVM Classifier by Kernel F-Score Feature Selection
NASA Astrophysics Data System (ADS)
Sarojini, Balakrishnan; Ramaraj, Narayanasamy; Nickolas, Savarimuthu
Medical Data mining is the search for relationships and patterns within the medical datasets that could provide useful knowledge for effective clinical decisions. The inclusion of irrelevant, redundant and noisy features in the process model results in poor predictive accuracy. Much research work in data mining has gone into improving the predictive accuracy of the classifiers by applying the techniques of feature selection. Feature selection in medical data mining is appreciable as the diagnosis of the disease could be done in this patient-care activity with minimum number of significant features. The objective of this work is to show that selecting the more significant features would improve the performance of the classifier. We empirically evaluate the classification effectiveness of LibSVM classifier on the reduced feature subset of diabetes dataset. The evaluations suggest that the feature subset selected improves the predictive accuracy of the classifier and reduce false negatives and false positives.
An Extreme Learning Machine-Based Neuromorphic Tactile Sensing System for Texture Recognition.
Rasouli, Mahdi; Chen, Yi; Basu, Arindam; Kukreja, Sunil L; Thakor, Nitish V
2018-04-01
Despite significant advances in computational algorithms and development of tactile sensors, artificial tactile sensing is strikingly less efficient and capable than the human tactile perception. Inspired by efficiency of biological systems, we aim to develop a neuromorphic system for tactile pattern recognition. We particularly target texture recognition as it is one of the most necessary and challenging tasks for artificial sensory systems. Our system consists of a piezoresistive fabric material as the sensor to emulate skin, an interface that produces spike patterns to mimic neural signals from mechanoreceptors, and an extreme learning machine (ELM) chip to analyze spiking activity. Benefiting from intrinsic advantages of biologically inspired event-driven systems and massively parallel and energy-efficient processing capabilities of the ELM chip, the proposed architecture offers a fast and energy-efficient alternative for processing tactile information. Moreover, it provides the opportunity for the development of low-cost tactile modules for large-area applications by integration of sensors and processing circuits. We demonstrate the recognition capability of our system in a texture discrimination task, where it achieves a classification accuracy of 92% for categorization of ten graded textures. Our results confirm that there exists a tradeoff between response time and classification accuracy (and information transfer rate). A faster decision can be achieved at early time steps or by using a shorter time window. This, however, results in deterioration of the classification accuracy and information transfer rate. We further observe that there exists a tradeoff between the classification accuracy and the input spike rate (and thus energy consumption). Our work substantiates the importance of development of efficient sparse codes for encoding sensory data to improve the energy efficiency. These results have a significance for a wide range of wearable, robotic, prosthetic, and industrial applications.
Dai, Shengfa; Wei, Qingguo
2017-01-01
Common spatial pattern algorithm is widely used to estimate spatial filters in motor imagery based brain-computer interfaces. However, use of a large number of channels will make common spatial pattern tend to over-fitting and the classification of electroencephalographic signals time-consuming. To overcome these problems, it is necessary to choose an optimal subset of the whole channels to save computational time and improve the classification accuracy. In this paper, a novel method named backtracking search optimization algorithm is proposed to automatically select the optimal channel set for common spatial pattern. Each individual in the population is a N-dimensional vector, with each component representing one channel. A population of binary codes generate randomly in the beginning, and then channels are selected according to the evolution of these codes. The number and positions of 1's in the code denote the number and positions of chosen channels. The objective function of backtracking search optimization algorithm is defined as the combination of classification error rate and relative number of channels. Experimental results suggest that higher classification accuracy can be achieved with much fewer channels compared to standard common spatial pattern with whole channels.
Clemans, Katherine H.; Musci, Rashelle J.; Leoutsakos, Jeannie-Marie S.; Ialongo, Nicholas S.
2014-01-01
Objective This study compared the ability of teacher, parent, and peer reports of aggressive behavior in early childhood to accurately classify cases of maladaptive outcomes in late adolescence and early adulthood. Method Weighted kappa analyses determined optimal cut points and relative classification accuracy among teacher, parent, and peer reports of aggression assessed for 691 students (54% male; 84% African American, 13% White) in the fall of first grade. Outcomes included antisocial personality, substance use, incarceration history, risky sexual behavior, and failure to graduate from high school on time. Results Peer reports were the most accurate classifier of all outcomes in the full sample. For most outcomes, the addition of teacher or parent reports did not improve overall classification accuracy once peer reports were accounted for. Additional gender-specific and adjusted kappa analyses supported the superior classification utility of the peer report measure. Conclusion The results suggest that peer reports provided the most useful classification information of the three aggression measures. Implications for targeted intervention efforts which use screening measures to identify at-risk children are discussed. PMID:24512126
Improved Diagnostic Multimodal Biomarkers for Alzheimer's Disease and Mild Cognitive Impairment
Martínez-Torteya, Antonio; Treviño, Víctor; Tamez-Peña, José G.
2015-01-01
The early diagnosis of Alzheimer's disease (AD) and mild cognitive impairment (MCI) is very important for treatment research and patient care purposes. Few biomarkers are currently considered in clinical settings, and their use is still optional. The objective of this work was to determine whether multimodal and nonpreviously AD associated features could improve the classification accuracy between AD, MCI, and healthy controls, which may impact future AD biomarkers. For this, Alzheimer's Disease Neuroimaging Initiative database was mined for case-control candidates. At least 652 baseline features extracted from MRI and PET analyses, biological samples, and clinical data up to February 2014 were used. A feature selection methodology that includes a genetic algorithm search coupled to a logistic regression classifier and forward and backward selection strategies was used to explore combinations of features. This generated diagnostic models with sizes ranging from 3 to 8, including well documented AD biomarkers, as well as unexplored image, biochemical, and clinical features. Accuracies of 0.85, 0.79, and 0.80 were achieved for HC-AD, HC-MCI, and MCI-AD classifications, respectively, when evaluated using a blind test set. In conclusion, a set of features provided additional and independent information to well-established AD biomarkers, aiding in the classification of MCI and AD. PMID:26106620
Mental Task Evaluation for Hybrid NIRS-EEG Brain-Computer Interfaces
Gupta, Rishabh; Falk, Tiago H.
2017-01-01
Based on recent electroencephalography (EEG) and near-infrared spectroscopy (NIRS) studies that showed that tasks such as motor imagery and mental arithmetic induce specific neural response patterns, we propose a hybrid brain-computer interface (hBCI) paradigm in which EEG and NIRS data are fused to improve binary classification performance. We recorded simultaneous NIRS-EEG data from nine participants performing seven mental tasks (word generation, mental rotation, subtraction, singing and navigation, and motor and face imagery). Classifiers were trained for each possible pair of tasks using (1) EEG features alone, (2) NIRS features alone, and (3) EEG and NIRS features combined, to identify the best task pairs and assess the usefulness of a multimodal approach. The NIRS-EEG approach led to an average increase in peak kappa of 0.03 when using features extracted from one-second windows (equivalent to an increase of 1.5% in classification accuracy for balanced classes). The increase was much stronger (0.20, corresponding to an 10% accuracy increase) when focusing on time windows of high NIRS performance. The EEG and NIRS analyses further unveiled relevant brain regions and important feature types. This work provides a basis for future NIRS-EEG hBCI studies aiming to improve classification performance toward more efficient and flexible BCIs. PMID:29181021
Cardiac arrhythmia beat classification using DOST and PSO tuned SVM.
Raj, Sandeep; Ray, Kailash Chandra; Shankar, Om
2016-11-01
The increase in the number of deaths due to cardiovascular diseases (CVDs) has gained significant attention from the study of electrocardiogram (ECG) signals. These ECG signals are studied by the experienced cardiologist for accurate and proper diagnosis, but it becomes difficult and time-consuming for long-term recordings. Various signal processing techniques are studied to analyze the ECG signal, but they bear limitations due to the non-stationary behavior of ECG signals. Hence, this study aims to improve the classification accuracy rate and provide an automated diagnostic solution for the detection of cardiac arrhythmias. The proposed methodology consists of four stages, i.e. filtering, R-peak detection, feature extraction and classification stages. In this study, Wavelet based approach is used to filter the raw ECG signal, whereas Pan-Tompkins algorithm is used for detecting the R-peak inside the ECG signal. In the feature extraction stage, discrete orthogonal Stockwell transform (DOST) approach is presented for an efficient time-frequency representation (i.e. morphological descriptors) of a time domain signal and retains the absolute phase information to distinguish the various non-stationary behavior ECG signals. Moreover, these morphological descriptors are further reduced in lower dimensional space by using principal component analysis and combined with the dynamic features (i.e based on RR-interval of the ECG signals) of the input signal. This combination of two different kinds of descriptors represents each feature set of an input signal that is utilized for classification into subsequent categories by employing PSO tuned support vector machines (SVM). The proposed methodology is validated on the baseline MIT-BIH arrhythmia database and evaluated under two assessment schemes, yielding an improved overall accuracy of 99.18% for sixteen classes in the category-based and 89.10% for five classes (mapped according to AAMI standard) in the patient-based assessment scheme respectively to the state-of-art diagnosis. The results reported are further compared to the existing methodologies in literature. The proposed feature representation of cardiac signals based on symmetrical features along with PSO based optimization technique for the SVM classifier reported an improved classification accuracy in both the assessment schemes evaluated on the benchmark MIT-BIH arrhythmia database and hence can be utilized for automated computer-aided diagnosis of cardiac arrhythmia beats. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Graph pyramids for protein function prediction
2015-01-01
Background Uncovering the hidden organizational characteristics and regularities among biological sequences is the key issue for detailed understanding of an underlying biological phenomenon. Thus pattern recognition from nucleic acid sequences is an important affair for protein function prediction. As proteins from the same family exhibit similar characteristics, homology based approaches predict protein functions via protein classification. But conventional classification approaches mostly rely on the global features by considering only strong protein similarity matches. This leads to significant loss of prediction accuracy. Methods Here we construct the Protein-Protein Similarity (PPS) network, which captures the subtle properties of protein families. The proposed method considers the local as well as the global features, by examining the interactions among 'weakly interacting proteins' in the PPS network and by using hierarchical graph analysis via the graph pyramid. Different underlying properties of the protein families are uncovered by operating the proposed graph based features at various pyramid levels. Results Experimental results on benchmark data sets show that the proposed hierarchical voting algorithm using graph pyramid helps to improve computational efficiency as well the protein classification accuracy. Quantitatively, among 14,086 test sequences, on an average the proposed method misclassified only 21.1 sequences whereas baseline BLAST score based global feature matching method misclassified 362.9 sequences. With each correctly classified test sequence, the fast incremental learning ability of the proposed method further enhances the training model. Thus it has achieved more than 96% protein classification accuracy using only 20% per class training data. PMID:26044522
Graph pyramids for protein function prediction.
Sandhan, Tushar; Yoo, Youngjun; Choi, Jin; Kim, Sun
2015-01-01
Uncovering the hidden organizational characteristics and regularities among biological sequences is the key issue for detailed understanding of an underlying biological phenomenon. Thus pattern recognition from nucleic acid sequences is an important affair for protein function prediction. As proteins from the same family exhibit similar characteristics, homology based approaches predict protein functions via protein classification. But conventional classification approaches mostly rely on the global features by considering only strong protein similarity matches. This leads to significant loss of prediction accuracy. Here we construct the Protein-Protein Similarity (PPS) network, which captures the subtle properties of protein families. The proposed method considers the local as well as the global features, by examining the interactions among 'weakly interacting proteins' in the PPS network and by using hierarchical graph analysis via the graph pyramid. Different underlying properties of the protein families are uncovered by operating the proposed graph based features at various pyramid levels. Experimental results on benchmark data sets show that the proposed hierarchical voting algorithm using graph pyramid helps to improve computational efficiency as well the protein classification accuracy. Quantitatively, among 14,086 test sequences, on an average the proposed method misclassified only 21.1 sequences whereas baseline BLAST score based global feature matching method misclassified 362.9 sequences. With each correctly classified test sequence, the fast incremental learning ability of the proposed method further enhances the training model. Thus it has achieved more than 96% protein classification accuracy using only 20% per class training data.
Öztoprak, Hüseyin; Toycan, Mehmet; Alp, Yaşar Kemal; Arıkan, Orhan; Doğutepe, Elvin; Karakaş, Sirel
2017-12-01
Attention-deficit/hyperactivity disorder (ADHD) is the most frequent diagnosis among children who are referred to psychiatry departments. Although ADHD was discovered at the beginning of the 20th century, its diagnosis is still confronted with many problems. A novel classification approach that discriminates ADHD and nonADHD groups over the time-frequency domain features of event-related potential (ERP) recordings that are taken during Stroop task is presented. Time-Frequency Hermite-Atomizer (TFHA) technique is used for the extraction of high resolution time-frequency domain features that are highly localized in time-frequency domain. Based on an extensive investigation, Support Vector Machine-Recursive Feature Elimination (SVM-RFE) was used to obtain the best discriminating features. When the best three features were used, the classification accuracy for the training dataset reached 98%, and the use of five features further improved the accuracy to 99.5%. The accuracy was 100% for the testing dataset. Based on extensive experiments, the delta band emerged as the most contributing frequency band and statistical parameters emerged as the most contributing feature group. The classification performance of this study suggests that TFHA can be employed as an auxiliary component of the diagnostic and prognostic procedures for ADHD. The features obtained in this study can potentially contribute to the neuroelectrical understanding and clinical diagnosis of ADHD. Copyright © 2017 International Federation of Clinical Neurophysiology. Published by Elsevier B.V. All rights reserved.
Comparisons of neural networks to standard techniques for image classification and correlation
NASA Technical Reports Server (NTRS)
Paola, Justin D.; Schowengerdt, Robert A.
1994-01-01
Neural network techniques for multispectral image classification and spatial pattern detection are compared to the standard techniques of maximum-likelihood classification and spatial correlation. The neural network produced a more accurate classification than maximum-likelihood of a Landsat scene of Tucson, Arizona. Some of the errors in the maximum-likelihood classification are illustrated using decision region and class probability density plots. As expected, the main drawback to the neural network method is the long time required for the training stage. The network was trained using several different hidden layer sizes to optimize both the classification accuracy and training speed, and it was found that one node per class was optimal. The performance improved when 3x3 local windows of image data were entered into the net. This modification introduces texture into the classification without explicit calculation of a texture measure. Larger windows were successfully used for the detection of spatial features in Landsat and Magellan synthetic aperture radar imagery.
Ensemble methods with simple features for document zone classification
NASA Astrophysics Data System (ADS)
Obafemi-Ajayi, Tayo; Agam, Gady; Xie, Bingqing
2012-01-01
Document layout analysis is of fundamental importance for document image understanding and information retrieval. It requires the identification of blocks extracted from a document image via features extraction and block classification. In this paper, we focus on the classification of the extracted blocks into five classes: text (machine printed), handwriting, graphics, images, and noise. We propose a new set of features for efficient classifications of these blocks. We present a comparative evaluation of three ensemble based classification algorithms (boosting, bagging, and combined model trees) in addition to other known learning algorithms. Experimental results are demonstrated for a set of 36503 zones extracted from 416 document images which were randomly selected from the tobacco legacy document collection. The results obtained verify the robustness and effectiveness of the proposed set of features in comparison to the commonly used Ocropus recognition features. When used in conjunction with the Ocropus feature set, we further improve the performance of the block classification system to obtain a classification accuracy of 99.21%.
NASA Astrophysics Data System (ADS)
Zamora Ramos, Ernesto
Artificial Intelligence is a big part of automation and with today's technological advances, artificial intelligence has taken great strides towards positioning itself as the technology of the future to control, enhance and perfect automation. Computer vision includes pattern recognition and classification and machine learning. Computer vision is at the core of decision making and it is a vast and fruitful branch of artificial intelligence. In this work, we expose novel algorithms and techniques built upon existing technologies to improve pattern recognition and neural network training, initially motivated by a multidisciplinary effort to build a robot that helps maintain and optimize solar panel energy production. Our contributions detail an improved non-linear pre-processing technique to enhance poorly illuminated images based on modifications to the standard histogram equalization for an image. While the original motivation was to improve nocturnal navigation, the results have applications in surveillance, search and rescue, medical imaging enhancing, and many others. We created a vision system for precise camera distance positioning motivated to correctly locate the robot for capture of solar panel images for classification. The classification algorithm marks solar panels as clean or dirty for later processing. Our algorithm extends past image classification and, based on historical and experimental data, it identifies the optimal moment in which to perform maintenance on marked solar panels as to minimize the energy and profit loss. In order to improve upon the classification algorithm, we delved into feedforward neural networks because of their recent advancements, proven universal approximation and classification capabilities, and excellent recognition rates. We explore state-of-the-art neural network training techniques offering pointers and insights, culminating on the implementation of a complete library with support for modern deep learning architectures, multilayer percepterons and convolutional neural networks. Our research with neural networks has encountered a great deal of difficulties regarding hyperparameter estimation for good training convergence rate and accuracy. Most hyperparameters, including architecture, learning rate, regularization, trainable parameters (or weights) initialization, and so on, are chosen via a trial and error process with some educated guesses. However, we developed the first quantitative method to compare weight initialization strategies, a critical hyperparameter choice during training, to estimate among a group of candidate strategies which would make the network converge to the highest classification accuracy faster with high probability. Our method provides a quick, objective measure to compare initialization strategies to select the best possible among them beforehand without having to complete multiple training sessions for each candidate strategy to compare final results.
Land cover mapping in Latvia using hyperspectral airborne and simulated Sentinel-2 data
NASA Astrophysics Data System (ADS)
Jakovels, Dainis; Filipovs, Jevgenijs; Brauns, Agris; Taskovs, Juris; Erins, Gatis
2016-08-01
Land cover mapping in Latvia is performed as part of the Corine Land Cover (CLC) initiative every six years. The advantage of CLC is the creation of a standardized nomenclature and mapping protocol comparable across all European countries, thereby making it a valuable information source at the European level. However, low spatial resolution and accuracy, infrequent updates and expensive manual production has limited its use at the national level. As of now, there is no remote sensing based high resolution land cover and land use services designed specifically for Latvia which would account for the country's natural and land use specifics and end-user interests. The European Space Agency launched the Sentinel-2 satellite in 2015 aiming to provide continuity of free high resolution multispectral satellite data thereby presenting an opportunity to develop and adapted land cover and land use algorithm which accounts for national enduser needs. In this study, land cover mapping scheme according to national end-user needs was developed and tested in two pilot territories (Cesis and Burtnieki). Hyperspectral airborne data covering spectral range 400-2500 nm was acquired in summer 2015 using Airborne Surveillance and Environmental Monitoring System (ARSENAL). The gathered data was tested for land cover classification of seven general classes (urban/artificial, bare, forest, shrubland, agricultural/grassland, wetlands, water) and sub-classes specific for Latvia as well as simulation of Sentinel-2 satellite data. Hyperspectral data sets consist of 122 spectral bands in visible to near infrared spectral range (356-950 nm) and 100 bands in short wave infrared (950-2500 nm). Classification of land cover was tested separately for each sensor data and fused cross-sensor data. The best overall classification accuracy 84.2% and satisfactory classification accuracy (more than 80%) for 9 of 13 classes was obtained using Support Vector Machine (SVM) classifier with 109 band hyperspectral data. Grassland and agriculture land demonstrated lowest classification accuracy in pixel based approach, but result significantly improved by looking at agriculture polygons registered in Rural Support Service data as objects. The test of simulated Sentinel-2 bands for land cover mapping using SVM classifier showed 82.8% overall accuracy and satisfactory separation of 7 classes. SVM provided highest overall accuracy 84.2% in comparison to 75.9% for k-Nearest Neighbor and 79.2% Linear Discriminant Analysis classifiers.
Zmiri, Dror; Shahar, Yuval; Taieb-Maimon, Meirav
2012-04-01
To test the feasibility of classifying emergency department patients into severity grades using data mining methods. Emergency department records of 402 patients were classified into five severity grades by two expert physicians. The Naïve Bayes and C4.5 algorithms were applied to produce classifiers from patient data into severity grades. The classifiers' results over several subsets of the data were compared with the physicians' assessments, with a random classifier, and with a classifier that selects the maximal-prevalence class. Positive predictive value, multiple-class extensions of sensitivity and specificity combinations, and entropy change. The mean accuracy of the data mining classifiers was 52.94 ± 5.89%, significantly better (P < 0.05) than the mean accuracy of a random classifier (34.60 ± 2.40%). The entropy of the input data sets was reduced through classification by a mean of 10.1%. Allowing for classification deviations of one severity grade led to mean accuracy of 85.42 ± 1.42%. The classifiers' accuracy in that case was similar to the physicians' consensus rate. Learning from consensus records led to better performance. Reducing the number of severity grades improved results in certain cases. The performance of the Naïve Bayes and C4.5 algorithms was similar; in unbalanced data sets, Naïve Bayes performed better. It is possible to produce a computerized classification model for the severity grade of triage patients, using data mining methods. Learning from patient records regarding which there is a consensus of several physicians is preferable to learning from each physician's patients. Either Naïve Bayes or C4.5 can be used; Naïve Bayes is preferable for unbalanced data sets. An ambiguity in the intermediate severity grades seems to hamper both the physicians' agreement and the classifiers' accuracy. © 2010 Blackwell Publishing Ltd.