Sample records for random forest classification

  1. Predicting temperate forest stand types using only structural profiles from discrete return airborne lidar

    NASA Astrophysics Data System (ADS)

    Fedrigo, Melissa; Newnham, Glenn J.; Coops, Nicholas C.; Culvenor, Darius S.; Bolton, Douglas K.; Nitschke, Craig R.

    2018-02-01

    Light detection and ranging (lidar) data have been increasingly used for forest classification due to its ability to penetrate the forest canopy and provide detail about the structure of the lower strata. In this study we demonstrate forest classification approaches using airborne lidar data as inputs to random forest and linear unmixing classification algorithms. Our results demonstrated that both random forest and linear unmixing models identified a distribution of rainforest and eucalypt stands that was comparable to existing ecological vegetation class (EVC) maps based primarily on manual interpretation of high resolution aerial imagery. Rainforest stands were also identified in the region that have not previously been identified in the EVC maps. The transition between stand types was better characterised by the random forest modelling approach. In contrast, the linear unmixing model placed greater emphasis on field plots selected as endmembers which may not have captured the variability in stand structure within a single stand type. The random forest model had the highest overall accuracy (84%) and Cohen's kappa coefficient (0.62). However, the classification accuracy was only marginally better than linear unmixing. The random forest model was applied to a region in the Central Highlands of south-eastern Australia to produce maps of stand type probability, including areas of transition (the 'ecotone') between rainforest and eucalypt forest. The resulting map provided a detailed delineation of forest classes, which specifically recognised the coalescing of stand types at the landscape scale. This represents a key step towards mapping the structural and spatial complexity of these ecosystems, which is important for both their management and conservation.

  2. CURE-SMOTE algorithm and hybrid algorithm for feature selection and parameter optimization based on random forests.

    PubMed

    Ma, Li; Fan, Suohai

    2017-03-14

    The random forests algorithm is a type of classifier with prominent universality, a wide application range, and robustness for avoiding overfitting. But there are still some drawbacks to random forests. Therefore, to improve the performance of random forests, this paper seeks to improve imbalanced data processing, feature selection and parameter optimization. We propose the CURE-SMOTE algorithm for the imbalanced data classification problem. Experiments on imbalanced UCI data reveal that the combination of Clustering Using Representatives (CURE) enhances the original synthetic minority oversampling technique (SMOTE) algorithms effectively compared with the classification results on the original data using random sampling, Borderline-SMOTE1, safe-level SMOTE, C-SMOTE, and k-means-SMOTE. Additionally, the hybrid RF (random forests) algorithm has been proposed for feature selection and parameter optimization, which uses the minimum out of bag (OOB) data error as its objective function. Simulation results on binary and higher-dimensional data indicate that the proposed hybrid RF algorithms, hybrid genetic-random forests algorithm, hybrid particle swarm-random forests algorithm and hybrid fish swarm-random forests algorithm can achieve the minimum OOB error and show the best generalization ability. The training set produced from the proposed CURE-SMOTE algorithm is closer to the original data distribution because it contains minimal noise. Thus, better classification results are produced from this feasible and effective algorithm. Moreover, the hybrid algorithm's F-value, G-mean, AUC and OOB scores demonstrate that they surpass the performance of the original RF algorithm. Hence, this hybrid algorithm provides a new way to perform feature selection and parameter optimization.

  3. CW-SSIM kernel based random forest for image classification

    NASA Astrophysics Data System (ADS)

    Fan, Guangzhe; Wang, Zhou; Wang, Jiheng

    2010-07-01

    Complex wavelet structural similarity (CW-SSIM) index has been proposed as a powerful image similarity metric that is robust to translation, scaling and rotation of images, but how to employ it in image classification applications has not been deeply investigated. In this paper, we incorporate CW-SSIM as a kernel function into a random forest learning algorithm. This leads to a novel image classification approach that does not require a feature extraction or dimension reduction stage at the front end. We use hand-written digit recognition as an example to demonstrate our algorithm. We compare the performance of the proposed approach with random forest learning based on other kernels, including the widely adopted Gaussian and the inner product kernels. Empirical evidences show that the proposed method is superior in its classification power. We also compared our proposed approach with the direct random forest method without kernel and the popular kernel-learning method support vector machine. Our test results based on both simulated and realworld data suggest that the proposed approach works superior to traditional methods without the feature selection procedure.

  4. Differential privacy-based evaporative cooling feature selection and classification with relief-F and random forests.

    PubMed

    Le, Trang T; Simmons, W Kyle; Misaki, Masaya; Bodurka, Jerzy; White, Bill C; Savitz, Jonathan; McKinney, Brett A

    2017-09-15

    Classification of individuals into disease or clinical categories from high-dimensional biological data with low prediction error is an important challenge of statistical learning in bioinformatics. Feature selection can improve classification accuracy but must be incorporated carefully into cross-validation to avoid overfitting. Recently, feature selection methods based on differential privacy, such as differentially private random forests and reusable holdout sets, have been proposed. However, for domains such as bioinformatics, where the number of features is much larger than the number of observations p≫n , these differential privacy methods are susceptible to overfitting. We introduce private Evaporative Cooling, a stochastic privacy-preserving machine learning algorithm that uses Relief-F for feature selection and random forest for privacy preserving classification that also prevents overfitting. We relate the privacy-preserving threshold mechanism to a thermodynamic Maxwell-Boltzmann distribution, where the temperature represents the privacy threshold. We use the thermal statistical physics concept of Evaporative Cooling of atomic gases to perform backward stepwise privacy-preserving feature selection. On simulated data with main effects and statistical interactions, we compare accuracies on holdout and validation sets for three privacy-preserving methods: the reusable holdout, reusable holdout with random forest, and private Evaporative Cooling, which uses Relief-F feature selection and random forest classification. In simulations where interactions exist between attributes, private Evaporative Cooling provides higher classification accuracy without overfitting based on an independent validation set. In simulations without interactions, thresholdout with random forest and private Evaporative Cooling give comparable accuracies. We also apply these privacy methods to human brain resting-state fMRI data from a study of major depressive disorder. Code available at http://insilico.utulsa.edu/software/privateEC . brett-mckinney@utulsa.edu. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  5. An Introduction to Recursive Partitioning: Rationale, Application, and Characteristics of Classification and Regression Trees, Bagging, and Random Forests

    ERIC Educational Resources Information Center

    Strobl, Carolin; Malley, James; Tutz, Gerhard

    2009-01-01

    Recursive partitioning methods have become popular and widely used tools for nonparametric regression and classification in many scientific fields. Especially random forests, which can deal with large numbers of predictor variables even in the presence of complex interactions, have been applied successfully in genetics, clinical medicine, and…

  6. Remote sensing based detection of forested wetlands: An evaluation of LiDAR, aerial imagery, and their data fusion

    NASA Astrophysics Data System (ADS)

    Suiter, Ashley Elizabeth

    Multi-spectral imagery provides a robust and low-cost dataset for assessing wetland extent and quality over broad regions and is frequently used for wetland inventories. However in forested wetlands, hydrology is obscured by tree canopy making it difficult to detect with multi-spectral imagery alone. Because of this, classification of forested wetlands often includes greater errors than that of other wetlands types. Elevation and terrain derivatives have been shown to be useful for modelling wetland hydrology. But, few studies have addressed the use of LiDAR intensity data detecting hydrology in forested wetlands. Due the tendency of LiDAR signal to be attenuated by water, this research proposed the fusion of LiDAR intensity data with LiDAR elevation, terrain data, and aerial imagery, for the detection of forested wetland hydrology. We examined the utility of LiDAR intensity data and determined whether the fusion of Lidar derived data with multispectral imagery increased the accuracy of forested wetland classification compared with a classification performed with only multi-spectral image. Four classifications were performed: Classification A -- All Imagery, Classification B -- All LiDAR, Classification C -- LiDAR without Intensity, and Classification D -- Fusion of All Data. These classifications were performed using random forest and each resulted in a 3-foot resolution thematic raster of forested upland and forested wetland locations in Vermilion County, Illinois. The accuracies of these classifications were compared using Kappa Coefficient of Agreement. Importance statistics produced within the random forest classifier were evaluated in order to understand the contribution of individual datasets. Classification D, which used the fusion of LiDAR and multi-spectral imagery as input variables, had moderate to strong agreement between reference data and classification results. It was found that Classification A performed using all the LiDAR data and its derivatives (intensity, elevation, slope, aspect, curvatures, and Topographic Wetness Index) was the most accurate classification with Kappa: 78.04%, indicating moderate to strong agreement. However, Classification C, performed with LiDAR derivative without intensity data had less agreement than would be expected by chance, indicating that LiDAR contributed significantly to the accuracy of Classification B.

  7. Discriminant forest classification method and system

    DOEpatents

    Chen, Barry Y.; Hanley, William G.; Lemmond, Tracy D.; Hiller, Lawrence J.; Knapp, David A.; Mugge, Marshall J.

    2012-11-06

    A hybrid machine learning methodology and system for classification that combines classical random forest (RF) methodology with discriminant analysis (DA) techniques to provide enhanced classification capability. A DA technique which uses feature measurements of an object to predict its class membership, such as linear discriminant analysis (LDA) or Andersen-Bahadur linear discriminant technique (AB), is used to split the data at each node in each of its classification trees to train and grow the trees and the forest. When training is finished, a set of n DA-based decision trees of a discriminant forest is produced for use in predicting the classification of new samples of unknown class.

  8. Spectroscopic diagnosis of laryngeal carcinoma using near-infrared Raman spectroscopy and random recursive partitioning ensemble techniques.

    PubMed

    Teh, Seng Khoon; Zheng, Wei; Lau, David P; Huang, Zhiwei

    2009-06-01

    In this work, we evaluated the diagnostic ability of near-infrared (NIR) Raman spectroscopy associated with the ensemble recursive partitioning algorithm based on random forests for identifying cancer from normal tissue in the larynx. A rapid-acquisition NIR Raman system was utilized for tissue Raman measurements at 785 nm excitation, and 50 human laryngeal tissue specimens (20 normal; 30 malignant tumors) were used for NIR Raman studies. The random forests method was introduced to develop effective diagnostic algorithms for classification of Raman spectra of different laryngeal tissues. High-quality Raman spectra in the range of 800-1800 cm(-1) can be acquired from laryngeal tissue within 5 seconds. Raman spectra differed significantly between normal and malignant laryngeal tissues. Classification results obtained from the random forests algorithm on tissue Raman spectra yielded a diagnostic sensitivity of 88.0% and specificity of 91.4% for laryngeal malignancy identification. The random forests technique also provided variables importance that facilitates correlation of significant Raman spectral features with cancer transformation. This study shows that NIR Raman spectroscopy in conjunction with random forests algorithm has a great potential for the rapid diagnosis and detection of malignant tumors in the larynx.

  9. Mapping Deforestation area in North Korea Using Phenology-based Multi-Index and Random Forest

    NASA Astrophysics Data System (ADS)

    Jin, Y.; Sung, S.; Lee, D. K.; Jeong, S.

    2016-12-01

    Forest ecosystem provides ecological benefits to both humans and wildlife. Growing global demand for food and fiber is accelerating the pressure on the forest ecosystem in whole world from agriculture and logging. In recently, North Korea lost almost 40 % of its forests to crop fields for food production and cut-down of forest for fuel woods between 1990 and 2015. It led to the increased damage caused by natural disasters and is known to be one of the most forest degraded areas in the world. The characteristic of forest landscape in North Korea is complex and heterogeneous, the major landscape types in the forest are hillside farm, unstocked forest, natural forest and plateau vegetation. Remote sensing can be used for the forest degradation mapping of a dynamic landscape at a broad scale of detail and spatial distribution. Confusion mostly occurred between hillside farmland and unstocked forest, but also between unstocked forest and forest. Most previous forest degradation that used focused on the classification of broad types such as deforests area and sand from the perspective of land cover classification. The objective of this study is using random forest for mapping degraded forest in North Korea by phenological based vegetation index derived from MODIS products, which has various environmental factors such as vegetation, soil and water at a regional scale for improving accuracy. The model created by random forest resulted in an overall accuracy was 91.44%. Class user's accuracy of hillside farmland and unstocked forest were 97.2% and 84%%, which indicate the degraded forest. Unstocked forest had relative low user accuracy due to misclassified hillside farmland and forest samples. Producer's accuracy of hillside farmland and unstocked forest were 85.2% and 93.3%, repectly. In this case hillside farmland had lower produce accuracy mainly due to confusion with field, unstocked forest and forest. Such a classification of degraded forest could supply essential information to decide the priority of forest management and restoration in degraded forest area.

  10. Random Forest Application for NEXRAD Radar Data Quality Control

    NASA Astrophysics Data System (ADS)

    Keem, M.; Seo, B. C.; Krajewski, W. F.

    2017-12-01

    Identification and elimination of non-meteorological radar echoes (e.g., returns from ground, wind turbines, and biological targets) are the basic data quality control steps before radar data use in quantitative applications (e.g., precipitation estimation). Although WSR-88Ds' recent upgrade to dual-polarization has enhanced this quality control and echo classification, there are still challenges to detect some non-meteorological echoes that show precipitation-like characteristics (e.g., wind turbine or anomalous propagation clutter embedded in rain). With this in mind, a new quality control method using Random Forest is proposed in this study. This classification algorithm is known to produce reliable results with less uncertainty. The method introduces randomness into sampling and feature selections and integrates consequent multiple decision trees. The multidimensional structure of the trees can characterize the statistical interactions of involved multiple features in complex situations. The authors explore the performance of Random Forest method for NEXRAD radar data quality control. Training datasets are selected using several clear cases of precipitation and non-precipitation (but with some non-meteorological echoes). The model is structured using available candidate features (from the NEXRAD data) such as horizontal reflectivity, differential reflectivity, differential phase shift, copolar correlation coefficient, and their horizontal textures (e.g., local standard deviation). The influence of each feature on classification results are quantified by variable importance measures that are automatically estimated by the Random Forest algorithm. Therefore, the number and types of features in the final forest can be examined based on the classification accuracy. The authors demonstrate the capability of the proposed approach using several cases ranging from distinct to complex rain/no-rain events and compare the performance with the existing algorithms (e.g., MRMS). They also discuss operational feasibility based on the observed strength and weakness of the method.

  11. Multi-label spacecraft electrical signal classification method based on DBN and random forest

    PubMed Central

    Li, Ke; Yu, Nan; Li, Pengfei; Song, Shimin; Wu, Yalei; Li, Yang; Liu, Meng

    2017-01-01

    In spacecraft electrical signal characteristic data, there exists a large amount of data with high-dimensional features, a high computational complexity degree, and a low rate of identification problems, which causes great difficulty in fault diagnosis of spacecraft electronic load systems. This paper proposes a feature extraction method that is based on deep belief networks (DBN) and a classification method that is based on the random forest (RF) algorithm; The proposed algorithm mainly employs a multi-layer neural network to reduce the dimension of the original data, and then, classification is applied. Firstly, we use the method of wavelet denoising, which was used to pre-process the data. Secondly, the deep belief network is used to reduce the feature dimension and improve the rate of classification for the electrical characteristics data. Finally, we used the random forest algorithm to classify the data and comparing it with other algorithms. The experimental results show that compared with other algorithms, the proposed method shows excellent performance in terms of accuracy, computational efficiency, and stability in addressing spacecraft electrical signal data. PMID:28486479

  12. Multi-label spacecraft electrical signal classification method based on DBN and random forest.

    PubMed

    Li, Ke; Yu, Nan; Li, Pengfei; Song, Shimin; Wu, Yalei; Li, Yang; Liu, Meng

    2017-01-01

    In spacecraft electrical signal characteristic data, there exists a large amount of data with high-dimensional features, a high computational complexity degree, and a low rate of identification problems, which causes great difficulty in fault diagnosis of spacecraft electronic load systems. This paper proposes a feature extraction method that is based on deep belief networks (DBN) and a classification method that is based on the random forest (RF) algorithm; The proposed algorithm mainly employs a multi-layer neural network to reduce the dimension of the original data, and then, classification is applied. Firstly, we use the method of wavelet denoising, which was used to pre-process the data. Secondly, the deep belief network is used to reduce the feature dimension and improve the rate of classification for the electrical characteristics data. Finally, we used the random forest algorithm to classify the data and comparing it with other algorithms. The experimental results show that compared with other algorithms, the proposed method shows excellent performance in terms of accuracy, computational efficiency, and stability in addressing spacecraft electrical signal data.

  13. Data mining methods in the prediction of Dementia: A real-data comparison of the accuracy, sensitivity and specificity of linear discriminant analysis, logistic regression, neural networks, support vector machines, classification trees and random forests

    PubMed Central

    2011-01-01

    Background Dementia and cognitive impairment associated with aging are a major medical and social concern. Neuropsychological testing is a key element in the diagnostic procedures of Mild Cognitive Impairment (MCI), but has presently a limited value in the prediction of progression to dementia. We advance the hypothesis that newer statistical classification methods derived from data mining and machine learning methods like Neural Networks, Support Vector Machines and Random Forests can improve accuracy, sensitivity and specificity of predictions obtained from neuropsychological testing. Seven non parametric classifiers derived from data mining methods (Multilayer Perceptrons Neural Networks, Radial Basis Function Neural Networks, Support Vector Machines, CART, CHAID and QUEST Classification Trees and Random Forests) were compared to three traditional classifiers (Linear Discriminant Analysis, Quadratic Discriminant Analysis and Logistic Regression) in terms of overall classification accuracy, specificity, sensitivity, Area under the ROC curve and Press'Q. Model predictors were 10 neuropsychological tests currently used in the diagnosis of dementia. Statistical distributions of classification parameters obtained from a 5-fold cross-validation were compared using the Friedman's nonparametric test. Results Press' Q test showed that all classifiers performed better than chance alone (p < 0.05). Support Vector Machines showed the larger overall classification accuracy (Median (Me) = 0.76) an area under the ROC (Me = 0.90). However this method showed high specificity (Me = 1.0) but low sensitivity (Me = 0.3). Random Forest ranked second in overall accuracy (Me = 0.73) with high area under the ROC (Me = 0.73) specificity (Me = 0.73) and sensitivity (Me = 0.64). Linear Discriminant Analysis also showed acceptable overall accuracy (Me = 0.66), with acceptable area under the ROC (Me = 0.72) specificity (Me = 0.66) and sensitivity (Me = 0.64). The remaining classifiers showed overall classification accuracy above a median value of 0.63, but for most sensitivity was around or even lower than a median value of 0.5. Conclusions When taking into account sensitivity, specificity and overall classification accuracy Random Forests and Linear Discriminant analysis rank first among all the classifiers tested in prediction of dementia using several neuropsychological tests. These methods may be used to improve accuracy, sensitivity and specificity of Dementia predictions from neuropsychological testing. PMID:21849043

  14. Newer classification and regression tree techniques: Bagging and Random Forests for ecological prediction

    Treesearch

    Anantha M. Prasad; Louis R. Iverson; Andy Liaw; Andy Liaw

    2006-01-01

    We evaluated four statistical models - Regression Tree Analysis (RTA), Bagging Trees (BT), Random Forests (RF), and Multivariate Adaptive Regression Splines (MARS) - for predictive vegetation mapping under current and future climate scenarios according to the Canadian Climate Centre global circulation model.

  15. Aspen, climate, and sudden decline in western USA

    Treesearch

    Gerald E. Rehfeldt; Dennis E. Ferguson; Nicholas L. Crookston

    2009-01-01

    A bioclimate model predicting the presence or absence of aspen, Populus tremuloides, in western USA from climate variables was developed by using the Random Forests classification tree on Forest Inventory data from about 118,000 permanent sample plots. A reasonably parsimonious model used eight predictors to describe aspen's climate profile. Classification errors...

  16. Exploring diversity in ensemble classification: Applications in large area land cover mapping

    NASA Astrophysics Data System (ADS)

    Mellor, Andrew; Boukir, Samia

    2017-07-01

    Ensemble classifiers, such as random forests, are now commonly applied in the field of remote sensing, and have been shown to perform better than single classifier systems, resulting in reduced generalisation error. Diversity across the members of ensemble classifiers is known to have a strong influence on classification performance - whereby classifier errors are uncorrelated and more uniformly distributed across ensemble members. The relationship between ensemble diversity and classification performance has not yet been fully explored in the fields of information science and machine learning and has never been examined in the field of remote sensing. This study is a novel exploration of ensemble diversity and its link to classification performance, applied to a multi-class canopy cover classification problem using random forests and multisource remote sensing and ancillary GIS data, across seven million hectares of diverse dry-sclerophyll dominated public forests in Victoria Australia. A particular emphasis is placed on analysing the relationship between ensemble diversity and ensemble margin - two key concepts in ensemble learning. The main novelty of our work is on boosting diversity by emphasizing the contribution of lower margin instances used in the learning process. Exploring the influence of tree pruning on diversity is also a new empirical analysis that contributes to a better understanding of ensemble performance. Results reveal insights into the trade-off between ensemble classification accuracy and diversity, and through the ensemble margin, demonstrate how inducing diversity by targeting lower margin training samples is a means of achieving better classifier performance for more difficult or rarer classes and reducing information redundancy in classification problems. Our findings inform strategies for collecting training data and designing and parameterising ensemble classifiers, such as random forests. This is particularly important in large area remote sensing applications, for which training data is costly and resource intensive to collect.

  17. Object-based random forest classification of Landsat ETM+ and WorldView-2 satellite imagery for mapping lowland native grassland communities in Tasmania, Australia

    NASA Astrophysics Data System (ADS)

    Melville, Bethany; Lucieer, Arko; Aryal, Jagannath

    2018-04-01

    This paper presents a random forest classification approach for identifying and mapping three types of lowland native grassland communities found in the Tasmanian Midlands region. Due to the high conservation priority assigned to these communities, there has been an increasing need to identify appropriate datasets that can be used to derive accurate and frequently updateable maps of community extent. Therefore, this paper proposes a method employing repeat classification and statistical significance testing as a means of identifying the most appropriate dataset for mapping these communities. Two datasets were acquired and analysed; a Landsat ETM+ scene, and a WorldView-2 scene, both from 2010. Training and validation data were randomly subset using a k-fold (k = 50) approach from a pre-existing field dataset. Poa labillardierei, Themeda triandra and lowland native grassland complex communities were identified in addition to dry woodland and agriculture. For each subset of randomly allocated points, a random forest model was trained based on each dataset, and then used to classify the corresponding imagery. Validation was performed using the reciprocal points from the independent subset that had not been used to train the model. Final training and classification accuracies were reported as per class means for each satellite dataset. Analysis of Variance (ANOVA) was undertaken to determine whether classification accuracy differed between the two datasets, as well as between classifications. Results showed mean class accuracies between 54% and 87%. Class accuracy only differed significantly between datasets for the dry woodland and Themeda grassland classes, with the WorldView-2 dataset showing higher mean classification accuracies. The results of this study indicate that remote sensing is a viable method for the identification of lowland native grassland communities in the Tasmanian Midlands, and that repeat classification and statistical significant testing can be used to identify optimal datasets for vegetation community mapping.

  18. Adapting GNU random forest program for Unix and Windows

    NASA Astrophysics Data System (ADS)

    Jirina, Marcel; Krayem, M. Said; Jirina, Marcel, Jr.

    2013-10-01

    The Random Forest is a well-known method and also a program for data clustering and classification. Unfortunately, the original Random Forest program is rather difficult to use. Here we describe a new version of this program originally written in Fortran 77. The modified program in Fortran 95 needs to be compiled only once and information for different tasks is passed with help of arguments. The program was tested with 24 data sets from UCI MLR and results are available on the net.

  19. Land Covers Classification Based on Random Forest Method Using Features from Full-Waveform LIDAR Data

    NASA Astrophysics Data System (ADS)

    Ma, L.; Zhou, M.; Li, C.

    2017-09-01

    In this study, a Random Forest (RF) based land covers classification method is presented to predict the types of land covers in Miyun area. The returned full-waveforms which were acquired by a LiteMapper 5600 airborne LiDAR system were processed, including waveform filtering, waveform decomposition and features extraction. The commonly used features that were distance, intensity, Full Width at Half Maximum (FWHM), skewness and kurtosis were extracted. These waveform features were used as attributes of training data for generating the RF prediction model. The RF prediction model was applied to predict the types of land covers in Miyun area as trees, buildings, farmland and ground. The classification results of these four types of land covers were obtained according to the ground truth information acquired from CCD image data of the same region. The RF classification results were compared with that of SVM method and show better results. The RF classification accuracy reached 89.73% and the classification Kappa was 0.8631.

  20. Per-field crop classification in irrigated agricultural regions in middle Asia using random forest and support vector machine ensemble

    NASA Astrophysics Data System (ADS)

    Löw, Fabian; Schorcht, Gunther; Michel, Ulrich; Dech, Stefan; Conrad, Christopher

    2012-10-01

    Accurate crop identification and crop area estimation are important for studies on irrigated agricultural systems, yield and water demand modeling, and agrarian policy development. In this study a novel combination of Random Forest (RF) and Support Vector Machine (SVM) classifiers is presented that (i) enhances crop classification accuracy and (ii) provides spatial information on map uncertainty. The methodology was implemented over four distinct irrigated sites in Middle Asia using RapidEye time series data. The RF feature importance statistics was used as feature-selection strategy for the SVM to assess possible negative effects on classification accuracy caused by an oversized feature space. The results of the individual RF and SVM classifications were combined with rules based on posterior classification probability and estimates of classification probability entropy. SVM classification performance was increased by feature selection through RF. Further experimental results indicate that the hybrid classifier improves overall classification accuracy in comparison to the single classifiers as well as useŕs and produceŕs accuracy.

  1. Random-Forest Classification of High-Resolution Remote Sensing Images and Ndsm Over Urban Areas

    NASA Astrophysics Data System (ADS)

    Sun, X. F.; Lin, X. G.

    2017-09-01

    As an intermediate step between raw remote sensing data and digital urban maps, remote sensing data classification has been a challenging and long-standing research problem in the community of remote sensing. In this work, an effective classification method is proposed for classifying high-resolution remote sensing data over urban areas. Starting from high resolution multi-spectral images and 3D geometry data, our method proceeds in three main stages: feature extraction, classification, and classified result refinement. First, we extract color, vegetation index and texture features from the multi-spectral image and compute the height, elevation texture and differential morphological profile (DMP) features from the 3D geometry data. Then in the classification stage, multiple random forest (RF) classifiers are trained separately, then combined to form a RF ensemble to estimate each sample's category probabilities. Finally the probabilities along with the feature importance indicator outputted by RF ensemble are used to construct a fully connected conditional random field (FCCRF) graph model, by which the classification results are refined through mean-field based statistical inference. Experiments on the ISPRS Semantic Labeling Contest dataset show that our proposed 3-stage method achieves 86.9% overall accuracy on the test data.

  2. Faster Trees: Strategies for Accelerated Training and Prediction of Random Forests for Classification of Polsar Images

    NASA Astrophysics Data System (ADS)

    Hänsch, Ronny; Hellwich, Olaf

    2018-04-01

    Random Forests have continuously proven to be one of the most accurate, robust, as well as efficient methods for the supervised classification of images in general and polarimetric synthetic aperture radar data in particular. While the majority of previous work focus on improving classification accuracy, we aim for accelerating the training of the classifier as well as its usage during prediction while maintaining its accuracy. Unlike other approaches we mainly consider algorithmic changes to stay as much as possible independent of platform and programming language. The final model achieves an approximately 60 times faster training and a 500 times faster prediction, while the accuracy is only marginally decreased by roughly 1 %.

  3. Object based image analysis for the classification of the growth stages of Avocado crop, in Michoacán State, Mexico

    NASA Astrophysics Data System (ADS)

    Gao, Yan; Marpu, Prashanth; Morales Manila, Luis M.

    2014-11-01

    This paper assesses the suitability of 8-band Worldview-2 (WV2) satellite data and object-based random forest algorithm for the classification of avocado growth stages in Mexico. We tested both pixel-based with minimum distance (MD) and maximum likelihood (MLC) and object-based with Random Forest (RF) algorithm for this task. Training samples and verification data were selected by visual interpreting the WV2 images for seven thematic classes: fully grown, middle stage, and early stage of avocado crops, bare land, two types of natural forests, and water body. To examine the contribution of the four new spectral bands of WV2 sensor, all the tested classifications were carried out with and without the four new spectral bands. Classification accuracy assessment results show that object-based classification with RF algorithm obtained higher overall higher accuracy (93.06%) than pixel-based MD (69.37%) and MLC (64.03%) method. For both pixel-based and object-based methods, the classifications with the four new spectral bands (overall accuracy obtained higher accuracy than those without: overall accuracy of object-based RF classification with vs without: 93.06% vs 83.59%, pixel-based MD: 69.37% vs 67.2%, pixel-based MLC: 64.03% vs 36.05%, suggesting that the four new spectral bands in WV2 sensor contributed to the increase of the classification accuracy.

  4. Land cover and land use mapping of the iSimangaliso Wetland Park, South Africa: comparison of oblique and orthogonal random forest algorithms

    NASA Astrophysics Data System (ADS)

    Bassa, Zaakirah; Bob, Urmilla; Szantoi, Zoltan; Ismail, Riyad

    2016-01-01

    In recent years, the popularity of tree-based ensemble methods for land cover classification has increased significantly. Using WorldView-2 image data, we evaluate the potential of the oblique random forest algorithm (oRF) to classify a highly heterogeneous protected area. In contrast to the random forest (RF) algorithm, the oRF algorithm builds multivariate trees by learning the optimal split using a supervised model. The oRF binary algorithm is adapted to a multiclass land cover and land use application using both the "one-against-one" and "one-against-all" combination approaches. Results show that the oRF algorithms are capable of achieving high classification accuracies (>80%). However, there was no statistical difference in classification accuracies obtained by the oRF algorithms and the more popular RF algorithm. For all the algorithms, user accuracies (UAs) and producer accuracies (PAs) >80% were recorded for most of the classes. Both the RF and oRF algorithms poorly classified the indigenous forest class as indicated by the low UAs and PAs. Finally, the results from this study advocate and support the utility of the oRF algorithm for land cover and land use mapping of protected areas using WorldView-2 image data.

  5. THE WESTERN LAKE SUPERIOR COMPARATIVE WATERSHED FRAMEWORK: A FIELD TEST OF GEOGRAPHICALLY-DEPENDENT VS. THRESHOLD-BASED GEOGRAPHICALLY-INDEPENDENT CLASSIFICATION

    EPA Science Inventory

    Stratified random selection of watersheds allowed us to compare geographically-independent classification schemes based on watershed storage (wetland + lake area/watershed area) and forest fragmentation with a geographically-based classification scheme within the Northern Lakes a...

  6. Object-based forest classification to facilitate landscape-scale conservation in the Mississippi Alluvial Valley

    USGS Publications Warehouse

    Mitchell, Michael; Wilson, R. Randy; Twedt, Daniel J.; Mini, Anne E.; James, J. Dale

    2016-01-01

    The Mississippi Alluvial Valley is a floodplain along the southern extent of the Mississippi River extending from southern Missouri to the Gulf of Mexico. This area once encompassed nearly 10 million ha of floodplain forests, most of which has been converted to agriculture over the past two centuries. Conservation programs in this region revolve around protection of existing forest and reforestation of converted lands. Therefore, an accurate and up to date classification of forest cover is essential for conservation planning, including efforts that prioritize areas for conservation activities. We used object-based image analysis with Random Forest classification to quickly and accurately classify forest cover. We used Landsat band, band ratio, and band index statistics to identify and define similar objects as our training sets instead of selecting individual training points. This provided a single rule-set that was used to classify each of the 11 Landsat 5 Thematic Mapper scenes that encompassed the Mississippi Alluvial Valley. We classified 3,307,910±85,344 ha (32% of this region) as forest. Our overall classification accuracy was 96.9% with Kappa statistic of 0.96. Because this method of forest classification is rapid and accurate, assessment of forest cover can be regularly updated and progress toward forest habitat goals identified in conservation plans can be periodically evaluated.

  7. The Efficiency of Random Forest Method for Shoreline Extraction from LANDSAT-8 and GOKTURK-2 Imageries

    NASA Astrophysics Data System (ADS)

    Bayram, B.; Erdem, F.; Akpinar, B.; Ince, A. K.; Bozkurt, S.; Catal Reis, H.; Seker, D. Z.

    2017-11-01

    Coastal monitoring plays a vital role in environmental planning and hazard management related issues. Since shorelines are fundamental data for environment management, disaster management, coastal erosion studies, modelling of sediment transport and coastal morphodynamics, various techniques have been developed to extract shorelines. Random Forest is one of these techniques which is used in this study for shoreline extraction.. This algorithm is a machine learning method based on decision trees. Decision trees analyse classes of training data creates rules for classification. In this study, Terkos region has been chosen for the proposed method within the scope of "TUBITAK Project (Project No: 115Y718) titled "Integration of Unmanned Aerial Vehicles for Sustainable Coastal Zone Monitoring Model - Three-Dimensional Automatic Coastline Extraction and Analysis: Istanbul-Terkos Example". Random Forest algorithm has been implemented to extract the shoreline of the Black Sea where near the lake from LANDSAT-8 and GOKTURK-2 satellite imageries taken in 2015. The MATLAB environment was used for classification. To obtain land and water-body classes, the Random Forest method has been applied to NIR bands of LANDSAT-8 (5th band) and GOKTURK-2 (4th band) imageries. Each image has been digitized manually and shorelines obtained for accuracy assessment. According to accuracy assessment results, Random Forest method is efficient for both medium and high resolution images for shoreline extraction studies.

  8. Random Bits Forest: a Strong Classifier/Regressor for Big Data

    NASA Astrophysics Data System (ADS)

    Wang, Yi; Li, Yi; Pu, Weilin; Wen, Kathryn; Shugart, Yin Yao; Xiong, Momiao; Jin, Li

    2016-07-01

    Efficiency, memory consumption, and robustness are common problems with many popular methods for data analysis. As a solution, we present Random Bits Forest (RBF), a classification and regression algorithm that integrates neural networks (for depth), boosting (for width), and random forests (for prediction accuracy). Through a gradient boosting scheme, it first generates and selects ~10,000 small, 3-layer random neural networks. These networks are then fed into a modified random forest algorithm to obtain predictions. Testing with datasets from the UCI (University of California, Irvine) Machine Learning Repository shows that RBF outperforms other popular methods in both accuracy and robustness, especially with large datasets (N > 1000). The algorithm also performed highly in testing with an independent data set, a real psoriasis genome-wide association study (GWAS).

  9. Random Forest Algorithm for the Classification of Neuroimaging Data in Alzheimer's Disease: A Systematic Review.

    PubMed

    Sarica, Alessia; Cerasa, Antonio; Quattrone, Aldo

    2017-01-01

    Objective: Machine learning classification has been the most important computational development in the last years to satisfy the primary need of clinicians for automatic early diagnosis and prognosis. Nowadays, Random Forest (RF) algorithm has been successfully applied for reducing high dimensional and multi-source data in many scientific realms. Our aim was to explore the state of the art of the application of RF on single and multi-modal neuroimaging data for the prediction of Alzheimer's disease. Methods: A systematic review following PRISMA guidelines was conducted on this field of study. In particular, we constructed an advanced query using boolean operators as follows: ("random forest" OR "random forests") AND neuroimaging AND ("alzheimer's disease" OR alzheimer's OR alzheimer) AND (prediction OR classification) . The query was then searched in four well-known scientific databases: Pubmed, Scopus, Google Scholar and Web of Science. Results: Twelve articles-published between the 2007 and 2017-have been included in this systematic review after a quantitative and qualitative selection. The lesson learnt from these works suggest that when RF was applied on multi-modal data for prediction of Alzheimer's disease (AD) conversion from the Mild Cognitive Impairment (MCI), it produces one of the best accuracies to date. Moreover, the RF has important advantages in terms of robustness to overfitting, ability to handle highly non-linear data, stability in the presence of outliers and opportunity for efficient parallel processing mainly when applied on multi-modality neuroimaging data, such as, MRI morphometric, diffusion tensor imaging, and PET images. Conclusions: We discussed the strengths of RF, considering also possible limitations and by encouraging further studies on the comparisons of this algorithm with other commonly used classification approaches, particularly in the early prediction of the progression from MCI to AD.

  10. Classification of savanna tree species, in the Greater Kruger National Park region, by integrating hyperspectral and LiDAR data in a Random Forest data mining environment

    NASA Astrophysics Data System (ADS)

    Naidoo, L.; Cho, M. A.; Mathieu, R.; Asner, G.

    2012-04-01

    The accurate classification and mapping of individual trees at species level in the savanna ecosystem can provide numerous benefits for the managerial authorities. Such benefits include the mapping of economically useful tree species, which are a key source of food production and fuel wood for the local communities, and of problematic alien invasive and bush encroaching species, which can threaten the integrity of the environment and livelihoods of the local communities. Species level mapping is particularly challenging in African savannas which are complex, heterogeneous, and open environments with high intra-species spectral variability due to differences in geology, topography, rainfall, herbivory and human impacts within relatively short distances. Savanna vegetation are also highly irregular in canopy and crown shape, height and other structural dimensions with a combination of open grassland patches and dense woody thicket - a stark contrast to the more homogeneous forest vegetation. This study classified eight common savanna tree species in the Greater Kruger National Park region, South Africa, using a combination of hyperspectral and Light Detection and Ranging (LiDAR)-derived structural parameters, in the form of seven predictor datasets, in an automated Random Forest modelling approach. The most important predictors, which were found to play an important role in the different classification models and contributed to the success of the hybrid dataset model when combined, were species tree height; NDVI; the chlorophyll b wavelength (466 nm) and a selection of raw, continuum removed and Spectral Angle Mapper (SAM) bands. It was also concluded that the hybrid predictor dataset Random Forest model yielded the highest classification accuracy and prediction success for the eight savanna tree species with an overall classification accuracy of 87.68% and KHAT value of 0.843.

  11. Influence of multi-source and multi-temporal remotely sensed and ancillary data on the accuracy of random forest classification of wetlands in northern Minnesota

    USGS Publications Warehouse

    Corcoran, Jennifer M.; Knight, Joseph F.; Gallant, Alisa L.

    2013-01-01

    Wetland mapping at the landscape scale using remotely sensed data requires both affordable data and an efficient accurate classification method. Random forest classification offers several advantages over traditional land cover classification techniques, including a bootstrapping technique to generate robust estimations of outliers in the training data, as well as the capability of measuring classification confidence. Though the random forest classifier can generate complex decision trees with a multitude of input data and still not run a high risk of over fitting, there is a great need to reduce computational and operational costs by including only key input data sets without sacrificing a significant level of accuracy. Our main questions for this study site in Northern Minnesota were: (1) how does classification accuracy and confidence of mapping wetlands compare using different remote sensing platforms and sets of input data; (2) what are the key input variables for accurate differentiation of upland, water, and wetlands, including wetland type; and (3) which datasets and seasonal imagery yield the best accuracy for wetland classification. Our results show the key input variables include terrain (elevation and curvature) and soils descriptors (hydric), along with an assortment of remotely sensed data collected in the spring (satellite visible, near infrared, and thermal bands; satellite normalized vegetation index and Tasseled Cap greenness and wetness; and horizontal-horizontal (HH) and horizontal-vertical (HV) polarization using L-band satellite radar). We undertook this exploratory analysis to inform decisions by natural resource managers charged with monitoring wetland ecosystems and to aid in designing a system for consistent operational mapping of wetlands across landscapes similar to those found in Northern Minnesota.

  12. An assessment of the effectiveness of a random forest classifier for land-cover classification

    NASA Astrophysics Data System (ADS)

    Rodriguez-Galiano, V. F.; Ghimire, B.; Rogan, J.; Chica-Olmo, M.; Rigol-Sanchez, J. P.

    2012-01-01

    Land cover monitoring using remotely sensed data requires robust classification methods which allow for the accurate mapping of complex land cover and land use categories. Random forest (RF) is a powerful machine learning classifier that is relatively unknown in land remote sensing and has not been evaluated thoroughly by the remote sensing community compared to more conventional pattern recognition techniques. Key advantages of RF include: their non-parametric nature; high classification accuracy; and capability to determine variable importance. However, the split rules for classification are unknown, therefore RF can be considered to be black box type classifier. RF provides an algorithm for estimating missing values; and flexibility to perform several types of data analysis, including regression, classification, survival analysis, and unsupervised learning. In this paper, the performance of the RF classifier for land cover classification of a complex area is explored. Evaluation was based on several criteria: mapping accuracy, sensitivity to data set size and noise. Landsat-5 Thematic Mapper data captured in European spring and summer were used with auxiliary variables derived from a digital terrain model to classify 14 different land categories in the south of Spain. Results show that the RF algorithm yields accurate land cover classifications, with 92% overall accuracy and a Kappa index of 0.92. RF is robust to training data reduction and noise because significant differences in kappa values were only observed for data reduction and noise addition values greater than 50 and 20%, respectively. Additionally, variables that RF identified as most important for classifying land cover coincided with expectations. A McNemar test indicates an overall better performance of the random forest model over a single decision tree at the 0.00001 significance level.

  13. Mapping forested wetlands in the Great Zhan River Basin through integrating optical, radar, and topographical data classification techniques.

    PubMed

    Na, X D; Zang, S Y; Wu, C S; Li, W L

    2015-11-01

    Knowledge of the spatial extent of forested wetlands is essential to many studies including wetland functioning assessment, greenhouse gas flux estimation, and wildlife suitable habitat identification. For discriminating forested wetlands from their adjacent land cover types, researchers have resorted to image analysis techniques applied to numerous remotely sensed data. While with some success, there is still no consensus on the optimal approaches for mapping forested wetlands. To address this problem, we examined two machine learning approaches, random forest (RF) and K-nearest neighbor (KNN) algorithms, and applied these two approaches to the framework of pixel-based and object-based classifications. The RF and KNN algorithms were constructed using predictors derived from Landsat 8 imagery, Radarsat-2 advanced synthetic aperture radar (SAR), and topographical indices. The results show that the objected-based classifications performed better than per-pixel classifications using the same algorithm (RF) in terms of overall accuracy and the difference of their kappa coefficients are statistically significant (p<0.01). There were noticeably omissions for forested and herbaceous wetlands based on the per-pixel classifications using the RF algorithm. As for the object-based image analysis, there were also statistically significant differences (p<0.01) of Kappa coefficient between results performed based on RF and KNN algorithms. The object-based classification using RF provided a more visually adequate distribution of interested land cover types, while the object classifications based on the KNN algorithm showed noticeably commissions for forested wetlands and omissions for agriculture land. This research proves that the object-based classification with RF using optical, radar, and topographical data improved the mapping accuracy of land covers and provided a feasible approach to discriminate the forested wetlands from the other land cover types in forestry area.

  14. Random forests and stochastic gradient boosting for predicting tree canopy cover: Comparing tuning processes and model performance

    Treesearch

    E. Freeman; G. Moisen; J. Coulston; B. Wilson

    2014-01-01

    Random forests (RF) and stochastic gradient boosting (SGB), both involving an ensemble of classification and regression trees, are compared for modeling tree canopy cover for the 2011 National Land Cover Database (NLCD). The objectives of this study were twofold. First, sensitivity of RF and SGB to choices in tuning parameters was explored. Second, performance of the...

  15. Ensemble Feature Learning of Genomic Data Using Support Vector Machine

    PubMed Central

    Anaissi, Ali; Goyal, Madhu; Catchpoole, Daniel R.; Braytee, Ali; Kennedy, Paul J.

    2016-01-01

    The identification of a subset of genes having the ability to capture the necessary information to distinguish classes of patients is crucial in bioinformatics applications. Ensemble and bagging methods have been shown to work effectively in the process of gene selection and classification. Testament to that is random forest which combines random decision trees with bagging to improve overall feature selection and classification accuracy. Surprisingly, the adoption of these methods in support vector machines has only recently received attention but mostly on classification not gene selection. This paper introduces an ensemble SVM-Recursive Feature Elimination (ESVM-RFE) for gene selection that follows the concepts of ensemble and bagging used in random forest but adopts the backward elimination strategy which is the rationale of RFE algorithm. The rationale behind this is, building ensemble SVM models using randomly drawn bootstrap samples from the training set, will produce different feature rankings which will be subsequently aggregated as one feature ranking. As a result, the decision for elimination of features is based upon the ranking of multiple SVM models instead of choosing one particular model. Moreover, this approach will address the problem of imbalanced datasets by constructing a nearly balanced bootstrap sample. Our experiments show that ESVM-RFE for gene selection substantially increased the classification performance on five microarray datasets compared to state-of-the-art methods. Experiments on the childhood leukaemia dataset show that an average 9% better accuracy is achieved by ESVM-RFE over SVM-RFE, and 5% over random forest based approach. The selected genes by the ESVM-RFE algorithm were further explored with Singular Value Decomposition (SVD) which reveals significant clusters with the selected data. PMID:27304923

  16. Forest community classification of the Porcupine River drainage, interior Alaska, and its application to forest management.

    Treesearch

    John Yarie

    1983-01-01

    The forest vegetation of 3,600,000 hectares in northeast interior Alaska was classified. A total of 365 plots located in a stratified random design were run through the ordination programs SIMORD and TWINSPAN. A total of 40 forest communities were described vegetatively and, to a limited extent, environmentally. The area covered by each community was similar, ranging...

  17. FOCIS: A forest classification and inventory system using LANDSAT and digital terrain data

    NASA Technical Reports Server (NTRS)

    Strahler, A. H.; Franklin, J.; Woodcook, C. E.; Logan, T. L.

    1981-01-01

    Accurate, cost-effective stratification of forest vegetation and timber inventory is the primary goal of a Forest Classification and Inventory System (FOCIS). Conventional timber stratification using photointerpretation can be time-consuming, costly, and inconsistent from analyst to analyst. FOCIS was designed to overcome these problems by using machine processing techniques to extract and process tonal, textural, and terrain information from registered LANDSAT multispectral and digital terrain data. Comparison of samples from timber strata identified by conventional procedures showed that both have about the same potential to reduce the variance of timber volume estimates over simple random sampling.

  18. Field evaluation of a random forest activity classifier for wrist-worn accelerometer data.

    PubMed

    Pavey, Toby G; Gilson, Nicholas D; Gomersall, Sjaan R; Clark, Bronwyn; Trost, Stewart G

    2017-01-01

    Wrist-worn accelerometers are convenient to wear and associated with greater wear-time compliance. Previous work has generally relied on choreographed activity trials to train and test classification models. However, validity in free-living contexts is starting to emerge. Study aims were: (1) train and test a random forest activity classifier for wrist accelerometer data; and (2) determine if models trained on laboratory data perform well under free-living conditions. Twenty-one participants (mean age=27.6±6.2) completed seven lab-based activity trials and a 24h free-living trial (N=16). Participants wore a GENEActiv monitor on the non-dominant wrist. Classification models recognising four activity classes (sedentary, stationary+, walking, and running) were trained using time and frequency domain features extracted from 10-s non-overlapping windows. Model performance was evaluated using leave-one-out-cross-validation. Models were implemented using the randomForest package within R. Classifier accuracy during the 24h free living trial was evaluated by calculating agreement with concurrently worn activPAL monitors. Overall classification accuracy for the random forest algorithm was 92.7%. Recognition accuracy for sedentary, stationary+, walking, and running was 80.1%, 95.7%, 91.7%, and 93.7%, respectively for the laboratory protocol. Agreement with the activPAL data (stepping vs. non-stepping) during the 24h free-living trial was excellent and, on average, exceeded 90%. The ICC for stepping time was 0.92 (95% CI=0.75-0.97). However, sensitivity and positive predictive values were modest. Mean bias was 10.3min/d (95% LOA=-46.0 to 25.4min/d). The random forest classifier for wrist accelerometer data yielded accurate group-level predictions under controlled conditions, but was less accurate at identifying stepping verse non-stepping behaviour in free living conditions Future studies should conduct more rigorous field-based evaluations using observation as a criterion measure. Copyright © 2016 Sports Medicine Australia. Published by Elsevier Ltd. All rights reserved.

  19. Ensemble Pruning for Glaucoma Detection in an Unbalanced Data Set.

    PubMed

    Adler, Werner; Gefeller, Olaf; Gul, Asma; Horn, Folkert K; Khan, Zardad; Lausen, Berthold

    2016-12-07

    Random forests are successful classifier ensemble methods consisting of typically 100 to 1000 classification trees. Ensemble pruning techniques reduce the computational cost, especially the memory demand, of random forests by reducing the number of trees without relevant loss of performance or even with increased performance of the sub-ensemble. The application to the problem of an early detection of glaucoma, a severe eye disease with low prevalence, based on topographical measurements of the eye background faces specific challenges. We examine the performance of ensemble pruning strategies for glaucoma detection in an unbalanced data situation. The data set consists of 102 topographical features of the eye background of 254 healthy controls and 55 glaucoma patients. We compare the area under the receiver operating characteristic curve (AUC), and the Brier score on the total data set, in the majority class, and in the minority class of pruned random forest ensembles obtained with strategies based on the prediction accuracy of greedily grown sub-ensembles, the uncertainty weighted accuracy, and the similarity between single trees. To validate the findings and to examine the influence of the prevalence of glaucoma in the data set, we additionally perform a simulation study with lower prevalences of glaucoma. In glaucoma classification all three pruning strategies lead to improved AUC and smaller Brier scores on the total data set with sub-ensembles as small as 30 to 80 trees compared to the classification results obtained with the full ensemble consisting of 1000 trees. In the simulation study, we were able to show that the prevalence of glaucoma is a critical factor and lower prevalence decreases the performance of our pruning strategies. The memory demand for glaucoma classification in an unbalanced data situation based on random forests could effectively be reduced by the application of pruning strategies without loss of performance in a population with increased risk of glaucoma.

  20. Random forest feature selection, fusion and ensemble strategy: Combining multiple morphological MRI measures to discriminate among healhy elderly, MCI, cMCI and alzheimer's disease patients: From the alzheimer's disease neuroimaging initiative (ADNI) database.

    PubMed

    Dimitriadis, S I; Liparas, Dimitris; Tsolaki, Magda N

    2018-05-15

    In the era of computer-assisted diagnostic tools for various brain diseases, Alzheimer's disease (AD) covers a large percentage of neuroimaging research, with the main scope being its use in daily practice. However, there has been no study attempting to simultaneously discriminate among Healthy Controls (HC), early mild cognitive impairment (MCI), late MCI (cMCI) and stable AD, using features derived from a single modality, namely MRI. Based on preprocessed MRI images from the organizers of a neuroimaging challenge, 3 we attempted to quantify the prediction accuracy of multiple morphological MRI features to simultaneously discriminate among HC, MCI, cMCI and AD. We explored the efficacy of a novel scheme that includes multiple feature selections via Random Forest from subsets of the whole set of features (e.g. whole set, left/right hemisphere etc.), Random Forest classification using a fusion approach and ensemble classification via majority voting. From the ADNI database, 60 HC, 60 MCI, 60 cMCI and 60 CE were used as a training set with known labels. An extra dataset of 160 subjects (HC: 40, MCI: 40, cMCI: 40 and AD: 40) was used as an external blind validation dataset to evaluate the proposed machine learning scheme. In the second blind dataset, we succeeded in a four-class classification of 61.9% by combining MRI-based features with a Random Forest-based Ensemble Strategy. We achieved the best classification accuracy of all teams that participated in this neuroimaging competition. The results demonstrate the effectiveness of the proposed scheme to simultaneously discriminate among four groups using morphological MRI features for the very first time in the literature. Hence, the proposed machine learning scheme can be used to define single and multi-modal biomarkers for AD. Copyright © 2017 Elsevier B.V. All rights reserved.

  1. Random forests for classification in ecology

    USGS Publications Warehouse

    Cutler, D.R.; Edwards, T.C.; Beard, K.H.; Cutler, A.; Hess, K.T.; Gibson, J.; Lawler, J.J.

    2007-01-01

    Classification procedures are some of the most widely used statistical methods in ecology. Random forests (RF) is a new and powerful statistical classifier that is well established in other disciplines but is relatively unknown in ecology. Advantages of RF compared to other statistical classifiers include (1) very high classification accuracy; (2) a novel method of determining variable importance; (3) ability to model complex interactions among predictor variables; (4) flexibility to perform several types of statistical data analysis, including regression, classification, survival analysis, and unsupervised learning; and (5) an algorithm for imputing missing values. We compared the accuracies of RF and four other commonly used statistical classifiers using data on invasive plant species presence in Lava Beds National Monument, California, USA, rare lichen species presence in the Pacific Northwest, USA, and nest sites for cavity nesting birds in the Uinta Mountains, Utah, USA. We observed high classification accuracy in all applications as measured by cross-validation and, in the case of the lichen data, by independent test data, when comparing RF to other common classification methods. We also observed that the variables that RF identified as most important for classifying invasive plant species coincided with expectations based on the literature. ?? 2007 by the Ecological Society of America.

  2. Beluga whale (Delphinapterus leucas) vocalizations and call classification from the eastern Beaufort Sea population.

    PubMed

    Garland, Ellen C; Castellote, Manuel; Berchok, Catherine L

    2015-06-01

    Beluga whales, Delphinapterus leucas, have a graded call system; call types exist on a continuum making classification challenging. A description of vocalizations from the eastern Beaufort Sea beluga population during its spring migration are presented here, using both a non-parametric classification tree analysis (CART), and a Random Forest analysis. Twelve frequency and duration measurements were made on 1019 calls recorded over 14 days off Icy Cape, Alaska, resulting in 34 identifiable call types with 83% agreement in classification for both CART and Random Forest analyses. This high level of agreement in classification, with an initial subjective classification of calls into 36 categories, demonstrates that the methods applied here provide a quantitative analysis of a graded call dataset. Further, as calls cannot be attributed to individuals using single sensor passive acoustic monitoring efforts, these methods provide a comprehensive analysis of data where the influence of pseudo-replication of calls from individuals is unknown. This study is the first to describe the vocal repertoire of a beluga population using a robust and repeatable methodology. A baseline eastern Beaufort Sea beluga population repertoire is presented here, against which the call repertoire of other seasonally sympatric Alaskan beluga populations can be compared.

  3. Computer aided diagnosis system for the Alzheimer's disease based on partial least squares and random forest SPECT image classification.

    PubMed

    Ramírez, J; Górriz, J M; Segovia, F; Chaves, R; Salas-Gonzalez, D; López, M; Alvarez, I; Padilla, P

    2010-03-19

    This letter shows a computer aided diagnosis (CAD) technique for the early detection of the Alzheimer's disease (AD) by means of single photon emission computed tomography (SPECT) image classification. The proposed method is based on partial least squares (PLS) regression model and a random forest (RF) predictor. The challenge of the curse of dimensionality is addressed by reducing the large dimensionality of the input data by downscaling the SPECT images and extracting score features using PLS. A RF predictor then forms an ensemble of classification and regression tree (CART)-like classifiers being its output determined by a majority vote of the trees in the forest. A baseline principal component analysis (PCA) system is also developed for reference. The experimental results show that the combined PLS-RF system yields a generalization error that converges to a limit when increasing the number of trees in the forest. Thus, the generalization error is reduced when using PLS and depends on the strength of the individual trees in the forest and the correlation between them. Moreover, PLS feature extraction is found to be more effective for extracting discriminative information from the data than PCA yielding peak sensitivity, specificity and accuracy values of 100%, 92.7%, and 96.9%, respectively. Moreover, the proposed CAD system outperformed several other recently developed AD CAD systems. Copyright 2010 Elsevier Ireland Ltd. All rights reserved.

  4. Linear Subpixel Learning Algorithm for Land Cover Classification from WELD using High Performance Computing

    NASA Technical Reports Server (NTRS)

    Kumar, Uttam; Nemani, Ramakrishna R.; Ganguly, Sangram; Kalia, Subodh; Michaelis, Andrew

    2017-01-01

    In this work, we use a Fully Constrained Least Squares Subpixel Learning Algorithm to unmix global WELD (Web Enabled Landsat Data) to obtain fractions or abundances of substrate (S), vegetation (V) and dark objects (D) classes. Because of the sheer nature of data and compute needs, we leveraged the NASA Earth Exchange (NEX) high performance computing architecture to optimize and scale our algorithm for large-scale processing. Subsequently, the S-V-D abundance maps were characterized into 4 classes namely, forest, farmland, water and urban areas (with NPP-VIIRS-national polar orbiting partnership visible infrared imaging radiometer suite nighttime lights data) over California, USA using Random Forest classifier. Validation of these land cover maps with NLCD (National Land Cover Database) 2011 products and NAFD (North American Forest Dynamics) static forest cover maps showed that an overall classification accuracy of over 91 percent was achieved, which is a 6 percent improvement in unmixing based classification relative to per-pixel-based classification. As such, abundance maps continue to offer an useful alternative to high-spatial resolution data derived classification maps for forest inventory analysis, multi-class mapping for eco-climatic models and applications, fast multi-temporal trend analysis and for societal and policy-relevant applications needed at the watershed scale.

  5. Linear Subpixel Learning Algorithm for Land Cover Classification from WELD using High Performance Computing

    NASA Astrophysics Data System (ADS)

    Ganguly, S.; Kumar, U.; Nemani, R. R.; Kalia, S.; Michaelis, A.

    2017-12-01

    In this work, we use a Fully Constrained Least Squares Subpixel Learning Algorithm to unmix global WELD (Web Enabled Landsat Data) to obtain fractions or abundances of substrate (S), vegetation (V) and dark objects (D) classes. Because of the sheer nature of data and compute needs, we leveraged the NASA Earth Exchange (NEX) high performance computing architecture to optimize and scale our algorithm for large-scale processing. Subsequently, the S-V-D abundance maps were characterized into 4 classes namely, forest, farmland, water and urban areas (with NPP-VIIRS - national polar orbiting partnership visible infrared imaging radiometer suite nighttime lights data) over California, USA using Random Forest classifier. Validation of these land cover maps with NLCD (National Land Cover Database) 2011 products and NAFD (North American Forest Dynamics) static forest cover maps showed that an overall classification accuracy of over 91% was achieved, which is a 6% improvement in unmixing based classification relative to per-pixel based classification. As such, abundance maps continue to offer an useful alternative to high-spatial resolution data derived classification maps for forest inventory analysis, multi-class mapping for eco-climatic models and applications, fast multi-temporal trend analysis and for societal and policy-relevant applications needed at the watershed scale.

  6. A comparison of rule-based and machine learning approaches for classifying patient portal messages.

    PubMed

    Cronin, Robert M; Fabbri, Daniel; Denny, Joshua C; Rosenbloom, S Trent; Jackson, Gretchen Purcell

    2017-09-01

    Secure messaging through patient portals is an increasingly popular way that consumers interact with healthcare providers. The increasing burden of secure messaging can affect clinic staffing and workflows. Manual management of portal messages is costly and time consuming. Automated classification of portal messages could potentially expedite message triage and delivery of care. We developed automated patient portal message classifiers with rule-based and machine learning techniques using bag of words and natural language processing (NLP) approaches. To evaluate classifier performance, we used a gold standard of 3253 portal messages manually categorized using a taxonomy of communication types (i.e., main categories of informational, medical, logistical, social, and other communications, and subcategories including prescriptions, appointments, problems, tests, follow-up, contact information, and acknowledgement). We evaluated our classifiers' accuracies in identifying individual communication types within portal messages with area under the receiver-operator curve (AUC). Portal messages often contain more than one type of communication. To predict all communication types within single messages, we used the Jaccard Index. We extracted the variables of importance for the random forest classifiers. The best performing approaches to classification for the major communication types were: logistic regression for medical communications (AUC: 0.899); basic (rule-based) for informational communications (AUC: 0.842); and random forests for social communications and logistical communications (AUCs: 0.875 and 0.925, respectively). The best performing classification approach of classifiers for individual communication subtypes was random forests for Logistical-Contact Information (AUC: 0.963). The Jaccard Indices by approach were: basic classifier, Jaccard Index: 0.674; Naïve Bayes, Jaccard Index: 0.799; random forests, Jaccard Index: 0.859; and logistic regression, Jaccard Index: 0.861. For medical communications, the most predictive variables were NLP concepts (e.g., Temporal_Concept, which maps to 'morning', 'evening' and Idea_or_Concept which maps to 'appointment' and 'refill'). For logistical communications, the most predictive variables contained similar numbers of NLP variables and words (e.g., Telephone mapping to 'phone', 'insurance'). For social and informational communications, the most predictive variables were words (e.g., social: 'thanks', 'much', informational: 'question', 'mean'). This study applies automated classification methods to the content of patient portal messages and evaluates the application of NLP techniques on consumer communications in patient portal messages. We demonstrated that random forest and logistic regression approaches accurately classified the content of portal messages, although the best approach to classification varied by communication type. Words were the most predictive variables for classification of most communication types, although NLP variables were most predictive for medical communication types. As adoption of patient portals increases, automated techniques could assist in understanding and managing growing volumes of messages. Further work is needed to improve classification performance to potentially support message triage and answering. Copyright © 2017 Elsevier B.V. All rights reserved.

  7. Decision-Tree, Rule-Based, and Random Forest Classification of High-Resolution Multispectral Imagery for Wetland Mapping and Inventory

    EPA Science Inventory

    Efforts are increasingly being made to classify the world’s wetland resources, an important ecosystem and habitat that is diminishing in abundance. There are multiple remote sensing classification methods, including a suite of nonparametric classifiers such as decision-tree...

  8. Forest Cover Estimation in Ireland Using Radar Remote Sensing: A Comparative Analysis of Forest Cover Assessment Methodologies.

    PubMed

    Devaney, John; Barrett, Brian; Barrett, Frank; Redmond, John; O Halloran, John

    2015-01-01

    Quantification of spatial and temporal changes in forest cover is an essential component of forest monitoring programs. Due to its cloud free capability, Synthetic Aperture Radar (SAR) is an ideal source of information on forest dynamics in countries with near-constant cloud-cover. However, few studies have investigated the use of SAR for forest cover estimation in landscapes with highly sparse and fragmented forest cover. In this study, the potential use of L-band SAR for forest cover estimation in two regions (Longford and Sligo) in Ireland is investigated and compared to forest cover estimates derived from three national (Forestry2010, Prime2, National Forest Inventory), one pan-European (Forest Map 2006) and one global forest cover (Global Forest Change) product. Two machine-learning approaches (Random Forests and Extremely Randomised Trees) are evaluated. Both Random Forests and Extremely Randomised Trees classification accuracies were high (98.1-98.5%), with differences between the two classifiers being minimal (<0.5%). Increasing levels of post classification filtering led to a decrease in estimated forest area and an increase in overall accuracy of SAR-derived forest cover maps. All forest cover products were evaluated using an independent validation dataset. For the Longford region, the highest overall accuracy was recorded with the Forestry2010 dataset (97.42%) whereas in Sligo, highest overall accuracy was obtained for the Prime2 dataset (97.43%), although accuracies of SAR-derived forest maps were comparable. Our findings indicate that spaceborne radar could aid inventories in regions with low levels of forest cover in fragmented landscapes. The reduced accuracies observed for the global and pan-continental forest cover maps in comparison to national and SAR-derived forest maps indicate that caution should be exercised when applying these datasets for national reporting.

  9. Forest Cover Estimation in Ireland Using Radar Remote Sensing: A Comparative Analysis of Forest Cover Assessment Methodologies

    PubMed Central

    Devaney, John; Barrett, Brian; Barrett, Frank; Redmond, John; O`Halloran, John

    2015-01-01

    Quantification of spatial and temporal changes in forest cover is an essential component of forest monitoring programs. Due to its cloud free capability, Synthetic Aperture Radar (SAR) is an ideal source of information on forest dynamics in countries with near-constant cloud-cover. However, few studies have investigated the use of SAR for forest cover estimation in landscapes with highly sparse and fragmented forest cover. In this study, the potential use of L-band SAR for forest cover estimation in two regions (Longford and Sligo) in Ireland is investigated and compared to forest cover estimates derived from three national (Forestry2010, Prime2, National Forest Inventory), one pan-European (Forest Map 2006) and one global forest cover (Global Forest Change) product. Two machine-learning approaches (Random Forests and Extremely Randomised Trees) are evaluated. Both Random Forests and Extremely Randomised Trees classification accuracies were high (98.1–98.5%), with differences between the two classifiers being minimal (<0.5%). Increasing levels of post classification filtering led to a decrease in estimated forest area and an increase in overall accuracy of SAR-derived forest cover maps. All forest cover products were evaluated using an independent validation dataset. For the Longford region, the highest overall accuracy was recorded with the Forestry2010 dataset (97.42%) whereas in Sligo, highest overall accuracy was obtained for the Prime2 dataset (97.43%), although accuracies of SAR-derived forest maps were comparable. Our findings indicate that spaceborne radar could aid inventories in regions with low levels of forest cover in fragmented landscapes. The reduced accuracies observed for the global and pan-continental forest cover maps in comparison to national and SAR-derived forest maps indicate that caution should be exercised when applying these datasets for national reporting. PMID:26262681

  10. Integrating support vector machines and random forests to classify crops in time series of Worldview-2 images

    NASA Astrophysics Data System (ADS)

    Zafari, A.; Zurita-Milla, R.; Izquierdo-Verdiguier, E.

    2017-10-01

    Crop maps are essential inputs for the agricultural planning done at various governmental and agribusinesses agencies. Remote sensing offers timely and costs efficient technologies to identify and map crop types over large areas. Among the plethora of classification methods, Support Vector Machine (SVM) and Random Forest (RF) are widely used because of their proven performance. In this work, we study the synergic use of both methods by introducing a random forest kernel (RFK) in an SVM classifier. A time series of multispectral WorldView-2 images acquired over Mali (West Africa) in 2014 was used to develop our case study. Ground truth containing five common crop classes (cotton, maize, millet, peanut, and sorghum) were collected at 45 farms and used to train and test the classifiers. An SVM with the standard Radial Basis Function (RBF) kernel, a RF, and an SVM-RFK were trained and tested over 10 random training and test subsets generated from the ground data. Results show that the newly proposed SVM-RFK classifier can compete with both RF and SVM-RBF. The overall accuracies based on the spectral bands only are of 83, 82 and 83% respectively. Adding vegetation indices to the analysis result in the classification accuracy of 82, 81 and 84% for SVM-RFK, RF, and SVM-RBF respectively. Overall, it can be observed that the newly tested RFK can compete with SVM-RBF and RF classifiers in terms of classification accuracy.

  11. Advanced Subspace Techniques for Modeling Channel and Session Variability in a Speaker Recognition System

    DTIC Science & Technology

    2012-03-01

    with each SVM discriminating between a pair of the N total speakers in the data set. The (( + 1))/2 classifiers then vote on the final...classification of a test sample. The Random Forest classifier is an ensemble classifier that votes amongst decision trees generated with each node using...Forest vote , and the effects of overtraining will be mitigated by the fact that each decision tree is overtrained differently (due to the random

  12. Computer-Aided Screening of Conjugated Polymers for Organic Solar Cell: Classification by Random Forest.

    PubMed

    Nagasawa, Shinji; Al-Naamani, Eman; Saeki, Akinori

    2018-05-17

    Owing to the diverse chemical structures, organic photovoltaic (OPV) applications with a bulk heterojunction framework have greatly evolved over the last two decades, which has produced numerous organic semiconductors exhibiting improved power conversion efficiencies (PCEs). Despite the recent fast progress in materials informatics and data science, data-driven molecular design of OPV materials remains challenging. We report a screening of conjugated molecules for polymer-fullerene OPV applications by supervised learning methods (artificial neural network (ANN) and random forest (RF)). Approximately 1000 experimental parameters including PCE, molecular weight, and electronic properties are manually collected from the literature and subjected to machine learning with digitized chemical structures. Contrary to the low correlation coefficient in ANN, RF yields an acceptable accuracy, which is twice that of random classification. We demonstrate the application of RF screening for the design, synthesis, and characterization of a conjugated polymer, which facilitates a rapid development of optoelectronic materials.

  13. Evaluating data mining algorithms using molecular dynamics trajectories.

    PubMed

    Tatsis, Vasileios A; Tjortjis, Christos; Tzirakis, Panagiotis

    2013-01-01

    Molecular dynamics simulations provide a sample of a molecule's conformational space. Experiments on the mus time scale, resulting in large amounts of data, are nowadays routine. Data mining techniques such as classification provide a way to analyse such data. In this work, we evaluate and compare several classification algorithms using three data sets which resulted from computer simulations, of a potential enzyme mimetic biomolecule. We evaluated 65 classifiers available in the well-known data mining toolkit Weka, using 'classification' errors to assess algorithmic performance. Results suggest that: (i) 'meta' classifiers perform better than the other groups, when applied to molecular dynamics data sets; (ii) Random Forest and Rotation Forest are the best classifiers for all three data sets; and (iii) classification via clustering yields the highest classification error. Our findings are consistent with bibliographic evidence, suggesting a 'roadmap' for dealing with such data.

  14. Predicting redox-sensitive contaminant concentrations in groundwater using random forest classification

    NASA Astrophysics Data System (ADS)

    Tesoriero, Anthony J.; Gronberg, Jo Ann; Juckem, Paul F.; Miller, Matthew P.; Austin, Brian P.

    2017-08-01

    Machine learning techniques were applied to a large (n > 10,000) compliance monitoring database to predict the occurrence of several redox-active constituents in groundwater across a large watershed. Specifically, random forest classification was used to determine the probabilities of detecting elevated concentrations of nitrate, iron, and arsenic in the Fox, Wolf, Peshtigo, and surrounding watersheds in northeastern Wisconsin. Random forest classification is well suited to describe the nonlinear relationships observed among several explanatory variables and the predicted probabilities of elevated concentrations of nitrate, iron, and arsenic. Maps of the probability of elevated nitrate, iron, and arsenic can be used to assess groundwater vulnerability and the vulnerability of streams to contaminants derived from groundwater. Processes responsible for elevated concentrations are elucidated using partial dependence plots. For example, an increase in the probability of elevated iron and arsenic occurred when well depths coincided with the glacial/bedrock interface, suggesting a bedrock source for these constituents. Furthermore, groundwater in contact with Ordovician bedrock has a higher likelihood of elevated iron concentrations, which supports the hypothesis that groundwater liberates iron from a sulfide-bearing secondary cement horizon of Ordovician age. Application of machine learning techniques to existing compliance monitoring data offers an opportunity to broadly assess aquifer and stream vulnerability at regional and national scales and to better understand geochemical processes responsible for observed conditions.

  15. Predicting redox-sensitive contaminant concentrations in groundwater using random forest classification

    USGS Publications Warehouse

    Tesoriero, Anthony J.; Gronberg, Jo Ann M.; Juckem, Paul F.; Miller, Matthew P.; Austin, Brian P.

    2017-01-01

    Machine learning techniques were applied to a large (n > 10,000) compliance monitoring database to predict the occurrence of several redox-active constituents in groundwater across a large watershed. Specifically, random forest classification was used to determine the probabilities of detecting elevated concentrations of nitrate, iron, and arsenic in the Fox, Wolf, Peshtigo, and surrounding watersheds in northeastern Wisconsin. Random forest classification is well suited to describe the nonlinear relationships observed among several explanatory variables and the predicted probabilities of elevated concentrations of nitrate, iron, and arsenic. Maps of the probability of elevated nitrate, iron, and arsenic can be used to assess groundwater vulnerability and the vulnerability of streams to contaminants derived from groundwater. Processes responsible for elevated concentrations are elucidated using partial dependence plots. For example, an increase in the probability of elevated iron and arsenic occurred when well depths coincided with the glacial/bedrock interface, suggesting a bedrock source for these constituents. Furthermore, groundwater in contact with Ordovician bedrock has a higher likelihood of elevated iron concentrations, which supports the hypothesis that groundwater liberates iron from a sulfide-bearing secondary cement horizon of Ordovician age. Application of machine learning techniques to existing compliance monitoring data offers an opportunity to broadly assess aquifer and stream vulnerability at regional and national scales and to better understand geochemical processes responsible for observed conditions.

  16. Automatic Classification of Time-variable X-Ray Sources

    NASA Astrophysics Data System (ADS)

    Lo, Kitty K.; Farrell, Sean; Murphy, Tara; Gaensler, B. M.

    2014-05-01

    To maximize the discovery potential of future synoptic surveys, especially in the field of transient science, it will be necessary to use automatic classification to identify some of the astronomical sources. The data mining technique of supervised classification is suitable for this problem. Here, we present a supervised learning method to automatically classify variable X-ray sources in the Second XMM-Newton Serendipitous Source Catalog (2XMMi-DR2). Random Forest is our classifier of choice since it is one of the most accurate learning algorithms available. Our training set consists of 873 variable sources and their features are derived from time series, spectra, and other multi-wavelength contextual information. The 10 fold cross validation accuracy of the training data is ~97% on a 7 class data set. We applied the trained classification model to 411 unknown variable 2XMM sources to produce a probabilistically classified catalog. Using the classification margin and the Random Forest derived outlier measure, we identified 12 anomalous sources, of which 2XMM J180658.7-500250 appears to be the most unusual source in the sample. Its X-ray spectra is suggestive of a ultraluminous X-ray source but its variability makes it highly unusual. Machine-learned classification and anomaly detection will facilitate scientific discoveries in the era of all-sky surveys.

  17. Using Classification and Regression Trees (CART) and random forests to analyze attrition: Results from two simulations.

    PubMed

    Hayes, Timothy; Usami, Satoshi; Jacobucci, Ross; McArdle, John J

    2015-12-01

    In this article, we describe a recent development in the analysis of attrition: using classification and regression trees (CART) and random forest methods to generate inverse sampling weights. These flexible machine learning techniques have the potential to capture complex nonlinear, interactive selection models, yet to our knowledge, their performance in the missing data analysis context has never been evaluated. To assess the potential benefits of these methods, we compare their performance with commonly employed multiple imputation and complete case techniques in 2 simulations. These initial results suggest that weights computed from pruned CART analyses performed well in terms of both bias and efficiency when compared with other methods. We discuss the implications of these findings for applied researchers. (c) 2015 APA, all rights reserved).

  18. Using Classification and Regression Trees (CART) and Random Forests to Analyze Attrition: Results From Two Simulations

    PubMed Central

    Hayes, Timothy; Usami, Satoshi; Jacobucci, Ross; McArdle, John J.

    2016-01-01

    In this article, we describe a recent development in the analysis of attrition: using classification and regression trees (CART) and random forest methods to generate inverse sampling weights. These flexible machine learning techniques have the potential to capture complex nonlinear, interactive selection models, yet to our knowledge, their performance in the missing data analysis context has never been evaluated. To assess the potential benefits of these methods, we compare their performance with commonly employed multiple imputation and complete case techniques in 2 simulations. These initial results suggest that weights computed from pruned CART analyses performed well in terms of both bias and efficiency when compared with other methods. We discuss the implications of these findings for applied researchers. PMID:26389526

  19. Random forests ensemble classifier trained with data resampling strategy to improve cardiac arrhythmia diagnosis.

    PubMed

    Ozçift, Akin

    2011-05-01

    Supervised classification algorithms are commonly used in the designing of computer-aided diagnosis systems. In this study, we present a resampling strategy based Random Forests (RF) ensemble classifier to improve diagnosis of cardiac arrhythmia. Random forests is an ensemble classifier that consists of many decision trees and outputs the class that is the mode of the class's output by individual trees. In this way, an RF ensemble classifier performs better than a single tree from classification performance point of view. In general, multiclass datasets having unbalanced distribution of sample sizes are difficult to analyze in terms of class discrimination. Cardiac arrhythmia is such a dataset that has multiple classes with small sample sizes and it is therefore adequate to test our resampling based training strategy. The dataset contains 452 samples in fourteen types of arrhythmias and eleven of these classes have sample sizes less than 15. Our diagnosis strategy consists of two parts: (i) a correlation based feature selection algorithm is used to select relevant features from cardiac arrhythmia dataset. (ii) RF machine learning algorithm is used to evaluate the performance of selected features with and without simple random sampling to evaluate the efficiency of proposed training strategy. The resultant accuracy of the classifier is found to be 90.0% and this is a quite high diagnosis performance for cardiac arrhythmia. Furthermore, three case studies, i.e., thyroid, cardiotocography and audiology, are used to benchmark the effectiveness of the proposed method. The results of experiments demonstrated the efficiency of random sampling strategy in training RF ensemble classification algorithm. Copyright © 2011 Elsevier Ltd. All rights reserved.

  20. Optimizing classification performance in an object-based very-high-resolution land use-land cover urban application

    NASA Astrophysics Data System (ADS)

    Georganos, Stefanos; Grippa, Tais; Vanhuysse, Sabine; Lennert, Moritz; Shimoni, Michal; Wolff, Eléonore

    2017-10-01

    This study evaluates the impact of three Feature Selection (FS) algorithms in an Object Based Image Analysis (OBIA) framework for Very-High-Resolution (VHR) Land Use-Land Cover (LULC) classification. The three selected FS algorithms, Correlation Based Selection (CFS), Mean Decrease in Accuracy (MDA) and Random Forest (RF) based Recursive Feature Elimination (RFE), were tested on Support Vector Machine (SVM), K-Nearest Neighbor, and Random Forest (RF) classifiers. The results demonstrate that the accuracy of SVM and KNN classifiers are the most sensitive to FS. The RF appeared to be more robust to high dimensionality, although a significant increase in accuracy was found by using the RFE method. In terms of classification accuracy, SVM performed the best using FS, followed by RF and KNN. Finally, only a small number of features is needed to achieve the highest performance using each classifier. This study emphasizes the benefits of rigorous FS for maximizing performance, as well as for minimizing model complexity and interpretation.

  1. Combination of support vector machine, artificial neural network and random forest for improving the classification of convective and stratiform rain using spectral features of SEVIRI data

    NASA Astrophysics Data System (ADS)

    Lazri, Mourad; Ameur, Soltane

    2018-05-01

    A model combining three classifiers, namely Support vector machine, Artificial neural network and Random forest (SAR) is designed for improving the classification of convective and stratiform rain. This model (SAR model) has been trained and then tested on a datasets derived from MSG-SEVIRI (Meteosat Second Generation-Spinning Enhanced Visible and Infrared Imager). Well-classified, mid-classified and misclassified pixels are determined from the combination of three classifiers. Mid-classified and misclassified pixels that are considered unreliable pixels are reclassified by using a novel training of the developed scheme. In this novel training, only the input data corresponding to the pixels in question to are used. This whole process is repeated a second time and applied to mid-classified and misclassified pixels separately. Learning and validation of the developed scheme are realized against co-located data observed by ground radar. The developed scheme outperformed different classifiers used separately and reached 97.40% of overall accuracy of classification.

  2. An integrated classifier for computer-aided diagnosis of colorectal polyps based on random forest and location index strategies

    NASA Astrophysics Data System (ADS)

    Hu, Yifan; Han, Hao; Zhu, Wei; Li, Lihong; Pickhardt, Perry J.; Liang, Zhengrong

    2016-03-01

    Feature classification plays an important role in differentiation or computer-aided diagnosis (CADx) of suspicious lesions. As a widely used ensemble learning algorithm for classification, random forest (RF) has a distinguished performance for CADx. Our recent study has shown that the location index (LI), which is derived from the well-known kNN (k nearest neighbor) and wkNN (weighted k nearest neighbor) classifier [1], has also a distinguished role in the classification for CADx. Therefore, in this paper, based on the property that the LI will achieve a very high accuracy, we design an algorithm to integrate the LI into RF for improved or higher value of AUC (area under the curve of receiver operating characteristics -- ROC). Experiments were performed by the use of a database of 153 lesions (polyps), including 116 neoplastic lesions and 37 hyperplastic lesions, with comparison to the existing classifiers of RF and wkNN, respectively. A noticeable gain by the proposed integrated classifier was quantified by the AUC measure.

  3. Mapping Species Composition of Forests and Tree Plantations in Northeastern Costa Rica with an Integration of Hyperspectral and Multitemporal Landsat Imagery

    NASA Technical Reports Server (NTRS)

    Fagan, Matthew E.; Defries, Ruth S.; Sesnie, Steven E.; Arroyo-Mora, J. Pablo; Soto, Carlomagno; Singh, Aditya; Townsend, Philip A.; Chazdon, Robin L.

    2015-01-01

    An efficient means to map tree plantations is needed to detect tropical land use change and evaluate reforestation projects. To analyze recent tree plantation expansion in northeastern Costa Rica, we examined the potential of combining moderate-resolution hyperspectral imagery (2005 HyMap mosaic) with multitemporal, multispectral data (Landsat) to accurately classify (1) general forest types and (2) tree plantations by species composition. Following a linear discriminant analysis to reduce data dimensionality, we compared four Random Forest classification models: hyperspectral data (HD) alone; HD plus interannual spectral metrics; HD plus a multitemporal forest regrowth classification; and all three models combined. The fourth, combined model achieved overall accuracy of 88.5%. Adding multitemporal data significantly improved classification accuracy (p less than 0.0001) of all forest types, although the effect on tree plantation accuracy was modest. The hyperspectral data alone classified six species of tree plantations with 75% to 93% producer's accuracy; adding multitemporal spectral data increased accuracy only for two species with dense canopies. Non-native tree species had higher classification accuracy overall and made up the majority of tree plantations in this landscape. Our results indicate that combining occasionally acquired hyperspectral data with widely available multitemporal satellite imagery enhances mapping and monitoring of reforestation in tropical landscapes.

  4. Evaluating the Effectiveness of Flood Control Strategies in Contrasting Urban Watersheds and Implications for Houston's Future Flood Vulnerability

    NASA Astrophysics Data System (ADS)

    Ganguly, S.; Kumar, U.; Nemani, R. R.; Kalia, S.; Michaelis, A.

    2016-12-01

    In this work, we use a Fully Constrained Least Squares Subpixel Learning Algorithm to unmix global WELD (Web Enabled Landsat Data) to obtain fractions or abundances of substrate (S), vegetation (V) and dark objects (D) classes. Because of the sheer nature of data and compute needs, we leveraged the NASA Earth Exchange (NEX) high performance computing architecture to optimize and scale our algorithm for large-scale processing. Subsequently, the S-V-D abundance maps were characterized into 4 classes namely, forest, farmland, water and urban areas (with NPP-VIIRS - national polar orbiting partnership visible infrared imaging radiometer suite nighttime lights data) over California, USA using Random Forest classifier. Validation of these land cover maps with NLCD (National Land Cover Database) 2011 products and NAFD (North American Forest Dynamics) static forest cover maps showed that an overall classification accuracy of over 91% was achieved, which is a 6% improvement in unmixing based classification relative to per-pixel based classification. As such, abundance maps continue to offer an useful alternative to high-spatial resolution data derived classification maps for forest inventory analysis, multi-class mapping for eco-climatic models and applications, fast multi-temporal trend analysis and for societal and policy-relevant applications needed at the watershed scale.

  5. Quantifying the abundance of co-occurring conifers along Inland Northwest (USA) climate gradients

    Treesearch

    Gerald E. Rehfeldt; Dennis E. Ferguson; Nicholas L. Crookston

    2008-01-01

    The occurrence and abundance of conifers along climate gradients in the Inland Northwest (USA) was assessed using data from 5082 field plots, 81% of which were forested. Analyses using the Random Forests classification tree revealed that the sequential distribution of species along an altitudinal gradient could be predicted with reasonable accuracy from a single...

  6. Patterns among the ashes: Exploring the relationship between landscape pattern and the emerald ash borer

    Treesearch

    Susan J. Crocker; Dacia M. Meneguzzo; Greg C. Liknes

    2010-01-01

    Landscape metrics, including host abundance and population density, were calculated using forest inventory and land cover data to assess the relationship between landscape pattern and the presence or absence of the emerald ash borer (EAB) (Agrilus planipennis Fairmaire). The Random Forests classification algorithm in the R statistical environment was...

  7. Analysis and Recognition of Traditional Chinese Medicine Pulse Based on the Hilbert-Huang Transform and Random Forest in Patients with Coronary Heart Disease

    PubMed Central

    Wang, Yiqin; Yan, Hanxia; Yan, Jianjun; Yuan, Fengyin; Xu, Zhaoxia; Liu, Guoping; Xu, Wenjie

    2015-01-01

    Objective. This research provides objective and quantitative parameters of the traditional Chinese medicine (TCM) pulse conditions for distinguishing between patients with the coronary heart disease (CHD) and normal people by using the proposed classification approach based on Hilbert-Huang transform (HHT) and random forest. Methods. The energy and the sample entropy features were extracted by applying the HHT to TCM pulse by treating these pulse signals as time series. By using the random forest classifier, the extracted two types of features and their combination were, respectively, used as input data to establish classification model. Results. Statistical results showed that there were significant differences in the pulse energy and sample entropy between the CHD group and the normal group. Moreover, the energy features, sample entropy features, and their combination were inputted as pulse feature vectors; the corresponding average recognition rates were 84%, 76.35%, and 90.21%, respectively. Conclusion. The proposed approach could be appropriately used to analyze pulses of patients with CHD, which can lay a foundation for research on objective and quantitative criteria on disease diagnosis or Zheng differentiation. PMID:26180536

  8. Analysis and Recognition of Traditional Chinese Medicine Pulse Based on the Hilbert-Huang Transform and Random Forest in Patients with Coronary Heart Disease.

    PubMed

    Guo, Rui; Wang, Yiqin; Yan, Hanxia; Yan, Jianjun; Yuan, Fengyin; Xu, Zhaoxia; Liu, Guoping; Xu, Wenjie

    2015-01-01

    Objective. This research provides objective and quantitative parameters of the traditional Chinese medicine (TCM) pulse conditions for distinguishing between patients with the coronary heart disease (CHD) and normal people by using the proposed classification approach based on Hilbert-Huang transform (HHT) and random forest. Methods. The energy and the sample entropy features were extracted by applying the HHT to TCM pulse by treating these pulse signals as time series. By using the random forest classifier, the extracted two types of features and their combination were, respectively, used as input data to establish classification model. Results. Statistical results showed that there were significant differences in the pulse energy and sample entropy between the CHD group and the normal group. Moreover, the energy features, sample entropy features, and their combination were inputted as pulse feature vectors; the corresponding average recognition rates were 84%, 76.35%, and 90.21%, respectively. Conclusion. The proposed approach could be appropriately used to analyze pulses of patients with CHD, which can lay a foundation for research on objective and quantitative criteria on disease diagnosis or Zheng differentiation.

  9. On the classification techniques in data mining for microarray data classification

    NASA Astrophysics Data System (ADS)

    Aydadenta, Husna; Adiwijaya

    2018-03-01

    Cancer is one of the deadly diseases, according to data from WHO by 2015 there are 8.8 million more deaths caused by cancer, and this will increase every year if not resolved earlier. Microarray data has become one of the most popular cancer-identification studies in the field of health, since microarray data can be used to look at levels of gene expression in certain cell samples that serve to analyze thousands of genes simultaneously. By using data mining technique, we can classify the sample of microarray data thus it can be identified with cancer or not. In this paper we will discuss some research using some data mining techniques using microarray data, such as Support Vector Machine (SVM), Artificial Neural Network (ANN), Naive Bayes, k-Nearest Neighbor (kNN), and C4.5, and simulation of Random Forest algorithm with technique of reduction dimension using Relief. The result of this paper show performance measure (accuracy) from classification algorithm (SVM, ANN, Naive Bayes, kNN, C4.5, and Random Forets).The results in this paper show the accuracy of Random Forest algorithm higher than other classification algorithms (Support Vector Machine (SVM), Artificial Neural Network (ANN), Naive Bayes, k-Nearest Neighbor (kNN), and C4.5). It is hoped that this paper can provide some information about the speed, accuracy, performance and computational cost generated from each Data Mining Classification Technique based on microarray data.

  10. Random Forest-Based Recognition of Isolated Sign Language Subwords Using Data from Accelerometers and Surface Electromyographic Sensors.

    PubMed

    Su, Ruiliang; Chen, Xiang; Cao, Shuai; Zhang, Xu

    2016-01-14

    Sign language recognition (SLR) has been widely used for communication amongst the hearing-impaired and non-verbal community. This paper proposes an accurate and robust SLR framework using an improved decision tree as the base classifier of random forests. This framework was used to recognize Chinese sign language subwords using recordings from a pair of portable devices worn on both arms consisting of accelerometers (ACC) and surface electromyography (sEMG) sensors. The experimental results demonstrated the validity of the proposed random forest-based method for recognition of Chinese sign language (CSL) subwords. With the proposed method, 98.25% average accuracy was obtained for the classification of a list of 121 frequently used CSL subwords. Moreover, the random forests method demonstrated a superior performance in resisting the impact of bad training samples. When the proportion of bad samples in the training set reached 50%, the recognition error rate of the random forest-based method was only 10.67%, while that of a single decision tree adopted in our previous work was almost 27.5%. Our study offers a practical way of realizing a robust and wearable EMG-ACC-based SLR systems.

  11. Tree species classification in subtropical forests using small-footprint full-waveform LiDAR data

    NASA Astrophysics Data System (ADS)

    Cao, Lin; Coops, Nicholas C.; Innes, John L.; Dai, Jinsong; Ruan, Honghua; She, Guanghui

    2016-07-01

    The accurate classification of tree species is critical for the management of forest ecosystems, particularly subtropical forests, which are highly diverse and complex ecosystems. While airborne Light Detection and Ranging (LiDAR) technology offers significant potential to estimate forest structural attributes, the capacity of this new tool to classify species is less well known. In this research, full-waveform metrics were extracted by a voxel-based composite waveform approach and examined with a Random Forests classifier to discriminate six subtropical tree species (i.e., Masson pine (Pinus massoniana Lamb.)), Chinese fir (Cunninghamia lanceolata (Lamb.) Hook.), Slash pines (Pinus elliottii Engelm.), Sawtooth oak (Quercus acutissima Carruth.) and Chinese holly (Ilex chinensis Sims.) at three levels of discrimination. As part of the analysis, the optimal voxel size for modelling the composite waveforms was investigated, the most important predictor metrics for species classification assessed and the effect of scan angle on species discrimination examined. Results demonstrate that all tree species were classified with relatively high accuracy (68.6% for six classes, 75.8% for four main species and 86.2% for conifers and broadleaved trees). Full-waveform metrics (based on height of median energy, waveform distance and number of waveform peaks) demonstrated high classification importance and were stable among various voxel sizes. The results also suggest that the voxel based approach can alleviate some of the issues associated with large scan angles. In summary, the results indicate that full-waveform LIDAR data have significant potential for tree species classification in the subtropical forests.

  12. Modeling Verdict Outcomes Using Social Network Measures: The Watergate and Caviar Network Cases.

    PubMed

    Masías, Víctor Hugo; Valle, Mauricio; Morselli, Carlo; Crespo, Fernando; Vargas, Augusto; Laengle, Sigifredo

    2016-01-01

    Modelling criminal trial verdict outcomes using social network measures is an emerging research area in quantitative criminology. Few studies have yet analyzed which of these measures are the most important for verdict modelling or which data classification techniques perform best for this application. To compare the performance of different techniques in classifying members of a criminal network, this article applies three different machine learning classifiers-Logistic Regression, Naïve Bayes and Random Forest-with a range of social network measures and the necessary databases to model the verdicts in two real-world cases: the U.S. Watergate Conspiracy of the 1970's and the now-defunct Canada-based international drug trafficking ring known as the Caviar Network. In both cases it was found that the Random Forest classifier did better than either Logistic Regression or Naïve Bayes, and its superior performance was statistically significant. This being so, Random Forest was used not only for classification but also to assess the importance of the measures. For the Watergate case, the most important one proved to be betweenness centrality while for the Caviar Network, it was the effective size of the network. These results are significant because they show that an approach combining machine learning with social network analysis not only can generate accurate classification models but also helps quantify the importance social network variables in modelling verdict outcomes. We conclude our analysis with a discussion and some suggestions for future work in verdict modelling using social network measures.

  13. Aggregation of Sentinel-2 time series classifications as a solution for multitemporal analysis

    NASA Astrophysics Data System (ADS)

    Lewiński, Stanislaw; Nowakowski, Artur; Malinowski, Radek; Rybicki, Marcin; Kukawska, Ewa; Krupiński, Michał

    2017-10-01

    The general aim of this work was to elaborate efficient and reliable aggregation method that could be used for creating a land cover map at a global scale from multitemporal satellite imagery. The study described in this paper presents methods for combining results of land cover/land use classifications performed on single-date Sentinel-2 images acquired at different time periods. For that purpose different aggregation methods were proposed and tested on study sites spread on different continents. The initial classifications were performed with Random Forest classifier on individual Sentinel-2 images from a time series. In the following step the resulting land cover maps were aggregated pixel by pixel using three different combinations of information on the number of occurrences of a certain land cover class within a time series and the posterior probability of particular classes resulting from the Random Forest classification. From the proposed methods two are shown superior and in most cases were able to reach or outperform the accuracy of the best individual classifications of single-date images. Moreover, the aggregations results are very stable when used on data with varying cloudiness. They also enable to reduce considerably the number of cloudy pixels in the resulting land cover map what is significant advantage for mapping areas with frequent cloud coverage.

  14. Application of Machine Learning Approaches for Classifying Sitting Posture Based on Force and Acceleration Sensors.

    PubMed

    Zemp, Roland; Tanadini, Matteo; Plüss, Stefan; Schnüriger, Karin; Singh, Navrag B; Taylor, William R; Lorenzetti, Silvio

    2016-01-01

    Occupational musculoskeletal disorders, particularly chronic low back pain (LBP), are ubiquitous due to prolonged static sitting or nonergonomic sitting positions. Therefore, the aim of this study was to develop an instrumented chair with force and acceleration sensors to determine the accuracy of automatically identifying the user's sitting position by applying five different machine learning methods (Support Vector Machines, Multinomial Regression, Boosting, Neural Networks, and Random Forest). Forty-one subjects were requested to sit four times in seven different prescribed sitting positions (total 1148 samples). Sixteen force sensor values and the backrest angle were used as the explanatory variables (features) for the classification. The different classification methods were compared by means of a Leave-One-Out cross-validation approach. The best performance was achieved using the Random Forest classification algorithm, producing a mean classification accuracy of 90.9% for subjects with which the algorithm was not familiar. The classification accuracy varied between 81% and 98% for the seven different sitting positions. The present study showed the possibility of accurately classifying different sitting positions by means of the introduced instrumented office chair combined with machine learning analyses. The use of such novel approaches for the accurate assessment of chair usage could offer insights into the relationships between sitting position, sitting behaviour, and the occurrence of musculoskeletal disorders.

  15. Evaluating the statistical performance of less applied algorithms in classification of worldview-3 imagery data in an urbanized landscape

    NASA Astrophysics Data System (ADS)

    Ranaie, Mehrdad; Soffianian, Alireza; Pourmanafi, Saeid; Mirghaffari, Noorollah; Tarkesh, Mostafa

    2018-03-01

    In recent decade, analyzing the remotely sensed imagery is considered as one of the most common and widely used procedures in the environmental studies. In this case, supervised image classification techniques play a central role. Hence, taking a high resolution Worldview-3 over a mixed urbanized landscape in Iran, three less applied image classification methods including Bagged CART, Stochastic gradient boosting model and Neural network with feature extraction were tested and compared with two prevalent methods: random forest and support vector machine with linear kernel. To do so, each method was run ten time and three validation techniques was used to estimate the accuracy statistics consist of cross validation, independent validation and validation with total of train data. Moreover, using ANOVA and Tukey test, statistical difference significance between the classification methods was significantly surveyed. In general, the results showed that random forest with marginal difference compared to Bagged CART and stochastic gradient boosting model is the best performing method whilst based on independent validation there was no significant difference between the performances of classification methods. It should be finally noted that neural network with feature extraction and linear support vector machine had better processing speed than other.

  16. Gender classification in low-resolution surveillance video: in-depth comparison of random forests and SVMs

    NASA Astrophysics Data System (ADS)

    Geelen, Christopher D.; Wijnhoven, Rob G. J.; Dubbelman, Gijs; de With, Peter H. N.

    2015-03-01

    This research considers gender classification in surveillance environments, typically involving low-resolution images and a large amount of viewpoint variations and occlusions. Gender classification is inherently difficult due to the large intra-class variation and interclass correlation. We have developed a gender classification system, which is successfully evaluated on two novel datasets, which realistically consider the above conditions, typical for surveillance. The system reaches a mean accuracy of up to 90% and approaches our human baseline of 92.6%, proving a high-quality gender classification system. We also present an in-depth discussion of the fundamental differences between SVM and RF classifiers. We conclude that balancing the degree of randomization in any classifier is required for the highest classification accuracy. For our problem, an RF-SVM hybrid classifier exploiting the combination of HSV and LBP features results in the highest classification accuracy of 89.9 0.2%, while classification computation time is negligible compared to the detection time of pedestrians.

  17. Automatic classification of time-variable X-ray sources

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lo, Kitty K.; Farrell, Sean; Murphy, Tara

    2014-05-01

    To maximize the discovery potential of future synoptic surveys, especially in the field of transient science, it will be necessary to use automatic classification to identify some of the astronomical sources. The data mining technique of supervised classification is suitable for this problem. Here, we present a supervised learning method to automatically classify variable X-ray sources in the Second XMM-Newton Serendipitous Source Catalog (2XMMi-DR2). Random Forest is our classifier of choice since it is one of the most accurate learning algorithms available. Our training set consists of 873 variable sources and their features are derived from time series, spectra, andmore » other multi-wavelength contextual information. The 10 fold cross validation accuracy of the training data is ∼97% on a 7 class data set. We applied the trained classification model to 411 unknown variable 2XMM sources to produce a probabilistically classified catalog. Using the classification margin and the Random Forest derived outlier measure, we identified 12 anomalous sources, of which 2XMM J180658.7–500250 appears to be the most unusual source in the sample. Its X-ray spectra is suggestive of a ultraluminous X-ray source but its variability makes it highly unusual. Machine-learned classification and anomaly detection will facilitate scientific discoveries in the era of all-sky surveys.« less

  18. Integrating human and machine intelligence in galaxy morphology classification tasks

    NASA Astrophysics Data System (ADS)

    Beck, Melanie R.; Scarlata, Claudia; Fortson, Lucy F.; Lintott, Chris J.; Simmons, B. D.; Galloway, Melanie A.; Willett, Kyle W.; Dickinson, Hugh; Masters, Karen L.; Marshall, Philip J.; Wright, Darryl

    2018-06-01

    Quantifying galaxy morphology is a challenging yet scientifically rewarding task. As the scale of data continues to increase with upcoming surveys, traditional classification methods will struggle to handle the load. We present a solution through an integration of visual and automated classifications, preserving the best features of both human and machine. We demonstrate the effectiveness of such a system through a re-analysis of visual galaxy morphology classifications collected during the Galaxy Zoo 2 (GZ2) project. We reprocess the top-level question of the GZ2 decision tree with a Bayesian classification aggregation algorithm dubbed SWAP, originally developed for the Space Warps gravitational lens project. Through a simple binary classification scheme, we increase the classification rate nearly 5-fold classifying 226 124 galaxies in 92 d of GZ2 project time while reproducing labels derived from GZ2 classification data with 95.7 per cent accuracy. We next combine this with a Random Forest machine learning algorithm that learns on a suite of non-parametric morphology indicators widely used for automated morphologies. We develop a decision engine that delegates tasks between human and machine and demonstrate that the combined system provides at least a factor of 8 increase in the classification rate, classifying 210 803 galaxies in just 32 d of GZ2 project time with 93.1 per cent accuracy. As the Random Forest algorithm requires a minimal amount of computational cost, this result has important implications for galaxy morphology identification tasks in the era of Euclid and other large-scale surveys.

  19. Comparison of Hyperspectral and Multispectral Satellites for Forest Alliance Classification in the San Francisco Bay Area

    NASA Astrophysics Data System (ADS)

    Clark, M. L.

    2016-12-01

    The goal of this study was to assess multi-temporal, Hyperspectral Infrared Imager (HyspIRI) satellite imagery for improved forest class mapping relative to multispectral satellites. The study area was the western San Francisco Bay Area, California and forest alliances (e.g., forest communities defined by dominant or co-dominant trees) were defined using the U.S. National Vegetation Classification System. Simulated 30-m HyspIRI, Landsat 8 and Sentinel-2 imagery were processed from image data acquired by NASA's AVIRIS airborne sensor in year 2015, with summer and multi-temporal (spring, summer, fall) data analyzed separately. HyspIRI reflectance was used to generate a suite of hyperspectral metrics that targeted key spectral features related to chemical and structural properties. The Random Forests classifier was applied to the simulated images and overall accuracies (OA) were compared to those from real Landsat 8 images. For each image group, broad land cover (e.g., Needle-leaf Trees, Broad-leaf Trees, Annual agriculture, Herbaceous, Built-up) was classified first, followed by a finer-detail forest alliance classification for pixels mapped as closed-canopy forest. There were 5 needle-leaf tree alliances and 16 broad-leaf tree alliances, including 7 Quercus (oak) alliance types. No forest alliance classification exceeded 50% OA, indicating that there was broad spectral similarity among alliances, most of which were not spectrally pure but rather a mix of tree species. In general, needle-leaf (Pine, Redwood, Douglas Fir) alliances had better class accuracies than broad-leaf alliances (Oaks, Madrone, Bay Laurel, Buckeye, etc). Multi-temporal data classifications all had 5-6% greater OA than with comparable summer data. For simulated data, HyspIRI metrics had 4-5% greater OA than Landsat 8 and Sentinel-2 multispectral imagery and 3-4% greater OA than HyspIRI reflectance. Finally, HyspIRI metrics had 8% greater OA than real Landsat 8 imagery. In conclusion, forest alliance classification was found to be a difficult remote sensing application with moderate resolution (30 m) satellite imagery; however, of the data tested, HyspIRI spectral metrics had the best performance relative to multispectral satellites.

  20. Mapping ecological systems with a random foret model: tradeoffs between errors and bias

    Treesearch

    Emilie Grossmann; Janet Ohmann; James Kagan; Heather May; Matthew Gregory

    2010-01-01

    New methods for predictive vegetation mapping allow improved estimations of plant community composition across large regions. Random Forest (RF) models limit over-fitting problems of other methods, and are known for making accurate classification predictions from noisy, nonnormal data, but can be biased when plot samples are unbalanced. We developed two contrasting...

  1. A k-mer-based barcode DNA classification methodology based on spectral representation and a neural gas network.

    PubMed

    Fiannaca, Antonino; La Rosa, Massimo; Rizzo, Riccardo; Urso, Alfonso

    2015-07-01

    In this paper, an alignment-free method for DNA barcode classification that is based on both a spectral representation and a neural gas network for unsupervised clustering is proposed. In the proposed methodology, distinctive words are identified from a spectral representation of DNA sequences. A taxonomic classification of the DNA sequence is then performed using the sequence signature, i.e., the smallest set of k-mers that can assign a DNA sequence to its proper taxonomic category. Experiments were then performed to compare our method with other supervised machine learning classification algorithms, such as support vector machine, random forest, ripper, naïve Bayes, ridor, and classification tree, which also consider short DNA sequence fragments of 200 and 300 base pairs (bp). The experimental tests were conducted over 10 real barcode datasets belonging to different animal species, which were provided by the on-line resource "Barcode of Life Database". The experimental results showed that our k-mer-based approach is directly comparable, in terms of accuracy, recall and precision metrics, with the other classifiers when considering full-length sequences. In addition, we demonstrate the robustness of our method when a classification is performed task with a set of short DNA sequences that were randomly extracted from the original data. For example, the proposed method can reach the accuracy of 64.8% at the species level with 200-bp fragments. Under the same conditions, the best other classifier (random forest) reaches the accuracy of 20.9%. Our results indicate that we obtained a clear improvement over the other classifiers for the study of short DNA barcode sequence fragments. Copyright © 2015 Elsevier B.V. All rights reserved.

  2. Comparative genetic responses to climate for the varieties of Pinus ponderosa and Pseudotsuga menziesii: realized climate niches

    Treesearch

    Gerald E. Rehfeldt; Barry C. Jaquish; Javier Lopez-Upton; Cuauhtemoc Saenz-Romero; J. Bradley St Clair; Laura P. Leites; Dennis G. Joyce

    2014-01-01

    The Random Forests classification algorithm was used to predict the occurrence of the realized climate niche for two sub-specific varieties of Pinus ponderosa and three varieties of Pseudotsuga menziesii from presence-absence data in forest inventory ground plots. Analyses were based on ca. 271,000 observations for P. ponderosa and ca. 426,000 observations for P....

  3. Dynamic species classification of microorganisms across time, abiotic and biotic environments—A sliding window approach

    PubMed Central

    Griffiths, Jason I.; Fronhofer, Emanuel A.; Garnier, Aurélie; Seymour, Mathew; Altermatt, Florian; Petchey, Owen L.

    2017-01-01

    The development of video-based monitoring methods allows for rapid, dynamic and accurate monitoring of individuals or communities, compared to slower traditional methods, with far reaching ecological and evolutionary applications. Large amounts of data are generated using video-based methods, which can be effectively processed using machine learning (ML) algorithms into meaningful ecological information. ML uses user defined classes (e.g. species), derived from a subset (i.e. training data) of video-observed quantitative features (e.g. phenotypic variation), to infer classes in subsequent observations. However, phenotypic variation often changes due to environmental conditions, which may lead to poor classification, if environmentally induced variation in phenotypes is not accounted for. Here we describe a framework for classifying species under changing environmental conditions based on the random forest classification. A sliding window approach was developed that restricts temporal and environmentally conditions to improve the classification. We tested our approach by applying the classification framework to experimental data. The experiment used a set of six ciliate species to monitor changes in community structure and behavior over hundreds of generations, in dozens of species combinations and across a temperature gradient. Differences in biotic and abiotic conditions caused simplistic classification approaches to be unsuccessful. In contrast, the sliding window approach allowed classification to be highly successful, as phenotypic differences driven by environmental change, could be captured by the classifier. Importantly, classification using the random forest algorithm showed comparable success when validated against traditional, slower, manual identification. Our framework allows for reliable classification in dynamic environments, and may help to improve strategies for long-term monitoring of species in changing environments. Our classification pipeline can be applied in fields assessing species community dynamics, such as eco-toxicology, ecology and evolutionary ecology. PMID:28472193

  4. North American vegetation model for land-use planning in a changing climate: A solution to large classification problems

    Treesearch

    Gerald E. Rehfeldt; Nicholas L. Crookston; Cuauhtemoc Saenz-Romero; Elizabeth M. Campbell

    2012-01-01

    Data points intensively sampling 46 North American biomes were used to predict the geographic distribution of biomes from climate variables using the Random Forests classification tree. Techniques were incorporated to accommodate a large number of classes and to predict the future occurrence of climates beyond the contemporary climatic range of the biomes. Errors of...

  5. Time Series of Images to Improve Tree Species Classification

    NASA Astrophysics Data System (ADS)

    Miyoshi, G. T.; Imai, N. N.; de Moraes, M. V. A.; Tommaselli, A. M. G.; Näsi, R.

    2017-10-01

    Tree species classification provides valuable information to forest monitoring and management. The high floristic variation of the tree species appears as a challenging issue in the tree species classification because the vegetation characteristics changes according to the season. To help to monitor this complex environment, the imaging spectroscopy has been largely applied since the development of miniaturized sensors attached to Unmanned Aerial Vehicles (UAV). Considering the seasonal changes in forests and the higher spectral and spatial resolution acquired with sensors attached to UAV, we present the use of time series of images to classify four tree species. The study area is an Atlantic Forest area located in the western part of São Paulo State. Images were acquired in August 2015 and August 2016, generating three data sets of images: only with the image spectra of 2015; only with the image spectra of 2016; with the layer stacking of images from 2015 and 2016. Four tree species were classified using Spectral angle mapper (SAM), Spectral information divergence (SID) and Random Forest (RF). The results showed that SAM and SID caused an overfitting of the data whereas RF showed better results and the use of the layer stacking improved the classification achieving a kappa coefficient of 18.26 %.

  6. Mapping permafrost in the boreal forest with Thematic Mapper satellite data

    NASA Technical Reports Server (NTRS)

    Morrissey, L. A.; Strong, L. L.; Card, D. H.

    1986-01-01

    A geographic data base incorporating Landsat TM data was used to develop and evaluate logistic discriminant functions for predicting the distribution of permafrost in a boreal forest watershed. The data base included both satellite-derived information and ancillary map data. Five permafrost classifications were developed from a stratified random sample of the data base and evaluated by comparison with a photo-interpreted permafrost map using contingency table analysis and soil temperatures recorded at sites within the watershed. A classification using a TM thermal band and a TM-derived vegetation map as independent variables yielded the highest mapping accuracy for all permafrost categories.

  7. Classification of coronary artery tissues using optical coherence tomography imaging in Kawasaki disease

    NASA Astrophysics Data System (ADS)

    Abdolmanafi, Atefeh; Prasad, Arpan Suravi; Duong, Luc; Dahdah, Nagib

    2016-03-01

    Intravascular imaging modalities, such as Optical Coherence Tomography (OCT) allow nowadays improving diagnosis, treatment, follow-up, and even prevention of coronary artery disease in the adult. OCT has been recently used in children following Kawasaki disease (KD), the most prevalent acquired coronary artery disease during childhood with devastating complications. The assessment of coronary artery layers with OCT and early detection of coronary sequelae secondary to KD is a promising tool for preventing myocardial infarction in this population. More importantly, OCT is promising for tissue quantification of the inner vessel wall, including neo intima luminal myofibroblast proliferation, calcification, and fibrous scar deposits. The goal of this study is to classify the coronary artery layers of OCT imaging obtained from a series of KD patients. Our approach is focused on developing a robust Random Forest classifier built on the idea of randomly selecting a subset of features at each node and based on second- and higher-order statistical texture analysis which estimates the gray-level spatial distribution of images by specifying the local features of each pixel and extracting the statistics from their distribution. The average classification accuracy for intima and media are 76.36% and 73.72% respectively. Random forest classifier with texture analysis promises for classification of coronary artery tissue.

  8. Comparison of the Predictive Performance and Interpretability of Random Forest and Linear Models on Benchmark Data Sets.

    PubMed

    Marchese Robinson, Richard L; Palczewska, Anna; Palczewski, Jan; Kidley, Nathan

    2017-08-28

    The ability to interpret the predictions made by quantitative structure-activity relationships (QSARs) offers a number of advantages. While QSARs built using nonlinear modeling approaches, such as the popular Random Forest algorithm, might sometimes be more predictive than those built using linear modeling approaches, their predictions have been perceived as difficult to interpret. However, a growing number of approaches have been proposed for interpreting nonlinear QSAR models in general and Random Forest in particular. In the current work, we compare the performance of Random Forest to those of two widely used linear modeling approaches: linear Support Vector Machines (SVMs) (or Support Vector Regression (SVR)) and partial least-squares (PLS). We compare their performance in terms of their predictivity as well as the chemical interpretability of the predictions using novel scoring schemes for assessing heat map images of substructural contributions. We critically assess different approaches for interpreting Random Forest models as well as for obtaining predictions from the forest. We assess the models on a large number of widely employed public-domain benchmark data sets corresponding to regression and binary classification problems of relevance to hit identification and toxicology. We conclude that Random Forest typically yields comparable or possibly better predictive performance than the linear modeling approaches and that its predictions may also be interpreted in a chemically and biologically meaningful way. In contrast to earlier work looking at interpretation of nonlinear QSAR models, we directly compare two methodologically distinct approaches for interpreting Random Forest models. The approaches for interpreting Random Forest assessed in our article were implemented using open-source programs that we have made available to the community. These programs are the rfFC package ( https://r-forge.r-project.org/R/?group_id=1725 ) for the R statistical programming language and the Python program HeatMapWrapper [ https://doi.org/10.5281/zenodo.495163 ] for heat map generation.

  9. SNP selection and classification of genome-wide SNP data using stratified sampling random forests.

    PubMed

    Wu, Qingyao; Ye, Yunming; Liu, Yang; Ng, Michael K

    2012-09-01

    For high dimensional genome-wide association (GWA) case-control data of complex disease, there are usually a large portion of single-nucleotide polymorphisms (SNPs) that are irrelevant with the disease. A simple random sampling method in random forest using default mtry parameter to choose feature subspace, will select too many subspaces without informative SNPs. Exhaustive searching an optimal mtry is often required in order to include useful and relevant SNPs and get rid of vast of non-informative SNPs. However, it is too time-consuming and not favorable in GWA for high-dimensional data. The main aim of this paper is to propose a stratified sampling method for feature subspace selection to generate decision trees in a random forest for GWA high-dimensional data. Our idea is to design an equal-width discretization scheme for informativeness to divide SNPs into multiple groups. In feature subspace selection, we randomly select the same number of SNPs from each group and combine them to form a subspace to generate a decision tree. The advantage of this stratified sampling procedure can make sure each subspace contains enough useful SNPs, but can avoid a very high computational cost of exhaustive search of an optimal mtry, and maintain the randomness of a random forest. We employ two genome-wide SNP data sets (Parkinson case-control data comprised of 408 803 SNPs and Alzheimer case-control data comprised of 380 157 SNPs) to demonstrate that the proposed stratified sampling method is effective, and it can generate better random forest with higher accuracy and lower error bound than those by Breiman's random forest generation method. For Parkinson data, we also show some interesting genes identified by the method, which may be associated with neurological disorders for further biological investigations.

  10. Accuracy assessments and areal estimates using two-phase stratified random sampling, cluster plots, and the multivariate composite estimator

    Treesearch

    Raymond L. Czaplewski

    2000-01-01

    Consider the following example of an accuracy assessment. Landsat data are used to build a thematic map of land cover for a multicounty region. The map classifier (e.g., a supervised classification algorithm) assigns each pixel into one category of land cover. The classification system includes 12 different types of forest and land cover: black spruce, balsam fir,...

  11. Integration of spectral, spatial and morphometric data into lithological mapping: A comparison of different Machine Learning Algorithms in the Kurdistan Region, NE Iraq

    NASA Astrophysics Data System (ADS)

    Othman, Arsalan A.; Gloaguen, Richard

    2017-09-01

    Lithological mapping in mountainous regions is often impeded by limited accessibility due to relief. This study aims to evaluate (1) the performance of different supervised classification approaches using remote sensing data and (2) the use of additional information such as geomorphology. We exemplify the methodology in the Bardi-Zard area in NE Iraq, a part of the Zagros Fold - Thrust Belt, known for its chromite deposits. We highlighted the improvement of remote sensing geological classification by integrating geomorphic features and spatial information in the classification scheme. We performed a Maximum Likelihood (ML) classification method besides two Machine Learning Algorithms (MLA): Support Vector Machine (SVM) and Random Forest (RF) to allow the joint use of geomorphic features, Band Ratio (BR), Principal Component Analysis (PCA), spatial information (spatial coordinates) and multispectral data of the Advanced Space-borne Thermal Emission and Reflection radiometer (ASTER) satellite. The RF algorithm showed reliable results and discriminated serpentinite, talus and terrace deposits, red argillites with conglomerates and limestone, limy conglomerates and limestone conglomerates, tuffites interbedded with basic lavas, limestone and Metamorphosed limestone and reddish green shales. The best overall accuracy (∼80%) was achieved by Random Forest (RF) algorithms in the majority of the sixteen tested combination datasets.

  12. Ensemble Methods for Classification of Physical Activities from Wrist Accelerometry.

    PubMed

    Chowdhury, Alok Kumar; Tjondronegoro, Dian; Chandran, Vinod; Trost, Stewart G

    2017-09-01

    To investigate whether the use of ensemble learning algorithms improve physical activity recognition accuracy compared to the single classifier algorithms, and to compare the classification accuracy achieved by three conventional ensemble machine learning methods (bagging, boosting, random forest) and a custom ensemble model comprising four algorithms commonly used for activity recognition (binary decision tree, k nearest neighbor, support vector machine, and neural network). The study used three independent data sets that included wrist-worn accelerometer data. For each data set, a four-step classification framework consisting of data preprocessing, feature extraction, normalization and feature selection, and classifier training and testing was implemented. For the custom ensemble, decisions from the single classifiers were aggregated using three decision fusion methods: weighted majority vote, naïve Bayes combination, and behavior knowledge space combination. Classifiers were cross-validated using leave-one subject out cross-validation and compared on the basis of average F1 scores. In all three data sets, ensemble learning methods consistently outperformed the individual classifiers. Among the conventional ensemble methods, random forest models provided consistently high activity recognition; however, the custom ensemble model using weighted majority voting demonstrated the highest classification accuracy in two of the three data sets. Combining multiple individual classifiers using conventional or custom ensemble learning methods can improve activity recognition accuracy from wrist-worn accelerometer data.

  13. A neural network approach for enhancing information extraction from multispectral image data

    USGS Publications Warehouse

    Liu, J.; Shao, G.; Zhu, H.; Liu, S.

    2005-01-01

    A back-propagation artificial neural network (ANN) was applied to classify multispectral remote sensing imagery data. The classification procedure included four steps: (i) noisy training that adds minor random variations to the sampling data to make the data more representative and to reduce the training sample size; (ii) iterative or multi-tier classification that reclassifies the unclassified pixels by making a subset of training samples from the original training set, which means the neural model can focus on fewer classes; (iii) spectral channel selection based on neural network weights that can distinguish the relative importance of each channel in the classification process to simplify the ANN model; and (iv) voting rules that adjust the accuracy of classification and produce outputs of different confidence levels. The Purdue Forest, located west of Purdue University, West Lafayette, Indiana, was chosen as the test site. The 1992 Landsat thematic mapper imagery was used as the input data. High-quality airborne photographs of the same Lime period were used for the ground truth. A total of 11 land use and land cover classes were defined, including water, broadleaved forest, coniferous forest, young forest, urban and road, and six types of cropland-grassland. The experiment, indicated that the back-propagation neural network application was satisfactory in distinguishing different land cover types at US Geological Survey levels II-III. The single-tier classification reached an overall accuracy of 85%. and the multi-tier classification an overall accuracy of 95%. For the whole test, region, the final output of this study reached an overall accuracy of 87%. ?? 2005 CASI.

  14. Foliar and woody materials discriminated using terrestrial LiDAR in a mixed natural forest

    NASA Astrophysics Data System (ADS)

    Zhu, Xi; Skidmore, Andrew K.; Darvishzadeh, Roshanak; Niemann, K. Olaf; Liu, Jing; Shi, Yifang; Wang, Tiejun

    2018-02-01

    Separation of foliar and woody materials using remotely sensed data is crucial for the accurate estimation of leaf area index (LAI) and woody biomass across forest stands. In this paper, we present a new method to accurately separate foliar and woody materials using terrestrial LiDAR point clouds obtained from ten test sites in a mixed forest in Bavarian Forest National Park, Germany. Firstly, we applied and compared an adaptive radius near-neighbor search algorithm with a fixed radius near-neighbor search method in order to obtain both radiometric and geometric features derived from terrestrial LiDAR point clouds. Secondly, we used a random forest machine learning algorithm to classify foliar and woody materials and examined the impact of understory and slope on the classification accuracy. An average overall accuracy of 84.4% (Kappa = 0.75) was achieved across all experimental plots. The adaptive radius near-neighbor search method outperformed the fixed radius near-neighbor search method. The classification accuracy was significantly higher when the combination of both radiometric and geometric features was utilized. The analysis showed that increasing slope and understory coverage had a significant negative effect on the overall classification accuracy. Our results suggest that the utilization of the adaptive radius near-neighbor search method coupling both radiometric and geometric features has the potential to accurately discriminate foliar and woody materials from terrestrial LiDAR data in a mixed natural forest.

  15. Testing random forest classification for identifying lava flows and mapping age groups on a single Landsat 8 image

    NASA Astrophysics Data System (ADS)

    Li, Long; Solana, Carmen; Canters, Frank; Kervyn, Matthieu

    2017-10-01

    Mapping lava flows using satellite images is an important application of remote sensing in volcanology. Several volcanoes have been mapped through remote sensing using a wide range of data, from optical to thermal infrared and radar images, using techniques such as manual mapping, supervised/unsupervised classification, and elevation subtraction. So far, spectral-based mapping applications mainly focus on the use of traditional pixel-based classifiers, without much investigation into the added value of object-based approaches and into advantages of using machine learning algorithms. In this study, Nyamuragira, characterized by a series of > 20 overlapping lava flows erupted over the last century, was used as a case study. The random forest classifier was tested to map lava flows based on pixels and objects. Image classification was conducted for the 20 individual flows and for 8 groups of flows of similar age using a Landsat 8 image and a DEM of the volcano, both at 30-meter spatial resolution. Results show that object-based classification produces maps with continuous and homogeneous lava surfaces, in agreement with the physical characteristics of lava flows, while lava flows mapped through the pixel-based classification are heterogeneous and fragmented including much "salt and pepper noise". In terms of accuracy, both pixel-based and object-based classification performs well but the former results in higher accuracies than the latter except for mapping lava flow age groups without using topographic features. It is concluded that despite spectral similarity, lava flows of contrasting age can be well discriminated and mapped by means of image classification. The classification approach demonstrated in this study only requires easily accessible image data and can be applied to other volcanoes as well if there is sufficient information to calibrate the mapping.

  16. Classification of large-sized hyperspectral imagery using fast machine learning algorithms

    NASA Astrophysics Data System (ADS)

    Xia, Junshi; Yokoya, Naoto; Iwasaki, Akira

    2017-07-01

    We present a framework of fast machine learning algorithms in the context of large-sized hyperspectral images classification from the theoretical to a practical viewpoint. In particular, we assess the performance of random forest (RF), rotation forest (RoF), and extreme learning machine (ELM) and the ensembles of RF and ELM. These classifiers are applied to two large-sized hyperspectral images and compared to the support vector machines. To give the quantitative analysis, we pay attention to comparing these methods when working with high input dimensions and a limited/sufficient training set. Moreover, other important issues such as the computational cost and robustness against the noise are also discussed.

  17. Ensemble of random forests One vs. Rest classifiers for MCI and AD prediction using ANOVA cortical and subcortical feature selection and partial least squares.

    PubMed

    Ramírez, J; Górriz, J M; Ortiz, A; Martínez-Murcia, F J; Segovia, F; Salas-Gonzalez, D; Castillo-Barnes, D; Illán, I A; Puntonet, C G

    2018-05-15

    Alzheimer's disease (AD) is the most common cause of dementia in the elderly and affects approximately 30 million individuals worldwide. Mild cognitive impairment (MCI) is very frequently a prodromal phase of AD, and existing studies have suggested that people with MCI tend to progress to AD at a rate of about 10-15% per year. However, the ability of clinicians and machine learning systems to predict AD based on MRI biomarkers at an early stage is still a challenging problem that can have a great impact in improving treatments. The proposed system, developed by the SiPBA-UGR team for this challenge, is based on feature standardization, ANOVA feature selection, partial least squares feature dimension reduction and an ensemble of One vs. Rest random forest classifiers. With the aim of improving its performance when discriminating healthy controls (HC) from MCI, a second binary classification level was introduced that reconsiders the HC and MCI predictions of the first level. The system was trained and evaluated on an ADNI datasets that consist of T1-weighted MRI morphological measurements from HC, stable MCI, converter MCI and AD subjects. The proposed system yields a 56.25% classification score on the test subset which consists of 160 real subjects. The classifier yielded the best performance when compared to: (i) One vs. One (OvO), One vs. Rest (OvR) and error correcting output codes (ECOC) as strategies for reducing the multiclass classification task to multiple binary classification problems, (ii) support vector machines, gradient boosting classifier and random forest as base binary classifiers, and (iii) bagging ensemble learning. A robust method has been proposed for the international challenge on MCI prediction based on MRI data. The system yielded the second best performance during the competition with an accuracy rate of 56.25% when evaluated on the real subjects of the test set. Copyright © 2017 Elsevier B.V. All rights reserved.

  18. Comparing ensemble learning methods based on decision tree classifiers for protein fold recognition.

    PubMed

    Bardsiri, Mahshid Khatibi; Eftekhari, Mahdi

    2014-01-01

    In this paper, some methods for ensemble learning of protein fold recognition based on a decision tree (DT) are compared and contrasted against each other over three datasets taken from the literature. According to previously reported studies, the features of the datasets are divided into some groups. Then, for each of these groups, three ensemble classifiers, namely, random forest, rotation forest and AdaBoost.M1 are employed. Also, some fusion methods are introduced for combining the ensemble classifiers obtained in the previous step. After this step, three classifiers are produced based on the combination of classifiers of types random forest, rotation forest and AdaBoost.M1. Finally, the three different classifiers achieved are combined to make an overall classifier. Experimental results show that the overall classifier obtained by the genetic algorithm (GA) weighting fusion method, is the best one in comparison to previously applied methods in terms of classification accuracy.

  19. Predicting membrane protein types using various decision tree classifiers based on various modes of general PseAAC for imbalanced datasets.

    PubMed

    Sankari, E Siva; Manimegalai, D

    2017-12-21

    Predicting membrane protein types is an important and challenging research area in bioinformatics and proteomics. Traditional biophysical methods are used to classify membrane protein types. Due to large exploration of uncharacterized protein sequences in databases, traditional methods are very time consuming, expensive and susceptible to errors. Hence, it is highly desirable to develop a robust, reliable, and efficient method to predict membrane protein types. Imbalanced datasets and large datasets are often handled well by decision tree classifiers. Since imbalanced datasets are taken, the performance of various decision tree classifiers such as Decision Tree (DT), Classification And Regression Tree (CART), C4.5, Random tree, REP (Reduced Error Pruning) tree, ensemble methods such as Adaboost, RUS (Random Under Sampling) boost, Rotation forest and Random forest are analysed. Among the various decision tree classifiers Random forest performs well in less time with good accuracy of 96.35%. Another inference is RUS boost decision tree classifier is able to classify one or two samples in the class with very less samples while the other classifiers such as DT, Adaboost, Rotation forest and Random forest are not sensitive for the classes with fewer samples. Also the performance of decision tree classifiers is compared with SVM (Support Vector Machine) and Naive Bayes classifier. Copyright © 2017 Elsevier Ltd. All rights reserved.

  20. Mapping the distribution of the main host for plague in a complex landscape in Kazakhstan: An object-based approach using SPOT-5 XS, Landsat 7 ETM+, SRTM and multiple Random Forests

    NASA Astrophysics Data System (ADS)

    Wilschut, L. I.; Addink, E. A.; Heesterbeek, J. A. P.; Dubyanskiy, V. M.; Davis, S. A.; Laudisoit, A.; Begon, M.; Burdelov, L. A.; Atshabar, B. B.; de Jong, S. M.

    2013-08-01

    Plague is a zoonotic infectious disease present in great gerbil populations in Kazakhstan. Infectious disease dynamics are influenced by the spatial distribution of the carriers (hosts) of the disease. The great gerbil, the main host in our study area, lives in burrows, which can be recognized on high resolution satellite imagery. In this study, using earth observation data at various spatial scales, we map the spatial distribution of burrows in a semi-desert landscape. The study area consists of various landscape types. To evaluate whether identification of burrows by classification is possible in these landscape types, the study area was subdivided into eight landscape units, on the basis of Landsat 7 ETM+ derived Tasselled Cap Greenness and Brightness, and SRTM derived standard deviation in elevation. In the field, 904 burrows were mapped. Using two segmented 2.5 m resolution SPOT-5 XS satellite scenes, reference object sets were created. Random Forests were built for both SPOT scenes and used to classify the images. Additionally, a stratified classification was carried out, by building separate Random Forests per landscape unit. Burrows were successfully classified in all landscape units. In the ‘steppe on floodplain’ areas, classification worked best: producer's and user's accuracy in those areas reached 88% and 100%, respectively. In the ‘floodplain’ areas with a more heterogeneous vegetation cover, classification worked least well; there, accuracies were 86 and 58% respectively. Stratified classification improved the results in all landscape units where comparison was possible (four), increasing kappa coefficients by 13, 10, 9 and 1%, respectively. In this study, an innovative stratification method using high- and medium resolution imagery was applied in order to map host distribution on a large spatial scale. The burrow maps we developed will help to detect changes in the distribution of great gerbil populations and, moreover, serve as a unique empirical data set which can be used as input for epidemiological plague models. This is an important step in understanding the dynamics of plague.

  1. Modification of the random forest algorithm to avoid statistical dependence problems when classifying remote sensing imagery

    NASA Astrophysics Data System (ADS)

    Cánovas-García, Fulgencio; Alonso-Sarría, Francisco; Gomariz-Castillo, Francisco; Oñate-Valdivieso, Fernando

    2017-06-01

    Random forest is a classification technique widely used in remote sensing. One of its advantages is that it produces an estimation of classification accuracy based on the so called out-of-bag cross-validation method. It is usually assumed that such estimation is not biased and may be used instead of validation based on an external data-set or a cross-validation external to the algorithm. In this paper we show that this is not necessarily the case when classifying remote sensing imagery using training areas with several pixels or objects. According to our results, out-of-bag cross-validation clearly overestimates accuracy, both overall and per class. The reason is that, in a training patch, pixels or objects are not independent (from a statistical point of view) of each other; however, they are split by bootstrapping into in-bag and out-of-bag as if they were really independent. We believe that putting whole patch, rather than pixels/objects, in one or the other set would produce a less biased out-of-bag cross-validation. To deal with the problem, we propose a modification of the random forest algorithm to split training patches instead of the pixels (or objects) that compose them. This modified algorithm does not overestimate accuracy and has no lower predictive capability than the original. When its results are validated with an external data-set, the accuracy is not different from that obtained with the original algorithm. We analysed three remote sensing images with different classification approaches (pixel and object based); in the three cases reported, the modification we propose produces a less biased accuracy estimation.

  2. An AUC-based permutation variable importance measure for random forests

    PubMed Central

    2013-01-01

    Background The random forest (RF) method is a commonly used tool for classification with high dimensional data as well as for ranking candidate predictors based on the so-called random forest variable importance measures (VIMs). However the classification performance of RF is known to be suboptimal in case of strongly unbalanced data, i.e. data where response class sizes differ considerably. Suggestions were made to obtain better classification performance based either on sampling procedures or on cost sensitivity analyses. However to our knowledge the performance of the VIMs has not yet been examined in the case of unbalanced response classes. In this paper we explore the performance of the permutation VIM for unbalanced data settings and introduce an alternative permutation VIM based on the area under the curve (AUC) that is expected to be more robust towards class imbalance. Results We investigated the performance of the standard permutation VIM and of our novel AUC-based permutation VIM for different class imbalance levels using simulated data and real data. The results suggest that the new AUC-based permutation VIM outperforms the standard permutation VIM for unbalanced data settings while both permutation VIMs have equal performance for balanced data settings. Conclusions The standard permutation VIM loses its ability to discriminate between associated predictors and predictors not associated with the response for increasing class imbalance. It is outperformed by our new AUC-based permutation VIM for unbalanced data settings, while the performance of both VIMs is very similar in the case of balanced classes. The new AUC-based VIM is implemented in the R package party for the unbiased RF variant based on conditional inference trees. The codes implementing our study are available from the companion website: http://www.ibe.med.uni-muenchen.de/organisation/mitarbeiter/070_drittmittel/janitza/index.html. PMID:23560875

  3. An AUC-based permutation variable importance measure for random forests.

    PubMed

    Janitza, Silke; Strobl, Carolin; Boulesteix, Anne-Laure

    2013-04-05

    The random forest (RF) method is a commonly used tool for classification with high dimensional data as well as for ranking candidate predictors based on the so-called random forest variable importance measures (VIMs). However the classification performance of RF is known to be suboptimal in case of strongly unbalanced data, i.e. data where response class sizes differ considerably. Suggestions were made to obtain better classification performance based either on sampling procedures or on cost sensitivity analyses. However to our knowledge the performance of the VIMs has not yet been examined in the case of unbalanced response classes. In this paper we explore the performance of the permutation VIM for unbalanced data settings and introduce an alternative permutation VIM based on the area under the curve (AUC) that is expected to be more robust towards class imbalance. We investigated the performance of the standard permutation VIM and of our novel AUC-based permutation VIM for different class imbalance levels using simulated data and real data. The results suggest that the new AUC-based permutation VIM outperforms the standard permutation VIM for unbalanced data settings while both permutation VIMs have equal performance for balanced data settings. The standard permutation VIM loses its ability to discriminate between associated predictors and predictors not associated with the response for increasing class imbalance. It is outperformed by our new AUC-based permutation VIM for unbalanced data settings, while the performance of both VIMs is very similar in the case of balanced classes. The new AUC-based VIM is implemented in the R package party for the unbiased RF variant based on conditional inference trees. The codes implementing our study are available from the companion website: http://www.ibe.med.uni-muenchen.de/organisation/mitarbeiter/070_drittmittel/janitza/index.html.

  4. A new tool for supervised classification of satellite images available on web servers: Google Maps as a case study

    NASA Astrophysics Data System (ADS)

    García-Flores, Agustín.; Paz-Gallardo, Abel; Plaza, Antonio; Li, Jun

    2016-10-01

    This paper describes a new web platform dedicated to the classification of satellite images called Hypergim. The current implementation of this platform enables users to perform classification of satellite images from any part of the world thanks to the worldwide maps provided by Google Maps. To perform this classification, Hypergim uses unsupervised algorithms like Isodata and K-means. Here, we present an extension of the original platform in which we adapt Hypergim in order to use supervised algorithms to improve the classification results. This involves a significant modification of the user interface, providing the user with a way to obtain samples of classes present in the images to use in the training phase of the classification process. Another main goal of this development is to improve the runtime of the image classification process. To achieve this goal, we use a parallel implementation of the Random Forest classification algorithm. This implementation is a modification of the well-known CURFIL software package. The use of this type of algorithms to perform image classification is widespread today thanks to its precision and ease of training. The actual implementation of Random Forest was developed using CUDA platform, which enables us to exploit the potential of several models of NVIDIA graphics processing units using them to execute general purpose computing tasks as image classification algorithms. As well as CUDA, we use other parallel libraries as Intel Boost, taking advantage of the multithreading capabilities of modern CPUs. To ensure the best possible results, the platform is deployed in a cluster of commodity graphics processing units (GPUs), so that multiple users can use the tool in a concurrent way. The experimental results indicate that this new algorithm widely outperform the previous unsupervised algorithms implemented in Hypergim, both in runtime as well as precision of the actual classification of the images.

  5. Texture analysis of common renal masses in multiple MR sequences for prediction of pathology

    NASA Astrophysics Data System (ADS)

    Hoang, Uyen N.; Malayeri, Ashkan A.; Lay, Nathan S.; Summers, Ronald M.; Yao, Jianhua

    2017-03-01

    This pilot study performs texture analysis on multiple magnetic resonance (MR) images of common renal masses for differentiation of renal cell carcinoma (RCC). Bounding boxes are drawn around each mass on one axial slice in T1 delayed sequence to use for feature extraction and classification. All sequences (T1 delayed, venous, arterial, pre-contrast phases, T2, and T2 fat saturated sequences) are co-registered and texture features are extracted from each sequence simultaneously. Random forest is used to construct models to classify lesions on 96 normal regions, 87 clear cell RCCs, 8 papillary RCCs, and 21 renal oncocytomas; ground truths are verified through pathology reports. The highest performance is seen in random forest model when data from all sequences are used in conjunction, achieving an overall classification accuracy of 83.7%. When using data from one single sequence, the overall accuracies achieved for T1 delayed, venous, arterial, and pre-contrast phase, T2, and T2 fat saturated were 79.1%, 70.5%, 56.2%, 61.0%, 60.0%, and 44.8%, respectively. This demonstrates promising results of utilizing intensity information from multiple MR sequences for accurate classification of renal masses.

  6. Comparative Performance Analysis of Support Vector Machine, Random Forest, Logistic Regression and k-Nearest Neighbours in Rainbow Trout (Oncorhynchus Mykiss) Classification Using Image-Based Features

    PubMed Central

    Císař, Petr; Labbé, Laurent; Souček, Pavel; Pelissier, Pablo; Kerneis, Thierry

    2018-01-01

    The main aim of this study was to develop a new objective method for evaluating the impacts of different diets on the live fish skin using image-based features. In total, one-hundred and sixty rainbow trout (Oncorhynchus mykiss) were fed either a fish-meal based diet (80 fish) or a 100% plant-based diet (80 fish) and photographed using consumer-grade digital camera. Twenty-three colour features and four texture features were extracted. Four different classification methods were used to evaluate fish diets including Random forest (RF), Support vector machine (SVM), Logistic regression (LR) and k-Nearest neighbours (k-NN). The SVM with radial based kernel provided the best classifier with correct classification rate (CCR) of 82% and Kappa coefficient of 0.65. Although the both LR and RF methods were less accurate than SVM, they achieved good classification with CCR 75% and 70% respectively. The k-NN was the least accurate (40%) classification model. Overall, it can be concluded that consumer-grade digital cameras could be employed as the fast, accurate and non-invasive sensor for classifying rainbow trout based on their diets. Furthermore, these was a close association between image-based features and fish diet received during cultivation. These procedures can be used as non-invasive, accurate and precise approaches for monitoring fish status during the cultivation by evaluating diet’s effects on fish skin. PMID:29596375

  7. Comparative Performance Analysis of Support Vector Machine, Random Forest, Logistic Regression and k-Nearest Neighbours in Rainbow Trout (Oncorhynchus Mykiss) Classification Using Image-Based Features.

    PubMed

    Saberioon, Mohammadmehdi; Císař, Petr; Labbé, Laurent; Souček, Pavel; Pelissier, Pablo; Kerneis, Thierry

    2018-03-29

    The main aim of this study was to develop a new objective method for evaluating the impacts of different diets on the live fish skin using image-based features. In total, one-hundred and sixty rainbow trout ( Oncorhynchus mykiss ) were fed either a fish-meal based diet (80 fish) or a 100% plant-based diet (80 fish) and photographed using consumer-grade digital camera. Twenty-three colour features and four texture features were extracted. Four different classification methods were used to evaluate fish diets including Random forest (RF), Support vector machine (SVM), Logistic regression (LR) and k -Nearest neighbours ( k -NN). The SVM with radial based kernel provided the best classifier with correct classification rate (CCR) of 82% and Kappa coefficient of 0.65. Although the both LR and RF methods were less accurate than SVM, they achieved good classification with CCR 75% and 70% respectively. The k -NN was the least accurate (40%) classification model. Overall, it can be concluded that consumer-grade digital cameras could be employed as the fast, accurate and non-invasive sensor for classifying rainbow trout based on their diets. Furthermore, these was a close association between image-based features and fish diet received during cultivation. These procedures can be used as non-invasive, accurate and precise approaches for monitoring fish status during the cultivation by evaluating diet's effects on fish skin.

  8. Mapping Sub-Antarctic Cushion Plants Using Random Forests to Combine Very High Resolution Satellite Imagery and Terrain Modelling

    PubMed Central

    Bricher, Phillippa K.; Lucieer, Arko; Shaw, Justine; Terauds, Aleks; Bergstrom, Dana M.

    2013-01-01

    Monitoring changes in the distribution and density of plant species often requires accurate and high-resolution baseline maps of those species. Detecting such change at the landscape scale is often problematic, particularly in remote areas. We examine a new technique to improve accuracy and objectivity in mapping vegetation, combining species distribution modelling and satellite image classification on a remote sub-Antarctic island. In this study, we combine spectral data from very high resolution WorldView-2 satellite imagery and terrain variables from a high resolution digital elevation model to improve mapping accuracy, in both pixel- and object-based classifications. Random forest classification was used to explore the effectiveness of these approaches on mapping the distribution of the critically endangered cushion plant Azorella macquariensis Orchard (Apiaceae) on sub-Antarctic Macquarie Island. Both pixel- and object-based classifications of the distribution of Azorella achieved very high overall validation accuracies (91.6–96.3%, κ = 0.849–0.924). Both two-class and three-class classifications were able to accurately and consistently identify the areas where Azorella was absent, indicating that these maps provide a suitable baseline for monitoring expected change in the distribution of the cushion plants. Detecting such change is critical given the threats this species is currently facing under altering environmental conditions. The method presented here has applications to monitoring a range of species, particularly in remote and isolated environments. PMID:23940805

  9. Classification of acoustic emission signals using wavelets and Random Forests : Application to localized corrosion

    NASA Astrophysics Data System (ADS)

    Morizet, N.; Godin, N.; Tang, J.; Maillet, E.; Fregonese, M.; Normand, B.

    2016-03-01

    This paper aims to propose a novel approach to classify acoustic emission (AE) signals deriving from corrosion experiments, even if embedded into a noisy environment. To validate this new methodology, synthetic data are first used throughout an in-depth analysis, comparing Random Forests (RF) to the k-Nearest Neighbor (k-NN) algorithm. Moreover, a new evaluation tool called the alter-class matrix (ACM) is introduced to simulate different degrees of uncertainty on labeled data for supervised classification. Then, tests on real cases involving noise and crevice corrosion are conducted, by preprocessing the waveforms including wavelet denoising and extracting a rich set of features as input of the RF algorithm. To this end, a software called RF-CAM has been developed. Results show that this approach is very efficient on ground truth data and is also very promising on real data, especially for its reliability, performance and speed, which are serious criteria for the chemical industry.

  10. Supervised classification in the presence of misclassified training data: a Monte Carlo simulation study in the three group case.

    PubMed

    Bolin, Jocelyn Holden; Finch, W Holmes

    2014-01-01

    Statistical classification of phenomena into observed groups is very common in the social and behavioral sciences. Statistical classification methods, however, are affected by the characteristics of the data under study. Statistical classification can be further complicated by initial misclassification of the observed groups. The purpose of this study is to investigate the impact of initial training data misclassification on several statistical classification and data mining techniques. Misclassification conditions in the three group case will be simulated and results will be presented in terms of overall as well as subgroup classification accuracy. Results show decreased classification accuracy as sample size, group separation and group size ratio decrease and as misclassification percentage increases with random forests demonstrating the highest accuracy across conditions.

  11. Characterizing channel change along a multithread gravel-bed river using random forest image classification

    NASA Astrophysics Data System (ADS)

    Overstreet, B. T.; Legleiter, C. J.

    2012-12-01

    The Snake River in Grand Teton National Park is a dam-regulated but highly dynamic gravel-bed river that alternates between a single thread and a multithread planform. Identifying key drivers of channel change on this river could improve our understanding of 1) how flow regulation at Jackson Lake Dam has altered the character of the river over time; 2) how changes in the distribution of various types of vegetation impacts river dynamics; and 3) how the Snake River will respond to future human and climate driven disturbances. Despite the importance of monitoring planform changes over time, automated channel extraction and understanding the physical drivers contributing to channel change continue to be challenging yet critical steps in the remote sensing of riverine environments. In this study we use the random forest statistical technique to first classify land cover within the Snake River corridor and then extract channel features from a sequence of high-resolution multispectral images of the Snake River spanning the period from 2006 to 2012, which encompasses both exceptionally dry years and near-record runoff in 2011. We show that the random forest technique can be used to classify images with as few as four spectral bands with far greater accuracy than traditional single-tree classification approaches. Secondly, we couple random forest derived land cover maps with LiDAR derived topography, bathymetry, and canopy height to explore physical drivers contributing to observed channel changes on the Snake River. In conclusion we show that the random forest technique is a powerful tool for classifying multispectral images of rivers. Moreover, we hypothesize that with sufficient data for calculating spatially distributed metrics of channel form and more frequent channel monitoring, this tool can also be used to identify areas with high probabilities of channel change. Land cover maps of a portion of the Snake River produced from digital aerial photography from 2010 and a 2011 WorldView2 satellite image. This pair of maps thus captures changes that occurred during the 2011 runoff

  12. Automated diagnosis of myositis from muscle ultrasound: Exploring the use of machine learning and deep learning methods

    PubMed Central

    Burlina, Philippe; Billings, Seth; Joshi, Neil

    2017-01-01

    Objective To evaluate the use of ultrasound coupled with machine learning (ML) and deep learning (DL) techniques for automated or semi-automated classification of myositis. Methods Eighty subjects comprised of 19 with inclusion body myositis (IBM), 14 with polymyositis (PM), 14 with dermatomyositis (DM), and 33 normal (N) subjects were included in this study, where 3214 muscle ultrasound images of 7 muscles (observed bilaterally) were acquired. We considered three problems of classification including (A) normal vs. affected (DM, PM, IBM); (B) normal vs. IBM patients; and (C) IBM vs. other types of myositis (DM or PM). We studied the use of an automated DL method using deep convolutional neural networks (DL-DCNNs) for diagnostic classification and compared it with a semi-automated conventional ML method based on random forests (ML-RF) and “engineered” features. We used the known clinical diagnosis as the gold standard for evaluating performance of muscle classification. Results The performance of the DL-DCNN method resulted in accuracies ± standard deviation of 76.2% ± 3.1% for problem (A), 86.6% ± 2.4% for (B) and 74.8% ± 3.9% for (C), while the ML-RF method led to accuracies of 72.3% ± 3.3% for problem (A), 84.3% ± 2.3% for (B) and 68.9% ± 2.5% for (C). Conclusions This study demonstrates the application of machine learning methods for automatically or semi-automatically classifying inflammatory muscle disease using muscle ultrasound. Compared to the conventional random forest machine learning method used here, which has the drawback of requiring manual delineation of muscle/fat boundaries, DCNN-based classification by and large improved the accuracies in all classification problems while providing a fully automated approach to classification. PMID:28854220

  13. Automated diagnosis of myositis from muscle ultrasound: Exploring the use of machine learning and deep learning methods.

    PubMed

    Burlina, Philippe; Billings, Seth; Joshi, Neil; Albayda, Jemima

    2017-01-01

    To evaluate the use of ultrasound coupled with machine learning (ML) and deep learning (DL) techniques for automated or semi-automated classification of myositis. Eighty subjects comprised of 19 with inclusion body myositis (IBM), 14 with polymyositis (PM), 14 with dermatomyositis (DM), and 33 normal (N) subjects were included in this study, where 3214 muscle ultrasound images of 7 muscles (observed bilaterally) were acquired. We considered three problems of classification including (A) normal vs. affected (DM, PM, IBM); (B) normal vs. IBM patients; and (C) IBM vs. other types of myositis (DM or PM). We studied the use of an automated DL method using deep convolutional neural networks (DL-DCNNs) for diagnostic classification and compared it with a semi-automated conventional ML method based on random forests (ML-RF) and "engineered" features. We used the known clinical diagnosis as the gold standard for evaluating performance of muscle classification. The performance of the DL-DCNN method resulted in accuracies ± standard deviation of 76.2% ± 3.1% for problem (A), 86.6% ± 2.4% for (B) and 74.8% ± 3.9% for (C), while the ML-RF method led to accuracies of 72.3% ± 3.3% for problem (A), 84.3% ± 2.3% for (B) and 68.9% ± 2.5% for (C). This study demonstrates the application of machine learning methods for automatically or semi-automatically classifying inflammatory muscle disease using muscle ultrasound. Compared to the conventional random forest machine learning method used here, which has the drawback of requiring manual delineation of muscle/fat boundaries, DCNN-based classification by and large improved the accuracies in all classification problems while providing a fully automated approach to classification.

  14. A novel transferable individual tree crown delineation model based on Fishing Net Dragging and boundary classification

    NASA Astrophysics Data System (ADS)

    Liu, Tao; Im, Jungho; Quackenbush, Lindi J.

    2015-12-01

    This study provides a novel approach to individual tree crown delineation (ITCD) using airborne Light Detection and Ranging (LiDAR) data in dense natural forests using two main steps: crown boundary refinement based on a proposed Fishing Net Dragging (FiND) method, and segment merging based on boundary classification. FiND starts with approximate tree crown boundaries derived using a traditional watershed method with Gaussian filtering and refines these boundaries using an algorithm that mimics how a fisherman drags a fishing net. Random forest machine learning is then used to classify boundary segments into two classes: boundaries between trees and boundaries between branches that belong to a single tree. Three groups of LiDAR-derived features-two from the pseudo waveform generated along with crown boundaries and one from a canopy height model (CHM)-were used in the classification. The proposed ITCD approach was tested using LiDAR data collected over a mountainous region in the Adirondack Park, NY, USA. Overall accuracy of boundary classification was 82.4%. Features derived from the CHM were generally more important in the classification than the features extracted from the pseudo waveform. A comprehensive accuracy assessment scheme for ITCD was also introduced by considering both area of crown overlap and crown centroids. Accuracy assessment using this new scheme shows the proposed ITCD achieved 74% and 78% as overall accuracy, respectively, for deciduous and mixed forest.

  15. Temporal optimisation of image acquisition for land cover classification with Random Forest and MODIS time-series

    NASA Astrophysics Data System (ADS)

    Nitze, Ingmar; Barrett, Brian; Cawkwell, Fiona

    2015-02-01

    The analysis and classification of land cover is one of the principal applications in terrestrial remote sensing. Due to the seasonal variability of different vegetation types and land surface characteristics, the ability to discriminate land cover types changes over time. Multi-temporal classification can help to improve the classification accuracies, but different constraints, such as financial restrictions or atmospheric conditions, may impede their application. The optimisation of image acquisition timing and frequencies can help to increase the effectiveness of the classification process. For this purpose, the Feature Importance (FI) measure of the state-of-the art machine learning method Random Forest was used to determine the optimal image acquisition periods for a general (Grassland, Forest, Water, Settlement, Peatland) and Grassland specific (Improved Grassland, Semi-Improved Grassland) land cover classification in central Ireland based on a 9-year time-series of MODIS Terra 16 day composite data (MOD13Q1). Feature Importances for each acquisition period of the Enhanced Vegetation Index (EVI) and Normalised Difference Vegetation Index (NDVI) were calculated for both classification scenarios. In the general land cover classification, the months December and January showed the highest, and July and August the lowest separability for both VIs over the entire nine-year period. This temporal separability was reflected in the classification accuracies, where the optimal choice of image dates outperformed the worst image date by 13% using NDVI and 5% using EVI on a mono-temporal analysis. With the addition of the next best image periods to the data input the classification accuracies converged quickly to their limit at around 8-10 images. The binary classification schemes, using two classes only, showed a stronger seasonal dependency with a higher intra-annual, but lower inter-annual variation. Nonetheless anomalous weather conditions, such as the cold winter of 2009/2010 can alter the temporal separability pattern significantly. Due to the extensive use of the NDVI for land cover discrimination, the findings of this study should be transferrable to data from other optical sensors with a higher spatial resolution. However, the high impact of outliers from the general climatic pattern highlights the limitation of spatial transferability to locations with different climatic and land cover conditions. The use of high-temporal, moderate resolution data such as MODIS in conjunction with machine-learning techniques proved to be a good base for the prediction of image acquisition timing for optimal land cover classification results.

  16. Multi-Temporal Classification and Change Detection Using Uav Images

    NASA Astrophysics Data System (ADS)

    Makuti, S.; Nex, F.; Yang, M. Y.

    2018-05-01

    In this paper different methodologies for the classification and change detection of UAV image blocks are explored. UAV is not only the cheapest platform for image acquisition but it is also the easiest platform to operate in repeated data collections over a changing area like a building construction site. Two change detection techniques have been evaluated in this study: the pre-classification and the post-classification algorithms. These methods are based on three main steps: feature extraction, classification and change detection. A set of state of the art features have been used in the tests: colour features (HSV), textural features (GLCM) and 3D geometric features. For classification purposes Conditional Random Field (CRF) has been used: the unary potential was determined using the Random Forest algorithm while the pairwise potential was defined by the fully connected CRF. In the performed tests, different feature configurations and settings have been considered to assess the performance of these methods in such challenging task. Experimental results showed that the post-classification approach outperforms the pre-classification change detection method. This was analysed using the overall accuracy, where by post classification have an accuracy of up to 62.6 % and the pre classification change detection have an accuracy of 46.5 %. These results represent a first useful indication for future works and developments.

  17. Identification of an Efficient Gene Expression Panel for Glioblastoma Classification

    PubMed Central

    Zelaya, Ivette; Laks, Dan R.; Zhao, Yining; Kawaguchi, Riki; Gao, Fuying; Kornblum, Harley I.; Coppola, Giovanni

    2016-01-01

    We present here a novel genetic algorithm-based random forest (GARF) modeling technique that enables a reduction in the complexity of large gene disease signatures to highly accurate, greatly simplified gene panels. When applied to 803 glioblastoma multiforme samples, this method allowed the 840-gene Verhaak et al. gene panel (the standard in the field) to be reduced to a 48-gene classifier, while retaining 90.91% classification accuracy, and outperforming the best available alternative methods. Additionally, using this approach we produced a 32-gene panel which allows for better consistency between RNA-seq and microarray-based classifications, improving cross-platform classification retention from 69.67% to 86.07%. A webpage producing these classifications is available at http://simplegbm.semel.ucla.edu. PMID:27855170

  18. Random forests as cumulative effects models: A case study of lakes and rivers in Muskoka, Canada.

    PubMed

    Jones, F Chris; Plewes, Rachel; Murison, Lorna; MacDougall, Mark J; Sinclair, Sarah; Davies, Christie; Bailey, John L; Richardson, Murray; Gunn, John

    2017-10-01

    Cumulative effects assessment (CEA) - a type of environmental appraisal - lacks effective methods for modeling cumulative effects, evaluating indicators of ecosystem condition, and exploring the likely outcomes of development scenarios. Random forests are an extension of classification and regression trees, which model response variables by recursive partitioning. Random forests were used to model a series of candidate ecological indicators that described lakes and rivers from a case study watershed (The Muskoka River Watershed, Canada). Suitability of the candidate indicators for use in cumulative effects assessment and watershed monitoring was assessed according to how well they could be predicted from natural habitat features and how sensitive they were to human land-use. The best models explained 75% of the variation in a multivariate descriptor of lake benthic-macroinvertebrate community structure, and 76% of the variation in the conductivity of river water. Similar results were obtained by cross-validation. Several candidate indicators detected a simulated doubling of urban land-use in their catchments, and a few were able to detect a simulated doubling of agricultural land-use. The paper demonstrates that random forests can be used to describe the combined and singular effects of multiple stressors and natural environmental factors, and furthermore, that random forests can be used to evaluate the performance of monitoring indicators. The numerical methods presented are applicable to any ecosystem and indicator type, and therefore represent a step forward for CEA. Crown Copyright © 2017. Published by Elsevier Ltd. All rights reserved.

  19. Gradient modeling of conifer species using random forests

    Treesearch

    Jeffrey S. Evans; Samuel A. Cushman

    2009-01-01

    Landscape ecology often adopts a patch mosaic model of ecological patterns. However, many ecological attributes are inherently continuous and classification of species composition into vegetation communities and discrete patches provides an overly simplistic view of the landscape. If one adopts a nichebased, individualistic concept of biotic communities then it may...

  20. Automated source classification of new transient sources

    NASA Astrophysics Data System (ADS)

    Oertel, M.; Kreikenbohm, A.; Wilms, J.; DeLuca, A.

    2017-10-01

    The EXTraS project harvests the hitherto unexplored temporal domain information buried in the serendipitous data collected by the European Photon Imaging Camera (EPIC) onboard the ESA XMM-Newton mission since its launch. This includes a search for fast transients, missed by standard image analysis, and a search and characterization of variability in hundreds of thousands of sources. We present an automated classification scheme for new transient sources in the EXTraS project. The method is as follows: source classification features of a training sample are used to train machine learning algorithms (performed in R; randomForest (Breiman, 2001) in supervised mode) which are then tested on a sample of known source classes and used for classification.

  1. Automated segmentation of thyroid gland on CT images with multi-atlas label fusion and random classification forest

    NASA Astrophysics Data System (ADS)

    Liu, Jiamin; Chang, Kevin; Kim, Lauren; Turkbey, Evrim; Lu, Le; Yao, Jianhua; Summers, Ronald

    2015-03-01

    The thyroid gland plays an important role in clinical practice, especially for radiation therapy treatment planning. For patients with head and neck cancer, radiation therapy requires a precise delineation of the thyroid gland to be spared on the pre-treatment planning CT images to avoid thyroid dysfunction. In the current clinical workflow, the thyroid gland is normally manually delineated by radiologists or radiation oncologists, which is time consuming and error prone. Therefore, a system for automated segmentation of the thyroid is desirable. However, automated segmentation of the thyroid is challenging because the thyroid is inhomogeneous and surrounded by structures that have similar intensities. In this work, the thyroid gland segmentation is initially estimated by multi-atlas label fusion algorithm. The segmentation is refined by supervised statistical learning based voxel labeling with a random forest algorithm. Multiatlas label fusion (MALF) transfers expert-labeled thyroids from atlases to a target image using deformable registration. Errors produced by label transfer are reduced by label fusion that combines the results produced by all atlases into a consensus solution. Then, random forest (RF) employs an ensemble of decision trees that are trained on labeled thyroids to recognize features. The trained forest classifier is then applied to the thyroid estimated from the MALF by voxel scanning to assign the class-conditional probability. Voxels from the expert-labeled thyroids in CT volumes are treated as positive classes; background non-thyroid voxels as negatives. We applied this automated thyroid segmentation system to CT scans of 20 patients. The results showed that the MALF achieved an overall 0.75 Dice Similarity Coefficient (DSC) and the RF classification further improved the DSC to 0.81.

  2. Unbiased feature selection in learning random forests for high-dimensional data.

    PubMed

    Nguyen, Thanh-Tung; Huang, Joshua Zhexue; Nguyen, Thuy Thi

    2015-01-01

    Random forests (RFs) have been widely used as a powerful classification method. However, with the randomization in both bagging samples and feature selection, the trees in the forest tend to select uninformative features for node splitting. This makes RFs have poor accuracy when working with high-dimensional data. Besides that, RFs have bias in the feature selection process where multivalued features are favored. Aiming at debiasing feature selection in RFs, we propose a new RF algorithm, called xRF, to select good features in learning RFs for high-dimensional data. We first remove the uninformative features using p-value assessment, and the subset of unbiased features is then selected based on some statistical measures. This feature subset is then partitioned into two subsets. A feature weighting sampling technique is used to sample features from these two subsets for building trees. This approach enables one to generate more accurate trees, while allowing one to reduce dimensionality and the amount of data needed for learning RFs. An extensive set of experiments has been conducted on 47 high-dimensional real-world datasets including image datasets. The experimental results have shown that RFs with the proposed approach outperformed the existing random forests in increasing the accuracy and the AUC measures.

  3. A random forest learning assisted "divide and conquer" approach for peptide conformation search.

    PubMed

    Chen, Xin; Yang, Bing; Lin, Zijing

    2018-06-11

    Computational determination of peptide conformations is challenging as it is a problem of finding minima in a high-dimensional space. The "divide and conquer" approach is promising for reliably reducing the search space size. A random forest learning model is proposed here to expand the scope of applicability of the "divide and conquer" approach. A random forest classification algorithm is used to characterize the distributions of the backbone φ-ψ units ("words"). A random forest supervised learning model is developed to analyze the combinations of the φ-ψ units ("grammar"). It is found that amino acid residues may be grouped as equivalent "words", while the φ-ψ combinations in low-energy peptide conformations follow a distinct "grammar". The finding of equivalent words empowers the "divide and conquer" method with the flexibility of fragment substitution. The learnt grammar is used to improve the efficiency of the "divide and conquer" method by removing unfavorable φ-ψ combinations without the need of dedicated human effort. The machine learning assisted search method is illustrated by efficiently searching the conformations of GGG/AAA/GGGG/AAAA/GGGGG through assembling the structures of GFG/GFGG. Moreover, the computational cost of the new method is shown to increase rather slowly with the peptide length.

  4. Semantic classification of urban buildings combining VHR image and GIS data: An improved random forest approach

    NASA Astrophysics Data System (ADS)

    Du, Shihong; Zhang, Fangli; Zhang, Xiuyuan

    2015-07-01

    While most existing studies have focused on extracting geometric information on buildings, only a few have concentrated on semantic information. The lack of semantic information cannot satisfy many demands on resolving environmental and social issues. This study presents an approach to semantically classify buildings into much finer categories than those of existing studies by learning random forest (RF) classifier from a large number of imbalanced samples with high-dimensional features. First, a two-level segmentation mechanism combining GIS and VHR image produces single image objects at a large scale and intra-object components at a small scale. Second, a semi-supervised method chooses a large number of unbiased samples by considering the spatial proximity and intra-cluster similarity of buildings. Third, two important improvements in RF classifier are made: a voting-distribution ranked rule for reducing the influences of imbalanced samples on classification accuracy and a feature importance measurement for evaluating each feature's contribution to the recognition of each category. Fourth, the semantic classification of urban buildings is practically conducted in Beijing city, and the results demonstrate that the proposed approach is effective and accurate. The seven categories used in the study are finer than those in existing work and more helpful to studying many environmental and social problems.

  5. Building rooftop classification using random forests for large-scale PV deployment

    NASA Astrophysics Data System (ADS)

    Assouline, Dan; Mohajeri, Nahid; Scartezzini, Jean-Louis

    2017-10-01

    Large scale solar Photovoltaic (PV) deployment on existing building rooftops has proven to be one of the most efficient and viable sources of renewable energy in urban areas. As it usually requires a potential analysis over the area of interest, a crucial step is to estimate the geometric characteristics of the building rooftops. In this paper, we introduce a multi-layer machine learning methodology to classify 6 roof types, 9 aspect (azimuth) classes and 5 slope (tilt) classes for all building rooftops in Switzerland, using GIS processing. We train Random Forests (RF), an ensemble learning algorithm, to build the classifiers. We use (2 × 2) [m2 ] LiDAR data (considering buildings and vegetation) to extract several rooftop features, and a generalised footprint polygon data to localize buildings. The roof classifier is trained and tested with 1252 labeled roofs from three different urban areas, namely Baden, Luzern, and Winterthur. The results for roof type classification show an average accuracy of 67%. The aspect and slope classifiers are trained and tested with 11449 labeled roofs in the Zurich periphery area. The results for aspect and slope classification show different accuracies depending on the classes: while some classes are well identified, other under-represented classes remain challenging to detect.

  6. Predicting human liver microsomal stability with machine learning techniques.

    PubMed

    Sakiyama, Yojiro; Yuki, Hitomi; Moriya, Takashi; Hattori, Kazunari; Suzuki, Misaki; Shimada, Kaoru; Honma, Teruki

    2008-02-01

    To ensure a continuing pipeline in pharmaceutical research, lead candidates must possess appropriate metabolic stability in the drug discovery process. In vitro ADMET (absorption, distribution, metabolism, elimination, and toxicity) screening provides us with useful information regarding the metabolic stability of compounds. However, before the synthesis stage, an efficient process is required in order to deal with the vast quantity of data from large compound libraries and high-throughput screening. Here we have derived a relationship between the chemical structure and its metabolic stability for a data set of in-house compounds by means of various in silico machine learning such as random forest, support vector machine (SVM), logistic regression, and recursive partitioning. For model building, 1952 proprietary compounds comprising two classes (stable/unstable) were used with 193 descriptors calculated by Molecular Operating Environment. The results using test compounds have demonstrated that all classifiers yielded satisfactory results (accuracy > 0.8, sensitivity > 0.9, specificity > 0.6, and precision > 0.8). Above all, classification by random forest as well as SVM yielded kappa values of approximately 0.7 in an independent validation set, slightly higher than other classification tools. These results suggest that nonlinear/ensemble-based classification methods might prove useful in the area of in silico ADME modeling.

  7. Improved predictive mapping of indoor radon concentrations using ensemble regression trees based on automatic clustering of geological units.

    PubMed

    Kropat, Georg; Bochud, Francois; Jaboyedoff, Michel; Laedermann, Jean-Pascal; Murith, Christophe; Palacios Gruson, Martha; Baechler, Sébastien

    2015-09-01

    According to estimations around 230 people die as a result of radon exposure in Switzerland. This public health concern makes reliable indoor radon prediction and mapping methods necessary in order to improve risk communication to the public. The aim of this study was to develop an automated method to classify lithological units according to their radon characteristics and to develop mapping and predictive tools in order to improve local radon prediction. About 240 000 indoor radon concentration (IRC) measurements in about 150 000 buildings were available for our analysis. The automated classification of lithological units was based on k-medoids clustering via pair-wise Kolmogorov distances between IRC distributions of lithological units. For IRC mapping and prediction we used random forests and Bayesian additive regression trees (BART). The automated classification groups lithological units well in terms of their IRC characteristics. Especially the IRC differences in metamorphic rocks like gneiss are well revealed by this method. The maps produced by random forests soundly represent the regional difference of IRCs in Switzerland and improve the spatial detail compared to existing approaches. We could explain 33% of the variations in IRC data with random forests. Additionally, the influence of a variable evaluated by random forests shows that building characteristics are less important predictors for IRCs than spatial/geological influences. BART could explain 29% of IRC variability and produced maps that indicate the prediction uncertainty. Ensemble regression trees are a powerful tool to model and understand the multidimensional influences on IRCs. Automatic clustering of lithological units complements this method by facilitating the interpretation of radon properties of rock types. This study provides an important element for radon risk communication. Future approaches should consider taking into account further variables like soil gas radon measurements as well as more detailed geological information. Copyright © 2015 Elsevier Ltd. All rights reserved.

  8. Accurate Diabetes Risk Stratification Using Machine Learning: Role of Missing Value and Outliers.

    PubMed

    Maniruzzaman, Md; Rahman, Md Jahanur; Al-MehediHasan, Md; Suri, Harman S; Abedin, Md Menhazul; El-Baz, Ayman; Suri, Jasjit S

    2018-04-10

    Diabetes mellitus is a group of metabolic diseases in which blood sugar levels are too high. About 8.8% of the world was diabetic in 2017. It is projected that this will reach nearly 10% by 2045. The major challenge is that when machine learning-based classifiers are applied to such data sets for risk stratification, leads to lower performance. Thus, our objective is to develop an optimized and robust machine learning (ML) system under the assumption that missing values or outliers if replaced by a median configuration will yield higher risk stratification accuracy. This ML-based risk stratification is designed, optimized and evaluated, where: (i) the features are extracted and optimized from the six feature selection techniques (random forest, logistic regression, mutual information, principal component analysis, analysis of variance, and Fisher discriminant ratio) and combined with ten different types of classifiers (linear discriminant analysis, quadratic discriminant analysis, naïve Bayes, Gaussian process classification, support vector machine, artificial neural network, Adaboost, logistic regression, decision tree, and random forest) under the hypothesis that both missing values and outliers when replaced by computed medians will improve the risk stratification accuracy. Pima Indian diabetic dataset (768 patients: 268 diabetic and 500 controls) was used. Our results demonstrate that on replacing the missing values and outliers by group median and median values, respectively and further using the combination of random forest feature selection and random forest classification technique yields an accuracy, sensitivity, specificity, positive predictive value, negative predictive value and area under the curve as: 92.26%, 95.96%, 79.72%, 91.14%, 91.20%, and 0.93, respectively. This is an improvement of 10% over previously developed techniques published in literature. The system was validated for its stability and reliability. RF-based model showed the best performance when outliers are replaced by median values.

  9. An Activity Recognition Framework Deploying the Random Forest Classifier and A Single Optical Heart Rate Monitoring and Triaxial AccelerometerWrist-Band.

    PubMed

    Mehrang, Saeed; Pietilä, Julia; Korhonen, Ilkka

    2018-02-22

    Wrist-worn sensors have better compliance for activity monitoring compared to hip, waist, ankle or chest positions. However, wrist-worn activity monitoring is challenging due to the wide degree of freedom for the hand movements, as well as similarity of hand movements in different activities such as varying intensities of cycling. To strengthen the ability of wrist-worn sensors in detecting human activities more accurately, motion signals can be complemented by physiological signals such as optical heart rate (HR) based on photoplethysmography. In this paper, an activity monitoring framework using an optical HR sensor and a triaxial wrist-worn accelerometer is presented. We investigated a range of daily life activities including sitting, standing, household activities and stationary cycling with two intensities. A random forest (RF) classifier was exploited to detect these activities based on the wrist motions and optical HR. The highest overall accuracy of 89.6 ± 3.9% was achieved with a forest of a size of 64 trees and 13-s signal segments with 90% overlap. Removing the HR-derived features decreased the classification accuracy of high-intensity cycling by almost 7%, but did not affect the classification accuracies of other activities. A feature reduction utilizing the feature importance scores of RF was also carried out and resulted in a shrunken feature set of only 21 features. The overall accuracy of the classification utilizing the shrunken feature set was 89.4 ± 4.2%, which is almost equivalent to the above-mentioned peak overall accuracy.

  10. Exploiting machine learning algorithms for tree species classification in a semiarid woodland using RapidEye image

    NASA Astrophysics Data System (ADS)

    Adelabu, Samuel; Mutanga, Onisimo; Adam, Elhadi; Cho, Moses Azong

    2013-01-01

    Classification of different tree species in semiarid areas can be challenging as a result of the change in leaf structure and orientation due to soil moisture constraints. Tree species mapping is, however, a key parameter for forest management in semiarid environments. In this study, we examined the suitability of 5-band RapidEye satellite data for the classification of five tree species in mopane woodland of Botswana using machine leaning algorithms with limited training samples.We performed classification using random forest (RF) and support vector machines (SVM) based on EnMap box. The overall accuracies for classifying the five tree species was 88.75 and 85% for both SVM and RF, respectively. We also demonstrated that the new red-edge band in the RapidEye sensor has the potential for classifying tree species in semiarid environments when integrated with other standard bands. Similarly, we observed that where there are limited training samples, SVM is preferred over RF. Finally, we demonstrated that the two accuracy measures of quantity and allocation disagreement are simpler and more helpful for the vast majority of remote sensing classification process than the kappa coefficient. Overall, high species classification can be achieved using strategically located RapidEye bands integrated with advanced processing algorithms.

  11. Random forest wetland classification using ALOS-2 L-band, RADARSAT-2 C-band, and TerraSAR-X imagery

    NASA Astrophysics Data System (ADS)

    Mahdianpari, Masoud; Salehi, Bahram; Mohammadimanesh, Fariba; Motagh, Mahdi

    2017-08-01

    Wetlands are important ecosystems around the world, although they are degraded due both to anthropogenic and natural process. Newfoundland is among the richest Canadian province in terms of different wetland classes. Herbaceous wetlands cover extensive areas of the Avalon Peninsula, which are the habitat of a number of animal and plant species. In this study, a novel hierarchical object-based Random Forest (RF) classification approach is proposed for discriminating between different wetland classes in a sub-region located in the north eastern portion of the Avalon Peninsula. Particularly, multi-polarization and multi-frequency SAR data, including X-band TerraSAR-X single polarized (HH), L-band ALOS-2 dual polarized (HH/HV), and C-band RADARSAT-2 fully polarized images, were applied in different classification levels. First, a SAR backscatter analysis of different land cover types was performed by training data and used in Level-I classification to separate water from non-water classes. This was followed by Level-II classification, wherein the water class was further divided into shallow- and deep-water classes, and the non-water class was partitioned into herbaceous and non-herbaceous classes. In Level-III classification, the herbaceous class was further divided into bog, fen, and marsh classes, while the non-herbaceous class was subsequently partitioned into urban, upland, and swamp classes. In Level-II and -III classifications, different polarimetric decomposition approaches, including Cloude-Pottier, Freeman-Durden, Yamaguchi decompositions, and Kennaugh matrix elements were extracted to aid the RF classifier. The overall accuracy and kappa coefficient were determined in each classification level for evaluating the classification results. The importance of input features was also determined using the variable importance obtained by RF. It was found that the Kennaugh matrix elements, Yamaguchi, and Freeman-Durden decompositions were the most important parameters for wetland classification in this study. Using this new hierarchical RF classification approach, an overall accuracy of up to 94% was obtained for classifying different land cover types in the study area.

  12. Identification of terrain cover using the optimum polarimetric classifier

    NASA Technical Reports Server (NTRS)

    Kong, J. A.; Swartz, A. A.; Yueh, H. A.; Novak, L. M.; Shin, R. T.

    1988-01-01

    A systematic approach for the identification of terrain media such as vegetation canopy, forest, and snow-covered fields is developed using the optimum polarimetric classifier. The covariance matrices for various terrain cover are computed from theoretical models of random medium by evaluating the scattering matrix elements. The optimal classification scheme makes use of a quadratic distance measure and is applied to classify a vegetation canopy consisting of both trees and grass. Experimentally measured data are used to validate the classification scheme. Analytical and Monte Carlo simulated classification errors using the fully polarimetric feature vector are compared with classification based on single features which include the phase difference between the VV and HH polarization returns. It is shown that the full polarimetric results are optimal and provide better classification performance than single feature measurements.

  13. Comparison of Random Forest, k-Nearest Neighbor, and Support Vector Machine Classifiers for Land Cover Classification Using Sentinel-2 Imagery

    PubMed Central

    Thanh Noi, Phan; Kappas, Martin

    2017-01-01

    In previous classification studies, three non-parametric classifiers, Random Forest (RF), k-Nearest Neighbor (kNN), and Support Vector Machine (SVM), were reported as the foremost classifiers at producing high accuracies. However, only a few studies have compared the performances of these classifiers with different training sample sizes for the same remote sensing images, particularly the Sentinel-2 Multispectral Imager (MSI). In this study, we examined and compared the performances of the RF, kNN, and SVM classifiers for land use/cover classification using Sentinel-2 image data. An area of 30 × 30 km2 within the Red River Delta of Vietnam with six land use/cover types was classified using 14 different training sample sizes, including balanced and imbalanced, from 50 to over 1250 pixels/class. All classification results showed a high overall accuracy (OA) ranging from 90% to 95%. Among the three classifiers and 14 sub-datasets, SVM produced the highest OA with the least sensitivity to the training sample sizes, followed consecutively by RF and kNN. In relation to the sample size, all three classifiers showed a similar and high OA (over 93.85%) when the training sample size was large enough, i.e., greater than 750 pixels/class or representing an area of approximately 0.25% of the total study area. The high accuracy was achieved with both imbalanced and balanced datasets. PMID:29271909

  14. Comparison of Random Forest, k-Nearest Neighbor, and Support Vector Machine Classifiers for Land Cover Classification Using Sentinel-2 Imagery.

    PubMed

    Thanh Noi, Phan; Kappas, Martin

    2017-12-22

    In previous classification studies, three non-parametric classifiers, Random Forest (RF), k-Nearest Neighbor (kNN), and Support Vector Machine (SVM), were reported as the foremost classifiers at producing high accuracies. However, only a few studies have compared the performances of these classifiers with different training sample sizes for the same remote sensing images, particularly the Sentinel-2 Multispectral Imager (MSI). In this study, we examined and compared the performances of the RF, kNN, and SVM classifiers for land use/cover classification using Sentinel-2 image data. An area of 30 × 30 km² within the Red River Delta of Vietnam with six land use/cover types was classified using 14 different training sample sizes, including balanced and imbalanced, from 50 to over 1250 pixels/class. All classification results showed a high overall accuracy (OA) ranging from 90% to 95%. Among the three classifiers and 14 sub-datasets, SVM produced the highest OA with the least sensitivity to the training sample sizes, followed consecutively by RF and kNN. In relation to the sample size, all three classifiers showed a similar and high OA (over 93.85%) when the training sample size was large enough, i.e., greater than 750 pixels/class or representing an area of approximately 0.25% of the total study area. The high accuracy was achieved with both imbalanced and balanced datasets.

  15. Optimal Subset Selection of Time-Series MODIS Images and Sample Data Transfer with Random Forests for Supervised Classification Modelling.

    PubMed

    Zhou, Fuqun; Zhang, Aining

    2016-10-25

    Nowadays, various time-series Earth Observation data with multiple bands are freely available, such as Moderate Resolution Imaging Spectroradiometer (MODIS) datasets including 8-day composites from NASA, and 10-day composites from the Canada Centre for Remote Sensing (CCRS). It is challenging to efficiently use these time-series MODIS datasets for long-term environmental monitoring due to their vast volume and information redundancy. This challenge will be greater when Sentinel 2-3 data become available. Another challenge that researchers face is the lack of in-situ data for supervised modelling, especially for time-series data analysis. In this study, we attempt to tackle the two important issues with a case study of land cover mapping using CCRS 10-day MODIS composites with the help of Random Forests' features: variable importance, outlier identification. The variable importance feature is used to analyze and select optimal subsets of time-series MODIS imagery for efficient land cover mapping, and the outlier identification feature is utilized for transferring sample data available from one year to an adjacent year for supervised classification modelling. The results of the case study of agricultural land cover classification at a regional scale show that using only about a half of the variables we can achieve land cover classification accuracy close to that generated using the full dataset. The proposed simple but effective solution of sample transferring could make supervised modelling possible for applications lacking sample data.

  16. Neonatal Seizure Detection Using Deep Convolutional Neural Networks.

    PubMed

    Ansari, Amir H; Cherian, Perumpillichira J; Caicedo, Alexander; Naulaers, Gunnar; De Vos, Maarten; Van Huffel, Sabine

    2018-04-02

    Identifying a core set of features is one of the most important steps in the development of an automated seizure detector. In most of the published studies describing features and seizure classifiers, the features were hand-engineered, which may not be optimal. The main goal of the present paper is using deep convolutional neural networks (CNNs) and random forest to automatically optimize feature selection and classification. The input of the proposed classifier is raw multi-channel EEG and the output is the class label: seizure/nonseizure. By training this network, the required features are optimized, while fitting a nonlinear classifier on the features. After training the network with EEG recordings of 26 neonates, five end layers performing the classification were replaced with a random forest classifier in order to improve the performance. This resulted in a false alarm rate of 0.9 per hour and seizure detection rate of 77% using a test set of EEG recordings of 22 neonates that also included dubious seizures. The newly proposed CNN classifier outperformed three data-driven feature-based approaches and performed similar to a previously developed heuristic method.

  17. Random forest classification of stars in the Galactic Centre

    NASA Astrophysics Data System (ADS)

    Plewa, P. M.

    2018-05-01

    Near-infrared high-angular resolution imaging observations of the Milky Way's nuclear star cluster have revealed all luminous members of the existing stellar population within the central parsec. Generally, these stars are either evolved late-type giants or massive young, early-type stars. We revisit the problem of stellar classification based on intermediate-band photometry in the K band, with the primary aim of identifying faint early-type candidate stars in the extended vicinity of the central massive black hole. A random forest classifier, trained on a subsample of spectroscopically identified stars, performs similarly well as competitive methods (F1 = 0.85), without involving any model of stellar spectral energy distributions. Advantages of using such a machine-trained classifier are a minimum of required calibration effort, a predictive accuracy expected to improve as more training data become available, and the ease of application to future, larger data sets. By applying this classifier to archive data, we are also able to reproduce the results of previous studies of the spatial distribution and the K-band luminosity function of both the early- and late-type stars.

  18. Quantifying and Characterizing Tonic Thermal Pain Across Subjects From EEG Data Using Random Forest Models.

    PubMed

    Vijayakumar, Vishal; Case, Michelle; Shirinpour, Sina; He, Bin

    2017-12-01

    Effective pain assessment and management strategies are needed to better manage pain. In addition to self-report, an objective pain assessment system can provide a more complete picture of the neurophysiological basis for pain. In this study, a robust and accurate machine learning approach is developed to quantify tonic thermal pain across healthy subjects into a maximum of ten distinct classes. A random forest model was trained to predict pain scores using time-frequency wavelet representations of independent components obtained from electroencephalography (EEG) data, and the relative importance of each frequency band to pain quantification is assessed. The mean classification accuracy for predicting pain on an independent test subject for a range of 1-10 is 89.45%, highest among existing state of the art quantification algorithms for EEG. The gamma band is the most important to both intersubject and intrasubject classification accuracy. The robustness and generalizability of the classifier are demonstrated. Our results demonstrate the potential of this tool to be used clinically to help us to improve chronic pain treatment and establish spectral biomarkers for future pain-related studies using EEG.

  19. Evaluation of Semi-supervised Learning for Classification of Protein Crystallization Imagery.

    PubMed

    Sigdel, Madhav; Dinç, İmren; Dinç, Semih; Sigdel, Madhu S; Pusey, Marc L; Aygün, Ramazan S

    2014-03-01

    In this paper, we investigate the performance of two wrapper methods for semi-supervised learning algorithms for classification of protein crystallization images with limited labeled images. Firstly, we evaluate the performance of semi-supervised approach using self-training with naïve Bayesian (NB) and sequential minimum optimization (SMO) as the base classifiers. The confidence values returned by these classifiers are used to select high confident predictions to be used for self-training. Secondly, we analyze the performance of Yet Another Two Stage Idea (YATSI) semi-supervised learning using NB, SMO, multilayer perceptron (MLP), J48 and random forest (RF) classifiers. These results are compared with the basic supervised learning using the same training sets. We perform our experiments on a dataset consisting of 2250 protein crystallization images for different proportions of training and test data. Our results indicate that NB and SMO using both self-training and YATSI semi-supervised approaches improve accuracies with respect to supervised learning. On the other hand, MLP, J48 and RF perform better using basic supervised learning. Overall, random forest classifier yields the best accuracy with supervised learning for our dataset.

  20. Preselection statistics and Random Forest classification identify population informative single nucleotide polymorphisms in cosmopolitan and autochthonous cattle breeds.

    PubMed

    Bertolini, F; Galimberti, G; Schiavo, G; Mastrangelo, S; Di Gerlando, R; Strillacci, M G; Bagnato, A; Portolano, B; Fontanesi, L

    2018-01-01

    Commercial single nucleotide polymorphism (SNP) arrays have been recently developed for several species and can be used to identify informative markers to differentiate breeds or populations for several downstream applications. To identify the most discriminating genetic markers among thousands of genotyped SNPs, a few statistical approaches have been proposed. In this work, we compared several methods of SNPs preselection (Delta, F st and principal component analyses (PCA)) in addition to Random Forest classifications to analyse SNP data from six dairy cattle breeds, including cosmopolitan (Holstein, Brown and Simmental) and autochthonous Italian breeds raised in two different regions and subjected to limited or no breeding programmes (Cinisara, Modicana, raised only in Sicily and Reggiana, raised only in Emilia Romagna). From these classifications, two panels of 96 and 48 SNPs that contain the most discriminant SNPs were created for each preselection method. These panels were evaluated in terms of the ability to discriminate as a whole and breed-by-breed, as well as linkage disequilibrium within each panel. The obtained results showed that for the 48-SNP panel, the error rate increased mainly for autochthonous breeds, probably as a consequence of their admixed origin lower selection pressure and by ascertaining bias in the construction of the SNP chip. The 96-SNP panels were generally more able to discriminate all breeds. The panel derived by PCA-chrom (obtained by a preselection chromosome by chromosome) could identify informative SNPs that were particularly useful for the assignment of minor breeds that reached the lowest value of Out Of Bag error even in the Cinisara, whose value was quite high in all other panels. Moreover, this panel contained also the lowest number of SNPs in linkage disequilibrium. Several selected SNPs are located nearby genes affecting breed-specific phenotypic traits (coat colour and stature) or associated with production traits. In general, our results demonstrated the usefulness of Random Forest in combination to other reduction techniques to identify population informative SNPs.

  1. Benchmarking protein classification algorithms via supervised cross-validation.

    PubMed

    Kertész-Farkas, Attila; Dhir, Somdutta; Sonego, Paolo; Pacurar, Mircea; Netoteia, Sergiu; Nijveen, Harm; Kuzniar, Arnold; Leunissen, Jack A M; Kocsor, András; Pongor, Sándor

    2008-04-24

    Development and testing of protein classification algorithms are hampered by the fact that the protein universe is characterized by groups vastly different in the number of members, in average protein size, similarity within group, etc. Datasets based on traditional cross-validation (k-fold, leave-one-out, etc.) may not give reliable estimates on how an algorithm will generalize to novel, distantly related subtypes of the known protein classes. Supervised cross-validation, i.e., selection of test and train sets according to the known subtypes within a database has been successfully used earlier in conjunction with the SCOP database. Our goal was to extend this principle to other databases and to design standardized benchmark datasets for protein classification. Hierarchical classification trees of protein categories provide a simple and general framework for designing supervised cross-validation strategies for protein classification. Benchmark datasets can be designed at various levels of the concept hierarchy using a simple graph-theoretic distance. A combination of supervised and random sampling was selected to construct reduced size model datasets, suitable for algorithm comparison. Over 3000 new classification tasks were added to our recently established protein classification benchmark collection that currently includes protein sequence (including protein domains and entire proteins), protein structure and reading frame DNA sequence data. We carried out an extensive evaluation based on various machine-learning algorithms such as nearest neighbor, support vector machines, artificial neural networks, random forests and logistic regression, used in conjunction with comparison algorithms, BLAST, Smith-Waterman, Needleman-Wunsch, as well as 3D comparison methods DALI and PRIDE. The resulting datasets provide lower, and in our opinion more realistic estimates of the classifier performance than do random cross-validation schemes. A combination of supervised and random sampling was used to construct model datasets, suitable for algorithm comparison.

  2. Intelligent Fault Diagnosis of HVCB with Feature Space Optimization-Based Random Forest

    PubMed Central

    Ma, Suliang; Wu, Jianwen; Wang, Yuhao; Jia, Bowen; Jiang, Yuan

    2018-01-01

    Mechanical faults of high-voltage circuit breakers (HVCBs) always happen over long-term operation, so extracting the fault features and identifying the fault type have become a key issue for ensuring the security and reliability of power supply. Based on wavelet packet decomposition technology and random forest algorithm, an effective identification system was developed in this paper. First, compared with the incomplete description of Shannon entropy, the wavelet packet time-frequency energy rate (WTFER) was adopted as the input vector for the classifier model in the feature selection procedure. Then, a random forest classifier was used to diagnose the HVCB fault, assess the importance of the feature variable and optimize the feature space. Finally, the approach was verified based on actual HVCB vibration signals by considering six typical fault classes. The comparative experiment results show that the classification accuracy of the proposed method with the origin feature space reached 93.33% and reached up to 95.56% with optimized input feature vector of classifier. This indicates that feature optimization procedure is successful, and the proposed diagnosis algorithm has higher efficiency and robustness than traditional methods. PMID:29659548

  3. GPURFSCREEN: a GPU based virtual screening tool using random forest classifier.

    PubMed

    Jayaraj, P B; Ajay, Mathias K; Nufail, M; Gopakumar, G; Jaleel, U C A

    2016-01-01

    In-silico methods are an integral part of modern drug discovery paradigm. Virtual screening, an in-silico method, is used to refine data models and reduce the chemical space on which wet lab experiments need to be performed. Virtual screening of a ligand data model requires large scale computations, making it a highly time consuming task. This process can be speeded up by implementing parallelized algorithms on a Graphical Processing Unit (GPU). Random Forest is a robust classification algorithm that can be employed in the virtual screening. A ligand based virtual screening tool (GPURFSCREEN) that uses random forests on GPU systems has been proposed and evaluated in this paper. This tool produces optimized results at a lower execution time for large bioassay data sets. The quality of results produced by our tool on GPU is same as that on a regular serial environment. Considering the magnitude of data to be screened, the parallelized virtual screening has a significantly lower running time at high throughput. The proposed parallel tool outperforms its serial counterpart by successfully screening billions of molecules in training and prediction phases.

  4. Method: automatic segmentation of mitochondria utilizing patch classification, contour pair classification, and automatically seeded level sets

    PubMed Central

    2012-01-01

    Background While progress has been made to develop automatic segmentation techniques for mitochondria, there remains a need for more accurate and robust techniques to delineate mitochondria in serial blockface scanning electron microscopic data. Previously developed texture based methods are limited for solving this problem because texture alone is often not sufficient to identify mitochondria. This paper presents a new three-step method, the Cytoseg process, for automated segmentation of mitochondria contained in 3D electron microscopic volumes generated through serial block face scanning electron microscopic imaging. The method consists of three steps. The first is a random forest patch classification step operating directly on 2D image patches. The second step consists of contour-pair classification. At the final step, we introduce a method to automatically seed a level set operation with output from previous steps. Results We report accuracy of the Cytoseg process on three types of tissue and compare it to a previous method based on Radon-Like Features. At step 1, we show that the patch classifier identifies mitochondria texture but creates many false positive pixels. At step 2, our contour processing step produces contours and then filters them with a second classification step, helping to improve overall accuracy. We show that our final level set operation, which is automatically seeded with output from previous steps, helps to smooth the results. Overall, our results show that use of contour pair classification and level set operations improve segmentation accuracy beyond patch classification alone. We show that the Cytoseg process performs well compared to another modern technique based on Radon-Like Features. Conclusions We demonstrated that texture based methods for mitochondria segmentation can be enhanced with multiple steps that form an image processing pipeline. While we used a random-forest based patch classifier to recognize texture, it would be possible to replace this with other texture identifiers, and we plan to explore this in future work. PMID:22321695

  5. Hydrologic Landscape Regionalisation Using Deductive Classification and Random Forests

    PubMed Central

    Brown, Stuart C.; Lester, Rebecca E.; Versace, Vincent L.; Fawcett, Jonathon; Laurenson, Laurie

    2014-01-01

    Landscape classification and hydrological regionalisation studies are being increasingly used in ecohydrology to aid in the management and research of aquatic resources. We present a methodology for classifying hydrologic landscapes based on spatial environmental variables by employing non-parametric statistics and hybrid image classification. Our approach differed from previous classifications which have required the use of an a priori spatial unit (e.g. a catchment) which necessarily results in the loss of variability that is known to exist within those units. The use of a simple statistical approach to identify an appropriate number of classes eliminated the need for large amounts of post-hoc testing with different number of groups, or the selection and justification of an arbitrary number. Using statistical clustering, we identified 23 distinct groups within our training dataset. The use of a hybrid classification employing random forests extended this statistical clustering to an area of approximately 228,000 km2 of south-eastern Australia without the need to rely on catchments, landscape units or stream sections. This extension resulted in a highly accurate regionalisation at both 30-m and 2.5-km resolution, and a less-accurate 10-km classification that would be more appropriate for use at a continental scale. A smaller case study, of an area covering 27,000 km2, demonstrated that the method preserved the intra- and inter-catchment variability that is known to exist in local hydrology, based on previous research. Preliminary analysis linking the regionalisation to streamflow indices is promising suggesting that the method could be used to predict streamflow behaviour in ungauged catchments. Our work therefore simplifies current classification frameworks that are becoming more popular in ecohydrology, while better retaining small-scale variability in hydrology, thus enabling future attempts to explain and visualise broad-scale hydrologic trends at the scale of catchments and continents. PMID:25396410

  6. Hydrologic landscape regionalisation using deductive classification and random forests.

    PubMed

    Brown, Stuart C; Lester, Rebecca E; Versace, Vincent L; Fawcett, Jonathon; Laurenson, Laurie

    2014-01-01

    Landscape classification and hydrological regionalisation studies are being increasingly used in ecohydrology to aid in the management and research of aquatic resources. We present a methodology for classifying hydrologic landscapes based on spatial environmental variables by employing non-parametric statistics and hybrid image classification. Our approach differed from previous classifications which have required the use of an a priori spatial unit (e.g. a catchment) which necessarily results in the loss of variability that is known to exist within those units. The use of a simple statistical approach to identify an appropriate number of classes eliminated the need for large amounts of post-hoc testing with different number of groups, or the selection and justification of an arbitrary number. Using statistical clustering, we identified 23 distinct groups within our training dataset. The use of a hybrid classification employing random forests extended this statistical clustering to an area of approximately 228,000 km2 of south-eastern Australia without the need to rely on catchments, landscape units or stream sections. This extension resulted in a highly accurate regionalisation at both 30-m and 2.5-km resolution, and a less-accurate 10-km classification that would be more appropriate for use at a continental scale. A smaller case study, of an area covering 27,000 km2, demonstrated that the method preserved the intra- and inter-catchment variability that is known to exist in local hydrology, based on previous research. Preliminary analysis linking the regionalisation to streamflow indices is promising suggesting that the method could be used to predict streamflow behaviour in ungauged catchments. Our work therefore simplifies current classification frameworks that are becoming more popular in ecohydrology, while better retaining small-scale variability in hydrology, thus enabling future attempts to explain and visualise broad-scale hydrologic trends at the scale of catchments and continents.

  7. The brain MRI classification problem from wavelets perspective

    NASA Astrophysics Data System (ADS)

    Bendib, Mohamed M.; Merouani, Hayet F.; Diaba, Fatma

    2015-02-01

    Haar and Daubechies 4 (DB4) are the most used wavelets for brain MRI (Magnetic Resonance Imaging) classification. The former is simple and fast to compute while the latter is more complex and offers a better resolution. This paper explores the potential of both of them in performing Normal versus Pathological discrimination on the one hand, and Multiclassification on the other hand. The Whole Brain Atlas is used as a validation database, and the Random Forest (RF) algorithm is employed as a learning approach. The achieved results are discussed and statistically compared.

  8. Modeling Verdict Outcomes Using Social Network Measures: The Watergate and Caviar Network Cases

    PubMed Central

    2016-01-01

    Modelling criminal trial verdict outcomes using social network measures is an emerging research area in quantitative criminology. Few studies have yet analyzed which of these measures are the most important for verdict modelling or which data classification techniques perform best for this application. To compare the performance of different techniques in classifying members of a criminal network, this article applies three different machine learning classifiers–Logistic Regression, Naïve Bayes and Random Forest–with a range of social network measures and the necessary databases to model the verdicts in two real–world cases: the U.S. Watergate Conspiracy of the 1970’s and the now–defunct Canada–based international drug trafficking ring known as the Caviar Network. In both cases it was found that the Random Forest classifier did better than either Logistic Regression or Naïve Bayes, and its superior performance was statistically significant. This being so, Random Forest was used not only for classification but also to assess the importance of the measures. For the Watergate case, the most important one proved to be betweenness centrality while for the Caviar Network, it was the effective size of the network. These results are significant because they show that an approach combining machine learning with social network analysis not only can generate accurate classification models but also helps quantify the importance social network variables in modelling verdict outcomes. We conclude our analysis with a discussion and some suggestions for future work in verdict modelling using social network measures. PMID:26824351

  9. Machine learning methods for the classification of gliomas: Initial results using features extracted from MR spectroscopy.

    PubMed

    Ranjith, G; Parvathy, R; Vikas, V; Chandrasekharan, Kesavadas; Nair, Suresh

    2015-04-01

    With the advent of new imaging modalities, radiologists are faced with handling increasing volumes of data for diagnosis and treatment planning. The use of automated and intelligent systems is becoming essential in such a scenario. Machine learning, a branch of artificial intelligence, is increasingly being used in medical image analysis applications such as image segmentation, registration and computer-aided diagnosis and detection. Histopathological analysis is currently the gold standard for classification of brain tumors. The use of machine learning algorithms along with extraction of relevant features from magnetic resonance imaging (MRI) holds promise of replacing conventional invasive methods of tumor classification. The aim of the study is to classify gliomas into benign and malignant types using MRI data. Retrospective data from 28 patients who were diagnosed with glioma were used for the analysis. WHO Grade II (low-grade astrocytoma) was classified as benign while Grade III (anaplastic astrocytoma) and Grade IV (glioblastoma multiforme) were classified as malignant. Features were extracted from MR spectroscopy. The classification was done using four machine learning algorithms: multilayer perceptrons, support vector machine, random forest and locally weighted learning. Three of the four machine learning algorithms gave an area under ROC curve in excess of 0.80. Random forest gave the best performance in terms of AUC (0.911) while sensitivity was best for locally weighted learning (86.1%). The performance of different machine learning algorithms in the classification of gliomas is promising. An even better performance may be expected by integrating features extracted from other MR sequences. © The Author(s) 2015 Reprints and permissions: sagepub.co.uk/journalsPermissions.nav.

  10. Probability machines: consistent probability estimation using nonparametric learning machines.

    PubMed

    Malley, J D; Kruppa, J; Dasgupta, A; Malley, K G; Ziegler, A

    2012-01-01

    Most machine learning approaches only provide a classification for binary responses. However, probabilities are required for risk estimation using individual patient characteristics. It has been shown recently that every statistical learning machine known to be consistent for a nonparametric regression problem is a probability machine that is provably consistent for this estimation problem. The aim of this paper is to show how random forests and nearest neighbors can be used for consistent estimation of individual probabilities. Two random forest algorithms and two nearest neighbor algorithms are described in detail for estimation of individual probabilities. We discuss the consistency of random forests, nearest neighbors and other learning machines in detail. We conduct a simulation study to illustrate the validity of the methods. We exemplify the algorithms by analyzing two well-known data sets on the diagnosis of appendicitis and the diagnosis of diabetes in Pima Indians. Simulations demonstrate the validity of the method. With the real data application, we show the accuracy and practicality of this approach. We provide sample code from R packages in which the probability estimation is already available. This means that all calculations can be performed using existing software. Random forest algorithms as well as nearest neighbor approaches are valid machine learning methods for estimating individual probabilities for binary responses. Freely available implementations are available in R and may be used for applications.

  11. Source identification of western Oregon Douglas-Fir wood cores using mass spectrometry and random forest Classification

    Treesearch

    Kristen Finch; Edgard Espinoza; F. Andrew Jones; Richard Cronn

    2017-01-01

    Premise of the study: We investigated whether wood metabolite profiles from direct analysis in real time (time-of-flight) mass spectrometry (DART-TOFMS) could be used to determine the geographic origin of Douglas-fir wood cores originating from two regions in western Oregon, USA. Methods: Three annual ring mass...

  12. Classification of forest land attributes using multi-source remotely sensed data

    NASA Astrophysics Data System (ADS)

    Pippuri, Inka; Suvanto, Aki; Maltamo, Matti; Korhonen, Kari T.; Pitkänen, Juho; Packalen, Petteri

    2016-02-01

    The aim of the study was to (1) examine the classification of forest land using airborne laser scanning (ALS) data, satellite images and sample plots of the Finnish National Forest Inventory (NFI) as training data and to (2) identify best performing metrics for classifying forest land attributes. Six different schemes of forest land classification were studied: land use/land cover (LU/LC) classification using both national classes and FAO (Food and Agricultural Organization of the United Nations) classes, main type, site type, peat land type and drainage status. Special interest was to test different ALS-based surface metrics in classification of forest land attributes. Field data consisted of 828 NFI plots collected in 2008-2012 in southern Finland and remotely sensed data was from summer 2010. Multinomial logistic regression was used as the classification method. Classification of LU/LC classes were highly accurate (kappa-values 0.90 and 0.91) but also the classification of site type, peat land type and drainage status succeeded moderately well (kappa-values 0.51, 0.69 and 0.52). ALS-based surface metrics were found to be the most important predictor variables in classification of LU/LC class, main type and drainage status. In best classification models of forest site types both spectral metrics from satellite data and point cloud metrics from ALS were used. In turn, in the classification of peat land types ALS point cloud metrics played the most important role. Results indicated that the prediction of site type and forest land category could be incorporated into stand level forest management inventory system in Finland.

  13. Pre-operative prediction of surgical morbidity in children: comparison of five statistical models.

    PubMed

    Cooper, Jennifer N; Wei, Lai; Fernandez, Soledad A; Minneci, Peter C; Deans, Katherine J

    2015-02-01

    The accurate prediction of surgical risk is important to patients and physicians. Logistic regression (LR) models are typically used to estimate these risks. However, in the fields of data mining and machine-learning, many alternative classification and prediction algorithms have been developed. This study aimed to compare the performance of LR to several data mining algorithms for predicting 30-day surgical morbidity in children. We used the 2012 National Surgical Quality Improvement Program-Pediatric dataset to compare the performance of (1) a LR model that assumed linearity and additivity (simple LR model) (2) a LR model incorporating restricted cubic splines and interactions (flexible LR model) (3) a support vector machine, (4) a random forest and (5) boosted classification trees for predicting surgical morbidity. The ensemble-based methods showed significantly higher accuracy, sensitivity, specificity, PPV, and NPV than the simple LR model. However, none of the models performed better than the flexible LR model in terms of the aforementioned measures or in model calibration or discrimination. Support vector machines, random forests, and boosted classification trees do not show better performance than LR for predicting pediatric surgical morbidity. After further validation, the flexible LR model derived in this study could be used to assist with clinical decision-making based on patient-specific surgical risks. Copyright © 2014 Elsevier Ltd. All rights reserved.

  14. Bayes classification of interferometric TOPSAR data

    NASA Technical Reports Server (NTRS)

    Michel, T. R.; Rodriguez, E.; Houshmand, B.; Carande, R.

    1995-01-01

    We report the Bayes classification of terrain types at different sites using airborne interferometric synthetic aperture radar (INSAR) data. A Gaussian maximum likelihood classifier was applied on multidimensional observations derived from the SAR intensity, the terrain elevation model, and the magnitude of the interferometric correlation. Training sets for forested, urban, agricultural, or bare areas were obtained either by selecting samples with known ground truth, or by k-means clustering of random sets of samples uniformly distributed across all sites, and subsequent assignments of these clusters using ground truth. The accuracy of the classifier was used to optimize the discriminating efficiency of the set of features that was chosen. The most important features include the SAR intensity, a canopy penetration depth model, and the terrain slope. We demonstrate the classifier's performance across sites using a unique set of training classes for the four main terrain categories. The scenes examined include San Francisco (CA) (predominantly urban and water), Mount Adams (WA) (forested with clear cuts), Pasadena (CA) (urban with mountains), and Antioch Hills (CA) (water, swamps, fields). Issues related to the effects of image calibration and the robustness of the classification to calibration errors are explored. The relative performance of single polarization Interferometric data classification is contrasted against classification schemes based on polarimetric SAR data.

  15. Computed tomography synthesis from magnetic resonance images in the pelvis using multiple random forests and auto-context features

    NASA Astrophysics Data System (ADS)

    Andreasen, Daniel; Edmund, Jens M.; Zografos, Vasileios; Menze, Bjoern H.; Van Leemput, Koen

    2016-03-01

    In radiotherapy treatment planning that is only based on magnetic resonance imaging (MRI), the electron density information usually obtained from computed tomography (CT) must be derived from the MRI by synthesizing a so-called pseudo CT (pCT). This is a non-trivial task since MRI intensities are neither uniquely nor quantitatively related to electron density. Typical approaches involve either a classification or regression model requiring specialized MRI sequences to solve intensity ambiguities, or an atlas-based model necessitating multiple registrations between atlases and subject scans. In this work, we explore a machine learning approach for creating a pCT of the pelvic region from conventional MRI sequences without using atlases. We use a random forest provided with information about local texture, edges and spatial features derived from the MRI. This helps to solve intensity ambiguities. Furthermore, we use the concept of auto-context by sequentially training a number of classification forests to create and improve context features, which are finally used to train a regression forest for pCT prediction. We evaluate the pCT quality in terms of the voxel-wise error and the radiologic accuracy as measured by water-equivalent path lengths. We compare the performance of our method against two baseline pCT strategies, which either set all MRI voxels in the subject equal to the CT value of water, or in addition transfer the bone volume from the real CT. We show an improved performance compared to both baseline pCTs suggesting that our method may be useful for MRI-only radiotherapy.

  16. Automatic Estimation of Osteoporotic Fracture Cases by Using Ensemble Learning Approaches.

    PubMed

    Kilic, Niyazi; Hosgormez, Erkan

    2016-03-01

    Ensemble learning methods are one of the most powerful tools for the pattern classification problems. In this paper, the effects of ensemble learning methods and some physical bone densitometry parameters on osteoporotic fracture detection were investigated. Six feature set models were constructed including different physical parameters and they fed into the ensemble classifiers as input features. As ensemble learning techniques, bagging, gradient boosting and random subspace (RSM) were used. Instance based learning (IBk) and random forest (RF) classifiers applied to six feature set models. The patients were classified into three groups such as osteoporosis, osteopenia and control (healthy), using ensemble classifiers. Total classification accuracy and f-measure were also used to evaluate diagnostic performance of the proposed ensemble classification system. The classification accuracy has reached to 98.85 % by the combination of model 6 (five BMD + five T-score values) using RSM-RF classifier. The findings of this paper suggest that the patients will be able to be warned before a bone fracture occurred, by just examining some physical parameters that can easily be measured without invasive operations.

  17. Application of random forests methods to diabetic retinopathy classification analyses.

    PubMed

    Casanova, Ramon; Saldana, Santiago; Chew, Emily Y; Danis, Ronald P; Greven, Craig M; Ambrosius, Walter T

    2014-01-01

    Diabetic retinopathy (DR) is one of the leading causes of blindness in the United States and world-wide. DR is a silent disease that may go unnoticed until it is too late for effective treatment. Therefore, early detection could improve the chances of therapeutic interventions that would alleviate its effects. Graded fundus photography and systemic data from 3443 ACCORD-Eye Study participants were used to estimate Random Forest (RF) and logistic regression classifiers. We studied the impact of sample size on classifier performance and the possibility of using RF generated class conditional probabilities as metrics describing DR risk. RF measures of variable importance are used to detect factors that affect classification performance. Both types of data were informative when discriminating participants with or without DR. RF based models produced much higher classification accuracy than those based on logistic regression. Combining both types of data did not increase accuracy but did increase statistical discrimination of healthy participants who subsequently did or did not have DR events during four years of follow-up. RF variable importance criteria revealed that microaneurysms counts in both eyes seemed to play the most important role in discrimination among the graded fundus variables, while the number of medicines and diabetes duration were the most relevant among the systemic variables. We have introduced RF methods to DR classification analyses based on fundus photography data. In addition, we propose an approach to DR risk assessment based on metrics derived from graded fundus photography and systemic data. Our results suggest that RF methods could be a valuable tool to diagnose DR diagnosis and evaluate its progression.

  18. Automatic medical image annotation and keyword-based image retrieval using relevance feedback.

    PubMed

    Ko, Byoung Chul; Lee, JiHyeon; Nam, Jae-Yeal

    2012-08-01

    This paper presents novel multiple keywords annotation for medical images, keyword-based medical image retrieval, and relevance feedback method for image retrieval for enhancing image retrieval performance. For semantic keyword annotation, this study proposes a novel medical image classification method combining local wavelet-based center symmetric-local binary patterns with random forests. For keyword-based image retrieval, our retrieval system use the confidence score that is assigned to each annotated keyword by combining probabilities of random forests with predefined body relation graph. To overcome the limitation of keyword-based image retrieval, we combine our image retrieval system with relevance feedback mechanism based on visual feature and pattern classifier. Compared with other annotation and relevance feedback algorithms, the proposed method shows both improved annotation performance and accurate retrieval results.

  19. Forested land cover classification on the Cumberland Plateau, Jackson County, Alabama: a comparison of Landsat ETM+ and SPOT5 images

    Treesearch

    Yong Wang; Shanta Parajuli; Callie Schweitzer; Glendon Smalley; Dawn Lemke; Wubishet Tadesse; Xiongwen Chen

    2010-01-01

    Forest cover classifications focus on the overall growth form (physiognomy) of the community, dominant vegetation, and species composition of the existing forest. Accurately classifying the forest cover type is important for forest inventory and silviculture. We compared classification accuracy based on Landsat Enhanced Thematic Mapper Plus (Landsat ETM+) and Satellite...

  20. Refining Time-Activity Classification of Human Subjects Using the Global Positioning System.

    PubMed

    Hu, Maogui; Li, Wei; Li, Lianfa; Houston, Douglas; Wu, Jun

    2016-01-01

    Detailed spatial location information is important in accurately estimating personal exposure to air pollution. Global Position System (GPS) has been widely used in tracking personal paths and activities. Previous researchers have developed time-activity classification models based on GPS data, most of them were developed for specific regions. An adaptive model for time-location classification can be widely applied to air pollution studies that use GPS to track individual level time-activity patterns. Time-activity data were collected for seven days using GPS loggers and accelerometers from thirteen adult participants from Southern California under free living conditions. We developed an automated model based on random forests to classify major time-activity patterns (i.e. indoor, outdoor-static, outdoor-walking, and in-vehicle travel). Sensitivity analysis was conducted to examine the contribution of the accelerometer data and the supplemental spatial data (i.e. roadway and tax parcel data) to the accuracy of time-activity classification. Our model was evaluated using both leave-one-fold-out and leave-one-subject-out methods. Maximum speeds in averaging time intervals of 7 and 5 minutes, and distance to primary highways with limited access were found to be the three most important variables in the classification model. Leave-one-fold-out cross-validation showed an overall accuracy of 99.71%. Sensitivities varied from 84.62% (outdoor walking) to 99.90% (indoor). Specificities varied from 96.33% (indoor) to 99.98% (outdoor static). The exclusion of accelerometer and ambient light sensor variables caused a slight loss in sensitivity for outdoor walking, but little loss in overall accuracy. However, leave-one-subject-out cross-validation showed considerable loss in sensitivity for outdoor static and outdoor walking conditions. The random forests classification model can achieve high accuracy for the four major time-activity categories. The model also performed well with just GPS, road and tax parcel data. However, caution is warranted when generalizing the model developed from a small number of subjects to other populations.

  1. Preliminary classification of forest vegetation of the Kenai Peninsula, Alaska.

    Treesearch

    K.M. Reynolds

    1990-01-01

    A total of 5,597 photo points was systematically located on 1:60,000-scale high altitude photographs of the Kenai Peninsula, Alaska; photo interpretation was used to classify the vegetation at each grid position. Of the total grid points, 12.3 percent were classified as timberland; 129 photo points within the timberland class were randomly selected for field survey....

  2. Source identification of western Oregon Douglas-fir wood cores using mass spectrometry and random forest classification.

    PubMed

    Finch, Kristen; Espinoza, Edgard; Jones, F Andrew; Cronn, Richard

    2017-05-01

    We investigated whether wood metabolite profiles from direct analysis in real time (time-of-flight) mass spectrometry (DART-TOFMS) could be used to determine the geographic origin of Douglas-fir wood cores originating from two regions in western Oregon, USA. Three annual ring mass spectra were obtained from 188 adult Douglas-fir trees, and these were analyzed using random forest models to determine whether samples could be classified to geographic origin, growth year, or growth year and geographic origin. Specific wood molecules that contributed to geographic discrimination were identified. Douglas-fir mass spectra could be differentiated into two geographic classes with an accuracy between 70% and 76%. Classification models could not accurately classify sample mass spectra based on growth year. Thirty-two molecules were identified as key for classifying western Oregon Douglas-fir wood cores to geographic origin. DART-TOFMS is capable of detecting minute but regionally informative differences in wood molecules over a small geographic scale, and these differences made it possible to predict the geographic origin of Douglas-fir wood with moderate accuracy. Studies involving DART-TOFMS, alone and in combination with other technologies, will be relevant for identifying the geographic origin of illegally harvested wood.

  3. Standard errors and confidence intervals for variable importance in random forest regression, classification, and survival.

    PubMed

    Ishwaran, Hemant; Lu, Min

    2018-06-04

    Random forests are a popular nonparametric tree ensemble procedure with broad applications to data analysis. While its widespread popularity stems from its prediction performance, an equally important feature is that it provides a fully nonparametric measure of variable importance (VIMP). A current limitation of VIMP, however, is that no systematic method exists for estimating its variance. As a solution, we propose a subsampling approach that can be used to estimate the variance of VIMP and for constructing confidence intervals. The method is general enough that it can be applied to many useful settings, including regression, classification, and survival problems. Using extensive simulations, we demonstrate the effectiveness of the subsampling estimator and in particular find that the delete-d jackknife variance estimator, a close cousin, is especially effective under low subsampling rates due to its bias correction properties. These 2 estimators are highly competitive when compared with the .164 bootstrap estimator, a modified bootstrap procedure designed to deal with ties in out-of-sample data. Most importantly, subsampling is computationally fast, thus making it especially attractive for big data settings. Copyright © 2018 John Wiley & Sons, Ltd.

  4. Evaluation of Semi-supervised Learning for Classification of Protein Crystallization Imagery

    PubMed Central

    Sigdel, Madhav; Dinç, İmren; Dinç, Semih; Sigdel, Madhu S.; Pusey, Marc L.; Aygün, Ramazan S.

    2015-01-01

    In this paper, we investigate the performance of two wrapper methods for semi-supervised learning algorithms for classification of protein crystallization images with limited labeled images. Firstly, we evaluate the performance of semi-supervised approach using self-training with naïve Bayesian (NB) and sequential minimum optimization (SMO) as the base classifiers. The confidence values returned by these classifiers are used to select high confident predictions to be used for self-training. Secondly, we analyze the performance of Yet Another Two Stage Idea (YATSI) semi-supervised learning using NB, SMO, multilayer perceptron (MLP), J48 and random forest (RF) classifiers. These results are compared with the basic supervised learning using the same training sets. We perform our experiments on a dataset consisting of 2250 protein crystallization images for different proportions of training and test data. Our results indicate that NB and SMO using both self-training and YATSI semi-supervised approaches improve accuracies with respect to supervised learning. On the other hand, MLP, J48 and RF perform better using basic supervised learning. Overall, random forest classifier yields the best accuracy with supervised learning for our dataset. PMID:25914518

  5. Assessments of SENTINEL-2 Vegetation Red-Edge Spectral Bands for Improving Land Cover Classification

    NASA Astrophysics Data System (ADS)

    Qiu, S.; He, B.; Yin, C.; Liao, Z.

    2017-09-01

    The Multi Spectral Instrument (MSI) onboard Sentinel-2 can record the information in Vegetation Red-Edge (VRE) spectral domains. In this study, the performance of the VRE bands on improving land cover classification was evaluated based on a Sentinel-2A MSI image in East Texas, USA. Two classification scenarios were designed by excluding and including the VRE bands. A Random Forest (RF) classifier was used to generate land cover maps and evaluate the contributions of different spectral bands. The combination of VRE bands increased the overall classification accuracy by 1.40 %, which was statistically significant. Both confusion matrices and land cover maps indicated that the most beneficial increase was from vegetation-related land cover types, especially agriculture. Comparison of the relative importance of each band showed that the most beneficial VRE bands were Band 5 and Band 6. These results demonstrated the value of VRE bands for land cover classification.

  6. Characterization and detection of Anopheles vestitipennis and Anopheles punctimacula (Diptera: Culicidae) larval habitats in Belize with field survey and SPOT satellite imagery

    NASA Technical Reports Server (NTRS)

    Rejmankova, E.; Pope, K. O.; Roberts, D. R.; Lege, M. G.; Andre, R.; Greico, J.; Alonzo, Y.

    1998-01-01

    Surveys of larval habitats of Anopheles vestitipennis and Anopheles punctimacula were conducted in Belize, Central America. Habitat analysis and classification resulted in delineation of eight habitat types defined by dominant life forms and hydrology. Percent cover of tall dense macrophytes, shrubs, open water, and pH were significantly different between sites with and without An. vestitipennis. For An. punctimacula, percent cover of tall dense macrophytes, trees, detritus, open water, and water depth were significantly different between larvae positive and negative sites. The discriminant function for An. vestitipennis correctly predicted the presence of larvae in 65% of sites and correctly predicted the absence of larvae in 88% of sites. The discriminant function for An. punctimacula correctly predicted 81% of sites for the presence of larvae and 45% for the absence of larvae. Canonical discriminant analysis of the three groups of habitats (An. vestitipennis positive; An. punctimacula positive; all negative) confirmed that while larval habitats of An. punctimacula are clustered in the tree dominated area, larval habitats of An. vestitipennis were found in both tree dominated and tall dense macrophyte dominated environments. The forest larval habitats of An. vestitipennis and An. punctimacula seem to be randomly distributed among different forest types. Both species tend to occur in denser forests with more detritus, shallower water, and slightly higher pH. Classification of dry season (February) SPOT multispectral satellite imagery produced 10 land cover types with the swamp forest and tall dense marsh classes being of particular interest. The accuracy assessment showed that commission errors for the tall, dense marsh and swamp forest appeared to be minor; but omission errors were significant, especially for the swamp forest (perhaps because no swamp forests are flooded in February). This means that where the classification indicates there are An. vestitipennis breeding sites, they probably do exist; but breeding sites in many locations are not identified and could be more abundant than indicated.

  7. Sleep state classification using pressure sensor mats.

    PubMed

    Baran Pouyan, M; Nourani, M; Pompeo, M

    2015-08-01

    Sleep state detection is valuable in assessing patient's sleep quality and in-bed general behavior. In this paper, a novel classification approach of sleep states (sleep, pre-wake, wake) is proposed that uses only surface pressure sensors. In our method, a mobility metric is defined based on successive pressure body maps. Then, suitable statistical features are computed based on the mobility metric. Finally, a customized random forest classifier is employed to identify various classes including a new class for pre-wake state. Our algorithm achieves 96.1% and 88% accuracies for two (sleep, wake) and three (sleep, pre-wake, wake) class identification, respectively.

  8. Cascade Classification with Adaptive Feature Extraction for Arrhythmia Detection.

    PubMed

    Park, Juyoung; Kang, Mingon; Gao, Jean; Kim, Younghoon; Kang, Kyungtae

    2017-01-01

    Detecting arrhythmia from ECG data is now feasible on mobile devices, but in this environment it is necessary to trade computational efficiency against accuracy. We propose an adaptive strategy for feature extraction that only considers normalized beat morphology features when running in a resource-constrained environment; but in a high-performance environment it takes account of a wider range of ECG features. This process is augmented by a cascaded random forest classifier. Experiments on data from the MIT-BIH Arrhythmia Database showed classification accuracies from 96.59% to 98.51%, which are comparable to state-of-the art methods.

  9. Assessment of various supervised learning algorithms using different performance metrics

    NASA Astrophysics Data System (ADS)

    Susheel Kumar, S. M.; Laxkar, Deepak; Adhikari, Sourav; Vijayarajan, V.

    2017-11-01

    Our work brings out comparison based on the performance of supervised machine learning algorithms on a binary classification task. The supervised machine learning algorithms which are taken into consideration in the following work are namely Support Vector Machine(SVM), Decision Tree(DT), K Nearest Neighbour (KNN), Naïve Bayes(NB) and Random Forest(RF). This paper mostly focuses on comparing the performance of above mentioned algorithms on one binary classification task by analysing the Metrics such as Accuracy, F-Measure, G-Measure, Precision, Misclassification Rate, False Positive Rate, True Positive Rate, Specificity, Prevalence.

  10. Development of machine learning models for diagnosis of glaucoma.

    PubMed

    Kim, Seong Jae; Cho, Kyong Jin; Oh, Sejong

    2017-01-01

    The study aimed to develop machine learning models that have strong prediction power and interpretability for diagnosis of glaucoma based on retinal nerve fiber layer (RNFL) thickness and visual field (VF). We collected various candidate features from the examination of retinal nerve fiber layer (RNFL) thickness and visual field (VF). We also developed synthesized features from original features. We then selected the best features proper for classification (diagnosis) through feature evaluation. We used 100 cases of data as a test dataset and 399 cases of data as a training and validation dataset. To develop the glaucoma prediction model, we considered four machine learning algorithms: C5.0, random forest (RF), support vector machine (SVM), and k-nearest neighbor (KNN). We repeatedly composed a learning model using the training dataset and evaluated it by using the validation dataset. Finally, we got the best learning model that produces the highest validation accuracy. We analyzed quality of the models using several measures. The random forest model shows best performance and C5.0, SVM, and KNN models show similar accuracy. In the random forest model, the classification accuracy is 0.98, sensitivity is 0.983, specificity is 0.975, and AUC is 0.979. The developed prediction models show high accuracy, sensitivity, specificity, and AUC in classifying among glaucoma and healthy eyes. It will be used for predicting glaucoma against unknown examination records. Clinicians may reference the prediction results and be able to make better decisions. We may combine multiple learning models to increase prediction accuracy. The C5.0 model includes decision rules for prediction. It can be used to explain the reasons for specific predictions.

  11. Differentiation of fat, muscle, and edema in thigh MRIs using random forest classification

    NASA Astrophysics Data System (ADS)

    Kovacs, William; Liu, Chia-Ying; Summers, Ronald M.; Yao, Jianhua

    2016-03-01

    There are many diseases that affect the distribution of muscles, including Duchenne and fascioscapulohumeral dystrophy among other myopathies. In these disease cases, it is important to quantify both the muscle and fat volumes to track the disease progression. There has also been evidence that abnormal signal intensity on the MR images, which often is an indication of edema or inflammation can be a good predictor for muscle deterioration. We present a fully-automated method that examines magnetic resonance (MR) images of the thigh and identifies the fat, muscle, and edema using a random forest classifier. First the thigh regions are automatically segmented using the T1 sequence. Then, inhomogeneity artifacts were corrected using the N3 technique. The T1 and STIR (short tau inverse recovery) images are then aligned using landmark based registration with the bone marrow. The normalized T1 and STIR intensity values are used to train the random forest. Once trained, the random forest can accurately classify the aforementioned classes. This method was evaluated on MR images of 9 patients. The precision values are 0.91+/-0.06, 0.98+/-0.01 and 0.50+/-0.29 for muscle, fat, and edema, respectively. The recall values are 0.95+/-0.02, 0.96+/-0.03 and 0.43+/-0.09 for muscle, fat, and edema, respectively. This demonstrates the feasibility of utilizing information from multiple MR sequences for the accurate quantification of fat, muscle and edema.

  12. Studies of the DIII-D disruption database using Machine Learning algorithms

    NASA Astrophysics Data System (ADS)

    Rea, Cristina; Granetz, Robert; Meneghini, Orso

    2017-10-01

    A Random Forests Machine Learning algorithm, trained on a large database of both disruptive and non-disruptive DIII-D discharges, predicts disruptive behavior in DIII-D with about 90% of accuracy. Several algorithms have been tested and Random Forests was found superior in performances for this particular task. Over 40 plasma parameters are included in the database, with data for each of the parameters taken from 500k time slices. We focused on a subset of non-dimensional plasma parameters, deemed to be good predictors based on physics considerations. Both binary (disruptive/non-disruptive) and multi-label (label based on the elapsed time before disruption) classification problems are investigated. The Random Forests algorithm provides insight on the available dataset by ranking the relative importance of the input features. It is found that q95 and Greenwald density fraction (n/nG) are the most relevant parameters for discriminating between DIII-D disruptive and non-disruptive discharges. A comparison with the Gradient Boosted Trees algorithm is shown and the first results coming from the application of regression algorithms are presented. Work supported by the US Department of Energy under DE-FC02-04ER54698, DE-SC0014264 and DE-FG02-95ER54309.

  13. Analysis of landslide hazard area in Ludian earthquake based on Random Forests

    NASA Astrophysics Data System (ADS)

    Xie, J.-C.; Liu, R.; Li, H.-W.; Lai, Z.-L.

    2015-04-01

    With the development of machine learning theory, more and more algorithms are evaluated for seismic landslides. After the Ludian earthquake, the research team combine with the special geological structure in Ludian area and the seismic filed exploration results, selecting SLOPE(PODU); River distance(HL); Fault distance(DC); Seismic Intensity(LD) and Digital Elevation Model(DEM), the normalized difference vegetation index(NDVI) which based on remote sensing images as evaluation factors. But the relationships among these factors are fuzzy, there also exists heavy noise and high-dimensional, we introduce the random forest algorithm to tolerate these difficulties and get the evaluation result of Ludian landslide areas, in order to verify the accuracy of the result, using the ROC graphs for the result evaluation standard, AUC covers an area of 0.918, meanwhile, the random forest's generalization error rate decreases with the increase of the classification tree to the ideal 0.08 by using Out Of Bag(OOB) Estimation. Studying the final landslides inversion results, paper comes to a statistical conclusion that near 80% of the whole landslides and dilapidations are in areas with high susceptibility and moderate susceptibility, showing the forecast results are reasonable and adopted.

  14. Fire detection system using random forest classification for image sequences of complex background

    NASA Astrophysics Data System (ADS)

    Kim, Onecue; Kang, Dong-Joong

    2013-06-01

    We present a fire alarm system based on image processing that detects fire accidents in various environments. To reduce false alarms that frequently appeared in earlier systems, we combined image features including color, motion, and blinking information. We specifically define the color conditions of fires in hue, saturation and value, and RGB color space. Fire features are represented as intensity variation, color mean and variance, motion, and image differences. Moreover, blinking fire features are modeled by using crossing patches. We propose an algorithm that classifies patches into fire or nonfire areas by using random forest supervised learning. We design an embedded surveillance device made with acrylonitrile butadiene styrene housing for stable fire detection in outdoor environments. The experimental results show that our algorithm works robustly in complex environments and is able to detect fires in real time.

  15. Combined rule extraction and feature elimination in supervised classification.

    PubMed

    Liu, Sheng; Patel, Ronak Y; Daga, Pankaj R; Liu, Haining; Fu, Gang; Doerksen, Robert J; Chen, Yixin; Wilkins, Dawn E

    2012-09-01

    There are a vast number of biology related research problems involving a combination of multiple sources of data to achieve a better understanding of the underlying problems. It is important to select and interpret the most important information from these sources. Thus it will be beneficial to have a good algorithm to simultaneously extract rules and select features for better interpretation of the predictive model. We propose an efficient algorithm, Combined Rule Extraction and Feature Elimination (CRF), based on 1-norm regularized random forests. CRF simultaneously extracts a small number of rules generated by random forests and selects important features. We applied CRF to several drug activity prediction and microarray data sets. CRF is capable of producing performance comparable with state-of-the-art prediction algorithms using a small number of decision rules. Some of the decision rules are biologically significant.

  16. ERTS-1 data applications to Minnesota forest land use classification

    NASA Technical Reports Server (NTRS)

    Sizer, J. E. (Principal Investigator); Eller, R. G.; Meyer, M. P.; Ulliman, J. J.

    1973-01-01

    The author has identified the following significant results. Color-combined ERTS-1 MSS spectral slices were analyzed to determine the maximum (repeatable) level of meaningful forest resource classification data visually attainable by skilled forest photointerpreters for the following purposes: (1) periodic updating of the Minnesota Land Management Information System (MLMIS) statewide computerized land use data bank, and (2) to provide first-stage forest resources survey data for large area forest land management planning. Controlled tests were made of two forest classification schemes by experienced professional foresters with special photointerpretation training and experience. The test results indicate it is possible to discriminate the MLMIS forest class from the MLMIS nonforest classes, but that it is not possible, under average circumstances, to further stratify the forest classification into species components with any degree of reliability with ERTS-1 imagery. An ongoing test of the resulting classification scheme involves the interpretation, and mapping, of the south half of Itasca County, Minnesota, with ERTS-1 imagery. This map is undergoing field checking by on the ground field cooperators, whose evaluation will be completed in the fall of 1973.

  17. Discrimination of raw and processed Dipsacus asperoides by near infrared spectroscopy combined with least squares-support vector machine and random forests

    NASA Astrophysics Data System (ADS)

    Xin, Ni; Gu, Xiao-Feng; Wu, Hao; Hu, Yu-Zhu; Yang, Zhong-Lin

    2012-04-01

    Most herbal medicines could be processed to fulfill the different requirements of therapy. The purpose of this study was to discriminate between raw and processed Dipsacus asperoides, a common traditional Chinese medicine, based on their near infrared (NIR) spectra. Least squares-support vector machine (LS-SVM) and random forests (RF) were employed for full-spectrum classification. Three types of kernels, including linear kernel, polynomial kernel and radial basis function kernel (RBF), were checked for optimization of LS-SVM model. For comparison, a linear discriminant analysis (LDA) model was performed for classification, and the successive projections algorithm (SPA) was executed prior to building an LDA model to choose an appropriate subset of wavelengths. The three methods were applied to a dataset containing 40 raw herbs and 40 corresponding processed herbs. We ran 50 runs of 10-fold cross validation to evaluate the model's efficiency. The performance of the LS-SVM with RBF kernel (RBF LS-SVM) was better than the other two kernels. The RF, RBF LS-SVM and SPA-LDA successfully classified all test samples. The mean error rates for the 50 runs of 10-fold cross validation were 1.35% for RBF LS-SVM, 2.87% for RF, and 2.50% for SPA-LDA. The best classification results were obtained by using LS-SVM with RBF kernel, while RF was fast in the training and making predictions.

  18. Local receptive field constrained stacked sparse autoencoder for classification of hyperspectral images.

    PubMed

    Wan, Xiaoqing; Zhao, Chunhui

    2017-06-01

    As a competitive machine learning algorithm, the stacked sparse autoencoder (SSA) has achieved outstanding popularity in exploiting high-level features for classification of hyperspectral images (HSIs). In general, in the SSA architecture, the nodes between adjacent layers are fully connected and need to be iteratively fine-tuned during the pretraining stage; however, the nodes of previous layers further away may be less likely to have a dense correlation to the given node of subsequent layers. Therefore, to reduce the classification error and increase the learning rate, this paper proposes the general framework of locally connected SSA; that is, the biologically inspired local receptive field (LRF) constrained SSA architecture is employed to simultaneously characterize the local correlations of spectral features and extract high-level feature representations of hyperspectral data. In addition, the appropriate receptive field constraint is concurrently updated by measuring the spatial distances from the neighbor nodes to the corresponding node. Finally, the efficient random forest classifier is cascaded to the last hidden layer of the SSA architecture as a benchmark classifier. Experimental results on two real HSI datasets demonstrate that the proposed hierarchical LRF constrained stacked sparse autoencoder and random forest (SSARF) provides encouraging results with respect to other contrastive methods, for instance, the improvements of overall accuracy in a range of 0.72%-10.87% for the Indian Pines dataset and 0.74%-7.90% for the Kennedy Space Center dataset; moreover, it generates lower running time compared with the result provided by similar SSARF based methodology.

  19. Application of Random Forests Methods to Diabetic Retinopathy Classification Analyses

    PubMed Central

    Casanova, Ramon; Saldana, Santiago; Chew, Emily Y.; Danis, Ronald P.; Greven, Craig M.; Ambrosius, Walter T.

    2014-01-01

    Background Diabetic retinopathy (DR) is one of the leading causes of blindness in the United States and world-wide. DR is a silent disease that may go unnoticed until it is too late for effective treatment. Therefore, early detection could improve the chances of therapeutic interventions that would alleviate its effects. Methodology Graded fundus photography and systemic data from 3443 ACCORD-Eye Study participants were used to estimate Random Forest (RF) and logistic regression classifiers. We studied the impact of sample size on classifier performance and the possibility of using RF generated class conditional probabilities as metrics describing DR risk. RF measures of variable importance are used to detect factors that affect classification performance. Principal Findings Both types of data were informative when discriminating participants with or without DR. RF based models produced much higher classification accuracy than those based on logistic regression. Combining both types of data did not increase accuracy but did increase statistical discrimination of healthy participants who subsequently did or did not have DR events during four years of follow-up. RF variable importance criteria revealed that microaneurysms counts in both eyes seemed to play the most important role in discrimination among the graded fundus variables, while the number of medicines and diabetes duration were the most relevant among the systemic variables. Conclusions and Significance We have introduced RF methods to DR classification analyses based on fundus photography data. In addition, we propose an approach to DR risk assessment based on metrics derived from graded fundus photography and systemic data. Our results suggest that RF methods could be a valuable tool to diagnose DR diagnosis and evaluate its progression. PMID:24940623

  20. Evaluation of Classifier Performance for Multiclass Phenotype Discrimination in Untargeted Metabolomics.

    PubMed

    Trainor, Patrick J; DeFilippis, Andrew P; Rai, Shesh N

    2017-06-21

    Statistical classification is a critical component of utilizing metabolomics data for examining the molecular determinants of phenotypes. Despite this, a comprehensive and rigorous evaluation of the accuracy of classification techniques for phenotype discrimination given metabolomics data has not been conducted. We conducted such an evaluation using both simulated and real metabolomics datasets, comparing Partial Least Squares-Discriminant Analysis (PLS-DA), Sparse PLS-DA, Random Forests, Support Vector Machines (SVM), Artificial Neural Network, k -Nearest Neighbors ( k -NN), and Naïve Bayes classification techniques for discrimination. We evaluated the techniques on simulated data generated to mimic global untargeted metabolomics data by incorporating realistic block-wise correlation and partial correlation structures for mimicking the correlations and metabolite clustering generated by biological processes. Over the simulation studies, covariance structures, means, and effect sizes were stochastically varied to provide consistent estimates of classifier performance over a wide range of possible scenarios. The effects of the presence of non-normal error distributions, the introduction of biological and technical outliers, unbalanced phenotype allocation, missing values due to abundances below a limit of detection, and the effect of prior-significance filtering (dimension reduction) were evaluated via simulation. In each simulation, classifier parameters, such as the number of hidden nodes in a Neural Network, were optimized by cross-validation to minimize the probability of detecting spurious results due to poorly tuned classifiers. Classifier performance was then evaluated using real metabolomics datasets of varying sample medium, sample size, and experimental design. We report that in the most realistic simulation studies that incorporated non-normal error distributions, unbalanced phenotype allocation, outliers, missing values, and dimension reduction, classifier performance (least to greatest error) was ranked as follows: SVM, Random Forest, Naïve Bayes, sPLS-DA, Neural Networks, PLS-DA and k -NN classifiers. When non-normal error distributions were introduced, the performance of PLS-DA and k -NN classifiers deteriorated further relative to the remaining techniques. Over the real datasets, a trend of better performance of SVM and Random Forest classifier performance was observed.

  1. Multidecadal Rates of Disturbance- and Climate Change-Induced Land Cover Change in Arctic and Boreal Ecosystems over Western Canada and Alaska Inferred from Dense Landsat Time Series

    NASA Astrophysics Data System (ADS)

    Wang, J.; Sulla-menashe, D. J.; Woodcock, C. E.; Sonnentag, O.; Friedl, M. A.

    2017-12-01

    Rapid climate change in arctic and boreal ecosystems is driving changes to land cover composition, including woody expansion in the arctic tundra, successional shifts following boreal fires, and thaw-induced wetland expansion and forest collapse along the southern limit of permafrost. The impacts of these land cover transformations on the physical climate and the carbon cycle are increasingly well-documented from field and model studies, but there have been few attempts to empirically estimate rates of land cover change at decadal time scale and continental spatial scale. Previous studies have used too coarse spatial resolution or have been too limited in temporal range to enable broad multi-decadal assessment of land cover change. As part of NASA's Arctic Boreal Vulnerability Experiment (ABoVE), we are using dense time series of Landsat remote sensing data to map disturbances and classify land cover types across the ABoVE extended domain (spanning western Canada and Alaska) over the last three decades (1982-2014) at 30 m resolution. We utilize regionally-complete and repeated acquisition high-resolution (<2 m) DigitalGlobe imagery to generate training data from across the region that follows a nested, hierarchical classification scheme encompassing plant functional type and cover density, understory type, wetland status, and land use. Additionally, we crosswalk plot-level field data into our scheme for additional high quality training sites. We use the Continuous Change Detection and Classification algorithm to estimate land cover change dates and temporal-spectral features in the Landsat data. These features are used to train random forest classification models and map land cover and analyze land cover change processes, focusing primarily on tundra "shrubification", post-fire succession, and boreal wetland expansion. We will analyze the high resolution data based on stratified random sampling of our change maps to validate and assess the accuracy of our model predictions. In this paper, we present initial results from this effort, including sub-regional analyses focused on several key areas, such as the Taiga Plains and the Southern Arctic ecozones, to calibrate our random forest models and assess results.

  2. Reading Profiles in Multi-Site Data With Missingness.

    PubMed

    Eckert, Mark A; Vaden, Kenneth I; Gebregziabher, Mulugeta

    2018-01-01

    Children with reading disability exhibit varied deficits in reading and cognitive abilities that contribute to their reading comprehension problems. Some children exhibit primary deficits in phonological processing, while others can exhibit deficits in oral language and executive functions that affect comprehension. This behavioral heterogeneity is problematic when missing data prevent the characterization of different reading profiles, which often occurs in retrospective data sharing initiatives without coordinated data collection. Here we show that reading profiles can be reliably identified based on Random Forest classification of incomplete behavioral datasets, after the missForest method is used to multiply impute missing values. Results from simulation analyses showed that reading profiles could be accurately classified across degrees of missingness (e.g., ∼5% classification error for 30% missingness across the sample). The application of missForest to a real multi-site dataset with missingness ( n = 924) showed that reading disability profiles significantly and consistently differed in reading and cognitive abilities for cases with and without missing data. The results of validation analyses indicated that the reading profiles (cases with and without missing data) exhibited significant differences for an independent set of behavioral variables that were not used to classify reading profiles. Together, the results show how multiple imputation can be applied to the classification of cases with missing data and can increase the integrity of results from multi-site open access datasets.

  3. Ecological Land Classification: Applications to Identify the Productive Potential of Southern Forests

    Treesearch

    Dennis L. Mengel; D. Thompson Tew; [Editors

    1991-01-01

    Eighteen papers representing four categories-Regional Overviews; Classification System Development; Classification System Interpretation; Mapping/GIS Applications in Classification Systems-present the state of the art in forest-land classification and evaluation in the South. In addition, nine poster papers are presented.

  4. Semi-supervised manifold learning with affinity regularization for Alzheimer's disease identification using positron emission tomography imaging.

    PubMed

    Lu, Shen; Xia, Yong; Cai, Tom Weidong; Feng, David Dagan

    2015-01-01

    Dementia, Alzheimer's disease (AD) in particular is a global problem and big threat to the aging population. An image based computer-aided dementia diagnosis method is needed to providing doctors help during medical image examination. Many machine learning based dementia classification methods using medical imaging have been proposed and most of them achieve accurate results. However, most of these methods make use of supervised learning requiring fully labeled image dataset, which usually is not practical in real clinical environment. Using large amount of unlabeled images can improve the dementia classification performance. In this study we propose a new semi-supervised dementia classification method based on random manifold learning with affinity regularization. Three groups of spatial features are extracted from positron emission tomography (PET) images to construct an unsupervised random forest which is then used to regularize the manifold learning objective function. The proposed method, stat-of-the-art Laplacian support vector machine (LapSVM) and supervised SVM are applied to classify AD and normal controls (NC). The experiment results show that learning with unlabeled images indeed improves the classification performance. And our method outperforms LapSVM on the same dataset.

  5. Mapping of land cover in northern California with simulated hyperspectral satellite imagery

    NASA Astrophysics Data System (ADS)

    Clark, Matthew L.; Kilham, Nina E.

    2016-09-01

    Land-cover maps are important science products needed for natural resource and ecosystem service management, biodiversity conservation planning, and assessing human-induced and natural drivers of land change. Analysis of hyperspectral, or imaging spectrometer, imagery has shown an impressive capacity to map a wide range of natural and anthropogenic land cover. Applications have been mostly with single-date imagery from relatively small spatial extents. Future hyperspectral satellites will provide imagery at greater spatial and temporal scales, and there is a need to assess techniques for mapping land cover with these data. Here we used simulated multi-temporal HyspIRI satellite imagery over a 30,000 km2 area in the San Francisco Bay Area, California to assess its capabilities for mapping classes defined by the international Land Cover Classification System (LCCS). We employed a mapping methodology and analysis framework that is applicable to regional and global scales. We used the Random Forests classifier with three sets of predictor variables (reflectance, MNF, hyperspectral metrics), two temporal resolutions (summer, spring-summer-fall), two sample scales (pixel, polygon) and two levels of classification complexity (12, 20 classes). Hyperspectral metrics provided a 16.4-21.8% and 3.1-6.7% increase in overall accuracy relative to MNF and reflectance bands, respectively, depending on pixel or polygon scales of analysis. Multi-temporal metrics improved overall accuracy by 0.9-3.1% over summer metrics, yet increases were only significant at the pixel scale of analysis. Overall accuracy at pixel scales was 72.2% (Kappa 0.70) with three seasons of metrics. Anthropogenic and homogenous natural vegetation classes had relatively high confidence and producer and user accuracies were over 70%; in comparison, woodland and forest classes had considerable confusion. We next focused on plant functional types with relatively pure spectra by removing open-canopy shrublands, woodlands and mixed forests from the classification. This 12-class map had significantly improved accuracy of 85.1% (Kappa 0.83) and most classes had over 70% producer and user accuracies. Finally, we summarized important metrics from the multi-temporal Random Forests to infer the underlying chemical and structural properties that best discriminated our land-cover classes across seasons.

  6. [The study of M dwarf spectral classification].

    PubMed

    Yi, Zhen-Ping; Pan, Jing-Chang; Luo, A-Li

    2013-08-01

    As the most common stars in the galaxy, M dwarfs can be used to trace the structure and evolution of the Milky Way. Besides, investigating M dwarfs is important for searching for habitability of extrasolar planets orbiting M dwarfs. Spectral classification of M dwarfs is a fundamental work. The authors used DR7 M dwarf sample of SLOAN to extract important features from the range of 600-900 nm by random forest method. Compared to the features used in Hammer Code, the authors added three new indices. Our test showed that the improved Hammer with new indices is more accurate. Our method has been applied to classify M dwarf spectra of LAMOST.

  7. Methods for Real-Time Prediction of the Mode of Travel Using Smartphone-Based GPS and Accelerometer Data

    PubMed Central

    Martin, Bryan D.; Wolfson, Julian; Adomavicius, Gediminas; Fan, Yingling

    2017-01-01

    We propose and compare combinations of several methods for classifying transportation activity data from smartphone GPS and accelerometer sensors. We have two main objectives. First, we aim to classify our data as accurately as possible. Second, we aim to reduce the dimensionality of the data as much as possible in order to reduce the computational burden of the classification. We combine dimension reduction and classification algorithms and compare them with a metric that balances accuracy and dimensionality. In doing so, we develop a classification algorithm that accurately classifies five different modes of transportation (i.e., walking, biking, car, bus and rail) while being computationally simple enough to run on a typical smartphone. Further, we use data that required no behavioral changes from the smartphone users to collect. Our best classification model uses the random forest algorithm to achieve 96.8% accuracy. PMID:28885550

  8. Methods for Real-Time Prediction of the Mode of Travel Using Smartphone-Based GPS and Accelerometer Data.

    PubMed

    Martin, Bryan D; Addona, Vittorio; Wolfson, Julian; Adomavicius, Gediminas; Fan, Yingling

    2017-09-08

    We propose and compare combinations of several methods for classifying transportation activity data from smartphone GPS and accelerometer sensors. We have two main objectives. First, we aim to classify our data as accurately as possible. Second, we aim to reduce the dimensionality of the data as much as possible in order to reduce the computational burden of the classification. We combine dimension reduction and classification algorithms and compare them with a metric that balances accuracy and dimensionality. In doing so, we develop a classification algorithm that accurately classifies five different modes of transportation (i.e., walking, biking, car, bus and rail) while being computationally simple enough to run on a typical smartphone. Further, we use data that required no behavioral changes from the smartphone users to collect. Our best classification model uses the random forest algorithm to achieve 96.8% accuracy.

  9. Prediction of aquatic toxicity mode of action using linear discriminant and random forest models.

    PubMed

    Martin, Todd M; Grulke, Christopher M; Young, Douglas M; Russom, Christine L; Wang, Nina Y; Jackson, Crystal R; Barron, Mace G

    2013-09-23

    The ability to determine the mode of action (MOA) for a diverse group of chemicals is a critical part of ecological risk assessment and chemical regulation. However, existing MOA assignment approaches in ecotoxicology have been limited to a relatively few MOAs, have high uncertainty, or rely on professional judgment. In this study, machine based learning algorithms (linear discriminant analysis and random forest) were used to develop models for assigning aquatic toxicity MOA. These methods were selected since they have been shown to be able to correlate diverse data sets and provide an indication of the most important descriptors. A data set of MOA assignments for 924 chemicals was developed using a combination of high confidence assignments, international consensus classifications, ASTER (ASessment Tools for the Evaluation of Risk) predictions, and weight of evidence professional judgment based an assessment of structure and literature information. The overall data set was randomly divided into a training set (75%) and a validation set (25%) and then used to develop linear discriminant analysis (LDA) and random forest (RF) MOA assignment models. The LDA and RF models had high internal concordance and specificity and were able to produce overall prediction accuracies ranging from 84.5 to 87.7% for the validation set. These results demonstrate that computational chemistry approaches can be used to determine the acute toxicity MOAs across a large range of structures and mechanisms.

  10. Quantitative classification of a historic northern Wisconsin (U.S.A.) landscape: mapping forests at regional scales

    Treesearch

    Lisa A. Schulte; David J. Mladenoff; Erik V. Nordheim

    2002-01-01

    We developed a quantitative and replicable classification system to improve understanding of historical composition and structure within northern Wisconsin's forests. The classification system was based on statistical cluster analysis and two forest metrics, relative dominance (% basal area) and relative importance (mean of relative dominance and relative density...

  11. Pigmented skin lesion detection using random forest and wavelet-based texture

    NASA Astrophysics Data System (ADS)

    Hu, Ping; Yang, Tie-jun

    2016-10-01

    The incidence of cutaneous malignant melanoma, a disease of worldwide distribution and is the deadliest form of skin cancer, has been rapidly increasing over the last few decades. Because advanced cutaneous melanoma is still incurable, early detection is an important step toward a reduction in mortality. Dermoscopy photographs are commonly used in melanoma diagnosis and can capture detailed features of a lesion. A great variability exists in the visual appearance of pigmented skin lesions. Therefore, in order to minimize the diagnostic errors that result from the difficulty and subjectivity of visual interpretation, an automatic detection approach is required. The objectives of this paper were to propose a hybrid method using random forest and Gabor wavelet transformation to accurately differentiate which part belong to lesion area and the other is not in a dermoscopy photographs and analyze segmentation accuracy. A random forest classifier consisting of a set of decision trees was used for classification. Gabor wavelets transformation are the mathematical model of visual cortical cells of mammalian brain and an image can be decomposed into multiple scales and multiple orientations by using it. The Gabor function has been recognized as a very useful tool in texture analysis, due to its optimal localization properties in both spatial and frequency domain. Texture features based on Gabor wavelets transformation are found by the Gabor filtered image. Experiment results indicate the following: (1) the proposed algorithm based on random forest outperformed the-state-of-the-art in pigmented skin lesions detection (2) and the inclusion of Gabor wavelet transformation based texture features improved segmentation accuracy significantly.

  12. Personalized Risk Prediction in Clinical Oncology Research: Applications and Practical Issues Using Survival Trees and Random Forests.

    PubMed

    Hu, Chen; Steingrimsson, Jon Arni

    2018-01-01

    A crucial component of making individualized treatment decisions is to accurately predict each patient's disease risk. In clinical oncology, disease risks are often measured through time-to-event data, such as overall survival and progression/recurrence-free survival, and are often subject to censoring. Risk prediction models based on recursive partitioning methods are becoming increasingly popular largely due to their ability to handle nonlinear relationships, higher-order interactions, and/or high-dimensional covariates. The most popular recursive partitioning methods are versions of the Classification and Regression Tree (CART) algorithm, which builds a simple interpretable tree structured model. With the aim of increasing prediction accuracy, the random forest algorithm averages multiple CART trees, creating a flexible risk prediction model. Risk prediction models used in clinical oncology commonly use both traditional demographic and tumor pathological factors as well as high-dimensional genetic markers and treatment parameters from multimodality treatments. In this article, we describe the most commonly used extensions of the CART and random forest algorithms to right-censored outcomes. We focus on how they differ from the methods for noncensored outcomes, and how the different splitting rules and methods for cost-complexity pruning impact these algorithms. We demonstrate these algorithms by analyzing a randomized Phase III clinical trial of breast cancer. We also conduct Monte Carlo simulations to compare the prediction accuracy of survival forests with more commonly used regression models under various scenarios. These simulation studies aim to evaluate how sensitive the prediction accuracy is to the underlying model specifications, the choice of tuning parameters, and the degrees of missing covariates.

  13. Deep convolutional neural networks for automatic classification of gastric carcinoma using whole slide images in digital histopathology.

    PubMed

    Sharma, Harshita; Zerbe, Norman; Klempert, Iris; Hellwich, Olaf; Hufnagl, Peter

    2017-11-01

    Deep learning using convolutional neural networks is an actively emerging field in histological image analysis. This study explores deep learning methods for computer-aided classification in H&E stained histopathological whole slide images of gastric carcinoma. An introductory convolutional neural network architecture is proposed for two computerized applications, namely, cancer classification based on immunohistochemical response and necrosis detection based on the existence of tumor necrosis in the tissue. Classification performance of the developed deep learning approach is quantitatively compared with traditional image analysis methods in digital histopathology requiring prior computation of handcrafted features, such as statistical measures using gray level co-occurrence matrix, Gabor filter-bank responses, LBP histograms, gray histograms, HSV histograms and RGB histograms, followed by random forest machine learning. Additionally, the widely known AlexNet deep convolutional framework is comparatively analyzed for the corresponding classification problems. The proposed convolutional neural network architecture reports favorable results, with an overall classification accuracy of 0.6990 for cancer classification and 0.8144 for necrosis detection. Copyright © 2017 Elsevier Ltd. All rights reserved.

  14. Supernova Photometric Lightcurve Classification

    NASA Astrophysics Data System (ADS)

    Zaidi, Tayeb; Narayan, Gautham

    2016-01-01

    This is a preliminary report on photometric supernova classification. We first explore the properties of supernova light curves, and attempt to restructure the unevenly sampled and sparse data from assorted datasets to allow for processing and classification. The data was primarily drawn from the Dark Energy Survey (DES) simulated data, created for the Supernova Photometric Classification Challenge. This poster shows a method for producing a non-parametric representation of the light curve data, and applying a Random Forest classifier algorithm to distinguish between supernovae types. We examine the impact of Principal Component Analysis to reduce the dimensionality of the dataset, for future classification work. The classification code will be used in a stage of the ANTARES pipeline, created for use on the Large Synoptic Survey Telescope alert data and other wide-field surveys. The final figure-of-merit for the DES data in the r band was 60% for binary classification (Type I vs II).Zaidi was supported by the NOAO/KPNO Research Experiences for Undergraduates (REU) Program which is funded by the National Science Foundation Research Experiences for Undergraduates Program (AST-1262829).

  15. Using random forest for reliable classification and cost-sensitive learning for medical diagnosis.

    PubMed

    Yang, Fan; Wang, Hua-zhen; Mi, Hong; Lin, Cheng-de; Cai, Wei-wen

    2009-01-30

    Most machine-learning classifiers output label predictions for new instances without indicating how reliable the predictions are. The applicability of these classifiers is limited in critical domains where incorrect predictions have serious consequences, like medical diagnosis. Further, the default assumption of equal misclassification costs is most likely violated in medical diagnosis. In this paper, we present a modified random forest classifier which is incorporated into the conformal predictor scheme. A conformal predictor is a transductive learning scheme, using Kolmogorov complexity to test the randomness of a particular sample with respect to the training sets. Our method show well-calibrated property that the performance can be set prior to classification and the accurate rate is exactly equal to the predefined confidence level. Further, to address the cost sensitive problem, we extend our method to a label-conditional predictor which takes into account different costs for misclassifications in different class and allows different confidence level to be specified for each class. Intensive experiments on benchmark datasets and real world applications show the resultant classifier is well-calibrated and able to control the specific risk of different class. The method of using RF outlier measure to design a nonconformity measure benefits the resultant predictor. Further, a label-conditional classifier is developed and turn to be an alternative approach to the cost sensitive learning problem that relies on label-wise predefined confidence level. The target of minimizing the risk of misclassification is achieved by specifying the different confidence level for different class.

  16. Automatic detection of freezing of gait events in patients with Parkinson's disease.

    PubMed

    Tripoliti, Evanthia E; Tzallas, Alexandros T; Tsipouras, Markos G; Rigas, George; Bougia, Panagiota; Leontiou, Michael; Konitsiotis, Spiros; Chondrogiorgi, Maria; Tsouli, Sofia; Fotiadis, Dimitrios I

    2013-04-01

    The aim of this study is to detect freezing of gait (FoG) events in patients suffering from Parkinson's disease (PD) using signals received from wearable sensors (six accelerometers and two gyroscopes) placed on the patients' body. For this purpose, an automated methodology has been developed which consists of four stages. In the first stage, missing values due to signal loss or degradation are replaced and then (second stage) low frequency components of the raw signal are removed. In the third stage, the entropy of the raw signal is calculated. Finally (fourth stage), four classification algorithms have been tested (Naïve Bayes, Random Forests, Decision Trees and Random Tree) in order to detect the FoG events. The methodology has been evaluated using several different configurations of sensors in order to conclude to the set of sensors which can produce optimal FoG episode detection. Signals recorded from five healthy subjects, five patients with PD who presented the symptom of FoG and six patients who suffered from PD but they do not present FoG events. The signals included 93 FoG events with 405.6s total duration. The results indicate that the proposed methodology is able to detect FoG events with 81.94% sensitivity, 98.74% specificity, 96.11% accuracy and 98.6% area under curve (AUC) using the signals from all sensors and the Random Forests classification algorithm. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.

  17. Random forest feature selection approach for image segmentation

    NASA Astrophysics Data System (ADS)

    Lefkovits, László; Lefkovits, Szidónia; Emerich, Simina; Vaida, Mircea Florin

    2017-03-01

    In the field of image segmentation, discriminative models have shown promising performance. Generally, every such model begins with the extraction of numerous features from annotated images. Most authors create their discriminative model by using many features without using any selection criteria. A more reliable model can be built by using a framework that selects the important variables, from the point of view of the classification, and eliminates the unimportant once. In this article we present a framework for feature selection and data dimensionality reduction. The methodology is built around the random forest (RF) algorithm and its variable importance evaluation. In order to deal with datasets so large as to be practically unmanageable, we propose an algorithm based on RF that reduces the dimension of the database by eliminating irrelevant features. Furthermore, this framework is applied to optimize our discriminative model for brain tumor segmentation.

  18. Combination of lateral and PA view radiographs to study development of knee OA and associated pain

    NASA Astrophysics Data System (ADS)

    Minciullo, Luca; Thomson, Jessie; Cootes, Timothy F.

    2017-03-01

    Knee Osteoarthritis (OA) is the most common form of arthritis, affecting millions of people around the world. The effects of the disease have been studied using the shape and texture features of bones in PosteriorAnterior (PA) and Lateral radiographs separately. In this work we compare the utility of features from each view, and evaluate whether combining features from both is advantageous. We built a fully automated system to independently locate landmark points in both radiographic images using Random Forest Constrained Local Models. We extracted discriminative features from the two bony outlines using Appearance Models. The features were used to train Random Forest classifiers to solve three specific tasks: (i) OA classification, distinguishing patients with structural signs of OA from the others; (ii) predicting future onset of the disease and (iii) predicting which patients with no current pain will have a positive pain score later in a follow-up visit. Using a subset of the MOST dataset we show that the PA view has more discriminative features to classify and predict OA, while the lateral view contains features that achieve better performance in predicting pain, and that combining the features from both views gives a small improvement in accuracy of the classification compared to the individual views.

  19. Source identification of western Oregon Douglas-fir wood cores using mass spectrometry and random forest classification1

    PubMed Central

    Finch, Kristen; Espinoza, Edgard; Jones, F. Andrew; Cronn, Richard

    2017-01-01

    Premise of the study: We investigated whether wood metabolite profiles from direct analysis in real time (time-of-flight) mass spectrometry (DART-TOFMS) could be used to determine the geographic origin of Douglas-fir wood cores originating from two regions in western Oregon, USA. Methods: Three annual ring mass spectra were obtained from 188 adult Douglas-fir trees, and these were analyzed using random forest models to determine whether samples could be classified to geographic origin, growth year, or growth year and geographic origin. Specific wood molecules that contributed to geographic discrimination were identified. Results: Douglas-fir mass spectra could be differentiated into two geographic classes with an accuracy between 70% and 76%. Classification models could not accurately classify sample mass spectra based on growth year. Thirty-two molecules were identified as key for classifying western Oregon Douglas-fir wood cores to geographic origin. Discussion: DART-TOFMS is capable of detecting minute but regionally informative differences in wood molecules over a small geographic scale, and these differences made it possible to predict the geographic origin of Douglas-fir wood with moderate accuracy. Studies involving DART-TOFMS, alone and in combination with other technologies, will be relevant for identifying the geographic origin of illegally harvested wood. PMID:28529831

  20. Classification of nanoparticle diffusion processes in vital cells by a multifeature random forests approach: application to simulated data, darkfield, and confocal laser scanning microscopy

    NASA Astrophysics Data System (ADS)

    Wagner, Thorsten; Kroll, Alexandra; Wiemann, Martin; Lipinski, Hans-Gerd

    2016-04-01

    Darkfield and confocal laser scanning microscopy both allow for a simultaneous observation of live cells and single nanoparticles. Accordingly, a characterization of nanoparticle uptake and intracellular mobility appears possible within living cells. Single particle tracking makes it possible to characterize the particle and the surrounding cell. In case of free diffusion, the mean squared displacement for each trajectory of a nanoparticle can be measured which allows computing the corresponding diffusion coefficient and, if desired, converting it into the hydrodynamic diameter using the Stokes-Einstein equation and the viscosity of the fluid. However, within the more complex system of a cell's cytoplasm unrestrained diffusion is scarce and several other types of movements may occur. Thus, confined or anomalous diffusion (e.g. diffusion in porous media), active transport, and combinations thereof were described by several authors. To distinguish between these types of particle movement we developed an appropriate classification method, and simulated three types of particle motion in a 2D plane using a Monte Carlo approach: (1) normal diffusion, using random direction and step-length, (2) subdiffusion, using confinements like a reflective boundary with defined radius or reflective objects in the closer vicinity, and (3) superdiffusion, using a directed flow added to the normal diffusion. To simulate subdiffusion we devised a new method based on tracks of different length combined with equally probable obstacle interaction. Next we estimated the fractal dimension, elongation and the ratio of long-time / short-time diffusion coefficients. These features were used to train a random forests classification algorithm. The accuracy for simulated trajectories with 180 steps was 97% (95%-CI: 0.9481-0.9884). The balanced accuracy was 94%, 99% and 98% for normal-, sub- and superdiffusion, respectively. Nanoparticle tracking analysis was used with 100 nm polystyrene particles to get trajectories for normal diffusion. As a next step we identified diffusion types of nanoparticles in vital cells and incubated V79 fibroblasts with 50 nm gold nanoparticles, which appeared as intensely bright objects due to their surface plasmon resonance. The movement of particles in both the extracellular and intracellular space was observed by dark field and confocal laser scanning microscopy. After reducing background noise from the video it became possible to identify individual particle spots by a maximum detection algorithm and trace them using the robust single-particle tracking algorithm proposed by Jaqaman, which is able to handle motion heterogeneity and particle disappearance. The particle trajectories inside cells indicated active transport (superdiffusion) as well as subdiffusion. Eventually, the random forest classification algorithm, after being trained by the above simulations, successfully classified the trajectories observed in live cells.

  1. Object-Based Random Forest Classification of Land Cover from Remotely Sensed Imagery for Industrial and Mining Reclamation

    NASA Astrophysics Data System (ADS)

    Chen, Y.; Luo, M.; Xu, L.; Zhou, X.; Ren, J.; Zhou, J.

    2018-04-01

    The RF method based on grid-search parameter optimization could achieve a classification accuracy of 88.16 % in the classification of images with multiple feature variables. This classification accuracy was higher than that of SVM and ANN under the same feature variables. In terms of efficiency, the RF classification method performs better than SVM and ANN, it is more capable of handling multidimensional feature variables. The RF method combined with object-based analysis approach could highlight the classification accuracy further. The multiresolution segmentation approach on the basis of ESP scale parameter optimization was used for obtaining six scales to execute image segmentation, when the segmentation scale was 49, the classification accuracy reached the highest value of 89.58 %. The classification accuracy of object-based RF classification was 1.42 % higher than that of pixel-based classification (88.16 %), and the classification accuracy was further improved. Therefore, the RF classification method combined with object-based analysis approach could achieve relatively high accuracy in the classification and extraction of land use information for industrial and mining reclamation areas. Moreover, the interpretation of remotely sensed imagery using the proposed method could provide technical support and theoretical reference for remotely sensed monitoring land reclamation.

  2. Mathematical models application for mapping soils spatial distribution on the example of the farm from the North of Udmurt Republic of Russia

    NASA Astrophysics Data System (ADS)

    Dokuchaev, P. M.; Meshalkina, J. L.; Yaroslavtsev, A. M.

    2018-01-01

    Comparative analysis of soils geospatial modeling using multinomial logistic regression, decision trees, random forest, regression trees and support vector machines algorithms was conducted. The visual interpretation of the digital maps obtained and their comparison with the existing map, as well as the quantitative assessment of the individual soil groups detection overall accuracy and of the models kappa showed that multiple logistic regression, support vector method, and random forest models application with spatial prediction of the conditional soil groups distribution can be reliably used for mapping of the study area. It has shown the most accurate detection for sod-podzolics soils (Phaeozems Albic) lightly eroded and moderately eroded soils. In second place, according to the mean overall accuracy of the prediction, there are sod-podzolics soils - non-eroded and warp one, as well as sod-gley soils (Umbrisols Gleyic) and alluvial soils (Fluvisols Dystric, Umbric). Heavy eroded sod-podzolics and gray forest soils (Phaeozems Albic) were detected by methods of automatic classification worst of all.

  3. K-Nearest Neighbor Estimation of Forest Attributes: Improving Mapping Efficiency

    Treesearch

    Andrew O. Finley; Alan R. Ek; Yun Bai; Marvin E. Bauer

    2005-01-01

    This paper describes our efforts in refining k-nearest neighbor forest attributes classification using U.S. Department of Agriculture Forest Service Forest Inventory and Analysis plot data and Landsat 7 Enhanced Thematic Mapper Plus imagery. The analysis focuses on FIA-defined forest type classification across St. Louis County in northeastern Minnesota. We outline...

  4. Assessment of geostatistical features for object-based image classification of contrasted landscape vegetation cover

    NASA Astrophysics Data System (ADS)

    de Oliveira Silveira, Eduarda Martiniano; de Menezes, Michele Duarte; Acerbi Júnior, Fausto Weimar; Castro Nunes Santos Terra, Marcela; de Mello, José Márcio

    2017-07-01

    Accurate mapping and monitoring of savanna and semiarid woodland biomes are needed to support the selection of areas of conservation, to provide sustainable land use, and to improve the understanding of vegetation. The potential of geostatistical features, derived from medium spatial resolution satellite imagery, to characterize contrasted landscape vegetation cover and improve object-based image classification is studied. The study site in Brazil includes cerrado sensu stricto, deciduous forest, and palm swamp vegetation cover. Sentinel 2 and Landsat 8 images were acquired and divided into objects, for each of which a semivariogram was calculated using near-infrared (NIR) and normalized difference vegetation index (NDVI) to extract the set of geostatistical features. The features selected by principal component analysis were used as input data to train a random forest algorithm. Tests were conducted, combining spectral and geostatistical features. Change detection evaluation was performed using a confusion matrix and its accuracies. The semivariogram curves were efficient to characterize spatial heterogeneity, with similar results using NIR and NDVI from Sentinel 2 and Landsat 8. Accuracy was significantly greater when combining geostatistical features with spectral data, suggesting that this method can improve image classification results.

  5. Training echo state networks for rotation-invariant bone marrow cell classification.

    PubMed

    Kainz, Philipp; Burgsteiner, Harald; Asslaber, Martin; Ahammer, Helmut

    2017-01-01

    The main principle of diagnostic pathology is the reliable interpretation of individual cells in context of the tissue architecture. Especially a confident examination of bone marrow specimen is dependent on a valid classification of myeloid cells. In this work, we propose a novel rotation-invariant learning scheme for multi-class echo state networks (ESNs), which achieves very high performance in automated bone marrow cell classification. Based on representing static images as temporal sequence of rotations, we show how ESNs robustly recognize cells of arbitrary rotations by taking advantage of their short-term memory capacity. The performance of our approach is compared to a classification random forest that learns rotation-invariance in a conventional way by exhaustively training on multiple rotations of individual samples. The methods were evaluated on a human bone marrow image database consisting of granulopoietic and erythropoietic cells in different maturation stages. Our ESN approach to cell classification does not rely on segmentation of cells or manual feature extraction and can therefore directly be applied to image data.

  6. Analysis of Landsat-4 Thematic Mapper data for classification of forest stands in Baldwin County, Alabama

    NASA Technical Reports Server (NTRS)

    Hill, C. L.

    1984-01-01

    A computer-implemented classification has been derived from Landsat-4 Thematic Mapper data acquired over Baldwin County, Alabama on January 15, 1983. One set of spectral signatures was developed from the data by utilizing a 3x3 pixel sliding window approach. An analysis of the classification produced from this technique identified forested areas. Additional information regarding only the forested areas. Additional information regarding only the forested areas was extracted by employing a pixel-by-pixel signature development program which derived spectral statistics only for pixels within the forested land covers. The spectral statistics from both approaches were integrated and the data classified. This classification was evaluated by comparing the spectral classes produced from the data against corresponding ground verification polygons. This iterative data analysis technique resulted in an overall classification accuracy of 88.4 percent correct for slash pine, young pine, loblolly pine, natural pine, and mixed hardwood-pine. An accuracy assessment matrix has been produced for the classification.

  7. Forest site classification in the interior uplands

    Treesearch

    Glendon W. Smalley

    1989-01-01

    Classification and evaluation of forest sites is an essential step in managing central hardwood forests. In Note 4.01, The Importance of Site Quality, the usefulness of land classification systems was discussed. The present Note describes one of those systems in more detail. It is an easy-to-use system developed for the Cumberland Plateau and Highland Rim-Pennyroyal...

  8. Nationwide classification of forest types of India using remote sensing and GIS.

    PubMed

    Reddy, C Sudhakar; Jha, C S; Diwakar, P G; Dadhwal, V K

    2015-12-01

    India, a mega-diverse country, possesses a wide range of climate and vegetation types along with a varied topography. The present study has classified forest types of India based on multi-season IRS Resourcesat-2 Advanced Wide Field Sensor (AWiFS) data. The study has characterized 29 land use/land cover classes including 14 forest types and seven scrub types. Hybrid classification approach has been used for the classification of forest types. The classification of vegetation has been carried out based on the ecological rule bases followed by Champion and Seth's (1968) scheme of forest types in India. The present classification scheme has been compared with the available global and national level land cover products. The natural vegetation cover was estimated to be 29.36% of total geographical area of India. The predominant forest types of India are tropical dry deciduous and tropical moist deciduous. Of the total forest cover, tropical dry deciduous forests occupy an area of 2,17,713 km(2) (34.80%) followed by 2,07,649 km(2) (33.19%) under tropical moist deciduous forests, 48,295 km(2) (7.72%) under tropical semi-evergreen forests and 47,192 km(2) (7.54%) under tropical wet evergreen forests. The study has brought out a comprehensive vegetation cover and forest type maps based on inputs critical in defining the various categories of vegetation and forest types. This spatially explicit database will be highly useful for the studies related to changes in various forest types, carbon stocks, climate-vegetation modeling and biogeochemical cycles.

  9. A Robust Random Forest-Based Approach for Heart Rate Monitoring Using Photoplethysmography Signal Contaminated by Intense Motion Artifacts.

    PubMed

    Ye, Yalan; He, Wenwen; Cheng, Yunfei; Huang, Wenxia; Zhang, Zhilin

    2017-02-16

    The estimation of heart rate (HR) based on wearable devices is of interest in fitness. Photoplethysmography (PPG) is a promising approach to estimate HR due to low cost; however, it is easily corrupted by motion artifacts (MA). In this work, a robust approach based on random forest is proposed for accurately estimating HR from the photoplethysmography signal contaminated by intense motion artifacts, consisting of two stages. Stage 1 proposes a hybrid method to effectively remove MA with a low computation complexity, where two MA removal algorithms are combined by an accurate binary decision algorithm whose aim is to decide whether or not to adopt the second MA removal algorithm. Stage 2 proposes a random forest-based spectral peak-tracking algorithm, whose aim is to locate the spectral peak corresponding to HR, formulating the problem of spectral peak tracking into a pattern classification problem. Experiments on the PPG datasets including 22 subjects used in the 2015 IEEE Signal Processing Cup showed that the proposed approach achieved the average absolute error of 1.65 beats per minute (BPM) on the 22 PPG datasets. Compared to state-of-the-art approaches, the proposed approach has better accuracy and robustness to intense motion artifacts, indicating its potential use in wearable sensors for health monitoring and fitness tracking.

  10. CNN-BLPred: a Convolutional neural network based predictor for β-Lactamases (BL) and their classes.

    PubMed

    White, Clarence; Ismail, Hamid D; Saigo, Hiroto; Kc, Dukka B

    2017-12-28

    The β-Lactamase (BL) enzyme family is an important class of enzymes that plays a key role in bacterial resistance to antibiotics. As the newly identified number of BL enzymes is increasing daily, it is imperative to develop a computational tool to classify the newly identified BL enzymes into one of its classes. There are two types of classification of BL enzymes: Molecular Classification and Functional Classification. Existing computational methods only address Molecular Classification and the performance of these existing methods is unsatisfactory. We addressed the unsatisfactory performance of the existing methods by implementing a Deep Learning approach called Convolutional Neural Network (CNN). We developed CNN-BLPred, an approach for the classification of BL proteins. The CNN-BLPred uses Gradient Boosted Feature Selection (GBFS) in order to select the ideal feature set for each BL classification. Based on the rigorous benchmarking of CCN-BLPred using both leave-one-out cross-validation and independent test sets, CCN-BLPred performed better than the other existing algorithms. Compared with other architectures of CNN, Recurrent Neural Network, and Random Forest, the simple CNN architecture with only one convolutional layer performs the best. After feature extraction, we were able to remove ~95% of the 10,912 features using Gradient Boosted Trees. During 10-fold cross validation, we increased the accuracy of the classic BL predictions by 7%. We also increased the accuracy of Class A, Class B, Class C, and Class D performance by an average of 25.64%. The independent test results followed a similar trend. We implemented a deep learning algorithm known as Convolutional Neural Network (CNN) to develop a classifier for BL classification. Combined with feature selection on an exhaustive feature set and using balancing method such as Random Oversampling (ROS), Random Undersampling (RUS) and Synthetic Minority Oversampling Technique (SMOTE), CNN-BLPred performs significantly better than existing algorithms for BL classification.

  11. Discrimination of crop types with TerraSAR-X-derived information

    NASA Astrophysics Data System (ADS)

    Sonobe, Rei; Tani, Hiroshi; Wang, Xiufeng; Kobayashi, Nobuyuki; Shimamura, Hideki

    Although classification maps are required for management and for the estimation of agricultural disaster compensation, those techniques have yet to be established. This paper describes the comparison of three different classification algorithms for mapping crops in Hokkaido, Japan, using TerraSAR-X (including TanDEM-X) dual-polarimetric data. In the study area, beans, beets, grasslands, maize, potatoes and winter wheat were cultivated. In this study, classification using TerraSAR-X-derived information was performed. Coherence values, polarimetric parameters and gamma nought values were also obtained and evaluated regarding their usefulness in crop classification. Accurate classification may be possible with currently existing supervised learning models. A comparison between the classification and regression tree (CART), support vector machine (SVM) and random forests (RF) algorithms was performed. Even though J-M distances were lower than 1.0 on all TerraSAR-X acquisition days, good results were achieved (e.g., separability between winter wheat and grass) due to the characteristics of the machine learning algorithm. It was found that SVM performed best, achieving an overall accuracy of 95.0% based on the polarimetric parameters and gamma nought values for HH and VV polarizations. The misclassified fields were less than 100 a in area and 79.5-96.3% were less than 200 a with the exception of grassland. When some feature such as a road or windbreak forest is present in the TerraSAR-X data, the ratio of its extent to that of the field is relatively higher for the smaller fields, which leads to misclassifications.

  12. Improving Oil Palm Classification in the Peruvian Amazon by Combining Active and Passive Remote Sensing Data

    NASA Astrophysics Data System (ADS)

    Gutierrez-Velez, V. H.; DeFries, R. S.

    2011-12-01

    Oil palm expansion has led to clearing of extensive forest areas in the tropics. However quantitative assessments of the magnitude of oil palm expansion to deforestation have been challenging due in large part to the limitations presented by conventional optical data sets for discriminating plantations from forests and other tree cover vegetations. Recently available information from active remote sensors has opened the possibility of using these data sources to overcome these limitations. The purpose of this analysis is to evaluate the accuracy of oil palm classification when using ALOS/PALSAR active satellite data in conjunction with Landsat information, compared to the use of Landsat data only. The analysis takes place in a focused region around the city of Pucallpa in the Ucayali province of the Peruvian Amazon for the year 2010. Oil palm plantations were separated in five categories consisting of four age classes (0-3, 3-5, 5-10 and > 10 yrs) and an additional class accounting for degraded plantations older than 15 yr. Other land covers were water bodies, unvegetated land, short and tall grass, fallow, secondary vegetation, and forest. Classifications were performed using random forests. Training points for calibration and validation consisted of 411 polygons measured in areas representative of the land covers of interest and totaled 6,367 ha. Overall classification accuracy increased from 89.9% using only Landsat data sets to 94.3% using both Landast and ALOS/PALSAR. Both user's and producer's accuracy increased in all classes when using both data sets except for producer's accuracy in short grass which decreased by 1%. The largest increase in user's accuracy was obtained in oil palm plantations older than 10 years from 62 to 80% while producer's accuracy improved the most in plantations in age class 3-5 from 63 to 80%. Results demonstrate the suitability of data from ALOS/PALSAR and other active remote sensors to improve classification of oil palm plantations in age classes and discriminate them from other land covers. Results suggest a potential for improving discrimination of other tree cover types using a combination of active and conventional optical remote sensors.

  13. Refining Time-Activity Classification of Human Subjects Using the Global Positioning System

    PubMed Central

    Hu, Maogui; Li, Wei; Li, Lianfa; Houston, Douglas; Wu, Jun

    2016-01-01

    Background Detailed spatial location information is important in accurately estimating personal exposure to air pollution. Global Position System (GPS) has been widely used in tracking personal paths and activities. Previous researchers have developed time-activity classification models based on GPS data, most of them were developed for specific regions. An adaptive model for time-location classification can be widely applied to air pollution studies that use GPS to track individual level time-activity patterns. Methods Time-activity data were collected for seven days using GPS loggers and accelerometers from thirteen adult participants from Southern California under free living conditions. We developed an automated model based on random forests to classify major time-activity patterns (i.e. indoor, outdoor-static, outdoor-walking, and in-vehicle travel). Sensitivity analysis was conducted to examine the contribution of the accelerometer data and the supplemental spatial data (i.e. roadway and tax parcel data) to the accuracy of time-activity classification. Our model was evaluated using both leave-one-fold-out and leave-one-subject-out methods. Results Maximum speeds in averaging time intervals of 7 and 5 minutes, and distance to primary highways with limited access were found to be the three most important variables in the classification model. Leave-one-fold-out cross-validation showed an overall accuracy of 99.71%. Sensitivities varied from 84.62% (outdoor walking) to 99.90% (indoor). Specificities varied from 96.33% (indoor) to 99.98% (outdoor static). The exclusion of accelerometer and ambient light sensor variables caused a slight loss in sensitivity for outdoor walking, but little loss in overall accuracy. However, leave-one-subject-out cross-validation showed considerable loss in sensitivity for outdoor static and outdoor walking conditions. Conclusions The random forests classification model can achieve high accuracy for the four major time-activity categories. The model also performed well with just GPS, road and tax parcel data. However, caution is warranted when generalizing the model developed from a small number of subjects to other populations. PMID:26919723

  14. Forested plant associations of the Colville National Forest.

    Treesearch

    Clinton K. Williams; Brian F. Kelley; Bradley G. Smith; Terry R. Lillybridge

    1995-01-01

    A classification of forest vegetation is presented for the Colville National Forest in northeastern Washington State. It is based on potential vegetation with the plant association as the basic unit. The classification is based on a sample of approximately 229 intensive plots and 282 reconnaissance plots distributed across the forest from 1980 to 1983. The hierarchical...

  15. Effectiveness of repeated examination to diagnose enterobiasis in nursery school groups.

    PubMed

    Remm, Mare; Remm, Kalle

    2009-09-01

    The aim of this study was to estimate the benefit from repeated examinations in the diagnosis of enterobiasis in nursery school groups, and to test the effectiveness of individual-based risk predictions using different methods. A total of 604 children were examined using double, and 96 using triple, anal swab examinations. The questionnaires for parents, structured observations, and interviews with supervisors were used to identify factors of possible infection risk. In order to model the risk of enterobiasis at individual level, a similarity-based machine learning and prediction software Constud was compared with data mining methods in the Statistica 8 Data Miner software package. Prevalence according to a single examination was 22.5%; the increase as a result of double examinations was 8.2%. Single swabs resulted in an estimated prevalence of 20.1% among children examined 3 times; double swabs increased this by 10.1%, and triple swabs by 7.3%. Random forest classification, boosting classification trees, and Constud correctly predicted about 2/3 of the results of the second examination. Constud estimated a mean prevalence of 31.5% in groups. Constud was able to yield the highest overall fit of individual-based predictions while boosting classification tree and random forest models were more effective in recognizing Enterobius positive persons. As a rule, the actual prevalence of enterobiasis is higher than indicated by a single examination. We suggest using either the values of the mean increase in prevalence after double examinations compared to single examinations or group estimations deduced from individual-level modelled risk predictions.

  16. Effectiveness of Repeated Examination to Diagnose Enterobiasis in Nursery School Groups

    PubMed Central

    Remm, Kalle

    2009-01-01

    The aim of this study was to estimate the benefit from repeated examinations in the diagnosis of enterobiasis in nursery school groups, and to test the effectiveness of individual-based risk predictions using different methods. A total of 604 children were examined using double, and 96 using triple, anal swab examinations. The questionnaires for parents, structured observations, and interviews with supervisors were used to identify factors of possible infection risk. In order to model the risk of enterobiasis at individual level, a similarity-based machine learning and prediction software Constud was compared with data mining methods in the Statistica 8 Data Miner software package. Prevalence according to a single examination was 22.5%; the increase as a result of double examinations was 8.2%. Single swabs resulted in an estimated prevalence of 20.1% among children examined 3 times; double swabs increased this by 10.1%, and triple swabs by 7.3%. Random forest classification, boosting classification trees, and Constud correctly predicted about 2/3 of the results of the second examination. Constud estimated a mean prevalence of 31.5% in groups. Constud was able to yield the highest overall fit of individual-based predictions while boosting classification tree and random forest models were more effective in recognizing Enterobius positive persons. As a rule, the actual prevalence of enterobiasis is higher than indicated by a single examination. We suggest using either the values of the mean increase in prevalence after double examinations compared to single examinations or group estimations deduced from individual-level modelled risk predictions. PMID:19724696

  17. An evaluation of supervised classifiers for indirectly detecting salt-affected areas at irrigation scheme level

    NASA Astrophysics Data System (ADS)

    Muller, Sybrand Jacobus; van Niekerk, Adriaan

    2016-07-01

    Soil salinity often leads to reduced crop yield and quality and can render soils barren. Irrigated areas are particularly at risk due to intensive cultivation and secondary salinization caused by waterlogging. Regular monitoring of salt accumulation in irrigation schemes is needed to keep its negative effects under control. The dynamic spatial and temporal characteristics of remote sensing can provide a cost-effective solution for monitoring salt accumulation at irrigation scheme level. This study evaluated a range of pan-fused SPOT-5 derived features (spectral bands, vegetation indices, image textures and image transformations) for classifying salt-affected areas in two distinctly different irrigation schemes in South Africa, namely Vaalharts and Breede River. The relationship between the input features and electro conductivity measurements were investigated using regression modelling (stepwise linear regression, partial least squares regression, curve fit regression modelling) and supervised classification (maximum likelihood, nearest neighbour, decision tree analysis, support vector machine and random forests). Classification and regression trees and random forest were used to select the most important features for differentiating salt-affected and unaffected areas. The results showed that the regression analyses produced weak models (<0.4 R squared). Better results were achieved using the supervised classifiers, but the algorithms tend to over-estimate salt-affected areas. A key finding was that none of the feature sets or classification algorithms stood out as being superior for monitoring salt accumulation at irrigation scheme level. This was attributed to the large variations in the spectral responses of different crops types at different growing stages, coupled with their individual tolerances to saline conditions.

  18. Classification of California streams using combined deductive and inductive approaches: Setting the foundation for analysis of hydrologic alteration

    USGS Publications Warehouse

    Pyne, Matthew I.; Carlisle, Daren M.; Konrad, Christopher P.; Stein, Eric D.

    2017-01-01

    Regional classification of streams is an early step in the Ecological Limits of Hydrologic Alteration framework. Many stream classifications are based on an inductive approach using hydrologic data from minimally disturbed basins, but this approach may underrepresent streams from heavily disturbed basins or sparsely gaged arid regions. An alternative is a deductive approach, using watershed climate, land use, and geomorphology to classify streams, but this approach may miss important hydrological characteristics of streams. We classified all stream reaches in California using both approaches. First, we used Bayesian and hierarchical clustering to classify reaches according to watershed characteristics. Streams were clustered into seven classes according to elevation, sedimentary rock, and winter precipitation. Permutation-based analysis of variance and random forest analyses were used to determine which hydrologic variables best separate streams into their respective classes. Stream typology (i.e., the class that a stream reach is assigned to) is shaped mainly by patterns of high and mean flow behavior within the stream's landscape context. Additionally, random forest was used to determine which hydrologic variables best separate minimally disturbed reference streams from non-reference streams in each of the seven classes. In contrast to stream typology, deviation from reference conditions is more difficult to detect and is largely defined by changes in low-flow variables, average daily flow, and duration of flow. Our combined deductive/inductive approach allows us to estimate flow under minimally disturbed conditions based on the deductive analysis and compare to measured flow based on the inductive analysis in order to estimate hydrologic change.

  19. Classification of melanoma lesions using sparse coded features and random forests

    NASA Astrophysics Data System (ADS)

    Rastgoo, Mojdeh; Lemaître, Guillaume; Morel, Olivier; Massich, Joan; Garcia, Rafael; Meriaudeau, Fabrice; Marzani, Franck; Sidibé, Désiré

    2016-03-01

    Malignant melanoma is the most dangerous type of skin cancer, yet it is the most treatable kind of cancer, conditioned by its early diagnosis which is a challenging task for clinicians and dermatologists. In this regard, CAD systems based on machine learning and image processing techniques are developed to differentiate melanoma lesions from benign and dysplastic nevi using dermoscopic images. Generally, these frameworks are composed of sequential processes: pre-processing, segmentation, and classification. This architecture faces mainly two challenges: (i) each process is complex with the need to tune a set of parameters, and is specific to a given dataset; (ii) the performance of each process depends on the previous one, and the errors are accumulated throughout the framework. In this paper, we propose a framework for melanoma classification based on sparse coding which does not rely on any pre-processing or lesion segmentation. Our framework uses Random Forests classifier and sparse representation of three features: SIFT, Hue and Opponent angle histograms, and RGB intensities. The experiments are carried out on the public PH2 dataset using a 10-fold cross-validation. The results show that SIFT sparse-coded feature achieves the highest performance with sensitivity and specificity of 100% and 90.3% respectively, with a dictionary size of 800 atoms and a sparsity level of 2. Furthermore, the descriptor based on RGB intensities achieves similar results with sensitivity and specificity of 100% and 71.3%, respectively for a smaller dictionary size of 100 atoms. In conclusion, dictionary learning techniques encode strong structures of dermoscopic images and provide discriminant descriptors.

  20. Proteomics Versus Clinical Data and Stochastic Local Search Based Feature Selection for Acute Myeloid Leukemia Patients' Classification.

    PubMed

    Chebouba, Lokmane; Boughaci, Dalila; Guziolowski, Carito

    2018-06-04

    The use of data issued from high throughput technologies in drug target problems is widely widespread during the last decades. This study proposes a meta-heuristic framework using stochastic local search (SLS) combined with random forest (RF) where the aim is to specify the most important genes and proteins leading to the best classification of Acute Myeloid Leukemia (AML) patients. First we use a stochastic local search meta-heuristic as a feature selection technique to select the most significant proteins to be used in the classification task step. Then we apply RF to classify new patients into their corresponding classes. The evaluation technique is to run the RF classifier on the training data to get a model. Then, we apply this model on the test data to find the appropriate class. We use as metrics the balanced accuracy (BAC) and the area under the receiver operating characteristic curve (AUROC) to measure the performance of our model. The proposed method is evaluated on the dataset issued from DREAM 9 challenge. The comparison is done with a pure random forest (without feature selection), and with the two best ranked results of the DREAM 9 challenge. We used three types of data: only clinical data, only proteomics data, and finally clinical and proteomics data combined. The numerical results show that the highest scores are obtained when using clinical data alone, and the lowest is obtained when using proteomics data alone. Further, our method succeeds in finding promising results compared to the methods presented in the DREAM challenge.

  1. Disruption prediction investigations using Machine Learning tools on DIII-D and Alcator C-Mod

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Rea, C.; Granetz, R. S.; Montes, K.

    Using data-driven methodology, we exploit the time series of relevant plasma parameters for a large set of disrupted and non-disrupted discharges to develop a classification algorithm for detecting disruptive phases in shots that eventually disrupt. Comparing the same methodology on different devices is crucial in order to have information on the portability of the developed algorithm and the possible extrapolation to ITER. Therefore, we use data from two very different tokamaks, DIII-D and Alcator C-Mod. We then focus on a subset of disruption predictors, most of which are dimensionless and/or machine-independent parameters, coming from both plasma diagnostics and equilibrium reconstructions,more » such as the normalized plasma internal inductance ℓ and the n = 1 mode amplitude normalized to the toroidal magnetic field. Using such dimensionless indicators facilitates a more direct comparison between DIII-D and C-Mod. We then choose a shallow Machine Learning technique, called Random Forests, to explore the databases available for the two devices. We show results from the classification task, where we introduce a time dependency through the definition of class labels on the basis of the elapsed time before the disruption (i.e. ‘far from a disruption’ and ‘close to a disruption’). The performances of the different Random Forest classifiers are discussed in terms of several metrics, by showing the number of successfully detected samples, as well as the misclassifications. The overall model accuracies are above 97% when identifying a ‘far from disruption’ and a ‘disruptive’ phase for disrupted discharges. Nevertheless, the Forests are intrinsically different in their capability of predicting a disruptive behavior, with C-Mod predictions comparable to random guesses. Indeed, we show that C-Mod recall index, i.e. the sensitivity to a disruptive behavior, is as low as 0.47, while DIII-D recall is ~0.72. The portability of the developed algorithm is also tested across the two devices, by using DIII-D data for training the forests and C-Mod for testing and vice versa.« less

  2. Disruption prediction investigations using Machine Learning tools on DIII-D and Alcator C-Mod

    DOE PAGES

    Rea, C.; Granetz, R. S.; Montes, K.; ...

    2018-06-18

    Using data-driven methodology, we exploit the time series of relevant plasma parameters for a large set of disrupted and non-disrupted discharges to develop a classification algorithm for detecting disruptive phases in shots that eventually disrupt. Comparing the same methodology on different devices is crucial in order to have information on the portability of the developed algorithm and the possible extrapolation to ITER. Therefore, we use data from two very different tokamaks, DIII-D and Alcator C-Mod. We then focus on a subset of disruption predictors, most of which are dimensionless and/or machine-independent parameters, coming from both plasma diagnostics and equilibrium reconstructions,more » such as the normalized plasma internal inductance ℓ and the n = 1 mode amplitude normalized to the toroidal magnetic field. Using such dimensionless indicators facilitates a more direct comparison between DIII-D and C-Mod. We then choose a shallow Machine Learning technique, called Random Forests, to explore the databases available for the two devices. We show results from the classification task, where we introduce a time dependency through the definition of class labels on the basis of the elapsed time before the disruption (i.e. ‘far from a disruption’ and ‘close to a disruption’). The performances of the different Random Forest classifiers are discussed in terms of several metrics, by showing the number of successfully detected samples, as well as the misclassifications. The overall model accuracies are above 97% when identifying a ‘far from disruption’ and a ‘disruptive’ phase for disrupted discharges. Nevertheless, the Forests are intrinsically different in their capability of predicting a disruptive behavior, with C-Mod predictions comparable to random guesses. Indeed, we show that C-Mod recall index, i.e. the sensitivity to a disruptive behavior, is as low as 0.47, while DIII-D recall is ~0.72. The portability of the developed algorithm is also tested across the two devices, by using DIII-D data for training the forests and C-Mod for testing and vice versa.« less

  3. A non-parametric, supervised classification of vegetation types on the Kaibab National Forest using decision trees

    Treesearch

    Suzanne M. Joy; R. M. Reich; Richard T. Reynolds

    2003-01-01

    Traditional land classification techniques for large areas that use Landsat Thematic Mapper (TM) imagery are typically limited to the fixed spatial resolution of the sensors (30m). However, the study of some ecological processes requires land cover classifications at finer spatial resolutions. We model forest vegetation types on the Kaibab National Forest (KNF) in...

  4. Open Dataset for the Automatic Recognition of Sedentary Behaviors.

    PubMed

    Possos, William; Cruz, Robinson; Cerón, Jesús D; López, Diego M; Sierra-Torres, Carlos H

    2017-01-01

    Sedentarism is associated with the development of noncommunicable diseases (NCD) such as cardiovascular diseases (CVD), type 2 diabetes, and cancer. Therefore, the identification of specific sedentary behaviors (TV viewing, sitting at work, driving, relaxing, etc.) is especially relevant for planning personalized prevention programs. To build and evaluate a public a dataset for the automatic recognition (classification) of sedentary behaviors. The dataset included data from 30 subjects, who performed 23 sedentary behaviors while wearing a commercial wearable on the wrist, a smartphone on the hip and another in the thigh. Bluetooth Low Energy (BLE) beacons were used in order to improve the automatic classification of different sedentary behaviors. The study also compared six well know data mining classification techniques in order to identify the more precise method of solving the classification problem of the 23 defined behaviors. A better classification accuracy was obtained using the Random Forest algorithm and when data were collected from the phone on the hip. Furthermore, the use of beacons as a reference for obtaining the symbolic location of the individual improved the precision of the classification.

  5. Feasibility of Multispectral Airborne Laser Scanning for Land Cover Classification, Road Mapping and Map Updating

    NASA Astrophysics Data System (ADS)

    Matikainen, L.; Karila, K.; Hyyppä, J.; Puttonen, E.; Litkey, P.; Ahokas, E.

    2017-10-01

    This article summarises our first results and experiences on the use of multispectral airborne laser scanner (ALS) data. Optech Titan multispectral ALS data over a large suburban area in Finland were acquired on three different dates in 2015-2016. We investigated the feasibility of the data from the first date for land cover classification and road mapping. Object-based analyses with segmentation and random forests classification were used. The potential of the data for change detection of buildings and roads was also demonstrated. The overall accuracy of land cover classification results with six classes was 96 % compared with validation points. The data also showed high potential for road detection, road surface classification and change detection. The multispectral intensity information appeared to be very important for automated classifications. Compared to passive aerial images, the intensity images have interesting advantages, such as the lack of shadows. Currently, we focus on analyses and applications with the multitemporal multispectral data. Important questions include, for example, the potential and challenges of the multitemporal data for change detection.

  6. Voice based gender classification using machine learning

    NASA Astrophysics Data System (ADS)

    Raahul, A.; Sapthagiri, R.; Pankaj, K.; Vijayarajan, V.

    2017-11-01

    Gender identification is one of the major problem speech analysis today. Tracing the gender from acoustic data i.e., pitch, median, frequency etc. Machine learning gives promising results for classification problem in all the research domains. There are several performance metrics to evaluate algorithms of an area. Our Comparative model algorithm for evaluating 5 different machine learning algorithms based on eight different metrics in gender classification from acoustic data. Agenda is to identify gender, with five different algorithms: Linear Discriminant Analysis (LDA), K-Nearest Neighbour (KNN), Classification and Regression Trees (CART), Random Forest (RF), and Support Vector Machine (SVM) on basis of eight different metrics. The main parameter in evaluating any algorithms is its performance. Misclassification rate must be less in classification problems, which says that the accuracy rate must be high. Location and gender of the person have become very crucial in economic markets in the form of AdSense. Here with this comparative model algorithm, we are trying to assess the different ML algorithms and find the best fit for gender classification of acoustic data.

  7. Investigating the limitations of tree species classification using the Combined Cluster and Discriminant Analysis method for low density ALS data from a dense forest region in Aggtelek (Hungary)

    NASA Astrophysics Data System (ADS)

    Koma, Zsófia; Deák, Márton; Kovács, József; Székely, Balázs; Kelemen, Kristóf; Standovár, Tibor

    2016-04-01

    Airborne Laser Scanning (ALS) is a widely used technology for forestry classification applications. However, single tree detection and species classification from low density ALS point cloud is limited in a dense forest region. In this study we investigate the division of a forest into homogenous groups at stand level. The study area is located in the Aggtelek karst region (Northeast Hungary) with a complex relief topography. The ALS dataset contained only 4 discrete echoes (at 2-4 pt/m2 density) from the study area during leaf-on season. Ground-truth measurements about canopy closure and proportion of tree species cover are available for every 70 meter in 500 square meter circular plots. In the first step, ALS data were processed and geometrical and intensity based features were calculated into a 5×5 meter raster based grid. The derived features contained: basic statistics of relative height, canopy RMS, echo ratio, openness, pulse penetration ratio, basic statistics of radiometric feature. In the second step the data were investigated using Combined Cluster and Discriminant Analysis (CCDA, Kovács et al., 2014). The CCDA method first determines a basic grouping for the multiple circle shaped sampling locations using hierarchical clustering and then for the arising grouping possibilities a core cycle is executed comparing the goodness of the investigated groupings with random ones. Out of these comparisons difference values arise, yielding information about the optimal grouping out of the investigated ones. If sub-groups are then further investigated, one might even find homogeneous groups. We found that low density ALS data classification into homogeneous groups are highly dependent on canopy closure, and the proportion of the dominant tree species. The presented results show high potential using CCDA for determination of homogenous separable groups in LiDAR based tree species classification. Aggtelek Karst/Slovakian Karst Caves" (HUSK/1101/221/0180, Aggtelek NP), data evaluation: 'Multipurpose assessment serving forest biodiversity conservation in the Carpathian region of Hungary', Swiss-Hungarian Cooperation Programme (SH/4/13 Project). BS contributed as an Alexander von Humboldt Research Fellow. J. Kovács, S. Kovács, N. Magyar, P. Tanos, I. G. Hatvani, and A. Anda (2014), Classification into homogeneous groups using combined cluster and discriminant analysis, Environmental Modelling & Software, 57, 52-59.

  8. Can a Forest/Nonforest Change Map Improve the Precision of Forest Area, Volume, Growth, Removals, and Mortality Estimates?

    Treesearch

    Dale D. Gormanson; Mark H. Hansen; Ronald E. McRoberts

    2005-01-01

    In an extensive forest inventory, stratifications that use dual-date forest/nonforest classifications of Landsat Thematic Mapper data approximately 10 years apart are tested against similar classifications that use data from only one date. Alternative stratifications that further define edge strata as pixels adjacent to a forest/nonforest boundary are included in the...

  9. An initial analysis of LANDSAT 4 Thematic Mapper data for the classification of agricultural, forested wetland, and urban land covers

    NASA Technical Reports Server (NTRS)

    Quattrochi, D. A.; Anderson, J. E.; Brannon, D. P.; Hill, C. L.

    1982-01-01

    An initial analysis of LANDSAT 4 thematic mapper (TM) data for the delineation and classification of agricultural, forested wetland, and urban land covers was conducted. A study area in Poinsett County, Arkansas was used to evaluate a classification of agricultural lands derived from multitemporal LANDSAT multispectral scanner (MSS) data in comparison with a classification of TM data for the same area. Data over Reelfoot Lake in northwestern Tennessee were utilized to evaluate the TM for delineating forested wetland species. A classification of the study area was assessed for accuracy in discriminating five forested wetland categories. Finally, the TM data were used to identify urban features within a small city. A computer generated classification of Union City, Tennessee was analyzed for accuracy in delineating urban land covers. An evaluation of digitally enhanced TM data using principal components analysis to facilitate photointerpretation of urban features was also performed.

  10. Android Malware Classification Using K-Means Clustering Algorithm

    NASA Astrophysics Data System (ADS)

    Hamid, Isredza Rahmi A.; Syafiqah Khalid, Nur; Azma Abdullah, Nurul; Rahman, Nurul Hidayah Ab; Chai Wen, Chuah

    2017-08-01

    Malware was designed to gain access or damage a computer system without user notice. Besides, attacker exploits malware to commit crime or fraud. This paper proposed Android malware classification approach based on K-Means clustering algorithm. We evaluate the proposed model in terms of accuracy using machine learning algorithms. Two datasets were selected to demonstrate the practicing of K-Means clustering algorithms that are Virus Total and Malgenome dataset. We classify the Android malware into three clusters which are ransomware, scareware and goodware. Nine features were considered for each types of dataset such as Lock Detected, Text Detected, Text Score, Encryption Detected, Threat, Porn, Law, Copyright and Moneypak. We used IBM SPSS Statistic software for data classification and WEKA tools to evaluate the built cluster. The proposed K-Means clustering algorithm shows promising result with high accuracy when tested using Random Forest algorithm.

  11. Wavelet-based energy features for glaucomatous image classification.

    PubMed

    Dua, Sumeet; Acharya, U Rajendra; Chowriappa, Pradeep; Sree, S Vinitha

    2012-01-01

    Texture features within images are actively pursued for accurate and efficient glaucoma classification. Energy distribution over wavelet subbands is applied to find these important texture features. In this paper, we investigate the discriminatory potential of wavelet features obtained from the daubechies (db3), symlets (sym3), and biorthogonal (bio3.3, bio3.5, and bio3.7) wavelet filters. We propose a novel technique to extract energy signatures obtained using 2-D discrete wavelet transform, and subject these signatures to different feature ranking and feature selection strategies. We have gauged the effectiveness of the resultant ranked and selected subsets of features using a support vector machine, sequential minimal optimization, random forest, and naïve Bayes classification strategies. We observed an accuracy of around 93% using tenfold cross validations to demonstrate the effectiveness of these methods.

  12. Classification and management of aquatic, riparian, and wetland sites on the national forests of eastern Washington: series description.

    Treesearch

    Bernard L. Kovalchik; Rodrick R. Clausnitzer

    2004-01-01

    This is a classification of aquatic, wetland, and riparian series and plant associations found within the Colville, Okanogan, and Wenatchee National Forests. It is based on the potential vegetation occurring on lake and pond margins, wetland fens and bogs, and fluvial surfaces along streams and rivers within Forest Service lands. Data used in the classification were...

  13. Comparison of classification algorithms for various methods of preprocessing radar images of the MSTAR base

    NASA Astrophysics Data System (ADS)

    Borodinov, A. A.; Myasnikov, V. V.

    2018-04-01

    The present work is devoted to comparing the accuracy of the known qualification algorithms in the task of recognizing local objects on radar images for various image preprocessing methods. Preprocessing involves speckle noise filtering and normalization of the object orientation in the image by the method of image moments and by a method based on the Hough transform. In comparison, the following classification algorithms are used: Decision tree; Support vector machine, AdaBoost, Random forest. The principal component analysis is used to reduce the dimension. The research is carried out on the objects from the base of radar images MSTAR. The paper presents the results of the conducted studies.

  14. Hyperspectral detection of a subsurface CO2 leak in the presence of water stressed vegetation.

    PubMed

    Bellante, Gabriel J; Powell, Scott L; Lawrence, Rick L; Repasky, Kevin S; Dougher, Tracy

    2014-01-01

    Remote sensing of vegetation stress has been posed as a possible large area monitoring tool for surface CO2 leakage from geologic carbon sequestration (GCS) sites since vegetation is adversely affected by elevated CO2 levels in soil. However, the extent to which remote sensing could be used for CO2 leak detection depends on the spectral separability of the plant stress signal caused by various factors, including elevated soil CO2 and water stress. This distinction is crucial to determining the seasonality and appropriateness of remote GCS site monitoring. A greenhouse experiment tested the degree to which plants stressed by elevated soil CO2 could be distinguished from plants that were water stressed. A randomized block design assigned Alfalfa plants (Medicago sativa) to one of four possible treatment groups: 1) a CO2 injection group; 2) a water stress group; 3) an interaction group that was subjected to both water stress and CO2 injection; or 4) a group that received adequate water and no CO2 injection. Single date classification trees were developed to identify individual spectral bands that were significant in distinguishing between CO2 and water stress agents, in addition to a random forest classifier that was used to further understand and validate predictive accuracies. Overall peak classification accuracy was 90% (Kappa of 0.87) for the classification tree analysis and 83% (Kappa of 0.77) for the random forest classifier, demonstrating that vegetation stressed from an underground CO2 leak could be accurately discerned from healthy vegetation and areas of co-occurring water stressed vegetation at certain times. Plants appear to hit a stress threshold, however, that would render detection of a CO2 leak unlikely during severe drought conditions. Our findings suggest that early detection of a CO2 leak with an aerial or ground-based hyperspectral imaging system is possible and could be an important GCS monitoring tool.

  15. Hyperspectral Detection of a Subsurface CO2 Leak in the Presence of Water Stressed Vegetation

    PubMed Central

    Bellante, Gabriel J.; Powell, Scott L.; Lawrence, Rick L.; Repasky, Kevin S.; Dougher, Tracy

    2014-01-01

    Remote sensing of vegetation stress has been posed as a possible large area monitoring tool for surface CO2 leakage from geologic carbon sequestration (GCS) sites since vegetation is adversely affected by elevated CO2 levels in soil. However, the extent to which remote sensing could be used for CO2 leak detection depends on the spectral separability of the plant stress signal caused by various factors, including elevated soil CO2 and water stress. This distinction is crucial to determining the seasonality and appropriateness of remote GCS site monitoring. A greenhouse experiment tested the degree to which plants stressed by elevated soil CO2 could be distinguished from plants that were water stressed. A randomized block design assigned Alfalfa plants (Medicago sativa) to one of four possible treatment groups: 1) a CO2 injection group; 2) a water stress group; 3) an interaction group that was subjected to both water stress and CO2 injection; or 4) a group that received adequate water and no CO2 injection. Single date classification trees were developed to identify individual spectral bands that were significant in distinguishing between CO2 and water stress agents, in addition to a random forest classifier that was used to further understand and validate predictive accuracies. Overall peak classification accuracy was 90% (Kappa of 0.87) for the classification tree analysis and 83% (Kappa of 0.77) for the random forest classifier, demonstrating that vegetation stressed from an underground CO2 leak could be accurately discerned from healthy vegetation and areas of co-occurring water stressed vegetation at certain times. Plants appear to hit a stress threshold, however, that would render detection of a CO2 leak unlikely during severe drought conditions. Our findings suggest that early detection of a CO2 leak with an aerial or ground-based hyperspectral imaging system is possible and could be an important GCS monitoring tool. PMID:25330232

  16. SU-D-207B-02: Early Grade Classification in Meningioma Patients Combining Radiomics and Semantics Data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Coroller, T; Bi, W; Abedalthagafi, M

    Purpose: The clinical management of meningioma is guided by its grade and biologic behavior. Currently, diagnosis of tumor grade follows surgical resection and histopathologic review. Reliable techniques for pre-operative determination of tumor behavior are needed. We investigated the association between imaging features extracted from preoperative gadolinium-enhanced T1-weighted MRI and meningioma grade. Methods: We retrospectively examined the pre-operative MRI for 139 patients with de novo WHO grade I (63%) and grade II (37%) meningiomas. We investigated the predictive power of ten semantic radiologic features as determined by a neuroradiologist, fifteen radiomic features, and tumor location. Conventional (volume and diameter) imaging featuresmore » were added for comparison. AUC was computed for continuous and χ{sup 2} for discrete variables. Classification was done using random forest. Performance was evaluated using cross validation (1000 iterations, 75% training and 25% validation). All p-values were adjusted for multiple testing. Results: Significant association was observed between meningioma grade and tumor location (p<0.001) and two semantic features including intra-tumoral heterogeneity (p<0.001) and overt hemorrhage (p=0.01). Conventional (AUC 0.61–0.67) and eleven radiomic (AUC 0.60–0.70) features were significant from random (p<0.05, Noether test). Median AUC values for classification of tumor grade were 0.57, 0.71, 0.72 and 0.77 respectively for conventional, radiomic, location, and semantic features after using random forest. By combining all imaging data (semantic, radiomic, and location), the median AUC was 0.81, which offers superior predicting power to that of conventional imaging descriptors for meningioma as well as radiomic features alone (p<0.05, permutation test). Conclusion: We demonstrate a strong association between radiologic features and meningioma grade. Pre-operative prediction of tumor behavior based on imaging features offers promise for guiding personalized medicine and improving patient management.« less

  17. Subtyping cognitive profiles in Autism Spectrum Disorder using a Functional Random Forest algorithm.

    PubMed

    Feczko, E; Balba, N M; Miranda-Dominguez, O; Cordova, M; Karalunas, S L; Irwin, L; Demeter, D V; Hill, A P; Langhorst, B H; Grieser Painter, J; Van Santen, J; Fombonne, E J; Nigg, J T; Fair, D A

    2018-05-15

    DSM-5 Autism Spectrum Disorder (ASD) comprises a set of neurodevelopmental disorders characterized by deficits in social communication and interaction and repetitive behaviors or restricted interests, and may both affect and be affected by multiple cognitive mechanisms. This study attempts to identify and characterize cognitive subtypes within the ASD population using our Functional Random Forest (FRF) machine learning classification model. This model trained a traditional random forest model on measures from seven tasks that reflect multiple levels of information processing. 47 ASD diagnosed and 58 typically developing (TD) children between the ages of 9 and 13 participated in this study. Our RF model was 72.7% accurate, with 80.7% specificity and 63.1% sensitivity. Using the random forest model, the FRF then measures the proximity of each subject to every other subject, generating a distance matrix between participants. This matrix is then used in a community detection algorithm to identify subgroups within the ASD and TD groups, and revealed 3 ASD and 4 TD putative subgroups with unique behavioral profiles. We then examined differences in functional brain systems between diagnostic groups and putative subgroups using resting-state functional connectivity magnetic resonance imaging (rsfcMRI). Chi-square tests revealed a significantly greater number of between group differences (p < .05) within the cingulo-opercular, visual, and default systems as well as differences in inter-system connections in the somato-motor, dorsal attention, and subcortical systems. Many of these differences were primarily driven by specific subgroups suggesting that our method could potentially parse the variation in brain mechanisms affected by ASD. Copyright © 2017. Published by Elsevier Inc.

  18. Simple to complex modeling of breathing volume using a motion sensor.

    PubMed

    John, Dinesh; Staudenmayer, John; Freedson, Patty

    2013-06-01

    To compare simple and complex modeling techniques to estimate categories of low, medium, and high ventilation (VE) from ActiGraph™ activity counts. Vertical axis ActiGraph™ GT1M activity counts, oxygen consumption and VE were measured during treadmill walking and running, sports, household chores and labor-intensive employment activities. Categories of low (<19.3 l/min), medium (19.3 to 35.4 l/min) and high (>35.4 l/min) VEs were derived from activity intensity classifications (light <2.9 METs, moderate 3.0 to 5.9 METs and vigorous >6.0 METs). We examined the accuracy of two simple techniques (multiple regression and activity count cut-point analyses) and one complex (random forest technique) modeling technique in predicting VE from activity counts. Prediction accuracy of the complex random forest technique was marginally better than the simple multiple regression method. Both techniques accurately predicted VE categories almost 80% of the time. The multiple regression and random forest techniques were more accurate (85 to 88%) in predicting medium VE. Both techniques predicted the high VE (70 to 73%) with greater accuracy than low VE (57 to 60%). Actigraph™ cut-points for light, medium and high VEs were <1381, 1381 to 3660 and >3660 cpm. There were minor differences in prediction accuracy between the multiple regression and the random forest technique. This study provides methods to objectively estimate VE categories using activity monitors that can easily be deployed in the field. Objective estimates of VE should provide a better understanding of the dose-response relationship between internal exposure to pollutants and disease. Copyright © 2013 Elsevier B.V. All rights reserved.

  19. Managing salinity in Upper Colorado River Basin streams: Selecting catchments for sediment control efforts using watershed characteristics and random forests models

    USGS Publications Warehouse

    Tillman, Fred; Anning, David W.; Heilman, Julian A.; Buto, Susan G.; Miller, Matthew P.

    2018-01-01

    Elevated concentrations of dissolved-solids (salinity) including calcium, sodium, sulfate, and chloride, among others, in the Colorado River cause substantial problems for its water users. Previous efforts to reduce dissolved solids in upper Colorado River basin (UCRB) streams often focused on reducing suspended-sediment transport to streams, but few studies have investigated the relationship between suspended sediment and salinity, or evaluated which watershed characteristics might be associated with this relationship. Are there catchment properties that may help in identifying areas where control of suspended sediment will also reduce salinity transport to streams? A random forests classification analysis was performed on topographic, climate, land cover, geology, rock chemistry, soil, and hydrologic information in 163 UCRB catchments. Two random forests models were developed in this study: one for exploring stream and catchment characteristics associated with stream sites where dissolved solids increase with increasing suspended-sediment concentration, and the other for predicting where these sites are located in unmonitored reaches. Results of variable importance from the exploratory random forests models indicate that no simple source, geochemical process, or transport mechanism can easily explain the relationship between dissolved solids and suspended sediment concentrations at UCRB monitoring sites. Among the most important watershed characteristics in both models were measures of soil hydraulic conductivity, soil erodibility, minimum catchment elevation, catchment area, and the silt component of soil in the catchment. Predictions at key locations in the basin were combined with observations from selected monitoring sites, and presented in map-form to give a complete understanding of where catchment sediment control practices would also benefit control of dissolved solids in streams.

  20. Accuracy of Remotely Sensed Classifications For Stratification of Forest and Nonforest Lands

    Treesearch

    Raymond L. Czaplewski; Paul L. Patterson

    2001-01-01

    We specify accuracy standards for remotely sensed classifications used by FIA to stratify landscapes into two categories: forest and nonforest. Accuracy must be highest when forest area approaches 100 percent of the landscape. If forest area is rare in a landscape, then accuracy in the nonforest stratum must be very high, even at the expense of accuracy in the forest...

  1. Forest Classification Accuracy as Influenced by Multispectral Scanner Spatial Resolution. [Sam Houston National Forest, Texas

    NASA Technical Reports Server (NTRS)

    Nalepka, R. F. (Principal Investigator); Sadowski, F. E.; Sarno, J. E.

    1976-01-01

    The author has identified the following significant results. A supervised classification within two separate ground areas of the Sam Houston National Forest was carried out for two sq meters spatial resolution MSS data. Data were progressively coarsened to simulate five additional cases of spatial resolution ranging up to 64 sq meters. Similar processing and analysis of all spatial resolutions enabled evaluations of the effect of spatial resolution on classification accuracy for various levels of detail and the effects on area proportion estimation for very general forest features. For very coarse resolutions, a subset of spectral channels which simulated the proposed thematic mapper channels was used to study classification accuracy.

  2. A comparative study: classification vs. user-based collaborative filtering for clinical prediction.

    PubMed

    Hao, Fang; Blair, Rachael Hageman

    2016-12-08

    Recommender systems have shown tremendous value for the prediction of personalized item recommendations for individuals in a variety of settings (e.g., marketing, e-commerce, etc.). User-based collaborative filtering is a popular recommender system, which leverages an individuals' prior satisfaction with items, as well as the satisfaction of individuals that are "similar". Recently, there have been applications of collaborative filtering based recommender systems for clinical risk prediction. In these applications, individuals represent patients, and items represent clinical data, which includes an outcome. Application of recommender systems to a problem of this type requires the recasting a supervised learning problem as unsupervised. The rationale is that patients with similar clinical features carry a similar disease risk. As the "Big Data" era progresses, it is likely that approaches of this type will be reached for as biomedical data continues to grow in both size and complexity (e.g., electronic health records). In the present study, we set out to understand and assess the performance of recommender systems in a controlled yet realistic setting. User-based collaborative filtering recommender systems are compared to logistic regression and random forests with different types of imputation and varying amounts of missingness on four different publicly available medical data sets: National Health and Nutrition Examination Survey (NHANES, 2011-2012 on Obesity), Study to Understand Prognoses Preferences Outcomes and Risks of Treatment (SUPPORT), chronic kidney disease, and dermatology data. We also examined performance using simulated data with observations that are Missing At Random (MAR) or Missing Completely At Random (MCAR) under various degrees of missingness and levels of class imbalance in the response variable. Our results demonstrate that user-based collaborative filtering is consistently inferior to logistic regression and random forests with different imputations on real and simulated data. The results warrant caution for the collaborative filtering for the purpose of clinical risk prediction when traditional classification is feasible and practical. CF may not be desirable in datasets where classification is an acceptable alternative. We describe some natural applications related to "Big Data" where CF would be preferred and conclude with some insights as to why caution may be warranted in this context.

  3. The use of space and high altitude aerial photography to classify forest land and to detect forest disturbances

    NASA Technical Reports Server (NTRS)

    Aldrich, R. C.; Greentree, W. J.; Heller, R. C.; Norick, N. X.

    1970-01-01

    In October 1969, an investigation was begun near Atlanta, Georgia, to explore the possibilities of developing predictors for forest land and stand condition classifications using space photography. It has been found that forest area can be predicted with reasonable accuracy on space photographs using ocular techniques. Infrared color film is the best single multiband sensor for this purpose. Using the Apollo 9 infrared color photographs taken in March 1969 photointerpreters were able to predict forest area for small units consistently within 5 to 10 percent of ground truth. Approximately 5,000 density data points were recorded for 14 scan lines selected at random from five study blocks. The mean densities and standard deviations were computed for 13 separate land use classes. The results indicate that forest area cannot be separated from other land uses with a high degree of accuracy using optical film density alone. If, however, densities derived by introducing red, green, and blue cutoff filters in the optical system of the microdensitometer are combined with their differences and their ratios in regression analysis techniques, there is a good possibility of discriminating forest from all other classes.

  4. Prediction of body mass index status from voice signals based on machine learning for automated medical applications.

    PubMed

    Lee, Bum Ju; Kim, Keun Ho; Ku, Boncho; Jang, Jun-Su; Kim, Jong Yeol

    2013-05-01

    The body mass index (BMI) provides essential medical information related to body weight for the treatment and prognosis prediction of diseases such as cardiovascular disease, diabetes, and stroke. We propose a method for the prediction of normal, overweight, and obese classes based only on the combination of voice features that are associated with BMI status, independently of weight and height measurements. A total of 1568 subjects were divided into 4 groups according to age and gender differences. We performed statistical analyses by analysis of variance (ANOVA) and Scheffe test to find significant features in each group. We predicted BMI status (normal, overweight, and obese) by a logistic regression algorithm and two ensemble classification algorithms (bagging and random forests) based on statistically significant features. In the Female-2030 group (females aged 20-40 years), classification experiments using an imbalanced (original) data set gave area under the receiver operating characteristic curve (AUC) values of 0.569-0.731 by logistic regression, whereas experiments using a balanced data set gave AUC values of 0.893-0.994 by random forests. AUC values in Female-4050 (females aged 41-60 years), Male-2030 (males aged 20-40 years), and Male-4050 (males aged 41-60 years) groups by logistic regression in imbalanced data were 0.585-0.654, 0.581-0.614, and 0.557-0.653, respectively. AUC values in Female-4050, Male-2030, and Male-4050 groups in balanced data were 0.629-0.893 by bagging, 0.707-0.916 by random forests, and 0.695-0.854 by bagging, respectively. In each group, we found discriminatory features showing statistical differences among normal, overweight, and obese classes. The results showed that the classification models built by logistic regression in imbalanced data were better than those built by the other two algorithms, and significant features differed according to age and gender groups. Our results could support the development of BMI diagnosis tools for real-time monitoring; such tools are considered helpful in improving automated BMI status diagnosis in remote healthcare or telemedicine and are expected to have applications in forensic and medical science. Copyright © 2013 Elsevier B.V. All rights reserved.

  5. The Classification of Tongue Colors with Standardized Acquisition and ICC Profile Correction in Traditional Chinese Medicine

    PubMed Central

    Tu, Li-ping; Chen, Jing-bo; Hu, Xiao-juan; Zhang, Zhi-feng

    2016-01-01

    Background and Goal. The application of digital image processing techniques and machine learning methods in tongue image classification in Traditional Chinese Medicine (TCM) has been widely studied nowadays. However, it is difficult for the outcomes to generalize because of lack of color reproducibility and image standardization. Our study aims at the exploration of tongue colors classification with a standardized tongue image acquisition process and color correction. Methods. Three traditional Chinese medical experts are chosen to identify the selected tongue pictures taken by the TDA-1 tongue imaging device in TIFF format through ICC profile correction. Then we compare the mean value of L * a * b * of different tongue colors and evaluate the effect of the tongue color classification by machine learning methods. Results. The L * a * b * values of the five tongue colors are statistically different. Random forest method has a better performance than SVM in classification. SMOTE algorithm can increase classification accuracy by solving the imbalance of the varied color samples. Conclusions. At the premise of standardized tongue acquisition and color reproduction, preliminary objectification of tongue color classification in Traditional Chinese Medicine (TCM) is feasible. PMID:28050555

  6. The Classification of Tongue Colors with Standardized Acquisition and ICC Profile Correction in Traditional Chinese Medicine.

    PubMed

    Qi, Zhen; Tu, Li-Ping; Chen, Jing-Bo; Hu, Xiao-Juan; Xu, Jia-Tuo; Zhang, Zhi-Feng

    2016-01-01

    Background and Goal . The application of digital image processing techniques and machine learning methods in tongue image classification in Traditional Chinese Medicine (TCM) has been widely studied nowadays. However, it is difficult for the outcomes to generalize because of lack of color reproducibility and image standardization. Our study aims at the exploration of tongue colors classification with a standardized tongue image acquisition process and color correction. Methods . Three traditional Chinese medical experts are chosen to identify the selected tongue pictures taken by the TDA-1 tongue imaging device in TIFF format through ICC profile correction. Then we compare the mean value of L * a * b * of different tongue colors and evaluate the effect of the tongue color classification by machine learning methods. Results . The L * a * b * values of the five tongue colors are statistically different. Random forest method has a better performance than SVM in classification. SMOTE algorithm can increase classification accuracy by solving the imbalance of the varied color samples. Conclusions . At the premise of standardized tongue acquisition and color reproduction, preliminary objectification of tongue color classification in Traditional Chinese Medicine (TCM) is feasible.

  7. Classification and evaluation for forest sites in the Cumberland Mountains

    Treesearch

    Glendon W. Smalley

    1984-01-01

    This report classifies and evaluates forest sites in the Cumberland Mountains (fig. 1) for the management of several commercially valuable tree species. It provides forest managers with a land classification system that will enable them to subdivide forest land into logical segments (landtypes), allow them to rate productivity, and alert them to any limitations and...

  8. Comparing Forest/Nonforest Classifications of Landsat TM Imagery for Stratifying FIA Estimates of Forest Land Area

    Treesearch

    Mark D. Nelson; Ronald E. McRoberts; Greg C. Liknes; Geoffrey R. Holden

    2005-01-01

    Landsat Thematic Mapper (TM) satellite imagery and Forest Inventory and Analysis (FIA) plot data were used to construct forest/nonforest maps of Mapping Zone 41, National Land Cover Dataset 2000 (NLCD 2000). Stratification approaches resulting from Maximum Likelihood, Fuzzy Convolution, Logistic Regression, and k-Nearest Neighbors classification/prediction methods were...

  9. Crown-condition classification: a guide to data collection and analysis

    Treesearch

    Michael E. Schomaker; Stanley J. Zarnoch; William A. Bechtold; David J. Latelle; William G. Burkman; Susan M. Cox

    2007-01-01

    The Forest Inventory and Analysis (FIA) Program of the Forest Service, U.S. Department of Agriculture, conducts a national inventory of forests across the United States. A systematic subset of permanent inventory plots in 38 States is currently sampled every year for numerous forest health indicators. One of these indicators, crown-condition classification, is designed...

  10. Ecological classification and management characteristics of montane forest land in southwestern Washington.

    Treesearch

    D.G. Brockway; C. Topik

    1984-01-01

    Vegetation, soil, and site data werecollectedthroughout the forested portion of the Pacific silver fir and mountain hemlock zones of the Gifford Pinchot National Forest as part of the Forest Service program to develop anecoIogicallybasedplant association classification system for the Pacific Northwest Region. The major objective of sampling was to include a wide...

  11. Field sampling and data analysis methods for development of ecological land classifications: an application on the Manistee National Forest.

    Treesearch

    George E. Host; Carl W. Ramm; Eunice A. Padley; Kurt S. Pregitzer; James B. Hart; David T. Cleland

    1992-01-01

    Presents technical documentation for development of an Ecological Classification System for the Manistee National Forest in northwest Lower Michigan, and suggests procedures applicable to other ecological land classification projects. Includes discussion of sampling design, field data collection, data summarization and analyses, development of classification units,...

  12. Satellite inventory of Minnesota forest resources

    NASA Technical Reports Server (NTRS)

    Bauer, Marvin E.; Burk, Thomas E.; Ek, Alan R.; Coppin, Pol R.; Lime, Stephen D.; Walsh, Terese A.; Walters, David K.; Befort, William; Heinzen, David F.

    1993-01-01

    The methods and results of using Landsat Thematic Mapper (TM) data to classify and estimate the acreage of forest covertypes in northeastern Minnesota are described. Portions of six TM scenes covering five counties with a total area of 14,679 square miles were classified into six forest and five nonforest classes. The approach involved the integration of cluster sampling, image processing, and estimation. Using cluster sampling, 343 plots, each 88 acres in size, were photo interpreted and field mapped as a source of reference data for classifier training and calibration of the TM data classifications. Classification accuracies of up to 75 percent were achieved; most misclassification was between similar or related classes. An inverse method of calibration, based on the error rates obtained from the classifications of the cluster plots, was used to adjust the classification class proportions for classification errors. The resulting area estimates for total forest land in the five-county area were within 3 percent of the estimate made independently by the USDA Forest Service. Area estimates for conifer and hardwood forest types were within 0.8 and 6.0 percent respectively, of the Forest Service estimates. A trial of a second method of estimating the same classes as the Forest Service resulted in standard errors of 0.002 to 0.015. A study of the use of multidate TM data for change detection showed that forest canopy depletion, canopy increment, and no change could be identified with greater than 90 percent accuracy. The project results have been the basis for the Minnesota Department of Natural Resources and the Forest Service to define and begin to implement an annual system of forest inventory which utilizes Landsat TM data to detect changes in forest cover.

  13. Sub-pixel image classification for forest types in East Texas

    NASA Astrophysics Data System (ADS)

    Westbrook, Joey

    Sub-pixel classification is the extraction of information about the proportion of individual materials of interest within a pixel. Landcover classification at the sub-pixel scale provides more discrimination than traditional per-pixel multispectral classifiers for pixels where the material of interest is mixed with other materials. It allows for the un-mixing of pixels to show the proportion of each material of interest. The materials of interest for this study are pine, hardwood, mixed forest and non-forest. The goal of this project was to perform a sub-pixel classification, which allows a pixel to have multiple labels, and compare the result to a traditional supervised classification, which allows a pixel to have only one label. The satellite image used was a Landsat 5 Thematic Mapper (TM) scene of the Stephen F. Austin Experimental Forest in Nacogdoches County, Texas and the four cover type classes are pine, hardwood, mixed forest and non-forest. Once classified, a multi-layer raster datasets was created that comprised four raster layers where each layer showed the percentage of that cover type within the pixel area. Percentage cover type maps were then produced and the accuracy of each was assessed using a fuzzy error matrix for the sub-pixel classifications, and the results were compared to the supervised classification in which a traditional error matrix was used. The overall accuracy of the sub-pixel classification using the aerial photo for both training and reference data had the highest (65% overall) out of the three sub-pixel classifications. This was understandable because the analyst can visually observe the cover types actually on the ground for training data and reference data, whereas using the FIA (Forest Inventory and Analysis) plot data, the analyst must assume that an entire pixel contains the exact percentage of a cover type found in a plot. An increase in accuracy was found after reclassifying each sub-pixel classification from nine classes with 10 percent interval each to five classes with 20 percent interval each. When compared to the supervised classification which has a satisfactory overall accuracy of 90%, none of the sub-pixel classification achieved the same level. However, since traditional per-pixel classifiers assign only one label to pixels throughout the landscape while sub-pixel classifications assign multiple labels to each pixel, the traditional 85% accuracy of acceptance for pixel-based classifications should not apply to sub-pixel classifications. More research is needed in order to define the level of accuracy that is deemed acceptable for sub-pixel classifications.

  14. Data Field Modeling and Spectral-Spatial Feature Fusion for Hyperspectral Data Classification.

    PubMed

    Liu, Da; Li, Jianxun

    2016-12-16

    Classification is a significant subject in hyperspectral remote sensing image processing. This study proposes a spectral-spatial feature fusion algorithm for the classification of hyperspectral images (HSI). Unlike existing spectral-spatial classification methods, the influences and interactions of the surroundings on each measured pixel were taken into consideration in this paper. Data field theory was employed as the mathematical realization of the field theory concept in physics, and both the spectral and spatial domains of HSI were considered as data fields. Therefore, the inherent dependency of interacting pixels was modeled. Using data field modeling, spatial and spectral features were transformed into a unified radiation form and further fused into a new feature by using a linear model. In contrast to the current spectral-spatial classification methods, which usually simply stack spectral and spatial features together, the proposed method builds the inner connection between the spectral and spatial features, and explores the hidden information that contributed to classification. Therefore, new information is included for classification. The final classification result was obtained using a random forest (RF) classifier. The proposed method was tested with the University of Pavia and Indian Pines, two well-known standard hyperspectral datasets. The experimental results demonstrate that the proposed method has higher classification accuracies than those obtained by the traditional approaches.

  15. Flood induced phenotypic plasticity in amphibious genus Elatine (Elatinaceae).

    PubMed

    Molnár V, Attila; Tóth, János Pál; Sramkó, Gábor; Horváth, Orsolya; Popiela, Agnieszka; Mesterházy, Attila; Lukács, Balázs András

    2015-01-01

    Vegetative characters are widely used in the taxonomy of the amphibious genus Elatine L. However, these usually show great variation not just between species but between their aquatic and terrestrial forms. In the present study we examine the variation of seed and vegetative characters in nine Elatine species (E. brachysperma, E. californica, E. gussonei, E. hexandra, E. hungarica, E. hydropiper, E. macropoda, E. orthosperma and E. triandra) to reveal the extension of plasticity induced by the amphibious environment, and to test character reliability for species identification. Cultivated plant clones were kept under controlled conditions exposed to either aquatic or terrestrial environmental conditions. Six vegetative characters (length of stem, length of internodium, length of lamina, width of lamina, length of petioles, length of pedicel) and four seed characters (curvature, number of pits / lateral row, 1st and 2nd dimension) were measured on 50 fruiting stems of the aquatic and on 50 stems of the terrestrial form of the same clone. MDA, NPMANOVA Random Forest classification and cluster analysis were used to unravel the morphological differences between aquatic and terrestrial forms. The results of MDA cross-validated and Random Forest classification clearly indicated that only seed traits are stable within species (i.e., different forms of the same species keep similar morphology). Consequently, only seed morphology is valuable for taxonomic purposes since vegetative traits are highly influenced by environmental factors.

  16. Flood induced phenotypic plasticity in amphibious genus Elatine (Elatinaceae)

    PubMed Central

    Sramkó, Gábor; Horváth, Orsolya; Popiela, Agnieszka; Mesterházy, Attila; Lukács, Balázs András

    2015-01-01

    Vegetative characters are widely used in the taxonomy of the amphibious genus Elatine L. However, these usually show great variation not just between species but between their aquatic and terrestrial forms. In the present study we examine the variation of seed and vegetative characters in nine Elatine species (E. brachysperma, E. californica, E. gussonei, E. hexandra, E. hungarica, E. hydropiper, E. macropoda, E. orthosperma and E. triandra) to reveal the extension of plasticity induced by the amphibious environment, and to test character reliability for species identification. Cultivated plant clones were kept under controlled conditions exposed to either aquatic or terrestrial environmental conditions. Six vegetative characters (length of stem, length of internodium, length of lamina, width of lamina, length of petioles, length of pedicel) and four seed characters (curvature, number of pits / lateral row, 1st and 2nd dimension) were measured on 50 fruiting stems of the aquatic and on 50 stems of the terrestrial form of the same clone. MDA, NPMANOVA Random Forest classification and cluster analysis were used to unravel the morphological differences between aquatic and terrestrial forms. The results of MDA cross-validated and Random Forest classification clearly indicated that only seed traits are stable within species (i.e., different forms of the same species keep similar morphology). Consequently, only seed morphology is valuable for taxonomic purposes since vegetative traits are highly influenced by environmental factors. PMID:26713235

  17. Chemical Characterization of Young Virgin Queens and Mated Egg-Laying Queens in the Ant Cataglyphis cursor: Random Forest Classification Analysis for Multivariate Datasets.

    PubMed

    Monnin, Thibaud; Helft, Florence; Leroy, Chloé; d'Ettorre, Patrizia; Doums, Claudie

    2018-02-01

    Social insects are well known for their extremely rich chemical communication, yet their sex pheromones remain poorly studied. In the thermophilic and thelytokous ant, Cataglyphis cursor, we analyzed the cuticular hydrocarbon profiles and Dufour's gland contents of queens of different age and reproductive status (sexually immature gynes, sexually mature gynes, mated and egg-laying queens) and of workers. Random forest classification analyses showed that the four groups of individuals were well separated for both chemical sources, except mature gynes that clustered with queens for cuticular hydrocarbons and with immature gynes for Dufour's gland secretions. Analyses carried out with two groups of females only allowed identification of candidate chemicals for queen signal and for sexual attractant. In particular, gynes produced more undecane in the Dufour's gland. This chemical is both the sex pheromone and the alarm pheromone of the ant Formica lugubris. It may therefore act as sex pheromone in C. cursor, and/or be involved in the restoration of monogyny that occurs rapidly following colony fission. Indeed, new colonies often start with several gynes and all but one are rapidly culled by workers, and this process likely involves chemical signals between gynes and workers. These findings open novel opportunities for experimental studies of inclusive mate choice and queen choice in C. cursor.

  18. ToxiM: A Toxicity Prediction Tool for Small Molecules Developed Using Machine Learning and Chemoinformatics Approaches.

    PubMed

    Sharma, Ashok K; Srivastava, Gopal N; Roy, Ankita; Sharma, Vineet K

    2017-01-01

    The experimental methods for the prediction of molecular toxicity are tedious and time-consuming tasks. Thus, the computational approaches could be used to develop alternative methods for toxicity prediction. We have developed a tool for the prediction of molecular toxicity along with the aqueous solubility and permeability of any molecule/metabolite. Using a comprehensive and curated set of toxin molecules as a training set, the different chemical and structural based features such as descriptors and fingerprints were exploited for feature selection, optimization and development of machine learning based classification and regression models. The compositional differences in the distribution of atoms were apparent between toxins and non-toxins, and hence, the molecular features were used for the classification and regression. On 10-fold cross-validation, the descriptor-based, fingerprint-based and hybrid-based classification models showed similar accuracy (93%) and Matthews's correlation coefficient (0.84). The performances of all the three models were comparable (Matthews's correlation coefficient = 0.84-0.87) on the blind dataset. In addition, the regression-based models using descriptors as input features were also compared and evaluated on the blind dataset. Random forest based regression model for the prediction of solubility performed better ( R 2 = 0.84) than the multi-linear regression (MLR) and partial least square regression (PLSR) models, whereas, the partial least squares based regression model for the prediction of permeability (caco-2) performed better ( R 2 = 0.68) in comparison to the random forest and MLR based regression models. The performance of final classification and regression models was evaluated using the two validation datasets including the known toxins and commonly used constituents of health products, which attests to its accuracy. The ToxiM web server would be a highly useful and reliable tool for the prediction of toxicity, solubility, and permeability of small molecules.

  19. ToxiM: A Toxicity Prediction Tool for Small Molecules Developed Using Machine Learning and Chemoinformatics Approaches

    PubMed Central

    Sharma, Ashok K.; Srivastava, Gopal N.; Roy, Ankita; Sharma, Vineet K.

    2017-01-01

    The experimental methods for the prediction of molecular toxicity are tedious and time-consuming tasks. Thus, the computational approaches could be used to develop alternative methods for toxicity prediction. We have developed a tool for the prediction of molecular toxicity along with the aqueous solubility and permeability of any molecule/metabolite. Using a comprehensive and curated set of toxin molecules as a training set, the different chemical and structural based features such as descriptors and fingerprints were exploited for feature selection, optimization and development of machine learning based classification and regression models. The compositional differences in the distribution of atoms were apparent between toxins and non-toxins, and hence, the molecular features were used for the classification and regression. On 10-fold cross-validation, the descriptor-based, fingerprint-based and hybrid-based classification models showed similar accuracy (93%) and Matthews's correlation coefficient (0.84). The performances of all the three models were comparable (Matthews's correlation coefficient = 0.84–0.87) on the blind dataset. In addition, the regression-based models using descriptors as input features were also compared and evaluated on the blind dataset. Random forest based regression model for the prediction of solubility performed better (R2 = 0.84) than the multi-linear regression (MLR) and partial least square regression (PLSR) models, whereas, the partial least squares based regression model for the prediction of permeability (caco-2) performed better (R2 = 0.68) in comparison to the random forest and MLR based regression models. The performance of final classification and regression models was evaluated using the two validation datasets including the known toxins and commonly used constituents of health products, which attests to its accuracy. The ToxiM web server would be a highly useful and reliable tool for the prediction of toxicity, solubility, and permeability of small molecules. PMID:29249969

  20. Intra-regional classification of grape seeds produced in Mendoza province (Argentina) by multi-elemental analysis and chemometrics tools.

    PubMed

    Canizo, Brenda V; Escudero, Leticia B; Pérez, María B; Pellerano, Roberto G; Wuilloud, Rodolfo G

    2018-03-01

    The feasibility of the application of chemometric techniques associated with multi-element analysis for the classification of grape seeds according to their provenance vineyard soil was investigated. Grape seed samples from different localities of Mendoza province (Argentina) were evaluated. Inductively coupled plasma mass spectrometry (ICP-MS) was used for the determination of twenty-nine elements (Ag, As, Ce, Co, Cs, Cu, Eu, Fe, Ga, Gd, La, Lu, Mn, Mo, Nb, Nd, Ni, Pr, Rb, Sm, Te, Ti, Tl, Tm, U, V, Y, Zn and Zr). Once the analytical data were collected, supervised pattern recognition techniques such as linear discriminant analysis (LDA), partial least square discriminant analysis (PLS-DA), k-nearest neighbors (k-NN), support vector machine (SVM) and Random Forest (RF) were applied to construct classification/discrimination rules. The results indicated that nonlinear methods, RF and SVM, perform best with up to 98% and 93% accuracy rate, respectively, and therefore are excellent tools for classification of grapes. Copyright © 2017 Elsevier Ltd. All rights reserved.

  1. Classification Algorithms for Big Data Analysis, a Map Reduce Approach

    NASA Astrophysics Data System (ADS)

    Ayma, V. A.; Ferreira, R. S.; Happ, P.; Oliveira, D.; Feitosa, R.; Costa, G.; Plaza, A.; Gamba, P.

    2015-03-01

    Since many years ago, the scientific community is concerned about how to increase the accuracy of different classification methods, and major achievements have been made so far. Besides this issue, the increasing amount of data that is being generated every day by remote sensors raises more challenges to be overcome. In this work, a tool within the scope of InterIMAGE Cloud Platform (ICP), which is an open-source, distributed framework for automatic image interpretation, is presented. The tool, named ICP: Data Mining Package, is able to perform supervised classification procedures on huge amounts of data, usually referred as big data, on a distributed infrastructure using Hadoop MapReduce. The tool has four classification algorithms implemented, taken from WEKA's machine learning library, namely: Decision Trees, Naïve Bayes, Random Forest and Support Vector Machines (SVM). The results of an experimental analysis using a SVM classifier on data sets of different sizes for different cluster configurations demonstrates the potential of the tool, as well as aspects that affect its performance.

  2. Digital classification of Landsat data for vegetation and land-cover mapping in the Blackfoot River watershed, southeastern Idaho

    USGS Publications Warehouse

    Pettinger, L.R.

    1982-01-01

    This paper documents the procedures, results, and final products of a digital analysis of Landsat data used to produce a vegetation and landcover map of the Blackfoot River watershed in southeastern Idaho. Resource classes were identified at two levels of detail: generalized Level I classes (for example, forest land and wetland) and detailed Levels II and III classes (for example, conifer forest, aspen, wet meadow, and riparian hardwoods). Training set statistics were derived using a modified clustering approach. Environmental stratification that separated uplands from lowlands improved discrimination between resource classes having similar spectral signatures. Digital classification was performed using a maximum likelihood algorithm. Classification accuracy was determined on a single-pixel basis from a random sample of 25-pixel blocks. These blocks were transferred to small-scale color-infrared aerial photographs, and the image area corresponding to each pixel was interpreted. Classification accuracy, expressed as percent agreement of digital classification and photo-interpretation results, was 83.0:t 2.1 percent (0.95 probability level) for generalized (Level I) classes and 52.2:t 2.8 percent (0.95 probability level) for detailed (Levels II and III) classes. After the classified images were geometrically corrected, two types of maps were produced of Level I and Levels II and III resource classes: color-coded maps at a 1:250,000 scale, and flatbed-plotter overlays at a 1:24,000 scale. The overlays are more useful because of their larger scale, familiar format to users, and compatibility with other types of topographic and thematic maps of the same scale.

  3. Determining successional stage of temperate coniferous forests with Landsat satellite data

    NASA Technical Reports Server (NTRS)

    Fiorella, Maria; Ripple, William J.

    1993-01-01

    Thematic Mapper (TM) digital imagery was used to map forest successional stages and to evaluate spectral differences between old-growth and mature forests in the central Cascade Range of Oregon. Relative sun incidence values were incorporated into the successional stage classification to compensate for topographic induced variation. Relative sun incidence improved the classification accuracy of young successional stages, but did not improve the classification accuracy of older, closed canopy forest classes or overall accuracy. TM bands 1, 2, and 4; the normalized difference vegetation index; and TM 4/3, 4/5, and 4/7 band ratio values for o|d-growth forests were found to be significantly lower than the values of mature forests. The Tasseled Cap features of brightness, greenness, and wetness also had significantly lower old-growth values as compared to mature forest values .

  4. An unsupervised two-stage clustering approach for forest structure classification based on X-band InSAR data - A case study in complex temperate forest stands

    NASA Astrophysics Data System (ADS)

    Abdullahi, Sahra; Schardt, Mathias; Pretzsch, Hans

    2017-05-01

    Forest structure at stand level plays a key role for sustainable forest management, since the biodiversity, productivity, growth and stability of the forest can be positively influenced by managing its structural diversity. In contrast to field-based measurements, remote sensing techniques offer a cost-efficient opportunity to collect area-wide information about forest stand structure with high spatial and temporal resolution. Especially Interferometric Synthetic Aperture Radar (InSAR), which facilitates worldwide acquisition of 3d information independent from weather conditions and illumination, is convenient to capture forest stand structure. This study purposes an unsupervised two-stage clustering approach for forest structure classification based on height information derived from interferometric X-band SAR data which was performed in complex temperate forest stands of Traunstein forest (South Germany). In particular, a four dimensional input data set composed of first-order height statistics was non-linearly projected on a two-dimensional Self-Organizing Map, spatially ordered according to similarity (based on the Euclidean distance) in the first stage and classified using the k-means algorithm in the second stage. The study demonstrated that X-band InSAR data exhibits considerable capabilities for forest structure classification. Moreover, the unsupervised classification approach achieved meaningful and reasonable results by means of comparison to aerial imagery and LiDAR data.

  5. Evaluating the potential for site-specific modification of LiDAR DEM derivatives to improve environmental planning-scale wetland identification using Random Forest classification

    NASA Astrophysics Data System (ADS)

    O'Neil, Gina L.; Goodall, Jonathan L.; Watson, Layne T.

    2018-04-01

    Wetlands are important ecosystems that provide many ecological benefits, and their quality and presence are protected by federal regulations. These regulations require wetland delineations, which can be costly and time-consuming to perform. Computer models can assist in this process, but lack the accuracy necessary for environmental planning-scale wetland identification. In this study, the potential for improvement of wetland identification models through modification of digital elevation model (DEM) derivatives, derived from high-resolution and increasingly available light detection and ranging (LiDAR) data, at a scale necessary for small-scale wetland delineations is evaluated. A novel approach of flow convergence modelling is presented where Topographic Wetness Index (TWI), curvature, and Cartographic Depth-to-Water index (DTW), are modified to better distinguish wetland from upland areas, combined with ancillary soil data, and used in a Random Forest classification. This approach is applied to four study sites in Virginia, implemented as an ArcGIS model. The model resulted in significant improvement in average wetland accuracy compared to the commonly used National Wetland Inventory (84.9% vs. 32.1%), at the expense of a moderately lower average non-wetland accuracy (85.6% vs. 98.0%) and average overall accuracy (85.6% vs. 92.0%). From this, we concluded that modifying TWI, curvature, and DTW provides more robust wetland and non-wetland signatures to the models by improving accuracy rates compared to classifications using the original indices. The resulting ArcGIS model is a general tool able to modify these local LiDAR DEM derivatives based on site characteristics to identify wetlands at a high resolution.

  6. Classification of suicide attempters in schizophrenia using sociocultural and clinical features: A machine learning approach.

    PubMed

    Hettige, Nuwan C; Nguyen, Thai Binh; Yuan, Chen; Rajakulendran, Thanara; Baddour, Jermeen; Bhagwat, Nikhil; Bani-Fatemi, Ali; Voineskos, Aristotle N; Mallar Chakravarty, M; De Luca, Vincenzo

    2017-07-01

    Suicide is a major concern for those afflicted by schizophrenia. Identifying patients at the highest risk for future suicide attempts remains a complex problem for psychiatric interventions. Machine learning models allow for the integration of many risk factors in order to build an algorithm that predicts which patients are likely to attempt suicide. Currently it is unclear how to integrate previously identified risk factors into a clinically relevant predictive tool to estimate the probability of a patient with schizophrenia for attempting suicide. We conducted a cross-sectional assessment on a sample of 345 participants diagnosed with schizophrenia spectrum disorders. Suicide attempters and non-attempters were clearly identified using the Columbia Suicide Severity Rating Scale (C-SSRS) and the Beck Suicide Ideation Scale (BSS). We developed four classification algorithms using a regularized regression, random forest, elastic net and support vector machine models with sociocultural and clinical variables as features to train the models. All classification models performed similarly in identifying suicide attempters and non-attempters. Our regularized logistic regression model demonstrated an accuracy of 67% and an area under the curve (AUC) of 0.71, while the random forest model demonstrated 66% accuracy and an AUC of 0.67. Support vector classifier (SVC) model demonstrated an accuracy of 67% and an AUC of 0.70, and the elastic net model demonstrated and accuracy of 65% and an AUC of 0.71. Machine learning algorithms offer a relatively successful method for incorporating many clinical features to predict individuals at risk for future suicide attempts. Increased performance of these models using clinically relevant variables offers the potential to facilitate early treatment and intervention to prevent future suicide attempts. Copyright © 2017 Elsevier Inc. All rights reserved.

  7. Empirical study of seven data mining algorithms on different characteristics of datasets for biomedical classification applications.

    PubMed

    Zhang, Yiyan; Xin, Yi; Li, Qin; Ma, Jianshe; Li, Shuai; Lv, Xiaodan; Lv, Weiqi

    2017-11-02

    Various kinds of data mining algorithms are continuously raised with the development of related disciplines. The applicable scopes and their performances of these algorithms are different. Hence, finding a suitable algorithm for a dataset is becoming an important emphasis for biomedical researchers to solve practical problems promptly. In this paper, seven kinds of sophisticated active algorithms, namely, C4.5, support vector machine, AdaBoost, k-nearest neighbor, naïve Bayes, random forest, and logistic regression, were selected as the research objects. The seven algorithms were applied to the 12 top-click UCI public datasets with the task of classification, and their performances were compared through induction and analysis. The sample size, number of attributes, number of missing values, and the sample size of each class, correlation coefficients between variables, class entropy of task variable, and the ratio of the sample size of the largest class to the least class were calculated to character the 12 research datasets. The two ensemble algorithms reach high accuracy of classification on most datasets. Moreover, random forest performs better than AdaBoost on the unbalanced dataset of the multi-class task. Simple algorithms, such as the naïve Bayes and logistic regression model are suitable for a small dataset with high correlation between the task and other non-task attribute variables. K-nearest neighbor and C4.5 decision tree algorithms perform well on binary- and multi-class task datasets. Support vector machine is more adept on the balanced small dataset of the binary-class task. No algorithm can maintain the best performance in all datasets. The applicability of the seven data mining algorithms on the datasets with different characteristics was summarized to provide a reference for biomedical researchers or beginners in different fields.

  8. Categorizing renal oncocytic neoplasms on core needle biopsy: a morphologic and immunophenotypic study of 144 cases with clinical follow-up. Alderman MA, Daignault S, Wolf JS Jr, Palapattu GS, Weizer AZ, Hafez KS, Kunju LP, Wu AJ. Hum Pathol.September 2016;55:1-10.

    PubMed

    Kryvenko, Oleksandr N

    2017-06-01

    There is limited literature on renal oncocytic neoplasms diagnosed on core biopsy. All renal oncocytic neoplasm core biopsies from 2006 to 2013 were, retrospectively, reviewed. Morphologic features and an immunohistochemical panel of CK7, c-KIT, and S100A1 were assessed. Concordance with resection diagnosis, statistical analysis including a random forest classification, and follow-up were recorded. The postimmunohistochemical diagnoses of 144 renal oncocytic core biopsies were favor oncocytoma (67%), favor renal cell carcinoma (RCC) (12%), and cannot exclude RCC (21%). Diagnosis was revised following immunohistochemistry in 7% of cases. The most common features for oncocytoma (excluding dense granular cytoplasm) were nested architecture, edematous stroma, binucleation and tubular architecture; the most common features for favor RCC were sheet-like architecture, nuclear pleomorphism, papillary architecture, and prominent cell borders. High nuclear grade, necrosis, extensive papillary architecture, raisinoid nuclei, and frequent mitoses were not seen in oncocytomas. Comparing the pathologist and random forest classification, the overall out-of-bag estimate of classification error dropped from 23% to 13% when favor RCC and cannot exclude RCC was combined into 1 category. Resection was performed in 19% (28 cases) with a 94% concordance (100% of favor RCC biopsies and 90% of cannot exclude RCC biopsies confirmed as RCC; 83% of favor oncocytomas confirmed); ablation in 23%; and surveillance in 46%. Follow-up was available in 92% (median follow-up, 33 months) with no adverse outcomes. Renal oncocytic neoplasms comprise a significant subset (16%) of all core biopsies, and the majority (78%) can be classified as favor oncocytoma or favor RCC. Copyright © 2017 Elsevier Inc. All rights reserved.

  9. Combining deep residual neural network features with supervised machine learning algorithms to classify diverse food image datasets.

    PubMed

    McAllister, Patrick; Zheng, Huiru; Bond, Raymond; Moorhead, Anne

    2018-04-01

    Obesity is increasing worldwide and can cause many chronic conditions such as type-2 diabetes, heart disease, sleep apnea, and some cancers. Monitoring dietary intake through food logging is a key method to maintain a healthy lifestyle to prevent and manage obesity. Computer vision methods have been applied to food logging to automate image classification for monitoring dietary intake. In this work we applied pretrained ResNet-152 and GoogleNet convolutional neural networks (CNNs), initially trained using ImageNet Large Scale Visual Recognition Challenge (ILSVRC) dataset with MatConvNet package, to extract features from food image datasets; Food 5K, Food-11, RawFooT-DB, and Food-101. Deep features were extracted from CNNs and used to train machine learning classifiers including artificial neural network (ANN), support vector machine (SVM), Random Forest, and Naive Bayes. Results show that using ResNet-152 deep features with SVM with RBF kernel can accurately detect food items with 99.4% accuracy using Food-5K validation food image dataset and 98.8% with Food-5K evaluation dataset using ANN, SVM-RBF, and Random Forest classifiers. Trained with ResNet-152 features, ANN can achieve 91.34%, 99.28% when applied to Food-11 and RawFooT-DB food image datasets respectively and SVM with RBF kernel can achieve 64.98% with Food-101 image dataset. From this research it is clear that using deep CNN features can be used efficiently for diverse food item image classification. The work presented in this research shows that pretrained ResNet-152 features provide sufficient generalisation power when applied to a range of food image classification tasks. Copyright © 2018 Elsevier Ltd. All rights reserved.

  10. Data-driven mapping of the potential mountain permafrost distribution.

    PubMed

    Deluigi, Nicola; Lambiel, Christophe; Kanevski, Mikhail

    2017-07-15

    Existing mountain permafrost distribution models generally offer a good overview of the potential extent of this phenomenon at a regional scale. They are however not always able to reproduce the high spatial discontinuity of permafrost at the micro-scale (scale of a specific landform; ten to several hundreds of meters). To overcome this lack, we tested an alternative modelling approach using three classification algorithms belonging to statistics and machine learning: Logistic regression, Support Vector Machines and Random forests. These supervised learning techniques infer a classification function from labelled training data (pixels of permafrost absence and presence) with the aim of predicting the permafrost occurrence where it is unknown. The research was carried out in a 588km 2 area of the Western Swiss Alps. Permafrost evidences were mapped from ortho-image interpretation (rock glacier inventorying) and field data (mainly geoelectrical and thermal data). The relationship between selected permafrost evidences and permafrost controlling factors was computed with the mentioned techniques. Classification performances, assessed with AUROC, range between 0.81 for Logistic regression, 0.85 with Support Vector Machines and 0.88 with Random forests. The adopted machine learning algorithms have demonstrated to be efficient for permafrost distribution modelling thanks to consistent results compared to the field reality. The high resolution of the input dataset (10m) allows elaborating maps at the micro-scale with a modelled permafrost spatial distribution less optimistic than classic spatial models. Moreover, the probability output of adopted algorithms offers a more precise overview of the potential distribution of mountain permafrost than proposing simple indexes of the permafrost favorability. These encouraging results also open the way to new possibilities of permafrost data analysis and mapping. Copyright © 2017 Elsevier B.V. All rights reserved.

  11. A Dirichlet-Multinomial Bayes Classifier for Disease Diagnosis with Microbial Compositions.

    PubMed

    Gao, Xiang; Lin, Huaiying; Dong, Qunfeng

    2017-01-01

    Dysbiosis of microbial communities is associated with various human diseases, raising the possibility of using microbial compositions as biomarkers for disease diagnosis. We have developed a Bayes classifier by modeling microbial compositions with Dirichlet-multinomial distributions, which are widely used to model multicategorical count data with extra variation. The parameters of the Dirichlet-multinomial distributions are estimated from training microbiome data sets based on maximum likelihood. The posterior probability of a microbiome sample belonging to a disease or healthy category is calculated based on Bayes' theorem, using the likelihood values computed from the estimated Dirichlet-multinomial distribution, as well as a prior probability estimated from the training microbiome data set or previously published information on disease prevalence. When tested on real-world microbiome data sets, our method, called DMBC (for Dirichlet-multinomial Bayes classifier), shows better classification accuracy than the only existing Bayesian microbiome classifier based on a Dirichlet-multinomial mixture model and the popular random forest method. The advantage of DMBC is its built-in automatic feature selection, capable of identifying a subset of microbial taxa with the best classification accuracy between different classes of samples based on cross-validation. This unique ability enables DMBC to maintain and even improve its accuracy at modeling species-level taxa. The R package for DMBC is freely available at https://github.com/qunfengdong/DMBC. IMPORTANCE By incorporating prior information on disease prevalence, Bayes classifiers have the potential to estimate disease probability better than other common machine-learning methods. Thus, it is important to develop Bayes classifiers specifically tailored for microbiome data. Our method shows higher classification accuracy than the only existing Bayesian classifier and the popular random forest method, and thus provides an alternative option for using microbial compositions for disease diagnosis.

  12. Differences in forest area classification based on tree tally from variable- and fixed-radius plots

    Treesearch

    David Azuma; Vicente J. Monleon

    2011-01-01

    In forest inventory, it is not enough to formulate a definition; it is also necessary to define the "measurement procedure." In the classification of forestland by dominant cover type, the measurement design (the plot) can affect the outcome of the classification. We present results of a simulation study comparing classification of the dominant cover type...

  13. Modified Bat Algorithm for Feature Selection with the Wisconsin Diagnosis Breast Cancer (WDBC) Dataset

    PubMed

    Jeyasingh, Suganthi; Veluchamy, Malathi

    2017-05-01

    Early diagnosis of breast cancer is essential to save lives of patients. Usually, medical datasets include a large variety of data that can lead to confusion during diagnosis. The Knowledge Discovery on Database (KDD) process helps to improve efficiency. It requires elimination of inappropriate and repeated data from the dataset before final diagnosis. This can be done using any of the feature selection algorithms available in data mining. Feature selection is considered as a vital step to increase the classification accuracy. This paper proposes a Modified Bat Algorithm (MBA) for feature selection to eliminate irrelevant features from an original dataset. The Bat algorithm was modified using simple random sampling to select the random instances from the dataset. Ranking was with the global best features to recognize the predominant features available in the dataset. The selected features are used to train a Random Forest (RF) classification algorithm. The MBA feature selection algorithm enhanced the classification accuracy of RF in identifying the occurrence of breast cancer. The Wisconsin Diagnosis Breast Cancer Dataset (WDBC) was used for estimating the performance analysis of the proposed MBA feature selection algorithm. The proposed algorithm achieved better performance in terms of Kappa statistic, Mathew’s Correlation Coefficient, Precision, F-measure, Recall, Mean Absolute Error (MAE), Root Mean Square Error (RMSE), Relative Absolute Error (RAE) and Root Relative Squared Error (RRSE). Creative Commons Attribution License

  14. Random forests-based differential analysis of gene sets for gene expression data.

    PubMed

    Hsueh, Huey-Miin; Zhou, Da-Wei; Tsai, Chen-An

    2013-04-10

    In DNA microarray studies, gene-set analysis (GSA) has become the focus of gene expression data analysis. GSA utilizes the gene expression profiles of functionally related gene sets in Gene Ontology (GO) categories or priori-defined biological classes to assess the significance of gene sets associated with clinical outcomes or phenotypes. Many statistical approaches have been proposed to determine whether such functionally related gene sets express differentially (enrichment and/or deletion) in variations of phenotypes. However, little attention has been given to the discriminatory power of gene sets and classification of patients. In this study, we propose a method of gene set analysis, in which gene sets are used to develop classifications of patients based on the Random Forest (RF) algorithm. The corresponding empirical p-value of an observed out-of-bag (OOB) error rate of the classifier is introduced to identify differentially expressed gene sets using an adequate resampling method. In addition, we discuss the impacts and correlations of genes within each gene set based on the measures of variable importance in the RF algorithm. Significant classifications are reported and visualized together with the underlying gene sets and their contribution to the phenotypes of interest. Numerical studies using both synthesized data and a series of publicly available gene expression data sets are conducted to evaluate the performance of the proposed methods. Compared with other hypothesis testing approaches, our proposed methods are reliable and successful in identifying enriched gene sets and in discovering the contributions of genes within a gene set. The classification results of identified gene sets can provide an valuable alternative to gene set testing to reveal the unknown, biologically relevant classes of samples or patients. In summary, our proposed method allows one to simultaneously assess the discriminatory ability of gene sets and the importance of genes for interpretation of data in complex biological systems. The classifications of biologically defined gene sets can reveal the underlying interactions of gene sets associated with the phenotypes, and provide an insightful complement to conventional gene set analyses. Copyright © 2012 Elsevier B.V. All rights reserved.

  15. Statistical Approaches to Type Determination of the Ejector Marks on Cartridge Cases.

    PubMed

    Warren, Eric M; Sheets, H David

    2018-03-01

    While type determination on bullets has been performed for over a century, type determination on cartridge cases is often overlooked. Presented here is an example of type determination of ejector marks on cartridge cases from Glock and Smith & Wesson Sigma series pistols using Naïve Bayes and Random Forest classification methods. The shapes of ejector marks were captured from images of test-fired cartridge cases and subjected to multivariate analysis. Naïve Bayes and Random Forest methods were used to assign the ejector shapes to the correct class of firearm with success rates as high as 98%. This method is easily implemented with equipment already available in crime laboratories and can serve as an investigative lead in the form of a list of firearms that could have fired the evidence. Paired with the FBI's General Rifling Characteristics (GRC) database, this could be an invaluable resource for firearm evidence at crime scenes. © 2017 American Academy of Forensic Sciences.

  16. Automated Classification of Consumer Health Information Needs in Patient Portal Messages.

    PubMed

    Cronin, Robert M; Fabbri, Daniel; Denny, Joshua C; Jackson, Gretchen Purcell

    2015-01-01

    Patients have diverse health information needs, and secure messaging through patient portals is an emerging means by which such needs are expressed and met. As patient portal adoption increases, growing volumes of secure messages may burden healthcare providers. Automated classification could expedite portal message triage and answering. We created four automated classifiers based on word content and natural language processing techniques to identify health information needs in 1000 patient-generated portal messages. Logistic regression and random forest classifiers detected single information needs well, with area under the curves of 0.804-0.914. A logistic regression classifier accurately found the set of needs within a message, with a Jaccard index of 0.859 (95% Confidence Interval: (0.847, 0.871)). Automated classification of consumer health information needs expressed in patient portal messages is feasible and may allow direct linking to relevant resources or creation of institutional resources for commonly expressed needs.

  17. Automated Classification of Consumer Health Information Needs in Patient Portal Messages

    PubMed Central

    Cronin, Robert M.; Fabbri, Daniel; Denny, Joshua C.; Jackson, Gretchen Purcell

    2015-01-01

    Patients have diverse health information needs, and secure messaging through patient portals is an emerging means by which such needs are expressed and met. As patient portal adoption increases, growing volumes of secure messages may burden healthcare providers. Automated classification could expedite portal message triage and answering. We created four automated classifiers based on word content and natural language processing techniques to identify health information needs in 1000 patient-generated portal messages. Logistic regression and random forest classifiers detected single information needs well, with area under the curves of 0.804–0.914. A logistic regression classifier accurately found the set of needs within a message, with a Jaccard index of 0.859 (95% Confidence Interval: (0.847, 0.871)). Automated classification of consumer health information needs expressed in patient portal messages is feasible and may allow direct linking to relevant resources or creation of institutional resources for commonly expressed needs. PMID:26958285

  18. Mapping of taiga forest units using AIRSAR data and/or optical data, and retrieval of forest parameters

    NASA Technical Reports Server (NTRS)

    Rignot, Eric; Williams, Cynthia; Way, Jobea; Viereck, Leslie

    1993-01-01

    A maximum a posteriori Bayesian classifier for multifrequency polarimetric SAR data is used to perform a supervised classification of forest types in the floodplains of Alaska. The image classes include white spruce, balsam poplar, black spruce, alder, non-forests, and open water. The authors investigate the effect on classification accuracy of changing environmental conditions, and of frequency and polarization of the signal. The highest classification accuracy (86 percent correctly classified forest pixels, and 91 percent overall) is obtained combining L- and C-band frequencies fully polarimetric on a date where the forest is just recovering from flooding. The forest map compares favorably with a vegetation map assembled from digitized aerial photos which took five years for completion, and address the state of the forest in 1978, ignoring subsequent fires, changes in the course of the river, clear-cutting of trees, and tree growth. HV-polarization is the most useful polarization at L- and C-band for classification. C-band VV (ERS-1 mode) and L-band HH (J-ERS-1 mode) alone or combined yield unsatisfactory classification accuracies. Additional data acquired in the winter season during thawed and frozen days yield classification accuracies respectively 20 percent and 30 percent lower due to a greater confusion between conifers and deciduous trees. Data acquired at the peak of flooding in May 1991 also yield classification accuracies 10 percent lower because of dominant trunk-ground interactions which mask out finer differences in radar backscatter between tree species. Combination of several of these dates does not improve classification accuracy. For comparison, panchromatic optical data acquired by SPOT in the summer season of 1991 are used to classify the same area. The classification accuracy (78 percent for the forest types and 90 percent if open water is included) is lower than that obtained with AIRSAR although conifers and deciduous trees are better separated due to the presence of leaves on the deciduous trees. Optical data do not separate black spruce and white spruce as well as SAR data, cannot separate alder from balsam poplar, and are of course limited by the frequent cloud cover in the polar regions. Yet, combining SPOT and AIRSAR offers better chances to identify vegetation types independent of ground truth information using a combination of NDVI indexes from SPOT, biomass numbers from AIRSAR, and a segmentation map from either one.

  19. Evolving optimised decision rules for intrusion detection using particle swarm paradigm

    NASA Astrophysics Data System (ADS)

    Sivatha Sindhu, Siva S.; Geetha, S.; Kannan, A.

    2012-12-01

    The aim of this article is to construct a practical intrusion detection system (IDS) that properly analyses the statistics of network traffic pattern and classify them as normal or anomalous class. The objective of this article is to prove that the choice of effective network traffic features and a proficient machine-learning paradigm enhances the detection accuracy of IDS. In this article, a rule-based approach with a family of six decision tree classifiers, namely Decision Stump, C4.5, Naive Baye's Tree, Random Forest, Random Tree and Representative Tree model to perform the detection of anomalous network pattern is introduced. In particular, the proposed swarm optimisation-based approach selects instances that compose training set and optimised decision tree operate over this trained set producing classification rules with improved coverage, classification capability and generalisation ability. Experiment with the Knowledge Discovery and Data mining (KDD) data set which have information on traffic pattern, during normal and intrusive behaviour shows that the proposed algorithm produces optimised decision rules and outperforms other machine-learning algorithm.

  20. Building Diversified Multiple Trees for classification in high dimensional noisy biomedical data.

    PubMed

    Li, Jiuyong; Liu, Lin; Liu, Jixue; Green, Ryan

    2017-12-01

    It is common that a trained classification model is applied to the operating data that is deviated from the training data because of noise. This paper will test an ensemble method, Diversified Multiple Tree (DMT), on its capability for classifying instances in a new laboratory using the classifier built on the instances of another laboratory. DMT is tested on three real world biomedical data sets from different laboratories in comparison with four benchmark ensemble methods, AdaBoost, Bagging, Random Forests, and Random Trees. Experiments have also been conducted on studying the limitation of DMT and its possible variations. Experimental results show that DMT is significantly more accurate than other benchmark ensemble classifiers on classifying new instances of a different laboratory from the laboratory where instances are used to build the classifier. This paper demonstrates that an ensemble classifier, DMT, is more robust in classifying noisy data than other widely used ensemble methods. DMT works on the data set that supports multiple simple trees.

  1. Capability of Integrated MODIS Imagery and ALOS for Oil Palm, Rubber and Forest Areas Mapping in Tropical Forest Regions

    PubMed Central

    Razali, Sheriza Mohd; Marin, Arnaldo; Nuruddin, Ahmad Ainuddin; Shafri, Helmi Zulhaidi Mohd; Hamid, Hazandy Abdul

    2014-01-01

    Various classification methods have been applied for low resolution of the entire Earth's surface from recorded satellite images, but insufficient study has determined which method, for which satellite data, is economically viable for tropical forest land use mapping. This study employed Iterative Self Organizing Data Analysis Techniques (ISODATA) and K-Means classification techniques to classified Moderate Resolution Imaging Spectroradiometer (MODIS) Surface Reflectance satellite image into forests, oil palm groves, rubber plantations, mixed horticulture, mixed oil palm and rubber and mixed forest and rubber. Even though frequent cloud cover has been a challenge for mapping tropical forests, our MODIS land use classification map found that 2008 ISODATA-1 performed well with overall accuracy of 94%, with the highest Producer's Accuracy of Forest with 86%, and were consistent with MODIS Land Cover 2008 (MOD12Q1), respectively. The MODIS land use classification was able to distinguish young oil palm groves from open areas, rubber and mature oil palm plantations, on the Advanced Land Observing Satellite (ALOS) map, whereas rubber was more easily distinguished from an open area than from mixed rubber and forest. This study provides insight on the potential for integrating regional databases and temporal MODIS data, in order to map land use in tropical forest regions. PMID:24811079

  2. Capability of integrated MODIS imagery and ALOS for oil palm, rubber and forest areas mapping in tropical forest regions.

    PubMed

    Razali, Sheriza Mohd; Marin, Arnaldo; Nuruddin, Ahmad Ainuddin; Shafri, Helmi Zulhaidi Mohd; Hamid, Hazandy Abdul

    2014-05-07

    Various classification methods have been applied for low resolution of the entire Earth's surface from recorded satellite images, but insufficient study has determined which method, for which satellite data, is economically viable for tropical forest land use mapping. This study employed Iterative Self Organizing Data Analysis Techniques (ISODATA) and K-Means classification techniques to classified Moderate Resolution Imaging Spectroradiometer (MODIS) Surface Reflectance satellite image into forests, oil palm groves, rubber plantations, mixed horticulture, mixed oil palm and rubber and mixed forest and rubber. Even though frequent cloud cover has been a challenge for mapping tropical forests, our MODIS land use classification map found that 2008 ISODATA-1 performed well with overall accuracy of 94%, with the highest Producer's Accuracy of Forest with 86%, and were consistent with MODIS Land Cover 2008 (MOD12Q1), respectively. The MODIS land use classification was able to distinguish young oil palm groves from open areas, rubber and mature oil palm plantations, on the Advanced Land Observing Satellite (ALOS) map, whereas rubber was more easily distinguished from an open area than from mixed rubber and forest. This study provides insight on the potential for integrating regional databases and temporal MODIS data, in order to map land use in tropical forest regions.

  3. Multi-class computational evolution: development, benchmark evaluation and application to RNA-Seq biomarker discovery.

    PubMed

    Crabtree, Nathaniel M; Moore, Jason H; Bowyer, John F; George, Nysia I

    2017-01-01

    A computational evolution system (CES) is a knowledge discovery engine that can identify subtle, synergistic relationships in large datasets. Pareto optimization allows CESs to balance accuracy with model complexity when evolving classifiers. Using Pareto optimization, a CES is able to identify a very small number of features while maintaining high classification accuracy. A CES can be designed for various types of data, and the user can exploit expert knowledge about the classification problem in order to improve discrimination between classes. These characteristics give CES an advantage over other classification and feature selection algorithms, particularly when the goal is to identify a small number of highly relevant, non-redundant biomarkers. Previously, CESs have been developed only for binary class datasets. In this study, we developed a multi-class CES. The multi-class CES was compared to three common feature selection and classification algorithms: support vector machine (SVM), random k-nearest neighbor (RKNN), and random forest (RF). The algorithms were evaluated on three distinct multi-class RNA sequencing datasets. The comparison criteria were run-time, classification accuracy, number of selected features, and stability of selected feature set (as measured by the Tanimoto distance). The performance of each algorithm was data-dependent. CES performed best on the dataset with the smallest sample size, indicating that CES has a unique advantage since the accuracy of most classification methods suffer when sample size is small. The multi-class extension of CES increases the appeal of its application to complex, multi-class datasets in order to identify important biomarkers and features.

  4. Automatic Classification of Aerial Imagery for Urban Hydrological Applications

    NASA Astrophysics Data System (ADS)

    Paul, A.; Yang, C.; Breitkopf, U.; Liu, Y.; Wang, Z.; Rottensteiner, F.; Wallner, M.; Verworn, A.; Heipke, C.

    2018-04-01

    In this paper we investigate the potential of automatic supervised classification for urban hydrological applications. In particular, we contribute to runoff simulations using hydrodynamic urban drainage models. In order to assess whether the capacity of the sewers is sufficient to avoid surcharge within certain return periods, precipitation is transformed into runoff. The transformation of precipitation into runoff requires knowledge about the proportion of drainage-effective areas and their spatial distribution in the catchment area. Common simulation methods use the coefficient of imperviousness as an important parameter to estimate the overland flow, which subsequently contributes to the pipe flow. The coefficient of imperviousness is the percentage of area covered by impervious surfaces such as roofs or road surfaces. It is still common practice to assign the coefficient of imperviousness for each particular land parcel manually by visual interpretation of aerial images. Based on classification results of these imagery we contribute to an objective automatic determination of the coefficient of imperviousness. In this context we compare two classification techniques: Random Forests (RF) and Conditional Random Fields (CRF). Experimental results performed on an urban test area show good results and confirm that the automated derivation of the coefficient of imperviousness, apart from being more objective and, thus, reproducible, delivers more accurate results than the interactive estimation. We achieve an overall accuracy of about 85 % for both classifiers. The root mean square error of the differences of the coefficient of imperviousness compared to the reference is 4.4 % for the CRF-based classification, and 3.8 % for the RF-based classification.

  5. Coast redwood ecological types of southern Monterey County, California

    Treesearch

    Mark Borchert; Daniel Segotta; Michael D. Purser

    1988-01-01

    An ecological classification system has been developed for the Pacific Southwest Region of the Forest Service. As part of this classification effort, coast redwood (Sequoia sempervirens) forests of southern Monterey County in the Los Padres National Forest were classified into six ecological types using vegetation, soils and geomorphology taken from...

  6. Applications of satellite remote sensing to forested ecosystems

    Treesearch

    Louis R. Iverson; Robin Lambert Graham; Elizabeth A. Cook; Elizabeth A. Cook

    1989-01-01

    Since the launch of the first civilian earth-observing satellite in 1972, satellite remote sensing has provided increasingly sophisticated information on the structure and function of forested ecosystems. Forest classification and mapping, common uses of satellite data, have improved over the years as a result of more discriminating sensors, better classification...

  7. Habitat typing versus advanced vegetation classification in western forests

    Treesearch

    Tony Kusbach; John Shaw; James Long; Helga Van Miegroet

    2012-01-01

    Major habitat and community types in northern Utah were compared with plant alliances and associations that were derived from fidelity- and diagnostic-species classification concepts. Each of these classification approaches was associated with important environmental factors. Within a 20,000-ha watershed, 103 forest ecosystems were described by physiographic features,...

  8. Multi-fractal detrended texture feature for brain tumor classification

    NASA Astrophysics Data System (ADS)

    Reza, Syed M. S.; Mays, Randall; Iftekharuddin, Khan M.

    2015-03-01

    We propose a novel non-invasive brain tumor type classification using Multi-fractal Detrended Fluctuation Analysis (MFDFA) [1] in structural magnetic resonance (MR) images. This preliminary work investigates the efficacy of the MFDFA features along with our novel texture feature known as multifractional Brownian motion (mBm) [2] in classifying (grading) brain tumors as High Grade (HG) and Low Grade (LG). Based on prior performance, Random Forest (RF) [3] is employed for tumor grading using two different datasets such as BRATS-2013 [4] and BRATS-2014 [5]. Quantitative scores such as precision, recall, accuracy are obtained using the confusion matrix. On an average 90% precision and 85% recall from the inter-dataset cross-validation confirm the efficacy of the proposed method.

  9. A preliminary test of an ecological classification system for the Oconee National Forest using forest inventory and analysis data

    Treesearch

    W. Henry McNab; Ronald B. Stephens; Richard D. Rightmyer; Erika M. Mavity; Samuel G. Lambert

    2012-01-01

    An ecological classification system (ECS) has been developed for use in evaluating management, conservation and restoration options for forest and wildlife resources on the Oconee National Forest. Our study was the initial evaluation of the ECS to determine if the units at each level differed in potential productivity. We used loblolly pine (Pinus taeda...

  10. Forest tree species clssification based on airborne hyper-spectral imagery

    NASA Astrophysics Data System (ADS)

    Dian, Yuanyong; Li, Zengyuan; Pang, Yong

    2013-10-01

    Forest precision classification products were the basic data for surveying of forest resource, updating forest subplot information, logging and design of forest. However, due to the diversity of stand structure, complexity of the forest growth environment, it's difficult to discriminate forest tree species using multi-spectral image. The airborne hyperspectral images can achieve the high spatial and spectral resolution imagery of forest canopy, so it will good for tree species level classification. The aim of this paper was to test the effective of combining spatial and spectral features in airborne hyper-spectral image classification. The CASI hyper spectral image data were acquired from Liangshui natural reserves area. Firstly, we use the MNF (minimum noise fraction) transform method for to reduce the hyperspectral image dimensionality and highlighting variation. And secondly, we use the grey level co-occurrence matrix (GLCM) to extract the texture features of forest tree canopy from the hyper-spectral image, and thirdly we fused the texture and the spectral features of forest canopy to classify the trees species using support vector machine (SVM) with different kernel functions. The results showed that when using the SVM classifier, MNF and texture-based features combined with linear kernel function can achieve the best overall accuracy which was 85.92%. It was also confirm that combine the spatial and spectral information can improve the accuracy of tree species classification.

  11. The influence of multispectral scanner spatial resolution on forest feature classification

    NASA Technical Reports Server (NTRS)

    Sadowski, F. G.; Malila, W. A.; Sarno, J. E.; Nalepka, R. F.

    1977-01-01

    Inappropriate spatial resolution and corresponding data processing techniques may be major causes for non-optimal forest classification results frequently achieved from multispectral scanner (MSS) data. Procedures and results of empirical investigations are studied to determine the influence of MSS spatial resolution on the classification of forest features into levels of detail or hierarchies of information that might be appropriate for nationwide forest surveys and detailed in-place inventories. Two somewhat different, but related studies are presented. The first consisted of establishing classification accuracies for several hierarchies of features as spatial resolution was progressively coarsened from (2 meters) squared to (64 meters) squared. The second investigated the capabilities for specialized processing techniques to improve upon the results of conventional processing procedures for both coarse and fine resolution data.

  12. Machine Learning Methods for Production Cases Analysis

    NASA Astrophysics Data System (ADS)

    Mokrova, Nataliya V.; Mokrov, Alexander M.; Safonova, Alexandra V.; Vishnyakov, Igor V.

    2018-03-01

    Approach to analysis of events occurring during the production process were proposed. Described machine learning system is able to solve classification tasks related to production control and hazard identification at an early stage. Descriptors of the internal production network data were used for training and testing of applied models. k-Nearest Neighbors and Random forest methods were used to illustrate and analyze proposed solution. The quality of the developed classifiers was estimated using standard statistical metrics, such as precision, recall and accuracy.

  13. BCDForest: a boosting cascade deep forest model towards the classification of cancer subtypes based on gene expression data.

    PubMed

    Guo, Yang; Liu, Shuhui; Li, Zhanhuai; Shang, Xuequn

    2018-04-11

    The classification of cancer subtypes is of great importance to cancer disease diagnosis and therapy. Many supervised learning approaches have been applied to cancer subtype classification in the past few years, especially of deep learning based approaches. Recently, the deep forest model has been proposed as an alternative of deep neural networks to learn hyper-representations by using cascade ensemble decision trees. It has been proved that the deep forest model has competitive or even better performance than deep neural networks in some extent. However, the standard deep forest model may face overfitting and ensemble diversity challenges when dealing with small sample size and high-dimensional biology data. In this paper, we propose a deep learning model, so-called BCDForest, to address cancer subtype classification on small-scale biology datasets, which can be viewed as a modification of the standard deep forest model. The BCDForest distinguishes from the standard deep forest model with the following two main contributions: First, a named multi-class-grained scanning method is proposed to train multiple binary classifiers to encourage diversity of ensemble. Meanwhile, the fitting quality of each classifier is considered in representation learning. Second, we propose a boosting strategy to emphasize more important features in cascade forests, thus to propagate the benefits of discriminative features among cascade layers to improve the classification performance. Systematic comparison experiments on both microarray and RNA-Seq gene expression datasets demonstrate that our method consistently outperforms the state-of-the-art methods in application of cancer subtype classification. The multi-class-grained scanning and boosting strategy in our model provide an effective solution to ease the overfitting challenge and improve the robustness of deep forest model working on small-scale data. Our model provides a useful approach to the classification of cancer subtypes by using deep learning on high-dimensional and small-scale biology data.

  14. Validation of psoriatic arthritis diagnoses in electronic medical records using natural language processing

    PubMed Central

    Cai, Tianxi; Karlson, Elizabeth W.

    2013-01-01

    Objectives To test whether data extracted from full text patient visit notes from an electronic medical record (EMR) would improve the classification of PsA compared to an algorithm based on codified data. Methods From the > 1,350,000 adults in a large academic EMR, all 2318 patients with a billing code for PsA were extracted and 550 were randomly selected for chart review and algorithm training. Using codified data and phrases extracted from narrative data using natural language processing, 31 predictors were extracted and three random forest algorithms trained using coded, narrative, and combined predictors. The receiver operator curve (ROC) was used to identify the optimal algorithm and a cut point was chosen to achieve the maximum sensitivity possible at a 90% positive predictive value (PPV). The algorithm was then used to classify the remaining 1768 charts and finally validated in a random sample of 300 cases predicted to have PsA. Results The PPV of a single PsA code was 57% (95%CI 55%–58%). Using a combination of coded data and NLP the random forest algorithm reached a PPV of 90% (95%CI 86%–93%) at sensitivity of 87% (95% CI 83% – 91%) in the training data. The PPV was 93% (95%CI 89%–96%) in the validation set. Adding NLP predictors to codified data increased the area under the ROC (p < 0.001). Conclusions Using NLP with text notes from electronic medical records improved the performance of the prediction algorithm significantly. Random forests were a useful tool to accurately classify psoriatic arthritis cases to enable epidemiological research. PMID:20701955

  15. Fangorn Forest (F2): a machine learning approach to classify genes and genera in the family Geminiviridae.

    PubMed

    Silva, José Cleydson F; Carvalho, Thales F M; Fontes, Elizabeth P B; Cerqueira, Fabio R

    2017-09-30

    Geminiviruses infect a broad range of cultivated and non-cultivated plants, causing significant economic losses worldwide. The studies of the diversity of species, taxonomy, mechanisms of evolution, geographic distribution, and mechanisms of interaction of these pathogens with the host have greatly increased in recent years. Furthermore, the use of rolling circle amplification (RCA) and advanced metagenomics approaches have enabled the elucidation of viromes and the identification of many viral agents in a large number of plant species. As a result, determining the nomenclature and taxonomically classifying geminiviruses turned into complex tasks. In addition, the gene responsible for viral replication (particularly, the viruses belonging to the genus Mastrevirus) may be spliced due to the use of the transcriptional/splicing machinery in the host cells. However, the current tools have limitations concerning the identification of introns. This study proposes a new method, designated Fangorn Forest (F2), based on machine learning approaches to classify genera using an ab initio approach, i.e., using only the genomic sequence, as well as to predict and classify genes in the family Geminiviridae. In this investigation, nine genera of the family Geminiviridae and their related satellite DNAs were selected. We obtained two training sets, one for genus classification, containing attributes extracted from the complete genome of geminiviruses, while the other was made up to classify geminivirus genes, containing attributes extracted from ORFs taken from the complete genomes cited above. Three ML algorithms were applied on those datasets to build the predictive models: support vector machines, using the sequential minimal optimization training approach, random forest (RF), and multilayer perceptron. RF demonstrated a very high predictive power, achieving 0.966, 0.964, and 0.995 of precision, recall, and area under the curve (AUC), respectively, for genus classification. For gene classification, RF could reach 0.983, 0.983, and 0.998 of precision, recall, and AUC, respectively. Therefore, Fangorn Forest is proven to be an efficient method for classifying genera of the family Geminiviridae with high precision and effective gene prediction and classification. The method is freely accessible at www.geminivirus.org:8080/geminivirusdw/discoveryGeminivirus.jsp .

  16. [Object-oriented segmentation and classification of forest gap based on QuickBird remote sensing image.

    PubMed

    Mao, Xue Gang; Du, Zi Han; Liu, Jia Qian; Chen, Shu Xin; Hou, Ji Yu

    2018-01-01

    Traditional field investigation and artificial interpretation could not satisfy the need of forest gaps extraction at regional scale. High spatial resolution remote sensing image provides the possibility for regional forest gaps extraction. In this study, we used object-oriented classification method to segment and classify forest gaps based on QuickBird high resolution optical remote sensing image in Jiangle National Forestry Farm of Fujian Province. In the process of object-oriented classification, 10 scales (10-100, with a step length of 10) were adopted to segment QuickBird remote sensing image; and the intersection area of reference object (RA or ) and intersection area of segmented object (RA os ) were adopted to evaluate the segmentation result at each scale. For segmentation result at each scale, 16 spectral characteristics and support vector machine classifier (SVM) were further used to classify forest gaps, non-forest gaps and others. The results showed that the optimal segmentation scale was 40 when RA or was equal to RA os . The accuracy difference between the maximum and minimum at different segmentation scales was 22%. At optimal scale, the overall classification accuracy was 88% (Kappa=0.82) based on SVM classifier. Combining high resolution remote sensing image data with object-oriented classification method could replace the traditional field investigation and artificial interpretation method to identify and classify forest gaps at regional scale.

  17. High spatial resolution mapping of land cover types in a priority area for conservation in the Brazilian savanna

    NASA Astrophysics Data System (ADS)

    Ribeiro, F.; Roberts, D. A.; Hess, L. L.; Davis, F. W.; Caylor, K. K.; Nackoney, J.; Antunes Daldegan, G.

    2017-12-01

    Savannas are heterogeneous landscapes consisting of highly mixed land cover types that lack clear distinct boundaries. The Brazilian Cerrado is a Neotropical savanna considered a biodiversity hotspot for conservation due to its biodiversity richness and rapid transformation of its landscape by crop and pasture activities. The Cerrado is one of the most threatened Brazilian biomes and only 2.2% of its original extent is strictly protected. Accurate mapping and monitoring of its ecosystems and adjacent land use are important to select areas for conservation and to improve our understanding of the dynamics in this biome. Land cover mapping of savannas is difficult due to spectral similarity between land cover types resulting from similar vegetation structure, floristically similar components, generalization of land cover classes, and heterogeneity usually expressed as small patch sizes within the natural landscape. These factors are the major contributor to misclassification and low map accuracies among remote sensing studies in savannas. Specific challenges to map the Cerrado's land cover types are related to the spectral similarity between classes of land use and natural vegetation, such as natural grassland vs. cultivated pasture, and forest ecosystem vs. crops. This study seeks to classify and evaluate the land cover patterns across an area ranked as having extremely high priority for future conservation in the Cerrado. The main objective of this study is to identify the representativeness of each vegetation type across the landscape using high to moderate spatial resolution imagery using an automated scheme. A combination of pixel-based and object-based approaches were tested using RapidEye 3A imagery (5m spatial resolution) to classify the Cerrado's major land cover types. The random forest classifier was used to map the major ecosystems present across the area, and demonstrated to have an effective result with 68% of overall accuracy. Post-classification modification was performed to refine information to the major physiognomic groups of each ecosystem type. In this step, we used segmentation in eCognition, considering the random forest classification as input as well as other environmental layers (e.g. slope, soil types), which improved the overall classification to 75%.

  18. Five-class differential diagnostics of neurodegenerative diseases using random undersampling boosting.

    PubMed

    Tong, Tong; Ledig, Christian; Guerrero, Ricardo; Schuh, Andreas; Koikkalainen, Juha; Tolonen, Antti; Rhodius, Hanneke; Barkhof, Frederik; Tijms, Betty; Lemstra, Afina W; Soininen, Hilkka; Remes, Anne M; Waldemar, Gunhild; Hasselbalch, Steen; Mecocci, Patrizia; Baroni, Marta; Lötjönen, Jyrki; Flier, Wiesje van der; Rueckert, Daniel

    2017-01-01

    Differentiating between different types of neurodegenerative diseases is not only crucial in clinical practice when treatment decisions have to be made, but also has a significant potential for the enrichment of clinical trials. The purpose of this study is to develop a classification framework for distinguishing the four most common neurodegenerative diseases, including Alzheimer's disease, frontotemporal lobe degeneration, Dementia with Lewy bodies and vascular dementia, as well as patients with subjective memory complaints. Different biomarkers including features from images (volume features, region-wise grading features) and non-imaging features (CSF measures) were extracted for each subject. In clinical practice, the prevalence of different dementia types is imbalanced, posing challenges for learning an effective classification model. Therefore, we propose the use of the RUSBoost algorithm in order to train classifiers and to handle the class imbalance training problem. Furthermore, a multi-class feature selection method based on sparsity is integrated into the proposed framework to improve the classification performance. It also provides a way for investigating the importance of different features and regions. Using a dataset of 500 subjects, the proposed framework achieved a high accuracy of 75.2% with a balanced accuracy of 69.3% for the five-class classification using ten-fold cross validation, which is significantly better than the results using support vector machine or random forest, demonstrating the feasibility of the proposed framework to support clinical decision making.

  19. An application of quantile random forests for predictive mapping of forest attributes

    Treesearch

    E.A. Freeman; G.G. Moisen

    2015-01-01

    Increasingly, random forest models are used in predictive mapping of forest attributes. Traditional random forests output the mean prediction from the random trees. Quantile regression forests (QRF) is an extension of random forests developed by Nicolai Meinshausen that provides non-parametric estimates of the median predicted value as well as prediction quantiles. It...

  20. Assessing urban forest canopy cover using airborne or satellite imagery

    Treesearch

    Jeffrey T. Walton; David J. Nowak; Eric J. Greenfield

    2008-01-01

    With the availability of many sources of imagery and various digital classification techniques, assessing urban forest canopy cover is readily accessible to most urban forest managers. Understanding the capability and limitations of various types of imagery and classification methods is essential to interpreting canopy cover values. An overview of several remote...

  1. Forest habitat types of central Idaho

    Treesearch

    Robert Steele; Robert D. Pfister; Russell A. Ryker; Jay A. Kittams

    1981-01-01

    A land-classification system based upon potential natural vegetation is presented for the forests of central Idaho. It is based on reconnaissance sampling of about 800 stands. A hierarchical taxonomic classification of forest sites was developed using the habitat type concept. A total of eight climax series, 64 habitat types, and 55 additional phases of habitat types...

  2. Forest statistics for Arkansas' Ouachita counties - 1988

    Treesearch

    F. Dee Hines

    1988-01-01

    Tabulated results were derived from data obtained during a recent inventory of 10 counties comprising the Ouachita Unit of Arkansas. Data on forest acreage and timber volume were secured by a three-step process. A forest-nonforest classification using aerial photographs was accomplished for points representing approximately 230 acres. These photo classifications were...

  3. Classification and evaluation for forest sites on the Natchez Trace State Forest, State Resort Park, and Wildlife Management Area in West Tennessee

    Treesearch

    Glendon W. Smalley

    1991-01-01

    Presents comprehensive forest site classification system for the 45,084-acre Natchez Trace State Forest, State Resort park, and WIldlife Management Area in the highly dissected and predominantly hilly Upper Coastal Plain of west Tennessee. Twenty-five landtypes are identified. Each landtype is defined in terms of nine elements and evaluated on the baiss of...

  4. Accuracy assessment of biomass and forested area classification from modis, landstat-tm satellite imagery and forest inventory plot data

    Treesearch

    Dumitru Salajanu; Dennis M. Jacobs

    2007-01-01

    The objective of this study was to determine how well forestfnon-forest and biomass classifications obtained from Landsat-TM and MODIS satellite data modeled with FIA plots, compare to each other and with forested area and biomass estimates from the national inventory data, as well as whether there is an increase in overall accuracy when pixel size (spatial resolution...

  5. An ecological classification system for the central hardwoods region: The Hoosier National Forest

    Treesearch

    James E. Van Kley; George R. Parker

    1993-01-01

    This study, a multifactor ecological classification system, using vegetation, soil characteristics, and physiography, was developed for the landscape of the Hoosier National Forest in Southern Indiana. Measurements of ground flora, saplings, and canopy trees from selected stands older than 80 years were subjected to TWINSPAN classification and DECORANA ordination....

  6. Application of satellite data and LARS's data processing techniques to mapping vegetation of the Dismal Swamp. M.S. Thesis - Old Dominion Univ.

    NASA Technical Reports Server (NTRS)

    Messmore, J. A.

    1976-01-01

    The feasibility of using digital satellite imagery and automatic data processing techniques as a means of mapping swamp forest vegetation was considered, using multispectral scanner data acquired by the LANDSAT-1 satellite. The site for this investigation was the Dismal Swamp, a 210,000 acre swamp forest located south of Suffolk, Va. on the Virginia-North Carolina border. Two basic classification strategies were employed. The initial classification utilized unsupervised techniques which produced a map of the swamp indicating the distribution of thirteen forest spectral classes. These classes were later combined into three informational categories: Atlantic white cedar (Chamaecyparis thyoides), Loblolly pine (Pinus taeda), and deciduous forest. The subsequent classification employed supervised techniques which mapped Atlantic white cedar, Loblolly pine, deciduous forest, water and agriculture within the study site. A classification accuracy of 82.5% was produced by unsupervised techniques compared with 89% accuracy using supervised techniques.

  7. A Two-Layer Method for Sedentary Behaviors Classification Using Smartphone and Bluetooth Beacons.

    PubMed

    Cerón, Jesús D; López, Diego M; Hofmann, Christian

    2017-01-01

    Among the factors that outline the health of populations, person's lifestyle is the more important one. This work focuses on the caracterization and prevention of sedentary lifestyles. A sedentary behavior is defined as "any waking behavior characterized by an energy expenditure of 1.5 METs (Metabolic Equivalent) or less while in a sitting or reclining posture". To propose a method for sedentary behaviors classification using a smartphone and Bluetooth beacons considering different types of classification models: personal, hybrid or impersonal. Following the CRISP-DM methodology, a method based on a two-layer approach for the classification of sedentary behaviors is proposed. Using data collected from a smartphones' accelerometer, gyroscope and barometer; the first layer classifies between performing a sedentary behavior and not. The second layer of the method classifies the specific sedentary activity performed using only the smartphone's accelerometer and barometer data, but adding indoor location data, using Bluetooth Low Energy (BLE) beacons. To improve the precision of the classification, both layers implemented the Random Forest algorithm and the personal model. This study presents the first available method for the automatic classification of specific sedentary behaviors. The layered classification approach has the potential to improve processing, memory and energy consumption of mobile devices and wearables used.

  8. SAR-based change detection using hypothesis testing and Markov random field modelling

    NASA Astrophysics Data System (ADS)

    Cao, W.; Martinis, S.

    2015-04-01

    The objective of this study is to automatically detect changed areas caused by natural disasters from bi-temporal co-registered and calibrated TerraSAR-X data. The technique in this paper consists of two steps: Firstly, an automatic coarse detection step is applied based on a statistical hypothesis test for initializing the classification. The original analytical formula as proposed in the constant false alarm rate (CFAR) edge detector is reviewed and rewritten in a compact form of the incomplete beta function, which is a builtin routine in commercial scientific software such as MATLAB and IDL. Secondly, a post-classification step is introduced to optimize the noisy classification result in the previous step. Generally, an optimization problem can be formulated as a Markov random field (MRF) on which the quality of a classification is measured by an energy function. The optimal classification based on the MRF is related to the lowest energy value. Previous studies provide methods for the optimization problem using MRFs, such as the iterated conditional modes (ICM) algorithm. Recently, a novel algorithm was presented based on graph-cut theory. This method transforms a MRF to an equivalent graph and solves the optimization problem by a max-flow/min-cut algorithm on the graph. In this study this graph-cut algorithm is applied iteratively to improve the coarse classification. At each iteration the parameters of the energy function for the current classification are set by the logarithmic probability density function (PDF). The relevant parameters are estimated by the method of logarithmic cumulants (MoLC). Experiments are performed using two flood events in Germany and Australia in 2011 and a forest fire on La Palma in 2009 using pre- and post-event TerraSAR-X data. The results show convincing coarse classifications and considerable improvement by the graph-cut post-classification step.

  9. Mapping vegetation heights in China using slope correction ICESat data, SRTM, MODIS-derived and climate data

    NASA Astrophysics Data System (ADS)

    Huang, Huabing; Liu, Caixia; Wang, Xiaoyi; Biging, Gregory S.; Chen, Yanlei; Yang, Jun; Gong, Peng

    2017-07-01

    Vegetation height is an important parameter for biomass assessment and vegetation classification. However, vegetation height data over large areas are difficult to obtain. The existing vegetation height data derived from the Ice, Cloud and land Elevation Satellite (ICESat) data only include laser footprints in relatively flat forest regions (<5°). Thus, a large portion of ICESat data over sloping areas has not been used. In this study, we used a new slope correction method to improve the accuracy of estimates of vegetation heights for regions where slopes fall between 5° and 15°. The new method enabled us to use more than 20% additional laser data compared with the existing vegetation height data which only uses ICESat data in relatively flat areas (slope < 5°) in China. With the vegetation height data extracted from ICESat footprints and ancillary data including Moderate Resolution Imaging Spectroradiometer (MODIS) derived data (canopy cover, reflectances and leaf area index), climate data, and topographic data, we developed a wall to wall vegetation height map of China using the Random Forest algorithm. We used the data from 416 field measurements to validate the new vegetation height product. The coefficient of determination (R2) and RMSE of the new vegetation height product were 0.89 and 4.73 m respectively. The accuracy of the product is significantly better than that of the two existing global forest height products produced by Lefsky (2010) and Simard et al. (2011), when compared with the data from 227 field measurements in our study area. The new vegetation height data demonstrated clear distinctions among forest, shrub and grassland, which is promising for improving the classification of vegetation and above-ground forest biomass assessment in China.

  10. Forest above ground biomass estimation and forest/non-forest classification for Odisha, India, using L-band Synthetic Aperture Radar (SAR) data

    NASA Astrophysics Data System (ADS)

    Suresh, M.; Kiran Chand, T. R.; Fararoda, R.; Jha, C. S.; Dadhwal, V. K.

    2014-11-01

    Tropical forests contribute to approximately 40 % of the total carbon found in terrestrial biomass. In this context, forest/non-forest classification and estimation of forest above ground biomass over tropical regions are very important and relevant in understanding the contribution of tropical forests in global biogeochemical cycles, especially in terms of carbon pools and fluxes. Information on the spatio-temporal biomass distribution acts as a key input to Reducing Emissions from Deforestation and forest Degradation Plus (REDD+) action plans. This necessitates precise and reliable methods to estimate forest biomass and to reduce uncertainties in existing biomass quantification scenarios. The use of backscatter information from a host of allweather capable Synthetic Aperture Radar (SAR) systems during the recent past has demonstrated the potential of SAR data in forest above ground biomass estimation and forest / nonforest classification. In the present study, Advanced Land Observing Satellite (ALOS) / Phased Array L-band Synthetic Aperture Radar (PALSAR) data along with field inventory data have been used in forest above ground biomass estimation and forest / non-forest classification over Odisha state, India. The ALOSPALSAR 50 m spatial resolution orthorectified and radiometrically corrected HH/HV dual polarization data (digital numbers) for the year 2010 were converted to backscattering coefficient images (Schimada et al., 2009). The tree level measurements collected during field inventory (2009-'10) on Girth at Breast Height (GBH at 1.3 m above ground) and height of all individual trees at plot (plot size 0.1 ha) level were converted to biomass density using species specific allometric equations and wood densities. The field inventory based biomass estimations were empirically integrated with ALOS-PALSAR backscatter coefficients to derive spatial forest above ground biomass estimates for the study area. Further, The Support Vector Machines (SVM) based Radial Basis Function classification technique was employed to carry out binary (forest-non forest) classification using ALOSPALSAR HH and HV backscatter coefficient images and field inventory data. The textural Haralick's Grey Level Cooccurrence Matrix (GLCM) texture measures are determined on HV backscatter image for Odisha, for the year 2010. PALSAR HH, HV backscatter coefficient images, their difference (HHHV) and HV backscatter coefficient based eight textural parameters (Mean, Variance, Dissimilarity, Contrast, Angular second moment, Homogeneity, Correlation and Contrast) are used as input parameters for Support Vector Machines (SVM) tool. Ground based inputs for forest / non-forest were taken from field inventory data and high resolution Google maps. Results suggested significant relationship between HV backscatter coefficient and field based biomass (R2 = 0.508, p = 0.55) compared to HH with biomass values ranging from 5 to 365 t/ha. The spatial variability of biomass with reference to different forest types is in good agreement. The forest / nonforest classified map suggested a total forest cover of 50214 km2 with an overall accuracy of 92.54 %. The forest / non-forest information derived from the present study showed a good spatial agreement with the standard forest cover map of Forest Survey of India (FSI) and corresponding published area of 50575 km2. Results are discussed in the paper.

  11. Spatial and Temporal Patterns of Unburned Areas within Fire Perimeters in the Northwestern United States from 1984 to 2014

    NASA Astrophysics Data System (ADS)

    Meddens, A. J.; Kolden, C.; Lutz, J. A.; Abatzoglou, J. T.; Hudak, A. T.

    2016-12-01

    Recently, there has been concern about increasing extent and severity of wildfires across the globe given rapid climate change. Areas that do not burn within fire perimeters can act as fire refugia, providing (1) protection from the detrimental effects of the fire, (2) seed sources, and (3) post-fire habitat on the landscape. However, recent studies have mainly focused on the higher end of the burn severity spectrum whereas the lower end of the burn severity spectrum has been largely ignored. We developed a spatially explicit database for 2,200 fires across the inland northwestern USA, delineating unburned areas within fire perimeters from 1984 to 2014. We used 1,600 Landsat scenes with one or two scenes before and one or two scenes after the fires to capture the unburned proportion of the fire. Subsequently, we characterized the spatial and temporal patterns of unburned areas and related the unburned proportion to interannual climate variability. The overall classification accuracy detecting unburned locations was 89.2% using a 10-fold cross-validation classification tree approach in combination with 719 randomly located field plots. The unburned proportion ranged from 2% to 58% with an average of 19% for a select number of fires. We find that using both an immediate post-fire image and a one-year post fire image improves classification accuracy of unburned islands over using just a single post-fire image. The spatial characteristics of the unburned islands differ between forested and non-forested regions with a larger amount of unburned area within non-forest. In addition, we show trends of unburned proportion related primarily to concurrent climatic drought conditions across the entire region. This database is important for subsequent analyses of fire refugia prioritization, vegetation recovery studies, ecosystem resilience, and forest management to facilitate unburned islands through fuels breaks, prescribed burning, and fire suppression strategies.

  12. Forest land cover change (1975-2000) in the Greater Border Lakes region

    Treesearch

    Peter T. Wolter; Brian R. Sturtevant; Brian R. Miranda; Sue M. Lietz; Phillip A. Townsend; John Pastor

    2012-01-01

    This document and accompanying maps describe land cover classifications and change detection for a 13.8 million ha landscape straddling the border between Minnesota, and Ontario, Canada (greater Border Lakes Region). Land cover classifications focus on discerning Anderson Level II forest and nonforest cover to track spatiotemporal changes in forest cover. Multi-...

  13. Forest habitat types of eastern Idaho-western Wyoming

    Treesearch

    Robert Steele; Stephen V. Cooper; David M. Ondov; David W. Roberts; Robert D. Pfister

    1983-01-01

    A land-classification system based upon potential natural vegetation is presented for the forests of central Idaho. It is based on reconnaissance sampling of about 980 stands. A hierarchical taxonomic classification of forest sites was developed using the habitat type concept. A total of six climax series, 58 habitat types, and 24 additional phases of habitat types are...

  14. Blue oak plant communities of southern San Luis Obispo and northern Santa Barbara Counties, California

    Treesearch

    Mark I. Borchert; Nancy D. Cunha; Patricia C. Krosse; Marcee L. Lawrence

    1993-01-01

    An ecological classification system has been developed for the Pacific Southwest Region of the Forest Service. As part of that classification effort, blue oak (Quercus douglasii) woodlands and forests of southern San Luis Obispo and northern Santa Barbara Counties in Los Padres National Forest were classified into I3 plant communities using...

  15. Coniferous forest habitat types of central and southern Utah

    Treesearch

    Andrew P. Youngblood; Ronald L. Mauk

    1985-01-01

    A land-classification system based upon potential natural vegetation is presented for the coniferous forests of central and southern Utah. It is based on reconnaissance sampling of about 720 stands. A hierarchical taxonomic classification of forest sites was developed using the habitat type concept. Seven climax series, 37 habitat types, and six additional phases of...

  16. Forest habitat types of Montana

    Treesearch

    Robert D. Pfister; Bernard L. Kovalchik; Stephen F. Arno; Richard C. Presby

    1977-01-01

    A land-classification system based upon potential natural vegetation is presented for the forests of Montana. It is based on an intensive 4-year study and reconnaissance sampling of about 1,500 stands. A hierarchical classification of forest sites was developed using the habitat type concept. A total of 9 climax series, 64 habitat types, and 37 additional phases of...

  17. Development of an ecological classification system for the Wayne National Forest

    Treesearch

    David M. Hix; Andrea M. Chech

    1993-01-01

    In 1991, a collaborative research project was initiated to create an ecological classification system for the Wayne National Forest of southeastern Ohio. The work focuses on the ecological land type (ELT) level of ecosystem classification. The most common ELTs are being identified and described using information from intensive field sampling and multivariate data...

  18. Simultaneous comparison and assessment of eight remotely sensed maps of Philippine forests

    NASA Astrophysics Data System (ADS)

    Estoque, Ronald C.; Pontius, Robert G.; Murayama, Yuji; Hou, Hao; Thapa, Rajesh B.; Lasco, Rodel D.; Villar, Merlito A.

    2018-05-01

    This article compares and assesses eight remotely sensed maps of Philippine forest cover in the year 2010. We examined eight Forest versus Non-Forest maps reclassified from eight land cover products: the Philippine Land Cover, the Climate Change Initiative (CCI) Land Cover, the Landsat Vegetation Continuous Fields (VCF), the MODIS VCF, the MODIS Land Cover Type product (MCD12Q1), the Global Tree Canopy Cover, the ALOS-PALSAR Forest/Non-Forest Map, and the GlobeLand30. The reference data consisted of 9852 randomly distributed sample points interpreted from Google Earth. We created methods to assess the maps and their combinations. Results show that the percentage of the Philippines covered by forest ranges among the maps from a low of 23% for the Philippine Land Cover to a high of 67% for GlobeLand30. Landsat VCF estimates 36% forest cover, which is closest to the 37% estimate based on the reference data. The eight maps plus the reference data agree unanimously on 30% of the sample points, of which 11% are attributable to forest and 19% to non-forest. The overall disagreement between the reference data and Philippine Land Cover is 21%, which is the least among the eight Forest versus Non-Forest maps. About half of the 9852 points have a nested structure such that the forest in a given dataset is a subset of the forest in the datasets that have more forest than the given dataset. The variation among the maps regarding forest quantity and allocation relates to the combined effects of the various definitions of forest and classification errors. Scientists and policy makers must consider these insights when producing future forest cover maps and when establishing benchmarks for forest cover monitoring.

  19. Predicting Long-Term Cognitive Outcome Following Breast Cancer with Pre-Treatment Resting State fMRI and Random Forest Machine Learning.

    PubMed

    Kesler, Shelli R; Rao, Arvind; Blayney, Douglas W; Oakley-Girvan, Ingrid A; Karuturi, Meghan; Palesh, Oxana

    2017-01-01

    We aimed to determine if resting state functional magnetic resonance imaging (fMRI) acquired at pre-treatment baseline could accurately predict breast cancer-related cognitive impairment at long-term follow-up. We evaluated 31 patients with breast cancer (age 34-65) prior to any treatment, post-chemotherapy and 1 year later. Cognitive testing scores were normalized based on data obtained from 43 healthy female controls and then used to categorize patients as impaired or not based on longitudinal changes. We measured clustering coefficient, a measure of local connectivity, by applying graph theory to baseline resting state fMRI and entered these metrics along with relevant patient-related and medical variables into random forest classification. Incidence of cognitive impairment at 1 year follow-up was 55% and was predicted by classification algorithms with up to 100% accuracy ( p < 0.0001). The neuroimaging-based model was significantly more accurate than a model involving patient-related and medical variables ( p = 0.005). Hub regions belonging to several distinct functional networks were the most important predictors of cognitive outcome. Characteristics of these hubs indicated potential spread of brain injury from default mode to other networks over time. These findings suggest that resting state fMRI is a promising tool for predicting future cognitive impairment associated with breast cancer. This information could inform treatment decision making by identifying patients at highest risk for long-term cognitive impairment.

  20. Predicting Long-Term Cognitive Outcome Following Breast Cancer with Pre-Treatment Resting State fMRI and Random Forest Machine Learning

    PubMed Central

    Kesler, Shelli R.; Rao, Arvind; Blayney, Douglas W.; Oakley-Girvan, Ingrid A.; Karuturi, Meghan; Palesh, Oxana

    2017-01-01

    We aimed to determine if resting state functional magnetic resonance imaging (fMRI) acquired at pre-treatment baseline could accurately predict breast cancer-related cognitive impairment at long-term follow-up. We evaluated 31 patients with breast cancer (age 34–65) prior to any treatment, post-chemotherapy and 1 year later. Cognitive testing scores were normalized based on data obtained from 43 healthy female controls and then used to categorize patients as impaired or not based on longitudinal changes. We measured clustering coefficient, a measure of local connectivity, by applying graph theory to baseline resting state fMRI and entered these metrics along with relevant patient-related and medical variables into random forest classification. Incidence of cognitive impairment at 1 year follow-up was 55% and was predicted by classification algorithms with up to 100% accuracy (p < 0.0001). The neuroimaging-based model was significantly more accurate than a model involving patient-related and medical variables (p = 0.005). Hub regions belonging to several distinct functional networks were the most important predictors of cognitive outcome. Characteristics of these hubs indicated potential spread of brain injury from default mode to other networks over time. These findings suggest that resting state fMRI is a promising tool for predicting future cognitive impairment associated with breast cancer. This information could inform treatment decision making by identifying patients at highest risk for long-term cognitive impairment. PMID:29187817

  1. Random forest classification of large volume structures for visuo-haptic rendering in CT images

    NASA Astrophysics Data System (ADS)

    Mastmeyer, Andre; Fortmeier, Dirk; Handels, Heinz

    2016-03-01

    For patient-specific voxel-based visuo-haptic rendering of CT scans of the liver area, the fully automatic segmentation of large volume structures such as skin, soft tissue, lungs and intestine (risk structures) is important. Using a machine learning based approach, several existing segmentations from 10 segmented gold-standard patients are learned by random decision forests individually and collectively. The core of this paper is feature selection and the application of the learned classifiers to a new patient data set. In a leave-some-out cross-validation, the obtained full volume segmentations are compared to the gold-standard segmentations of the untrained patients. The proposed classifiers use a multi-dimensional feature space to estimate the hidden truth, instead of relying on clinical standard threshold and connectivity based methods. The result of our efficient whole-body section classification are multi-label maps with the considered tissues. For visuo-haptic simulation, other small volume structures would have to be segmented additionally. We also take a look into these structures (liver vessels). For an experimental leave-some-out study consisting of 10 patients, the proposed method performs much more efficiently compared to state of the art methods. In two variants of leave-some-out experiments we obtain best mean DICE ratios of 0.79, 0.97, 0.63 and 0.83 for skin, soft tissue, hard bone and risk structures. Liver structures are segmented with DICE 0.93 for the liver, 0.43 for blood vessels and 0.39 for bile vessels.

  2. Differentiation of Crataegus spp. guided by nuclear magnetic resonance spectrometry with chemometric analyses.

    PubMed

    Lund, Jensen A; Brown, Paula N; Shipley, Paul R

    2017-09-01

    For compliance with US Current Good Manufacturing Practice regulations for dietary supplements, manufacturers must provide identity of source plant material. Despite the popularity of hawthorn as a dietary supplement, relatively little is known about the comparative phytochemistry of different hawthorn species, and in particular North American hawthorns. The combination of NMR spectrometry with chemometric analyses offers an innovative approach to differentiating hawthorn species and exploring the phytochemistry. Two European and two North American species, harvested from a farm trial in late summer 2008, were analyzed by standard 1D 1 H and J-resolved (JRES) experiments. The data were preprocessed and modelled by principal component analysis (PCA). A supervised model was then generated by partial least squares-discriminant analysis (PLS-DA) for classification and evaluated by cross validation. Supervised random forests models were constructed from the dataset to explore the potential of machine learning for identification of unique patterns across species. 1D 1 H NMR data yielded increased differentiation over the JRES data. The random forests results correlated with PLS-DA results and outperformed PLS-DA in classification accuracy. In all of these analyses differentiation of the Crataegus spp. was best achieved by focusing on the NMR spectral region that contains signals unique to plant phenolic compounds. Identification of potentially significant metabolites for differentiation between species was approached using univariate techniques including significance analysis of microarrays and Kruskall-Wallis tests. Copyright © 2017 Elsevier Ltd. All rights reserved.

  3. GIS-based groundwater potential mapping using boosted regression tree, classification and regression tree, and random forest machine learning models in Iran.

    PubMed

    Naghibi, Seyed Amir; Pourghasemi, Hamid Reza; Dixon, Barnali

    2016-01-01

    Groundwater is considered one of the most valuable fresh water resources. The main objective of this study was to produce groundwater spring potential maps in the Koohrang Watershed, Chaharmahal-e-Bakhtiari Province, Iran, using three machine learning models: boosted regression tree (BRT), classification and regression tree (CART), and random forest (RF). Thirteen hydrological-geological-physiographical (HGP) factors that influence locations of springs were considered in this research. These factors include slope degree, slope aspect, altitude, topographic wetness index (TWI), slope length (LS), plan curvature, profile curvature, distance to rivers, distance to faults, lithology, land use, drainage density, and fault density. Subsequently, groundwater spring potential was modeled and mapped using CART, RF, and BRT algorithms. The predicted results from the three models were validated using the receiver operating characteristics curve (ROC). From 864 springs identified, 605 (≈70 %) locations were used for the spring potential mapping, while the remaining 259 (≈30 %) springs were used for the model validation. The area under the curve (AUC) for the BRT model was calculated as 0.8103 and for CART and RF the AUC were 0.7870 and 0.7119, respectively. Therefore, it was concluded that the BRT model produced the best prediction results while predicting locations of springs followed by CART and RF models, respectively. Geospatially integrated BRT, CART, and RF methods proved to be useful in generating the spring potential map (SPM) with reasonable accuracy.

  4. Improved classification accuracy of powdery mildew infection levels of wine grapes by spatial-spectral analysis of hyperspectral images.

    PubMed

    Knauer, Uwe; Matros, Andrea; Petrovic, Tijana; Zanker, Timothy; Scott, Eileen S; Seiffert, Udo

    2017-01-01

    Hyperspectral imaging is an emerging means of assessing plant vitality, stress parameters, nutrition status, and diseases. Extraction of target values from the high-dimensional datasets either relies on pixel-wise processing of the full spectral information, appropriate selection of individual bands, or calculation of spectral indices. Limitations of such approaches are reduced classification accuracy, reduced robustness due to spatial variation of the spectral information across the surface of the objects measured as well as a loss of information intrinsic to band selection and use of spectral indices. In this paper we present an improved spatial-spectral segmentation approach for the analysis of hyperspectral imaging data and its application for the prediction of powdery mildew infection levels (disease severity) of intact Chardonnay grape bunches shortly before veraison. Instead of calculating texture features (spatial features) for the huge number of spectral bands independently, dimensionality reduction by means of Linear Discriminant Analysis (LDA) was applied first to derive a few descriptive image bands. Subsequent classification was based on modified Random Forest classifiers and selective extraction of texture parameters from the integral image representation of the image bands generated. Dimensionality reduction, integral images, and the selective feature extraction led to improved classification accuracies of up to [Formula: see text] for detached berries used as a reference sample (training dataset). Our approach was validated by predicting infection levels for a sample of 30 intact bunches. Classification accuracy improved with the number of decision trees of the Random Forest classifier. These results corresponded with qPCR results. An accuracy of 0.87 was achieved in classification of healthy, infected, and severely diseased bunches. However, discrimination between visually healthy and infected bunches proved to be challenging for a few samples, perhaps due to colonized berries or sparse mycelia hidden within the bunch or airborne conidia on the berries that were detected by qPCR. An advanced approach to hyperspectral image classification based on combined spatial and spectral image features, potentially applicable to many available hyperspectral sensor technologies, has been developed and validated to improve the detection of powdery mildew infection levels of Chardonnay grape bunches. The spatial-spectral approach improved especially the detection of light infection levels compared with pixel-wise spectral data analysis. This approach is expected to improve the speed and accuracy of disease detection once the thresholds for fungal biomass detected by hyperspectral imaging are established; it can also facilitate monitoring in plant phenotyping of grapevine and additional crops.

  5. Classification and evaluation for forest sites on the Mid-Cumberland Plateau

    Treesearch

    Glendon W. Smalley

    1982-01-01

    Presents a comprehensive forest site classification system for the central portion of the Cumberland Plateau in northeast Alabama, and east-central Tennessee. The system is based on physiography, geology, soils, topography, and vegetation.

  6. Ensemble classification of individual Pinus crowns from multispectral satellite imagery and airborne LiDAR

    NASA Astrophysics Data System (ADS)

    Kukunda, Collins B.; Duque-Lazo, Joaquín; González-Ferreiro, Eduardo; Thaden, Hauke; Kleinn, Christoph

    2018-03-01

    Distinguishing tree species is relevant in many contexts of remote sensing assisted forest inventory. Accurate tree species maps support management and conservation planning, pest and disease control and biomass estimation. This study evaluated the performance of applying ensemble techniques with the goal of automatically distinguishing Pinus sylvestris L. and Pinus uncinata Mill. Ex Mirb within a 1.3 km2 mountainous area in Barcelonnette (France). Three modelling schemes were examined, based on: (1) high-density LiDAR data (160 returns m-2), (2) Worldview-2 multispectral imagery, and (3) Worldview-2 and LiDAR in combination. Variables related to the crown structure and height of individual trees were extracted from the normalized LiDAR point cloud at individual-tree level, after performing individual tree crown (ITC) delineation. Vegetation indices and the Haralick texture indices were derived from Worldview-2 images and served as independent spectral variables. Selection of the best predictor subset was done after a comparison of three variable selection procedures: (1) Random Forests with cross validation (AUCRFcv), (2) Akaike Information Criterion (AIC) and (3) Bayesian Information Criterion (BIC). To classify the species, 9 regression techniques were combined using ensemble models. Predictions were evaluated using cross validation and an independent dataset. Integration of datasets and models improved individual tree species classification (True Skills Statistic, TSS; from 0.67 to 0.81) over individual techniques and maintained strong predictive power (Relative Operating Characteristic, ROC = 0.91). Assemblage of regression models and integration of the datasets provided more reliable species distribution maps and associated tree-scale mapping uncertainties. Our study highlights the potential of model and data assemblage at improving species classifications needed in present-day forest planning and management.

  7. Forest Tree Species Distribution Mapping Using Landsat Satellite Imagery and Topographic Variables with the Maximum Entropy Method in Mongolia

    NASA Astrophysics Data System (ADS)

    Hao Chiang, Shou; Valdez, Miguel; Chen, Chi-Farn

    2016-06-01

    Forest is a very important ecosystem and natural resource for living things. Based on forest inventories, government is able to make decisions to converse, improve and manage forests in a sustainable way. Field work for forestry investigation is difficult and time consuming, because it needs intensive physical labor and the costs are high, especially surveying in remote mountainous regions. A reliable forest inventory can give us a more accurate and timely information to develop new and efficient approaches of forest management. The remote sensing technology has been recently used for forest investigation at a large scale. To produce an informative forest inventory, forest attributes, including tree species are unavoidably required to be considered. In this study the aim is to classify forest tree species in Erdenebulgan County, Huwsgul province in Mongolia, using Maximum Entropy method. The study area is covered by a dense forest which is almost 70% of total territorial extension of Erdenebulgan County and is located in a high mountain region in northern Mongolia. For this study, Landsat satellite imagery and a Digital Elevation Model (DEM) were acquired to perform tree species mapping. The forest tree species inventory map was collected from the Forest Division of the Mongolian Ministry of Nature and Environment as training data and also used as ground truth to perform the accuracy assessment of the tree species classification. Landsat images and DEM were processed for maximum entropy modeling, and this study applied the model with two experiments. The first one is to use Landsat surface reflectance for tree species classification; and the second experiment incorporates terrain variables in addition to the Landsat surface reflectance to perform the tree species classification. All experimental results were compared with the tree species inventory to assess the classification accuracy. Results show that the second one which uses Landsat surface reflectance coupled with terrain variables produced better result, with the higher overall accuracy and kappa coefficient than first experiment. The results indicate that the Maximum Entropy method is an applicable, and to classify tree species using satellite imagery data coupled with terrain information can improve the classification of tree species in the study area.

  8. Lesion segmentation from multimodal MRI using random forest following ischemic stroke.

    PubMed

    Mitra, Jhimli; Bourgeat, Pierrick; Fripp, Jurgen; Ghose, Soumya; Rose, Stephen; Salvado, Olivier; Connelly, Alan; Campbell, Bruce; Palmer, Susan; Sharma, Gagan; Christensen, Soren; Carey, Leeanne

    2014-09-01

    Understanding structure-function relationships in the brain after stroke is reliant not only on the accurate anatomical delineation of the focal ischemic lesion, but also on previous infarcts, remote changes and the presence of white matter hyperintensities. The robust definition of primary stroke boundaries and secondary brain lesions will have significant impact on investigation of brain-behavior relationships and lesion volume correlations with clinical measures after stroke. Here we present an automated approach to identify chronic ischemic infarcts in addition to other white matter pathologies, that may be used to aid the development of post-stroke management strategies. Our approach uses Bayesian-Markov Random Field (MRF) classification to segment probable lesion volumes present on fluid attenuated inversion recovery (FLAIR) MRI. Thereafter, a random forest classification of the information from multimodal (T1-weighted, T2-weighted, FLAIR, and apparent diffusion coefficient (ADC)) MRI images and other context-aware features (within the probable lesion areas) was used to extract areas with high likelihood of being classified as lesions. The final segmentation of the lesion was obtained by thresholding the random forest probabilistic maps. The accuracy of the automated lesion delineation method was assessed in a total of 36 patients (24 male, 12 female, mean age: 64.57±14.23yrs) at 3months after stroke onset and compared with manually segmented lesion volumes by an expert. Accuracy assessment of the automated lesion identification method was performed using the commonly used evaluation metrics. The mean sensitivity of segmentation was measured to be 0.53±0.13 with a mean positive predictive value of 0.75±0.18. The mean lesion volume difference was observed to be 32.32%±21.643% with a high Pearson's correlation of r=0.76 (p<0.0001). The lesion overlap accuracy was measured in terms of Dice similarity coefficient with a mean of 0.60±0.12, while the contour accuracy was observed with a mean surface distance of 3.06mm±3.17mm. The results signify that our method was successful in identifying most of the lesion areas in FLAIR with a low false positive rate. Copyright © 2014 Elsevier Inc. All rights reserved.

  9. TH-CD-206-05: Machine-Learning Based Segmentation of Organs at Risks for Head and Neck Radiotherapy Planning

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ibragimov, B; Pernus, F; Strojan, P

    Purpose: Accurate and efficient delineation of tumor target and organs-at-risks is essential for the success of radiotherapy. In reality, despite of decades of intense research efforts, auto-segmentation has not yet become clinical practice. In this study, we present, for the first time, a deep learning-based classification algorithm for autonomous segmentation in head and neck (HaN) treatment planning. Methods: Fifteen HN datasets of CT, MR and PET images with manual annotation of organs-at-risk (OARs) including spinal cord, brainstem, optic nerves, chiasm, eyes, mandible, tongue, parotid glands were collected and saved in a library of plans. We also have ten super-resolution MRmore » images of the tongue area, where the genioglossus and inferior longitudinalis tongue muscles are defined as organs of interest. We applied the concepts of random forest- and deep learning-based object classification for automated image annotation with the aim of using machine learning to facilitate head and neck radiotherapy planning process. In this new paradigm of segmentation, random forests were used for landmark-assisted segmentation of super-resolution MR images. Alternatively to auto-segmentation with random forest-based landmark detection, deep convolutional neural networks were developed for voxel-wise segmentation of OARs in single and multi-modal images. The network consisted of three pairs of convolution and pooing layer, one RuLU layer and a softmax layer. Results: We present a comprehensive study on using machine learning concepts for auto-segmentation of OARs and tongue muscles for the HaN radiotherapy planning. An accuracy of 81.8% in terms of Dice coefficient was achieved for segmentation of genioglossus and inferior longitudinalis tongue muscles. Preliminary results of OARs regimentation also indicate that deep-learning afforded an unprecedented opportunities to improve the accuracy and robustness of radiotherapy planning. Conclusion: A novel machine learning framework has been developed for image annotation and structure segmentation. Our results indicate the great potential of deep learning in radiotherapy treatment planning.« less

  10. Machine learning models in breast cancer survival prediction.

    PubMed

    Montazeri, Mitra; Montazeri, Mohadeseh; Montazeri, Mahdieh; Beigzadeh, Amin

    2016-01-01

    Breast cancer is one of the most common cancers with a high mortality rate among women. With the early diagnosis of breast cancer survival will increase from 56% to more than 86%. Therefore, an accurate and reliable system is necessary for the early diagnosis of this cancer. The proposed model is the combination of rules and different machine learning techniques. Machine learning models can help physicians to reduce the number of false decisions. They try to exploit patterns and relationships among a large number of cases and predict the outcome of a disease using historical cases stored in datasets. The objective of this study is to propose a rule-based classification method with machine learning techniques for the prediction of different types of Breast cancer survival. We use a dataset with eight attributes that include the records of 900 patients in which 876 patients (97.3%) and 24 (2.7%) patients were females and males respectively. Naive Bayes (NB), Trees Random Forest (TRF), 1-Nearest Neighbor (1NN), AdaBoost (AD), Support Vector Machine (SVM), RBF Network (RBFN), and Multilayer Perceptron (MLP) machine learning techniques with 10-cross fold technique were used with the proposed model for the prediction of breast cancer survival. The performance of machine learning techniques were evaluated with accuracy, precision, sensitivity, specificity, and area under ROC curve. Out of 900 patients, 803 patients and 97 patients were alive and dead, respectively. In this study, Trees Random Forest (TRF) technique showed better results in comparison to other techniques (NB, 1NN, AD, SVM and RBFN, MLP). The accuracy, sensitivity and the area under ROC curve of TRF are 96%, 96%, 93%, respectively. However, 1NN machine learning technique provided poor performance (accuracy 91%, sensitivity 91% and area under ROC curve 78%). This study demonstrates that Trees Random Forest model (TRF) which is a rule-based classification model was the best model with the highest level of accuracy. Therefore, this model is recommended as a useful tool for breast cancer survival prediction as well as medical decision making.

  11. Why choose Random Forest to predict rare species distribution with few samples in large undersampled areas? Three Asian crane species models provide supporting evidence.

    PubMed

    Mi, Chunrong; Huettmann, Falk; Guo, Yumin; Han, Xuesong; Wen, Lijia

    2017-01-01

    Species distribution models (SDMs) have become an essential tool in ecology, biogeography, evolution and, more recently, in conservation biology. How to generalize species distributions in large undersampled areas, especially with few samples, is a fundamental issue of SDMs. In order to explore this issue, we used the best available presence records for the Hooded Crane ( Grus monacha , n  = 33), White-naped Crane ( Grus vipio , n  = 40), and Black-necked Crane ( Grus nigricollis , n  = 75) in China as three case studies, employing four powerful and commonly used machine learning algorithms to map the breeding distributions of the three species: TreeNet (Stochastic Gradient Boosting, Boosted Regression Tree Model), Random Forest, CART (Classification and Regression Tree) and Maxent (Maximum Entropy Models). In addition, we developed an ensemble forecast by averaging predicted probability of the above four models results. Commonly used model performance metrics (Area under ROC (AUC) and true skill statistic (TSS)) were employed to evaluate model accuracy. The latest satellite tracking data and compiled literature data were used as two independent testing datasets to confront model predictions. We found Random Forest demonstrated the best performance for the most assessment method, provided a better model fit to the testing data, and achieved better species range maps for each crane species in undersampled areas. Random Forest has been generally available for more than 20 years and has been known to perform extremely well in ecological predictions. However, while increasingly on the rise, its potential is still widely underused in conservation, (spatial) ecological applications and for inference. Our results show that it informs ecological and biogeographical theories as well as being suitable for conservation applications, specifically when the study area is undersampled. This method helps to save model-selection time and effort, and allows robust and rapid assessments and decisions for efficient conservation.

  12. Why choose Random Forest to predict rare species distribution with few samples in large undersampled areas? Three Asian crane species models provide supporting evidence

    PubMed Central

    Mi, Chunrong; Huettmann, Falk; Han, Xuesong; Wen, Lijia

    2017-01-01

    Species distribution models (SDMs) have become an essential tool in ecology, biogeography, evolution and, more recently, in conservation biology. How to generalize species distributions in large undersampled areas, especially with few samples, is a fundamental issue of SDMs. In order to explore this issue, we used the best available presence records for the Hooded Crane (Grus monacha, n = 33), White-naped Crane (Grus vipio, n = 40), and Black-necked Crane (Grus nigricollis, n = 75) in China as three case studies, employing four powerful and commonly used machine learning algorithms to map the breeding distributions of the three species: TreeNet (Stochastic Gradient Boosting, Boosted Regression Tree Model), Random Forest, CART (Classification and Regression Tree) and Maxent (Maximum Entropy Models). In addition, we developed an ensemble forecast by averaging predicted probability of the above four models results. Commonly used model performance metrics (Area under ROC (AUC) and true skill statistic (TSS)) were employed to evaluate model accuracy. The latest satellite tracking data and compiled literature data were used as two independent testing datasets to confront model predictions. We found Random Forest demonstrated the best performance for the most assessment method, provided a better model fit to the testing data, and achieved better species range maps for each crane species in undersampled areas. Random Forest has been generally available for more than 20 years and has been known to perform extremely well in ecological predictions. However, while increasingly on the rise, its potential is still widely underused in conservation, (spatial) ecological applications and for inference. Our results show that it informs ecological and biogeographical theories as well as being suitable for conservation applications, specifically when the study area is undersampled. This method helps to save model-selection time and effort, and allows robust and rapid assessments and decisions for efficient conservation. PMID:28097060

  13. New Tree-Classification System Used by the Southern Forest Inventory and Analysis Unit

    Treesearch

    Dennis M. May; John S. Vissage; D. Vince Few

    1990-01-01

    Trees at USDA Forest Service, Southern Forest Inventory and Analysis, sample locations are classified as growing stock or cull based on their ability to produce sawlogs. The old and new classification systems are compared, and the impacts of the new system on the reporting of tree volumes are illustrated with inventory data from north Alabama.

  14. Automatic Screening and Grading of Age-Related Macular Degeneration from Texture Analysis of Fundus Images

    PubMed Central

    Phan, Thanh Vân; Seoud, Lama; Chakor, Hadi; Cheriet, Farida

    2016-01-01

    Age-related macular degeneration (AMD) is a disease which causes visual deficiency and irreversible blindness to the elderly. In this paper, an automatic classification method for AMD is proposed to perform robust and reproducible assessments in a telemedicine context. First, a study was carried out to highlight the most relevant features for AMD characterization based on texture, color, and visual context in fundus images. A support vector machine and a random forest were used to classify images according to the different AMD stages following the AREDS protocol and to evaluate the features' relevance. Experiments were conducted on a database of 279 fundus images coming from a telemedicine platform. The results demonstrate that local binary patterns in multiresolution are the most relevant for AMD classification, regardless of the classifier used. Depending on the classification task, our method achieves promising performances with areas under the ROC curve between 0.739 and 0.874 for screening and between 0.469 and 0.685 for grading. Moreover, the proposed automatic AMD classification system is robust with respect to image quality. PMID:27190636

  15. Multiple-Primitives Hierarchical Classification of Airborne Laser Scanning Data in Urban Areas

    NASA Astrophysics Data System (ADS)

    Ni, H.; Lin, X. G.; Zhang, J. X.

    2017-09-01

    A hierarchical classification method for Airborne Laser Scanning (ALS) data of urban areas is proposed in this paper. This method is composed of three stages among which three types of primitives are utilized, i.e., smooth surface, rough surface, and individual point. In the first stage, the input ALS data is divided into smooth surfaces and rough surfaces by employing a step-wise point cloud segmentation method. In the second stage, classification based on smooth surfaces and rough surfaces is performed. Points in the smooth surfaces are first classified into ground and buildings based on semantic rules. Next, features of rough surfaces are extracted. Then, points in rough surfaces are classified into vegetation and vehicles based on the derived features and Random Forests (RF). In the third stage, point-based features are extracted for the ground points, and then, an individual point classification procedure is performed to classify the ground points into bare land, artificial ground and greenbelt. Moreover, the shortages of the existing studies are analyzed, and experiments show that the proposed method overcomes these shortages and handles more types of objects.

  16. Spectral resampling based on user-defined inter-band correlation filter: C3 and C4 grass species classification

    NASA Astrophysics Data System (ADS)

    Adjorlolo, Clement; Mutanga, Onisimo; Cho, Moses A.; Ismail, Riyad

    2013-04-01

    In this paper, a user-defined inter-band correlation filter function was used to resample hyperspectral data and thereby mitigate the problem of multicollinearity in classification analysis. The proposed resampling technique convolves the spectral dependence information between a chosen band-centre and its shorter and longer wavelength neighbours. Weighting threshold of inter-band correlation (WTC, Pearson's r) was calculated, whereby r = 1 at the band-centre. Various WTC (r = 0.99, r = 0.95 and r = 0.90) were assessed, and bands with coefficients beyond a chosen threshold were assigned r = 0. The resultant data were used in the random forest analysis to classify in situ C3 and C4 grass canopy reflectance. The respective WTC datasets yielded improved classification accuracies (kappa = 0.82, 0.79 and 0.76) with less correlated wavebands when compared to resampled Hyperion bands (kappa = 0.76). Overall, the results obtained from this study suggested that resampling of hyperspectral data should account for the spectral dependence information to improve overall classification accuracy as well as reducing the problem of multicollinearity.

  17. A Partial Least Squares Based Procedure for Upstream Sequence Classification in Prokaryotes.

    PubMed

    Mehmood, Tahir; Bohlin, Jon; Snipen, Lars

    2015-01-01

    The upstream region of coding genes is important for several reasons, for instance locating transcription factor, binding sites, and start site initiation in genomic DNA. Motivated by a recently conducted study, where multivariate approach was successfully applied to coding sequence modeling, we have introduced a partial least squares (PLS) based procedure for the classification of true upstream prokaryotic sequence from background upstream sequence. The upstream sequences of conserved coding genes over genomes were considered in analysis, where conserved coding genes were found by using pan-genomics concept for each considered prokaryotic species. PLS uses position specific scoring matrix (PSSM) to study the characteristics of upstream region. Results obtained by PLS based method were compared with Gini importance of random forest (RF) and support vector machine (SVM), which is much used method for sequence classification. The upstream sequence classification performance was evaluated by using cross validation, and suggested approach identifies prokaryotic upstream region significantly better to RF (p-value < 0.01) and SVM (p-value < 0.01). Further, the proposed method also produced results that concurred with known biological characteristics of the upstream region.

  18. Predicting alpine headwater stream intermittency: a case study in the northern Rocky Mountains

    USGS Publications Warehouse

    Sando, Thomas R.; Blasch, Kyle W.

    2015-01-01

    This investigation used climatic, geological, and environmental data coupled with observational stream intermittency data to predict alpine headwater stream intermittency. Prediction was made using a random forest classification model. Results showed that the most important variables in the prediction model were snowpack persistence, represented by average snow extent from March through July, mean annual mean monthly minimum temperature, and surface geology types. For stream catchments with intermittent headwater streams, snowpack, on average, persisted until early June, whereas for stream catchments with perennial headwater streams, snowpack, on average, persisted until early July. Additionally, on average, stream catchments with intermittent headwater streams were about 0.7 °C warmer than stream catchments with perennial headwater streams. Finally, headwater stream catchments primarily underlain by coarse, permeable sediment are significantly more likely to have intermittent headwater streams than those primarily underlain by impermeable bedrock. Comparison of the predicted streamflow classification with observed stream status indicated a four percent classification error for first-order streams and a 21 percent classification error for all stream orders in the study area.

  19. Thyroid Nodule Classification in Ultrasound Images by Fine-Tuning Deep Convolutional Neural Network.

    PubMed

    Chi, Jianning; Walia, Ekta; Babyn, Paul; Wang, Jimmy; Groot, Gary; Eramian, Mark

    2017-08-01

    With many thyroid nodules being incidentally detected, it is important to identify as many malignant nodules as possible while excluding those that are highly likely to be benign from fine needle aspiration (FNA) biopsies or surgeries. This paper presents a computer-aided diagnosis (CAD) system for classifying thyroid nodules in ultrasound images. We use deep learning approach to extract features from thyroid ultrasound images. Ultrasound images are pre-processed to calibrate their scale and remove the artifacts. A pre-trained GoogLeNet model is then fine-tuned using the pre-processed image samples which leads to superior feature extraction. The extracted features of the thyroid ultrasound images are sent to a Cost-sensitive Random Forest classifier to classify the images into "malignant" and "benign" cases. The experimental results show the proposed fine-tuned GoogLeNet model achieves excellent classification performance, attaining 98.29% classification accuracy, 99.10% sensitivity and 93.90% specificity for the images in an open access database (Pedraza et al. 16), while 96.34% classification accuracy, 86% sensitivity and 99% specificity for the images in our local health region database.

  20. Deep canyon and subalpine riparian and wetland plant associations of the Malheur, Umatilla, and Wallowa-Whitman National Forests.

    Treesearch

    Aaron F. Wells

    2006-01-01

    This guide presents a classification of the deep canyon and subalpine riparian and wetland vegetation types of the Malheur, Umatilla, and Wallowa-Whitman National Forests. A primary goal of the deep canyon and subalpine riparian and wetland classification was a seamless linkage with the midmontane northeastern Oregon riparian and wetland classification provided by...

  1. Classification and evaluation for forest sites on the Eastern Highland Rim and Pennyroyal.

    Treesearch

    Glendon W. Smalley

    1983-01-01

    Presents a comprehensive forest site classification system for the Eastern Highland Rim and Pennyroyal in north Alabama, east-central Tennessee, and central Kentucky. The system is based on physiography, geology, soils, topography, and vegetation.

  2. Mapping growing stock volume and forest live biomass: a case study of the Polissya region of Ukraine

    NASA Astrophysics Data System (ADS)

    Bilous, Andrii; Myroniuk, Viktor; Holiaka, Dmytrii; Bilous, Svitlana; See, Linda; Schepaschenko, Dmitry

    2017-10-01

    Forest inventory and biomass mapping are important tasks that require inputs from multiple data sources. In this paper we implement two methods for the Ukrainian region of Polissya: random forest (RF) for tree species prediction and k-nearest neighbors (k-NN) for growing stock volume and biomass mapping. We examined the suitability of the five-band RapidEye satellite image to predict the distribution of six tree species. The accuracy of RF is quite high: ~99% for forest/non-forest mask and 89% for tree species prediction. Our results demonstrate that inclusion of elevation as a predictor variable in the RF model improved the performance of tree species classification. We evaluated different distance metrics for the k-NN method, including Euclidean or Mahalanobis distance, most similar neighbor (MSN), gradient nearest neighbor, and independent component analysis. The MSN with the four nearest neighbors (k = 4) is the most precise (according to the root-mean-square deviation) for predicting forest attributes across the study area. The k-NN method allowed us to estimate growing stock volume with an accuracy of 3 m3 ha-1 and for live biomass of about 2 t ha-1 over the study area.

  3. A framework for mapping tree species combining hyperspectral and LiDAR data: Role of selected classifiers and sensor across three spatial scales

    NASA Astrophysics Data System (ADS)

    Ghosh, Aniruddha; Fassnacht, Fabian Ewald; Joshi, P. K.; Koch, Barbara

    2014-02-01

    Knowledge of tree species distribution is important worldwide for sustainable forest management and resource evaluation. The accuracy and information content of species maps produced using remote sensing images vary with scale, sensor (optical, microwave, LiDAR), classification algorithm, verification design and natural conditions like tree age, forest structure and density. Imaging spectroscopy reduces the inaccuracies making use of the detailed spectral response. However, the scale effect still has a strong influence and cannot be neglected. This study aims to bridge the knowledge gap in understanding the scale effect in imaging spectroscopy when moving from 4 to 30 m pixel size for tree species mapping, keeping in mind that most current and future hyperspectral satellite based sensors work with spatial resolution around 30 m or more. Two airborne (HyMAP) and one spaceborne (Hyperion) imaging spectroscopy dataset with pixel sizes of 4, 8 and 30 m, respectively were available to examine the effect of scale over a central European forest. The forest under examination is a typical managed forest with relatively homogenous stands featuring mostly two canopy layers. Normalized digital surface model (nDSM) derived from LiDAR data was used additionally to examine the effect of height information in tree species mapping. Six different sets of predictor variables (reflectance value of all bands, selected components of a Minimum Noise Fraction (MNF), Vegetation Indices (VI) and each of these sets combined with LiDAR derived height) were explored at each scale. Supervised kernel based (Support Vector Machines) and ensemble based (Random Forest) machine learning algorithms were applied on the dataset to investigate the effect of the classifier. Iterative bootstrap-validation with 100 iterations was performed for classification model building and testing for all the trials. For scale, analysis of overall classification accuracy and kappa values indicated that 8 m spatial resolution (reaching kappa values of over 0.83) slightly outperformed the results obtained from 4 m for the study area and five tree species under examination. The 30 m resolution Hyperion image produced sound results (kappa values of over 0.70), which in some areas of the test site were comparable with the higher spatial resolution imagery when qualitatively assessing the map outputs. Considering input predictor sets, MNF bands performed best at 4 and 8 m resolution. Optical bands were found to be best for 30 m spatial resolution. Classification with MNF as input predictors produced better visual appearance of tree species patches when compared with reference maps. Based on the analysis, it was concluded that there is no significant effect of height information on tree species classification accuracies for the present framework and study area. Furthermore, in the examined cases there was no single best choice among the two classifiers across scales and predictors. It can be concluded that tree species mapping from imaging spectroscopy for forest sites comparable to the one under investigation is possible with reliable accuracies not only from airborne but also from spaceborne imaging spectroscopy datasets.

  4. Automatic interpretation of ERTS data for forest management

    NASA Technical Reports Server (NTRS)

    Kirvida, L.; Johnson, G. R.

    1973-01-01

    Automatic stratification of forested land from ERTS-1 data provides a valuable tool for resource management. The results are useful for wood product yield estimates, recreation and wild life management, forest inventory and forest condition monitoring. Automatic procedures based on both multi-spectral and spatial features are evaluated. With five classes, training and testing on the same samples, classification accuracy of 74% was achieved using the MSS multispectral features. When adding texture computed from 8 x 8 arrays, classification accuracy of 99% was obtained.

  5. Forest cover type analysis of New England forests using innovative WorldView-2 imagery

    NASA Astrophysics Data System (ADS)

    Kovacs, Jenna M.

    For many years, remote sensing has been used to generate land cover type maps to create a visual representation of what is occurring on the ground. One significant use of remote sensing is the identification of forest cover types. New England forests are notorious for their especially complex forest structure and as a result have been, and continue to be, a challenge when classifying forest cover types. To most accurately depict forest cover types occurring on the ground, it is essential to utilize image data that have a suitable combination of both spectral and spatial resolution. The WorldView-2 (WV2) commercial satellite, launched in 2009, is the first of its kind, having both high spectral and spatial resolutions. WV2 records eight bands of multispectral imagery, four more than the usual high spatial resolution sensors, and has a pixel size of 1.85 meters at the nadir. These additional bands have the potential to improve classification detail and classification accuracy of forest cover type maps. For this reason, WV2 imagery was utilized on its own, and in combination with Landsat 5 TM (LS5) multispectral imagery, to evaluate whether these image data could more accurately classify forest cover types. In keeping with recent developments in image analysis, an Object-Based Image Analysis (OBIA) approach was used to segment images of Pawtuckaway State Park and nearby private lands, an area representative of the typical complex forest structure found in the New England region. A Classification and Regression Tree (CART) analysis was then used to classify image segments at two levels of classification detail. Accuracies for each forest cover type map produced were generated using traditional and area-based error matrices, and additional standard accuracy measures (i.e., KAPPA) were generated. The results from this study show that there is value in analyzing imagery with both high spectral and spatial resolutions, and that WV2's new and innovative bands can be useful for the classification of complex forest structures.

  6. Characteristics of Forests in Western Sayani Mountains, Siberia from SAR Data

    NASA Technical Reports Server (NTRS)

    Ranson, K. Jon; Sun, Guoqing; Kharuk, V. I.; Kovacs, Katalin

    1998-01-01

    This paper investigated the possibility of using spaceborne radar data to map forest types and logging in the mountainous Western Sayani area in Siberia. L and C band HH, HV, and VV polarized images from the Shuttle Imaging Radar-C instrument were used in the study. Techniques to reduce topographic effects in the radar images were investigated. These included radiometric correction using illumination angle inferred from a digital elevation model, and reducing apparent effects of topography through band ratios. Forest classification was performed after terrain correction utilizing typical supervised techniques and principal component analyses. An ancillary data set of local elevations was also used to improve the forest classification. Map accuracy for each technique was estimated for training sites based on Russian forestry maps, satellite imagery and field measurements. The results indicate that it is necessary to correct for topography when attempting to classify forests in mountainous terrain. Radiometric correction based on a DEM (Digital Elevation Model) improved classification results but required reducing the SAR (Synthetic Aperture Radar) resolution to match the DEM. Using ratios of SAR channels that include cross-polarization improved classification and

  7. Use and applicability of the vegetation component of the national site classification system. [Sumter National Forest, South Carolina

    NASA Technical Reports Server (NTRS)

    Clark, C. A. (Principal Investigator)

    1981-01-01

    Existing vegetation on a site in Sumter National Forest, South Carolina was classified using high altitude aerial optical bar color infrared photography in an effort to determine if the National Site Classification (NSC) system could be used in the heterogeneously forested southeastern United States where it had not previously been used. Results show that the revised UNESCO international classification and mapping of vegetation system, as incorporated into the NSCS, is general enough at the higher levels and specific enough at the lower levels to adequately accommodate densely forested, heterogeneous areas as well as the larger, more homogeneous regions of the Pacific Northwest. The major problem is of existing vegetation versus natural vegetation.

  8. Integration of adaptive guided filtering, deep feature learning, and edge-detection techniques for hyperspectral image classification

    NASA Astrophysics Data System (ADS)

    Wan, Xiaoqing; Zhao, Chunhui; Gao, Bing

    2017-11-01

    The integration of an edge-preserving filtering technique in the classification of a hyperspectral image (HSI) has been proven effective in enhancing classification performance. This paper proposes an ensemble strategy for HSI classification using an edge-preserving filter along with a deep learning model and edge detection. First, an adaptive guided filter is applied to the original HSI to reduce the noise in degraded images and to extract powerful spectral-spatial features. Second, the extracted features are fed as input to a stacked sparse autoencoder to adaptively exploit more invariant and deep feature representations; then, a random forest classifier is applied to fine-tune the entire pretrained network and determine the classification output. Third, a Prewitt compass operator is further performed on the HSI to extract the edges of the first principal component after dimension reduction. Moreover, the regional growth rule is applied to the resulting edge logical image to determine the local region for each unlabeled pixel. Finally, the categories of the corresponding neighborhood samples are determined in the original classification map; then, the major voting mechanism is implemented to generate the final output. Extensive experiments proved that the proposed method achieves competitive performance compared with several traditional approaches.

  9. Classification of Strawberry Fruit Shape by Machine Learning

    NASA Astrophysics Data System (ADS)

    Ishikawa, T.; Hayashi, A.; Nagamatsu, S.; Kyutoku, Y.; Dan, I.; Wada, T.; Oku, K.; Saeki, Y.; Uto, T.; Tanabata, T.; Isobe, S.; Kochi, N.

    2018-05-01

    Shape is one of the most important traits of agricultural products due to its relationships with the quality, quantity, and value of the products. For strawberries, the nine types of fruit shape were defined and classified by humans based on the sampler patterns of the nine types. In this study, we tested the classification of strawberry shapes by machine learning in order to increase the accuracy of the classification, and we introduce the concept of computerization into this field. Four types of descriptors were extracted from the digital images of strawberries: (1) the Measured Values (MVs) including the length of the contour line, the area, the fruit length and width, and the fruit width/length ratio; (2) the Ellipse Similarity Index (ESI); (3) Elliptic Fourier Descriptors (EFDs), and (4) Chain Code Subtraction (CCS). We used these descriptors for the classification test along with the random forest approach, and eight of the nine shape types were classified with combinations of MVs + CCS + EFDs. CCS is a descriptor that adds human knowledge to the chain codes, and it showed higher robustness in classification than the other descriptors. Our results suggest machine learning's high ability to classify fruit shapes accurately. We will attempt to increase the classification accuracy and apply the machine learning methods to other plant species.

  10. Fiber tractography using machine learning.

    PubMed

    Neher, Peter F; Côté, Marc-Alexandre; Houde, Jean-Christophe; Descoteaux, Maxime; Maier-Hein, Klaus H

    2017-09-01

    We present a fiber tractography approach based on a random forest classification and voting process, guiding each step of the streamline progression by directly processing raw diffusion-weighted signal intensities. For comparison to the state-of-the-art, i.e. tractography pipelines that rely on mathematical modeling, we performed a quantitative and qualitative evaluation with multiple phantom and in vivo experiments, including a comparison to the 96 submissions of the ISMRM tractography challenge 2015. The results demonstrate the vast potential of machine learning for fiber tractography. Copyright © 2017 Elsevier Inc. All rights reserved.

  11. Computer-Aided Diagnosis for Breast Ultrasound Using Computerized BI-RADS Features and Machine Learning Methods.

    PubMed

    Shan, Juan; Alam, S Kaisar; Garra, Brian; Zhang, Yingtao; Ahmed, Tahira

    2016-04-01

    This work identifies effective computable features from the Breast Imaging Reporting and Data System (BI-RADS), to develop a computer-aided diagnosis (CAD) system for breast ultrasound. Computerized features corresponding to ultrasound BI-RADs categories were designed and tested using a database of 283 pathology-proven benign and malignant lesions. Features were selected based on classification performance using a "bottom-up" approach for different machine learning methods, including decision tree, artificial neural network, random forest and support vector machine. Using 10-fold cross-validation on the database of 283 cases, the highest area under the receiver operating characteristic (ROC) curve (AUC) was 0.84 from a support vector machine with 77.7% overall accuracy; the highest overall accuracy, 78.5%, was from a random forest with the AUC 0.83. Lesion margin and orientation were optimum features common to all of the different machine learning methods. These features can be used in CAD systems to help distinguish benign from worrisome lesions. Copyright © 2016 World Federation for Ultrasound in Medicine & Biology. All rights reserved.

  12. Classification of Hyperspectral Data Based on Guided Filtering and Random Forest

    NASA Astrophysics Data System (ADS)

    Ma, H.; Feng, W.; Cao, X.; Wang, L.

    2017-09-01

    Hyperspectral images usually consist of more than one hundred spectral bands, which have potentials to provide rich spatial and spectral information. However, the application of hyperspectral data is still challengeable due to "the curse of dimensionality". In this context, many techniques, which aim to make full use of both the spatial and spectral information, are investigated. In order to preserve the geometrical information, meanwhile, with less spectral bands, we propose a novel method, which combines principal components analysis (PCA), guided image filtering and the random forest classifier (RF). In detail, PCA is firstly employed to reduce the dimension of spectral bands. Secondly, the guided image filtering technique is introduced to smooth land object, meanwhile preserving the edge of objects. Finally, the features are fed into RF classifier. To illustrate the effectiveness of the method, we carry out experiments over the popular Indian Pines data set, which is collected by Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) sensor. By comparing the proposed method with the method of only using PCA or guided image filter, we find that effect of the proposed method is better.

  13. Phase I Forest Area Estimation Using Landsat TM and Iterative Guided Spectral Class Rejection: Assessment of Possible Training Data Protocols

    Treesearch

    John A. Scrivani; Randolph H. Wynne; Christine E. Blinn; Rebecca F. Musy

    2001-01-01

    Two methods of training data collection for automated image classification were tested in Virginia as part of a larger effort to develop an objective, repeatable, and low-cost method to provide forest area classification from satellite imagery. The derived forest area estimates were compared to estimates derived from a traditional photo-interpreted, double sample. One...

  14. Tree mortality based fire severity classification for forest inventories: A Pacific Northwest national forests example

    Treesearch

    Thomas R. Whittier; Andrew N. Gray

    2016-01-01

    Determining how the frequency, severity, and extent of forest fires are changing in response to changes in management and climate is a key concern in many regions where fire is an important natural disturbance. In the USA the only national-scale fire severity classification uses satellite image changedetection to produce maps for large (>400 ha) fires, and is...

  15. Comparison of Random Forest and Support Vector Machine classifiers using UAV remote sensing imagery

    NASA Astrophysics Data System (ADS)

    Piragnolo, Marco; Masiero, Andrea; Pirotti, Francesco

    2017-04-01

    Since recent years surveying with unmanned aerial vehicles (UAV) is getting a great amount of attention due to decreasing costs, higher precision and flexibility of usage. UAVs have been applied for geomorphological investigations, forestry, precision agriculture, cultural heritage assessment and for archaeological purposes. It can be used for land use and land cover classification (LULC). In literature, there are two main types of approaches for classification of remote sensing imagery: pixel-based and object-based. On one hand, pixel-based approach mostly uses training areas to define classes and respective spectral signatures. On the other hand, object-based classification considers pixels, scale, spatial information and texture information for creating homogeneous objects. Machine learning methods have been applied successfully for classification, and their use is increasing due to the availability of faster computing capabilities. The methods learn and train the model from previous computation. Two machine learning methods which have given good results in previous investigations are Random Forest (RF) and Support Vector Machine (SVM). The goal of this work is to compare RF and SVM methods for classifying LULC using images collected with a fixed wing UAV. The processing chain regarding classification uses packages in R, an open source scripting language for data analysis, which provides all necessary algorithms. The imagery was acquired and processed in November 2015 with cameras providing information over the red, blue, green and near infrared wavelength reflectivity over a testing area in the campus of Agripolis, in Italy. Images were elaborated and ortho-rectified through Agisoft Photoscan. The ortho-rectified image is the full data set, and the test set is derived from partial sub-setting of the full data set. Different tests have been carried out, using a percentage from 2 % to 20 % of the total. Ten training sets and ten validation sets are obtained from each test set. The control dataset consist of an independent visual classification done by an expert over the whole area. The classes are (i) broadleaf, (ii) building, (iii) grass, (iv) headland access path, (v) road, (vi) sowed land, (vii) vegetable. The RF and SVM are applied to the test set. The performances of the methods are evaluated using the three following accuracy metrics: Kappa index, Classification accuracy and Classification Error. All three are calculated in three different ways: with K-fold cross validation, using the validation test set and using the full test set. The analysis indicates that SVM gets better results in terms of good scores using K-fold cross or validation test set. Using the full test set, RF achieves a better result in comparison to SVM. It also seems that SVM performs better with smaller training sets, whereas RF performs better as training sets get larger.

  16. Optimal Subset Selection of Time-Series MODIS Images and Sample Data Transfer with Random Forests for Supervised Classification Modelling

    PubMed Central

    Zhou, Fuqun; Zhang, Aining

    2016-01-01

    Nowadays, various time-series Earth Observation data with multiple bands are freely available, such as Moderate Resolution Imaging Spectroradiometer (MODIS) datasets including 8-day composites from NASA, and 10-day composites from the Canada Centre for Remote Sensing (CCRS). It is challenging to efficiently use these time-series MODIS datasets for long-term environmental monitoring due to their vast volume and information redundancy. This challenge will be greater when Sentinel 2–3 data become available. Another challenge that researchers face is the lack of in-situ data for supervised modelling, especially for time-series data analysis. In this study, we attempt to tackle the two important issues with a case study of land cover mapping using CCRS 10-day MODIS composites with the help of Random Forests’ features: variable importance, outlier identification. The variable importance feature is used to analyze and select optimal subsets of time-series MODIS imagery for efficient land cover mapping, and the outlier identification feature is utilized for transferring sample data available from one year to an adjacent year for supervised classification modelling. The results of the case study of agricultural land cover classification at a regional scale show that using only about a half of the variables we can achieve land cover classification accuracy close to that generated using the full dataset. The proposed simple but effective solution of sample transferring could make supervised modelling possible for applications lacking sample data. PMID:27792152

  17. Organic cattle products: Authenticating production origin by analysis of serum mineral content.

    PubMed

    Rodríguez-Bermúdez, Ruth; Herrero-Latorre, Carlos; López-Alonso, Marta; Losada, David E; Iglesias, Roberto; Miranda, Marta

    2018-10-30

    An authentication procedure for differentiating between organic and non-organic cattle production on the basis of analysis of serum samples has been developed. For this purpose, the concentrations of fourteen mineral elements (As, Cd, Co, Cr, Cu, Fe, Hg, I, Mn, Mo, Ni, Pb, Se and Zn) in 522 serum samples from cows (341 from organic farms and 181 from non-organic farms), determined by inductively coupled plasma spectrometry, were used. The chemical information provided by serum analysis was employed to construct different pattern recognition classification models that predict the origin of each sample: organic or non-organic class. Among all classification procedures considered, the best results were obtained with the decision tree C5.0, Random Forest and AdaBoost neural networks, with hit levels close to 90% for both production types. The proposed method, involving analysis of serum samples, provided rapid, accurate in vivo classification of cattle according to organic and non-organic production type. Copyright © 2018 Elsevier Ltd. All rights reserved.

  18. Large and Small-Scale Cropland Classification on the Foothills of Mount Kenya Based on SPOT-5 Take-5 Data Time Series

    NASA Astrophysics Data System (ADS)

    Eckert, Sandra

    2016-08-01

    The SPOT-5 Take 5 campaign provided SPOT time series data of an unprecedented spatial and temporal resolution. We analysed 29 scenes acquired between May and September 2015 of a semi-arid region in the foothills of Mount Kenya, with two aims: first, to distinguish rainfed from irrigated cropland and cropland from natural vegetation covers, which show similar reflectance patterns; and second, to identify individual crop types. We tested several input data sets in different combinations: the spectral bands and the normalized difference vegetation index (NDVI) time series, principal components of NDVI time series, and selected NDVI time series statistics. For the classification we used random forests (RF). In the test differentiating rainfed cropland, irrigated cropland, and natural vegetation covers, the best classification accuracies were achieved using spectral bands. For the differentiation of crop types, we analysed the phenology of selected crop types based on NDVI time series. First results are promising.

  19. Computational Short-cutting the Big Data Classification Bottleneck: Using the MODIS Land Cover Product to Derive a Consistent 30 m Landsat Land Cover Product of the Conterminous United States

    NASA Astrophysics Data System (ADS)

    Zhang, H.; Roy, D. P.

    2016-12-01

    Classification is a fundamental process in remote sensing used to relate pixel values to land cover classes present on the surface. The state of the practice for large area land cover classification is to classify satellite time series metrics with a supervised (i.e., training data dependent) non-parametric classifier. Classification accuracy generally increases with training set size. However, training data collection is expensive and the optimal training distribution over large areas is unknown. The MODIS 500 m land cover product is available globally on an annual basis and so provides a potentially very large source of land cover training data. A novel methodology to classify large volume Landsat data using high quality training data derived automatically from the MODIS land cover product is demonstrated for all of the Conterminous United States (CONUS). The known misclassification accuracy of the MODIS land cover product and the scale difference between the 500 m MODIS and 30 m Landsat data are accommodated for by a novel MODIS product filtering, Landsat pixel selection, and iterative training approach to balance the proportion of local and CONUS training data used. Three years of global Web-enabled Landsat data (WELD) data for all of the CONUS are classified using a random forest classifier and the results assessed using random forest `out-of-bag' training samples. The global WELD data are corrected to surface nadir BRDF-Adjusted Reflectance and are defined in 158 × 158 km tiles in the same projection and nested to the MODIS land cover products. This reduces the need to pre-process the considerable Landsat data volume (more than 14,000 Landsat 5 and 7 scenes per year over the CONUS covering 11,000 million 30 m pixels). The methodology is implemented in a parallel manner on WELD tile by tile basis but provides a wall-to-wall seamless 30 m land cover product. Detailed tile and CONUS results are presented and the potential for global production using the recently available global WELD products are discussed.

  20. Mapping Successional Stages in a Wet Tropical Forest Using Landsat ETM+ and Forest Inventory Data

    NASA Technical Reports Server (NTRS)

    Goncalves, Fabio G.; Yatskov, Mikhail; dos Santos, Joao Roberto; Treuhaft, Robert N.; Law, Beverly E.

    2010-01-01

    In this study, we test whether an existing classification technique based on the integration of Landsat ETM+ and forest inventory data enables detailed characterization of successional stages in a wet tropical forest site. The specific objectives were: (1) to map forest age classes across the La Selva Biological Station in Costa Rica; and (2) to quantify uncertainties in the proposed approach in relation to field data and existing vegetation maps. Although significant relationships between vegetation height entropy (a surrogate for forest age) and ETM+ data were detected, the classification scheme tested in this study was not suitable for characterizing spatial variation in age at La Selva, as evidenced by the error matrix and the low Kappa coefficient (12.9%). Factors affecting the performance of the classification at this particular study site include the smooth transition in vegetation structure between intermediate and advanced successional stages, and the low sensitivity of NDVI to variations in vertical structure at high biomass levels.

  1. Multiple-factor classification of a human-modified forest landscape in the Hsuehshan Mountain Range, Taiwan.

    PubMed

    Berg, Kevan J; Icyeh, Lahuy; Lin, Yih-Ren; Janz, Arnold; Newmaster, Steven G

    2016-12-01

    Human actions drive landscape heterogeneity, yet most ecosystem classifications omit the role of human influence. This study explores land use history to inform a classification of forestland of the Tayal Mrqwang indigenous people of Taiwan. Our objectives were to determine the extent to which human action drives landscape heterogeneity. We used interviews, field sampling, and multivariate analysis to relate vegetation patterns to environmental gradients and human modification across 76 sites. We identified eleven forest classes. In total, around 70 % of plots were at lower elevations and had a history of shifting cultivation, terrace farming, and settlement that resulted in alder, laurel, oak, pine, and bamboo stands. Higher elevation mixed conifer forests were least disturbed. Arboriculture and selective harvesting were drivers of other conspicuous forest patterns. The findings show that past land uses play a key role in shaping forests, which is important to consider when setting targets to guide forest management.

  2. High-Resolution Regional Biomass Map of Siberia from Glas, Palsar L-Band Radar and Landsat Vcf Data

    NASA Astrophysics Data System (ADS)

    Sun, G.; Ranson, K.; Montesano, P.; Zhang, Z.; Kharuk, V.

    2015-12-01

    The Arctic-Boreal zone is known be warming at an accelerated rate relative to other biomes. The taiga or boreal forest covers over 16 x106 km2 of Arctic North America, Scandinavia, and Eurasia. A large part of the northern Boreal forests are in Russia's Siberia, as area with recent accelerated climate warming. During the last two decades we have been working on characterization of boreal forests in north-central Siberia using field and satellite measurements. We have published results of circumpolar biomass using field plots, airborne (PALS, ACTM) and spaceborne (GLAS) lidar data with ASTER DEM, LANDSAT and MODIS land cover classification, MODIS burned area and WWF's ecoregion map. Researchers from ESA and Russia have also been working on biomass (or growing stock) mapping in Siberia. For example, they developed a pan-boreal growing stock volume map at 1-kilometer scale using hyper-temporal ENVISAT ASAR ScanSAR backscatter data. Using the annual PALSAR mosaics from 2007 to 2010 growing stock volume maps were retrieved based on a supervised random forest regression approach. This method is being used in the ESA/Russia ZAPAS project for Central Siberia Biomass mapping. Spatially specific biomass maps of this region at higher resolution are desired for carbon cycle and climate change studies. In this study, our work focused on improving resolution ( 50 m) of a biomass map based on PALSAR L-band data and Landsat Vegetation Canopy Fraction products. GLAS data were carefully processed and screened using land cover classification, local slope, and acquisition dates. The biomass at remaining footprints was estimated using a model developed from field measurements at GLAS footprints. The GLAS biomass samples were then aggregated into 1 Mg/ha bins of biomass and mean VCF and PALSAR backscatter and textures were calculated for each of these biomass bins. The resulted biomass/signature data was used to train a random forest model for biomass mapping of entire region from 50oN to 75oN, and 80oE to 145oE. The spatial patterns of the new biomass map is much better than the previous maps due to spatially specific mapping in high resolution. The uncertainties of field/GLAS and GLAS/imagery models were investigated using bootstrap procedure, and the final biomass map was compared with previous maps.

  3. An Initial Analysis of LANDSAT-4 Thematic Mapper Data for the Discrimination of Agricultural, Forested Wetland, and Urban Land Covers

    NASA Technical Reports Server (NTRS)

    Quattrochi, D. A.

    1984-01-01

    An initial analysis of LANDSAT 4 Thematic Mapper (TM) data for the discrimination of agricultural, forested wetland, and urban land covers is conducted using a scene of data collected over Arkansas and Tennessee. A classification of agricultural lands derived from multitemporal LANDSAT Multispectral Scanner (MSS) data is compared with a classification of TM data for the same area. Results from this comparative analysis show that the multitemporal MSS classification produced an overall accuracy of 80.91% while the TM classification yields an overall classification accuracy of 97.06% correct.

  4. Determining successional stage of temperate coniferous forests with Landsat satellite data

    NASA Technical Reports Server (NTRS)

    Fiorella, Maria; Ripple, William J.

    1995-01-01

    Thematic Mapper (TM) digital imagery was used to map forest successional stages and to evaluate spectral differences between old-growth and mature forests in the central Cascade Range of Oregon. Relative sun incidence values were incorporated into the successional stage classification to compensate for topographic induced variation. Relative sun incidence improved the classification accuracy of young successional stages, but did not improve the classification accuracy of older, closed canopy forest classes or overall accuracy. TM bands 1, 2, and 4; the normalized difference vegetation index (NDVI); and TM 4/3, 4/5, and 4/7 band ratio values for old-growth forests were found to be significantly lower than the values of mature forests (P less than or equal to 0.010). Wetness and the TM 4/5 and 4/7 band ratios all had low correlations to relative sun incidence (r(exp 2) less than or equal to 0.16). The TM 4/5 band ratio was named the 'structural index' (SI) because of its ability to distinguish between mature and old-growth forests and its simplicity.

  5. Classification and evaluation for forest sites on the Western Highland Rim and Pennyroyal

    Treesearch

    Glendon W. Smalley

    1980-01-01

    Presents a comprehensive forest site classification system for the Western Highland Rim and Western Pennyroyal-Limestone area in northwest Alabama, west-central Tennessee, and western Kentucky. The system is based on physiography, geology, soils, topography, and vegetation.

  6. Classification of wines according to their production regions with the contained trace elements using laser-induced breakdown spectroscopy

    NASA Astrophysics Data System (ADS)

    Tian, Ye; Yan, Chunhua; Zhang, Tianlong; Tang, Hongsheng; Li, Hua; Yu, Jialu; Bernard, Jérôme; Chen, Li; Martin, Serge; Delepine-Gilon, Nicole; Bocková, Jana; Veis, Pavel; Chen, Yanping; Yu, Jin

    2017-09-01

    Laser-induced breakdown spectroscopy (LIBS) has been applied to classify French wines according to their production regions. The use of the surface-assisted (or surface-enhanced) sample preparation method enabled a sub-ppm limit of detection (LOD), which led to the detection and identification of at least 22 metal and nonmetal elements in a typical wine sample including majors, minors and traces. An ensemble of 29 bottles of French wines, either red or white wines, from five production regions, Alsace, Bourgogne, Beaujolais, Bordeaux and Languedoc, was analyzed together with a wine from California, considered as an outlier. A non-supervised classification model based on principal component analysis (PCA) was first developed for the classification. The results showed a limited separation power of the model, which however allowed, in a step by step approach, to understand the physical reasons behind each step of sample separation and especially to observe the influence of the matrix effect in the sample classification. A supervised classification model was then developed based on random forest (RF), which is in addition a nonlinear algorithm. The obtained classification results were satisfactory with, when the parameters of the model were optimized, a classification accuracy of 100% for the tested samples. We especially discuss in the paper, the effect of spectrum normalization with an internal reference, the choice of input variables for the classification models and the optimization of parameters for the developed classification models.

  7. Automatic classification of protein structures using physicochemical parameters.

    PubMed

    Mohan, Abhilash; Rao, M Divya; Sunderrajan, Shruthi; Pennathur, Gautam

    2014-09-01

    Protein classification is the first step to functional annotation; SCOP and Pfam databases are currently the most relevant protein classification schemes. However, the disproportion in the number of three dimensional (3D) protein structures generated versus their classification into relevant superfamilies/families emphasizes the need for automated classification schemes. Predicting function of novel proteins based on sequence information alone has proven to be a major challenge. The present study focuses on the use of physicochemical parameters in conjunction with machine learning algorithms (Naive Bayes, Decision Trees, Random Forest and Support Vector Machines) to classify proteins into their respective SCOP superfamily/Pfam family, using sequence derived information. Spectrophores™, a 1D descriptor of the 3D molecular field surrounding a structure was used as a benchmark to compare the performance of the physicochemical parameters. The machine learning algorithms were modified to select features based on information gain for each SCOP superfamily/Pfam family. The effect of combining physicochemical parameters and spectrophores on classification accuracy (CA) was studied. Machine learning algorithms trained with the physicochemical parameters consistently classified SCOP superfamilies and Pfam families with a classification accuracy above 90%, while spectrophores performed with a CA of around 85%. Feature selection improved classification accuracy for both physicochemical parameters and spectrophores based machine learning algorithms. Combining both attributes resulted in a marginal loss of performance. Physicochemical parameters were able to classify proteins from both schemes with classification accuracy ranging from 90-96%. These results suggest the usefulness of this method in classifying proteins from amino acid sequences.

  8. Waveform classification and statistical analysis of seismic precursors to the July 2008 Vulcanian Eruption of Soufrière Hills Volcano, Montserrat

    NASA Astrophysics Data System (ADS)

    Rodgers, Mel; Smith, Patrick; Pyle, David; Mather, Tamsin

    2016-04-01

    Understanding the transition between quiescence and eruption at dome-forming volcanoes, such as Soufrière Hills Volcano (SHV), Montserrat, is important for monitoring volcanic activity during long-lived eruptions. Statistical analysis of seismic events (e.g. spectral analysis and identification of multiplets via cross-correlation) can be useful for characterising seismicity patterns and can be a powerful tool for analysing temporal changes in behaviour. Waveform classification is crucial for volcano monitoring, but consistent classification, both during real-time analysis and for retrospective analysis of previous volcanic activity, remains a challenge. Automated classification allows consistent re-classification of events. We present a machine learning (random forest) approach to rapidly classify waveforms that requires minimal training data. We analyse the seismic precursors to the July 2008 Vulcanian explosion at SHV and show systematic changes in frequency content and multiplet behaviour that had not previously been recognised. These precursory patterns of seismicity may be interpreted as changes in pressure conditions within the conduit during magma ascent and could be linked to magma flow rates. Frequency analysis of the different waveform classes supports the growing consensus that LP and Hybrid events should be considered end members of a continuum of low-frequency source processes. By using both supervised and unsupervised machine-learning methods we investigate the nature of waveform classification and assess current classification schemes.

  9. Description of a land classification system and its application to the management of Tennessee's state forests

    Treesearch

    Glendon W. Smalley; S. David Todd; K. Ward Tarkington

    2006-01-01

    The Tennessee Division of Forestry has adopted a land classification system developed by the senior author as the basic theme of information for the management of its 15 state forests (162,371 acres) with at least 1 in each of 8 physiographic provinces. This paper summarizes the application of the system to six forests on the Cumberland Plateau. Landtypes are the most...

  10. Forest vegetation of the Black Hills National Forest of South Dakota and Wyoming: A habitat type classification

    Treesearch

    George R. Hoffman; Robert R. Alexander

    1987-01-01

    A vegetation classification based on concepts and methods developed by Daubenmire was used to identify 12 forest habitat types and one shrub habitat type in the Black Hills. Included were two habitat types in the Quercus macrocarpa series, seven in the Pinus ponderosa series, one in the Populus tremuloides series, two in the Picea glaucci series, and one in the...

  11. A Comparative Object-Based Sugarcane Classification from Sentinel-2 Data Using Random Forests and Support Vector Machines

    NASA Astrophysics Data System (ADS)

    Chen, C. R.; Chen, C. F.; Nguyen, S. T.; Lau, K.; Lay, J. G.

    2016-12-01

    Sugarcane mostly grown in tropical and subtropical regions is one of the important commercial crops worldwide, providing significant employment, foreign exchange earnings, and other social and environmental benefits. The sugar industry is a vital component of Belize's economy as it provides employment to 15% of the country's population and 60% of the national agricultural exports. Sugarcane mapping is thus an important task due to official initiatives to provide reliable information on sugarcane-growing areas in respect to improved accuracy in monitoring sugarcane production and yield estimates. Policymakers need such monitoring information to formulate timely plans to ensure sustainably socioeconomic development. Sugarcane monitoring in Belize is traditionally carried out through time-consuming and costly field surveys. Remote sensing is an indispensable tool for crop monitoring on national, regional and global scales. The use of high and low resolution satellites for sugarcane monitoring in Belize is often restricted due to cost limitations and mixed pixel problems because sugarcane fields are small and fragmental. With the launch of Sentinel-2 satellite, it is possible to collectively map small patches of sugarcane fields over a large region as the data are free of charge and have high spectral, spatial, and temporal resolutions. This study aims to develop an object-based classification approach to comparatively map sugarcane fields in Belize from Sentinel-2 data using random forests (RF) and support vector machines (SVM). The data were processed through four main steps: (1) data pre-processing, (2) image segmentation, (3) sugarcane classification, and (4) accuracy assessment. The mapping results compared with the ground reference data indicated satisfactory results. The overall accuracies and Kappa coefficients were generally higher than 80% and 0.7, in both cases. The RF produced slightly more accurate mapping results than SVM. This study demonstrates the realization of the potential application of Sentinel-2 data for sugarcane mapping in Belize with the aid of RF and SVM methods. The methods are thus proposed for monitoring purposes in the country.

  12. Beyond where to how: a machine learning approach for sensing mobility contexts using smartphone sensors.

    PubMed

    Guinness, Robert E

    2015-04-28

    This paper presents the results of research on the use of smartphone sensors (namely, GPS and accelerometers), geospatial information (points of interest, such as bus stops and train stations) and machine learning (ML) to sense mobility contexts. Our goal is to develop techniques to continuously and automatically detect a smartphone user's mobility activities, including walking, running, driving and using a bus or train, in real-time or near-real-time (<5 s). We investigated a wide range of supervised learning techniques for classification, including decision trees (DT), support vector machines (SVM), naive Bayes classifiers (NB), Bayesian networks (BN), logistic regression (LR), artificial neural networks (ANN) and several instance-based classifiers (KStar, LWLand IBk). Applying ten-fold cross-validation, the best performers in terms of correct classification rate (i.e., recall) were DT (96.5%), BN (90.9%), LWL (95.5%) and KStar (95.6%). In particular, the DT-algorithm RandomForest exhibited the best overall performance. After a feature selection process for a subset of algorithms, the performance was improved slightly. Furthermore, after tuning the parameters of RandomForest, performance improved to above 97.5%. Lastly, we measured the computational complexity of the classifiers, in terms of central processing unit (CPU) time needed for classification, to provide a rough comparison between the algorithms in terms of battery usage requirements. As a result, the classifiers can be ranked from lowest to highest complexity (i.e., computational cost) as follows: SVM, ANN, LR, BN, DT, NB, IBk, LWL and KStar. The instance-based classifiers take considerably more computational time than the non-instance-based classifiers, whereas the slowest non-instance-based classifier (NB) required about five-times the amount of CPU time as the fastest classifier (SVM). The above results suggest that DT algorithms are excellent candidates for detecting mobility contexts in smartphones, both in terms of performance and computational complexity.

  13. Automated classification of seismic sources in a large database: a comparison of Random Forests and Deep Neural Networks.

    NASA Astrophysics Data System (ADS)

    Hibert, Clement; Stumpf, André; Provost, Floriane; Malet, Jean-Philippe

    2017-04-01

    In the past decades, the increasing quality of seismic sensors and capability to transfer remotely large quantity of data led to a fast densification of local, regional and global seismic networks for near real-time monitoring of crustal and surface processes. This technological advance permits the use of seismology to document geological and natural/anthropogenic processes (volcanoes, ice-calving, landslides, snow and rock avalanches, geothermal fields), but also led to an ever-growing quantity of seismic data. This wealth of seismic data makes the construction of complete seismicity catalogs, which include earthquakes but also other sources of seismic waves, more challenging and very time-consuming as this critical pre-processing stage is classically done by human operators and because hundreds of thousands of seismic signals have to be processed. To overcome this issue, the development of automatic methods for the processing of continuous seismic data appears to be a necessity. The classification algorithm should satisfy the need of a method that is robust, precise and versatile enough to be deployed to monitor the seismicity in very different contexts. In this study, we evaluate the ability of machine learning algorithms for the analysis of seismic sources at the Piton de la Fournaise volcano being Random Forest and Deep Neural Network classifiers. We gather a catalog of more than 20,000 events, belonging to 8 classes of seismic sources. We define 60 attributes, based on the waveform, the frequency content and the polarization of the seismic waves, to parameterize the seismic signals recorded. We show that both algorithms provide similar positive classification rates, with values exceeding 90% of the events. When trained with a sufficient number of events, the rate of positive identification can reach 99%. These very high rates of positive identification open the perspective of an operational implementation of these algorithms for near-real time monitoring of mass movements and other environmental sources at the local, regional and even global scale.

  14. Gait phenotypes in paediatric hereditary spastic paraplegia revealed by dynamic time warping analysis and random forests

    PubMed Central

    Martín-Gonzalo, Juan Andrés; Rodríguez-Andonaegui, Irene; López-López, Javier; Pascual-Pascual, Samuel Ignacio

    2018-01-01

    The Hereditary Spastic Paraplegias (HSP) are a group of heterogeneous disorders with a wide spectrum of underlying neural pathology, and hence HSP patients express a variety of gait abnormalities. Classification of these phenotypes may help in monitoring disease progression and personalizing therapies. This is currently managed by measuring values of some kinematic and spatio-temporal parameters at certain moments during the gait cycle, either in the doctor´s surgery room or after very precise measurements produced by instrumental gait analysis (IGA). These methods, however, do not provide information about the whole structure of the gait cycle. Classification of the similarities among time series of IGA measured values of sagittal joint positions throughout the whole gait cycle can be achieved by hierarchical clustering analysis based on multivariate dynamic time warping (DTW). Random forests can estimate which are the most important isolated parameters to predict the classification revealed by DTW, since clinicians need to refer to them in their daily practice. We acquired time series of pelvic, hip, knee, ankle and forefoot sagittal angular positions from 26 HSP and 33 healthy children with an optokinetic IGA system. DTW revealed six gait patterns with different degrees of impairment of walking speed, cadence and gait cycle distribution and related with patient’s age, sex, GMFCS stage, concurrence of polyneuropathy and abnormal visual evoked potentials or corpus callosum. The most important parameters to differentiate patterns were mean pelvic tilt and hip flexion at initial contact. Longer time of support, decreased values of hip extension and increased knee flexion at initial contact can differentiate the mildest, near to normal HSP gait phenotype and the normal healthy one. Increased values of knee flexion at initial contact and delayed peak of knee flexion are important factors to distinguish GMFCS stages I from II-III and concurrence of polyneuropathy. PMID:29518090

  15. A random forest model based classification scheme for neonatal amplitude-integrated EEG.

    PubMed

    Chen, Weiting; Wang, Yu; Cao, Guitao; Chen, Guoqiang; Gu, Qiufang

    2014-01-01

    Modern medical advances have greatly increased the survival rate of infants, while they remain in the higher risk group for neurological problems later in life. For the infants with encephalopathy or seizures, identification of the extent of brain injury is clinically challenging. Continuous amplitude-integrated electroencephalography (aEEG) monitoring offers a possibility to directly monitor the brain functional state of the newborns over hours, and has seen an increasing application in neonatal intensive care units (NICUs). This paper presents a novel combined feature set of aEEG and applies random forest (RF) method to classify aEEG tracings. To that end, a series of experiments were conducted on 282 aEEG tracing cases (209 normal and 73 abnormal ones). Basic features, statistic features and segmentation features were extracted from both the tracing as a whole and the segmented recordings, and then form a combined feature set. All the features were sent to a classifier afterwards. The significance of feature, the data segmentation, the optimization of RF parameters, and the problem of imbalanced datasets were examined through experiments. Experiments were also done to evaluate the performance of RF on aEEG signal classifying, compared with several other widely used classifiers including SVM-Linear, SVM-RBF, ANN, Decision Tree (DT), Logistic Regression(LR), ML, and LDA. The combined feature set can better characterize aEEG signals, compared with basic features, statistic features and segmentation features respectively. With the combined feature set, the proposed RF-based aEEG classification system achieved a correct rate of 92.52% and a high F1-score of 95.26%. Among all of the seven classifiers examined in our work, the RF method got the highest correct rate, sensitivity, specificity, and F1-score, which means that RF outperforms all of the other classifiers considered here. The results show that the proposed RF-based aEEG classification system with the combined feature set is efficient and helpful to better detect the brain disorders in newborns.

  16. Beyond Where to How: A Machine Learning Approach for Sensing Mobility Contexts Using Smartphone Sensors †

    PubMed Central

    Guinness, Robert E.

    2015-01-01

    This paper presents the results of research on the use of smartphone sensors (namely, GPS and accelerometers), geospatial information (points of interest, such as bus stops and train stations) and machine learning (ML) to sense mobility contexts. Our goal is to develop techniques to continuously and automatically detect a smartphone user's mobility activities, including walking, running, driving and using a bus or train, in real-time or near-real-time (<5 s). We investigated a wide range of supervised learning techniques for classification, including decision trees (DT), support vector machines (SVM), naive Bayes classifiers (NB), Bayesian networks (BN), logistic regression (LR), artificial neural networks (ANN) and several instance-based classifiers (KStar, LWLand IBk). Applying ten-fold cross-validation, the best performers in terms of correct classification rate (i.e., recall) were DT (96.5%), BN (90.9%), LWL (95.5%) and KStar (95.6%). In particular, the DT-algorithm RandomForest exhibited the best overall performance. After a feature selection process for a subset of algorithms, the performance was improved slightly. Furthermore, after tuning the parameters of RandomForest, performance improved to above 97.5%. Lastly, we measured the computational complexity of the classifiers, in terms of central processing unit (CPU) time needed for classification, to provide a rough comparison between the algorithms in terms of battery usage requirements. As a result, the classifiers can be ranked from lowest to highest complexity (i.e., computational cost) as follows: SVM, ANN, LR, BN, DT, NB, IBk, LWL and KStar. The instance-based classifiers take considerably more computational time than the non-instance-based classifiers, whereas the slowest non-instance-based classifier (NB) required about five-times the amount of CPU time as the fastest classifier (SVM). The above results suggest that DT algorithms are excellent candidates for detecting mobility contexts in smartphones, both in terms of performance and computational complexity. PMID:25928060

  17. Validation of optical codes based on 3D nanostructures

    NASA Astrophysics Data System (ADS)

    Carnicer, Artur; Javidi, Bahram

    2017-05-01

    Image information encoding using random phase masks produce speckle-like noise distributions when the sample is propagated in the Fresnel domain. As a result, information cannot be accessed by simple visual inspection. Phase masks can be easily implemented in practice by attaching cello-tape to the plain-text message. Conventional 2D-phase masks can be generalized to 3D by combining glass and diffusers resulting in a more complex, physical unclonable function. In this communication, we model the behavior of a 3D phase mask using a simple approach: light is propagated trough glass using the angular spectrum of plane waves whereas the diffusor is described as a random phase mask and a blurring effect on the amplitude of the propagated wave. Using different designs for the 3D phase mask and multiple samples, we demonstrate that classification is possible using the k-nearest neighbors and random forests machine learning algorithms.

  18. Classifying northern forests using Thematic Mapper Simulator data

    NASA Technical Reports Server (NTRS)

    Nelson, R. F.; Latty, R. S.; Mott, G.

    1984-01-01

    Thematic Mapper Simulator data were collected over a 23,200 hectare forested area near Baxter State Park in north-central Maine. Photointerpreted ground reference information was used to drive a stratified random sampling procedure for waveband discriminant analyses and to generate training statistics and test pixel accuracies. Stepwise discriminant analyses indicated that the following bands best differentiated the thirteen level II - III cover types (in order of entry): near infrared (0.77 to 0.90 micron), blue (0.46 0.52 micron), first middle infrared (1.53 to 1.73 microns), second middle infrared (2.06 to 2.33 microsn), red (0.63 to 0.69 micron), thermal (10.32 to 12.33 microns). Classification accuracies peaked at 58 percent for thirteen level II-III land-cover classes and at 65 percent for ten level II classes.

  19. Classifying bent radio galaxies from a mixture of point-like/extended images with Machine Learning.

    NASA Astrophysics Data System (ADS)

    Bastien, David; Oozeer, Nadeem; Somanah, Radhakrishna

    2017-05-01

    The hypothesis that bent radio sources are supposed to be found in rich, massive galaxy clusters and the avalibility of huge amount of data from radio surveys have fueled our motivation to use Machine Learning (ML) to identify bent radio sources and as such use them as tracers for galaxy clusters. The shapelet analysis allowed us to decompose radio images into 256 features that could be fed into the ML algorithm. Additionally, ideas from the field of neuro-psychology helped us to consider training the machine to identify bent galaxies at different orientations. From our analysis, we found that the Random Forest algorithm was the most effective with an accuracy rate of 92% for a classification of point and extended sources as well as an accuracy of 80% for bent and unbent classification.

  20. Hubble Tarantula Treasury Project - VI. Identification of Pre-Main-Sequence Stars using Machine Learning techniques

    NASA Astrophysics Data System (ADS)

    Ksoll, Victor F.; Gouliermis, Dimitrios A.; Klessen, Ralf S.; Grebel, Eva K.; Sabbi, Elena; Anderson, Jay; Lennon, Daniel J.; Cignoni, Michele; de Marchi, Guido; Smith, Linda J.; Tosi, Monica; van der Marel, Roeland P.

    2018-05-01

    The Hubble Tarantula Treasury Project (HTTP) has provided an unprecedented photometric coverage of the entire star-burst region of 30 Doradus down to the half Solar mass limit. We use the deep stellar catalogue of HTTP to identify all the pre-main-sequence (PMS) stars of the region, i.e., stars that have not started their lives on the main-sequence yet. The photometric distinction of these stars from the more evolved populations is not a trivial task due to several factors that alter their colour-magnitude diagram positions. The identification of PMS stars requires, thus, sophisticated statistical methods. We employ Machine Learning Classification techniques on the HTTP survey of more than 800,000 sources to identify the PMS stellar content of the observed field. Our methodology consists of 1) carefully selecting the most probable low-mass PMS stellar population of the star-forming cluster NGC2070, 2) using this sample to train classification algorithms to build a predictive model for PMS stars, and 3) applying this model in order to identify the most probable PMS content across the entire Tarantula Nebula. We employ Decision Tree, Random Forest and Support Vector Machine classifiers to categorise the stars as PMS and Non-PMS. The Random Forest and Support Vector Machine provided the most accurate models, predicting about 20,000 sources with a candidateship probability higher than 50 percent, and almost 10,000 PMS candidates with a probability higher than 95 percent. This is the richest and most accurate photometric catalogue of extragalactic PMS candidates across the extent of a whole star-forming complex.

  1. Policy Implications and Suggestions on Administrative Measures of Urban Flood

    NASA Astrophysics Data System (ADS)

    Lee, S. V.; Lee, M. J.; Lee, C.; Yoon, J. H.; Chae, S. H.

    2017-12-01

    The frequency and intensity of floods are increasing worldwide as recent climate change progresses gradually. Flood management should be policy-oriented in urban municipalities due to the characteristics of urban areas with a lot of damage. Therefore, the purpose of this study is to prepare a flood susceptibility map by using data mining model and make a policy suggestion on administrative measures of urban flood. Therefore, we constructed a spatial database by collecting relevant factors including the topography, geology, soil and land use data of the representative city, Seoul, the capital city of Korea. Flood susceptibility map was constructed by applying the data mining models of random forest and boosted tree model to input data and existing flooded area data in 2010. The susceptibility map has been validated using the 2011 flood area data which was not used for training. The predictor importance value of each factor to the results was calculated in this process. The distance from the water, DEM and geology showed a high predictor importance value which means to be a high priority for flood preparation policy. As a result of receiver operating characteristic (ROC), random forest model showed 78.78% and 79.18% accuracy of regression and classification and boosted tree model showed 77.55% and 77.26% accuracy of regression and classification, respectively. The results show that the flood susceptibility maps can be applied to flood prevention and management, and it also can help determine the priority areas for flood mitigation policy by providing useful information to policy makers.

  2. Sentinel node status prediction by four statistical models: results from a large bi-institutional series (n = 1132).

    PubMed

    Mocellin, Simone; Thompson, John F; Pasquali, Sandro; Montesco, Maria C; Pilati, Pierluigi; Nitti, Donato; Saw, Robyn P; Scolyer, Richard A; Stretch, Jonathan R; Rossi, Carlo R

    2009-12-01

    To improve selection for sentinel node (SN) biopsy (SNB) in patients with cutaneous melanoma using statistical models predicting SN status. About 80% of patients currently undergoing SNB are node negative. In the absence of conclusive evidence of a SNBassociated survival benefit, these patients may be over-treated. Here, we tested the efficiency of 4 different models in predicting SN status. The clinicopathologic data (age, gender, tumor thickness, Clark level, regression, ulceration, histologic subtype, and mitotic index) of 1132 melanoma patients who had undergone SNB at institutions in Italy and Australia were analyzed. Logistic regression, classification tree, random forest, and support vector machine models were fitted to the data. The predictive models were built with the aim of maximizing the negative predictive value (NPV) and reducing the rate of SNB procedures though minimizing the error rate. After cross-validation logistic regression, classification tree, random forest, and support vector machine predictive models obtained clinically relevant NPV (93.6%, 94.0%, 97.1%, and 93.0%, respectively), SNB reduction (27.5%, 29.8%, 18.2%, and 30.1%, respectively), and error rates (1.8%, 1.8%, 0.5%, and 2.1%, respectively). Using commonly available clinicopathologic variables, predictive models can preoperatively identify a proportion of patients ( approximately 25%) who might be spared SNB, with an acceptable (1%-2%) error. If validated in large prospective series, these models might be implemented in the clinical setting for improved patient selection, which ultimately would lead to better quality of life for patients and optimization of resource allocation for the health care system.

  3. Neighbourhood-Scale Urban Forest Ecosystem Classification

    Treesearch

    James W.N. Steenberg; Andrew A. Millward; Peter N. Duinker; David J. Nowak; Pamela J. Robinson

    2015-01-01

    Urban forests are now recognized as essential components of sustainable cities, but there remains uncertainty concerning how to stratify and classify urban landscapes into units of ecological significance at spatial scales appropriate for management. Ecosystem classification is an approach that entails quantifying the social and ecological processes that shape...

  4. Multivariate classification with random forests for gravitational wave searches of black hole binary coalescence

    NASA Astrophysics Data System (ADS)

    Baker, Paul T.; Caudill, Sarah; Hodge, Kari A.; Talukder, Dipongkar; Capano, Collin; Cornish, Neil J.

    2015-03-01

    Searches for gravitational waves produced by coalescing black hole binaries with total masses ≳25 M⊙ use matched filtering with templates of short duration. Non-Gaussian noise bursts in gravitational wave detector data can mimic short signals and limit the sensitivity of these searches. Previous searches have relied on empirically designed statistics incorporating signal-to-noise ratio and signal-based vetoes to separate gravitational wave candidates from noise candidates. We report on sensitivity improvements achieved using a multivariate candidate ranking statistic derived from a supervised machine learning algorithm. We apply the random forest of bagged decision trees technique to two separate searches in the high mass (≳25 M⊙ ) parameter space. For a search which is sensitive to gravitational waves from the inspiral, merger, and ringdown of binary black holes with total mass between 25 M⊙ and 100 M⊙ , we find sensitive volume improvements as high as 70±13%-109±11% when compared to the previously used ranking statistic. For a ringdown-only search which is sensitive to gravitational waves from the resultant perturbed intermediate mass black hole with mass roughly between 10 M⊙ and 600 M⊙ , we find sensitive volume improvements as high as 61±4%-241±12% when compared to the previously used ranking statistic. We also report how sensitivity improvements can differ depending on mass regime, mass ratio, and available data quality information. Finally, we describe the techniques used to tune and train the random forest classifier that can be generalized to its use in other searches for gravitational waves.

  5. Spectral multi-energy CT texture analysis with machine learning for tissue classification: an investigation using classification of benign parotid tumours as a testing paradigm.

    PubMed

    Al Ajmi, Eiman; Forghani, Behzad; Reinhold, Caroline; Bayat, Maryam; Forghani, Reza

    2018-06-01

    There is a rich amount of quantitative information in spectral datasets generated from dual-energy CT (DECT). In this study, we compare the performance of texture analysis performed on multi-energy datasets to that of virtual monochromatic images (VMIs) at 65 keV only, using classification of the two most common benign parotid neoplasms as a testing paradigm. Forty-two patients with pathologically proven Warthin tumour (n = 25) or pleomorphic adenoma (n = 17) were evaluated. Texture analysis was performed on VMIs ranging from 40 to 140 keV in 5-keV increments (multi-energy analysis) or 65-keV VMIs only, which is typically considered equivalent to single-energy CT. Random forest (RF) models were constructed for outcome prediction using separate randomly selected training and testing sets or the entire patient set. Using multi-energy texture analysis, tumour classification in the independent testing set had accuracy, sensitivity, specificity, positive predictive value, and negative predictive value of 92%, 86%, 100%, 100%, and 83%, compared to 75%, 57%, 100%, 100%, and 63%, respectively, for single-energy analysis. Multi-energy texture analysis demonstrates superior performance compared to single-energy texture analysis of VMIs at 65 keV for classification of benign parotid tumours. • We present and validate a paradigm for texture analysis of DECT scans. • Multi-energy dataset texture analysis is superior to single-energy dataset texture analysis. • DECT texture analysis has high accura\\cy for diagnosis of benign parotid tumours. • DECT texture analysis with machine learning can enhance non-invasive diagnostic tumour evaluation.

  6. Spectral Band Selection for Urban Material Classification Using Hyperspectral Libraries

    NASA Astrophysics Data System (ADS)

    Le Bris, A.; Chehata, N.; Briottet, X.; Paparoditis, N.

    2016-06-01

    In urban areas, information concerning very high resolution land cover and especially material maps are necessary for several city modelling or monitoring applications. That is to say, knowledge concerning the roofing materials or the different kinds of ground areas is required. Airborne remote sensing techniques appear to be convenient for providing such information at a large scale. However, results obtained using most traditional processing methods based on usual red-green-blue-near infrared multispectral images remain limited for such applications. A possible way to improve classification results is to enhance the imagery spectral resolution using superspectral or hyperspectral sensors. In this study, it is intended to design a superspectral sensor dedicated to urban materials classification and this work particularly focused on the selection of the optimal spectral band subsets for such sensor. First, reflectance spectral signatures of urban materials were collected from 7 spectral libraires. Then, spectral optimization was performed using this data set. The band selection workflow included two steps, optimising first the number of spectral bands using an incremental method and then examining several possible optimised band subsets using a stochastic algorithm. The same wrapper relevance criterion relying on a confidence measure of Random Forests classifier was used at both steps. To cope with the limited number of available spectra for several classes, additional synthetic spectra were generated from the collection of reference spectra: intra-class variability was simulated by multiplying reference spectra by a random coefficient. At the end, selected band subsets were evaluated considering the classification quality reached using a rbf svm classifier. It was confirmed that a limited band subset was sufficient to classify common urban materials. The important contribution of bands from the Short Wave Infra-Red (SWIR) spectral domain (1000-2400 nm) to material classification was also shown.

  7. The vegetation of the Grand River/Cedar River, Sioux, and Ashland Districts of the Custer National Forest: a habitat type classification.

    Treesearch

    Paul L. Hansen; George R. Hoffman

    1988-01-01

    A vegetation classification was developed, using the methods and concepts of Daubenmire, on the Ashland, Sioux, and Grand River/Cedar River Districts of the Custer National Forest. Of the 26 habitat types delimited and described, eight were steppe, nine shrub-steppe, four woodland, and five forest. Two community types also were described. A key to the habitat types and...

  8. Physical Human Activity Recognition Using Wearable Sensors.

    PubMed

    Attal, Ferhat; Mohammed, Samer; Dedabrishvili, Mariam; Chamroukhi, Faicel; Oukhellou, Latifa; Amirat, Yacine

    2015-12-11

    This paper presents a review of different classification techniques used to recognize human activities from wearable inertial sensor data. Three inertial sensor units were used in this study and were worn by healthy subjects at key points of upper/lower body limbs (chest, right thigh and left ankle). Three main steps describe the activity recognition process: sensors' placement, data pre-processing and data classification. Four supervised classification techniques namely, k-Nearest Neighbor (k-NN), Support Vector Machines (SVM), Gaussian Mixture Models (GMM), and Random Forest (RF) as well as three unsupervised classification techniques namely, k-Means, Gaussian mixture models (GMM) and Hidden Markov Model (HMM), are compared in terms of correct classification rate, F-measure, recall, precision, and specificity. Raw data and extracted features are used separately as inputs of each classifier. The feature selection is performed using a wrapper approach based on the RF algorithm. Based on our experiments, the results obtained show that the k-NN classifier provides the best performance compared to other supervised classification algorithms, whereas the HMM classifier is the one that gives the best results among unsupervised classification algorithms. This comparison highlights which approach gives better performance in both supervised and unsupervised contexts. It should be noted that the obtained results are limited to the context of this study, which concerns the classification of the main daily living human activities using three wearable accelerometers placed at the chest, right shank and left ankle of the subject.

  9. Deep convolutional neural network training enrichment using multi-view object-based analysis of Unmanned Aerial systems imagery for wetlands classification

    NASA Astrophysics Data System (ADS)

    Liu, Tao; Abd-Elrahman, Amr

    2018-05-01

    Deep convolutional neural network (DCNN) requires massive training datasets to trigger its image classification power, while collecting training samples for remote sensing application is usually an expensive process. When DCNN is simply implemented with traditional object-based image analysis (OBIA) for classification of Unmanned Aerial systems (UAS) orthoimage, its power may be undermined if the number training samples is relatively small. This research aims to develop a novel OBIA classification approach that can take advantage of DCNN by enriching the training dataset automatically using multi-view data. Specifically, this study introduces a Multi-View Object-based classification using Deep convolutional neural network (MODe) method to process UAS images for land cover classification. MODe conducts the classification on multi-view UAS images instead of directly on the orthoimage, and gets the final results via a voting procedure. 10-fold cross validation results show the mean overall classification accuracy increasing substantially from 65.32%, when DCNN was applied on the orthoimage to 82.08% achieved when MODe was implemented. This study also compared the performances of the support vector machine (SVM) and random forest (RF) classifiers with DCNN under traditional OBIA and the proposed multi-view OBIA frameworks. The results indicate that the advantage of DCNN over traditional classifiers in terms of accuracy is more obvious when these classifiers were applied with the proposed multi-view OBIA framework than when these classifiers were applied within the traditional OBIA framework.

  10. Physical Human Activity Recognition Using Wearable Sensors

    PubMed Central

    Attal, Ferhat; Mohammed, Samer; Dedabrishvili, Mariam; Chamroukhi, Faicel; Oukhellou, Latifa; Amirat, Yacine

    2015-01-01

    This paper presents a review of different classification techniques used to recognize human activities from wearable inertial sensor data. Three inertial sensor units were used in this study and were worn by healthy subjects at key points of upper/lower body limbs (chest, right thigh and left ankle). Three main steps describe the activity recognition process: sensors’ placement, data pre-processing and data classification. Four supervised classification techniques namely, k-Nearest Neighbor (k-NN), Support Vector Machines (SVM), Gaussian Mixture Models (GMM), and Random Forest (RF) as well as three unsupervised classification techniques namely, k-Means, Gaussian mixture models (GMM) and Hidden Markov Model (HMM), are compared in terms of correct classification rate, F-measure, recall, precision, and specificity. Raw data and extracted features are used separately as inputs of each classifier. The feature selection is performed using a wrapper approach based on the RF algorithm. Based on our experiments, the results obtained show that the k-NN classifier provides the best performance compared to other supervised classification algorithms, whereas the HMM classifier is the one that gives the best results among unsupervised classification algorithms. This comparison highlights which approach gives better performance in both supervised and unsupervised contexts. It should be noted that the obtained results are limited to the context of this study, which concerns the classification of the main daily living human activities using three wearable accelerometers placed at the chest, right shank and left ankle of the subject. PMID:26690450

  11. Accurate crop classification using hierarchical genetic fuzzy rule-based systems

    NASA Astrophysics Data System (ADS)

    Topaloglou, Charalampos A.; Mylonas, Stelios K.; Stavrakoudis, Dimitris G.; Mastorocostas, Paris A.; Theocharis, John B.

    2014-10-01

    This paper investigates the effectiveness of an advanced classification system for accurate crop classification using very high resolution (VHR) satellite imagery. Specifically, a recently proposed genetic fuzzy rule-based classification system (GFRBCS) is employed, namely, the Hierarchical Rule-based Linguistic Classifier (HiRLiC). HiRLiC's model comprises a small set of simple IF-THEN fuzzy rules, easily interpretable by humans. One of its most important attributes is that its learning algorithm requires minimum user interaction, since the most important learning parameters affecting the classification accuracy are determined by the learning algorithm automatically. HiRLiC is applied in a challenging crop classification task, using a SPOT5 satellite image over an intensively cultivated area in a lake-wetland ecosystem in northern Greece. A rich set of higher-order spectral and textural features is derived from the initial bands of the (pan-sharpened) image, resulting in an input space comprising 119 features. The experimental analysis proves that HiRLiC compares favorably to other interpretable classifiers of the literature, both in terms of structural complexity and classification accuracy. Its testing accuracy was very close to that obtained by complex state-of-the-art classification systems, such as the support vector machines (SVM) and random forest (RF) classifiers. Nevertheless, visual inspection of the derived classification maps shows that HiRLiC is characterized by higher generalization properties, providing more homogeneous classifications that the competitors. Moreover, the runtime requirements for producing the thematic map was orders of magnitude lower than the respective for the competitors.

  12. Data-Driven Lead-Acid Battery Prognostics Using Random Survival Forests

    DTIC Science & Technology

    2014-10-02

    Kogalur, Blackstone , & Lauer, 2008; Ishwaran & Kogalur, 2010). Random survival forest is a sur- vival analysis extension of Random Forests (Breiman, 2001...Statistics & probability letters, 80(13), 1056–1064. Ishwaran, H., Kogalur, U. B., Blackstone , E. H., & Lauer, M. S. (2008). Random survival forests. The

  13. Forest succession on four habitat types in western Montana

    Treesearch

    Stephen F. Arno; Dennis G. Simmerman; Robert E. Keane

    1985-01-01

    Presents classifications of successional community types on four major forest habitat types in western Montana. Classifications show the sequences of seral community types developing after stand-replacing wildfire and clearcutting with broadcast burning, mechanical scarification, or no followup treatment. Information is provided for associating vegetational response to...

  14. Site classification for northern forest species

    Treesearch

    Willard H. Carmean

    1977-01-01

    Summarizes the extensive literature for northern forest species covering site index curves, site index species comparisons, growth intercepts, soil-site studies, plant indicators, physiographic site classifications, and soil survey studies. The advantages and disadvantages of each are discussed, and suggestions are made for future research using each of these methods....

  15. Continuous Change Detection and Classification (CCDC) of Land Cover Using All Available Landsat Data

    NASA Astrophysics Data System (ADS)

    Zhu, Z.; Woodcock, C. E.

    2012-12-01

    A new algorithm for Continuous Change Detection and Classification (CCDC) of land cover using all available Landsat data is developed. This new algorithm is capable of detecting many kinds of land cover change as new images are collected and at the same time provide land cover maps for any given time. To better identify land cover change, a two step cloud, cloud shadow, and snow masking algorithm is used for eliminating "noisy" observations. Next, a time series model that has components of seasonality, trend, and break estimates the surface reflectance and temperature. The time series model is updated continuously with newly acquired observations. Due to the high variability in spectral response for different kinds of land cover change, the CCDC algorithm uses a data-driven threshold derived from all seven Landsat bands. When the difference between observed and predicted exceeds the thresholds three consecutive times, a pixel is identified as land cover change. Land cover classification is done after change detection. Coefficients from the time series models and the Root Mean Square Error (RMSE) from model fitting are used as classification inputs for the Random Forest Classifier (RFC). We applied this new algorithm for one Landsat scene (Path 12 Row 31) that includes all of Rhode Island as well as much of Eastern Massachusetts and parts of Connecticut. A total of 532 Landsat images acquired between 1982 and 2011 were processed. During this period, 619,924 pixels were detected to change once (91% of total changed pixels) and 60,199 pixels were detected to change twice (8% of total changed pixels). The most frequent land cover change category is from mixed forest to low density residential which occupies more than 8% of total land cover change pixels.

  16. Satellite and in situ monitoring data used for modeling of forest vegetation reflectance

    NASA Astrophysics Data System (ADS)

    Zoran, M. A.; Savastru, R. S.; Savastru, D. M.; Miclos, S. I.; Tautan, M. N.; Baschir, L.

    2010-10-01

    As climatic variability and anthropogenic stressors are growing up continuously, must be defined the proper criteria for forest vegetation assessment. In order to characterize current and future state of forest vegetation satellite imagery is a very useful tool. Vegetation can be distinguished using remote sensing data from most other (mainly inorganic) materials by virtue of its notable absorption in the red and blue segments of the visible spectrum, its higher green reflectance and, especially, its very strong reflectance in the near-IR. Vegetation reflectance has variations with sun zenith angle, view zenith angle, and terrain slope angle. To provide corrections of these effects, for visible and near-infrared light, was used a developed a simple physical model of vegetation reflectance, by assuming homogeneous and closed vegetation canopy with randomly oriented leaves. A simple physical model of forest vegetation reflectance was applied and validated for Cernica forested area, near Bucharest town through two ASTER satellite data , acquired within minutes from one another ,a nadir and off-nadir for band 3 lying in the near infra red, most radiance differences between the two scenes can be attributed to the BRDF effect. Other satellite data MODIS, Landsat TM and ETM as well as, IKONOS have been used for different NDVI and classification analysis.

  17. Landscape risk factors for Lyme disease in the eastern broadleaf forest province of the Hudson River valley and the effect of explanatory data classification resolution.

    PubMed

    Messier, Kyle P; Jackson, Laura E; White, Jennifer L; Hilborn, Elizabeth D

    2015-01-01

    This study assessed how landcover classification affects associations between landscape characteristics and Lyme disease rate. Landscape variables were derived from the National Land Cover Database (NLCD), including native classes (e.g., deciduous forest, developed low intensity) and aggregate classes (e.g., forest, developed). Percent of each landcover type, median income, and centroid coordinates were calculated by census tract. Regression results from individual and aggregate variable models were compared with the dispersion parameter-based R(2) (Rα(2)) and AIC. The maximum Rα(2) was 0.82 and 0.83 for the best aggregate and individual model, respectively. The AICs for the best models differed by less than 0.5%. The aggregate model variables included forest, developed, agriculture, agriculture-squared, y-coordinate, y-coordinate-squared, income and income-squared. The individual model variables included deciduous forest, deciduous forest-squared, developed low intensity, pasture, y-coordinate, y-coordinate-squared, income, and income-squared. Results indicate that regional landscape models for Lyme disease rate are robust to NLCD landcover classification resolution. Published by Elsevier Ltd.

  18. Remote sensing change detection methods to track deforestation and growth in threatened rainforests in Madre de Dios, Peru

    USGS Publications Warehouse

    Shermeyer, Jacob S.; Haack, Barry N.

    2015-01-01

    Two forestry-change detection methods are described, compared, and contrasted for estimating deforestation and growth in threatened forests in southern Peru from 2000 to 2010. The methods used in this study rely on freely available data, including atmospherically corrected Landsat 5 Thematic Mapper and Moderate Resolution Imaging Spectroradiometer (MODIS) vegetation continuous fields (VCF). The two methods include a conventional supervised signature extraction method and a unique self-calibrating method called MODIS VCF guided forest/nonforest (FNF) masking. The process chain for each of these methods includes a threshold classification of MODIS VCF, training data or signature extraction, signature evaluation, k-nearest neighbor classification, analyst-guided reclassification, and postclassification image differencing to generate forest change maps. Comparisons of all methods were based on an accuracy assessment using 500 validation pixels. Results of this accuracy assessment indicate that FNF masking had a 5% higher overall accuracy and was superior to conventional supervised classification when estimating forest change. Both methods succeeded in classifying persistently forested and nonforested areas, and both had limitations when classifying forest change.

  19. A tale of two "forests": random forest machine learning AIDS tropical forest carbon mapping.

    PubMed

    Mascaro, Joseph; Asner, Gregory P; Knapp, David E; Kennedy-Bowdoin, Ty; Martin, Roberta E; Anderson, Christopher; Higgins, Mark; Chadwick, K Dana

    2014-01-01

    Accurate and spatially-explicit maps of tropical forest carbon stocks are needed to implement carbon offset mechanisms such as REDD+ (Reduced Deforestation and Degradation Plus). The Random Forest machine learning algorithm may aid carbon mapping applications using remotely-sensed data. However, Random Forest has never been compared to traditional and potentially more reliable techniques such as regionally stratified sampling and upscaling, and it has rarely been employed with spatial data. Here, we evaluated the performance of Random Forest in upscaling airborne LiDAR (Light Detection and Ranging)-based carbon estimates compared to the stratification approach over a 16-million hectare focal area of the Western Amazon. We considered two runs of Random Forest, both with and without spatial contextual modeling by including--in the latter case--x, and y position directly in the model. In each case, we set aside 8 million hectares (i.e., half of the focal area) for validation; this rigorous test of Random Forest went above and beyond the internal validation normally compiled by the algorithm (i.e., called "out-of-bag"), which proved insufficient for this spatial application. In this heterogeneous region of Northern Peru, the model with spatial context was the best preforming run of Random Forest, and explained 59% of LiDAR-based carbon estimates within the validation area, compared to 37% for stratification or 43% by Random Forest without spatial context. With the 60% improvement in explained variation, RMSE against validation LiDAR samples improved from 33 to 26 Mg C ha(-1) when using Random Forest with spatial context. Our results suggest that spatial context should be considered when using Random Forest, and that doing so may result in substantially improved carbon stock modeling for purposes of climate change mitigation.

  20. First steps towards a novel European forest fuel classification systems and a European forest fuel map

    NASA Astrophysics Data System (ADS)

    Sebastián-López, Ana; Urbieta, Itziar R.; de La Fuente Blanco, David; García Mateo, Rubén.; Moreno Rodríguez, José Manuel; Eftichidis, George; Varela, Vassiliki; Cesari, Véronique; Mário Ribeiro, Luís.; Viegas, Domingos Xavier; Lanorte, Antonio; Lasaponara, Rosa; Camia, Andrea; San Miguel, Jesús

    2010-05-01

    Forest fires burn at the local scale, but their massive occurrence causes effects which have global dimensions. Furthermore climate change projections associate global warming to a significant increase in forest fire activity. Warmer and drier conditions are expected to increase the frequency, duration and intensity of fires, and greater amounts of fuel associated with forest areas in decline may cause more frequent and larger fires. These facts create the need for establishing strategies for harmonizing fire danger rating, fire risk assessment, and fire prevention policies at a supranational level. Albeit forest fires are a permanent threat for European ecosystems, particularly in the south, there is no commonly accepted fuel classification scheme adopted for operational use by the Member States of the EU. The European Commission (EC) DG Environment and JRC have launched a set of studies following a resolution of the European Parliament on the further development and enhancement of the European Forest Fire Information System (EFFIS), the EC focal point for information on forest fires in Europe. One of the studies that are being funded is the FUELMAP project. The objective of FUELMAP is to develop a novel fuel classification system and a new European fuel map that will be based on a comprehensive classification of fuel complexes representing the various vegetation types across EU27, plus Switzerland, Croatia and Turkey. The overall work plan is grounded on a throughout knowledge of European forest landscapes and the key features of fuel situations occurring in natural areas. The method makes extended use of existing databases available in the Member States and European Institutions. Specifically, our proposed classification combines relevant information on ecoregions, land cover and uses, potential and actual vegetation, and stand structure. GIS techniques are used in order to define the geographic extent of the classification units and for identifying the main driving factors that determine the spatial distribution of the resulting fuel complexes. Furthermore, relevant parameters influencing fire potential and effects such as fuel load, live/dead ratio, and fuels' size classes' distribution are considered. National- and local-scale datasets (vegetation maps, forest inventory plots, fuel maps...) will be also studied and compared. Local ground- truth data will be used to assess the accuracy of the classification and will contribute, along with literature values and experts' opinion, to characterize the fuels' physical properties. The resulting classification aims to support the characterization of the fire potential, serve as input in fire emissions models, and be used to assess the expected impact of fire in the European landscapes. The work plan includes the development of a GIS software tool to automatically update the fuel map from modified (up-to-date) input data layers. The fuel map of Europe is mainly intended to support the implementation of the EFFIS modules that can be enhanced by the use of improved information on forest fuel properties and spatial distribution, though it is also envisaged that the results of the project might be useful for other relevant applications at different spatial scales. To this purpose, the classification will be designed with a hierarchical and flexible structure for describing heterogeneous landscapes. The work is on-going and this presentation shows the first results towards the envisaged European fuel map.

  1. Improving ensemble decision tree performance using Adaboost and Bagging

    NASA Astrophysics Data System (ADS)

    Hasan, Md. Rajib; Siraj, Fadzilah; Sainin, Mohd Shamrie

    2015-12-01

    Ensemble classifier systems are considered as one of the most promising in medical data classification and the performance of deceision tree classifier can be increased by the ensemble method as it is proven to be better than single classifiers. However, in a ensemble settings the performance depends on the selection of suitable base classifier. This research employed two prominent esemble s namely Adaboost and Bagging with base classifiers such as Random Forest, Random Tree, j48, j48grafts and Logistic Model Regression (LMT) that have been selected independently. The empirical study shows that the performance varries when different base classifiers are selected and even some places overfitting issue also been noted. The evidence shows that ensemble decision tree classfiers using Adaboost and Bagging improves the performance of selected medical data sets.

  2. Multi-Feature Classification of Multi-Sensor Satellite Imagery Based on Dual-Polarimetric Sentinel-1A, Landsat-8 OLI, and Hyperion Images for Urban Land-Cover Classification.

    PubMed

    Zhou, Tao; Li, Zhaofu; Pan, Jianjun

    2018-01-27

    This paper focuses on evaluating the ability and contribution of using backscatter intensity, texture, coherence, and color features extracted from Sentinel-1A data for urban land cover classification and comparing different multi-sensor land cover mapping methods to improve classification accuracy. Both Landsat-8 OLI and Hyperion images were also acquired, in combination with Sentinel-1A data, to explore the potential of different multi-sensor urban land cover mapping methods to improve classification accuracy. The classification was performed using a random forest (RF) method. The results showed that the optimal window size of the combination of all texture features was 9 × 9, and the optimal window size was different for each individual texture feature. For the four different feature types, the texture features contributed the most to the classification, followed by the coherence and backscatter intensity features; and the color features had the least impact on the urban land cover classification. Satisfactory classification results can be obtained using only the combination of texture and coherence features, with an overall accuracy up to 91.55% and a kappa coefficient up to 0.8935, respectively. Among all combinations of Sentinel-1A-derived features, the combination of the four features had the best classification result. Multi-sensor urban land cover mapping obtained higher classification accuracy. The combination of Sentinel-1A and Hyperion data achieved higher classification accuracy compared to the combination of Sentinel-1A and Landsat-8 OLI images, with an overall accuracy of up to 99.12% and a kappa coefficient up to 0.9889. When Sentinel-1A data was added to Hyperion images, the overall accuracy and kappa coefficient were increased by 4.01% and 0.0519, respectively.

  3. Automated classification of bone marrow cells in microscopic images for diagnosis of leukemia: a comparison of two classification schemes with respect to the segmentation quality

    NASA Astrophysics Data System (ADS)

    Krappe, Sebastian; Benz, Michaela; Wittenberg, Thomas; Haferlach, Torsten; Münzenmayer, Christian

    2015-03-01

    The morphological analysis of bone marrow smears is fundamental for the diagnosis of leukemia. Currently, the counting and classification of the different types of bone marrow cells is done manually with the use of bright field microscope. This is a time consuming, partly subjective and tedious process. Furthermore, repeated examinations of a slide yield intra- and inter-observer variances. For this reason an automation of morphological bone marrow analysis is pursued. This analysis comprises several steps: image acquisition and smear detection, cell localization and segmentation, feature extraction and cell classification. The automated classification of bone marrow cells is depending on the automated cell segmentation and the choice of adequate features extracted from different parts of the cell. In this work we focus on the evaluation of support vector machines (SVMs) and random forests (RFs) for the differentiation of bone marrow cells in 16 different classes, including immature and abnormal cell classes. Data sets of different segmentation quality are used to test the two approaches. Automated solutions for the morphological analysis for bone marrow smears could use such a classifier to pre-classify bone marrow cells and thereby shortening the examination duration.

  4. Detection of Periodic Leg Movements by Machine Learning Methods Using Polysomnographic Parameters Other Than Leg Electromyography

    PubMed Central

    Umut, İlhan; Çentik, Güven

    2016-01-01

    The number of channels used for polysomnographic recording frequently causes difficulties for patients because of the many cables connected. Also, it increases the risk of having troubles during recording process and increases the storage volume. In this study, it is intended to detect periodic leg movement (PLM) in sleep with the use of the channels except leg electromyography (EMG) by analysing polysomnography (PSG) data with digital signal processing (DSP) and machine learning methods. PSG records of 153 patients of different ages and genders with PLM disorder diagnosis were examined retrospectively. A novel software was developed for the analysis of PSG records. The software utilizes the machine learning algorithms, statistical methods, and DSP methods. In order to classify PLM, popular machine learning methods (multilayer perceptron, K-nearest neighbour, and random forests) and logistic regression were used. Comparison of classified results showed that while K-nearest neighbour classification algorithm had higher average classification rate (91.87%) and lower average classification error value (RMSE = 0.2850), multilayer perceptron algorithm had the lowest average classification rate (83.29%) and the highest average classification error value (RMSE = 0.3705). Results showed that PLM can be classified with high accuracy (91.87%) without leg EMG record being present. PMID:27213008

  5. Detection of Periodic Leg Movements by Machine Learning Methods Using Polysomnographic Parameters Other Than Leg Electromyography.

    PubMed

    Umut, İlhan; Çentik, Güven

    2016-01-01

    The number of channels used for polysomnographic recording frequently causes difficulties for patients because of the many cables connected. Also, it increases the risk of having troubles during recording process and increases the storage volume. In this study, it is intended to detect periodic leg movement (PLM) in sleep with the use of the channels except leg electromyography (EMG) by analysing polysomnography (PSG) data with digital signal processing (DSP) and machine learning methods. PSG records of 153 patients of different ages and genders with PLM disorder diagnosis were examined retrospectively. A novel software was developed for the analysis of PSG records. The software utilizes the machine learning algorithms, statistical methods, and DSP methods. In order to classify PLM, popular machine learning methods (multilayer perceptron, K-nearest neighbour, and random forests) and logistic regression were used. Comparison of classified results showed that while K-nearest neighbour classification algorithm had higher average classification rate (91.87%) and lower average classification error value (RMSE = 0.2850), multilayer perceptron algorithm had the lowest average classification rate (83.29%) and the highest average classification error value (RMSE = 0.3705). Results showed that PLM can be classified with high accuracy (91.87%) without leg EMG record being present.

  6. Spectral-spatial classification of hyperspectral data with mutual information based segmented stacked autoencoder approach

    NASA Astrophysics Data System (ADS)

    Paul, Subir; Nagesh Kumar, D.

    2018-04-01

    Hyperspectral (HS) data comprises of continuous spectral responses of hundreds of narrow spectral bands with very fine spectral resolution or bandwidth, which offer feature identification and classification with high accuracy. In the present study, Mutual Information (MI) based Segmented Stacked Autoencoder (S-SAE) approach for spectral-spatial classification of the HS data is proposed to reduce the complexity and computational time compared to Stacked Autoencoder (SAE) based feature extraction. A non-parametric dependency measure (MI) based spectral segmentation is proposed instead of linear and parametric dependency measure to take care of both linear and nonlinear inter-band dependency for spectral segmentation of the HS bands. Then morphological profiles are created corresponding to segmented spectral features to assimilate the spatial information in the spectral-spatial classification approach. Two non-parametric classifiers, Support Vector Machine (SVM) with Gaussian kernel and Random Forest (RF) are used for classification of the three most popularly used HS datasets. Results of the numerical experiments carried out in this study have shown that SVM with a Gaussian kernel is providing better results for the Pavia University and Botswana datasets whereas RF is performing better for Indian Pines dataset. The experiments performed with the proposed methodology provide encouraging results compared to numerous existing approaches.

  7. A Practical and Automated Approach to Large Area Forest Disturbance Mapping with Remote Sensing

    PubMed Central

    Ozdogan, Mutlu

    2014-01-01

    In this paper, I describe a set of procedures that automate forest disturbance mapping using a pair of Landsat images. The approach is built on the traditional pair-wise change detection method, but is designed to extract training data without user interaction and uses a robust classification algorithm capable of handling incorrectly labeled training data. The steps in this procedure include: i) creating masks for water, non-forested areas, clouds, and cloud shadows; ii) identifying training pixels whose value is above or below a threshold defined by the number of standard deviations from the mean value of the histograms generated from local windows in the short-wave infrared (SWIR) difference image; iii) filtering the original training data through a number of classification algorithms using an n-fold cross validation to eliminate mislabeled training samples; and finally, iv) mapping forest disturbance using a supervised classification algorithm. When applied to 17 Landsat footprints across the U.S. at five-year intervals between 1985 and 2010, the proposed approach produced forest disturbance maps with 80 to 95% overall accuracy, comparable to those obtained from traditional approaches to forest change detection. The primary sources of mis-classification errors included inaccurate identification of forests (errors of commission), issues related to the land/water mask, and clouds and cloud shadows missed during image screening. The approach requires images from the peak growing season, at least for the deciduous forest sites, and cannot readily distinguish forest harvest from natural disturbances or other types of land cover change. The accuracy of detecting forest disturbance diminishes with the number of years between the images that make up the image pair. Nevertheless, the relatively high accuracies, little or no user input needed for processing, speed of map production, and simplicity of the approach make the new method especially practical for forest cover change analysis over very large regions. PMID:24717283

  8. A practical and automated approach to large area forest disturbance mapping with remote sensing.

    PubMed

    Ozdogan, Mutlu

    2014-01-01

    In this paper, I describe a set of procedures that automate forest disturbance mapping using a pair of Landsat images. The approach is built on the traditional pair-wise change detection method, but is designed to extract training data without user interaction and uses a robust classification algorithm capable of handling incorrectly labeled training data. The steps in this procedure include: i) creating masks for water, non-forested areas, clouds, and cloud shadows; ii) identifying training pixels whose value is above or below a threshold defined by the number of standard deviations from the mean value of the histograms generated from local windows in the short-wave infrared (SWIR) difference image; iii) filtering the original training data through a number of classification algorithms using an n-fold cross validation to eliminate mislabeled training samples; and finally, iv) mapping forest disturbance using a supervised classification algorithm. When applied to 17 Landsat footprints across the U.S. at five-year intervals between 1985 and 2010, the proposed approach produced forest disturbance maps with 80 to 95% overall accuracy, comparable to those obtained from traditional approaches to forest change detection. The primary sources of mis-classification errors included inaccurate identification of forests (errors of commission), issues related to the land/water mask, and clouds and cloud shadows missed during image screening. The approach requires images from the peak growing season, at least for the deciduous forest sites, and cannot readily distinguish forest harvest from natural disturbances or other types of land cover change. The accuracy of detecting forest disturbance diminishes with the number of years between the images that make up the image pair. Nevertheless, the relatively high accuracies, little or no user input needed for processing, speed of map production, and simplicity of the approach make the new method especially practical for forest cover change analysis over very large regions.

  9. Land cover and forest formation distributions for St. Kitts, Nevis, St. Eustatius, Grenada and Barbados from decision tree classification of cloud-cleared satellite imagery. Caribbean Journal of Science. 44(2):175-198.

    Treesearch

    E.H. Helmer; T.A. Kennaway; D.H. Pedreros; M.L. Clark; H. Marcano-Vega; L.L. Tieszen; S.R. Schill; C.M.S. Carrington

    2008-01-01

    Satellite image-based mapping of tropical forests is vital to conservation planning. Standard methods for automated image classification, however, limit classification detail in complex tropical landscapes. In this study, we test an approach to Landsat image interpretation on four islands of the Lesser Antilles, including Grenada and St. Kitts, Nevis and St. Eustatius...

  10. Variable Selection for Road Segmentation in Aerial Images

    NASA Astrophysics Data System (ADS)

    Warnke, S.; Bulatov, D.

    2017-05-01

    For extraction of road pixels from combined image and elevation data, Wegner et al. (2015) proposed classification of superpixels into road and non-road, after which a refinement of the classification results using minimum cost paths and non-local optimization methods took place. We believed that the variable set used for classification was to a certain extent suboptimal, because many variables were redundant while several features known as useful in Photogrammetry and Remote Sensing are missed. This motivated us to implement a variable selection approach which builds a model for classification using portions of training data and subsets of features, evaluates this model, updates the feature set, and terminates when a stopping criterion is satisfied. The choice of classifier is flexible; however, we tested the approach with Logistic Regression and Random Forests, and taylored the evaluation module to the chosen classifier. To guarantee a fair comparison, we kept the segment-based approach and most of the variables from the related work, but we extended them by additional, mostly higher-level features. Applying these superior features, removing the redundant ones, as well as using more accurately acquired 3D data allowed to keep stable or even to reduce the misclassification error in a challenging dataset.

  11. An Evaluation of Feature Learning Methods for High Resolution Image Classification

    NASA Astrophysics Data System (ADS)

    Tokarczyk, P.; Montoya, J.; Schindler, K.

    2012-07-01

    Automatic image classification is one of the fundamental problems of remote sensing research. The classification problem is even more challenging in high-resolution images of urban areas, where the objects are small and heterogeneous. Two questions arise, namely which features to extract from the raw sensor data to capture the local radiometry and image structure at each pixel or segment, and which classification method to apply to the feature vectors. While classifiers are nowadays well understood, selecting the right features remains a largely empirical process. Here we concentrate on the features. Several methods are evaluated which allow one to learn suitable features from unlabelled image data by analysing the image statistics. In a comparative study, we evaluate unsupervised feature learning with different linear and non-linear learning methods, including principal component analysis (PCA) and deep belief networks (DBN). We also compare these automatically learned features with popular choices of ad-hoc features including raw intensity values, standard combinations like the NDVI, a few PCA channels, and texture filters. The comparison is done in a unified framework using the same images, the target classes, reference data and a Random Forest classifier.

  12. Classification of kidney and liver tissue using ultrasound backscatter data

    NASA Astrophysics Data System (ADS)

    Aalamifar, Fereshteh; Rivaz, Hassan; Cerrolaza, Juan J.; Jago, James; Safdar, Nabile; Boctor, Emad M.; Linguraru, Marius G.

    2015-03-01

    Ultrasound (US) tissue characterization provides valuable information for the initialization of automatic segmentation algorithms, and can further provide complementary information for diagnosis of pathologies. US tissue characterization is challenging due to the presence of various types of image artifacts and dependence on the sonographer's skills. One way of overcoming this challenge is by characterizing images based on the distribution of the backscatter data derived from the interaction between US waves and tissue. The goal of this work is to classify liver versus kidney tissue in 3D volumetric US data using the distribution of backscatter US data recovered from end-user displayed Bmode image available in clinical systems. To this end, we first propose the computation of a large set of features based on the homodyned-K distribution of the speckle as well as the correlation coefficients between small patches in 3D images. We then utilize the random forests framework to select the most important features for classification. Experiments on in-vivo 3D US data from nine pediatric patients with hydronephrosis showed an average accuracy of 94% for the classification of liver and kidney tissues showing a good potential of this work to assist in the classification and segmentation of abdominal soft tissue.

  13. Stacked sparse autoencoder in hyperspectral data classification using spectral-spatial, higher order statistics and multifractal spectrum features

    NASA Astrophysics Data System (ADS)

    Wan, Xiaoqing; Zhao, Chunhui; Wang, Yanchun; Liu, Wu

    2017-11-01

    This paper proposes a novel classification paradigm for hyperspectral image (HSI) using feature-level fusion and deep learning-based methodologies. Operation is carried out in three main steps. First, during a pre-processing stage, wave atoms are introduced into bilateral filter to smooth HSI, and this strategy can effectively attenuate noise and restore texture information. Meanwhile, high quality spectral-spatial features can be extracted from HSI by taking geometric closeness and photometric similarity among pixels into consideration simultaneously. Second, higher order statistics techniques are firstly introduced into hyperspectral data classification to characterize the phase correlations of spectral curves. Third, multifractal spectrum features are extracted to characterize the singularities and self-similarities of spectra shapes. To this end, a feature-level fusion is applied to the extracted spectral-spatial features along with higher order statistics and multifractal spectrum features. Finally, stacked sparse autoencoder is utilized to learn more abstract and invariant high-level features from the multiple feature sets, and then random forest classifier is employed to perform supervised fine-tuning and classification. Experimental results on two real hyperspectral data sets demonstrate that the proposed method outperforms some traditional alternatives.

  14. Phenotyping: Using Machine Learning for Improved Pairwise Genotype Classification Based on Root Traits

    PubMed Central

    Zhao, Jiangsan; Bodner, Gernot; Rewald, Boris

    2016-01-01

    Phenotyping local crop cultivars is becoming more and more important, as they are an important genetic source for breeding – especially in regard to inherent root system architectures. Machine learning algorithms are promising tools to assist in the analysis of complex data sets; novel approaches are need to apply them on root phenotyping data of mature plants. A greenhouse experiment was conducted in large, sand-filled columns to differentiate 16 European Pisum sativum cultivars based on 36 manually derived root traits. Through combining random forest and support vector machine models, machine learning algorithms were successfully used for unbiased identification of most distinguishing root traits and subsequent pairwise cultivar differentiation. Up to 86% of pea cultivar pairs could be distinguished based on top five important root traits (Timp5) – Timp5 differed widely between cultivar pairs. Selecting top important root traits (Timp) provided a significant improved classification compared to using all available traits or randomly selected trait sets. The most frequent Timp of mature pea cultivars was total surface area of lateral roots originating from tap root segments at 0–5 cm depth. The high classification rate implies that culturing did not lead to a major loss of variability in root system architecture in the studied pea cultivars. Our results illustrate the potential of machine learning approaches for unbiased (root) trait selection and cultivar classification based on rather small, complex phenotypic data sets derived from pot experiments. Powerful statistical approaches are essential to make use of the increasing amount of (root) phenotyping information, integrating the complex trait sets describing crop cultivars. PMID:27999587

  15. A Noise-Assisted Data Analysis Method for Automatic EOG-Based Sleep Stage Classification Using Ensemble Learning.

    PubMed

    Olesen, Alexander Neergaard; Christensen, Julie A E; Sorensen, Helge B D; Jennum, Poul J

    2016-08-01

    Reducing the number of recording modalities for sleep staging research can benefit both researchers and patients, under the condition that they provide as accurate results as conventional systems. This paper investigates the possibility of exploiting the multisource nature of the electrooculography (EOG) signals by presenting a method for automatic sleep staging using the complete ensemble empirical mode decomposition with adaptive noise algorithm, and a random forest classifier. It achieves a high overall accuracy of 82% and a Cohen's kappa of 0.74 indicating substantial agreement between automatic and manual scoring.

  16. The creation of digital thematic soil maps at the regional level (with the map of soil carbon pools in the Usa River basin as an example)

    NASA Astrophysics Data System (ADS)

    Pastukhov, A. V.; Kaverin, D. A.; Shchanov, V. M.

    2016-09-01

    A digital map of soil carbon pools was created for the forest-tundra ecotone in the Usa River basin with the use of ERDAS Imagine 2014 and ArcGIS 10.2 software. Supervised classification and thematic interpretation of satellite images and digital terrain models with the use of a georeferenced database on soil profiles were applied. Expert assessment of the natural diversity and representativeness of random samples for different soil groups was performed, and the minimal necessary size of the statistical sample was determined.

  17. A Tale of Two “Forests”: Random Forest Machine Learning Aids Tropical Forest Carbon Mapping

    PubMed Central

    Mascaro, Joseph; Asner, Gregory P.; Knapp, David E.; Kennedy-Bowdoin, Ty; Martin, Roberta E.; Anderson, Christopher; Higgins, Mark; Chadwick, K. Dana

    2014-01-01

    Accurate and spatially-explicit maps of tropical forest carbon stocks are needed to implement carbon offset mechanisms such as REDD+ (Reduced Deforestation and Degradation Plus). The Random Forest machine learning algorithm may aid carbon mapping applications using remotely-sensed data. However, Random Forest has never been compared to traditional and potentially more reliable techniques such as regionally stratified sampling and upscaling, and it has rarely been employed with spatial data. Here, we evaluated the performance of Random Forest in upscaling airborne LiDAR (Light Detection and Ranging)-based carbon estimates compared to the stratification approach over a 16-million hectare focal area of the Western Amazon. We considered two runs of Random Forest, both with and without spatial contextual modeling by including—in the latter case—x, and y position directly in the model. In each case, we set aside 8 million hectares (i.e., half of the focal area) for validation; this rigorous test of Random Forest went above and beyond the internal validation normally compiled by the algorithm (i.e., called “out-of-bag”), which proved insufficient for this spatial application. In this heterogeneous region of Northern Peru, the model with spatial context was the best preforming run of Random Forest, and explained 59% of LiDAR-based carbon estimates within the validation area, compared to 37% for stratification or 43% by Random Forest without spatial context. With the 60% improvement in explained variation, RMSE against validation LiDAR samples improved from 33 to 26 Mg C ha−1 when using Random Forest with spatial context. Our results suggest that spatial context should be considered when using Random Forest, and that doing so may result in substantially improved carbon stock modeling for purposes of climate change mitigation. PMID:24489686

  18. Producing landslide susceptibility maps by utilizing machine learning methods. The case of Finikas catchment basin, North Peloponnese, Greece.

    NASA Astrophysics Data System (ADS)

    Tsangaratos, Paraskevas; Ilia, Ioanna; Loupasakis, Constantinos; Papadakis, Michalis; Karimalis, Antonios

    2017-04-01

    The main objective of the present study was to apply two machine learning methods for the production of a landslide susceptibility map in the Finikas catchment basin, located in North Peloponnese, Greece and to compare their results. Specifically, Logistic Regression and Random Forest were utilized, based on a database of 40 sites classified into two categories, non-landslide and landslide areas that were separated into a training dataset (70% of the total data) and a validation dataset (remaining 30%). The identification of the areas was established by analyzing airborne imagery, extensive field investigation and the examination of previous research studies. Six landslide related variables were analyzed, namely: lithology, elevation, slope, aspect, distance to rivers and distance to faults. Within the Finikas catchment basin most of the reported landslides were located along the road network and within the residential complexes, classified as rotational and translational slides, and rockfalls, mainly caused due to the physical conditions and the general geotechnical behavior of the geological formation that cover the area. Each landslide susceptibility map was reclassified by applying the Geometric Interval classification technique into five classes, namely: very low susceptibility, low susceptibility, moderate susceptibility, high susceptibility, and very high susceptibility. The comparison and validation of the outcomes of each model were achieved using statistical evaluation measures, the receiving operating characteristic and the area under the success and predictive rate curves. The computation process was carried out using RStudio an integrated development environment for R language and ArcGIS 10.1 for compiling the data and producing the landslide susceptibility maps. From the outcomes of the Logistic Regression analysis it was induced that the highest b coefficient is allocated to lithology and slope, which was 2.8423 and 1.5841, respectively. From the estimation of the mean decrease in Gini coefficient performed during the application of Random Forest and the mean decrease in accuracy the most important variable is slope followed by lithology, aspect, elevation, distance from river network, and distance from faults, while the most used variables during the training phase were the variable aspect (21.45%), slope (20.53%) and lithology (19.84%). The outcomes of the analysis are consistent with previous studies concerning the area of research, which have indicated the high influence of lithology and slope in the manifestation of landslides. High percentage of landslide occurrence has been observed in Plio-Pleistocene sediments, flysch formations, and Cretaceous limestone. Also the presences of landslides have been associated with the degree of weathering and fragmentation, the orientation of the discontinuities surfaces and the intense morphological relief. The most accurate model was Random Forest which identified correctly 92.00% of the instances during the training phase, followed by the Logistic Regression 89.00%. The same pattern of accuracy was calculated during the validation phase, in which the Random Forest achieved a classification accuracy of 93.00%, while the Logistic Regression model achieved an accuracy of 91.00%. In conclusion, the outcomes of the study could be a useful cartographic product to local authorities and government agencies during the implementation of successful decision-making and land use planning strategies. Keywords: Landslide Susceptibility, Logistic Regression, Random Forest, GIS, Greece.

  19. Classification and evaluation for forest sites on the Southern Cumberland Plateau

    Treesearch

    Glendon W. Smalley

    1979-01-01

    This paper presents a comprehensive forest site classification system for the southern portion of the Cumberland Plateau in northern Alabama, northwest Georgia, and extreme south-central Tennessee. The system is based on physiography, geology, soils, topography, and vegetation. Twenty-one landtypes are described, and each landtype is evaluated in terms of productivity...

  20. Landscape Risk Factors for Lyme Disease in the Eastern Broadleaf Forest Province of the Hudson River Valley and the Effect of Explanatory Data Classification Resolution

    EPA Science Inventory

    This study assessed how landcover classification affects associations between landscape characteristics and Lyme disease rate. Landscape variables were derived from the National Land Cover Database (NLCD), including native classes (e.g., deciduous forest, developed low intensity)...

  1. A Quality Classification System for Young Hardwood Trees - The First Step in Predicting Future Products

    Treesearch

    David L. Sonderman; Robert L. Brisbin

    1978-01-01

    Forest managers have no objective way to determine the relative value of culturally treated forest stands in terms of product potential. This paper describes the first step in the development of a quality classification system based on the measurement of individual tree characteristics for young hardwood stands.

  2. Statistical inference for remote sensing-based estimates of net deforestation

    Treesearch

    Ronald E. McRoberts; Brian F. Walters

    2012-01-01

    Statistical inference requires expression of an estimate in probabilistic terms, usually in the form of a confidence interval. An approach to constructing confidence intervals for remote sensing-based estimates of net deforestation is illustrated. The approach is based on post-classification methods using two independent forest/non-forest classifications because...

  3. Plant community classification for alpine vegetation on the Beaverhead National Forest, Montana

    Treesearch

    Stephen V. Cooper; Peter Lesica; Deborah Page-Dumroese

    1997-01-01

    Vegetation of the alpine zone of eight mountain ranges in southwestern Montana was classified using IWINSPAN, DECORAN, and STRATA-algorithms embedded within the U.S. Forest Service Northern Region's ECADS (ecological classification and description system) program. Quantitative estimates of vegetation and soil attributes were sampled from 138 plots. Vegetation...

  4. Forest site classification for cultural plant harvest by tribal weavers can inform management

    Treesearch

    S. Hummel; F.K. Lake

    2015-01-01

    Do qualitative classifications of ecological conditions for harvesting culturally important forest plants correspond to quantitative differences among sites? To address this question, we blended scientific methods (SEK) and traditional ecological knowledge (TEK) to identify conditions on sites considered good, marginal, or poor for harvesting the leaves of a plant (...

  5. Forest statistics for Arkansas counties - 1988

    Treesearch

    F. Dee Hines; John S. Vissage

    1988-01-01

    The tables and figures in this report were derived from data obtained during a recent inventory of the five survey regions and 75 counties in the State of Arkansas (fig. I), Data on forest-nonforest classification using aerial photographs was accomplished for points representing approximately 230 acres. These photo classifications were adjusted based on ground...

  6. Using an Ecological Land Hierarchy to Predict Seasonal-Wetland Abundance in Upland Forests

    Treesearch

    Brian J. Palik; Richard Buech; Leanne Egeland

    2003-01-01

    Hierarchy theory, when applied to landscapes, predicts that broader-scale ecosystems constrain the development of finer-scale, nested ecosystems. This prediction finds application in hierarchical land classifications. Such classifications typically apply to physiognomically similar ecosystems, or ecological land units, e.g., a set of multi-scale forest ecosystems. We...

  7. Classification of Parkinson's disease utilizing multi-edit nearest-neighbor and ensemble learning algorithms with speech samples.

    PubMed

    Zhang, He-Hua; Yang, Liuyang; Liu, Yuchuan; Wang, Pin; Yin, Jun; Li, Yongming; Qiu, Mingguo; Zhu, Xueru; Yan, Fang

    2016-11-16

    The use of speech based data in the classification of Parkinson disease (PD) has been shown to provide an effect, non-invasive mode of classification in recent years. Thus, there has been an increased interest in speech pattern analysis methods applicable to Parkinsonism for building predictive tele-diagnosis and tele-monitoring models. One of the obstacles in optimizing classifications is to reduce noise within the collected speech samples, thus ensuring better classification accuracy and stability. While the currently used methods are effect, the ability to invoke instance selection has been seldomly examined. In this study, a PD classification algorithm was proposed and examined that combines a multi-edit-nearest-neighbor (MENN) algorithm and an ensemble learning algorithm. First, the MENN algorithm is applied for selecting optimal training speech samples iteratively, thereby obtaining samples with high separability. Next, an ensemble learning algorithm, random forest (RF) or decorrelated neural network ensembles (DNNE), is used to generate trained samples from the collected training samples. Lastly, the trained ensemble learning algorithms are applied to the test samples for PD classification. This proposed method was examined using a more recently deposited public datasets and compared against other currently used algorithms for validation. Experimental results showed that the proposed algorithm obtained the highest degree of improved classification accuracy (29.44%) compared with the other algorithm that was examined. Furthermore, the MENN algorithm alone was found to improve classification accuracy by as much as 45.72%. Moreover, the proposed algorithm was found to exhibit a higher stability, particularly when combining the MENN and RF algorithms. This study showed that the proposed method could improve PD classification when using speech data and can be applied to future studies seeking to improve PD classification methods.

  8. Improved supervised classification of accelerometry data to distinguish behaviors of soaring birds.

    PubMed

    Sur, Maitreyi; Suffredini, Tony; Wessells, Stephen M; Bloom, Peter H; Lanzone, Michael; Blackshire, Sheldon; Sridhar, Srisarguru; Katzner, Todd

    2017-01-01

    Soaring birds can balance the energetic costs of movement by switching between flapping, soaring and gliding flight. Accelerometers can allow quantification of flight behavior and thus a context to interpret these energetic costs. However, models to interpret accelerometry data are still being developed, rarely trained with supervised datasets, and difficult to apply. We collected accelerometry data at 140Hz from a trained golden eagle (Aquila chrysaetos) whose flight we recorded with video that we used to characterize behavior. We applied two forms of supervised classifications, random forest (RF) models and K-nearest neighbor (KNN) models. The KNN model was substantially easier to implement than the RF approach but both were highly accurate in classifying basic behaviors such as flapping (85.5% and 83.6% accurate, respectively), soaring (92.8% and 87.6%) and sitting (84.1% and 88.9%) with overall accuracies of 86.6% and 92.3% respectively. More detailed classification schemes, with specific behaviors such as banking and straight flights were well classified only by the KNN model (91.24% accurate; RF = 61.64% accurate). The RF model maintained its accuracy of classifying basic behavior classification accuracy of basic behaviors at sampling frequencies as low as 10Hz, the KNN at sampling frequencies as low as 20Hz. Classification of accelerometer data collected from free ranging birds demonstrated a strong dependence of predicted behavior on the type of classification model used. Our analyses demonstrate the consequence of different approaches to classification of accelerometry data, the potential to optimize classification algorithms with validated flight behaviors to improve classification accuracy, ideal sampling frequencies for different classification algorithms, and a number of ways to improve commonly used analytical techniques and best practices for classification of accelerometry data.

  9. Improved supervised classification of accelerometry data to distinguish behaviors of soaring birds

    PubMed Central

    Suffredini, Tony; Wessells, Stephen M.; Bloom, Peter H.; Lanzone, Michael; Blackshire, Sheldon; Sridhar, Srisarguru; Katzner, Todd

    2017-01-01

    Soaring birds can balance the energetic costs of movement by switching between flapping, soaring and gliding flight. Accelerometers can allow quantification of flight behavior and thus a context to interpret these energetic costs. However, models to interpret accelerometry data are still being developed, rarely trained with supervised datasets, and difficult to apply. We collected accelerometry data at 140Hz from a trained golden eagle (Aquila chrysaetos) whose flight we recorded with video that we used to characterize behavior. We applied two forms of supervised classifications, random forest (RF) models and K-nearest neighbor (KNN) models. The KNN model was substantially easier to implement than the RF approach but both were highly accurate in classifying basic behaviors such as flapping (85.5% and 83.6% accurate, respectively), soaring (92.8% and 87.6%) and sitting (84.1% and 88.9%) with overall accuracies of 86.6% and 92.3% respectively. More detailed classification schemes, with specific behaviors such as banking and straight flights were well classified only by the KNN model (91.24% accurate; RF = 61.64% accurate). The RF model maintained its accuracy of classifying basic behavior classification accuracy of basic behaviors at sampling frequencies as low as 10Hz, the KNN at sampling frequencies as low as 20Hz. Classification of accelerometer data collected from free ranging birds demonstrated a strong dependence of predicted behavior on the type of classification model used. Our analyses demonstrate the consequence of different approaches to classification of accelerometry data, the potential to optimize classification algorithms with validated flight behaviors to improve classification accuracy, ideal sampling frequencies for different classification algorithms, and a number of ways to improve commonly used analytical techniques and best practices for classification of accelerometry data. PMID:28403159

  10. Improved supervised classification of accelerometry data to distinguish behaviors of soaring birds

    USGS Publications Warehouse

    Sur, Maitreyi; Suffredini, Tony; Wessells, Stephen M.; Bloom, Peter H.; Lanzone, Michael J.; Blackshire, Sheldon; Sridhar, Srisarguru; Katzner, Todd

    2017-01-01

    Soaring birds can balance the energetic costs of movement by switching between flapping, soaring and gliding flight. Accelerometers can allow quantification of flight behavior and thus a context to interpret these energetic costs. However, models to interpret accelerometry data are still being developed, rarely trained with supervised datasets, and difficult to apply. We collected accelerometry data at 140Hz from a trained golden eagle (Aquila chrysaetos) whose flight we recorded with video that we used to characterize behavior. We applied two forms of supervised classifications, random forest (RF) models and K-nearest neighbor (KNN) models. The KNN model was substantially easier to implement than the RF approach but both were highly accurate in classifying basic behaviors such as flapping (85.5% and 83.6% accurate, respectively), soaring (92.8% and 87.6%) and sitting (84.1% and 88.9%) with overall accuracies of 86.6% and 92.3% respectively. More detailed classification schemes, with specific behaviors such as banking and straight flights were well classified only by the KNN model (91.24% accurate; RF = 61.64% accurate). The RF model maintained its accuracy of classifying basic behavior classification accuracy of basic behaviors at sampling frequencies as low as 10Hz, the KNN at sampling frequencies as low as 20Hz. Classification of accelerometer data collected from free ranging birds demonstrated a strong dependence of predicted behavior on the type of classification model used. Our analyses demonstrate the consequence of different approaches to classification of accelerometry data, the potential to optimize classification algorithms with validated flight behaviors to improve classification accuracy, ideal sampling frequencies for different classification algorithms, and a number of ways to improve commonly used analytical techniques and best practices for classification of accelerometry data.

  11. Mapping Mangrove Density from Rapideye Data in Central America

    NASA Astrophysics Data System (ADS)

    Son, Nguyen-Thanh; Chen, Chi-Farn; Chen, Cheng-Ru

    2017-06-01

    Mangrove forests provide a wide range of socioeconomic and ecological services for coastal communities. Extensive aquaculture development of mangrove waters in many developing countries has constantly ignored services of mangrove ecosystems, leading to unintended environmental consequences. Monitoring the current status and distribution of mangrove forests is deemed important for evaluating forest management strategies. This study aims to delineate the density distribution of mangrove forests in the Gulf of Fonseca, Central America with Rapideye data using the support vector machines (SVM). The data collected in 2012 for density classification of mangrove forests were processed based on four different band combination schemes: scheme-1 (bands 1-3, 5 excluding the red-edge band 4), scheme-2 (bands 1-5), scheme-3 (bands 1-3, 5 incorporating with the normalized difference vegetation index, NDVI), and scheme-4 (bands 1-3, 5 incorporating with the normalized difference red-edge index, NDRI). We also hypothesized if the obvious contribution of Rapideye red-edge band could improve the classification results. Three main steps of data processing were employed: (1), data pre-processing, (2) image classification, and (3) accuracy assessment to evaluate the contribution of red-edge band in terms of the accuracy of classification results across these four schemes. The classification maps compared with the ground reference data indicated the slightly higher accuracy level observed for schemes 2 and 4. The overall accuracies and Kappa coefficients were 97% and 0.95 for scheme-2 and 96.9% and 0.95 for scheme-4, respectively.

  12. Proceedings, 15th central hardwood forest conference

    Treesearch

    David S. Buckley; Wayne K. Clatterbuck; [Editors

    2007-01-01

    Proceedings of the 15th central hardwood forest conference held February 27–March 1, 2006, in Knoxville, TN. Includes 86 papers and 30 posters pertaining to forest health and protection, ecology and forest dynamics, natural and artificial regeneration, forest products, wildlife, site classification, management and forest resources, mensuration and models, soil and...

  13. Floating Forests: Validation of a Citizen Science Effort to Answer Global Ecological Questions

    NASA Astrophysics Data System (ADS)

    Rosenthal, I.; Byrnes, J.; Cavanaugh, K. C.; Haupt, A. J.; Trouille, L.; Bell, T. W.; Rassweiler, A.; Pérez-Matus, A.; Assis, J.

    2017-12-01

    Researchers undertaking long term, large-scale ecological analyses face significant challenges for data collection and processing. Crowdsourcing via citizen science can provide an efficient method for analyzing large data sets. However, many scientists have raised questions about the quality of data collected by citizen scientists. Here we use Floating-Forests (http://floatingforests.org), a citizen science platform for creating a global time series of giant kelp abundance, to show that ensemble classifications of satellite data can ensure data quality. Citizen scientists view satellite images of coastlines and classify kelp forests by tracing all visible patches of kelp. Each image is classified by fifteen citizen scientists before being retired. To validate citizen science results, all fifteen classifications are converted to a raster and overlaid on a calibration dataset generated from previous studies. Results show that ensemble classifications from citizen scientists are consistently accurate when compared to calibration data. Given that all source images were acquired by Landsat satellites, we expect this consistency to hold across all regions. At present, we have over 6000 web-based citizen scientists' classifications of almost 2.5 million images of kelp forests in California and Tasmania. These results are not only useful for remote sensing of kelp forests, but also for a wide array of applications that combine citizen science with remote sensing.

  14. Land cover and forest formation distributions for St. Kitts, Nevis, St. Eustatius, Grenada and Barbados from decision tree classification of cloud-cleared satellite imagery

    USGS Publications Warehouse

    Helmer, E.H.; Kennaway, T.A.; Pedreros, D.H.; Clark, M.L.; Marcano-Vega, H.; Tieszen, L.L.; Ruzycki, T.R.; Schill, S.R.; Carrington, C.M.S.

    2008-01-01

    Satellite image-based mapping of tropical forests is vital to conservation planning. Standard methods for automated image classification, however, limit classification detail in complex tropical landscapes. In this study, we test an approach to Landsat image interpretation on four islands of the Lesser Antilles, including Grenada and St. Kitts, Nevis and St. Eustatius, testing a more detailed classification than earlier work in the latter three islands. Secondly, we estimate the extents of land cover and protected forest by formation for five islands and ask how land cover has changed over the second half of the 20th century. The image interpretation approach combines image mosaics and ancillary geographic data, classifying the resulting set of raster data with decision tree software. Cloud-free image mosaics for one or two seasons were created by applying regression tree normalization to scene dates that could fill cloudy areas in a base scene. Such mosaics are also known as cloud-filled, cloud-minimized or cloud-cleared imagery, mosaics, or composites. The approach accurately distinguished several classes that more standard methods would confuse; the seamless mosaics aided reference data collection; and the multiseason imagery allowed us to separate drought deciduous forests and woodlands from semi-deciduous ones. Cultivated land areas declined 60 to 100 percent from about 1945 to 2000 on several islands. Meanwhile, forest cover has increased 50 to 950%. This trend will likely continue where sugar cane cultivation has dominated. Like the island of Puerto Rico, most higher-elevation forest formations are protected in formal or informal reserves. Also similarly, lowland forests, which are drier forest types on these islands, are not well represented in reserves. Former cultivated lands in lowland areas could provide lands for new reserves of drier forest types. The land-use history of these islands may provide insight for planners in countries currently considering lowland forest clearing for agriculture. Copyright 2008 College of Arts and Sciences.

  15. An assessment of commonly employed satellite-based remote sensors for mapping mangrove species in Mexico using an NDVI-based classification scheme.

    PubMed

    Valderrama-Landeros, L; Flores-de-Santiago, F; Kovacs, J M; Flores-Verdugo, F

    2017-12-14

    Optimizing the classification accuracy of a mangrove forest is of utmost importance for conservation practitioners. Mangrove forest mapping using satellite-based remote sensing techniques is by far the most common method of classification currently used given the logistical difficulties of field endeavors in these forested wetlands. However, there is now an abundance of options from which to choose in regards to satellite sensors, which has led to substantially different estimations of mangrove forest location and extent with particular concern for degraded systems. The objective of this study was to assess the accuracy of mangrove forest classification using different remotely sensed data sources (i.e., Landsat-8, SPOT-5, Sentinel-2, and WorldView-2) for a system located along the Pacific coast of Mexico. Specifically, we examined a stressed semiarid mangrove forest which offers a variety of conditions such as dead areas, degraded stands, healthy mangroves, and very dense mangrove island formations. The results indicated that Landsat-8 (30 m per pixel) had  the lowest overall accuracy at 64% and that WorldView-2 (1.6 m per pixel) had the highest at 93%. Moreover, the SPOT-5 and the Sentinel-2 classifications (10 m per pixel) were very similar having accuracies of 75 and 78%, respectively. In comparison to WorldView-2, the other sensors overestimated the extent of Laguncularia racemosa and underestimated the extent of Rhizophora mangle. When considering such type of sensors, the higher spatial resolution can be particularly important in mapping small mangrove islands that often occur in degraded mangrove systems.

  16. Deep learning approach to bacterial colony classification.

    PubMed

    Zieliński, Bartosz; Plichta, Anna; Misztal, Krzysztof; Spurek, Przemysław; Brzychczy-Włoch, Monika; Ochońska, Dorota

    2017-01-01

    In microbiology it is diagnostically useful to recognize various genera and species of bacteria. It can be achieved using computer-aided methods, which make the recognition processes more automatic and thus significantly reduce the time necessary for the classification. Moreover, in case of diagnostic uncertainty (the misleading similarity in shape or structure of bacterial cells), such methods can minimize the risk of incorrect recognition. In this article, we apply the state of the art method for texture analysis to classify genera and species of bacteria. This method uses deep Convolutional Neural Networks to obtain image descriptors, which are then encoded and classified with Support Vector Machine or Random Forest. To evaluate this approach and to make it comparable with other approaches, we provide a new dataset of images. DIBaS dataset (Digital Image of Bacterial Species) contains 660 images with 33 different genera and species of bacteria.

  17. Spot counting on fluorescence in situ hybridization in suspension images using Gaussian mixture model

    NASA Astrophysics Data System (ADS)

    Liu, Sijia; Sa, Ruhan; Maguire, Orla; Minderman, Hans; Chaudhary, Vipin

    2015-03-01

    Cytogenetic abnormalities are important diagnostic and prognostic criteria for acute myeloid leukemia (AML). A flow cytometry-based imaging approach for FISH in suspension (FISH-IS) was established that enables the automated analysis of several log-magnitude higher number of cells compared to the microscopy-based approaches. The rotational positioning can occur leading to discordance between spot count. As a solution of counting error from overlapping spots, in this study, a Gaussian Mixture Model based classification method is proposed. The Akaike information criterion (AIC) and Bayesian information criterion (BIC) of GMM are used as global image features of this classification method. Via Random Forest classifier, the result shows that the proposed method is able to detect closely overlapping spots which cannot be separated by existing image segmentation based spot detection methods. The experiment results show that by the proposed method we can obtain a significant improvement in spot counting accuracy.

  18. Automatic segmentation and classification of mycobacterium tuberculosis with conventional light microscopy

    NASA Astrophysics Data System (ADS)

    Xu, Chao; Zhou, Dongxiang; Zhai, Yongping; Liu, Yunhui

    2015-12-01

    This paper realizes the automatic segmentation and classification of Mycobacterium tuberculosis with conventional light microscopy. First, the candidate bacillus objects are segmented by the marker-based watershed transform. The markers are obtained by an adaptive threshold segmentation based on the adaptive scale Gaussian filter. The scale of the Gaussian filter is determined according to the color model of the bacillus objects. Then the candidate objects are extracted integrally after region merging and contaminations elimination. Second, the shape features of the bacillus objects are characterized by the Hu moments, compactness, eccentricity, and roughness, which are used to classify the single, touching and non-bacillus objects. We evaluated the logistic regression, random forest, and intersection kernel support vector machines classifiers in classifying the bacillus objects respectively. Experimental results demonstrate that the proposed method yields to high robustness and accuracy. The logistic regression classifier performs best with an accuracy of 91.68%.

  19. Using random forests for assistance in the curation of G-protein coupled receptor databases.

    PubMed

    Shkurin, Aleksei; Vellido, Alfredo

    2017-08-18

    Biology is experiencing a gradual but fast transformation from a laboratory-centred science towards a data-centred one. As such, it requires robust data engineering and the use of quantitative data analysis methods as part of database curation. This paper focuses on G protein-coupled receptors, a large and heterogeneous super-family of cell membrane proteins of interest to biology in general. One of its families, Class C, is of particular interest to pharmacology and drug design. This family is quite heterogeneous on its own, and the discrimination of its several sub-families is a challenging problem. In the absence of known crystal structure, such discrimination must rely on their primary amino acid sequences. We are interested not as much in achieving maximum sub-family discrimination accuracy using quantitative methods, but in exploring sequence misclassification behavior. Specifically, we are interested in isolating those sequences showing consistent misclassification, that is, sequences that are very often misclassified and almost always to the same wrong sub-family. Random forests are used for this analysis due to their ensemble nature, which makes them naturally suited to gauge the consistency of misclassification. This consistency is here defined through the voting scheme of their base tree classifiers. Detailed consistency results for the random forest ensemble classification were obtained for all receptors and for all data transformations of their unaligned primary sequences. Shortlists of the most consistently misclassified receptors for each subfamily and transformation, as well as an overall shortlist including those cases that were consistently misclassified across transformations, were obtained. The latter should be referred to experts for further investigation as a data curation task. The automatic discrimination of the Class C sub-families of G protein-coupled receptors from their unaligned primary sequences shows clear limits. This study has investigated in some detail the consistency of their misclassification using random forest ensemble classifiers. Different sub-families have been shown to display very different discrimination consistency behaviors. The individual identification of consistently misclassified sequences should provide a tool for quality control to GPCR database curators.

  20. Unsupervised learning on scientific ocean drilling datasets from the South China Sea

    NASA Astrophysics Data System (ADS)

    Tse, Kevin C.; Chiu, Hon-Chim; Tsang, Man-Yin; Li, Yiliang; Lam, Edmund Y.

    2018-06-01

    Unsupervised learning methods were applied to explore data patterns in multivariate geophysical datasets collected from ocean floor sediment core samples coming from scientific ocean drilling in the South China Sea. Compared to studies on similar datasets, but using supervised learning methods which are designed to make predictions based on sample training data, unsupervised learning methods require no a priori information and focus only on the input data. In this study, popular unsupervised learning methods including K-means, self-organizing maps, hierarchical clustering and random forest were coupled with different distance metrics to form exploratory data clusters. The resulting data clusters were externally validated with lithologic units and geologic time scales assigned to the datasets by conventional methods. Compact and connected data clusters displayed varying degrees of correspondence with existing classification by lithologic units and geologic time scales. K-means and self-organizing maps were observed to perform better with lithologic units while random forest corresponded best with geologic time scales. This study sets a pioneering example of how unsupervised machine learning methods can be used as an automatic processing tool for the increasingly high volume of scientific ocean drilling data.

  1. Retinal layer segmentation of macular OCT images using boundary classification

    PubMed Central

    Lang, Andrew; Carass, Aaron; Hauser, Matthew; Sotirchos, Elias S.; Calabresi, Peter A.; Ying, Howard S.; Prince, Jerry L.

    2013-01-01

    Optical coherence tomography (OCT) has proven to be an essential imaging modality for ophthalmology and is proving to be very important in neurology. OCT enables high resolution imaging of the retina, both at the optic nerve head and the macula. Macular retinal layer thicknesses provide useful diagnostic information and have been shown to correlate well with measures of disease severity in several diseases. Since manual segmentation of these layers is time consuming and prone to bias, automatic segmentation methods are critical for full utilization of this technology. In this work, we build a random forest classifier to segment eight retinal layers in macular cube images acquired by OCT. The random forest classifier learns the boundary pixels between layers, producing an accurate probability map for each boundary, which is then processed to finalize the boundaries. Using this algorithm, we can accurately segment the entire retina contained in the macular cube to an accuracy of at least 4.3 microns for any of the nine boundaries. Experiments were carried out on both healthy and multiple sclerosis subjects, with no difference in the accuracy of our algorithm found between the groups. PMID:23847738

  2. Detection of compensatory balance responses using wearable electromyography sensors for fall-risk assessment.

    PubMed

    Nouredanesh, Mina; Kukreja, Sunil L; Tung, James

    2016-08-01

    Loss of balance is prevalent in older adults and populations with gait and balance impairments. The present paper aims to develop a method to automatically distinguish compensatory balance responses (CBRs) from normal gait, based on activity patterns of muscles involved in maintaining balance. In this study, subjects were perturbed by lateral pushes while walking and surface electromyography (sEMG) signals were recorded from four muscles in their right leg. To extract sEMG time domain features, several filtering characteristics and segmentation approaches are examined. The performance of three classification methods, i.e., k-nearest neighbor, support vector machines, and random forests, were investigated for accurate detection of CBRs. Our results show that features extracted in the 50-200Hz band, segmented using peak sEMG amplitudes, and a random forest classifier detected CBRs with an accuracy of 92.35%. Moreover, our results support the important role of biceps femoris and rectus femoris muscles in stabilization and consequently discerning CBRs. This study contributes towards the development of wearable sensor systems to accurately and reliably monitor gait and balance control behavior in at-home settings (unsupervised conditions), over long periods of time, towards personalized fall risk assessment tools.

  3. Vertebral degenerative disc disease severity evaluation using random forest classification

    NASA Astrophysics Data System (ADS)

    Munoz, Hector E.; Yao, Jianhua; Burns, Joseph E.; Pham, Yasuyuki; Stieger, James; Summers, Ronald M.

    2014-03-01

    Degenerative disc disease (DDD) develops in the spine as vertebral discs degenerate and osseous excrescences or outgrowths naturally form to restabilize unstable segments of the spine. These osseous excrescences, or osteophytes, may progress or stabilize in size as the spine reaches a new equilibrium point. We have previously created a CAD system that detects DDD. This paper presents a new system to determine the severity of DDD of individual vertebral levels. This will be useful to monitor the progress of developing DDD, as rapid growth may indicate that there is a greater stabilization problem that should be addressed. The existing DDD CAD system extracts the spine from CT images and segments the cortical shell of individual levels with a dual-surface model. The cortical shell is unwrapped, and is analyzed to detect the hyperdense regions of DDD. Three radiologists scored the severity of DDD of each disc space of 46 CT scans. Radiologists' scores and features generated from CAD detections were used to train a random forest classifier. The classifier then assessed the severity of DDD at each vertebral disc level. The agreement between the computer severity score and the average radiologist's score had a quadratic weighted Cohen's kappa of 0.64.

  4. A machine learning system to improve heart failure patient assistance.

    PubMed

    Guidi, Gabriele; Pettenati, Maria Chiara; Melillo, Paolo; Iadanza, Ernesto

    2014-11-01

    In this paper, we present a clinical decision support system (CDSS) for the analysis of heart failure (HF) patients, providing various outputs such as an HF severity evaluation, HF-type prediction, as well as a management interface that compares the different patients' follow-ups. The whole system is composed of a part of intelligent core and of an HF special-purpose management tool also providing the function to act as interface for the artificial intelligence training and use. To implement the smart intelligent functions, we adopted a machine learning approach. In this paper, we compare the performance of a neural network (NN), a support vector machine, a system with fuzzy rules genetically produced, and a classification and regression tree and its direct evolution, which is the random forest, in analyzing our database. Best performances in both HF severity evaluation and HF-type prediction functions are obtained by using the random forest algorithm. The management tool allows the cardiologist to populate a "supervised database" suitable for machine learning during his or her regular outpatient consultations. The idea comes from the fact that in literature there are a few databases of this type, and they are not scalable to our case.

  5. Introducing two Random Forest based methods for cloud detection in remote sensing images

    NASA Astrophysics Data System (ADS)

    Ghasemian, Nafiseh; Akhoondzadeh, Mehdi

    2018-07-01

    Cloud detection is a necessary phase in satellite images processing to retrieve the atmospheric and lithospheric parameters. Currently, some cloud detection methods based on Random Forest (RF) model have been proposed but they do not consider both spectral and textural characteristics of the image. Furthermore, they have not been tested in the presence of snow/ice. In this paper, we introduce two RF based algorithms, Feature Level Fusion Random Forest (FLFRF) and Decision Level Fusion Random Forest (DLFRF) to incorporate visible, infrared (IR) and thermal spectral and textural features (FLFRF) including Gray Level Co-occurrence Matrix (GLCM) and Robust Extended Local Binary Pattern (RELBP_CI) or visible, IR and thermal classifiers (DLFRF) for highly accurate cloud detection on remote sensing images. FLFRF first fuses visible, IR and thermal features. Thereafter, it uses the RF model to classify pixels to cloud, snow/ice and background or thick cloud, thin cloud and background. DLFRF considers visible, IR and thermal features (both spectral and textural) separately and inserts each set of features to RF model. Then, it holds vote matrix of each run of the model. Finally, it fuses the classifiers using the majority vote method. To demonstrate the effectiveness of the proposed algorithms, 10 Terra MODIS and 15 Landsat 8 OLI/TIRS images with different spatial resolutions are used in this paper. Quantitative analyses are based on manually selected ground truth data. Results show that after adding RELBP_CI to input feature set cloud detection accuracy improves. Also, the average cloud kappa values of FLFRF and DLFRF on MODIS images (1 and 0.99) are higher than other machine learning methods, Linear Discriminate Analysis (LDA), Classification And Regression Tree (CART), K Nearest Neighbor (KNN) and Support Vector Machine (SVM) (0.96). The average snow/ice kappa values of FLFRF and DLFRF on MODIS images (1 and 0.85) are higher than other traditional methods. The quantitative values on Landsat 8 images show similar trend. Consequently, while SVM and K-nearest neighbor show overestimation in predicting cloud and snow/ice pixels, our Random Forest (RF) based models can achieve higher cloud, snow/ice kappa values on MODIS and thin cloud, thick cloud and snow/ice kappa values on Landsat 8 images. Our algorithms predict both thin and thick cloud on Landsat 8 images while the existing cloud detection algorithm, Fmask cannot discriminate them. Compared to the state-of-the-art methods, our algorithms have acquired higher average cloud and snow/ice kappa values for different spatial resolutions.

  6. Regional Mapping of Plantation Extent Using Multisensor Imagery

    NASA Astrophysics Data System (ADS)

    Torbick, N.; Ledoux, L.; Hagen, S.; Salas, W.

    2016-12-01

    Industrial forest plantations are expanding rapidly across the tropics and monitoring extent is critical for understanding environmental and socioeconomic impacts. In this study, new, multisensor imagery were evaluated and integrated to extract the strengths of each sensor for mapping plantation extent at regional scales. Three distinctly different landscapes with multiple plantation types were chosen to consider scalability and transferability. These were Tanintharyi, Myanmar, West Kalimantan, Indonesia, and southern Ghana. Landsat-8 Operational Land Imager (OLI), Phased Array L-band Synthetic Aperture Radar-2 (PALSAR-2), and Sentinel-1A images were fused within a Classification and Regression Tree (CART) framework using random forest and high-resolution surveys. Multi-criteria evaluations showed both L-and C-band gamma nought γ° backscatter decibel (dB), Landsat reflectance ρλ, and texture indices were useful for distinguishing oil palm and rubber plantations from other land types. The classification approach identified 750,822 ha or 23% of the Taninathryi, Myanmar, and 216,086 ha or 25% of western West Kalimantan as plantation with very high cross validation accuracy. The mapping approach was scalable and transferred well across the different geographies and plantation types. As archives for Sentinel-1, Landsat-8, and PALSAR-2 continue to grow, mapping plantation extent and dynamics at moderate resolution over large regions should be feasible.

  7. Investigation of Latent Traces Using Infrared Reflectance Hyperspectral Imaging

    NASA Astrophysics Data System (ADS)

    Schubert, Till; Wenzel, Susanne; Roscher, Ribana; Stachniss, Cyrill

    2016-06-01

    The detection of traces is a main task of forensics. Hyperspectral imaging is a potential method from which we expect to capture more fluorescence effects than with common forensic light sources. This paper shows that the use of hyperspectral imaging is suited for the analysis of latent traces and extends the classical concept to the conservation of the crime scene for retrospective laboratory analysis. We examine specimen of blood, semen and saliva traces in several dilution steps, prepared on cardboard substrate. As our key result we successfully make latent traces visible up to dilution factor of 1:8000. We can attribute most of the detectability to interference of electromagnetic light with the water content of the traces in the shortwave infrared region of the spectrum. In a classification task we use several dimensionality reduction methods (PCA and LDA) in combination with a Maximum Likelihood classifier, assuming normally distributed data. Further, we use Random Forest as a competitive approach. The classifiers retrieve the exact positions of labelled trace preparation up to highest dilution and determine posterior probabilities. By modelling the classification task with a Markov Random Field we are able to integrate prior information about the spatial relation of neighboured pixel labels.

  8. Classification of tree species based on longwave hyperspectral data from leaves, a case study for a tropical dry forest

    NASA Astrophysics Data System (ADS)

    Harrison, D.; Rivard, B.; Sánchez-Azofeifa, A.

    2018-04-01

    Remote sensing of the environment has utilized the visible, near and short-wave infrared (IR) regions of the electromagnetic (EM) spectrum to characterize vegetation health, vigor and distribution. However, relatively little research has focused on the use of the longwave infrared (LWIR, 8.0-12.5 μm) region for studies of vegetation. In this study LWIR leaf reflectance spectra were collected in the wet seasons (May through December) of 2013 and 2014 from twenty-six tree species located in a high species diversity environment, a tropical dry forest in Costa Rica. A continuous wavelet transformation (CWT) was applied to all spectra to minimize noise and broad amplitude variations attributable to non-compositional effects. Species discrimination was then explored with Random Forest classification and accuracy improved was observed with preprocessing of reflectance spectra with continuous wavelet transformation. Species were found to share common spectral features that formed the basis for five spectral types that were corroborated with linear discriminate analysis. The source of most of the observed spectral features is attributed to cell wall or cuticle compounds (cellulose, cutin, matrix glycan, silica and oleanolic acid). Spectral types could be advantageous for the analysis of airborne hyperspectral data because cavity effects will lower the spectral contrast thus increasing the reliance of classification efforts on dominant spectral features. Spectral types specifically derived from leaf level data are expected to support the labeling of spectral classes derived from imagery. The results of this study and that of Ribeiro Da Luz (2006), Ribeiro Da Luz and Crowley (2007, 2010), Ullah et al. (2012) and Rock et al. (2016) have now illustrated success in tree species discrimination across a range of ecosystems using leaf-level spectral observations. With advances in LWIR sensors and concurrent improvements in their signal to noise, applications to large-scale species detection from airborne imagery appear feasible.

  9. Prediction of carbonate rock type from NMR responses using data mining techniques

    NASA Astrophysics Data System (ADS)

    Gonçalves, Eduardo Corrêa; da Silva, Pablo Nascimento; Silveira, Carla Semiramis; Carneiro, Giovanna; Domingues, Ana Beatriz; Moss, Adam; Pritchard, Tim; Plastino, Alexandre; Azeredo, Rodrigo Bagueira de Vasconcellos

    2017-05-01

    Recent studies have indicated that the accurate identification of carbonate rock types in a reservoir can be employed as a preliminary step to enhance the effectiveness of petrophysical property modeling. Furthermore, rock typing activity has been shown to be of key importance in several steps of formation evaluation, such as the study of sedimentary series, reservoir zonation and well-to-well correlation. In this paper, a methodology based exclusively on the analysis of 1H-NMR (Nuclear Magnetic Resonance) relaxation responses - using data mining algorithms - is evaluated to perform the automatic classification of carbonate samples according to their rock type. We analyze the effectiveness of six different classification algorithms (k-NN, Naïve Bayes, C4.5, Random Forest, SMO and Multilayer Perceptron) and two data preprocessing strategies (discretization and feature selection). The dataset used in this evaluation is formed by 78 1H-NMR T2 distributions of fully brine-saturated rock samples from six different rock type classes. The experiments reveal that the combination of preprocessing strategies with classification algorithms is able to achieve a prediction accuracy of 97.4%.

  10. Developing a radiomics framework for classifying non-small cell lung carcinoma subtypes

    NASA Astrophysics Data System (ADS)

    Yu, Dongdong; Zang, Yali; Dong, Di; Zhou, Mu; Gevaert, Olivier; Fang, Mengjie; Shi, Jingyun; Tian, Jie

    2017-03-01

    Patient-targeted treatment of non-small cell lung carcinoma (NSCLC) has been well documented according to the histologic subtypes over the past decade. In parallel, recent development of quantitative image biomarkers has recently been highlighted as important diagnostic tools to facilitate histological subtype classification. In this study, we present a radiomics analysis that classifies the adenocarcinoma (ADC) and squamous cell carcinoma (SqCC). We extract 52-dimensional, CT-based features (7 statistical features and 45 image texture features) to represent each nodule. We evaluate our approach on a clinical dataset including 324 ADCs and 110 SqCCs patients with CT image scans. Classification of these features is performed with four different machine-learning classifiers including Support Vector Machines with Radial Basis Function kernel (RBF-SVM), Random forest (RF), K-nearest neighbor (KNN), and RUSBoost algorithms. To improve the classifiers' performance, optimal feature subset is selected from the original feature set by using an iterative forward inclusion and backward eliminating algorithm. Extensive experimental results demonstrate that radiomics features achieve encouraging classification results on both complete feature set (AUC=0.89) and optimal feature subset (AUC=0.91).

  11. Ecological type classification for California: the Forest Service approach

    Treesearch

    Barbara H. Allen

    1987-01-01

    National legislation has mandated the development and use of an ecological data base to improve resource decision making, while State and Federal agencies have agreed to cooperate in standardizing resource classification and inventory data. In the Pacific Southwest Region, which includes nearly 20 million acres (8.3 million ha) in California, the Forest Service, U.S....

  12. Landsat TM Classifications For SAFIS Using FIA Field Plots

    Treesearch

    William H. Cooke; Andrew J. Hartsell

    2001-01-01

    Wall-to-wall Landsat Thematic Mapper (TM) classification efforts in Georgia require field validation. We developed a new crown modeling procedure based on Forest Health Monitoring (FHM) data to test Forest Inventory and Analysis (FIA) data. These models simulate the proportion of tree crowns that reflect light on a FIA subplot basis. We averaged subplot crown...

  13. Comparing the performance of flat and hierarchical Habitat/Land-Cover classification models in a NATURA 2000 site

    NASA Astrophysics Data System (ADS)

    Gavish, Yoni; O'Connell, Jerome; Marsh, Charles J.; Tarantino, Cristina; Blonda, Palma; Tomaselli, Valeria; Kunin, William E.

    2018-02-01

    The increasing need for high quality Habitat/Land-Cover (H/LC) maps has triggered considerable research into novel machine-learning based classification models. In many cases, H/LC classes follow pre-defined hierarchical classification schemes (e.g., CORINE), in which fine H/LC categories are thematically nested within more general categories. However, none of the existing machine-learning algorithms account for this pre-defined hierarchical structure. Here we introduce a novel Random Forest (RF) based application of hierarchical classification, which fits a separate local classification model in every branching point of the thematic tree, and then integrates all the different local models to a single global prediction. We applied the hierarchal RF approach in a NATURA 2000 site in Italy, using two land-cover (CORINE, FAO-LCCS) and one habitat classification scheme (EUNIS) that differ from one another in the shape of the class hierarchy. For all 3 classification schemes, both the hierarchical model and a flat model alternative provided accurate predictions, with kappa values mostly above 0.9 (despite using only 2.2-3.2% of the study area as training cells). The flat approach slightly outperformed the hierarchical models when the hierarchy was relatively simple, while the hierarchical model worked better under more complex thematic hierarchies. Most misclassifications came from habitat pairs that are thematically distant yet spectrally similar. In 2 out of 3 classification schemes, the additional constraints of the hierarchical model resulted with fewer such serious misclassifications relative to the flat model. The hierarchical model also provided valuable information on variable importance which can shed light into "black-box" based machine learning algorithms like RF. We suggest various ways by which hierarchical classification models can increase the accuracy and interpretability of H/LC classification maps.

  14. Using methods from the data mining and machine learning literature for disease classification and prediction: A case study examining classification of heart failure sub-types

    PubMed Central

    Austin, Peter C.; Tu, Jack V.; Ho, Jennifer E.; Levy, Daniel; Lee, Douglas S.

    2014-01-01

    Objective Physicians classify patients into those with or without a specific disease. Furthermore, there is often interest in classifying patients according to disease etiology or subtype. Classification trees are frequently used to classify patients according to the presence or absence of a disease. However, classification trees can suffer from limited accuracy. In the data-mining and machine learning literature, alternate classification schemes have been developed. These include bootstrap aggregation (bagging), boosting, random forests, and support vector machines. Study design and Setting We compared the performance of these classification methods with those of conventional classification trees to classify patients with heart failure according to the following sub-types: heart failure with preserved ejection fraction (HFPEF) vs. heart failure with reduced ejection fraction (HFREF). We also compared the ability of these methods to predict the probability of the presence of HFPEF with that of conventional logistic regression. Results We found that modern, flexible tree-based methods from the data mining literature offer substantial improvement in prediction and classification of heart failure sub-type compared to conventional classification and regression trees. However, conventional logistic regression had superior performance for predicting the probability of the presence of HFPEF compared to the methods proposed in the data mining literature. Conclusion The use of tree-based methods offers superior performance over conventional classification and regression trees for predicting and classifying heart failure subtypes in a population-based sample of patients from Ontario. However, these methods do not offer substantial improvements over logistic regression for predicting the presence of HFPEF. PMID:23384592

  15. Recent forest cover changes (2002-2015) in the Southern Carpathians: A case study of the Iezer Mountains, Romania.

    PubMed

    Mihai, Bogdan; Săvulescu, Ionuț; Rujoiu-Mare, Marina; Nistor, Constantin

    2017-12-01

    The paper explores the dynamics of the forest cover change in the Iezer Mountains, part of Southern Carpathians, in the context of the forest ownership recovery and deforestation processes, combined with the effects of biotic and abiotic disturbances. The aim of the study is to map and evaluate the typology and the spatial extension of changes in the montane forest cover between 700 and 2462m a.s.l., sampling all the representative Carpathian ecosystems, from the European beech zone up to the spruce-fir zone and the subalpine-alpine pastures. The methodology uses a change detection analysis of satellite imagery with Landsat ETM+/OLI and Sentinel-2 MSI data. The workflow started with a complete calibration of multispectral data from 2002, before the massive forest restitution to private owners, after the Law 247/2005 empowerment, and 2015, the intensification of deforestation process. For the data classification, a Maximum Likelihood supervised classification algorithm was utilized. The forest change map was developed after combining the classifications in a unitary formula using image difference. The principal outcome of the research identifies the type of forest cover change using a quantitative formula. This information can be integrated in the future decision-making strategies for forest stand management and sustainable development. Copyright © 2017 Elsevier B.V. All rights reserved.

  16. Mapping forest types in Worcester County, Maryland, using LANDSAT data

    NASA Technical Reports Server (NTRS)

    Burtis, J., Jr.; Witt, R. G.

    1981-01-01

    The feasibility of mapping Level 2 forest cover types for a county-sized area on Maryland's Eastern Shore was demonstrated. A Level 1 land use/land cover classification was carried out for all of Worcester County as well. A June 1978 LANDSAT scene was utilized in a classification which employed two software packages on different computers (IDIMS on an HP 3000 and ASTEP-II on a Univac 1108). A twelve category classification scheme was devised for the study area. Resulting products include black and white line printer maps, final color coded classification maps, digitally enhanced color imagery and tabulated acreage statistics for all land use and land cover types.

  17. Combined use of two supervised learning algorithms to model sea turtle behaviours from tri-axial acceleration data.

    PubMed

    Jeantet, L; Dell'Amico, F; Forin-Wiart, M-A; Coutant, M; Bonola, M; Etienne, D; Gresser, J; Regis, S; Lecerf, N; Lefebvre, F; de Thoisy, B; Le Maho, Y; Brucker, M; Châtelain, N; Laesser, R; Crenner, F; Handrich, Y; Wilson, R; Chevallier, D

    2018-05-23

    Accelerometers are becoming ever more important sensors in animal-attached technology, providing data that allow determination of body posture and movement and thereby helping to elucidate behaviour in animals that are difficult to observe. We sought to validate the identification of sea turtle behaviours from accelerometer signals by deploying tags on the carapace of a juvenile loggerhead ( Caretta caretta ), an adult hawksbill ( Eretmochelys imbricata ) and an adult green turtle ( Chelonia mydas ) at Aquarium La Rochelle, France. We recorded tri-axial acceleration at 50 Hz for each species for a full day while two fixed cameras recorded their behaviours. We identified behaviours from the acceleration data using two different supervised learning algorithms, Random Forest and Classification And Regression Tree (CART), treating the data from the adult animals as separate from the juvenile data. We achieved a global accuracy of 81.30% for the adult hawksbill and green turtle CART model and 71.63% for the juvenile loggerhead, identifying 10 and 12 different behaviours, respectively. Equivalent figures were 86.96% for the adult hawksbill and green turtle Random Forest model and 79.49% for the juvenile loggerhead, for the same behaviours. The use of Random Forest combined with CART algorithms allowed us to understand the decision rules implicated in behaviour discrimination, and thus remove or group together some 'confused' or under--represented behaviours in order to get the most accurate models. This study is the first to validate accelerometer data to identify turtle behaviours and the approach can now be tested on other captive sea turtle species. © 2018. Published by The Company of Biologists Ltd.

  18. A Hybrid Color Space for Skin Detection Using Genetic Algorithm Heuristic Search and Principal Component Analysis Technique

    PubMed Central

    2015-01-01

    Color is one of the most prominent features of an image and used in many skin and face detection applications. Color space transformation is widely used by researchers to improve face and skin detection performance. Despite the substantial research efforts in this area, choosing a proper color space in terms of skin and face classification performance which can address issues like illumination variations, various camera characteristics and diversity in skin color tones has remained an open issue. This research proposes a new three-dimensional hybrid color space termed SKN by employing the Genetic Algorithm heuristic and Principal Component Analysis to find the optimal representation of human skin color in over seventeen existing color spaces. Genetic Algorithm heuristic is used to find the optimal color component combination setup in terms of skin detection accuracy while the Principal Component Analysis projects the optimal Genetic Algorithm solution to a less complex dimension. Pixel wise skin detection was used to evaluate the performance of the proposed color space. We have employed four classifiers including Random Forest, Naïve Bayes, Support Vector Machine and Multilayer Perceptron in order to generate the human skin color predictive model. The proposed color space was compared to some existing color spaces and shows superior results in terms of pixel-wise skin detection accuracy. Experimental results show that by using Random Forest classifier, the proposed SKN color space obtained an average F-score and True Positive Rate of 0.953 and False Positive Rate of 0.0482 which outperformed the existing color spaces in terms of pixel wise skin detection accuracy. The results also indicate that among the classifiers used in this study, Random Forest is the most suitable classifier for pixel wise skin detection applications. PMID:26267377

  19. Mediastinal lymph node detection and station mapping on chest CT using spatial priors and random forest

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Liu, Jiamin; Hoffman, Joanne; Zhao, Jocelyn

    2016-07-15

    Purpose: To develop an automated system for mediastinal lymph node detection and station mapping for chest CT. Methods: The contextual organs, trachea, lungs, and spine are first automatically identified to locate the region of interest (ROI) (mediastinum). The authors employ shape features derived from Hessian analysis, local object scale, and circular transformation that are computed per voxel in the ROI. Eight more anatomical structures are simultaneously segmented by multiatlas label fusion. Spatial priors are defined as the relative multidimensional distance vectors corresponding to each structure. Intensity, shape, and spatial prior features are integrated and parsed by a random forest classifiermore » for lymph node detection. The detected candidates are then segmented by the following curve evolution process. Texture features are computed on the segmented lymph nodes and a support vector machine committee is used for final classification. For lymph node station labeling, based on the segmentation results of the above anatomical structures, the textual definitions of mediastinal lymph node map according to the International Association for the Study of Lung Cancer are converted into patient-specific color-coded CT image, where the lymph node station can be automatically assigned for each detected node. Results: The chest CT volumes from 70 patients with 316 enlarged mediastinal lymph nodes are used for validation. For lymph node detection, their system achieves 88% sensitivity at eight false positives per patient. For lymph node station labeling, 84.5% of lymph nodes are correctly assigned to their stations. Conclusions: Multiple-channel shape, intensity, and spatial prior features aggregated by a random forest classifier improve mediastinal lymph node detection on chest CT. Using the location information of segmented anatomic structures from the multiatlas formulation enables accurate identification of lymph node stations.« less

  20. Inside the black box: starting to uncover the underlying decision rules used in one-by-one expert assessment of occupational exposure in case-control studies

    PubMed Central

    Wheeler, David C.; Burstyn, Igor; Vermeulen, Roel; Yu, Kai; Shortreed, Susan M.; Pronk, Anjoeka; Stewart, Patricia A.; Colt, Joanne S.; Baris, Dalsu; Karagas, Margaret R.; Schwenn, Molly; Johnson, Alison; Silverman, Debra T.; Friesen, Melissa C.

    2014-01-01

    Objectives Evaluating occupational exposures in population-based case-control studies often requires exposure assessors to review each study participants' reported occupational information job-by-job to derive exposure estimates. Although such assessments likely have underlying decision rules, they usually lack transparency, are time-consuming and have uncertain reliability and validity. We aimed to identify the underlying rules to enable documentation, review, and future use of these expert-based exposure decisions. Methods Classification and regression trees (CART, predictions from a single tree) and random forests (predictions from many trees) were used to identify the underlying rules from the questionnaire responses and an expert's exposure assignments for occupational diesel exhaust exposure for several metrics: binary exposure probability and ordinal exposure probability, intensity, and frequency. Data were split into training (n=10,488 jobs), testing (n=2,247), and validation (n=2,248) data sets. Results The CART and random forest models' predictions agreed with 92–94% of the expert's binary probability assignments. For ordinal probability, intensity, and frequency metrics, the two models extracted decision rules more successfully for unexposed and highly exposed jobs (86–90% and 57–85%, respectively) than for low or medium exposed jobs (7–71%). Conclusions CART and random forest models extracted decision rules and accurately predicted an expert's exposure decisions for the majority of jobs and identified questionnaire response patterns that would require further expert review if the rules were applied to other jobs in the same or different study. This approach makes the exposure assessment process in case-control studies more transparent and creates a mechanism to efficiently replicate exposure decisions in future studies. PMID:23155187

  1. Field guide for forested plant associations of the Wenatchee National Forest.

    Treesearch

    T.R. Lillybridge; B.L. Kovalchik; C.K. Williams; B.G. Smith

    1995-01-01

    A classification of forest vegetation is presented for the Wenatchee National Forest (NF). It is based on potential vegetation, with the plant association as the basic unit. The sample includes about 570 intensive plots and 840 reconnaissance plots distributed across the Wenatchee National Forest and the southwest portion of the Okanogan National Forest from 1975...

  2. Abundance and characteristics of snags in western Montana forests

    Treesearch

    Richard B. Harris

    1999-01-01

    Plot data from the U.S. Forest Service's Forest Inventory and Analysis program was used to characterize the abundance and selected characteristics of snags from forests in western Montana. Plots were grouped by whether they had a history of timber harvest, and the U.S. Forest Service classifications of forest type, habitat type, and potential vegetation group were...

  3. Integrative image segmentation optimization and machine learning approach for high quality land-use and land-cover mapping using multisource remote sensing data

    NASA Astrophysics Data System (ADS)

    Gibril, Mohamed Barakat A.; Idrees, Mohammed Oludare; Yao, Kouame; Shafri, Helmi Zulhaidi Mohd

    2018-01-01

    The growing use of optimization for geographic object-based image analysis and the possibility to derive a wide range of information about the image in textual form makes machine learning (data mining) a versatile tool for information extraction from multiple data sources. This paper presents application of data mining for land-cover classification by fusing SPOT-6, RADARSAT-2, and derived dataset. First, the images and other derived indices (normalized difference vegetation index, normalized difference water index, and soil adjusted vegetation index) were combined and subjected to segmentation process with optimal segmentation parameters obtained using combination of spatial and Taguchi statistical optimization. The image objects, which carry all the attributes of the input datasets, were extracted and related to the target land-cover classes through data mining algorithms (decision tree) for classification. To evaluate the performance, the result was compared with two nonparametric classifiers: support vector machine (SVM) and random forest (RF). Furthermore, the decision tree classification result was evaluated against six unoptimized trials segmented using arbitrary parameter combinations. The result shows that the optimized process produces better land-use land-cover classification with overall classification accuracy of 91.79%, 87.25%, and 88.69% for SVM and RF, respectively, while the results of the six unoptimized classifications yield overall accuracy between 84.44% and 88.08%. Higher accuracy of the optimized data mining classification approach compared to the unoptimized results indicates that the optimization process has significant impact on the classification quality.

  4. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Weimar, Mark R.; Daly, Don S.; Wood, Thomas W.

    Both nuclear power and nuclear weapons programs should have (related) economic signatures which are detectible at some scale. We evaluated this premise in a series of studies using national economic input/output (IO) data. Statistical discrimination models using economic IO tables predict with a high probability whether a country with an unknown predilection for nuclear weapons proliferation is in fact engaged in nuclear power development or nuclear weapons proliferation. We analyzed 93 IO tables, spanning the years 1993 to 2005 for 37 countries that are either members or associates of the Organization for Economic Cooperation and Development (OECD). The 2009 OECDmore » input/output tables featured 48 industrial sectors based on International Standard Industrial Classification (ISIC) Revision 3, and described the respective economies in current country-of-origin valued currency. We converted and transformed these reported values to US 2005 dollars using appropriate exchange rates and implicit price deflators, and addressed discrepancies in reported industrial sectors across tables. We then classified countries with Random Forest using either the adjusted or industry-normalized values. Random Forest, a classification tree technique, separates and categorizes countries using a very small, select subset of the 2304 individual cells in the IO table. A nation’s efforts in nuclear power, be it for electricity or nuclear weapons, are an enterprise with a large economic footprint -- an effort so large that it should discernibly perturb coarse country-level economics data such as that found in yearly input-output economic tables. The neoclassical economic input-output model describes a country’s or region’s economy in terms of the requirements of industries to produce the current level of economic output. An IO table row shows the distribution of an industry’s output to the industrial sectors while a table column shows the input required of each industrial sector by a given industry.« less

  5. Bush encroachment monitoring using multi-temporal Landsat data and random forests

    NASA Astrophysics Data System (ADS)

    Symeonakis, E.; Higginbottom, T.

    2014-11-01

    It is widely accepted that land degradation and desertification (LDD) are serious global threats to humans and the environment. Around a third of savannahs in Africa are affected by LDD processes that may lead to substantial declines in ecosystem functioning and services. Indirectly, LDD can be monitored using relevant indicators. The encroachment of woody plants into grasslands, and the subsequent conversion of savannahs and open woodlands into shrublands, has attracted a lot of attention over the last decades and has been identified as a potential indicator of LDD. Mapping bush encroachment over large areas can only effectively be done using Earth Observation (EO) data and techniques. However, the accurate assessment of large-scale savannah degradation through bush encroachment with satellite imagery remains a formidable task due to the fact that on the satellite data vegetation variability in response to highly variable rainfall patterns might obscure the underlying degradation processes. Here, we present a methodological framework for the monitoring of bush encroachment-related land degradation in a savannah environment in the Northwest Province of South Africa. We utilise multi-temporal Landsat TM and ETM+ (SLC-on) data from 1989 until 2009, mostly from the dry-season, and ancillary data in a GIS environment. We then use the machine learning classification approach of random forests to identify the extent of encroachment over the 20-year period. The results show that in the area of study, bush encroachment is as alarming as permanent vegetation loss. The classification of the year 2009 is validated yielding low commission and omission errors and high k-statistic values for the grasses and woody vegetation classes. Our approach is a step towards a rigorous and effective savannah degradation assessment.

  6. Grey matter volume patterns in thalamic nuclei are associated with familial risk for schizophrenia.

    PubMed

    Pergola, Giulio; Trizio, Silvestro; Di Carlo, Pasquale; Taurisano, Paolo; Mancini, Marina; Amoroso, Nicola; Nettis, Maria Antonietta; Andriola, Ileana; Caforio, Grazia; Popolizio, Teresa; Rampino, Antonio; Di Giorgio, Annabella; Bertolino, Alessandro; Blasi, Giuseppe

    2017-02-01

    Previous evidence suggests reduced thalamic grey matter volume (GMV) in patients with schizophrenia (SCZ). However, it is not considered an intermediate phenotype for schizophrenia, possibly because previous studies did not assess the contribution of individual thalamic nuclei and employed univariate statistics. Here, we hypothesized that multivariate statistics would reveal an association of GMV in different thalamic nuclei with familial risk for schizophrenia. We also hypothesized that accounting for the heterogeneity of thalamic GMV in healthy controls would improve the detection of subjects at familial risk for the disorder. We acquired MRI scans for 96 clinically stable SCZ, 55 non-affected siblings of patients with schizophrenia (SIB), and 249 HC. The thalamus was parceled into seven regions of interest (ROIs). After a canonical univariate analysis, we used GMV estimates of thalamic ROIs, together with total thalamic GMV and premorbid intelligence, as features in Random Forests to classify HC, SIB, and SCZ. Then, we computed a Misclassification Index for each individual and tested the improvement in SIB detection after excluding a subsample of HC misclassified as patients. Random Forests discriminated SCZ from HC (accuracy=81%) and SIB from HC (accuracy=75%). Left anteromedial thalamic volumes were significantly associated with both multivariate classifications (p<0.05). Excluding HC misclassified as SCZ improved greatly HC vs. SIB classification (Cohen's d=1.39). These findings suggest that multivariate statistics identify a familial background associated with thalamic GMV reduction in SCZ. They also suggest the relevance of inter-individual variability of GMV patterns for the discrimination of individuals at familial risk for the disorder. Copyright © 2016 Elsevier B.V. All rights reserved.

  7. A decision support system for automatic sleep staging from EEG signals using tunable Q-factor wavelet transform and spectral features.

    PubMed

    Hassan, Ahnaf Rashik; Bhuiyan, Mohammed Imamul Hassan

    2016-09-15

    Automatic sleep scoring is essential owing to the fact that conventionally a large volume of data have to be analyzed visually by the physicians which is onerous, time-consuming and error-prone. Therefore, there is a dire need of an automated sleep staging scheme. In this work, we decompose sleep-EEG signal segments using tunable-Q factor wavelet transform (TQWT). Various spectral features are then computed from TQWT sub-bands. The performance of spectral features in the TQWT domain has been determined by intuitive and graphical analyses, statistical validation, and Fisher criteria. Random forest is used to perform classification. Optimal choices and the effects of TQWT and random forest parameters have been determined and expounded. Experimental outcomes manifest the efficacy of our feature generation scheme in terms of p-values of ANOVA analysis and Fisher criteria. The proposed scheme yields 90.38%, 91.50%, 92.11%, 94.80%, 97.50% for 6-stage to 2-stage classification of sleep states on the benchmark Sleep-EDF data-set. In addition, its performance on DREAMS Subjects Data-set is also promising. The performance of the proposed method is significantly better than the existing ones in terms of accuracy and Cohen's kappa coefficient. Additionally, the proposed scheme gives high detection accuracy for sleep stages non-REM 1 and REM. Spectral features in the TQWT domain can discriminate sleep-EEG signals corresponding to various sleep states efficaciously. The proposed scheme will alleviate the burden of the physicians, speed-up sleep disorder diagnosis, and expedite sleep research. Copyright © 2016 Elsevier B.V. All rights reserved.

  8. Saliency-Guided Change Detection of Remotely Sensed Images Using Random Forest

    NASA Astrophysics Data System (ADS)

    Feng, W.; Sui, H.; Chen, X.

    2018-04-01

    Studies based on object-based image analysis (OBIA) representing the paradigm shift in change detection (CD) have achieved remarkable progress in the last decade. Their aim has been developing more intelligent interpretation analysis methods in the future. The prediction effect and performance stability of random forest (RF), as a new kind of machine learning algorithm, are better than many single predictors and integrated forecasting method. In this paper, we present a novel CD approach for high-resolution remote sensing images, which incorporates visual saliency and RF. First, highly homogeneous and compact image super-pixels are generated using super-pixel segmentation, and the optimal segmentation result is obtained through image superimposition and principal component analysis (PCA). Second, saliency detection is used to guide the search of interest regions in the initial difference image obtained via the improved robust change vector analysis (RCVA) algorithm. The salient regions within the difference image that correspond to the binarized saliency map are extracted, and the regions are subject to the fuzzy c-means (FCM) clustering to obtain the pixel-level pre-classification result, which can be used as a prerequisite for superpixel-based analysis. Third, on the basis of the optimal segmentation and pixel-level pre-classification results, different super-pixel change possibilities are calculated. Furthermore, the changed and unchanged super-pixels that serve as the training samples are automatically selected. The spectral features and Gabor features of each super-pixel are extracted. Finally, superpixel-based CD is implemented by applying RF based on these samples. Experimental results on Ziyuan 3 (ZY3) multi-spectral images show that the proposed method outperforms the compared methods in the accuracy of CD, and also confirm the feasibility and effectiveness of the proposed approach.

  9. Application of Pattern Recognition Techniques to the Classification of Full-Term and Preterm Infant Cry.

    PubMed

    Orlandi, Silvia; Reyes Garcia, Carlos Alberto; Bandini, Andrea; Donzelli, Gianpaolo; Manfredi, Claudia

    2016-11-01

    Scientific and clinical advances in perinatology and neonatology have enhanced the chances of survival of preterm and very low weight neonates. Infant cry analysis is a suitable noninvasive complementary tool to assess the neurologic state of infants particularly important in the case of preterm neonates. This article aims at exploiting differences between full-term and preterm infant cry with robust automatic acoustical analysis and data mining techniques. Twenty-two acoustical parameters are estimated in more than 3000 cry units from cry recordings of 28 full-term and 10 preterm newborns. Feature extraction is performed through the BioVoice dedicated software tool, developed at the Biomedical Engineering Lab, University of Firenze, Italy. Classification and pattern recognition is based on genetic algorithms for the selection of the best attributes. Training is performed comparing four classifiers: Logistic Curve, Multilayer Perceptron, Support Vector Machine, and Random Forest and three different testing options: full training set, 10-fold cross-validation, and 66% split. Results show that the best feature set is made up by 10 parameters capable to assess differences between preterm and full-term newborns with about 87% of accuracy. Best results are obtained with the Random Forest method (receiver operating characteristic area, 0.94). These 10 cry features might convey important additional information to assist the clinical specialist in the diagnosis and follow-up of possible delays or disorders in the neurologic development due to premature birth in this extremely vulnerable population of patients. The proposed approach is a first step toward an automatic infant cry recognition system for fast and proper identification of risk in preterm babies. Copyright © 2016 The Voice Foundation. Published by Elsevier Inc. All rights reserved.

  10. Identification of immune correlates of protection in Shigella infection by application of machine learning.

    PubMed

    Arevalillo, Jorge M; Sztein, Marcelo B; Kotloff, Karen L; Levine, Myron M; Simon, Jakub K

    2017-10-01

    Immunologic correlates of protection are important in vaccine development because they give insight into mechanisms of protection, assist in the identification of promising vaccine candidates, and serve as endpoints in bridging clinical vaccine studies. Our goal is the development of a methodology to identify immunologic correlates of protection using the Shigella challenge as a model. The proposed methodology utilizes the Random Forests (RF) machine learning algorithm as well as Classification and Regression Trees (CART) to detect immune markers that predict protection, identify interactions between variables, and define optimal cutoffs. Logistic regression modeling is applied to estimate the probability of protection and the confidence interval (CI) for such a probability is computed by bootstrapping the logistic regression models. The results demonstrate that the combination of Classification and Regression Trees and Random Forests complements the standard logistic regression and uncovers subtle immune interactions. Specific levels of immunoglobulin IgG antibody in blood on the day of challenge predicted protection in 75% (95% CI 67-86). Of those subjects that did not have blood IgG at or above a defined threshold, 100% were protected if they had IgA antibody secreting cells above a defined threshold. Comparison with the results obtained by applying only logistic regression modeling with standard Akaike Information Criterion for model selection shows the usefulness of the proposed method. Given the complexity of the immune system, the use of machine learning methods may enhance traditional statistical approaches. When applied together, they offer a novel way to quantify important immune correlates of protection that may help the development of vaccines. Copyright © 2017 Elsevier Inc. All rights reserved.

  11. ADMET Evaluation in Drug Discovery. 18. Reliable Prediction of Chemical-Induced Urinary Tract Toxicity by Boosting Machine Learning Approaches.

    PubMed

    Lei, Tailong; Sun, Huiyong; Kang, Yu; Zhu, Feng; Liu, Hui; Zhou, Wenfang; Wang, Zhe; Li, Dan; Li, Youyong; Hou, Tingjun

    2017-11-06

    Xenobiotic chemicals and their metabolites are mainly excreted out of our bodies by the urinary tract through the urine. Chemical-induced urinary tract toxicity is one of the main reasons that cause failure during drug development, and it is a common adverse event for medications, natural supplements, and environmental chemicals. Despite its importance, there are only a few in silico models for assessing urinary tract toxicity for a large number of compounds with diverse chemical structures. Here, we developed a series of qualitative and quantitative structure-activity relationship (QSAR) models for predicting urinary tract toxicity. In our study, the recursive feature elimination method incorporated with random forests (RFE-RF) was used for dimension reduction, and then eight machine learning approaches were used for QSAR modeling, i.e., relevance vector machine (RVM), support vector machine (SVM), regularized random forest (RRF), C5.0 trees, eXtreme gradient boosting (XGBoost), AdaBoost.M1, SVM boosting (SVMBoost), and RVM boosting (RVMBoost). For building classification models, the synthetic minority oversampling technique was used to handle the imbalance data set problem. Among all the machine learning approaches, SVMBoost based on the RBF kernel achieves both the best quantitative (q ext 2 = 0.845) and qualitative predictions for the test set (MCC of 0.787, AUC of 0.893, sensitivity of 89.6%, specificity of 94.1%, and global accuracy of 90.8%). The application domains were then analyzed, and all of the tested chemicals fall within the application domain coverage. We also examined the structure features of the chemicals with large prediction errors. In brief, both the regression and classification models developed by the SVMBoost approach have reliable prediction capability for assessing chemical-induced urinary tract toxicity.

  12. Spatiotemporal Change Detection Using Landsat Imagery: the Case Study of Karacabey Flooded Forest, Bursa, Turkey

    NASA Astrophysics Data System (ADS)

    Akay, A. E.; Gencal, B.; Taş, İ.

    2017-11-01

    This short paper aims to detect spatiotemporal detection of land use/land cover change within Karacabey Flooded Forest region. Change detection analysis applied to Landsat 5 TM images representing July 2000 and a Landsat 8 OLI representing June 2017. Various image processing tools were implemented using ERDAS 9.2, ArcGIS 10.4.1, and ENVI programs to conduct spatiotemporal change detection over these two images such as band selection, corrections, subset, classification, recoding, accuracy assessment, and change detection analysis. Image classification revealed that there are five significant land use/land cover types, including forest, flooded forest, swamp, water, and other lands (i.e. agriculture, sand, roads, settlement, and open areas). The results indicated that there was increase in flooded forest, water, and other lands, while the cover of forest and swamp decreased.

  13. Monitoring forest land from high altitude and from space

    NASA Technical Reports Server (NTRS)

    1972-01-01

    The significant findings are reported for remote sensing of forest lands conducted during the period October 1, 1965 to December 31, 1972. Forest inventory research included the use of aircraft and space imagery for forest and nonforest land classification, and land use classification by automated procedures, multispectral scanning, and computerized mapping. Forest stress studies involved previsual detection of ponderosa pine under stress from insects and disease, bark bettle infestations in the Black Hills, and root disease impacts on forest stands. Standardization and calibration studies were made to develop a field test of an ERTS-matched four-channel spectrometer. Calibration of focal plane shutters and mathematical modeling of film characteristic curves were also studied. Documents published as a result of all forestry studies funded by NASA for the Earth Resources Survey Program from 1965 through 1972 are listed.

  14. Evaluation of Skylab (EREP) data for forest and rangeland surveys. [Georgia, South Dakota, Colorado, and California

    NASA Technical Reports Server (NTRS)

    Aldrich, R. C. (Principal Investigator); Dana, R. W.; Greentree, W. J.; Roberts, E. H.; Norick, N. X.; Waite, T. H.; Francis, R. E.; Driscoll, R. S.; Weber, F. P.

    1975-01-01

    The author has identified the following significant results. Four widely separated sites (near Augusta, Georgia; Lead, South Dakota; Manitou, Colorado; and Redding, California) were selected as typical sites for forest inventory, forest stress, rangeland inventory, and atmospheric and solar measurements, respectively. Results indicated that Skylab S190B color photography is good for classification of Level 1 forest and nonforest land (90 to 95 percent correct) and could be used as a data base for sampling by small and medium scale photography using regression techniques. The accuracy of Level 2 forest and nonforest classes, however, varied from fair to poor. Results of plant community classification tests indicate that both visual and microdensitometric techniques can separate deciduous, conifirous, and grassland classes to the region level in the Ecoclass hierarchical classification system. There was no consistency in classifying tree categories at the series level by visual photointerpretation. The relationship between ground measurements and large scale photo measurements of foliar cover had a correlation coefficient of greater than 0.75. Some of the relationships, however, were site dependent.

  15. Benchmark of Machine Learning Methods for Classification of a SENTINEL-2 Image

    NASA Astrophysics Data System (ADS)

    Pirotti, F.; Sunar, F.; Piragnolo, M.

    2016-06-01

    Thanks to mainly ESA and USGS, a large bulk of free images of the Earth is readily available nowadays. One of the main goals of remote sensing is to label images according to a set of semantic categories, i.e. image classification. This is a very challenging issue since land cover of a specific class may present a large spatial and spectral variability and objects may appear at different scales and orientations. In this study, we report the results of benchmarking 9 machine learning algorithms tested for accuracy and speed in training and classification of land-cover classes in a Sentinel-2 dataset. The following machine learning methods (MLM) have been tested: linear discriminant analysis, k-nearest neighbour, random forests, support vector machines, multi layered perceptron, multi layered perceptron ensemble, ctree, boosting, logarithmic regression. The validation is carried out using a control dataset which consists of an independent classification in 11 land-cover classes of an area about 60 km2, obtained by manual visual interpretation of high resolution images (20 cm ground sampling distance) by experts. In this study five out of the eleven classes are used since the others have too few samples (pixels) for testing and validating subsets. The classes used are the following: (i) urban (ii) sowable areas (iii) water (iv) tree plantations (v) grasslands. Validation is carried out using three different approaches: (i) using pixels from the training dataset (train), (ii) using pixels from the training dataset and applying cross-validation with the k-fold method (kfold) and (iii) using all pixels from the control dataset. Five accuracy indices are calculated for the comparison between the values predicted with each model and control values over three sets of data: the training dataset (train), the whole control dataset (full) and with k-fold cross-validation (kfold) with ten folds. Results from validation of predictions of the whole dataset (full) show the random forests method with the highest values; kappa index ranging from 0.55 to 0.42 respectively with the most and least number pixels for training. The two neural networks (multi layered perceptron and its ensemble) and the support vector machines - with default radial basis function kernel - methods follow closely with comparable performance.

  16. Potential of VIIRS Data for Regional Monitoring of Gypsy Moth Defoliation: Implications for Forest Threat Early Warning System

    NASA Technical Reports Server (NTRS)

    Spruce, Joseph P.; Ryan, Robert E.; Smoot, James C.; Prados, Donald; McKellip, Rodney; Sader. Steven A.; Gasser, Jerry; May, George; Hargrove, William

    2007-01-01

    A NASA RPC (Rapid Prototyping Capability) experiment was conducted to assess the potential of VIIRS (Visible/Infrared Imager/Radiometer Suite) data for monitoring non-native gypsy moth (Lymantria dispar) defoliation of forests. This experiment compares defoliation detection products computed from simulated VIIRS and from MODIS (Moderate Resolution Imaging Spectroradiometer) time series products as potential inputs to a forest threat EWS (Early Warning System) being developed for the USFS (USDA Forest Service). Gypsy moth causes extensive defoliation of broadleaved forests in the United States and is specifically identified in the Healthy Forest Restoration Act (HFRA) of 2003. The HFRA mandates development of a national forest threat EWS. This system is being built by the USFS and NASA is aiding integration of needed satellite data products into this system, including MODIS products. This RPC experiment enabled the MODIS follow-on, VIIRS, to be evaluated as a data source for EWS forest monitoring products. The experiment included 1) assessment of MODIS-simulated VIIRS NDVI products, and 2) evaluation of gypsy moth defoliation mapping products from MODIS-simulated VIIRS and from MODIS NDVI time series data. This experiment employed MODIS data collected over the approximately 15 million acre mid-Appalachian Highlands during the annual peak defoliation time frame (approximately June 10 through July 27) during 2000-2006. NASA Stennis Application Research Toolbox software was used to produce MODIS-simulated VIIRS data and NASA Stennis Time Series Product Tool software was employed to process MODIS and MODIS-simulated VIIRS time series data scaled to planetary reflectance. MODIS-simulated VIIRS data was assessed through comparison to Hyperion-simulated VIIRS data using data collected during gypsy moth defoliation. Hyperion-simulated MODIS data showed a high correlation with actual MODIS data (NDVI R2 of 0.877 and RMSE of 0.023). MODIS-simulated VIIRS data for the same date showed moderately high correlation with Hyperion-simulated VIIRS data (NDVI R2 of 0.62 and RMSE of 0.035), even though the datasets were collected about a half an hour apart during changing weather conditions. MODIS products (MOD02, MOD09, and MOD13) and MOD02-simulated VIIRS time series data were used to generate defoliation mapping products based on image classification and image differencing change detection techniques. Accuracy of final defoliation mapping products was assessed by image interpreting over 170 randomly sampled locations found on Landsat and ASTER data in conjunction with defoliation map data from the USFS. The MOD02-simulated VIIRS 400-meter NDVI classification produced a similar overall accuracy (87.28 percent with 0.72 Kappa) to the MOD02 250-meter NDVI classification (86.71 percent with 0.71 Kappa). In addition, the VIIRS 400-meter NDVI, MOD02 250-meter NDVI, and MOD02 500-meter NDVI showed good user and producer accuracies for the defoliated forest class (70 percent) and acceptable Kappa values (0.66). MOD02 and MOD02-simulated VIIRS data both showed promise as data sources for regional monitoring of forest disturbance due to insect defoliation.

  17. Land-cover classification in a moist tropical region of Brazil with Landsat TM imagery.

    PubMed

    Li, Guiying; Lu, Dengsheng; Moran, Emilio; Hetrick, Scott

    2011-01-01

    This research aims to improve land-cover classification accuracy in a moist tropical region in Brazil by examining the use of different remote sensing-derived variables and classification algorithms. Different scenarios based on Landsat Thematic Mapper (TM) spectral data and derived vegetation indices and textural images, and different classification algorithms - maximum likelihood classification (MLC), artificial neural network (ANN), classification tree analysis (CTA), and object-based classification (OBC), were explored. The results indicated that a combination of vegetation indices as extra bands into Landsat TM multispectral bands did not improve the overall classification performance, but the combination of textural images was valuable for improving vegetation classification accuracy. In particular, the combination of both vegetation indices and textural images into TM multispectral bands improved overall classification accuracy by 5.6% and kappa coefficient by 6.25%. Comparison of the different classification algorithms indicated that CTA and ANN have poor classification performance in this research, but OBC improved primary forest and pasture classification accuracies. This research indicates that use of textural images or use of OBC are especially valuable for improving the vegetation classes such as upland and liana forest classes having complex stand structures and having relatively large patch sizes.

  18. Land-cover classification in a moist tropical region of Brazil with Landsat TM imagery

    PubMed Central

    LI, GUIYING; LU, DENGSHENG; MORAN, EMILIO; HETRICK, SCOTT

    2011-01-01

    This research aims to improve land-cover classification accuracy in a moist tropical region in Brazil by examining the use of different remote sensing-derived variables and classification algorithms. Different scenarios based on Landsat Thematic Mapper (TM) spectral data and derived vegetation indices and textural images, and different classification algorithms – maximum likelihood classification (MLC), artificial neural network (ANN), classification tree analysis (CTA), and object-based classification (OBC), were explored. The results indicated that a combination of vegetation indices as extra bands into Landsat TM multispectral bands did not improve the overall classification performance, but the combination of textural images was valuable for improving vegetation classification accuracy. In particular, the combination of both vegetation indices and textural images into TM multispectral bands improved overall classification accuracy by 5.6% and kappa coefficient by 6.25%. Comparison of the different classification algorithms indicated that CTA and ANN have poor classification performance in this research, but OBC improved primary forest and pasture classification accuracies. This research indicates that use of textural images or use of OBC are especially valuable for improving the vegetation classes such as upland and liana forest classes having complex stand structures and having relatively large patch sizes. PMID:22368311

  19. Ecological modeling for forest management in the Shawnee National Forest

    Treesearch

    Richard G. Thurau; J.F. Fralish; S. Hupe; B. Fitch; A.D. Carver

    2008-01-01

    Land managers of the Shawnee National Forest in southern Illinois are challenged to meet the needs of a diverse populace of stakeholders. By classifying National Forest holdings into management units, U.S. Forest Service personnel can spatially allocate resources and services to meet local management objectives. Ecological Classification Systems predict ecological site...

  20. Phylogenetic classification of the world's tropical forests.

    PubMed

    Slik, J W Ferry; Franklin, Janet; Arroyo-Rodríguez, Víctor; Field, Richard; Aguilar, Salomon; Aguirre, Nikolay; Ahumada, Jorge; Aiba, Shin-Ichiro; Alves, Luciana F; K, Anitha; Avella, Andres; Mora, Francisco; Aymard C, Gerardo A; Báez, Selene; Balvanera, Patricia; Bastian, Meredith L; Bastin, Jean-François; Bellingham, Peter J; van den Berg, Eduardo; da Conceição Bispo, Polyanna; Boeckx, Pascal; Boehning-Gaese, Katrin; Bongers, Frans; Boyle, Brad; Brambach, Fabian; Brearley, Francis Q; Brown, Sandra; Chai, Shauna-Lee; Chazdon, Robin L; Chen, Shengbin; Chhang, Phourin; Chuyong, George; Ewango, Corneille; Coronado, Indiana M; Cristóbal-Azkarate, Jurgi; Culmsee, Heike; Damas, Kipiro; Dattaraja, H S; Davidar, Priya; DeWalt, Saara J; Din, Hazimah; Drake, Donald R; Duque, Alvaro; Durigan, Giselda; Eichhorn, Karl; Eler, Eduardo Schmidt; Enoki, Tsutomu; Ensslin, Andreas; Fandohan, Adandé Belarmain; Farwig, Nina; Feeley, Kenneth J; Fischer, Markus; Forshed, Olle; Garcia, Queila Souza; Garkoti, Satish Chandra; Gillespie, Thomas W; Gillet, Jean-Francois; Gonmadje, Christelle; Granzow-de la Cerda, Iñigo; Griffith, Daniel M; Grogan, James; Hakeem, Khalid Rehman; Harris, David J; Harrison, Rhett D; Hector, Andy; Hemp, Andreas; Homeier, Jürgen; Hussain, M Shah; Ibarra-Manríquez, Guillermo; Hanum, I Faridah; Imai, Nobuo; Jansen, Patrick A; Joly, Carlos Alfredo; Joseph, Shijo; Kartawinata, Kuswata; Kearsley, Elizabeth; Kelly, Daniel L; Kessler, Michael; Killeen, Timothy J; Kooyman, Robert M; Laumonier, Yves; Laurance, Susan G; Laurance, William F; Lawes, Michael J; Letcher, Susan G; Lindsell, Jeremy; Lovett, Jon; Lozada, Jose; Lu, Xinghui; Lykke, Anne Mette; Mahmud, Khairil Bin; Mahayani, Ni Putu Diana; Mansor, Asyraf; Marshall, Andrew R; Martin, Emanuel H; Calderado Leal Matos, Darley; Meave, Jorge A; Melo, Felipe P L; Mendoza, Zhofre Huberto Aguirre; Metali, Faizah; Medjibe, Vincent P; Metzger, Jean Paul; Metzker, Thiago; Mohandass, D; Munguía-Rosas, Miguel A; Muñoz, Rodrigo; Nurtjahy, Eddy; de Oliveira, Eddie Lenza; Onrizal; Parolin, Pia; Parren, Marc; Parthasarathy, N; Paudel, Ekananda; Perez, Rolando; Pérez-García, Eduardo A; Pommer, Ulf; Poorter, Lourens; Qie, Lan; Piedade, Maria Teresa F; Pinto, José Roberto Rodrigues; Poulsen, Axel Dalberg; Poulsen, John R; Powers, Jennifer S; Prasad, Rama Chandra; Puyravaud, Jean-Philippe; Rangel, Orlando; Reitsma, Jan; Rocha, Diogo S B; Rolim, Samir; Rovero, Francesco; Rozak, Andes; Ruokolainen, Kalle; Rutishauser, Ervan; Rutten, Gemma; Mohd Said, Mohd Nizam; Saiter, Felipe Z; Saner, Philippe; Santos, Braulio; Dos Santos, João Roberto; Sarker, Swapan Kumar; Schmitt, Christine B; Schoengart, Jochen; Schulze, Mark; Sheil, Douglas; Sist, Plinio; Souza, Alexandre F; Spironello, Wilson Roberto; Sposito, Tereza; Steinmetz, Robert; Stevart, Tariq; Suganuma, Marcio Seiji; Sukri, Rahayu; Sultana, Aisha; Sukumar, Raman; Sunderland, Terry; Supriyadi; Suresh, H S; Suzuki, Eizi; Tabarelli, Marcelo; Tang, Jianwei; Tanner, Ed V J; Targhetta, Natalia; Theilade, Ida; Thomas, Duncan; Timberlake, Jonathan; de Morisson Valeriano, Márcio; van Valkenburg, Johan; Van Do, Tran; Van Sam, Hoang; Vandermeer, John H; Verbeeck, Hans; Vetaas, Ole Reidar; Adekunle, Victor; Vieira, Simone A; Webb, Campbell O; Webb, Edward L; Whitfeld, Timothy; Wich, Serge; Williams, John; Wiser, Susan; Wittmann, Florian; Yang, Xiaobo; Adou Yao, C Yves; Yap, Sandra L; Zahawi, Rakan A; Zakaria, Rahmad; Zang, Runguo

    2018-02-20

    Knowledge about the biogeographic affinities of the world's tropical forests helps to better understand regional differences in forest structure, diversity, composition, and dynamics. Such understanding will enable anticipation of region-specific responses to global environmental change. Modern phylogenies, in combination with broad coverage of species inventory data, now allow for global biogeographic analyses that take species evolutionary distance into account. Here we present a classification of the world's tropical forests based on their phylogenetic similarity. We identify five principal floristic regions and their floristic relationships: ( i ) Indo-Pacific, ( ii ) Subtropical, ( iii ) African, ( iv ) American, and ( v ) Dry forests. Our results do not support the traditional neo- versus paleotropical forest division but instead separate the combined American and African forests from their Indo-Pacific counterparts. We also find indications for the existence of a global dry forest region, with representatives in America, Africa, Madagascar, and India. Additionally, a northern-hemisphere Subtropical forest region was identified with representatives in Asia and America, providing support for a link between Asian and American northern-hemisphere forests. Copyright © 2018 the Author(s). Published by PNAS.

  1. Phylogenetic classification of the world’s tropical forests

    PubMed Central

    Franklin, Janet; Arroyo-Rodríguez, Víctor; Field, Richard; Aguilar, Salomon; Aguirre, Nikolay; Ahumada, Jorge; Aiba, Shin-Ichiro; K, Anitha; Avella, Andres; Mora, Francisco; Aymard C., Gerardo A.; Báez, Selene; Balvanera, Patricia; Bastian, Meredith L.; Bastin, Jean-François; Bellingham, Peter J.; van den Berg, Eduardo; da Conceição Bispo, Polyanna; Boeckx, Pascal; Boehning-Gaese, Katrin; Bongers, Frans; Boyle, Brad; Brearley, Francis Q.; Brown, Sandra; Chai, Shauna-Lee; Chazdon, Robin L.; Chen, Shengbin; Chhang, Phourin; Chuyong, George; Ewango, Corneille; Coronado, Indiana M.; Cristóbal-Azkarate, Jurgi; Culmsee, Heike; Damas, Kipiro; Dattaraja, H. S.; Davidar, Priya; DeWalt, Saara J.; Din, Hazimah; Drake, Donald R.; Durigan, Giselda; Eichhorn, Karl; Eler, Eduardo Schmidt; Enoki, Tsutomu; Ensslin, Andreas; Fandohan, Adandé Belarmain; Farwig, Nina; Feeley, Kenneth J.; Fischer, Markus; Forshed, Olle; Garcia, Queila Souza; Garkoti, Satish Chandra; Gillespie, Thomas W.; Gillet, Jean-Francois; Gonmadje, Christelle; Granzow-de la Cerda, Iñigo; Griffith, Daniel M.; Grogan, James; Hakeem, Khalid Rehman; Harris, David J.; Harrison, Rhett D.; Hector, Andy; Hemp, Andreas; Hussain, M. Shah; Ibarra-Manríquez, Guillermo; Hanum, I. Faridah; Imai, Nobuo; Jansen, Patrick A.; Joly, Carlos Alfredo; Joseph, Shijo; Kartawinata, Kuswata; Kearsley, Elizabeth; Kelly, Daniel L.; Kessler, Michael; Killeen, Timothy J.; Kooyman, Robert M.; Laumonier, Yves; Laurance, William F.; Lawes, Michael J.; Letcher, Susan G.; Lovett, Jon; Lozada, Jose; Lu, Xinghui; Lykke, Anne Mette; Mahmud, Khairil Bin; Mahayani, Ni Putu Diana; Mansor, Asyraf; Marshall, Andrew R.; Martin, Emanuel H.; Calderado Leal Matos, Darley; Meave, Jorge A.; Melo, Felipe P. L.; Mendoza, Zhofre Huberto Aguirre; Metali, Faizah; Medjibe, Vincent P.; Metzger, Jean Paul; Metzker, Thiago; Mohandass, D.; Munguía-Rosas, Miguel A.; Muñoz, Rodrigo; Nurtjahy, Eddy; de Oliveira, Eddie Lenza; Onrizal; Parolin, Pia; Parren, Marc; Parthasarathy, N.; Paudel, Ekananda; Perez, Rolando; Pérez-García, Eduardo A.; Pommer, Ulf; Poorter, Lourens; Qie, Lan; Piedade, Maria Teresa F.; Pinto, José Roberto Rodrigues; Poulsen, Axel Dalberg; Poulsen, John R.; Powers, Jennifer S.; Prasad, Rama Chandra; Puyravaud, Jean-Philippe; Rangel, Orlando; Reitsma, Jan; Rocha, Diogo S. B.; Rolim, Samir; Rovero, Francesco; Ruokolainen, Kalle; Rutishauser, Ervan; Rutten, Gemma; Mohd. Said, Mohd. Nizam; Saiter, Felipe Z.; Saner, Philippe; Santos, Braulio; dos Santos, João Roberto; Sarker, Swapan Kumar; Schoengart, Jochen; Schulze, Mark; Sheil, Douglas; Sist, Plinio; Souza, Alexandre F.; Spironello, Wilson Roberto; Sposito, Tereza; Steinmetz, Robert; Stevart, Tariq; Suganuma, Marcio Seiji; Sukri, Rahayu; Sukumar, Raman; Sunderland, Terry; Supriyadi; Suresh, H. S.; Suzuki, Eizi; Tabarelli, Marcelo; Tang, Jianwei; Tanner, Ed V. J.; Targhetta, Natalia; Theilade, Ida; Thomas, Duncan; Timberlake, Jonathan; de Morisson Valeriano, Márcio; van Valkenburg, Johan; Van Do, Tran; Van Sam, Hoang; Vandermeer, John H.; Verbeeck, Hans; Vetaas, Ole Reidar; Adekunle, Victor; Vieira, Simone A.; Webb, Campbell O.; Webb, Edward L.; Whitfeld, Timothy; Wich, Serge; Williams, John; Wiser, Susan; Wittmann, Florian; Yang, Xiaobo; Adou Yao, C. Yves; Yap, Sandra L.; Zahawi, Rakan A.; Zakaria, Rahmad; Zang, Runguo

    2018-01-01

    Knowledge about the biogeographic affinities of the world’s tropical forests helps to better understand regional differences in forest structure, diversity, composition, and dynamics. Such understanding will enable anticipation of region-specific responses to global environmental change. Modern phylogenies, in combination with broad coverage of species inventory data, now allow for global biogeographic analyses that take species evolutionary distance into account. Here we present a classification of the world’s tropical forests based on their phylogenetic similarity. We identify five principal floristic regions and their floristic relationships: (i) Indo-Pacific, (ii) Subtropical, (iii) African, (iv) American, and (v) Dry forests. Our results do not support the traditional neo- versus paleotropical forest division but instead separate the combined American and African forests from their Indo-Pacific counterparts. We also find indications for the existence of a global dry forest region, with representatives in America, Africa, Madagascar, and India. Additionally, a northern-hemisphere Subtropical forest region was identified with representatives in Asia and America, providing support for a link between Asian and American northern-hemisphere forests. PMID:29432167

  2. Gaia eclipsing binary and multiple systems. Supervised classification and self-organizing maps

    NASA Astrophysics Data System (ADS)

    Süveges, M.; Barblan, F.; Lecoeur-Taïbi, I.; Prša, A.; Holl, B.; Eyer, L.; Kochoska, A.; Mowlavi, N.; Rimoldini, L.

    2017-07-01

    Context. Large surveys producing tera- and petabyte-scale databases require machine-learning and knowledge discovery methods to deal with the overwhelming quantity of data and the difficulties of extracting concise, meaningful information with reliable assessment of its uncertainty. This study investigates the potential of a few machine-learning methods for the automated analysis of eclipsing binaries in the data of such surveys. Aims: We aim to aid the extraction of samples of eclipsing binaries from such databases and to provide basic information about the objects. We intend to estimate class labels according to two different, well-known classification systems, one based on the light curve morphology (EA/EB/EW classes) and the other based on the physical characteristics of the binary system (system morphology classes; detached through overcontact systems). Furthermore, we explore low-dimensional surfaces along which the light curves of eclipsing binaries are concentrated, and consider their use in the characterization of the binary systems and in the exploration of biases of the full unknown Gaia data with respect to the training sets. Methods: We have explored the performance of principal component analysis (PCA), linear discriminant analysis (LDA), Random Forest classification and self-organizing maps (SOM) for the above aims. We pre-processed the photometric time series by combining a double Gaussian profile fit and a constrained smoothing spline, in order to de-noise and interpolate the observed light curves. We achieved further denoising, and selected the most important variability elements from the light curves using PCA. Supervised classification was performed using Random Forest and LDA based on the PC decomposition, while SOM gives a continuous 2-dimensional manifold of the light curves arranged by a few important features. We estimated the uncertainty of the supervised methods due to the specific finite training set using ensembles of models constructed on randomized training sets. Results: We obtain excellent results (about 5% global error rate) with classification into light curve morphology classes on the Hipparcos data. The classification into system morphology classes using the Catalog and Atlas of Eclipsing binaries (CALEB) has a higher error rate (about 10.5%), most importantly due to the (sometimes strong) similarity of the photometric light curves originating from physically different systems. When trained on CALEB and then applied to Kepler-detected eclipsing binaries subsampled according to Gaia observing times, LDA and SOM provide tractable, easy-to-visualize subspaces of the full (functional) space of light curves that summarize the most important phenomenological elements of the individual light curves. The sequence of light curves ordered by their first linear discriminant coefficient is compared to results obtained using local linear embedding. The SOM method proves able to find a 2-dimensional embedded surface in the space of the light curves which separates the system morphology classes in its different regions, and also identifies a few other phenomena, such as the asymmetry of the light curves due to spots, eccentric systems, and systems with a single eclipse. Furthermore, when data from other surveys are projected to the same SOM surface, the resulting map yields a good overview of the general biases and distortions due to differences in time sampling or population.

  3. Forest cover from Landsat Thematic Mapper data for use in the Catahoula anger District geographic information system.

    Treesearch

    David L. Evans

    1994-01-01

    A forest cover classification of the Kisatchie National Forest, Catahoula Ranger district, was performed with Landsat Thematic Mapper data. Data base retrievals and map products from this analysis demonstrated use of Landsat for forest management decisions.

  4. Georgia's forests, 1989

    Treesearch

    Raymond M. Sheffield; Tony G. Johnson

    1993-01-01

    This resource bulletin presents the principal findings of the sixth inventory of Georgia's forest resources. Data on the extent, condition, and classification of forest land and associated timber volumes, growth, removals, and mortality are described and interpreted. Whereas data on nontimber commodities associated with forests were also collected, evaluations of...

  5. The effect of using complete and partial forested FIA plot data on biomass and forested area classifications from MODIS satellite data

    Treesearch

    Dumitru Salajanu; Dennis M. Jacobs

    2006-01-01

    Authors’ objective was to determine at what level biomass and forest area obtained from partial and complete forested plot inventory data compares with forested area and biomass estimates from the national inventory data. A subset of 3819 inventory plots (100% forested, 100% non-forested, mixed-forest/non-forest) was used to classify the land cover and model the...

  6. A new climatic classification of afforestation in Three-North regions of China with multi-source remote sensing data

    NASA Astrophysics Data System (ADS)

    Zheng, Xiao; Zhu, Jiaojun

    2017-01-01

    Afforestation and reforestation activities achieve high attention at the policy agenda as measures for carbon sequestration in order to mitigate climate change. The Three-North Shelter Forest Program, the largest ecological afforestation program worldwide, was launched in 1978 and will last until 2050 in the Three-North regions (accounting for 42.4 % of China's territory). Shelter forests of the Three-North Shelter Forest Program have exhibited severe decline after planting in 1978 due to lack of detailed climatic classification. Besides, a comprehensive assessment of climate adaptation for the current shelter forests was lacking. In this study, the aridity index determined by precipitation and reference evapotranspiration was employed to classify climatic zones for the afforestation program. The precipitation and reference evapotranspiration with 1-km resolution were estimated based on data from the tropical rainfall measuring mission and moderate resolution imaging spectroradiometer, respectively. Then, the detailed climatic classification for the afforestation program was obtained based on the relationship between the different vegetation types and the aridity index. The shelter forests in 2008 were derived from Landsat TM in the Three-North regions. In addition, climatic zones and shelter forests were corrected by comparing with natural vegetation map and field surveys. By overlaying the shelter forests on the climatic zones, we found that 16.30 % coniferous forests, 8.21 % broadleaved forests, 2.03 % mixed conifer-broadleaved forests, and 10.86 % shrubs were not in strict accordance with the climate conditions. These results open new perspectives for potential use of remote sensing techniques for afforestation management.

  7. Integrating Human and Machine Intelligence in Galaxy Morphology Classification Tasks

    NASA Astrophysics Data System (ADS)

    Beck, Melanie Renee

    The large flood of data flowing from observatories presents significant challenges to astronomy and cosmology--challenges that will only be magnified by projects currently under development. Growth in both volume and velocity of astrophysics data is accelerating: whereas the Sloan Digital Sky Survey (SDSS) has produced 60 terabytes of data in the last decade, the upcoming Large Synoptic Survey Telescope (LSST) plans to register 30 terabytes per night starting in the year 2020. Additionally, the Euclid Mission will acquire imaging for 5 x 107 resolvable galaxies. The field of galaxy evolution faces a particularly challenging future as complete understanding often cannot be reached without analysis of detailed morphological galaxy features. Historically, morphological analysis has relied on visual classification by astronomers, accessing the human brains capacity for advanced pattern recognition. However, this accurate but inefficient method falters when confronted with many thousands (or millions) of images. In the SDSS era, efforts to automate morphological classifications of galaxies (e.g., Conselice et al., 2000; Lotz et al., 2004) are reasonably successful and can distinguish between elliptical and disk-dominated galaxies with accuracies of 80%. While this is statistically very useful, a key problem with these methods is that they often cannot say which 80% of their samples are accurate. Furthermore, when confronted with the more complex task of identifying key substructure within galaxies, automated classification algorithms begin to fail. The Galaxy Zoo project uses a highly innovative approach to solving the scalability problem of visual classification. Displaying images of SDSS galaxies to volunteers via a simple and engaging web interface, www.galaxyzoo.org asks people to classify images by eye. Within the first year hundreds of thousands of members of the general public had classified each of the 1 million SDSS galaxies an average of 40 times. Galaxy Zoo thus solved both the visual classification problem of time efficiency and improved accuracy by producing a distribution of independent classifications for each galaxy. While crowd-sourced galaxy classifications have proven their worth, challenges remain before establishing this method as a critical and standard component of the data processing pipelines for the next generation of surveys. In particular, though innovative, crowd-sourcing techniques do not have the capacity to handle the data volume and rates expected in the next generation of surveys. These algorithms will be delegated to handle the majority of the classification tasks, freeing citizen scientists to contribute their efforts on subtler and more complex assignments. This thesis presents a solution through an integration of visual and automated classifications, preserving the best features of both human and machine. We demonstrate the effectiveness of such a system through a re-analysis of visual galaxy morphology classifications collected during the Galaxy Zoo 2 (GZ2) project. We reprocess the top-level question of the GZ2 decision tree with a Bayesian classification aggregation algorithm dubbed SWAP, originally developed for the Space Warps gravitational lens project. Through a simple binary classification scheme we increase the classification rate nearly 5-fold classifying 226,124 galaxies in 92 days of GZ2 project time while reproducing labels derived from GZ2 classification data with 95.7% accuracy. We next combine this with a Random Forest machine learning algorithm that learns on a suite of non-parametric morphology indicators widely used for automated morphologies. We develop a decision engine that delegates tasks between human and machine and demonstrate that the combined system provides a factor of 11.4 increase in the classification rate, classifying 210,803 galaxies in just 32 days of GZ2 project time with 93.1% accuracy. As the Random Forest algorithm requires a minimal amount of computational cost, this result has important implications for galaxy morphology identification tasks in the era of Euclid and other large-scale surveys.

  8. Impacts of Landscape Context on Patterns of Wind Downfall Damage in a Fragmented Amazonian Landscape

    NASA Astrophysics Data System (ADS)

    Schwartz, N.; Uriarte, M.; DeFries, R. S.; Gutierrez-Velez, V. H.; Fernandes, K.; Pinedo-Vasquez, M.

    2015-12-01

    Wind is a major disturbance in the Amazon and has both short-term impacts and lasting legacies in tropical forests. Observed patterns of damage across landscapes result from differences in wind exposure and stand characteristics, such as tree stature, species traits, successional age, and fragmentation. Wind disturbance has important consequences for biomass dynamics in Amazonian forests, and understanding the spatial distribution and size of impacts is necessary to quantify the effects on carbon dynamics. In November 2013, a mesoscale convective system was observed over the study area in Ucayali, Peru, a highly human modified and fragmented forest landscape. We mapped downfall damage associated with the storm in order to ask: how does the severity of damage vary within forest patches, and across forest patches of different sizes and successional ages? We applied spectral mixture analysis to Landsat images from 2013 and 2014 to calculate the change in non-photosynthetic vegetation fraction after the storm, and combined it with C-band SAR data from the Sentinel-1 satellite to predict downfall damage measured in 30 field plots using random forest regression. We then applied this model to map damage in forests across the study area. Using a land cover classification developed in a previous study, we mapped secondary and mature forest, and compared the severity of damage in the two. We found that damage was on average higher in secondary forests, but patterns varied spatially. This study demonstrates the utility of using multiple sources of satellite data for mapping wind disturbance, and adds to our understanding of the sources of variation in wind-related damage. Ultimately, an improved ability to map wind impacts and a better understanding of their spatial patterns can contribute to better quantification of carbon dynamics in Amazonian landscapes.

  9. Automatic photointerpretation for plant species and stress identification (ERTS-A1)

    NASA Technical Reports Server (NTRS)

    Swanlund, G. D. (Principal Investigator); Kirvida, L.; Johnson, G. R.

    1973-01-01

    The author has identified the following significant results. Automatic stratification of forested land from ERTS-1 data provides a valuable tool for resource management. The results are useful for wood product yield estimates, recreation and wildlife management, forest inventory, and forest condition monitoring. Automatic procedures based on both multispectral and spatial features are evaluated. With five classes, training and testing on the same samples, classification accuracy of 74 percent was achieved using the MSS multispectral features. When adding texture computed from 8 x 8 arrays, classification accuracy of 90 percent was obtained.

  10. Basic forest cover mapping using digitized remote sensor data and automated data processing techniques

    NASA Technical Reports Server (NTRS)

    Coggeshall, M. E.; Hoffer, R. M.

    1973-01-01

    Remote sensing equipment and automatic data processing techniques were employed as aids in the institution of improved forest resource management methods. On the basis of automatically calculated statistics derived from manually selected training samples, the feature selection processor of LARSYS selected, upon consideration of various groups of the four available spectral regions, a series of channel combinations whose automatic classification performances (for six cover types, including both deciduous and coniferous forest) were tested, analyzed, and further compared with automatic classification results obtained from digitized color infrared photography.

  11. An enhanced forest classification scheme for modeling vegetation-climate interactions based on national forest inventory data

    NASA Astrophysics Data System (ADS)

    Majasalmi, Titta; Eisner, Stephanie; Astrup, Rasmus; Fridman, Jonas; Bright, Ryan M.

    2018-01-01

    Forest management affects the distribution of tree species and the age class of a forest, shaping its overall structure and functioning and in turn the surface-atmosphere exchanges of mass, energy, and momentum. In order to attribute climate effects to anthropogenic activities like forest management, good accounts of forest structure are necessary. Here, using Fennoscandia as a case study, we make use of Fennoscandic National Forest Inventory (NFI) data to systematically classify forest cover into groups of similar aboveground forest structure. An enhanced forest classification scheme and related lookup table (LUT) of key forest structural attributes (i.e., maximum growing season leaf area index (LAImax), basal-area-weighted mean tree height, tree crown length, and total stem volume) was developed, and the classification was applied for multisource NFI (MS-NFI) maps from Norway, Sweden, and Finland. To provide a complete surface representation, our product was integrated with the European Space Agency Climate Change Initiative Land Cover (ESA CCI LC) map of present day land cover (v.2.0.7). Comparison of the ESA LC and our enhanced LC products (https://doi.org/10.21350/7zZEy5w3) showed that forest extent notably (κ = 0.55, accuracy 0.64) differed between the two products. To demonstrate the potential of our enhanced LC product to improve the description of the maximum growing season LAI (LAImax) of managed forests in Fennoscandia, we compared our LAImax map with reference LAImax maps created using the ESA LC product (and related cross-walking table) and PFT-dependent LAImax values used in three leading land models. Comparison of the LAImax maps showed that our product provides a spatially more realistic description of LAImax in managed Fennoscandian forests compared to reference maps. This study presents an approach to account for the transient nature of forest structural attributes due to human intervention in different land models.

  12. Development Of Polarimetric Decomposition Techniques For Indian Forest Resource Assessment Using Radar Imaging Satellite (Risat-1) Images

    NASA Astrophysics Data System (ADS)

    Sridhar, J.

    2015-12-01

    The focus of this work is to examine polarimetric decomposition techniques primarily focussed on Pauli decomposition and Sphere Di-Plane Helix (SDH) decomposition for forest resource assessment. The data processing methods adopted are Pre-processing (Geometric correction and Radiometric calibration), Speckle Reduction, Image Decomposition and Image Classification. Initially to classify forest regions, unsupervised classification was applied to determine different unknown classes. It was observed K-means clustering method gave better results in comparison with ISO Data method.Using the algorithm developed for Radar Tools, the code for decomposition and classification techniques were applied in Interactive Data Language (IDL) and was applied to RISAT-1 image of Mysore-Mandya region of Karnataka, India. This region is chosen for studying forest vegetation and consists of agricultural lands, water and hilly regions. Polarimetric SAR data possess a high potential for classification of earth surface.After applying the decomposition techniques, classification was done by selecting region of interests andpost-classification the over-all accuracy was observed to be higher in the SDH decomposed image, as it operates on individual pixels on a coherent basis and utilises the complete intrinsic coherent nature of polarimetric SAR data. Thereby, making SDH decomposition particularly suited for analysis of high-resolution SAR data. The Pauli Decomposition represents all the polarimetric information in a single SAR image however interpretation of the resulting image is difficult. The SDH decomposition technique seems to produce better results and interpretation as compared to Pauli Decomposition however more quantification and further analysis are being done in this area of research. The comparison of Polarimetric decomposition techniques and evolutionary classification techniques will be the scope of this work.

  13. A minimum spanning forest based classification method for dedicated breast CT images

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Pike, Robert; Sechopoulos, Ioannis; Fei, Baowei, E-mail: bfei@emory.edu

    Purpose: To develop and test an automated algorithm to classify different types of tissue in dedicated breast CT images. Methods: Images of a single breast of five different patients were acquired with a dedicated breast CT clinical prototype. The breast CT images were processed by a multiscale bilateral filter to reduce noise while keeping edge information and were corrected to overcome cupping artifacts. As skin and glandular tissue have similar CT values on breast CT images, morphologic processing is used to identify the skin based on its position information. A support vector machine (SVM) is trained and the resulting modelmore » used to create a pixelwise classification map of fat and glandular tissue. By combining the results of the skin mask with the SVM results, the breast tissue is classified as skin, fat, and glandular tissue. This map is then used to identify markers for a minimum spanning forest that is grown to segment the image using spatial and intensity information. To evaluate the authors’ classification method, they use DICE overlap ratios to compare the results of the automated classification to those obtained by manual segmentation on five patient images. Results: Comparison between the automatic and the manual segmentation shows that the minimum spanning forest based classification method was able to successfully classify dedicated breast CT image with average DICE ratios of 96.9%, 89.8%, and 89.5% for fat, glandular, and skin tissue, respectively. Conclusions: A 2D minimum spanning forest based classification method was proposed and evaluated for classifying the fat, skin, and glandular tissue in dedicated breast CT images. The classification method can be used for dense breast tissue quantification, radiation dose assessment, and other applications in breast imaging.« less

  14. Analysis of the Tanana River Basin using LANDSAT data

    NASA Technical Reports Server (NTRS)

    Morrissey, L. A.; Ambrosia, V. G.; Carson-Henry, C.

    1981-01-01

    Digital image classification techniques were used to classify land cover/resource information in the Tanana River Basin of Alaska. Portions of four scenes of LANDSAT digital data were analyzed using computer systems at Ames Research Center in an unsupervised approach to derive cluster statistics. The spectral classes were identified using the IDIMS display and color infrared photography. Classification errors were corrected using stratification procedures. The classification scheme resulted in the following eleven categories; sedimented/shallow water, clear/deep water, coniferous forest, mixed forest, deciduous forest, shrub and grass, bog, alpine tundra, barrens, snow and ice, and cultural features. Color coded maps and acreage summaries of the major land cover categories were generated for selected USGS quadrangles (1:250,000) which lie within the drainage basin. The project was completed within six months.

  15. Analysis of the 1996 Wisconsin forest statistics by habitat type.

    Treesearch

    John Kotar; Joseph A. Kovach; Gary Brand

    1999-01-01

    The fifth inventory of Wisconsin's forests is presented from the perspective of habitat type as a classification tool. Habitat type classifies forests based on the species composition of the understory plant community. Various forest attributes are summarized by habitat type and management implications are discussed.

  16. Comparison of Hybrid Classifiers for Crop Classification Using Normalized Difference Vegetation Index Time Series: A Case Study for Major Crops in North Xinjiang, China

    PubMed Central

    Hao, Pengyu; Wang, Li; Niu, Zheng

    2015-01-01

    A range of single classifiers have been proposed to classify crop types using time series vegetation indices, and hybrid classifiers are used to improve discriminatory power. Traditional fusion rules use the product of multi-single classifiers, but that strategy cannot integrate the classification output of machine learning classifiers. In this research, the performance of two hybrid strategies, multiple voting (M-voting) and probabilistic fusion (P-fusion), for crop classification using NDVI time series were tested with different training sample sizes at both pixel and object levels, and two representative counties in north Xinjiang were selected as study area. The single classifiers employed in this research included Random Forest (RF), Support Vector Machine (SVM), and See 5 (C 5.0). The results indicated that classification performance improved (increased the mean overall accuracy by 5%~10%, and reduced standard deviation of overall accuracy by around 1%) substantially with the training sample number, and when the training sample size was small (50 or 100 training samples), hybrid classifiers substantially outperformed single classifiers with higher mean overall accuracy (1%~2%). However, when abundant training samples (4,000) were employed, single classifiers could achieve good classification accuracy, and all classifiers obtained similar performances. Additionally, although object-based classification did not improve accuracy, it resulted in greater visual appeal, especially in study areas with a heterogeneous cropping pattern. PMID:26360597

  17. Automated classification of periodic variable stars detected by the wide-field infrared survey explorer

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Masci, Frank J.; Grillmair, Carl J.; Cutri, Roc M.

    2014-07-01

    We describe a methodology to classify periodic variable stars identified using photometric time-series measurements constructed from the Wide-field Infrared Survey Explorer (WISE) full-mission single-exposure Source Databases. This will assist in the future construction of a WISE Variable Source Database that assigns variables to specific science classes as constrained by the WISE observing cadence with statistically meaningful classification probabilities. We have analyzed the WISE light curves of 8273 variable stars identified in previous optical variability surveys (MACHO, GCVS, and ASAS) and show that Fourier decomposition techniques can be extended into the mid-IR to assist with their classification. Combined with other periodicmore » light-curve features, this sample is then used to train a machine-learned classifier based on the random forest (RF) method. Consistent with previous classification studies of variable stars in general, the RF machine-learned classifier is superior to other methods in terms of accuracy, robustness against outliers, and relative immunity to features that carry little or redundant class information. For the three most common classes identified by WISE: Algols, RR Lyrae, and W Ursae Majoris type variables, we obtain classification efficiencies of 80.7%, 82.7%, and 84.5% respectively using cross-validation analyses, with 95% confidence intervals of approximately ±2%. These accuracies are achieved at purity (or reliability) levels of 88.5%, 96.2%, and 87.8% respectively, similar to that achieved in previous automated classification studies of periodic variable stars.« less

  18. The edge-preservation multi-classifier relearning framework for the classification of high-resolution remotely sensed imagery

    NASA Astrophysics Data System (ADS)

    Han, Xiaopeng; Huang, Xin; Li, Jiayi; Li, Yansheng; Yang, Michael Ying; Gong, Jianya

    2018-04-01

    In recent years, the availability of high-resolution imagery has enabled more detailed observation of the Earth. However, it is imperative to simultaneously achieve accurate interpretation and preserve the spatial details for the classification of such high-resolution data. To this aim, we propose the edge-preservation multi-classifier relearning framework (EMRF). This multi-classifier framework is made up of support vector machine (SVM), random forest (RF), and sparse multinomial logistic regression via variable splitting and augmented Lagrangian (LORSAL) classifiers, considering their complementary characteristics. To better characterize complex scenes of remote sensing images, relearning based on landscape metrics is proposed, which iteratively quantizes both the landscape composition and spatial configuration by the use of the initial classification results. In addition, a novel tri-training strategy is proposed to solve the over-smoothing effect of relearning by means of automatic selection of training samples with low classification certainties, which always distribute in or near the edge areas. Finally, EMRF flexibly combines the strengths of relearning and tri-training via the classification certainties calculated by the probabilistic output of the respective classifiers. It should be noted that, in order to achieve an unbiased evaluation, we assessed the classification accuracy of the proposed framework using both edge and non-edge test samples. The experimental results obtained with four multispectral high-resolution images confirm the efficacy of the proposed framework, in terms of both edge and non-edge accuracy.

  19. Using ecological zones to increase the detail of Landsat classifications

    NASA Technical Reports Server (NTRS)

    Fox, L., III; Mayer, K. E.

    1981-01-01

    Changes in classification detail of forest species descriptions were made for Landsat data on 2.2 million acres in northwestern California. Because basic forest canopy structures may exhibit very similar E-M energy reflectance patterns in different environmental regions, classification labels based on Landsat spectral signatures alone become very generalized when mapping large heterogeneous ecological regions. By adding a seven ecological zone stratification, a 167% improvement in classification detail was made over the results achieved without it. The seven zone stratification is a less costly alternative to the inclusion of complex collateral information, such as terrain data and soil type, into the Landsat data base when making inventories of areas greater than 500,000 acres.

  20. Creation and implementation of a certification system for insurability and fire risk classification for forest plantations

    Treesearch

    Veronica Loewe M.; Victor Vargas; Juan Miguel Ruiz; Andrea Alvarez C.; Felipe Lobo Q.

    2015-01-01

    Currently, the Chilean insurance market sells forest fire insurance policies and agricultural weather risk policies. However, access to forest fire insurance is difficult for small and medium enterprises (SMEs), with a significant proportion (close to 50%) of forest plantations being without coverage. Indeed, the insurance market that sells forest fire insurance...

Top