Sample records for random forest support

  1. Novel solutions for an old disease: diagnosis of acute appendicitis with random forest, support vector machines, and artificial neural networks.

    PubMed

    Hsieh, Chung-Ho; Lu, Ruey-Hwa; Lee, Nai-Hsin; Chiu, Wen-Ta; Hsu, Min-Huei; Li, Yu-Chuan Jack

    2011-01-01

    Diagnosing acute appendicitis clinically is still difficult. We developed random forests, support vector machines, and artificial neural network models to diagnose acute appendicitis. Between January 2006 and December 2008, patients who had a consultation session with surgeons for suspected acute appendicitis were enrolled. Seventy-five percent of the data set was used to construct models including random forest, support vector machines, artificial neural networks, and logistic regression. Twenty-five percent of the data set was withheld to evaluate model performance. The area under the receiver operating characteristic curve (AUC) was used to evaluate performance, which was compared with that of the Alvarado score. Data from a total of 180 patients were collected, 135 used for training and 45 for testing. The mean age of patients was 39.4 years (range, 16-85). Final diagnosis revealed 115 patients with and 65 without appendicitis. The AUC of random forest, support vector machines, artificial neural networks, logistic regression, and Alvarado was 0.98, 0.96, 0.91, 0.87, and 0.77, respectively. The sensitivity, specificity, positive, and negative predictive values of random forest were 94%, 100%, 100%, and 87%, respectively. Random forest performed better than artificial neural networks, logistic regression, and Alvarado. We demonstrated that random forest can predict acute appendicitis with good accuracy and, deployed appropriately, can be an effective tool in clinical decision making. Copyright © 2011 Mosby, Inc. All rights reserved.

  2. Subpixel urban land cover estimation: comparing cubist, random forests, and support vector regression

    Treesearch

    Jeffrey T. Walton

    2008-01-01

    Three machine learning subpixel estimation methods (Cubist, Random Forests, and support vector regression) were applied to estimate urban cover. Urban forest canopy cover and impervious surface cover were estimated from Landsat-7 ETM+ imagery using a higher resolution cover map resampled to 30 m as training and reference data. Three different band combinations (...

  3. Comparison of the Predictive Performance and Interpretability of Random Forest and Linear Models on Benchmark Data Sets.

    PubMed

    Marchese Robinson, Richard L; Palczewska, Anna; Palczewski, Jan; Kidley, Nathan

    2017-08-28

    The ability to interpret the predictions made by quantitative structure-activity relationships (QSARs) offers a number of advantages. While QSARs built using nonlinear modeling approaches, such as the popular Random Forest algorithm, might sometimes be more predictive than those built using linear modeling approaches, their predictions have been perceived as difficult to interpret. However, a growing number of approaches have been proposed for interpreting nonlinear QSAR models in general and Random Forest in particular. In the current work, we compare the performance of Random Forest to those of two widely used linear modeling approaches: linear Support Vector Machines (SVMs) (or Support Vector Regression (SVR)) and partial least-squares (PLS). We compare their performance in terms of their predictivity as well as the chemical interpretability of the predictions using novel scoring schemes for assessing heat map images of substructural contributions. We critically assess different approaches for interpreting Random Forest models as well as for obtaining predictions from the forest. We assess the models on a large number of widely employed public-domain benchmark data sets corresponding to regression and binary classification problems of relevance to hit identification and toxicology. We conclude that Random Forest typically yields comparable or possibly better predictive performance than the linear modeling approaches and that its predictions may also be interpreted in a chemically and biologically meaningful way. In contrast to earlier work looking at interpretation of nonlinear QSAR models, we directly compare two methodologically distinct approaches for interpreting Random Forest models. The approaches for interpreting Random Forest assessed in our article were implemented using open-source programs that we have made available to the community. These programs are the rfFC package ( https://r-forge.r-project.org/R/?group_id=1725 ) for the R statistical programming language and the Python program HeatMapWrapper [ https://doi.org/10.5281/zenodo.495163 ] for heat map generation.

  4. Predicting healthcare associated infections using patients' experiences

    NASA Astrophysics Data System (ADS)

    Pratt, Michael A.; Chu, Henry

    2016-05-01

    Healthcare associated infections (HAI) are a major threat to patient safety and are costly to health systems. Our goal is to predict the HAI performance of a hospital using the patients' experience responses as input. We use four classifiers, viz. random forest, naive Bayes, artificial feedforward neural networks, and the support vector machine, to perform the prediction of six types of HAI. The six types include blood stream, urinary tract, surgical site, and intestinal infections. Experiments show that the random forest and support vector machine perform well across the six types of HAI.

  5. CW-SSIM kernel based random forest for image classification

    NASA Astrophysics Data System (ADS)

    Fan, Guangzhe; Wang, Zhou; Wang, Jiheng

    2010-07-01

    Complex wavelet structural similarity (CW-SSIM) index has been proposed as a powerful image similarity metric that is robust to translation, scaling and rotation of images, but how to employ it in image classification applications has not been deeply investigated. In this paper, we incorporate CW-SSIM as a kernel function into a random forest learning algorithm. This leads to a novel image classification approach that does not require a feature extraction or dimension reduction stage at the front end. We use hand-written digit recognition as an example to demonstrate our algorithm. We compare the performance of the proposed approach with random forest learning based on other kernels, including the widely adopted Gaussian and the inner product kernels. Empirical evidences show that the proposed method is superior in its classification power. We also compared our proposed approach with the direct random forest method without kernel and the popular kernel-learning method support vector machine. Our test results based on both simulated and realworld data suggest that the proposed approach works superior to traditional methods without the feature selection procedure.

  6. Data mining methods in the prediction of Dementia: A real-data comparison of the accuracy, sensitivity and specificity of linear discriminant analysis, logistic regression, neural networks, support vector machines, classification trees and random forests

    PubMed Central

    2011-01-01

    Background Dementia and cognitive impairment associated with aging are a major medical and social concern. Neuropsychological testing is a key element in the diagnostic procedures of Mild Cognitive Impairment (MCI), but has presently a limited value in the prediction of progression to dementia. We advance the hypothesis that newer statistical classification methods derived from data mining and machine learning methods like Neural Networks, Support Vector Machines and Random Forests can improve accuracy, sensitivity and specificity of predictions obtained from neuropsychological testing. Seven non parametric classifiers derived from data mining methods (Multilayer Perceptrons Neural Networks, Radial Basis Function Neural Networks, Support Vector Machines, CART, CHAID and QUEST Classification Trees and Random Forests) were compared to three traditional classifiers (Linear Discriminant Analysis, Quadratic Discriminant Analysis and Logistic Regression) in terms of overall classification accuracy, specificity, sensitivity, Area under the ROC curve and Press'Q. Model predictors were 10 neuropsychological tests currently used in the diagnosis of dementia. Statistical distributions of classification parameters obtained from a 5-fold cross-validation were compared using the Friedman's nonparametric test. Results Press' Q test showed that all classifiers performed better than chance alone (p < 0.05). Support Vector Machines showed the larger overall classification accuracy (Median (Me) = 0.76) an area under the ROC (Me = 0.90). However this method showed high specificity (Me = 1.0) but low sensitivity (Me = 0.3). Random Forest ranked second in overall accuracy (Me = 0.73) with high area under the ROC (Me = 0.73) specificity (Me = 0.73) and sensitivity (Me = 0.64). Linear Discriminant Analysis also showed acceptable overall accuracy (Me = 0.66), with acceptable area under the ROC (Me = 0.72) specificity (Me = 0.66) and sensitivity (Me = 0.64). The remaining classifiers showed overall classification accuracy above a median value of 0.63, but for most sensitivity was around or even lower than a median value of 0.5. Conclusions When taking into account sensitivity, specificity and overall classification accuracy Random Forests and Linear Discriminant analysis rank first among all the classifiers tested in prediction of dementia using several neuropsychological tests. These methods may be used to improve accuracy, sensitivity and specificity of Dementia predictions from neuropsychological testing. PMID:21849043

  7. Predicting membrane protein types using various decision tree classifiers based on various modes of general PseAAC for imbalanced datasets.

    PubMed

    Sankari, E Siva; Manimegalai, D

    2017-12-21

    Predicting membrane protein types is an important and challenging research area in bioinformatics and proteomics. Traditional biophysical methods are used to classify membrane protein types. Due to large exploration of uncharacterized protein sequences in databases, traditional methods are very time consuming, expensive and susceptible to errors. Hence, it is highly desirable to develop a robust, reliable, and efficient method to predict membrane protein types. Imbalanced datasets and large datasets are often handled well by decision tree classifiers. Since imbalanced datasets are taken, the performance of various decision tree classifiers such as Decision Tree (DT), Classification And Regression Tree (CART), C4.5, Random tree, REP (Reduced Error Pruning) tree, ensemble methods such as Adaboost, RUS (Random Under Sampling) boost, Rotation forest and Random forest are analysed. Among the various decision tree classifiers Random forest performs well in less time with good accuracy of 96.35%. Another inference is RUS boost decision tree classifier is able to classify one or two samples in the class with very less samples while the other classifiers such as DT, Adaboost, Rotation forest and Random forest are not sensitive for the classes with fewer samples. Also the performance of decision tree classifiers is compared with SVM (Support Vector Machine) and Naive Bayes classifier. Copyright © 2017 Elsevier Ltd. All rights reserved.

  8. An application of quantile random forests for predictive mapping of forest attributes

    Treesearch

    E.A. Freeman; G.G. Moisen

    2015-01-01

    Increasingly, random forest models are used in predictive mapping of forest attributes. Traditional random forests output the mean prediction from the random trees. Quantile regression forests (QRF) is an extension of random forests developed by Nicolai Meinshausen that provides non-parametric estimates of the median predicted value as well as prediction quantiles. It...

  9. A spatially explicit decision support model for restoration of forest bird habitat

    USGS Publications Warehouse

    Twedt, D.J.; Uihlein, W.B.; Elliott, A.B.

    2006-01-01

    The historical area of bottomland hardwood forest in the Mississippi Alluvial Valley has been reduced by >75%. Agricultural production was the primary motivator for deforestation; hence, clearing deliberately targeted higher and drier sites. Remaining forests are highly fragmented and hydrologically altered, with larger forest fragments subject to greater inundation, which has negatively affected many forest bird populations. We developed a spatially explicit decision support model, based on a Partners in Flight plan for forest bird conservation, that prioritizes forest restoration to reduce forest fragmentation and increase the area of forest core (interior forest >1 km from 'hostile' edge). Our primary objective was to increase the number of forest patches that harbor >2000 ha of forest core, but we also sought to increase the number and area of forest cores >5000 ha. Concurrently, we targeted restoration within local (320 km2) landscapes to achieve >60% forest cover. Finally, we emphasized restoration of higher-elevation bottomland hardwood forests in areas where restoration would not increase forest fragmentation. Reforestation of 10% of restorable land in the Mississippi Alluvial Valley (approximately 880,000 ha) targeted at priorities established by this decision support model resulted in approximately 824,000 ha of new forest core. This is more than 32 times the amount of core forest added through reforestation of randomly located fields (approximately 25,000 ha). The total area of forest core (1.6 million ha) that resulted from targeted restoration exceeded habitat objectives identified in the Partners in Flight Bird Conservation Plan and approached the area of forest core present in the 1950s.

  10. Random forest models to predict aqueous solubility.

    PubMed

    Palmer, David S; O'Boyle, Noel M; Glen, Robert C; Mitchell, John B O

    2007-01-01

    Random Forest regression (RF), Partial-Least-Squares (PLS) regression, Support Vector Machines (SVM), and Artificial Neural Networks (ANN) were used to develop QSPR models for the prediction of aqueous solubility, based on experimental data for 988 organic molecules. The Random Forest regression model predicted aqueous solubility more accurately than those created by PLS, SVM, and ANN and offered methods for automatic descriptor selection, an assessment of descriptor importance, and an in-parallel measure of predictive ability, all of which serve to recommend its use. The prediction of log molar solubility for an external test set of 330 molecules that are solid at 25 degrees C gave an r2 = 0.89 and RMSE = 0.69 log S units. For a standard data set selected from the literature, the model performed well with respect to other documented methods. Finally, the diversity of the training and test sets are compared to the chemical space occupied by molecules in the MDL drug data report, on the basis of molecular descriptors selected by the regression analysis.

  11. Mathematical models application for mapping soils spatial distribution on the example of the farm from the North of Udmurt Republic of Russia

    NASA Astrophysics Data System (ADS)

    Dokuchaev, P. M.; Meshalkina, J. L.; Yaroslavtsev, A. M.

    2018-01-01

    Comparative analysis of soils geospatial modeling using multinomial logistic regression, decision trees, random forest, regression trees and support vector machines algorithms was conducted. The visual interpretation of the digital maps obtained and their comparison with the existing map, as well as the quantitative assessment of the individual soil groups detection overall accuracy and of the models kappa showed that multiple logistic regression, support vector method, and random forest models application with spatial prediction of the conditional soil groups distribution can be reliably used for mapping of the study area. It has shown the most accurate detection for sod-podzolics soils (Phaeozems Albic) lightly eroded and moderately eroded soils. In second place, according to the mean overall accuracy of the prediction, there are sod-podzolics soils - non-eroded and warp one, as well as sod-gley soils (Umbrisols Gleyic) and alluvial soils (Fluvisols Dystric, Umbric). Heavy eroded sod-podzolics and gray forest soils (Phaeozems Albic) were detected by methods of automatic classification worst of all.

  12. Integrating support vector machines and random forests to classify crops in time series of Worldview-2 images

    NASA Astrophysics Data System (ADS)

    Zafari, A.; Zurita-Milla, R.; Izquierdo-Verdiguier, E.

    2017-10-01

    Crop maps are essential inputs for the agricultural planning done at various governmental and agribusinesses agencies. Remote sensing offers timely and costs efficient technologies to identify and map crop types over large areas. Among the plethora of classification methods, Support Vector Machine (SVM) and Random Forest (RF) are widely used because of their proven performance. In this work, we study the synergic use of both methods by introducing a random forest kernel (RFK) in an SVM classifier. A time series of multispectral WorldView-2 images acquired over Mali (West Africa) in 2014 was used to develop our case study. Ground truth containing five common crop classes (cotton, maize, millet, peanut, and sorghum) were collected at 45 farms and used to train and test the classifiers. An SVM with the standard Radial Basis Function (RBF) kernel, a RF, and an SVM-RFK were trained and tested over 10 random training and test subsets generated from the ground data. Results show that the newly proposed SVM-RFK classifier can compete with both RF and SVM-RBF. The overall accuracies based on the spectral bands only are of 83, 82 and 83% respectively. Adding vegetation indices to the analysis result in the classification accuracy of 82, 81 and 84% for SVM-RFK, RF, and SVM-RBF respectively. Overall, it can be observed that the newly tested RFK can compete with SVM-RBF and RF classifiers in terms of classification accuracy.

  13. Data-Driven Lead-Acid Battery Prognostics Using Random Survival Forests

    DTIC Science & Technology

    2014-10-02

    Kogalur, Blackstone , & Lauer, 2008; Ishwaran & Kogalur, 2010). Random survival forest is a sur- vival analysis extension of Random Forests (Breiman, 2001...Statistics & probability letters, 80(13), 1056–1064. Ishwaran, H., Kogalur, U. B., Blackstone , E. H., & Lauer, M. S. (2008). Random survival forests. The

  14. A tale of two "forests": random forest machine learning AIDS tropical forest carbon mapping.

    PubMed

    Mascaro, Joseph; Asner, Gregory P; Knapp, David E; Kennedy-Bowdoin, Ty; Martin, Roberta E; Anderson, Christopher; Higgins, Mark; Chadwick, K Dana

    2014-01-01

    Accurate and spatially-explicit maps of tropical forest carbon stocks are needed to implement carbon offset mechanisms such as REDD+ (Reduced Deforestation and Degradation Plus). The Random Forest machine learning algorithm may aid carbon mapping applications using remotely-sensed data. However, Random Forest has never been compared to traditional and potentially more reliable techniques such as regionally stratified sampling and upscaling, and it has rarely been employed with spatial data. Here, we evaluated the performance of Random Forest in upscaling airborne LiDAR (Light Detection and Ranging)-based carbon estimates compared to the stratification approach over a 16-million hectare focal area of the Western Amazon. We considered two runs of Random Forest, both with and without spatial contextual modeling by including--in the latter case--x, and y position directly in the model. In each case, we set aside 8 million hectares (i.e., half of the focal area) for validation; this rigorous test of Random Forest went above and beyond the internal validation normally compiled by the algorithm (i.e., called "out-of-bag"), which proved insufficient for this spatial application. In this heterogeneous region of Northern Peru, the model with spatial context was the best preforming run of Random Forest, and explained 59% of LiDAR-based carbon estimates within the validation area, compared to 37% for stratification or 43% by Random Forest without spatial context. With the 60% improvement in explained variation, RMSE against validation LiDAR samples improved from 33 to 26 Mg C ha(-1) when using Random Forest with spatial context. Our results suggest that spatial context should be considered when using Random Forest, and that doing so may result in substantially improved carbon stock modeling for purposes of climate change mitigation.

  15. Ensemble Feature Learning of Genomic Data Using Support Vector Machine

    PubMed Central

    Anaissi, Ali; Goyal, Madhu; Catchpoole, Daniel R.; Braytee, Ali; Kennedy, Paul J.

    2016-01-01

    The identification of a subset of genes having the ability to capture the necessary information to distinguish classes of patients is crucial in bioinformatics applications. Ensemble and bagging methods have been shown to work effectively in the process of gene selection and classification. Testament to that is random forest which combines random decision trees with bagging to improve overall feature selection and classification accuracy. Surprisingly, the adoption of these methods in support vector machines has only recently received attention but mostly on classification not gene selection. This paper introduces an ensemble SVM-Recursive Feature Elimination (ESVM-RFE) for gene selection that follows the concepts of ensemble and bagging used in random forest but adopts the backward elimination strategy which is the rationale of RFE algorithm. The rationale behind this is, building ensemble SVM models using randomly drawn bootstrap samples from the training set, will produce different feature rankings which will be subsequently aggregated as one feature ranking. As a result, the decision for elimination of features is based upon the ranking of multiple SVM models instead of choosing one particular model. Moreover, this approach will address the problem of imbalanced datasets by constructing a nearly balanced bootstrap sample. Our experiments show that ESVM-RFE for gene selection substantially increased the classification performance on five microarray datasets compared to state-of-the-art methods. Experiments on the childhood leukaemia dataset show that an average 9% better accuracy is achieved by ESVM-RFE over SVM-RFE, and 5% over random forest based approach. The selected genes by the ESVM-RFE algorithm were further explored with Singular Value Decomposition (SVD) which reveals significant clusters with the selected data. PMID:27304923

  16. A Tale of Two “Forests”: Random Forest Machine Learning Aids Tropical Forest Carbon Mapping

    PubMed Central

    Mascaro, Joseph; Asner, Gregory P.; Knapp, David E.; Kennedy-Bowdoin, Ty; Martin, Roberta E.; Anderson, Christopher; Higgins, Mark; Chadwick, K. Dana

    2014-01-01

    Accurate and spatially-explicit maps of tropical forest carbon stocks are needed to implement carbon offset mechanisms such as REDD+ (Reduced Deforestation and Degradation Plus). The Random Forest machine learning algorithm may aid carbon mapping applications using remotely-sensed data. However, Random Forest has never been compared to traditional and potentially more reliable techniques such as regionally stratified sampling and upscaling, and it has rarely been employed with spatial data. Here, we evaluated the performance of Random Forest in upscaling airborne LiDAR (Light Detection and Ranging)-based carbon estimates compared to the stratification approach over a 16-million hectare focal area of the Western Amazon. We considered two runs of Random Forest, both with and without spatial contextual modeling by including—in the latter case—x, and y position directly in the model. In each case, we set aside 8 million hectares (i.e., half of the focal area) for validation; this rigorous test of Random Forest went above and beyond the internal validation normally compiled by the algorithm (i.e., called “out-of-bag”), which proved insufficient for this spatial application. In this heterogeneous region of Northern Peru, the model with spatial context was the best preforming run of Random Forest, and explained 59% of LiDAR-based carbon estimates within the validation area, compared to 37% for stratification or 43% by Random Forest without spatial context. With the 60% improvement in explained variation, RMSE against validation LiDAR samples improved from 33 to 26 Mg C ha−1 when using Random Forest with spatial context. Our results suggest that spatial context should be considered when using Random Forest, and that doing so may result in substantially improved carbon stock modeling for purposes of climate change mitigation. PMID:24489686

  17. Analysis of Machine Learning Techniques for Heart Failure Readmissions.

    PubMed

    Mortazavi, Bobak J; Downing, Nicholas S; Bucholz, Emily M; Dharmarajan, Kumar; Manhapra, Ajay; Li, Shu-Xia; Negahban, Sahand N; Krumholz, Harlan M

    2016-11-01

    The current ability to predict readmissions in patients with heart failure is modest at best. It is unclear whether machine learning techniques that address higher dimensional, nonlinear relationships among variables would enhance prediction. We sought to compare the effectiveness of several machine learning algorithms for predicting readmissions. Using data from the Telemonitoring to Improve Heart Failure Outcomes trial, we compared the effectiveness of random forests, boosting, random forests combined hierarchically with support vector machines or logistic regression (LR), and Poisson regression against traditional LR to predict 30- and 180-day all-cause readmissions and readmissions because of heart failure. We randomly selected 50% of patients for a derivation set, and a validation set comprised the remaining patients, validated using 100 bootstrapped iterations. We compared C statistics for discrimination and distributions of observed outcomes in risk deciles for predictive range. In 30-day all-cause readmission prediction, the best performing machine learning model, random forests, provided a 17.8% improvement over LR (mean C statistics, 0.628 and 0.533, respectively). For readmissions because of heart failure, boosting improved the C statistic by 24.9% over LR (mean C statistic 0.678 and 0.543, respectively). For 30-day all-cause readmission, the observed readmission rates in the lowest and highest deciles of predicted risk with random forests (7.8% and 26.2%, respectively) showed a much wider separation than LR (14.2% and 16.4%, respectively). Machine learning methods improved the prediction of readmission after hospitalization for heart failure compared with LR and provided the greatest predictive range in observed readmission rates. © 2016 American Heart Association, Inc.

  18. Land cover and land use mapping of the iSimangaliso Wetland Park, South Africa: comparison of oblique and orthogonal random forest algorithms

    NASA Astrophysics Data System (ADS)

    Bassa, Zaakirah; Bob, Urmilla; Szantoi, Zoltan; Ismail, Riyad

    2016-01-01

    In recent years, the popularity of tree-based ensemble methods for land cover classification has increased significantly. Using WorldView-2 image data, we evaluate the potential of the oblique random forest algorithm (oRF) to classify a highly heterogeneous protected area. In contrast to the random forest (RF) algorithm, the oRF algorithm builds multivariate trees by learning the optimal split using a supervised model. The oRF binary algorithm is adapted to a multiclass land cover and land use application using both the "one-against-one" and "one-against-all" combination approaches. Results show that the oRF algorithms are capable of achieving high classification accuracies (>80%). However, there was no statistical difference in classification accuracies obtained by the oRF algorithms and the more popular RF algorithm. For all the algorithms, user accuracies (UAs) and producer accuracies (PAs) >80% were recorded for most of the classes. Both the RF and oRF algorithms poorly classified the indigenous forest class as indicated by the low UAs and PAs. Finally, the results from this study advocate and support the utility of the oRF algorithm for land cover and land use mapping of protected areas using WorldView-2 image data.

  19. Combination of support vector machine, artificial neural network and random forest for improving the classification of convective and stratiform rain using spectral features of SEVIRI data

    NASA Astrophysics Data System (ADS)

    Lazri, Mourad; Ameur, Soltane

    2018-05-01

    A model combining three classifiers, namely Support vector machine, Artificial neural network and Random forest (SAR) is designed for improving the classification of convective and stratiform rain. This model (SAR model) has been trained and then tested on a datasets derived from MSG-SEVIRI (Meteosat Second Generation-Spinning Enhanced Visible and Infrared Imager). Well-classified, mid-classified and misclassified pixels are determined from the combination of three classifiers. Mid-classified and misclassified pixels that are considered unreliable pixels are reclassified by using a novel training of the developed scheme. In this novel training, only the input data corresponding to the pixels in question to are used. This whole process is repeated a second time and applied to mid-classified and misclassified pixels separately. Learning and validation of the developed scheme are realized against co-located data observed by ground radar. The developed scheme outperformed different classifiers used separately and reached 97.40% of overall accuracy of classification.

  20. Benchmarking dairy herd health status using routinely recorded herd summary data.

    PubMed

    Parker Gaddis, K L; Cole, J B; Clay, J S; Maltecca, C

    2016-02-01

    Genetic improvement of dairy cattle health through the use of producer-recorded data has been determined to be feasible. Low estimated heritabilities indicate that genetic progress will be slow. Variation observed in lowly heritable traits can largely be attributed to nongenetic factors, such as the environment. More rapid improvement of dairy cattle health may be attainable if herd health programs incorporate environmental and managerial aspects. More than 1,100 herd characteristics are regularly recorded on farm test-days. We combined these data with producer-recorded health event data, and parametric and nonparametric models were used to benchmark herd and cow health status. Health events were grouped into 3 categories for analyses: mastitis, reproductive, and metabolic. Both herd incidence and individual incidence were used as dependent variables. Models implemented included stepwise logistic regression, support vector machines, and random forests. At both the herd and individual levels, random forest models attained the highest accuracy for predicting health status in all health event categories when evaluated with 10-fold cross-validation. Accuracy (SD) ranged from 0.61 (0.04) to 0.63 (0.04) when using random forest models at the herd level. Accuracy of prediction (SD) at the individual cow level ranged from 0.87 (0.06) to 0.93 (0.001) with random forest models. Highly significant variables and key words from logistic regression and random forest models were also investigated. All models identified several of the same key factors for each health event category, including movement out of the herd, size of the herd, and weather-related variables. We concluded that benchmarking health status using routinely collected herd data is feasible. Nonparametric models were better suited to handle this complex data with numerous variables. These data mining techniques were able to perform prediction of health status and could add evidence to personal experience in herd management. Copyright © 2016 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.

  1. Spatio-temporal Change Patterns of Tropical Forests from 2000 to 2014 Using MOD09A1 Dataset

    NASA Astrophysics Data System (ADS)

    Qin, Y.; Xiao, X.; Dong, J.

    2016-12-01

    Large-scale deforestation and forest degradation in the tropical region have resulted in extensive carbon emissions and biodiversity loss. However, restricted by the availability of good-quality observations, large uncertainty exists in mapping the spatial distribution of forests and their spatio-temporal changes. In this study, we proposed a pixel- and phenology-based algorithm to identify and map annual tropical forests from 2000 to 2014, using the 8-day, 500-m MOD09A1 (v005) product, under the support of Google cloud computing (Google Earth Engine). A temporal filter was applied to reduce the random noises and to identify the spatio-temporal changes of forests. We then built up a confusion matrix and assessed the accuracy of the annual forest maps based on the ground reference interpreted from high spatial resolution images in Google Earth. The resultant forest maps showed the consistent forest/non-forest, forest loss, and forest gain in the pan-tropical zone during 2000 - 2014. The proposed algorithm showed the potential for tropical forest mapping and the resultant forest maps are important for the estimation of carbon emission and biodiversity loss.

  2. Calibrating random forests for probability estimation.

    PubMed

    Dankowski, Theresa; Ziegler, Andreas

    2016-09-30

    Probabilities can be consistently estimated using random forests. It is, however, unclear how random forests should be updated to make predictions for other centers or at different time points. In this work, we present two approaches for updating random forests for probability estimation. The first method has been proposed by Elkan and may be used for updating any machine learning approach yielding consistent probabilities, so-called probability machines. The second approach is a new strategy specifically developed for random forests. Using the terminal nodes, which represent conditional probabilities, the random forest is first translated to logistic regression models. These are, in turn, used for re-calibration. The two updating strategies were compared in a simulation study and are illustrated with data from the German Stroke Study Collaboration. In most simulation scenarios, both methods led to similar improvements. In the simulation scenario in which the stricter assumptions of Elkan's method were not met, the logistic regression-based re-calibration approach for random forests outperformed Elkan's method. It also performed better on the stroke data than Elkan's method. The strength of Elkan's method is its general applicability to any probability machine. However, if the strict assumptions underlying this approach are not met, the logistic regression-based approach is preferable for updating random forests for probability estimation. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.

  3. Studies of the DIII-D disruption database using Machine Learning algorithms

    NASA Astrophysics Data System (ADS)

    Rea, Cristina; Granetz, Robert; Meneghini, Orso

    2017-10-01

    A Random Forests Machine Learning algorithm, trained on a large database of both disruptive and non-disruptive DIII-D discharges, predicts disruptive behavior in DIII-D with about 90% of accuracy. Several algorithms have been tested and Random Forests was found superior in performances for this particular task. Over 40 plasma parameters are included in the database, with data for each of the parameters taken from 500k time slices. We focused on a subset of non-dimensional plasma parameters, deemed to be good predictors based on physics considerations. Both binary (disruptive/non-disruptive) and multi-label (label based on the elapsed time before disruption) classification problems are investigated. The Random Forests algorithm provides insight on the available dataset by ranking the relative importance of the input features. It is found that q95 and Greenwald density fraction (n/nG) are the most relevant parameters for discriminating between DIII-D disruptive and non-disruptive discharges. A comparison with the Gradient Boosted Trees algorithm is shown and the first results coming from the application of regression algorithms are presented. Work supported by the US Department of Energy under DE-FC02-04ER54698, DE-SC0014264 and DE-FG02-95ER54309.

  4. Statistical-learning strategies generate only modestly performing predictive models for urinary symptoms following external beam radiotherapy of the prostate: A comparison of conventional and machine-learning methods

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Yahya, Noorazrul, E-mail: noorazrul.yahya@research.uwa.edu.au; Ebert, Martin A.; Bulsara, Max

    Purpose: Given the paucity of available data concerning radiotherapy-induced urinary toxicity, it is important to ensure derivation of the most robust models with superior predictive performance. This work explores multiple statistical-learning strategies for prediction of urinary symptoms following external beam radiotherapy of the prostate. Methods: The performance of logistic regression, elastic-net, support-vector machine, random forest, neural network, and multivariate adaptive regression splines (MARS) to predict urinary symptoms was analyzed using data from 754 participants accrued by TROG03.04-RADAR. Predictive features included dose-surface data, comorbidities, and medication-intake. Four symptoms were analyzed: dysuria, haematuria, incontinence, and frequency, each with three definitions (grade ≥more » 1, grade ≥ 2 and longitudinal) with event rate between 2.3% and 76.1%. Repeated cross-validations producing matched models were implemented. A synthetic minority oversampling technique was utilized in endpoints with rare events. Parameter optimization was performed on the training data. Area under the receiver operating characteristic curve (AUROC) was used to compare performance using sample size to detect differences of ≥0.05 at the 95% confidence level. Results: Logistic regression, elastic-net, random forest, MARS, and support-vector machine were the highest-performing statistical-learning strategies in 3, 3, 3, 2, and 1 endpoints, respectively. Logistic regression, MARS, elastic-net, random forest, neural network, and support-vector machine were the best, or were not significantly worse than the best, in 7, 7, 5, 5, 3, and 1 endpoints. The best-performing statistical model was for dysuria grade ≥ 1 with AUROC ± standard deviation of 0.649 ± 0.074 using MARS. For longitudinal frequency and dysuria grade ≥ 1, all strategies produced AUROC>0.6 while all haematuria endpoints and longitudinal incontinence models produced AUROC<0.6. Conclusions: Logistic regression and MARS were most likely to be the best-performing strategy for the prediction of urinary symptoms with elastic-net and random forest producing competitive results. The predictive power of the models was modest and endpoint-dependent. New features, including spatial dose maps, may be necessary to achieve better models.« less

  5. Forecasting Daily Patient Outflow From a Ward Having No Real-Time Clinical Data

    PubMed Central

    Tran, Truyen; Luo, Wei; Phung, Dinh; Venkatesh, Svetha

    2016-01-01

    Background: Modeling patient flow is crucial in understanding resource demand and prioritization. We study patient outflow from an open ward in an Australian hospital, where currently bed allocation is carried out by a manager relying on past experiences and looking at demand. Automatic methods that provide a reasonable estimate of total next-day discharges can aid in efficient bed management. The challenges in building such methods lie in dealing with large amounts of discharge noise introduced by the nonlinear nature of hospital procedures, and the nonavailability of real-time clinical information in wards. Objective Our study investigates different models to forecast the total number of next-day discharges from an open ward having no real-time clinical data. Methods We compared 5 popular regression algorithms to model total next-day discharges: (1) autoregressive integrated moving average (ARIMA), (2) the autoregressive moving average with exogenous variables (ARMAX), (3) k-nearest neighbor regression, (4) random forest regression, and (5) support vector regression. Although the autoregressive integrated moving average model relied on past 3-month discharges, nearest neighbor forecasting used median of similar discharges in the past in estimating next-day discharge. In addition, the ARMAX model used the day of the week and number of patients currently in ward as exogenous variables. For the random forest and support vector regression models, we designed a predictor set of 20 patient features and 88 ward-level features. Results Our data consisted of 12,141 patient visits over 1826 days. Forecasting quality was measured using mean forecast error, mean absolute error, symmetric mean absolute percentage error, and root mean square error. When compared with a moving average prediction model, all 5 models demonstrated superior performance with the random forests achieving 22.7% improvement in mean absolute error, for all days in the year 2014. Conclusions In the absence of clinical information, our study recommends using patient-level and ward-level data in predicting next-day discharges. Random forest and support vector regression models are able to use all available features from such data, resulting in superior performance over traditional autoregressive methods. An intelligent estimate of available beds in wards plays a crucial role in relieving access block in emergency departments. PMID:27444059

  6. Applying a weighted random forests method to extract karst sinkholes from LiDAR data

    NASA Astrophysics Data System (ADS)

    Zhu, Junfeng; Pierskalla, William P.

    2016-02-01

    Detailed mapping of sinkholes provides critical information for mitigating sinkhole hazards and understanding groundwater and surface water interactions in karst terrains. LiDAR (Light Detection and Ranging) measures the earth's surface in high-resolution and high-density and has shown great potentials to drastically improve locating and delineating sinkholes. However, processing LiDAR data to extract sinkholes requires separating sinkholes from other depressions, which can be laborious because of the sheer number of the depressions commonly generated from LiDAR data. In this study, we applied the random forests, a machine learning method, to automatically separate sinkholes from other depressions in a karst region in central Kentucky. The sinkhole-extraction random forest was grown on a training dataset built from an area where LiDAR-derived depressions were manually classified through a visual inspection and field verification process. Based on the geometry of depressions, as well as natural and human factors related to sinkholes, 11 parameters were selected as predictive variables to form the dataset. Because the training dataset was imbalanced with the majority of depressions being non-sinkholes, a weighted random forests method was used to improve the accuracy of predicting sinkholes. The weighted random forest achieved an average accuracy of 89.95% for the training dataset, demonstrating that the random forest can be an effective sinkhole classifier. Testing of the random forest in another area, however, resulted in moderate success with an average accuracy rate of 73.96%. This study suggests that an automatic sinkhole extraction procedure like the random forest classifier can significantly reduce time and labor costs and makes its more tractable to map sinkholes using LiDAR data for large areas. However, the random forests method cannot totally replace manual procedures, such as visual inspection and field verification.

  7. A Random Forest Approach to Predict the Spatial Distribution of Sediment Pollution in an Estuarine System

    EPA Science Inventory

    Modeling the magnitude and distribution of sediment-bound pollutants in estuaries is often limited by incomplete knowledge of the site and inadequate sample density. To address these modeling limitations, a decision-support tool framework was conceived that predicts sediment cont...

  8. CURE-SMOTE algorithm and hybrid algorithm for feature selection and parameter optimization based on random forests.

    PubMed

    Ma, Li; Fan, Suohai

    2017-03-14

    The random forests algorithm is a type of classifier with prominent universality, a wide application range, and robustness for avoiding overfitting. But there are still some drawbacks to random forests. Therefore, to improve the performance of random forests, this paper seeks to improve imbalanced data processing, feature selection and parameter optimization. We propose the CURE-SMOTE algorithm for the imbalanced data classification problem. Experiments on imbalanced UCI data reveal that the combination of Clustering Using Representatives (CURE) enhances the original synthetic minority oversampling technique (SMOTE) algorithms effectively compared with the classification results on the original data using random sampling, Borderline-SMOTE1, safe-level SMOTE, C-SMOTE, and k-means-SMOTE. Additionally, the hybrid RF (random forests) algorithm has been proposed for feature selection and parameter optimization, which uses the minimum out of bag (OOB) data error as its objective function. Simulation results on binary and higher-dimensional data indicate that the proposed hybrid RF algorithms, hybrid genetic-random forests algorithm, hybrid particle swarm-random forests algorithm and hybrid fish swarm-random forests algorithm can achieve the minimum OOB error and show the best generalization ability. The training set produced from the proposed CURE-SMOTE algorithm is closer to the original data distribution because it contains minimal noise. Thus, better classification results are produced from this feasible and effective algorithm. Moreover, the hybrid algorithm's F-value, G-mean, AUC and OOB scores demonstrate that they surpass the performance of the original RF algorithm. Hence, this hybrid algorithm provides a new way to perform feature selection and parameter optimization.

  9. How random is the random forest? Random forest algorithm on the service of structural imaging biomarkers for Alzheimer's disease: from Alzheimer's disease neuroimaging initiative (ADNI) database.

    PubMed

    Dimitriadis, Stavros I; Liparas, Dimitris

    2018-06-01

    Neuroinformatics is a fascinating research field that applies computational models and analytical tools to high dimensional experimental neuroscience data for a better understanding of how the brain functions or dysfunctions in brain diseases. Neuroinformaticians work in the intersection of neuroscience and informatics supporting the integration of various sub-disciplines (behavioural neuroscience, genetics, cognitive psychology, etc.) working on brain research. Neuroinformaticians are the pathway of information exchange between informaticians and clinicians for a better understanding of the outcome of computational models and the clinical interpretation of the analysis. Machine learning is one of the most significant computational developments in the last decade giving tools to neuroinformaticians and finally to radiologists and clinicians for an automatic and early diagnosis-prognosis of a brain disease. Random forest (RF) algorithm has been successfully applied to high-dimensional neuroimaging data for feature reduction and also has been applied to classify the clinical label of a subject using single or multi-modal neuroimaging datasets. Our aim was to review the studies where RF was applied to correctly predict the Alzheimer's disease (AD), the conversion from mild cognitive impairment (MCI) and its robustness to overfitting, outliers and handling of non-linear data. Finally, we described our RF-based model that gave us the 1 st position in an international challenge for automated prediction of MCI from MRI data.

  10. Global patterns of tropical forest fragmentation

    NASA Astrophysics Data System (ADS)

    Taubert, Franziska; Fischer, Rico; Groeneveld, Jürgen; Lehmann, Sebastian; Müller, Michael S.; Rödig, Edna; Wiegand, Thorsten; Huth, Andreas

    2018-02-01

    Remote sensing enables the quantification of tropical deforestation with high spatial resolution. This in-depth mapping has led to substantial advances in the analysis of continent-wide fragmentation of tropical forests. Here we identified approximately 130 million forest fragments in three continents that show surprisingly similar power-law size and perimeter distributions as well as fractal dimensions. Power-law distributions have been observed in many natural phenomena such as wildfires, landslides and earthquakes. The principles of percolation theory provide one explanation for the observed patterns, and suggest that forest fragmentation is close to the critical point of percolation; simulation modelling also supports this hypothesis. The observed patterns emerge not only from random deforestation, which can be described by percolation theory, but also from a wide range of deforestation and forest-recovery regimes. Our models predict that additional forest loss will result in a large increase in the total number of forest fragments—at maximum by a factor of 33 over 50 years—as well as a decrease in their size, and that these consequences could be partly mitigated by reforestation and forest protection.

  11. Modelling Associations between Public Understanding, Engagement and Forest Conditions in the Inland Northwest, USA

    PubMed Central

    Hartter, Joel; Stevens, Forrest R.; Hamilton, Lawrence C.; Congalton, Russell G.; Ducey, Mark J.; Oester, Paul T.

    2015-01-01

    Opinions about public lands and the actions of private non-industrial forest owners in the western United States play important roles in forested landscape management as both public and private forests face increasing risks from large wildfires, pests and disease. This work presents the responses from two surveys, a random-sample telephone survey of more than 1500 residents and a mail survey targeting owners of parcels with 10 or more acres of forest. These surveys were conducted in three counties (Wallowa, Union, and Baker) in northeast Oregon, USA. We analyze these survey data using structural equation models in order to assess how individual characteristics and understanding of forest management issues affect perceptions about forest conditions and risks associated with declining forest health on public lands. We test whether forest understanding is informed by background, beliefs, and experiences, and whether as an intervening variable it is associated with views about forest conditions on publicly managed forests. Individual background characteristics such as age, gender and county of residence have significant direct or indirect effects on our measurement of understanding. Controlling for background factors, we found that forest owners with higher self-assessed understanding, and more education about forest management, tend to hold more pessimistic views about forest conditions. Based on our results we argue that self-assessed understanding, interest in learning, and willingness to engage in extension activities together have leverage to affect perceptions about the risks posed by declining forest conditions on public lands, influence land owner actions, and affect support for public policies. These results also have broader implications for management of forested landscapes on public and private lands amidst changing demographics in rural communities across the Inland Northwest where migration may significantly alter the composition of forest owner goals, understanding, and support for various management actions. PMID:25671619

  12. Computer-Aided Diagnosis for Breast Ultrasound Using Computerized BI-RADS Features and Machine Learning Methods.

    PubMed

    Shan, Juan; Alam, S Kaisar; Garra, Brian; Zhang, Yingtao; Ahmed, Tahira

    2016-04-01

    This work identifies effective computable features from the Breast Imaging Reporting and Data System (BI-RADS), to develop a computer-aided diagnosis (CAD) system for breast ultrasound. Computerized features corresponding to ultrasound BI-RADs categories were designed and tested using a database of 283 pathology-proven benign and malignant lesions. Features were selected based on classification performance using a "bottom-up" approach for different machine learning methods, including decision tree, artificial neural network, random forest and support vector machine. Using 10-fold cross-validation on the database of 283 cases, the highest area under the receiver operating characteristic (ROC) curve (AUC) was 0.84 from a support vector machine with 77.7% overall accuracy; the highest overall accuracy, 78.5%, was from a random forest with the AUC 0.83. Lesion margin and orientation were optimum features common to all of the different machine learning methods. These features can be used in CAD systems to help distinguish benign from worrisome lesions. Copyright © 2016 World Federation for Ultrasound in Medicine & Biology. All rights reserved.

  13. Per-field crop classification in irrigated agricultural regions in middle Asia using random forest and support vector machine ensemble

    NASA Astrophysics Data System (ADS)

    Löw, Fabian; Schorcht, Gunther; Michel, Ulrich; Dech, Stefan; Conrad, Christopher

    2012-10-01

    Accurate crop identification and crop area estimation are important for studies on irrigated agricultural systems, yield and water demand modeling, and agrarian policy development. In this study a novel combination of Random Forest (RF) and Support Vector Machine (SVM) classifiers is presented that (i) enhances crop classification accuracy and (ii) provides spatial information on map uncertainty. The methodology was implemented over four distinct irrigated sites in Middle Asia using RapidEye time series data. The RF feature importance statistics was used as feature-selection strategy for the SVM to assess possible negative effects on classification accuracy caused by an oversized feature space. The results of the individual RF and SVM classifications were combined with rules based on posterior classification probability and estimates of classification probability entropy. SVM classification performance was increased by feature selection through RF. Further experimental results indicate that the hybrid classifier improves overall classification accuracy in comparison to the single classifiers as well as useŕs and produceŕs accuracy.

  14. The Trail Making test: a study of its ability to predict falls in the acute neurological in-patient population.

    PubMed

    Mateen, Bilal Akhter; Bussas, Matthias; Doogan, Catherine; Waller, Denise; Saverino, Alessia; Király, Franz J; Playford, E Diane

    2018-05-01

    To determine whether tests of cognitive function and patient-reported outcome measures of motor function can be used to create a machine learning-based predictive tool for falls. Prospective cohort study. Tertiary neurological and neurosurgical center. In all, 337 in-patients receiving neurosurgical, neurological, or neurorehabilitation-based care. Binary (Y/N) for falling during the in-patient episode, the Trail Making Test (a measure of attention and executive function) and the Walk-12 (a patient-reported measure of physical function). The principal outcome was a fall during the in-patient stay ( n = 54). The Trail test was identified as the best predictor of falls. Moreover, addition of other variables, did not improve the prediction (Wilcoxon signed-rank P < 0.001). Classical linear statistical modeling methods were then compared with more recent machine learning based strategies, for example, random forests, neural networks, support vector machines. The random forest was the best modeling strategy when utilizing just the Trail Making Test data (Wilcoxon signed-rank P < 0.001) with 68% (± 7.7) sensitivity, and 90% (± 2.3) specificity. This study identifies a simple yet powerful machine learning (Random Forest) based predictive model for an in-patient neurological population, utilizing a single neuropsychological test of cognitive function, the Trail Making test.

  15. Predicting CD4 count changes among patients on antiretroviral treatment: Application of data mining techniques.

    PubMed

    Kebede, Mihiretu; Zegeye, Desalegn Tigabu; Zeleke, Berihun Megabiaw

    2017-12-01

    To monitor the progress of therapy and disease progression, periodic CD4 counts are required throughout the course of HIV/AIDS care and support. The demand for CD4 count measurement is increasing as ART programs expand over the last decade. This study aimed to predict CD4 count changes and to identify the predictors of CD4 count changes among patients on ART. A cross-sectional study was conducted at the University of Gondar Hospital from 3,104 adult patients on ART with CD4 counts measured at least twice (baseline and most recent). Data were retrieved from the HIV care clinic electronic database and patients` charts. Descriptive data were analyzed by SPSS version 20. Cross-Industry Standard Process for Data Mining (CRISP-DM) methodology was followed to undertake the study. WEKA version 3.8 was used to conduct a predictive data mining. Before building the predictive data mining models, information gain values and correlation-based Feature Selection methods were used for attribute selection. Variables were ranked according to their relevance based on their information gain values. J48, Neural Network, and Random Forest algorithms were experimented to assess model accuracies. The median duration of ART was 191.5 weeks. The mean CD4 count change was 243 (SD 191.14) cells per microliter. Overall, 2427 (78.2%) patients had their CD4 counts increased by at least 100 cells per microliter, while 4% had a decline from the baseline CD4 value. Baseline variables including age, educational status, CD8 count, ART regimen, and hemoglobin levels predicted CD4 count changes with predictive accuracies of J48, Neural Network, and Random Forest being 87.1%, 83.5%, and 99.8%, respectively. Random Forest algorithm had a superior performance accuracy level than both J48 and Artificial Neural Network. The precision, sensitivity and recall values of Random Forest were also more than 99%. Nearly accurate prediction results were obtained using Random Forest algorithm. This algorithm could be used in a low-resource setting to build a web-based prediction model for CD4 count changes. Copyright © 2017 Elsevier B.V. All rights reserved.

  16. Non-random species loss in a forest herbaceous layer following nitrogen addition

    Treesearch

    Christopher A. ​Walter; Mary Beth Adams; Frank S. Gilliam; William T. Peterjohn

    2017-01-01

    Nitrogen (N) additions have decreased species richness (S) in hardwood forest herbaceous layers, yet the functional mechanisms for these decreases have not been explicitly evaluated.We tested two hypothesized mechanisms, random species loss (RSL) and non-random species loss (NRSL), in the hardwood forest herbaceous layer of a long-term, plot-scale...

  17. Comparison of Random Forest and Parametric Imputation Models for Imputing Missing Data Using MICE: A CALIBER Study

    PubMed Central

    Shah, Anoop D.; Bartlett, Jonathan W.; Carpenter, James; Nicholas, Owen; Hemingway, Harry

    2014-01-01

    Multivariate imputation by chained equations (MICE) is commonly used for imputing missing data in epidemiologic research. The “true” imputation model may contain nonlinearities which are not included in default imputation models. Random forest imputation is a machine learning technique which can accommodate nonlinearities and interactions and does not require a particular regression model to be specified. We compared parametric MICE with a random forest-based MICE algorithm in 2 simulation studies. The first study used 1,000 random samples of 2,000 persons drawn from the 10,128 stable angina patients in the CALIBER database (Cardiovascular Disease Research using Linked Bespoke Studies and Electronic Records; 2001–2010) with complete data on all covariates. Variables were artificially made “missing at random,” and the bias and efficiency of parameter estimates obtained using different imputation methods were compared. Both MICE methods produced unbiased estimates of (log) hazard ratios, but random forest was more efficient and produced narrower confidence intervals. The second study used simulated data in which the partially observed variable depended on the fully observed variables in a nonlinear way. Parameter estimates were less biased using random forest MICE, and confidence interval coverage was better. This suggests that random forest imputation may be useful for imputing complex epidemiologic data sets in which some patients have missing data. PMID:24589914

  18. Comparison of random forest and parametric imputation models for imputing missing data using MICE: a CALIBER study.

    PubMed

    Shah, Anoop D; Bartlett, Jonathan W; Carpenter, James; Nicholas, Owen; Hemingway, Harry

    2014-03-15

    Multivariate imputation by chained equations (MICE) is commonly used for imputing missing data in epidemiologic research. The "true" imputation model may contain nonlinearities which are not included in default imputation models. Random forest imputation is a machine learning technique which can accommodate nonlinearities and interactions and does not require a particular regression model to be specified. We compared parametric MICE with a random forest-based MICE algorithm in 2 simulation studies. The first study used 1,000 random samples of 2,000 persons drawn from the 10,128 stable angina patients in the CALIBER database (Cardiovascular Disease Research using Linked Bespoke Studies and Electronic Records; 2001-2010) with complete data on all covariates. Variables were artificially made "missing at random," and the bias and efficiency of parameter estimates obtained using different imputation methods were compared. Both MICE methods produced unbiased estimates of (log) hazard ratios, but random forest was more efficient and produced narrower confidence intervals. The second study used simulated data in which the partially observed variable depended on the fully observed variables in a nonlinear way. Parameter estimates were less biased using random forest MICE, and confidence interval coverage was better. This suggests that random forest imputation may be useful for imputing complex epidemiologic data sets in which some patients have missing data.

  19. Approximating prediction uncertainty for random forest regression models

    Treesearch

    John W. Coulston; Christine E. Blinn; Valerie A. Thomas; Randolph H. Wynne

    2016-01-01

    Machine learning approaches such as random forest have increased for the spatial modeling and mapping of continuous variables. Random forest is a non-parametric ensemble approach, and unlike traditional regression approaches there is no direct quantification of prediction error. Understanding prediction uncertainty is important when using model-based continuous maps as...

  20. Predicting temperate forest stand types using only structural profiles from discrete return airborne lidar

    NASA Astrophysics Data System (ADS)

    Fedrigo, Melissa; Newnham, Glenn J.; Coops, Nicholas C.; Culvenor, Darius S.; Bolton, Douglas K.; Nitschke, Craig R.

    2018-02-01

    Light detection and ranging (lidar) data have been increasingly used for forest classification due to its ability to penetrate the forest canopy and provide detail about the structure of the lower strata. In this study we demonstrate forest classification approaches using airborne lidar data as inputs to random forest and linear unmixing classification algorithms. Our results demonstrated that both random forest and linear unmixing models identified a distribution of rainforest and eucalypt stands that was comparable to existing ecological vegetation class (EVC) maps based primarily on manual interpretation of high resolution aerial imagery. Rainforest stands were also identified in the region that have not previously been identified in the EVC maps. The transition between stand types was better characterised by the random forest modelling approach. In contrast, the linear unmixing model placed greater emphasis on field plots selected as endmembers which may not have captured the variability in stand structure within a single stand type. The random forest model had the highest overall accuracy (84%) and Cohen's kappa coefficient (0.62). However, the classification accuracy was only marginally better than linear unmixing. The random forest model was applied to a region in the Central Highlands of south-eastern Australia to produce maps of stand type probability, including areas of transition (the 'ecotone') between rainforest and eucalypt forest. The resulting map provided a detailed delineation of forest classes, which specifically recognised the coalescing of stand types at the landscape scale. This represents a key step towards mapping the structural and spatial complexity of these ecosystems, which is important for both their management and conservation.

  1. Social determinants of long lasting insecticidal hammock use among the Ra-glai ethnic minority in Vietnam: implications for forest malaria control.

    PubMed

    Grietens, Koen Peeters; Xuan, Xa Nguyen; Ribera, Joan; Duc, Thang Ngo; Bortel, Wim van; Ba, Nhat Truong; Van, Ky Pham; Xuan, Hung Le; D'Alessandro, Umberto; Erhart, Annette

    2012-01-01

    Long-lasting insecticidal hammocks (LLIHs) are being evaluated as an additional malaria prevention tool in settings where standard control strategies have a limited impact. This is the case among the Ra-glai ethnic minority communities of Ninh Thuan, one of the forested and mountainous provinces of Central Vietnam where malaria morbidity persist due to the sylvatic nature of the main malaria vector An. dirus and the dependence of the population on the forest for subsistence--as is the case for many impoverished ethnic minorities in Southeast Asia. A social science study was carried out ancillary to a community-based cluster randomized trial on the effectiveness of LLIHs to control forest malaria. The social science research strategy consisted of a mixed methods study triangulating qualitative data from focused ethnography and quantitative data collected during a malariometric cross-sectional survey on a random sample of 2,045 study participants. To meet work requirements during the labor intensive malaria transmission and rainy season, Ra-glai slash and burn farmers combine living in government supported villages along the road with a second home at their fields located in the forest. LLIH use was evaluated in both locations. During daytime, LLIH use at village level was reported by 69.3% of all respondents, and in forest fields this was 73.2%. In the evening, 54.1% used the LLIHs in the villages, while at the fields this was 20.7%. At night, LLIH use was minimal, regardless of the location (village 4.4%; forest 6.4%). Despite the free distribution of insecticide-treated nets (ITNs) and LLIHs, around half the local population remains largely unprotected when sleeping in their forest plot huts. In order to tackle forest malaria more effectively, control policies should explicitly target forest fields where ethnic minority farmers are more vulnerable to malaria.

  2. Social Determinants of Long Lasting Insecticidal Hammock-Use Among the Ra-Glai Ethnic Minority in Vietnam: Implications for Forest Malaria Control

    PubMed Central

    Muela Ribera, Joan; Ngo Duc, Thang; van Bortel, Wim; Truong Ba, Nhat; Van, Ky Pham; Le Xuan, Hung; D'Alessandro, Umberto; Erhart, Annette

    2012-01-01

    Background Long-lasting insecticidal hammocks (LLIHs) are being evaluated as an additional malaria prevention tool in settings where standard control strategies have a limited impact. This is the case among the Ra-glai ethnic minority communities of Ninh Thuan, one of the forested and mountainous provinces of Central Vietnam where malaria morbidity persist due to the sylvatic nature of the main malaria vector An. dirus and the dependence of the population on the forest for subsistence - as is the case for many impoverished ethnic minorities in Southeast Asia. Methods A social science study was carried out ancillary to a community-based cluster randomized trial on the effectiveness of LLIHs to control forest malaria. The social science research strategy consisted of a mixed methods study triangulating qualitative data from focused ethnography and quantitative data collected during a malariometric cross-sectional survey on a random sample of 2,045 study participants. Results To meet work requirements during the labor intensive malaria transmission and rainy season, Ra-glai slash and burn farmers combine living in government supported villages along the road with a second home at their fields located in the forest. LLIH use was evaluated in both locations. During daytime, LLIH use at village level was reported by 69.3% of all respondents, and in forest fields this was 73.2%. In the evening, 54.1% used the LLIHs in the villages, while at the fields this was 20.7%. At night, LLIH use was minimal, regardless of the location (village 4.4%; forest 6.4%). Discussion Despite the free distribution of insecticide-treated nets (ITNs) and LLIHs, around half the local population remains largely unprotected when sleeping in their forest plot huts. In order to tackle forest malaria more effectively, control policies should explicitly target forest fields where ethnic minority farmers are more vulnerable to malaria. PMID:22253852

  3. Edge-related loss of tree phylogenetic diversity in the severely fragmented Brazilian Atlantic forest.

    PubMed

    Santos, Bráulio A; Arroyo-Rodríguez, Víctor; Moreno, Claudia E; Tabarelli, Marcelo

    2010-09-08

    Deforestation and forest fragmentation are known major causes of nonrandom extinction, but there is no information about their impact on the phylogenetic diversity of the remaining species assemblages. Using a large vegetation dataset from an old hyper-fragmented landscape in the Brazilian Atlantic rainforest we assess whether the local extirpation of tree species and functional impoverishment of tree assemblages reduce the phylogenetic diversity of the remaining tree assemblages. We detected a significant loss of tree phylogenetic diversity in forest edges, but not in core areas of small (<80 ha) forest fragments. This was attributed to a reduction of 11% in the average phylogenetic distance between any two randomly chosen individuals from forest edges; an increase of 17% in the average phylogenetic distance to closest non-conspecific relative for each individual in forest edges; and to the potential manifestation of late edge effects in the core areas of small forest remnants. We found no evidence supporting fragmentation-induced phylogenetic clustering or evenness. This could be explained by the low phylogenetic conservatism of key life-history traits corresponding to vulnerable species. Edge effects must be reduced to effectively protect tree phylogenetic diversity in the severely fragmented Brazilian Atlantic forest.

  4. The experimental design of the Missouri Ozark Forest Ecosystem Project

    Treesearch

    Steven L. Sheriff; Shuoqiong He

    1997-01-01

    The Missouri Ozark Forest Ecosystem Project (MOFEP) is an experiment that examines the effects of three forest management practices on the forest community. MOFEP is designed as a randomized complete block design using nine sites divided into three blocks. Treatments of uneven-aged, even-aged, and no-harvest management were randomly assigned to sites within each block...

  5. Random Forest as an Imputation Method for Education and Psychology Research: Its Impact on Item Fit and Difficulty of the Rasch Model

    ERIC Educational Resources Information Center

    Golino, Hudson F.; Gomes, Cristiano M. A.

    2016-01-01

    This paper presents a non-parametric imputation technique, named random forest, from the machine learning field. The random forest procedure has two main tuning parameters: the number of trees grown in the prediction and the number of predictors used. Fifty experimental conditions were created in the imputation procedure, with different…

  6. Random Bits Forest: a Strong Classifier/Regressor for Big Data

    NASA Astrophysics Data System (ADS)

    Wang, Yi; Li, Yi; Pu, Weilin; Wen, Kathryn; Shugart, Yin Yao; Xiong, Momiao; Jin, Li

    2016-07-01

    Efficiency, memory consumption, and robustness are common problems with many popular methods for data analysis. As a solution, we present Random Bits Forest (RBF), a classification and regression algorithm that integrates neural networks (for depth), boosting (for width), and random forests (for prediction accuracy). Through a gradient boosting scheme, it first generates and selects ~10,000 small, 3-layer random neural networks. These networks are then fed into a modified random forest algorithm to obtain predictions. Testing with datasets from the UCI (University of California, Irvine) Machine Learning Repository shows that RBF outperforms other popular methods in both accuracy and robustness, especially with large datasets (N > 1000). The algorithm also performed highly in testing with an independent data set, a real psoriasis genome-wide association study (GWAS).

  7. Using Random Forest Models to Predict Organizational Violence

    NASA Technical Reports Server (NTRS)

    Levine, Burton; Bobashev, Georgly

    2012-01-01

    We present a methodology to access the proclivity of an organization to commit violence against nongovernment personnel. We fitted a Random Forest model using the Minority at Risk Organizational Behavior (MAROS) dataset. The MAROS data is longitudinal; so, individual observations are not independent. We propose a modification to the standard Random Forest methodology to account for the violation of the independence assumption. We present the results of the model fit, an example of predicting violence for an organization; and finally, we present a summary of the forest in a "meta-tree,"

  8. The Past, Present and Future of the Meteorological Phenomena Identification Near the Ground (mPING) Project

    NASA Astrophysics Data System (ADS)

    Elmore, K. L.

    2016-12-01

    The Metorological Phenomemna Identification NeartheGround (mPING) project is an example of a crowd-sourced, citizen science effort to gather data of sufficeint quality and quantity needed by new post processing methods that use machine learning. Transportation and infrastructure are particularly sensitive to precipitation type in winter weather. We extract attributes from operational numerical forecast models and use them in a random forest to generate forecast winter precipitation types. We find that random forests applied to forecast soundings are effective at generating skillful forecasts of surface ptype with consideralbly more skill than the current algorithms, especuially for ice pellets and freezing rain. We also find that three very different forecast models yuield similar overall results, showing that random forests are able to extract essentially equivalent information from different forecast models. We also show that the random forest for each model, and each profile type is unique to the particular forecast model and that the random forests developed using a particular model suffer significant degradation when given attributes derived from a different model. This implies that no single algorithm can perform well across all forecast models. Clearly, random forests extract information unavailable to "physically based" methods because the physical information in the models does not appear as we expect. One intersting result is that results from the classic "warm nose" sounding profile are, by far, the most sensitive to the particular forecast model, but this profile is also the one for which random forests are most skillful. Finally, a method for calibrarting probabilties for each different ptype using multinomial logistic regression is shown.

  9. Predicting redox-sensitive contaminant concentrations in groundwater using random forest classification

    NASA Astrophysics Data System (ADS)

    Tesoriero, Anthony J.; Gronberg, Jo Ann; Juckem, Paul F.; Miller, Matthew P.; Austin, Brian P.

    2017-08-01

    Machine learning techniques were applied to a large (n > 10,000) compliance monitoring database to predict the occurrence of several redox-active constituents in groundwater across a large watershed. Specifically, random forest classification was used to determine the probabilities of detecting elevated concentrations of nitrate, iron, and arsenic in the Fox, Wolf, Peshtigo, and surrounding watersheds in northeastern Wisconsin. Random forest classification is well suited to describe the nonlinear relationships observed among several explanatory variables and the predicted probabilities of elevated concentrations of nitrate, iron, and arsenic. Maps of the probability of elevated nitrate, iron, and arsenic can be used to assess groundwater vulnerability and the vulnerability of streams to contaminants derived from groundwater. Processes responsible for elevated concentrations are elucidated using partial dependence plots. For example, an increase in the probability of elevated iron and arsenic occurred when well depths coincided with the glacial/bedrock interface, suggesting a bedrock source for these constituents. Furthermore, groundwater in contact with Ordovician bedrock has a higher likelihood of elevated iron concentrations, which supports the hypothesis that groundwater liberates iron from a sulfide-bearing secondary cement horizon of Ordovician age. Application of machine learning techniques to existing compliance monitoring data offers an opportunity to broadly assess aquifer and stream vulnerability at regional and national scales and to better understand geochemical processes responsible for observed conditions.

  10. Predicting redox-sensitive contaminant concentrations in groundwater using random forest classification

    USGS Publications Warehouse

    Tesoriero, Anthony J.; Gronberg, Jo Ann M.; Juckem, Paul F.; Miller, Matthew P.; Austin, Brian P.

    2017-01-01

    Machine learning techniques were applied to a large (n > 10,000) compliance monitoring database to predict the occurrence of several redox-active constituents in groundwater across a large watershed. Specifically, random forest classification was used to determine the probabilities of detecting elevated concentrations of nitrate, iron, and arsenic in the Fox, Wolf, Peshtigo, and surrounding watersheds in northeastern Wisconsin. Random forest classification is well suited to describe the nonlinear relationships observed among several explanatory variables and the predicted probabilities of elevated concentrations of nitrate, iron, and arsenic. Maps of the probability of elevated nitrate, iron, and arsenic can be used to assess groundwater vulnerability and the vulnerability of streams to contaminants derived from groundwater. Processes responsible for elevated concentrations are elucidated using partial dependence plots. For example, an increase in the probability of elevated iron and arsenic occurred when well depths coincided with the glacial/bedrock interface, suggesting a bedrock source for these constituents. Furthermore, groundwater in contact with Ordovician bedrock has a higher likelihood of elevated iron concentrations, which supports the hypothesis that groundwater liberates iron from a sulfide-bearing secondary cement horizon of Ordovician age. Application of machine learning techniques to existing compliance monitoring data offers an opportunity to broadly assess aquifer and stream vulnerability at regional and national scales and to better understand geochemical processes responsible for observed conditions.

  11. Developing reservoir monthly inflow forecasts using artificial intelligence and climate phenomenon information

    NASA Astrophysics Data System (ADS)

    Yang, Tiantian; Asanjan, Ata Akbari; Welles, Edwin; Gao, Xiaogang; Sorooshian, Soroosh; Liu, Xiaomang

    2017-04-01

    Reservoirs are fundamental human-built infrastructures that collect, store, and deliver fresh surface water in a timely manner for many purposes. Efficient reservoir operation requires policy makers and operators to understand how reservoir inflows are changing under different hydrological and climatic conditions to enable forecast-informed operations. Over the last decade, the uses of Artificial Intelligence and Data Mining [AI & DM] techniques in assisting reservoir streamflow subseasonal to seasonal forecasts have been increasing. In this study, Random Forest [RF), Artificial Neural Network (ANN), and Support Vector Regression (SVR) are employed and compared with respect to their capabilities for predicting 1 month-ahead reservoir inflows for two headwater reservoirs in USA and China. Both current and lagged hydrological information and 17 known climate phenomenon indices, i.e., PDO and ENSO, etc., are selected as predictors for simulating reservoir inflows. Results show (1) three methods are capable of providing monthly reservoir inflows with satisfactory statistics; (2) the results obtained by Random Forest have the best statistical performances compared with the other two methods; (3) another advantage of Random Forest algorithm is its capability of interpreting raw model inputs; (4) climate phenomenon indices are useful in assisting monthly or seasonal forecasts of reservoir inflow; and (5) different climate conditions are autocorrelated with up to several months, and the climatic information and their lags are cross correlated with local hydrological conditions in our case studies.

  12. A Random Forest-based ensemble method for activity recognition.

    PubMed

    Feng, Zengtao; Mo, Lingfei; Li, Meng

    2015-01-01

    This paper presents a multi-sensor ensemble approach to human physical activity (PA) recognition, using random forest. We designed an ensemble learning algorithm, which integrates several independent Random Forest classifiers based on different sensor feature sets to build a more stable, more accurate and faster classifier for human activity recognition. To evaluate the algorithm, PA data collected from the PAMAP (Physical Activity Monitoring for Aging People), which is a standard, publicly available database, was utilized to train and test. The experimental results show that the algorithm is able to correctly recognize 19 PA types with an accuracy of 93.44%, while the training is faster than others. The ensemble classifier system based on the RF (Random Forest) algorithm can achieve high recognition accuracy and fast calculation.

  13. Detection of compensatory balance responses using wearable electromyography sensors for fall-risk assessment.

    PubMed

    Nouredanesh, Mina; Kukreja, Sunil L; Tung, James

    2016-08-01

    Loss of balance is prevalent in older adults and populations with gait and balance impairments. The present paper aims to develop a method to automatically distinguish compensatory balance responses (CBRs) from normal gait, based on activity patterns of muscles involved in maintaining balance. In this study, subjects were perturbed by lateral pushes while walking and surface electromyography (sEMG) signals were recorded from four muscles in their right leg. To extract sEMG time domain features, several filtering characteristics and segmentation approaches are examined. The performance of three classification methods, i.e., k-nearest neighbor, support vector machines, and random forests, were investigated for accurate detection of CBRs. Our results show that features extracted in the 50-200Hz band, segmented using peak sEMG amplitudes, and a random forest classifier detected CBRs with an accuracy of 92.35%. Moreover, our results support the important role of biceps femoris and rectus femoris muscles in stabilization and consequently discerning CBRs. This study contributes towards the development of wearable sensor systems to accurately and reliably monitor gait and balance control behavior in at-home settings (unsupervised conditions), over long periods of time, towards personalized fall risk assessment tools.

  14. A machine learning system to improve heart failure patient assistance.

    PubMed

    Guidi, Gabriele; Pettenati, Maria Chiara; Melillo, Paolo; Iadanza, Ernesto

    2014-11-01

    In this paper, we present a clinical decision support system (CDSS) for the analysis of heart failure (HF) patients, providing various outputs such as an HF severity evaluation, HF-type prediction, as well as a management interface that compares the different patients' follow-ups. The whole system is composed of a part of intelligent core and of an HF special-purpose management tool also providing the function to act as interface for the artificial intelligence training and use. To implement the smart intelligent functions, we adopted a machine learning approach. In this paper, we compare the performance of a neural network (NN), a support vector machine, a system with fuzzy rules genetically produced, and a classification and regression tree and its direct evolution, which is the random forest, in analyzing our database. Best performances in both HF severity evaluation and HF-type prediction functions are obtained by using the random forest algorithm. The management tool allows the cardiologist to populate a "supervised database" suitable for machine learning during his or her regular outpatient consultations. The idea comes from the fact that in literature there are a few databases of this type, and they are not scalable to our case.

  15. Unbiased split variable selection for random survival forests using maximally selected rank statistics.

    PubMed

    Wright, Marvin N; Dankowski, Theresa; Ziegler, Andreas

    2017-04-15

    The most popular approach for analyzing survival data is the Cox regression model. The Cox model may, however, be misspecified, and its proportionality assumption may not always be fulfilled. An alternative approach for survival prediction is random forests for survival outcomes. The standard split criterion for random survival forests is the log-rank test statistic, which favors splitting variables with many possible split points. Conditional inference forests avoid this split variable selection bias. However, linear rank statistics are utilized by default in conditional inference forests to select the optimal splitting variable, which cannot detect non-linear effects in the independent variables. An alternative is to use maximally selected rank statistics for the split point selection. As in conditional inference forests, splitting variables are compared on the p-value scale. However, instead of the conditional Monte-Carlo approach used in conditional inference forests, p-value approximations are employed. We describe several p-value approximations and the implementation of the proposed random forest approach. A simulation study demonstrates that unbiased split variable selection is possible. However, there is a trade-off between unbiased split variable selection and runtime. In benchmark studies of prediction performance on simulated and real datasets, the new method performs better than random survival forests if informative dichotomous variables are combined with uninformative variables with more categories and better than conditional inference forests if non-linear covariate effects are included. In a runtime comparison, the method proves to be computationally faster than both alternatives, if a simple p-value approximation is used. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.

  16. A comparison of the conditional inference survival forest model to random survival forests based on a simulation study as well as on two applications with time-to-event data.

    PubMed

    Nasejje, Justine B; Mwambi, Henry; Dheda, Keertan; Lesosky, Maia

    2017-07-28

    Random survival forest (RSF) models have been identified as alternative methods to the Cox proportional hazards model in analysing time-to-event data. These methods, however, have been criticised for the bias that results from favouring covariates with many split-points and hence conditional inference forests for time-to-event data have been suggested. Conditional inference forests (CIF) are known to correct the bias in RSF models by separating the procedure for the best covariate to split on from that of the best split point search for the selected covariate. In this study, we compare the random survival forest model to the conditional inference model (CIF) using twenty-two simulated time-to-event datasets. We also analysed two real time-to-event datasets. The first dataset is based on the survival of children under-five years of age in Uganda and it consists of categorical covariates with most of them having more than two levels (many split-points). The second dataset is based on the survival of patients with extremely drug resistant tuberculosis (XDR TB) which consists of mainly categorical covariates with two levels (few split-points). The study findings indicate that the conditional inference forest model is superior to random survival forest models in analysing time-to-event data that consists of covariates with many split-points based on the values of the bootstrap cross-validated estimates for integrated Brier scores. However, conditional inference forests perform comparably similar to random survival forests models in analysing time-to-event data consisting of covariates with fewer split-points. Although survival forests are promising methods in analysing time-to-event data, it is important to identify the best forest model for analysis based on the nature of covariates of the dataset in question.

  17. SNP selection and classification of genome-wide SNP data using stratified sampling random forests.

    PubMed

    Wu, Qingyao; Ye, Yunming; Liu, Yang; Ng, Michael K

    2012-09-01

    For high dimensional genome-wide association (GWA) case-control data of complex disease, there are usually a large portion of single-nucleotide polymorphisms (SNPs) that are irrelevant with the disease. A simple random sampling method in random forest using default mtry parameter to choose feature subspace, will select too many subspaces without informative SNPs. Exhaustive searching an optimal mtry is often required in order to include useful and relevant SNPs and get rid of vast of non-informative SNPs. However, it is too time-consuming and not favorable in GWA for high-dimensional data. The main aim of this paper is to propose a stratified sampling method for feature subspace selection to generate decision trees in a random forest for GWA high-dimensional data. Our idea is to design an equal-width discretization scheme for informativeness to divide SNPs into multiple groups. In feature subspace selection, we randomly select the same number of SNPs from each group and combine them to form a subspace to generate a decision tree. The advantage of this stratified sampling procedure can make sure each subspace contains enough useful SNPs, but can avoid a very high computational cost of exhaustive search of an optimal mtry, and maintain the randomness of a random forest. We employ two genome-wide SNP data sets (Parkinson case-control data comprised of 408 803 SNPs and Alzheimer case-control data comprised of 380 157 SNPs) to demonstrate that the proposed stratified sampling method is effective, and it can generate better random forest with higher accuracy and lower error bound than those by Breiman's random forest generation method. For Parkinson data, we also show some interesting genes identified by the method, which may be associated with neurological disorders for further biological investigations.

  18. Spectroscopic diagnosis of laryngeal carcinoma using near-infrared Raman spectroscopy and random recursive partitioning ensemble techniques.

    PubMed

    Teh, Seng Khoon; Zheng, Wei; Lau, David P; Huang, Zhiwei

    2009-06-01

    In this work, we evaluated the diagnostic ability of near-infrared (NIR) Raman spectroscopy associated with the ensemble recursive partitioning algorithm based on random forests for identifying cancer from normal tissue in the larynx. A rapid-acquisition NIR Raman system was utilized for tissue Raman measurements at 785 nm excitation, and 50 human laryngeal tissue specimens (20 normal; 30 malignant tumors) were used for NIR Raman studies. The random forests method was introduced to develop effective diagnostic algorithms for classification of Raman spectra of different laryngeal tissues. High-quality Raman spectra in the range of 800-1800 cm(-1) can be acquired from laryngeal tissue within 5 seconds. Raman spectra differed significantly between normal and malignant laryngeal tissues. Classification results obtained from the random forests algorithm on tissue Raman spectra yielded a diagnostic sensitivity of 88.0% and specificity of 91.4% for laryngeal malignancy identification. The random forests technique also provided variables importance that facilitates correlation of significant Raman spectral features with cancer transformation. This study shows that NIR Raman spectroscopy in conjunction with random forests algorithm has a great potential for the rapid diagnosis and detection of malignant tumors in the larynx.

  19. Why choose Random Forest to predict rare species distribution with few samples in large undersampled areas? Three Asian crane species models provide supporting evidence.

    PubMed

    Mi, Chunrong; Huettmann, Falk; Guo, Yumin; Han, Xuesong; Wen, Lijia

    2017-01-01

    Species distribution models (SDMs) have become an essential tool in ecology, biogeography, evolution and, more recently, in conservation biology. How to generalize species distributions in large undersampled areas, especially with few samples, is a fundamental issue of SDMs. In order to explore this issue, we used the best available presence records for the Hooded Crane ( Grus monacha , n  = 33), White-naped Crane ( Grus vipio , n  = 40), and Black-necked Crane ( Grus nigricollis , n  = 75) in China as three case studies, employing four powerful and commonly used machine learning algorithms to map the breeding distributions of the three species: TreeNet (Stochastic Gradient Boosting, Boosted Regression Tree Model), Random Forest, CART (Classification and Regression Tree) and Maxent (Maximum Entropy Models). In addition, we developed an ensemble forecast by averaging predicted probability of the above four models results. Commonly used model performance metrics (Area under ROC (AUC) and true skill statistic (TSS)) were employed to evaluate model accuracy. The latest satellite tracking data and compiled literature data were used as two independent testing datasets to confront model predictions. We found Random Forest demonstrated the best performance for the most assessment method, provided a better model fit to the testing data, and achieved better species range maps for each crane species in undersampled areas. Random Forest has been generally available for more than 20 years and has been known to perform extremely well in ecological predictions. However, while increasingly on the rise, its potential is still widely underused in conservation, (spatial) ecological applications and for inference. Our results show that it informs ecological and biogeographical theories as well as being suitable for conservation applications, specifically when the study area is undersampled. This method helps to save model-selection time and effort, and allows robust and rapid assessments and decisions for efficient conservation.

  20. Why choose Random Forest to predict rare species distribution with few samples in large undersampled areas? Three Asian crane species models provide supporting evidence

    PubMed Central

    Mi, Chunrong; Huettmann, Falk; Han, Xuesong; Wen, Lijia

    2017-01-01

    Species distribution models (SDMs) have become an essential tool in ecology, biogeography, evolution and, more recently, in conservation biology. How to generalize species distributions in large undersampled areas, especially with few samples, is a fundamental issue of SDMs. In order to explore this issue, we used the best available presence records for the Hooded Crane (Grus monacha, n = 33), White-naped Crane (Grus vipio, n = 40), and Black-necked Crane (Grus nigricollis, n = 75) in China as three case studies, employing four powerful and commonly used machine learning algorithms to map the breeding distributions of the three species: TreeNet (Stochastic Gradient Boosting, Boosted Regression Tree Model), Random Forest, CART (Classification and Regression Tree) and Maxent (Maximum Entropy Models). In addition, we developed an ensemble forecast by averaging predicted probability of the above four models results. Commonly used model performance metrics (Area under ROC (AUC) and true skill statistic (TSS)) were employed to evaluate model accuracy. The latest satellite tracking data and compiled literature data were used as two independent testing datasets to confront model predictions. We found Random Forest demonstrated the best performance for the most assessment method, provided a better model fit to the testing data, and achieved better species range maps for each crane species in undersampled areas. Random Forest has been generally available for more than 20 years and has been known to perform extremely well in ecological predictions. However, while increasingly on the rise, its potential is still widely underused in conservation, (spatial) ecological applications and for inference. Our results show that it informs ecological and biogeographical theories as well as being suitable for conservation applications, specifically when the study area is undersampled. This method helps to save model-selection time and effort, and allows robust and rapid assessments and decisions for efficient conservation. PMID:28097060

  1. Accurate Diabetes Risk Stratification Using Machine Learning: Role of Missing Value and Outliers.

    PubMed

    Maniruzzaman, Md; Rahman, Md Jahanur; Al-MehediHasan, Md; Suri, Harman S; Abedin, Md Menhazul; El-Baz, Ayman; Suri, Jasjit S

    2018-04-10

    Diabetes mellitus is a group of metabolic diseases in which blood sugar levels are too high. About 8.8% of the world was diabetic in 2017. It is projected that this will reach nearly 10% by 2045. The major challenge is that when machine learning-based classifiers are applied to such data sets for risk stratification, leads to lower performance. Thus, our objective is to develop an optimized and robust machine learning (ML) system under the assumption that missing values or outliers if replaced by a median configuration will yield higher risk stratification accuracy. This ML-based risk stratification is designed, optimized and evaluated, where: (i) the features are extracted and optimized from the six feature selection techniques (random forest, logistic regression, mutual information, principal component analysis, analysis of variance, and Fisher discriminant ratio) and combined with ten different types of classifiers (linear discriminant analysis, quadratic discriminant analysis, naïve Bayes, Gaussian process classification, support vector machine, artificial neural network, Adaboost, logistic regression, decision tree, and random forest) under the hypothesis that both missing values and outliers when replaced by computed medians will improve the risk stratification accuracy. Pima Indian diabetic dataset (768 patients: 268 diabetic and 500 controls) was used. Our results demonstrate that on replacing the missing values and outliers by group median and median values, respectively and further using the combination of random forest feature selection and random forest classification technique yields an accuracy, sensitivity, specificity, positive predictive value, negative predictive value and area under the curve as: 92.26%, 95.96%, 79.72%, 91.14%, 91.20%, and 0.93, respectively. This is an improvement of 10% over previously developed techniques published in literature. The system was validated for its stability and reliability. RF-based model showed the best performance when outliers are replaced by median values.

  2. Adapting GNU random forest program for Unix and Windows

    NASA Astrophysics Data System (ADS)

    Jirina, Marcel; Krayem, M. Said; Jirina, Marcel, Jr.

    2013-10-01

    The Random Forest is a well-known method and also a program for data clustering and classification. Unfortunately, the original Random Forest program is rather difficult to use. Here we describe a new version of this program originally written in Fortran 77. The modified program in Fortran 95 needs to be compiled only once and information for different tasks is passed with help of arguments. The program was tested with 24 data sets from UCI MLR and results are available on the net.

  3. Mapping Deforestation area in North Korea Using Phenology-based Multi-Index and Random Forest

    NASA Astrophysics Data System (ADS)

    Jin, Y.; Sung, S.; Lee, D. K.; Jeong, S.

    2016-12-01

    Forest ecosystem provides ecological benefits to both humans and wildlife. Growing global demand for food and fiber is accelerating the pressure on the forest ecosystem in whole world from agriculture and logging. In recently, North Korea lost almost 40 % of its forests to crop fields for food production and cut-down of forest for fuel woods between 1990 and 2015. It led to the increased damage caused by natural disasters and is known to be one of the most forest degraded areas in the world. The characteristic of forest landscape in North Korea is complex and heterogeneous, the major landscape types in the forest are hillside farm, unstocked forest, natural forest and plateau vegetation. Remote sensing can be used for the forest degradation mapping of a dynamic landscape at a broad scale of detail and spatial distribution. Confusion mostly occurred between hillside farmland and unstocked forest, but also between unstocked forest and forest. Most previous forest degradation that used focused on the classification of broad types such as deforests area and sand from the perspective of land cover classification. The objective of this study is using random forest for mapping degraded forest in North Korea by phenological based vegetation index derived from MODIS products, which has various environmental factors such as vegetation, soil and water at a regional scale for improving accuracy. The model created by random forest resulted in an overall accuracy was 91.44%. Class user's accuracy of hillside farmland and unstocked forest were 97.2% and 84%%, which indicate the degraded forest. Unstocked forest had relative low user accuracy due to misclassified hillside farmland and forest samples. Producer's accuracy of hillside farmland and unstocked forest were 85.2% and 93.3%, repectly. In this case hillside farmland had lower produce accuracy mainly due to confusion with field, unstocked forest and forest. Such a classification of degraded forest could supply essential information to decide the priority of forest management and restoration in degraded forest area.

  4. On the classification techniques in data mining for microarray data classification

    NASA Astrophysics Data System (ADS)

    Aydadenta, Husna; Adiwijaya

    2018-03-01

    Cancer is one of the deadly diseases, according to data from WHO by 2015 there are 8.8 million more deaths caused by cancer, and this will increase every year if not resolved earlier. Microarray data has become one of the most popular cancer-identification studies in the field of health, since microarray data can be used to look at levels of gene expression in certain cell samples that serve to analyze thousands of genes simultaneously. By using data mining technique, we can classify the sample of microarray data thus it can be identified with cancer or not. In this paper we will discuss some research using some data mining techniques using microarray data, such as Support Vector Machine (SVM), Artificial Neural Network (ANN), Naive Bayes, k-Nearest Neighbor (kNN), and C4.5, and simulation of Random Forest algorithm with technique of reduction dimension using Relief. The result of this paper show performance measure (accuracy) from classification algorithm (SVM, ANN, Naive Bayes, kNN, C4.5, and Random Forets).The results in this paper show the accuracy of Random Forest algorithm higher than other classification algorithms (Support Vector Machine (SVM), Artificial Neural Network (ANN), Naive Bayes, k-Nearest Neighbor (kNN), and C4.5). It is hoped that this paper can provide some information about the speed, accuracy, performance and computational cost generated from each Data Mining Classification Technique based on microarray data.

  5. Random Forest-Based Recognition of Isolated Sign Language Subwords Using Data from Accelerometers and Surface Electromyographic Sensors.

    PubMed

    Su, Ruiliang; Chen, Xiang; Cao, Shuai; Zhang, Xu

    2016-01-14

    Sign language recognition (SLR) has been widely used for communication amongst the hearing-impaired and non-verbal community. This paper proposes an accurate and robust SLR framework using an improved decision tree as the base classifier of random forests. This framework was used to recognize Chinese sign language subwords using recordings from a pair of portable devices worn on both arms consisting of accelerometers (ACC) and surface electromyography (sEMG) sensors. The experimental results demonstrated the validity of the proposed random forest-based method for recognition of Chinese sign language (CSL) subwords. With the proposed method, 98.25% average accuracy was obtained for the classification of a list of 121 frequently used CSL subwords. Moreover, the random forests method demonstrated a superior performance in resisting the impact of bad training samples. When the proportion of bad samples in the training set reached 50%, the recognition error rate of the random forest-based method was only 10.67%, while that of a single decision tree adopted in our previous work was almost 27.5%. Our study offers a practical way of realizing a robust and wearable EMG-ACC-based SLR systems.

  6. Pseudo CT estimation from MRI using patch-based random forest

    NASA Astrophysics Data System (ADS)

    Yang, Xiaofeng; Lei, Yang; Shu, Hui-Kuo; Rossi, Peter; Mao, Hui; Shim, Hyunsuk; Curran, Walter J.; Liu, Tian

    2017-02-01

    Recently, MR simulators gain popularity because of unnecessary radiation exposure of CT simulators being used in radiation therapy planning. We propose a method for pseudo CT estimation from MR images based on a patch-based random forest. Patient-specific anatomical features are extracted from the aligned training images and adopted as signatures for each voxel. The most robust and informative features are identified using feature selection to train the random forest. The well-trained random forest is used to predict the pseudo CT of a new patient. This prediction technique was tested with human brain images and the prediction accuracy was assessed using the original CT images. Peak signal-to-noise ratio (PSNR) and feature similarity (FSIM) indexes were used to quantify the differences between the pseudo and original CT images. The experimental results showed the proposed method could accurately generate pseudo CT images from MR images. In summary, we have developed a new pseudo CT prediction method based on patch-based random forest, demonstrated its clinical feasibility, and validated its prediction accuracy. This pseudo CT prediction technique could be a useful tool for MRI-based radiation treatment planning and attenuation correction in a PET/MRI scanner.

  7. Optimizing classification performance in an object-based very-high-resolution land use-land cover urban application

    NASA Astrophysics Data System (ADS)

    Georganos, Stefanos; Grippa, Tais; Vanhuysse, Sabine; Lennert, Moritz; Shimoni, Michal; Wolff, Eléonore

    2017-10-01

    This study evaluates the impact of three Feature Selection (FS) algorithms in an Object Based Image Analysis (OBIA) framework for Very-High-Resolution (VHR) Land Use-Land Cover (LULC) classification. The three selected FS algorithms, Correlation Based Selection (CFS), Mean Decrease in Accuracy (MDA) and Random Forest (RF) based Recursive Feature Elimination (RFE), were tested on Support Vector Machine (SVM), K-Nearest Neighbor, and Random Forest (RF) classifiers. The results demonstrate that the accuracy of SVM and KNN classifiers are the most sensitive to FS. The RF appeared to be more robust to high dimensionality, although a significant increase in accuracy was found by using the RFE method. In terms of classification accuracy, SVM performed the best using FS, followed by RF and KNN. Finally, only a small number of features is needed to achieve the highest performance using each classifier. This study emphasizes the benefits of rigorous FS for maximizing performance, as well as for minimizing model complexity and interpretation.

  8. Taxi-Out Time Prediction for Departures at Charlotte Airport Using Machine Learning Techniques

    NASA Technical Reports Server (NTRS)

    Lee, Hanbong; Malik, Waqar; Jung, Yoon C.

    2016-01-01

    Predicting the taxi-out times of departures accurately is important for improving airport efficiency and takeoff time predictability. In this paper, we attempt to apply machine learning techniques to actual traffic data at Charlotte Douglas International Airport for taxi-out time prediction. To find the key factors affecting aircraft taxi times, surface surveillance data is first analyzed. From this data analysis, several variables, including terminal concourse, spot, runway, departure fix and weight class, are selected for taxi time prediction. Then, various machine learning methods such as linear regression, support vector machines, k-nearest neighbors, random forest, and neural networks model are applied to actual flight data. Different traffic flow and weather conditions at Charlotte airport are also taken into account for more accurate prediction. The taxi-out time prediction results show that linear regression and random forest techniques can provide the most accurate prediction in terms of root-mean-square errors. We also discuss the operational complexity and uncertainties that make it difficult to predict the taxi times accurately.

  9. Toward Improving Electrocardiogram (ECG) Biometric Verification using Mobile Sensors: A Two-Stage Classifier Approach

    PubMed Central

    Tan, Robin; Perkowski, Marek

    2017-01-01

    Electrocardiogram (ECG) signals sensed from mobile devices pertain the potential for biometric identity recognition applicable in remote access control systems where enhanced data security is demanding. In this study, we propose a new algorithm that consists of a two-stage classifier combining random forest and wavelet distance measure through a probabilistic threshold schema, to improve the effectiveness and robustness of a biometric recognition system using ECG data acquired from a biosensor integrated into mobile devices. The proposed algorithm is evaluated using a mixed dataset from 184 subjects under different health conditions. The proposed two-stage classifier achieves a total of 99.52% subject verification accuracy, better than the 98.33% accuracy from random forest alone and 96.31% accuracy from wavelet distance measure algorithm alone. These results demonstrate the superiority of the proposed algorithm for biometric identification, hence supporting its practicality in areas such as cloud data security, cyber-security or remote healthcare systems. PMID:28230745

  10. Toward Improving Electrocardiogram (ECG) Biometric Verification using Mobile Sensors: A Two-Stage Classifier Approach.

    PubMed

    Tan, Robin; Perkowski, Marek

    2017-02-20

    Electrocardiogram (ECG) signals sensed from mobile devices pertain the potential for biometric identity recognition applicable in remote access control systems where enhanced data security is demanding. In this study, we propose a new algorithm that consists of a two-stage classifier combining random forest and wavelet distance measure through a probabilistic threshold schema, to improve the effectiveness and robustness of a biometric recognition system using ECG data acquired from a biosensor integrated into mobile devices. The proposed algorithm is evaluated using a mixed dataset from 184 subjects under different health conditions. The proposed two-stage classifier achieves a total of 99.52% subject verification accuracy, better than the 98.33% accuracy from random forest alone and 96.31% accuracy from wavelet distance measure algorithm alone. These results demonstrate the superiority of the proposed algorithm for biometric identification, hence supporting its practicality in areas such as cloud data security, cyber-security or remote healthcare systems.

  11. Ship Detection Based on Multiple Features in Random Forest Model for Hyperspectral Images

    NASA Astrophysics Data System (ADS)

    Li, N.; Ding, L.; Zhao, H.; Shi, J.; Wang, D.; Gong, X.

    2018-04-01

    A novel method for detecting ships which aim to make full use of both the spatial and spectral information from hyperspectral images is proposed. Firstly, the band which is high signal-noise ratio in the range of near infrared or short-wave infrared spectrum, is used to segment land and sea on Otsu threshold segmentation method. Secondly, multiple features that include spectral and texture features are extracted from hyperspectral images. Principal components analysis (PCA) is used to extract spectral features, the Grey Level Co-occurrence Matrix (GLCM) is used to extract texture features. Finally, Random Forest (RF) model is introduced to detect ships based on the extracted features. To illustrate the effectiveness of the method, we carry out experiments over the EO-1 data by comparing single feature and different multiple features. Compared with the traditional single feature method and Support Vector Machine (SVM) model, the proposed method can stably achieve the target detection of ships under complex background and can effectively improve the detection accuracy of ships.

  12. Classification of large-sized hyperspectral imagery using fast machine learning algorithms

    NASA Astrophysics Data System (ADS)

    Xia, Junshi; Yokoya, Naoto; Iwasaki, Akira

    2017-07-01

    We present a framework of fast machine learning algorithms in the context of large-sized hyperspectral images classification from the theoretical to a practical viewpoint. In particular, we assess the performance of random forest (RF), rotation forest (RoF), and extreme learning machine (ELM) and the ensembles of RF and ELM. These classifiers are applied to two large-sized hyperspectral images and compared to the support vector machines. To give the quantitative analysis, we pay attention to comparing these methods when working with high input dimensions and a limited/sufficient training set. Moreover, other important issues such as the computational cost and robustness against the noise are also discussed.

  13. Detecting targets hidden in random forests

    NASA Astrophysics Data System (ADS)

    Kouritzin, Michael A.; Luo, Dandan; Newton, Fraser; Wu, Biao

    2009-05-01

    Military tanks, cargo or troop carriers, missile carriers or rocket launchers often hide themselves from detection in the forests. This plagues the detection problem of locating these hidden targets. An electro-optic camera mounted on a surveillance aircraft or unmanned aerial vehicle is used to capture the images of the forests with possible hidden targets, e.g., rocket launchers. We consider random forests of longitudinal and latitudinal correlations. Specifically, foliage coverage is encoded with a binary representation (i.e., foliage or no foliage), and is correlated in adjacent regions. We address the detection problem of camouflaged targets hidden in random forests by building memory into the observations. In particular, we propose an efficient algorithm to generate random forests, ground, and camouflage of hidden targets with two dimensional correlations. The observations are a sequence of snapshots consisting of foliage-obscured ground or target. Theoretically, detection is possible because there are subtle differences in the correlations of the ground and camouflage of the rocket launcher. However, these differences are well beyond human perception. To detect the presence of hidden targets automatically, we develop a Markov representation for these sequences and modify the classical filtering equations to allow the Markov chain observation. Particle filters are used to estimate the position of the targets in combination with a novel random weighting technique. Furthermore, we give positive proof-of-concept simulations.

  14. Screening large-scale association study data: exploiting interactions using random forests.

    PubMed

    Lunetta, Kathryn L; Hayward, L Brooke; Segal, Jonathan; Van Eerdewegh, Paul

    2004-12-10

    Genome-wide association studies for complex diseases will produce genotypes on hundreds of thousands of single nucleotide polymorphisms (SNPs). A logical first approach to dealing with massive numbers of SNPs is to use some test to screen the SNPs, retaining only those that meet some criterion for further study. For example, SNPs can be ranked by p-value, and those with the lowest p-values retained. When SNPs have large interaction effects but small marginal effects in a population, they are unlikely to be retained when univariate tests are used for screening. However, model-based screens that pre-specify interactions are impractical for data sets with thousands of SNPs. Random forest analysis is an alternative method that produces a single measure of importance for each predictor variable that takes into account interactions among variables without requiring model specification. Interactions increase the importance for the individual interacting variables, making them more likely to be given high importance relative to other variables. We test the performance of random forests as a screening procedure to identify small numbers of risk-associated SNPs from among large numbers of unassociated SNPs using complex disease models with up to 32 loci, incorporating both genetic heterogeneity and multi-locus interaction. Keeping other factors constant, if risk SNPs interact, the random forest importance measure significantly outperforms the Fisher Exact test as a screening tool. As the number of interacting SNPs increases, the improvement in performance of random forest analysis relative to Fisher Exact test for screening also increases. Random forests perform similarly to the univariate Fisher Exact test as a screening tool when SNPs in the analysis do not interact. In the context of large-scale genetic association studies where unknown interactions exist among true risk-associated SNPs or SNPs and environmental covariates, screening SNPs using random forest analyses can significantly reduce the number of SNPs that need to be retained for further study compared to standard univariate screening methods.

  15. Application of lifting wavelet and random forest in compound fault diagnosis of gearbox

    NASA Astrophysics Data System (ADS)

    Chen, Tang; Cui, Yulian; Feng, Fuzhou; Wu, Chunzhi

    2018-03-01

    Aiming at the weakness of compound fault characteristic signals of a gearbox of an armored vehicle and difficult to identify fault types, a fault diagnosis method based on lifting wavelet and random forest is proposed. First of all, this method uses the lifting wavelet transform to decompose the original vibration signal in multi-layers, reconstructs the multi-layer low-frequency and high-frequency components obtained by the decomposition to get multiple component signals. Then the time-domain feature parameters are obtained for each component signal to form multiple feature vectors, which is input into the random forest pattern recognition classifier to determine the compound fault type. Finally, a variety of compound fault data of the gearbox fault analog test platform are verified, the results show that the recognition accuracy of the fault diagnosis method combined with the lifting wavelet and the random forest is up to 99.99%.

  16. Evaluating the statistical performance of less applied algorithms in classification of worldview-3 imagery data in an urbanized landscape

    NASA Astrophysics Data System (ADS)

    Ranaie, Mehrdad; Soffianian, Alireza; Pourmanafi, Saeid; Mirghaffari, Noorollah; Tarkesh, Mostafa

    2018-03-01

    In recent decade, analyzing the remotely sensed imagery is considered as one of the most common and widely used procedures in the environmental studies. In this case, supervised image classification techniques play a central role. Hence, taking a high resolution Worldview-3 over a mixed urbanized landscape in Iran, three less applied image classification methods including Bagged CART, Stochastic gradient boosting model and Neural network with feature extraction were tested and compared with two prevalent methods: random forest and support vector machine with linear kernel. To do so, each method was run ten time and three validation techniques was used to estimate the accuracy statistics consist of cross validation, independent validation and validation with total of train data. Moreover, using ANOVA and Tukey test, statistical difference significance between the classification methods was significantly surveyed. In general, the results showed that random forest with marginal difference compared to Bagged CART and stochastic gradient boosting model is the best performing method whilst based on independent validation there was no significant difference between the performances of classification methods. It should be finally noted that neural network with feature extraction and linear support vector machine had better processing speed than other.

  17. Modeling and Prediction of Solvent Effect on Human Skin Permeability using Support Vector Regression and Random Forest.

    PubMed

    Baba, Hiromi; Takahara, Jun-ichi; Yamashita, Fumiyoshi; Hashida, Mitsuru

    2015-11-01

    The solvent effect on skin permeability is important for assessing the effectiveness and toxicological risk of new dermatological formulations in pharmaceuticals and cosmetics development. The solvent effect occurs by diverse mechanisms, which could be elucidated by efficient and reliable prediction models. However, such prediction models have been hampered by the small variety of permeants and mixture components archived in databases and by low predictive performance. Here, we propose a solution to both problems. We first compiled a novel large database of 412 samples from 261 structurally diverse permeants and 31 solvents reported in the literature. The data were carefully screened to ensure their collection under consistent experimental conditions. To construct a high-performance predictive model, we then applied support vector regression (SVR) and random forest (RF) with greedy stepwise descriptor selection to our database. The models were internally and externally validated. The SVR achieved higher performance statistics than RF. The (externally validated) determination coefficient, root mean square error, and mean absolute error of SVR were 0.899, 0.351, and 0.268, respectively. Moreover, because all descriptors are fully computational, our method can predict as-yet unsynthesized compounds. Our high-performance prediction model offers an attractive alternative to permeability experiments for pharmaceutical and cosmetic candidate screening and optimizing skin-permeable topical formulations.

  18. DHSpred: support-vector-machine-based human DNase I hypersensitive sites prediction using the optimal features selected by random forest.

    PubMed

    Manavalan, Balachandran; Shin, Tae Hwan; Lee, Gwang

    2018-01-05

    DNase I hypersensitive sites (DHSs) are genomic regions that provide important information regarding the presence of transcriptional regulatory elements and the state of chromatin. Therefore, identifying DHSs in uncharacterized DNA sequences is crucial for understanding their biological functions and mechanisms. Although many experimental methods have been proposed to identify DHSs, they have proven to be expensive for genome-wide application. Therefore, it is necessary to develop computational methods for DHS prediction. In this study, we proposed a support vector machine (SVM)-based method for predicting DHSs, called DHSpred (DNase I Hypersensitive Site predictor in human DNA sequences), which was trained with 174 optimal features. The optimal combination of features was identified from a large set that included nucleotide composition and di- and trinucleotide physicochemical properties, using a random forest algorithm. DHSpred achieved a Matthews correlation coefficient and accuracy of 0.660 and 0.871, respectively, which were 3% higher than those of control SVM predictors trained with non-optimized features, indicating the efficiency of the feature selection method. Furthermore, the performance of DHSpred was superior to that of state-of-the-art predictors. An online prediction server has been developed to assist the scientific community, and is freely available at: http://www.thegleelab.org/DHSpred.html.

  19. DHSpred: support-vector-machine-based human DNase I hypersensitive sites prediction using the optimal features selected by random forest

    PubMed Central

    Manavalan, Balachandran; Shin, Tae Hwan; Lee, Gwang

    2018-01-01

    DNase I hypersensitive sites (DHSs) are genomic regions that provide important information regarding the presence of transcriptional regulatory elements and the state of chromatin. Therefore, identifying DHSs in uncharacterized DNA sequences is crucial for understanding their biological functions and mechanisms. Although many experimental methods have been proposed to identify DHSs, they have proven to be expensive for genome-wide application. Therefore, it is necessary to develop computational methods for DHS prediction. In this study, we proposed a support vector machine (SVM)-based method for predicting DHSs, called DHSpred (DNase I Hypersensitive Site predictor in human DNA sequences), which was trained with 174 optimal features. The optimal combination of features was identified from a large set that included nucleotide composition and di- and trinucleotide physicochemical properties, using a random forest algorithm. DHSpred achieved a Matthews correlation coefficient and accuracy of 0.660 and 0.871, respectively, which were 3% higher than those of control SVM predictors trained with non-optimized features, indicating the efficiency of the feature selection method. Furthermore, the performance of DHSpred was superior to that of state-of-the-art predictors. An online prediction server has been developed to assist the scientific community, and is freely available at: http://www.thegleelab.org/DHSpred.html PMID:29416743

  20. Assessing and comparison of different machine learning methods in parent-offspring trios for genotype imputation.

    PubMed

    Mikhchi, Abbas; Honarvar, Mahmood; Kashan, Nasser Emam Jomeh; Aminafshar, Mehdi

    2016-06-21

    Genotype imputation is an important tool for prediction of unknown genotypes for both unrelated individuals and parent-offspring trios. Several imputation methods are available and can either employ universal machine learning methods, or deploy algorithms dedicated to infer missing genotypes. In this research the performance of eight machine learning methods: Support Vector Machine, K-Nearest Neighbors, Extreme Learning Machine, Radial Basis Function, Random Forest, AdaBoost, LogitBoost, and TotalBoost compared in terms of the imputation accuracy, computation time and the factors affecting imputation accuracy. The methods employed using real and simulated datasets to impute the un-typed SNPs in parent-offspring trios. The tested methods show that imputation of parent-offspring trios can be accurate. The Random Forest and Support Vector Machine were more accurate than the other machine learning methods. The TotalBoost performed slightly worse than the other methods.The running times were different between methods. The ELM was always most fast algorithm. In case of increasing the sample size, the RBF requires long imputation time.The tested methods in this research can be an alternative for imputation of un-typed SNPs in low missing rate of data. However, it is recommended that other machine learning methods to be used for imputation. Copyright © 2016 Elsevier Ltd. All rights reserved.

  1. D Semantic Labeling of ALS Data Based on Domain Adaption by Transferring and Fusing Random Forest Models

    NASA Astrophysics Data System (ADS)

    Wu, J.; Yao, W.; Zhang, J.; Li, Y.

    2018-04-01

    Labeling 3D point cloud data with traditional supervised learning methods requires considerable labelled samples, the collection of which is cost and time expensive. This work focuses on adopting domain adaption concept to transfer existing trained random forest classifiers (based on source domain) to new data scenes (target domain), which aims at reducing the dependence of accurate 3D semantic labeling in point clouds on training samples from the new data scene. Firstly, two random forest classifiers were firstly trained with existing samples previously collected for other data. They were different from each other by using two different decision tree construction algorithms: C4.5 with information gain ratio and CART with Gini index. Secondly, four random forest classifiers adapted to the target domain are derived through transferring each tree in the source random forest models with two types of operations: structure expansion and reduction-SER and structure transfer-STRUT. Finally, points in target domain are labelled by fusing the four newly derived random forest classifiers using weights of evidence based fusion model. To validate our method, experimental analysis was conducted using 3 datasets: one is used as the source domain data (Vaihingen data for 3D Semantic Labelling); another two are used as the target domain data from two cities in China (Jinmen city and Dunhuang city). Overall accuracies of 85.5 % and 83.3 % for 3D labelling were achieved for Jinmen city and Dunhuang city data respectively, with only 1/3 newly labelled samples compared to the cases without domain adaption.

  2. Predicting stem total and assortment volumes in an industrial Pinus taeda L. forest plantation using airborne laser scanning data and random forest

    Treesearch

    Carlos Alberto Silva; Carine Klauberg; Andrew Thomas Hudak; Lee Alexander Vierling; Wan Shafrina Wan Mohd Jaafar; Midhun Mohan; Mariano Garcia; Antonio Ferraz; Adrian Cardil; Sassan Saatchi

    2017-01-01

    Improvements in the management of pine plantations result in multiple industrial and environmental benefits. Remote sensing techniques can dramatically increase the efficiency of plantation management by reducing or replacing time-consuming field sampling. We tested the utility and accuracy of combining field and airborne lidar data with Random Forest, a supervised...

  3. Uncertainty in Random Forests: What does it mean in a spatial context?

    NASA Astrophysics Data System (ADS)

    Klump, Jens; Fouedjio, Francky

    2017-04-01

    Geochemical surveys are an important part of exploration for mineral resources and in environmental studies. The samples and chemical analyses are often laborious and difficult to obtain and therefore come at a high cost. As a consequence, these surveys are characterised by datasets with large numbers of variables but relatively few data points when compared to conventional big data problems. With more remote sensing platforms and sensor networks being deployed, large volumes of auxiliary data of the surveyed areas are becoming available. The use of these auxiliary data has the potential to improve the prediction of chemical element concentrations over the whole study area. Kriging is a well established geostatistical method for the prediction of spatial data but requires significant pre-processing and makes some basic assumptions about the underlying distribution of the data. Some machine learning algorithms, on the other hand, may require less data pre-processing and are non-parametric. In this study we used a dataset provided by Kirkwood et al. [1] to explore the potential use of Random Forest in geochemical mapping. We chose Random Forest because it is a well understood machine learning method and has the advantage that it provides us with a measure of uncertainty. By comparing Random Forest to Kriging we found that both methods produced comparable maps of estimated values for our variables of interest. Kriging outperformed Random Forest for variables of interest with relatively strong spatial correlation. The measure of uncertainty provided by Random Forest seems to be quite different to the measure of uncertainty provided by Kriging. In particular, the lack of spatial context can give misleading results in areas without ground truth data. In conclusion, our preliminary results show that the model driven approach in geostatistics gives us more reliable estimates for our target variables than Random Forest for variables with relatively strong spatial correlation. However, in cases of weak spatial correlation Random Forest, as a nonparametric method, may give the better results once we have a better understanding of the meaning of its uncertainty measures in a spatial context. References [1] Kirkwood, C., M. Cave, D. Beamish, S. Grebby, and A. Ferreira (2016), A machine learning approach to geochemical mapping, Journal of Geochemical Exploration, 163, 28-40, doi:10.1016/j.gexplo.2016.05.003.

  4. Introducing two Random Forest based methods for cloud detection in remote sensing images

    NASA Astrophysics Data System (ADS)

    Ghasemian, Nafiseh; Akhoondzadeh, Mehdi

    2018-07-01

    Cloud detection is a necessary phase in satellite images processing to retrieve the atmospheric and lithospheric parameters. Currently, some cloud detection methods based on Random Forest (RF) model have been proposed but they do not consider both spectral and textural characteristics of the image. Furthermore, they have not been tested in the presence of snow/ice. In this paper, we introduce two RF based algorithms, Feature Level Fusion Random Forest (FLFRF) and Decision Level Fusion Random Forest (DLFRF) to incorporate visible, infrared (IR) and thermal spectral and textural features (FLFRF) including Gray Level Co-occurrence Matrix (GLCM) and Robust Extended Local Binary Pattern (RELBP_CI) or visible, IR and thermal classifiers (DLFRF) for highly accurate cloud detection on remote sensing images. FLFRF first fuses visible, IR and thermal features. Thereafter, it uses the RF model to classify pixels to cloud, snow/ice and background or thick cloud, thin cloud and background. DLFRF considers visible, IR and thermal features (both spectral and textural) separately and inserts each set of features to RF model. Then, it holds vote matrix of each run of the model. Finally, it fuses the classifiers using the majority vote method. To demonstrate the effectiveness of the proposed algorithms, 10 Terra MODIS and 15 Landsat 8 OLI/TIRS images with different spatial resolutions are used in this paper. Quantitative analyses are based on manually selected ground truth data. Results show that after adding RELBP_CI to input feature set cloud detection accuracy improves. Also, the average cloud kappa values of FLFRF and DLFRF on MODIS images (1 and 0.99) are higher than other machine learning methods, Linear Discriminate Analysis (LDA), Classification And Regression Tree (CART), K Nearest Neighbor (KNN) and Support Vector Machine (SVM) (0.96). The average snow/ice kappa values of FLFRF and DLFRF on MODIS images (1 and 0.85) are higher than other traditional methods. The quantitative values on Landsat 8 images show similar trend. Consequently, while SVM and K-nearest neighbor show overestimation in predicting cloud and snow/ice pixels, our Random Forest (RF) based models can achieve higher cloud, snow/ice kappa values on MODIS and thin cloud, thick cloud and snow/ice kappa values on Landsat 8 images. Our algorithms predict both thin and thick cloud on Landsat 8 images while the existing cloud detection algorithm, Fmask cannot discriminate them. Compared to the state-of-the-art methods, our algorithms have acquired higher average cloud and snow/ice kappa values for different spatial resolutions.

  5. Prediction of Baseflow Index of Catchments using Machine Learning Algorithms

    NASA Astrophysics Data System (ADS)

    Yadav, B.; Hatfield, K.

    2017-12-01

    We present the results of eight machine learning techniques for predicting the baseflow index (BFI) of ungauged basins using a surrogate of catchment scale climate and physiographic data. The tested algorithms include ordinary least squares, ridge regression, least absolute shrinkage and selection operator (lasso), elasticnet, support vector machine, gradient boosted regression trees, random forests, and extremely randomized trees. Our work seeks to identify the dominant controls of BFI that can be readily obtained from ancillary geospatial databases and remote sensing measurements, such that the developed techniques can be extended to ungauged catchments. More than 800 gauged catchments spanning the continental United States were selected to develop the general methodology. The BFI calculation was based on the baseflow separated from daily streamflow hydrograph using HYSEP filter. The surrogate catchment attributes were compiled from multiple sources including digital elevation model, soil, landuse, climate data, other publicly available ancillary and geospatial data. 80% catchments were used to train the ML algorithms, and the remaining 20% of the catchments were used as an independent test set to measure the generalization performance of fitted models. A k-fold cross-validation using exhaustive grid search was used to fit the hyperparameters of each model. Initial model development was based on 19 independent variables, but after variable selection and feature ranking, we generated revised sparse models of BFI prediction that are based on only six catchment attributes. These key predictive variables selected after the careful evaluation of bias-variance tradeoff include average catchment elevation, slope, fraction of sand, permeability, temperature, and precipitation. The most promising algorithms exceeding an accuracy score (r-square) of 0.7 on test data include support vector machine, gradient boosted regression trees, random forests, and extremely randomized trees. Considering both the accuracy and the computational complexity of these algorithms, we identify the extremely randomized trees as the best performing algorithm for BFI prediction in ungauged basins.

  6. Correcting Classifiers for Sample Selection Bias in Two-Phase Case-Control Studies

    PubMed Central

    Theis, Fabian J.

    2017-01-01

    Epidemiological studies often utilize stratified data in which rare outcomes or exposures are artificially enriched. This design can increase precision in association tests but distorts predictions when applying classifiers on nonstratified data. Several methods correct for this so-called sample selection bias, but their performance remains unclear especially for machine learning classifiers. With an emphasis on two-phase case-control studies, we aim to assess which corrections to perform in which setting and to obtain methods suitable for machine learning techniques, especially the random forest. We propose two new resampling-based methods to resemble the original data and covariance structure: stochastic inverse-probability oversampling and parametric inverse-probability bagging. We compare all techniques for the random forest and other classifiers, both theoretically and on simulated and real data. Empirical results show that the random forest profits from only the parametric inverse-probability bagging proposed by us. For other classifiers, correction is mostly advantageous, and methods perform uniformly. We discuss consequences of inappropriate distribution assumptions and reason for different behaviors between the random forest and other classifiers. In conclusion, we provide guidance for choosing correction methods when training classifiers on biased samples. For random forests, our method outperforms state-of-the-art procedures if distribution assumptions are roughly fulfilled. We provide our implementation in the R package sambia. PMID:29312464

  7. A Comparison between Decision Tree and Random Forest in Determining the Risk Factors Associated with Type 2 Diabetes.

    PubMed

    Esmaily, Habibollah; Tayefi, Maryam; Doosti, Hassan; Ghayour-Mobarhan, Majid; Nezami, Hossein; Amirabadizadeh, Alireza

    2018-04-24

    We aimed to identify the associated risk factors of type 2 diabetes mellitus (T2DM) using data mining approach, decision tree and random forest techniques using the Mashhad Stroke and Heart Atherosclerotic Disorders (MASHAD) Study program. A cross-sectional study. The MASHAD study started in 2010 and will continue until 2020. Two data mining tools, namely decision trees, and random forests, are used for predicting T2DM when some other characteristics are observed on 9528 subjects recruited from MASHAD database. This paper makes a comparison between these two models in terms of accuracy, sensitivity, specificity and the area under ROC curve. The prevalence rate of T2DM was 14% among these subjects. The decision tree model has 64.9% accuracy, 64.5% sensitivity, 66.8% specificity, and area under the ROC curve measuring 68.6%, while the random forest model has 71.1% accuracy, 71.3% sensitivity, 69.9% specificity, and area under the ROC curve measuring 77.3% respectively. The random forest model, when used with demographic, clinical, and anthropometric and biochemical measurements, can provide a simple tool to identify associated risk factors for type 2 diabetes. Such identification can substantially use for managing the health policy to reduce the number of subjects with T2DM .

  8. Applications of random forest feature selection for fine-scale genetic population assignment.

    PubMed

    Sylvester, Emma V A; Bentzen, Paul; Bradbury, Ian R; Clément, Marie; Pearce, Jon; Horne, John; Beiko, Robert G

    2018-02-01

    Genetic population assignment used to inform wildlife management and conservation efforts requires panels of highly informative genetic markers and sensitive assignment tests. We explored the utility of machine-learning algorithms (random forest, regularized random forest and guided regularized random forest) compared with F ST ranking for selection of single nucleotide polymorphisms (SNP) for fine-scale population assignment. We applied these methods to an unpublished SNP data set for Atlantic salmon ( Salmo salar ) and a published SNP data set for Alaskan Chinook salmon ( Oncorhynchus tshawytscha ). In each species, we identified the minimum panel size required to obtain a self-assignment accuracy of at least 90% using each method to create panels of 50-700 markers Panels of SNPs identified using random forest-based methods performed up to 7.8 and 11.2 percentage points better than F ST -selected panels of similar size for the Atlantic salmon and Chinook salmon data, respectively. Self-assignment accuracy ≥90% was obtained with panels of 670 and 384 SNPs for each data set, respectively, a level of accuracy never reached for these species using F ST -selected panels. Our results demonstrate a role for machine-learning approaches in marker selection across large genomic data sets to improve assignment for management and conservation of exploited populations.

  9. Do little interactions get lost in dark random forests?

    PubMed

    Wright, Marvin N; Ziegler, Andreas; König, Inke R

    2016-03-31

    Random forests have often been claimed to uncover interaction effects. However, if and how interaction effects can be differentiated from marginal effects remains unclear. In extensive simulation studies, we investigate whether random forest variable importance measures capture or detect gene-gene interactions. With capturing interactions, we define the ability to identify a variable that acts through an interaction with another one, while detection is the ability to identify an interaction effect as such. Of the single importance measures, the Gini importance captured interaction effects in most of the simulated scenarios, however, they were masked by marginal effects in other variables. With the permutation importance, the proportion of captured interactions was lower in all cases. Pairwise importance measures performed about equal, with a slight advantage for the joint variable importance method. However, the overall fraction of detected interactions was low. In almost all scenarios the detection fraction in a model with only marginal effects was larger than in a model with an interaction effect only. Random forests are generally capable of capturing gene-gene interactions, but current variable importance measures are unable to detect them as interactions. In most of the cases, interactions are masked by marginal effects and interactions cannot be differentiated from marginal effects. Consequently, caution is warranted when claiming that random forests uncover interactions.

  10. Application of random survival forests in understanding the determinants of under-five child mortality in Uganda in the presence of covariates that satisfy the proportional and non-proportional hazards assumption.

    PubMed

    Nasejje, Justine B; Mwambi, Henry

    2017-09-07

    Uganda just like any other Sub-Saharan African country, has a high under-five child mortality rate. To inform policy on intervention strategies, sound statistical methods are required to critically identify factors strongly associated with under-five child mortality rates. The Cox proportional hazards model has been a common choice in analysing data to understand factors strongly associated with high child mortality rates taking age as the time-to-event variable. However, due to its restrictive proportional hazards (PH) assumption, some covariates of interest which do not satisfy the assumption are often excluded in the analysis to avoid mis-specifying the model. Otherwise using covariates that clearly violate the assumption would mean invalid results. Survival trees and random survival forests are increasingly becoming popular in analysing survival data particularly in the case of large survey data and could be attractive alternatives to models with the restrictive PH assumption. In this article, we adopt random survival forests which have never been used in understanding factors affecting under-five child mortality rates in Uganda using Demographic and Health Survey data. Thus the first part of the analysis is based on the use of the classical Cox PH model and the second part of the analysis is based on the use of random survival forests in the presence of covariates that do not necessarily satisfy the PH assumption. Random survival forests and the Cox proportional hazards model agree that the sex of the household head, sex of the child, number of births in the past 1 year are strongly associated to under-five child mortality in Uganda given all the three covariates satisfy the PH assumption. Random survival forests further demonstrated that covariates that were originally excluded from the earlier analysis due to violation of the PH assumption were important in explaining under-five child mortality rates. These covariates include the number of children under the age of five in a household, number of births in the past 5 years, wealth index, total number of children ever born and the child's birth order. The results further indicated that the predictive performance for random survival forests built using covariates including those that violate the PH assumption was higher than that for random survival forests built using only covariates that satisfy the PH assumption. Random survival forests are appealing methods in analysing public health data to understand factors strongly associated with under-five child mortality rates especially in the presence of covariates that violate the proportional hazards assumption.

  11. Movement trajectories and habitat partitioning of small mammals in logged and unlogged rain forests on Borneo.

    PubMed

    Wells, Konstans; Pfeiffer, Martin; Lakim, Maklarin B; Kalko, Elisabeth K V

    2006-09-01

    1. Non-volant animals in tropical rain forests differ in their ability to exploit the habitat above the forest floor and also in their response to habitat variability. It is predicted that specific movement trajectories are determined both by intrinsic factors such as ecological specialization, morphology and body size and by structural features of the surrounding habitat such as undergrowth and availability of supportive structures. 2. We applied spool-and-line tracking in order to describe movement trajectories and habitat segregation of eight species of small mammals from an assemblage of Muridae, Tupaiidae and Sciuridae in the rain forest of Borneo where we followed a total of 13,525 m path. We also analysed specific changes in the movement patterns of the small mammals in relation to habitat stratification between logged and unlogged forests. Variables related to climbing activity of the tracked species as well as the supportive structures of the vegetation and undergrowth density were measured along their tracks. 3. Movement patterns of the small mammals differed significantly between species. Most similarities were found in congeneric species that converged strongly in body size and morphology. All species were affected in their movement patterns by the altered forest structure in logged forests with most differences found in Leopoldamys sabanus. However, the large proportions of short step lengths found in all species for both forest types and similar path tortuosity suggest that the main movement strategies of the small mammals were not influenced by logging but comprised generally a response to the heterogeneous habitat as opposed to random movement strategies predicted for homogeneous environments. 4. Overall shifts in microhabitat use showed no coherent trend among species. Multivariate (principal component) analysis revealed contrasting trends for convergent species, in particular for Maxomys rajah and M. surifer as well as for Tupaia longipes and T. tana, suggesting that each species was uniquely affected in its movement trajectories by a multiple set of environmental and intrinsic features.

  12. Validation of a Remote Sensing Based Index of Forest Disturbance Using Streamwater Nitrogen Data

    NASA Technical Reports Server (NTRS)

    Eshleman, Keith N.; McNeil, Brenden E.; Townsend, Philip A.

    2008-01-01

    Vegetation disturbances are known to alter the functioning of forested ecosystems by contributing to export ('leakage') of dissolved nitrogen (N), typically nitrate-N, from watersheds that can contribute to acidification of acid-sensitive streams, leaching of base cations, and eutrophication of downstream receiving waters. Yet, at a landscape scale, direct evaluation of how disturbance is linked to spatial variability in N leakage is complicated by the fact that disturbances operate at different spatial scales, over different timescales, and at different intensities. In this paper we explore whether data from synoptic streamwater surveys conducted in an Appalachian oak-dominated forested river basin in western MD (USA) can be used to test and validate a scalable, synthetic, and integrative forest disturbance index (FDI) derived from Landsat imagery. In particular, we found support for the hypothesis that the interannual variation in spring baseflow total dissolved nitrogen (TDN) and nitrate-N concentrations measured at 35 randomly selected stream stations varied as a linear function of the change in FDI computed for the corresponding set of subwatersheds. Our results demonstrate that the combined effects of forest disturbances can be detected using synoptic water quality data. It appears that careful timing of the synoptic baseflow sampling under comparable phenological and hydrometeorological conditions increased our ability to identify a forest disturbance signal.

  13. Machine-Learning Techniques for the Determination of Attrition of Forces Due to Atmospheric Conditions

    DTIC Science & Technology

    2018-02-01

    the possibility of a correlation between aircraft incidents in the National Transportation Safety Board database and meteorological conditions. If a...strong correlation could be found, it could be used to derive a model to predict aircraft incidents and become part of a decision support tool for...techniques, primarily the random forest algorithm, were used to explore the possibility of a correlation between aircraft incidents in the National

  14. A Predictive Analysis of the Department of Defense Distribution System Utilizing Random Forests

    DTIC Science & Technology

    2016-06-01

    resources capable of meeting both customer and individual resource constraints and goals while also maximizing the global benefit to the supply...and probability rules to determine the optimal red wine distribution network for an Italian-based wine producer. The decision support model for...combinations of factors that will result in delivery of the highest quality wines . The model’s first stage inputs basic logistics information to look

  15. A Random Forest approach to predict the spatial distribution of sediment pollution in an estuarine system

    PubMed Central

    Kreakie, Betty J.; Cantwell, Mark G.; Nacci, Diane

    2017-01-01

    Modeling the magnitude and distribution of sediment-bound pollutants in estuaries is often limited by incomplete knowledge of the site and inadequate sample density. To address these modeling limitations, a decision-support tool framework was conceived that predicts sediment contamination from the sub-estuary to broader estuary extent. For this study, a Random Forest (RF) model was implemented to predict the distribution of a model contaminant, triclosan (5-chloro-2-(2,4-dichlorophenoxy)phenol) (TCS), in Narragansett Bay, Rhode Island, USA. TCS is an unregulated contaminant used in many personal care products. The RF explanatory variables were associated with TCS transport and fate (proxies) and direct and indirect environmental entry. The continuous RF TCS concentration predictions were discretized into three levels of contamination (low, medium, and high) for three different quantile thresholds. The RF model explained 63% of the variance with a minimum number of variables. Total organic carbon (TOC) (transport and fate proxy) was a strong predictor of TCS contamination causing a mean squared error increase of 59% when compared to permutations of randomized values of TOC. Additionally, combined sewer overflow discharge (environmental entry) and sand (transport and fate proxy) were strong predictors. The discretization models identified a TCS area of greatest concern in the northern reach of Narragansett Bay (Providence River sub-estuary), which was validated with independent test samples. This decision-support tool performed well at the sub-estuary extent and provided the means to identify areas of concern and prioritize bay-wide sampling. PMID:28738089

  16. The Effects of Forest Therapy on Coping with Chronic Widespread Pain: Physiological and Psychological Differences between Participants in a Forest Therapy Program and a Control Group.

    PubMed

    Han, Jin-Woo; Choi, Han; Jeon, Yo-Han; Yoon, Chong-Hyeon; Woo, Jong-Min; Kim, Won

    2016-02-24

    This study aimed to investigate the effects of a two-day forest therapy program on individuals with chronic widespread pain. Sixty one employees of a public organization providing building and facilities management services within the Seoul Metropolitan area participated in the study. Participants were assigned to an experimental group (n = 33) who participated in a forest therapy program or a control group (n = 28) on a non-random basis. Pre- and post-measures of heart rate variability (HRV), Natural Killer cell (NK cell) activity, self-reported pain using the visual analog scale (VAS), depression level using the Beck Depression Inventory (BDI), and health-related quality of life measures using the EuroQol Visual Analog Scale (EQ-VAS) were collected in both groups. The results showed that participants in the forest therapy group, as compared to the control group, showed physiological improvement as indicated by a significant increase in some measures of HRV and an increase in immune competence as indicated by NK cell activity. Participants in the forest therapy group also reported significant decreases in pain and depression, and a significant improvement in health-related quality of life. These results support the hypothesis that forest therapy is an effective intervention to relieve pain and associated psychological and physiological symptoms in individuals with chronic widespread pain.

  17. Comparison of Models for the Prediction of Medical Costs of Spinal Fusion in Taiwan Diagnosis-Related Groups by Machine Learning Algorithms.

    PubMed

    Kuo, Ching-Yen; Yu, Liang-Chin; Chen, Hou-Chaung; Chan, Chien-Lung

    2018-01-01

    The aims of this study were to compare the performance of machine learning methods for the prediction of the medical costs associated with spinal fusion in terms of profit or loss in Taiwan Diagnosis-Related Groups (Tw-DRGs) and to apply these methods to explore the important factors associated with the medical costs of spinal fusion. A data set was obtained from a regional hospital in Taoyuan city in Taiwan, which contained data from 2010 to 2013 on patients of Tw-DRG49702 (posterior and other spinal fusion without complications or comorbidities). Naïve-Bayesian, support vector machines, logistic regression, C4.5 decision tree, and random forest methods were employed for prediction using WEKA 3.8.1. Five hundred thirty-two cases were categorized as belonging to the Tw-DRG49702 group. The mean medical cost was US $4,549.7, and the mean age of the patients was 62.4 years. The mean length of stay was 9.3 days. The length of stay was an important variable in terms of determining medical costs for patients undergoing spinal fusion. The random forest method had the best predictive performance in comparison to the other methods, achieving an accuracy of 84.30%, a sensitivity of 71.4%, a specificity of 92.2%, and an AUC of 0.904. Our study demonstrated that the random forest model can be employed to predict the medical costs of Tw-DRG49702, and could inform hospital strategy in terms of increasing the financial management efficiency of this operation.

  18. Comparison of machine-learning methods for above-ground biomass estimation based on Landsat imagery

    NASA Astrophysics Data System (ADS)

    Wu, Chaofan; Shen, Huanhuan; Shen, Aihua; Deng, Jinsong; Gan, Muye; Zhu, Jinxia; Xu, Hongwei; Wang, Ke

    2016-07-01

    Biomass is one significant biophysical parameter of a forest ecosystem, and accurate biomass estimation on the regional scale provides important information for carbon-cycle investigation and sustainable forest management. In this study, Landsat satellite imagery data combined with field-based measurements were integrated through comparisons of five regression approaches [stepwise linear regression, K-nearest neighbor, support vector regression, random forest (RF), and stochastic gradient boosting] with two different candidate variable strategies to implement the optimal spatial above-ground biomass (AGB) estimation. The results suggested that RF algorithm exhibited the best performance by 10-fold cross-validation with respect to R2 (0.63) and root-mean-square error (26.44 ton/ha). Consequently, the map of estimated AGB was generated with a mean value of 89.34 ton/ha in northwestern Zhejiang Province, China, with a similar pattern to the distribution mode of local forest species. This research indicates that machine-learning approaches associated with Landsat imagery provide an economical way for biomass estimation. Moreover, ensemble methods using all candidate variables, especially for Landsat images, provide an alternative for regional biomass simulation.

  19. Newer classification and regression tree techniques: Bagging and Random Forests for ecological prediction

    Treesearch

    Anantha M. Prasad; Louis R. Iverson; Andy Liaw; Andy Liaw

    2006-01-01

    We evaluated four statistical models - Regression Tree Analysis (RTA), Bagging Trees (BT), Random Forests (RF), and Multivariate Adaptive Regression Splines (MARS) - for predictive vegetation mapping under current and future climate scenarios according to the Canadian Climate Centre global circulation model.

  20. Comparing spatial regression to random forests for large environmental data sets

    EPA Science Inventory

    Environmental data may be “large” due to number of records, number of covariates, or both. Random forests has a reputation for good predictive performance when using many covariates, whereas spatial regression, when using reduced rank methods, has a reputatio...

  1. Assessing the accuracy and stability of variable selection methods for random forest modeling in ecology

    EPA Science Inventory

    Random forest (RF) modeling has emerged as an important statistical learning method in ecology due to its exceptional predictive performance. However, for large and complex ecological datasets there is limited guidance on variable selection methods for RF modeling. Typically, e...

  2. Characterizing stand-level forest canopy cover and height using Landsat time series, samples of airborne LiDAR, and the Random Forest algorithm

    NASA Astrophysics Data System (ADS)

    Ahmed, Oumer S.; Franklin, Steven E.; Wulder, Michael A.; White, Joanne C.

    2015-03-01

    Many forest management activities, including the development of forest inventories, require spatially detailed forest canopy cover and height data. Among the various remote sensing technologies, LiDAR (Light Detection and Ranging) offers the most accurate and consistent means for obtaining reliable canopy structure measurements. A potential solution to reduce the cost of LiDAR data, is to integrate transects (samples) of LiDAR data with frequently acquired and spatially comprehensive optical remotely sensed data. Although multiple regression is commonly used for such modeling, often it does not fully capture the complex relationships between forest structure variables. This study investigates the potential of Random Forest (RF), a machine learning technique, to estimate LiDAR measured canopy structure using a time series of Landsat imagery. The study is implemented over a 2600 ha area of industrially managed coastal temperate forests on Vancouver Island, British Columbia, Canada. We implemented a trajectory-based approach to time series analysis that generates time since disturbance (TSD) and disturbance intensity information for each pixel and we used this information to stratify the forest land base into two strata: mature forests and young forests. Canopy cover and height for three forest classes (i.e. mature, young and mature and young (combined)) were modeled separately using multiple regression and Random Forest (RF) techniques. For all forest classes, the RF models provided improved estimates relative to the multiple regression models. The lowest validation error was obtained for the mature forest strata in a RF model (R2 = 0.88, RMSE = 2.39 m and bias = -0.16 for canopy height; R2 = 0.72, RMSE = 0.068% and bias = -0.0049 for canopy cover). This study demonstrates the value of using disturbance and successional history to inform estimates of canopy structure and obtain improved estimates of forest canopy cover and height using the RF algorithm.

  3. Prostate cancer prediction using the random forest algorithm that takes into account transrectal ultrasound findings, age, and serum levels of prostate-specific antigen.

    PubMed

    Xiao, Li-Hong; Chen, Pei-Ran; Gou, Zhong-Ping; Li, Yong-Zhong; Li, Mei; Xiang, Liang-Cheng; Feng, Ping

    2017-01-01

    The aim of this study is to evaluate the ability of the random forest algorithm that combines data on transrectal ultrasound findings, age, and serum levels of prostate-specific antigen to predict prostate carcinoma. Clinico-demographic data were analyzed for 941 patients with prostate diseases treated at our hospital, including age, serum prostate-specific antigen levels, transrectal ultrasound findings, and pathology diagnosis based on ultrasound-guided needle biopsy of the prostate. These data were compared between patients with and without prostate cancer using the Chi-square test, and then entered into the random forest model to predict diagnosis. Patients with and without prostate cancer differed significantly in age and serum prostate-specific antigen levels (P < 0.001), as well as in all transrectal ultrasound characteristics (P < 0.05) except uneven echo (P = 0.609). The random forest model based on age, prostate-specific antigen and ultrasound predicted prostate cancer with an accuracy of 83.10%, sensitivity of 65.64%, and specificity of 93.83%. Positive predictive value was 86.72%, and negative predictive value was 81.64%. By integrating age, prostate-specific antigen levels and transrectal ultrasound findings, the random forest algorithm shows better diagnostic performance for prostate cancer than either diagnostic indicator on its own. This algorithm may help improve diagnosis of the disease by identifying patients at high risk for biopsy.

  4. The Efficiency of Random Forest Method for Shoreline Extraction from LANDSAT-8 and GOKTURK-2 Imageries

    NASA Astrophysics Data System (ADS)

    Bayram, B.; Erdem, F.; Akpinar, B.; Ince, A. K.; Bozkurt, S.; Catal Reis, H.; Seker, D. Z.

    2017-11-01

    Coastal monitoring plays a vital role in environmental planning and hazard management related issues. Since shorelines are fundamental data for environment management, disaster management, coastal erosion studies, modelling of sediment transport and coastal morphodynamics, various techniques have been developed to extract shorelines. Random Forest is one of these techniques which is used in this study for shoreline extraction.. This algorithm is a machine learning method based on decision trees. Decision trees analyse classes of training data creates rules for classification. In this study, Terkos region has been chosen for the proposed method within the scope of "TUBITAK Project (Project No: 115Y718) titled "Integration of Unmanned Aerial Vehicles for Sustainable Coastal Zone Monitoring Model - Three-Dimensional Automatic Coastline Extraction and Analysis: Istanbul-Terkos Example". Random Forest algorithm has been implemented to extract the shoreline of the Black Sea where near the lake from LANDSAT-8 and GOKTURK-2 satellite imageries taken in 2015. The MATLAB environment was used for classification. To obtain land and water-body classes, the Random Forest method has been applied to NIR bands of LANDSAT-8 (5th band) and GOKTURK-2 (4th band) imageries. Each image has been digitized manually and shorelines obtained for accuracy assessment. According to accuracy assessment results, Random Forest method is efficient for both medium and high resolution images for shoreline extraction studies.

  5. Identification by random forest method of HLA class I amino acid substitutions associated with lower survival at day 100 in unrelated donor hematopoietic cell transplantation.

    PubMed

    Marino, S R; Lin, S; Maiers, M; Haagenson, M; Spellman, S; Klein, J P; Binkowski, T A; Lee, S J; van Besien, K

    2012-02-01

    The identification of important amino acid substitutions associated with low survival in hematopoietic cell transplantation (HCT) is hampered by the large number of observed substitutions compared with the small number of patients available for analysis. Random forest analysis is designed to address these limitations. We studied 2107 HCT recipients with good or intermediate risk hematological malignancies to identify HLA class I amino acid substitutions associated with reduced survival at day 100 post transplant. Random forest analysis and traditional univariate and multivariate analyses were used. Random forest analysis identified amino acid substitutions in 33 positions that were associated with reduced 100 day survival, including HLA-A 9, 43, 62, 63, 76, 77, 95, 97, 114, 116, 152, 156, 166 and 167; HLA-B 97, 109, 116 and 156; and HLA-C 6, 9, 11, 14, 21, 66, 77, 80, 95, 97, 99, 116, 156, 163 and 173. In all 13 had been previously reported by other investigators using classical biostatistical approaches. Using the same data set, traditional multivariate logistic regression identified only five amino acid substitutions associated with lower day 100 survival. Random forest analysis is a novel statistical methodology for analysis of HLA mismatching and outcome studies, capable of identifying important amino acid substitutions missed by other methods.

  6. A Random Forest Approach to Predict the Spatial Distribution ...

    EPA Pesticide Factsheets

    Modeling the magnitude and distribution of sediment-bound pollutants in estuaries is often limited by incomplete knowledge of the site and inadequate sample density. To address these modeling limitations, a decision-support tool framework was conceived that predicts sediment contamination from the sub-estuary to broader estuary extent. For this study, a Random Forest (RF) model was implemented to predict the distribution of a model contaminant, triclosan (5-chloro-2-(2,4-dichlorophenoxy)phenol) (TCS), in Narragansett Bay, Rhode Island, USA. TCS is an unregulated contaminant used in many personal care products. The RF explanatory variables were associated with TCS transport and fate (proxies) and direct and indirect environmental entry. The continuous RF TCS concentration predictions were discretized into three levels of contamination (low, medium, and high) for three different quantile thresholds. The RF model explained 63% of the variance with a minimum number of variables. Total organic carbon (TOC) (transport and fate proxy) was a strong predictor of TCS contamination causing a mean squared error increase of 59% when compared to permutations of randomized values of TOC. Additionally, combined sewer overflow discharge (environmental entry) and sand (transport and fate proxy) were strong predictors. The discretization models identified a TCS area of greatest concern in the northern reach of Narragansett Bay (Providence River sub-estuary), which was validated wi

  7. Missouri Ozark Forest Ecosystem Project: the experiment

    Treesearch

    Steven L. Sheriff

    2002-01-01

    Missouri Ozark Forest Ecosystem Project (MOFEP) is a unique experiment to learn about the impacts of management practices on a forest system. Three forest management practices (uneven-aged management, even-aged management, and no-harvest management) as practiced by the Missouri Department of Conservation were randomly assigned to nine forest management sites using a...

  8. MLACP: machine-learning-based prediction of anticancer peptides

    PubMed Central

    Manavalan, Balachandran; Basith, Shaherin; Shin, Tae Hwan; Choi, Sun; Kim, Myeong Ok; Lee, Gwang

    2017-01-01

    Cancer is the second leading cause of death globally, and use of therapeutic peptides to target and kill cancer cells has received considerable attention in recent years. Identification of anticancer peptides (ACPs) through wet-lab experimentation is expensive and often time consuming; therefore, development of an efficient computational method is essential to identify potential ACP candidates prior to in vitro experimentation. In this study, we developed support vector machine- and random forest-based machine-learning methods for the prediction of ACPs using the features calculated from the amino acid sequence, including amino acid composition, dipeptide composition, atomic composition, and physicochemical properties. We trained our methods using the Tyagi-B dataset and determined the machine parameters by 10-fold cross-validation. Furthermore, we evaluated the performance of our methods on two benchmarking datasets, with our results showing that the random forest-based method outperformed the existing methods with an average accuracy and Matthews correlation coefficient value of 88.7% and 0.78, respectively. To assist the scientific community, we also developed a publicly accessible web server at www.thegleelab.org/MLACP.html. PMID:29100375

  9. Variable selection with random forest: Balancing stability, performance, and interpretation in ecological and environmental modeling

    EPA Science Inventory

    Random forest (RF) is popular in ecological and environmental modeling, in part, because of its insensitivity to correlated predictors and resistance to overfitting. Although variable selection has been proposed to improve both performance and interpretation of RF models, it is u...

  10. Random Forests for Evaluating Pedagogy and Informing Personalized Learning

    ERIC Educational Resources Information Center

    Spoon, Kelly; Beemer, Joshua; Whitmer, John C.; Fan, Juanjuan; Frazee, James P.; Stronach, Jeanne; Bohonak, Andrew J.; Levine, Richard A.

    2016-01-01

    Random forests are presented as an analytics foundation for educational data mining tasks. The focus is on course- and program-level analytics including evaluating pedagogical approaches and interventions and identifying and characterizing at-risk students. As part of this development, the concept of individualized treatment effects (ITE) is…

  11. Employing canopy hyperspectral narrowband data and random forest algorithm to differentiate palmer amaranth from colored cotton

    USDA-ARS?s Scientific Manuscript database

    Palmer amaranth (Amaranthus palmeri S. Wats.) invasion negatively impacts cotton (Gossypium hirsutum L.) production systems throughout the United States. The objective of this study was to evaluate canopy hyperspectral narrowband data as input into the random forest machine learning algorithm to dis...

  12. Pre-operative prediction of surgical morbidity in children: comparison of five statistical models.

    PubMed

    Cooper, Jennifer N; Wei, Lai; Fernandez, Soledad A; Minneci, Peter C; Deans, Katherine J

    2015-02-01

    The accurate prediction of surgical risk is important to patients and physicians. Logistic regression (LR) models are typically used to estimate these risks. However, in the fields of data mining and machine-learning, many alternative classification and prediction algorithms have been developed. This study aimed to compare the performance of LR to several data mining algorithms for predicting 30-day surgical morbidity in children. We used the 2012 National Surgical Quality Improvement Program-Pediatric dataset to compare the performance of (1) a LR model that assumed linearity and additivity (simple LR model) (2) a LR model incorporating restricted cubic splines and interactions (flexible LR model) (3) a support vector machine, (4) a random forest and (5) boosted classification trees for predicting surgical morbidity. The ensemble-based methods showed significantly higher accuracy, sensitivity, specificity, PPV, and NPV than the simple LR model. However, none of the models performed better than the flexible LR model in terms of the aforementioned measures or in model calibration or discrimination. Support vector machines, random forests, and boosted classification trees do not show better performance than LR for predicting pediatric surgical morbidity. After further validation, the flexible LR model derived in this study could be used to assist with clinical decision-making based on patient-specific surgical risks. Copyright © 2014 Elsevier Ltd. All rights reserved.

  13. Predictors of occurrence of the aquatic macrophyte Podostemum ceratophyllum in a southern Appalachian River

    USGS Publications Warehouse

    Argentina, Jane E.; Freeman, Mary C.; Freeman, Byron J.

    2010-01-01

    The aquatic macrophyte Podostemum ceratophyllum (Hornleaf Riverweed) commonly provides habitat for invertebrates and fishes in flowing-water portions of Piedmont and Appalachian streams in the eastern US. We quantified variation in percent cover by P. ceratophyllum in a 39-km reach of the Conasauga River, TN and GA, to test the hypothesis that cover decreased with increasing non-forest land use. We estimated percent P. ceratophyllum cover in quadrats (0.09 m2) placed at random coordinates within 20 randomly selected shoals. We then used hierarchical logistic regression, in an information-theoretic framework, to evaluate relative support for models incorporating alternative combinations of microhabitat and shoal-level variables to predict the occurrence of high (≥50%)P. ceratophyllum cover. As expected, bed sediment size and measures of light availability (location in the center of the channel, canopy cover) were included in best-supported models and had similar estimated-effect sizes across models. Podostemum ceratophyllum cover declined with increasing watershed size (included in 8 of 13 models in the confidence set of models); however, this decrease in cover was not well predicted by variation in land use. Focused monitoring of temporal and spatial trends in status of P. ceratophyllum are important due to its biotic importance in fast-flowing waters and its potential sensitivity to landscape-level changes, such as declines in forested land cover and homogenization of benthic habitats.

  14. Lawsuit lead time prediction: Comparison of data mining techniques based on categorical response variable.

    PubMed

    Gruginskie, Lúcia Adriana Dos Santos; Vaccaro, Guilherme Luís Roehe

    2018-01-01

    The quality of the judicial system of a country can be verified by the overall length time of lawsuits, or the lead time. When the lead time is excessive, a country's economy can be affected, leading to the adoption of measures such as the creation of the Saturn Center in Europe. Although there are performance indicators to measure the lead time of lawsuits, the analysis and the fit of prediction models are still underdeveloped themes in the literature. To contribute to this subject, this article compares different prediction models according to their accuracy, sensitivity, specificity, precision, and F1 measure. The database used was from TRF4-the Tribunal Regional Federal da 4a Região-a federal court in southern Brazil, corresponding to the 2nd Instance civil lawsuits completed in 2016. The models were fitted using support vector machine, naive Bayes, random forests, and neural network approaches with categorical predictor variables. The lead time of the 2nd Instance judgment was selected as the response variable measured in days and categorized in bands. The comparison among the models showed that the support vector machine and random forest approaches produced measurements that were superior to those of the other models. The evaluation of the models was made using k-fold cross-validation similar to that applied to the test models.

  15. Public perceptions about climate change mitigation in British Columbia's forest sector

    PubMed Central

    Hagerman, Shannon; Kozak, Robert; Hoberg, George

    2018-01-01

    The role of forest management in mitigating climate change is a central concern for the Canadian province of British Columbia. The successful implementation of forest management activities to achieve climate change mitigation in British Columbia will be strongly influenced by public support or opposition. While we now have increasingly clear ideas of the management opportunities associated with forest mitigation and some insight into public support for climate change mitigation in the context of sustainable forest management, very little is known with respect to the levels and basis of public support for potential forest management strategies to mitigate climate change. This paper, by describing the results of a web-based survey, documents levels of public support for the implementation of eight forest carbon mitigation strategies in British Columbia’s forest sector, and examines and quantifies the influence of the factors that shape this support. Overall, respondents ascribed a high level of importance to forest carbon mitigation and supported all of the eight proposed strategies, indicating that the British Columbia public is inclined to consider alternative practices in managing forests and wood products to mitigate climate change. That said, we found differences in levels of support for the mitigation strategies. In general, we found greater levels of support for a rehabilitation strategy (e.g. reforestation of unproductive forest land), and to a lesser extent for conservation strategies (e.g. old growth conservation, reduced harvest) over enhanced forest management strategies (e.g. improved harvesting and silvicultural techniques). We also highlighted multiple variables within the British Columbia population that appear to play a role in predicting levels of support for conservation and/or enhanced forest management strategies, including environmental values, risk perception, trust in groups of actors, prioritized objectives of forest management and socio-demographic factors. PMID:29684041

  16. Public perceptions about climate change mitigation in British Columbia's forest sector.

    PubMed

    Peterson St-Laurent, Guillaume; Hagerman, Shannon; Kozak, Robert; Hoberg, George

    2018-01-01

    The role of forest management in mitigating climate change is a central concern for the Canadian province of British Columbia. The successful implementation of forest management activities to achieve climate change mitigation in British Columbia will be strongly influenced by public support or opposition. While we now have increasingly clear ideas of the management opportunities associated with forest mitigation and some insight into public support for climate change mitigation in the context of sustainable forest management, very little is known with respect to the levels and basis of public support for potential forest management strategies to mitigate climate change. This paper, by describing the results of a web-based survey, documents levels of public support for the implementation of eight forest carbon mitigation strategies in British Columbia's forest sector, and examines and quantifies the influence of the factors that shape this support. Overall, respondents ascribed a high level of importance to forest carbon mitigation and supported all of the eight proposed strategies, indicating that the British Columbia public is inclined to consider alternative practices in managing forests and wood products to mitigate climate change. That said, we found differences in levels of support for the mitigation strategies. In general, we found greater levels of support for a rehabilitation strategy (e.g. reforestation of unproductive forest land), and to a lesser extent for conservation strategies (e.g. old growth conservation, reduced harvest) over enhanced forest management strategies (e.g. improved harvesting and silvicultural techniques). We also highlighted multiple variables within the British Columbia population that appear to play a role in predicting levels of support for conservation and/or enhanced forest management strategies, including environmental values, risk perception, trust in groups of actors, prioritized objectives of forest management and socio-demographic factors.

  17. Old-growth and mature forests near spotted owl nests in western Oregon

    NASA Technical Reports Server (NTRS)

    Ripple, William J.; Johnson, David H.; Hershey, K. T.; Meslow, E. Charles

    1995-01-01

    We investigated how the amount of old-growth and mature forest influences the selection of nest sites by northern spotted owls (Strix occidentalis caurina) in the Central Cascade Mountains of Oregon. We used 7 different plot sizes to compare the proportion of mature and old-growth forest between 30 nest sites and 30 random sites. The proportion of old-growth and mature forest was significantly greater at nests sites than at random sites for all plot sizes (P less than or equal to 0.01). Thus, management of the spotted owl might require setting the percentage of old-growth and mature forest retained from harvesting at least 1 standard deviation above the mean for the 30 nest sites we examined.

  18. Tissue segmentation of computed tomography images using a Random Forest algorithm: a feasibility study

    NASA Astrophysics Data System (ADS)

    Polan, Daniel F.; Brady, Samuel L.; Kaufman, Robert A.

    2016-09-01

    There is a need for robust, fully automated whole body organ segmentation for diagnostic CT. This study investigates and optimizes a Random Forest algorithm for automated organ segmentation; explores the limitations of a Random Forest algorithm applied to the CT environment; and demonstrates segmentation accuracy in a feasibility study of pediatric and adult patients. To the best of our knowledge, this is the first study to investigate a trainable Weka segmentation (TWS) implementation using Random Forest machine-learning as a means to develop a fully automated tissue segmentation tool developed specifically for pediatric and adult examinations in a diagnostic CT environment. Current innovation in computed tomography (CT) is focused on radiomics, patient-specific radiation dose calculation, and image quality improvement using iterative reconstruction, all of which require specific knowledge of tissue and organ systems within a CT image. The purpose of this study was to develop a fully automated Random Forest classifier algorithm for segmentation of neck-chest-abdomen-pelvis CT examinations based on pediatric and adult CT protocols. Seven materials were classified: background, lung/internal air or gas, fat, muscle, solid organ parenchyma, blood/contrast enhanced fluid, and bone tissue using Matlab and the TWS plugin of FIJI. The following classifier feature filters of TWS were investigated: minimum, maximum, mean, and variance evaluated over a voxel radius of 2 n , (n from 0 to 4), along with noise reduction and edge preserving filters: Gaussian, bilateral, Kuwahara, and anisotropic diffusion. The Random Forest algorithm used 200 trees with 2 features randomly selected per node. The optimized auto-segmentation algorithm resulted in 16 image features including features derived from maximum, mean, variance Gaussian and Kuwahara filters. Dice similarity coefficient (DSC) calculations between manually segmented and Random Forest algorithm segmented images from 21 patient image sections, were analyzed. The automated algorithm produced segmentation of seven material classes with a median DSC of 0.86  ±  0.03 for pediatric patient protocols, and 0.85  ±  0.04 for adult patient protocols. Additionally, 100 randomly selected patient examinations were segmented and analyzed, and a mean sensitivity of 0.91 (range: 0.82-0.98), specificity of 0.89 (range: 0.70-0.98), and accuracy of 0.90 (range: 0.76-0.98) were demonstrated. In this study, we demonstrate that this fully automated segmentation tool was able to produce fast and accurate segmentation of the neck and trunk of the body over a wide range of patient habitus and scan parameters.

  19. Temporal changes in randomness of bird communities across Central Europe.

    PubMed

    Renner, Swen C; Gossner, Martin M; Kahl, Tiemo; Kalko, Elisabeth K V; Weisser, Wolfgang W; Fischer, Markus; Allan, Eric

    2014-01-01

    Many studies have examined whether communities are structured by random or deterministic processes, and both are likely to play a role, but relatively few studies have attempted to quantify the degree of randomness in species composition. We quantified, for the first time, the degree of randomness in forest bird communities based on an analysis of spatial autocorrelation in three regions of Germany. The compositional dissimilarity between pairs of forest patches was regressed against the distance between them. We then calculated the y-intercept of the curve, i.e. the 'nugget', which represents the compositional dissimilarity at zero spatial distance. We therefore assume, following similar work on plant communities, that this represents the degree of randomness in species composition. We then analysed how the degree of randomness in community composition varied over time and with forest management intensity, which we expected to reduce the importance of random processes by increasing the strength of environmental drivers. We found that a high portion of the bird community composition could be explained by chance (overall mean of 0.63), implying that most of the variation in local bird community composition is driven by stochastic processes. Forest management intensity did not consistently affect the mean degree of randomness in community composition, perhaps because the bird communities were relatively insensitive to management intensity. We found a high temporal variation in the degree of randomness, which may indicate temporal variation in assembly processes and in the importance of key environmental drivers. We conclude that the degree of randomness in community composition should be considered in bird community studies, and the high values we find may indicate that bird community composition is relatively hard to predict at the regional scale.

  20. Security authentication with a three-dimensional optical phase code using random forest classifier: an overview

    NASA Astrophysics Data System (ADS)

    Markman, Adam; Carnicer, Artur; Javidi, Bahram

    2017-05-01

    We overview our recent work [1] on utilizing three-dimensional (3D) optical phase codes for object authentication using the random forest classifier. A simple 3D optical phase code (OPC) is generated by combining multiple diffusers and glass slides. This tag is then placed on a quick-response (QR) code, which is a barcode capable of storing information and can be scanned under non-uniform illumination conditions, rotation, and slight degradation. A coherent light source illuminates the OPC and the transmitted light is captured by a CCD to record the unique signature. Feature extraction on the signature is performed and inputted into a pre-trained random-forest classifier for authentication.

  1. Fast image interpolation via random forests.

    PubMed

    Huang, Jun-Jie; Siu, Wan-Chi; Liu, Tian-Rui

    2015-10-01

    This paper proposes a two-stage framework for fast image interpolation via random forests (FIRF). The proposed FIRF method gives high accuracy, as well as requires low computation. The underlying idea of this proposed work is to apply random forests to classify the natural image patch space into numerous subspaces and learn a linear regression model for each subspace to map the low-resolution image patch to high-resolution image patch. The FIRF framework consists of two stages. Stage 1 of the framework removes most of the ringing and aliasing artifacts in the initial bicubic interpolated image, while Stage 2 further refines the Stage 1 interpolated image. By varying the number of decision trees in the random forests and the number of stages applied, the proposed FIRF method can realize computationally scalable image interpolation. Extensive experimental results show that the proposed FIRF(3, 2) method achieves more than 0.3 dB improvement in peak signal-to-noise ratio over the state-of-the-art nonlocal autoregressive modeling (NARM) method. Moreover, the proposed FIRF(1, 1) obtains similar or better results as NARM while only takes its 0.3% computational time.

  2. Random forests on Hadoop for genome-wide association studies of multivariate neuroimaging phenotypes

    PubMed Central

    2013-01-01

    Motivation Multivariate quantitative traits arise naturally in recent neuroimaging genetics studies, in which both structural and functional variability of the human brain is measured non-invasively through techniques such as magnetic resonance imaging (MRI). There is growing interest in detecting genetic variants associated with such multivariate traits, especially in genome-wide studies. Random forests (RFs) classifiers, which are ensembles of decision trees, are amongst the best performing machine learning algorithms and have been successfully employed for the prioritisation of genetic variants in case-control studies. RFs can also be applied to produce gene rankings in association studies with multivariate quantitative traits, and to estimate genetic similarities measures that are predictive of the trait. However, in studies involving hundreds of thousands of SNPs and high-dimensional traits, a very large ensemble of trees must be inferred from the data in order to obtain reliable rankings, which makes the application of these algorithms computationally prohibitive. Results We have developed a parallel version of the RF algorithm for regression and genetic similarity learning tasks in large-scale population genetic association studies involving multivariate traits, called PaRFR (Parallel Random Forest Regression). Our implementation takes advantage of the MapReduce programming model and is deployed on Hadoop, an open-source software framework that supports data-intensive distributed applications. Notable speed-ups are obtained by introducing a distance-based criterion for node splitting in the tree estimation process. PaRFR has been applied to a genome-wide association study on Alzheimer's disease (AD) in which the quantitative trait consists of a high-dimensional neuroimaging phenotype describing longitudinal changes in the human brain structure. PaRFR provides a ranking of SNPs associated to this trait, and produces pair-wise measures of genetic proximity that can be directly compared to pair-wise measures of phenotypic proximity. Several known AD-related variants have been identified, including APOE4 and TOMM40. We also present experimental evidence supporting the hypothesis of a linear relationship between the number of top-ranked mutated states, or frequent mutation patterns, and an indicator of disease severity. Availability The Java codes are freely available at http://www2.imperial.ac.uk/~gmontana. PMID:24564704

  3. Random forests on Hadoop for genome-wide association studies of multivariate neuroimaging phenotypes.

    PubMed

    Wang, Yue; Goh, Wilson; Wong, Limsoon; Montana, Giovanni

    2013-01-01

    Multivariate quantitative traits arise naturally in recent neuroimaging genetics studies, in which both structural and functional variability of the human brain is measured non-invasively through techniques such as magnetic resonance imaging (MRI). There is growing interest in detecting genetic variants associated with such multivariate traits, especially in genome-wide studies. Random forests (RFs) classifiers, which are ensembles of decision trees, are amongst the best performing machine learning algorithms and have been successfully employed for the prioritisation of genetic variants in case-control studies. RFs can also be applied to produce gene rankings in association studies with multivariate quantitative traits, and to estimate genetic similarities measures that are predictive of the trait. However, in studies involving hundreds of thousands of SNPs and high-dimensional traits, a very large ensemble of trees must be inferred from the data in order to obtain reliable rankings, which makes the application of these algorithms computationally prohibitive. We have developed a parallel version of the RF algorithm for regression and genetic similarity learning tasks in large-scale population genetic association studies involving multivariate traits, called PaRFR (Parallel Random Forest Regression). Our implementation takes advantage of the MapReduce programming model and is deployed on Hadoop, an open-source software framework that supports data-intensive distributed applications. Notable speed-ups are obtained by introducing a distance-based criterion for node splitting in the tree estimation process. PaRFR has been applied to a genome-wide association study on Alzheimer's disease (AD) in which the quantitative trait consists of a high-dimensional neuroimaging phenotype describing longitudinal changes in the human brain structure. PaRFR provides a ranking of SNPs associated to this trait, and produces pair-wise measures of genetic proximity that can be directly compared to pair-wise measures of phenotypic proximity. Several known AD-related variants have been identified, including APOE4 and TOMM40. We also present experimental evidence supporting the hypothesis of a linear relationship between the number of top-ranked mutated states, or frequent mutation patterns, and an indicator of disease severity. The Java codes are freely available at http://www2.imperial.ac.uk/~gmontana.

  4. 7 CFR 1.620 - What supporting information must the Forest Service provide with its preliminary conditions?

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... 7 Agriculture 1 2010-01-01 2010-01-01 false What supporting information must the Forest Service... § 1.620 What supporting information must the Forest Service provide with its preliminary conditions? (a) Supporting information. (1) When the Forest Service files preliminary conditions with FERC, it...

  5. Random forests as cumulative effects models: A case study of lakes and rivers in Muskoka, Canada.

    PubMed

    Jones, F Chris; Plewes, Rachel; Murison, Lorna; MacDougall, Mark J; Sinclair, Sarah; Davies, Christie; Bailey, John L; Richardson, Murray; Gunn, John

    2017-10-01

    Cumulative effects assessment (CEA) - a type of environmental appraisal - lacks effective methods for modeling cumulative effects, evaluating indicators of ecosystem condition, and exploring the likely outcomes of development scenarios. Random forests are an extension of classification and regression trees, which model response variables by recursive partitioning. Random forests were used to model a series of candidate ecological indicators that described lakes and rivers from a case study watershed (The Muskoka River Watershed, Canada). Suitability of the candidate indicators for use in cumulative effects assessment and watershed monitoring was assessed according to how well they could be predicted from natural habitat features and how sensitive they were to human land-use. The best models explained 75% of the variation in a multivariate descriptor of lake benthic-macroinvertebrate community structure, and 76% of the variation in the conductivity of river water. Similar results were obtained by cross-validation. Several candidate indicators detected a simulated doubling of urban land-use in their catchments, and a few were able to detect a simulated doubling of agricultural land-use. The paper demonstrates that random forests can be used to describe the combined and singular effects of multiple stressors and natural environmental factors, and furthermore, that random forests can be used to evaluate the performance of monitoring indicators. The numerical methods presented are applicable to any ecosystem and indicator type, and therefore represent a step forward for CEA. Crown Copyright © 2017. Published by Elsevier Ltd. All rights reserved.

  6. Improved high-dimensional prediction with Random Forests by the use of co-data.

    PubMed

    Te Beest, Dennis E; Mes, Steven W; Wilting, Saskia M; Brakenhoff, Ruud H; van de Wiel, Mark A

    2017-12-28

    Prediction in high dimensional settings is difficult due to the large number of variables relative to the sample size. We demonstrate how auxiliary 'co-data' can be used to improve the performance of a Random Forest in such a setting. Co-data are incorporated in the Random Forest by replacing the uniform sampling probabilities that are used to draw candidate variables by co-data moderated sampling probabilities. Co-data here are defined as any type information that is available on the variables of the primary data, but does not use its response labels. These moderated sampling probabilities are, inspired by empirical Bayes, learned from the data at hand. We demonstrate the co-data moderated Random Forest (CoRF) with two examples. In the first example we aim to predict the presence of a lymph node metastasis with gene expression data. We demonstrate how a set of external p-values, a gene signature, and the correlation between gene expression and DNA copy number can improve the predictive performance. In the second example we demonstrate how the prediction of cervical (pre-)cancer with methylation data can be improved by including the location of the probe relative to the known CpG islands, the number of CpG sites targeted by a probe, and a set of p-values from a related study. The proposed method is able to utilize auxiliary co-data to improve the performance of a Random Forest.

  7. Beyond the hype: deep neural networks outperform established methods using a ChEMBL bioactivity benchmark set.

    PubMed

    Lenselink, Eelke B; Ten Dijke, Niels; Bongers, Brandon; Papadatos, George; van Vlijmen, Herman W T; Kowalczyk, Wojtek; IJzerman, Adriaan P; van Westen, Gerard J P

    2017-08-14

    The increase of publicly available bioactivity data in recent years has fueled and catalyzed research in chemogenomics, data mining, and modeling approaches. As a direct result, over the past few years a multitude of different methods have been reported and evaluated, such as target fishing, nearest neighbor similarity-based methods, and Quantitative Structure Activity Relationship (QSAR)-based protocols. However, such studies are typically conducted on different datasets, using different validation strategies, and different metrics. In this study, different methods were compared using one single standardized dataset obtained from ChEMBL, which is made available to the public, using standardized metrics (BEDROC and Matthews Correlation Coefficient). Specifically, the performance of Naïve Bayes, Random Forests, Support Vector Machines, Logistic Regression, and Deep Neural Networks was assessed using QSAR and proteochemometric (PCM) methods. All methods were validated using both a random split validation and a temporal validation, with the latter being a more realistic benchmark of expected prospective execution. Deep Neural Networks are the top performing classifiers, highlighting the added value of Deep Neural Networks over other more conventional methods. Moreover, the best method ('DNN_PCM') performed significantly better at almost one standard deviation higher than the mean performance. Furthermore, Multi-task and PCM implementations were shown to improve performance over single task Deep Neural Networks. Conversely, target prediction performed almost two standard deviations under the mean performance. Random Forests, Support Vector Machines, and Logistic Regression performed around mean performance. Finally, using an ensemble of DNNs, alongside additional tuning, enhanced the relative performance by another 27% (compared with unoptimized 'DNN_PCM'). Here, a standardized set to test and evaluate different machine learning algorithms in the context of multi-task learning is offered by providing the data and the protocols. Graphical Abstract .

  8. Development of machine learning models for diagnosis of glaucoma.

    PubMed

    Kim, Seong Jae; Cho, Kyong Jin; Oh, Sejong

    2017-01-01

    The study aimed to develop machine learning models that have strong prediction power and interpretability for diagnosis of glaucoma based on retinal nerve fiber layer (RNFL) thickness and visual field (VF). We collected various candidate features from the examination of retinal nerve fiber layer (RNFL) thickness and visual field (VF). We also developed synthesized features from original features. We then selected the best features proper for classification (diagnosis) through feature evaluation. We used 100 cases of data as a test dataset and 399 cases of data as a training and validation dataset. To develop the glaucoma prediction model, we considered four machine learning algorithms: C5.0, random forest (RF), support vector machine (SVM), and k-nearest neighbor (KNN). We repeatedly composed a learning model using the training dataset and evaluated it by using the validation dataset. Finally, we got the best learning model that produces the highest validation accuracy. We analyzed quality of the models using several measures. The random forest model shows best performance and C5.0, SVM, and KNN models show similar accuracy. In the random forest model, the classification accuracy is 0.98, sensitivity is 0.983, specificity is 0.975, and AUC is 0.979. The developed prediction models show high accuracy, sensitivity, specificity, and AUC in classifying among glaucoma and healthy eyes. It will be used for predicting glaucoma against unknown examination records. Clinicians may reference the prediction results and be able to make better decisions. We may combine multiple learning models to increase prediction accuracy. The C5.0 model includes decision rules for prediction. It can be used to explain the reasons for specific predictions.

  9. Towards large-scale FAME-based bacterial species identification using machine learning techniques.

    PubMed

    Slabbinck, Bram; De Baets, Bernard; Dawyndt, Peter; De Vos, Paul

    2009-05-01

    In the last decade, bacterial taxonomy witnessed a huge expansion. The swift pace of bacterial species (re-)definitions has a serious impact on the accuracy and completeness of first-line identification methods. Consequently, back-end identification libraries need to be synchronized with the List of Prokaryotic names with Standing in Nomenclature. In this study, we focus on bacterial fatty acid methyl ester (FAME) profiling as a broadly used first-line identification method. From the BAME@LMG database, we have selected FAME profiles of individual strains belonging to the genera Bacillus, Paenibacillus and Pseudomonas. Only those profiles resulting from standard growth conditions have been retained. The corresponding data set covers 74, 44 and 95 validly published bacterial species, respectively, represented by 961, 378 and 1673 standard FAME profiles. Through the application of machine learning techniques in a supervised strategy, different computational models have been built for genus and species identification. Three techniques have been considered: artificial neural networks, random forests and support vector machines. Nearly perfect identification has been achieved at genus level. Notwithstanding the known limited discriminative power of FAME analysis for species identification, the computational models have resulted in good species identification results for the three genera. For Bacillus, Paenibacillus and Pseudomonas, random forests have resulted in sensitivity values, respectively, 0.847, 0.901 and 0.708. The random forests models outperform those of the other machine learning techniques. Moreover, our machine learning approach also outperformed the Sherlock MIS (MIDI Inc., Newark, DE, USA). These results show that machine learning proves very useful for FAME-based bacterial species identification. Besides good bacterial identification at species level, speed and ease of taxonomic synchronization are major advantages of this computational species identification strategy.

  10. A Hybrid Color Space for Skin Detection Using Genetic Algorithm Heuristic Search and Principal Component Analysis Technique

    PubMed Central

    2015-01-01

    Color is one of the most prominent features of an image and used in many skin and face detection applications. Color space transformation is widely used by researchers to improve face and skin detection performance. Despite the substantial research efforts in this area, choosing a proper color space in terms of skin and face classification performance which can address issues like illumination variations, various camera characteristics and diversity in skin color tones has remained an open issue. This research proposes a new three-dimensional hybrid color space termed SKN by employing the Genetic Algorithm heuristic and Principal Component Analysis to find the optimal representation of human skin color in over seventeen existing color spaces. Genetic Algorithm heuristic is used to find the optimal color component combination setup in terms of skin detection accuracy while the Principal Component Analysis projects the optimal Genetic Algorithm solution to a less complex dimension. Pixel wise skin detection was used to evaluate the performance of the proposed color space. We have employed four classifiers including Random Forest, Naïve Bayes, Support Vector Machine and Multilayer Perceptron in order to generate the human skin color predictive model. The proposed color space was compared to some existing color spaces and shows superior results in terms of pixel-wise skin detection accuracy. Experimental results show that by using Random Forest classifier, the proposed SKN color space obtained an average F-score and True Positive Rate of 0.953 and False Positive Rate of 0.0482 which outperformed the existing color spaces in terms of pixel wise skin detection accuracy. The results also indicate that among the classifiers used in this study, Random Forest is the most suitable classifier for pixel wise skin detection applications. PMID:26267377

  11. Mediastinal lymph node detection and station mapping on chest CT using spatial priors and random forest

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Liu, Jiamin; Hoffman, Joanne; Zhao, Jocelyn

    2016-07-15

    Purpose: To develop an automated system for mediastinal lymph node detection and station mapping for chest CT. Methods: The contextual organs, trachea, lungs, and spine are first automatically identified to locate the region of interest (ROI) (mediastinum). The authors employ shape features derived from Hessian analysis, local object scale, and circular transformation that are computed per voxel in the ROI. Eight more anatomical structures are simultaneously segmented by multiatlas label fusion. Spatial priors are defined as the relative multidimensional distance vectors corresponding to each structure. Intensity, shape, and spatial prior features are integrated and parsed by a random forest classifiermore » for lymph node detection. The detected candidates are then segmented by the following curve evolution process. Texture features are computed on the segmented lymph nodes and a support vector machine committee is used for final classification. For lymph node station labeling, based on the segmentation results of the above anatomical structures, the textual definitions of mediastinal lymph node map according to the International Association for the Study of Lung Cancer are converted into patient-specific color-coded CT image, where the lymph node station can be automatically assigned for each detected node. Results: The chest CT volumes from 70 patients with 316 enlarged mediastinal lymph nodes are used for validation. For lymph node detection, their system achieves 88% sensitivity at eight false positives per patient. For lymph node station labeling, 84.5% of lymph nodes are correctly assigned to their stations. Conclusions: Multiple-channel shape, intensity, and spatial prior features aggregated by a random forest classifier improve mediastinal lymph node detection on chest CT. Using the location information of segmented anatomic structures from the multiatlas formulation enables accurate identification of lymph node stations.« less

  12. Differential privacy-based evaporative cooling feature selection and classification with relief-F and random forests.

    PubMed

    Le, Trang T; Simmons, W Kyle; Misaki, Masaya; Bodurka, Jerzy; White, Bill C; Savitz, Jonathan; McKinney, Brett A

    2017-09-15

    Classification of individuals into disease or clinical categories from high-dimensional biological data with low prediction error is an important challenge of statistical learning in bioinformatics. Feature selection can improve classification accuracy but must be incorporated carefully into cross-validation to avoid overfitting. Recently, feature selection methods based on differential privacy, such as differentially private random forests and reusable holdout sets, have been proposed. However, for domains such as bioinformatics, where the number of features is much larger than the number of observations p≫n , these differential privacy methods are susceptible to overfitting. We introduce private Evaporative Cooling, a stochastic privacy-preserving machine learning algorithm that uses Relief-F for feature selection and random forest for privacy preserving classification that also prevents overfitting. We relate the privacy-preserving threshold mechanism to a thermodynamic Maxwell-Boltzmann distribution, where the temperature represents the privacy threshold. We use the thermal statistical physics concept of Evaporative Cooling of atomic gases to perform backward stepwise privacy-preserving feature selection. On simulated data with main effects and statistical interactions, we compare accuracies on holdout and validation sets for three privacy-preserving methods: the reusable holdout, reusable holdout with random forest, and private Evaporative Cooling, which uses Relief-F feature selection and random forest classification. In simulations where interactions exist between attributes, private Evaporative Cooling provides higher classification accuracy without overfitting based on an independent validation set. In simulations without interactions, thresholdout with random forest and private Evaporative Cooling give comparable accuracies. We also apply these privacy methods to human brain resting-state fMRI data from a study of major depressive disorder. Code available at http://insilico.utulsa.edu/software/privateEC . brett-mckinney@utulsa.edu. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  13. What does it take to get family forest owners to enroll in a forest stewardship-type program?

    Treesearch

    Michael A. Kilgore; Stephanie A. Snyder; Joseph Schertz; Steven J. Taff

    2008-01-01

    We estimated the probability of enrollment and factors influencing participation in a forest stewardship-type program, Minnesota's Sustainable Forest Incentives Act, using data from a mail survey of over 1000 randomly-selected Minnesota family forest owners. Of the 15 variables tested, only five were significant predictors of a landowner's interest in...

  14. Mapping forest vegetation for the western United States using modified random forests imputation of FIA forest plots

    Treesearch

    Karin Riley; Isaac C. Grenfell; Mark A. Finney

    2016-01-01

    Maps of the number, size, and species of trees in forests across the western United States are desirable for many applications such as estimating terrestrial carbon resources, predicting tree mortality following wildfires, and for forest inventory. However, detailed mapping of trees for large areas is not feasible with current technologies, but statistical...

  15. Random forest regression modelling for forest aboveground biomass estimation using RISAT-1 PolSAR and terrestrial LiDAR data

    NASA Astrophysics Data System (ADS)

    Mangla, Rohit; Kumar, Shashi; Nandy, Subrata

    2016-05-01

    SAR and LiDAR remote sensing have already shown the potential of active sensors for forest parameter retrieval. SAR sensor in its fully polarimetric mode has an advantage to retrieve scattering property of different component of forest structure and LiDAR has the capability to measure structural information with very high accuracy. This study was focused on retrieval of forest aboveground biomass (AGB) using Terrestrial Laser Scanner (TLS) based point clouds and scattering property of forest vegetation obtained from decomposition modelling of RISAT-1 fully polarimetric SAR data. TLS data was acquired for 14 plots of Timli forest range, Uttarakhand, India. The forest area is dominated by Sal trees and random sampling with plot size of 0.1 ha (31.62m*31.62m) was adopted for TLS and field data collection. RISAT-1 data was processed to retrieve SAR data based variables and TLS point clouds based 3D imaging was done to retrieve LiDAR based variables. Surface scattering, double-bounce scattering, volume scattering, helix and wire scattering were the SAR based variables retrieved from polarimetric decomposition. Tree heights and stem diameters were used as LiDAR based variables retrieved from single tree vertical height and least square circle fit methods respectively. All the variables obtained for forest plots were used as an input in a machine learning based Random Forest Regression Model, which was developed in this study for forest AGB estimation. Modelled output for forest AGB showed reliable accuracy (RMSE = 27.68 t/ha) and a good coefficient of determination (0.63) was obtained through the linear regression between modelled AGB and field-estimated AGB. The sensitivity analysis showed that the model was more sensitive for the major contributed variables (stem diameter and volume scattering) and these variables were measured from two different remote sensing techniques. This study strongly recommends the integration of SAR and LiDAR data for forest AGB estimation.

  16. Predicting Health Care Utilization After Behavioral Health Referral Using Natural Language Processing and Machine Learning.

    PubMed

    Roysden, Nathaniel; Wright, Adam

    2015-01-01

    Mental health problems are an independent predictor of increased healthcare utilization. We created random forest classifiers for predicting two outcomes following a patient's first behavioral health encounter: decreased utilization by any amount (AUROC 0.74) and ultra-high absolute utilization (AUROC 0.88). These models may be used for clinical decision support by referring providers, to automatically detect patients who may benefit from referral, for cost management, or for risk/protection factor analysis.

  17. Advances in SCA and RF-DNA Fingerprinting Through Enhanced Linear Regression Attacks and Application of Random Forest Classifiers

    DTIC Science & Technology

    2014-09-18

    Converter AES Advance Encryption Standard ANN Artificial Neural Network APS Application Support AUC Area Under the Curve CPA Correlation Power Analysis ...Importance WGN White Gaussian Noise WPAN Wireless Personal Area Networks XEnv Cross-Environment XRx Cross-Receiver xxi ADVANCES IN SCA AND RF-DNA...based tool called KillerBee was released in 2009 that increases the exposure of ZigBee and other IEEE 802.15.4-based Wireless Personal Area Networks

  18. Hierarchical Bayesian spatial models for predicting multiple forest variables using waveform LiDAR, hyperspectral imagery, and large inventory datasets

    USGS Publications Warehouse

    Finley, Andrew O.; Banerjee, Sudipto; Cook, Bruce D.; Bradford, John B.

    2013-01-01

    In this paper we detail a multivariate spatial regression model that couples LiDAR, hyperspectral and forest inventory data to predict forest outcome variables at a high spatial resolution. The proposed model is used to analyze forest inventory data collected on the US Forest Service Penobscot Experimental Forest (PEF), ME, USA. In addition to helping meet the regression model's assumptions, results from the PEF analysis suggest that the addition of multivariate spatial random effects improves model fit and predictive ability, compared with two commonly applied modeling approaches. This improvement results from explicitly modeling the covariation among forest outcome variables and spatial dependence among observations through the random effects. Direct application of such multivariate models to even moderately large datasets is often computationally infeasible because of cubic order matrix algorithms involved in estimation. We apply a spatial dimension reduction technique to help overcome this computational hurdle without sacrificing richness in modeling.

  19. The Random Forests Statistical Technique: An Examination of Its Value for the Study of Reading

    ERIC Educational Resources Information Center

    Matsuki, Kazunaga; Kuperman, Victor; Van Dyke, Julie A.

    2016-01-01

    Studies investigating individual differences in reading ability often involve data sets containing a large number of collinear predictors and a small number of observations. In this article, we discuss the method of Random Forests and demonstrate its suitability for addressing the statistical concerns raised by such data sets. The method is…

  20. An Introduction to Recursive Partitioning: Rationale, Application, and Characteristics of Classification and Regression Trees, Bagging, and Random Forests

    ERIC Educational Resources Information Center

    Strobl, Carolin; Malley, James; Tutz, Gerhard

    2009-01-01

    Recursive partitioning methods have become popular and widely used tools for nonparametric regression and classification in many scientific fields. Especially random forests, which can deal with large numbers of predictor variables even in the presence of complex interactions, have been applied successfully in genetics, clinical medicine, and…

  1. Random location of fuel treatments in wildland community interfaces: a percolation approach

    Treesearch

    Michael Bevers; Philip N. Omi; John G. Hof

    2004-01-01

    We explore the use of spatially correlated random treatments to reduce fuels in landscape patterns that appear somewhat natural while forming fully connected fuelbreaks between wildland forests and developed protection zones. From treatment zone maps partitioned into grids of hexagonal forest cells representing potential treatment sites, we selected cells to be treated...

  2. Road Network State Estimation Using Random Forest Ensemble Learning

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hou, Yi; Edara, Praveen; Chang, Yohan

    Network-scale travel time prediction not only enables traffic management centers (TMC) to proactively implement traffic management strategies, but also allows travelers make informed decisions about route choices between various origins and destinations. In this paper, a random forest estimator was proposed to predict travel time in a network. The estimator was trained using two years of historical travel time data for a case study network in St. Louis, Missouri. Both temporal and spatial effects were considered in the modeling process. The random forest models predicted travel times accurately during both congested and uncongested traffic conditions. The computational times for themore » models were low, thus useful for real-time traffic management and traveler information applications.« less

  3. Adaptive economic and ecological forest management under risk

    Treesearch

    Joseph Buongiorno; Mo Zhou

    2015-01-01

    Background: Forest managers must deal with inherently stochastic ecological and economic processes. The future growth of trees is uncertain, and so is their value. The randomness of low-impact, high frequency or rare catastrophic shocks in forest growth has significant implications in shaping the mix of tree species and the forest landscape...

  4. Machine Learning Techniques for Prediction of Early Childhood Obesity.

    PubMed

    Dugan, T M; Mukhopadhyay, S; Carroll, A; Downs, S

    2015-01-01

    This paper aims to predict childhood obesity after age two, using only data collected prior to the second birthday by a clinical decision support system called CHICA. Analyses of six different machine learning methods: RandomTree, RandomForest, J48, ID3, Naïve Bayes, and Bayes trained on CHICA data show that an accurate, sensitive model can be created. Of the methods analyzed, the ID3 model trained on the CHICA dataset proved the best overall performance with accuracy of 85% and sensitivity of 89%. Additionally, the ID3 model had a positive predictive value of 84% and a negative predictive value of 88%. The structure of the tree also gives insight into the strongest predictors of future obesity in children. Many of the strongest predictors seen in the ID3 modeling of the CHICA dataset have been independently validated in the literature as correlated with obesity, thereby supporting the validity of the model. This study demonstrated that data from a production clinical decision support system can be used to build an accurate machine learning model to predict obesity in children after age two.

  5. Seeing the forest for the trees: utilizing modified random forests imputation of forest plot data for landscape-level analyses

    Treesearch

    Karin L. Riley; Isaac C. Grenfell; Mark A. Finney

    2015-01-01

    Mapping the number, size, and species of trees in forests across the western United States has utility for a number of research endeavors, ranging from estimation of terrestrial carbon resources to tree mortality following wildfires. For landscape fire and forest simulations that use the Forest Vegetation Simulator (FVS), a tree-level dataset, or “tree list”, is a...

  6. Forest Cover Estimation in Ireland Using Radar Remote Sensing: A Comparative Analysis of Forest Cover Assessment Methodologies.

    PubMed

    Devaney, John; Barrett, Brian; Barrett, Frank; Redmond, John; O Halloran, John

    2015-01-01

    Quantification of spatial and temporal changes in forest cover is an essential component of forest monitoring programs. Due to its cloud free capability, Synthetic Aperture Radar (SAR) is an ideal source of information on forest dynamics in countries with near-constant cloud-cover. However, few studies have investigated the use of SAR for forest cover estimation in landscapes with highly sparse and fragmented forest cover. In this study, the potential use of L-band SAR for forest cover estimation in two regions (Longford and Sligo) in Ireland is investigated and compared to forest cover estimates derived from three national (Forestry2010, Prime2, National Forest Inventory), one pan-European (Forest Map 2006) and one global forest cover (Global Forest Change) product. Two machine-learning approaches (Random Forests and Extremely Randomised Trees) are evaluated. Both Random Forests and Extremely Randomised Trees classification accuracies were high (98.1-98.5%), with differences between the two classifiers being minimal (<0.5%). Increasing levels of post classification filtering led to a decrease in estimated forest area and an increase in overall accuracy of SAR-derived forest cover maps. All forest cover products were evaluated using an independent validation dataset. For the Longford region, the highest overall accuracy was recorded with the Forestry2010 dataset (97.42%) whereas in Sligo, highest overall accuracy was obtained for the Prime2 dataset (97.43%), although accuracies of SAR-derived forest maps were comparable. Our findings indicate that spaceborne radar could aid inventories in regions with low levels of forest cover in fragmented landscapes. The reduced accuracies observed for the global and pan-continental forest cover maps in comparison to national and SAR-derived forest maps indicate that caution should be exercised when applying these datasets for national reporting.

  7. Forest Cover Estimation in Ireland Using Radar Remote Sensing: A Comparative Analysis of Forest Cover Assessment Methodologies

    PubMed Central

    Devaney, John; Barrett, Brian; Barrett, Frank; Redmond, John; O`Halloran, John

    2015-01-01

    Quantification of spatial and temporal changes in forest cover is an essential component of forest monitoring programs. Due to its cloud free capability, Synthetic Aperture Radar (SAR) is an ideal source of information on forest dynamics in countries with near-constant cloud-cover. However, few studies have investigated the use of SAR for forest cover estimation in landscapes with highly sparse and fragmented forest cover. In this study, the potential use of L-band SAR for forest cover estimation in two regions (Longford and Sligo) in Ireland is investigated and compared to forest cover estimates derived from three national (Forestry2010, Prime2, National Forest Inventory), one pan-European (Forest Map 2006) and one global forest cover (Global Forest Change) product. Two machine-learning approaches (Random Forests and Extremely Randomised Trees) are evaluated. Both Random Forests and Extremely Randomised Trees classification accuracies were high (98.1–98.5%), with differences between the two classifiers being minimal (<0.5%). Increasing levels of post classification filtering led to a decrease in estimated forest area and an increase in overall accuracy of SAR-derived forest cover maps. All forest cover products were evaluated using an independent validation dataset. For the Longford region, the highest overall accuracy was recorded with the Forestry2010 dataset (97.42%) whereas in Sligo, highest overall accuracy was obtained for the Prime2 dataset (97.43%), although accuracies of SAR-derived forest maps were comparable. Our findings indicate that spaceborne radar could aid inventories in regions with low levels of forest cover in fragmented landscapes. The reduced accuracies observed for the global and pan-continental forest cover maps in comparison to national and SAR-derived forest maps indicate that caution should be exercised when applying these datasets for national reporting. PMID:26262681

  8. Canopy Height and Vertical Structure from Multibaseline Polarimetric InSAR: First Results of the 2016 NASA/ESA AfriSAR Campaign

    NASA Astrophysics Data System (ADS)

    Lavalle, M.; Hensley, S.; Lou, Y.; Saatchi, S. S.; Pinto, N.; Simard, M.; Fatoyinbo, T. E.; Duncanson, L.; Dubayah, R.; Hofton, M. A.; Blair, J. B.; Armston, J.

    2016-12-01

    In this paper we explore the derivation of canopy height and vertical structure from polarimetric-interferometric SAR (PolInSAR) data collected during the 2016 AfriSAR campaign in Gabon. AfriSAR is a joint effort between NASA and ESA to acquire multi-baseline L- and P-band radar data, lidar data and field data over tropical forests and savannah sites to support calibration, validation and algorithm development in preparation for the NISAR, GEDI and BIOMASS missions. Here we focus on the L-band UAVSAR dataset acquired over the Lope National Park in Central Gabon to demonstrate mapping of canopy height and vertical structure using PolInSAR and tomographic techniques. The Lope site features a natural gradient of forest biomass from the forest-savanna boundary (< 100 Mg/ha) to dense undisturbed humid tropical forests (> 400 Mg/ha). Our dataset includes 9 long-baseline, full-polarimetric UAVSAR acquisitions along with field and lidar data from the Laser Vegetation Ice Sensor (LVIS). We first present a brief theoretical background of the PolInSAR and tomographic techniques. We then show the results of our PolInSAR algorithms to create maps of canopy height generated via inversion of the random-volume-over-ground (RVOG) and random-motion-over-ground (RVoG) models. In our approach multiple interferometric baselines are merged incoherently to maximize the interferometric sensitivity over a broad range of tree heights. Finally we show how traditional tomographic algorithms are used for the retrieval of the full vertical canopy profile. We compare our results from the different PolInSAR/tomographic algorithms to validation data derived from lidar and field data.

  9. [Distribution patterns of canopy and understory tree species at local scale in a Tierra Firme forest, the Colombian Amazonia].

    PubMed

    Barreto-Silva, Juan Sebastian; López, Dairon Cárdenas; Montoya, Alvaro Javier Duque

    2014-03-01

    The effect of environmental variation on the structure of tree communities in tropical forests is still under debate. There is evidence that in landscapes like Tierra Firme forest, where the environmental gradient decreases at a local level, the effect of soil on the distribution patterns of plant species is minimal, happens to be random or is due to biological processes. In contrast, in studies with different kinds of plants from tropical forests, a greater effect on floristic composition of varying soil and topography has been reported. To assess this, the current study was carried out in a permanent plot of ten hectares in the Amacayacu National Park, Colombian Amazonia. To run the analysis, floristic and environmental variations were obtained according to tree species abundance categories and growth forms. In order to quantify the role played by both environmental filtering and dispersal limitation, the variation of the spatial configuration was included. We used Detrended Correspondence Analysis and Canonical Correspondence Analysis, followed by a variation partitioning, to analyze the species distribution patterns. The spatial template was evaluated using the Principal Coordinates of Neighbor Matrix method. We recorded 14 074 individuals from 1 053 species and 80 families. The most abundant families were Myristicaceae, Moraceae, Meliaceae, Arecaceae and Lecythidaceae, coinciding with other studies from Northwest Amazonia. Beta diversity was relatively low within the plot. Soils were very poor, had high aluminum concentration and were predominantly clayey. The floristic differences explained along the ten hectares plot were mainly associated to biological processes, such as dispersal limitation. The largest proportion of community variation in our dataset was unexplained by either environmental or spatial data. In conclusion, these results support random processes as the major drivers of the spatial variation of tree species at a local scale on Tierra Firme forests of Amacayacu National Park, and suggest reserve's size as a key element to ensure the conservation of plant diversity at both regional and local levels.

  10. Time fluctuation analysis of forest fire sequences

    NASA Astrophysics Data System (ADS)

    Vega Orozco, Carmen D.; Kanevski, Mikhaïl; Tonini, Marj; Golay, Jean; Pereira, Mário J. G.

    2013-04-01

    Forest fires are complex events involving both space and time fluctuations. Understanding of their dynamics and pattern distribution is of great importance in order to improve the resource allocation and support fire management actions at local and global levels. This study aims at characterizing the temporal fluctuations of forest fire sequences observed in Portugal, which is the country that holds the largest wildfire land dataset in Europe. This research applies several exploratory data analysis measures to 302,000 forest fires occurred from 1980 to 2007. The applied clustering measures are: Morisita clustering index, fractal and multifractal dimensions (box-counting), Ripley's K-function, Allan Factor, and variography. These algorithms enable a global time structural analysis describing the degree of clustering of a point pattern and defining whether the observed events occur randomly, in clusters or in a regular pattern. The considered methods are of general importance and can be used for other spatio-temporal events (i.e. crime, epidemiology, biodiversity, geomarketing, etc.). An important contribution of this research deals with the analysis and estimation of local measures of clustering that helps understanding their temporal structure. Each measure is described and executed for the raw data (forest fires geo-database) and results are compared to reference patterns generated under the null hypothesis of randomness (Poisson processes) embedded in the same time period of the raw data. This comparison enables estimating the degree of the deviation of the real data from a Poisson process. Generalizations to functional measures of these clustering methods, taking into account the phenomena, were also applied and adapted to detect time dependences in a measured variable (i.e. burned area). The time clustering of the raw data is compared several times with the Poisson processes at different thresholds of the measured function. Then, the clustering measure value depends on the threshold which helps to understand the time pattern of the studied events. Our findings detected the presence of overdensity of events in particular time periods and showed that the forest fire sequences in Portugal can be considered as a multifractal process with a degree of time-clustering of the events. Key words: time sequences, Morisita index, fractals, multifractals, box-counting, Ripley's K-function, Allan Factor, variography, forest fires, point process. Acknowledgements This work was partly supported by the SNFS Project No. 200021-140658, "Analysis and Modelling of Space-Time Patterns in Complex Regions". References - Kanevski M. (Editor). 2008. Advanced Mapping of Environmental Data: Geostatistics, Machine Learning and Bayesian Maximum Entropy. London / Hoboken: iSTE / Wiley. - Telesca L. and Pereira M.G. 2010. Time-clustering investigation of fire temporal fluctuations in Portugal, Nat. Hazards Earth Syst. Sci., vol. 10(4): 661-666. - Vega Orozco C., Tonini M., Conedera M., Kanevski M. (2012) Cluster recognition in spatial-temporal sequences: the case of forest fires, Geoinformatica, vol. 16(4): 653-673.

  11. Multiple filters affect tree species assembly in mid-latitude forest communities.

    PubMed

    Kubota, Y; Kusumoto, B; Shiono, T; Ulrich, W

    2018-05-01

    Species assembly patterns of local communities are shaped by the balance between multiple abiotic/biotic filters and dispersal that both select individuals from species pools at the regional scale. Knowledge regarding functional assembly can provide insight into the relative importance of the deterministic and stochastic processes that shape species assembly. We evaluated the hierarchical roles of the α niche and β niches by analyzing the influence of environmental filtering relative to functional traits on geographical patterns of tree species assembly in mid-latitude forests. Using forest plot datasets, we examined the α niche traits (leaf and wood traits) and β niche properties (cold/drought tolerance) of tree species, and tested non-randomness (clustering/over-dispersion) of trait assembly based on null models that assumed two types of species pools related to biogeographical regions. For most plots, species assembly patterns fell within the range of random expectation. However, particularly for cold/drought tolerance-related β niche properties, deviation from randomness was frequently found; non-random clustering was predominant in higher latitudes with harsh climates. Our findings demonstrate that both randomness and non-randomness in trait assembly emerged as a result of the α and β niches, although we suggest the potential role of dispersal processes and/or species equalization through trait similarities in generating the prevalence of randomness. Clustering of β niche traits along latitudinal climatic gradients provides clear evidence of species sorting by filtering particular traits. Our results reveal that multiple filters through functional niches and stochastic processes jointly shape geographical patterns of species assembly across mid-latitude forests.

  12. Hand pose estimation in depth image using CNN and random forest

    NASA Astrophysics Data System (ADS)

    Chen, Xi; Cao, Zhiguo; Xiao, Yang; Fang, Zhiwen

    2018-03-01

    Thanks to the availability of low cost depth cameras, like Microsoft Kinect, 3D hand pose estimation attracted special research attention in these years. Due to the large variations in hand`s viewpoint and the high dimension of hand motion, 3D hand pose estimation is still challenging. In this paper we propose a two-stage framework which joint with CNN and Random Forest to boost the performance of hand pose estimation. First, we use a standard Convolutional Neural Network (CNN) to regress the hand joints` locations. Second, using a Random Forest to refine the joints from the first stage. In the second stage, we propose a pyramid feature which merges the information flow of the CNN. Specifically, we get the rough joints` location from first stage, then rotate the convolutional feature maps (and image). After this, for each joint, we map its location to each feature map (and image) firstly, then crop features at each feature map (and image) around its location, put extracted features to Random Forest to refine at last. Experimentally, we evaluate our proposed method on ICVL dataset and get the mean error about 11mm, our method is also real-time on a desktop.

  13. Recent drought conditions in the Conterminous United States

    Treesearch

    Frank H. Koch; William D. Smith; John W. Coulston

    2013-01-01

    Droughts are common in virtually all U.S. forests, but their frequency and intensity vary widely both between and within forest ecosystems (Hanson and Weltzin 2000). Forests in the Western United States generally exhibit a pattern of annual seasonal droughts. Forests in the Eastern United States tend to exhibit one of two prevailing patterns: random occasional droughts...

  14. Comparison of Random Forest, k-Nearest Neighbor, and Support Vector Machine Classifiers for Land Cover Classification Using Sentinel-2 Imagery

    PubMed Central

    Thanh Noi, Phan; Kappas, Martin

    2017-01-01

    In previous classification studies, three non-parametric classifiers, Random Forest (RF), k-Nearest Neighbor (kNN), and Support Vector Machine (SVM), were reported as the foremost classifiers at producing high accuracies. However, only a few studies have compared the performances of these classifiers with different training sample sizes for the same remote sensing images, particularly the Sentinel-2 Multispectral Imager (MSI). In this study, we examined and compared the performances of the RF, kNN, and SVM classifiers for land use/cover classification using Sentinel-2 image data. An area of 30 × 30 km2 within the Red River Delta of Vietnam with six land use/cover types was classified using 14 different training sample sizes, including balanced and imbalanced, from 50 to over 1250 pixels/class. All classification results showed a high overall accuracy (OA) ranging from 90% to 95%. Among the three classifiers and 14 sub-datasets, SVM produced the highest OA with the least sensitivity to the training sample sizes, followed consecutively by RF and kNN. In relation to the sample size, all three classifiers showed a similar and high OA (over 93.85%) when the training sample size was large enough, i.e., greater than 750 pixels/class or representing an area of approximately 0.25% of the total study area. The high accuracy was achieved with both imbalanced and balanced datasets. PMID:29271909

  15. Comparison of Random Forest, k-Nearest Neighbor, and Support Vector Machine Classifiers for Land Cover Classification Using Sentinel-2 Imagery.

    PubMed

    Thanh Noi, Phan; Kappas, Martin

    2017-12-22

    In previous classification studies, three non-parametric classifiers, Random Forest (RF), k-Nearest Neighbor (kNN), and Support Vector Machine (SVM), were reported as the foremost classifiers at producing high accuracies. However, only a few studies have compared the performances of these classifiers with different training sample sizes for the same remote sensing images, particularly the Sentinel-2 Multispectral Imager (MSI). In this study, we examined and compared the performances of the RF, kNN, and SVM classifiers for land use/cover classification using Sentinel-2 image data. An area of 30 × 30 km² within the Red River Delta of Vietnam with six land use/cover types was classified using 14 different training sample sizes, including balanced and imbalanced, from 50 to over 1250 pixels/class. All classification results showed a high overall accuracy (OA) ranging from 90% to 95%. Among the three classifiers and 14 sub-datasets, SVM produced the highest OA with the least sensitivity to the training sample sizes, followed consecutively by RF and kNN. In relation to the sample size, all three classifiers showed a similar and high OA (over 93.85%) when the training sample size was large enough, i.e., greater than 750 pixels/class or representing an area of approximately 0.25% of the total study area. The high accuracy was achieved with both imbalanced and balanced datasets.

  16. Discrimination of raw and processed Dipsacus asperoides by near infrared spectroscopy combined with least squares-support vector machine and random forests

    NASA Astrophysics Data System (ADS)

    Xin, Ni; Gu, Xiao-Feng; Wu, Hao; Hu, Yu-Zhu; Yang, Zhong-Lin

    2012-04-01

    Most herbal medicines could be processed to fulfill the different requirements of therapy. The purpose of this study was to discriminate between raw and processed Dipsacus asperoides, a common traditional Chinese medicine, based on their near infrared (NIR) spectra. Least squares-support vector machine (LS-SVM) and random forests (RF) were employed for full-spectrum classification. Three types of kernels, including linear kernel, polynomial kernel and radial basis function kernel (RBF), were checked for optimization of LS-SVM model. For comparison, a linear discriminant analysis (LDA) model was performed for classification, and the successive projections algorithm (SPA) was executed prior to building an LDA model to choose an appropriate subset of wavelengths. The three methods were applied to a dataset containing 40 raw herbs and 40 corresponding processed herbs. We ran 50 runs of 10-fold cross validation to evaluate the model's efficiency. The performance of the LS-SVM with RBF kernel (RBF LS-SVM) was better than the other two kernels. The RF, RBF LS-SVM and SPA-LDA successfully classified all test samples. The mean error rates for the 50 runs of 10-fold cross validation were 1.35% for RBF LS-SVM, 2.87% for RF, and 2.50% for SPA-LDA. The best classification results were obtained by using LS-SVM with RBF kernel, while RF was fast in the training and making predictions.

  17. Comparative Performance Analysis of Support Vector Machine, Random Forest, Logistic Regression and k-Nearest Neighbours in Rainbow Trout (Oncorhynchus Mykiss) Classification Using Image-Based Features

    PubMed Central

    Císař, Petr; Labbé, Laurent; Souček, Pavel; Pelissier, Pablo; Kerneis, Thierry

    2018-01-01

    The main aim of this study was to develop a new objective method for evaluating the impacts of different diets on the live fish skin using image-based features. In total, one-hundred and sixty rainbow trout (Oncorhynchus mykiss) were fed either a fish-meal based diet (80 fish) or a 100% plant-based diet (80 fish) and photographed using consumer-grade digital camera. Twenty-three colour features and four texture features were extracted. Four different classification methods were used to evaluate fish diets including Random forest (RF), Support vector machine (SVM), Logistic regression (LR) and k-Nearest neighbours (k-NN). The SVM with radial based kernel provided the best classifier with correct classification rate (CCR) of 82% and Kappa coefficient of 0.65. Although the both LR and RF methods were less accurate than SVM, they achieved good classification with CCR 75% and 70% respectively. The k-NN was the least accurate (40%) classification model. Overall, it can be concluded that consumer-grade digital cameras could be employed as the fast, accurate and non-invasive sensor for classifying rainbow trout based on their diets. Furthermore, these was a close association between image-based features and fish diet received during cultivation. These procedures can be used as non-invasive, accurate and precise approaches for monitoring fish status during the cultivation by evaluating diet’s effects on fish skin. PMID:29596375

  18. Comparative Performance Analysis of Support Vector Machine, Random Forest, Logistic Regression and k-Nearest Neighbours in Rainbow Trout (Oncorhynchus Mykiss) Classification Using Image-Based Features.

    PubMed

    Saberioon, Mohammadmehdi; Císař, Petr; Labbé, Laurent; Souček, Pavel; Pelissier, Pablo; Kerneis, Thierry

    2018-03-29

    The main aim of this study was to develop a new objective method for evaluating the impacts of different diets on the live fish skin using image-based features. In total, one-hundred and sixty rainbow trout ( Oncorhynchus mykiss ) were fed either a fish-meal based diet (80 fish) or a 100% plant-based diet (80 fish) and photographed using consumer-grade digital camera. Twenty-three colour features and four texture features were extracted. Four different classification methods were used to evaluate fish diets including Random forest (RF), Support vector machine (SVM), Logistic regression (LR) and k -Nearest neighbours ( k -NN). The SVM with radial based kernel provided the best classifier with correct classification rate (CCR) of 82% and Kappa coefficient of 0.65. Although the both LR and RF methods were less accurate than SVM, they achieved good classification with CCR 75% and 70% respectively. The k -NN was the least accurate (40%) classification model. Overall, it can be concluded that consumer-grade digital cameras could be employed as the fast, accurate and non-invasive sensor for classifying rainbow trout based on their diets. Furthermore, these was a close association between image-based features and fish diet received during cultivation. These procedures can be used as non-invasive, accurate and precise approaches for monitoring fish status during the cultivation by evaluating diet's effects on fish skin.

  19. Machine learning to predict the occurrence of bisphosphonate-related osteonecrosis of the jaw associated with dental extraction: A preliminary report.

    PubMed

    Kim, Dong Wook; Kim, Hwiyoung; Nam, Woong; Kim, Hyung Jun; Cha, In-Ho

    2018-04-23

    The aim of this study was to build and validate five types of machine learning models that can predict the occurrence of BRONJ associated with dental extraction in patients taking bisphosphonates for the management of osteoporosis. A retrospective review of the medical records was conducted to obtain cases and controls for the study. Total 125 patients consisting of 41 cases and 84 controls were selected for the study. Five machine learning prediction algorithms including multivariable logistic regression model, decision tree, support vector machine, artificial neural network, and random forest were implemented. The outputs of these models were compared with each other and also with conventional methods, such as serum CTX level. Area under the receiver operating characteristic (ROC) curve (AUC) was used to compare the results. The performance of machine learning models was significantly superior to conventional statistical methods and single predictors. The random forest model yielded the best performance (AUC = 0.973), followed by artificial neural network (AUC = 0.915), support vector machine (AUC = 0.882), logistic regression (AUC = 0.844), decision tree (AUC = 0.821), drug holiday alone (AUC = 0.810), and CTX level alone (AUC = 0.630). Machine learning methods showed superior performance in predicting BRONJ associated with dental extraction compared to conventional statistical methods using drug holiday and serum CTX level. Machine learning can thus be applied in a wide range of clinical studies. Copyright © 2017. Published by Elsevier Inc.

  20. Stratifying to reduce bias caused by high nonresponse rates: A case study from New Mexico’s forest inventory

    Treesearch

    Sara A. Goeking; Paul L. Patterson

    2013-01-01

    The USDA Forest Service’s Forest Inventory and Analysis (FIA) Program applies specific sampling and analysis procedures to estimate a variety of forest attributes. FIA’s Interior West region uses post-stratification, where strata consist of forest/nonforest polygons based on MODIS imagery, and assumes that nonresponse plots are distributed at random across each stratum...

  1. RF-Phos: A Novel General Phosphorylation Site Prediction Tool Based on Random Forest.

    PubMed

    Ismail, Hamid D; Jones, Ahoi; Kim, Jung H; Newman, Robert H; Kc, Dukka B

    2016-01-01

    Protein phosphorylation is one of the most widespread regulatory mechanisms in eukaryotes. Over the past decade, phosphorylation site prediction has emerged as an important problem in the field of bioinformatics. Here, we report a new method, termed Random Forest-based Phosphosite predictor 2.0 (RF-Phos 2.0), to predict phosphorylation sites given only the primary amino acid sequence of a protein as input. RF-Phos 2.0, which uses random forest with sequence and structural features, is able to identify putative sites of phosphorylation across many protein families. In side-by-side comparisons based on 10-fold cross validation and an independent dataset, RF-Phos 2.0 compares favorably to other popular mammalian phosphosite prediction methods, such as PhosphoSVM, GPS2.1, and Musite.

  2. Prediction of Return-to-original-work after an Industrial Accident Using Machine Learning and Comparison of Techniques

    PubMed Central

    2018-01-01

    Background Many studies have tried to develop predictors for return-to-work (RTW). However, since complex factors have been demonstrated to predict RTW, it is difficult to use them practically. This study investigated whether factors used in previous studies could predict whether an individual had returned to his/her original work by four years after termination of the worker's recovery period. Methods An initial logistic regression analysis of 1,567 participants of the fourth Panel Study of Worker's Compensation Insurance yielded odds ratios. The participants were divided into two subsets, a training dataset and a test dataset. Using the training dataset, logistic regression, decision tree, random forest, and support vector machine models were established, and important variables of each model were identified. The predictive abilities of the different models were compared. Results The analysis showed that only earned income and company-related factors significantly affected return-to-original-work (RTOW). The random forest model showed the best accuracy among the tested machine learning models; however, the difference was not prominent. Conclusion It is possible to predict a worker's probability of RTOW using machine learning techniques with moderate accuracy. PMID:29736160

  3. Estimation of sleep status in sleep apnea patients using a novel head actigraphy technique.

    PubMed

    Hummel, Richard; Bradley, T Douglas; Fernie, Geoff R; Chang, S J Isaac; Alshaer, Hisham

    2015-01-01

    Polysomnography is a comprehensive modality for diagnosing sleep apnea (SA), but it is expensive and not widely available. Several technologies have been developed for portable diagnosis of SA in the home, most of which lack the ability to detect sleep status. Wrist actigraphy (accelerometry) has been adopted to cover this limitation. However, head actigraphy has not been systematically evaluated for this purpose. Therefore, the aim of this study was to evaluate the ability of head actigraphy to detect sleep/wake status. We obtained full overnight 3-axis head accelerometry data from 75 sleep apnea patient recordings. These were split into training and validation groups (2:1). Data were preprocessed and 5 features were extracted. Different feature combinations were fed into 3 different classifiers, namely support vector machine, logistic regression, and random forests, each of which was trained and validated on a different subgroup. The random forest algorithm yielded the highest performance, with an area under the receiver operating characteristic (ROC) curve of 0.81 for detection of sleep status. This shows that this technique has a very good performance in detecting sleep status in SA patients despite the specificities in this population, such as respiration related movements.

  4. Automatic detection of atrial fibrillation in cardiac vibration signals.

    PubMed

    Brueser, C; Diesel, J; Zink, M D H; Winter, S; Schauerte, P; Leonhardt, S

    2013-01-01

    We present a study on the feasibility of the automatic detection of atrial fibrillation (AF) from cardiac vibration signals (ballistocardiograms/BCGs) recorded by unobtrusive bedmounted sensors. The proposed system is intended as a screening and monitoring tool in home-healthcare applications and not as a replacement for ECG-based methods used in clinical environments. Based on BCG data recorded in a study with 10 AF patients, we evaluate and rank seven popular machine learning algorithms (naive Bayes, linear and quadratic discriminant analysis, support vector machines, random forests as well as bagged and boosted trees) for their performance in separating 30 s long BCG epochs into one of three classes: sinus rhythm, atrial fibrillation, and artifact. For each algorithm, feature subsets of a set of statistical time-frequency-domain and time-domain features were selected based on the mutual information between features and class labels as well as first- and second-order interactions among features. The classifiers were evaluated on a set of 856 epochs by means of 10-fold cross-validation. The best algorithm (random forests) achieved a Matthews correlation coefficient, mean sensitivity, and mean specificity of 0.921, 0.938, and 0.982, respectively.

  5. 3D statistical shape models incorporating 3D random forest regression voting for robust CT liver segmentation

    NASA Astrophysics Data System (ADS)

    Norajitra, Tobias; Meinzer, Hans-Peter; Maier-Hein, Klaus H.

    2015-03-01

    During image segmentation, 3D Statistical Shape Models (SSM) usually conduct a limited search for target landmarks within one-dimensional search profiles perpendicular to the model surface. In addition, landmark appearance is modeled only locally based on linear profiles and weak learners, altogether leading to segmentation errors from landmark ambiguities and limited search coverage. We present a new method for 3D SSM segmentation based on 3D Random Forest Regression Voting. For each surface landmark, a Random Regression Forest is trained that learns a 3D spatial displacement function between the according reference landmark and a set of surrounding sample points, based on an infinite set of non-local randomized 3D Haar-like features. Landmark search is then conducted omni-directionally within 3D search spaces, where voxelwise forest predictions on landmark position contribute to a common voting map which reflects the overall position estimate. Segmentation experiments were conducted on a set of 45 CT volumes of the human liver, of which 40 images were randomly chosen for training and 5 for testing. Without parameter optimization, using a simple candidate selection and a single resolution approach, excellent results were achieved, while faster convergence and better concavity segmentation were observed, altogether underlining the potential of our approach in terms of increased robustness from distinct landmark detection and from better search coverage.

  6. Artificial Intelligence Procedures for Tree Taper Estimation within a Complex Vegetation Mosaic in Brazil

    PubMed Central

    Nunes, Matheus Henrique

    2016-01-01

    Tree stem form in native tropical forests is very irregular, posing a challenge to establishing taper equations that can accurately predict the diameter at any height along the stem and subsequently merchantable volume. Artificial intelligence approaches can be useful techniques in minimizing estimation errors within complex variations of vegetation. We evaluated the performance of Random Forest® regression tree and Artificial Neural Network procedures in modelling stem taper. Diameters and volume outside bark were compared to a traditional taper-based equation across a tropical Brazilian savanna, a seasonal semi-deciduous forest and a rainforest. Neural network models were found to be more accurate than the traditional taper equation. Random forest showed trends in the residuals from the diameter prediction and provided the least precise and accurate estimations for all forest types. This study provides insights into the superiority of a neural network, which provided advantages regarding the handling of local effects. PMID:27187074

  7. Electromagnetic wave extinction within a forested canopy

    NASA Technical Reports Server (NTRS)

    Karam, M. A.; Fung, A. K.

    1989-01-01

    A forested canopy is modeled by a collection of randomly oriented finite-length cylinders shaded by randomly oriented and distributed disk- or needle-shaped leaves. For a plane wave exciting the forested canopy, the extinction coefficient is formulated in terms of the extinction cross sections (ECSs) in the local frame of each forest component and the Eulerian angles of orientation (used to describe the orientation of each component). The ECSs in the local frame for the finite-length cylinders used to model the branches are obtained by using the forward-scattering theorem. ECSs in the local frame for the disk- and needle-shaped leaves are obtained by the summation of the absorption and scattering cross-sections. The behavior of the extinction coefficients with the incidence angle is investigated numerically for both deciduous and coniferous forest. The dependencies of the extinction coefficients on the orientation of the leaves are illustrated numerically.

  8. Artificial Intelligence Procedures for Tree Taper Estimation within a Complex Vegetation Mosaic in Brazil.

    PubMed

    Nunes, Matheus Henrique; Görgens, Eric Bastos

    2016-01-01

    Tree stem form in native tropical forests is very irregular, posing a challenge to establishing taper equations that can accurately predict the diameter at any height along the stem and subsequently merchantable volume. Artificial intelligence approaches can be useful techniques in minimizing estimation errors within complex variations of vegetation. We evaluated the performance of Random Forest® regression tree and Artificial Neural Network procedures in modelling stem taper. Diameters and volume outside bark were compared to a traditional taper-based equation across a tropical Brazilian savanna, a seasonal semi-deciduous forest and a rainforest. Neural network models were found to be more accurate than the traditional taper equation. Random forest showed trends in the residuals from the diameter prediction and provided the least precise and accurate estimations for all forest types. This study provides insights into the superiority of a neural network, which provided advantages regarding the handling of local effects.

  9. Random forests and stochastic gradient boosting for predicting tree canopy cover: Comparing tuning processes and model performance

    Treesearch

    E. Freeman; G. Moisen; J. Coulston; B. Wilson

    2014-01-01

    Random forests (RF) and stochastic gradient boosting (SGB), both involving an ensemble of classification and regression trees, are compared for modeling tree canopy cover for the 2011 National Land Cover Database (NLCD). The objectives of this study were twofold. First, sensitivity of RF and SGB to choices in tuning parameters was explored. Second, performance of the...

  10. Relationship of field and LiDAR estimates of forest canopy cover with snow accumulation and melt

    Treesearch

    Mariana Dobre; William J. Elliot; Joan Q. Wu; Timothy E. Link; Brandon Glaza; Theresa B. Jain; Andrew T. Hudak

    2012-01-01

    At the Priest River Experimental Forest in northern Idaho, USA, snow water equivalent (SWE) was recorded over a period of six years on random, equally-spaced plots in ~4.5 ha small watersheds (n=10). Two watersheds were selected as controls and eight as treatments, with two watersheds randomly assigned per treatment as follows: harvest (2007) followed by mastication (...

  11. New machine learning tools for predictive vegetation mapping after climate change: Bagging and Random Forest perform better than Regression Tree Analysis

    Treesearch

    L.R. Iverson; A.M. Prasad; A. Liaw

    2004-01-01

    More and better machine learning tools are becoming available for landscape ecologists to aid in understanding species-environment relationships and to map probable species occurrence now and potentially into the future. To thal end, we evaluated three statistical models: Regression Tree Analybib (RTA), Bagging Trees (BT) and Random Forest (RF) for their utility in...

  12. Random forests and stochastic gradient boosting for predicting tree canopy cover: Comparing tuning processes and model performance

    Treesearch

    Elizabeth A. Freeman; Gretchen G. Moisen; John W. Coulston; Barry T. (Ty) Wilson

    2015-01-01

    As part of the development of the 2011 National Land Cover Database (NLCD) tree canopy cover layer, a pilot project was launched to test the use of high-resolution photography coupled with extensive ancillary data to map the distribution of tree canopy cover over four study regions in the conterminous US. Two stochastic modeling techniques, random forests (RF...

  13. Chapter4 - Drought patterns in the conterminous United States and Hawaii.

    Treesearch

    Frank H. Koch; William D. Smith; John W. Coulston

    2014-01-01

    Droughts are common in virtually all U.S. forests, but their frequency and intensity vary widely both between and within forest ecosystems (Hanson and Weltzin 2000). Forests in the Western United States generally exhibit a pattern of annual seasonal droughts. Forests in the Eastern United States tend to exhibit one of two prevailing patterns: random occasional droughts...

  14. A Prospectus on Restoring Late Successional Forest Structure to Eastside Pine Ecosystems Through Large-Scale, Interdisciplinary Research

    Treesearch

    Steve Zack; William F. Laudenslayer; Luke George; Carl Skinner; William Oliver

    1999-01-01

    At two different locations in northeast California, an interdisciplinary team of scientists is initiating long-term studies to quantify the effects of forest manipulations intended to accelerate andlor enhance late-successional structure of eastside pine forest ecosystems. One study, at Blacks Mountain Experimental Forest, uses a split-plot, factorial, randomized block...

  15. Probabilistic risk models for multiple disturbances: an example of forest insects and wildfires

    Treesearch

    Haiganoush K. Preisler; Alan A. Ager; Jane L. Hayes

    2010-01-01

    Building probabilistic risk models for highly random forest disturbances like wildfire and forest insect outbreaks is a challenging. Modeling the interactions among natural disturbances is even more difficult. In the case of wildfire and forest insects, we looked at the probability of a large fire given an insect outbreak and also the incidence of insect outbreaks...

  16. Utilizing random forests imputation of forest plot data for landscape-level wildfire analyses

    Treesearch

    Karin L. Riley; Isaac C. Grenfell; Mark A. Finney; Nicholas L. Crookston

    2014-01-01

    Maps of the number, size, and species of trees in forests across the United States are desirable for a number of applications. For landscape-level fire and forest simulations that use the Forest Vegetation Simulator (FVS), a spatial tree-level dataset, or “tree list”, is a necessity. FVS is widely used at the stand level for simulating fire effects on tree mortality,...

  17. Alternative methods to evaluate trial level surrogacy.

    PubMed

    Abrahantes, Josè Cortiñas; Shkedy, Ziv; Molenberghs, Geert

    2008-01-01

    The evaluation and validation of surrogate endpoints have been extensively studied in the last decade. Prentice [1] and Freedman, Graubard and Schatzkin [2] laid the foundations for the evaluation of surrogate endpoints in randomized clinical trials. Later, Buyse et al. [5] proposed a meta-analytic methodology, producing different methods for different settings, which was further studied by Alonso and Molenberghs [9], in their unifying approach based on information theory. In this article, we focus our attention on the trial-level surrogacy and propose alternative procedures to evaluate such surrogacy measure, which do not pre-specify the type of association. A promising correction based on cross-validation is investigated. As well as the construction of confidence intervals for this measure. In order to avoid making assumption about the type of relationship between the treatment effects and its distribution, a collection of alternative methods, based on regression trees, bagging, random forests, and support vector machines, combined with bootstrap-based confidence interval and, should one wish, in conjunction with a cross-validation based correction, will be proposed and applied. We apply the various strategies to data from three clinical studies: in opthalmology, in advanced colorectal cancer, and in schizophrenia. The results obtained for the three case studies are compared; they indicate that using random forest or bagging models produces larger estimated values for the surrogacy measure, which are in general stabler and the confidence interval narrower than linear regression and support vector regression. For the advanced colorectal cancer studies, we even found the trial-level surrogacy is considerably different from what has been reported. In general the alternative methods are more computationally demanding, and specially the calculation of the confidence intervals, require more computational time that the delta-method counterpart. First, more flexible modeling techniques can be used, allowing for other type of association. Second, when no cross-validation-based correction is applied, overly optimistic trial-level surrogacy estimates will be found, thus cross-validation is highly recommendable. Third, the use of the delta method to calculate confidence intervals is not recommendable since it makes assumptions valid only in very large samples. It may also produce range-violating limits. We therefore recommend alternatives: bootstrap methods in general. Also, the information-theoretic approach produces comparable results with the bagging and random forest approaches, when cross-validation correction is applied. It is also important to observe that, even for the case in which the linear model might be a good option too, bagging methods perform well too, and their confidence intervals were more narrow.

  18. A comparison of rule-based and machine learning approaches for classifying patient portal messages.

    PubMed

    Cronin, Robert M; Fabbri, Daniel; Denny, Joshua C; Rosenbloom, S Trent; Jackson, Gretchen Purcell

    2017-09-01

    Secure messaging through patient portals is an increasingly popular way that consumers interact with healthcare providers. The increasing burden of secure messaging can affect clinic staffing and workflows. Manual management of portal messages is costly and time consuming. Automated classification of portal messages could potentially expedite message triage and delivery of care. We developed automated patient portal message classifiers with rule-based and machine learning techniques using bag of words and natural language processing (NLP) approaches. To evaluate classifier performance, we used a gold standard of 3253 portal messages manually categorized using a taxonomy of communication types (i.e., main categories of informational, medical, logistical, social, and other communications, and subcategories including prescriptions, appointments, problems, tests, follow-up, contact information, and acknowledgement). We evaluated our classifiers' accuracies in identifying individual communication types within portal messages with area under the receiver-operator curve (AUC). Portal messages often contain more than one type of communication. To predict all communication types within single messages, we used the Jaccard Index. We extracted the variables of importance for the random forest classifiers. The best performing approaches to classification for the major communication types were: logistic regression for medical communications (AUC: 0.899); basic (rule-based) for informational communications (AUC: 0.842); and random forests for social communications and logistical communications (AUCs: 0.875 and 0.925, respectively). The best performing classification approach of classifiers for individual communication subtypes was random forests for Logistical-Contact Information (AUC: 0.963). The Jaccard Indices by approach were: basic classifier, Jaccard Index: 0.674; Naïve Bayes, Jaccard Index: 0.799; random forests, Jaccard Index: 0.859; and logistic regression, Jaccard Index: 0.861. For medical communications, the most predictive variables were NLP concepts (e.g., Temporal_Concept, which maps to 'morning', 'evening' and Idea_or_Concept which maps to 'appointment' and 'refill'). For logistical communications, the most predictive variables contained similar numbers of NLP variables and words (e.g., Telephone mapping to 'phone', 'insurance'). For social and informational communications, the most predictive variables were words (e.g., social: 'thanks', 'much', informational: 'question', 'mean'). This study applies automated classification methods to the content of patient portal messages and evaluates the application of NLP techniques on consumer communications in patient portal messages. We demonstrated that random forest and logistic regression approaches accurately classified the content of portal messages, although the best approach to classification varied by communication type. Words were the most predictive variables for classification of most communication types, although NLP variables were most predictive for medical communication types. As adoption of patient portals increases, automated techniques could assist in understanding and managing growing volumes of messages. Further work is needed to improve classification performance to potentially support message triage and answering. Copyright © 2017 Elsevier B.V. All rights reserved.

  19. Random Forest Application for NEXRAD Radar Data Quality Control

    NASA Astrophysics Data System (ADS)

    Keem, M.; Seo, B. C.; Krajewski, W. F.

    2017-12-01

    Identification and elimination of non-meteorological radar echoes (e.g., returns from ground, wind turbines, and biological targets) are the basic data quality control steps before radar data use in quantitative applications (e.g., precipitation estimation). Although WSR-88Ds' recent upgrade to dual-polarization has enhanced this quality control and echo classification, there are still challenges to detect some non-meteorological echoes that show precipitation-like characteristics (e.g., wind turbine or anomalous propagation clutter embedded in rain). With this in mind, a new quality control method using Random Forest is proposed in this study. This classification algorithm is known to produce reliable results with less uncertainty. The method introduces randomness into sampling and feature selections and integrates consequent multiple decision trees. The multidimensional structure of the trees can characterize the statistical interactions of involved multiple features in complex situations. The authors explore the performance of Random Forest method for NEXRAD radar data quality control. Training datasets are selected using several clear cases of precipitation and non-precipitation (but with some non-meteorological echoes). The model is structured using available candidate features (from the NEXRAD data) such as horizontal reflectivity, differential reflectivity, differential phase shift, copolar correlation coefficient, and their horizontal textures (e.g., local standard deviation). The influence of each feature on classification results are quantified by variable importance measures that are automatically estimated by the Random Forest algorithm. Therefore, the number and types of features in the final forest can be examined based on the classification accuracy. The authors demonstrate the capability of the proposed approach using several cases ranging from distinct to complex rain/no-rain events and compare the performance with the existing algorithms (e.g., MRMS). They also discuss operational feasibility based on the observed strength and weakness of the method.

  20. Fire, climate change, and forest resilience in interior Alaska

    Treesearch

    Jill F. Johnstone; F. Stuart Chapin; Teresa N. Hollingsworth; Michelle C. Mack; Vladimir Romanovsky; Merritt Turetsky

    2010-01-01

    In the boreal forests of interior Alaska, feedbacks that link forest soils, fire characteristics, and plant traits have supported stable cycles of forest succession for the past 6000 years. This high resilience of forest stands to fire disturbance is supported by two interrelated feedback cycles: (i) interactions among disturbance regime and plant-soil-microbial...

  1. Fault Detection of Aircraft System with Random Forest Algorithm and Similarity Measure

    PubMed Central

    Park, Wookje; Jung, Sikhang

    2014-01-01

    Research on fault detection algorithm was developed with the similarity measure and random forest algorithm. The organized algorithm was applied to unmanned aircraft vehicle (UAV) that was readied by us. Similarity measure was designed by the help of distance information, and its usefulness was also verified by proof. Fault decision was carried out by calculation of weighted similarity measure. Twelve available coefficients among healthy and faulty status data group were used to determine the decision. Similarity measure weighting was done and obtained through random forest algorithm (RFA); RF provides data priority. In order to get a fast response of decision, a limited number of coefficients was also considered. Relation of detection rate and amount of feature data were analyzed and illustrated. By repeated trial of similarity calculation, useful data amount was obtained. PMID:25057508

  2. A primer on stand and forest inventory designs

    Treesearch

    H. Gyde Lund; Charles E. Thomas

    1989-01-01

    Covers designs for the inventory of stands and forests in detail and with worked-out examples. For stands, random sampling, line transects, ricochet plot, systematic sampling, single plot, cluster, subjective sampling and complete enumeration are discussed. For forests inventory, the main categories are subjective sampling, inventories without prior stand mapping,...

  3. Implementing watershed investment programs to restore fire-adapted forests for watershed services

    NASA Astrophysics Data System (ADS)

    Springer, A. E.

    2013-12-01

    Payments for ecosystems services and watershed investment programs have created new solutions for restoring upland fire-adapted forests to support downstream surface-water and groundwater uses. Water from upland forests supports not only a significant percentage of the public water supplies in the U.S., but also extensive riparian, aquatic, and groundwater dependent ecosystems. Many rare, endemic, threatened, and endangered species are supported by the surface-water and groundwater generated from the forested uplands. In the Ponderosa pine forests of the Southwestern U.S., post Euro-American settlement forest management practices, coupled with climate change, has significantly impacted watershed functionality by increasing vegetation cover and associated evapotranspiration and decreasing runoff and groundwater recharge. A large Collaborative Forest Landscape Restoration Program project known as the Four Forests Restoration Initiative is developing landscape scale processes to make the forests connected to these watersheds more resilient. However, there are challenges in financing the initial forest treatments and subsequent maintenance treatments while garnering supportive public opinion to forest thinning projects. A solution called the Flagstaff Watershed Protection Project is utilizing City tax dollars collected through a public bond to finance forest treatments. Exit polling from the bond election documented the reasons for the 73 % affirmative vote on the bond measure. These forest treatments have included in their actions restoration of associated ephemeral stream channels and spring ecosystems, but resources still need to be identified for these actions. A statewide strategy for developing additional forest restoration resources outside of the federal financing is being explored by state and local business and governmental leaders. Coordination, synthesis, and modeling supported by a NSF Water Sustainability and Climate project has been instrumental in facilitating the forest restoration and watershed health decision making processes.

  4. Forest community classification of the Porcupine River drainage, interior Alaska, and its application to forest management.

    Treesearch

    John Yarie

    1983-01-01

    The forest vegetation of 3,600,000 hectares in northeast interior Alaska was classified. A total of 365 plots located in a stratified random design were run through the ordination programs SIMORD and TWINSPAN. A total of 40 forest communities were described vegetatively and, to a limited extent, environmentally. The area covered by each community was similar, ranging...

  5. Experimental Design Considerations for Establishing an Off-Road, Habitat-Specific Bird Monitoring Program Using Point Counts

    Treesearch

    JoAnn M. Hanowski; Gerald J. Niemi

    1995-01-01

    We established bird monitoring programs in two regions of Minnesota: the Chippewa National Forest and the Superior National Forest. The experimental design defined forest cover types as strata in which samples of forest stands were randomly selected. Subsamples (3 point counts) were placed in each stand to maximize field effort and to assess within-stand and between-...

  6. Predicting live and dead tree basal area of bark beetle affected forests from discrete-return lidar

    Treesearch

    Benjamin C. Bright; Andrew T. Hudak; Robert McGaughey; Hans-Erik Andersen; Jose Negron

    2013-01-01

    Bark beetle outbreaks have killed large numbers of trees across North America in recent years. Lidar remote sensing can be used to effectively estimate forest biomass, but prediction of both live and dead standing biomass in beetle-affected forests using lidar alone has not been demonstrated. We developed Random Forest (RF) models predicting total, live, dead, and...

  7. Valuing the Recreational Benefits from the Creation of Nature Reserves in Irish Forests

    Treesearch

    Riccardo Scarpa; Susan M. Chilton; W. George Hutchinson; Joseph Buongiorno

    2000-01-01

    Data from a large-scale contingent valuation study are used to investigate the effects of forest attribum on willingness to pay for forest recreation in Ireland. In particular, the presence of a nature reserve in the forest is found to significantly increase the visitors' willingness to pay. A random utility model is used to estimate the welfare change associated...

  8. Evaluating effectiveness of down-sampling for stratified designs and unbalanced prevalence in Random Forest models of tree species distributions in Nevada

    Treesearch

    Elizabeth A. Freeman; Gretchen G. Moisen; Tracy S. Frescino

    2012-01-01

    Random Forests is frequently used to model species distributions over large geographic areas. Complications arise when data used to train the models have been collected in stratified designs that involve different sampling intensity per stratum. The modeling process is further complicated if some of the target species are relatively rare on the landscape leading to an...

  9. Unbiased feature selection in learning random forests for high-dimensional data.

    PubMed

    Nguyen, Thanh-Tung; Huang, Joshua Zhexue; Nguyen, Thuy Thi

    2015-01-01

    Random forests (RFs) have been widely used as a powerful classification method. However, with the randomization in both bagging samples and feature selection, the trees in the forest tend to select uninformative features for node splitting. This makes RFs have poor accuracy when working with high-dimensional data. Besides that, RFs have bias in the feature selection process where multivalued features are favored. Aiming at debiasing feature selection in RFs, we propose a new RF algorithm, called xRF, to select good features in learning RFs for high-dimensional data. We first remove the uninformative features using p-value assessment, and the subset of unbiased features is then selected based on some statistical measures. This feature subset is then partitioned into two subsets. A feature weighting sampling technique is used to sample features from these two subsets for building trees. This approach enables one to generate more accurate trees, while allowing one to reduce dimensionality and the amount of data needed for learning RFs. An extensive set of experiments has been conducted on 47 high-dimensional real-world datasets including image datasets. The experimental results have shown that RFs with the proposed approach outperformed the existing random forests in increasing the accuracy and the AUC measures.

  10. A random forest learning assisted "divide and conquer" approach for peptide conformation search.

    PubMed

    Chen, Xin; Yang, Bing; Lin, Zijing

    2018-06-11

    Computational determination of peptide conformations is challenging as it is a problem of finding minima in a high-dimensional space. The "divide and conquer" approach is promising for reliably reducing the search space size. A random forest learning model is proposed here to expand the scope of applicability of the "divide and conquer" approach. A random forest classification algorithm is used to characterize the distributions of the backbone φ-ψ units ("words"). A random forest supervised learning model is developed to analyze the combinations of the φ-ψ units ("grammar"). It is found that amino acid residues may be grouped as equivalent "words", while the φ-ψ combinations in low-energy peptide conformations follow a distinct "grammar". The finding of equivalent words empowers the "divide and conquer" method with the flexibility of fragment substitution. The learnt grammar is used to improve the efficiency of the "divide and conquer" method by removing unfavorable φ-ψ combinations without the need of dedicated human effort. The machine learning assisted search method is illustrated by efficiently searching the conformations of GGG/AAA/GGGG/AAAA/GGGGG through assembling the structures of GFG/GFGG. Moreover, the computational cost of the new method is shown to increase rather slowly with the peptide length.

  11. Chapter 6: Creating a basis for watershed management in high elevation forests

    Treesearch

    Gerald J. Gottfried; Leonard F. DeBano; Peter F. Ffolliott

    1999-01-01

    Higher mountains and plateaus in the Central Arizona Highlands generally support southwestern mixed conifer forests, associated aspen and spruce-fir forests, and a small acreage of grasslands interspersed among the forested areas. Most of the major rivers in the region originate on headwater watersheds that support mixed conifer forests where annual precipitation,...

  12. Optimal Symmetric Multimodal Templates and Concatenated Random Forests for Supervised Brain Tumor Segmentation (Simplified) with ANTsR.

    PubMed

    Tustison, Nicholas J; Shrinidhi, K L; Wintermark, Max; Durst, Christopher R; Kandel, Benjamin M; Gee, James C; Grossman, Murray C; Avants, Brian B

    2015-04-01

    Segmenting and quantifying gliomas from MRI is an important task for diagnosis, planning intervention, and for tracking tumor changes over time. However, this task is complicated by the lack of prior knowledge concerning tumor location, spatial extent, shape, possible displacement of normal tissue, and intensity signature. To accommodate such complications, we introduce a framework for supervised segmentation based on multiple modality intensity, geometry, and asymmetry feature sets. These features drive a supervised whole-brain and tumor segmentation approach based on random forest-derived probabilities. The asymmetry-related features (based on optimal symmetric multimodal templates) demonstrate excellent discriminative properties within this framework. We also gain performance by generating probability maps from random forest models and using these maps for a refining Markov random field regularized probabilistic segmentation. This strategy allows us to interface the supervised learning capabilities of the random forest model with regularized probabilistic segmentation using the recently developed ANTsR package--a comprehensive statistical and visualization interface between the popular Advanced Normalization Tools (ANTs) and the R statistical project. The reported algorithmic framework was the top-performing entry in the MICCAI 2013 Multimodal Brain Tumor Segmentation challenge. The challenge data were widely varying consisting of both high-grade and low-grade glioma tumor four-modality MRI from five different institutions. Average Dice overlap measures for the final algorithmic assessment were 0.87, 0.78, and 0.74 for "complete", "core", and "enhanced" tumor components, respectively.

  13. High resolution satellite remote sensing used in a stratified random sampling scheme to quantify the constituent land cover components of the shifting cultivation mosaic of the Democratic Republic of Congo

    NASA Astrophysics Data System (ADS)

    Molinario, G.; Hansen, M.; Potapov, P.

    2016-12-01

    High resolution satellite imagery obtained from the National Geospatial Intelligence Agency through NASA was used to photo-interpret sample areas within the DRC. The area sampled is a stratifcation of the forest cover loss from circa 2014 that either occurred completely within the previosly mapped homogenous area of the Rural Complex, at it's interface with primary forest, or in isolated forest perforations. Previous research resulted in a map of these areas that contextualizes forest loss depending on where it occurs and with what spatial density, leading to a better understading of the real impacts on forest degradation of livelihood shifting cultivation. The stratified random sampling approach of these areas allows the characterization of the constituent land cover types within these areas, and their variability throughout the DRC. Shifting cultivation has a variable forest degradation footprint in the DRC depending on many factors that drive it, but it's role in forest degradation and deforestation had been disputed, leading us to investigate and quantify the clearing and reuse rates within the strata throughout the country.

  14. Tehran Air Pollutants Prediction Based on Random Forest Feature Selection Method

    NASA Astrophysics Data System (ADS)

    Shamsoddini, A.; Aboodi, M. R.; Karami, J.

    2017-09-01

    Air pollution as one of the most serious forms of environmental pollutions poses huge threat to human life. Air pollution leads to environmental instability, and has harmful and undesirable effects on the environment. Modern prediction methods of the pollutant concentration are able to improve decision making and provide appropriate solutions. This study examines the performance of the Random Forest feature selection in combination with multiple-linear regression and Multilayer Perceptron Artificial Neural Networks methods, in order to achieve an efficient model to estimate carbon monoxide and nitrogen dioxide, sulfur dioxide and PM2.5 contents in the air. The results indicated that Artificial Neural Networks fed by the attributes selected by Random Forest feature selection method performed more accurate than other models for the modeling of all pollutants. The estimation accuracy of sulfur dioxide emissions was lower than the other air contaminants whereas the nitrogen dioxide was predicted more accurate than the other pollutants.

  15. Research on electricity consumption forecast based on mutual information and random forests algorithm

    NASA Astrophysics Data System (ADS)

    Shi, Jing; Shi, Yunli; Tan, Jian; Zhu, Lei; Li, Hu

    2018-02-01

    Traditional power forecasting models cannot efficiently take various factors into account, neither to identify the relation factors. In this paper, the mutual information in information theory and the artificial intelligence random forests algorithm are introduced into the medium and long-term electricity demand prediction. Mutual information can identify the high relation factors based on the value of average mutual information between a variety of variables and electricity demand, different industries may be highly associated with different variables. The random forests algorithm was used for building the different industries forecasting models according to the different correlation factors. The data of electricity consumption in Jiangsu Province is taken as a practical example, and the above methods are compared with the methods without regard to mutual information and the industries. The simulation results show that the above method is scientific, effective, and can provide higher prediction accuracy.

  16. Hubble Tarantula Treasury Project - VI. Identification of Pre-Main-Sequence Stars using Machine Learning techniques

    NASA Astrophysics Data System (ADS)

    Ksoll, Victor F.; Gouliermis, Dimitrios A.; Klessen, Ralf S.; Grebel, Eva K.; Sabbi, Elena; Anderson, Jay; Lennon, Daniel J.; Cignoni, Michele; de Marchi, Guido; Smith, Linda J.; Tosi, Monica; van der Marel, Roeland P.

    2018-05-01

    The Hubble Tarantula Treasury Project (HTTP) has provided an unprecedented photometric coverage of the entire star-burst region of 30 Doradus down to the half Solar mass limit. We use the deep stellar catalogue of HTTP to identify all the pre-main-sequence (PMS) stars of the region, i.e., stars that have not started their lives on the main-sequence yet. The photometric distinction of these stars from the more evolved populations is not a trivial task due to several factors that alter their colour-magnitude diagram positions. The identification of PMS stars requires, thus, sophisticated statistical methods. We employ Machine Learning Classification techniques on the HTTP survey of more than 800,000 sources to identify the PMS stellar content of the observed field. Our methodology consists of 1) carefully selecting the most probable low-mass PMS stellar population of the star-forming cluster NGC2070, 2) using this sample to train classification algorithms to build a predictive model for PMS stars, and 3) applying this model in order to identify the most probable PMS content across the entire Tarantula Nebula. We employ Decision Tree, Random Forest and Support Vector Machine classifiers to categorise the stars as PMS and Non-PMS. The Random Forest and Support Vector Machine provided the most accurate models, predicting about 20,000 sources with a candidateship probability higher than 50 percent, and almost 10,000 PMS candidates with a probability higher than 95 percent. This is the richest and most accurate photometric catalogue of extragalactic PMS candidates across the extent of a whole star-forming complex.

  17. Machine-learning-based classification of real-time tissue elastography for hepatic fibrosis in patients with chronic hepatitis B.

    PubMed

    Chen, Yang; Luo, Yan; Huang, Wei; Hu, Die; Zheng, Rong-Qin; Cong, Shu-Zhen; Meng, Fan-Kun; Yang, Hong; Lin, Hong-Jun; Sun, Yan; Wang, Xiu-Yan; Wu, Tao; Ren, Jie; Pei, Shu-Fang; Zheng, Ying; He, Yun; Hu, Yu; Yang, Na; Yan, Hongmei

    2017-10-01

    Hepatic fibrosis is a common middle stage of the pathological processes of chronic liver diseases. Clinical intervention during the early stages of hepatic fibrosis can slow the development of liver cirrhosis and reduce the risk of developing liver cancer. Performing a liver biopsy, the gold standard for viral liver disease management, has drawbacks such as invasiveness and a relatively high sampling error rate. Real-time tissue elastography (RTE), one of the most recently developed technologies, might be promising imaging technology because it is both noninvasive and provides accurate assessments of hepatic fibrosis. However, determining the stage of liver fibrosis from RTE images in a clinic is a challenging task. In this study, in contrast to the previous liver fibrosis index (LFI) method, which predicts the stage of diagnosis using RTE images and multiple regression analysis, we employed four classical classifiers (i.e., Support Vector Machine, Naïve Bayes, Random Forest and K-Nearest Neighbor) to build a decision-support system to improve the hepatitis B stage diagnosis performance. Eleven RTE image features were obtained from 513 subjects who underwent liver biopsies in this multicenter collaborative research. The experimental results showed that the adopted classifiers significantly outperformed the LFI method and that the Random Forest(RF) classifier provided the highest average accuracy among the four machine algorithms. This result suggests that sophisticated machine-learning methods can be powerful tools for evaluating the stage of hepatic fibrosis and show promise for clinical applications. Copyright © 2017 Elsevier Ltd. All rights reserved.

  18. Sentinel node status prediction by four statistical models: results from a large bi-institutional series (n = 1132).

    PubMed

    Mocellin, Simone; Thompson, John F; Pasquali, Sandro; Montesco, Maria C; Pilati, Pierluigi; Nitti, Donato; Saw, Robyn P; Scolyer, Richard A; Stretch, Jonathan R; Rossi, Carlo R

    2009-12-01

    To improve selection for sentinel node (SN) biopsy (SNB) in patients with cutaneous melanoma using statistical models predicting SN status. About 80% of patients currently undergoing SNB are node negative. In the absence of conclusive evidence of a SNBassociated survival benefit, these patients may be over-treated. Here, we tested the efficiency of 4 different models in predicting SN status. The clinicopathologic data (age, gender, tumor thickness, Clark level, regression, ulceration, histologic subtype, and mitotic index) of 1132 melanoma patients who had undergone SNB at institutions in Italy and Australia were analyzed. Logistic regression, classification tree, random forest, and support vector machine models were fitted to the data. The predictive models were built with the aim of maximizing the negative predictive value (NPV) and reducing the rate of SNB procedures though minimizing the error rate. After cross-validation logistic regression, classification tree, random forest, and support vector machine predictive models obtained clinically relevant NPV (93.6%, 94.0%, 97.1%, and 93.0%, respectively), SNB reduction (27.5%, 29.8%, 18.2%, and 30.1%, respectively), and error rates (1.8%, 1.8%, 0.5%, and 2.1%, respectively). Using commonly available clinicopathologic variables, predictive models can preoperatively identify a proportion of patients ( approximately 25%) who might be spared SNB, with an acceptable (1%-2%) error. If validated in large prospective series, these models might be implemented in the clinical setting for improved patient selection, which ultimately would lead to better quality of life for patients and optimization of resource allocation for the health care system.

  19. Combining macula clinical signs and patient characteristics for age-related macular degeneration diagnosis: a machine learning approach.

    PubMed

    Fraccaro, Paolo; Nicolo, Massimo; Bonetto, Monica; Giacomini, Mauro; Weller, Peter; Traverso, Carlo Enrico; Prosperi, Mattia; OSullivan, Dympna

    2015-01-27

    To investigate machine learning methods, ranging from simpler interpretable techniques to complex (non-linear) "black-box" approaches, for automated diagnosis of Age-related Macular Degeneration (AMD). Data from healthy subjects and patients diagnosed with AMD or other retinal diseases were collected during routine visits via an Electronic Health Record (EHR) system. Patients' attributes included demographics and, for each eye, presence/absence of major AMD-related clinical signs (soft drusen, retinal pigment epitelium, defects/pigment mottling, depigmentation area, subretinal haemorrhage, subretinal fluid, macula thickness, macular scar, subretinal fibrosis). Interpretable techniques known as white box methods including logistic regression and decision trees as well as less interpreitable techniques known as black box methods, such as support vector machines (SVM), random forests and AdaBoost, were used to develop models (trained and validated on unseen data) to diagnose AMD. The gold standard was confirmed diagnosis of AMD by physicians. Sensitivity, specificity and area under the receiver operating characteristic (AUC) were used to assess performance. Study population included 487 patients (912 eyes). In terms of AUC, random forests, logistic regression and adaboost showed a mean performance of (0.92), followed by SVM and decision trees (0.90). All machine learning models identified soft drusen and age as the most discriminating variables in clinicians' decision pathways to diagnose AMD. Both black-box and white box methods performed well in identifying diagnoses of AMD and their decision pathways. Machine learning models developed through the proposed approach, relying on clinical signs identified by retinal specialists, could be embedded into EHR to provide physicians with real time (interpretable) support.

  20. Feature combination networks for the interpretation of statistical machine learning models: application to Ames mutagenicity.

    PubMed

    Webb, Samuel J; Hanser, Thierry; Howlin, Brendan; Krause, Paul; Vessey, Jonathan D

    2014-03-25

    A new algorithm has been developed to enable the interpretation of black box models. The developed algorithm is agnostic to learning algorithm and open to all structural based descriptors such as fragments, keys and hashed fingerprints. The algorithm has provided meaningful interpretation of Ames mutagenicity predictions from both random forest and support vector machine models built on a variety of structural fingerprints.A fragmentation algorithm is utilised to investigate the model's behaviour on specific substructures present in the query. An output is formulated summarising causes of activation and deactivation. The algorithm is able to identify multiple causes of activation or deactivation in addition to identifying localised deactivations where the prediction for the query is active overall. No loss in performance is seen as there is no change in the prediction; the interpretation is produced directly on the model's behaviour for the specific query. Models have been built using multiple learning algorithms including support vector machine and random forest. The models were built on public Ames mutagenicity data and a variety of fingerprint descriptors were used. These models produced a good performance in both internal and external validation with accuracies around 82%. The models were used to evaluate the interpretation algorithm. Interpretation was revealed that links closely with understood mechanisms for Ames mutagenicity. This methodology allows for a greater utilisation of the predictions made by black box models and can expedite further study based on the output for a (quantitative) structure activity model. Additionally the algorithm could be utilised for chemical dataset investigation and knowledge extraction/human SAR development.

  1. Effectiveness of Strict vs. Multiple Use Protected Areas in Reducing Tropical Forest Fires: A Global Analysis Using Matching Methods

    PubMed Central

    Nelson, Andrew; Chomitz, Kenneth M.

    2011-01-01

    Protected areas (PAs) cover a quarter of the tropical forest estate. Yet there is debate over the effectiveness of PAs in reducing deforestation, especially when local people have rights to use the forest. A key analytic problem is the likely placement of PAs on marginal lands with low pressure for deforestation, biasing comparisons between protected and unprotected areas. Using matching techniques to control for this bias, this paper analyzes the global tropical forest biome using forest fires as a high resolution proxy for deforestation; disaggregates impacts by remoteness, a proxy for deforestation pressure; and compares strictly protected vs. multiple use PAs vs indigenous areas. Fire activity was overlaid on a 1 km map of tropical forest extent in 2000; land use change was inferred for any point experiencing one or more fires. Sampled points in pre-2000 PAs were matched with randomly selected never-protected points in the same country. Matching criteria included distance to road network, distance to major cities, elevation and slope, and rainfall. In Latin America and Asia, strict PAs substantially reduced fire incidence, but multi-use PAs were even more effective. In Latin America, where there is data on indigenous areas, these areas reduce forest fire incidence by 16 percentage points, over two and a half times as much as naïve (unmatched) comparison with unprotected areas would suggest. In Africa, more recently established strict PAs appear to be effective, but multi-use tropical forest protected areas yield few sample points, and their impacts are not robustly estimated. These results suggest that forest protection can contribute both to biodiversity conservation and CO2 mitigation goals, with particular relevance to the REDD agenda. Encouragingly, indigenous areas and multi-use protected areas can help to accomplish these goals, suggesting some compatibility between global environmental goals and support for local livelihoods. PMID:21857950

  2. VT0005 In Action: National Forest Biomass Inventory Using Airborne Lidar Sampling

    NASA Astrophysics Data System (ADS)

    Saatchi, S. S.; Xu, L.; Meyer, V.; Ferraz, A.; Yang, Y.; Shapiro, A.; Bastin, J. F.

    2016-12-01

    Tropical countries are required to produce robust and verifiable estimates of forest carbon stocks for successful implementation of climate change mitigation. Lack of systematic national inventory data due to access, cost, and infrastructure, has impacted the capacity of most tropical countries to accurately report the GHG emissions to the international community. Here, we report on the development of the aboveground forest carbon (AGC) map of Democratic Republic of Congo (DRC) by using the VCS (Verified Carbon Standard) methodology developed by Sassan Saatchi (VT0005) using high-resolution airborne LiDAR samples. The methodology provides the distribution of the carbon stocks in aboveground live trees of more than 150 million ha of forests at 1-ha spatial resolution in DRC using more than 430, 000 ha of systematic random airborne Lidar inventory samples of forest structure. We developed a LIDAR aboveground biomass allometry using more than 100 1-ha plots across forest types and power-law model with LIDAR height metrics and average landscape scale wood density. The methodology provided estimates of forest biomass over the entire country using two approaches: 1) mean, variance, and total carbon estimates for each forest type present in DRC using inventory statistical techniques, and 2) a wall-to-wall map of the forest biomass extrapolated using satellite radar (ALOS PALSAR), surface topography from SRTM, and spectral information from Landsat (TM) and machine learning algorithms. We present the methodology, the estimates of carbon stocks and the spatial uncertainty over the entire country. AcknowledgementsThe theoretical research was carried out partially at the Jet Propulsion Laboratory, California Institute of Technology, under a contract with the National Aeronautics and Space Administration, and the design and implementation in the Democratic Republic of Congo was carried out at the Institute of Environment and Sustainability at University of California Los Angeles through the support of the International Climate Initiative of the German Ministry of Environment, Conservation and Nuclear Security, and the KFW Development Bank.

  3. Modeling the Emergent Impacts of Harvesting Acadian Forests over 100+ Years

    NASA Astrophysics Data System (ADS)

    Luus, K. A.; Plug, L. J.

    2007-12-01

    Harvesting strategies and policies for Acadian forest in Nova Scotia, Canada, presently are set using Decision Support Models (DSMs) that aim to maximize the long-term (>100y) value of forests through decisions implemented over short time horizons (5-80 years). However, DSMs typically are aspatial, lack ecological processes and do not treat erosion, so the long-term (>100y) emergent impacts of the prescribed forestry decisions on erosion and vegetation in Acadian forests remain poorly known. To better understand these impacts, we created an equation-based model that simulates the evolution of a ≥4 km2 forest in time steps of 1 y and at a spatial resolution of 3 m2, the footprint of a single mature tree. The model combines 1) ecological processes of recruitment, competition, and mortality; 2) geomorphic processes of hillslope erosion; 3) anthropic processes of tree harvesting, replanting, and road construction under constraints imposed by regulations and cost/benefit ratio. The model uses digital elevation models, parameters (where available), and calibration (where measurements are not available) for conditions presently found in central Cape Breton, Nova Scotia. The model is unique because it 1) deals with the impacts of harvesting on an Acadian forest; and 2) vegetation and erosion are coupled. The model was tested by comparing the species-specific biomass of long-term (40 y) forest plot data to simulated results. At the spatial scale of individual 1 ha plots, model predictions presently account for approximately 50% of observed biomass changes through time, but predictions are hampered by the effects of serendipitous "random" events such as single tree windfall. Harvesting increases the cumulative erosion over 3000 years by 240% when compared to an old growth forest and significantly suppresses the growth of Balsam Fir and Sugar Maple. We discuss further tests of the model, and how it might be used to investigate the long-term sustainability of the recommendations made by DSMs and to better understand the relationship between vegetation, erosion, and forest management strategies.

  4. Ensemble of random forests One vs. Rest classifiers for MCI and AD prediction using ANOVA cortical and subcortical feature selection and partial least squares.

    PubMed

    Ramírez, J; Górriz, J M; Ortiz, A; Martínez-Murcia, F J; Segovia, F; Salas-Gonzalez, D; Castillo-Barnes, D; Illán, I A; Puntonet, C G

    2018-05-15

    Alzheimer's disease (AD) is the most common cause of dementia in the elderly and affects approximately 30 million individuals worldwide. Mild cognitive impairment (MCI) is very frequently a prodromal phase of AD, and existing studies have suggested that people with MCI tend to progress to AD at a rate of about 10-15% per year. However, the ability of clinicians and machine learning systems to predict AD based on MRI biomarkers at an early stage is still a challenging problem that can have a great impact in improving treatments. The proposed system, developed by the SiPBA-UGR team for this challenge, is based on feature standardization, ANOVA feature selection, partial least squares feature dimension reduction and an ensemble of One vs. Rest random forest classifiers. With the aim of improving its performance when discriminating healthy controls (HC) from MCI, a second binary classification level was introduced that reconsiders the HC and MCI predictions of the first level. The system was trained and evaluated on an ADNI datasets that consist of T1-weighted MRI morphological measurements from HC, stable MCI, converter MCI and AD subjects. The proposed system yields a 56.25% classification score on the test subset which consists of 160 real subjects. The classifier yielded the best performance when compared to: (i) One vs. One (OvO), One vs. Rest (OvR) and error correcting output codes (ECOC) as strategies for reducing the multiclass classification task to multiple binary classification problems, (ii) support vector machines, gradient boosting classifier and random forest as base binary classifiers, and (iii) bagging ensemble learning. A robust method has been proposed for the international challenge on MCI prediction based on MRI data. The system yielded the second best performance during the competition with an accuracy rate of 56.25% when evaluated on the real subjects of the test set. Copyright © 2017 Elsevier B.V. All rights reserved.

  5. Woody plant phylogenetic diversity mediates bottom-up control of arthropod biomass in species-rich forests.

    PubMed

    Schuldt, Andreas; Baruffol, Martin; Bruelheide, Helge; Chen, Simon; Chi, Xiulian; Wall, Marcus; Assmann, Thorsten

    2014-09-01

    Global change is predicted to cause non-random species loss in plant communities, with consequences for ecosystem functioning. However, beyond the simple effects of plant species richness, little is known about how plant diversity and its loss influence higher trophic levels, which are crucial to the functioning of many species-rich ecosystems. We analyzed to what extent woody plant phylogenetic diversity and species richness contribute to explaining the biomass and abundance of herbivorous and predatory arthropods in a species-rich forest in subtropical China. The biomass and abundance of leaf-chewing herbivores, and the biomass dispersion of herbivores within plots, increased with woody plant phylogenetic diversity. Woody plant species richness had much weaker effects on arthropods, but interacted with plant phylogenetic diversity to negatively affect the ratio of predator to herbivore biomass. Overall, our results point to a strong bottom-up control of functionally important herbivores mediated particularly by plant phylogenetic diversity, but do not support the general expectation that top-down predator effects increase with plant diversity. The observed effects appear to be driven primarily by increasing resource diversity rather than diversity-dependent primary productivity, as the latter did not affect arthropods. The strong effects of plant phylogenetic diversity and the overall weaker effects of plant species richness show that the diversity-dependence of ecosystem processes and interactions across trophic levels can depend fundamentally on non-random species associations. This has important implications for the regulation of ecosystem functions via trophic interaction pathways and for the way species loss may impact these pathways in species-rich forests.

  6. Where to nest? Ecological determinants of chimpanzee nest abundance and distribution at the habitat and tree species scale.

    PubMed

    Carvalho, Joana S; Meyer, Christoph F J; Vicente, Luis; Marques, Tiago A

    2015-02-01

    Conversion of forests to anthropogenic land-uses increasingly subjects chimpanzee populations to habitat changes and concomitant alterations in the plant resources available to them for nesting and feeding. Based on nest count surveys conducted during the dry season, we investigated nest tree species selection and the effect of vegetation attributes on nest abundance of the western chimpanzee, Pan troglodytes verus, at Lagoas de Cufada Natural Park (LCNP), Guinea-Bissau, a forest-savannah mosaic widely disturbed by humans. Further, we assessed patterns of nest height distribution to determine support for the anti-predator hypothesis. A zero-altered generalized linear mixed model showed that nest abundance was negatively related to floristic diversity (exponential form of the Shannon index) and positively with the availability of smaller-sized trees, reflecting characteristics of dense-canopy forest. A positive correlation between nest abundance and floristic richness (number of plant species) and composition indicated that species-rich open habitats are also important in nest site selection. Restricting this analysis to feeding trees, nest abundance was again positively associated with the availability of smaller-sized trees, further supporting the preference for nesting in food tree species from dense forest. Nest tree species selection was non-random, and oil palms were used at a much lower proportion (10%) than previously reported from other study sites in forest-savannah mosaics. While this study suggests that human disturbance may underlie the exclusive arboreal nesting at LCNP, better quantitative data are needed to determine to what extent the construction of elevated nests is in fact a response to predators able to climb trees. Given the importance of LCNP as refuge for Pan t. verus our findings can improve conservation decisions for the management of this important umbrella species as well as its remaining suitable habitats. © 2014 Wiley Periodicals, Inc.

  7. Nitrogen spatial heterogeneity influences diversity following restoration in a ponderosa pine forest, Montana.

    PubMed

    Gundale, Michael J; Metlen, Kerry L; Fiedler, Carl E; DeLuca, Thomas H

    2006-04-01

    The resource heterogeneity hypothesis (RHH) is frequently cited in the ecological literature as an important mechanism for maintaining species diversity. The RHH has rarely been evaluated in the context of restoration ecology in which a commonly cited goal is to restore diversity. In this study we focused on the spatial heterogeneity of total inorganic nitrogen (TIN) following restoration treatments in a ponderosa pine (Pinus ponderosa)/Douglas-fir (Pseudotsuga menziesii) forest in western Montana, USA. Our objective was to evaluate relationships between understory species richness and TIN heterogeneity following mechanical thinning (thin-only), prescribed burning (burn-only), and mechanical thinning with prescribed burning (thin/burn) to discern the ecological and management implications of these restoration approaches. We employed a randomized block design, with three 9-ha replicates of each treatment and an untreated control. Within each treatment, we randomly established a 20 x 50 m (1000 m2) plot in which we measured species richness across the entire plot and in 12 1-m(2) quadrats randomly placed within each larger plot. Additionally, we measured TIN from a grid consisting of 112 soil samples (0-5 cm) in each plot and computed standard deviations as a measure of heterogeneity. We found a correlation between the net increase in species richness and the TIN standard deviations one and two years following restoration treatments, supporting RHH. Using nonmetric multidimensional scaling ordination and chi-squared analysis, we found that high and low TIN quadrats contained different understory communities in 2003 and 2004, further supporting RHH. A comparison of restoration treatments demonstrated that thin/burn and burn-only treatments created higher N heterogeneity relative to the control. We also found that within prescribed burn treatments, TIN heterogeneity was positively correlated with fine-fuel consumption, a variable reflecting burn severity. These findings may lead to more informed restoration decisions that consider treatment effects on understory diversity in ponderosa pine/Douglas-fir ecosystems.

  8. Effects of the amount and composition of the forest floor on emergence and early establishment of loblolly pine seedlings

    Treesearch

    Michael G. Shelton

    1995-01-01

    Five forest floor weights (0, 10, 20, 30, and 40 MgJha), three forest floor compositions (pine, pine-hardwood, and hardwood), and two seed placements (forest floor and soil surface) were tested in a three-factorial. split-plot design with four incomplete, randomized blocks. The experiment was conducted in a nursery setting and used wooden frames to define 0.145-m

  9. Extrapolating intensified forest inventory data to the surrounding landscape using landsat

    Treesearch

    Evan B. Brooks; John W. Coulston; Valerie A. Thomas; Randolph H. Wynne

    2015-01-01

    In 2011, a collection of spatially intensified plots was established on three of the Experimental Forests and Ranges (EFRs) sites with the intent of facilitating FIA program objectives for regional extrapolation. Characteristic coefficients from harmonic regression (HR) analysis of associated Landsat stacks are used as inputs into a conditional random forests model to...

  10. Forest-floor disturbance reduces chipmunk (Tamias spp.) abundance two years after variable-retention harvest of Pacific Northwestern forests

    Treesearch

    Randall J. Wilk; Timothy B. Harrington; Robert A. Gitzen; Chris C. Maguire

    2015-01-01

    We evaluated the two-year effects of variable-retention harvest on chipmunk (Tamias spp.) abundance (N^) and habitat in mature coniferous forests in western Oregon and Washington because wildlife responses to density/pattern of retained trees remain largely unknown. In a randomized complete-block design, six...

  11. Highlights of the national evaluation of the Forest Stewardship Planning Program

    Treesearch

    R.J. Moulton; J.D. Esseks

    2001-01-01

    In 1998 and 1999, a nationwide random sample of 1238 nonindustrial private (NIPF) landowners with approved multiple resource Forest Stewardship Plans were interviewed to determine if this program is meeting its Congressional mandate of promoting sustainable management of forest resources on NIPF ownerships. It was found that two-thirds of program participants had never...

  12. Ownership and ecosystem as sources of spatial heterogeneity in a forested landscape, Wisconsin, USA

    Treesearch

    Thomas R. Crow; George E. Host; David J. Mladenoff

    1999-01-01

    The interaction between physical environment and land ownership in creating spatial heterogeneity was studied in largely forested landscapes of northern Wisconsin, USA. A stratified random approach was used in which 2500-ha plots representing two ownerships (National Forest and private non-industrial) were located within two regional ecosystems (extremely well-drained...

  13. Integration of spectral, spatial and morphometric data into lithological mapping: A comparison of different Machine Learning Algorithms in the Kurdistan Region, NE Iraq

    NASA Astrophysics Data System (ADS)

    Othman, Arsalan A.; Gloaguen, Richard

    2017-09-01

    Lithological mapping in mountainous regions is often impeded by limited accessibility due to relief. This study aims to evaluate (1) the performance of different supervised classification approaches using remote sensing data and (2) the use of additional information such as geomorphology. We exemplify the methodology in the Bardi-Zard area in NE Iraq, a part of the Zagros Fold - Thrust Belt, known for its chromite deposits. We highlighted the improvement of remote sensing geological classification by integrating geomorphic features and spatial information in the classification scheme. We performed a Maximum Likelihood (ML) classification method besides two Machine Learning Algorithms (MLA): Support Vector Machine (SVM) and Random Forest (RF) to allow the joint use of geomorphic features, Band Ratio (BR), Principal Component Analysis (PCA), spatial information (spatial coordinates) and multispectral data of the Advanced Space-borne Thermal Emission and Reflection radiometer (ASTER) satellite. The RF algorithm showed reliable results and discriminated serpentinite, talus and terrace deposits, red argillites with conglomerates and limestone, limy conglomerates and limestone conglomerates, tuffites interbedded with basic lavas, limestone and Metamorphosed limestone and reddish green shales. The best overall accuracy (∼80%) was achieved by Random Forest (RF) algorithms in the majority of the sixteen tested combination datasets.

  14. Ensemble Methods for Classification of Physical Activities from Wrist Accelerometry.

    PubMed

    Chowdhury, Alok Kumar; Tjondronegoro, Dian; Chandran, Vinod; Trost, Stewart G

    2017-09-01

    To investigate whether the use of ensemble learning algorithms improve physical activity recognition accuracy compared to the single classifier algorithms, and to compare the classification accuracy achieved by three conventional ensemble machine learning methods (bagging, boosting, random forest) and a custom ensemble model comprising four algorithms commonly used for activity recognition (binary decision tree, k nearest neighbor, support vector machine, and neural network). The study used three independent data sets that included wrist-worn accelerometer data. For each data set, a four-step classification framework consisting of data preprocessing, feature extraction, normalization and feature selection, and classifier training and testing was implemented. For the custom ensemble, decisions from the single classifiers were aggregated using three decision fusion methods: weighted majority vote, naïve Bayes combination, and behavior knowledge space combination. Classifiers were cross-validated using leave-one subject out cross-validation and compared on the basis of average F1 scores. In all three data sets, ensemble learning methods consistently outperformed the individual classifiers. Among the conventional ensemble methods, random forest models provided consistently high activity recognition; however, the custom ensemble model using weighted majority voting demonstrated the highest classification accuracy in two of the three data sets. Combining multiple individual classifiers using conventional or custom ensemble learning methods can improve activity recognition accuracy from wrist-worn accelerometer data.

  15. Predicting losing and gaining river reaches in lowland New Zealand based on a statistical methodology

    NASA Astrophysics Data System (ADS)

    Yang, Jing; Zammit, Christian; Dudley, Bruce

    2017-04-01

    The phenomenon of losing and gaining in rivers normally takes place in lowland where often there are various, sometimes conflicting uses for water resources, e.g., agriculture, industry, recreation, and maintenance of ecosystem function. To better support water allocation decisions, it is crucial to understand the location and seasonal dynamics of these losses and gains. We present a statistical methodology to predict losing and gaining river reaches in New Zealand based on 1) information surveys with surface water and groundwater experts from regional government, 2) A collection of river/watershed characteristics, including climate, soil and hydrogeologic information, and 3) the random forests technique. The surveys on losing and gaining reaches were conducted face-to-face at 16 New Zealand regional government authorities, and climate, soil, river geometry, and hydrogeologic data from various sources were collected and compiled to represent river/watershed characteristics. The random forests technique was used to build up the statistical relationship between river reach status (gain and loss) and river/watershed characteristics, and then to predict for river reaches at Strahler order one without prior losing and gaining information. Results show that the model has a classification error of around 10% for "gain" and "loss". The results will assist further research, and water allocation decisions in lowland New Zealand.

  16. Predicting human liver microsomal stability with machine learning techniques.

    PubMed

    Sakiyama, Yojiro; Yuki, Hitomi; Moriya, Takashi; Hattori, Kazunari; Suzuki, Misaki; Shimada, Kaoru; Honma, Teruki

    2008-02-01

    To ensure a continuing pipeline in pharmaceutical research, lead candidates must possess appropriate metabolic stability in the drug discovery process. In vitro ADMET (absorption, distribution, metabolism, elimination, and toxicity) screening provides us with useful information regarding the metabolic stability of compounds. However, before the synthesis stage, an efficient process is required in order to deal with the vast quantity of data from large compound libraries and high-throughput screening. Here we have derived a relationship between the chemical structure and its metabolic stability for a data set of in-house compounds by means of various in silico machine learning such as random forest, support vector machine (SVM), logistic regression, and recursive partitioning. For model building, 1952 proprietary compounds comprising two classes (stable/unstable) were used with 193 descriptors calculated by Molecular Operating Environment. The results using test compounds have demonstrated that all classifiers yielded satisfactory results (accuracy > 0.8, sensitivity > 0.9, specificity > 0.6, and precision > 0.8). Above all, classification by random forest as well as SVM yielded kappa values of approximately 0.7 in an independent validation set, slightly higher than other classification tools. These results suggest that nonlinear/ensemble-based classification methods might prove useful in the area of in silico ADME modeling.

  17. Application of Machine Learning Approaches for Classifying Sitting Posture Based on Force and Acceleration Sensors.

    PubMed

    Zemp, Roland; Tanadini, Matteo; Plüss, Stefan; Schnüriger, Karin; Singh, Navrag B; Taylor, William R; Lorenzetti, Silvio

    2016-01-01

    Occupational musculoskeletal disorders, particularly chronic low back pain (LBP), are ubiquitous due to prolonged static sitting or nonergonomic sitting positions. Therefore, the aim of this study was to develop an instrumented chair with force and acceleration sensors to determine the accuracy of automatically identifying the user's sitting position by applying five different machine learning methods (Support Vector Machines, Multinomial Regression, Boosting, Neural Networks, and Random Forest). Forty-one subjects were requested to sit four times in seven different prescribed sitting positions (total 1148 samples). Sixteen force sensor values and the backrest angle were used as the explanatory variables (features) for the classification. The different classification methods were compared by means of a Leave-One-Out cross-validation approach. The best performance was achieved using the Random Forest classification algorithm, producing a mean classification accuracy of 90.9% for subjects with which the algorithm was not familiar. The classification accuracy varied between 81% and 98% for the seven different sitting positions. The present study showed the possibility of accurately classifying different sitting positions by means of the introduced instrumented office chair combined with machine learning analyses. The use of such novel approaches for the accurate assessment of chair usage could offer insights into the relationships between sitting position, sitting behaviour, and the occurrence of musculoskeletal disorders.

  18. Mortality risk prediction in burn injury: Comparison of logistic regression with machine learning approaches.

    PubMed

    Stylianou, Neophytos; Akbarov, Artur; Kontopantelis, Evangelos; Buchan, Iain; Dunn, Ken W

    2015-08-01

    Predicting mortality from burn injury has traditionally employed logistic regression models. Alternative machine learning methods have been introduced in some areas of clinical prediction as the necessary software and computational facilities have become accessible. Here we compare logistic regression and machine learning predictions of mortality from burn. An established logistic mortality model was compared to machine learning methods (artificial neural network, support vector machine, random forests and naïve Bayes) using a population-based (England & Wales) case-cohort registry. Predictive evaluation used: area under the receiver operating characteristic curve; sensitivity; specificity; positive predictive value and Youden's index. All methods had comparable discriminatory abilities, similar sensitivities, specificities and positive predictive values. Although some machine learning methods performed marginally better than logistic regression the differences were seldom statistically significant and clinically insubstantial. Random forests were marginally better for high positive predictive value and reasonable sensitivity. Neural networks yielded slightly better prediction overall. Logistic regression gives an optimal mix of performance and interpretability. The established logistic regression model of burn mortality performs well against more complex alternatives. Clinical prediction with a small set of strong, stable, independent predictors is unlikely to gain much from machine learning outside specialist research contexts. Copyright © 2015 Elsevier Ltd and ISBI. All rights reserved.

  19. Advanced Subspace Techniques for Modeling Channel and Session Variability in a Speaker Recognition System

    DTIC Science & Technology

    2012-03-01

    with each SVM discriminating between a pair of the N total speakers in the data set. The (( + 1))/2 classifiers then vote on the final...classification of a test sample. The Random Forest classifier is an ensemble classifier that votes amongst decision trees generated with each node using...Forest vote , and the effects of overtraining will be mitigated by the fact that each decision tree is overtrained differently (due to the random

  20. Probability machines: consistent probability estimation using nonparametric learning machines.

    PubMed

    Malley, J D; Kruppa, J; Dasgupta, A; Malley, K G; Ziegler, A

    2012-01-01

    Most machine learning approaches only provide a classification for binary responses. However, probabilities are required for risk estimation using individual patient characteristics. It has been shown recently that every statistical learning machine known to be consistent for a nonparametric regression problem is a probability machine that is provably consistent for this estimation problem. The aim of this paper is to show how random forests and nearest neighbors can be used for consistent estimation of individual probabilities. Two random forest algorithms and two nearest neighbor algorithms are described in detail for estimation of individual probabilities. We discuss the consistency of random forests, nearest neighbors and other learning machines in detail. We conduct a simulation study to illustrate the validity of the methods. We exemplify the algorithms by analyzing two well-known data sets on the diagnosis of appendicitis and the diagnosis of diabetes in Pima Indians. Simulations demonstrate the validity of the method. With the real data application, we show the accuracy and practicality of this approach. We provide sample code from R packages in which the probability estimation is already available. This means that all calculations can be performed using existing software. Random forest algorithms as well as nearest neighbor approaches are valid machine learning methods for estimating individual probabilities for binary responses. Freely available implementations are available in R and may be used for applications.

  1. A random forest algorithm for nowcasting of intense precipitation events

    NASA Astrophysics Data System (ADS)

    Das, Saurabh; Chakraborty, Rohit; Maitra, Animesh

    2017-09-01

    Automatic nowcasting of convective initiation and thunderstorms has potential applications in several sectors including aviation planning and disaster management. In this paper, random forest based machine learning algorithm is tested for nowcasting of convective rain with a ground based radiometer. Brightness temperatures measured at 14 frequencies (7 frequencies in 22-31 GHz band and 7 frequencies in 51-58 GHz bands) are utilized as the inputs of the model. The lower frequency band is associated to the water vapor absorption whereas the upper frequency band relates to the oxygen absorption and hence, provide information on the temperature and humidity of the atmosphere. Synthetic minority over-sampling technique is used to balance the data set and 10-fold cross validation is used to assess the performance of the model. Results indicate that random forest algorithm with fixed alarm generation time of 30 min and 60 min performs quite well (probability of detection of all types of weather condition ∼90%) with low false alarms. It is, however, also observed that reducing the alarm generation time improves the threat score significantly and also decreases false alarms. The proposed model is found to be very sensitive to the boundary layer instability as indicated by the variable importance measure. The study shows the suitability of a random forest algorithm for nowcasting application utilizing a large number of input parameters from diverse sources and can be utilized in other forecasting problems.

  2. Learning accurate and interpretable models based on regularized random forests regression

    PubMed Central

    2014-01-01

    Background Many biology related research works combine data from multiple sources in an effort to understand the underlying problems. It is important to find and interpret the most important information from these sources. Thus it will be beneficial to have an effective algorithm that can simultaneously extract decision rules and select critical features for good interpretation while preserving the prediction performance. Methods In this study, we focus on regression problems for biological data where target outcomes are continuous. In general, models constructed from linear regression approaches are relatively easy to interpret. However, many practical biological applications are nonlinear in essence where we can hardly find a direct linear relationship between input and output. Nonlinear regression techniques can reveal nonlinear relationship of data, but are generally hard for human to interpret. We propose a rule based regression algorithm that uses 1-norm regularized random forests. The proposed approach simultaneously extracts a small number of rules from generated random forests and eliminates unimportant features. Results We tested the approach on some biological data sets. The proposed approach is able to construct a significantly smaller set of regression rules using a subset of attributes while achieving prediction performance comparable to that of random forests regression. Conclusion It demonstrates high potential in aiding prediction and interpretation of nonlinear relationships of the subject being studied. PMID:25350120

  3. The contribution of competition to tree mortality in old-growth coniferous forests

    USGS Publications Warehouse

    Das, A.; Battles, J.; Stephenson, N.L.; van Mantgem, P.J.

    2011-01-01

    Competition is a well-documented contributor to tree mortality in temperate forests, with numerous studies documenting a relationship between tree death and the competitive environment. Models frequently rely on competition as the only non-random mechanism affecting tree mortality. However, for mature forests, competition may cease to be the primary driver of mortality.We use a large, long-term dataset to study the importance of competition in determining tree mortality in old-growth forests on the western slope of the Sierra Nevada of California, U.S.A. We make use of the comparative spatial configuration of dead and live trees, changes in tree spatial pattern through time, and field assessments of contributors to an individual tree's death to quantify competitive effects.Competition was apparently a significant contributor to tree mortality in these forests. Trees that died tended to be in more competitive environments than trees that survived, and suppression frequently appeared as a factor contributing to mortality. On the other hand, based on spatial pattern analyses, only three of 14 plots demonstrated compelling evidence that competition was dominating mortality. Most of the rest of the plots fell within the expectation for random mortality, and three fit neither the random nor the competition model. These results suggest that while competition is often playing a significant role in tree mortality processes in these forests it only infrequently governs those processes. In addition, the field assessments indicated a substantial presence of biotic mortality agents in trees that died.While competition is almost certainly important, demographics in these forests cannot accurately be characterized without a better grasp of other mortality processes. In particular, we likely need a better understanding of biotic agents and their interactions with one another and with competition. ?? 2011.

  4. Riparian Ficus Tree Communities: The Distribution and Abundance of Riparian Fig Trees in Northern Thailand

    PubMed Central

    Pothasin, Pornwiwan; Compton, Stephen G.; Wangpakapattanawong, Prasit

    2014-01-01

    Fig trees (Ficus) are often ecologically significant keystone species because they sustain populations of the many seed-dispersing animals that feed on their fruits. They are prominent components of riparian zones where they may also contribute to bank stability as well as supporting associated animals. The diversity and distributions of riparian fig trees in deciduous and evergreen forests in Chiang Mai Province, Northern Thailand were investigated in 2010–2012. To record the diversity and abundance of riparian fig trees, we (1) calculated stem density, species richness, and diversity indices in 20×50 m randomly selected quadrats along four streams and (2) measured the distances of individual trees from four streams to determine if species exhibit distinct distribution patterns within riparian zones. A total of 1169 individuals (from c. 4 ha) were recorded in the quadrats, representing 33 Ficus species (13 monoecious and 20 dioecious) from six sub-genera and about 70% of all the species recorded from northern Thailand. All 33 species had at least some stems in close proximity to the streams, but they varied in their typical proximity, with F. squamosa Roxb. and F. ischnopoda Miq the most strictly stream-side species. The riparian forests in Northern Thailand support a rich diversity and high density of Ficus species and our results emphasise the importance of fig tree within the broader priorities of riparian area conservation. Plans to maintain or restore properly functioning riparian forests need to take into account their significance. PMID:25310189

  5. Defaunation affects Astrocaryum gratum (Arecales: Arecaceae) seed survivorship in a sub-montane tropical forest.

    PubMed

    Aliaga-Rossel, Enzo; Manuel Fragoso, Jos

    2015-03-01

    Animal-plant interactions in Neotropical forests are complex processes. Within these processes, mid- to large-sized mammals consume fruits and seeds from several species; however, because of their size these mammals are overhunted, resulting in defaunated forests. Our objective was to evaluate and compare seed removal and survivorship in a forest with no hunting, a forest with moderate or reduced hunting, and a forest with higher hunting pressure. We examined the interaction between Astrocaryum gratum and white lipped peccary (Tayassu pecari) to tease apart the defaunation process. To isolate and evaluate mammal seed removal rates and to identify the causes of mortality on Agratum, under the three different hunting pressures forests, we used exclosures in each one. In four different forest-patches for each forest, we positioned a block-treatment consisting of three exclosures (total exclusion, peccary exclusion, and control), randomly distributed 5m apart and the block-treatments spaced 50-75 m apart from one another. We established 15 treatments in total for each patch (5 blocks per patch). There were 20 blocks within each forest type. For total exclusion, all vertebrates were excluded using galvanized wire mesh exclosures. The second, the peccary exclusion, was designed to stop peccaries from entering treatment units, providing access only to small vertebrates; larger mammals were able to access the treatment unit by reaching over the sides and the open top; finally, the Control allowed full access for all mammals. Fresh A. gratum fruits were collected from the forest floor under different adult trees throughout the study area. In each exclosure treatment, twenty Agratum seeds were placed, and their removal was recorded. In total, 3 600 seeds were analyzed. Seed survival was lower in unhunted forest compared to areas with moderate hunting and forest with a higher hunt pressure, supporting the hypothesis of the importance of mammals in seed removal. From the initial 400 seeds left for each control exclosure in each type of forest, there was a significant difference between the seed removal; 1.75% seeds in the unhunted forest remained; 43.5% in the moderately hunted forest, and 48.5% in hunted forest. The main cause of seed mortality was white lipped peccaries; while in the forests without them, the main removal was caused by rodents and a higher insect infection was observed in the heavily hunted forest. Our results indicated that defaunation affects seed survivorship.

  6. Landscape variability of vegetation change across the forest to tundra transition of central Canada

    NASA Astrophysics Data System (ADS)

    Bonney, Mitchell Thurston

    Widespread vegetation productivity increases in tundra ecosystems and stagnation, or even productivity decreases, in boreal forest ecosystems have been detected from coarse-scale remote sensing observations over the last few decades. However, finer-scale Landsat studies have shown that these changes are heterogeneous and may be related to landscape and regional variability in climate, land cover, topography and moisture. In this study, a Landsat Normalized Difference Vegetation Index (NDVI) time-series (1984-2016) was examined for a study area spanning the entirety of the sub-Arctic boreal forest to Low Arctic tundra transition of central Canada (i.e., Yellowknife to the Arctic Ocean). NDVI trend analysis indicated that 27% of un-masked pixels in the study area exhibited a significant (p < 0.05) trend and virtually all (99.3%) of those pixels were greening. Greening pixels were most common in the northern tundra zone and the southern forest-tundra ecotone zone. NDVI trends were positive throughout the study area, but were smallest in the forest zone and largest in the northern tundra zone. These results were supported by ground validation, which found a strong relationship (R2 = 0.81) between bulk vegetation volume (BVV) and NDVI for non-tree functional groups in the North Slave region of Northwest Territories. Field observations indicate that alder (Alnus spp.) shrublands and open woodland sites with shrubby understories were most likely to exhibit greening in that area. Random Forest (RF) modelling of the relationship between NDVI trends and environmental variables found that the magnitude and direction of trends differed across the forest to tundra transition. Increased summer temperatures, shrubland and forest land cover, closer proximity to major drainage systems, longer distances from major lakes and lower elevations were generally more important and associated with larger positive NDVI trends. These findings indicate that the largest positive NDVI trends were primarily associated with the increased productivity of shrubby environments, especially at, and north of the forest-tundra ecotone in areas with more favorable growing conditions. Smaller and less significant NDVI trends in boreal forest environments south of the forest-tundra ecotone were likely associated with long-term recovery from fire disturbance rather than the variables analyzed here.

  7. Applying genetic algorithms to set the optimal combination of forest fire related variables and model forest fire susceptibility based on data mining models. The case of Dayu County, China.

    PubMed

    Hong, Haoyuan; Tsangaratos, Paraskevas; Ilia, Ioanna; Liu, Junzhi; Zhu, A-Xing; Xu, Chong

    2018-07-15

    The main objective of the present study was to utilize Genetic Algorithms (GA) in order to obtain the optimal combination of forest fire related variables and apply data mining methods for constructing a forest fire susceptibility map. In the proposed approach, a Random Forest (RF) and a Support Vector Machine (SVM) was used to produce a forest fire susceptibility map for the Dayu County which is located in southwest of Jiangxi Province, China. For this purpose, historic forest fires and thirteen forest fire related variables were analyzed, namely: elevation, slope angle, aspect, curvature, land use, soil cover, heat load index, normalized difference vegetation index, mean annual temperature, mean annual wind speed, mean annual rainfall, distance to river network and distance to road network. The Natural Break and the Certainty Factor method were used to classify and weight the thirteen variables, while a multicollinearity analysis was performed to determine the correlation among the variables and decide about their usability. The optimal set of variables, determined by the GA limited the number of variables into eight excluding from the analysis, aspect, land use, heat load index, distance to river network and mean annual rainfall. The performance of the forest fire models was evaluated by using the area under the Receiver Operating Characteristic curve (ROC-AUC) based on the validation dataset. Overall, the RF models gave higher AUC values. Also the results showed that the proposed optimized models outperform the original models. Specifically, the optimized RF model gave the best results (0.8495), followed by the original RF (0.8169), while the optimized SVM gave lower values (0.7456) than the RF, however higher than the original SVM (0.7148) model. The study highlights the significance of feature selection techniques in forest fire susceptibility, whereas data mining methods could be considered as a valid approach for forest fire susceptibility modeling. Copyright © 2018 Elsevier B.V. All rights reserved.

  8. Random forest learning of ultrasonic statistical physics and object spaces for lesion detection in 2D sonomammography

    NASA Astrophysics Data System (ADS)

    Sheet, Debdoot; Karamalis, Athanasios; Kraft, Silvan; Noël, Peter B.; Vag, Tibor; Sadhu, Anup; Katouzian, Amin; Navab, Nassir; Chatterjee, Jyotirmoy; Ray, Ajoy K.

    2013-03-01

    Breast cancer is the most common form of cancer in women. Early diagnosis can significantly improve lifeexpectancy and allow different treatment options. Clinicians favor 2D ultrasonography for breast tissue abnormality screening due to high sensitivity and specificity compared to competing technologies. However, inter- and intra-observer variability in visual assessment and reporting of lesions often handicaps its performance. Existing Computer Assisted Diagnosis (CAD) systems though being able to detect solid lesions are often restricted in performance. These restrictions are inability to (1) detect lesion of multiple sizes and shapes, and (2) differentiate between hypo-echoic lesions from their posterior acoustic shadowing. In this work we present a completely automatic system for detection and segmentation of breast lesions in 2D ultrasound images. We employ random forests for learning of tissue specific primal to discriminate breast lesions from surrounding normal tissues. This enables it to detect lesions of multiple shapes and sizes, as well as discriminate between hypo-echoic lesion from associated posterior acoustic shadowing. The primal comprises of (i) multiscale estimated ultrasonic statistical physics and (ii) scale-space characteristics. The random forest learns lesion vs. background primal from a database of 2D ultrasound images with labeled lesions. For segmentation, the posterior probabilities of lesion pixels estimated by the learnt random forest are hard thresholded to provide a random walks segmentation stage with starting seeds. Our method achieves detection with 99.19% accuracy and segmentation with mean contour-to-contour error < 3 pixels on a set of 40 images with 49 lesions.

  9. Comparing ensemble learning methods based on decision tree classifiers for protein fold recognition.

    PubMed

    Bardsiri, Mahshid Khatibi; Eftekhari, Mahdi

    2014-01-01

    In this paper, some methods for ensemble learning of protein fold recognition based on a decision tree (DT) are compared and contrasted against each other over three datasets taken from the literature. According to previously reported studies, the features of the datasets are divided into some groups. Then, for each of these groups, three ensemble classifiers, namely, random forest, rotation forest and AdaBoost.M1 are employed. Also, some fusion methods are introduced for combining the ensemble classifiers obtained in the previous step. After this step, three classifiers are produced based on the combination of classifiers of types random forest, rotation forest and AdaBoost.M1. Finally, the three different classifiers achieved are combined to make an overall classifier. Experimental results show that the overall classifier obtained by the genetic algorithm (GA) weighting fusion method, is the best one in comparison to previously applied methods in terms of classification accuracy.

  10. Polarimetric signatures of a coniferous forest canopy based on vector radiative transfer theory

    NASA Technical Reports Server (NTRS)

    Karam, M. A.; Fung, A. K.; Amar, F.; Mougin, E.; Lopes, A.; Beaudoin, A.

    1992-01-01

    Complete polarization signatures of a coniferous forest canopy are studied by the iterative solution of the vector radiative transfer equations up to the second order. The forest canopy constituents (leaves, branches, stems, and trunk) are embedded in a multi-layered medium over a rough interface. The branches, stems and trunk scatterers are modeled as finite randomly oriented cylinders. The leaves are modeled as randomly oriented needles. For a plane wave exciting the canopy, the average Mueller matrix is formulated in terms of the iterative solution of the radiative transfer solution and used to determine the linearly polarized backscattering coefficients, the co-polarized and cross-polarized power returns, and the phase difference statistics. Numerical results are presented to investigate the effect of transmitting and receiving antenna configurations on the polarimetric signature of a pine forest. Comparison is made with measurements.

  11. Field evaluation of a random forest activity classifier for wrist-worn accelerometer data.

    PubMed

    Pavey, Toby G; Gilson, Nicholas D; Gomersall, Sjaan R; Clark, Bronwyn; Trost, Stewart G

    2017-01-01

    Wrist-worn accelerometers are convenient to wear and associated with greater wear-time compliance. Previous work has generally relied on choreographed activity trials to train and test classification models. However, validity in free-living contexts is starting to emerge. Study aims were: (1) train and test a random forest activity classifier for wrist accelerometer data; and (2) determine if models trained on laboratory data perform well under free-living conditions. Twenty-one participants (mean age=27.6±6.2) completed seven lab-based activity trials and a 24h free-living trial (N=16). Participants wore a GENEActiv monitor on the non-dominant wrist. Classification models recognising four activity classes (sedentary, stationary+, walking, and running) were trained using time and frequency domain features extracted from 10-s non-overlapping windows. Model performance was evaluated using leave-one-out-cross-validation. Models were implemented using the randomForest package within R. Classifier accuracy during the 24h free living trial was evaluated by calculating agreement with concurrently worn activPAL monitors. Overall classification accuracy for the random forest algorithm was 92.7%. Recognition accuracy for sedentary, stationary+, walking, and running was 80.1%, 95.7%, 91.7%, and 93.7%, respectively for the laboratory protocol. Agreement with the activPAL data (stepping vs. non-stepping) during the 24h free-living trial was excellent and, on average, exceeded 90%. The ICC for stepping time was 0.92 (95% CI=0.75-0.97). However, sensitivity and positive predictive values were modest. Mean bias was 10.3min/d (95% LOA=-46.0 to 25.4min/d). The random forest classifier for wrist accelerometer data yielded accurate group-level predictions under controlled conditions, but was less accurate at identifying stepping verse non-stepping behaviour in free living conditions Future studies should conduct more rigorous field-based evaluations using observation as a criterion measure. Copyright © 2016 Sports Medicine Australia. Published by Elsevier Ltd. All rights reserved.

  12. Assessment of various supervised learning algorithms using different performance metrics

    NASA Astrophysics Data System (ADS)

    Susheel Kumar, S. M.; Laxkar, Deepak; Adhikari, Sourav; Vijayarajan, V.

    2017-11-01

    Our work brings out comparison based on the performance of supervised machine learning algorithms on a binary classification task. The supervised machine learning algorithms which are taken into consideration in the following work are namely Support Vector Machine(SVM), Decision Tree(DT), K Nearest Neighbour (KNN), Naïve Bayes(NB) and Random Forest(RF). This paper mostly focuses on comparing the performance of above mentioned algorithms on one binary classification task by analysing the Metrics such as Accuracy, F-Measure, G-Measure, Precision, Misclassification Rate, False Positive Rate, True Positive Rate, Specificity, Prevalence.

  13. An assessment of educational needs in the Alaskan forest products industry.

    Treesearch

    Jon Thomas; Eric Hansen; Allen M. Brackley

    2005-01-01

    Major changes in federal forest policy in Alaska have resulted in a dramatic downsizing of the state's forest industry. These changes have driven efforts for economic restructuring and improved support for Alaskan communities. The University of Alaska Sitka Forest Products program at the University of Alaska Southeast is one example of efforts to better support...

  14. An assessment of educational needs in the Alaskan forest products industry

    Treesearch

    J. Thomas; E. Hansen; A. Brackley

    2005-01-01

    Major changes in federal forest policy in Alaska have resulted in a dramatic downsizing of the state's forest industry. These changes have driven efforts for economic restructuring and improved support for Alaskan communities. The University of Alaska Sitka Forest Products program at the University of Alaska Southeast is one example of efforts to better support...

  15. Metastability for discontinuous dynamical systems under Lévy noise: Case study on Amazonian Vegetation.

    PubMed

    Serdukova, Larissa; Zheng, Yayun; Duan, Jinqiao; Kurths, Jürgen

    2017-08-24

    For the tipping elements in the Earth's climate system, the most important issue to address is how stable is the desirable state against random perturbations. Extreme biotic and climatic events pose severe hazards to tropical rainforests. Their local effects are extremely stochastic and difficult to measure. Moreover, the direction and intensity of the response of forest trees to such perturbations are unknown, especially given the lack of efficient dynamical vegetation models to evaluate forest tree cover changes over time. In this study, we consider randomness in the mathematical modelling of forest trees by incorporating uncertainty through a stochastic differential equation. According to field-based evidence, the interactions between fires and droughts are a more direct mechanism that may describe sudden forest degradation in the south-eastern Amazon. In modeling the Amazonian vegetation system, we include symmetric α-stable Lévy perturbations. We report results of stability analysis of the metastable fertile forest state. We conclude that even a very slight threat to the forest state stability represents L´evy noise with large jumps of low intensity, that can be interpreted as a fire occurring in a non-drought year. During years of severe drought, high-intensity fires significantly accelerate the transition between a forest and savanna state.

  16. Stochastic assembly in a subtropical forest chronosequence: evidence from contrasting changes of species, phylogenetic and functional dissimilarity over succession.

    PubMed

    Mi, Xiangcheng; Swenson, Nathan G; Jia, Qi; Rao, Mide; Feng, Gang; Ren, Haibao; Bebber, Daniel P; Ma, Keping

    2016-09-07

    Deterministic and stochastic processes jointly determine the community dynamics of forest succession. However, it has been widely held in previous studies that deterministic processes dominate forest succession. Furthermore, inference of mechanisms for community assembly may be misleading if based on a single axis of diversity alone. In this study, we evaluated the relative roles of deterministic and stochastic processes along a disturbance gradient by integrating species, functional, and phylogenetic beta diversity in a subtropical forest chronosequence in Southeastern China. We found a general pattern of increasing species turnover, but little-to-no change in phylogenetic and functional turnover over succession at two spatial scales. Meanwhile, the phylogenetic and functional beta diversity were not significantly different from random expectation. This result suggested a dominance of stochastic assembly, contrary to the general expectation that deterministic processes dominate forest succession. On the other hand, we found significant interactions of environment and disturbance and limited evidence for significant deviations of phylogenetic or functional turnover from random expectations for different size classes. This result provided weak evidence of deterministic processes over succession. Stochastic assembly of forest succession suggests that post-disturbance restoration may be largely unpredictable and difficult to control in subtropical forests.

  17. Correspondence between sound propagation in discrete and continuous random media with application to forest acoustics.

    PubMed

    Ostashev, Vladimir E; Wilson, D Keith; Muhlestein, Michael B; Attenborough, Keith

    2018-02-01

    Although sound propagation in a forest is important in several applications, there are currently no rigorous yet computationally tractable prediction methods. Due to the complexity of sound scattering in a forest, it is natural to formulate the problem stochastically. In this paper, it is demonstrated that the equations for the statistical moments of the sound field propagating in a forest have the same form as those for sound propagation in a turbulent atmosphere if the scattering properties of the two media are expressed in terms of the differential scattering and total cross sections. Using the existing theories for sound propagation in a turbulent atmosphere, this analogy enables the derivation of several results for predicting forest acoustics. In particular, the second-moment parabolic equation is formulated for the spatial correlation function of the sound field propagating above an impedance ground in a forest with micrometeorology. Effective numerical techniques for solving this equation have been developed in atmospheric acoustics. In another example, formulas are obtained that describe the effect of a forest on the interference between the direct and ground-reflected waves. The formulated correspondence between wave propagation in discrete and continuous random media can also be used in other fields of physics.

  18. First direct landscape-scale measurement of tropical rain forest Leaf Area Index, a key driver of global primary productivity

    Treesearch

    David B. Clark; Paulo C. Olivas; Steven F. Oberbauer; Deborah A. Clark; Michael G. Ryan

    2008-01-01

    Leaf Area Index (leaf area per unit ground area, LAI) is a key driver of forest productivity but has never previously been measured directly at the landscape scale in tropical rain forest (TRF). We used a modular tower and stratified random sampling to harvest all foliage from forest floor to canopy top in 55 vertical transects (4.6 m2) across 500 ha of old growth in...

  19. Dynamics of Tree Species Diversity in Unlogged and Selectively Logged Malaysian Forests.

    PubMed

    Shima, Ken; Yamada, Toshihiro; Okuda, Toshinori; Fletcher, Christine; Kassim, Abdul Rahman

    2018-01-18

    Selective logging that is commonly conducted in tropical forests may change tree species diversity. In rarely disturbed tropical forests, locally rare species exhibit higher survival rates. If this non-random process occurs in a logged forest, the forest will rapidly recover its tree species diversity. Here we determined whether a forest in the Pasoh Forest Reserve, Malaysia, which was selectively logged 40 years ago, recovered its original species diversity (species richness and composition). To explore this, we compared the dynamics of secies diversity between unlogged forest plot (18.6 ha) and logged forest plot (5.4 ha). We found that 40 years are not sufficient to recover species diversity after logging. Unlike unlogged forests, tree deaths and recruitments did not contribute to increased diversity in the selectively logged forests. Our results predict that selectively logged forests require a longer time at least than our observing period (40 years) to regain their diversity.

  20. Assessing change in large-scale forest area by visually interpreting Landsat images

    Treesearch

    Jerry D. Greer; Frederick P. Weber; Raymond L. Czaplewski

    2000-01-01

    As part of the Forest Resources Assessment 1990, the Food and Agriculture Organization of the United Nations visually interpreted a stratified random sample of 117 Landsat scenes to estimate global status and change in tropical forest area. Images from 1980 and 1990 were interpreted by a group of widely experienced technical people in many different tropical countries...

  1. A ground-based method of assessing urban forest structure and ecosystem services

    Treesearch

    David J. Nowak; Daniel E. Crane; Jack C. Stevens; Robert E. Hoehn; Jeffrey T. Walton; Jerry Bond

    2008-01-01

    To properly manage urban forests, it is essential to have data on this important resource. An efficient means to obtain this information is to randomly sample urban areas. To help assess the urban forest structure (e.g., number of trees, species composition, tree sizes, health) and several functions (e.g., air pollution removal, carbon storage and sequestration), the...

  2. Spatially random mortality in old-growth red pine forests of northern Minnesota

    Treesearch

    Tuomas ​Aakala; Shawn Fraver; Brian J. Palik; Anthony W. D' Amato

    2012-01-01

    Characterizing the spatial distribution of tree mortality is critical to understanding forest dynamics, but empirical studies on these patterns under old-growth conditions are rare. This rarity is due in part to low mortality rates in old-growth forests, the study of which necessitates long observation periods, and the confounding influence of tree in-growth during...

  3. Random forests of interaction trees for estimating individualized treatment effects in randomized trials.

    PubMed

    Su, Xiaogang; Peña, Annette T; Liu, Lei; Levine, Richard A

    2018-04-29

    Assessing heterogeneous treatment effects is a growing interest in advancing precision medicine. Individualized treatment effects (ITEs) play a critical role in such an endeavor. Concerning experimental data collected from randomized trials, we put forward a method, termed random forests of interaction trees (RFIT), for estimating ITE on the basis of interaction trees. To this end, we propose a smooth sigmoid surrogate method, as an alternative to greedy search, to speed up tree construction. The RFIT outperforms the "separate regression" approach in estimating ITE. Furthermore, standard errors for the estimated ITE via RFIT are obtained with the infinitesimal jackknife method. We assess and illustrate the use of RFIT via both simulation and the analysis of data from an acupuncture headache trial. Copyright © 2018 John Wiley & Sons, Ltd.

  4. Geographical traceability of Marsdenia tenacissima by Fourier transform infrared spectroscopy and chemometrics

    NASA Astrophysics Data System (ADS)

    Li, Chao; Yang, Sheng-Chao; Guo, Qiao-Sheng; Zheng, Kai-Yan; Wang, Ping-Li; Meng, Zhen-Gui

    2016-01-01

    A combination of Fourier transform infrared spectroscopy with chemometrics tools provided an approach for studying Marsdenia tenacissima according to its geographical origin. A total of 128 M. tenacissima samples from four provinces in China were analyzed with FTIR spectroscopy. Six pattern recognition methods were used to construct the discrimination models: support vector machine-genetic algorithms, support vector machine-particle swarm optimization, K-nearest neighbors, radial basis function neural network, random forest and support vector machine-grid search. Experimental results showed that K-nearest neighbors was superior to other mathematical algorithms after data were preprocessed with wavelet de-noising, with a discrimination rate of 100% in both the training and prediction sets. This study demonstrated that FTIR spectroscopy coupled with K-nearest neighbors could be successfully applied to determine the geographical origins of M. tenacissima samples, thereby providing reliable authentication in a rapid, cheap and noninvasive way.

  5. Measuring and explaining the willingness to pay for forest conservation: evidence from a survey experiment in Brazil

    NASA Astrophysics Data System (ADS)

    Bakaki, Zorzeta; Bernauer, Thomas

    2016-11-01

    Recent research suggests that there is substantial public support (including willingness to pay) for forest conservation. Based on a nationwide survey experiment in Brazil (N = 2500) the largest and richest of the world’s tropical developing countries, we shed new light on this issue. To what extent does the public in fact support forest conservation and what factors are influencing support levels? Unlike previous studies, our results show that the willingness to pay for tropical forest conservation in Brazil is rather low. Moreover, framing forest conservation in terms of biodiversity protection, which tends to create more local benefits, does not induce more support than framing conservation in terms of mitigating global climate change. The results also show that low levels of trust in public institutions have a strong negative impact on the public’s willingness to pay for forest conservation, individually and/or via government spending. What could other (richer) countries do, in this context, to encourage forest conservation in Brazil and other tropical developing countries? One key issue is whether prospects of foreign funding for forest conservation are likely to crowd out or, conversely, enhance the motivation for domestic level conservation efforts. We find that prospects of foreign funding have no significant effect on willingness to pay for forest conservation. These findings have at least three policy implications, namely, that the Brazilian public’s willingness to pay for forest conservation is very limited, that large-scale international funding is probably needed, and that such funding is unlikely to encourage more domestic effort, but is also unlikely to crowd out domestic efforts. Restoring public trust in the Brazilian government is key to increasing public support for forest conservation in Brazil.

  6. Toward geodesign for watershed restoration on the Fremont-Winema National Forest, Pacific Northwest, USA

    Treesearch

    Keith Reynolds; Philip Murphy; Steven Paplanus

    2017-01-01

    Spatial decision support systems for forest management have steadily evolved over the past 20+ years in order to better address the complexities of contemporary forest management issues such as the sustainability and resilience of ecosystems on forested landscapes. In this paper, we describe and illustrate new features of the Ecosystem Management Decision Support (EMDS...

  7. Monitoring and Modeling Carbon Dynamics at a Network of Intensive Sites in the USA and Mexico

    NASA Astrophysics Data System (ADS)

    Birdsey, R.; Wayson, C.; Johnson, K. D.; Pan, Y.; Angeles, G.; De Jong, B. H.; Andrade, J. L.; Dai, Z.

    2013-05-01

    The Forest Services of the USA and Mexico, supported by NASA and USAID, have begun to establish a network of intensive forest carbon monitoring sites. These sites are used for research and teaching, developing forest management practices, and forging links to the needs of communities. Several of the sites have installed eddy flux towers to basic meteorology data and daily estimates of forest carbon uptake and release, the processes that determine forest growth. Field sampling locations at each site provide estimates of forest biomass and carbon stocks, and monitor forest dynamic processes such as growth and mortality rates. Remote sensing facilitates scaling up to the surrounding landscapes. The sites support information requirements for implementing programs such as Reducing Emissions from Deforestation and Forest Degradation (REDD+), enabling communities to receive payments for ecosystem services such as reduced carbon emissions or improved forest management. In addition to providing benchmark data for REDD+ projects, the sites are valuable for validating state and national estimates from satellite remote sensing and the national forest inventory. Data from the sites provide parameters for forest models that support strategic management analysis, and support student training and graduate projects. The intensive monitoring sites may be a model for other countries in Latin America. Coordination among sites in the USA, Mexico and other Latin American countries can ensure harmonization of approaches and data, and share experiences and knowledge among countries with emerging opportunities for implementing REDD+ and other conservation programs.

  8. CRF: detection of CRISPR arrays using random forest.

    PubMed

    Wang, Kai; Liang, Chun

    2017-01-01

    CRISPRs (clustered regularly interspaced short palindromic repeats) are particular repeat sequences found in wide range of bacteria and archaea genomes. Several tools are available for detecting CRISPR arrays in the genomes of both domains. Here we developed a new web-based CRISPR detection tool named CRF (CRISPR Finder by Random Forest). Different from other CRISPR detection tools, a random forest classifier was used in CRF to filter out invalid CRISPR arrays from all putative candidates and accordingly enhanced detection accuracy. In CRF, particularly, triplet elements that combine both sequence content and structure information were extracted from CRISPR repeats for classifier training. The classifier achieved high accuracy and sensitivity. Moreover, CRF offers a highly interactive web interface for robust data visualization that is not available among other CRISPR detection tools. After detection, the query sequence, CRISPR array architecture, and the sequences and secondary structures of CRISPR repeats and spacers can be visualized for visual examination and validation. CRF is freely available at http://bioinfolab.miamioh.edu/crf/home.php.

  9. Do bioclimate variables improve performance of climate envelope models?

    USGS Publications Warehouse

    Watling, James I.; Romañach, Stephanie S.; Bucklin, David N.; Speroterra, Carolina; Brandt, Laura A.; Pearlstine, Leonard G.; Mazzotti, Frank J.

    2012-01-01

    Climate envelope models are widely used to forecast potential effects of climate change on species distributions. A key issue in climate envelope modeling is the selection of predictor variables that most directly influence species. To determine whether model performance and spatial predictions were related to the selection of predictor variables, we compared models using bioclimate variables with models constructed from monthly climate data for twelve terrestrial vertebrate species in the southeastern USA using two different algorithms (random forests or generalized linear models), and two model selection techniques (using uncorrelated predictors or a subset of user-defined biologically relevant predictor variables). There were no differences in performance between models created with bioclimate or monthly variables, but one metric of model performance was significantly greater using the random forest algorithm compared with generalized linear models. Spatial predictions between maps using bioclimate and monthly variables were very consistent using the random forest algorithm with uncorrelated predictors, whereas we observed greater variability in predictions using generalized linear models.

  10. Clustering Single-Cell Expression Data Using Random Forest Graphs.

    PubMed

    Pouyan, Maziyar Baran; Nourani, Mehrdad

    2017-07-01

    Complex tissues such as brain and bone marrow are made up of multiple cell types. As the study of biological tissue structure progresses, the role of cell-type-specific research becomes increasingly important. Novel sequencing technology such as single-cell cytometry provides researchers access to valuable biological data. Applying machine-learning techniques to these high-throughput datasets provides deep insights into the cellular landscape of the tissue where those cells are a part of. In this paper, we propose the use of random-forest-based single-cell profiling, a new machine-learning-based technique, to profile different cell types of intricate tissues using single-cell cytometry data. Our technique utilizes random forests to capture cell marker dependences and model the cellular populations using the cell network concept. This cellular network helps us discover what cell types are in the tissue. Our experimental results on public-domain datasets indicate promising performance and accuracy of our technique in extracting cell populations of complex tissues.

  11. Visible and near infrared spectroscopy coupled to random forest to quantify some soil quality parameters

    NASA Astrophysics Data System (ADS)

    de Santana, Felipe Bachion; de Souza, André Marcelo; Poppi, Ronei Jesus

    2018-02-01

    This study evaluates the use of visible and near infrared spectroscopy (Vis-NIRS) combined with multivariate regression based on random forest to quantify some quality soil parameters. The parameters analyzed were soil cation exchange capacity (CEC), sum of exchange bases (SB), organic matter (OM), clay and sand present in the soils of several regions of Brazil. Current methods for evaluating these parameters are laborious, timely and require various wet analytical methods that are not adequate for use in precision agriculture, where faster and automatic responses are required. The random forest regression models were statistically better than PLS regression models for CEC, OM, clay and sand, demonstrating resistance to overfitting, attenuating the effect of outlier samples and indicating the most important variables for the model. The methodology demonstrates the potential of the Vis-NIR as an alternative for determination of CEC, SB, OM, sand and clay, making possible to develop a fast and automatic analytical procedure.

  12. Comparative analysis of used car price evaluation models

    NASA Astrophysics Data System (ADS)

    Chen, Chuancan; Hao, Lulu; Xu, Cong

    2017-05-01

    An accurate used car price evaluation is a catalyst for the healthy development of used car market. Data mining has been applied to predict used car price in several articles. However, little is studied on the comparison of using different algorithms in used car price estimation. This paper collects more than 100,000 used car dealing records throughout China to do empirical analysis on a thorough comparison of two algorithms: linear regression and random forest. These two algorithms are used to predict used car price in three different models: model for a certain car make, model for a certain car series and universal model. Results show that random forest has a stable but not ideal effect in price evaluation model for a certain car make, but it shows great advantage in the universal model compared with linear regression. This indicates that random forest is an optimal algorithm when handling complex models with a large number of variables and samples, yet it shows no obvious advantage when coping with simple models with less variables.

  13. Predicting Coastal Flood Severity using Random Forest Algorithm

    NASA Astrophysics Data System (ADS)

    Sadler, J. M.; Goodall, J. L.; Morsy, M. M.; Spencer, K.

    2017-12-01

    Coastal floods have become more common recently and are predicted to further increase in frequency and severity due to sea level rise. Predicting floods in coastal cities can be difficult due to the number of environmental and geographic factors which can influence flooding events. Built stormwater infrastructure and irregular urban landscapes add further complexity. This paper demonstrates the use of machine learning algorithms in predicting street flood occurrence in an urban coastal setting. The model is trained and evaluated using data from Norfolk, Virginia USA from September 2010 - October 2016. Rainfall, tide levels, water table levels, and wind conditions are used as input variables. Street flooding reports made by city workers after named and unnamed storm events, ranging from 1-159 reports per event, are the model output. Results show that Random Forest provides predictive power in estimating the number of flood occurrences given a set of environmental conditions with an out-of-bag root mean squared error of 4.3 flood reports and a mean absolute error of 0.82 flood reports. The Random Forest algorithm performed much better than Poisson regression. From the Random Forest model, total daily rainfall was by far the most important factor in flood occurrence prediction, followed by daily low tide and daily higher high tide. The model demonstrated here could be used to predict flood severity based on forecast rainfall and tide conditions and could be further enhanced using more complete street flooding data for model training.

  14. Differentiation of fat, muscle, and edema in thigh MRIs using random forest classification

    NASA Astrophysics Data System (ADS)

    Kovacs, William; Liu, Chia-Ying; Summers, Ronald M.; Yao, Jianhua

    2016-03-01

    There are many diseases that affect the distribution of muscles, including Duchenne and fascioscapulohumeral dystrophy among other myopathies. In these disease cases, it is important to quantify both the muscle and fat volumes to track the disease progression. There has also been evidence that abnormal signal intensity on the MR images, which often is an indication of edema or inflammation can be a good predictor for muscle deterioration. We present a fully-automated method that examines magnetic resonance (MR) images of the thigh and identifies the fat, muscle, and edema using a random forest classifier. First the thigh regions are automatically segmented using the T1 sequence. Then, inhomogeneity artifacts were corrected using the N3 technique. The T1 and STIR (short tau inverse recovery) images are then aligned using landmark based registration with the bone marrow. The normalized T1 and STIR intensity values are used to train the random forest. Once trained, the random forest can accurately classify the aforementioned classes. This method was evaluated on MR images of 9 patients. The precision values are 0.91+/-0.06, 0.98+/-0.01 and 0.50+/-0.29 for muscle, fat, and edema, respectively. The recall values are 0.95+/-0.02, 0.96+/-0.03 and 0.43+/-0.09 for muscle, fat, and edema, respectively. This demonstrates the feasibility of utilizing information from multiple MR sequences for the accurate quantification of fat, muscle and edema.

  15. AUTOCLASSIFICATION OF THE VARIABLE 3XMM SOURCES USING THE RANDOM FOREST MACHINE LEARNING ALGORITHM

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Farrell, Sean A.; Murphy, Tara; Lo, Kitty K., E-mail: s.farrell@physics.usyd.edu.au

    In the current era of large surveys and massive data sets, autoclassification of astrophysical sources using intelligent algorithms is becoming increasingly important. In this paper we present the catalog of variable sources in the Third XMM-Newton Serendipitous Source catalog (3XMM) autoclassified using the Random Forest machine learning algorithm. We used a sample of manually classified variable sources from the second data release of the XMM-Newton catalogs (2XMMi-DR2) to train the classifier, obtaining an accuracy of ∼92%. We also evaluated the effectiveness of identifying spurious detections using a sample of spurious sources, achieving an accuracy of ∼95%. Manual investigation of amore » random sample of classified sources confirmed these accuracy levels and showed that the Random Forest machine learning algorithm is highly effective at automatically classifying 3XMM sources. Here we present the catalog of classified 3XMM variable sources. We also present three previously unidentified unusual sources that were flagged as outlier sources by the algorithm: a new candidate supergiant fast X-ray transient, a 400 s X-ray pulsar, and an eclipsing 5 hr binary system coincident with a known Cepheid.« less

  16. Classification of suicide attempters in schizophrenia using sociocultural and clinical features: A machine learning approach.

    PubMed

    Hettige, Nuwan C; Nguyen, Thai Binh; Yuan, Chen; Rajakulendran, Thanara; Baddour, Jermeen; Bhagwat, Nikhil; Bani-Fatemi, Ali; Voineskos, Aristotle N; Mallar Chakravarty, M; De Luca, Vincenzo

    2017-07-01

    Suicide is a major concern for those afflicted by schizophrenia. Identifying patients at the highest risk for future suicide attempts remains a complex problem for psychiatric interventions. Machine learning models allow for the integration of many risk factors in order to build an algorithm that predicts which patients are likely to attempt suicide. Currently it is unclear how to integrate previously identified risk factors into a clinically relevant predictive tool to estimate the probability of a patient with schizophrenia for attempting suicide. We conducted a cross-sectional assessment on a sample of 345 participants diagnosed with schizophrenia spectrum disorders. Suicide attempters and non-attempters were clearly identified using the Columbia Suicide Severity Rating Scale (C-SSRS) and the Beck Suicide Ideation Scale (BSS). We developed four classification algorithms using a regularized regression, random forest, elastic net and support vector machine models with sociocultural and clinical variables as features to train the models. All classification models performed similarly in identifying suicide attempters and non-attempters. Our regularized logistic regression model demonstrated an accuracy of 67% and an area under the curve (AUC) of 0.71, while the random forest model demonstrated 66% accuracy and an AUC of 0.67. Support vector classifier (SVC) model demonstrated an accuracy of 67% and an AUC of 0.70, and the elastic net model demonstrated and accuracy of 65% and an AUC of 0.71. Machine learning algorithms offer a relatively successful method for incorporating many clinical features to predict individuals at risk for future suicide attempts. Increased performance of these models using clinically relevant variables offers the potential to facilitate early treatment and intervention to prevent future suicide attempts. Copyright © 2017 Elsevier Inc. All rights reserved.

  17. Empirical study of seven data mining algorithms on different characteristics of datasets for biomedical classification applications.

    PubMed

    Zhang, Yiyan; Xin, Yi; Li, Qin; Ma, Jianshe; Li, Shuai; Lv, Xiaodan; Lv, Weiqi

    2017-11-02

    Various kinds of data mining algorithms are continuously raised with the development of related disciplines. The applicable scopes and their performances of these algorithms are different. Hence, finding a suitable algorithm for a dataset is becoming an important emphasis for biomedical researchers to solve practical problems promptly. In this paper, seven kinds of sophisticated active algorithms, namely, C4.5, support vector machine, AdaBoost, k-nearest neighbor, naïve Bayes, random forest, and logistic regression, were selected as the research objects. The seven algorithms were applied to the 12 top-click UCI public datasets with the task of classification, and their performances were compared through induction and analysis. The sample size, number of attributes, number of missing values, and the sample size of each class, correlation coefficients between variables, class entropy of task variable, and the ratio of the sample size of the largest class to the least class were calculated to character the 12 research datasets. The two ensemble algorithms reach high accuracy of classification on most datasets. Moreover, random forest performs better than AdaBoost on the unbalanced dataset of the multi-class task. Simple algorithms, such as the naïve Bayes and logistic regression model are suitable for a small dataset with high correlation between the task and other non-task attribute variables. K-nearest neighbor and C4.5 decision tree algorithms perform well on binary- and multi-class task datasets. Support vector machine is more adept on the balanced small dataset of the binary-class task. No algorithm can maintain the best performance in all datasets. The applicability of the seven data mining algorithms on the datasets with different characteristics was summarized to provide a reference for biomedical researchers or beginners in different fields.

  18. Diagnostic Value of the Impairment of Olfaction in Parkinson's Disease

    PubMed Central

    Casjens, Swaantje; Eckert, Angelika; Woitalla, Dirk; Ellrichmann, Gisa; Turewicz, Michael; Stephan, Christian; Eisenacher, Martin; May, Caroline; Meyer, Helmut E.; Brüning, Thomas; Pesch, Beate

    2013-01-01

    Background Olfactory impairment is increasingly recognized as an early symptom in the development of Parkinson's disease. Testing olfactory function is a non-invasive method but can be time-consuming which restricts its application in clinical settings and epidemiological studies. Here, we investigate odor identification as a supportive diagnostic tool for Parkinson's disease and estimate the performance of odor subsets to allow a more rapid testing of olfactory impairment. Methodology/Principal Findings Odor identification was assessed with 16 Sniffin' sticks in 148 Parkinson patients and 148 healthy controls. Risks of olfactory impairment were estimated with proportional odds models. Random forests were applied to classify Parkinson and non-Parkinson patients. Parkinson patients were rarely normosmic (identification of more than 12 odors; 16.8%) and identified on average seven odors whereas the reference group identified 12 odors and showed a higher prevalence of normosmy (31.1%). Parkinson patients with rigidity dominance had a twofold greater prevalence of olfactory impairment. Disease severity was associated with impairment of odor identification (per score point of the Hoehn and Yahr rating OR 1.87, 95% CI 1.26–2.77). Age-related impairment of olfaction showed a steeper gradient in Parkinson patients. Coffee, peppermint, and anise showed the largest difference in odor identification between Parkinson patients and controls. Random forests estimated a misclassification rate of 22.4% when comparing Parkinson patients with healthy controls using all 16 odors. A similar rate (23.8%) was observed when only the three aforementioned odors were applied. Conclusions/Significance Our findings indicate that testing odor identification can be a supportive diagnostic tool for Parkinson's disease. The application of only three odors performed well in discriminating Parkinson patients from controls, which can facilitate a wider application of this method as a point-of-care test. PMID:23696904

  19. Data-driven mapping of the potential mountain permafrost distribution.

    PubMed

    Deluigi, Nicola; Lambiel, Christophe; Kanevski, Mikhail

    2017-07-15

    Existing mountain permafrost distribution models generally offer a good overview of the potential extent of this phenomenon at a regional scale. They are however not always able to reproduce the high spatial discontinuity of permafrost at the micro-scale (scale of a specific landform; ten to several hundreds of meters). To overcome this lack, we tested an alternative modelling approach using three classification algorithms belonging to statistics and machine learning: Logistic regression, Support Vector Machines and Random forests. These supervised learning techniques infer a classification function from labelled training data (pixels of permafrost absence and presence) with the aim of predicting the permafrost occurrence where it is unknown. The research was carried out in a 588km 2 area of the Western Swiss Alps. Permafrost evidences were mapped from ortho-image interpretation (rock glacier inventorying) and field data (mainly geoelectrical and thermal data). The relationship between selected permafrost evidences and permafrost controlling factors was computed with the mentioned techniques. Classification performances, assessed with AUROC, range between 0.81 for Logistic regression, 0.85 with Support Vector Machines and 0.88 with Random forests. The adopted machine learning algorithms have demonstrated to be efficient for permafrost distribution modelling thanks to consistent results compared to the field reality. The high resolution of the input dataset (10m) allows elaborating maps at the micro-scale with a modelled permafrost spatial distribution less optimistic than classic spatial models. Moreover, the probability output of adopted algorithms offers a more precise overview of the potential distribution of mountain permafrost than proposing simple indexes of the permafrost favorability. These encouraging results also open the way to new possibilities of permafrost data analysis and mapping. Copyright © 2017 Elsevier B.V. All rights reserved.

  20. Mapping SOC (Soil Organic Carbon) using LiDAR-derived vegetation indices in a random forest regression model

    NASA Astrophysics Data System (ADS)

    Will, R. M.; Glenn, N. F.; Benner, S. G.; Pierce, J. L.; Spaete, L.; Li, A.

    2015-12-01

    Quantifying SOC (Soil Organic Carbon) storage in complex terrain is challenging due to high spatial variability. Generally, the challenge is met by transforming point data to the entire landscape using surrogate, spatially-distributed, variables like elevation or precipitation. In many ecosystems, remotely sensed information on above-ground vegetation (e.g. NDVI) is a good predictor of below-ground carbon stocks. In this project, we are attempting to improve this predictive method by incorporating LiDAR-derived vegetation indices. LiDAR provides a mechanism for improved characterization of aboveground vegetation by providing structural parameters such as vegetation height and biomass. In this study, a random forest model is used to predict SOC using a suite of LiDAR-derived vegetation indices as predictor variables. The Reynolds Creek Experimental Watershed (RCEW) is an ideal location for a study of this type since it encompasses a strong elevation/precipitation gradient that supports lower biomass sagebrush ecosystems at low elevations and forests with more biomass at higher elevations. Sagebrush ecosystems composed of Wyoming, Low and Mountain Sagebrush have SOC values ranging from .4 to 1% (top 30 cm), while higher biomass ecosystems composed of aspen, juniper and fir have SOC values approaching 4% (top 30 cm). Large differences in SOC have been observed between canopy and interspace locations and high resolution vegetation information is likely to explain plot scale variability in SOC. Mapping of the SOC reservoir will help identify underlying controls on SOC distribution and provide insight into which processes are most important in determining SOC in semi-arid mountainous regions. In addition, airborne LiDAR has the potential to characterize vegetation communities at a high resolution and could be a tool for improving estimates of SOC at larger scales.

  1. Virtual screening by a new Clustering-based Weighted Similarity Extreme Learning Machine approach

    PubMed Central

    Kudisthalert, Wasu

    2018-01-01

    Machine learning techniques are becoming popular in virtual screening tasks. One of the powerful machine learning algorithms is Extreme Learning Machine (ELM) which has been applied to many applications and has recently been applied to virtual screening. We propose the Weighted Similarity ELM (WS-ELM) which is based on a single layer feed-forward neural network in a conjunction of 16 different similarity coefficients as activation function in the hidden layer. It is known that the performance of conventional ELM is not robust due to random weight selection in the hidden layer. Thus, we propose a Clustering-based WS-ELM (CWS-ELM) that deterministically assigns weights by utilising clustering algorithms i.e. k-means clustering and support vector clustering. The experiments were conducted on one of the most challenging datasets–Maximum Unbiased Validation Dataset–which contains 17 activity classes carefully selected from PubChem. The proposed algorithms were then compared with other machine learning techniques such as support vector machine, random forest, and similarity searching. The results show that CWS-ELM in conjunction with support vector clustering yields the best performance when utilised together with Sokal/Sneath(1) coefficient. Furthermore, ECFP_6 fingerprint presents the best results in our framework compared to the other types of fingerprints, namely ECFP_4, FCFP_4, and FCFP_6. PMID:29652912

  2. GIS based Cadastral level Forest Information System using World View-II data in Bir Hisar (Haryana)

    NASA Astrophysics Data System (ADS)

    Mothi Kumar, K. E.; Singh, S.; Attri, P.; Kumar, R.; Kumar, A.; Sarika; Hooda, R. S.; Sapra, R. K.; Garg, V.; Kumar, V.; Nivedita

    2014-11-01

    Identification and demarcation of Forest lands on the ground remains a major challenge in Forest administration and management. Cadastral forest mapping deals with forestlands boundary delineation and their associated characterization (forest/non forest). The present study is an application of high resolution World View-II data for digitization of Protected Forest boundary at cadastral level with integration of Records of Right (ROR) data. Cadastral vector data was generated by digitization of spatial data using scanned mussavies in ArcGIS environment. Ortho-images were created from World View-II digital stereo data with Universal Transverse Mercator coordinate system with WGS 84 datum. Cadastral vector data of Bir Hisar (Hisar district, Haryana) and adjacent villages was spatially adjusted over ortho-image using ArcGIS software. Edge matching of village boundaries was done with respect to khasra boundaries of individual village. The notified forest grids were identified on ortho-image and grid vector data was extracted from georeferenced cadastral data. Cadastral forest boundary vectors were digitized from ortho-images. Accuracy of cadastral data was checked by comparison of randomly selected geo-coordinates points, tie lines and boundary measurements of randomly selected parcels generated from image data set with that of actual field measurements. Area comparison was done between cadastral map area, the image map area and RoR area. The area covered under Protected Forest was compared with ROR data and within an accuracy of less than 1 % from ROR area was accepted. The methodology presented in this paper is useful to update the cadastral forest maps. The produced GIS databases and large-scale Forest Maps may serve as a data foundation towards a land register of forests. The study introduces the use of very high resolution satellite data to develop a method for cadastral surveying through on - screen digitization in a less time as compared to the old fashioned cadastral parcel boundaries surveying method.

  3. 25 CFR 163.36 - Tribal forestry program financial support.

    Code of Federal Regulations, 2010 CFR

    2010-04-01

    ... services to carry out forest land management activities and shall be based on levels of funding assistance... carrying out forest land management activities. Such financial support shall be made available through the... of carrying out forest land management activities may apply and qualify for tribal forestry program...

  4. 25 CFR 163.36 - Tribal forestry program financial support.

    Code of Federal Regulations, 2011 CFR

    2011-04-01

    ... services to carry out forest land management activities and shall be based on levels of funding assistance... carrying out forest land management activities. Such financial support shall be made available through the... of carrying out forest land management activities may apply and qualify for tribal forestry program...

  5. Aggregating pixel-level basal area predictions derived from LiDAR data to industrial forest stands in North-Central Idaho

    Treesearch

    Andrew T. Hudak; Jeffrey S. Evans; Nicholas L. Crookston; Michael J. Falkowski; Brant K. Steigers; Rob Taylor; Halli Hemingway

    2008-01-01

    Stand exams are the principal means by which timber companies monitor and manage their forested lands. Airborne LiDAR surveys sample forest stands at much finer spatial resolution and broader spatial extent than is practical on the ground. In this paper, we developed models that leverage spatially intensive and extensive LiDAR data and a stratified random sample of...

  6. Assessing the accuracy of respondents reports of the location of their home relative to a national forest boundary and forest cover

    Treesearch

    John D. Baldridge; James T. Sylvester; William T. Borrie

    2005-01-01

    Local, state, and national agencies charged with managing wildlands in the United States are now seeking to learn more about the public's preferences for managing forests. For this reason agency wildland managers are making use of survey research to supplement their public input processes. Agency managers often choose random-digit dial telephone surveys because of...

  7. Effect of the federal estate tax on nonindustrial private forest holdings

    Treesearch

    John L. Greene; Steven H. Bullard; Tamara L. Cushing; Theodore Beauvais

    2006-01-01

    Data for this study were collected using a questionnaire mailed to randomly selected members of two forest owner organizations. Among the key findings is that 38% of forest estates owed federal estate tax, a rate many times higher than US estates in general. In 28% of the cases where estate tax was due, timber or land was sold because other assets were not adequate. In...

  8. Acorn Production on the Missouri Ozark Forest Ecosystem Project Study Sites: Pre-treatment Data

    Treesearch

    Larry D. Vangilder

    1997-01-01

    In the pre-treatment phase of a study to determine if even- and uneven-aged forest management affects the production of acorns on the Missourt Forest Ecosystem Project (MOFEP) study sites, acorn production was measured on the nine study sites by randomly placing from 2 to 6 plots in each of four ecological land type (ELT) groupings (N=130 plots). A split-plot...

  9. Low temperature growth of ultra-high mass density carbon nanotube forests on conductive supports

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sugime, Hisashi; Esconjauregui, Santiago; Yang, Junwei

    2013-08-12

    We grow ultra-high mass density carbon nanotube forests at 450 °C on Ti-coated Cu supports using Co-Mo co-catalyst. X-ray photoelectron spectroscopy shows Mo strongly interacts with Ti and Co, suppressing both aggregation and lifting off of Co particles and, thus, promoting the root growth mechanism. The forests average a height of 0.38 μm and a mass density of 1.6 g cm{sup −3}. This mass density is the highest reported so far, even at higher temperatures or on insulators. The forests and Cu supports show ohmic conductivity (lowest resistance ∼22 kΩ), suggesting Co-Mo is useful for applications requiring forest growth onmore » conductors.« less

  10. Community turnover of wood-inhabiting fungi across hierarchical spatial scales.

    PubMed

    Abrego, Nerea; García-Baquero, Gonzalo; Halme, Panu; Ovaskainen, Otso; Salcedo, Isabel

    2014-01-01

    For efficient use of conservation resources it is important to determine how species diversity changes across spatial scales. In many poorly known species groups little is known about at which spatial scales the conservation efforts should be focused. Here we examined how the community turnover of wood-inhabiting fungi is realised at three hierarchical levels, and how much of community variation is explained by variation in resource composition and spatial proximity. The hierarchical study design consisted of management type (fixed factor), forest site (random factor, nested within management type) and study plots (randomly placed plots within each study site). To examine how species richness varied across the three hierarchical scales, randomized species accumulation curves and additive partitioning of species richness were applied. To analyse variation in wood-inhabiting species and dead wood composition at each scale, linear and Permanova modelling approaches were used. Wood-inhabiting fungal communities were dominated by rare and infrequent species. The similarity of fungal communities was higher within sites and within management categories than among sites or between the two management categories, and it decreased with increasing distance among the sampling plots and with decreasing similarity of dead wood resources. However, only a small part of community variation could be explained by these factors. The species present in managed forests were in a large extent a subset of those species present in natural forests. Our results suggest that in particular the protection of rare species requires a large total area. As managed forests have only little additional value complementing the diversity of natural forests, the conservation of natural forests is the key to ecologically effective conservation. As the dissimilarity of fungal communities increases with distance, the conserved natural forest sites should be broadly distributed in space, yet the individual conserved areas should be large enough to ensure local persistence.

  11. Community Turnover of Wood-Inhabiting Fungi across Hierarchical Spatial Scales

    PubMed Central

    Abrego, Nerea; García-Baquero, Gonzalo; Halme, Panu; Ovaskainen, Otso; Salcedo, Isabel

    2014-01-01

    For efficient use of conservation resources it is important to determine how species diversity changes across spatial scales. In many poorly known species groups little is known about at which spatial scales the conservation efforts should be focused. Here we examined how the community turnover of wood-inhabiting fungi is realised at three hierarchical levels, and how much of community variation is explained by variation in resource composition and spatial proximity. The hierarchical study design consisted of management type (fixed factor), forest site (random factor, nested within management type) and study plots (randomly placed plots within each study site). To examine how species richness varied across the three hierarchical scales, randomized species accumulation curves and additive partitioning of species richness were applied. To analyse variation in wood-inhabiting species and dead wood composition at each scale, linear and Permanova modelling approaches were used. Wood-inhabiting fungal communities were dominated by rare and infrequent species. The similarity of fungal communities was higher within sites and within management categories than among sites or between the two management categories, and it decreased with increasing distance among the sampling plots and with decreasing similarity of dead wood resources. However, only a small part of community variation could be explained by these factors. The species present in managed forests were in a large extent a subset of those species present in natural forests. Our results suggest that in particular the protection of rare species requires a large total area. As managed forests have only little additional value complementing the diversity of natural forests, the conservation of natural forests is the key to ecologically effective conservation. As the dissimilarity of fungal communities increases with distance, the conserved natural forest sites should be broadly distributed in space, yet the individual conserved areas should be large enough to ensure local persistence. PMID:25058128

  12. Linking Attitudes, Policy, and Forest Cover Change in Buffer Zone Communities of Chitwan National Park, Nepal

    NASA Astrophysics Data System (ADS)

    Stapp, Jared R.; Lilieholm, Robert J.; Leahy, Jessica; Upadhaya, Suraj

    2016-06-01

    Deforestation in Nepal threatens the functioning of complex social-ecological systems, including rural populations that depend on forests for subsistence, as well as Nepal's biodiversity and other ecosystem services. Nepal's forests are particularly important to the nation's poorest inhabitants, as many depend upon them for daily survival. Two-thirds of Nepal's population relies on forests for sustenance, and these pressures are likely to increase in the future. This, coupled with high population densities and growth rates, highlights the importance of studying the relationship between human communities, forest cover trends through time, and forest management institutions. Here, we used surveys to explore how household attitudes associated with conservation-related behaviors in two rural communities—one that has experienced significant forest loss, and the other forest gain—compare with forest cover trends as indicated by satellite-derived forest-loss and -regeneration estimates between 2005 and 2013. Results found a significant difference in attitudes in the two areas, perhaps contributing to and reacting from current forest conditions. In both study sites, participation in community forestry strengthened support for conservation, forest conservation-related attitudes aligned with forest cover trends, and a negative relationship was found between economic status and having supportive forest conservation-related attitudes. In addition, on average, respondents were not satisfied with their district forest officers and did not feel that the current political climate in Nepal supported sustainable forestry. These findings are important as Nepal's Master Plan for the Forestry Sector has expired and the country is in the process of structuring a new Forestry Sector Strategy.

  13. Linking Attitudes, Policy, and Forest Cover Change in Buffer Zone Communities of Chitwan National Park, Nepal.

    PubMed

    Stapp, Jared R; Lilieholm, Robert J; Leahy, Jessica; Upadhaya, Suraj

    2016-06-01

    Deforestation in Nepal threatens the functioning of complex social-ecological systems, including rural populations that depend on forests for subsistence, as well as Nepal's biodiversity and other ecosystem services. Nepal's forests are particularly important to the nation's poorest inhabitants, as many depend upon them for daily survival. Two-thirds of Nepal's population relies on forests for sustenance, and these pressures are likely to increase in the future. This, coupled with high population densities and growth rates, highlights the importance of studying the relationship between human communities, forest cover trends through time, and forest management institutions. Here, we used surveys to explore how household attitudes associated with conservation-related behaviors in two rural communities-one that has experienced significant forest loss, and the other forest gain-compare with forest cover trends as indicated by satellite-derived forest-loss and -regeneration estimates between 2005 and 2013. Results found a significant difference in attitudes in the two areas, perhaps contributing to and reacting from current forest conditions. In both study sites, participation in community forestry strengthened support for conservation, forest conservation-related attitudes aligned with forest cover trends, and a negative relationship was found between economic status and having supportive forest conservation-related attitudes. In addition, on average, respondents were not satisfied with their district forest officers and did not feel that the current political climate in Nepal supported sustainable forestry. These findings are important as Nepal's Master Plan for the Forestry Sector has expired and the country is in the process of structuring a new Forestry Sector Strategy.

  14. Forecasting Solar Flares Using Magnetogram-based Predictors and Machine Learning

    NASA Astrophysics Data System (ADS)

    Florios, Kostas; Kontogiannis, Ioannis; Park, Sung-Hong; Guerra, Jordan A.; Benvenuto, Federico; Bloomfield, D. Shaun; Georgoulis, Manolis K.

    2018-02-01

    We propose a forecasting approach for solar flares based on data from Solar Cycle 24, taken by the Helioseismic and Magnetic Imager (HMI) on board the Solar Dynamics Observatory (SDO) mission. In particular, we use the Space-weather HMI Active Region Patches (SHARP) product that facilitates cut-out magnetograms of solar active regions (AR) in the Sun in near-realtime (NRT), taken over a five-year interval (2012 - 2016). Our approach utilizes a set of thirteen predictors, which are not included in the SHARP metadata, extracted from line-of-sight and vector photospheric magnetograms. We exploit several machine learning (ML) and conventional statistics techniques to predict flares of peak magnitude {>} M1 and {>} C1 within a 24 h forecast window. The ML methods used are multi-layer perceptrons (MLP), support vector machines (SVM), and random forests (RF). We conclude that random forests could be the prediction technique of choice for our sample, with the second-best method being multi-layer perceptrons, subject to an entropy objective function. A Monte Carlo simulation showed that the best-performing method gives accuracy ACC=0.93(0.00), true skill statistic TSS=0.74(0.02), and Heidke skill score HSS=0.49(0.01) for {>} M1 flare prediction with probability threshold 15% and ACC=0.84(0.00), TSS=0.60(0.01), and HSS=0.59(0.01) for {>} C1 flare prediction with probability threshold 35%.

  15. Decision tree modeling using R.

    PubMed

    Zhang, Zhongheng

    2016-08-01

    In machine learning field, decision tree learner is powerful and easy to interpret. It employs recursive binary partitioning algorithm that splits the sample in partitioning variable with the strongest association with the response variable. The process continues until some stopping criteria are met. In the example I focus on conditional inference tree, which incorporates tree-structured regression models into conditional inference procedures. While growing a single tree is subject to small changes in the training data, random forests procedure is introduced to address this problem. The sources of diversity for random forests come from the random sampling and restricted set of input variables to be selected. Finally, I introduce R functions to perform model based recursive partitioning. This method incorporates recursive partitioning into conventional parametric model building.

  16. VizieR Online Data Catalog: Gamma-ray AGN type determination (Hassan+, 2013)

    NASA Astrophysics Data System (ADS)

    Hassan, T.; Mirabal, N.; Contreras, J. L.; Oya, I.

    2013-11-01

    In this paper, we employ Support Vector Machines (SVMs) and Random Forest (RF) that embody two of the most robust supervised learning algorithms available today. We are interested in building classifiers that can distinguish between two AGN classes: BL Lacs and FSRQs. In the 2FGL, there is a total set of 1074 identified/associated AGN objects with the following labels: 'bzb' (BL Lacs), 'bzq' (FSRQs), 'agn' (other non-blazar AGN) and 'agu' (active galaxies of uncertain type). From this global set, we group the identified/associated blazars ('bzb' and 'bzq' labels) as the training/testing set of our algorithms. (2 data files).

  17. Application of XGBoost algorithm in hourly PM2.5 concentration prediction

    NASA Astrophysics Data System (ADS)

    Pan, Bingyue

    2018-02-01

    In view of prediction techniques of hourly PM2.5 concentration in China, this paper applied the XGBoost(Extreme Gradient Boosting) algorithm to predict hourly PM2.5 concentration. The monitoring data of air quality in Tianjin city was analyzed by using XGBoost algorithm. The prediction performance of the XGBoost method is evaluated by comparing observed and predicted PM2.5 concentration using three measures of forecast accuracy. The XGBoost method is also compared with the random forest algorithm, multiple linear regression, decision tree regression and support vector machines for regression models using computational results. The results demonstrate that the XGBoost algorithm outperforms other data mining methods.

  18. Estimating the impact of mineral aerosols on crop yields in food insecure regions using statistical crop models

    NASA Astrophysics Data System (ADS)

    Hoffman, A.; Forest, C. E.; Kemanian, A.

    2016-12-01

    A significant number of food-insecure nations exist in regions of the world where dust plays a large role in the climate system. While the impacts of common climate variables (e.g. temperature, precipitation, ozone, and carbon dioxide) on crop yields are relatively well understood, the impact of mineral aerosols on yields have not yet been thoroughly investigated. This research aims to develop the data and tools to progress our understanding of mineral aerosol impacts on crop yields. Suspended dust affects crop yields by altering the amount and type of radiation reaching the plant, modifying local temperature and precipitation. While dust events (i.e. dust storms) affect crop yields by depleting the soil of nutrients or by defoliation via particle abrasion. The impact of dust on yields is modeled statistically because we are uncertain which impacts will dominate the response on national and regional scales considered in this study. Multiple linear regression is used in a number of large-scale statistical crop modeling studies to estimate yield responses to various climate variables. In alignment with previous work, we develop linear crop models, but build upon this simple method of regression with machine-learning techniques (e.g. random forests) to identify important statistical predictors and isolate how dust affects yields on the scales of interest. To perform this analysis, we develop a crop-climate dataset for maize, soybean, groundnut, sorghum, rice, and wheat for the regions of West Africa, East Africa, South Africa, and the Sahel. Random forest regression models consistently model historic crop yields better than the linear models. In several instances, the random forest models accurately capture the temperature and precipitation threshold behavior in crops. Additionally, improving agricultural technology has caused a well-documented positive trend that dominates time series of global and regional yields. This trend is often removed before regression with traditional crop models, but likely at the cost of removing climate information. Our random forest models consistently discover the positive trend without removing any additional data. The application of random forests as a statistical crop model provides insight into understanding the impact of dust on yields in marginal food producing regions.

  19. Characterizing channel change along a multithread gravel-bed river using random forest image classification

    NASA Astrophysics Data System (ADS)

    Overstreet, B. T.; Legleiter, C. J.

    2012-12-01

    The Snake River in Grand Teton National Park is a dam-regulated but highly dynamic gravel-bed river that alternates between a single thread and a multithread planform. Identifying key drivers of channel change on this river could improve our understanding of 1) how flow regulation at Jackson Lake Dam has altered the character of the river over time; 2) how changes in the distribution of various types of vegetation impacts river dynamics; and 3) how the Snake River will respond to future human and climate driven disturbances. Despite the importance of monitoring planform changes over time, automated channel extraction and understanding the physical drivers contributing to channel change continue to be challenging yet critical steps in the remote sensing of riverine environments. In this study we use the random forest statistical technique to first classify land cover within the Snake River corridor and then extract channel features from a sequence of high-resolution multispectral images of the Snake River spanning the period from 2006 to 2012, which encompasses both exceptionally dry years and near-record runoff in 2011. We show that the random forest technique can be used to classify images with as few as four spectral bands with far greater accuracy than traditional single-tree classification approaches. Secondly, we couple random forest derived land cover maps with LiDAR derived topography, bathymetry, and canopy height to explore physical drivers contributing to observed channel changes on the Snake River. In conclusion we show that the random forest technique is a powerful tool for classifying multispectral images of rivers. Moreover, we hypothesize that with sufficient data for calculating spatially distributed metrics of channel form and more frequent channel monitoring, this tool can also be used to identify areas with high probabilities of channel change. Land cover maps of a portion of the Snake River produced from digital aerial photography from 2010 and a 2011 WorldView2 satellite image. This pair of maps thus captures changes that occurred during the 2011 runoff

  20. Using DCOM to support interoperability in forest ecosystem management decision support systems

    Treesearch

    W.D. Potter; S. Liu; X. Deng; H.M. Rauscher

    2000-01-01

    Forest ecosystems exhibit complex dynamics over time and space. Management of forest ecosystems involves the need to forecast future states of complex systems that are often undergoing structural changes. This in turn requires integration of quantitative science and engineering components with sociopolitical, regulatory, and economic considerations. The amount of data...

  1. Pigmented skin lesion detection using random forest and wavelet-based texture

    NASA Astrophysics Data System (ADS)

    Hu, Ping; Yang, Tie-jun

    2016-10-01

    The incidence of cutaneous malignant melanoma, a disease of worldwide distribution and is the deadliest form of skin cancer, has been rapidly increasing over the last few decades. Because advanced cutaneous melanoma is still incurable, early detection is an important step toward a reduction in mortality. Dermoscopy photographs are commonly used in melanoma diagnosis and can capture detailed features of a lesion. A great variability exists in the visual appearance of pigmented skin lesions. Therefore, in order to minimize the diagnostic errors that result from the difficulty and subjectivity of visual interpretation, an automatic detection approach is required. The objectives of this paper were to propose a hybrid method using random forest and Gabor wavelet transformation to accurately differentiate which part belong to lesion area and the other is not in a dermoscopy photographs and analyze segmentation accuracy. A random forest classifier consisting of a set of decision trees was used for classification. Gabor wavelets transformation are the mathematical model of visual cortical cells of mammalian brain and an image can be decomposed into multiple scales and multiple orientations by using it. The Gabor function has been recognized as a very useful tool in texture analysis, due to its optimal localization properties in both spatial and frequency domain. Texture features based on Gabor wavelets transformation are found by the Gabor filtered image. Experiment results indicate the following: (1) the proposed algorithm based on random forest outperformed the-state-of-the-art in pigmented skin lesions detection (2) and the inclusion of Gabor wavelet transformation based texture features improved segmentation accuracy significantly.

  2. Modeling urban coastal flood severity from crowd-sourced flood reports using Poisson regression and Random Forest

    NASA Astrophysics Data System (ADS)

    Sadler, J. M.; Goodall, J. L.; Morsy, M. M.; Spencer, K.

    2018-04-01

    Sea level rise has already caused more frequent and severe coastal flooding and this trend will likely continue. Flood prediction is an essential part of a coastal city's capacity to adapt to and mitigate this growing problem. Complex coastal urban hydrological systems however, do not always lend themselves easily to physically-based flood prediction approaches. This paper presents a method for using a data-driven approach to estimate flood severity in an urban coastal setting using crowd-sourced data, a non-traditional but growing data source, along with environmental observation data. Two data-driven models, Poisson regression and Random Forest regression, are trained to predict the number of flood reports per storm event as a proxy for flood severity, given extensive environmental data (i.e., rainfall, tide, groundwater table level, and wind conditions) as input. The method is demonstrated using data from Norfolk, Virginia USA from September 2010 to October 2016. Quality-controlled, crowd-sourced street flooding reports ranging from 1 to 159 per storm event for 45 storm events are used to train and evaluate the models. Random Forest performed better than Poisson regression at predicting the number of flood reports and had a lower false negative rate. From the Random Forest model, total cumulative rainfall was by far the most dominant input variable in predicting flood severity, followed by low tide and lower low tide. These methods serve as a first step toward using data-driven methods for spatially and temporally detailed coastal urban flood prediction.

  3. Personalized Risk Prediction in Clinical Oncology Research: Applications and Practical Issues Using Survival Trees and Random Forests.

    PubMed

    Hu, Chen; Steingrimsson, Jon Arni

    2018-01-01

    A crucial component of making individualized treatment decisions is to accurately predict each patient's disease risk. In clinical oncology, disease risks are often measured through time-to-event data, such as overall survival and progression/recurrence-free survival, and are often subject to censoring. Risk prediction models based on recursive partitioning methods are becoming increasingly popular largely due to their ability to handle nonlinear relationships, higher-order interactions, and/or high-dimensional covariates. The most popular recursive partitioning methods are versions of the Classification and Regression Tree (CART) algorithm, which builds a simple interpretable tree structured model. With the aim of increasing prediction accuracy, the random forest algorithm averages multiple CART trees, creating a flexible risk prediction model. Risk prediction models used in clinical oncology commonly use both traditional demographic and tumor pathological factors as well as high-dimensional genetic markers and treatment parameters from multimodality treatments. In this article, we describe the most commonly used extensions of the CART and random forest algorithms to right-censored outcomes. We focus on how they differ from the methods for noncensored outcomes, and how the different splitting rules and methods for cost-complexity pruning impact these algorithms. We demonstrate these algorithms by analyzing a randomized Phase III clinical trial of breast cancer. We also conduct Monte Carlo simulations to compare the prediction accuracy of survival forests with more commonly used regression models under various scenarios. These simulation studies aim to evaluate how sensitive the prediction accuracy is to the underlying model specifications, the choice of tuning parameters, and the degrees of missing covariates.

  4. Application of Remote Sensing for Forest Management in Nepal

    NASA Astrophysics Data System (ADS)

    Bajracharya, B.; Matin, M. A.

    2016-12-01

    Large area of the Hindu Kush Himalayan (HKH) region is covered by forest that is playing a vital role to address the challenges of climate change and livelihood options for a growing population. Effective management of forest cover needs establishment of regular monitoring system for forest. Supporting REDD assessment needs reliable baseline assessment of forest biomass and its monitoring at multiple scale. Adaptation of forest to climate change needs understanding vulnerability of forests and dependence of local communities on these forest. We present here different forest monitoring products developed under the SERVIR-Himalaya programme to address these issues. Landsat 30 meter images were used for decadal land cover change assessment and annual forest change hotspot monitoring. Methodology developed for biomass estimation at national and sub-national level biomass estimation. Decision support system was developed for analysis of forest vulnerability and dependence and selection of adaptation options based on resource availability. These products are forming the basis for development of an integrated system that will be very useful for comprehensive forest monitoring and long term strategy development for sustainable forest management.

  5. Patch forest: a hybrid framework of random forest and patch-based segmentation

    NASA Astrophysics Data System (ADS)

    Xie, Zhongliu; Gillies, Duncan

    2016-03-01

    The development of an accurate, robust and fast segmentation algorithm has long been a research focus in medical computer vision. State-of-the-art practices often involve non-rigidly registering a target image with a set of training atlases for label propagation over the target space to perform segmentation, a.k.a. multi-atlas label propagation (MALP). In recent years, the patch-based segmentation (PBS) framework has gained wide attention due to its advantage of relaxing the strict voxel-to-voxel correspondence to a series of pair-wise patch comparisons for contextual pattern matching. Despite a high accuracy reported in many scenarios, computational efficiency has consistently been a major obstacle for both approaches. Inspired by recent work on random forest, in this paper we propose a patch forest approach, which by equipping the conventional PBS with a fast patch search engine, is able to boost segmentation speed significantly while retaining an equal level of accuracy. In addition, a fast forest training mechanism is also proposed, with the use of a dynamic grid framework to efficiently approximate data compactness computation and a 3D integral image technique for fast box feature retrieval.

  6. Evaluation of Classifier Performance for Multiclass Phenotype Discrimination in Untargeted Metabolomics.

    PubMed

    Trainor, Patrick J; DeFilippis, Andrew P; Rai, Shesh N

    2017-06-21

    Statistical classification is a critical component of utilizing metabolomics data for examining the molecular determinants of phenotypes. Despite this, a comprehensive and rigorous evaluation of the accuracy of classification techniques for phenotype discrimination given metabolomics data has not been conducted. We conducted such an evaluation using both simulated and real metabolomics datasets, comparing Partial Least Squares-Discriminant Analysis (PLS-DA), Sparse PLS-DA, Random Forests, Support Vector Machines (SVM), Artificial Neural Network, k -Nearest Neighbors ( k -NN), and Naïve Bayes classification techniques for discrimination. We evaluated the techniques on simulated data generated to mimic global untargeted metabolomics data by incorporating realistic block-wise correlation and partial correlation structures for mimicking the correlations and metabolite clustering generated by biological processes. Over the simulation studies, covariance structures, means, and effect sizes were stochastically varied to provide consistent estimates of classifier performance over a wide range of possible scenarios. The effects of the presence of non-normal error distributions, the introduction of biological and technical outliers, unbalanced phenotype allocation, missing values due to abundances below a limit of detection, and the effect of prior-significance filtering (dimension reduction) were evaluated via simulation. In each simulation, classifier parameters, such as the number of hidden nodes in a Neural Network, were optimized by cross-validation to minimize the probability of detecting spurious results due to poorly tuned classifiers. Classifier performance was then evaluated using real metabolomics datasets of varying sample medium, sample size, and experimental design. We report that in the most realistic simulation studies that incorporated non-normal error distributions, unbalanced phenotype allocation, outliers, missing values, and dimension reduction, classifier performance (least to greatest error) was ranked as follows: SVM, Random Forest, Naïve Bayes, sPLS-DA, Neural Networks, PLS-DA and k -NN classifiers. When non-normal error distributions were introduced, the performance of PLS-DA and k -NN classifiers deteriorated further relative to the remaining techniques. Over the real datasets, a trend of better performance of SVM and Random Forest classifier performance was observed.

  7. Fuzzy association rule mining and classification for the prediction of malaria in South Korea.

    PubMed

    Buczak, Anna L; Baugher, Benjamin; Guven, Erhan; Ramac-Thomas, Liane C; Elbert, Yevgeniy; Babin, Steven M; Lewis, Sheri H

    2015-06-18

    Malaria is the world's most prevalent vector-borne disease. Accurate prediction of malaria outbreaks may lead to public health interventions that mitigate disease morbidity and mortality. We describe an application of a method for creating prediction models utilizing Fuzzy Association Rule Mining to extract relationships between epidemiological, meteorological, climatic, and socio-economic data from Korea. These relationships are in the form of rules, from which the best set of rules is automatically chosen and forms a classifier. Two classifiers have been built and their results fused to become a malaria prediction model. Future malaria cases are predicted as Low, Medium or High, where these classes are defined as a total of 0-2, 3-16, and above 17 cases, respectively, for a region in South Korea during a two-week period. Based on user recommendations, HIGH is considered an outbreak. Model accuracy is described by Positive Predictive Value (PPV), Sensitivity, and F-score for each class, computed on test data not previously used to develop the model. For predictions made 7-8 weeks in advance, model PPV and Sensitivity are 0.842 and 0.681, respectively, for the HIGH classes. The F0.5 and F3 scores (which combine PPV and Sensitivity) are 0.804 and 0.694, respectively, for the HIGH classes. The overall FARM results (as measured by F-scores) are significantly better than those obtained by Decision Tree, Random Forest, Support Vector Machine, and Holt-Winters methods for the HIGH class. For the Medium class, Random Forest and FARM obtain comparable results, with FARM being better at F0.5, and Random Forest obtaining a higher F3. A previously described method for creating disease prediction models has been modified and extended to build models for predicting malaria. In addition, some new input variables were used, including indicators of intervention measures. The South Korea malaria prediction models predict Low, Medium or High cases 7-8 weeks in the future. This paper demonstrates that our data driven approach can be used for the prediction of different diseases.

  8. Deep learning approach for classifying, detecting and predicting photometric redshifts of quasars in the Sloan Digital Sky Survey stripe 82

    NASA Astrophysics Data System (ADS)

    Pasquet-Itam, J.; Pasquet, J.

    2018-04-01

    We have applied a convolutional neural network (CNN) to classify and detect quasars in the Sloan Digital Sky Survey Stripe 82 and also to predict the photometric redshifts of quasars. The network takes the variability of objects into account by converting light curves into images. The width of the images, noted w, corresponds to the five magnitudes ugriz and the height of the images, noted h, represents the date of the observation. The CNN provides good results since its precision is 0.988 for a recall of 0.90, compared to a precision of 0.985 for the same recall with a random forest classifier. Moreover 175 new quasar candidates are found with the CNN considering a fixed recall of 0.97. The combination of probabilities given by the CNN and the random forest makes good performance even better with a precision of 0.99 for a recall of 0.90. For the redshift predictions, the CNN presents excellent results which are higher than those obtained with a feature extraction step and different classifiers (a K-nearest-neighbors, a support vector machine, a random forest and a Gaussian process classifier). Indeed, the accuracy of the CNN within |Δz| < 0.1 can reach 78.09%, within |Δz| < 0.2 reaches 86.15%, within |Δz| < 0.3 reaches 91.2% and the value of root mean square (rms) is 0.359. The performance of the KNN decreases for the three |Δz| regions, since within the accuracy of |Δz| < 0.1, |Δz| < 0.2, and |Δz| < 0.3 is 73.72%, 82.46%, and 90.09% respectively, and the value of rms amounts to 0.395. So the CNN successfully reduces the dispersion and the catastrophic redshifts of quasars. This new method is very promising for the future of big databases such as the Large Synoptic Survey Telescope. A table of the candidates is only available at the CDS via anonymous ftp to http://cdsarc.u-strasbg.fr (http://130.79.128.5) or via http://cdsarc.u-strasbg.fr/viz-bin/qcat?J/A+A/611/A97

  9. Machine learning models in breast cancer survival prediction.

    PubMed

    Montazeri, Mitra; Montazeri, Mohadeseh; Montazeri, Mahdieh; Beigzadeh, Amin

    2016-01-01

    Breast cancer is one of the most common cancers with a high mortality rate among women. With the early diagnosis of breast cancer survival will increase from 56% to more than 86%. Therefore, an accurate and reliable system is necessary for the early diagnosis of this cancer. The proposed model is the combination of rules and different machine learning techniques. Machine learning models can help physicians to reduce the number of false decisions. They try to exploit patterns and relationships among a large number of cases and predict the outcome of a disease using historical cases stored in datasets. The objective of this study is to propose a rule-based classification method with machine learning techniques for the prediction of different types of Breast cancer survival. We use a dataset with eight attributes that include the records of 900 patients in which 876 patients (97.3%) and 24 (2.7%) patients were females and males respectively. Naive Bayes (NB), Trees Random Forest (TRF), 1-Nearest Neighbor (1NN), AdaBoost (AD), Support Vector Machine (SVM), RBF Network (RBFN), and Multilayer Perceptron (MLP) machine learning techniques with 10-cross fold technique were used with the proposed model for the prediction of breast cancer survival. The performance of machine learning techniques were evaluated with accuracy, precision, sensitivity, specificity, and area under ROC curve. Out of 900 patients, 803 patients and 97 patients were alive and dead, respectively. In this study, Trees Random Forest (TRF) technique showed better results in comparison to other techniques (NB, 1NN, AD, SVM and RBFN, MLP). The accuracy, sensitivity and the area under ROC curve of TRF are 96%, 96%, 93%, respectively. However, 1NN machine learning technique provided poor performance (accuracy 91%, sensitivity 91% and area under ROC curve 78%). This study demonstrates that Trees Random Forest model (TRF) which is a rule-based classification model was the best model with the highest level of accuracy. Therefore, this model is recommended as a useful tool for breast cancer survival prediction as well as medical decision making.

  10. Multi-label spacecraft electrical signal classification method based on DBN and random forest

    PubMed Central

    Li, Ke; Yu, Nan; Li, Pengfei; Song, Shimin; Wu, Yalei; Li, Yang; Liu, Meng

    2017-01-01

    In spacecraft electrical signal characteristic data, there exists a large amount of data with high-dimensional features, a high computational complexity degree, and a low rate of identification problems, which causes great difficulty in fault diagnosis of spacecraft electronic load systems. This paper proposes a feature extraction method that is based on deep belief networks (DBN) and a classification method that is based on the random forest (RF) algorithm; The proposed algorithm mainly employs a multi-layer neural network to reduce the dimension of the original data, and then, classification is applied. Firstly, we use the method of wavelet denoising, which was used to pre-process the data. Secondly, the deep belief network is used to reduce the feature dimension and improve the rate of classification for the electrical characteristics data. Finally, we used the random forest algorithm to classify the data and comparing it with other algorithms. The experimental results show that compared with other algorithms, the proposed method shows excellent performance in terms of accuracy, computational efficiency, and stability in addressing spacecraft electrical signal data. PMID:28486479

  11. Multi-label spacecraft electrical signal classification method based on DBN and random forest.

    PubMed

    Li, Ke; Yu, Nan; Li, Pengfei; Song, Shimin; Wu, Yalei; Li, Yang; Liu, Meng

    2017-01-01

    In spacecraft electrical signal characteristic data, there exists a large amount of data with high-dimensional features, a high computational complexity degree, and a low rate of identification problems, which causes great difficulty in fault diagnosis of spacecraft electronic load systems. This paper proposes a feature extraction method that is based on deep belief networks (DBN) and a classification method that is based on the random forest (RF) algorithm; The proposed algorithm mainly employs a multi-layer neural network to reduce the dimension of the original data, and then, classification is applied. Firstly, we use the method of wavelet denoising, which was used to pre-process the data. Secondly, the deep belief network is used to reduce the feature dimension and improve the rate of classification for the electrical characteristics data. Finally, we used the random forest algorithm to classify the data and comparing it with other algorithms. The experimental results show that compared with other algorithms, the proposed method shows excellent performance in terms of accuracy, computational efficiency, and stability in addressing spacecraft electrical signal data.

  12. Intelligent Fault Diagnosis of HVCB with Feature Space Optimization-Based Random Forest

    PubMed Central

    Ma, Suliang; Wu, Jianwen; Wang, Yuhao; Jia, Bowen; Jiang, Yuan

    2018-01-01

    Mechanical faults of high-voltage circuit breakers (HVCBs) always happen over long-term operation, so extracting the fault features and identifying the fault type have become a key issue for ensuring the security and reliability of power supply. Based on wavelet packet decomposition technology and random forest algorithm, an effective identification system was developed in this paper. First, compared with the incomplete description of Shannon entropy, the wavelet packet time-frequency energy rate (WTFER) was adopted as the input vector for the classifier model in the feature selection procedure. Then, a random forest classifier was used to diagnose the HVCB fault, assess the importance of the feature variable and optimize the feature space. Finally, the approach was verified based on actual HVCB vibration signals by considering six typical fault classes. The comparative experiment results show that the classification accuracy of the proposed method with the origin feature space reached 93.33% and reached up to 95.56% with optimized input feature vector of classifier. This indicates that feature optimization procedure is successful, and the proposed diagnosis algorithm has higher efficiency and robustness than traditional methods. PMID:29659548

  13. Spectral Analysis of Ultrasound Radiofrequency Backscatter for the Detection of Intercostal Blood Vessels.

    PubMed

    Klingensmith, Jon D; Haggard, Asher; Fedewa, Russell J; Qiang, Beidi; Cummings, Kenneth; DeGrande, Sean; Vince, D Geoffrey; Elsharkawy, Hesham

    2018-04-19

    Spectral analysis of ultrasound radiofrequency backscatter has the potential to identify intercostal blood vessels during ultrasound-guided placement of paravertebral nerve blocks and intercostal nerve blocks. Autoregressive models were used for spectral estimation, and bandwidth, autoregressive order and region-of-interest size were evaluated. Eight spectral parameters were calculated and used to create random forests. An autoregressive order of 10, bandwidth of 6 dB and region-of-interest size of 1.0 mm resulted in the minimum out-of-bag error. An additional random forest, using these chosen values, was created from 70% of the data and evaluated independently from the remaining 30% of data. The random forest achieved a predictive accuracy of 92% and Youden's index of 0.85. These results suggest that spectral analysis of ultrasound radiofrequency backscatter has the potential to identify intercostal blood vessels. (jokling@siue.edu) © 2018 World Federation for Ultrasound in Medicine and Biology. Copyright © 2018 World Federation for Ultrasound in Medicine and Biology. Published by Elsevier Inc. All rights reserved.

  14. RandomForest4Life: a Random Forest for predicting ALS disease progression.

    PubMed

    Hothorn, Torsten; Jung, Hans H

    2014-09-01

    We describe a method for predicting disease progression in amyotrophic lateral sclerosis (ALS) patients. The method was developed as a submission to the DREAM Phil Bowen ALS Prediction Prize4Life Challenge of summer 2012. Based on repeated patient examinations over a three- month period, we used a random forest algorithm to predict future disease progression. The procedure was set up and internally evaluated using data from 1197 ALS patients. External validation by an expert jury was based on undisclosed information of an additional 625 patients; all patient data were obtained from the PRO-ACT database. In terms of prediction accuracy, the approach described here ranked third best. Our interpretation of the prediction model confirmed previous reports suggesting that past disease progression is a strong predictor of future disease progression measured on the ALS functional rating scale (ALSFRS). We also found that larger variability in initial ALSFRS scores is linked to faster future disease progression. The results reported here furthermore suggested that approaches taking the multidimensionality of the ALSFRS into account promise some potential for improved ALS disease prediction.

  15. RAQ–A Random Forest Approach for Predicting Air Quality in Urban Sensing Systems

    PubMed Central

    Yu, Ruiyun; Yang, Yu; Yang, Leyou; Han, Guangjie; Move, Oguti Ann

    2016-01-01

    Air quality information such as the concentration of PM2.5 is of great significance for human health and city management. It affects the way of traveling, urban planning, government policies and so on. However, in major cities there is typically only a limited number of air quality monitoring stations. In the meantime, air quality varies in the urban areas and there can be large differences, even between closely neighboring regions. In this paper, a random forest approach for predicting air quality (RAQ) is proposed for urban sensing systems. The data generated by urban sensing includes meteorology data, road information, real-time traffic status and point of interest (POI) distribution. The random forest algorithm is exploited for data training and prediction. The performance of RAQ is evaluated with real city data. Compared with three other algorithms, this approach achieves better prediction precision. Exciting results are observed from the experiments that the air quality can be inferred with amazingly high accuracy from the data which are obtained from urban sensing. PMID:26761008

  16. PET-CT image fusion using random forest and à-trous wavelet transform.

    PubMed

    Seal, Ayan; Bhattacharjee, Debotosh; Nasipuri, Mita; Rodríguez-Esparragón, Dionisio; Menasalvas, Ernestina; Gonzalo-Martin, Consuelo

    2018-03-01

    New image fusion rules for multimodal medical images are proposed in this work. Image fusion rules are defined by random forest learning algorithm and a translation-invariant à-trous wavelet transform (AWT). The proposed method is threefold. First, source images are decomposed into approximation and detail coefficients using AWT. Second, random forest is used to choose pixels from the approximation and detail coefficients for forming the approximation and detail coefficients of the fused image. Lastly, inverse AWT is applied to reconstruct fused image. All experiments have been performed on 198 slices of both computed tomography and positron emission tomography images of a patient. A traditional fusion method based on Mallat wavelet transform has also been implemented on these slices. A new image fusion performance measure along with 4 existing measures has been presented, which helps to compare the performance of 2 pixel level fusion methods. The experimental results clearly indicate that the proposed method outperforms the traditional method in terms of visual and quantitative qualities and the new measure is meaningful. Copyright © 2017 John Wiley & Sons, Ltd.

  17. GPURFSCREEN: a GPU based virtual screening tool using random forest classifier.

    PubMed

    Jayaraj, P B; Ajay, Mathias K; Nufail, M; Gopakumar, G; Jaleel, U C A

    2016-01-01

    In-silico methods are an integral part of modern drug discovery paradigm. Virtual screening, an in-silico method, is used to refine data models and reduce the chemical space on which wet lab experiments need to be performed. Virtual screening of a ligand data model requires large scale computations, making it a highly time consuming task. This process can be speeded up by implementing parallelized algorithms on a Graphical Processing Unit (GPU). Random Forest is a robust classification algorithm that can be employed in the virtual screening. A ligand based virtual screening tool (GPURFSCREEN) that uses random forests on GPU systems has been proposed and evaluated in this paper. This tool produces optimized results at a lower execution time for large bioassay data sets. The quality of results produced by our tool on GPU is same as that on a regular serial environment. Considering the magnitude of data to be screened, the parallelized virtual screening has a significantly lower running time at high throughput. The proposed parallel tool outperforms its serial counterpart by successfully screening billions of molecules in training and prediction phases.

  18. Multiscale habitat use and selection in cooperatively breeding Micronesian kingfishers

    USGS Publications Warehouse

    Kesler, D.C.; Haig, S.M.

    2007-01-01

    Information about the interaction between behavior and landscape resources is key to directing conservation management for endangered species. We studied multi-scale occurrence, habitat use, and selection in a cooperatively breeding population of Micronesian kingfishers (Todiramphus cinnamominus) on the island of Pohnpei, Federated States of Micronesia. At the landscape level, point-transect surveys resulted in kingfisher detection frequencies that were higher than those reported in 1994, although they remained 15-40% lower than 1983 indices. Integration of spatially explicit vegetation information with survey results indicated that kingfisher detections were positively associated with the amount of wet forest and grass-urban vegetative cover, and they were negatively associated with agricultural forest, secondary vegetation, and upland forest cover types. We used radiotelemetry and remote sensing to evaluate habitat use by individual kingfishers at the home-range scale. A comparison of habitats in Micronesian kingfisher home ranges with those in randomly placed polygons illustrated that birds used more forested areas than were randomly available in the immediate surrounding area. Further, members of cooperatively breeding groups included more forest in their home ranges than birds in pair-breeding territories, and forested portions of study areas appeared to be saturated with territories. Together, these results suggested that forest habitats were limited for Micronesian kingfishers. Thus, protecting and managing forests is important for the restoration of Micronesian kingfishers to the island of Guam (United States Territory), where they are currently extirpated, as well as to maintaining kingfisher populations on the islands of Pohnpei and Palau. Results further indicated that limited forest resources may restrict dispersal opportunities and, therefore, play a role in delayed dispersal and cooperative behaviors in Micronesian kingfishers.

  19. Modelling above Ground Biomass of Mangrove Forest Using SENTINEL-1 Imagery

    NASA Astrophysics Data System (ADS)

    Labadisos Argamosa, Reginald Jay; Conferido Blanco, Ariel; Balidoy Baloloy, Alvin; Gumbao Candido, Christian; Lovern Caboboy Dumalag, John Bart; Carandang Dimapilis, Lee, , Lady; Camero Paringit, Enrico

    2018-04-01

    Many studies have been conducted in the estimation of forest above ground biomass (AGB) using features from synthetic aperture radar (SAR). Specifically, L-band ALOS/PALSAR (wavelength 23 cm) data is often used. However, few studies have been made on the use of shorter wavelengths (e.g., C-band, 3.75 cm to 7.5 cm) for forest mapping especially in tropical forests since higher attenuation is observed for volumetric objects where energy propagated is absorbed. This study aims to model AGB estimates of mangrove forest using information derived from Sentinel-1 C-band SAR data. Combinations of polarisations (VV, VH), its derivatives, grey level co-occurrence matrix (GLCM), and its principal components were used as features for modelling AGB. Five models were tested with varying combinations of features; a) sigma nought polarisations and its derivatives; b) GLCM textures; c) the first five principal components; d) combination of models a-c; and e) the identified important features by Random Forest variable importance algorithm. Random Forest was used as regressor to compute for the AGB estimates to avoid over fitting caused by the introduction of too many features in the model. Model e obtained the highest r2 of 0.79 and an RMSE of 0.44 Mg using only four features, namely, σ°VH GLCM variance, σ°VH GLCM contrast, PC1, and PC2. This study shows that Sentinel-1 C-band SAR data could be used to produce acceptable AGB estimates in mangrove forest to compensate for the unavailability of longer wavelength SAR.

  20. Planning and implementing forest operations to achieve sustainable forests: Proceedings of papers presented at the joint meeting of the Council on Forest Engineering and International Union of Forest Research Organizations.

    Treesearch

    Charles R. Blinn; Michael A. Thompson

    1996-01-01

    Contains a variety of papers presented at the joint meeting of the Council on Forest Engineering and International Union of Forest Research Organizations Subject Group S3.04 and that support the meeting theme "Planning and Implementing Forest Operations to Achieve Sustainable Forests."

  1. Machine Learning Algorithms for prediction of regions of high Reynolds Averaged Navier Stokes Uncertainty

    NASA Astrophysics Data System (ADS)

    Mishra, Aashwin; Iaccarino, Gianluca

    2017-11-01

    In spite of their deficiencies, RANS models represent the workhorse for industrial investigations into turbulent flows. In this context, it is essential to provide diagnostic measures to assess the quality of RANS predictions. To this end, the primary step is to identify feature importances amongst massive sets of potentially descriptive and discriminative flow features. This aids the physical interpretability of the resultant discrepancy model and its extensibility to similar problems. Recent investigations have utilized approaches such as Random Forests, Support Vector Machines and the Least Absolute Shrinkage and Selection Operator for feature selection. With examples, we exhibit how such methods may not be suitable for turbulent flow datasets. The underlying rationale, such as the correlation bias and the required conditions for the success of penalized algorithms, are discussed with illustrative examples. Finally, we provide alternate approaches using convex combinations of regularized regression approaches and randomized sub-sampling in combination with feature selection algorithms, to infer model structure from data. This research was supported by the Defense Advanced Research Projects Agency under the Enabling Quantification of Uncertainty in Physical Systems (EQUiPS) project (technical monitor: Dr Fariba Fahroo).

  2. Use of DNA markers in forest tree improvement research

    Treesearch

    D.B. Neale; M.E. Devey; K.D. Jermstad; M.R. Ahuja; M.C. Alosi; K.A. Marshall

    1992-01-01

    DNA markers are rapidly being developed for forest trees. The most important markers are restriction fragment length polymorphisms (RFLPs), polymerase chain reaction- (PCR) based markers such as random amplified polymorphic DNA (RAPD), and fingerprinting markers. DNA markers can supplement isozyme markers for monitoring tree improvement activities such as; estimating...

  3. Influence of alternative silviculture on small mammals

    USGS Publications Warehouse

    Waldien, David L.; Hayes, John P.

    2006-01-01

    HIGHLIGHT: A variety of harvest methods promote diversity within forests while still generating income. For example, recent studies have shown that when dead wood is left on the forest floor during harvest, biodiversity increases. A new Cooperative Forest Ecosystem Research (CFER) program fact sheet summarizes how small mammals respond to dead wood in forests that are harvested with alternative methods. CFER is developing a series of fact sheets about responses to changes in young western Oregon forests. The fact sheets are designed to help resource managers balance management needs, including timber and wildlife. The USGS provides a primary source of financial support for CFER, a consortium of federal and state partners conducting research in support of the Northwest Forest Plan.

  4. Northwest research experimental forests: A hundred years in the making

    Treesearch

    Theresa B. Jain

    2015-01-01

    Over the past 100 years, experimental forests and ranges (forests) have supported research that produced long-term knowledge about our forests and ranges, and their resources. These forests are living laboratories and are rare assets that serve as places to conduct forest research to meet society’s natural resource needs.

  5. A k-mer-based barcode DNA classification methodology based on spectral representation and a neural gas network.

    PubMed

    Fiannaca, Antonino; La Rosa, Massimo; Rizzo, Riccardo; Urso, Alfonso

    2015-07-01

    In this paper, an alignment-free method for DNA barcode classification that is based on both a spectral representation and a neural gas network for unsupervised clustering is proposed. In the proposed methodology, distinctive words are identified from a spectral representation of DNA sequences. A taxonomic classification of the DNA sequence is then performed using the sequence signature, i.e., the smallest set of k-mers that can assign a DNA sequence to its proper taxonomic category. Experiments were then performed to compare our method with other supervised machine learning classification algorithms, such as support vector machine, random forest, ripper, naïve Bayes, ridor, and classification tree, which also consider short DNA sequence fragments of 200 and 300 base pairs (bp). The experimental tests were conducted over 10 real barcode datasets belonging to different animal species, which were provided by the on-line resource "Barcode of Life Database". The experimental results showed that our k-mer-based approach is directly comparable, in terms of accuracy, recall and precision metrics, with the other classifiers when considering full-length sequences. In addition, we demonstrate the robustness of our method when a classification is performed task with a set of short DNA sequences that were randomly extracted from the original data. For example, the proposed method can reach the accuracy of 64.8% at the species level with 200-bp fragments. Under the same conditions, the best other classifier (random forest) reaches the accuracy of 20.9%. Our results indicate that we obtained a clear improvement over the other classifiers for the study of short DNA barcode sequence fragments. Copyright © 2015 Elsevier B.V. All rights reserved.

  6. Temperature of upland and peatland soils in a north central Minnesota forest

    Treesearch

    Dale S. Nichols

    1998-01-01

    Soil temperature strongly influences physical, chemical, and biological activities in soil. However, soil temperature data for forest landscapes are scarce. For 6 yr, weekly soil temperatures were measured at two upland and four peatland sites in north central Minnesota. One upland site supported mature aspen forest, the other supported short grass. One peatland site...

  7. Forest diversity and disturbance: changing influences and the future of Virginia's Forests

    Treesearch

    Christine J. Small; James L. Chamberlain

    2015-01-01

    The Virginia landscape supports a remarkable diversity of forests, from maritime dunes, swamp forests, and pine savannas of the Atlantic coastal plain, to post-agricultural pine-hardwood forests of the piedmont, to mixed oak, mixed-mesophytic, northern hardwood, and high elevation conifer forests in Appalachian mountain provinces. Virginia’s forests also have been...

  8. Predicting adaptive phenotypes from multilocus genotypes in Sitka spruce (Picea sitchensis) using random forest.

    PubMed

    Holliday, Jason A; Wang, Tongli; Aitken, Sally

    2012-09-01

    Climate is the primary driver of the distribution of tree species worldwide, and the potential for adaptive evolution will be an important factor determining the response of forests to anthropogenic climate change. Although association mapping has the potential to improve our understanding of the genomic underpinnings of climatically relevant traits, the utility of adaptive polymorphisms uncovered by such studies would be greatly enhanced by the development of integrated models that account for the phenotypic effects of multiple single-nucleotide polymorphisms (SNPs) and their interactions simultaneously. We previously reported the results of association mapping in the widespread conifer Sitka spruce (Picea sitchensis). In the current study we used the recursive partitioning algorithm 'Random Forest' to identify optimized combinations of SNPs to predict adaptive phenotypes. After adjusting for population structure, we were able to explain 37% and 30% of the phenotypic variation, respectively, in two locally adaptive traits--autumn budset timing and cold hardiness. For each trait, the leading five SNPs captured much of the phenotypic variation. To determine the role of epistasis in shaping these phenotypes, we also used a novel approach to quantify the strength and direction of pairwise interactions between SNPs and found such interactions to be common. Our results demonstrate the power of Random Forest to identify subsets of markers that are most important to climatic adaptation, and suggest that interactions among these loci may be widespread.

  9. Bird distributional patterns support biogeographical histories and are associated with bioclimatic units in the Atlantic Forest, Brazil.

    PubMed

    Carvalho, Cristiano DE Santana; Nascimento, Nayla Fábia Ferreira DO; Araujo, Helder F P DE

    2017-10-17

    Rivers as barriers to dispersal and past forest refugia are two of the hypotheses proposed to explain the patterns of biodiversity in the Atlantic Forest. It has recently been shown that possible past refugia correspond to bioclimatically different regions, so we tested whether patterns of shared distribution of bird taxa in the Atlantic Forest are 1) limited by the Doce and São Francisco rivers or 2) associated with the bioclimatically different southern and northeastern regions. We catalogued lists of forest birds from 45 locations, 36 in the Atlantic forest and nine in Amazon, and used parsimony analysis of endemicity to identify groups of shared taxa. We also compared differences between these groups by permutational multivariate analysis of variance and identified the species that best supported the resulting groups. The results showed that the distribution of forest birds is divided into two main regions in the Atlantic Forest, the first with more southern localities and the second with northeastern localities. This distributional pattern is not delimited by riverbanks, but it may be associated with bioclimatic units, surrogated by altitude, that maintain current environmental differences between two main regions on Atlantic Forest and may be related to phylogenetic histories of taxa supporting the two groups.

  10. Large-Scale Habitat Corridors for Biodiversity Conservation: A Forest Corridor in Madagascar

    PubMed Central

    Ramiadantsoa, Tanjona; Ovaskainen, Otso; Rybicki, Joel; Hanski, Ilkka

    2015-01-01

    In biodiversity conservation, habitat corridors are assumed to increase landscape-level connectivity and to enhance the viability of otherwise isolated populations. While the role of corridors is supported by empirical evidence, studies have typically been conducted at small spatial scales. Here, we assess the quality and the functionality of a large 95-km long forest corridor connecting two large national parks (416 and 311 km2) in the southeastern escarpment of Madagascar. We analyze the occurrence of 300 species in 5 taxonomic groups in the parks and in the corridor, and combine high-resolution forest cover data with a simulation model to examine various scenarios of corridor destruction. At present, the corridor contains essentially the same communities as the national parks, reflecting its breadth which on average matches that of the parks. In the simulation model, we consider three types of dispersers: passive dispersers, which settle randomly around the source population; active dispersers, which settle only in favorable habitat; and gap-avoiding active dispersers, which avoid dispersing across non-habitat. Our results suggest that long-distance passive dispersers are most sensitive to ongoing degradation of the corridor, because increasing numbers of propagules are lost outside the forest habitat. For a wide range of dispersal parameters, the national parks are large enough to sustain stable populations until the corridor becomes severely broken, which will happen around 2065 if the current rate of forest loss continues. A significant decrease in gene flow along the corridor is expected after 2040, and this will exacerbate the adverse consequences of isolation. Our results demonstrate that simulation studies assessing the role of habitat corridors should pay close attention to the mode of dispersal and the effects of regional stochasticity. PMID:26200351

  11. Large-Scale Habitat Corridors for Biodiversity Conservation: A Forest Corridor in Madagascar.

    PubMed

    Ramiadantsoa, Tanjona; Ovaskainen, Otso; Rybicki, Joel; Hanski, Ilkka

    2015-01-01

    In biodiversity conservation, habitat corridors are assumed to increase landscape-level connectivity and to enhance the viability of otherwise isolated populations. While the role of corridors is supported by empirical evidence, studies have typically been conducted at small spatial scales. Here, we assess the quality and the functionality of a large 95-km long forest corridor connecting two large national parks (416 and 311 km2) in the southeastern escarpment of Madagascar. We analyze the occurrence of 300 species in 5 taxonomic groups in the parks and in the corridor, and combine high-resolution forest cover data with a simulation model to examine various scenarios of corridor destruction. At present, the corridor contains essentially the same communities as the national parks, reflecting its breadth which on average matches that of the parks. In the simulation model, we consider three types of dispersers: passive dispersers, which settle randomly around the source population; active dispersers, which settle only in favorable habitat; and gap-avoiding active dispersers, which avoid dispersing across non-habitat. Our results suggest that long-distance passive dispersers are most sensitive to ongoing degradation of the corridor, because increasing numbers of propagules are lost outside the forest habitat. For a wide range of dispersal parameters, the national parks are large enough to sustain stable populations until the corridor becomes severely broken, which will happen around 2065 if the current rate of forest loss continues. A significant decrease in gene flow along the corridor is expected after 2040, and this will exacerbate the adverse consequences of isolation. Our results demonstrate that simulation studies assessing the role of habitat corridors should pay close attention to the mode of dispersal and the effects of regional stochasticity.

  12. The potential predictability of fire danger provided by ECMWF forecast

    NASA Astrophysics Data System (ADS)

    Di Giuseppe, Francesca

    2017-04-01

    The European Forest Fire Information System (EFFIS), is currently being developed in the framework of the Copernicus Emergency Management Services to monitor and forecast fire danger in Europe. The system provides timely information to civil protection authorities in 38 nations across Europe and mostly concentrates on flagging regions which might be at high danger of spontaneous ignition due to persistent drought. The daily predictions of fire danger conditions are based on the US Forest Service National Fire Danger Rating System (NFDRS), the Canadian forest service Fire Weather Index Rating System (FWI) and the Australian McArthur (MARK-5) rating systems. Weather forcings are provided in real time by the European Centre for Medium range Weather Forecasts (ECMWF) forecasting system. The global system's potential predictability is assessed using re-analysis fields as weather forcings. The Global Fire Emissions Database (GFED4) provides 11 years of observed burned areas from satellite measurements and is used as a validation dataset. The fire indices implemented are good predictors to highlight dangerous conditions. High values are correlated with observed fire and low values correspond to non observed events. A more quantitative skill evaluation was performed using the Extremal Dependency Index which is a skill score specifically designed for rare events. It revealed that the three indices were more skilful on a global scale than the random forecast to detect large fires. The performance peaks in the boreal forests, in the Mediterranean, the Amazon rain-forests and southeast Asia. The skill-scores were then aggregated at country level to reveal which nations could potentiallty benefit from the system information in aid of decision making and fire control support. Overall we found that fire danger modelling based on weather forecasts, can provide reasonable predictability over large parts of the global landmass.

  13. Idaho forest carbon projections from 2017 to 2117 under forest disturbance and climate change scenarios

    NASA Astrophysics Data System (ADS)

    Hudak, A. T.; Crookston, N.; Kennedy, R. E.; Domke, G. M.; Fekety, P.; Falkowski, M. J.

    2017-12-01

    Commercial off-the-shelf lidar collections associated with tree measures in field plots allow aboveground biomass (AGB) estimation with high confidence. Predictive models developed from such datasets are used operationally to map AGB across lidar project areas. We use a random selection of these pixel-level AGB predictions as training for predicting AGB annually across Idaho and western Montana, primarily from Landsat time series imagery processed through LandTrendr. At both the landscape and regional scales, Random Forests is used for predictive AGB modeling. To project future carbon dynamics, we use Climate-FVS (Forest Vegetation Simulator), the tree growth engine used by foresters to inform forest planning decisions, under either constant or changing climate scenarios. Disturbance data compiled from LandTrendr (Kennedy et al. 2010) using TimeSync (Cohen et al. 2010) in forested lands of Idaho (n=509) and western Montana (n=288) are used to generate probabilities of disturbance (harvest, fire, or insect) by land ownership class (public, private) as well as the magnitude of disturbance. Our verification approach is to aggregate the regional, annual AGB predictions at the county level and compare them to annual county-level AGB summarized independently from systematic, field-based, annual inventories conducted by the US Forest Inventory and Analysis (FIA) Program nationally. This analysis shows that when federal lands are disturbed the magnitude is generally high and when other lands are disturbed the magnitudes are more moderate. The probability of disturbance in corporate lands is higher than in other lands but the magnitudes are generally lower. This is consistent with the much higher prevalence of fire and insects occurring on federal lands, and greater harvest activity on private lands. We found large forest carbon losses in drier southern Idaho, only partially offset by carbon gains in wetter northern Idaho, due to anticipated climate change. Public and private forest managers can use these forest carbon projections to 2117 to inform 2017 decisions on which tree species and seed sources to select for planting, and implement forest management strategies now that may seek to maximize forest carbon sequestration for greenhouse gas abatement a century from now.

  14. Building capacity for national carbon measurements for reducing emissions from deforestation and forest degradation

    NASA Astrophysics Data System (ADS)

    Goetz, S. J.; Laporte, N.; Horning, N.; Pelletier, J.; Jantz, P.; Ndunda, P.

    2014-12-01

    Many tropical countries are now working on developing their strategies for reducing emissions from deforestation and forest degradation, including activities that result in conservation or enhancement of forest carbon stocks and sustainable management of forests to effectively decrease atmospheric carbon emissions (i.e. REDD+). A new international REDD+ agreement is at the heart of recent negotiations of the parties to the UN Framework Convention on Climate Change (UNFCCC). REDD+ mechanisms could provide an opportunity to not only diminish an important source of emissions, but also to promote large-scale conservation of tropical forests and establish incentives and opportunities to alleviate poverty. Most tropical countries still lack basic information for developing and implementing their forest carbon stock assessments, including the extent of forest area and the rate at which forests are being cleared and/or degraded, and the carbon amounts associated with these losses. These same countries also need support to conduct integrated assessments of the most promising approaches for reducing emissions, and in identifying those policy options that hold the greatest potential while minimizing potential negative impacts of REDD+ policies. The WHRC SERVIR project in East Africa is helping to provide these data sets to countries via best practice tools and methods to support cost effective forest carbon monitoring solutions and more informed decision making processes under REDD+. We will present the results of our capacity building activites in the region and planned future efforts being coordinated with the NASA-SERVIR Hub in Kenya to support to REDD+ decision support.

  15. Aspen, climate, and sudden decline in western USA

    Treesearch

    Gerald E. Rehfeldt; Dennis E. Ferguson; Nicholas L. Crookston

    2009-01-01

    A bioclimate model predicting the presence or absence of aspen, Populus tremuloides, in western USA from climate variables was developed by using the Random Forests classification tree on Forest Inventory data from about 118,000 permanent sample plots. A reasonably parsimonious model used eight predictors to describe aspen's climate profile. Classification errors...

  16. Variation in Local-Scale Edge Effects: Mechanisms and landscape Context

    Treesearch

    Therese M. Donovan; Peter W. Jones; Elizabeth M. Annand; Frank R. Thompson III

    1997-01-01

    Ecological processes near habitat edges often differ from processes away from edges. Yet, the generality of "edge effects" has been hotly debated because results vary tremendously. To understand the factors responsible for this variation, we described nest predation and cowbird distribution patterns in forest edge and forest core habitats on 36 randomly...

  17. Mitigating budget constraints on visitation volume surveys: the case of U.S. National forests

    Treesearch

    Ashley E. Askew; Donald B.K. English; Stanley J. Zarnoch; Neelam C. Poudyal; J.M. Bowker

    2014-01-01

    Stratified random sampling (SRS) provides a scientifically based estimate of a population comprising mutually exclusive, homogenous subgroups. In the National Visitor Use Monitoring (NVUM) program, SRS is used to estimate recreation visitation and visitor characteristics across activities on National forests. However, with rising costs and declining budgets, carrying...

  18. Demographic influences on environmental value orientations and normative beliefs about national forest management

    Treesearch

    Jerry J. Vaske; Maureen P. Donnelly; Daniel R. Williams; Sandra Jonker

    2001-01-01

    Using the cognitive hierarchy as the theoretical foundation, this article examines the predictive influence of individuals' demographic characteristics on environmental value orientations and normative beliefs about national forest management. Data for this investigation were obtained from a random sample of Colorado residents (n = 960). As predicted by theory, a...

  19. Subtyping cognitive profiles in Autism Spectrum Disorder using a Functional Random Forest algorithm.

    PubMed

    Feczko, E; Balba, N M; Miranda-Dominguez, O; Cordova, M; Karalunas, S L; Irwin, L; Demeter, D V; Hill, A P; Langhorst, B H; Grieser Painter, J; Van Santen, J; Fombonne, E J; Nigg, J T; Fair, D A

    2018-05-15

    DSM-5 Autism Spectrum Disorder (ASD) comprises a set of neurodevelopmental disorders characterized by deficits in social communication and interaction and repetitive behaviors or restricted interests, and may both affect and be affected by multiple cognitive mechanisms. This study attempts to identify and characterize cognitive subtypes within the ASD population using our Functional Random Forest (FRF) machine learning classification model. This model trained a traditional random forest model on measures from seven tasks that reflect multiple levels of information processing. 47 ASD diagnosed and 58 typically developing (TD) children between the ages of 9 and 13 participated in this study. Our RF model was 72.7% accurate, with 80.7% specificity and 63.1% sensitivity. Using the random forest model, the FRF then measures the proximity of each subject to every other subject, generating a distance matrix between participants. This matrix is then used in a community detection algorithm to identify subgroups within the ASD and TD groups, and revealed 3 ASD and 4 TD putative subgroups with unique behavioral profiles. We then examined differences in functional brain systems between diagnostic groups and putative subgroups using resting-state functional connectivity magnetic resonance imaging (rsfcMRI). Chi-square tests revealed a significantly greater number of between group differences (p < .05) within the cingulo-opercular, visual, and default systems as well as differences in inter-system connections in the somato-motor, dorsal attention, and subcortical systems. Many of these differences were primarily driven by specific subgroups suggesting that our method could potentially parse the variation in brain mechanisms affected by ASD. Copyright © 2017. Published by Elsevier Inc.

  20. Predicting attention-deficit/hyperactivity disorder severity from psychosocial stress and stress-response genes: a random forest regression approach

    PubMed Central

    van der Meer, D; Hoekstra, P J; van Donkelaar, M; Bralten, J; Oosterlaan, J; Heslenfeld, D; Faraone, S V; Franke, B; Buitelaar, J K; Hartman, C A

    2017-01-01

    Identifying genetic variants contributing to attention-deficit/hyperactivity disorder (ADHD) is complicated by the involvement of numerous common genetic variants with small effects, interacting with each other as well as with environmental factors, such as stress exposure. Random forest regression is well suited to explore this complexity, as it allows for the analysis of many predictors simultaneously, taking into account any higher-order interactions among them. Using random forest regression, we predicted ADHD severity, measured by Conners’ Parent Rating Scales, from 686 adolescents and young adults (of which 281 were diagnosed with ADHD). The analysis included 17 374 single-nucleotide polymorphisms (SNPs) across 29 genes previously linked to hypothalamic–pituitary–adrenal (HPA) axis activity, together with information on exposure to 24 individual long-term difficulties or stressful life events. The model explained 12.5% of variance in ADHD severity. The most important SNP, which also showed the strongest interaction with stress exposure, was located in a region regulating the expression of telomerase reverse transcriptase (TERT). Other high-ranking SNPs were found in or near NPSR1, ESR1, GABRA6, PER3, NR3C2 and DRD4. Chronic stressors were more influential than single, severe, life events. Top hits were partly shared with conduct problems. We conclude that random forest regression may be used to investigate how multiple genetic and environmental factors jointly contribute to ADHD. It is able to implicate novel SNPs of interest, interacting with stress exposure, and may explain inconsistent findings in ADHD genetics. This exploratory approach may be best combined with more hypothesis-driven research; top predictors and their interactions with one another should be replicated in independent samples. PMID:28585928

  1. Simple to complex modeling of breathing volume using a motion sensor.

    PubMed

    John, Dinesh; Staudenmayer, John; Freedson, Patty

    2013-06-01

    To compare simple and complex modeling techniques to estimate categories of low, medium, and high ventilation (VE) from ActiGraph™ activity counts. Vertical axis ActiGraph™ GT1M activity counts, oxygen consumption and VE were measured during treadmill walking and running, sports, household chores and labor-intensive employment activities. Categories of low (<19.3 l/min), medium (19.3 to 35.4 l/min) and high (>35.4 l/min) VEs were derived from activity intensity classifications (light <2.9 METs, moderate 3.0 to 5.9 METs and vigorous >6.0 METs). We examined the accuracy of two simple techniques (multiple regression and activity count cut-point analyses) and one complex (random forest technique) modeling technique in predicting VE from activity counts. Prediction accuracy of the complex random forest technique was marginally better than the simple multiple regression method. Both techniques accurately predicted VE categories almost 80% of the time. The multiple regression and random forest techniques were more accurate (85 to 88%) in predicting medium VE. Both techniques predicted the high VE (70 to 73%) with greater accuracy than low VE (57 to 60%). Actigraph™ cut-points for light, medium and high VEs were <1381, 1381 to 3660 and >3660 cpm. There were minor differences in prediction accuracy between the multiple regression and the random forest technique. This study provides methods to objectively estimate VE categories using activity monitors that can easily be deployed in the field. Objective estimates of VE should provide a better understanding of the dose-response relationship between internal exposure to pollutants and disease. Copyright © 2013 Elsevier B.V. All rights reserved.

  2. Managing salinity in Upper Colorado River Basin streams: Selecting catchments for sediment control efforts using watershed characteristics and random forests models

    USGS Publications Warehouse

    Tillman, Fred; Anning, David W.; Heilman, Julian A.; Buto, Susan G.; Miller, Matthew P.

    2018-01-01

    Elevated concentrations of dissolved-solids (salinity) including calcium, sodium, sulfate, and chloride, among others, in the Colorado River cause substantial problems for its water users. Previous efforts to reduce dissolved solids in upper Colorado River basin (UCRB) streams often focused on reducing suspended-sediment transport to streams, but few studies have investigated the relationship between suspended sediment and salinity, or evaluated which watershed characteristics might be associated with this relationship. Are there catchment properties that may help in identifying areas where control of suspended sediment will also reduce salinity transport to streams? A random forests classification analysis was performed on topographic, climate, land cover, geology, rock chemistry, soil, and hydrologic information in 163 UCRB catchments. Two random forests models were developed in this study: one for exploring stream and catchment characteristics associated with stream sites where dissolved solids increase with increasing suspended-sediment concentration, and the other for predicting where these sites are located in unmonitored reaches. Results of variable importance from the exploratory random forests models indicate that no simple source, geochemical process, or transport mechanism can easily explain the relationship between dissolved solids and suspended sediment concentrations at UCRB monitoring sites. Among the most important watershed characteristics in both models were measures of soil hydraulic conductivity, soil erodibility, minimum catchment elevation, catchment area, and the silt component of soil in the catchment. Predictions at key locations in the basin were combined with observations from selected monitoring sites, and presented in map-form to give a complete understanding of where catchment sediment control practices would also benefit control of dissolved solids in streams.

  3. Remote sensing leaf water stress in coffee (Coffea arabica) using secondary effects of water absorption and random forests

    NASA Astrophysics Data System (ADS)

    Chemura, Abel; Mutanga, Onisimo; Dube, Timothy

    2017-08-01

    Water management is an important component in agriculture, particularly for perennial tree crops such as coffee. Proper detection and monitoring of water stress therefore plays an important role not only in mitigating the associated adverse impacts on crop growth and productivity but also in reducing expensive and environmentally unsustainable irrigation practices. Current methods for water stress detection in coffee production mainly involve monitoring plant physiological characteristics and soil conditions. In this study, we tested the ability of selected wavebands in the VIS/NIR range to predict plant water content (PWC) in coffee using the random forest algorithm. An experiment was set up such that coffee plants were exposed to different levels of water stress and reflectance and plant water content measured. In selecting appropriate parameters, cross-correlation identified 11 wavebands, reflectance difference identified 16 and reflectance sensitivity identified 22 variables related to PWC. Only three wavebands (485 nm, 670 nm and 885 nm) were identified by at least two methods as significant. The selected wavebands were trained (n = 36) and tested on independent data (n = 24) after being integrated into the random forest algorithm to predict coffee PWC. The results showed that the reflectance sensitivity selected bands performed the best in water stress detection (r = 0.87, RMSE = 4.91% and pBias = 0.9%), when compared to reflectance difference (r = 0.79, RMSE = 6.19 and pBias = 2.5%) and cross-correlation selected wavebands (r = 0.75, RMSE = 6.52 and pBias = 1.6). These results indicate that it is possible to reliably predict PWC using wavebands in the VIS/NIR range that correspond with many of the available multispectral scanners using random forests and further research at field and landscape scale is required to operationalize these findings.

  4. Properties of Protein Drug Target Classes

    PubMed Central

    Bull, Simon C.; Doig, Andrew J.

    2015-01-01

    Accurate identification of drug targets is a crucial part of any drug development program. We mined the human proteome to discover properties of proteins that may be important in determining their suitability for pharmaceutical modulation. Data was gathered concerning each protein’s sequence, post-translational modifications, secondary structure, germline variants, expression profile and drug target status. The data was then analysed to determine features for which the target and non-target proteins had significantly different values. This analysis was repeated for subsets of the proteome consisting of all G-protein coupled receptors, ion channels, kinases and proteases, as well as proteins that are implicated in cancer. Machine learning was used to quantify the proteins in each dataset in terms of their potential to serve as a drug target. This was accomplished by first inducing a random forest that could distinguish between its targets and non-targets, and then using the random forest to quantify the drug target likeness of the non-targets. The properties that can best differentiate targets from non-targets were primarily those that are directly related to a protein’s sequence (e.g. secondary structure). Germline variants, expression levels and interactions between proteins had minimal discriminative power. Overall, the best indicators of drug target likeness were found to be the proteins’ hydrophobicities, in vivo half-lives, propensity for being membrane bound and the fraction of non-polar amino acids in their sequences. In terms of predicting potential targets, datasets of proteases, ion channels and cancer proteins were able to induce random forests that were highly capable of distinguishing between targets and non-targets. The non-target proteins predicted to be targets by these random forests comprise the set of the most suitable potential future drug targets, and should therefore be prioritised when building a drug development programme. PMID:25822509

  5. Random Forest Algorithm for the Classification of Neuroimaging Data in Alzheimer's Disease: A Systematic Review.

    PubMed

    Sarica, Alessia; Cerasa, Antonio; Quattrone, Aldo

    2017-01-01

    Objective: Machine learning classification has been the most important computational development in the last years to satisfy the primary need of clinicians for automatic early diagnosis and prognosis. Nowadays, Random Forest (RF) algorithm has been successfully applied for reducing high dimensional and multi-source data in many scientific realms. Our aim was to explore the state of the art of the application of RF on single and multi-modal neuroimaging data for the prediction of Alzheimer's disease. Methods: A systematic review following PRISMA guidelines was conducted on this field of study. In particular, we constructed an advanced query using boolean operators as follows: ("random forest" OR "random forests") AND neuroimaging AND ("alzheimer's disease" OR alzheimer's OR alzheimer) AND (prediction OR classification) . The query was then searched in four well-known scientific databases: Pubmed, Scopus, Google Scholar and Web of Science. Results: Twelve articles-published between the 2007 and 2017-have been included in this systematic review after a quantitative and qualitative selection. The lesson learnt from these works suggest that when RF was applied on multi-modal data for prediction of Alzheimer's disease (AD) conversion from the Mild Cognitive Impairment (MCI), it produces one of the best accuracies to date. Moreover, the RF has important advantages in terms of robustness to overfitting, ability to handle highly non-linear data, stability in the presence of outliers and opportunity for efficient parallel processing mainly when applied on multi-modality neuroimaging data, such as, MRI morphometric, diffusion tensor imaging, and PET images. Conclusions: We discussed the strengths of RF, considering also possible limitations and by encouraging further studies on the comparisons of this algorithm with other commonly used classification approaches, particularly in the early prediction of the progression from MCI to AD.

  6. Assessing the Status of Wild Felids in a Highly-Disturbed Commercial Forest Reserve in Borneo and the Implications for Camera Trap Survey Design

    PubMed Central

    Wearn, Oliver R.; Rowcliffe, J. Marcus; Carbone, Chris; Bernard, Henry; Ewers, Robert M.

    2013-01-01

    The proliferation of camera-trapping studies has led to a spate of extensions in the known distributions of many wild cat species, not least in Borneo. However, we still do not have a clear picture of the spatial patterns of felid abundance in Southeast Asia, particularly with respect to the large areas of highly-disturbed habitat. An important obstacle to increasing the usefulness of camera trap data is the widespread practice of setting cameras at non-random locations. Non-random deployment interacts with non-random space-use by animals, causing biases in our inferences about relative abundance from detection frequencies alone. This may be a particular problem if surveys do not adequately sample the full range of habitat features present in a study region. Using camera-trapping records and incidental sightings from the Kalabakan Forest Reserve, Sabah, Malaysian Borneo, we aimed to assess the relative abundance of felid species in highly-disturbed forest, as well as investigate felid space-use and the potential for biases resulting from non-random sampling. Although the area has been intensively logged over three decades, it was found to still retain the full complement of Bornean felids, including the bay cat Pardofelis badia, a poorly known Bornean endemic. Camera-trapping using strictly random locations detected four of the five Bornean felid species and revealed inter- and intra-specific differences in space-use. We compare our results with an extensive dataset of >1,200 felid records from previous camera-trapping studies and show that the relative abundance of the bay cat, in particular, may have previously been underestimated due to the use of non-random survey locations. Further surveys for this species using random locations will be crucial in determining its conservation status. We advocate the more wide-spread use of random survey locations in future camera-trapping surveys in order to increase the robustness and generality of inferences that can be made. PMID:24223717

  7. Northeastern Area State and Private Forestry At a Glance

    Treesearch

    Northeastern Area, State & Private Forestry USDA Forest Service

    2006-01-01

    The State and Private Forestry branch of the USDA Forest Service promotes sustainable management of non-Federal forest lands, which make up two-thirds of the forests in the United States. This work supports the Forest Service?s role as steward of the Nation?s forests and ensures that private forests yield public benefits. Among these benefits are clean air, drinking...

  8. Combining forest inventory, satellite remote sensing, and geospatial data for mapping forest attributes of the conterminous United States

    Treesearch

    Mark Nelson; Greg Liknes; Charles H. Perry

    2009-01-01

    Analysis and display of forest composition, structure, and pattern provides information for a variety of assessments and management decision support. The objective of this study was to produce geospatial datasets and maps of conterminous United States forest land ownership, forest site productivity, timberland, and reserved forest land. Satellite image-based maps of...

  9. In Land of Cypress and Pine: An Environmental History of the Santee Experimental Forest, 1683-1937

    Treesearch

    Hayden R. Smith

    2012-01-01

    The Santee Experimental Forest is a 6,100-acre research facility located within the Francis Marion National Forest, SC. Situated within the Huger Creek watershed in the headwaters of the East Branch of the Cooper River, the Santee Experimental Forest supports research in forest ecology, silviculture, prescribed fire, forest hydrology, ecosystem restoration, and...

  10. Propagation of noise over and through a forest stand

    Treesearch

    Lee P. Herrington; C. Brock

    1977-01-01

    Measurements of the two-dimensional acoustic field in a forest resulting from a source located outside the forest indicated that the attenuation pattern near the ground is significantly different from the pattern higher up in the forest. The patterns of attenuation support the recent theory that the forest floor is the main absorber of acoustic energy in the forest....

  11. Advanced analysis of forest fire clustering

    NASA Astrophysics Data System (ADS)

    Kanevski, Mikhail; Pereira, Mario; Golay, Jean

    2017-04-01

    Analysis of point pattern clustering is an important topic in spatial statistics and for many applications: biodiversity, epidemiology, natural hazards, geomarketing, etc. There are several fundamental approaches used to quantify spatial data clustering using topological, statistical and fractal measures. In the present research, the recently introduced multi-point Morisita index (mMI) is applied to study the spatial clustering of forest fires in Portugal. The data set consists of more than 30000 fire events covering the time period from 1975 to 2013. The distribution of forest fires is very complex and highly variable in space. mMI is a multi-point extension of the classical two-point Morisita index. In essence, mMI is estimated by covering the region under study by a grid and by computing how many times more likely it is that m points selected at random will be from the same grid cell than it would be in the case of a complete random Poisson process. By changing the number of grid cells (size of the grid cells), mMI characterizes the scaling properties of spatial clustering. From mMI, the data intrinsic dimension (fractal dimension) of the point distribution can be estimated as well. In this study, the mMI of forest fires is compared with the mMI of random patterns (RPs) generated within the validity domain defined as the forest area of Portugal. It turns out that the forest fires are highly clustered inside the validity domain in comparison with the RPs. Moreover, they demonstrate different scaling properties at different spatial scales. The results obtained from the mMI analysis are also compared with those of fractal measures of clustering - box counting and sand box counting approaches. REFERENCES Golay J., Kanevski M., Vega Orozco C., Leuenberger M., 2014: The multipoint Morisita index for the analysis of spatial patterns. Physica A, 406, 191-202. Golay J., Kanevski M. 2015: A new estimator of intrinsic dimension based on the multipoint Morisita index. Pattern Recognition, 48, 4070-4081.

  12. Climate change may restrict dryland forest regeneration in the 21st century

    USGS Publications Warehouse

    Petrie, M.D.; Bradford, John B.; Hubbard, R.M.; Lauenroth, W.K.; Andrews, Caitlin; Schlaepfer, D.R.

    2017-01-01

    The persistence and geographic expansion of dryland forests in the 21st century will be influenced by how climate change supports the demographic processes associated with tree regeneration. Yet, the way that climate change may alter regeneration is unclear. We developed a quantitative framework that estimates forest regeneration potential (RP) as a function of key environmental conditions for ponderosa pine, a key dryland forest species. We integrated meteorological data and climate projections for 47 ponderosa pine forest sites across the western United States, and evaluated RP using an ecosystem water balance model. Our primary goal was to contrast conditions supporting regeneration among historical, mid-21st century and late-21st century time frames. Future climatic conditions supported 50% higher RP in 2020–2059 relative to 1910–2014. As temperatures increased more substantially in 2060–2099, seedling survival decreased, RP declined by 50%, and the frequency of years with very low RP increased from 25% to 58%. Thus, climate change may initially support higher RP and increase the likelihood of successful regeneration events, yet will ultimately reduce average RP and the frequency of years with moderate climate support of regeneration. Our results suggest that climate change alone may begin to restrict the persistence and expansion of dryland forests by limiting seedling survival in the late 21st century.

  13. Climate change may restrict dryland forest regeneration in the 21st century.

    PubMed

    Petrie, M D; Bradford, J B; Hubbard, R M; Lauenroth, W K; Andrews, C M; Schlaepfer, D R

    2017-06-01

    The persistence and geographic expansion of dryland forests in the 21st century will be influenced by how climate change supports the demographic processes associated with tree regeneration. Yet, the way that climate change may alter regeneration is unclear. We developed a quantitative framework that estimates forest regeneration potential (RP) as a function of key environmental conditions for ponderosa pine, a key dryland forest species. We integrated meteorological data and climate projections for 47 ponderosa pine forest sites across the western United States, and evaluated RP using an ecosystem water balance model. Our primary goal was to contrast conditions supporting regeneration among historical, mid-21st century and late-21st century time frames. Future climatic conditions supported 50% higher RP in 2020-2059 relative to 1910-2014. As temperatures increased more substantially in 2060-2099, seedling survival decreased, RP declined by 50%, and the frequency of years with very low RP increased from 25% to 58%. Thus, climate change may initially support higher RP and increase the likelihood of successful regeneration events, yet will ultimately reduce average RP and the frequency of years with moderate climate support of regeneration. Our results suggest that climate change alone may begin to restrict the persistence and expansion of dryland forests by limiting seedling survival in the late 21st century. © 2017 by the Ecological Society of America.

  14. EDITORIAL: Special section on foliage penetration

    NASA Astrophysics Data System (ADS)

    Fiddy, M. A.; Lang, R.; McGahan, R. V.

    2004-04-01

    Waves in Random Media was founded in 1991 to provide a forum for papers dealing with electromagnetic and acoustic waves as they propagate and scatter through media or objects having some degree of randomness. This is a broad charter since, in practice, all scattering obstacles and structures have roughness or randomness, often on the scale of the wavelength being used to probe them. Including this random component leads to some quite different methods for describing propagation effects, for example, when propagating through the atmosphere or the ground. This special section on foliage penetration (FOPEN) focuses on the problems arising from microwave propagation through foliage and vegetation. Applications of such studies include the estimation for forest biomass and the moisture of the underlying soil, as well as detecting objects hidden therein. In addition to the so-called `direct problem' of trying to describe energy propagating through such media, the complementary inverse problem is of great interest and much harder to solve. The development of theoretical models and associated numerical algorithms for identifying objects concealed by foliage has applications in surveillance, ranging from monitoring drug trafficking to targeting military vehicles. FOPEN can be employed to map the earth's surface in cases when it is under a forest canopy, permitting the identification of objects or targets on that surface, but the process for doing so is not straightforward. There has been an increasing interest in foliage penetration synthetic aperture radar (FOPEN or FOPENSAR) over the last 10 years and this special section provides a broad overview of many of the issues involved. The detection, identification, and geographical location of targets under foliage or otherwise obscured by poor visibility conditions remains a challenge. In particular, a trade-off often needs to be appreciated, namely that diminishing the deleterious effects of multiple scattering from leaves is typically associated with a significant loss in target resolution. Foliage is more or less transparent to some radar frequencies, but longer wavelengths found in the VHF (30 to 300 MHz) and UHF (300 MHz to 3 GHz) portions of the microwave spectrum have more chance of penetrating foliage than do wavelengths at the X band (8 to 12 GHz). Reflection and multiple scattering occur for some other frequencies and models of the processes involved are crucial. Two topical reviews can be found in this issue, one on the microwave radiometry of forests (page S275) and another describing ionospheric effects on space-based radar (page S189). Subsequent papers present new results on modelling coherent backscatter from forests (page S299), modelling forests as discrete random media over a random interface (page S359) and interpreting ranging scatterometer data from forests (page S317). Cloude et al present research on identifying targets beneath foliage using polarimetric SAR interferometry (page S393) while Treuhaft and Siqueira use interferometric radar to describe forest structure and biomass (page S345). Vechhia et al model scattering from leaves (page S333) and Semichaevsky et al address the problem of the trade-off between increasing wavelength, reduction in multiple scattering, and target resolution (page S415).

  15. Exploring prediction uncertainty of spatial data in geostatistical and machine learning Approaches

    NASA Astrophysics Data System (ADS)

    Klump, J. F.; Fouedjio, F.

    2017-12-01

    Geostatistical methods such as kriging with external drift as well as machine learning techniques such as quantile regression forest have been intensively used for modelling spatial data. In addition to providing predictions for target variables, both approaches are able to deliver a quantification of the uncertainty associated with the prediction at a target location. Geostatistical approaches are, by essence, adequate for providing such prediction uncertainties and their behaviour is well understood. However, they often require significant data pre-processing and rely on assumptions that are rarely met in practice. Machine learning algorithms such as random forest regression, on the other hand, require less data pre-processing and are non-parametric. This makes the application of machine learning algorithms to geostatistical problems an attractive proposition. The objective of this study is to compare kriging with external drift and quantile regression forest with respect to their ability to deliver reliable prediction uncertainties of spatial data. In our comparison we use both simulated and real world datasets. Apart from classical performance indicators, comparisons make use of accuracy plots, probability interval width plots, and the visual examinations of the uncertainty maps provided by the two approaches. By comparing random forest regression to kriging we found that both methods produced comparable maps of estimated values for our variables of interest. However, the measure of uncertainty provided by random forest seems to be quite different to the measure of uncertainty provided by kriging. In particular, the lack of spatial context can give misleading results in areas without ground truth data. These preliminary results raise questions about assessing the risks associated with decisions based on the predictions from geostatistical and machine learning algorithms in a spatial context, e.g. mineral exploration.

  16. Forests of the Oregon Coast Range-considerations for ecological restoration

    Treesearch

    Joe Means; Shu-hei Chen; Jane Kertis; Pete Teensma

    1996-01-01

    The Oregon Coast Range supports some of the most dense and productive forests in North America. In the pre-harvesting period these forests arose as a result of large fires-the largest covering 330,000 ha (Teensma and others 1991). These fires occurred mostly at intervals of 150 to 300 years. The natural disturbance regime supported a diverse fauna and large populations...

  17. Eucalyptus Forest Information System for the Portuguese pulp and paper industry

    Treesearch

    Luis Fonseca; Rita Crespo; Henk Feith; Jose Luis Carvalho; Antonio Macedo; Joao Pedro Pina

    2000-01-01

    To support the management of the Portuguese eucalyptus forest, the Association of Portuguese Pulp and Paper Industries (CELPA) decided to develop a Eucalyptus Forest Information System (EFIS). The specific goals of the EFIS are: characterization and development of the eucalyptus forest over time; planning of successive national eucalyptus forest inventories; estimation...

  18. Managing forest products for community benefit

    Treesearch

    Susan Charnley; Jonathan W. Long

    2014-01-01

    Forest products harvesting and use from national forest lands remain important to local residents and communities in some parts of the Sierra Nevada science synthesis area. Managing national forests for the sustainable production of timber, biomass, nontimber forest products, and forage for livestock can help support forestbased livelihoods in parts of the region where...

  19. Monitoring Strategies for REDD+: Integrating Field, Airborne, and Satellite Observations of Amazon Forests

    NASA Technical Reports Server (NTRS)

    Morton, Douglas; Souza, Carlos, Jr.; Souza, Carlos, Jr.; Keller, Michael

    2012-01-01

    Large-scale tropical forest monitoring efforts in support of REDD+ (Reducing Emissions from Deforestation and forest Degradation plus enhancing forest carbon stocks) confront a range of challenges. REDD+ activities typically have short reporting time scales, diverse data needs, and low tolerance for uncertainties. Meeting these challenges will require innovative use of remote sensing data, including integrating data at different spatial and temporal resolutions. The global scientific community is engaged in developing, evaluating, and applying new methods for regional to global scale forest monitoring. Pilot REDD+ activities are underway across the tropics with support from a range of national and international groups, including SilvaCarbon, an interagency effort to coordinate US expertise on forest monitoring and resource management. Early actions on REDD+ have exposed some of the inherent tradeoffs that arise from the use of incomplete or inaccurate data to quantify forest area changes and related carbon emissions. Here, we summarize recent advances in forest monitoring to identify and target the main sources of uncertainty in estimates of forest area changes, aboveground carbon stocks, and Amazon forest carbon emissions.

  20. Phylogenetic classification of the world's tropical forests.

    PubMed

    Slik, J W Ferry; Franklin, Janet; Arroyo-Rodríguez, Víctor; Field, Richard; Aguilar, Salomon; Aguirre, Nikolay; Ahumada, Jorge; Aiba, Shin-Ichiro; Alves, Luciana F; K, Anitha; Avella, Andres; Mora, Francisco; Aymard C, Gerardo A; Báez, Selene; Balvanera, Patricia; Bastian, Meredith L; Bastin, Jean-François; Bellingham, Peter J; van den Berg, Eduardo; da Conceição Bispo, Polyanna; Boeckx, Pascal; Boehning-Gaese, Katrin; Bongers, Frans; Boyle, Brad; Brambach, Fabian; Brearley, Francis Q; Brown, Sandra; Chai, Shauna-Lee; Chazdon, Robin L; Chen, Shengbin; Chhang, Phourin; Chuyong, George; Ewango, Corneille; Coronado, Indiana M; Cristóbal-Azkarate, Jurgi; Culmsee, Heike; Damas, Kipiro; Dattaraja, H S; Davidar, Priya; DeWalt, Saara J; Din, Hazimah; Drake, Donald R; Duque, Alvaro; Durigan, Giselda; Eichhorn, Karl; Eler, Eduardo Schmidt; Enoki, Tsutomu; Ensslin, Andreas; Fandohan, Adandé Belarmain; Farwig, Nina; Feeley, Kenneth J; Fischer, Markus; Forshed, Olle; Garcia, Queila Souza; Garkoti, Satish Chandra; Gillespie, Thomas W; Gillet, Jean-Francois; Gonmadje, Christelle; Granzow-de la Cerda, Iñigo; Griffith, Daniel M; Grogan, James; Hakeem, Khalid Rehman; Harris, David J; Harrison, Rhett D; Hector, Andy; Hemp, Andreas; Homeier, Jürgen; Hussain, M Shah; Ibarra-Manríquez, Guillermo; Hanum, I Faridah; Imai, Nobuo; Jansen, Patrick A; Joly, Carlos Alfredo; Joseph, Shijo; Kartawinata, Kuswata; Kearsley, Elizabeth; Kelly, Daniel L; Kessler, Michael; Killeen, Timothy J; Kooyman, Robert M; Laumonier, Yves; Laurance, Susan G; Laurance, William F; Lawes, Michael J; Letcher, Susan G; Lindsell, Jeremy; Lovett, Jon; Lozada, Jose; Lu, Xinghui; Lykke, Anne Mette; Mahmud, Khairil Bin; Mahayani, Ni Putu Diana; Mansor, Asyraf; Marshall, Andrew R; Martin, Emanuel H; Calderado Leal Matos, Darley; Meave, Jorge A; Melo, Felipe P L; Mendoza, Zhofre Huberto Aguirre; Metali, Faizah; Medjibe, Vincent P; Metzger, Jean Paul; Metzker, Thiago; Mohandass, D; Munguía-Rosas, Miguel A; Muñoz, Rodrigo; Nurtjahy, Eddy; de Oliveira, Eddie Lenza; Onrizal; Parolin, Pia; Parren, Marc; Parthasarathy, N; Paudel, Ekananda; Perez, Rolando; Pérez-García, Eduardo A; Pommer, Ulf; Poorter, Lourens; Qie, Lan; Piedade, Maria Teresa F; Pinto, José Roberto Rodrigues; Poulsen, Axel Dalberg; Poulsen, John R; Powers, Jennifer S; Prasad, Rama Chandra; Puyravaud, Jean-Philippe; Rangel, Orlando; Reitsma, Jan; Rocha, Diogo S B; Rolim, Samir; Rovero, Francesco; Rozak, Andes; Ruokolainen, Kalle; Rutishauser, Ervan; Rutten, Gemma; Mohd Said, Mohd Nizam; Saiter, Felipe Z; Saner, Philippe; Santos, Braulio; Dos Santos, João Roberto; Sarker, Swapan Kumar; Schmitt, Christine B; Schoengart, Jochen; Schulze, Mark; Sheil, Douglas; Sist, Plinio; Souza, Alexandre F; Spironello, Wilson Roberto; Sposito, Tereza; Steinmetz, Robert; Stevart, Tariq; Suganuma, Marcio Seiji; Sukri, Rahayu; Sultana, Aisha; Sukumar, Raman; Sunderland, Terry; Supriyadi; Suresh, H S; Suzuki, Eizi; Tabarelli, Marcelo; Tang, Jianwei; Tanner, Ed V J; Targhetta, Natalia; Theilade, Ida; Thomas, Duncan; Timberlake, Jonathan; de Morisson Valeriano, Márcio; van Valkenburg, Johan; Van Do, Tran; Van Sam, Hoang; Vandermeer, John H; Verbeeck, Hans; Vetaas, Ole Reidar; Adekunle, Victor; Vieira, Simone A; Webb, Campbell O; Webb, Edward L; Whitfeld, Timothy; Wich, Serge; Williams, John; Wiser, Susan; Wittmann, Florian; Yang, Xiaobo; Adou Yao, C Yves; Yap, Sandra L; Zahawi, Rakan A; Zakaria, Rahmad; Zang, Runguo

    2018-02-20

    Knowledge about the biogeographic affinities of the world's tropical forests helps to better understand regional differences in forest structure, diversity, composition, and dynamics. Such understanding will enable anticipation of region-specific responses to global environmental change. Modern phylogenies, in combination with broad coverage of species inventory data, now allow for global biogeographic analyses that take species evolutionary distance into account. Here we present a classification of the world's tropical forests based on their phylogenetic similarity. We identify five principal floristic regions and their floristic relationships: ( i ) Indo-Pacific, ( ii ) Subtropical, ( iii ) African, ( iv ) American, and ( v ) Dry forests. Our results do not support the traditional neo- versus paleotropical forest division but instead separate the combined American and African forests from their Indo-Pacific counterparts. We also find indications for the existence of a global dry forest region, with representatives in America, Africa, Madagascar, and India. Additionally, a northern-hemisphere Subtropical forest region was identified with representatives in Asia and America, providing support for a link between Asian and American northern-hemisphere forests. Copyright © 2018 the Author(s). Published by PNAS.

  1. Phylogenetic classification of the world’s tropical forests

    PubMed Central

    Franklin, Janet; Arroyo-Rodríguez, Víctor; Field, Richard; Aguilar, Salomon; Aguirre, Nikolay; Ahumada, Jorge; Aiba, Shin-Ichiro; K, Anitha; Avella, Andres; Mora, Francisco; Aymard C., Gerardo A.; Báez, Selene; Balvanera, Patricia; Bastian, Meredith L.; Bastin, Jean-François; Bellingham, Peter J.; van den Berg, Eduardo; da Conceição Bispo, Polyanna; Boeckx, Pascal; Boehning-Gaese, Katrin; Bongers, Frans; Boyle, Brad; Brearley, Francis Q.; Brown, Sandra; Chai, Shauna-Lee; Chazdon, Robin L.; Chen, Shengbin; Chhang, Phourin; Chuyong, George; Ewango, Corneille; Coronado, Indiana M.; Cristóbal-Azkarate, Jurgi; Culmsee, Heike; Damas, Kipiro; Dattaraja, H. S.; Davidar, Priya; DeWalt, Saara J.; Din, Hazimah; Drake, Donald R.; Durigan, Giselda; Eichhorn, Karl; Eler, Eduardo Schmidt; Enoki, Tsutomu; Ensslin, Andreas; Fandohan, Adandé Belarmain; Farwig, Nina; Feeley, Kenneth J.; Fischer, Markus; Forshed, Olle; Garcia, Queila Souza; Garkoti, Satish Chandra; Gillespie, Thomas W.; Gillet, Jean-Francois; Gonmadje, Christelle; Granzow-de la Cerda, Iñigo; Griffith, Daniel M.; Grogan, James; Hakeem, Khalid Rehman; Harris, David J.; Harrison, Rhett D.; Hector, Andy; Hemp, Andreas; Hussain, M. Shah; Ibarra-Manríquez, Guillermo; Hanum, I. Faridah; Imai, Nobuo; Jansen, Patrick A.; Joly, Carlos Alfredo; Joseph, Shijo; Kartawinata, Kuswata; Kearsley, Elizabeth; Kelly, Daniel L.; Kessler, Michael; Killeen, Timothy J.; Kooyman, Robert M.; Laumonier, Yves; Laurance, William F.; Lawes, Michael J.; Letcher, Susan G.; Lovett, Jon; Lozada, Jose; Lu, Xinghui; Lykke, Anne Mette; Mahmud, Khairil Bin; Mahayani, Ni Putu Diana; Mansor, Asyraf; Marshall, Andrew R.; Martin, Emanuel H.; Calderado Leal Matos, Darley; Meave, Jorge A.; Melo, Felipe P. L.; Mendoza, Zhofre Huberto Aguirre; Metali, Faizah; Medjibe, Vincent P.; Metzger, Jean Paul; Metzker, Thiago; Mohandass, D.; Munguía-Rosas, Miguel A.; Muñoz, Rodrigo; Nurtjahy, Eddy; de Oliveira, Eddie Lenza; Onrizal; Parolin, Pia; Parren, Marc; Parthasarathy, N.; Paudel, Ekananda; Perez, Rolando; Pérez-García, Eduardo A.; Pommer, Ulf; Poorter, Lourens; Qie, Lan; Piedade, Maria Teresa F.; Pinto, José Roberto Rodrigues; Poulsen, Axel Dalberg; Poulsen, John R.; Powers, Jennifer S.; Prasad, Rama Chandra; Puyravaud, Jean-Philippe; Rangel, Orlando; Reitsma, Jan; Rocha, Diogo S. B.; Rolim, Samir; Rovero, Francesco; Ruokolainen, Kalle; Rutishauser, Ervan; Rutten, Gemma; Mohd. Said, Mohd. Nizam; Saiter, Felipe Z.; Saner, Philippe; Santos, Braulio; dos Santos, João Roberto; Sarker, Swapan Kumar; Schoengart, Jochen; Schulze, Mark; Sheil, Douglas; Sist, Plinio; Souza, Alexandre F.; Spironello, Wilson Roberto; Sposito, Tereza; Steinmetz, Robert; Stevart, Tariq; Suganuma, Marcio Seiji; Sukri, Rahayu; Sukumar, Raman; Sunderland, Terry; Supriyadi; Suresh, H. S.; Suzuki, Eizi; Tabarelli, Marcelo; Tang, Jianwei; Tanner, Ed V. J.; Targhetta, Natalia; Theilade, Ida; Thomas, Duncan; Timberlake, Jonathan; de Morisson Valeriano, Márcio; van Valkenburg, Johan; Van Do, Tran; Van Sam, Hoang; Vandermeer, John H.; Verbeeck, Hans; Vetaas, Ole Reidar; Adekunle, Victor; Vieira, Simone A.; Webb, Campbell O.; Webb, Edward L.; Whitfeld, Timothy; Wich, Serge; Williams, John; Wiser, Susan; Wittmann, Florian; Yang, Xiaobo; Adou Yao, C. Yves; Yap, Sandra L.; Zahawi, Rakan A.; Zakaria, Rahmad; Zang, Runguo

    2018-01-01

    Knowledge about the biogeographic affinities of the world’s tropical forests helps to better understand regional differences in forest structure, diversity, composition, and dynamics. Such understanding will enable anticipation of region-specific responses to global environmental change. Modern phylogenies, in combination with broad coverage of species inventory data, now allow for global biogeographic analyses that take species evolutionary distance into account. Here we present a classification of the world’s tropical forests based on their phylogenetic similarity. We identify five principal floristic regions and their floristic relationships: (i) Indo-Pacific, (ii) Subtropical, (iii) African, (iv) American, and (v) Dry forests. Our results do not support the traditional neo- versus paleotropical forest division but instead separate the combined American and African forests from their Indo-Pacific counterparts. We also find indications for the existence of a global dry forest region, with representatives in America, Africa, Madagascar, and India. Additionally, a northern-hemisphere Subtropical forest region was identified with representatives in Asia and America, providing support for a link between Asian and American northern-hemisphere forests. PMID:29432167

  2. Forest climate change Vulnerability and Adaptation Assessment in Himalayas

    NASA Astrophysics Data System (ADS)

    Chitale, V. S.; Shrestha, H. L.; Agarwal, N. K.; Choudhurya, D.; Gilani, H.; Dhonju, H. K.; Murthy, M. S. R.

    2014-11-01

    Forests offer an important basis for creating and safeguarding more climate-resilient communities over Hindu Kush Himalayan region. The forest ecosystem vulnerability assessment to climate change and developing knowledge base to identify and support relevant adaptation strategies is realized as an urgent need. The multi scale adaptation strategies portray increasing complexity with the increasing levels in terms of data requirements, vulnerability understanding and decision making to choose a particular adaptation strategy. We present here how such complexities could be addressed and adaptation decisions could be either directly supported by open source remote sensing based forestry products or geospatial analysis and modelled products. The forest vulnerability assessment under climate change scenario coupled with increasing forest social dependence was studied using IPCC Landscape scale Vulnerability framework in Chitwan-Annapurna Landscape (CHAL) situated in Nepal. Around twenty layers of geospatial information on climate, forest biophysical and forest social dependence data was used to assess forest vulnerability and associated adaptation needs using self-learning decision tree based approaches. The increase in forest fires, evapotranspiration and reduction in productivity over changing climate scenario was observed. The adaptation measures on enhancing productivity, improving resilience, reducing or avoiding pressure with spatial specificity are identified to support suitable decision making. The study provides spatial analytical framework to evaluate multitude of parameters to understand vulnerabilities and assess scope for alternative adaptation strategies with spatial explicitness.

  3. [Functional diversity characteristics of canopy tree species of Jianfengling tropical montane rainforest on Hainan Island, China.

    PubMed

    Xu, Ge Xi; Shi, Zuo Min; Tang, Jing Chao; Liu, Shun; Ma, Fan Qiang; Xu, Han; Liu, Shi Rong; Li, Yi de

    2016-11-18

    Based on three 1-hm 2 plots of Jianfengling tropical montane rainforest on Hainan Island, 11 commom used functional traits of canopy trees were measured. After combining with topographical factors and trees census data of these three plots, we compared the impacts of weighted species abundance on two functional dispersion indices, mean pairwise distance (MPD) and mean nearest taxon distance (MNTD), by using single- and multi-dimensional traits, respectively. The relationship between functional richness of the forest canopies and species abundance was analyzed. We used a null model approach to explore the variations in standardized size effects of MPD and MNTD, which were weighted by species abundance and eliminated the influences of species richness diffe-rences among communities, and assessed functional diversity patterns of the forest canopies and their responses to local habitat heterogeneity at community's level. The results showed that variation in MPD was greatly dependent on the dimensionalities of functional traits as well as species abundance. The correlations between weighted and non-weighted MPD based on different dimensional traits were relatively weak (R=0.359-0.628). On the contrary, functional traits and species abundance had relatively weak effects on MNTD, which brought stronger correlations between weighted and non-weighted MNTD based on different dimensional traits (R=0.746-0.820). Functional dispersion of the forest canopies were generally overestimated when using non-weighted MPD and MNTD. Functional richness of the forest canopies showed an exponential relationship with species abundance (F=128.20; R 2 =0.632; AIC=97.72; P<0.001), which might exist a species abundance threshold value. Patterns of functional diversity of the forest canopies based on different dimensional functional traits and their habitat responses showed variations in some degree. Forest canopies in the valley usually had relatively stronger biological competition, and functional diversity was higher than expected functional diversity randomized by null model, which indicated dispersed distribution of functional traits among canopy tree species in this habitat. However, the functional diversity of the forest canopies tended to be close or lower than randomization in the other habitat types, which demonstrated random or clustered distribution of the functional traits among canopy tree species.

  4. Comparison of classification algorithms for various methods of preprocessing radar images of the MSTAR base

    NASA Astrophysics Data System (ADS)

    Borodinov, A. A.; Myasnikov, V. V.

    2018-04-01

    The present work is devoted to comparing the accuracy of the known qualification algorithms in the task of recognizing local objects on radar images for various image preprocessing methods. Preprocessing involves speckle noise filtering and normalization of the object orientation in the image by the method of image moments and by a method based on the Hough transform. In comparison, the following classification algorithms are used: Decision tree; Support vector machine, AdaBoost, Random forest. The principal component analysis is used to reduce the dimension. The research is carried out on the objects from the base of radar images MSTAR. The paper presents the results of the conducted studies.

  5. The use of single-date MODIS imagery for estimating large-scale urban impervious surface fraction with spectral mixture analysis and machine learning techniques

    NASA Astrophysics Data System (ADS)

    Deng, Chengbin; Wu, Changshan

    2013-12-01

    Urban impervious surface information is essential for urban and environmental applications at the regional/national scales. As a popular image processing technique, spectral mixture analysis (SMA) has rarely been applied to coarse-resolution imagery due to the difficulty of deriving endmember spectra using traditional endmember selection methods, particularly within heterogeneous urban environments. To address this problem, we derived endmember signatures through a least squares solution (LSS) technique with known abundances of sample pixels, and integrated these endmember signatures into SMA for mapping large-scale impervious surface fraction. In addition, with the same sample set, we carried out objective comparative analyses among SMA (i.e. fully constrained and unconstrained SMA) and machine learning (i.e. Cubist regression tree and Random Forests) techniques. Analysis of results suggests three major conclusions. First, with the extrapolated endmember spectra from stratified random training samples, the SMA approaches performed relatively well, as indicated by small MAE values. Second, Random Forests yields more reliable results than Cubist regression tree, and its accuracy is improved with increased sample sizes. Finally, comparative analyses suggest a tentative guide for selecting an optimal approach for large-scale fractional imperviousness estimation: unconstrained SMA might be a favorable option with a small number of samples, while Random Forests might be preferred if a large number of samples are available.

  6. Prediction of aquatic toxicity mode of action using linear discriminant and random forest models.

    PubMed

    Martin, Todd M; Grulke, Christopher M; Young, Douglas M; Russom, Christine L; Wang, Nina Y; Jackson, Crystal R; Barron, Mace G

    2013-09-23

    The ability to determine the mode of action (MOA) for a diverse group of chemicals is a critical part of ecological risk assessment and chemical regulation. However, existing MOA assignment approaches in ecotoxicology have been limited to a relatively few MOAs, have high uncertainty, or rely on professional judgment. In this study, machine based learning algorithms (linear discriminant analysis and random forest) were used to develop models for assigning aquatic toxicity MOA. These methods were selected since they have been shown to be able to correlate diverse data sets and provide an indication of the most important descriptors. A data set of MOA assignments for 924 chemicals was developed using a combination of high confidence assignments, international consensus classifications, ASTER (ASessment Tools for the Evaluation of Risk) predictions, and weight of evidence professional judgment based an assessment of structure and literature information. The overall data set was randomly divided into a training set (75%) and a validation set (25%) and then used to develop linear discriminant analysis (LDA) and random forest (RF) MOA assignment models. The LDA and RF models had high internal concordance and specificity and were able to produce overall prediction accuracies ranging from 84.5 to 87.7% for the validation set. These results demonstrate that computational chemistry approaches can be used to determine the acute toxicity MOAs across a large range of structures and mechanisms.

  7. Water chemistry in 179 randomly selected Swedish headwater streams related to forest production, clear-felling and climate.

    PubMed

    Löfgren, Stefan; Fröberg, Mats; Yu, Jun; Nisell, Jakob; Ranneby, Bo

    2014-12-01

    From a policy perspective, it is important to understand forestry effects on surface waters from a landscape perspective. The EU Water Framework Directive demands remedial actions if not achieving good ecological status. In Sweden, 44 % of the surface water bodies have moderate ecological status or worse. Many of these drain catchments with a mosaic of managed forests. It is important for the forestry sector and water authorities to be able to identify where, in the forested landscape, special precautions are necessary. The aim of this study was to quantify the relations between forestry parameters and headwater stream concentrations of nutrients, organic matter and acid-base chemistry. The results are put into the context of regional climate, sulphur and nitrogen deposition, as well as marine influences. Water chemistry was measured in 179 randomly selected headwater streams from two regions in southwest and central Sweden, corresponding to 10 % of the Swedish land area. Forest status was determined from satellite images and Swedish National Forest Inventory data using the probabilistic classifier method, which was used to model stream water chemistry with Bayesian model averaging. The results indicate that concentrations of e.g. nitrogen, phosphorus and organic matter are related to factors associated with forest production but that it is not forestry per se that causes the excess losses. Instead, factors simultaneously affecting forest production and stream water chemistry, such as climate, extensive soil pools and nitrogen deposition, are the most likely candidates The relationships with clear-felled and wetland areas are likely to be direct effects.

  8. High Quality Facade Segmentation Based on Structured Random Forest, Region Proposal Network and Rectangular Fitting

    NASA Astrophysics Data System (ADS)

    Rahmani, K.; Mayer, H.

    2018-05-01

    In this paper we present a pipeline for high quality semantic segmentation of building facades using Structured Random Forest (SRF), Region Proposal Network (RPN) based on a Convolutional Neural Network (CNN) as well as rectangular fitting optimization. Our main contribution is that we employ features created by the RPN as channels in the SRF.We empirically show that this is very effective especially for doors and windows. Our pipeline is evaluated on two datasets where we outperform current state-of-the-art methods. Additionally, we quantify the contribution of the RPN and the rectangular fitting optimization on the accuracy of the result.

  9. Bridging the gap between formal and experience-based knowledge for context-aware laparoscopy.

    PubMed

    Katić, Darko; Schuck, Jürgen; Wekerle, Anna-Laura; Kenngott, Hannes; Müller-Stich, Beat Peter; Dillmann, Rüdiger; Speidel, Stefanie

    2016-06-01

    Computer assistance is increasingly common in surgery. However, the amount of information is bound to overload processing abilities of surgeons. We propose methods to recognize the current phase of a surgery for context-aware information filtering. The purpose is to select the most suitable subset of information for surgical situations which require special assistance. We combine formal knowledge, represented by an ontology, and experience-based knowledge, represented by training samples, to recognize phases. For this purpose, we have developed two different methods. Firstly, we use formal knowledge about possible phase transitions to create a composition of random forests. Secondly, we propose a method based on cultural optimization to infer formal rules from experience to recognize phases. The proposed methods are compared with a purely formal knowledge-based approach using rules and a purely experience-based one using regular random forests. The comparative evaluation on laparoscopic pancreas resections and adrenalectomies employs a consistent set of quality criteria on clean and noisy input. The rule-based approaches proved best with noisefree data. The random forest-based ones were more robust in the presence of noise. Formal and experience-based knowledge can be successfully combined for robust phase recognition.

  10. Analysis and Recognition of Traditional Chinese Medicine Pulse Based on the Hilbert-Huang Transform and Random Forest in Patients with Coronary Heart Disease

    PubMed Central

    Wang, Yiqin; Yan, Hanxia; Yan, Jianjun; Yuan, Fengyin; Xu, Zhaoxia; Liu, Guoping; Xu, Wenjie

    2015-01-01

    Objective. This research provides objective and quantitative parameters of the traditional Chinese medicine (TCM) pulse conditions for distinguishing between patients with the coronary heart disease (CHD) and normal people by using the proposed classification approach based on Hilbert-Huang transform (HHT) and random forest. Methods. The energy and the sample entropy features were extracted by applying the HHT to TCM pulse by treating these pulse signals as time series. By using the random forest classifier, the extracted two types of features and their combination were, respectively, used as input data to establish classification model. Results. Statistical results showed that there were significant differences in the pulse energy and sample entropy between the CHD group and the normal group. Moreover, the energy features, sample entropy features, and their combination were inputted as pulse feature vectors; the corresponding average recognition rates were 84%, 76.35%, and 90.21%, respectively. Conclusion. The proposed approach could be appropriately used to analyze pulses of patients with CHD, which can lay a foundation for research on objective and quantitative criteria on disease diagnosis or Zheng differentiation. PMID:26180536

  11. Analysis and Recognition of Traditional Chinese Medicine Pulse Based on the Hilbert-Huang Transform and Random Forest in Patients with Coronary Heart Disease.

    PubMed

    Guo, Rui; Wang, Yiqin; Yan, Hanxia; Yan, Jianjun; Yuan, Fengyin; Xu, Zhaoxia; Liu, Guoping; Xu, Wenjie

    2015-01-01

    Objective. This research provides objective and quantitative parameters of the traditional Chinese medicine (TCM) pulse conditions for distinguishing between patients with the coronary heart disease (CHD) and normal people by using the proposed classification approach based on Hilbert-Huang transform (HHT) and random forest. Methods. The energy and the sample entropy features were extracted by applying the HHT to TCM pulse by treating these pulse signals as time series. By using the random forest classifier, the extracted two types of features and their combination were, respectively, used as input data to establish classification model. Results. Statistical results showed that there were significant differences in the pulse energy and sample entropy between the CHD group and the normal group. Moreover, the energy features, sample entropy features, and their combination were inputted as pulse feature vectors; the corresponding average recognition rates were 84%, 76.35%, and 90.21%, respectively. Conclusion. The proposed approach could be appropriately used to analyze pulses of patients with CHD, which can lay a foundation for research on objective and quantitative criteria on disease diagnosis or Zheng differentiation.

  12. Disaggregating census data for population mapping using random forests with remotely-sensed and ancillary data.

    PubMed

    Stevens, Forrest R; Gaughan, Andrea E; Linard, Catherine; Tatem, Andrew J

    2015-01-01

    High resolution, contemporary data on human population distributions are vital for measuring impacts of population growth, monitoring human-environment interactions and for planning and policy development. Many methods are used to disaggregate census data and predict population densities for finer scale, gridded population data sets. We present a new semi-automated dasymetric modeling approach that incorporates detailed census and ancillary data in a flexible, "Random Forest" estimation technique. We outline the combination of widely available, remotely-sensed and geospatial data that contribute to the modeled dasymetric weights and then use the Random Forest model to generate a gridded prediction of population density at ~100 m spatial resolution. This prediction layer is then used as the weighting surface to perform dasymetric redistribution of the census counts at a country level. As a case study we compare the new algorithm and its products for three countries (Vietnam, Cambodia, and Kenya) with other common gridded population data production methodologies. We discuss the advantages of the new method and increases over the accuracy and flexibility of those previous approaches. Finally, we outline how this algorithm will be extended to provide freely-available gridded population data sets for Africa, Asia and Latin America.

  13. Analysis of landslide hazard area in Ludian earthquake based on Random Forests

    NASA Astrophysics Data System (ADS)

    Xie, J.-C.; Liu, R.; Li, H.-W.; Lai, Z.-L.

    2015-04-01

    With the development of machine learning theory, more and more algorithms are evaluated for seismic landslides. After the Ludian earthquake, the research team combine with the special geological structure in Ludian area and the seismic filed exploration results, selecting SLOPE(PODU); River distance(HL); Fault distance(DC); Seismic Intensity(LD) and Digital Elevation Model(DEM), the normalized difference vegetation index(NDVI) which based on remote sensing images as evaluation factors. But the relationships among these factors are fuzzy, there also exists heavy noise and high-dimensional, we introduce the random forest algorithm to tolerate these difficulties and get the evaluation result of Ludian landslide areas, in order to verify the accuracy of the result, using the ROC graphs for the result evaluation standard, AUC covers an area of 0.918, meanwhile, the random forest's generalization error rate decreases with the increase of the classification tree to the ideal 0.08 by using Out Of Bag(OOB) Estimation. Studying the final landslides inversion results, paper comes to a statistical conclusion that near 80% of the whole landslides and dilapidations are in areas with high susceptibility and moderate susceptibility, showing the forecast results are reasonable and adopted.

  14. Prediction of drug synergy in cancer using ensemble-based machine learning techniques

    NASA Astrophysics Data System (ADS)

    Singh, Harpreet; Rana, Prashant Singh; Singh, Urvinder

    2018-04-01

    Drug synergy prediction plays a significant role in the medical field for inhibiting specific cancer agents. It can be developed as a pre-processing tool for therapeutic successes. Examination of different drug-drug interaction can be done by drug synergy score. It needs efficient regression-based machine learning approaches to minimize the prediction errors. Numerous machine learning techniques such as neural networks, support vector machines, random forests, LASSO, Elastic Nets, etc., have been used in the past to realize requirement as mentioned above. However, these techniques individually do not provide significant accuracy in drug synergy score. Therefore, the primary objective of this paper is to design a neuro-fuzzy-based ensembling approach. To achieve this, nine well-known machine learning techniques have been implemented by considering the drug synergy data. Based on the accuracy of each model, four techniques with high accuracy are selected to develop ensemble-based machine learning model. These models are Random forest, Fuzzy Rules Using Genetic Cooperative-Competitive Learning method (GFS.GCCL), Adaptive-Network-Based Fuzzy Inference System (ANFIS) and Dynamic Evolving Neural-Fuzzy Inference System method (DENFIS). Ensembling is achieved by evaluating the biased weighted aggregation (i.e. adding more weights to the model with a higher prediction score) of predicted data by selected models. The proposed and existing machine learning techniques have been evaluated on drug synergy score data. The comparative analysis reveals that the proposed method outperforms others in terms of accuracy, root mean square error and coefficient of correlation.

  15. On the information content of hydrological signatures and their relationship to catchment attributes

    NASA Astrophysics Data System (ADS)

    Addor, Nans; Clark, Martyn P.; Prieto, Cristina; Newman, Andrew J.; Mizukami, Naoki; Nearing, Grey; Le Vine, Nataliya

    2017-04-01

    Hydrological signatures, which are indices characterizing hydrologic behavior, are increasingly used for the evaluation, calibration and selection of hydrological models. Their key advantage is to provide more direct insights into specific hydrological processes than aggregated metrics (e.g., the Nash-Sutcliffe efficiency). A plethora of signatures now exists, which enable characterizing a variety of hydrograph features, but also makes the selection of signatures for new studies challenging. Here we propose that the selection of signatures should be based on their information content, which we estimated using several approaches, all leading to similar conclusions. To explore the relationship between hydrological signatures and the landscape, we extended a previously published data set of hydrometeorological time series for 671 catchments in the contiguous United States, by characterizing the climatic conditions, topography, soil, vegetation and stream network of each catchment. This new catchment attributes data set will soon be in open access, and we are looking forward to introducing it to the community. We used this data set in a data-learning algorithm (random forests) to explore whether hydrological signatures could be inferred from catchment attributes alone. We find that some signatures can be predicted remarkably well by random forests and, interestingly, the same signatures are well captured when simulating discharge using a conceptual hydrological model. We discuss what this result reveals about our understanding of hydrological processes shaping hydrological signatures. We also identify which catchment attributes exert the strongest control on catchment behavior, in particular during extreme hydrological events. Overall, climatic attributes have the most significant influence, and strongly condition how well hydrological signatures can be predicted by random forests and simulated by the hydrological model. In contrast, soil characteristics at the catchment scale are not found to be significant predictors by random forests, which raises questions on how to best use soil data for hydrological modeling, for instance for parameter estimation. We finally demonstrate that signatures with high spatial variability are poorly captured by random forests and model simulations, which makes their regionalization delicate. We conclude with a ranking of signatures based on their information content, and propose that the signatures with high information content are best suited for model calibration, model selection and understanding hydrologic similarity.

  16. U.S. forest products module : a technical document supporting the Forest Service 2010 RPA Assessment

    Treesearch

    Peter J. Ince; Andrew D. Kramp; Kenneth E. Skog; Henry N. Spelter; David N. Wear

    2011-01-01

    The U.S. Forest Products Module (USFPM) is a partial market equilibrium model of the U.S. forest sector that operates within the Global Forest Products Model (GFPM) to provide long-range timber market projections in relation to global economic scenarios. USFPM was designed specifically for the 2010 RPA forest assessment, but it is being used also in other applications...

  17. Discriminant forest classification method and system

    DOEpatents

    Chen, Barry Y.; Hanley, William G.; Lemmond, Tracy D.; Hiller, Lawrence J.; Knapp, David A.; Mugge, Marshall J.

    2012-11-06

    A hybrid machine learning methodology and system for classification that combines classical random forest (RF) methodology with discriminant analysis (DA) techniques to provide enhanced classification capability. A DA technique which uses feature measurements of an object to predict its class membership, such as linear discriminant analysis (LDA) or Andersen-Bahadur linear discriminant technique (AB), is used to split the data at each node in each of its classification trees to train and grow the trees and the forest. When training is finished, a set of n DA-based decision trees of a discriminant forest is produced for use in predicting the classification of new samples of unknown class.

  18. Effect of inventory method on niche models: random versus systematic error

    Treesearch

    Heather E. Lintz; Andrew N. Gray; Bruce McCune

    2013-01-01

    Data from large-scale biological inventories are essential for understanding and managing Earth's ecosystems. The Forest Inventory and Analysis Program (FIA) of the U.S. Forest Service is the largest biological inventory in North America; however, the FIA inventory recently changed from an amalgam of different approaches to a nationally-standardized approach in...

  19. Modeling species’ realized climatic niche space and predicting their response to global warming for several western forest species with small geographic distributions

    Treesearch

    Marcus V. Warwell; Gerald E. Rehfeldt; Nicholas L. Crookston

    2010-01-01

    The Random Forests multiple regression tree was used to develop an empirically based bioclimatic model of the presence-absence of species occupying small geographic distributions in western North America. The species assessed were subalpine larch (Larix lyallii), smooth Arizona cypress (Cupressus arizonica ssp. glabra...

  20. Determining soil erosion from roads in coastal plain of Alabama

    Treesearch

    McFero Grace; W.J. Elliot

    2008-01-01

    This paper reports soil losses and observed sediment deposition for 16 randomly selected forest road sections in the National Forests of Alabama. Visible sediment deposition zones were tracked along the stormwater flow path to the most remote location as a means of quantifying soil loss from road sections. Volumes of sediment in deposition zones were determined by...

  1. Quantifying the abundance of co-occurring conifers along Inland Northwest (USA) climate gradients

    Treesearch

    Gerald E. Rehfeldt; Dennis E. Ferguson; Nicholas L. Crookston

    2008-01-01

    The occurrence and abundance of conifers along climate gradients in the Inland Northwest (USA) was assessed using data from 5082 field plots, 81% of which were forested. Analyses using the Random Forests classification tree revealed that the sequential distribution of species along an altitudinal gradient could be predicted with reasonable accuracy from a single...

  2. Patterns among the ashes: Exploring the relationship between landscape pattern and the emerald ash borer

    Treesearch

    Susan J. Crocker; Dacia M. Meneguzzo; Greg C. Liknes

    2010-01-01

    Landscape metrics, including host abundance and population density, were calculated using forest inventory and land cover data to assess the relationship between landscape pattern and the presence or absence of the emerald ash borer (EAB) (Agrilus planipennis Fairmaire). The Random Forests classification algorithm in the R statistical environment was...

  3. Quantitative Trait Inheritance in a Forty-Year-Old Longleaf Pine Partial Diallel Test

    Treesearch

    Michael Stine; Jim Roberds; C. Dana Nelson; David P. Gwaze; Todd Shupe; Les Groom

    2002-01-01

    A longleaf pine (Pinus palustris Mill.) 13 parent partial diallel field experiment was established at two locations on the Harrison Experimental Forest in 1960. Parent trees were randomly selected from a natural population growing on the Harrison Experimental Forest, near Gulfport, Miss. Distance between trees chosen as parents ranged from 13 to 357...

  4. Modeling long-term suspended-sediment export from an undisturbed forest catchment

    NASA Astrophysics Data System (ADS)

    Zimmermann, Alexander; Francke, Till; Elsenbeer, Helmut

    2013-04-01

    Most estimates of suspended sediment yields from humid, undisturbed, and geologically stable forest environments fall within a range of 5 - 30 t km-2 a-1. These low natural erosion rates in small headwater catchments (≤ 1 km2) support the common impression that a well-developed forest cover prevents surface erosion. Interestingly, those estimates originate exclusively from areas with prevailing vertical hydrological flow paths. Forest environments dominated by (near-) surface flow paths (overland flow, pipe flow, and return flow) and a fast response to rainfall, however, are not an exceptional phenomenon, yet only very few sediment yields have been estimated for these areas. Not surprisingly, even fewer long-term (≥ 10 years) records exist. In this contribution we present our latest research which aims at quantifying long-term suspended-sediment export from an undisturbed rainforest catchment prone to frequent overland flow. A key aspect of our approach is the application of machine-learning techniques (Random Forest, Quantile Regression Forest) which allows not only the handling of non-Gaussian data, non-linear relations between predictors and response, and correlations between predictors, but also the assessment of prediction uncertainty. For the current study we provided the machine-learning algorithms exclusively with information from a high-resolution rainfall time series to reconstruct discharge and suspended sediment dynamics for a 21-year period. The significance of our results is threefold. First, our estimates clearly show that forest cover does not necessarily prevent erosion if wet antecedent conditions and large rainfalls coincide. During these situations, overland flow is widespread and sediment fluxes increase in a non-linear fashion due to the mobilization of new sediment sources. Second, our estimates indicate that annual suspended sediment yields of the undisturbed forest catchment show large fluctuations. Depending on the frequency of large events, annual suspended-sediment yield varies between 74 - 416 t km-2 a-1. Third, the estimated sediment yields exceed former benchmark values by an order of magnitude and provide evidence that the erosion footprint of undisturbed, forested catchments can be undistinguishable from that of sustainably managed, but hydrologically less responsive areas. Because of the susceptibility to soil loss we argue that any land use should be avoided in natural erosion hotspots.

  5. Building Diversified Multiple Trees for classification in high dimensional noisy biomedical data.

    PubMed

    Li, Jiuyong; Liu, Lin; Liu, Jixue; Green, Ryan

    2017-12-01

    It is common that a trained classification model is applied to the operating data that is deviated from the training data because of noise. This paper will test an ensemble method, Diversified Multiple Tree (DMT), on its capability for classifying instances in a new laboratory using the classifier built on the instances of another laboratory. DMT is tested on three real world biomedical data sets from different laboratories in comparison with four benchmark ensemble methods, AdaBoost, Bagging, Random Forests, and Random Trees. Experiments have also been conducted on studying the limitation of DMT and its possible variations. Experimental results show that DMT is significantly more accurate than other benchmark ensemble classifiers on classifying new instances of a different laboratory from the laboratory where instances are used to build the classifier. This paper demonstrates that an ensemble classifier, DMT, is more robust in classifying noisy data than other widely used ensemble methods. DMT works on the data set that supports multiple simple trees.

  6. Recognising discourse causality triggers in the biomedical domain.

    PubMed

    Mihăilă, Claudiu; Ananiadou, Sophia

    2013-12-01

    Current domain-specific information extraction systems represent an important resource for biomedical researchers, who need to process vast amounts of knowledge in a short time. Automatic discourse causality recognition can further reduce their workload by suggesting possible causal connections and aiding in the curation of pathway models. We describe here an approach to the automatic identification of discourse causality triggers in the biomedical domain using machine learning. We create several baselines and experiment with and compare various parameter settings for three algorithms, i.e. Conditional Random Fields (CRF), Support Vector Machines (SVM) and Random Forests (RF). We also evaluate the impact of lexical, syntactic, and semantic features on each of the algorithms, showing that semantics improves the performance in all cases. We test our comprehensive feature set on two corpora containing gold standard annotations of causal relations, and demonstrate the need for more gold standard data. The best performance of 79.35% F-score is achieved by CRFs when using all three feature types.

  7. LiDAR based prediction of forest biomass using hierarchical models with spatially varying coefficients

    USGS Publications Warehouse

    Babcock, Chad; Finley, Andrew O.; Bradford, John B.; Kolka, Randall K.; Birdsey, Richard A.; Ryan, Michael G.

    2015-01-01

    Many studies and production inventory systems have shown the utility of coupling covariates derived from Light Detection and Ranging (LiDAR) data with forest variables measured on georeferenced inventory plots through regression models. The objective of this study was to propose and assess the use of a Bayesian hierarchical modeling framework that accommodates both residual spatial dependence and non-stationarity of model covariates through the introduction of spatial random effects. We explored this objective using four forest inventory datasets that are part of the North American Carbon Program, each comprising point-referenced measures of above-ground forest biomass and discrete LiDAR. For each dataset, we considered at least five regression model specifications of varying complexity. Models were assessed based on goodness of fit criteria and predictive performance using a 10-fold cross-validation procedure. Results showed that the addition of spatial random effects to the regression model intercept improved fit and predictive performance in the presence of substantial residual spatial dependence. Additionally, in some cases, allowing either some or all regression slope parameters to vary spatially, via the addition of spatial random effects, further improved model fit and predictive performance. In other instances, models showed improved fit but decreased predictive performance—indicating over-fitting and underscoring the need for cross-validation to assess predictive ability. The proposed Bayesian modeling framework provided access to pixel-level posterior predictive distributions that were useful for uncertainty mapping, diagnosing spatial extrapolation issues, revealing missing model covariates, and discovering locally significant parameters.

  8. Change in phylogenetic community structure during succession of traditionally managed tropical rainforest in southwest China.

    PubMed

    Mo, Xiao-Xue; Shi, Ling-Ling; Zhang, Yong-Jiang; Zhu, Hua; Slik, J W Ferry

    2013-01-01

    Tropical rainforests in Southeast Asia are facing increasing and ever more intense human disturbance that often negatively affects biodiversity. The aim of this study was to determine how tree species phylogenetic diversity is affected by traditional forest management types and to understand the change in community phylogenetic structure during succession. Four types of forests with different management histories were selected for this purpose: old growth forests, understorey planted old growth forests, old secondary forests (∼200-years after slash and burn), and young secondary forests (15-50-years after slash and burn). We found that tree phylogenetic community structure changed from clustering to over-dispersion from early to late successional forests and finally became random in old-growth forest. We also found that the phylogenetic structure of the tree overstorey and understorey responded differentially to change in environmental conditions during succession. In addition, we show that slash and burn agriculture (swidden cultivation) can increase landscape level plant community evolutionary information content.

  9. Change in Phylogenetic Community Structure during Succession of Traditionally Managed Tropical Rainforest in Southwest China

    PubMed Central

    Mo, Xiao-Xue; Shi, Ling-Ling; Zhang, Yong-Jiang; Zhu, Hua; Slik, J. W. Ferry

    2013-01-01

    Tropical rainforests in Southeast Asia are facing increasing and ever more intense human disturbance that often negatively affects biodiversity. The aim of this study was to determine how tree species phylogenetic diversity is affected by traditional forest management types and to understand the change in community phylogenetic structure during succession. Four types of forests with different management histories were selected for this purpose: old growth forests, understorey planted old growth forests, old secondary forests (∼200-years after slash and burn), and young secondary forests (15–50-years after slash and burn). We found that tree phylogenetic community structure changed from clustering to over-dispersion from early to late successional forests and finally became random in old-growth forest. We also found that the phylogenetic structure of the tree overstorey and understorey responded differentially to change in environmental conditions during succession. In addition, we show that slash and burn agriculture (swidden cultivation) can increase landscape level plant community evolutionary information content. PMID:23936268

  10. The structure of tropical forests and sphere packings

    PubMed Central

    Jahn, Markus Wilhelm; Dobner, Hans-Jürgen; Wiegand, Thorsten; Huth, Andreas

    2015-01-01

    The search for simple principles underlying the complex architecture of ecological communities such as forests still challenges ecological theorists. We use tree diameter distributions—fundamental for deriving other forest attributes—to describe the structure of tropical forests. Here we argue that tree diameter distributions of natural tropical forests can be explained by stochastic packing of tree crowns representing a forest crown packing system: a method usually used in physics or chemistry. We demonstrate that tree diameter distributions emerge accurately from a surprisingly simple set of principles that include site-specific tree allometries, random placement of trees, competition for space, and mortality. The simple static model also successfully predicted the canopy structure, revealing that most trees in our two studied forests grow up to 30–50 m in height and that the highest packing density of about 60% is reached between the 25- and 40-m height layer. Our approach is an important step toward identifying a minimal set of processes responsible for generating the spatial structure of tropical forests. PMID:26598678

  11. Geographic patterns of at-risk species: A technical document supporting the USDA Forest Service Interim Update of the 2000 RPA Assessment

    Treesearch

    Curtis H. Flather; Michael S. Knowles; Jason McNees

    2008-01-01

    This technical document supports the Forest Service's requirement to assess the status of renewable natural resources as mandated by the Forest and Rangeland Renewable Resources Planning Act of 1974. It updates past reports on the trends and geographic patterns of species formally listed as threatened or endangered under the Endangered Species Act of 1973. We...

  12. AgRISTARS: Renewable resources inventory. Land information support system implementation plan and schedule. [San Juan National Forest pilot test

    NASA Technical Reports Server (NTRS)

    Yao, S. S. (Principal Investigator)

    1981-01-01

    The planning and scheduling of the use of remote sensing and computer technology to support the land management planning effort at the national forests level are outlined. The task planning and system capability development were reviewed. A user evaluation is presented along with technological transfer methodology. A land management planning pilot test of the San Juan National Forest is discussed.

  13. The relationship between the understory shrub component of coastal forests and the conservation of forest carnivores

    Treesearch

    Keith M. Slauson; William J. Zielinski

    2007-01-01

    The physical structure of vegetation is an important predictor of habitat for wildlife species. The coastal forests of the Redwood region are highly productive, supporting structurally-diverse forest habitats. The major elements of structural diversity in these forests include trees, shrubs, and herbaceous plants, which together create three-dimensional complexity. In...

  14. Penobscot Experimental Forest: resources, administration, and mission

    Treesearch

    Alan J. Kimball

    2014-01-01

    The Penobscot Experimental Forest (PEF) was established more than 60 years ago as a result of private forest landowners' interest in supporting forest research in Maine. In 1950, nine pulp and paper and land-holding companies pooled resources and purchased almost 4,000 acres of land in east-central Maine. The property was named the Penobscot Experimental Forest...

  15. Fragmentation of eastern United States forest types

    Treesearch

    Kurt H. Riitters; John W. Coulston

    2013-01-01

    Fragmentation is a continuing threat to the sustainability of forests in the Eastern United States, where land use changes supporting a growing human population are the primary driver of forest fragmentation (Stein and others 2009). While once mostly forested, approximately 40 percent of the original forest area has been converted to other land uses, and most of the...

  16. Proceedings of the fifth Lake States forest tree improvement conference

    Treesearch

    Lake States Forest Experiment Station

    1962-01-01

    The Lake States Forest Experiment Station has given active support to the Lake States Forest Tree Improvement Committee since the Committee's inception in 1953. In the interests of encouraging and coordinating forest genetics activities in this region, we are happy to publish this Proceedings of the Fifth Lake States Forest Tree Improvement Conference, as we did...

  17. Forest Service programs, authorities, and relationships: A technical document supporting the 2000 USDA Forest Service RPA Assessment

    Treesearch

    Ervin G. Schuster; Michael A. Krebs

    2003-01-01

    The Forest and Rangeland Renewable Resources Planning Act (RPA) of 1974, as amended, directs the Forest Service to prepare and update a renewable resources assessment that would include "a description of Forest Service programs and responsibilities , their interrelationships, and the relationship of these programs and responsibilities to public and private...

  18. Southern forest science: past, present, and future

    Treesearch

    H. Michael Rauscher; Kurt Johnsen

    2004-01-01

    Southern forests provide innumerable benefits. Forest scientists, managers, owners, and users have in common the desire to improve the condition of these forests and the ecosystems they support. A first step is to understand the contributions science has made and continues to make to the care and management of forests. This book represents a celebration of past...

  19. Trends in national forest values among forestry professionals, environmentalists, and the news media, 1982-1993

    Treesearch

    Zhi Xu; David N. Bengston

    1997-01-01

    This study empirically analyzes the evolution of national forest values in recent years. Four broad categories of forest values are distinguished: economic/utilitarian, life support, aesthetic, and moral/spiritual. A computerized content analysis procedure was developed to identify expressions of these four forest values related to the national forests. With this...

  20. National workshop on forest productivity & technology: cooperative research to support a sustainable & competitive future - progress and strategy

    Treesearch

    Eric D. Vance

    2010-01-01

    The Agenda 2020 Program is a partnership among government agencies, the forest products industry, and academia to develop technology capable of enhancing forest productivity, sustaining environmental values, increasing energy efficiency, and improving the economic competitiveness of the United States forest sector. In November 2006, the USDA Forest Service, in...

  1. Summer inventory of landbirds in Kenai Fjords National Park

    USGS Publications Warehouse

    2006-01-01

    As part of the National Park Service Inventory and Monitoring Program, we conducted a summer inventory of landbirds within Kenai Fjords National Park. Using a stratified random sampling design of areas accessible by boat or on foot, we selected sites that encompassed the breadth of habitat types within the Park. We detected 101 species across 52 transects, including 62 species of landbirds, which confirmed presence of 87% of landbird species expected to occur in the Park during the summer breeding season. We found evidence of breeding for three Partners in Flight Watch List species, Rufous Hummingbird (Selasphorus rufus), Olive-sided Flycatcher (Contopus cooperi), and Rusty Blackbird (Euphagus carolinus), which are of particular conservation concern due to recent population declines. Kenai Fjords National Park supports extremely high densities of Hermit Thrush, Orange-crowned Warbler, and Wilson’s Warbler (Wilsonia pusilla) compared with other regions of Alaska. Other commonly observed species included Fox Sparrow (Passerella iliaca), Varied Thrush (Ixoreus naevius), Rubycrowned Kinglet (Regulus calendula), and Yellow Warbler (Dendroica petechia). More than half of the landbird species we observed occurred in needleleaf forests, and several of these species were strongly associated with the coastforest interface. Tall shrub habitats, which occurred across all elevations and in recently deglaciated areas, supported high densities and a diverse array of passerines. Two major riparian corridors, with their broadleaf forests, wetlands, and connectivity to interior Alaska, provided unique and important landbird habitats within the region.

  2. FINAL TECHNICAL REPORT FOR FORESTRY BIOFUEL STATEWIDE COLLABORATION CENTER (MICHIGAN)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    LaCourt, Donna M.; Miller, Raymond O.; Shonnard, David R.

    A team composed of scientists from Michigan State University (MSU) and Michigan Technological University (MTU) assembled to better understand, document, and improve systems for using forest-based biomass feedstocks in the production of energy products within Michigan. Work was funded by a grant (DE-EE-0000280) from the U.S. Department of Energy (DOE) and was administered by the Michigan Economic Development Corporation (MEDC). The goal of the project was to improve the forest feedstock supply infrastructure to sustainably provide woody biomass for biofuel production in Michigan over the long-term. Work was divided into four broad areas with associated objectives: • TASK A: Developmore » a Forest-Based Biomass Assessment for Michigan – Define forest-based feedstock inventory, availability, and the potential of forest-based feedstock to support state and federal renewable energy goals while maintaining current uses. • TASK B: Improve Harvesting, Processing and Transportation Systems – Identify and develop cost, energy, and carbon efficient harvesting, processing and transportation systems. • TASK C: Improve Forest Feedstock Productivity and Sustainability – Identify and develop sustainable feedstock production systems through the establishment and monitoring of a statewide network of field trials in forests and energy plantations. • TASK D: Engage Stakeholders – Increase understanding of forest biomass production systems for biofuels by a broad range of stakeholders. The goal and objectives of this research and development project were fulfilled with key model deliverables including: 1) The Forest Biomass Inventory System (Sub-task A1) of feedstock inventory and availability and, 2) The Supply Chain Model (Sub-task B2). Both models are vital to Michigan’s forest biomass industry and support forecasting delivered cost, as well as carbon and energy balance. All of these elements are important to facilitate investor, operational and policy decisions. All other sub-tasks supported the development of these two tools either directly or by building out supporting information in the forest biomass supply chain. Outreach efforts have, and are continuing to get these user friendly models and information to decision makers to support biomass feedstock supply chain decisions across the areas of biomass inventory and availability, procurement, harvest, forwarding, transportation and processing. Outreach will continue on the project website at http://www.michiganforestbiofuels.org/ and http://www.michiganwoodbiofuels.org/« less

  3. Community assembly in epiphytic lichens in early stages of colonization.

    PubMed

    Gjerde, Ivar; Blom, Hans H; Lindblom, Louise; Saetersdal, Magne; Schei, Fride Høstad

    2012-04-01

    Colonization studies may function as natural experiments and have the potential of addressing important questions about community assembly. We studied colonization for a guild of epiphytic lichens in a former treeless heathland area of 170 km2 in southwest Norway. We investigated if epiphytic lichen species richness and composition on aspen (Populus tremula) trees corresponded to a random draw of lichen individuals from the regional species pool. We compared lichen communities of isolated young (55-120 yr) and old (140-200 yr) forest patches in the heathland area to those of aspen forest in an adjacent reference area that has been forested for a long time. All thalli (lichen bodies) of 32 selected lichen species on trunks of aspen were recorded in 35 aspen sites. When data for each site category (young, old, and reference) were pooled, we found the species richness by rarefaction to be similar for reference sites and old sites, but significantly lower for young sites. The depauperated species richness of young sites was accompanied by a skew in species composition and absence of several species that were common in the reference sites. In contrast, genetic variation screened with neutral microsatellite markers in the lichen species Lobaria pulmonaria showed no significant differences between site categories. Our null hypothesis of a neutral species assembly in young sites corresponding to a random draw from the regional species pool was rejected, whereas an alternative hypothesis based on differences in colonization capacity among species was supported. The results indicate that for the habitat configuration in the heathland area (isolated patches constituting < 0.4% of the area) lichen communities may need a colonization time of 100-150 yr for species richness to level off, but given enough time, isolation will not affect species richness. We suggest that this contradiction to expectations from classical island equilibrium theory results from low extinction rates.

  4. A Comparative Object-Based Sugarcane Classification from Sentinel-2 Data Using Random Forests and Support Vector Machines

    NASA Astrophysics Data System (ADS)

    Chen, C. R.; Chen, C. F.; Nguyen, S. T.; Lau, K.; Lay, J. G.

    2016-12-01

    Sugarcane mostly grown in tropical and subtropical regions is one of the important commercial crops worldwide, providing significant employment, foreign exchange earnings, and other social and environmental benefits. The sugar industry is a vital component of Belize's economy as it provides employment to 15% of the country's population and 60% of the national agricultural exports. Sugarcane mapping is thus an important task due to official initiatives to provide reliable information on sugarcane-growing areas in respect to improved accuracy in monitoring sugarcane production and yield estimates. Policymakers need such monitoring information to formulate timely plans to ensure sustainably socioeconomic development. Sugarcane monitoring in Belize is traditionally carried out through time-consuming and costly field surveys. Remote sensing is an indispensable tool for crop monitoring on national, regional and global scales. The use of high and low resolution satellites for sugarcane monitoring in Belize is often restricted due to cost limitations and mixed pixel problems because sugarcane fields are small and fragmental. With the launch of Sentinel-2 satellite, it is possible to collectively map small patches of sugarcane fields over a large region as the data are free of charge and have high spectral, spatial, and temporal resolutions. This study aims to develop an object-based classification approach to comparatively map sugarcane fields in Belize from Sentinel-2 data using random forests (RF) and support vector machines (SVM). The data were processed through four main steps: (1) data pre-processing, (2) image segmentation, (3) sugarcane classification, and (4) accuracy assessment. The mapping results compared with the ground reference data indicated satisfactory results. The overall accuracies and Kappa coefficients were generally higher than 80% and 0.7, in both cases. The RF produced slightly more accurate mapping results than SVM. This study demonstrates the realization of the potential application of Sentinel-2 data for sugarcane mapping in Belize with the aid of RF and SVM methods. The methods are thus proposed for monitoring purposes in the country.

  5. Population and harvest trends of big game and small game species: a technical document supporting the USDA Forest Service Interim Update of the 2000 RPA Assessment

    Treesearch

    Curtis H. Flather; Michael S. Knowles; Stephen J. Brady

    2009-01-01

    This technical document supports the Forest Service's requirement to assess the status of renewable natural resources as mandated by the Forest and Rangeland Renewable Resources Planning Act of 1974 (RPA). It updates past reports on national and regional trends in population and harvest estimates for species classified as big game and small game. The trends...

  6. Novelty and its ecological implications to dry forest functioning and conservation

    Treesearch

    Ariel Lugo; Heather Erickson

    2017-01-01

    Tropical and subtropical dry forest life zones support forests with lower stature and species richness than do tropical and subtropical life zones with greater water availability. The number of naturalized species that can thrive and mix with native species to form novel forests in dry forest conditions in Puerto Rico and the US Virgin Islands is lower than in other...

  7. Basal area growth for 15 tropical trees species in Puerto Rico. Forest

    Treesearch

    B. R. Parresol

    1995-01-01

    The tabonuco forest of Puerto Rico support a diverse population of tree species valued for timber, fuel, food, wildlife food and cover, and erosion control among other use. tree basal area growth data spanning 39 years are avaible on 15 species from eigth permanent plots in Luquillo Experimental Forest. The complexity of the rain forest challeges current forest...

  8. Using FIESTA , an R-based tool for analysts, to look at temporal trends in forest estimates

    Treesearch

    Tracey S. Frescino; Paul L. Patterson; Elizabeth A. Freeman; Gretchen G. Moisen

    2012-01-01

    FIESTA (Forest Inventory Estimation for Analysis) is a user-friendly R package that supports the production of estimates for forest resources based on procedures from Bechtold and Patterson (2005). The package produces output consistent with current tools available for the Forest Inventory and Analysis National Program, such as FIDO (Forest Inventory Data Online) and...

  9. Water quality and fish dynamics in forested wetlands associated with an oxbow lake

    USGS Publications Warehouse

    Andrews, Caroline S.; Miranda, Leandro E.; Kroger, Robert

    2015-01-01

    Forested wetlands represent some of the most distinct environments in the Lower Mississippi Alluvial Valley. Depending on season, water in forested wetlands can be warm, stagnant, and oxygen-depleted, yet may support high fish diversity. Fish assemblages in forested wetlands are not well studied because of difficulties in sampling heavily structured environments. During the April–July period, we surveyed and compared the water quality and assemblages of small fish in a margin wetland (forested fringe along a lake shore), contiguous wetland (forested wetland adjacent to a lake), and the open water of an oxbow lake. Dissolved-oxygen levels measured hourly 0.5 m below the surface were higher in the open water than in either of the forested wetlands. Despite reduced water quality, fish-species richness and catch rates estimated with light traps were greater in the forested wetlands than in the open water. The forested wetlands supported large numbers of fish and unique fish assemblages that included some rare species, likely because of their structural complexity. Programs developed to refine agricultural practices, preserve riparian zones, and restore lakes should include guidance to protect and reestablish forested wetlands.

  10. Measurement guidelines for the sequestration of forest carbon

    Treesearch

    Timothy R.H. Pearson; Sandra L. Brown; Richard A. Birdsey

    2007-01-01

    Measurement guidelines for forest carbon sequestration were developed to support reporting by public and private entities to greenhouse gas registries. These guidelines are intended to be a reference for designing a forest carbon inventory and monitoring system by professionals with a knowledge of sampling, statistical estimation, and forest measurements. This report...

  11. A decision support system for forest harvest planning in North Carolina

    Treesearch

    D.G. Jones

    2010-01-01

    Forest preharvest planning (FPP) can enhance recognition of environmentally-sensitive areas in advance of forest harvesting, including soil and water resources. While preharvest planning is often a standard component of many forest harvesting operations, either explicitly with paper-based checklists or implicitly with best professional judgment, Geographic Information...

  12. Keeping your forest soils healthy and productive.

    Treesearch

    Ole T. Helgerson; Richard E. Miller

    2008-01-01

    Soils are an integral structural part of your woodland and the larger forest ecosystem. Important forest soil functions include:Providing water, nutrients, and physical support for the growth of trees and other forest plantsAllowing an exchange of carbon dioxide, oxygen, and other gasses that affect root growth and...

  13. Estimating forest canopy fuel parameters using LIDAR data.

    Treesearch

    Hans-Erik Andersen; Robert J. McGaughey; Stephen E. Reutebuch

    2005-01-01

    Fire researchers and resource managers are dependent upon accurate, spatially-explicit forest structure information to support the application of forest fire behavior models. In particular, reliable estimates of several critical forest canopy structure metrics, including canopy bulk density, canopy height, canopy fuel weight, and canopy base height, are required to...

  14. Forest Structure Characterization Using JPL's UAVSAR Multi-Baseline Polarimetric SAR Interferometry and Tomography

    NASA Technical Reports Server (NTRS)

    Neumann, Maxim; Hensley, Scott; Lavalle, Marco; Ahmed, Razi

    2013-01-01

    This paper concerns forest remote sensing using JPL's multi-baseline polarimetric interferometric UAVSAR data. It presents exemplary results and analyzes the possibilities and limitations of using SAR Tomography and Polarimetric SAR Interferometry (PolInSAR) techniques for the estimation of forest structure. Performance and error indicators for the applicability and reliability of the used multi-baseline (MB) multi-temporal (MT) PolInSAR random volume over ground (RVoG) model are discussed. Experimental results are presented based on JPL's L-band repeat-pass polarimetric interferometric UAVSAR data over temperate and tropical forest biomes in the Harvard Forest, Massachusetts, and in the La Amistad Park, Panama and Costa Rica. The results are partially compared with ground field measurements and with air-borne LVIS lidar data.

  15. Forest Structure Characterization Using Jpl's UAVSAR Multi-Baseline Polarimetric SAR Interferometry and Tomography

    NASA Technical Reports Server (NTRS)

    Neumann, Maxim; Hensley, Scott; Lavalle, Marco; Ahmed, Razi

    2013-01-01

    This paper concerns forest remote sensing using JPL's multi-baseline polarimetric interferometric UAVSAR data. It presents exemplary results and analyzes the possibilities and limitations of using SAR Tomography and Polarimetric SAR Interferometry (PolInSAR) techniques for the estimation of forest structure. Performance and error indicators for the applicability and reliability of the used multi-baseline (MB) multi-temporal (MT) PolInSAR random volume over ground (RVoG) model are discussed. Experimental results are presented based on JPL's L-band repeat-pass polarimetric interferometric UAVSAR data over temperate and tropical forest biomes in the Harvard Forest, Massachusetts, and in the La Amistad Park, Panama and Costa Rica. The results are partially compared with ground field measurements and with air-borne LVIS lidar data.

  16. Design of Probabilistic Random Forests with Applications to Anticancer Drug Sensitivity Prediction

    PubMed Central

    Rahman, Raziur; Haider, Saad; Ghosh, Souparno; Pal, Ranadip

    2015-01-01

    Random forests consisting of an ensemble of regression trees with equal weights are frequently used for design of predictive models. In this article, we consider an extension of the methodology by representing the regression trees in the form of probabilistic trees and analyzing the nature of heteroscedasticity. The probabilistic tree representation allows for analytical computation of confidence intervals (CIs), and the tree weight optimization is expected to provide stricter CIs with comparable performance in mean error. We approached the ensemble of probabilistic trees’ prediction from the perspectives of a mixture distribution and as a weighted sum of correlated random variables. We applied our methodology to the drug sensitivity prediction problem on synthetic and cancer cell line encyclopedia dataset and illustrated that tree weights can be selected to reduce the average length of the CI without increase in mean error. PMID:27081304

  17. Computer-Aided Screening of Conjugated Polymers for Organic Solar Cell: Classification by Random Forest.

    PubMed

    Nagasawa, Shinji; Al-Naamani, Eman; Saeki, Akinori

    2018-05-17

    Owing to the diverse chemical structures, organic photovoltaic (OPV) applications with a bulk heterojunction framework have greatly evolved over the last two decades, which has produced numerous organic semiconductors exhibiting improved power conversion efficiencies (PCEs). Despite the recent fast progress in materials informatics and data science, data-driven molecular design of OPV materials remains challenging. We report a screening of conjugated molecules for polymer-fullerene OPV applications by supervised learning methods (artificial neural network (ANN) and random forest (RF)). Approximately 1000 experimental parameters including PCE, molecular weight, and electronic properties are manually collected from the literature and subjected to machine learning with digitized chemical structures. Contrary to the low correlation coefficient in ANN, RF yields an acceptable accuracy, which is twice that of random classification. We demonstrate the application of RF screening for the design, synthesis, and characterization of a conjugated polymer, which facilitates a rapid development of optoelectronic materials.

  18. Quantifying and mapping spatial variability in simulated forest plots

    Treesearch

    Gavin R. Corral; Harold E. Burkhart

    2016-01-01

    We used computer simulations to test the efficacy of multivariate statistical methods to detect, quantify, and map spatial variability of forest stands. Simulated stands were developed of regularly-spaced plantations of loblolly pine (Pinus taeda L.). We assumed no affects of competition or mortality, but random variability was added to individual tree characteristics...

  19. Mountain Pine Beetles and Invasive Plant Species Findings from a Survey of Colorado Community Residents

    Treesearch

    Courtney Flint; Hua Qin; Michael Daab

    2008-01-01

    The US Forest Service, Pacific Northwest Research Station funded research to assess community responses to forest disturbance by mountain pine beetles (Dendroctonus ponderosae) and public reaction to invasive plants in north central Colorado. In the Spring of2007, 4,027 16-page questionnaires were mailed to randomly selected households with addresses in Breckenridge,...

  20. Effects of soil compaction, forest leaf litter and nitrogen fertilizer on two oak species and microbial activity

    Treesearch

    D. Jordan; F., Jr. Ponder; V. C. Hubbard

    2003-01-01

    A greenhouse study examined the effects of soil compaction and forest leaf litter on the growth and nitrogen (N) uptake and recovery of red oak (Quercus rubra L.) and scarlet oak (Quercus coccinea Muencch) seedlings and selected microbial activity over a 6-month period. The experiment had a randomized complete block design with...

  1. Stemflow estimation in a redwood forest using model-based stratified random sampling

    Treesearch

    Jack Lewis

    2003-01-01

    Model-based stratified sampling is illustrated by a case study of stemflow volume in a redwood forest. The approach is actually a model-assisted sampling design in which auxiliary information (tree diameter) is utilized in the design of stratum boundaries to optimize the efficiency of a regression or ratio estimator. The auxiliary information is utilized in both the...

  2. Estimating erosion risk on forest lands using improved methods of discriminant analysis

    Treesearch

    J. Lewis; R. M. Rice

    1990-01-01

    A population of 638 timber harvest areas in northwestern California was sampled for data related to the occurrence of critical amounts of erosion (>153 m3 within 0.81 ha). Separate analyses were done for forest roads and logged areas. Linear discriminant functions were computed in each analysis to contrast site conditions at critical plots with randomly selected...

  3. Sample-based estimation of tree species richness in a wet tropical forest compartment

    Treesearch

    Steen Magnussen; Raphael Pelissier

    2007-01-01

    Petersen's capture-recapture ratio estimator and the well-known bootstrap estimator are compared across a range of simulated low-intensity simple random sampling with fixed-area plots of 100 m? in a rich wet tropical forest compartment with 93 tree species in the Western Ghats of India. Petersen's ratio estimator was uniformly superior to the bootstrap...

  4. Rates and Implications of Rainfall Interception in a Coastal Redwood Forest

    Treesearch

    Leslie M. Reid; Jack Lewis

    2007-01-01

    Throughfall was measured for a year at five-min intervals in 11 collectors randomly located on two plots in a second-growth redwood forest at the Caspar Creek Experimental Watersheds. Monitoring at one plot continued two more years, during which stemflow from 24 trees was also measured. Comparison of throughfall and stemflow to rainfall measured in adjacent clearings...

  5. Variation in soil and forest floor characteristics along gradients of ericaceous, evergreen shrub cover in the southern Appalachians

    Treesearch

    Jonatha L. Horton; Barton D. Clinton; John F. Walker; Colin M. Beir; Erik T. Nilsen

    2009-01-01

    Ericaceous shrubs can influence soil properties in many ecosystems. In this study, we examined how soil and forest floor properties vary among sites with different ericaceous evergreen shrub basal area in the southern Appalachian mountains. We randomly located plots along transects that included open understories and understories with varying amounts of Rhododendron...

  6. Predicting relative species composition within mixed conifer forest pixels using zero‐inflated models and Landsat imagery

    Treesearch

    Shannon L. Savage; Rick L. Lawrence; John R. Squires

    2015-01-01

    Ecological and land management applications would often benefit from maps of relative canopy cover of each species present within a pixel, instead of traditional remote-sensing based maps of either dominant species or percent canopy cover without regard to species composition. Widely used statistical models for remote sensing, such as randomForest (RF),...

  7. 'Pygmy' old-growth redwood characteristics on an edaphic ecotone in Mendocino County, California

    Treesearch

    Will Russell; Suzie. Woolhouse

    2012-01-01

    The 'pygmy forest' is a specialized community that is adapted to highly acidic, hydrophobic, nutrient deprived soils, and exists in pockets within the coast redwood forest in Mendocino County. While coast redwood is known as an exceptionally tall tree, stunted trees exhibit unusual growth-forms on pygmy soils. We used a stratified random sampling procedure to...

  8. Ecological impacts and management strategies for western larch in the face of climate-change

    Treesearch

    Gerald E. Rehfeldt; Barry C. Jaquish

    2010-01-01

    Approximately 185,000 forest inventory and ecological plots from both USA and Canada were used to predict the contemporary distribution of western larch (Larix occidentalis Nutt.) from climate variables. The random forests algorithm, using an 8-variable model, produced an overall error rate of about 2.9 %, nearly all of which consisted of predicting presence at...

  9. Simulation of long-term landscape-level fuel treatment effects on large wildfires

    Treesearch

    Mark A. Finney; Rob C. Seli; Charles W. McHugh; Alan A. Ager; Bernhard Bahro; James K. Agee

    2008-01-01

    A simulation system was developed to explore how fuel treatments placed in topologically random and optimal spatial patterns affect the growth and behaviour of large fires when implemented at different rates over the course of five decades. The system consisted of a forest and fuel dynamics simulation module (Forest Vegetation Simulator, FVS), logic for deriving fuel...

  10. Geomorphology and forest management in New Zealand's erodible steeplands: An overview

    NASA Astrophysics Data System (ADS)

    Phillips, Chris; Marden, Michael; Basher, Les R.

    2018-04-01

    In this paper we outline how geomorphological understanding has underpinned forest management in New Zealand's erodible steeplands, where it contributes to current forest management, and suggest where it will be of value in the future. We focus on the highly erodible soft-rock hill country of the East Coast region of North Island, but cover other parts of New Zealand where appropriate. We conclude that forestry will continue to make a significant contribution to New Zealand's economy, but several issues need to be addressed. The most pressing concerns are the incidence of post-harvest, storm-initiated landslides and debris flows arising from steepland forests following timber harvesting. There are three areas where geomorphological information and understanding are required to support the forest industry - development of an improved national erosion susceptibility classification to support a new national standard for plantation forestry; terrain analysis to support improved hazard and risk assessment at detailed operational scales; and understanding of post-harvest shallow landslide-debris flows, including their prediction and management.

  11. Looking for age-related growth decline in natural forests: unexpected biomass patterns from tree rings and simulated mortality

    USGS Publications Warehouse

    Foster, Jane R.; D'Amato, Anthony W.; Bradford, John B.

    2014-01-01

    Forest biomass growth is almost universally assumed to peak early in stand development, near canopy closure, after which it will plateau or decline. The chronosequence and plot remeasurement approaches used to establish the decline pattern suffer from limitations and coarse temporal detail. We combined annual tree ring measurements and mortality models to address two questions: first, how do assumptions about tree growth and mortality influence reconstructions of biomass growth? Second, under what circumstances does biomass production follow the model that peaks early, then declines? We integrated three stochastic mortality models with a census tree-ring data set from eight temperate forest types to reconstruct stand-level biomass increments (in Minnesota, USA). We compared growth patterns among mortality models, forest types and stands. Timing of peak biomass growth varied significantly among mortality models, peaking 20–30 years earlier when mortality was random with respect to tree growth and size, than when mortality favored slow-growing individuals. Random or u-shaped mortality (highest in small or large trees) produced peak growth 25–30 % higher than the surviving tree sample alone. Growth trends for even-aged, monospecific Pinus banksiana or Acer saccharum forests were similar to the early peak and decline expectation. However, we observed continually increasing biomass growth in older, low-productivity forests of Quercus rubra, Fraxinus nigra, and Thuja occidentalis. Tree-ring reconstructions estimated annual changes in live biomass growth and identified more diverse development patterns than previous methods. These detailed, long-term patterns of biomass development are crucial for detecting recent growth responses to global change and modeling future forest dynamics.

  12. Selection of forest canopy gaps by male Cerulean Warblers in West Virginia

    USGS Publications Warehouse

    Perkins, Kelly A.; Wood, Petra Bohall

    2014-01-01

    Forest openings, or canopy gaps, are an important resource for many forest songbirds, such as Cerulean Warblers (Setophaga cerulea). We examined canopy gap selection by this declining species to determine if male Cerulean Warblers selected particular sizes, vegetative heights, or types of gaps. We tested whether these parameters differed among territories, territory core areas, and randomly-placed sample plots. We used enhanced territory mapping techniques (burst sampling) to define habitat use within the territory. Canopy gap densities were higher within core areas of territories than within territories or random plots, indicating that Cerulean Warblers selected habitat within their territories with the highest gap densities. Selection of regenerating gaps with woody vegetation >12 m within the gap, and canopy heights >24 m surrounding the gap, occurred within territory core areas. These findings differed between two sites indicating that gap selection may vary based on forest structure. Differences were also found regarding the placement of territories with respect to gaps. Larger gaps, such as wildlife food plots, were located on the periphery of territories more often than other types and sizes of gaps, while smaller gaps, such as treefalls, were located within territory boundaries more often than expected. The creations of smaller canopy gaps, <100 m2, within dense stands are likely compatible with forest management for this species.

  13. Computer aided diagnosis system for the Alzheimer's disease based on partial least squares and random forest SPECT image classification.

    PubMed

    Ramírez, J; Górriz, J M; Segovia, F; Chaves, R; Salas-Gonzalez, D; López, M; Alvarez, I; Padilla, P

    2010-03-19

    This letter shows a computer aided diagnosis (CAD) technique for the early detection of the Alzheimer's disease (AD) by means of single photon emission computed tomography (SPECT) image classification. The proposed method is based on partial least squares (PLS) regression model and a random forest (RF) predictor. The challenge of the curse of dimensionality is addressed by reducing the large dimensionality of the input data by downscaling the SPECT images and extracting score features using PLS. A RF predictor then forms an ensemble of classification and regression tree (CART)-like classifiers being its output determined by a majority vote of the trees in the forest. A baseline principal component analysis (PCA) system is also developed for reference. The experimental results show that the combined PLS-RF system yields a generalization error that converges to a limit when increasing the number of trees in the forest. Thus, the generalization error is reduced when using PLS and depends on the strength of the individual trees in the forest and the correlation between them. Moreover, PLS feature extraction is found to be more effective for extracting discriminative information from the data than PCA yielding peak sensitivity, specificity and accuracy values of 100%, 92.7%, and 96.9%, respectively. Moreover, the proposed CAD system outperformed several other recently developed AD CAD systems. Copyright 2010 Elsevier Ireland Ltd. All rights reserved.

  14. Decision support for the integrated restoration and protection strategy of the Forest Service, Northern Region

    Treesearch

    Keith Reynolds; Barry Bollenbacher; Chip Fisher; Melissa Hart; Mary Manning; Eric Henderson; Bruce Sims

    2016-01-01

    This report documents a decision-support process developed in the U.S. Department of Agriculture, Forest Service, Northern Region to assess management opportunities as part of an ecosystem-based approach to management that emphasizes ecological resilience. The decision-support system described in this work implements what is known as the Integrated Restoration and...

  15. Forest Resources of the United States, 2012: a technical document supporting the Forest Service 2010 update of the RPA Assessment

    Treesearch

    Sonja N. Oswalt; W. Brad Smith; Patrick D. Miles; Scott A. Pugh

    2014-01-01

    Forest resource statistics from the 2010 Resources Planning Act (RPA) Assessment were updated to provide current information on the Nation's forests as a baseline for the 2015 national assessment. Resource tables present estimates of forest area, volume, mortality, growth, removals, and timber products output in various ways, such as by ownership, region, or State...

  16. Outlook to 2060 for world forests and forest industries: a technical document supporting the Forest Service 2010 RPA assessment

    Treesearch

    Joseph Buongiorno; Shushuai Zhu; Ronald Raunikar; Jeffrey P. Prestemon

    2012-01-01

    Four RPA scenarios corresponding with scenarios from the Third and Fourth Assessments of the Intergovernmental Panel on Climate Change were simulated with the Global Forest Products Model to project forest area, volume, products demand and supply, international trade, prices, and value added up to 2060 for Africa, Asia, Europe, North America, Oceania, South America,...

  17. A preview of Kentucky's forest resource

    Treesearch

    Joseph E. Barnard; Teresa M. Bowers

    1977-01-01

    Forty-eight percent of the total land area of Kentucky is forest. Sixty-three percent of this forest land is the oak-hickory forest type and 47 percent of the forest area supports sawtimber stands. There has been a 23-percent increase in the volume of growing stock and a 24-percent increase in the volume of sawtimber since the 1963 inventory. Total volume of growing...

  18. Forest inventory-based estimation of carbon stocks and flux in California forests in 1990.

    Treesearch

    Jeremy S. Fried; Xiaoping Zhou

    2008-01-01

    Estimates of forest carbon stores and flux for California circa 1990 were modeled from forest inventory data in support of California’s legislatively mandated greenhouse gas inventory. Reliable estimates of live-tree carbon stores and flux on timberlands outside of national forest could be calculated from periodic inventory data collected in the 1980s and 1990s;...

  19. Michigan`s forests 1993: An analysis. Forest Service resource bulletin

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Schmidt, T.L.; Spencer, J.S.; Bertsch, R.

    1997-02-04

    Michigan`s forests are abundant, diverse, healthy, productive, and expanding. These forests make important contributions to the quality of life by providing a wide array of benefits, including wildlife habitat, biological diversity, outdoor recreation, improved air and water quality, and economic resources such as the estimated $12 billion of value added and 200,000 jobs annually supported by forest-based industries/tourism/recreation.

  20. Contributions of water supply from the weathered bedrock zone to forest soil quality

    Treesearch

    James H. Witty; Robert C. Graham; Kenneth R. Hubbert; James A. Doolittle; Jonathan A. Wald

    2003-01-01

    One measure of forest soil quality is the ability of the soil to support tree growth. In mediterranean-type ecosystems, such as most of California's forests, there is virtually no rainfall during the summer growing season, so trees must rely on water stored within the substrate. Water is the primary limitation to productivity in these forests. Many forest soils in...

  1. Diversity of Medicinal Plants among Different Forest-use Types of the Pakistani Himalaya.

    PubMed

    Adnan, Muhammad; Hölscher, Dirk

    2012-12-01

    Diversity of Medicinal Plants among Different Forest-use Types of the Pakistani Himalaya Medicinal plants collected in Himalayan forests play a vital role in the livelihoods of regional rural societies and are also increasingly recognized at the international level. However, these forests are being heavily transformed by logging. Here we ask how forest transformation influences the diversity and composition of medicinal plants in northwestern Pakistan, where we studied old-growth forests, forests degraded by logging, and regrowth forests. First, an approximate map indicating these forest types was established and then 15 study plots per forest type were randomly selected. We found a total of 59 medicinal plant species consisting of herbs and ferns, most of which occurred in the old-growth forest. Species number was lowest in forest degraded by logging and intermediate in regrowth forest. The most valuable economic species, including six Himalayan endemics, occurred almost exclusively in old-growth forest. Species composition and abundance of forest degraded by logging differed markedly from that of old-growth forest, while regrowth forest was more similar to old-growth forest. The density of medicinal plants positively correlated with tree canopy cover in old-growth forest and negatively in degraded forest, which indicates that species adapted to open conditions dominate in logged forest. Thus, old-growth forests are important as refuge for vulnerable endemics. Forest degraded by logging has the lowest diversity of relatively common medicinal plants. Forest regrowth may foster the reappearance of certain medicinal species valuable to local livelihoods and as such promote acceptance of forest expansion and medicinal plants conservation in the region. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1007/s12231-012-9213-4) contains supplementary material, which is available to authorized users.

  2. Prediction of body mass index status from voice signals based on machine learning for automated medical applications.

    PubMed

    Lee, Bum Ju; Kim, Keun Ho; Ku, Boncho; Jang, Jun-Su; Kim, Jong Yeol

    2013-05-01

    The body mass index (BMI) provides essential medical information related to body weight for the treatment and prognosis prediction of diseases such as cardiovascular disease, diabetes, and stroke. We propose a method for the prediction of normal, overweight, and obese classes based only on the combination of voice features that are associated with BMI status, independently of weight and height measurements. A total of 1568 subjects were divided into 4 groups according to age and gender differences. We performed statistical analyses by analysis of variance (ANOVA) and Scheffe test to find significant features in each group. We predicted BMI status (normal, overweight, and obese) by a logistic regression algorithm and two ensemble classification algorithms (bagging and random forests) based on statistically significant features. In the Female-2030 group (females aged 20-40 years), classification experiments using an imbalanced (original) data set gave area under the receiver operating characteristic curve (AUC) values of 0.569-0.731 by logistic regression, whereas experiments using a balanced data set gave AUC values of 0.893-0.994 by random forests. AUC values in Female-4050 (females aged 41-60 years), Male-2030 (males aged 20-40 years), and Male-4050 (males aged 41-60 years) groups by logistic regression in imbalanced data were 0.585-0.654, 0.581-0.614, and 0.557-0.653, respectively. AUC values in Female-4050, Male-2030, and Male-4050 groups in balanced data were 0.629-0.893 by bagging, 0.707-0.916 by random forests, and 0.695-0.854 by bagging, respectively. In each group, we found discriminatory features showing statistical differences among normal, overweight, and obese classes. The results showed that the classification models built by logistic regression in imbalanced data were better than those built by the other two algorithms, and significant features differed according to age and gender groups. Our results could support the development of BMI diagnosis tools for real-time monitoring; such tools are considered helpful in improving automated BMI status diagnosis in remote healthcare or telemedicine and are expected to have applications in forensic and medical science. Copyright © 2013 Elsevier B.V. All rights reserved.

  3. 36 CFR Appendix - Figures to Part 1194

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 36 Parks, Forests, and Public Property 3 2013-07-01 2012-07-01 true Figures to Part 1194 Parks, Forests, and Public Property ARCHITECTURAL AND TRANSPORTATION BARRIERS COMPLIANCE BOARD ELECTRONIC AND INFORMATION TECHNOLOGY ACCESSIBILITY STANDARDS Information, Documentation, and Support Information, documentation, and support. Pt. 1194, Figs....

  4. 36 CFR Appendix - Figures to Part 1194

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... 36 Parks, Forests, and Public Property 3 2012-07-01 2012-07-01 false Figures to Part 1194 Parks, Forests, and Public Property ARCHITECTURAL AND TRANSPORTATION BARRIERS COMPLIANCE BOARD ELECTRONIC AND INFORMATION TECHNOLOGY ACCESSIBILITY STANDARDS Information, Documentation, and Support Information, documentation, and support. Pt. 1194, Figs....

  5. 36 CFR Appendix - Figures to Part 1194

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... 36 Parks, Forests, and Public Property 3 2014-07-01 2014-07-01 false Figures to Part 1194 Parks, Forests, and Public Property ARCHITECTURAL AND TRANSPORTATION BARRIERS COMPLIANCE BOARD ELECTRONIC AND INFORMATION TECHNOLOGY ACCESSIBILITY STANDARDS Information, Documentation, and Support Information, documentation, and support. Pt. 1194, Figs....

  6. 36 CFR Appendix - Figures to Part 1194

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 36 Parks, Forests, and Public Property 3 2011-07-01 2011-07-01 false Figures to Part 1194 Parks, Forests, and Public Property ARCHITECTURAL AND TRANSPORTATION BARRIERS COMPLIANCE BOARD ELECTRONIC AND INFORMATION TECHNOLOGY ACCESSIBILITY STANDARDS Information, Documentation, and Support Information, documentation, and support. Pt. 1194, Figs....

  7. 36 CFR Appendix - Figures to Part 1194

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 36 Parks, Forests, and Public Property 3 2010-07-01 2010-07-01 false Figures to Part 1194 Parks, Forests, and Public Property ARCHITECTURAL AND TRANSPORTATION BARRIERS COMPLIANCE BOARD ELECTRONIC AND INFORMATION TECHNOLOGY ACCESSIBILITY STANDARDS Information, Documentation, and Support Information, documentation, and support. Pt. 1194, Figs....

  8. Carbon storage and accumulation in United States forest ecosystems

    Treesearch

    Richard A. Birdsey

    1992-01-01

    Historically, assessments of the forest resource situation have focused on timber supply, and the data used to support the assessments came from traditional forest inventories designed to provide reliable estimates of timber volume, growth, removals, and mortality (U.S. Department of Agriculture, Forest Service 1982). The most recent assessment included data and...

  9. Forest resources and conditions

    Treesearch

    William H. McWilliams; Linda S. Heath; Gordon C. Reese; Thomas L. Schmidt

    2000-01-01

    The forests of the northern United States support a rich mix of floral and faunal communities that provide inestimable benefits to society. Today's forests face a range of biotic and abiotic stressors, not the least of which may be environmental change. This chapter reviews the compositional traits of presettlement forests and traces the major land use patterns...

  10. 36 CFR 262.4 - Audit of expenditures.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 36 Parks, Forests, and Public Property 2 2010-07-01 2010-07-01 false Audit of expenditures. 262.4 Section 262.4 Parks, Forests, and Public Property FOREST SERVICE, DEPARTMENT OF AGRICULTURE LAW ENFORCEMENT SUPPORT ACTIVITIES Rewards and Payments § 262.4 Audit of expenditures. The Chief of the Forest Service shall, through appropriate directives...

  11. Managing ecosystems for forest health: An approach and the effects on uses and values

    Treesearch

    Chadwick D. Oliver; Dennis E. Ferguson; Alan E. Harvey; Herbert S. Malany; John M. Mandzak; Robert W. Mutch

    1994-01-01

    Forest health is most appropriately based on the scientific paradigm of dynamic, constantly changing forest ecosystems. Many forests in the Inland West now support high levels of insect infestations, disease epidemics, fire susceptibilities, and imbalances in stand structures and habitats because of natural processes and past management practices. Impending,...

  12. Roundwood markets and utilization in West Virginia and Ohio

    Treesearch

    Shawn T. Grushecky; Jan Wiedenbeck; Ben Spong

    2011-01-01

    West Virginia and Ohio have similar forest resources and extensive forest-based economies. Roundwood is harvested throughout this central Appalachian region and supports a diverse primary and secondary forest products sector. The objective of this research was to investigate the utilization of the forest resource harvested in West Virginia and Ohio. Utilization and...

  13. Incidence and impact of damage to and mortality trends of Florida's timber, 1987

    Treesearch

    Elizabeth A. Brantley; Clair Redmond; Michael Thompson

    1994-01-01

    This southeastern forest experiment station, headquartered in Asheville, NC, periodically inventories and evaluates forest resources in Florida, Georgia, North Carolina, South Carolina, and Virginia. The southern region, Forest Health Staff unit, provides training, field support, and evaluation of the data on forest insects, diseases, and other damaging agents.

  14. Automated retrieval of forest structure variables based on multi-scale texture analysis of VHR satellite imagery

    NASA Astrophysics Data System (ADS)

    Beguet, Benoit; Guyon, Dominique; Boukir, Samia; Chehata, Nesrine

    2014-10-01

    The main goal of this study is to design a method to describe the structure of forest stands from Very High Resolution satellite imagery, relying on some typical variables such as crown diameter, tree height, trunk diameter, tree density and tree spacing. The emphasis is placed on the automatization of the process of identification of the most relevant image features for the forest structure retrieval task, exploiting both spectral and spatial information. Our approach is based on linear regressions between the forest structure variables to be estimated and various spectral and Haralick's texture features. The main drawback of this well-known texture representation is the underlying parameters which are extremely difficult to set due to the spatial complexity of the forest structure. To tackle this major issue, an automated feature selection process is proposed which is based on statistical modeling, exploring a wide range of parameter values. It provides texture measures of diverse spatial parameters hence implicitly inducing a multi-scale texture analysis. A new feature selection technique, we called Random PRiF, is proposed. It relies on random sampling in feature space, carefully addresses the multicollinearity issue in multiple-linear regression while ensuring accurate prediction of forest variables. Our automated forest variable estimation scheme was tested on Quickbird and Pléiades panchromatic and multispectral images, acquired at different periods on the maritime pine stands of two sites in South-Western France. It outperforms two well-established variable subset selection techniques. It has been successfully applied to identify the best texture features in modeling the five considered forest structure variables. The RMSE of all predicted forest variables is improved by combining multispectral and panchromatic texture features, with various parameterizations, highlighting the potential of a multi-resolution approach for retrieving forest structure variables from VHR satellite images. Thus an average prediction error of ˜ 1.1 m is expected on crown diameter, ˜ 0.9 m on tree spacing, ˜ 3 m on height and ˜ 0.06 m on diameter at breast height.

  15. Remote sensing based detection of forested wetlands: An evaluation of LiDAR, aerial imagery, and their data fusion

    NASA Astrophysics Data System (ADS)

    Suiter, Ashley Elizabeth

    Multi-spectral imagery provides a robust and low-cost dataset for assessing wetland extent and quality over broad regions and is frequently used for wetland inventories. However in forested wetlands, hydrology is obscured by tree canopy making it difficult to detect with multi-spectral imagery alone. Because of this, classification of forested wetlands often includes greater errors than that of other wetlands types. Elevation and terrain derivatives have been shown to be useful for modelling wetland hydrology. But, few studies have addressed the use of LiDAR intensity data detecting hydrology in forested wetlands. Due the tendency of LiDAR signal to be attenuated by water, this research proposed the fusion of LiDAR intensity data with LiDAR elevation, terrain data, and aerial imagery, for the detection of forested wetland hydrology. We examined the utility of LiDAR intensity data and determined whether the fusion of Lidar derived data with multispectral imagery increased the accuracy of forested wetland classification compared with a classification performed with only multi-spectral image. Four classifications were performed: Classification A -- All Imagery, Classification B -- All LiDAR, Classification C -- LiDAR without Intensity, and Classification D -- Fusion of All Data. These classifications were performed using random forest and each resulted in a 3-foot resolution thematic raster of forested upland and forested wetland locations in Vermilion County, Illinois. The accuracies of these classifications were compared using Kappa Coefficient of Agreement. Importance statistics produced within the random forest classifier were evaluated in order to understand the contribution of individual datasets. Classification D, which used the fusion of LiDAR and multi-spectral imagery as input variables, had moderate to strong agreement between reference data and classification results. It was found that Classification A performed using all the LiDAR data and its derivatives (intensity, elevation, slope, aspect, curvatures, and Topographic Wetness Index) was the most accurate classification with Kappa: 78.04%, indicating moderate to strong agreement. However, Classification C, performed with LiDAR derivative without intensity data had less agreement than would be expected by chance, indicating that LiDAR contributed significantly to the accuracy of Classification B.

  16. Text mining approach to predict hospital admissions using early medical records from the emergency department.

    PubMed

    Lucini, Filipe R; S Fogliatto, Flavio; C da Silveira, Giovani J; L Neyeloff, Jeruza; Anzanello, Michel J; de S Kuchenbecker, Ricardo; D Schaan, Beatriz

    2017-04-01

    Emergency department (ED) overcrowding is a serious issue for hospitals. Early information on short-term inward bed demand from patients receiving care at the ED may reduce the overcrowding problem, and optimize the use of hospital resources. In this study, we use text mining methods to process data from early ED patient records using the SOAP framework, and predict future hospitalizations and discharges. We try different approaches for pre-processing of text records and to predict hospitalization. Sets-of-words are obtained via binary representation, term frequency, and term frequency-inverse document frequency. Unigrams, bigrams and trigrams are tested for feature formation. Feature selection is based on χ 2 and F-score metrics. In the prediction module, eight text mining methods are tested: Decision Tree, Random Forest, Extremely Randomized Tree, AdaBoost, Logistic Regression, Multinomial Naïve Bayes, Support Vector Machine (Kernel linear) and Nu-Support Vector Machine (Kernel linear). Prediction performance is evaluated by F1-scores. Precision and Recall values are also informed for all text mining methods tested. Nu-Support Vector Machine was the text mining method with the best overall performance. Its average F1-score in predicting hospitalization was 77.70%, with a standard deviation (SD) of 0.66%. The method could be used to manage daily routines in EDs such as capacity planning and resource allocation. Text mining could provide valuable information and facilitate decision-making by inward bed management teams. Copyright © 2017 Elsevier Ireland Ltd. All rights reserved.

  17. Design and evaluation of an aerial spray trial with true replicates to test the efficacy of Bacillus thuringiensis insecticide in a boreal forest.

    PubMed

    Cadogan, Beresford L; Scharbach, Roger D

    2003-04-01

    A field trial using true replicates was conducted successfully in a boreal forest in 1996 to evaluate the efficacy of two aerially applied Bacillus thuringiensis formulations, ABG 6429 and ABG 6430. A complete randomized design with four replicates per treatment was chosen. Twelve to 15 balsam fir (Abies balsamea [L.] Mill.) per plot were randomly selected as sample trees. Interplot buffer zones, > or = 200 m wide, adequately prevented cross contamination from sprays that were atomized with four rotary atomizers (volume median diameters ranging from 64.6 to 139.4 microm) and released approximately 30 m above the ground. The B. thuringiensis formulations were not significantly different (P > 0.05) from each other in reducing spruce budworm (Choristoneura fumiferana [Clem.]) populations and protecting balsam trees from defoliation but both formulations were significantly more efficacious than the controls. The results suggest that true replicates are a feasible alternative to pseudoreplication in experimental forest aerial applications.

  18. Enhancing Multimedia Imbalanced Concept Detection Using VIMP in Random Forests.

    PubMed

    Sadiq, Saad; Yan, Yilin; Shyu, Mei-Ling; Chen, Shu-Ching; Ishwaran, Hemant

    2016-07-01

    Recent developments in social media and cloud storage lead to an exponential growth in the amount of multimedia data, which increases the complexity of managing, storing, indexing, and retrieving information from such big data. Many current content-based concept detection approaches lag from successfully bridging the semantic gap. To solve this problem, a multi-stage random forest framework is proposed to generate predictor variables based on multivariate regressions using variable importance (VIMP). By fine tuning the forests and significantly reducing the predictor variables, the concept detection scores are evaluated when the concept of interest is rare and imbalanced, i.e., having little collaboration with other high level concepts. Using classical multivariate statistics, estimating the value of one coordinate using other coordinates standardizes the covariates and it depends upon the variance of the correlations instead of the mean. Thus, conditional dependence on the data being normally distributed is eliminated. Experimental results demonstrate that the proposed framework outperforms those approaches in the comparison in terms of the Mean Average Precision (MAP) values.

  19. Validation of psoriatic arthritis diagnoses in electronic medical records using natural language processing

    PubMed Central

    Cai, Tianxi; Karlson, Elizabeth W.

    2013-01-01

    Objectives To test whether data extracted from full text patient visit notes from an electronic medical record (EMR) would improve the classification of PsA compared to an algorithm based on codified data. Methods From the > 1,350,000 adults in a large academic EMR, all 2318 patients with a billing code for PsA were extracted and 550 were randomly selected for chart review and algorithm training. Using codified data and phrases extracted from narrative data using natural language processing, 31 predictors were extracted and three random forest algorithms trained using coded, narrative, and combined predictors. The receiver operator curve (ROC) was used to identify the optimal algorithm and a cut point was chosen to achieve the maximum sensitivity possible at a 90% positive predictive value (PPV). The algorithm was then used to classify the remaining 1768 charts and finally validated in a random sample of 300 cases predicted to have PsA. Results The PPV of a single PsA code was 57% (95%CI 55%–58%). Using a combination of coded data and NLP the random forest algorithm reached a PPV of 90% (95%CI 86%–93%) at sensitivity of 87% (95% CI 83% – 91%) in the training data. The PPV was 93% (95%CI 89%–96%) in the validation set. Adding NLP predictors to codified data increased the area under the ROC (p < 0.001). Conclusions Using NLP with text notes from electronic medical records improved the performance of the prediction algorithm significantly. Random forests were a useful tool to accurately classify psoriatic arthritis cases to enable epidemiological research. PMID:20701955

  20. Random forests ensemble classifier trained with data resampling strategy to improve cardiac arrhythmia diagnosis.

    PubMed

    Ozçift, Akin

    2011-05-01

    Supervised classification algorithms are commonly used in the designing of computer-aided diagnosis systems. In this study, we present a resampling strategy based Random Forests (RF) ensemble classifier to improve diagnosis of cardiac arrhythmia. Random forests is an ensemble classifier that consists of many decision trees and outputs the class that is the mode of the class's output by individual trees. In this way, an RF ensemble classifier performs better than a single tree from classification performance point of view. In general, multiclass datasets having unbalanced distribution of sample sizes are difficult to analyze in terms of class discrimination. Cardiac arrhythmia is such a dataset that has multiple classes with small sample sizes and it is therefore adequate to test our resampling based training strategy. The dataset contains 452 samples in fourteen types of arrhythmias and eleven of these classes have sample sizes less than 15. Our diagnosis strategy consists of two parts: (i) a correlation based feature selection algorithm is used to select relevant features from cardiac arrhythmia dataset. (ii) RF machine learning algorithm is used to evaluate the performance of selected features with and without simple random sampling to evaluate the efficiency of proposed training strategy. The resultant accuracy of the classifier is found to be 90.0% and this is a quite high diagnosis performance for cardiac arrhythmia. Furthermore, three case studies, i.e., thyroid, cardiotocography and audiology, are used to benchmark the effectiveness of the proposed method. The results of experiments demonstrated the efficiency of random sampling strategy in training RF ensemble classification algorithm. Copyright © 2011 Elsevier Ltd. All rights reserved.

  1. Assessing the Potential of Land Use Modification to Mitigate Ambient NO₂ and Its Consequences for Respiratory Health.

    PubMed

    Rao, Meenakshi; George, Linda A; Shandas, Vivek; Rosenstiel, Todd N

    2017-07-10

    Understanding how local land use and land cover (LULC) shapes intra-urban concentrations of atmospheric pollutants-and thus human health-is a key component in designing healthier cities. Here, NO₂ is modeled based on spatially dense summer and winter NO₂ observations in Portland-Hillsboro-Vancouver (USA), and the spatial variation of NO₂ with LULC investigated using random forest, an ensemble data learning technique. The NO 2 random forest model, together with BenMAP, is further used to develop a better understanding of the relationship among LULC, ambient NO₂ and respiratory health. The impact of land use modifications on ambient NO₂, and consequently on respiratory health, is also investigated using a sensitivity analysis. We find that NO₂ associated with roadways and tree-canopied areas may be affecting annual incidence rates of asthma exacerbation in 4-12 year olds by +3000 per 100,000 and -1400 per 100,000, respectively. Our model shows that increasing local tree canopy by 5% may reduce local incidences rates of asthma exacerbation by 6%, indicating that targeted local tree-planting efforts may have a substantial impact on reducing city-wide incidence of respiratory distress. Our findings demonstrate the utility of random forest modeling in evaluating LULC modifications for enhanced respiratory health.

  2. A Robust Random Forest-Based Approach for Heart Rate Monitoring Using Photoplethysmography Signal Contaminated by Intense Motion Artifacts.

    PubMed

    Ye, Yalan; He, Wenwen; Cheng, Yunfei; Huang, Wenxia; Zhang, Zhilin

    2017-02-16

    The estimation of heart rate (HR) based on wearable devices is of interest in fitness. Photoplethysmography (PPG) is a promising approach to estimate HR due to low cost; however, it is easily corrupted by motion artifacts (MA). In this work, a robust approach based on random forest is proposed for accurately estimating HR from the photoplethysmography signal contaminated by intense motion artifacts, consisting of two stages. Stage 1 proposes a hybrid method to effectively remove MA with a low computation complexity, where two MA removal algorithms are combined by an accurate binary decision algorithm whose aim is to decide whether or not to adopt the second MA removal algorithm. Stage 2 proposes a random forest-based spectral peak-tracking algorithm, whose aim is to locate the spectral peak corresponding to HR, formulating the problem of spectral peak tracking into a pattern classification problem. Experiments on the PPG datasets including 22 subjects used in the 2015 IEEE Signal Processing Cup showed that the proposed approach achieved the average absolute error of 1.65 beats per minute (BPM) on the 22 PPG datasets. Compared to state-of-the-art approaches, the proposed approach has better accuracy and robustness to intense motion artifacts, indicating its potential use in wearable sensors for health monitoring and fitness tracking.

  3. Comparing spatial regression to random forests for large ...

    EPA Pesticide Factsheets

    Environmental data may be “large” due to number of records, number of covariates, or both. Random forests has a reputation for good predictive performance when using many covariates, whereas spatial regression, when using reduced rank methods, has a reputation for good predictive performance when using many records. In this study, we compare these two techniques using a data set containing the macroinvertebrate multimetric index (MMI) at 1859 stream sites with over 200 landscape covariates. Our primary goal is predicting MMI at over 1.1 million perennial stream reaches across the USA. For spatial regression modeling, we develop two new methods to accommodate large data: (1) a procedure that estimates optimal Box-Cox transformations to linearize covariate relationships; and (2) a computationally efficient covariate selection routine that takes into account spatial autocorrelation. We show that our new methods lead to cross-validated performance similar to random forests, but that there is an advantage for spatial regression when quantifying the uncertainty of the predictions. Simulations are used to clarify advantages for each method. This research investigates different approaches for modeling and mapping national stream condition. We use MMI data from the EPA's National Rivers and Streams Assessment and predictors from StreamCat (Hill et al., 2015). Previous studies have focused on modeling the MMI condition classes (i.e., good, fair, and po

  4. Sequential Monte Carlo tracking of the marginal artery by multiple cue fusion and random forest regression.

    PubMed

    Cherry, Kevin M; Peplinski, Brandon; Kim, Lauren; Wang, Shijun; Lu, Le; Zhang, Weidong; Liu, Jianfei; Wei, Zhuoshi; Summers, Ronald M

    2015-01-01

    Given the potential importance of marginal artery localization in automated registration in computed tomography colonography (CTC), we have devised a semi-automated method of marginal vessel detection employing sequential Monte Carlo tracking (also known as particle filtering tracking) by multiple cue fusion based on intensity, vesselness, organ detection, and minimum spanning tree information for poorly enhanced vessel segments. We then employed a random forest algorithm for intelligent cue fusion and decision making which achieved high sensitivity and robustness. After applying a vessel pruning procedure to the tracking results, we achieved statistically significantly improved precision compared to a baseline Hessian detection method (2.7% versus 75.2%, p<0.001). This method also showed statistically significantly improved recall rate compared to a 2-cue baseline method using fewer vessel cues (30.7% versus 67.7%, p<0.001). These results demonstrate that marginal artery localization on CTC is feasible by combining a discriminative classifier (i.e., random forest) with a sequential Monte Carlo tracking mechanism. In so doing, we present the effective application of an anatomical probability map to vessel pruning as well as a supplementary spatial coordinate system for colonic segmentation and registration when this task has been confounded by colon lumen collapse. Published by Elsevier B.V.

  5. A scattering model for forested area

    NASA Technical Reports Server (NTRS)

    Karam, M. A.; Fung, A. K.

    1988-01-01

    A forested area is modeled as a volume of randomly oriented and distributed disc-shaped, or needle-shaped leaves shading a distribution of branches modeled as randomly oriented finite-length, dielectric cylinders above an irregular soil surface. Since the radii of branches have a wide range of sizes, the model only requires the length of a branch to be large compared with its radius which may be any size relative to the incident wavelength. In addition, the model also assumes the thickness of a disc-shaped leaf or the radius of a needle-shaped leaf is much smaller than the electromagnetic wavelength. The scattering phase matrices for disc, needle, and cylinder are developed in terms of the scattering amplitudes of the corresponding fields which are computed by the forward scattering theorem. These quantities along with the Kirchoff scattering model for a randomly rough surface are used in the standard radiative transfer formulation to compute the backscattering coefficient. Numerical illustrations for the backscattering coefficient are given as a function of the shading factor, incidence angle, leaf orientation distribution, branch orientation distribution, and the number density of leaves. Also illustrated are the properties of the extinction coefficient as a function of leaf and branch orientation distributions. Comparisons are made with measured backscattering coefficients from forested areas reported in the literature.

  6. Source localization in an ocean waveguide using supervised machine learning.

    PubMed

    Niu, Haiqiang; Reeves, Emma; Gerstoft, Peter

    2017-09-01

    Source localization in ocean acoustics is posed as a machine learning problem in which data-driven methods learn source ranges directly from observed acoustic data. The pressure received by a vertical linear array is preprocessed by constructing a normalized sample covariance matrix and used as the input for three machine learning methods: feed-forward neural networks (FNN), support vector machines (SVM), and random forests (RF). The range estimation problem is solved both as a classification problem and as a regression problem by these three machine learning algorithms. The results of range estimation for the Noise09 experiment are compared for FNN, SVM, RF, and conventional matched-field processing and demonstrate the potential of machine learning for underwater source localization.

  7. Prediction of Nursing Workload in Hospital.

    PubMed

    Fiebig, Madlen; Hunstein, Dirk; Bartholomeyczik, Sabine

    2018-01-01

    A dissertation project at the Witten/Herdecke University [1] is investigating which (nursing sensitive) patient characteristics are suitable for predicting a higher or lower degree of nursing workload. For this research project four predictive modelling methods were selected. In a first step, SUPPORT VECTOR MACHINE, RANDOM FOREST, and GRADIENT BOOSTING were used to identify potential predictors from the nursing sensitive patient characteristics. The results were compared via FEATURE IMPORTANCE. To predict nursing workload the predictors identified in step 1 were modelled using MULTINOMIAL LOGISTIC REGRESSION. First results from the data mining process will be presented. A prognostic determination of nursing workload can be used not only as a basis for human resource planning in hospital, but also to respond to health policy issues.

  8. Tropical savannas and dry forests.

    PubMed

    Pennington, R Toby; Lehmann, Caroline E R; Rowland, Lucy M

    2018-05-07

    In the tropics, research, conservation and public attention focus on rain forests, but this neglects that half of the global tropics have a seasonally dry climate. These regions are home to dry forests and savannas (Figures 1 and 2), and are the focus of this Primer. The attention given to rain forests is understandable. Their high species diversity, sheer stature and luxuriance thrill biologists today as much as they did the first explorers in the Age of Discovery. Although dry forest and savanna may make less of a first impression, they support a fascinating diversity of plant strategies to cope with stress and disturbance including fire, drought and herbivory. Savannas played a fundamental role in human evolution, and across Africa and India they support iconic megafauna. Copyright © 2018 Elsevier Ltd. All rights reserved.

  9. National Satellite Forest Monitoring systems for REDD+

    NASA Astrophysics Data System (ADS)

    Jonckheere, I. G.

    2012-12-01

    Reducing Emissions from Deforestation and Forest Degradation (REDD) is an effort to create a financial value for the carbon stored in forests, offering incentives for developing countries to reduce emissions from forested lands and invest in low-carbon paths to sustainable development. "REDD+" goes beyond deforestation and forest degradation, and includes the role of conservation, sustainable management of forests and enhancement of forest carbon stocks. In the framework of getting countries ready for REDD+, the UN-REDD Programme assists developing countries to prepare and implement national REDD+ strategies. For the monitoring, reporting and verification, FAO supports the countries to develop national satellite forest monitoring systems that allow for credible measurement, reporting and verification (MRV) of REDD+ activities. These are among the most critical elements for the successful implementation of any REDD+ mechanism. The UN-REDD Programme through a joint effort of FAO and Brazil's National Space Agency, INPE, is supporting countries to develop cost- effective, robust and compatible national monitoring and MRV systems, providing tools, methodologies, training and knowledge sharing that help countries to strengthen their technical and institutional capacity for effective MRV systems. To develop strong nationally-owned forest monitoring systems, technical and institutional capacity building is key. The UN-REDD Programme, through FAO, has taken on intensive training together with INPE, and has provided technical help and assistance for in-country training and implementation for national satellite forest monitoring. The goal of the support to UN-REDD pilot countries in this capacity building effort is the training of technical forest people and IT persons from interested REDD+ countries, and to set- up the national satellite forest monitoring systems. The Brazilian forest monitoring system, TerraAmazon, which is used as a basis for this initiative, allows countries to adapt it to country needs and the training on the TerraAmazon system is a tool to enhance existing capacity on carbon monitoring systems. The support with the National Forest Monitoring System will allow these countries to follow all actions related to the implementation of its national REDD+ policies and measures. The monitoring system will work as a platform to obtain information on their REDD+ results and actions, related directly or indirectly to national REDD+ strategies and may also include actions unrelated to carbon assessment, such as forest law enforcement. With the technical assistance of FAO, INPE and other stakeholders, the countries will set up an autonomous operational forest monitoring system. An initial version and the methodologies of the system for DRC and PNG has been launched in Durban, South Africa during COP 17 and in 2012 Paraguay, Viet Nam and Zambia will be launched in Doha, Qatar at COP 18. The access to high-quality satellite data for these countries is crucial for the set-up.

  10. Recognizing pedestrian's unsafe behaviors in far-infrared imagery at night

    NASA Astrophysics Data System (ADS)

    Lee, Eun Ju; Ko, Byoung Chul; Nam, Jae-Yeal

    2016-05-01

    Pedestrian behavior recognition is important work for early accident prevention in advanced driver assistance system (ADAS). In particular, because most pedestrian-vehicle crashes are occurred from late of night to early of dawn, our study focus on recognizing unsafe behavior of pedestrians using thermal image captured from moving vehicle at night. For recognizing unsafe behavior, this study uses convolutional neural network (CNN) which shows high quality of recognition performance. However, because traditional CNN requires the very expensive training time and memory, we design the light CNN consisted of two convolutional layers and two subsampling layers for real-time processing of vehicle applications. In addition, we combine light CNN with boosted random forest (Boosted RF) classifier so that the output of CNN is not fully connected with the classifier but randomly connected with Boosted random forest. We named this CNN as randomly connected CNN (RC-CNN). The proposed method was successfully applied to the pedestrian unsafe behavior (PUB) dataset captured from far-infrared camera at night and its behavior recognition accuracy is confirmed to be higher than that of some algorithms related to CNNs, with a shorter processing time.

  11. Field strategies for the calibration and validation of high-resolution forest carbon maps: Scaling from plots to a three state region MD, DE, & PA, USA.

    NASA Astrophysics Data System (ADS)

    Dolan, K. A.; Huang, W.; Johnson, K. D.; Birdsey, R.; Finley, A. O.; Dubayah, R.; Hurtt, G. C.

    2016-12-01

    In 2010 Congress directed NASA to initiate research towards the development of Carbon Monitoring Systems (CMS). In response, our team has worked to develop a robust, replicable framework to quantify and map aboveground forest biomass at high spatial resolutions. Crucial to this framework has been the collection of field-based estimates of aboveground tree biomass, combined with remotely detected canopy and structural attributes, for calibration and validation. Here we evaluate the field- based calibration and validation strategies within this carbon monitoring framework and discuss the implications on local to national monitoring systems. Through project development, the domain of this research has expanded from two counties in MD (2,181 km2), to the entire state of MD (32,133 km2), and most recently the tri-state region of MD, PA, and DE (157,868 km2) and covers forests in four major USDA ecological providences. While there are approximately 1000 Forest Inventory and Analysis (FIA) plots distributed across the state of MD, 60% fell in areas considered non-forest or had conditions that precluded them from being measured in the last forest inventory. Across the two pilot counties, where population and landuse competition is high, that proportion rose to 70% Thus, during the initial phases of this project 850 independent field plots were established for model calibration following a random stratified design to insure the adequate representation of height and vegetation classes found across the state, while FIA data were used as an independent data source for validation. As the project expanded to cover the larger spatial tri-state domain, the strategy was flipped to base calibration on more than 3,300 measured FIA plots, as they provide a standardized, consistent and available data source across the nation. An additional 350 stratified random plots were deployed in the Northern Mixed forests of PA and the Coastal Plains forests of DE for validation.

  12. Comparative genetic responses to climate for the varieties of Pinus ponderosa and Pseudotsuga menziesii: realized climate niches

    Treesearch

    Gerald E. Rehfeldt; Barry C. Jaquish; Javier Lopez-Upton; Cuauhtemoc Saenz-Romero; J. Bradley St Clair; Laura P. Leites; Dennis G. Joyce

    2014-01-01

    The Random Forests classification algorithm was used to predict the occurrence of the realized climate niche for two sub-specific varieties of Pinus ponderosa and three varieties of Pseudotsuga menziesii from presence-absence data in forest inventory ground plots. Analyses were based on ca. 271,000 observations for P. ponderosa and ca. 426,000 observations for P....

  13. Sensitivity of a Riparian Large Woody Debris Recruitment Model to the Number of Contributing Banks and Tree Fall Pattern

    Treesearch

    Don C. Bragg; Jeffrey L. Kershner

    2004-01-01

    Riparian large woody debris (LWD) recruitment simulations have traditionally applied a random angle of tree fall from two well-forested stream banks. We used a riparian LWD recruitment model (CWD, version 1.4) to test the validity these assumptions. Both the number of contributing forest banks and predominant tree fall direction significantly influenced simulated...

  14. Ten-year response of a forest bird community to an operational herbicide-shelterwood treatment in Allegheny hardwoods

    Treesearch

    Scott H. Stoleson; Todd E. Ristau; David S. deCalesta; Stephen B. Horsley

    2011-01-01

    Use of herbicides in forestry to direct successional trajectories has raised concerns over possible direct or indirect effects on non-target organisms. We studied the response of forest birds to an operational application of glyphosate and sulfometuron methyl herbicides, using a randomized block design in which half of each 8 ha block received herbicide and the other...

  15. Habitat use of two songbird species in pine-hardwood forests treated with prescribed burning and thinning: first year results

    Treesearch

    Jill M. Wick; Yong Wang

    2010-01-01

    We evaluated habitat use and home range size of hooded warblers (Wilsonia citrine) and worm-eating warblers (Helmitheros vermivorus) in six treated mixed oak-pine stands on the Bankhead National Forest in north-central AL. Study design is a randomized complete block with a factorial arrangement of three thinning levels (no thin, 11...

  16. A comparison of three erosion control mulches on decommissioned forest road corridors in the northern Rocky Mountains, United States

    Treesearch

    R. B. Foltz

    2012-01-01

    This study tested the erosion mitigation effectiveness of agricultural straw and two wood-based mulches for four years on decommissioned forest roads. Plots were installed on the loosely consolidated, bare soil to measure sediment production, mulch cover, and plant regrowth. The experimental design was a repeated measures, randomized block on two soil types common in...

  17. Effects of low intensity prescribed fires on ponderosa pine forests in wilderness areas of Zion National Park, Utah

    Treesearch

    Henry V. Bastian

    2001-01-01

    Vegetation and fuel loading plots were monitored and sampled in wilderness areas treated with prescribed fire. Changes in ponderosa pine (Pinus ponderosa) forest structure tree species and fuel loading are presented. Plots were randomly stratified and established in burn units in 1995. Preliminary analysis of nine plots 2 years after burning show litter was reduced 54....

  18. Bayesian spatial prediction of the site index in the study of the Missouri Ozark Forest Ecosystem Project

    Treesearch

    Xiaoqian Sun; Zhuoqiong He; John Kabrick

    2008-01-01

    This paper presents a Bayesian spatial method for analysing the site index data from the Missouri Ozark Forest Ecosystem Project (MOFEP). Based on ecological background and availability, we select three variables, the aspect class, the soil depth and the land type association as covariates for analysis. To allow great flexibility of the smoothness of the random field,...

  19. Mapping Soil Properties of Africa at 250 m Resolution: Random Forests Significantly Improve Current Predictions

    PubMed Central

    Hengl, Tomislav; Heuvelink, Gerard B. M.; Kempen, Bas; Leenaars, Johan G. B.; Walsh, Markus G.; Shepherd, Keith D.; Sila, Andrew; MacMillan, Robert A.; Mendes de Jesus, Jorge; Tamene, Lulseged; Tondoh, Jérôme E.

    2015-01-01

    80% of arable land in Africa has low soil fertility and suffers from physical soil problems. Additionally, significant amounts of nutrients are lost every year due to unsustainable soil management practices. This is partially the result of insufficient use of soil management knowledge. To help bridge the soil information gap in Africa, the Africa Soil Information Service (AfSIS) project was established in 2008. Over the period 2008–2014, the AfSIS project compiled two point data sets: the Africa Soil Profiles (legacy) database and the AfSIS Sentinel Site database. These data sets contain over 28 thousand sampling locations and represent the most comprehensive soil sample data sets of the African continent to date. Utilizing these point data sets in combination with a large number of covariates, we have generated a series of spatial predictions of soil properties relevant to the agricultural management—organic carbon, pH, sand, silt and clay fractions, bulk density, cation-exchange capacity, total nitrogen, exchangeable acidity, Al content and exchangeable bases (Ca, K, Mg, Na). We specifically investigate differences between two predictive approaches: random forests and linear regression. Results of 5-fold cross-validation demonstrate that the random forests algorithm consistently outperforms the linear regression algorithm, with average decreases of 15–75% in Root Mean Squared Error (RMSE) across soil properties and depths. Fitting and running random forests models takes an order of magnitude more time and the modelling success is sensitive to artifacts in the input data, but as long as quality-controlled point data are provided, an increase in soil mapping accuracy can be expected. Results also indicate that globally predicted soil classes (USDA Soil Taxonomy, especially Alfisols and Mollisols) help improve continental scale soil property mapping, and are among the most important predictors. This indicates a promising potential for transferring pedological knowledge from data rich countries to countries with limited soil data. PMID:26110833

  20. Improved predictive mapping of indoor radon concentrations using ensemble regression trees based on automatic clustering of geological units.

    PubMed

    Kropat, Georg; Bochud, Francois; Jaboyedoff, Michel; Laedermann, Jean-Pascal; Murith, Christophe; Palacios Gruson, Martha; Baechler, Sébastien

    2015-09-01

    According to estimations around 230 people die as a result of radon exposure in Switzerland. This public health concern makes reliable indoor radon prediction and mapping methods necessary in order to improve risk communication to the public. The aim of this study was to develop an automated method to classify lithological units according to their radon characteristics and to develop mapping and predictive tools in order to improve local radon prediction. About 240 000 indoor radon concentration (IRC) measurements in about 150 000 buildings were available for our analysis. The automated classification of lithological units was based on k-medoids clustering via pair-wise Kolmogorov distances between IRC distributions of lithological units. For IRC mapping and prediction we used random forests and Bayesian additive regression trees (BART). The automated classification groups lithological units well in terms of their IRC characteristics. Especially the IRC differences in metamorphic rocks like gneiss are well revealed by this method. The maps produced by random forests soundly represent the regional difference of IRCs in Switzerland and improve the spatial detail compared to existing approaches. We could explain 33% of the variations in IRC data with random forests. Additionally, the influence of a variable evaluated by random forests shows that building characteristics are less important predictors for IRCs than spatial/geological influences. BART could explain 29% of IRC variability and produced maps that indicate the prediction uncertainty. Ensemble regression trees are a powerful tool to model and understand the multidimensional influences on IRCs. Automatic clustering of lithological units complements this method by facilitating the interpretation of radon properties of rock types. This study provides an important element for radon risk communication. Future approaches should consider taking into account further variables like soil gas radon measurements as well as more detailed geological information. Copyright © 2015 Elsevier Ltd. All rights reserved.

  1. Ensemble Pruning for Glaucoma Detection in an Unbalanced Data Set.

    PubMed

    Adler, Werner; Gefeller, Olaf; Gul, Asma; Horn, Folkert K; Khan, Zardad; Lausen, Berthold

    2016-12-07

    Random forests are successful classifier ensemble methods consisting of typically 100 to 1000 classification trees. Ensemble pruning techniques reduce the computational cost, especially the memory demand, of random forests by reducing the number of trees without relevant loss of performance or even with increased performance of the sub-ensemble. The application to the problem of an early detection of glaucoma, a severe eye disease with low prevalence, based on topographical measurements of the eye background faces specific challenges. We examine the performance of ensemble pruning strategies for glaucoma detection in an unbalanced data situation. The data set consists of 102 topographical features of the eye background of 254 healthy controls and 55 glaucoma patients. We compare the area under the receiver operating characteristic curve (AUC), and the Brier score on the total data set, in the majority class, and in the minority class of pruned random forest ensembles obtained with strategies based on the prediction accuracy of greedily grown sub-ensembles, the uncertainty weighted accuracy, and the similarity between single trees. To validate the findings and to examine the influence of the prevalence of glaucoma in the data set, we additionally perform a simulation study with lower prevalences of glaucoma. In glaucoma classification all three pruning strategies lead to improved AUC and smaller Brier scores on the total data set with sub-ensembles as small as 30 to 80 trees compared to the classification results obtained with the full ensemble consisting of 1000 trees. In the simulation study, we were able to show that the prevalence of glaucoma is a critical factor and lower prevalence decreases the performance of our pruning strategies. The memory demand for glaucoma classification in an unbalanced data situation based on random forests could effectively be reduced by the application of pruning strategies without loss of performance in a population with increased risk of glaucoma.

  2. Random forest feature selection, fusion and ensemble strategy: Combining multiple morphological MRI measures to discriminate among healhy elderly, MCI, cMCI and alzheimer's disease patients: From the alzheimer's disease neuroimaging initiative (ADNI) database.

    PubMed

    Dimitriadis, S I; Liparas, Dimitris; Tsolaki, Magda N

    2018-05-15

    In the era of computer-assisted diagnostic tools for various brain diseases, Alzheimer's disease (AD) covers a large percentage of neuroimaging research, with the main scope being its use in daily practice. However, there has been no study attempting to simultaneously discriminate among Healthy Controls (HC), early mild cognitive impairment (MCI), late MCI (cMCI) and stable AD, using features derived from a single modality, namely MRI. Based on preprocessed MRI images from the organizers of a neuroimaging challenge, 3 we attempted to quantify the prediction accuracy of multiple morphological MRI features to simultaneously discriminate among HC, MCI, cMCI and AD. We explored the efficacy of a novel scheme that includes multiple feature selections via Random Forest from subsets of the whole set of features (e.g. whole set, left/right hemisphere etc.), Random Forest classification using a fusion approach and ensemble classification via majority voting. From the ADNI database, 60 HC, 60 MCI, 60 cMCI and 60 CE were used as a training set with known labels. An extra dataset of 160 subjects (HC: 40, MCI: 40, cMCI: 40 and AD: 40) was used as an external blind validation dataset to evaluate the proposed machine learning scheme. In the second blind dataset, we succeeded in a four-class classification of 61.9% by combining MRI-based features with a Random Forest-based Ensemble Strategy. We achieved the best classification accuracy of all teams that participated in this neuroimaging competition. The results demonstrate the effectiveness of the proposed scheme to simultaneously discriminate among four groups using morphological MRI features for the very first time in the literature. Hence, the proposed machine learning scheme can be used to define single and multi-modal biomarkers for AD. Copyright © 2017 Elsevier B.V. All rights reserved.

  3. Automated segmentation of dental CBCT image with prior-guided sequential random forests

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wang, Li; Gao, Yaozong; Shi, Feng

    Purpose: Cone-beam computed tomography (CBCT) is an increasingly utilized imaging modality for the diagnosis and treatment planning of the patients with craniomaxillofacial (CMF) deformities. Accurate segmentation of CBCT image is an essential step to generate 3D models for the diagnosis and treatment planning of the patients with CMF deformities. However, due to the image artifacts caused by beam hardening, imaging noise, inhomogeneity, truncation, and maximal intercuspation, it is difficult to segment the CBCT. Methods: In this paper, the authors present a new automatic segmentation method to address these problems. Specifically, the authors first employ a majority voting method to estimatemore » the initial segmentation probability maps of both mandible and maxilla based on multiple aligned expert-segmented CBCT images. These probability maps provide an important prior guidance for CBCT segmentation. The authors then extract both the appearance features from CBCTs and the context features from the initial probability maps to train the first-layer of random forest classifier that can select discriminative features for segmentation. Based on the first-layer of trained classifier, the probability maps are updated, which will be employed to further train the next layer of random forest classifier. By iteratively training the subsequent random forest classifier using both the original CBCT features and the updated segmentation probability maps, a sequence of classifiers can be derived for accurate segmentation of CBCT images. Results: Segmentation results on CBCTs of 30 subjects were both quantitatively and qualitatively validated based on manually labeled ground truth. The average Dice ratios of mandible and maxilla by the authors’ method were 0.94 and 0.91, respectively, which are significantly better than the state-of-the-art method based on sparse representation (p-value < 0.001). Conclusions: The authors have developed and validated a novel fully automated method for CBCT segmentation.« less

  4. Comparison of Logistic Regression and Random Forests techniques for shallow landslide susceptibility assessment in Giampilieri (NE Sicily, Italy)

    NASA Astrophysics Data System (ADS)

    Trigila, Alessandro; Iadanza, Carla; Esposito, Carlo; Scarascia-Mugnozza, Gabriele

    2015-11-01

    The aim of this work is to define reliable susceptibility models for shallow landslides using Logistic Regression and Random Forests multivariate statistical techniques. The study area, located in North-East Sicily, was hit on October 1st 2009 by a severe rainstorm (225 mm of cumulative rainfall in 7 h) which caused flash floods and more than 1000 landslides. Several small villages, such as Giampilieri, were hit with 31 fatalities, 6 missing persons and damage to buildings and transportation infrastructures. Landslides, mainly types such as earth and debris translational slides evolving into debris flows, were triggered on steep slopes and involved colluvium and regolith materials which cover the underlying metamorphic bedrock. The work has been carried out with the following steps: i) realization of a detailed event landslide inventory map through field surveys coupled with observation of high resolution aerial colour orthophoto; ii) identification of landslide source areas; iii) data preparation of landslide controlling factors and descriptive statistics based on a bivariate method (Frequency Ratio) to get an initial overview on existing relationships between causative factors and shallow landslide source areas; iv) choice of criteria for the selection and sizing of the mapping unit; v) implementation of 5 multivariate statistical susceptibility models based on Logistic Regression and Random Forests techniques and focused on landslide source areas; vi) evaluation of the influence of sample size and type of sampling on results and performance of the models; vii) evaluation of the predictive capabilities of the models using ROC curve, AUC and contingency tables; viii) comparison of model results and obtained susceptibility maps; and ix) analysis of temporal variation of landslide susceptibility related to input parameter changes. Models based on Logistic Regression and Random Forests have demonstrated excellent predictive capabilities. Land use and wildfire variables were found to have a strong control on the occurrence of very rapid shallow landslides.

  5. Monitoring grass nutrients and biomass as indicators of rangeland quality and quantity using random forest modelling and WorldView-2 data

    NASA Astrophysics Data System (ADS)

    Ramoelo, Abel; Cho, M. A.; Mathieu, R.; Madonsela, S.; van de Kerchove, R.; Kaszta, Z.; Wolff, E.

    2015-12-01

    Land use and climate change could have huge impacts on food security and the health of various ecosystems. Leaf nitrogen (N) and above-ground biomass are some of the key factors limiting agricultural production and ecosystem functioning. Leaf N and biomass can be used as indicators of rangeland quality and quantity. Conventional methods for assessing these vegetation parameters at landscape scale level are time consuming and tedious. Remote sensing provides a bird-eye view of the landscape, which creates an opportunity to assess these vegetation parameters over wider rangeland areas. Estimation of leaf N has been successful during peak productivity or high biomass and limited studies estimated leaf N in dry season. The estimation of above-ground biomass has been hindered by the signal saturation problems using conventional vegetation indices. The objective of this study is to monitor leaf N and above-ground biomass as an indicator of rangeland quality and quantity using WorldView-2 satellite images and random forest technique in the north-eastern part of South Africa. Series of field work to collect samples for leaf N and biomass were undertaken in March 2013, April or May 2012 (end of wet season) and July 2012 (dry season). Several conventional and red edge based vegetation indices were computed. Overall results indicate that random forest and vegetation indices explained over 89% of leaf N concentrations for grass and trees, and less than 89% for all the years of assessment. The red edge based vegetation indices were among the important variables for predicting leaf N. For the biomass, random forest model explained over 84% of biomass variation in all years, and visible bands including red edge based vegetation indices were found to be important. The study demonstrated that leaf N could be monitored using high spatial resolution with the red edge band capability, and is important for rangeland assessment and monitoring.

  6. Mapping Soil Properties of Africa at 250 m Resolution: Random Forests Significantly Improve Current Predictions.

    PubMed

    Hengl, Tomislav; Heuvelink, Gerard B M; Kempen, Bas; Leenaars, Johan G B; Walsh, Markus G; Shepherd, Keith D; Sila, Andrew; MacMillan, Robert A; Mendes de Jesus, Jorge; Tamene, Lulseged; Tondoh, Jérôme E

    2015-01-01

    80% of arable land in Africa has low soil fertility and suffers from physical soil problems. Additionally, significant amounts of nutrients are lost every year due to unsustainable soil management practices. This is partially the result of insufficient use of soil management knowledge. To help bridge the soil information gap in Africa, the Africa Soil Information Service (AfSIS) project was established in 2008. Over the period 2008-2014, the AfSIS project compiled two point data sets: the Africa Soil Profiles (legacy) database and the AfSIS Sentinel Site database. These data sets contain over 28 thousand sampling locations and represent the most comprehensive soil sample data sets of the African continent to date. Utilizing these point data sets in combination with a large number of covariates, we have generated a series of spatial predictions of soil properties relevant to the agricultural management--organic carbon, pH, sand, silt and clay fractions, bulk density, cation-exchange capacity, total nitrogen, exchangeable acidity, Al content and exchangeable bases (Ca, K, Mg, Na). We specifically investigate differences between two predictive approaches: random forests and linear regression. Results of 5-fold cross-validation demonstrate that the random forests algorithm consistently outperforms the linear regression algorithm, with average decreases of 15-75% in Root Mean Squared Error (RMSE) across soil properties and depths. Fitting and running random forests models takes an order of magnitude more time and the modelling success is sensitive to artifacts in the input data, but as long as quality-controlled point data are provided, an increase in soil mapping accuracy can be expected. Results also indicate that globally predicted soil classes (USDA Soil Taxonomy, especially Alfisols and Mollisols) help improve continental scale soil property mapping, and are among the most important predictors. This indicates a promising potential for transferring pedological knowledge from data rich countries to countries with limited soil data.

  7. A New Method for Predicting Patient Survivorship Using Efficient Bayesian Network Learning

    PubMed Central

    Jiang, Xia; Xue, Diyang; Brufsky, Adam; Khan, Seema; Neapolitan, Richard

    2014-01-01

    The purpose of this investigation is to develop and evaluate a new Bayesian network (BN)-based patient survivorship prediction method. The central hypothesis is that the method predicts patient survivorship well, while having the capability to handle high-dimensional data and be incorporated into a clinical decision support system (CDSS). We have developed EBMC_Survivorship (EBMC_S), which predicts survivorship for each year individually. EBMC_S is based on the EBMC BN algorithm, which has been shown to handle high-dimensional data. BNs have excellent architecture for decision support systems. In this study, we evaluate EBMC_S using the Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) dataset, which concerns breast tumors. A 5-fold cross-validation study indicates that EMBC_S performs better than the Cox proportional hazard model and is comparable to the random survival forest method. We show that EBMC_S provides additional information such as sensitivity analyses, which covariates predict each year, and yearly areas under the ROC curve (AUROCs). We conclude that our investigation supports the central hypothesis. PMID:24558297

  8. A new method for predicting patient survivorship using efficient bayesian network learning.

    PubMed

    Jiang, Xia; Xue, Diyang; Brufsky, Adam; Khan, Seema; Neapolitan, Richard

    2014-01-01

    The purpose of this investigation is to develop and evaluate a new Bayesian network (BN)-based patient survivorship prediction method. The central hypothesis is that the method predicts patient survivorship well, while having the capability to handle high-dimensional data and be incorporated into a clinical decision support system (CDSS). We have developed EBMC_Survivorship (EBMC_S), which predicts survivorship for each year individually. EBMC_S is based on the EBMC BN algorithm, which has been shown to handle high-dimensional data. BNs have excellent architecture for decision support systems. In this study, we evaluate EBMC_S using the Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) dataset, which concerns breast tumors. A 5-fold cross-validation study indicates that EMBC_S performs better than the Cox proportional hazard model and is comparable to the random survival forest method. We show that EBMC_S provides additional information such as sensitivity analyses, which covariates predict each year, and yearly areas under the ROC curve (AUROCs). We conclude that our investigation supports the central hypothesis.

  9. Disentangling Biodiversity and Climatic Determinants of Wood Production

    PubMed Central

    Vilà, Montserrat; Carrillo-Gavilán, Amparo; Vayreda, Jordi; Bugmann, Harald; Fridman, Jonas; Grodzki, Wojciech; Haase, Josephine; Kunstler, Georges; Schelhaas, MartJan; Trasobares, Antoni

    2013-01-01

    Background Despite empirical support for an increase in ecosystem productivity with species diversity in synthetic systems, there is ample evidence that this relationship is dependent on environmental characteristics, especially in structurally more complex natural systems. Empirical support for this relationship in forests is urgently needed, as these ecosystems play an important role in carbon sequestration. Methodology/Principal Findings We tested whether tree wood production is positively related to tree species richness while controlling for climatic factors, by analyzing 55265 forest inventory plots in 11 forest types across five European countries. On average, wood production was 24% higher in mixed than in monospecific forests. Taken alone, wood production was enhanced with increasing tree species richness in almost all forest types. In some forests, wood production was also greater with increasing numbers of tree types. Structural Equation Modeling indicated that the increase in wood production with tree species richness was largely mediated by a positive association between stand basal area and tree species richness. Mean annual temperature and mean annual precipitation affected wood production and species richness directly. However, the direction and magnitude of the influence of climatic variables on wood production and species richness was not consistent, and vary dependent on forest type. Conclusions Our analysis is the first to find a local scale positive relationship between tree species richness and tree wood production occurring across a continent. Our results strongly support incorporating the role of biodiversity in management and policy plans for forest carbon sequestration. PMID:23437038

  10. Research in Support of Forest Management. Final report, 1986--1991

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Marx, D.H.

    1991-12-01

    This final research report on Research in Support of Forest Management for the Savannah River Forest Station covers the period 1986 thru 1991. This report provides a list of publications resulting from research accomplished by SEFES scientists and their cooperators, and a list of continuing research study titles. Output is 22 research publications, 23 publications involving technology transfer of results to various user groups, and 11 manuscripts in pre-publication format. DOE funding contributed approximately 15 percent of the total cost of the research.

  11. Effects of oil-palm plantations on diversity of tropical anurans.

    PubMed

    Faruk, Aisyah; Belabut, Daicus; Ahmad, Norhayati; Knell, Robert J; Garner, Trenton W J

    2013-06-01

    Agriculturally altered vegetation, especially oil-palm plantations, is rapidly increasing in Southeast Asia. Low species diversity is associated with this commodity, but data on anuran diversity in oil-palm plantations are lacking. We investigated how anuran biological diversity differs between forest and oil-palm plantation, and whether observed differences in biological diversity of these areas is linked to specific environmental factors. We hypothesized that biological diversity is lower in plantations and that plantations support a larger proportion of disturbance-tolerant species than forest. We compared species richness, abundance, and community composition between plantation and forest areas and between site types within plantation and forest (forest stream vs. plantation stream, forest riparian vs. plantation riparian, forest terrestrial vs. plantation terrestrial). Not all measures of biological diversity differed between oil-palm plantations and secondary forest sites. Anuran community composition, however, differed greatly between forest and plantation, and communities of anurans in plantations contained species that prosper in disturbed areas. Although plantations supported large numbers of breeding anurans, we concluded the community consisted of common species that were of little conservation concern (commonly found species include Fejervarya limnocharis, Microhyla heymonsi, and Hylarana erythrea). We believe that with a number of management interventions, oil-palm plantations can provide habitat for species that dwell in secondary forests. © 2013 Society for Conservation Biology.

  12. Analysis of needed forest in Universitas Negeri Semarang (UNNES) based on calculation of carbon dioxide emissions

    NASA Astrophysics Data System (ADS)

    Rahayuningsih, M.; Kartijono, N. E.; Arifin, M. S.

    2018-03-01

    Increasing number of staffs and academicians as a result of UNNES's popularity becoming a favourite university in Indonesia has demanded more facilities to support the learning process, student activities and campus operations. This condition has declined forest covered area in the campus, even though. Optimum extent must be prevented to support ecological function in campus areas. This research is conducted to determine the optimum areas of needed campus's forest based on CO2 emissions in the UNNES area in Sekaran sub-district. The results showed that forest need for campus of UNNES in 2017 is 14.25 ha, but the existing area is only 13.103 ha. Campus forest in western campus area is sufficient to absorb CO2 emissions with forest availability is about 8,147 ha while forest requirement is about 4.47 ha. Campus forest in eastern campus area is not sufficient to absorb CO2 emissions. The need of campus forest in eastern campus area is much bigger that is 9,78 ha from campus forest which available is about 4,956 ha. The results of this study can be used as a reference in the development of green space both on campus and in the city of UNNES Semarang.

  13. A comparative study: classification vs. user-based collaborative filtering for clinical prediction.

    PubMed

    Hao, Fang; Blair, Rachael Hageman

    2016-12-08

    Recommender systems have shown tremendous value for the prediction of personalized item recommendations for individuals in a variety of settings (e.g., marketing, e-commerce, etc.). User-based collaborative filtering is a popular recommender system, which leverages an individuals' prior satisfaction with items, as well as the satisfaction of individuals that are "similar". Recently, there have been applications of collaborative filtering based recommender systems for clinical risk prediction. In these applications, individuals represent patients, and items represent clinical data, which includes an outcome. Application of recommender systems to a problem of this type requires the recasting a supervised learning problem as unsupervised. The rationale is that patients with similar clinical features carry a similar disease risk. As the "Big Data" era progresses, it is likely that approaches of this type will be reached for as biomedical data continues to grow in both size and complexity (e.g., electronic health records). In the present study, we set out to understand and assess the performance of recommender systems in a controlled yet realistic setting. User-based collaborative filtering recommender systems are compared to logistic regression and random forests with different types of imputation and varying amounts of missingness on four different publicly available medical data sets: National Health and Nutrition Examination Survey (NHANES, 2011-2012 on Obesity), Study to Understand Prognoses Preferences Outcomes and Risks of Treatment (SUPPORT), chronic kidney disease, and dermatology data. We also examined performance using simulated data with observations that are Missing At Random (MAR) or Missing Completely At Random (MCAR) under various degrees of missingness and levels of class imbalance in the response variable. Our results demonstrate that user-based collaborative filtering is consistently inferior to logistic regression and random forests with different imputations on real and simulated data. The results warrant caution for the collaborative filtering for the purpose of clinical risk prediction when traditional classification is feasible and practical. CF may not be desirable in datasets where classification is an acceptable alternative. We describe some natural applications related to "Big Data" where CF would be preferred and conclude with some insights as to why caution may be warranted in this context.

  14. Water Quality Effects of Forest Roads in Bottomland Hardwood Stands

    Treesearch

    Robert B. Rummer

    1999-01-01

    Management of bottomland hardwood sites requires adequate access to support forest operations. A study conducted in a bottomland forest in central Georgia has evaluated the effect of forest road design on sediment movement and water quality. Five years of measurement indicate that a conventional crowned road design is a net sink for sediment, primarily due to settling...

  15. An Integrated Approach to Forest Ecosystem Services

    Treesearch

    José Joaquin Campos; Francisco Alpizar; Bastiaan Louman; John A. Parrotta

    2005-01-01

    Forest ecosystem services (FES) are fundamental for the Earth’s life support systems. This chapter discusses the different services provided by forest ecosystems and the effects that land use and forest management practices have on their provision. It also discusses the role of markets in providing an enabling environment for a sustainable and equitable provision of...

  16. Current Research on Wood Decay in the USDA Forest Service

    Treesearch

    Harold H. Burdsall Jr.

    1991-01-01

    The Forest Service's research on decay fungi and decay caused by fungi is done mainly in two research work units at the Forest Products Laboratory. One unit, the Center for Forest Mycology Research, performs biosystematic research on root-rot and products-rot fungi in the genera Armillaria, Phellinus, and Phlebia and maintains the culture collection supporting...

  17. DIY visualizations: opportunities for story-telling with esri tools

    Treesearch

    Charles H. Perry; Barry T. Wilson

    2015-01-01

    The Forest Service and Esri recently entered into a partnership: (1) to distribute FIA and other Forest Service data with the public and stakeholders through ArcGIS Online, and (2) to facilitate the application of the ArcGIS platform within the Forest Service to develop forest management and landscape management plans, and support their scientific research activities....

  18. Role of fire in restoration of a ponderosa pine forest, Washington

    Treesearch

    Richy J. Harrod; Richard W. Fonda; Mara K. McGrath

    2007-01-01

    Ponderosa pine forests in the Eastern Cascades of Washington support dense, overstocked stands in which crown fires are probable, owing to postsettlement sheep grazing, logging, and fire exclusion. In 1991, the Okanogan-Wenatchee National Forests began to apply long-term management techniques to reverse postsettlement changes in ponderosa pine forests. For 9 years, the...

  19. An overview of African Americans' historical, religious, and spiritual ties to forests

    Treesearch

    Earl C. Leatherberry

    2000-01-01

    Forests have played a significant role in the development of American culture values. Research has consistently demonstrated that a wide range of benefits accrues to people from contact with natural environments such as forest. Governmnetal agencies provide access to forest and support various forestry programs. It is generally known that African Americans are...

  20. Mapping deforestation and forest degradation using Landsat time series: a case of Sumatra—Indonesia

    Treesearch

    Belinda Arunarwati Margono

    2013-01-01

    Indonesia experiences the second highest rate of deforestation among tropical countries (FAO 2005, 2010). Consequently, timely and accurate forest data are required to combat deforestation and forest degradation in support of climate change mitigation and biodiversity conservation policy initiatives. Remote sensing is considered as a significant data source for forest...

  1. RPA tree-level database users guide

    Treesearch

    Patrick D. Miles; Scott A. Pugh; Brad Smith; Sonja N. Oswalt

    2014-01-01

    The Forest and Rangeland Renewable Resources Planning Act (RPA) of 1974 calls for a periodic assessment of the Nation's renewable resources. The Forest Inventory and Analysis (FIA) program of the U.S. Forest Service supports the RPA effort by providing information on the forest resources of the United States. The RPA tree-level database (RPAtreeDB) was generated...

  2. Forests: the potential consequences of climate variability and change

    Treesearch

    USDA Forest Service

    2001-01-01

    This pamphlet reports the recent scientific assessment that analyzed how future climate variablity and change may affect forests in the United States. The assessment, sponsored by the USDA Forest Service, and supported, in part, by the U.S Department of Energy, and the National Atmospheric and Space Administration, describes the suite of potential impacts on forests....

  3. Urban forests and parks as privacy refuges

    Treesearch

    William E. Hammitt

    2002-01-01

    Urban forests and parks are forested areas that can serve as refuges for privacy. This article presents a conceptual argument for urban forests and parks as privacy refuges, and data that support the argument. On-site visitors (n = 610) to four Cleveland, Ohio, U.S., Metroparks were surveyed in 1995. Results indicated that considerable amounts of privacy were obtained...

  4. Frugivory in Canopy Plants in a Western Amazonian Forest: Dispersal Systems, Phylogenetic Ensembles and Keystone Plants

    PubMed Central

    Stevenson, Pablo R.; Link, Andrés; González-Caro, Sebastian; Torres-Jiménez, María Fernanda

    2015-01-01

    Frugivory is a widespread mutualistic interaction in which frugivores obtain nutritional resources while favoring plant recruitment through their seed dispersal services. Nonetheless, how these complex interactions are organized in diverse communities, such as tropical forests, is not fully understood. In this study we evaluated the existence of plant-frugivore sub-assemblages and their phylogenetic organization in an undisturbed western Amazonian forest in Colombia. We also explored for potential keystone plants, based on network analyses and an estimate of the amount of fruit going from plants to frugivores. We carried out diurnal observations on 73 canopy plant species during a period of two years. During focal tree sampling, we recorded frugivore identity, the duration of each individual visit, and feeding rates. We did not find support for the existence of sub assemblages, such as specialized vs. generalized dispersal systems. Visitation rates on the vast majority of canopy species were associated with the relative abundance of frugivores, in which ateline monkeys (i.e. Lagothrix and Ateles) played the most important roles. All fruiting plants were visited by a variety of frugivores and the phylogenetic assemblage was random in more than 67% of the cases. In cases of aggregation, the plant species were consumed by only primates or only birds, and filters were associated with fruit protection and likely chemical content. Plants suggested as keystone species based on the amount of pulp going from plants to frugivores differ from those suggested based on network approaches. Our results suggest that in tropical forests most tree-frugivore interactions are generalized, and abundance should be taken into account when assessing the most important plants for frugivores. PMID:26492037

  5. Exploiting the capabilities of the Sentinel-2 multi spectral instrument for predicting growing stock volume in forest ecosystems

    NASA Astrophysics Data System (ADS)

    Mura, Matteo; Bottalico, Francesca; Giannetti, Francesca; Bertani, Remo; Giannini, Raffaello; Mancini, Marco; Orlandini, Simone; Travaglini, Davide; Chirici, Gherardo

    2018-04-01

    The spatial prediction of growing stock volume is one of the most frequent application of remote sensing for supporting the sustainable management of forest ecosystems. For such a purpose data from active or passive sensors are used as predictor variables in combination with measures taken in the field in sampling plots. The Sentinel-2 (S2) satellites are equipped with a Multi Spectral Instrument (MSI) capable of acquiring 13 bands in the visible and infrared domains with a spatial resolution varying between 10 and 60 m. The present study aimed at evaluating the performance of the S2-MSI imagery for estimating the growing stock volume of forest ecosystems. To do so we used 240 plots measured in two study areas in Italy. The imputation was carried out with eight k-Nearest Neighbours (k-NN) methods available in the open source YaImpute R package. In order to evaluate the S2-MSI performance we repeated the experimental protocol also with two other sets of images acquired by two well-known satellites equipped with multi spectral instruments: Landsat 8 OLI and RapidEye scanner. We found that S2 worked better than Landsat in 37.5% of the cases and in 62.5% of the cases better than RapidEye. In one study area the best performance was obtained with Landsat OLI (RMSD = 6.84%) and in the other with S2 (RMSD = 22.94%), both with the k-NN system based on a distance matrix calculated with the Random Forest algorithm. The results confirmed that S2 images are suitable for predicting growing stock volume obtaining good performances (average RMSD for both the test areas of less than 19%).

  6. Watershed forest management using decision support technology

    Treesearch

    Mark Twery; Robert Northrop

    2004-01-01

    Using innovative partnerships and a variety of decision support tools, we identified the needs and goals of Baltimore, Maryland, for their reservoir properties containing over 17000 forested acres; developed a management plan; determined the information necessary to evaluate conditions, processes, and context; chose tools to use; collected, organized, and analyzed data...

  7. PCDD/F and Aromatic Emissions from Simulated Forest and Grassland Fires

    EPA Science Inventory

    Emissions of polychlorinated dibenzodioxin and polychlorinated dibenzofuran (PCDD/F) from simulated grassland and forest fires were quantitatively sampled to derive emission factors in support of PCDD/F inventory development. Grasses from Kentucky and Minnesota; forest shrubs fro...

  8. An evaluation of supervised classifiers for indirectly detecting salt-affected areas at irrigation scheme level

    NASA Astrophysics Data System (ADS)

    Muller, Sybrand Jacobus; van Niekerk, Adriaan

    2016-07-01

    Soil salinity often leads to reduced crop yield and quality and can render soils barren. Irrigated areas are particularly at risk due to intensive cultivation and secondary salinization caused by waterlogging. Regular monitoring of salt accumulation in irrigation schemes is needed to keep its negative effects under control. The dynamic spatial and temporal characteristics of remote sensing can provide a cost-effective solution for monitoring salt accumulation at irrigation scheme level. This study evaluated a range of pan-fused SPOT-5 derived features (spectral bands, vegetation indices, image textures and image transformations) for classifying salt-affected areas in two distinctly different irrigation schemes in South Africa, namely Vaalharts and Breede River. The relationship between the input features and electro conductivity measurements were investigated using regression modelling (stepwise linear regression, partial least squares regression, curve fit regression modelling) and supervised classification (maximum likelihood, nearest neighbour, decision tree analysis, support vector machine and random forests). Classification and regression trees and random forest were used to select the most important features for differentiating salt-affected and unaffected areas. The results showed that the regression analyses produced weak models (<0.4 R squared). Better results were achieved using the supervised classifiers, but the algorithms tend to over-estimate salt-affected areas. A key finding was that none of the feature sets or classification algorithms stood out as being superior for monitoring salt accumulation at irrigation scheme level. This was attributed to the large variations in the spectral responses of different crops types at different growing stages, coupled with their individual tolerances to saline conditions.

  9. The effect of IDH1 mutation on the structural connectome in malignant astrocytoma.

    PubMed

    Kesler, Shelli R; Noll, Kyle; Cahill, Daniel P; Rao, Ganesh; Wefel, Jeffrey S

    2017-02-01

    Mutation of the IDH1 gene is associated with differences in malignant astrocytoma growth characteristics that impact phenotypic severity, including cognitive impairment. We previously demonstrated greater cognitive impairment in patients with IDH1 wild type tumor compared to those with IDH1 mutant, and therefore we hypothesized that brain network organization would be lower in patients with wild type tumors. Volumetric, T1-weighted MRI scans were obtained retrospectively from 35 patients with IDH1 mutant and 32 patients with wild type malignant astrocytoma (mean age = 45 ± 14 years) and used to extract individual level, gray matter connectomes. Graph theoretical analysis was then applied to measure efficiency and other connectome properties for each patient. Cognitive performance was categorized as impaired or not and random forest classification was used to explore factors associated with cognitive impairment. Patients with wild type tumor demonstrated significantly lower network efficiency in several medial frontal, posterior parietal and subcortical regions (p < 0.05, corrected for multiple comparisons). Patients with wild type tumor also demonstrated significantly higher incidence of cognitive impairment (p = 0.03). Random forest analysis indicated that network efficiency was inversely, though nonlinearly associated with cognitive impairment in both groups (p < 0.0001). Cognitive reserve appeared to mediate this relationship in patients with mutant tumor suggesting greater neuroplasticity and/or benefit from neuroprotective factors. Tumor volume was the greatest contributor to cognitive impairment in patients with wild type tumor, supporting our hypothesis that greater lesion momentum between grades may cause more disconnection of core neurocircuitry and consequently lower efficiency of information processing.

  10. Problematic internet use (PIU): Associations with the impulsive-compulsive spectrum. An application of machine learning in psychiatry.

    PubMed

    Ioannidis, Konstantinos; Chamberlain, Samuel R; Treder, Matthias S; Kiraly, Franz; Leppink, Eric W; Redden, Sarah A; Stein, Dan J; Lochner, Christine; Grant, Jon E

    2016-12-01

    Problematic internet use is common, functionally impairing, and in need of further study. Its relationship with obsessive-compulsive and impulsive disorders is unclear. Our objective was to evaluate whether problematic internet use can be predicted from recognised forms of impulsive and compulsive traits and symptomatology. We recruited volunteers aged 18 and older using media advertisements at two sites (Chicago USA, and Stellenbosch, South Africa) to complete an extensive online survey. State-of-the-art out-of-sample evaluation of machine learning predictive models was used, which included Logistic Regression, Random Forests and Naïve Bayes. Problematic internet use was identified using the Internet Addiction Test (IAT). 2006 complete cases were analysed, of whom 181 (9.0%) had moderate/severe problematic internet use. Using Logistic Regression and Naïve Bayes we produced a classification prediction with a receiver operating characteristic area under the curve (ROC-AUC) of 0.83 (SD 0.03) whereas using a Random Forests algorithm the prediction ROC-AUC was 0.84 (SD 0.03) [all three models superior to baseline models p < 0.0001]. The models showed robust transfer between the study sites in all validation sets [p < 0.0001]. Prediction of problematic internet use was possible using specific measures of impulsivity and compulsivity in a population of volunteers. Moreover, this study offers proof-of-concept in support of using machine learning in psychiatry to demonstrate replicability of results across geographically and culturally distinct settings. Copyright © 2016 The Author(s). Published by Elsevier Ltd.. All rights reserved.

  11. Development of an automated assessment tool for MedWatch reports in the FDA adverse event reporting system.

    PubMed

    Han, Lichy; Ball, Robert; Pamer, Carol A; Altman, Russ B; Proestel, Scott

    2017-09-01

    As the US Food and Drug Administration (FDA) receives over a million adverse event reports associated with medication use every year, a system is needed to aid FDA safety evaluators in identifying reports most likely to demonstrate causal relationships to the suspect medications. We combined text mining with machine learning to construct and evaluate such a system to identify medication-related adverse event reports. FDA safety evaluators assessed 326 reports for medication-related causality. We engineered features from these reports and constructed random forest, L1 regularized logistic regression, and support vector machine models. We evaluated model accuracy and further assessed utility by generating report rankings that represented a prioritized report review process. Our random forest model showed the best performance in report ranking and accuracy, with an area under the receiver operating characteristic curve of 0.66. The generated report ordering assigns reports with a higher probability of medication-related causality a higher rank and is significantly correlated to a perfect report ordering, with a Kendall's tau of 0.24 ( P  = .002). Our models produced prioritized report orderings that enable FDA safety evaluators to focus on reports that are more likely to contain valuable medication-related adverse event information. Applying our models to all FDA adverse event reports has the potential to streamline the manual review process and greatly reduce reviewer workload. Published by Oxford University Press on behalf of the American Medical Informatics Association 2017. This work is written by US Government employees and is in the public domain in the United States.

  12. Modelling Biophysical Parameters of Maize Using Landsat 8 Time Series

    NASA Astrophysics Data System (ADS)

    Dahms, Thorsten; Seissiger, Sylvia; Conrad, Christopher; Borg, Erik

    2016-06-01

    Open and free access to multi-frequent high-resolution data (e.g. Sentinel - 2) will fortify agricultural applications based on satellite data. The temporal and spatial resolution of these remote sensing datasets directly affects the applicability of remote sensing methods, for instance a robust retrieving of biophysical parameters over the entire growing season with very high geometric resolution. In this study we use machine learning methods to predict biophysical parameters, namely the fraction of absorbed photosynthetic radiation (FPAR), the leaf area index (LAI) and the chlorophyll content, from high resolution remote sensing. 30 Landsat 8 OLI scenes were available in our study region in Mecklenburg-Western Pomerania, Germany. In-situ data were weekly to bi-weekly collected on 18 maize plots throughout the summer season 2015. The study aims at an optimized prediction of biophysical parameters and the identification of the best explaining spectral bands and vegetation indices. For this purpose, we used the entire in-situ dataset from 24.03.2015 to 15.10.2015. Random forest and conditional inference forests were used because of their explicit strong exploratory and predictive character. Variable importance measures allowed for analysing the relation between the biophysical parameters with respect to the spectral response, and the performance of the two approaches over the plant stock evolvement. Classical random forest regression outreached the performance of conditional inference forests, in particular when modelling the biophysical parameters over the entire growing period. For example, modelling biophysical parameters of maize for the entire vegetation period using random forests yielded: FPAR: R² = 0.85; RMSE = 0.11; LAI: R² = 0.64; RMSE = 0.9 and chlorophyll content (SPAD): R² = 0.80; RMSE=4.9. Our results demonstrate the great potential in using machine-learning methods for the interpretation of long-term multi-frequent remote sensing datasets to model biophysical parameters.

  13. Simulating high spatial resolution high severity burned area in Sierra Nevada forests for California Spotted Owl habitat climate change risk assessment and management.

    NASA Astrophysics Data System (ADS)

    Keyser, A.; Westerling, A. L.; Jones, G.; Peery, M. Z.

    2017-12-01

    Sierra Nevada forests have experienced an increase in very large fires with significant areas of high burn severity, such as the Rim (2013) and King (2014) fires, that have impacted habitat of endangered species such as the California spotted owl. In order to support land manager forest management planning and risk assessment activities, we used historical wildfire histories from the Monitoring Trends in Burn Severity project and gridded hydroclimate and land surface characteristics data to develope statistical models to simulate the frequency, location and extent of high severity burned area in Sierra Nevada forest wildfires as functions of climate and land surface characteristics. We define high severity here as BA90 area: the area comprising patches with ninety percent or more basal area killed within a larger fire. We developed a system of statistical models to characterize the probability of large fire occurrence, the probability of significant BA90 area present given a large fire, and the total extent of BA90 area in a fire on a 1/16 degree lat/lon grid over the Sierra Nevada. Repeated draws from binomial and generalized pareto distributions using these probabilities generated a library of simulated histories of high severity fire for a range of near (50 yr) future climate and fuels management scenarios. Fuels management scenarios were provided by USFS Region 5. Simulated BA90 area was then downscaled to 30 m resolution using a statistical model we developed using Random Forest techniques to estimate the probability of adjacent 30m pixels burning with ninety percent basal kill as a function of fire size and vegetation and topographic features. The result is a library of simulated high resolution maps of BA90 burned areas for a range of climate and fuels management scenarios with which we estimated conditional probabilities of owl nesting sites being impacted by high severity wildfire.

  14. FOCIS: A forest classification and inventory system using LANDSAT and digital terrain data

    NASA Technical Reports Server (NTRS)

    Strahler, A. H.; Franklin, J.; Woodcook, C. E.; Logan, T. L.

    1981-01-01

    Accurate, cost-effective stratification of forest vegetation and timber inventory is the primary goal of a Forest Classification and Inventory System (FOCIS). Conventional timber stratification using photointerpretation can be time-consuming, costly, and inconsistent from analyst to analyst. FOCIS was designed to overcome these problems by using machine processing techniques to extract and process tonal, textural, and terrain information from registered LANDSAT multispectral and digital terrain data. Comparison of samples from timber strata identified by conventional procedures showed that both have about the same potential to reduce the variance of timber volume estimates over simple random sampling.

  15. Treatments that enhance the decomposition of forest fuels for use in partially harvested stands in the moist forests of the northern Rocky Mountains (Priest River Experimental Forest)

    Treesearch

    Russell T. Graham; Theresa B. Jain

    2007-01-01

    The moist forests of the Rocky Mountains typically support late seral western hemlock, moist grand fir, or western redcedar forests. In addition to these species, Douglas-fir, western white pine, western larch, ponderosa pine, and lodgepole pine can occur creating a multitude of species compositions, structures, and successional stages that can be arrayed in a variety...

  16. Faster Trees: Strategies for Accelerated Training and Prediction of Random Forests for Classification of Polsar Images

    NASA Astrophysics Data System (ADS)

    Hänsch, Ronny; Hellwich, Olaf

    2018-04-01

    Random Forests have continuously proven to be one of the most accurate, robust, as well as efficient methods for the supervised classification of images in general and polarimetric synthetic aperture radar data in particular. While the majority of previous work focus on improving classification accuracy, we aim for accelerating the training of the classifier as well as its usage during prediction while maintaining its accuracy. Unlike other approaches we mainly consider algorithmic changes to stay as much as possible independent of platform and programming language. The final model achieves an approximately 60 times faster training and a 500 times faster prediction, while the accuracy is only marginally decreased by roughly 1 %.

  17. Underwater image enhancement through depth estimation based on random forest

    NASA Astrophysics Data System (ADS)

    Tai, Shen-Chuan; Tsai, Ting-Chou; Huang, Jyun-Han

    2017-11-01

    Light absorption and scattering in underwater environments can result in low-contrast images with a distinct color cast. This paper proposes a systematic framework for the enhancement of underwater images. Light transmission is estimated using the random forest algorithm. RGB values, luminance, color difference, blurriness, and the dark channel are treated as features in training and estimation. Transmission is calculated using an ensemble machine learning algorithm to deal with a variety of conditions encountered in underwater environments. A color compensation and contrast enhancement algorithm based on depth information was also developed with the aim of improving the visual quality of underwater images. Experimental results demonstrate that the proposed scheme outperforms existing methods with regard to subjective visual quality as well as objective measurements.

  18. QUANTIFYING FOREST ABOVEGROUND CARBON POOLS AND FLUXES USING MULTI-TEMPORAL LIDAR A report on field monitoring, remote sensing MMV, GIS integration, and modeling results for forestry field validation test to quantify aboveground tree biomass and carbon

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lee Spangler; Lee A. Vierling; Eva K. Stand

    2012-04-01

    Sound policy recommendations relating to the role of forest management in mitigating atmospheric carbon dioxide (CO{sub 2}) depend upon establishing accurate methodologies for quantifying forest carbon pools for large tracts of land that can be dynamically updated over time. Light Detection and Ranging (LiDAR) remote sensing is a promising technology for achieving accurate estimates of aboveground biomass and thereby carbon pools; however, not much is known about the accuracy of estimating biomass change and carbon flux from repeat LiDAR acquisitions containing different data sampling characteristics. In this study, discrete return airborne LiDAR data was collected in 2003 and 2009 acrossmore » {approx}20,000 hectares (ha) of an actively managed, mixed conifer forest landscape in northern Idaho, USA. Forest inventory plots, established via a random stratified sampling design, were established and sampled in 2003 and 2009. The Random Forest machine learning algorithm was used to establish statistical relationships between inventory data and forest structural metrics derived from the LiDAR acquisitions. Aboveground biomass maps were created for the study area based on statistical relationships developed at the plot level. Over this 6-year period, we found that the mean increase in biomass due to forest growth across the non-harvested portions of the study area was 4.8 metric ton/hectare (Mg/ha). In these non-harvested areas, we found a significant difference in biomass increase among forest successional stages, with a higher biomass increase in mature and old forest compared to stand initiation and young forest. Approximately 20% of the landscape had been disturbed by harvest activities during the six-year time period, representing a biomass loss of >70 Mg/ha in these areas. During the study period, these harvest activities outweighed growth at the landscape scale, resulting in an overall loss in aboveground carbon at this site. The 30-fold increase in sampling density between the 2003 and 2009 did not affect the biomass estimates. Overall, LiDAR data coupled with field reference data offer a powerful method for calculating pools and changes in aboveground carbon in forested systems. The results of our study suggest that multitemporal LiDAR-based approaches are likely to be useful for high quality estimates of aboveground carbon change in conifer forest systems.« less

  19. Foraging behaviour and landscape utilisation by the endangered golden-crowned flying fox (Acerodon jubatus), the Philippines.

    PubMed

    de Jong, Carol; Field, Hume; Tagtag, Anson; Hughes, Tom; Dechmann, Dina; Jayme, Sarah; Epstein, Jonathan H; Epstein, Jonathan; Smith, Craig; Santos, Imelda; Catbagan, Davinio; Lim, Mundita; Benigno, Carolyn; Daszak, Peter; Newman, Scott

    2013-01-01

    Species of Old World fruit-bats (family Pteropodidae) have been identified as the natural hosts of a number of novel and highly pathogenic viruses threatening livestock and human health. We used GPS data loggers to record the nocturnal foraging movements of Acerodon jubatus, the Golden-crowned flying fox in the Philippines to better understand the landscape utilisation of this iconic species, with the dual objectives of pre-empting disease emergence and supporting conservation management. Data loggers were deployed on eight of 54 A. jubatus (two males and six females) captured near Subic Bay on the Philippine island of Luzon between 22 November and 2 December 2010. Bodyweight ranged from 730 g to 1002 g, translating to a weight burden of 3-4% of bodyweight. Six of the eight loggers yielded useful data over 2-10 days, showing variability in the nature and range of individual bat movements. The majority of foraging locations were in closed forest and most were remote from evident human activity. Forty-six discrete foraging locations and five previously unrecorded roost locations were identified. Our findings indicate that foraging is not a random event, with the majority of bats exhibiting repetitious foraging movements night-to-night, that apparently intact forest provides the primary foraging resource, and that known roost locations substantially underestimate the true number (and location) of roosts. Our initial findings support policy and decision-making across perspectives including landscape management, species conservation, and potentially disease emergence.

  20. Incidence and effects of endemic populations of forest pests in young mixed-conifer forests of the Sierra Nevada

    Treesearch

    Carroll B. Williams; David L. Azuma; George T. Ferrell

    1992-01-01

    Approximately 3.200 trees in young mixed-conifer stands were examined for pest activity and human-caused or mechanical injuries, and approximately 25 percent of these trees were randomly selected for stem analyses. The examination of trees felled for stem analyses showed that 409 (47 percent) were free of pests and 466 (53 percent) had one or more pest categories....

Top