Data-Driven Lead-Acid Battery Prognostics Using Random Survival Forests
2014-10-02
Kogalur, Blackstone , & Lauer, 2008; Ishwaran & Kogalur, 2010). Random survival forest is a sur- vival analysis extension of Random Forests (Breiman, 2001...Statistics & probability letters, 80(13), 1056–1064. Ishwaran, H., Kogalur, U. B., Blackstone , E. H., & Lauer, M. S. (2008). Random survival forests. The
Marino, S R; Lin, S; Maiers, M; Haagenson, M; Spellman, S; Klein, J P; Binkowski, T A; Lee, S J; van Besien, K
2012-02-01
The identification of important amino acid substitutions associated with low survival in hematopoietic cell transplantation (HCT) is hampered by the large number of observed substitutions compared with the small number of patients available for analysis. Random forest analysis is designed to address these limitations. We studied 2107 HCT recipients with good or intermediate risk hematological malignancies to identify HLA class I amino acid substitutions associated with reduced survival at day 100 post transplant. Random forest analysis and traditional univariate and multivariate analyses were used. Random forest analysis identified amino acid substitutions in 33 positions that were associated with reduced 100 day survival, including HLA-A 9, 43, 62, 63, 76, 77, 95, 97, 114, 116, 152, 156, 166 and 167; HLA-B 97, 109, 116 and 156; and HLA-C 6, 9, 11, 14, 21, 66, 77, 80, 95, 97, 99, 116, 156, 163 and 173. In all 13 had been previously reported by other investigators using classical biostatistical approaches. Using the same data set, traditional multivariate logistic regression identified only five amino acid substitutions associated with lower day 100 survival. Random forest analysis is a novel statistical methodology for analysis of HLA mismatching and outcome studies, capable of identifying important amino acid substitutions missed by other methods.
Screening large-scale association study data: exploiting interactions using random forests.
Lunetta, Kathryn L; Hayward, L Brooke; Segal, Jonathan; Van Eerdewegh, Paul
2004-12-10
Genome-wide association studies for complex diseases will produce genotypes on hundreds of thousands of single nucleotide polymorphisms (SNPs). A logical first approach to dealing with massive numbers of SNPs is to use some test to screen the SNPs, retaining only those that meet some criterion for further study. For example, SNPs can be ranked by p-value, and those with the lowest p-values retained. When SNPs have large interaction effects but small marginal effects in a population, they are unlikely to be retained when univariate tests are used for screening. However, model-based screens that pre-specify interactions are impractical for data sets with thousands of SNPs. Random forest analysis is an alternative method that produces a single measure of importance for each predictor variable that takes into account interactions among variables without requiring model specification. Interactions increase the importance for the individual interacting variables, making them more likely to be given high importance relative to other variables. We test the performance of random forests as a screening procedure to identify small numbers of risk-associated SNPs from among large numbers of unassociated SNPs using complex disease models with up to 32 loci, incorporating both genetic heterogeneity and multi-locus interaction. Keeping other factors constant, if risk SNPs interact, the random forest importance measure significantly outperforms the Fisher Exact test as a screening tool. As the number of interacting SNPs increases, the improvement in performance of random forest analysis relative to Fisher Exact test for screening also increases. Random forests perform similarly to the univariate Fisher Exact test as a screening tool when SNPs in the analysis do not interact. In the context of large-scale genetic association studies where unknown interactions exist among true risk-associated SNPs or SNPs and environmental covariates, screening SNPs using random forest analyses can significantly reduce the number of SNPs that need to be retained for further study compared to standard univariate screening methods.
Random Bits Forest: a Strong Classifier/Regressor for Big Data
NASA Astrophysics Data System (ADS)
Wang, Yi; Li, Yi; Pu, Weilin; Wen, Kathryn; Shugart, Yin Yao; Xiong, Momiao; Jin, Li
2016-07-01
Efficiency, memory consumption, and robustness are common problems with many popular methods for data analysis. As a solution, we present Random Bits Forest (RBF), a classification and regression algorithm that integrates neural networks (for depth), boosting (for width), and random forests (for prediction accuracy). Through a gradient boosting scheme, it first generates and selects ~10,000 small, 3-layer random neural networks. These networks are then fed into a modified random forest algorithm to obtain predictions. Testing with datasets from the UCI (University of California, Irvine) Machine Learning Repository shows that RBF outperforms other popular methods in both accuracy and robustness, especially with large datasets (N > 1000). The algorithm also performed highly in testing with an independent data set, a real psoriasis genome-wide association study (GWAS).
Nasejje, Justine B; Mwambi, Henry
2017-09-07
Uganda just like any other Sub-Saharan African country, has a high under-five child mortality rate. To inform policy on intervention strategies, sound statistical methods are required to critically identify factors strongly associated with under-five child mortality rates. The Cox proportional hazards model has been a common choice in analysing data to understand factors strongly associated with high child mortality rates taking age as the time-to-event variable. However, due to its restrictive proportional hazards (PH) assumption, some covariates of interest which do not satisfy the assumption are often excluded in the analysis to avoid mis-specifying the model. Otherwise using covariates that clearly violate the assumption would mean invalid results. Survival trees and random survival forests are increasingly becoming popular in analysing survival data particularly in the case of large survey data and could be attractive alternatives to models with the restrictive PH assumption. In this article, we adopt random survival forests which have never been used in understanding factors affecting under-five child mortality rates in Uganda using Demographic and Health Survey data. Thus the first part of the analysis is based on the use of the classical Cox PH model and the second part of the analysis is based on the use of random survival forests in the presence of covariates that do not necessarily satisfy the PH assumption. Random survival forests and the Cox proportional hazards model agree that the sex of the household head, sex of the child, number of births in the past 1 year are strongly associated to under-five child mortality in Uganda given all the three covariates satisfy the PH assumption. Random survival forests further demonstrated that covariates that were originally excluded from the earlier analysis due to violation of the PH assumption were important in explaining under-five child mortality rates. These covariates include the number of children under the age of five in a household, number of births in the past 5 years, wealth index, total number of children ever born and the child's birth order. The results further indicated that the predictive performance for random survival forests built using covariates including those that violate the PH assumption was higher than that for random survival forests built using only covariates that satisfy the PH assumption. Random survival forests are appealing methods in analysing public health data to understand factors strongly associated with under-five child mortality rates especially in the presence of covariates that violate the proportional hazards assumption.
Sara A. Goeking; Paul L. Patterson
2013-01-01
The USDA Forest Serviceâs Forest Inventory and Analysis (FIA) Program applies specific sampling and analysis procedures to estimate a variety of forest attributes. FIAâs Interior West region uses post-stratification, where strata consist of forest/nonforest polygons based on MODIS imagery, and assumes that nonresponse plots are distributed at random across each stratum...
Anantha M. Prasad; Louis R. Iverson; Andy Liaw; Andy Liaw
2006-01-01
We evaluated four statistical models - Regression Tree Analysis (RTA), Bagging Trees (BT), Random Forests (RF), and Multivariate Adaptive Regression Splines (MARS) - for predictive vegetation mapping under current and future climate scenarios according to the Canadian Climate Centre global circulation model.
An application of quantile random forests for predictive mapping of forest attributes
E.A. Freeman; G.G. Moisen
2015-01-01
Increasingly, random forest models are used in predictive mapping of forest attributes. Traditional random forests output the mean prediction from the random trees. Quantile regression forests (QRF) is an extension of random forests developed by Nicolai Meinshausen that provides non-parametric estimates of the median predicted value as well as prediction quantiles. It...
L.R. Iverson; A.M. Prasad; A. Liaw
2004-01-01
More and better machine learning tools are becoming available for landscape ecologists to aid in understanding species-environment relationships and to map probable species occurrence now and potentially into the future. To thal end, we evaluated three statistical models: Regression Tree Analybib (RTA), Bagging Trees (BT) and Random Forest (RF) for their utility in...
Nasejje, Justine B; Mwambi, Henry; Dheda, Keertan; Lesosky, Maia
2017-07-28
Random survival forest (RSF) models have been identified as alternative methods to the Cox proportional hazards model in analysing time-to-event data. These methods, however, have been criticised for the bias that results from favouring covariates with many split-points and hence conditional inference forests for time-to-event data have been suggested. Conditional inference forests (CIF) are known to correct the bias in RSF models by separating the procedure for the best covariate to split on from that of the best split point search for the selected covariate. In this study, we compare the random survival forest model to the conditional inference model (CIF) using twenty-two simulated time-to-event datasets. We also analysed two real time-to-event datasets. The first dataset is based on the survival of children under-five years of age in Uganda and it consists of categorical covariates with most of them having more than two levels (many split-points). The second dataset is based on the survival of patients with extremely drug resistant tuberculosis (XDR TB) which consists of mainly categorical covariates with two levels (few split-points). The study findings indicate that the conditional inference forest model is superior to random survival forest models in analysing time-to-event data that consists of covariates with many split-points based on the values of the bootstrap cross-validated estimates for integrated Brier scores. However, conditional inference forests perform comparably similar to random survival forests models in analysing time-to-event data consisting of covariates with fewer split-points. Although survival forests are promising methods in analysing time-to-event data, it is important to identify the best forest model for analysis based on the nature of covariates of the dataset in question.
Klingensmith, Jon D; Haggard, Asher; Fedewa, Russell J; Qiang, Beidi; Cummings, Kenneth; DeGrande, Sean; Vince, D Geoffrey; Elsharkawy, Hesham
2018-04-19
Spectral analysis of ultrasound radiofrequency backscatter has the potential to identify intercostal blood vessels during ultrasound-guided placement of paravertebral nerve blocks and intercostal nerve blocks. Autoregressive models were used for spectral estimation, and bandwidth, autoregressive order and region-of-interest size were evaluated. Eight spectral parameters were calculated and used to create random forests. An autoregressive order of 10, bandwidth of 6 dB and region-of-interest size of 1.0 mm resulted in the minimum out-of-bag error. An additional random forest, using these chosen values, was created from 70% of the data and evaluated independently from the remaining 30% of data. The random forest achieved a predictive accuracy of 92% and Youden's index of 0.85. These results suggest that spectral analysis of ultrasound radiofrequency backscatter has the potential to identify intercostal blood vessels. (jokling@siue.edu) © 2018 World Federation for Ultrasound in Medicine and Biology. Copyright © 2018 World Federation for Ultrasound in Medicine and Biology. Published by Elsevier Inc. All rights reserved.
Extrapolating intensified forest inventory data to the surrounding landscape using landsat
Evan B. Brooks; John W. Coulston; Valerie A. Thomas; Randolph H. Wynne
2015-01-01
In 2011, a collection of spatially intensified plots was established on three of the Experimental Forests and Ranges (EFRs) sites with the intent of facilitating FIA program objectives for regional extrapolation. Characteristic coefficients from harmonic regression (HR) analysis of associated Landsat stacks are used as inputs into a conditional random forests model to...
NASA Astrophysics Data System (ADS)
Wu, J.; Yao, W.; Zhang, J.; Li, Y.
2018-04-01
Labeling 3D point cloud data with traditional supervised learning methods requires considerable labelled samples, the collection of which is cost and time expensive. This work focuses on adopting domain adaption concept to transfer existing trained random forest classifiers (based on source domain) to new data scenes (target domain), which aims at reducing the dependence of accurate 3D semantic labeling in point clouds on training samples from the new data scene. Firstly, two random forest classifiers were firstly trained with existing samples previously collected for other data. They were different from each other by using two different decision tree construction algorithms: C4.5 with information gain ratio and CART with Gini index. Secondly, four random forest classifiers adapted to the target domain are derived through transferring each tree in the source random forest models with two types of operations: structure expansion and reduction-SER and structure transfer-STRUT. Finally, points in target domain are labelled by fusing the four newly derived random forest classifiers using weights of evidence based fusion model. To validate our method, experimental analysis was conducted using 3 datasets: one is used as the source domain data (Vaihingen data for 3D Semantic Labelling); another two are used as the target domain data from two cities in China (Jinmen city and Dunhuang city). Overall accuracies of 85.5 % and 83.3 % for 3D labelling were achieved for Jinmen city and Dunhuang city data respectively, with only 1/3 newly labelled samples compared to the cases without domain adaption.
Comparative analysis of used car price evaluation models
NASA Astrophysics Data System (ADS)
Chen, Chuancan; Hao, Lulu; Xu, Cong
2017-05-01
An accurate used car price evaluation is a catalyst for the healthy development of used car market. Data mining has been applied to predict used car price in several articles. However, little is studied on the comparison of using different algorithms in used car price estimation. This paper collects more than 100,000 used car dealing records throughout China to do empirical analysis on a thorough comparison of two algorithms: linear regression and random forest. These two algorithms are used to predict used car price in three different models: model for a certain car make, model for a certain car series and universal model. Results show that random forest has a stable but not ideal effect in price evaluation model for a certain car make, but it shows great advantage in the universal model compared with linear regression. This indicates that random forest is an optimal algorithm when handling complex models with a large number of variables and samples, yet it shows no obvious advantage when coping with simple models with less variables.
Finley, Andrew O.; Banerjee, Sudipto; Cook, Bruce D.; Bradford, John B.
2013-01-01
In this paper we detail a multivariate spatial regression model that couples LiDAR, hyperspectral and forest inventory data to predict forest outcome variables at a high spatial resolution. The proposed model is used to analyze forest inventory data collected on the US Forest Service Penobscot Experimental Forest (PEF), ME, USA. In addition to helping meet the regression model's assumptions, results from the PEF analysis suggest that the addition of multivariate spatial random effects improves model fit and predictive ability, compared with two commonly applied modeling approaches. This improvement results from explicitly modeling the covariation among forest outcome variables and spatial dependence among observations through the random effects. Direct application of such multivariate models to even moderately large datasets is often computationally infeasible because of cubic order matrix algorithms involved in estimation. We apply a spatial dimension reduction technique to help overcome this computational hurdle without sacrificing richness in modeling.
NASA Astrophysics Data System (ADS)
Bayram, B.; Erdem, F.; Akpinar, B.; Ince, A. K.; Bozkurt, S.; Catal Reis, H.; Seker, D. Z.
2017-11-01
Coastal monitoring plays a vital role in environmental planning and hazard management related issues. Since shorelines are fundamental data for environment management, disaster management, coastal erosion studies, modelling of sediment transport and coastal morphodynamics, various techniques have been developed to extract shorelines. Random Forest is one of these techniques which is used in this study for shoreline extraction.. This algorithm is a machine learning method based on decision trees. Decision trees analyse classes of training data creates rules for classification. In this study, Terkos region has been chosen for the proposed method within the scope of "TUBITAK Project (Project No: 115Y718) titled "Integration of Unmanned Aerial Vehicles for Sustainable Coastal Zone Monitoring Model - Three-Dimensional Automatic Coastline Extraction and Analysis: Istanbul-Terkos Example". Random Forest algorithm has been implemented to extract the shoreline of the Black Sea where near the lake from LANDSAT-8 and GOKTURK-2 satellite imageries taken in 2015. The MATLAB environment was used for classification. To obtain land and water-body classes, the Random Forest method has been applied to NIR bands of LANDSAT-8 (5th band) and GOKTURK-2 (4th band) imageries. Each image has been digitized manually and shorelines obtained for accuracy assessment. According to accuracy assessment results, Random Forest method is efficient for both medium and high resolution images for shoreline extraction studies.
Estimating erosion risk on forest lands using improved methods of discriminant analysis
J. Lewis; R. M. Rice
1990-01-01
A population of 638 timber harvest areas in northwestern California was sampled for data related to the occurrence of critical amounts of erosion (>153 m3 within 0.81 ha). Separate analyses were done for forest roads and logged areas. Linear discriminant functions were computed in each analysis to contrast site conditions at critical plots with randomly selected...
A tale of two "forests": random forest machine learning AIDS tropical forest carbon mapping.
Mascaro, Joseph; Asner, Gregory P; Knapp, David E; Kennedy-Bowdoin, Ty; Martin, Roberta E; Anderson, Christopher; Higgins, Mark; Chadwick, K Dana
2014-01-01
Accurate and spatially-explicit maps of tropical forest carbon stocks are needed to implement carbon offset mechanisms such as REDD+ (Reduced Deforestation and Degradation Plus). The Random Forest machine learning algorithm may aid carbon mapping applications using remotely-sensed data. However, Random Forest has never been compared to traditional and potentially more reliable techniques such as regionally stratified sampling and upscaling, and it has rarely been employed with spatial data. Here, we evaluated the performance of Random Forest in upscaling airborne LiDAR (Light Detection and Ranging)-based carbon estimates compared to the stratification approach over a 16-million hectare focal area of the Western Amazon. We considered two runs of Random Forest, both with and without spatial contextual modeling by including--in the latter case--x, and y position directly in the model. In each case, we set aside 8 million hectares (i.e., half of the focal area) for validation; this rigorous test of Random Forest went above and beyond the internal validation normally compiled by the algorithm (i.e., called "out-of-bag"), which proved insufficient for this spatial application. In this heterogeneous region of Northern Peru, the model with spatial context was the best preforming run of Random Forest, and explained 59% of LiDAR-based carbon estimates within the validation area, compared to 37% for stratification or 43% by Random Forest without spatial context. With the 60% improvement in explained variation, RMSE against validation LiDAR samples improved from 33 to 26 Mg C ha(-1) when using Random Forest with spatial context. Our results suggest that spatial context should be considered when using Random Forest, and that doing so may result in substantially improved carbon stock modeling for purposes of climate change mitigation.
2011-01-01
Background Dementia and cognitive impairment associated with aging are a major medical and social concern. Neuropsychological testing is a key element in the diagnostic procedures of Mild Cognitive Impairment (MCI), but has presently a limited value in the prediction of progression to dementia. We advance the hypothesis that newer statistical classification methods derived from data mining and machine learning methods like Neural Networks, Support Vector Machines and Random Forests can improve accuracy, sensitivity and specificity of predictions obtained from neuropsychological testing. Seven non parametric classifiers derived from data mining methods (Multilayer Perceptrons Neural Networks, Radial Basis Function Neural Networks, Support Vector Machines, CART, CHAID and QUEST Classification Trees and Random Forests) were compared to three traditional classifiers (Linear Discriminant Analysis, Quadratic Discriminant Analysis and Logistic Regression) in terms of overall classification accuracy, specificity, sensitivity, Area under the ROC curve and Press'Q. Model predictors were 10 neuropsychological tests currently used in the diagnosis of dementia. Statistical distributions of classification parameters obtained from a 5-fold cross-validation were compared using the Friedman's nonparametric test. Results Press' Q test showed that all classifiers performed better than chance alone (p < 0.05). Support Vector Machines showed the larger overall classification accuracy (Median (Me) = 0.76) an area under the ROC (Me = 0.90). However this method showed high specificity (Me = 1.0) but low sensitivity (Me = 0.3). Random Forest ranked second in overall accuracy (Me = 0.73) with high area under the ROC (Me = 0.73) specificity (Me = 0.73) and sensitivity (Me = 0.64). Linear Discriminant Analysis also showed acceptable overall accuracy (Me = 0.66), with acceptable area under the ROC (Me = 0.72) specificity (Me = 0.66) and sensitivity (Me = 0.64). The remaining classifiers showed overall classification accuracy above a median value of 0.63, but for most sensitivity was around or even lower than a median value of 0.5. Conclusions When taking into account sensitivity, specificity and overall classification accuracy Random Forests and Linear Discriminant analysis rank first among all the classifiers tested in prediction of dementia using several neuropsychological tests. These methods may be used to improve accuracy, sensitivity and specificity of Dementia predictions from neuropsychological testing. PMID:21849043
Advanced analysis of forest fire clustering
NASA Astrophysics Data System (ADS)
Kanevski, Mikhail; Pereira, Mario; Golay, Jean
2017-04-01
Analysis of point pattern clustering is an important topic in spatial statistics and for many applications: biodiversity, epidemiology, natural hazards, geomarketing, etc. There are several fundamental approaches used to quantify spatial data clustering using topological, statistical and fractal measures. In the present research, the recently introduced multi-point Morisita index (mMI) is applied to study the spatial clustering of forest fires in Portugal. The data set consists of more than 30000 fire events covering the time period from 1975 to 2013. The distribution of forest fires is very complex and highly variable in space. mMI is a multi-point extension of the classical two-point Morisita index. In essence, mMI is estimated by covering the region under study by a grid and by computing how many times more likely it is that m points selected at random will be from the same grid cell than it would be in the case of a complete random Poisson process. By changing the number of grid cells (size of the grid cells), mMI characterizes the scaling properties of spatial clustering. From mMI, the data intrinsic dimension (fractal dimension) of the point distribution can be estimated as well. In this study, the mMI of forest fires is compared with the mMI of random patterns (RPs) generated within the validity domain defined as the forest area of Portugal. It turns out that the forest fires are highly clustered inside the validity domain in comparison with the RPs. Moreover, they demonstrate different scaling properties at different spatial scales. The results obtained from the mMI analysis are also compared with those of fractal measures of clustering - box counting and sand box counting approaches. REFERENCES Golay J., Kanevski M., Vega Orozco C., Leuenberger M., 2014: The multipoint Morisita index for the analysis of spatial patterns. Physica A, 406, 191-202. Golay J., Kanevski M. 2015: A new estimator of intrinsic dimension based on the multipoint Morisita index. Pattern Recognition, 48, 4070-4081.
Discriminant forest classification method and system
Chen, Barry Y.; Hanley, William G.; Lemmond, Tracy D.; Hiller, Lawrence J.; Knapp, David A.; Mugge, Marshall J.
2012-11-06
A hybrid machine learning methodology and system for classification that combines classical random forest (RF) methodology with discriminant analysis (DA) techniques to provide enhanced classification capability. A DA technique which uses feature measurements of an object to predict its class membership, such as linear discriminant analysis (LDA) or Andersen-Bahadur linear discriminant technique (AB), is used to split the data at each node in each of its classification trees to train and grow the trees and the forest. When training is finished, a set of n DA-based decision trees of a discriminant forest is produced for use in predicting the classification of new samples of unknown class.
A Tale of Two “Forests”: Random Forest Machine Learning Aids Tropical Forest Carbon Mapping
Mascaro, Joseph; Asner, Gregory P.; Knapp, David E.; Kennedy-Bowdoin, Ty; Martin, Roberta E.; Anderson, Christopher; Higgins, Mark; Chadwick, K. Dana
2014-01-01
Accurate and spatially-explicit maps of tropical forest carbon stocks are needed to implement carbon offset mechanisms such as REDD+ (Reduced Deforestation and Degradation Plus). The Random Forest machine learning algorithm may aid carbon mapping applications using remotely-sensed data. However, Random Forest has never been compared to traditional and potentially more reliable techniques such as regionally stratified sampling and upscaling, and it has rarely been employed with spatial data. Here, we evaluated the performance of Random Forest in upscaling airborne LiDAR (Light Detection and Ranging)-based carbon estimates compared to the stratification approach over a 16-million hectare focal area of the Western Amazon. We considered two runs of Random Forest, both with and without spatial contextual modeling by including—in the latter case—x, and y position directly in the model. In each case, we set aside 8 million hectares (i.e., half of the focal area) for validation; this rigorous test of Random Forest went above and beyond the internal validation normally compiled by the algorithm (i.e., called “out-of-bag”), which proved insufficient for this spatial application. In this heterogeneous region of Northern Peru, the model with spatial context was the best preforming run of Random Forest, and explained 59% of LiDAR-based carbon estimates within the validation area, compared to 37% for stratification or 43% by Random Forest without spatial context. With the 60% improvement in explained variation, RMSE against validation LiDAR samples improved from 33 to 26 Mg C ha−1 when using Random Forest with spatial context. Our results suggest that spatial context should be considered when using Random Forest, and that doing so may result in substantially improved carbon stock modeling for purposes of climate change mitigation. PMID:24489686
NASA Astrophysics Data System (ADS)
Deng, Chengbin; Wu, Changshan
2013-12-01
Urban impervious surface information is essential for urban and environmental applications at the regional/national scales. As a popular image processing technique, spectral mixture analysis (SMA) has rarely been applied to coarse-resolution imagery due to the difficulty of deriving endmember spectra using traditional endmember selection methods, particularly within heterogeneous urban environments. To address this problem, we derived endmember signatures through a least squares solution (LSS) technique with known abundances of sample pixels, and integrated these endmember signatures into SMA for mapping large-scale impervious surface fraction. In addition, with the same sample set, we carried out objective comparative analyses among SMA (i.e. fully constrained and unconstrained SMA) and machine learning (i.e. Cubist regression tree and Random Forests) techniques. Analysis of results suggests three major conclusions. First, with the extrapolated endmember spectra from stratified random training samples, the SMA approaches performed relatively well, as indicated by small MAE values. Second, Random Forests yields more reliable results than Cubist regression tree, and its accuracy is improved with increased sample sizes. Finally, comparative analyses suggest a tentative guide for selecting an optimal approach for large-scale fractional imperviousness estimation: unconstrained SMA might be a favorable option with a small number of samples, while Random Forests might be preferred if a large number of samples are available.
NASA Astrophysics Data System (ADS)
Ahmed, Oumer S.; Franklin, Steven E.; Wulder, Michael A.; White, Joanne C.
2015-03-01
Many forest management activities, including the development of forest inventories, require spatially detailed forest canopy cover and height data. Among the various remote sensing technologies, LiDAR (Light Detection and Ranging) offers the most accurate and consistent means for obtaining reliable canopy structure measurements. A potential solution to reduce the cost of LiDAR data, is to integrate transects (samples) of LiDAR data with frequently acquired and spatially comprehensive optical remotely sensed data. Although multiple regression is commonly used for such modeling, often it does not fully capture the complex relationships between forest structure variables. This study investigates the potential of Random Forest (RF), a machine learning technique, to estimate LiDAR measured canopy structure using a time series of Landsat imagery. The study is implemented over a 2600 ha area of industrially managed coastal temperate forests on Vancouver Island, British Columbia, Canada. We implemented a trajectory-based approach to time series analysis that generates time since disturbance (TSD) and disturbance intensity information for each pixel and we used this information to stratify the forest land base into two strata: mature forests and young forests. Canopy cover and height for three forest classes (i.e. mature, young and mature and young (combined)) were modeled separately using multiple regression and Random Forest (RF) techniques. For all forest classes, the RF models provided improved estimates relative to the multiple regression models. The lowest validation error was obtained for the mature forest strata in a RF model (R2 = 0.88, RMSE = 2.39 m and bias = -0.16 for canopy height; R2 = 0.72, RMSE = 0.068% and bias = -0.0049 for canopy cover). This study demonstrates the value of using disturbance and successional history to inform estimates of canopy structure and obtain improved estimates of forest canopy cover and height using the RF algorithm.
Temporal changes in randomness of bird communities across Central Europe.
Renner, Swen C; Gossner, Martin M; Kahl, Tiemo; Kalko, Elisabeth K V; Weisser, Wolfgang W; Fischer, Markus; Allan, Eric
2014-01-01
Many studies have examined whether communities are structured by random or deterministic processes, and both are likely to play a role, but relatively few studies have attempted to quantify the degree of randomness in species composition. We quantified, for the first time, the degree of randomness in forest bird communities based on an analysis of spatial autocorrelation in three regions of Germany. The compositional dissimilarity between pairs of forest patches was regressed against the distance between them. We then calculated the y-intercept of the curve, i.e. the 'nugget', which represents the compositional dissimilarity at zero spatial distance. We therefore assume, following similar work on plant communities, that this represents the degree of randomness in species composition. We then analysed how the degree of randomness in community composition varied over time and with forest management intensity, which we expected to reduce the importance of random processes by increasing the strength of environmental drivers. We found that a high portion of the bird community composition could be explained by chance (overall mean of 0.63), implying that most of the variation in local bird community composition is driven by stochastic processes. Forest management intensity did not consistently affect the mean degree of randomness in community composition, perhaps because the bird communities were relatively insensitive to management intensity. We found a high temporal variation in the degree of randomness, which may indicate temporal variation in assembly processes and in the importance of key environmental drivers. We conclude that the degree of randomness in community composition should be considered in bird community studies, and the high values we find may indicate that bird community composition is relatively hard to predict at the regional scale.
Su, Xiaogang; Peña, Annette T; Liu, Lei; Levine, Richard A
2018-04-29
Assessing heterogeneous treatment effects is a growing interest in advancing precision medicine. Individualized treatment effects (ITEs) play a critical role in such an endeavor. Concerning experimental data collected from randomized trials, we put forward a method, termed random forests of interaction trees (RFIT), for estimating ITE on the basis of interaction trees. To this end, we propose a smooth sigmoid surrogate method, as an alternative to greedy search, to speed up tree construction. The RFIT outperforms the "separate regression" approach in estimating ITE. Furthermore, standard errors for the estimated ITE via RFIT are obtained with the infinitesimal jackknife method. We assess and illustrate the use of RFIT via both simulation and the analysis of data from an acupuncture headache trial. Copyright © 2018 John Wiley & Sons, Ltd.
Devaney, John; Barrett, Brian; Barrett, Frank; Redmond, John; O Halloran, John
2015-01-01
Quantification of spatial and temporal changes in forest cover is an essential component of forest monitoring programs. Due to its cloud free capability, Synthetic Aperture Radar (SAR) is an ideal source of information on forest dynamics in countries with near-constant cloud-cover. However, few studies have investigated the use of SAR for forest cover estimation in landscapes with highly sparse and fragmented forest cover. In this study, the potential use of L-band SAR for forest cover estimation in two regions (Longford and Sligo) in Ireland is investigated and compared to forest cover estimates derived from three national (Forestry2010, Prime2, National Forest Inventory), one pan-European (Forest Map 2006) and one global forest cover (Global Forest Change) product. Two machine-learning approaches (Random Forests and Extremely Randomised Trees) are evaluated. Both Random Forests and Extremely Randomised Trees classification accuracies were high (98.1-98.5%), with differences between the two classifiers being minimal (<0.5%). Increasing levels of post classification filtering led to a decrease in estimated forest area and an increase in overall accuracy of SAR-derived forest cover maps. All forest cover products were evaluated using an independent validation dataset. For the Longford region, the highest overall accuracy was recorded with the Forestry2010 dataset (97.42%) whereas in Sligo, highest overall accuracy was obtained for the Prime2 dataset (97.43%), although accuracies of SAR-derived forest maps were comparable. Our findings indicate that spaceborne radar could aid inventories in regions with low levels of forest cover in fragmented landscapes. The reduced accuracies observed for the global and pan-continental forest cover maps in comparison to national and SAR-derived forest maps indicate that caution should be exercised when applying these datasets for national reporting.
Devaney, John; Barrett, Brian; Barrett, Frank; Redmond, John; O`Halloran, John
2015-01-01
Quantification of spatial and temporal changes in forest cover is an essential component of forest monitoring programs. Due to its cloud free capability, Synthetic Aperture Radar (SAR) is an ideal source of information on forest dynamics in countries with near-constant cloud-cover. However, few studies have investigated the use of SAR for forest cover estimation in landscapes with highly sparse and fragmented forest cover. In this study, the potential use of L-band SAR for forest cover estimation in two regions (Longford and Sligo) in Ireland is investigated and compared to forest cover estimates derived from three national (Forestry2010, Prime2, National Forest Inventory), one pan-European (Forest Map 2006) and one global forest cover (Global Forest Change) product. Two machine-learning approaches (Random Forests and Extremely Randomised Trees) are evaluated. Both Random Forests and Extremely Randomised Trees classification accuracies were high (98.1–98.5%), with differences between the two classifiers being minimal (<0.5%). Increasing levels of post classification filtering led to a decrease in estimated forest area and an increase in overall accuracy of SAR-derived forest cover maps. All forest cover products were evaluated using an independent validation dataset. For the Longford region, the highest overall accuracy was recorded with the Forestry2010 dataset (97.42%) whereas in Sligo, highest overall accuracy was obtained for the Prime2 dataset (97.43%), although accuracies of SAR-derived forest maps were comparable. Our findings indicate that spaceborne radar could aid inventories in regions with low levels of forest cover in fragmented landscapes. The reduced accuracies observed for the global and pan-continental forest cover maps in comparison to national and SAR-derived forest maps indicate that caution should be exercised when applying these datasets for national reporting. PMID:26262681
Accurate Diabetes Risk Stratification Using Machine Learning: Role of Missing Value and Outliers.
Maniruzzaman, Md; Rahman, Md Jahanur; Al-MehediHasan, Md; Suri, Harman S; Abedin, Md Menhazul; El-Baz, Ayman; Suri, Jasjit S
2018-04-10
Diabetes mellitus is a group of metabolic diseases in which blood sugar levels are too high. About 8.8% of the world was diabetic in 2017. It is projected that this will reach nearly 10% by 2045. The major challenge is that when machine learning-based classifiers are applied to such data sets for risk stratification, leads to lower performance. Thus, our objective is to develop an optimized and robust machine learning (ML) system under the assumption that missing values or outliers if replaced by a median configuration will yield higher risk stratification accuracy. This ML-based risk stratification is designed, optimized and evaluated, where: (i) the features are extracted and optimized from the six feature selection techniques (random forest, logistic regression, mutual information, principal component analysis, analysis of variance, and Fisher discriminant ratio) and combined with ten different types of classifiers (linear discriminant analysis, quadratic discriminant analysis, naïve Bayes, Gaussian process classification, support vector machine, artificial neural network, Adaboost, logistic regression, decision tree, and random forest) under the hypothesis that both missing values and outliers when replaced by computed medians will improve the risk stratification accuracy. Pima Indian diabetic dataset (768 patients: 268 diabetic and 500 controls) was used. Our results demonstrate that on replacing the missing values and outliers by group median and median values, respectively and further using the combination of random forest feature selection and random forest classification technique yields an accuracy, sensitivity, specificity, positive predictive value, negative predictive value and area under the curve as: 92.26%, 95.96%, 79.72%, 91.14%, 91.20%, and 0.93, respectively. This is an improvement of 10% over previously developed techniques published in literature. The system was validated for its stability and reliability. RF-based model showed the best performance when outliers are replaced by median values.
NASA Astrophysics Data System (ADS)
Mangla, Rohit; Kumar, Shashi; Nandy, Subrata
2016-05-01
SAR and LiDAR remote sensing have already shown the potential of active sensors for forest parameter retrieval. SAR sensor in its fully polarimetric mode has an advantage to retrieve scattering property of different component of forest structure and LiDAR has the capability to measure structural information with very high accuracy. This study was focused on retrieval of forest aboveground biomass (AGB) using Terrestrial Laser Scanner (TLS) based point clouds and scattering property of forest vegetation obtained from decomposition modelling of RISAT-1 fully polarimetric SAR data. TLS data was acquired for 14 plots of Timli forest range, Uttarakhand, India. The forest area is dominated by Sal trees and random sampling with plot size of 0.1 ha (31.62m*31.62m) was adopted for TLS and field data collection. RISAT-1 data was processed to retrieve SAR data based variables and TLS point clouds based 3D imaging was done to retrieve LiDAR based variables. Surface scattering, double-bounce scattering, volume scattering, helix and wire scattering were the SAR based variables retrieved from polarimetric decomposition. Tree heights and stem diameters were used as LiDAR based variables retrieved from single tree vertical height and least square circle fit methods respectively. All the variables obtained for forest plots were used as an input in a machine learning based Random Forest Regression Model, which was developed in this study for forest AGB estimation. Modelled output for forest AGB showed reliable accuracy (RMSE = 27.68 t/ha) and a good coefficient of determination (0.63) was obtained through the linear regression between modelled AGB and field-estimated AGB. The sensitivity analysis showed that the model was more sensitive for the major contributed variables (stem diameter and volume scattering) and these variables were measured from two different remote sensing techniques. This study strongly recommends the integration of SAR and LiDAR data for forest AGB estimation.
Effect of inventory method on niche models: random versus systematic error
Heather E. Lintz; Andrew N. Gray; Bruce McCune
2013-01-01
Data from large-scale biological inventories are essential for understanding and managing Earth's ecosystems. The Forest Inventory and Analysis Program (FIA) of the U.S. Forest Service is the largest biological inventory in North America; however, the FIA inventory recently changed from an amalgam of different approaches to a nationally-standardized approach in...
Calibrating random forests for probability estimation.
Dankowski, Theresa; Ziegler, Andreas
2016-09-30
Probabilities can be consistently estimated using random forests. It is, however, unclear how random forests should be updated to make predictions for other centers or at different time points. In this work, we present two approaches for updating random forests for probability estimation. The first method has been proposed by Elkan and may be used for updating any machine learning approach yielding consistent probabilities, so-called probability machines. The second approach is a new strategy specifically developed for random forests. Using the terminal nodes, which represent conditional probabilities, the random forest is first translated to logistic regression models. These are, in turn, used for re-calibration. The two updating strategies were compared in a simulation study and are illustrated with data from the German Stroke Study Collaboration. In most simulation scenarios, both methods led to similar improvements. In the simulation scenario in which the stricter assumptions of Elkan's method were not met, the logistic regression-based re-calibration approach for random forests outperformed Elkan's method. It also performed better on the stroke data than Elkan's method. The strength of Elkan's method is its general applicability to any probability machine. However, if the strict assumptions underlying this approach are not met, the logistic regression-based approach is preferable for updating random forests for probability estimation. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.
Applying a weighted random forests method to extract karst sinkholes from LiDAR data
NASA Astrophysics Data System (ADS)
Zhu, Junfeng; Pierskalla, William P.
2016-02-01
Detailed mapping of sinkholes provides critical information for mitigating sinkhole hazards and understanding groundwater and surface water interactions in karst terrains. LiDAR (Light Detection and Ranging) measures the earth's surface in high-resolution and high-density and has shown great potentials to drastically improve locating and delineating sinkholes. However, processing LiDAR data to extract sinkholes requires separating sinkholes from other depressions, which can be laborious because of the sheer number of the depressions commonly generated from LiDAR data. In this study, we applied the random forests, a machine learning method, to automatically separate sinkholes from other depressions in a karst region in central Kentucky. The sinkhole-extraction random forest was grown on a training dataset built from an area where LiDAR-derived depressions were manually classified through a visual inspection and field verification process. Based on the geometry of depressions, as well as natural and human factors related to sinkholes, 11 parameters were selected as predictive variables to form the dataset. Because the training dataset was imbalanced with the majority of depressions being non-sinkholes, a weighted random forests method was used to improve the accuracy of predicting sinkholes. The weighted random forest achieved an average accuracy of 89.95% for the training dataset, demonstrating that the random forest can be an effective sinkhole classifier. Testing of the random forest in another area, however, resulted in moderate success with an average accuracy rate of 73.96%. This study suggests that an automatic sinkhole extraction procedure like the random forest classifier can significantly reduce time and labor costs and makes its more tractable to map sinkholes using LiDAR data for large areas. However, the random forests method cannot totally replace manual procedures, such as visual inspection and field verification.
Prediction of aquatic toxicity mode of action using linear discriminant and random forest models.
Martin, Todd M; Grulke, Christopher M; Young, Douglas M; Russom, Christine L; Wang, Nina Y; Jackson, Crystal R; Barron, Mace G
2013-09-23
The ability to determine the mode of action (MOA) for a diverse group of chemicals is a critical part of ecological risk assessment and chemical regulation. However, existing MOA assignment approaches in ecotoxicology have been limited to a relatively few MOAs, have high uncertainty, or rely on professional judgment. In this study, machine based learning algorithms (linear discriminant analysis and random forest) were used to develop models for assigning aquatic toxicity MOA. These methods were selected since they have been shown to be able to correlate diverse data sets and provide an indication of the most important descriptors. A data set of MOA assignments for 924 chemicals was developed using a combination of high confidence assignments, international consensus classifications, ASTER (ASessment Tools for the Evaluation of Risk) predictions, and weight of evidence professional judgment based an assessment of structure and literature information. The overall data set was randomly divided into a training set (75%) and a validation set (25%) and then used to develop linear discriminant analysis (LDA) and random forest (RF) MOA assignment models. The LDA and RF models had high internal concordance and specificity and were able to produce overall prediction accuracies ranging from 84.5 to 87.7% for the validation set. These results demonstrate that computational chemistry approaches can be used to determine the acute toxicity MOAs across a large range of structures and mechanisms.
Random forest models to predict aqueous solubility.
Palmer, David S; O'Boyle, Noel M; Glen, Robert C; Mitchell, John B O
2007-01-01
Random Forest regression (RF), Partial-Least-Squares (PLS) regression, Support Vector Machines (SVM), and Artificial Neural Networks (ANN) were used to develop QSPR models for the prediction of aqueous solubility, based on experimental data for 988 organic molecules. The Random Forest regression model predicted aqueous solubility more accurately than those created by PLS, SVM, and ANN and offered methods for automatic descriptor selection, an assessment of descriptor importance, and an in-parallel measure of predictive ability, all of which serve to recommend its use. The prediction of log molar solubility for an external test set of 330 molecules that are solid at 25 degrees C gave an r2 = 0.89 and RMSE = 0.69 log S units. For a standard data set selected from the literature, the model performed well with respect to other documented methods. Finally, the diversity of the training and test sets are compared to the chemical space occupied by molecules in the MDL drug data report, on the basis of molecular descriptors selected by the regression analysis.
Ma, Li; Fan, Suohai
2017-03-14
The random forests algorithm is a type of classifier with prominent universality, a wide application range, and robustness for avoiding overfitting. But there are still some drawbacks to random forests. Therefore, to improve the performance of random forests, this paper seeks to improve imbalanced data processing, feature selection and parameter optimization. We propose the CURE-SMOTE algorithm for the imbalanced data classification problem. Experiments on imbalanced UCI data reveal that the combination of Clustering Using Representatives (CURE) enhances the original synthetic minority oversampling technique (SMOTE) algorithms effectively compared with the classification results on the original data using random sampling, Borderline-SMOTE1, safe-level SMOTE, C-SMOTE, and k-means-SMOTE. Additionally, the hybrid RF (random forests) algorithm has been proposed for feature selection and parameter optimization, which uses the minimum out of bag (OOB) data error as its objective function. Simulation results on binary and higher-dimensional data indicate that the proposed hybrid RF algorithms, hybrid genetic-random forests algorithm, hybrid particle swarm-random forests algorithm and hybrid fish swarm-random forests algorithm can achieve the minimum OOB error and show the best generalization ability. The training set produced from the proposed CURE-SMOTE algorithm is closer to the original data distribution because it contains minimal noise. Thus, better classification results are produced from this feasible and effective algorithm. Moreover, the hybrid algorithm's F-value, G-mean, AUC and OOB scores demonstrate that they surpass the performance of the original RF algorithm. Hence, this hybrid algorithm provides a new way to perform feature selection and parameter optimization.
van der Meer, D; Hoekstra, P J; van Donkelaar, M; Bralten, J; Oosterlaan, J; Heslenfeld, D; Faraone, S V; Franke, B; Buitelaar, J K; Hartman, C A
2017-01-01
Identifying genetic variants contributing to attention-deficit/hyperactivity disorder (ADHD) is complicated by the involvement of numerous common genetic variants with small effects, interacting with each other as well as with environmental factors, such as stress exposure. Random forest regression is well suited to explore this complexity, as it allows for the analysis of many predictors simultaneously, taking into account any higher-order interactions among them. Using random forest regression, we predicted ADHD severity, measured by Conners’ Parent Rating Scales, from 686 adolescents and young adults (of which 281 were diagnosed with ADHD). The analysis included 17 374 single-nucleotide polymorphisms (SNPs) across 29 genes previously linked to hypothalamic–pituitary–adrenal (HPA) axis activity, together with information on exposure to 24 individual long-term difficulties or stressful life events. The model explained 12.5% of variance in ADHD severity. The most important SNP, which also showed the strongest interaction with stress exposure, was located in a region regulating the expression of telomerase reverse transcriptase (TERT). Other high-ranking SNPs were found in or near NPSR1, ESR1, GABRA6, PER3, NR3C2 and DRD4. Chronic stressors were more influential than single, severe, life events. Top hits were partly shared with conduct problems. We conclude that random forest regression may be used to investigate how multiple genetic and environmental factors jointly contribute to ADHD. It is able to implicate novel SNPs of interest, interacting with stress exposure, and may explain inconsistent findings in ADHD genetics. This exploratory approach may be best combined with more hypothesis-driven research; top predictors and their interactions with one another should be replicated in independent samples. PMID:28585928
Hayes, Timothy; Usami, Satoshi; Jacobucci, Ross; McArdle, John J
2015-12-01
In this article, we describe a recent development in the analysis of attrition: using classification and regression trees (CART) and random forest methods to generate inverse sampling weights. These flexible machine learning techniques have the potential to capture complex nonlinear, interactive selection models, yet to our knowledge, their performance in the missing data analysis context has never been evaluated. To assess the potential benefits of these methods, we compare their performance with commonly employed multiple imputation and complete case techniques in 2 simulations. These initial results suggest that weights computed from pruned CART analyses performed well in terms of both bias and efficiency when compared with other methods. We discuss the implications of these findings for applied researchers. (c) 2015 APA, all rights reserved).
Hayes, Timothy; Usami, Satoshi; Jacobucci, Ross; McArdle, John J.
2016-01-01
In this article, we describe a recent development in the analysis of attrition: using classification and regression trees (CART) and random forest methods to generate inverse sampling weights. These flexible machine learning techniques have the potential to capture complex nonlinear, interactive selection models, yet to our knowledge, their performance in the missing data analysis context has never been evaluated. To assess the potential benefits of these methods, we compare their performance with commonly employed multiple imputation and complete case techniques in 2 simulations. These initial results suggest that weights computed from pruned CART analyses performed well in terms of both bias and efficiency when compared with other methods. We discuss the implications of these findings for applied researchers. PMID:26389526
Serdukova, Larissa; Zheng, Yayun; Duan, Jinqiao; Kurths, Jürgen
2017-08-24
For the tipping elements in the Earth's climate system, the most important issue to address is how stable is the desirable state against random perturbations. Extreme biotic and climatic events pose severe hazards to tropical rainforests. Their local effects are extremely stochastic and difficult to measure. Moreover, the direction and intensity of the response of forest trees to such perturbations are unknown, especially given the lack of efficient dynamical vegetation models to evaluate forest tree cover changes over time. In this study, we consider randomness in the mathematical modelling of forest trees by incorporating uncertainty through a stochastic differential equation. According to field-based evidence, the interactions between fires and droughts are a more direct mechanism that may describe sudden forest degradation in the south-eastern Amazon. In modeling the Amazonian vegetation system, we include symmetric α-stable Lévy perturbations. We report results of stability analysis of the metastable fertile forest state. We conclude that even a very slight threat to the forest state stability represents L´evy noise with large jumps of low intensity, that can be interpreted as a fire occurring in a non-drought year. During years of severe drought, high-intensity fires significantly accelerate the transition between a forest and savanna state.
Henry V. Bastian
2001-01-01
Vegetation and fuel loading plots were monitored and sampled in wilderness areas treated with prescribed fire. Changes in ponderosa pine (Pinus ponderosa) forest structure tree species and fuel loading are presented. Plots were randomly stratified and established in burn units in 1995. Preliminary analysis of nine plots 2 years after burning show litter was reduced 54....
Xiaoqian Sun; Zhuoqiong He; John Kabrick
2008-01-01
This paper presents a Bayesian spatial method for analysing the site index data from the Missouri Ozark Forest Ecosystem Project (MOFEP). Based on ecological background and availability, we select three variables, the aspect class, the soil depth and the land type association as covariates for analysis. To allow great flexibility of the smoothness of the random field,...
NASA Astrophysics Data System (ADS)
Hudak, A. T.; Crookston, N.; Kennedy, R. E.; Domke, G. M.; Fekety, P.; Falkowski, M. J.
2017-12-01
Commercial off-the-shelf lidar collections associated with tree measures in field plots allow aboveground biomass (AGB) estimation with high confidence. Predictive models developed from such datasets are used operationally to map AGB across lidar project areas. We use a random selection of these pixel-level AGB predictions as training for predicting AGB annually across Idaho and western Montana, primarily from Landsat time series imagery processed through LandTrendr. At both the landscape and regional scales, Random Forests is used for predictive AGB modeling. To project future carbon dynamics, we use Climate-FVS (Forest Vegetation Simulator), the tree growth engine used by foresters to inform forest planning decisions, under either constant or changing climate scenarios. Disturbance data compiled from LandTrendr (Kennedy et al. 2010) using TimeSync (Cohen et al. 2010) in forested lands of Idaho (n=509) and western Montana (n=288) are used to generate probabilities of disturbance (harvest, fire, or insect) by land ownership class (public, private) as well as the magnitude of disturbance. Our verification approach is to aggregate the regional, annual AGB predictions at the county level and compare them to annual county-level AGB summarized independently from systematic, field-based, annual inventories conducted by the US Forest Inventory and Analysis (FIA) Program nationally. This analysis shows that when federal lands are disturbed the magnitude is generally high and when other lands are disturbed the magnitudes are more moderate. The probability of disturbance in corporate lands is higher than in other lands but the magnitudes are generally lower. This is consistent with the much higher prevalence of fire and insects occurring on federal lands, and greater harvest activity on private lands. We found large forest carbon losses in drier southern Idaho, only partially offset by carbon gains in wetter northern Idaho, due to anticipated climate change. Public and private forest managers can use these forest carbon projections to 2117 to inform 2017 decisions on which tree species and seed sources to select for planting, and implement forest management strategies now that may seek to maximize forest carbon sequestration for greenhouse gas abatement a century from now.
Non-random species loss in a forest herbaceous layer following nitrogen addition
Christopher A. Walter; Mary Beth Adams; Frank S. Gilliam; William T. Peterjohn
2017-01-01
Nitrogen (N) additions have decreased species richness (S) in hardwood forest herbaceous layers, yet the functional mechanisms for these decreases have not been explicitly evaluated.We tested two hypothesized mechanisms, random species loss (RSL) and non-random species loss (NRSL), in the hardwood forest herbaceous layer of a long-term, plot-scale...
Analysis of Machine Learning Techniques for Heart Failure Readmissions.
Mortazavi, Bobak J; Downing, Nicholas S; Bucholz, Emily M; Dharmarajan, Kumar; Manhapra, Ajay; Li, Shu-Xia; Negahban, Sahand N; Krumholz, Harlan M
2016-11-01
The current ability to predict readmissions in patients with heart failure is modest at best. It is unclear whether machine learning techniques that address higher dimensional, nonlinear relationships among variables would enhance prediction. We sought to compare the effectiveness of several machine learning algorithms for predicting readmissions. Using data from the Telemonitoring to Improve Heart Failure Outcomes trial, we compared the effectiveness of random forests, boosting, random forests combined hierarchically with support vector machines or logistic regression (LR), and Poisson regression against traditional LR to predict 30- and 180-day all-cause readmissions and readmissions because of heart failure. We randomly selected 50% of patients for a derivation set, and a validation set comprised the remaining patients, validated using 100 bootstrapped iterations. We compared C statistics for discrimination and distributions of observed outcomes in risk deciles for predictive range. In 30-day all-cause readmission prediction, the best performing machine learning model, random forests, provided a 17.8% improvement over LR (mean C statistics, 0.628 and 0.533, respectively). For readmissions because of heart failure, boosting improved the C statistic by 24.9% over LR (mean C statistic 0.678 and 0.543, respectively). For 30-day all-cause readmission, the observed readmission rates in the lowest and highest deciles of predicted risk with random forests (7.8% and 26.2%, respectively) showed a much wider separation than LR (14.2% and 16.4%, respectively). Machine learning methods improved the prediction of readmission after hospitalization for heart failure compared with LR and provided the greatest predictive range in observed readmission rates. © 2016 American Heart Association, Inc.
Shah, Anoop D.; Bartlett, Jonathan W.; Carpenter, James; Nicholas, Owen; Hemingway, Harry
2014-01-01
Multivariate imputation by chained equations (MICE) is commonly used for imputing missing data in epidemiologic research. The “true” imputation model may contain nonlinearities which are not included in default imputation models. Random forest imputation is a machine learning technique which can accommodate nonlinearities and interactions and does not require a particular regression model to be specified. We compared parametric MICE with a random forest-based MICE algorithm in 2 simulation studies. The first study used 1,000 random samples of 2,000 persons drawn from the 10,128 stable angina patients in the CALIBER database (Cardiovascular Disease Research using Linked Bespoke Studies and Electronic Records; 2001–2010) with complete data on all covariates. Variables were artificially made “missing at random,” and the bias and efficiency of parameter estimates obtained using different imputation methods were compared. Both MICE methods produced unbiased estimates of (log) hazard ratios, but random forest was more efficient and produced narrower confidence intervals. The second study used simulated data in which the partially observed variable depended on the fully observed variables in a nonlinear way. Parameter estimates were less biased using random forest MICE, and confidence interval coverage was better. This suggests that random forest imputation may be useful for imputing complex epidemiologic data sets in which some patients have missing data. PMID:24589914
Shah, Anoop D; Bartlett, Jonathan W; Carpenter, James; Nicholas, Owen; Hemingway, Harry
2014-03-15
Multivariate imputation by chained equations (MICE) is commonly used for imputing missing data in epidemiologic research. The "true" imputation model may contain nonlinearities which are not included in default imputation models. Random forest imputation is a machine learning technique which can accommodate nonlinearities and interactions and does not require a particular regression model to be specified. We compared parametric MICE with a random forest-based MICE algorithm in 2 simulation studies. The first study used 1,000 random samples of 2,000 persons drawn from the 10,128 stable angina patients in the CALIBER database (Cardiovascular Disease Research using Linked Bespoke Studies and Electronic Records; 2001-2010) with complete data on all covariates. Variables were artificially made "missing at random," and the bias and efficiency of parameter estimates obtained using different imputation methods were compared. Both MICE methods produced unbiased estimates of (log) hazard ratios, but random forest was more efficient and produced narrower confidence intervals. The second study used simulated data in which the partially observed variable depended on the fully observed variables in a nonlinear way. Parameter estimates were less biased using random forest MICE, and confidence interval coverage was better. This suggests that random forest imputation may be useful for imputing complex epidemiologic data sets in which some patients have missing data.
Approximating prediction uncertainty for random forest regression models
John W. Coulston; Christine E. Blinn; Valerie A. Thomas; Randolph H. Wynne
2016-01-01
Machine learning approaches such as random forest have increased for the spatial modeling and mapping of continuous variables. Random forest is a non-parametric ensemble approach, and unlike traditional regression approaches there is no direct quantification of prediction error. Understanding prediction uncertainty is important when using model-based continuous maps as...
Marchese Robinson, Richard L; Palczewska, Anna; Palczewski, Jan; Kidley, Nathan
2017-08-28
The ability to interpret the predictions made by quantitative structure-activity relationships (QSARs) offers a number of advantages. While QSARs built using nonlinear modeling approaches, such as the popular Random Forest algorithm, might sometimes be more predictive than those built using linear modeling approaches, their predictions have been perceived as difficult to interpret. However, a growing number of approaches have been proposed for interpreting nonlinear QSAR models in general and Random Forest in particular. In the current work, we compare the performance of Random Forest to those of two widely used linear modeling approaches: linear Support Vector Machines (SVMs) (or Support Vector Regression (SVR)) and partial least-squares (PLS). We compare their performance in terms of their predictivity as well as the chemical interpretability of the predictions using novel scoring schemes for assessing heat map images of substructural contributions. We critically assess different approaches for interpreting Random Forest models as well as for obtaining predictions from the forest. We assess the models on a large number of widely employed public-domain benchmark data sets corresponding to regression and binary classification problems of relevance to hit identification and toxicology. We conclude that Random Forest typically yields comparable or possibly better predictive performance than the linear modeling approaches and that its predictions may also be interpreted in a chemically and biologically meaningful way. In contrast to earlier work looking at interpretation of nonlinear QSAR models, we directly compare two methodologically distinct approaches for interpreting Random Forest models. The approaches for interpreting Random Forest assessed in our article were implemented using open-source programs that we have made available to the community. These programs are the rfFC package ( https://r-forge.r-project.org/R/?group_id=1725 ) for the R statistical programming language and the Python program HeatMapWrapper [ https://doi.org/10.5281/zenodo.495163 ] for heat map generation.
NASA Astrophysics Data System (ADS)
Fedrigo, Melissa; Newnham, Glenn J.; Coops, Nicholas C.; Culvenor, Darius S.; Bolton, Douglas K.; Nitschke, Craig R.
2018-02-01
Light detection and ranging (lidar) data have been increasingly used for forest classification due to its ability to penetrate the forest canopy and provide detail about the structure of the lower strata. In this study we demonstrate forest classification approaches using airborne lidar data as inputs to random forest and linear unmixing classification algorithms. Our results demonstrated that both random forest and linear unmixing models identified a distribution of rainforest and eucalypt stands that was comparable to existing ecological vegetation class (EVC) maps based primarily on manual interpretation of high resolution aerial imagery. Rainforest stands were also identified in the region that have not previously been identified in the EVC maps. The transition between stand types was better characterised by the random forest modelling approach. In contrast, the linear unmixing model placed greater emphasis on field plots selected as endmembers which may not have captured the variability in stand structure within a single stand type. The random forest model had the highest overall accuracy (84%) and Cohen's kappa coefficient (0.62). However, the classification accuracy was only marginally better than linear unmixing. The random forest model was applied to a region in the Central Highlands of south-eastern Australia to produce maps of stand type probability, including areas of transition (the 'ecotone') between rainforest and eucalypt forest. The resulting map provided a detailed delineation of forest classes, which specifically recognised the coalescing of stand types at the landscape scale. This represents a key step towards mapping the structural and spatial complexity of these ecosystems, which is important for both their management and conservation.
Hsieh, Chung-Ho; Lu, Ruey-Hwa; Lee, Nai-Hsin; Chiu, Wen-Ta; Hsu, Min-Huei; Li, Yu-Chuan Jack
2011-01-01
Diagnosing acute appendicitis clinically is still difficult. We developed random forests, support vector machines, and artificial neural network models to diagnose acute appendicitis. Between January 2006 and December 2008, patients who had a consultation session with surgeons for suspected acute appendicitis were enrolled. Seventy-five percent of the data set was used to construct models including random forest, support vector machines, artificial neural networks, and logistic regression. Twenty-five percent of the data set was withheld to evaluate model performance. The area under the receiver operating characteristic curve (AUC) was used to evaluate performance, which was compared with that of the Alvarado score. Data from a total of 180 patients were collected, 135 used for training and 45 for testing. The mean age of patients was 39.4 years (range, 16-85). Final diagnosis revealed 115 patients with and 65 without appendicitis. The AUC of random forest, support vector machines, artificial neural networks, logistic regression, and Alvarado was 0.98, 0.96, 0.91, 0.87, and 0.77, respectively. The sensitivity, specificity, positive, and negative predictive values of random forest were 94%, 100%, 100%, and 87%, respectively. Random forest performed better than artificial neural networks, logistic regression, and Alvarado. We demonstrated that random forest can predict acute appendicitis with good accuracy and, deployed appropriately, can be an effective tool in clinical decision making. Copyright © 2011 Mosby, Inc. All rights reserved.
Preference heterogeneity in a count data model of demand for off-highway vehicle recreation
Thomas P Holmes; Jeffrey E Englin
2010-01-01
This paper examines heterogeneity in the preferences for OHV recreation by applying the random parameters Poisson model to a data set of off-highway vehicle (OHV) users at four National Forest sites in North Carolina. The analysis develops estimates of individual consumer surplus and finds that estimates are systematically affected by the random parameter specification...
The experimental design of the Missouri Ozark Forest Ecosystem Project
Steven L. Sheriff; Shuoqiong He
1997-01-01
The Missouri Ozark Forest Ecosystem Project (MOFEP) is an experiment that examines the effects of three forest management practices on the forest community. MOFEP is designed as a randomized complete block design using nine sites divided into three blocks. Treatments of uneven-aged, even-aged, and no-harvest management were randomly assigned to sites within each block...
ERIC Educational Resources Information Center
Golino, Hudson F.; Gomes, Cristiano M. A.
2016-01-01
This paper presents a non-parametric imputation technique, named random forest, from the machine learning field. The random forest procedure has two main tuning parameters: the number of trees grown in the prediction and the number of predictors used. Fifty experimental conditions were created in the imputation procedure, with different…
Using Random Forest Models to Predict Organizational Violence
NASA Technical Reports Server (NTRS)
Levine, Burton; Bobashev, Georgly
2012-01-01
We present a methodology to access the proclivity of an organization to commit violence against nongovernment personnel. We fitted a Random Forest model using the Minority at Risk Organizational Behavior (MAROS) dataset. The MAROS data is longitudinal; so, individual observations are not independent. We propose a modification to the standard Random Forest methodology to account for the violation of the independence assumption. We present the results of the model fit, an example of predicting violence for an organization; and finally, we present a summary of the forest in a "meta-tree,"
Forest fire spatial pattern analysis in Galicia (NW Spain).
Fuentes-Santos, I; Marey-Pérez, M F; González-Manteiga, W
2013-10-15
Knowledge of fire behaviour is of key importance in forest management. In the present study, we analysed the spatial structure of forest fire with spatial point pattern analysis and inference techniques recently developed in the Spatstat package of R. Wildfires have been the primary threat to Galician forests in recent years. The district of Fonsagrada-Ancares is one of the most seriously affected by fire in the region and, therefore, the central focus of the study. Our main goal was to determine the spatial distribution of ignition points to model and predict fire occurrence. These data are of great value in establishing enhanced fire prevention and fire fighting plans. We found that the spatial distribution of wildfires is not random and that fire occurrence may depend on ownership conflicts. We also found positive interaction between small and large fires and spatial independence between wildfires in consecutive years. Copyright © 2013 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Elmore, K. L.
2016-12-01
The Metorological Phenomemna Identification NeartheGround (mPING) project is an example of a crowd-sourced, citizen science effort to gather data of sufficeint quality and quantity needed by new post processing methods that use machine learning. Transportation and infrastructure are particularly sensitive to precipitation type in winter weather. We extract attributes from operational numerical forecast models and use them in a random forest to generate forecast winter precipitation types. We find that random forests applied to forecast soundings are effective at generating skillful forecasts of surface ptype with consideralbly more skill than the current algorithms, especuially for ice pellets and freezing rain. We also find that three very different forecast models yuield similar overall results, showing that random forests are able to extract essentially equivalent information from different forecast models. We also show that the random forest for each model, and each profile type is unique to the particular forecast model and that the random forests developed using a particular model suffer significant degradation when given attributes derived from a different model. This implies that no single algorithm can perform well across all forecast models. Clearly, random forests extract information unavailable to "physically based" methods because the physical information in the models does not appear as we expect. One intersting result is that results from the classic "warm nose" sounding profile are, by far, the most sensitive to the particular forecast model, but this profile is also the one for which random forests are most skillful. Finally, a method for calibrarting probabilties for each different ptype using multinomial logistic regression is shown.
Buma, Brian
2012-06-01
Forest disturbances around the world have the potential to alter forest type and cover, with impacts on diversity, carbon storage, and landscape composition. These disturbances, especially fire, are common and often large, making ground investigation of forest recovery difficult. Remote sensing offers a means to monitor forest recovery in real time, over the entire landscape. Typically, recovery monitoring via remote sensing consists of measuring vegetation indices (e.g., NDVI) or index-derived metrics, with the assumption that recovery in NDVI (for example) is a meaningful measure of ecosystem recovery. This study tests that assumption using MODIS 16-day imagery from 2000 to 2010 in the area of the Colorado's Routt National Forest Hinman burn (2002) and seedling density counts taken in the same area. Results indicate that NDVI is rarely correlated with forest recovery, and is dominated by annual and perennial forb cover, although topography complicates analysis. Utility of NDVI as a means to delineate areas of recovery or non-recovery are in doubt, as bootstrapped analysis indicates distinguishing power only slightly better than random. NDVI in revegetation analyses should carefully consider the ecology and seasonal patterns of the system in question.
A multi-site analysis of random error in tower-based measurements of carbon and energy fluxes
Andrew D. Richardson; David Y. Hollinger; George G. Burba; Kenneth J. Davis; Lawrence B. Flanagan; Gabriel G. Katul; J. William Munger; Daniel M. Ricciuto; Paul C. Stoy; Andrew E. Suyker; Shashi B. Verma; Steven C. Wofsy; Steven C. Wofsy
2006-01-01
Measured surface-atmosphere fluxes of energy (sensible heat, H, and latent heat, LE) and CO2 (FCO2) represent the ``true?? flux plus or minus potential random and systematic measurement errors. Here, we use data from seven sites in the AmeriFlux network, including five forested sites (two of which include ``tall tower?? instrumentation), one grassland site, and one...
NASA Astrophysics Data System (ADS)
Beguet, Benoit; Guyon, Dominique; Boukir, Samia; Chehata, Nesrine
2014-10-01
The main goal of this study is to design a method to describe the structure of forest stands from Very High Resolution satellite imagery, relying on some typical variables such as crown diameter, tree height, trunk diameter, tree density and tree spacing. The emphasis is placed on the automatization of the process of identification of the most relevant image features for the forest structure retrieval task, exploiting both spectral and spatial information. Our approach is based on linear regressions between the forest structure variables to be estimated and various spectral and Haralick's texture features. The main drawback of this well-known texture representation is the underlying parameters which are extremely difficult to set due to the spatial complexity of the forest structure. To tackle this major issue, an automated feature selection process is proposed which is based on statistical modeling, exploring a wide range of parameter values. It provides texture measures of diverse spatial parameters hence implicitly inducing a multi-scale texture analysis. A new feature selection technique, we called Random PRiF, is proposed. It relies on random sampling in feature space, carefully addresses the multicollinearity issue in multiple-linear regression while ensuring accurate prediction of forest variables. Our automated forest variable estimation scheme was tested on Quickbird and Pléiades panchromatic and multispectral images, acquired at different periods on the maritime pine stands of two sites in South-Western France. It outperforms two well-established variable subset selection techniques. It has been successfully applied to identify the best texture features in modeling the five considered forest structure variables. The RMSE of all predicted forest variables is improved by combining multispectral and panchromatic texture features, with various parameterizations, highlighting the potential of a multi-resolution approach for retrieving forest structure variables from VHR satellite images. Thus an average prediction error of ˜ 1.1 m is expected on crown diameter, ˜ 0.9 m on tree spacing, ˜ 3 m on height and ˜ 0.06 m on diameter at breast height.
A Random Forest-based ensemble method for activity recognition.
Feng, Zengtao; Mo, Lingfei; Li, Meng
2015-01-01
This paper presents a multi-sensor ensemble approach to human physical activity (PA) recognition, using random forest. We designed an ensemble learning algorithm, which integrates several independent Random Forest classifiers based on different sensor feature sets to build a more stable, more accurate and faster classifier for human activity recognition. To evaluate the algorithm, PA data collected from the PAMAP (Physical Activity Monitoring for Aging People), which is a standard, publicly available database, was utilized to train and test. The experimental results show that the algorithm is able to correctly recognize 19 PA types with an accuracy of 93.44%, while the training is faster than others. The ensemble classifier system based on the RF (Random Forest) algorithm can achieve high recognition accuracy and fast calculation.
Wright, Marvin N; Dankowski, Theresa; Ziegler, Andreas
2017-04-15
The most popular approach for analyzing survival data is the Cox regression model. The Cox model may, however, be misspecified, and its proportionality assumption may not always be fulfilled. An alternative approach for survival prediction is random forests for survival outcomes. The standard split criterion for random survival forests is the log-rank test statistic, which favors splitting variables with many possible split points. Conditional inference forests avoid this split variable selection bias. However, linear rank statistics are utilized by default in conditional inference forests to select the optimal splitting variable, which cannot detect non-linear effects in the independent variables. An alternative is to use maximally selected rank statistics for the split point selection. As in conditional inference forests, splitting variables are compared on the p-value scale. However, instead of the conditional Monte-Carlo approach used in conditional inference forests, p-value approximations are employed. We describe several p-value approximations and the implementation of the proposed random forest approach. A simulation study demonstrates that unbiased split variable selection is possible. However, there is a trade-off between unbiased split variable selection and runtime. In benchmark studies of prediction performance on simulated and real datasets, the new method performs better than random survival forests if informative dichotomous variables are combined with uninformative variables with more categories and better than conditional inference forests if non-linear covariate effects are included. In a runtime comparison, the method proves to be computationally faster than both alternatives, if a simple p-value approximation is used. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
NASA Astrophysics Data System (ADS)
Dung Nguyen, The; Kappas, Martin
2017-04-01
In the last several years, the interest in forest biomass and carbon stock estimation has increased due to its importance for forest management, modelling carbon cycle, and other ecosystem services. However, no estimates of biomass and carbon stocks of deferent forest cover types exist throughout in the Xuan Lien Nature Reserve, Thanh Hoa, Viet Nam. This study investigates the relationship between above ground carbon stock and different vegetation indices and to identify the most likely vegetation index that best correlate with forest carbon stock. The terrestrial inventory data come from 380 sample plots that were randomly sampled. Individual tree parameters such as DBH and tree height were collected to calculate the above ground volume, biomass and carbon for different forest types. The SPOT6 2013 satellite data was used in the study to obtain five vegetation indices NDVI, RDVI, MSR, RVI, and EVI. The relationships between the forest carbon stock and vegetation indices were investigated using a multiple linear regression analysis. R-square, RMSE values and cross-validation were used to measure the strength and validate the performance of the models. The methodology presented here demonstrates the possibility of estimating forest volume, biomass and carbon stock. It can also be further improved by addressing more spectral bands data and/or elevation.
Pigmented skin lesion detection using random forest and wavelet-based texture
NASA Astrophysics Data System (ADS)
Hu, Ping; Yang, Tie-jun
2016-10-01
The incidence of cutaneous malignant melanoma, a disease of worldwide distribution and is the deadliest form of skin cancer, has been rapidly increasing over the last few decades. Because advanced cutaneous melanoma is still incurable, early detection is an important step toward a reduction in mortality. Dermoscopy photographs are commonly used in melanoma diagnosis and can capture detailed features of a lesion. A great variability exists in the visual appearance of pigmented skin lesions. Therefore, in order to minimize the diagnostic errors that result from the difficulty and subjectivity of visual interpretation, an automatic detection approach is required. The objectives of this paper were to propose a hybrid method using random forest and Gabor wavelet transformation to accurately differentiate which part belong to lesion area and the other is not in a dermoscopy photographs and analyze segmentation accuracy. A random forest classifier consisting of a set of decision trees was used for classification. Gabor wavelets transformation are the mathematical model of visual cortical cells of mammalian brain and an image can be decomposed into multiple scales and multiple orientations by using it. The Gabor function has been recognized as a very useful tool in texture analysis, due to its optimal localization properties in both spatial and frequency domain. Texture features based on Gabor wavelets transformation are found by the Gabor filtered image. Experiment results indicate the following: (1) the proposed algorithm based on random forest outperformed the-state-of-the-art in pigmented skin lesions detection (2) and the inclusion of Gabor wavelet transformation based texture features improved segmentation accuracy significantly.
SNP selection and classification of genome-wide SNP data using stratified sampling random forests.
Wu, Qingyao; Ye, Yunming; Liu, Yang; Ng, Michael K
2012-09-01
For high dimensional genome-wide association (GWA) case-control data of complex disease, there are usually a large portion of single-nucleotide polymorphisms (SNPs) that are irrelevant with the disease. A simple random sampling method in random forest using default mtry parameter to choose feature subspace, will select too many subspaces without informative SNPs. Exhaustive searching an optimal mtry is often required in order to include useful and relevant SNPs and get rid of vast of non-informative SNPs. However, it is too time-consuming and not favorable in GWA for high-dimensional data. The main aim of this paper is to propose a stratified sampling method for feature subspace selection to generate decision trees in a random forest for GWA high-dimensional data. Our idea is to design an equal-width discretization scheme for informativeness to divide SNPs into multiple groups. In feature subspace selection, we randomly select the same number of SNPs from each group and combine them to form a subspace to generate a decision tree. The advantage of this stratified sampling procedure can make sure each subspace contains enough useful SNPs, but can avoid a very high computational cost of exhaustive search of an optimal mtry, and maintain the randomness of a random forest. We employ two genome-wide SNP data sets (Parkinson case-control data comprised of 408 803 SNPs and Alzheimer case-control data comprised of 380 157 SNPs) to demonstrate that the proposed stratified sampling method is effective, and it can generate better random forest with higher accuracy and lower error bound than those by Breiman's random forest generation method. For Parkinson data, we also show some interesting genes identified by the method, which may be associated with neurological disorders for further biological investigations.
Teh, Seng Khoon; Zheng, Wei; Lau, David P; Huang, Zhiwei
2009-06-01
In this work, we evaluated the diagnostic ability of near-infrared (NIR) Raman spectroscopy associated with the ensemble recursive partitioning algorithm based on random forests for identifying cancer from normal tissue in the larynx. A rapid-acquisition NIR Raman system was utilized for tissue Raman measurements at 785 nm excitation, and 50 human laryngeal tissue specimens (20 normal; 30 malignant tumors) were used for NIR Raman studies. The random forests method was introduced to develop effective diagnostic algorithms for classification of Raman spectra of different laryngeal tissues. High-quality Raman spectra in the range of 800-1800 cm(-1) can be acquired from laryngeal tissue within 5 seconds. Raman spectra differed significantly between normal and malignant laryngeal tissues. Classification results obtained from the random forests algorithm on tissue Raman spectra yielded a diagnostic sensitivity of 88.0% and specificity of 91.4% for laryngeal malignancy identification. The random forests technique also provided variables importance that facilitates correlation of significant Raman spectral features with cancer transformation. This study shows that NIR Raman spectroscopy in conjunction with random forests algorithm has a great potential for the rapid diagnosis and detection of malignant tumors in the larynx.
Wang, Yiqin; Yan, Hanxia; Yan, Jianjun; Yuan, Fengyin; Xu, Zhaoxia; Liu, Guoping; Xu, Wenjie
2015-01-01
Objective. This research provides objective and quantitative parameters of the traditional Chinese medicine (TCM) pulse conditions for distinguishing between patients with the coronary heart disease (CHD) and normal people by using the proposed classification approach based on Hilbert-Huang transform (HHT) and random forest. Methods. The energy and the sample entropy features were extracted by applying the HHT to TCM pulse by treating these pulse signals as time series. By using the random forest classifier, the extracted two types of features and their combination were, respectively, used as input data to establish classification model. Results. Statistical results showed that there were significant differences in the pulse energy and sample entropy between the CHD group and the normal group. Moreover, the energy features, sample entropy features, and their combination were inputted as pulse feature vectors; the corresponding average recognition rates were 84%, 76.35%, and 90.21%, respectively. Conclusion. The proposed approach could be appropriately used to analyze pulses of patients with CHD, which can lay a foundation for research on objective and quantitative criteria on disease diagnosis or Zheng differentiation. PMID:26180536
Guo, Rui; Wang, Yiqin; Yan, Hanxia; Yan, Jianjun; Yuan, Fengyin; Xu, Zhaoxia; Liu, Guoping; Xu, Wenjie
2015-01-01
Objective. This research provides objective and quantitative parameters of the traditional Chinese medicine (TCM) pulse conditions for distinguishing between patients with the coronary heart disease (CHD) and normal people by using the proposed classification approach based on Hilbert-Huang transform (HHT) and random forest. Methods. The energy and the sample entropy features were extracted by applying the HHT to TCM pulse by treating these pulse signals as time series. By using the random forest classifier, the extracted two types of features and their combination were, respectively, used as input data to establish classification model. Results. Statistical results showed that there were significant differences in the pulse energy and sample entropy between the CHD group and the normal group. Moreover, the energy features, sample entropy features, and their combination were inputted as pulse feature vectors; the corresponding average recognition rates were 84%, 76.35%, and 90.21%, respectively. Conclusion. The proposed approach could be appropriately used to analyze pulses of patients with CHD, which can lay a foundation for research on objective and quantitative criteria on disease diagnosis or Zheng differentiation.
Analysis of landslide hazard area in Ludian earthquake based on Random Forests
NASA Astrophysics Data System (ADS)
Xie, J.-C.; Liu, R.; Li, H.-W.; Lai, Z.-L.
2015-04-01
With the development of machine learning theory, more and more algorithms are evaluated for seismic landslides. After the Ludian earthquake, the research team combine with the special geological structure in Ludian area and the seismic filed exploration results, selecting SLOPE(PODU); River distance(HL); Fault distance(DC); Seismic Intensity(LD) and Digital Elevation Model(DEM), the normalized difference vegetation index(NDVI) which based on remote sensing images as evaluation factors. But the relationships among these factors are fuzzy, there also exists heavy noise and high-dimensional, we introduce the random forest algorithm to tolerate these difficulties and get the evaluation result of Ludian landslide areas, in order to verify the accuracy of the result, using the ROC graphs for the result evaluation standard, AUC covers an area of 0.918, meanwhile, the random forest's generalization error rate decreases with the increase of the classification tree to the ideal 0.08 by using Out Of Bag(OOB) Estimation. Studying the final landslides inversion results, paper comes to a statistical conclusion that near 80% of the whole landslides and dilapidations are in areas with high susceptibility and moderate susceptibility, showing the forecast results are reasonable and adopted.
NASA Astrophysics Data System (ADS)
Purnomo, B.; Anggoro, S.; Izzati, M.
2017-06-01
The concept of forest management at the site level in the form of forest management units (KPH) implemented by the government in an effort to improve forest governance in Indonesia. Forest management must ensure fairness for all stakeholders, especially indigenous and local communities that have been the most marginalized groups. Local communities have become an important part in the efforts to achieve sustainable forest management. Public perception as one of the stakeholders in forest management need to be analyzed to determine their perspectives on the forest. This study aimed to analyze the perception and the level of community participation in forest management activities in KPHP Model Unit VII-Hulu Sarolangun, as well as examine the relationship between these two variables. Perception variables are divided into three categories: good, moderate and bad, while the participation variable is also divided into three categories: high, medium, and low. Data was obtained through semi-structured interviews with the key informants and questionnaires to randomly selected respondents. Statistical analysis was conducted to determine whether there are differences of perception and participation between the two villages and the relationship between perceptions of participation or not. The results showed 90,16 % of people have a good perception and the remaining 9,84% have a moderate perception. In general, community participation is at a low level that is as much as 76,17 % and only 1,55% had a high participation rate. The analysis showed differences in levels of participation between the two villages and there is no relationship between the perception and the level of community participation in forest management. The results of this study can be taken into consideration for KPHP and other stakeholders in forest management policy in the region KPHP.
Adapting GNU random forest program for Unix and Windows
NASA Astrophysics Data System (ADS)
Jirina, Marcel; Krayem, M. Said; Jirina, Marcel, Jr.
2013-10-01
The Random Forest is a well-known method and also a program for data clustering and classification. Unfortunately, the original Random Forest program is rather difficult to use. Here we describe a new version of this program originally written in Fortran 77. The modified program in Fortran 95 needs to be compiled only once and information for different tasks is passed with help of arguments. The program was tested with 24 data sets from UCI MLR and results are available on the net.
Modeling Verdict Outcomes Using Social Network Measures: The Watergate and Caviar Network Cases.
Masías, Víctor Hugo; Valle, Mauricio; Morselli, Carlo; Crespo, Fernando; Vargas, Augusto; Laengle, Sigifredo
2016-01-01
Modelling criminal trial verdict outcomes using social network measures is an emerging research area in quantitative criminology. Few studies have yet analyzed which of these measures are the most important for verdict modelling or which data classification techniques perform best for this application. To compare the performance of different techniques in classifying members of a criminal network, this article applies three different machine learning classifiers-Logistic Regression, Naïve Bayes and Random Forest-with a range of social network measures and the necessary databases to model the verdicts in two real-world cases: the U.S. Watergate Conspiracy of the 1970's and the now-defunct Canada-based international drug trafficking ring known as the Caviar Network. In both cases it was found that the Random Forest classifier did better than either Logistic Regression or Naïve Bayes, and its superior performance was statistically significant. This being so, Random Forest was used not only for classification but also to assess the importance of the measures. For the Watergate case, the most important one proved to be betweenness centrality while for the Caviar Network, it was the effective size of the network. These results are significant because they show that an approach combining machine learning with social network analysis not only can generate accurate classification models but also helps quantify the importance social network variables in modelling verdict outcomes. We conclude our analysis with a discussion and some suggestions for future work in verdict modelling using social network measures.
Dimitriadis, Stavros I; Liparas, Dimitris
2018-06-01
Neuroinformatics is a fascinating research field that applies computational models and analytical tools to high dimensional experimental neuroscience data for a better understanding of how the brain functions or dysfunctions in brain diseases. Neuroinformaticians work in the intersection of neuroscience and informatics supporting the integration of various sub-disciplines (behavioural neuroscience, genetics, cognitive psychology, etc.) working on brain research. Neuroinformaticians are the pathway of information exchange between informaticians and clinicians for a better understanding of the outcome of computational models and the clinical interpretation of the analysis. Machine learning is one of the most significant computational developments in the last decade giving tools to neuroinformaticians and finally to radiologists and clinicians for an automatic and early diagnosis-prognosis of a brain disease. Random forest (RF) algorithm has been successfully applied to high-dimensional neuroimaging data for feature reduction and also has been applied to classify the clinical label of a subject using single or multi-modal neuroimaging datasets. Our aim was to review the studies where RF was applied to correctly predict the Alzheimer's disease (AD), the conversion from mild cognitive impairment (MCI) and its robustness to overfitting, outliers and handling of non-linear data. Finally, we described our RF-based model that gave us the 1 st position in an international challenge for automated prediction of MCI from MRI data.
Mapping Deforestation area in North Korea Using Phenology-based Multi-Index and Random Forest
NASA Astrophysics Data System (ADS)
Jin, Y.; Sung, S.; Lee, D. K.; Jeong, S.
2016-12-01
Forest ecosystem provides ecological benefits to both humans and wildlife. Growing global demand for food and fiber is accelerating the pressure on the forest ecosystem in whole world from agriculture and logging. In recently, North Korea lost almost 40 % of its forests to crop fields for food production and cut-down of forest for fuel woods between 1990 and 2015. It led to the increased damage caused by natural disasters and is known to be one of the most forest degraded areas in the world. The characteristic of forest landscape in North Korea is complex and heterogeneous, the major landscape types in the forest are hillside farm, unstocked forest, natural forest and plateau vegetation. Remote sensing can be used for the forest degradation mapping of a dynamic landscape at a broad scale of detail and spatial distribution. Confusion mostly occurred between hillside farmland and unstocked forest, but also between unstocked forest and forest. Most previous forest degradation that used focused on the classification of broad types such as deforests area and sand from the perspective of land cover classification. The objective of this study is using random forest for mapping degraded forest in North Korea by phenological based vegetation index derived from MODIS products, which has various environmental factors such as vegetation, soil and water at a regional scale for improving accuracy. The model created by random forest resulted in an overall accuracy was 91.44%. Class user's accuracy of hillside farmland and unstocked forest were 97.2% and 84%%, which indicate the degraded forest. Unstocked forest had relative low user accuracy due to misclassified hillside farmland and forest samples. Producer's accuracy of hillside farmland and unstocked forest were 85.2% and 93.3%, repectly. In this case hillside farmland had lower produce accuracy mainly due to confusion with field, unstocked forest and forest. Such a classification of degraded forest could supply essential information to decide the priority of forest management and restoration in degraded forest area.
Jeffrey T. Walton
2008-01-01
Three machine learning subpixel estimation methods (Cubist, Random Forests, and support vector regression) were applied to estimate urban cover. Urban forest canopy cover and impervious surface cover were estimated from Landsat-7 ETM+ imagery using a higher resolution cover map resampled to 30 m as training and reference data. Three different band combinations (...
NASA Astrophysics Data System (ADS)
Hoffman, A.; Forest, C. E.; Kemanian, A.
2016-12-01
A significant number of food-insecure nations exist in regions of the world where dust plays a large role in the climate system. While the impacts of common climate variables (e.g. temperature, precipitation, ozone, and carbon dioxide) on crop yields are relatively well understood, the impact of mineral aerosols on yields have not yet been thoroughly investigated. This research aims to develop the data and tools to progress our understanding of mineral aerosol impacts on crop yields. Suspended dust affects crop yields by altering the amount and type of radiation reaching the plant, modifying local temperature and precipitation. While dust events (i.e. dust storms) affect crop yields by depleting the soil of nutrients or by defoliation via particle abrasion. The impact of dust on yields is modeled statistically because we are uncertain which impacts will dominate the response on national and regional scales considered in this study. Multiple linear regression is used in a number of large-scale statistical crop modeling studies to estimate yield responses to various climate variables. In alignment with previous work, we develop linear crop models, but build upon this simple method of regression with machine-learning techniques (e.g. random forests) to identify important statistical predictors and isolate how dust affects yields on the scales of interest. To perform this analysis, we develop a crop-climate dataset for maize, soybean, groundnut, sorghum, rice, and wheat for the regions of West Africa, East Africa, South Africa, and the Sahel. Random forest regression models consistently model historic crop yields better than the linear models. In several instances, the random forest models accurately capture the temperature and precipitation threshold behavior in crops. Additionally, improving agricultural technology has caused a well-documented positive trend that dominates time series of global and regional yields. This trend is often removed before regression with traditional crop models, but likely at the cost of removing climate information. Our random forest models consistently discover the positive trend without removing any additional data. The application of random forests as a statistical crop model provides insight into understanding the impact of dust on yields in marginal food producing regions.
A System-Level Pathway-Phenotype Association Analysis Using Synthetic Feature Random Forest
Pan, Qinxin; Hu, Ting; Malley, James D.; Andrew, Angeline S.; Karagas, Margaret R.; Moore, Jason H.
2015-01-01
As the cost of genome-wide genotyping decreases, the number of genome-wide association studies (GWAS) has increased considerably. However, the transition from GWAS findings to the underlying biology of various phenotypes remains challenging. As a result, due to its system-level interpretability, pathway analysis has become a popular tool for gaining insights on the underlying biology from high-throughput genetic association data. In pathway analyses, gene sets representing particular biological processes are tested for significant associations with a given phenotype. Most existing pathway analysis approaches rely on single-marker statistics and assume that pathways are independent of each other. As biological systems are driven by complex biomolecular interactions, embracing the complex relationships between single-nucleotide polymorphisms (SNPs) and pathways needs to be addressed. To incorporate the complexity of gene-gene interactions and pathway-pathway relationships, we propose a system-level pathway analysis approach, synthetic feature random forest (SF-RF), which is designed to detect pathway-phenotype associations without making assumptions about the relationships among SNPs or pathways. In our approach, the genotypes of SNPs in a particular pathway are aggregated into a synthetic feature representing that pathway via Random Forest (RF). Multiple synthetic features are analyzed using RF simultaneously and the significance of a synthetic feature indicates the significance of the corresponding pathway. We further complement SF-RF with pathway-based Statistical Epistasis Network (SEN) analysis that evaluates interactions among pathways. By investigating the pathway SEN, we hope to gain additional insights into the genetic mechanisms contributing to the pathway-phenotype association. We apply SF-RF to a population-based genetic study of bladder cancer and further investigate the mechanisms that help explain the pathway-phenotype associations using SEN. The bladder cancer associated pathways we found are both consistent with existing biological knowledge and reveal novel and plausible hypotheses for future biological validations. PMID:24535726
Global patterns of tropical forest fragmentation
NASA Astrophysics Data System (ADS)
Taubert, Franziska; Fischer, Rico; Groeneveld, Jürgen; Lehmann, Sebastian; Müller, Michael S.; Rödig, Edna; Wiegand, Thorsten; Huth, Andreas
2018-02-01
Remote sensing enables the quantification of tropical deforestation with high spatial resolution. This in-depth mapping has led to substantial advances in the analysis of continent-wide fragmentation of tropical forests. Here we identified approximately 130 million forest fragments in three continents that show surprisingly similar power-law size and perimeter distributions as well as fractal dimensions. Power-law distributions have been observed in many natural phenomena such as wildfires, landslides and earthquakes. The principles of percolation theory provide one explanation for the observed patterns, and suggest that forest fragmentation is close to the critical point of percolation; simulation modelling also supports this hypothesis. The observed patterns emerge not only from random deforestation, which can be described by percolation theory, but also from a wide range of deforestation and forest-recovery regimes. Our models predict that additional forest loss will result in a large increase in the total number of forest fragments—at maximum by a factor of 33 over 50 years—as well as a decrease in their size, and that these consequences could be partly mitigated by reforestation and forest protection.
Tanpitukpongse, Teerath P.; Mazurowski, Maciej A.; Ikhena, John; Petrella, Jeffrey R.
2016-01-01
Background and Purpose To assess prognostic efficacy of individual versus combined regional volumetrics in two commercially-available brain volumetric software packages for predicting conversion of patients with mild cognitive impairment to Alzheimer's disease. Materials and Methods Data was obtained through the Alzheimer's Disease Neuroimaging Initiative. 192 subjects (mean age 74.8 years, 39% female) diagnosed with mild cognitive impairment at baseline were studied. All had T1WI MRI sequences at baseline and 3-year clinical follow-up. Analysis was performed with NeuroQuant® and Neuroreader™. Receiver operating characteristic curves assessing the prognostic efficacy of each software package were generated using a univariable approach employing individual regional brain volumes, as well as two multivariable approaches (multiple regression and random forest), combining multiple volumes. Results On univariable analysis of 11 NeuroQuant® and 11 Neuroreader™ regional volumes, hippocampal volume had the highest area under the curve for both software packages (0.69 NeuroQuant®, 0.68 Neuroreader™), and was not significantly different (p > 0.05) between packages. Multivariable analysis did not increase the area under the curve for either package (0.63 logistic regression, 0.60 random forest NeuroQuant®; 0.65 logistic regression, 0.62 random forest Neuroreader™). Conclusion Of the multiple regional volume measures available in FDA-cleared brain volumetric software packages, hippocampal volume remains the best single predictor of conversion of mild cognitive impairment to Alzheimer's disease at 3-year follow-up. Combining volumetrics did not add additional prognostic efficacy. Therefore, future prognostic studies in MCI, combining such tools with demographic and other biomarker measures, are justified in using hippocampal volume as the only volumetric biomarker. PMID:28057634
2015-01-01
Color is one of the most prominent features of an image and used in many skin and face detection applications. Color space transformation is widely used by researchers to improve face and skin detection performance. Despite the substantial research efforts in this area, choosing a proper color space in terms of skin and face classification performance which can address issues like illumination variations, various camera characteristics and diversity in skin color tones has remained an open issue. This research proposes a new three-dimensional hybrid color space termed SKN by employing the Genetic Algorithm heuristic and Principal Component Analysis to find the optimal representation of human skin color in over seventeen existing color spaces. Genetic Algorithm heuristic is used to find the optimal color component combination setup in terms of skin detection accuracy while the Principal Component Analysis projects the optimal Genetic Algorithm solution to a less complex dimension. Pixel wise skin detection was used to evaluate the performance of the proposed color space. We have employed four classifiers including Random Forest, Naïve Bayes, Support Vector Machine and Multilayer Perceptron in order to generate the human skin color predictive model. The proposed color space was compared to some existing color spaces and shows superior results in terms of pixel-wise skin detection accuracy. Experimental results show that by using Random Forest classifier, the proposed SKN color space obtained an average F-score and True Positive Rate of 0.953 and False Positive Rate of 0.0482 which outperformed the existing color spaces in terms of pixel wise skin detection accuracy. The results also indicate that among the classifiers used in this study, Random Forest is the most suitable classifier for pixel wise skin detection applications. PMID:26267377
Genome analysis of Legionella pneumophila strains using a mixed-genome microarray.
Euser, Sjoerd M; Nagelkerke, Nico J; Schuren, Frank; Jansen, Ruud; Den Boer, Jeroen W
2012-01-01
Legionella, the causative agent for Legionnaires' disease, is ubiquitous in both natural and man-made aquatic environments. The distribution of Legionella genotypes within clinical strains is significantly different from that found in environmental strains. Developing novel genotypic methods that offer the ability to distinguish clinical from environmental strains could help to focus on more relevant (virulent) Legionella species in control efforts. Mixed-genome microarray data can be used to perform a comparative-genome analysis of strain collections, and advanced statistical approaches, such as the Random Forest algorithm are available to process these data. Microarray analysis was performed on a collection of 222 Legionella pneumophila strains, which included patient-derived strains from notified cases in The Netherlands in the period 2002-2006 and the environmental strains that were collected during the source investigation for those patients within the Dutch National Legionella Outbreak Detection Programme. The Random Forest algorithm combined with a logistic regression model was used to select predictive markers and to construct a predictive model that could discriminate between strains from different origin: clinical or environmental. Four genetic markers were selected that correctly predicted 96% of the clinical strains and 66% of the environmental strains collected within the Dutch National Legionella Outbreak Detection Programme. The Random Forest algorithm is well suited for the development of prediction models that use mixed-genome microarray data to discriminate between Legionella strains from different origin. The identification of these predictive genetic markers could offer the possibility to identify virulence factors within the Legionella genome, which in the future may be implemented in the daily practice of controlling Legionella in the public health environment.
Tillman, Fred; Anning, David W.; Heilman, Julian A.; Buto, Susan G.; Miller, Matthew P.
2018-01-01
Elevated concentrations of dissolved-solids (salinity) including calcium, sodium, sulfate, and chloride, among others, in the Colorado River cause substantial problems for its water users. Previous efforts to reduce dissolved solids in upper Colorado River basin (UCRB) streams often focused on reducing suspended-sediment transport to streams, but few studies have investigated the relationship between suspended sediment and salinity, or evaluated which watershed characteristics might be associated with this relationship. Are there catchment properties that may help in identifying areas where control of suspended sediment will also reduce salinity transport to streams? A random forests classification analysis was performed on topographic, climate, land cover, geology, rock chemistry, soil, and hydrologic information in 163 UCRB catchments. Two random forests models were developed in this study: one for exploring stream and catchment characteristics associated with stream sites where dissolved solids increase with increasing suspended-sediment concentration, and the other for predicting where these sites are located in unmonitored reaches. Results of variable importance from the exploratory random forests models indicate that no simple source, geochemical process, or transport mechanism can easily explain the relationship between dissolved solids and suspended sediment concentrations at UCRB monitoring sites. Among the most important watershed characteristics in both models were measures of soil hydraulic conductivity, soil erodibility, minimum catchment elevation, catchment area, and the silt component of soil in the catchment. Predictions at key locations in the basin were combined with observations from selected monitoring sites, and presented in map-form to give a complete understanding of where catchment sediment control practices would also benefit control of dissolved solids in streams.
Properties of Protein Drug Target Classes
Bull, Simon C.; Doig, Andrew J.
2015-01-01
Accurate identification of drug targets is a crucial part of any drug development program. We mined the human proteome to discover properties of proteins that may be important in determining their suitability for pharmaceutical modulation. Data was gathered concerning each protein’s sequence, post-translational modifications, secondary structure, germline variants, expression profile and drug target status. The data was then analysed to determine features for which the target and non-target proteins had significantly different values. This analysis was repeated for subsets of the proteome consisting of all G-protein coupled receptors, ion channels, kinases and proteases, as well as proteins that are implicated in cancer. Machine learning was used to quantify the proteins in each dataset in terms of their potential to serve as a drug target. This was accomplished by first inducing a random forest that could distinguish between its targets and non-targets, and then using the random forest to quantify the drug target likeness of the non-targets. The properties that can best differentiate targets from non-targets were primarily those that are directly related to a protein’s sequence (e.g. secondary structure). Germline variants, expression levels and interactions between proteins had minimal discriminative power. Overall, the best indicators of drug target likeness were found to be the proteins’ hydrophobicities, in vivo half-lives, propensity for being membrane bound and the fraction of non-polar amino acids in their sequences. In terms of predicting potential targets, datasets of proteases, ion channels and cancer proteins were able to induce random forests that were highly capable of distinguishing between targets and non-targets. The non-target proteins predicted to be targets by these random forests comprise the set of the most suitable potential future drug targets, and should therefore be prioritised when building a drug development programme. PMID:25822509
Su, Ruiliang; Chen, Xiang; Cao, Shuai; Zhang, Xu
2016-01-14
Sign language recognition (SLR) has been widely used for communication amongst the hearing-impaired and non-verbal community. This paper proposes an accurate and robust SLR framework using an improved decision tree as the base classifier of random forests. This framework was used to recognize Chinese sign language subwords using recordings from a pair of portable devices worn on both arms consisting of accelerometers (ACC) and surface electromyography (sEMG) sensors. The experimental results demonstrated the validity of the proposed random forest-based method for recognition of Chinese sign language (CSL) subwords. With the proposed method, 98.25% average accuracy was obtained for the classification of a list of 121 frequently used CSL subwords. Moreover, the random forests method demonstrated a superior performance in resisting the impact of bad training samples. When the proportion of bad samples in the training set reached 50%, the recognition error rate of the random forest-based method was only 10.67%, while that of a single decision tree adopted in our previous work was almost 27.5%. Our study offers a practical way of realizing a robust and wearable EMG-ACC-based SLR systems.
Pseudo CT estimation from MRI using patch-based random forest
NASA Astrophysics Data System (ADS)
Yang, Xiaofeng; Lei, Yang; Shu, Hui-Kuo; Rossi, Peter; Mao, Hui; Shim, Hyunsuk; Curran, Walter J.; Liu, Tian
2017-02-01
Recently, MR simulators gain popularity because of unnecessary radiation exposure of CT simulators being used in radiation therapy planning. We propose a method for pseudo CT estimation from MR images based on a patch-based random forest. Patient-specific anatomical features are extracted from the aligned training images and adopted as signatures for each voxel. The most robust and informative features are identified using feature selection to train the random forest. The well-trained random forest is used to predict the pseudo CT of a new patient. This prediction technique was tested with human brain images and the prediction accuracy was assessed using the original CT images. Peak signal-to-noise ratio (PSNR) and feature similarity (FSIM) indexes were used to quantify the differences between the pseudo and original CT images. The experimental results showed the proposed method could accurately generate pseudo CT images from MR images. In summary, we have developed a new pseudo CT prediction method based on patch-based random forest, demonstrated its clinical feasibility, and validated its prediction accuracy. This pseudo CT prediction technique could be a useful tool for MRI-based radiation treatment planning and attenuation correction in a PET/MRI scanner.
Detecting targets hidden in random forests
NASA Astrophysics Data System (ADS)
Kouritzin, Michael A.; Luo, Dandan; Newton, Fraser; Wu, Biao
2009-05-01
Military tanks, cargo or troop carriers, missile carriers or rocket launchers often hide themselves from detection in the forests. This plagues the detection problem of locating these hidden targets. An electro-optic camera mounted on a surveillance aircraft or unmanned aerial vehicle is used to capture the images of the forests with possible hidden targets, e.g., rocket launchers. We consider random forests of longitudinal and latitudinal correlations. Specifically, foliage coverage is encoded with a binary representation (i.e., foliage or no foliage), and is correlated in adjacent regions. We address the detection problem of camouflaged targets hidden in random forests by building memory into the observations. In particular, we propose an efficient algorithm to generate random forests, ground, and camouflage of hidden targets with two dimensional correlations. The observations are a sequence of snapshots consisting of foliage-obscured ground or target. Theoretically, detection is possible because there are subtle differences in the correlations of the ground and camouflage of the rocket launcher. However, these differences are well beyond human perception. To detect the presence of hidden targets automatically, we develop a Markov representation for these sequences and modify the classical filtering equations to allow the Markov chain observation. Particle filters are used to estimate the position of the targets in combination with a novel random weighting technique. Furthermore, we give positive proof-of-concept simulations.
Application of lifting wavelet and random forest in compound fault diagnosis of gearbox
NASA Astrophysics Data System (ADS)
Chen, Tang; Cui, Yulian; Feng, Fuzhou; Wu, Chunzhi
2018-03-01
Aiming at the weakness of compound fault characteristic signals of a gearbox of an armored vehicle and difficult to identify fault types, a fault diagnosis method based on lifting wavelet and random forest is proposed. First of all, this method uses the lifting wavelet transform to decompose the original vibration signal in multi-layers, reconstructs the multi-layer low-frequency and high-frequency components obtained by the decomposition to get multiple component signals. Then the time-domain feature parameters are obtained for each component signal to form multiple feature vectors, which is input into the random forest pattern recognition classifier to determine the compound fault type. Finally, a variety of compound fault data of the gearbox fault analog test platform are verified, the results show that the recognition accuracy of the fault diagnosis method combined with the lifting wavelet and the random forest is up to 99.99%.
Ramírez, J; Górriz, J M; Segovia, F; Chaves, R; Salas-Gonzalez, D; López, M; Alvarez, I; Padilla, P
2010-03-19
This letter shows a computer aided diagnosis (CAD) technique for the early detection of the Alzheimer's disease (AD) by means of single photon emission computed tomography (SPECT) image classification. The proposed method is based on partial least squares (PLS) regression model and a random forest (RF) predictor. The challenge of the curse of dimensionality is addressed by reducing the large dimensionality of the input data by downscaling the SPECT images and extracting score features using PLS. A RF predictor then forms an ensemble of classification and regression tree (CART)-like classifiers being its output determined by a majority vote of the trees in the forest. A baseline principal component analysis (PCA) system is also developed for reference. The experimental results show that the combined PLS-RF system yields a generalization error that converges to a limit when increasing the number of trees in the forest. Thus, the generalization error is reduced when using PLS and depends on the strength of the individual trees in the forest and the correlation between them. Moreover, PLS feature extraction is found to be more effective for extracting discriminative information from the data than PCA yielding peak sensitivity, specificity and accuracy values of 100%, 92.7%, and 96.9%, respectively. Moreover, the proposed CAD system outperformed several other recently developed AD CAD systems. Copyright 2010 Elsevier Ireland Ltd. All rights reserved.
Rao, Meenakshi; George, Linda A; Shandas, Vivek; Rosenstiel, Todd N
2017-07-10
Understanding how local land use and land cover (LULC) shapes intra-urban concentrations of atmospheric pollutants-and thus human health-is a key component in designing healthier cities. Here, NO₂ is modeled based on spatially dense summer and winter NO₂ observations in Portland-Hillsboro-Vancouver (USA), and the spatial variation of NO₂ with LULC investigated using random forest, an ensemble data learning technique. The NO 2 random forest model, together with BenMAP, is further used to develop a better understanding of the relationship among LULC, ambient NO₂ and respiratory health. The impact of land use modifications on ambient NO₂, and consequently on respiratory health, is also investigated using a sensitivity analysis. We find that NO₂ associated with roadways and tree-canopied areas may be affecting annual incidence rates of asthma exacerbation in 4-12 year olds by +3000 per 100,000 and -1400 per 100,000, respectively. Our model shows that increasing local tree canopy by 5% may reduce local incidences rates of asthma exacerbation by 6%, indicating that targeted local tree-planting efforts may have a substantial impact on reducing city-wide incidence of respiratory distress. Our findings demonstrate the utility of random forest modeling in evaluating LULC modifications for enhanced respiratory health.
Garland, Ellen C; Castellote, Manuel; Berchok, Catherine L
2015-06-01
Beluga whales, Delphinapterus leucas, have a graded call system; call types exist on a continuum making classification challenging. A description of vocalizations from the eastern Beaufort Sea beluga population during its spring migration are presented here, using both a non-parametric classification tree analysis (CART), and a Random Forest analysis. Twelve frequency and duration measurements were made on 1019 calls recorded over 14 days off Icy Cape, Alaska, resulting in 34 identifiable call types with 83% agreement in classification for both CART and Random Forest analyses. This high level of agreement in classification, with an initial subjective classification of calls into 36 categories, demonstrates that the methods applied here provide a quantitative analysis of a graded call dataset. Further, as calls cannot be attributed to individuals using single sensor passive acoustic monitoring efforts, these methods provide a comprehensive analysis of data where the influence of pseudo-replication of calls from individuals is unknown. This study is the first to describe the vocal repertoire of a beluga population using a robust and repeatable methodology. A baseline eastern Beaufort Sea beluga population repertoire is presented here, against which the call repertoire of other seasonally sympatric Alaskan beluga populations can be compared.
Barreto-Silva, Juan Sebastian; López, Dairon Cárdenas; Montoya, Alvaro Javier Duque
2014-03-01
The effect of environmental variation on the structure of tree communities in tropical forests is still under debate. There is evidence that in landscapes like Tierra Firme forest, where the environmental gradient decreases at a local level, the effect of soil on the distribution patterns of plant species is minimal, happens to be random or is due to biological processes. In contrast, in studies with different kinds of plants from tropical forests, a greater effect on floristic composition of varying soil and topography has been reported. To assess this, the current study was carried out in a permanent plot of ten hectares in the Amacayacu National Park, Colombian Amazonia. To run the analysis, floristic and environmental variations were obtained according to tree species abundance categories and growth forms. In order to quantify the role played by both environmental filtering and dispersal limitation, the variation of the spatial configuration was included. We used Detrended Correspondence Analysis and Canonical Correspondence Analysis, followed by a variation partitioning, to analyze the species distribution patterns. The spatial template was evaluated using the Principal Coordinates of Neighbor Matrix method. We recorded 14 074 individuals from 1 053 species and 80 families. The most abundant families were Myristicaceae, Moraceae, Meliaceae, Arecaceae and Lecythidaceae, coinciding with other studies from Northwest Amazonia. Beta diversity was relatively low within the plot. Soils were very poor, had high aluminum concentration and were predominantly clayey. The floristic differences explained along the ten hectares plot were mainly associated to biological processes, such as dispersal limitation. The largest proportion of community variation in our dataset was unexplained by either environmental or spatial data. In conclusion, these results support random processes as the major drivers of the spatial variation of tree species at a local scale on Tierra Firme forests of Amacayacu National Park, and suggest reserve's size as a key element to ensure the conservation of plant diversity at both regional and local levels.
Using random forests for assistance in the curation of G-protein coupled receptor databases.
Shkurin, Aleksei; Vellido, Alfredo
2017-08-18
Biology is experiencing a gradual but fast transformation from a laboratory-centred science towards a data-centred one. As such, it requires robust data engineering and the use of quantitative data analysis methods as part of database curation. This paper focuses on G protein-coupled receptors, a large and heterogeneous super-family of cell membrane proteins of interest to biology in general. One of its families, Class C, is of particular interest to pharmacology and drug design. This family is quite heterogeneous on its own, and the discrimination of its several sub-families is a challenging problem. In the absence of known crystal structure, such discrimination must rely on their primary amino acid sequences. We are interested not as much in achieving maximum sub-family discrimination accuracy using quantitative methods, but in exploring sequence misclassification behavior. Specifically, we are interested in isolating those sequences showing consistent misclassification, that is, sequences that are very often misclassified and almost always to the same wrong sub-family. Random forests are used for this analysis due to their ensemble nature, which makes them naturally suited to gauge the consistency of misclassification. This consistency is here defined through the voting scheme of their base tree classifiers. Detailed consistency results for the random forest ensemble classification were obtained for all receptors and for all data transformations of their unaligned primary sequences. Shortlists of the most consistently misclassified receptors for each subfamily and transformation, as well as an overall shortlist including those cases that were consistently misclassified across transformations, were obtained. The latter should be referred to experts for further investigation as a data curation task. The automatic discrimination of the Class C sub-families of G protein-coupled receptors from their unaligned primary sequences shows clear limits. This study has investigated in some detail the consistency of their misclassification using random forest ensemble classifiers. Different sub-families have been shown to display very different discrimination consistency behaviors. The individual identification of consistently misclassified sequences should provide a tool for quality control to GPCR database curators.
Carlos Alberto Silva; Carine Klauberg; Andrew Thomas Hudak; Lee Alexander Vierling; Wan Shafrina Wan Mohd Jaafar; Midhun Mohan; Mariano Garcia; Antonio Ferraz; Adrian Cardil; Sassan Saatchi
2017-01-01
Improvements in the management of pine plantations result in multiple industrial and environmental benefits. Remote sensing techniques can dramatically increase the efficiency of plantation management by reducing or replacing time-consuming field sampling. We tested the utility and accuracy of combining field and airborne lidar data with Random Forest, a supervised...
Uncertainty in Random Forests: What does it mean in a spatial context?
NASA Astrophysics Data System (ADS)
Klump, Jens; Fouedjio, Francky
2017-04-01
Geochemical surveys are an important part of exploration for mineral resources and in environmental studies. The samples and chemical analyses are often laborious and difficult to obtain and therefore come at a high cost. As a consequence, these surveys are characterised by datasets with large numbers of variables but relatively few data points when compared to conventional big data problems. With more remote sensing platforms and sensor networks being deployed, large volumes of auxiliary data of the surveyed areas are becoming available. The use of these auxiliary data has the potential to improve the prediction of chemical element concentrations over the whole study area. Kriging is a well established geostatistical method for the prediction of spatial data but requires significant pre-processing and makes some basic assumptions about the underlying distribution of the data. Some machine learning algorithms, on the other hand, may require less data pre-processing and are non-parametric. In this study we used a dataset provided by Kirkwood et al. [1] to explore the potential use of Random Forest in geochemical mapping. We chose Random Forest because it is a well understood machine learning method and has the advantage that it provides us with a measure of uncertainty. By comparing Random Forest to Kriging we found that both methods produced comparable maps of estimated values for our variables of interest. Kriging outperformed Random Forest for variables of interest with relatively strong spatial correlation. The measure of uncertainty provided by Random Forest seems to be quite different to the measure of uncertainty provided by Kriging. In particular, the lack of spatial context can give misleading results in areas without ground truth data. In conclusion, our preliminary results show that the model driven approach in geostatistics gives us more reliable estimates for our target variables than Random Forest for variables with relatively strong spatial correlation. However, in cases of weak spatial correlation Random Forest, as a nonparametric method, may give the better results once we have a better understanding of the meaning of its uncertainty measures in a spatial context. References [1] Kirkwood, C., M. Cave, D. Beamish, S. Grebby, and A. Ferreira (2016), A machine learning approach to geochemical mapping, Journal of Geochemical Exploration, 163, 28-40, doi:10.1016/j.gexplo.2016.05.003.
NASA Astrophysics Data System (ADS)
Zafari, A.; Zurita-Milla, R.; Izquierdo-Verdiguier, E.
2017-10-01
Crop maps are essential inputs for the agricultural planning done at various governmental and agribusinesses agencies. Remote sensing offers timely and costs efficient technologies to identify and map crop types over large areas. Among the plethora of classification methods, Support Vector Machine (SVM) and Random Forest (RF) are widely used because of their proven performance. In this work, we study the synergic use of both methods by introducing a random forest kernel (RFK) in an SVM classifier. A time series of multispectral WorldView-2 images acquired over Mali (West Africa) in 2014 was used to develop our case study. Ground truth containing five common crop classes (cotton, maize, millet, peanut, and sorghum) were collected at 45 farms and used to train and test the classifiers. An SVM with the standard Radial Basis Function (RBF) kernel, a RF, and an SVM-RFK were trained and tested over 10 random training and test subsets generated from the ground data. Results show that the newly proposed SVM-RFK classifier can compete with both RF and SVM-RBF. The overall accuracies based on the spectral bands only are of 83, 82 and 83% respectively. Adding vegetation indices to the analysis result in the classification accuracy of 82, 81 and 84% for SVM-RFK, RF, and SVM-RBF respectively. Overall, it can be observed that the newly tested RFK can compete with SVM-RBF and RF classifiers in terms of classification accuracy.
Correcting Classifiers for Sample Selection Bias in Two-Phase Case-Control Studies
Theis, Fabian J.
2017-01-01
Epidemiological studies often utilize stratified data in which rare outcomes or exposures are artificially enriched. This design can increase precision in association tests but distorts predictions when applying classifiers on nonstratified data. Several methods correct for this so-called sample selection bias, but their performance remains unclear especially for machine learning classifiers. With an emphasis on two-phase case-control studies, we aim to assess which corrections to perform in which setting and to obtain methods suitable for machine learning techniques, especially the random forest. We propose two new resampling-based methods to resemble the original data and covariance structure: stochastic inverse-probability oversampling and parametric inverse-probability bagging. We compare all techniques for the random forest and other classifiers, both theoretically and on simulated and real data. Empirical results show that the random forest profits from only the parametric inverse-probability bagging proposed by us. For other classifiers, correction is mostly advantageous, and methods perform uniformly. We discuss consequences of inappropriate distribution assumptions and reason for different behaviors between the random forest and other classifiers. In conclusion, we provide guidance for choosing correction methods when training classifiers on biased samples. For random forests, our method outperforms state-of-the-art procedures if distribution assumptions are roughly fulfilled. We provide our implementation in the R package sambia. PMID:29312464
Esmaily, Habibollah; Tayefi, Maryam; Doosti, Hassan; Ghayour-Mobarhan, Majid; Nezami, Hossein; Amirabadizadeh, Alireza
2018-04-24
We aimed to identify the associated risk factors of type 2 diabetes mellitus (T2DM) using data mining approach, decision tree and random forest techniques using the Mashhad Stroke and Heart Atherosclerotic Disorders (MASHAD) Study program. A cross-sectional study. The MASHAD study started in 2010 and will continue until 2020. Two data mining tools, namely decision trees, and random forests, are used for predicting T2DM when some other characteristics are observed on 9528 subjects recruited from MASHAD database. This paper makes a comparison between these two models in terms of accuracy, sensitivity, specificity and the area under ROC curve. The prevalence rate of T2DM was 14% among these subjects. The decision tree model has 64.9% accuracy, 64.5% sensitivity, 66.8% specificity, and area under the ROC curve measuring 68.6%, while the random forest model has 71.1% accuracy, 71.3% sensitivity, 69.9% specificity, and area under the ROC curve measuring 77.3% respectively. The random forest model, when used with demographic, clinical, and anthropometric and biochemical measurements, can provide a simple tool to identify associated risk factors for type 2 diabetes. Such identification can substantially use for managing the health policy to reduce the number of subjects with T2DM .
Applications of random forest feature selection for fine-scale genetic population assignment.
Sylvester, Emma V A; Bentzen, Paul; Bradbury, Ian R; Clément, Marie; Pearce, Jon; Horne, John; Beiko, Robert G
2018-02-01
Genetic population assignment used to inform wildlife management and conservation efforts requires panels of highly informative genetic markers and sensitive assignment tests. We explored the utility of machine-learning algorithms (random forest, regularized random forest and guided regularized random forest) compared with F ST ranking for selection of single nucleotide polymorphisms (SNP) for fine-scale population assignment. We applied these methods to an unpublished SNP data set for Atlantic salmon ( Salmo salar ) and a published SNP data set for Alaskan Chinook salmon ( Oncorhynchus tshawytscha ). In each species, we identified the minimum panel size required to obtain a self-assignment accuracy of at least 90% using each method to create panels of 50-700 markers Panels of SNPs identified using random forest-based methods performed up to 7.8 and 11.2 percentage points better than F ST -selected panels of similar size for the Atlantic salmon and Chinook salmon data, respectively. Self-assignment accuracy ≥90% was obtained with panels of 670 and 384 SNPs for each data set, respectively, a level of accuracy never reached for these species using F ST -selected panels. Our results demonstrate a role for machine-learning approaches in marker selection across large genomic data sets to improve assignment for management and conservation of exploited populations.
Do little interactions get lost in dark random forests?
Wright, Marvin N; Ziegler, Andreas; König, Inke R
2016-03-31
Random forests have often been claimed to uncover interaction effects. However, if and how interaction effects can be differentiated from marginal effects remains unclear. In extensive simulation studies, we investigate whether random forest variable importance measures capture or detect gene-gene interactions. With capturing interactions, we define the ability to identify a variable that acts through an interaction with another one, while detection is the ability to identify an interaction effect as such. Of the single importance measures, the Gini importance captured interaction effects in most of the simulated scenarios, however, they were masked by marginal effects in other variables. With the permutation importance, the proportion of captured interactions was lower in all cases. Pairwise importance measures performed about equal, with a slight advantage for the joint variable importance method. However, the overall fraction of detected interactions was low. In almost all scenarios the detection fraction in a model with only marginal effects was larger than in a model with an interaction effect only. Random forests are generally capable of capturing gene-gene interactions, but current variable importance measures are unable to detect them as interactions. In most of the cases, interactions are masked by marginal effects and interactions cannot be differentiated from marginal effects. Consequently, caution is warranted when claiming that random forests uncover interactions.
NASA Astrophysics Data System (ADS)
Dokuchaev, P. M.; Meshalkina, J. L.; Yaroslavtsev, A. M.
2018-01-01
Comparative analysis of soils geospatial modeling using multinomial logistic regression, decision trees, random forest, regression trees and support vector machines algorithms was conducted. The visual interpretation of the digital maps obtained and their comparison with the existing map, as well as the quantitative assessment of the individual soil groups detection overall accuracy and of the models kappa showed that multiple logistic regression, support vector method, and random forest models application with spatial prediction of the conditional soil groups distribution can be reliably used for mapping of the study area. It has shown the most accurate detection for sod-podzolics soils (Phaeozems Albic) lightly eroded and moderately eroded soils. In second place, according to the mean overall accuracy of the prediction, there are sod-podzolics soils - non-eroded and warp one, as well as sod-gley soils (Umbrisols Gleyic) and alluvial soils (Fluvisols Dystric, Umbric). Heavy eroded sod-podzolics and gray forest soils (Phaeozems Albic) were detected by methods of automatic classification worst of all.
Comparing spatial regression to random forests for large environmental data sets
Environmental data may be “large” due to number of records, number of covariates, or both. Random forests has a reputation for good predictive performance when using many covariates, whereas spatial regression, when using reduced rank methods, has a reputatio...
NASA Astrophysics Data System (ADS)
Abdolmanafi, Atefeh; Prasad, Arpan Suravi; Duong, Luc; Dahdah, Nagib
2016-03-01
Intravascular imaging modalities, such as Optical Coherence Tomography (OCT) allow nowadays improving diagnosis, treatment, follow-up, and even prevention of coronary artery disease in the adult. OCT has been recently used in children following Kawasaki disease (KD), the most prevalent acquired coronary artery disease during childhood with devastating complications. The assessment of coronary artery layers with OCT and early detection of coronary sequelae secondary to KD is a promising tool for preventing myocardial infarction in this population. More importantly, OCT is promising for tissue quantification of the inner vessel wall, including neo intima luminal myofibroblast proliferation, calcification, and fibrous scar deposits. The goal of this study is to classify the coronary artery layers of OCT imaging obtained from a series of KD patients. Our approach is focused on developing a robust Random Forest classifier built on the idea of randomly selecting a subset of features at each node and based on second- and higher-order statistical texture analysis which estimates the gray-level spatial distribution of images by specifying the local features of each pixel and extracting the statistics from their distribution. The average classification accuracy for intima and media are 76.36% and 73.72% respectively. Random forest classifier with texture analysis promises for classification of coronary artery tissue.
Random forest (RF) modeling has emerged as an important statistical learning method in ecology due to its exceptional predictive performance. However, for large and complex ecological datasets there is limited guidance on variable selection methods for RF modeling. Typically, e...
Sankari, E Siva; Manimegalai, D
2017-12-21
Predicting membrane protein types is an important and challenging research area in bioinformatics and proteomics. Traditional biophysical methods are used to classify membrane protein types. Due to large exploration of uncharacterized protein sequences in databases, traditional methods are very time consuming, expensive and susceptible to errors. Hence, it is highly desirable to develop a robust, reliable, and efficient method to predict membrane protein types. Imbalanced datasets and large datasets are often handled well by decision tree classifiers. Since imbalanced datasets are taken, the performance of various decision tree classifiers such as Decision Tree (DT), Classification And Regression Tree (CART), C4.5, Random tree, REP (Reduced Error Pruning) tree, ensemble methods such as Adaboost, RUS (Random Under Sampling) boost, Rotation forest and Random forest are analysed. Among the various decision tree classifiers Random forest performs well in less time with good accuracy of 96.35%. Another inference is RUS boost decision tree classifier is able to classify one or two samples in the class with very less samples while the other classifiers such as DT, Adaboost, Rotation forest and Random forest are not sensitive for the classes with fewer samples. Also the performance of decision tree classifiers is compared with SVM (Support Vector Machine) and Naive Bayes classifier. Copyright © 2017 Elsevier Ltd. All rights reserved.
Xiao, Li-Hong; Chen, Pei-Ran; Gou, Zhong-Ping; Li, Yong-Zhong; Li, Mei; Xiang, Liang-Cheng; Feng, Ping
2017-01-01
The aim of this study is to evaluate the ability of the random forest algorithm that combines data on transrectal ultrasound findings, age, and serum levels of prostate-specific antigen to predict prostate carcinoma. Clinico-demographic data were analyzed for 941 patients with prostate diseases treated at our hospital, including age, serum prostate-specific antigen levels, transrectal ultrasound findings, and pathology diagnosis based on ultrasound-guided needle biopsy of the prostate. These data were compared between patients with and without prostate cancer using the Chi-square test, and then entered into the random forest model to predict diagnosis. Patients with and without prostate cancer differed significantly in age and serum prostate-specific antigen levels (P < 0.001), as well as in all transrectal ultrasound characteristics (P < 0.05) except uneven echo (P = 0.609). The random forest model based on age, prostate-specific antigen and ultrasound predicted prostate cancer with an accuracy of 83.10%, sensitivity of 65.64%, and specificity of 93.83%. Positive predictive value was 86.72%, and negative predictive value was 81.64%. By integrating age, prostate-specific antigen levels and transrectal ultrasound findings, the random forest algorithm shows better diagnostic performance for prostate cancer than either diagnostic indicator on its own. This algorithm may help improve diagnosis of the disease by identifying patients at high risk for biopsy.
Rao, Meenakshi; George, Linda A.; Shandas, Vivek; Rosenstiel, Todd N.
2017-01-01
Understanding how local land use and land cover (LULC) shapes intra-urban concentrations of atmospheric pollutants—and thus human health—is a key component in designing healthier cities. Here, NO2 is modeled based on spatially dense summer and winter NO2 observations in Portland-Hillsboro-Vancouver (USA), and the spatial variation of NO2 with LULC investigated using random forest, an ensemble data learning technique. The NO2 random forest model, together with BenMAP, is further used to develop a better understanding of the relationship among LULC, ambient NO2 and respiratory health. The impact of land use modifications on ambient NO2, and consequently on respiratory health, is also investigated using a sensitivity analysis. We find that NO2 associated with roadways and tree-canopied areas may be affecting annual incidence rates of asthma exacerbation in 4–12 year olds by +3000 per 100,000 and −1400 per 100,000, respectively. Our model shows that increasing local tree canopy by 5% may reduce local incidences rates of asthma exacerbation by 6%, indicating that targeted local tree-planting efforts may have a substantial impact on reducing city-wide incidence of respiratory distress. Our findings demonstrate the utility of random forest modeling in evaluating LULC modifications for enhanced respiratory health. PMID:28698523
Missouri Ozark Forest Ecosystem Project: the experiment
Steven L. Sheriff
2002-01-01
Missouri Ozark Forest Ecosystem Project (MOFEP) is a unique experiment to learn about the impacts of management practices on a forest system. Three forest management practices (uneven-aged management, even-aged management, and no-harvest management) as practiced by the Missouri Department of Conservation were randomly assigned to nine forest management sites using a...
Random forest (RF) is popular in ecological and environmental modeling, in part, because of its insensitivity to correlated predictors and resistance to overfitting. Although variable selection has been proposed to improve both performance and interpretation of RF models, it is u...
Random Forests for Evaluating Pedagogy and Informing Personalized Learning
ERIC Educational Resources Information Center
Spoon, Kelly; Beemer, Joshua; Whitmer, John C.; Fan, Juanjuan; Frazee, James P.; Stronach, Jeanne; Bohonak, Andrew J.; Levine, Richard A.
2016-01-01
Random forests are presented as an analytics foundation for educational data mining tasks. The focus is on course- and program-level analytics including evaluating pedagogical approaches and interventions and identifying and characterizing at-risk students. As part of this development, the concept of individualized treatment effects (ITE) is…
USDA-ARS?s Scientific Manuscript database
Palmer amaranth (Amaranthus palmeri S. Wats.) invasion negatively impacts cotton (Gossypium hirsutum L.) production systems throughout the United States. The objective of this study was to evaluate canopy hyperspectral narrowband data as input into the random forest machine learning algorithm to dis...
Tian, Xin; Xin, Mingyuan; Luo, Jian; Liu, Mingyao; Jiang, Zhenran
2017-02-01
The selection of relevant genes for breast cancer metastasis is critical for the treatment and prognosis of cancer patients. Although much effort has been devoted to the gene selection procedures by use of different statistical analysis methods or computational techniques, the interpretation of the variables in the resulting survival models has been limited so far. This article proposes a new Random Forest (RF)-based algorithm to identify important variables highly related with breast cancer metastasis, which is based on the important scores of two variable selection algorithms, including the mean decrease Gini (MDG) criteria of Random Forest and the GeneRank algorithm with protein-protein interaction (PPI) information. The new gene selection algorithm can be called PPIRF. The improved prediction accuracy fully illustrated the reliability and high interpretability of gene list selected by the PPIRF approach.
An assessment of the effectiveness of a random forest classifier for land-cover classification
NASA Astrophysics Data System (ADS)
Rodriguez-Galiano, V. F.; Ghimire, B.; Rogan, J.; Chica-Olmo, M.; Rigol-Sanchez, J. P.
2012-01-01
Land cover monitoring using remotely sensed data requires robust classification methods which allow for the accurate mapping of complex land cover and land use categories. Random forest (RF) is a powerful machine learning classifier that is relatively unknown in land remote sensing and has not been evaluated thoroughly by the remote sensing community compared to more conventional pattern recognition techniques. Key advantages of RF include: their non-parametric nature; high classification accuracy; and capability to determine variable importance. However, the split rules for classification are unknown, therefore RF can be considered to be black box type classifier. RF provides an algorithm for estimating missing values; and flexibility to perform several types of data analysis, including regression, classification, survival analysis, and unsupervised learning. In this paper, the performance of the RF classifier for land cover classification of a complex area is explored. Evaluation was based on several criteria: mapping accuracy, sensitivity to data set size and noise. Landsat-5 Thematic Mapper data captured in European spring and summer were used with auxiliary variables derived from a digital terrain model to classify 14 different land categories in the south of Spain. Results show that the RF algorithm yields accurate land cover classifications, with 92% overall accuracy and a Kappa index of 0.92. RF is robust to training data reduction and noise because significant differences in kappa values were only observed for data reduction and noise addition values greater than 50 and 20%, respectively. Additionally, variables that RF identified as most important for classifying land cover coincided with expectations. A McNemar test indicates an overall better performance of the random forest model over a single decision tree at the 0.00001 significance level.
Doyle, T.W.; Krauss, K.W.; Wells, C.J.
2009-01-01
The Everglades ecosystem contains the largest contiguous tract of mangrove forest outside the tropics that were also coincidentally intersected by a major Category 5 hurricane. Airborne videography was flown to capture the landscape pattern and process of forest damage in relation to storm trajectory and circulation. Two aerial video transects, representing different topographic positions, were used to quantify forest damage from video frame analysis in relation to prevailing wind force, treefall direction, and forest height. A hurricane simulation model was applied to reconstruct wind fields corresponding to the ground location of each video frame and to correlate observed treefall and destruction patterns with wind speed and direction. Mangrove forests within the storm's eyepath and in the right-side (forewind) quadrants suffered whole or partial blowdowns, while left-side (backwind) sites south of the eyewall zone incurred moderate canopy reduction and defoliation. Sites along the coastal transect sustained substantially more storm damage than sites along the inland transect which may be attributed to differences in stand exposure and/or stature. Observed treefall directions were shown to be non-random and associated with hurricane trajectory and simulated forewind azimuths. Wide-area sampling using airborne videography provided an efficient adjunct to limited ground observations and improved our spatial understanding of how hurricanes imprint landscape-scale patterns of disturbance. ?? 2009 The Society of Wetland Scientists.
NASA Technical Reports Server (NTRS)
Aldrich, R. C.; Greentree, W. J.; Heller, R. C.; Norick, N. X.
1970-01-01
In October 1969, an investigation was begun near Atlanta, Georgia, to explore the possibilities of developing predictors for forest land and stand condition classifications using space photography. It has been found that forest area can be predicted with reasonable accuracy on space photographs using ocular techniques. Infrared color film is the best single multiband sensor for this purpose. Using the Apollo 9 infrared color photographs taken in March 1969 photointerpreters were able to predict forest area for small units consistently within 5 to 10 percent of ground truth. Approximately 5,000 density data points were recorded for 14 scan lines selected at random from five study blocks. The mean densities and standard deviations were computed for 13 separate land use classes. The results indicate that forest area cannot be separated from other land uses with a high degree of accuracy using optical film density alone. If, however, densities derived by introducing red, green, and blue cutoff filters in the optical system of the microdensitometer are combined with their differences and their ratios in regression analysis techniques, there is a good possibility of discriminating forest from all other classes.
NASA Astrophysics Data System (ADS)
Dolan, K. A.; Huang, W.; Johnson, K. D.; Birdsey, R.; Finley, A. O.; Dubayah, R.; Hurtt, G. C.
2016-12-01
In 2010 Congress directed NASA to initiate research towards the development of Carbon Monitoring Systems (CMS). In response, our team has worked to develop a robust, replicable framework to quantify and map aboveground forest biomass at high spatial resolutions. Crucial to this framework has been the collection of field-based estimates of aboveground tree biomass, combined with remotely detected canopy and structural attributes, for calibration and validation. Here we evaluate the field- based calibration and validation strategies within this carbon monitoring framework and discuss the implications on local to national monitoring systems. Through project development, the domain of this research has expanded from two counties in MD (2,181 km2), to the entire state of MD (32,133 km2), and most recently the tri-state region of MD, PA, and DE (157,868 km2) and covers forests in four major USDA ecological providences. While there are approximately 1000 Forest Inventory and Analysis (FIA) plots distributed across the state of MD, 60% fell in areas considered non-forest or had conditions that precluded them from being measured in the last forest inventory. Across the two pilot counties, where population and landuse competition is high, that proportion rose to 70% Thus, during the initial phases of this project 850 independent field plots were established for model calibration following a random stratified design to insure the adequate representation of height and vegetation classes found across the state, while FIA data were used as an independent data source for validation. As the project expanded to cover the larger spatial tri-state domain, the strategy was flipped to base calibration on more than 3,300 measured FIA plots, as they provide a standardized, consistent and available data source across the nation. An additional 350 stratified random plots were deployed in the Northern Mixed forests of PA and the Coastal Plains forests of DE for validation.
Kropat, Georg; Bochud, Francois; Jaboyedoff, Michel; Laedermann, Jean-Pascal; Murith, Christophe; Palacios Gruson, Martha; Baechler, Sébastien
2015-09-01
According to estimations around 230 people die as a result of radon exposure in Switzerland. This public health concern makes reliable indoor radon prediction and mapping methods necessary in order to improve risk communication to the public. The aim of this study was to develop an automated method to classify lithological units according to their radon characteristics and to develop mapping and predictive tools in order to improve local radon prediction. About 240 000 indoor radon concentration (IRC) measurements in about 150 000 buildings were available for our analysis. The automated classification of lithological units was based on k-medoids clustering via pair-wise Kolmogorov distances between IRC distributions of lithological units. For IRC mapping and prediction we used random forests and Bayesian additive regression trees (BART). The automated classification groups lithological units well in terms of their IRC characteristics. Especially the IRC differences in metamorphic rocks like gneiss are well revealed by this method. The maps produced by random forests soundly represent the regional difference of IRCs in Switzerland and improve the spatial detail compared to existing approaches. We could explain 33% of the variations in IRC data with random forests. Additionally, the influence of a variable evaluated by random forests shows that building characteristics are less important predictors for IRCs than spatial/geological influences. BART could explain 29% of IRC variability and produced maps that indicate the prediction uncertainty. Ensemble regression trees are a powerful tool to model and understand the multidimensional influences on IRCs. Automatic clustering of lithological units complements this method by facilitating the interpretation of radon properties of rock types. This study provides an important element for radon risk communication. Future approaches should consider taking into account further variables like soil gas radon measurements as well as more detailed geological information. Copyright © 2015 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Trigila, Alessandro; Iadanza, Carla; Esposito, Carlo; Scarascia-Mugnozza, Gabriele
2015-11-01
The aim of this work is to define reliable susceptibility models for shallow landslides using Logistic Regression and Random Forests multivariate statistical techniques. The study area, located in North-East Sicily, was hit on October 1st 2009 by a severe rainstorm (225 mm of cumulative rainfall in 7 h) which caused flash floods and more than 1000 landslides. Several small villages, such as Giampilieri, were hit with 31 fatalities, 6 missing persons and damage to buildings and transportation infrastructures. Landslides, mainly types such as earth and debris translational slides evolving into debris flows, were triggered on steep slopes and involved colluvium and regolith materials which cover the underlying metamorphic bedrock. The work has been carried out with the following steps: i) realization of a detailed event landslide inventory map through field surveys coupled with observation of high resolution aerial colour orthophoto; ii) identification of landslide source areas; iii) data preparation of landslide controlling factors and descriptive statistics based on a bivariate method (Frequency Ratio) to get an initial overview on existing relationships between causative factors and shallow landslide source areas; iv) choice of criteria for the selection and sizing of the mapping unit; v) implementation of 5 multivariate statistical susceptibility models based on Logistic Regression and Random Forests techniques and focused on landslide source areas; vi) evaluation of the influence of sample size and type of sampling on results and performance of the models; vii) evaluation of the predictive capabilities of the models using ROC curve, AUC and contingency tables; viii) comparison of model results and obtained susceptibility maps; and ix) analysis of temporal variation of landslide susceptibility related to input parameter changes. Models based on Logistic Regression and Random Forests have demonstrated excellent predictive capabilities. Land use and wildfire variables were found to have a strong control on the occurrence of very rapid shallow landslides.
Time fluctuation analysis of forest fire sequences
NASA Astrophysics Data System (ADS)
Vega Orozco, Carmen D.; Kanevski, Mikhaïl; Tonini, Marj; Golay, Jean; Pereira, Mário J. G.
2013-04-01
Forest fires are complex events involving both space and time fluctuations. Understanding of their dynamics and pattern distribution is of great importance in order to improve the resource allocation and support fire management actions at local and global levels. This study aims at characterizing the temporal fluctuations of forest fire sequences observed in Portugal, which is the country that holds the largest wildfire land dataset in Europe. This research applies several exploratory data analysis measures to 302,000 forest fires occurred from 1980 to 2007. The applied clustering measures are: Morisita clustering index, fractal and multifractal dimensions (box-counting), Ripley's K-function, Allan Factor, and variography. These algorithms enable a global time structural analysis describing the degree of clustering of a point pattern and defining whether the observed events occur randomly, in clusters or in a regular pattern. The considered methods are of general importance and can be used for other spatio-temporal events (i.e. crime, epidemiology, biodiversity, geomarketing, etc.). An important contribution of this research deals with the analysis and estimation of local measures of clustering that helps understanding their temporal structure. Each measure is described and executed for the raw data (forest fires geo-database) and results are compared to reference patterns generated under the null hypothesis of randomness (Poisson processes) embedded in the same time period of the raw data. This comparison enables estimating the degree of the deviation of the real data from a Poisson process. Generalizations to functional measures of these clustering methods, taking into account the phenomena, were also applied and adapted to detect time dependences in a measured variable (i.e. burned area). The time clustering of the raw data is compared several times with the Poisson processes at different thresholds of the measured function. Then, the clustering measure value depends on the threshold which helps to understand the time pattern of the studied events. Our findings detected the presence of overdensity of events in particular time periods and showed that the forest fire sequences in Portugal can be considered as a multifractal process with a degree of time-clustering of the events. Key words: time sequences, Morisita index, fractals, multifractals, box-counting, Ripley's K-function, Allan Factor, variography, forest fires, point process. Acknowledgements This work was partly supported by the SNFS Project No. 200021-140658, "Analysis and Modelling of Space-Time Patterns in Complex Regions". References - Kanevski M. (Editor). 2008. Advanced Mapping of Environmental Data: Geostatistics, Machine Learning and Bayesian Maximum Entropy. London / Hoboken: iSTE / Wiley. - Telesca L. and Pereira M.G. 2010. Time-clustering investigation of fire temporal fluctuations in Portugal, Nat. Hazards Earth Syst. Sci., vol. 10(4): 661-666. - Vega Orozco C., Tonini M., Conedera M., Kanevski M. (2012) Cluster recognition in spatial-temporal sequences: the case of forest fires, Geoinformatica, vol. 16(4): 653-673.
Old-growth and mature forests near spotted owl nests in western Oregon
NASA Technical Reports Server (NTRS)
Ripple, William J.; Johnson, David H.; Hershey, K. T.; Meslow, E. Charles
1995-01-01
We investigated how the amount of old-growth and mature forest influences the selection of nest sites by northern spotted owls (Strix occidentalis caurina) in the Central Cascade Mountains of Oregon. We used 7 different plot sizes to compare the proportion of mature and old-growth forest between 30 nest sites and 30 random sites. The proportion of old-growth and mature forest was significantly greater at nests sites than at random sites for all plot sizes (P less than or equal to 0.01). Thus, management of the spotted owl might require setting the percentage of old-growth and mature forest retained from harvesting at least 1 standard deviation above the mean for the 30 nest sites we examined.
NASA Astrophysics Data System (ADS)
Polan, Daniel F.; Brady, Samuel L.; Kaufman, Robert A.
2016-09-01
There is a need for robust, fully automated whole body organ segmentation for diagnostic CT. This study investigates and optimizes a Random Forest algorithm for automated organ segmentation; explores the limitations of a Random Forest algorithm applied to the CT environment; and demonstrates segmentation accuracy in a feasibility study of pediatric and adult patients. To the best of our knowledge, this is the first study to investigate a trainable Weka segmentation (TWS) implementation using Random Forest machine-learning as a means to develop a fully automated tissue segmentation tool developed specifically for pediatric and adult examinations in a diagnostic CT environment. Current innovation in computed tomography (CT) is focused on radiomics, patient-specific radiation dose calculation, and image quality improvement using iterative reconstruction, all of which require specific knowledge of tissue and organ systems within a CT image. The purpose of this study was to develop a fully automated Random Forest classifier algorithm for segmentation of neck-chest-abdomen-pelvis CT examinations based on pediatric and adult CT protocols. Seven materials were classified: background, lung/internal air or gas, fat, muscle, solid organ parenchyma, blood/contrast enhanced fluid, and bone tissue using Matlab and the TWS plugin of FIJI. The following classifier feature filters of TWS were investigated: minimum, maximum, mean, and variance evaluated over a voxel radius of 2 n , (n from 0 to 4), along with noise reduction and edge preserving filters: Gaussian, bilateral, Kuwahara, and anisotropic diffusion. The Random Forest algorithm used 200 trees with 2 features randomly selected per node. The optimized auto-segmentation algorithm resulted in 16 image features including features derived from maximum, mean, variance Gaussian and Kuwahara filters. Dice similarity coefficient (DSC) calculations between manually segmented and Random Forest algorithm segmented images from 21 patient image sections, were analyzed. The automated algorithm produced segmentation of seven material classes with a median DSC of 0.86 ± 0.03 for pediatric patient protocols, and 0.85 ± 0.04 for adult patient protocols. Additionally, 100 randomly selected patient examinations were segmented and analyzed, and a mean sensitivity of 0.91 (range: 0.82-0.98), specificity of 0.89 (range: 0.70-0.98), and accuracy of 0.90 (range: 0.76-0.98) were demonstrated. In this study, we demonstrate that this fully automated segmentation tool was able to produce fast and accurate segmentation of the neck and trunk of the body over a wide range of patient habitus and scan parameters.
Tanpitukpongse, T P; Mazurowski, M A; Ikhena, J; Petrella, J R
2017-03-01
Alzheimer disease is a prevalent neurodegenerative disease. Computer assessment of brain atrophy patterns can help predict conversion to Alzheimer disease. Our aim was to assess the prognostic efficacy of individual-versus-combined regional volumetrics in 2 commercially available brain volumetric software packages for predicting conversion of patients with mild cognitive impairment to Alzheimer disease. Data were obtained through the Alzheimer's Disease Neuroimaging Initiative. One hundred ninety-two subjects (mean age, 74.8 years; 39% female) diagnosed with mild cognitive impairment at baseline were studied. All had T1-weighted MR imaging sequences at baseline and 3-year clinical follow-up. Analysis was performed with NeuroQuant and Neuroreader. Receiver operating characteristic curves assessing the prognostic efficacy of each software package were generated by using a univariable approach using individual regional brain volumes and 2 multivariable approaches (multiple regression and random forest), combining multiple volumes. On univariable analysis of 11 NeuroQuant and 11 Neuroreader regional volumes, hippocampal volume had the highest area under the curve for both software packages (0.69, NeuroQuant; 0.68, Neuroreader) and was not significantly different ( P > .05) between packages. Multivariable analysis did not increase the area under the curve for either package (0.63, logistic regression; 0.60, random forest NeuroQuant; 0.65, logistic regression; 0.62, random forest Neuroreader). Of the multiple regional volume measures available in FDA-cleared brain volumetric software packages, hippocampal volume remains the best single predictor of conversion of mild cognitive impairment to Alzheimer disease at 3-year follow-up. Combining volumetrics did not add additional prognostic efficacy. Therefore, future prognostic studies in mild cognitive impairment, combining such tools with demographic and other biomarker measures, are justified in using hippocampal volume as the only volumetric biomarker. © 2017 by American Journal of Neuroradiology.
Measuring socioeconomic status in multicountry studies: results from the eight-country MAL-ED study
2014-01-01
Background There is no standardized approach to comparing socioeconomic status (SES) across multiple sites in epidemiological studies. This is particularly problematic when cross-country comparisons are of interest. We sought to develop a simple measure of SES that would perform well across diverse, resource-limited settings. Methods A cross-sectional study was conducted with 800 children aged 24 to 60 months across eight resource-limited settings. Parents were asked to respond to a household SES questionnaire, and the height of each child was measured. A statistical analysis was done in two phases. First, the best approach for selecting and weighting household assets as a proxy for wealth was identified. We compared four approaches to measuring wealth: maternal education, principal components analysis, Multidimensional Poverty Index, and a novel variable selection approach based on the use of random forests. Second, the selected wealth measure was combined with other relevant variables to form a more complete measure of household SES. We used child height-for-age Z-score (HAZ) as the outcome of interest. Results Mean age of study children was 41 months, 52% were boys, and 42% were stunted. Using cross-validation, we found that random forests yielded the lowest prediction error when selecting assets as a measure of household wealth. The final SES index included access to improved water and sanitation, eight selected assets, maternal education, and household income (the WAMI index). A 25% difference in the WAMI index was positively associated with a difference of 0.38 standard deviations in HAZ (95% CI 0.22 to 0.55). Conclusions Statistical learning methods such as random forests provide an alternative to principal components analysis in the development of SES scores. Results from this multicountry study demonstrate the validity of a simplified SES index. With further validation, this simplified index may provide a standard approach for SES adjustment across resource-limited settings. PMID:24656134
Neyeloff, Jeruza L; Fuchs, Sandra C; Moreira, Leila B
2012-01-20
Meta-analyses are necessary to synthesize data obtained from primary research, and in many situations reviews of observational studies are the only available alternative. General purpose statistical packages can meta-analyze data, but usually require external macros or coding. Commercial specialist software is available, but may be expensive and focused in a particular type of primary data. Most available softwares have limitations in dealing with descriptive data, and the graphical display of summary statistics such as incidence and prevalence is unsatisfactory. Analyses can be conducted using Microsoft Excel, but there was no previous guide available. We constructed a step-by-step guide to perform a meta-analysis in a Microsoft Excel spreadsheet, using either fixed-effect or random-effects models. We have also developed a second spreadsheet capable of producing customized forest plots. It is possible to conduct a meta-analysis using only Microsoft Excel. More important, to our knowledge this is the first description of a method for producing a statistically adequate but graphically appealing forest plot summarizing descriptive data, using widely available software.
2012-01-01
Background Meta-analyses are necessary to synthesize data obtained from primary research, and in many situations reviews of observational studies are the only available alternative. General purpose statistical packages can meta-analyze data, but usually require external macros or coding. Commercial specialist software is available, but may be expensive and focused in a particular type of primary data. Most available softwares have limitations in dealing with descriptive data, and the graphical display of summary statistics such as incidence and prevalence is unsatisfactory. Analyses can be conducted using Microsoft Excel, but there was no previous guide available. Findings We constructed a step-by-step guide to perform a meta-analysis in a Microsoft Excel spreadsheet, using either fixed-effect or random-effects models. We have also developed a second spreadsheet capable of producing customized forest plots. Conclusions It is possible to conduct a meta-analysis using only Microsoft Excel. More important, to our knowledge this is the first description of a method for producing a statistically adequate but graphically appealing forest plot summarizing descriptive data, using widely available software. PMID:22264277
NASA Technical Reports Server (NTRS)
Kumar, Uttam; Nemani, Ramakrishna R.; Ganguly, Sangram; Kalia, Subodh; Michaelis, Andrew
2017-01-01
In this work, we use a Fully Constrained Least Squares Subpixel Learning Algorithm to unmix global WELD (Web Enabled Landsat Data) to obtain fractions or abundances of substrate (S), vegetation (V) and dark objects (D) classes. Because of the sheer nature of data and compute needs, we leveraged the NASA Earth Exchange (NEX) high performance computing architecture to optimize and scale our algorithm for large-scale processing. Subsequently, the S-V-D abundance maps were characterized into 4 classes namely, forest, farmland, water and urban areas (with NPP-VIIRS-national polar orbiting partnership visible infrared imaging radiometer suite nighttime lights data) over California, USA using Random Forest classifier. Validation of these land cover maps with NLCD (National Land Cover Database) 2011 products and NAFD (North American Forest Dynamics) static forest cover maps showed that an overall classification accuracy of over 91 percent was achieved, which is a 6 percent improvement in unmixing based classification relative to per-pixel-based classification. As such, abundance maps continue to offer an useful alternative to high-spatial resolution data derived classification maps for forest inventory analysis, multi-class mapping for eco-climatic models and applications, fast multi-temporal trend analysis and for societal and policy-relevant applications needed at the watershed scale.
NASA Astrophysics Data System (ADS)
Ganguly, S.; Kumar, U.; Nemani, R. R.; Kalia, S.; Michaelis, A.
2017-12-01
In this work, we use a Fully Constrained Least Squares Subpixel Learning Algorithm to unmix global WELD (Web Enabled Landsat Data) to obtain fractions or abundances of substrate (S), vegetation (V) and dark objects (D) classes. Because of the sheer nature of data and compute needs, we leveraged the NASA Earth Exchange (NEX) high performance computing architecture to optimize and scale our algorithm for large-scale processing. Subsequently, the S-V-D abundance maps were characterized into 4 classes namely, forest, farmland, water and urban areas (with NPP-VIIRS - national polar orbiting partnership visible infrared imaging radiometer suite nighttime lights data) over California, USA using Random Forest classifier. Validation of these land cover maps with NLCD (National Land Cover Database) 2011 products and NAFD (North American Forest Dynamics) static forest cover maps showed that an overall classification accuracy of over 91% was achieved, which is a 6% improvement in unmixing based classification relative to per-pixel based classification. As such, abundance maps continue to offer an useful alternative to high-spatial resolution data derived classification maps for forest inventory analysis, multi-class mapping for eco-climatic models and applications, fast multi-temporal trend analysis and for societal and policy-relevant applications needed at the watershed scale.
NASA Astrophysics Data System (ADS)
Ganguly, S.; Kumar, U.; Nemani, R. R.; Kalia, S.; Michaelis, A.
2016-12-01
In this work, we use a Fully Constrained Least Squares Subpixel Learning Algorithm to unmix global WELD (Web Enabled Landsat Data) to obtain fractions or abundances of substrate (S), vegetation (V) and dark objects (D) classes. Because of the sheer nature of data and compute needs, we leveraged the NASA Earth Exchange (NEX) high performance computing architecture to optimize and scale our algorithm for large-scale processing. Subsequently, the S-V-D abundance maps were characterized into 4 classes namely, forest, farmland, water and urban areas (with NPP-VIIRS - national polar orbiting partnership visible infrared imaging radiometer suite nighttime lights data) over California, USA using Random Forest classifier. Validation of these land cover maps with NLCD (National Land Cover Database) 2011 products and NAFD (North American Forest Dynamics) static forest cover maps showed that an overall classification accuracy of over 91% was achieved, which is a 6% improvement in unmixing based classification relative to per-pixel based classification. As such, abundance maps continue to offer an useful alternative to high-spatial resolution data derived classification maps for forest inventory analysis, multi-class mapping for eco-climatic models and applications, fast multi-temporal trend analysis and for societal and policy-relevant applications needed at the watershed scale.
Mitchell, Michael; Wilson, R. Randy; Twedt, Daniel J.; Mini, Anne E.; James, J. Dale
2016-01-01
The Mississippi Alluvial Valley is a floodplain along the southern extent of the Mississippi River extending from southern Missouri to the Gulf of Mexico. This area once encompassed nearly 10 million ha of floodplain forests, most of which has been converted to agriculture over the past two centuries. Conservation programs in this region revolve around protection of existing forest and reforestation of converted lands. Therefore, an accurate and up to date classification of forest cover is essential for conservation planning, including efforts that prioritize areas for conservation activities. We used object-based image analysis with Random Forest classification to quickly and accurately classify forest cover. We used Landsat band, band ratio, and band index statistics to identify and define similar objects as our training sets instead of selecting individual training points. This provided a single rule-set that was used to classify each of the 11 Landsat 5 Thematic Mapper scenes that encompassed the Mississippi Alluvial Valley. We classified 3,307,910±85,344 ha (32% of this region) as forest. Our overall classification accuracy was 96.9% with Kappa statistic of 0.96. Because this method of forest classification is rapid and accurate, assessment of forest cover can be regularly updated and progress toward forest habitat goals identified in conservation plans can be periodically evaluated.
NASA Astrophysics Data System (ADS)
Markman, Adam; Carnicer, Artur; Javidi, Bahram
2017-05-01
We overview our recent work [1] on utilizing three-dimensional (3D) optical phase codes for object authentication using the random forest classifier. A simple 3D optical phase code (OPC) is generated by combining multiple diffusers and glass slides. This tag is then placed on a quick-response (QR) code, which is a barcode capable of storing information and can be scanned under non-uniform illumination conditions, rotation, and slight degradation. A coherent light source illuminates the OPC and the transmitted light is captured by a CCD to record the unique signature. Feature extraction on the signature is performed and inputted into a pre-trained random-forest classifier for authentication.
NASA Astrophysics Data System (ADS)
Banerjee, Priyanka; Preissner, Robert
2018-04-01
Taste of a chemical compounds present in food stimulates us to take in nutrients and avoid poisons. However, the perception of taste greatly depends on the genetic as well as evolutionary perspectives. The aim of this work was the development and validation of a machine learning model based on molecular fingerprints to discriminate between sweet and bitter taste of molecules. BitterSweetForest is the first open access model based on KNIME workflow that provides platform for prediction of bitter and sweet taste of chemical compounds using molecular fingerprints and Random Forest based classifier. The constructed model yielded an accuracy of 95% and an AUC of 0.98 in cross-validation. In independent test set, BitterSweetForest achieved an accuracy of 96 % and an AUC of 0.98 for bitter and sweet taste prediction. The constructed model was further applied to predict the bitter and sweet taste of natural compounds, approved drugs as well as on an acute toxicity compound data set. BitterSweetForest suggests 70% of the natural product space, as bitter and 10 % of the natural product space as sweet with confidence score of 0.60 and above. 77 % of the approved drug set was predicted as bitter and 2% as sweet with a confidence scores of 0.75 and above. Similarly, 75% of the total compounds from acute oral toxicity class were predicted only as bitter with a minimum confidence score of 0.75, revealing toxic compounds are mostly bitter. Furthermore, we applied a Bayesian based feature analysis method to discriminate the most occurring chemical features between sweet and bitter compounds from the feature space of a circular fingerprint.
Banerjee, Priyanka; Preissner, Robert
2018-01-01
Taste of a chemical compound present in food stimulates us to take in nutrients and avoid poisons. However, the perception of taste greatly depends on the genetic as well as evolutionary perspectives. The aim of this work was the development and validation of a machine learning model based on molecular fingerprints to discriminate between sweet and bitter taste of molecules. BitterSweetForest is the first open access model based on KNIME workflow that provides platform for prediction of bitter and sweet taste of chemical compounds using molecular fingerprints and Random Forest based classifier. The constructed model yielded an accuracy of 95% and an AUC of 0.98 in cross-validation. In independent test set, BitterSweetForest achieved an accuracy of 96% and an AUC of 0.98 for bitter and sweet taste prediction. The constructed model was further applied to predict the bitter and sweet taste of natural compounds, approved drugs as well as on an acute toxicity compound data set. BitterSweetForest suggests 70% of the natural product space, as bitter and 10% of the natural product space as sweet with confidence score of 0.60 and above. 77% of the approved drug set was predicted as bitter and 2% as sweet with a confidence score of 0.75 and above. Similarly, 75% of the total compounds from acute oral toxicity class were predicted only as bitter with a minimum confidence score of 0.75, revealing toxic compounds are mostly bitter. Furthermore, we applied a Bayesian based feature analysis method to discriminate the most occurring chemical features between sweet and bitter compounds using the feature space of a circular fingerprint. PMID:29696137
Exploring diversity in ensemble classification: Applications in large area land cover mapping
NASA Astrophysics Data System (ADS)
Mellor, Andrew; Boukir, Samia
2017-07-01
Ensemble classifiers, such as random forests, are now commonly applied in the field of remote sensing, and have been shown to perform better than single classifier systems, resulting in reduced generalisation error. Diversity across the members of ensemble classifiers is known to have a strong influence on classification performance - whereby classifier errors are uncorrelated and more uniformly distributed across ensemble members. The relationship between ensemble diversity and classification performance has not yet been fully explored in the fields of information science and machine learning and has never been examined in the field of remote sensing. This study is a novel exploration of ensemble diversity and its link to classification performance, applied to a multi-class canopy cover classification problem using random forests and multisource remote sensing and ancillary GIS data, across seven million hectares of diverse dry-sclerophyll dominated public forests in Victoria Australia. A particular emphasis is placed on analysing the relationship between ensemble diversity and ensemble margin - two key concepts in ensemble learning. The main novelty of our work is on boosting diversity by emphasizing the contribution of lower margin instances used in the learning process. Exploring the influence of tree pruning on diversity is also a new empirical analysis that contributes to a better understanding of ensemble performance. Results reveal insights into the trade-off between ensemble classification accuracy and diversity, and through the ensemble margin, demonstrate how inducing diversity by targeting lower margin training samples is a means of achieving better classifier performance for more difficult or rarer classes and reducing information redundancy in classification problems. Our findings inform strategies for collecting training data and designing and parameterising ensemble classifiers, such as random forests. This is particularly important in large area remote sensing applications, for which training data is costly and resource intensive to collect.
Application and partial validation of a habitat model for moose in the Lake Superior region
Allen, A.W.; Terrell, J.W.; Mangus, W.L.; Lindquist, E.L.
1991-01-01
A modified version of the dormant-season portion of a Habitat Suitability Index (HSI) model developed for assessing moose (Alces alces) habitat in the Lake Superior Region was incorporated in a Geographic Information System (GIS) for 490 km2 of Minnesota's Superior National Forest. Moose locations (n=235) were plotted during aerial surveys conducted in December 1988 and January 1990-1991. Dormant-season forage and cover quality for 1,000-m, 500-m, and 200-m radii plots around random points and moose locations were compared using U.S. Forest Service stand examination data. Cover quality indices were lower than forage quality indices within all plots. The median value for the average cover quality index was greater (P=0.003) within 200-m plots around cow moose locations than for plots around random points for the most severe winter of the study. The proportion of highest-quality winter cover, such as mixed stands dominated by mid-age class white spruce (Picea glauca) and balsam fir (Abies balsanea), was greater within 500-m and 200-m plots around cow moose than within similar plots around random points during the two most severe winters. These results indicate that suboptimum ratings of winter habitat quality used in the GIS for dormant-season forage >100 m from cover, as suggested in the original HSI model, are reasonable. Integrating the habitat model with forest stand data using a GIS permitted analysis of moose habitat within a relatively large geographic area. Simulation of habitat quality indicated a potential shortage of late-winter cover in the study area. The effects of forest management actions on moose habitat quality can be simulated without collecting additional data.
Liu, J.; Liu, S.; Loveland, Thomas R.; Tieszen, L.L.
2008-01-01
Land cover change is one of the key driving forces for ecosystem carbon (C) dynamics. We present an approach for using sequential remotely sensed land cover observations and a biogeochemical model to estimate contemporary and future ecosystem carbon trends. We applied the General Ensemble Biogeochemical Modelling System (GEMS) for the Laurentian Plains and Hills ecoregion in the northeastern United States for the period of 1975-2025. The land cover changes, especially forest stand-replacing events, were detected on 30 randomly located 10-km by 10-km sample blocks, and were assimilated by GEMS for biogeochemical simulations. In GEMS, each unique combination of major controlling variables (including land cover change history) forms a geo-referenced simulation unit. For a forest simulation unit, a Monte Carlo process is used to determine forest type, forest age, forest biomass, and soil C, based on the Forest Inventory and Analysis (FIA) data and the U.S. General Soil Map (STATSGO) data. Ensemble simulations are performed for each simulation unit to incorporate input data uncertainty. Results show that on average forests of the Laurentian Plains and Hills ecoregion have been sequestrating 4.2 Tg C (1 teragram = 1012 gram) per year, including 1.9 Tg C removed from the ecosystem as the consequences of land cover change. ?? 2008 Elsevier B.V.
Fast image interpolation via random forests.
Huang, Jun-Jie; Siu, Wan-Chi; Liu, Tian-Rui
2015-10-01
This paper proposes a two-stage framework for fast image interpolation via random forests (FIRF). The proposed FIRF method gives high accuracy, as well as requires low computation. The underlying idea of this proposed work is to apply random forests to classify the natural image patch space into numerous subspaces and learn a linear regression model for each subspace to map the low-resolution image patch to high-resolution image patch. The FIRF framework consists of two stages. Stage 1 of the framework removes most of the ringing and aliasing artifacts in the initial bicubic interpolated image, while Stage 2 further refines the Stage 1 interpolated image. By varying the number of decision trees in the random forests and the number of stages applied, the proposed FIRF method can realize computationally scalable image interpolation. Extensive experimental results show that the proposed FIRF(3, 2) method achieves more than 0.3 dB improvement in peak signal-to-noise ratio over the state-of-the-art nonlocal autoregressive modeling (NARM) method. Moreover, the proposed FIRF(1, 1) obtains similar or better results as NARM while only takes its 0.3% computational time.
CW-SSIM kernel based random forest for image classification
NASA Astrophysics Data System (ADS)
Fan, Guangzhe; Wang, Zhou; Wang, Jiheng
2010-07-01
Complex wavelet structural similarity (CW-SSIM) index has been proposed as a powerful image similarity metric that is robust to translation, scaling and rotation of images, but how to employ it in image classification applications has not been deeply investigated. In this paper, we incorporate CW-SSIM as a kernel function into a random forest learning algorithm. This leads to a novel image classification approach that does not require a feature extraction or dimension reduction stage at the front end. We use hand-written digit recognition as an example to demonstrate our algorithm. We compare the performance of the proposed approach with random forest learning based on other kernels, including the widely adopted Gaussian and the inner product kernels. Empirical evidences show that the proposed method is superior in its classification power. We also compared our proposed approach with the direct random forest method without kernel and the popular kernel-learning method support vector machine. Our test results based on both simulated and realworld data suggest that the proposed approach works superior to traditional methods without the feature selection procedure.
Bang, Kyung Sook; Lee, In Sook; Kim, Sung Jae; Song, Min Kyung; Park, Se Eun
2016-02-01
This study was performed to determine the physical and psychological effects of an urban forest-walking program for office workers. For many workers, sedentary lifestyles can lead to low levels of physical activity causing various health problems despite an increased interest in health promotion. Fifty four office workers participated in this study. They were assigned to two groups (experimental group and control group) in random order and the experimental group performed 5 weeks of walking exercise based on Information-Motivation-Behavioral skills Model. The data were collected from October to November 2014. SPSS 21.0 was used for the statistical analysis. The results showed that the urban forest walking program had positive effects on the physical activity level (U=65.00, p<.001), health promotion behavior (t=-2.20, p=.033), and quality of life (t=-2.42, p=.020). However, there were no statistical differences in depression, waist size, body mass index, blood pressure, or bone density between the groups. The current findings of the study suggest the forest-walking program may have positive effects on improving physical activity, health promotion behavior, and quality of life. The program can be used as an effective and efficient strategy for physical and psychological health promotion for office workers.
Random forests as cumulative effects models: A case study of lakes and rivers in Muskoka, Canada.
Jones, F Chris; Plewes, Rachel; Murison, Lorna; MacDougall, Mark J; Sinclair, Sarah; Davies, Christie; Bailey, John L; Richardson, Murray; Gunn, John
2017-10-01
Cumulative effects assessment (CEA) - a type of environmental appraisal - lacks effective methods for modeling cumulative effects, evaluating indicators of ecosystem condition, and exploring the likely outcomes of development scenarios. Random forests are an extension of classification and regression trees, which model response variables by recursive partitioning. Random forests were used to model a series of candidate ecological indicators that described lakes and rivers from a case study watershed (The Muskoka River Watershed, Canada). Suitability of the candidate indicators for use in cumulative effects assessment and watershed monitoring was assessed according to how well they could be predicted from natural habitat features and how sensitive they were to human land-use. The best models explained 75% of the variation in a multivariate descriptor of lake benthic-macroinvertebrate community structure, and 76% of the variation in the conductivity of river water. Similar results were obtained by cross-validation. Several candidate indicators detected a simulated doubling of urban land-use in their catchments, and a few were able to detect a simulated doubling of agricultural land-use. The paper demonstrates that random forests can be used to describe the combined and singular effects of multiple stressors and natural environmental factors, and furthermore, that random forests can be used to evaluate the performance of monitoring indicators. The numerical methods presented are applicable to any ecosystem and indicator type, and therefore represent a step forward for CEA. Crown Copyright © 2017. Published by Elsevier Ltd. All rights reserved.
Improved high-dimensional prediction with Random Forests by the use of co-data.
Te Beest, Dennis E; Mes, Steven W; Wilting, Saskia M; Brakenhoff, Ruud H; van de Wiel, Mark A
2017-12-28
Prediction in high dimensional settings is difficult due to the large number of variables relative to the sample size. We demonstrate how auxiliary 'co-data' can be used to improve the performance of a Random Forest in such a setting. Co-data are incorporated in the Random Forest by replacing the uniform sampling probabilities that are used to draw candidate variables by co-data moderated sampling probabilities. Co-data here are defined as any type information that is available on the variables of the primary data, but does not use its response labels. These moderated sampling probabilities are, inspired by empirical Bayes, learned from the data at hand. We demonstrate the co-data moderated Random Forest (CoRF) with two examples. In the first example we aim to predict the presence of a lymph node metastasis with gene expression data. We demonstrate how a set of external p-values, a gene signature, and the correlation between gene expression and DNA copy number can improve the predictive performance. In the second example we demonstrate how the prediction of cervical (pre-)cancer with methylation data can be improved by including the location of the probe relative to the known CpG islands, the number of CpG sites targeted by a probe, and a set of p-values from a related study. The proposed method is able to utilize auxiliary co-data to improve the performance of a Random Forest.
Le, Trang T; Simmons, W Kyle; Misaki, Masaya; Bodurka, Jerzy; White, Bill C; Savitz, Jonathan; McKinney, Brett A
2017-09-15
Classification of individuals into disease or clinical categories from high-dimensional biological data with low prediction error is an important challenge of statistical learning in bioinformatics. Feature selection can improve classification accuracy but must be incorporated carefully into cross-validation to avoid overfitting. Recently, feature selection methods based on differential privacy, such as differentially private random forests and reusable holdout sets, have been proposed. However, for domains such as bioinformatics, where the number of features is much larger than the number of observations p≫n , these differential privacy methods are susceptible to overfitting. We introduce private Evaporative Cooling, a stochastic privacy-preserving machine learning algorithm that uses Relief-F for feature selection and random forest for privacy preserving classification that also prevents overfitting. We relate the privacy-preserving threshold mechanism to a thermodynamic Maxwell-Boltzmann distribution, where the temperature represents the privacy threshold. We use the thermal statistical physics concept of Evaporative Cooling of atomic gases to perform backward stepwise privacy-preserving feature selection. On simulated data with main effects and statistical interactions, we compare accuracies on holdout and validation sets for three privacy-preserving methods: the reusable holdout, reusable holdout with random forest, and private Evaporative Cooling, which uses Relief-F feature selection and random forest classification. In simulations where interactions exist between attributes, private Evaporative Cooling provides higher classification accuracy without overfitting based on an independent validation set. In simulations without interactions, thresholdout with random forest and private Evaporative Cooling give comparable accuracies. We also apply these privacy methods to human brain resting-state fMRI data from a study of major depressive disorder. Code available at http://insilico.utulsa.edu/software/privateEC . brett-mckinney@utulsa.edu. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Aspen succession in the Intermountain West: A deterministic model
Dale L. Bartos; Frederick R. Ward; George S. Innis
1983-01-01
A deterministic model of succession in aspen forests was developed using existing data and intuition. The degree of uncertainty, which was determined by allowing the parameter values to vary at random within limits, was larger than desired. This report presents results of an analysis of model sensitivity to changes in parameter values. These results have indicated...
Kristen Finch; Edgard Espinoza; F. Andrew Jones; Richard Cronn
2017-01-01
Premise of the study: We investigated whether wood metabolite profiles from direct analysis in real time (time-of-flight) mass spectrometry (DART-TOFMS) could be used to determine the geographic origin of Douglas-fir wood cores originating from two regions in western Oregon, USA. Methods: Three annual ring mass...
What does it take to get family forest owners to enroll in a forest stewardship-type program?
Michael A. Kilgore; Stephanie A. Snyder; Joseph Schertz; Steven J. Taff
2008-01-01
We estimated the probability of enrollment and factors influencing participation in a forest stewardship-type program, Minnesota's Sustainable Forest Incentives Act, using data from a mail survey of over 1000 randomly-selected Minnesota family forest owners. Of the 15 variables tested, only five were significant predictors of a landowner's interest in...
Karin Riley; Isaac C. Grenfell; Mark A. Finney
2016-01-01
Maps of the number, size, and species of trees in forests across the western United States are desirable for many applications such as estimating terrestrial carbon resources, predicting tree mortality following wildfires, and for forest inventory. However, detailed mapping of trees for large areas is not feasible with current technologies, but statistical...
NASA Technical Reports Server (NTRS)
Treuhaft, Robert N.; Law, Beverly E.; Siqueira, Paul R.
2000-01-01
Parameters describing the vertical structure of forests, for example tree height, height-to-base-of-live-crown, underlying topography, and leaf area density, bear on land-surface, biogeochemical, and climate modeling efforts. Single, fixed-baseline interferometric synthetic aperture radar (INSAR) normalized cross-correlations constitute two observations from which to estimate forest vertical structure parameters: Cross-correlation amplitude and phase. Multialtitude INSAR observations increase the effective number of baselines potentially enabling the estimation of a larger set of vertical-structure parameters. Polarimetry and polarimetric interferometry can further extend the observation set. This paper describes the first acquisition of multialtitude INSAR for the purpose of estimating the parameters describing a vegetated land surface. These data were collected over ponderosa pine in central Oregon near longitude and latitude -121 37 25 and 44 29 56. The JPL interferometric TOPSAR system was flown at the standard 8-km altitude, and also at 4-km and 2-km altitudes, in a race track. A reference line including the above coordinates was maintained at 35 deg for both the north-east heading and the return southwest heading, at all altitudes. In addition to the three altitudes for interferometry, one line was flown with full zero-baseline polarimetry at the 8-km altitude. A preliminary analysis of part of the data collected suggests that they are consistent with one of two physical models describing the vegetation: 1) a single-layer, randomly oriented forest volume with a very strong ground return or 2) a multilayered randomly oriented volume; a homogeneous, single-layer model with no ground return cannot account for the multialtitude correlation amplitudes. Below the inconsistency of the data with a single-layer model is followed by analysis scenarios which include either the ground or a layered structure. The ground returns suggested by this preliminary analysis seem too strong to be plausible, but parameters describing a two-layer compare reasonably well to a field-measured probability distribution of tree heights in the area.
Factors influencing consumption of nutrient rich forest foods in rural Cameroon.
Fungo, Robert; Muyonga, John H; Kabahenda, Margaret; Okia, Clement A; Snook, Laura
2016-02-01
Studies show that a number of forest foods consumed in Cameroon are highly nutritious and rich in health boosting bioactive compounds. This study assessed the knowledge and perceptions towards the nutritional and health promoting properties of forest foods among forest dependent communities. The relationship between knowledge, perceptions and socio-demographic attributes on consumption of forest foods was also determined. A total of 279 females in charge of decision making with respect to food preparation were randomly selected from 12 villages in southern and eastern Cameroon and interviewed using researcher administered questionnaires. Multivariate logistic regression analysis was used to identify the factors affecting consumption of forest foods. Baillonella toxisperma (98%) and Irvingia gabonesis (81%) were the most known nutrient rich forest foods by the respondents. About 31% of the respondents were aware of the nutritional value and health benefits of forest foods. About 10%-61% of the respondents expressed positive attitudes to questions related with health benefits of specific forest foods. Consumption of forest foods was found to be higher among polygamous families and also positively related to length of stay in the forest area and age of respondent with consumption of forest foods. Education had an inverse relationship with use of forest foods. Knowledge and positive attitude towards the nutritional value of forest foods were also found to positively influence consumption of forest foods. Since knowledge was found to influence attitude and consumption, there is need to invest in awareness campaigns to strengthen the current knowledge levels among the study population. This should positively influence the attitudes and perceptions towards increased consumption of forest foods. Copyright © 2015 Elsevier Ltd. All rights reserved.
The Random Forests Statistical Technique: An Examination of Its Value for the Study of Reading
ERIC Educational Resources Information Center
Matsuki, Kazunaga; Kuperman, Victor; Van Dyke, Julie A.
2016-01-01
Studies investigating individual differences in reading ability often involve data sets containing a large number of collinear predictors and a small number of observations. In this article, we discuss the method of Random Forests and demonstrate its suitability for addressing the statistical concerns raised by such data sets. The method is…
ERIC Educational Resources Information Center
Strobl, Carolin; Malley, James; Tutz, Gerhard
2009-01-01
Recursive partitioning methods have become popular and widely used tools for nonparametric regression and classification in many scientific fields. Especially random forests, which can deal with large numbers of predictor variables even in the presence of complex interactions, have been applied successfully in genetics, clinical medicine, and…
Random location of fuel treatments in wildland community interfaces: a percolation approach
Michael Bevers; Philip N. Omi; John G. Hof
2004-01-01
We explore the use of spatially correlated random treatments to reduce fuels in landscape patterns that appear somewhat natural while forming fully connected fuelbreaks between wildland forests and developed protection zones. From treatment zone maps partitioned into grids of hexagonal forest cells representing potential treatment sites, we selected cells to be treated...
Road Network State Estimation Using Random Forest Ensemble Learning
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hou, Yi; Edara, Praveen; Chang, Yohan
Network-scale travel time prediction not only enables traffic management centers (TMC) to proactively implement traffic management strategies, but also allows travelers make informed decisions about route choices between various origins and destinations. In this paper, a random forest estimator was proposed to predict travel time in a network. The estimator was trained using two years of historical travel time data for a case study network in St. Louis, Missouri. Both temporal and spatial effects were considered in the modeling process. The random forest models predicted travel times accurately during both congested and uncongested traffic conditions. The computational times for themore » models were low, thus useful for real-time traffic management and traveler information applications.« less
Adaptive economic and ecological forest management under risk
Joseph Buongiorno; Mo Zhou
2015-01-01
Background: Forest managers must deal with inherently stochastic ecological and economic processes. The future growth of trees is uncertain, and so is their value. The randomness of low-impact, high frequency or rare catastrophic shocks in forest growth has significant implications in shaping the mix of tree species and the forest landscape...
Karin L. Riley; Isaac C. Grenfell; Mark A. Finney
2015-01-01
Mapping the number, size, and species of trees in forests across the western United States has utility for a number of research endeavors, ranging from estimation of terrestrial carbon resources to tree mortality following wildfires. For landscape fire and forest simulations that use the Forest Vegetation Simulator (FVS), a tree-level dataset, or âtree listâ, is a...
NASA Astrophysics Data System (ADS)
Forkert, Nils Daniel; Fiehler, Jens
2015-03-01
The tissue outcome prediction in acute ischemic stroke patients is highly relevant for clinical and research purposes. It has been shown that the combined analysis of diffusion and perfusion MRI datasets using high-level machine learning techniques leads to an improved prediction of final infarction compared to single perfusion parameter thresholding. However, most high-level classifiers require a previous training and, until now, it is ambiguous how many subjects are required for this, which is the focus of this work. 23 MRI datasets of acute stroke patients with known tissue outcome were used in this work. Relative values of diffusion and perfusion parameters as well as the binary tissue outcome were extracted on a voxel-by- voxel level for all patients and used for training of a random forest classifier. The number of patients used for training set definition was iteratively and randomly reduced from using all 22 other patients to only one other patient. Thus, 22 tissue outcome predictions were generated for each patient using the trained random forest classifiers and compared to the known tissue outcome using the Dice coefficient. Overall, a logarithmic relation between the number of patients used for training set definition and tissue outcome prediction accuracy was found. Quantitatively, a mean Dice coefficient of 0.45 was found for the prediction using the training set consisting of the voxel information from only one other patient, which increases to 0.53 if using all other patients (n=22). Based on extrapolation, 50-100 patients appear to be a reasonable tradeoff between tissue outcome prediction accuracy and effort required for data acquisition and preparation.
Prediction of Protein-Protein Interaction Sites by Random Forest Algorithm with mRMR and IFS
Li, Bi-Qing; Feng, Kai-Yan; Chen, Lei; Huang, Tao; Cai, Yu-Dong
2012-01-01
Prediction of protein-protein interaction (PPI) sites is one of the most challenging problems in computational biology. Although great progress has been made by employing various machine learning approaches with numerous characteristic features, the problem is still far from being solved. In this study, we developed a novel predictor based on Random Forest (RF) algorithm with the Minimum Redundancy Maximal Relevance (mRMR) method followed by incremental feature selection (IFS). We incorporated features of physicochemical/biochemical properties, sequence conservation, residual disorder, secondary structure and solvent accessibility. We also included five 3D structural features to predict protein-protein interaction sites and achieved an overall accuracy of 0.672997 and MCC of 0.347977. Feature analysis showed that 3D structural features such as Depth Index (DPX) and surface curvature (SC) contributed most to the prediction of protein-protein interaction sites. It was also shown via site-specific feature analysis that the features of individual residues from PPI sites contribute most to the determination of protein-protein interaction sites. It is anticipated that our prediction method will become a useful tool for identifying PPI sites, and that the feature analysis described in this paper will provide useful insights into the mechanisms of interaction. PMID:22937126
Schönig, Sarah; Recke, Andreas; Hirose, Misa; Ludwig, Ralf J; Seeger, Karsten
2013-06-26
Epidermolysis bullosa acquisita (EBA) is a rare skin blistering disease with a prevalence of 0.2/ million people. EBA is characterized by autoantibodies against type VII collagen. Type VII collagen builds anchoring fibrils that are essential for the dermal-epidermal junction. The pathogenic relevance of antibodies against type VII collagen subdomains has been demonstrated both in vitro and in vivo. Despite the multitude of clinical and immunological data, no information on metabolic changes exists. We used an animal model of EBA to obtain insights into metabolomic changes during EBA. Sera from mice with immunization-induced EBA and control mice were obtained and metabolites were isolated by filtration. Proton nuclear magnetic resonance (NMR) spectra were recorded and analyzed by principal component analysis (PCA), partial least squares discrimination analysis (PLS-DA) and random forest. The metabolic pattern of immunized mice and control mice could be clearly distinguished with PCA and PLS-DA. Metabolites that contribute to the discrimination could be identified via random forest. The observed changes in the metabolic pattern of EBA sera, i.e. increased levels of amino acid, point toward an increased energy demand in EBA. Knowledge about metabolic changes due to EBA could help in future to assess the disease status during treatment. Confirming the metabolic changes in patients needs probably large cohorts.
Mapping permafrost in the boreal forest with Thematic Mapper satellite data
NASA Technical Reports Server (NTRS)
Morrissey, L. A.; Strong, L. L.; Card, D. H.
1986-01-01
A geographic data base incorporating Landsat TM data was used to develop and evaluate logistic discriminant functions for predicting the distribution of permafrost in a boreal forest watershed. The data base included both satellite-derived information and ancillary map data. Five permafrost classifications were developed from a stratified random sample of the data base and evaluated by comparison with a photo-interpreted permafrost map using contingency table analysis and soil temperatures recorded at sites within the watershed. A classification using a TM thermal band and a TM-derived vegetation map as independent variables yielded the highest mapping accuracy for all permafrost categories.
Na, X D; Zang, S Y; Wu, C S; Li, W L
2015-11-01
Knowledge of the spatial extent of forested wetlands is essential to many studies including wetland functioning assessment, greenhouse gas flux estimation, and wildlife suitable habitat identification. For discriminating forested wetlands from their adjacent land cover types, researchers have resorted to image analysis techniques applied to numerous remotely sensed data. While with some success, there is still no consensus on the optimal approaches for mapping forested wetlands. To address this problem, we examined two machine learning approaches, random forest (RF) and K-nearest neighbor (KNN) algorithms, and applied these two approaches to the framework of pixel-based and object-based classifications. The RF and KNN algorithms were constructed using predictors derived from Landsat 8 imagery, Radarsat-2 advanced synthetic aperture radar (SAR), and topographical indices. The results show that the objected-based classifications performed better than per-pixel classifications using the same algorithm (RF) in terms of overall accuracy and the difference of their kappa coefficients are statistically significant (p<0.01). There were noticeably omissions for forested and herbaceous wetlands based on the per-pixel classifications using the RF algorithm. As for the object-based image analysis, there were also statistically significant differences (p<0.01) of Kappa coefficient between results performed based on RF and KNN algorithms. The object-based classification using RF provided a more visually adequate distribution of interested land cover types, while the object classifications based on the KNN algorithm showed noticeably commissions for forested wetlands and omissions for agriculture land. This research proves that the object-based classification with RF using optical, radar, and topographical data improved the mapping accuracy of land covers and provided a feasible approach to discriminate the forested wetlands from the other land cover types in forestry area.
Introducing two Random Forest based methods for cloud detection in remote sensing images
NASA Astrophysics Data System (ADS)
Ghasemian, Nafiseh; Akhoondzadeh, Mehdi
2018-07-01
Cloud detection is a necessary phase in satellite images processing to retrieve the atmospheric and lithospheric parameters. Currently, some cloud detection methods based on Random Forest (RF) model have been proposed but they do not consider both spectral and textural characteristics of the image. Furthermore, they have not been tested in the presence of snow/ice. In this paper, we introduce two RF based algorithms, Feature Level Fusion Random Forest (FLFRF) and Decision Level Fusion Random Forest (DLFRF) to incorporate visible, infrared (IR) and thermal spectral and textural features (FLFRF) including Gray Level Co-occurrence Matrix (GLCM) and Robust Extended Local Binary Pattern (RELBP_CI) or visible, IR and thermal classifiers (DLFRF) for highly accurate cloud detection on remote sensing images. FLFRF first fuses visible, IR and thermal features. Thereafter, it uses the RF model to classify pixels to cloud, snow/ice and background or thick cloud, thin cloud and background. DLFRF considers visible, IR and thermal features (both spectral and textural) separately and inserts each set of features to RF model. Then, it holds vote matrix of each run of the model. Finally, it fuses the classifiers using the majority vote method. To demonstrate the effectiveness of the proposed algorithms, 10 Terra MODIS and 15 Landsat 8 OLI/TIRS images with different spatial resolutions are used in this paper. Quantitative analyses are based on manually selected ground truth data. Results show that after adding RELBP_CI to input feature set cloud detection accuracy improves. Also, the average cloud kappa values of FLFRF and DLFRF on MODIS images (1 and 0.99) are higher than other machine learning methods, Linear Discriminate Analysis (LDA), Classification And Regression Tree (CART), K Nearest Neighbor (KNN) and Support Vector Machine (SVM) (0.96). The average snow/ice kappa values of FLFRF and DLFRF on MODIS images (1 and 0.85) are higher than other traditional methods. The quantitative values on Landsat 8 images show similar trend. Consequently, while SVM and K-nearest neighbor show overestimation in predicting cloud and snow/ice pixels, our Random Forest (RF) based models can achieve higher cloud, snow/ice kappa values on MODIS and thin cloud, thick cloud and snow/ice kappa values on Landsat 8 images. Our algorithms predict both thin and thick cloud on Landsat 8 images while the existing cloud detection algorithm, Fmask cannot discriminate them. Compared to the state-of-the-art methods, our algorithms have acquired higher average cloud and snow/ice kappa values for different spatial resolutions.
Multiple filters affect tree species assembly in mid-latitude forest communities.
Kubota, Y; Kusumoto, B; Shiono, T; Ulrich, W
2018-05-01
Species assembly patterns of local communities are shaped by the balance between multiple abiotic/biotic filters and dispersal that both select individuals from species pools at the regional scale. Knowledge regarding functional assembly can provide insight into the relative importance of the deterministic and stochastic processes that shape species assembly. We evaluated the hierarchical roles of the α niche and β niches by analyzing the influence of environmental filtering relative to functional traits on geographical patterns of tree species assembly in mid-latitude forests. Using forest plot datasets, we examined the α niche traits (leaf and wood traits) and β niche properties (cold/drought tolerance) of tree species, and tested non-randomness (clustering/over-dispersion) of trait assembly based on null models that assumed two types of species pools related to biogeographical regions. For most plots, species assembly patterns fell within the range of random expectation. However, particularly for cold/drought tolerance-related β niche properties, deviation from randomness was frequently found; non-random clustering was predominant in higher latitudes with harsh climates. Our findings demonstrate that both randomness and non-randomness in trait assembly emerged as a result of the α and β niches, although we suggest the potential role of dispersal processes and/or species equalization through trait similarities in generating the prevalence of randomness. Clustering of β niche traits along latitudinal climatic gradients provides clear evidence of species sorting by filtering particular traits. Our results reveal that multiple filters through functional niches and stochastic processes jointly shape geographical patterns of species assembly across mid-latitude forests.
Li, Qinghong; Freeman, Lisa M; Rush, John E; Huggins, Gordon S; Kennedy, Adam D; Labuda, Jeffrey A; Laflamme, Dorothy P; Hannah, Steven S
2015-08-01
Canine degenerative mitral valve disease (DMVD) is the most common form of heart disease in dogs. The objective of this study was to identify cellular and metabolic pathways that play a role in DMVD by performing metabolomics and transcriptomics analyses on serum and tissue (mitral valve and left ventricle) samples previously collected from dogs with DMVD or healthy hearts. Gas or liquid chromatography followed by mass spectrophotometry were used to identify metabolites in serum. Transcriptomics analysis of tissue samples was completed using RNA-seq, and selected targets were confirmed by RT-qPCR. Random Forest analysis was used to classify the metabolites that best predicted the presence of DMVD. Results identified 41 known and 13 unknown serum metabolites that were significantly different between healthy and DMVD dogs, representing alterations in fat and glucose energy metabolism, oxidative stress, and other pathways. The three metabolites with the greatest single effect in the Random Forest analysis were γ-glutamylmethionine, oxidized glutathione, and asymmetric dimethylarginine. Transcriptomics analysis identified 812 differentially expressed transcripts in left ventricle samples and 263 in mitral valve samples, representing changes in energy metabolism, antioxidant function, nitric oxide signaling, and extracellular matrix homeostasis pathways. Many of the identified alterations may benefit from nutritional or medical management. Our study provides evidence of the growing importance of integrative approaches in multi-omics research in veterinary and nutritional sciences.
Lee, Soo Yee; Mediani, Ahmed; Maulidiani, Maulidiani; Khatib, Alfi; Ismail, Intan Safinar; Zawawi, Norhasnida; Abas, Faridah
2018-01-01
Neptunia oleracea is a plant consumed as a vegetable and which has been used as a folk remedy for several diseases. Herein, two regression models (partial least squares, PLS; and random forest, RF) in a metabolomics approach were compared and applied to the evaluation of the relationship between phenolics and bioactivities of N. oleracea. In addition, the effects of different extraction conditions on the phenolic constituents were assessed by pattern recognition analysis. Comparison of the PLS and RF showed that RF exhibited poorer generalization and hence poorer predictive performance. Both the regression coefficient of PLS and the variable importance of RF revealed that quercetin and kaempferol derivatives, caffeic acid and vitexin-2-O-rhamnoside were significant towards the tested bioactivities. Furthermore, principal component analysis (PCA) and partial least squares-discriminant analysis (PLS-DA) results showed that sonication and absolute ethanol are the preferable extraction method and ethanol ratio, respectively, to produce N. oleracea extracts with high phenolic levels and therefore high DPPH scavenging and α-glucosidase inhibitory activities. Both PLS and RF are useful regression models in metabolomics studies. This work provides insight into the performance of different multivariate data analysis tools and the effects of different extraction conditions on the extraction of desired phenolics from plants. © 2017 Society of Chemical Industry. © 2017 Society of Chemical Industry.
Hand pose estimation in depth image using CNN and random forest
NASA Astrophysics Data System (ADS)
Chen, Xi; Cao, Zhiguo; Xiao, Yang; Fang, Zhiwen
2018-03-01
Thanks to the availability of low cost depth cameras, like Microsoft Kinect, 3D hand pose estimation attracted special research attention in these years. Due to the large variations in hand`s viewpoint and the high dimension of hand motion, 3D hand pose estimation is still challenging. In this paper we propose a two-stage framework which joint with CNN and Random Forest to boost the performance of hand pose estimation. First, we use a standard Convolutional Neural Network (CNN) to regress the hand joints` locations. Second, using a Random Forest to refine the joints from the first stage. In the second stage, we propose a pyramid feature which merges the information flow of the CNN. Specifically, we get the rough joints` location from first stage, then rotate the convolutional feature maps (and image). After this, for each joint, we map its location to each feature map (and image) firstly, then crop features at each feature map (and image) around its location, put extracted features to Random Forest to refine at last. Experimentally, we evaluate our proposed method on ICVL dataset and get the mean error about 11mm, our method is also real-time on a desktop.
Mapping of land cover in northern California with simulated hyperspectral satellite imagery
NASA Astrophysics Data System (ADS)
Clark, Matthew L.; Kilham, Nina E.
2016-09-01
Land-cover maps are important science products needed for natural resource and ecosystem service management, biodiversity conservation planning, and assessing human-induced and natural drivers of land change. Analysis of hyperspectral, or imaging spectrometer, imagery has shown an impressive capacity to map a wide range of natural and anthropogenic land cover. Applications have been mostly with single-date imagery from relatively small spatial extents. Future hyperspectral satellites will provide imagery at greater spatial and temporal scales, and there is a need to assess techniques for mapping land cover with these data. Here we used simulated multi-temporal HyspIRI satellite imagery over a 30,000 km2 area in the San Francisco Bay Area, California to assess its capabilities for mapping classes defined by the international Land Cover Classification System (LCCS). We employed a mapping methodology and analysis framework that is applicable to regional and global scales. We used the Random Forests classifier with three sets of predictor variables (reflectance, MNF, hyperspectral metrics), two temporal resolutions (summer, spring-summer-fall), two sample scales (pixel, polygon) and two levels of classification complexity (12, 20 classes). Hyperspectral metrics provided a 16.4-21.8% and 3.1-6.7% increase in overall accuracy relative to MNF and reflectance bands, respectively, depending on pixel or polygon scales of analysis. Multi-temporal metrics improved overall accuracy by 0.9-3.1% over summer metrics, yet increases were only significant at the pixel scale of analysis. Overall accuracy at pixel scales was 72.2% (Kappa 0.70) with three seasons of metrics. Anthropogenic and homogenous natural vegetation classes had relatively high confidence and producer and user accuracies were over 70%; in comparison, woodland and forest classes had considerable confusion. We next focused on plant functional types with relatively pure spectra by removing open-canopy shrublands, woodlands and mixed forests from the classification. This 12-class map had significantly improved accuracy of 85.1% (Kappa 0.83) and most classes had over 70% producer and user accuracies. Finally, we summarized important metrics from the multi-temporal Random Forests to infer the underlying chemical and structural properties that best discriminated our land-cover classes across seasons.
Recent drought conditions in the Conterminous United States
Frank H. Koch; William D. Smith; John W. Coulston
2013-01-01
Droughts are common in virtually all U.S. forests, but their frequency and intensity vary widely both between and within forest ecosystems (Hanson and Weltzin 2000). Forests in the Western United States generally exhibit a pattern of annual seasonal droughts. Forests in the Eastern United States tend to exhibit one of two prevailing patterns: random occasional droughts...
Random forests for classification in ecology
Cutler, D.R.; Edwards, T.C.; Beard, K.H.; Cutler, A.; Hess, K.T.; Gibson, J.; Lawler, J.J.
2007-01-01
Classification procedures are some of the most widely used statistical methods in ecology. Random forests (RF) is a new and powerful statistical classifier that is well established in other disciplines but is relatively unknown in ecology. Advantages of RF compared to other statistical classifiers include (1) very high classification accuracy; (2) a novel method of determining variable importance; (3) ability to model complex interactions among predictor variables; (4) flexibility to perform several types of statistical data analysis, including regression, classification, survival analysis, and unsupervised learning; and (5) an algorithm for imputing missing values. We compared the accuracies of RF and four other commonly used statistical classifiers using data on invasive plant species presence in Lava Beds National Monument, California, USA, rare lichen species presence in the Pacific Northwest, USA, and nest sites for cavity nesting birds in the Uinta Mountains, Utah, USA. We observed high classification accuracy in all applications as measured by cross-validation and, in the case of the lichen data, by independent test data, when comparing RF to other common classification methods. We also observed that the variables that RF identified as most important for classifying invasive plant species coincided with expectations based on the literature. ?? 2007 by the Ecological Society of America.
2018-01-01
Background Many studies have tried to develop predictors for return-to-work (RTW). However, since complex factors have been demonstrated to predict RTW, it is difficult to use them practically. This study investigated whether factors used in previous studies could predict whether an individual had returned to his/her original work by four years after termination of the worker's recovery period. Methods An initial logistic regression analysis of 1,567 participants of the fourth Panel Study of Worker's Compensation Insurance yielded odds ratios. The participants were divided into two subsets, a training dataset and a test dataset. Using the training dataset, logistic regression, decision tree, random forest, and support vector machine models were established, and important variables of each model were identified. The predictive abilities of the different models were compared. Results The analysis showed that only earned income and company-related factors significantly affected return-to-original-work (RTOW). The random forest model showed the best accuracy among the tested machine learning models; however, the difference was not prominent. Conclusion It is possible to predict a worker's probability of RTOW using machine learning techniques with moderate accuracy. PMID:29736160
Random Forest Segregation of Drug Responses May Define Regions of Biological Significance.
Bukhari, Qasim; Borsook, David; Rudin, Markus; Becerra, Lino
2016-01-01
The ability to assess brain responses in unsupervised manner based on fMRI measure has remained a challenge. Here we have applied the Random Forest (RF) method to detect differences in the pharmacological MRI (phMRI) response in rats to treatment with an analgesic drug (buprenorphine) as compared to control (saline). Three groups of animals were studied: two groups treated with different doses of the opioid buprenorphine, low (LD), and high dose (HD), and one receiving saline. PhMRI responses were evaluated in 45 brain regions and RF analysis was applied to allocate rats to the individual treatment groups. RF analysis was able to identify drug effects based on differential phMRI responses in the hippocampus, amygdala, nucleus accumbens, superior colliculus, and the lateral and posterior thalamus for drug vs. saline. These structures have high levels of mu opioid receptors. In addition these regions are involved in aversive signaling, which is inhibited by mu opioids. The results demonstrate that buprenorphine mediated phMRI responses comprise characteristic features that allow a supervised differentiation from placebo treated rats as well as the proper allocation to the respective drug dose group using the RF method, a method that has been successfully applied in clinical studies.
Texture analysis of common renal masses in multiple MR sequences for prediction of pathology
NASA Astrophysics Data System (ADS)
Hoang, Uyen N.; Malayeri, Ashkan A.; Lay, Nathan S.; Summers, Ronald M.; Yao, Jianhua
2017-03-01
This pilot study performs texture analysis on multiple magnetic resonance (MR) images of common renal masses for differentiation of renal cell carcinoma (RCC). Bounding boxes are drawn around each mass on one axial slice in T1 delayed sequence to use for feature extraction and classification. All sequences (T1 delayed, venous, arterial, pre-contrast phases, T2, and T2 fat saturated sequences) are co-registered and texture features are extracted from each sequence simultaneously. Random forest is used to construct models to classify lesions on 96 normal regions, 87 clear cell RCCs, 8 papillary RCCs, and 21 renal oncocytomas; ground truths are verified through pathology reports. The highest performance is seen in random forest model when data from all sequences are used in conjunction, achieving an overall classification accuracy of 83.7%. When using data from one single sequence, the overall accuracies achieved for T1 delayed, venous, arterial, and pre-contrast phase, T2, and T2 fat saturated were 79.1%, 70.5%, 56.2%, 61.0%, 60.0%, and 44.8%, respectively. This demonstrates promising results of utilizing intensity information from multiple MR sequences for accurate classification of renal masses.
NASA Astrophysics Data System (ADS)
Melville, Bethany; Lucieer, Arko; Aryal, Jagannath
2018-04-01
This paper presents a random forest classification approach for identifying and mapping three types of lowland native grassland communities found in the Tasmanian Midlands region. Due to the high conservation priority assigned to these communities, there has been an increasing need to identify appropriate datasets that can be used to derive accurate and frequently updateable maps of community extent. Therefore, this paper proposes a method employing repeat classification and statistical significance testing as a means of identifying the most appropriate dataset for mapping these communities. Two datasets were acquired and analysed; a Landsat ETM+ scene, and a WorldView-2 scene, both from 2010. Training and validation data were randomly subset using a k-fold (k = 50) approach from a pre-existing field dataset. Poa labillardierei, Themeda triandra and lowland native grassland complex communities were identified in addition to dry woodland and agriculture. For each subset of randomly allocated points, a random forest model was trained based on each dataset, and then used to classify the corresponding imagery. Validation was performed using the reciprocal points from the independent subset that had not been used to train the model. Final training and classification accuracies were reported as per class means for each satellite dataset. Analysis of Variance (ANOVA) was undertaken to determine whether classification accuracy differed between the two datasets, as well as between classifications. Results showed mean class accuracies between 54% and 87%. Class accuracy only differed significantly between datasets for the dry woodland and Themeda grassland classes, with the WorldView-2 dataset showing higher mean classification accuracies. The results of this study indicate that remote sensing is a viable method for the identification of lowland native grassland communities in the Tasmanian Midlands, and that repeat classification and statistical significant testing can be used to identify optimal datasets for vegetation community mapping.
RF-Phos: A Novel General Phosphorylation Site Prediction Tool Based on Random Forest.
Ismail, Hamid D; Jones, Ahoi; Kim, Jung H; Newman, Robert H; Kc, Dukka B
2016-01-01
Protein phosphorylation is one of the most widespread regulatory mechanisms in eukaryotes. Over the past decade, phosphorylation site prediction has emerged as an important problem in the field of bioinformatics. Here, we report a new method, termed Random Forest-based Phosphosite predictor 2.0 (RF-Phos 2.0), to predict phosphorylation sites given only the primary amino acid sequence of a protein as input. RF-Phos 2.0, which uses random forest with sequence and structural features, is able to identify putative sites of phosphorylation across many protein families. In side-by-side comparisons based on 10-fold cross validation and an independent dataset, RF-Phos 2.0 compares favorably to other popular mammalian phosphosite prediction methods, such as PhosphoSVM, GPS2.1, and Musite.
Statistical Approaches to Type Determination of the Ejector Marks on Cartridge Cases.
Warren, Eric M; Sheets, H David
2018-03-01
While type determination on bullets has been performed for over a century, type determination on cartridge cases is often overlooked. Presented here is an example of type determination of ejector marks on cartridge cases from Glock and Smith & Wesson Sigma series pistols using Naïve Bayes and Random Forest classification methods. The shapes of ejector marks were captured from images of test-fired cartridge cases and subjected to multivariate analysis. Naïve Bayes and Random Forest methods were used to assign the ejector shapes to the correct class of firearm with success rates as high as 98%. This method is easily implemented with equipment already available in crime laboratories and can serve as an investigative lead in the form of a list of firearms that could have fired the evidence. Paired with the FBI's General Rifling Characteristics (GRC) database, this could be an invaluable resource for firearm evidence at crime scenes. © 2017 American Academy of Forensic Sciences.
NASA Astrophysics Data System (ADS)
Georganos, Stefanos; Grippa, Tais; Vanhuysse, Sabine; Lennert, Moritz; Shimoni, Michal; Wolff, Eléonore
2017-10-01
This study evaluates the impact of three Feature Selection (FS) algorithms in an Object Based Image Analysis (OBIA) framework for Very-High-Resolution (VHR) Land Use-Land Cover (LULC) classification. The three selected FS algorithms, Correlation Based Selection (CFS), Mean Decrease in Accuracy (MDA) and Random Forest (RF) based Recursive Feature Elimination (RFE), were tested on Support Vector Machine (SVM), K-Nearest Neighbor, and Random Forest (RF) classifiers. The results demonstrate that the accuracy of SVM and KNN classifiers are the most sensitive to FS. The RF appeared to be more robust to high dimensionality, although a significant increase in accuracy was found by using the RFE method. In terms of classification accuracy, SVM performed the best using FS, followed by RF and KNN. Finally, only a small number of features is needed to achieve the highest performance using each classifier. This study emphasizes the benefits of rigorous FS for maximizing performance, as well as for minimizing model complexity and interpretation.
NASA Astrophysics Data System (ADS)
Morizet, N.; Godin, N.; Tang, J.; Maillet, E.; Fregonese, M.; Normand, B.
2016-03-01
This paper aims to propose a novel approach to classify acoustic emission (AE) signals deriving from corrosion experiments, even if embedded into a noisy environment. To validate this new methodology, synthetic data are first used throughout an in-depth analysis, comparing Random Forests (RF) to the k-Nearest Neighbor (k-NN) algorithm. Moreover, a new evaluation tool called the alter-class matrix (ACM) is introduced to simulate different degrees of uncertainty on labeled data for supervised classification. Then, tests on real cases involving noise and crevice corrosion are conducted, by preprocessing the waveforms including wavelet denoising and extracting a rich set of features as input of the RF algorithm. To this end, a software called RF-CAM has been developed. Results show that this approach is very efficient on ground truth data and is also very promising on real data, especially for its reliability, performance and speed, which are serious criteria for the chemical industry.
Taxi-Out Time Prediction for Departures at Charlotte Airport Using Machine Learning Techniques
NASA Technical Reports Server (NTRS)
Lee, Hanbong; Malik, Waqar; Jung, Yoon C.
2016-01-01
Predicting the taxi-out times of departures accurately is important for improving airport efficiency and takeoff time predictability. In this paper, we attempt to apply machine learning techniques to actual traffic data at Charlotte Douglas International Airport for taxi-out time prediction. To find the key factors affecting aircraft taxi times, surface surveillance data is first analyzed. From this data analysis, several variables, including terminal concourse, spot, runway, departure fix and weight class, are selected for taxi time prediction. Then, various machine learning methods such as linear regression, support vector machines, k-nearest neighbors, random forest, and neural networks model are applied to actual flight data. Different traffic flow and weather conditions at Charlotte airport are also taken into account for more accurate prediction. The taxi-out time prediction results show that linear regression and random forest techniques can provide the most accurate prediction in terms of root-mean-square errors. We also discuss the operational complexity and uncertainties that make it difficult to predict the taxi times accurately.
Ship Detection Based on Multiple Features in Random Forest Model for Hyperspectral Images
NASA Astrophysics Data System (ADS)
Li, N.; Ding, L.; Zhao, H.; Shi, J.; Wang, D.; Gong, X.
2018-04-01
A novel method for detecting ships which aim to make full use of both the spatial and spectral information from hyperspectral images is proposed. Firstly, the band which is high signal-noise ratio in the range of near infrared or short-wave infrared spectrum, is used to segment land and sea on Otsu threshold segmentation method. Secondly, multiple features that include spectral and texture features are extracted from hyperspectral images. Principal components analysis (PCA) is used to extract spectral features, the Grey Level Co-occurrence Matrix (GLCM) is used to extract texture features. Finally, Random Forest (RF) model is introduced to detect ships based on the extracted features. To illustrate the effectiveness of the method, we carry out experiments over the EO-1 data by comparing single feature and different multiple features. Compared with the traditional single feature method and Support Vector Machine (SVM) model, the proposed method can stably achieve the target detection of ships under complex background and can effectively improve the detection accuracy of ships.
NASA Astrophysics Data System (ADS)
Norajitra, Tobias; Meinzer, Hans-Peter; Maier-Hein, Klaus H.
2015-03-01
During image segmentation, 3D Statistical Shape Models (SSM) usually conduct a limited search for target landmarks within one-dimensional search profiles perpendicular to the model surface. In addition, landmark appearance is modeled only locally based on linear profiles and weak learners, altogether leading to segmentation errors from landmark ambiguities and limited search coverage. We present a new method for 3D SSM segmentation based on 3D Random Forest Regression Voting. For each surface landmark, a Random Regression Forest is trained that learns a 3D spatial displacement function between the according reference landmark and a set of surrounding sample points, based on an infinite set of non-local randomized 3D Haar-like features. Landmark search is then conducted omni-directionally within 3D search spaces, where voxelwise forest predictions on landmark position contribute to a common voting map which reflects the overall position estimate. Segmentation experiments were conducted on a set of 45 CT volumes of the human liver, of which 40 images were randomly chosen for training and 5 for testing. Without parameter optimization, using a simple candidate selection and a single resolution approach, excellent results were achieved, while faster convergence and better concavity segmentation were observed, altogether underlining the potential of our approach in terms of increased robustness from distinct landmark detection and from better search coverage.
Multisource passive acoustic tracking: an application of random finite set data fusion
NASA Astrophysics Data System (ADS)
Ali, Andreas M.; Hudson, Ralph E.; Lorenzelli, Flavio; Yao, Kung
2010-04-01
Multisource passive acoustic tracking is useful in animal bio-behavioral study by replacing or enhancing human involvement during and after field data collection. Multiple simultaneous vocalizations are a common occurrence in a forest or a jungle, where many species are encountered. Given a set of nodes that are capable of producing multiple direction-of-arrivals (DOAs), such data needs to be combined into meaningful estimates. Random Finite Set provides the mathematical probabilistic model, which is suitable for analysis and optimal estimation algorithm synthesis. Then the proposed algorithm has been verified using a simulation and a controlled test experiment.
Diggins, Corinne A.; Silvis, Alexander; Kelly, Christine A.; Ford, W. Mark
2017-01-01
Context: Understanding habitat selection is important for determining conservation and management strategies for endangered species. The Carolina northern flying squirrel (CNFS; Glaucomys sabrinus coloratus) is an endangered subspecies found in the high-elevation montane forests of the southern Appalachians, USA. The primary use of nest boxes to monitor CNFS has provided biased information on habitat use for this subspecies, as nest boxes are typically placed in suitable denning habitat.Aims: We conducted a radio-telemetry study on CNFS to determine home range, den site selection and habitat use at multiple spatial scales.Methods: We radio-collared 21 CNFS in 2012 and 2014–15. We tracked squirrels to diurnal den sites and during night-time activity.Key results: The MCP (minimum convex polygon) home range at 95% for males was 5.2 ± 1.2 ha and for females was 4.0 ± 0.7. The BRB (biased random bridge) home range at 95% for males was 10.8 ± 3.8 ha and for females was 8.3 ± 2.1. Den site (n = 81) selection occurred more frequently in montane conifer dominate forests (81.4%) vs northern hardwood forests or conifer–northern hardwood forests (9.9% and 8.7%, respectively). We assessed habitat selection using Euclidean distance-based analysis at the 2nd order and 3rd order scale. We found that squirrels were non-randomly selecting for habitat at both 2nd and 3rd order scales.Conclusions: At both spatial scales, CNFS preferentially selected for montane conifer forests more than expected based on availability on the landscape. Squirrels selected neither for nor against northern hardwood forests, regardless of availability on the landscape. Additionally, CNFS denned in montane conifer forests more than other habitat types.Implications: Our results highlight the importance of montane conifer to CNFS in the southern Appalachians. Management and restoration activities that increase the quality, connectivity and extent of this naturally rare forest type may be important for long-term conservation of this subspecies, especially with the impending threat of anthropogenic climate change.
Nunes, Matheus Henrique
2016-01-01
Tree stem form in native tropical forests is very irregular, posing a challenge to establishing taper equations that can accurately predict the diameter at any height along the stem and subsequently merchantable volume. Artificial intelligence approaches can be useful techniques in minimizing estimation errors within complex variations of vegetation. We evaluated the performance of Random Forest® regression tree and Artificial Neural Network procedures in modelling stem taper. Diameters and volume outside bark were compared to a traditional taper-based equation across a tropical Brazilian savanna, a seasonal semi-deciduous forest and a rainforest. Neural network models were found to be more accurate than the traditional taper equation. Random forest showed trends in the residuals from the diameter prediction and provided the least precise and accurate estimations for all forest types. This study provides insights into the superiority of a neural network, which provided advantages regarding the handling of local effects. PMID:27187074
Electromagnetic wave extinction within a forested canopy
NASA Technical Reports Server (NTRS)
Karam, M. A.; Fung, A. K.
1989-01-01
A forested canopy is modeled by a collection of randomly oriented finite-length cylinders shaded by randomly oriented and distributed disk- or needle-shaped leaves. For a plane wave exciting the forested canopy, the extinction coefficient is formulated in terms of the extinction cross sections (ECSs) in the local frame of each forest component and the Eulerian angles of orientation (used to describe the orientation of each component). The ECSs in the local frame for the finite-length cylinders used to model the branches are obtained by using the forward-scattering theorem. ECSs in the local frame for the disk- and needle-shaped leaves are obtained by the summation of the absorption and scattering cross-sections. The behavior of the extinction coefficients with the incidence angle is investigated numerically for both deciduous and coniferous forest. The dependencies of the extinction coefficients on the orientation of the leaves are illustrated numerically.
Nunes, Matheus Henrique; Görgens, Eric Bastos
2016-01-01
Tree stem form in native tropical forests is very irregular, posing a challenge to establishing taper equations that can accurately predict the diameter at any height along the stem and subsequently merchantable volume. Artificial intelligence approaches can be useful techniques in minimizing estimation errors within complex variations of vegetation. We evaluated the performance of Random Forest® regression tree and Artificial Neural Network procedures in modelling stem taper. Diameters and volume outside bark were compared to a traditional taper-based equation across a tropical Brazilian savanna, a seasonal semi-deciduous forest and a rainforest. Neural network models were found to be more accurate than the traditional taper equation. Random forest showed trends in the residuals from the diameter prediction and provided the least precise and accurate estimations for all forest types. This study provides insights into the superiority of a neural network, which provided advantages regarding the handling of local effects.
E. Freeman; G. Moisen; J. Coulston; B. Wilson
2014-01-01
Random forests (RF) and stochastic gradient boosting (SGB), both involving an ensemble of classification and regression trees, are compared for modeling tree canopy cover for the 2011 National Land Cover Database (NLCD). The objectives of this study were twofold. First, sensitivity of RF and SGB to choices in tuning parameters was explored. Second, performance of the...
Relationship of field and LiDAR estimates of forest canopy cover with snow accumulation and melt
Mariana Dobre; William J. Elliot; Joan Q. Wu; Timothy E. Link; Brandon Glaza; Theresa B. Jain; Andrew T. Hudak
2012-01-01
At the Priest River Experimental Forest in northern Idaho, USA, snow water equivalent (SWE) was recorded over a period of six years on random, equally-spaced plots in ~4.5 ha small watersheds (n=10). Two watersheds were selected as controls and eight as treatments, with two watersheds randomly assigned per treatment as follows: harvest (2007) followed by mastication (...
Elizabeth A. Freeman; Gretchen G. Moisen; John W. Coulston; Barry T. (Ty) Wilson
2015-01-01
As part of the development of the 2011 National Land Cover Database (NLCD) tree canopy cover layer, a pilot project was launched to test the use of high-resolution photography coupled with extensive ancillary data to map the distribution of tree canopy cover over four study regions in the conterminous US. Two stochastic modeling techniques, random forests (RF...
Chapter4 - Drought patterns in the conterminous United States and Hawaii.
Frank H. Koch; William D. Smith; John W. Coulston
2014-01-01
Droughts are common in virtually all U.S. forests, but their frequency and intensity vary widely both between and within forest ecosystems (Hanson and Weltzin 2000). Forests in the Western United States generally exhibit a pattern of annual seasonal droughts. Forests in the Eastern United States tend to exhibit one of two prevailing patterns: random occasional droughts...
Steve Zack; William F. Laudenslayer; Luke George; Carl Skinner; William Oliver
1999-01-01
At two different locations in northeast California, an interdisciplinary team of scientists is initiating long-term studies to quantify the effects of forest manipulations intended to accelerate andlor enhance late-successional structure of eastside pine forest ecosystems. One study, at Blacks Mountain Experimental Forest, uses a split-plot, factorial, randomized block...
Probabilistic risk models for multiple disturbances: an example of forest insects and wildfires
Haiganoush K. Preisler; Alan A. Ager; Jane L. Hayes
2010-01-01
Building probabilistic risk models for highly random forest disturbances like wildfire and forest insect outbreaks is a challenging. Modeling the interactions among natural disturbances is even more difficult. In the case of wildfire and forest insects, we looked at the probability of a large fire given an insect outbreak and also the incidence of insect outbreaks...
Utilizing random forests imputation of forest plot data for landscape-level wildfire analyses
Karin L. Riley; Isaac C. Grenfell; Mark A. Finney; Nicholas L. Crookston
2014-01-01
Maps of the number, size, and species of trees in forests across the United States are desirable for a number of applications. For landscape-level fire and forest simulations that use the Forest Vegetation Simulator (FVS), a spatial tree-level dataset, or âtree listâ, is a necessity. FVS is widely used at the stand level for simulating fire effects on tree mortality,...
Machine Learning Methods for Production Cases Analysis
NASA Astrophysics Data System (ADS)
Mokrova, Nataliya V.; Mokrov, Alexander M.; Safonova, Alexandra V.; Vishnyakov, Igor V.
2018-03-01
Approach to analysis of events occurring during the production process were proposed. Described machine learning system is able to solve classification tasks related to production control and hazard identification at an early stage. Descriptors of the internal production network data were used for training and testing of applied models. k-Nearest Neighbors and Random forest methods were used to illustrate and analyze proposed solution. The quality of the developed classifiers was estimated using standard statistical metrics, such as precision, recall and accuracy.
Carvajal, Thaddeus M; Viacrusis, Katherine M; Hernandez, Lara Fides T; Ho, Howell T; Amalin, Divina M; Watanabe, Kozo
2018-04-17
Several studies have applied ecological factors such as meteorological variables to develop models and accurately predict the temporal pattern of dengue incidence or occurrence. With the vast amount of studies that investigated this premise, the modeling approaches differ from each study and only use a single statistical technique. It raises the question of whether which technique would be robust and reliable. Hence, our study aims to compare the predictive accuracy of the temporal pattern of Dengue incidence in Metropolitan Manila as influenced by meteorological factors from four modeling techniques, (a) General Additive Modeling, (b) Seasonal Autoregressive Integrated Moving Average with exogenous variables (c) Random Forest and (d) Gradient Boosting. Dengue incidence and meteorological data (flood, precipitation, temperature, southern oscillation index, relative humidity, wind speed and direction) of Metropolitan Manila from January 1, 2009 - December 31, 2013 were obtained from respective government agencies. Two types of datasets were used in the analysis; observed meteorological factors (MF) and its corresponding delayed or lagged effect (LG). After which, these datasets were subjected to the four modeling techniques. The predictive accuracy and variable importance of each modeling technique were calculated and evaluated. Among the statistical modeling techniques, Random Forest showed the best predictive accuracy. Moreover, the delayed or lag effects of the meteorological variables was shown to be the best dataset to use for such purpose. Thus, the model of Random Forest with delayed meteorological effects (RF-LG) was deemed the best among all assessed models. Relative humidity was shown to be the top-most important meteorological factor in the best model. The study exhibited that there are indeed different predictive outcomes generated from each statistical modeling technique and it further revealed that the Random forest model with delayed meteorological effects to be the best in predicting the temporal pattern of Dengue incidence in Metropolitan Manila. It is also noteworthy that the study also identified relative humidity as an important meteorological factor along with rainfall and temperature that can influence this temporal pattern.
Random Forest Application for NEXRAD Radar Data Quality Control
NASA Astrophysics Data System (ADS)
Keem, M.; Seo, B. C.; Krajewski, W. F.
2017-12-01
Identification and elimination of non-meteorological radar echoes (e.g., returns from ground, wind turbines, and biological targets) are the basic data quality control steps before radar data use in quantitative applications (e.g., precipitation estimation). Although WSR-88Ds' recent upgrade to dual-polarization has enhanced this quality control and echo classification, there are still challenges to detect some non-meteorological echoes that show precipitation-like characteristics (e.g., wind turbine or anomalous propagation clutter embedded in rain). With this in mind, a new quality control method using Random Forest is proposed in this study. This classification algorithm is known to produce reliable results with less uncertainty. The method introduces randomness into sampling and feature selections and integrates consequent multiple decision trees. The multidimensional structure of the trees can characterize the statistical interactions of involved multiple features in complex situations. The authors explore the performance of Random Forest method for NEXRAD radar data quality control. Training datasets are selected using several clear cases of precipitation and non-precipitation (but with some non-meteorological echoes). The model is structured using available candidate features (from the NEXRAD data) such as horizontal reflectivity, differential reflectivity, differential phase shift, copolar correlation coefficient, and their horizontal textures (e.g., local standard deviation). The influence of each feature on classification results are quantified by variable importance measures that are automatically estimated by the Random Forest algorithm. Therefore, the number and types of features in the final forest can be examined based on the classification accuracy. The authors demonstrate the capability of the proposed approach using several cases ranging from distinct to complex rain/no-rain events and compare the performance with the existing algorithms (e.g., MRMS). They also discuss operational feasibility based on the observed strength and weakness of the method.
LaManna, Joseph A.; Martin, Thomas E.
2017-01-01
Understanding the causes underlying changes in species diversity is a fundamental pursuit of ecology. Animal species richness and composition often change with decreased forest structural complexity associated with logging. Yet differences in latitude and forest type may strongly influence how species diversity responds to logging. We performed a meta-analysis of logging effects on local species richness and composition of birds across the world and assessed responses by different guilds (nesting strata, foraging strata, diet, and body size). This approach allowed identification of species attributes that might underlie responses to this anthropogenic disturbance. We only examined studies that allowed forests to regrow naturally following logging, and accounted for logging intensity, spatial extent, successional regrowth after logging, and the change in species composition expected due to random assembly from regional species pools. Selective logging in the tropics and clearcut logging in temperate latitudes caused loss of species from nearly all forest strata (ground to canopy), leading to substantial declines in species richness (up to 27% of species). Few species were lost or gained following any intensity of logging in lower-latitude temperate forests, but the relative abundances of these species changed substantially. Selective logging at higher-temperate latitudes generally replaced late-successional specialists with early-successional specialists, leading to no net changes in species richness but large changes in species composition. Removing less basal area during logging mitigated the loss of avian species from all forests and, in some cases, increased diversity in temperate forests. This meta-analysis provides insights into the important role of habitat specialization in determining differential responses of animal communities to logging across tropical and temperate latitudes.
LaManna, Joseph A; Martin, Thomas E
2017-08-01
Understanding the causes underlying changes in species diversity is a fundamental pursuit of ecology. Animal species richness and composition often change with decreased forest structural complexity associated with logging. Yet differences in latitude and forest type may strongly influence how species diversity responds to logging. We performed a meta-analysis of logging effects on local species richness and composition of birds across the world and assessed responses by different guilds (nesting strata, foraging strata, diet, and body size). This approach allowed identification of species attributes that might underlie responses to this anthropogenic disturbance. We only examined studies that allowed forests to regrow naturally following logging, and accounted for logging intensity, spatial extent, successional regrowth after logging, and the change in species composition expected due to random assembly from regional species pools. Selective logging in the tropics and clearcut logging in temperate latitudes caused loss of species from nearly all forest strata (ground to canopy), leading to substantial declines in species richness (up to 27% of species). Few species were lost or gained following any intensity of logging in lower-latitude temperate forests, but the relative abundances of these species changed substantially. Selective logging at higher-temperate latitudes generally replaced late-successional specialists with early-successional specialists, leading to no net changes in species richness but large changes in species composition. Removing less basal area during logging mitigated the loss of avian species from all forests and, in some cases, increased diversity in temperate forests. This meta-analysis provides insights into the important role of habitat specialization in determining differential responses of animal communities to logging across tropical and temperate latitudes. © 2016 Cambridge Philosophical Society.
Li, Guo Chun; Song, Hua Dong; Li, Qi; Bu, Shu Hai
2017-11-01
In Abies fargesii forests of the giant panda's habitats in Mt. Taibai, the spatial distribution patterns and interspecific associations of main tree species and their spatial associations with the understory flowering Fargesia qinlingensis were analyzed at multiple scales by univariate and bivaria-te O-ring function in point pattern analysis. The results showed that in the A. fargesii forest, the number of A. fargesii was largest but its population structure was in decline. The population of Betula platyphylla was relatively young, with a stable population structure, while the population of B. albo-sinensis declined. The three populations showed aggregated distributions at small scales and gradually showed random distributions with increasing spatial scales. Spatial associations among tree species were mainly showed at small scales and gradually became not spatially associated with increasing scale. A. fargesii and B. platyphylla were positively associated with flowering F. qinlingensis at large and medium scales, whereas B. albo-sinensis showed negatively associated with flowering F. qinlingensis at large and medium scales. The interaction between trees and F. qinlingensis in the habitats of giant panda promoted the dynamic succession and development of forests, which changed the environment of giant panda's habitats in Qinling.
Diversity and habitat association of small mammals in Aridtsy forest, Awi Zone, Ethiopia.
Bantihun, Getachew; Bekele, Afework
2015-03-18
Here, we conducted a survey to examine the diversity, distribution and habitat association of small mammals from August 2011 to February 2012 incorporating both wet and dry seasons in Aridtsy forest, Awi Zone, Ethiopia. Using Sherman live traps and snap traps in four randomly selected trapping grids, namely, natural forest, bushland, grassland and farmland, a total of 468 individuals comprising eight species of small mammals (live traps) and 89 rodents of six species (snap traps) were trapped in 2352 and 1200 trap nights, respectively. The trapped small mammals included seven rodents and one insectivore: Lophuromys flavopuntatus (30.6%), Arvicanthis dembeensis (25.8%), Stenocephalemys albipes (20%), Mastomys natalensis (11.6%), Pelomys harringtoni (6.4%), Acomys cahirinus (4.3%), Lemniscomys zebra (0.2%) and the greater red musk shrew (Crocidura flavescens, 1.1%). Analysis showed statistically significant variations in the abundance and habitat preferences of small mammals between habitats during wet and dry seasons.
Fault Detection of Aircraft System with Random Forest Algorithm and Similarity Measure
Park, Wookje; Jung, Sikhang
2014-01-01
Research on fault detection algorithm was developed with the similarity measure and random forest algorithm. The organized algorithm was applied to unmanned aircraft vehicle (UAV) that was readied by us. Similarity measure was designed by the help of distance information, and its usefulness was also verified by proof. Fault decision was carried out by calculation of weighted similarity measure. Twelve available coefficients among healthy and faulty status data group were used to determine the decision. Similarity measure weighting was done and obtained through random forest algorithm (RFA); RF provides data priority. In order to get a fast response of decision, a limited number of coefficients was also considered. Relation of detection rate and amount of feature data were analyzed and illustrated. By repeated trial of similarity calculation, useful data amount was obtained. PMID:25057508
Bühnemann, Claudia; Li, Simon; Yu, Haiyue; Branford White, Harriet; Schäfer, Karl L; Llombart-Bosch, Antonio; Machado, Isidro; Picci, Piero; Hogendoorn, Pancras C W; Athanasou, Nicholas A; Noble, J Alison; Hassan, A Bassim
2014-01-01
Driven by genomic somatic variation, tumour tissues are typically heterogeneous, yet unbiased quantitative methods are rarely used to analyse heterogeneity at the protein level. Motivated by this problem, we developed automated image segmentation of images of multiple biomarkers in Ewing sarcoma to generate distributions of biomarkers between and within tumour cells. We further integrate high dimensional data with patient clinical outcomes utilising random survival forest (RSF) machine learning. Using material from cohorts of genetically diagnosed Ewing sarcoma with EWSR1 chromosomal translocations, confocal images of tissue microarrays were segmented with level sets and watershed algorithms. Each cell nucleus and cytoplasm were identified in relation to DAPI and CD99, respectively, and protein biomarkers (e.g. Ki67, pS6, Foxo3a, EGR1, MAPK) localised relative to nuclear and cytoplasmic regions of each cell in order to generate image feature distributions. The image distribution features were analysed with RSF in relation to known overall patient survival from three separate cohorts (185 informative cases). Variation in pre-analytical processing resulted in elimination of a high number of non-informative images that had poor DAPI localisation or biomarker preservation (67 cases, 36%). The distribution of image features for biomarkers in the remaining high quality material (118 cases, 104 features per case) were analysed by RSF with feature selection, and performance assessed using internal cross-validation, rather than a separate validation cohort. A prognostic classifier for Ewing sarcoma with low cross-validation error rates (0.36) was comprised of multiple features, including the Ki67 proliferative marker and a sub-population of cells with low cytoplasmic/nuclear ratio of CD99. Through elimination of bias, the evaluation of high-dimensionality biomarker distribution within cell populations of a tumour using random forest analysis in quality controlled tumour material could be achieved. Such an automated and integrated methodology has potential application in the identification of prognostic classifiers based on tumour cell heterogeneity.
Ito, Natsumi; Iwanaga, Hiroko; Charles, Suliana; Diway, Bibian; Sabang, John; Chong, Lucy; Nanami, Satoshi; Kamiya, Koichi; Lum, Shawn; Siregar, Ulfah J; Harada, Ko; Miyashita, Naohiko T
2017-09-12
Geographical variation in soil bacterial community structure in 26 tropical forests in Southeast Asia (Malaysia, Indonesia and Singapore) and two temperate forests in Japan was investigated to elucidate the environmental factors and mechanisms that influence biogeography of soil bacterial diversity and composition. Despite substantial environmental differences, bacterial phyla were represented in similar proportions, with Acidobacteria and Proteobacteria the dominant phyla in all forests except one mangrove forest in Sarawak, although highly significant heterogeneity in frequency of individual phyla was detected among forests. In contrast, species diversity (α-diversity) differed to a much greater extent, being nearly six-fold higher in the mangrove forest (Chao1 index = 6,862) than in forests in Singapore and Sarawak (~1,250). In addition, natural mixed dipterocarp forests had lower species diversity than acacia and oil palm plantations, indicating that aboveground tree composition does not influence soil bacterial diversity. Shannon and Chao1 indices were correlated positively, implying that skewed operational taxonomic unit (OTU) distribution was associated with the abundance of overall and rare (singleton) OTUs. No OTUs were represented in all 28 forests, and forest-specific OTUs accounted for over 70% of all detected OTUs. Forests that were geographically adjacent and/or of the same forest type had similar bacterial species composition, and a positive correlation was detected between species divergence (β-diversity) and direct distance between forests. Both α- and β-diversities were correlated with soil pH. These results suggest that soil bacterial communities in different forests evolve largely independently of each other and that soil bacterial communities adapt to their local environment, modulated by bacterial dispersal (distance effect) and forest type. Therefore, we conclude that the biogeography of soil bacteria communities described here is non-random, reflecting the influences of contemporary environmental factors and evolutionary history.
Classification of large-sized hyperspectral imagery using fast machine learning algorithms
NASA Astrophysics Data System (ADS)
Xia, Junshi; Yokoya, Naoto; Iwasaki, Akira
2017-07-01
We present a framework of fast machine learning algorithms in the context of large-sized hyperspectral images classification from the theoretical to a practical viewpoint. In particular, we assess the performance of random forest (RF), rotation forest (RoF), and extreme learning machine (ELM) and the ensembles of RF and ELM. These classifiers are applied to two large-sized hyperspectral images and compared to the support vector machines. To give the quantitative analysis, we pay attention to comparing these methods when working with high input dimensions and a limited/sufficient training set. Moreover, other important issues such as the computational cost and robustness against the noise are also discussed.
A primer on stand and forest inventory designs
H. Gyde Lund; Charles E. Thomas
1989-01-01
Covers designs for the inventory of stands and forests in detail and with worked-out examples. For stands, random sampling, line transects, ricochet plot, systematic sampling, single plot, cluster, subjective sampling and complete enumeration are discussed. For forests inventory, the main categories are subjective sampling, inventories without prior stand mapping,...
Uncertain Photometric Redshifts with Deep Learning Methods
NASA Astrophysics Data System (ADS)
D'Isanto, A.
2017-06-01
The need for accurate photometric redshifts estimation is a topic that has fundamental importance in Astronomy, due to the necessity of efficiently obtaining redshift information without the need of spectroscopic analysis. We propose a method for determining accurate multi-modal photo-z probability density functions (PDFs) using Mixture Density Networks (MDN) and Deep Convolutional Networks (DCN). A comparison with a Random Forest (RF) is performed.
NASA Astrophysics Data System (ADS)
Schünemann, Adriano Luis; Inácio Fernandes Filho, Elpídio; Rocha Francelino, Marcio; Rodrigues Santos, Gérson; Thomazini, Andre; Batista Pereira, Antônio; Gonçalves Reynaud Schaefer, Carlos Ernesto
2017-04-01
The knowledge of environmental variables values, in non-sampled sites from a minimum data set can be accessed through interpolation technique. Kriging and the classifier Random Forest algorithm are examples of predictors with this aim. The objective of this work was to compare methods of soil attributes spatialization in a recent deglaciated environment with complex landforms. Prediction of the selected soil attributes (potassium, calcium and magnesium) from ice-free areas were tested by using morphometric covariables, and geostatistical models without these covariables. For this, 106 soil samples were collected at 0-10 cm depth in Keller Peninsula, King George Island, Maritime Antarctica. Soil chemical analysis was performed by the gravimetric method, determining values of potassium, calcium and magnesium for each sampled point. Digital terrain models (DTMs) were obtained by using Terrestrial Laser Scanner. DTMs were generated from a cloud of points with spatial resolutions of 1, 5, 10, 20 and 30 m. Hence, 40 morphometric covariates were generated. Simple Kriging was performed using the R package software. The same data set coupled with morphometric covariates, was used to predict values of the studied attributes in non-sampled sites through Random Forest interpolator. Little differences were observed on the DTMs generated by Simple kriging and Random Forest interpolators. Also, DTMs with better spatial resolution did not improved the quality of soil attributes prediction. Results revealed that Simple Kriging can be used as interpolator when morphometric covariates are not available, with little impact regarding quality. It is necessary to go further in soil chemical attributes prediction techniques, especially in periglacial areas with complex landforms.
Mapping growing stock volume and forest live biomass: a case study of the Polissya region of Ukraine
NASA Astrophysics Data System (ADS)
Bilous, Andrii; Myroniuk, Viktor; Holiaka, Dmytrii; Bilous, Svitlana; See, Linda; Schepaschenko, Dmitry
2017-10-01
Forest inventory and biomass mapping are important tasks that require inputs from multiple data sources. In this paper we implement two methods for the Ukrainian region of Polissya: random forest (RF) for tree species prediction and k-nearest neighbors (k-NN) for growing stock volume and biomass mapping. We examined the suitability of the five-band RapidEye satellite image to predict the distribution of six tree species. The accuracy of RF is quite high: ~99% for forest/non-forest mask and 89% for tree species prediction. Our results demonstrate that inclusion of elevation as a predictor variable in the RF model improved the performance of tree species classification. We evaluated different distance metrics for the k-NN method, including Euclidean or Mahalanobis distance, most similar neighbor (MSN), gradient nearest neighbor, and independent component analysis. The MSN with the four nearest neighbors (k = 4) is the most precise (according to the root-mean-square deviation) for predicting forest attributes across the study area. The k-NN method allowed us to estimate growing stock volume with an accuracy of 3 m3 ha-1 and for live biomass of about 2 t ha-1 over the study area.
John Yarie
1983-01-01
The forest vegetation of 3,600,000 hectares in northeast interior Alaska was classified. A total of 365 plots located in a stratified random design were run through the ordination programs SIMORD and TWINSPAN. A total of 40 forest communities were described vegetatively and, to a limited extent, environmentally. The area covered by each community was similar, ranging...
JoAnn M. Hanowski; Gerald J. Niemi
1995-01-01
We established bird monitoring programs in two regions of Minnesota: the Chippewa National Forest and the Superior National Forest. The experimental design defined forest cover types as strata in which samples of forest stands were randomly selected. Subsamples (3 point counts) were placed in each stand to maximize field effort and to assess within-stand and between-...
Predicting live and dead tree basal area of bark beetle affected forests from discrete-return lidar
Benjamin C. Bright; Andrew T. Hudak; Robert McGaughey; Hans-Erik Andersen; Jose Negron
2013-01-01
Bark beetle outbreaks have killed large numbers of trees across North America in recent years. Lidar remote sensing can be used to effectively estimate forest biomass, but prediction of both live and dead standing biomass in beetle-affected forests using lidar alone has not been demonstrated. We developed Random Forest (RF) models predicting total, live, dead, and...
Valuing the Recreational Benefits from the Creation of Nature Reserves in Irish Forests
Riccardo Scarpa; Susan M. Chilton; W. George Hutchinson; Joseph Buongiorno
2000-01-01
Data from a large-scale contingent valuation study are used to investigate the effects of forest attribum on willingness to pay for forest recreation in Ireland. In particular, the presence of a nature reserve in the forest is found to significantly increase the visitors' willingness to pay. A random utility model is used to estimate the welfare change associated...
Elizabeth A. Freeman; Gretchen G. Moisen; Tracy S. Frescino
2012-01-01
Random Forests is frequently used to model species distributions over large geographic areas. Complications arise when data used to train the models have been collected in stratified designs that involve different sampling intensity per stratum. The modeling process is further complicated if some of the target species are relatively rare on the landscape leading to an...
Unbiased feature selection in learning random forests for high-dimensional data.
Nguyen, Thanh-Tung; Huang, Joshua Zhexue; Nguyen, Thuy Thi
2015-01-01
Random forests (RFs) have been widely used as a powerful classification method. However, with the randomization in both bagging samples and feature selection, the trees in the forest tend to select uninformative features for node splitting. This makes RFs have poor accuracy when working with high-dimensional data. Besides that, RFs have bias in the feature selection process where multivalued features are favored. Aiming at debiasing feature selection in RFs, we propose a new RF algorithm, called xRF, to select good features in learning RFs for high-dimensional data. We first remove the uninformative features using p-value assessment, and the subset of unbiased features is then selected based on some statistical measures. This feature subset is then partitioned into two subsets. A feature weighting sampling technique is used to sample features from these two subsets for building trees. This approach enables one to generate more accurate trees, while allowing one to reduce dimensionality and the amount of data needed for learning RFs. An extensive set of experiments has been conducted on 47 high-dimensional real-world datasets including image datasets. The experimental results have shown that RFs with the proposed approach outperformed the existing random forests in increasing the accuracy and the AUC measures.
A random forest learning assisted "divide and conquer" approach for peptide conformation search.
Chen, Xin; Yang, Bing; Lin, Zijing
2018-06-11
Computational determination of peptide conformations is challenging as it is a problem of finding minima in a high-dimensional space. The "divide and conquer" approach is promising for reliably reducing the search space size. A random forest learning model is proposed here to expand the scope of applicability of the "divide and conquer" approach. A random forest classification algorithm is used to characterize the distributions of the backbone φ-ψ units ("words"). A random forest supervised learning model is developed to analyze the combinations of the φ-ψ units ("grammar"). It is found that amino acid residues may be grouped as equivalent "words", while the φ-ψ combinations in low-energy peptide conformations follow a distinct "grammar". The finding of equivalent words empowers the "divide and conquer" method with the flexibility of fragment substitution. The learnt grammar is used to improve the efficiency of the "divide and conquer" method by removing unfavorable φ-ψ combinations without the need of dedicated human effort. The machine learning assisted search method is illustrated by efficiently searching the conformations of GGG/AAA/GGGG/AAAA/GGGGG through assembling the structures of GFG/GFGG. Moreover, the computational cost of the new method is shown to increase rather slowly with the peptide length.
An Evaluation of Data Fusion Products for the Analysis of Dryland Forest Phenology
NASA Astrophysics Data System (ADS)
Walker, J. J.; de Beurs, K.; Wynne, R. H.; Gao, F.
2010-12-01
Semi-arid forest areas cover a significant proportion of the world’s land surface; in the interior western U.S. alone, dryland forests extend across more than 56 million hectares. The scarcity of water in these systems makes them acutely sensitive to sustained weather fluctuations, such as the higher temperatures and altered water regimes predicted under most climate change scenarios. To understand, monitor, and predict the anticipated spatial and temporal changes in these areas, it is vital to characterize current phenological patterns. Phenological analysis of western U.S. drylands is complicated by patchy land cover and mosaics of plant phenology states at a variety of spatial scales. Our aim is to use complementary satellite sensors to mitigate these difficulties and gain greater insight into phenological patterns in dryland forests. In this study we applied the spatial and temporal adaptive reflectance model (STARFM; Gao et al. 2006) to fuse Landsat and MODIS imagery to create synthetic images at Landsat spatial resolution and MODIS temporal resolution. To determine which MODIS dataset is most appropriate for the creation of synthetic images intended for the analysis of dryland forest phenology, we examined the effect of temporal compositing and BRDF function adjustment on the accuracy of STARFM imagery. We assembled seven Landsat 5 scenes (path/row 37/36) and temporally-coincident 500m MODIS datasets (seven daily (MOD09GA), seven 8-day composite (MOD09A1), and fourteen 16-day nadir BRDF-adjusted composite (MCD43A4) images) spanning the 2006 April - October growing season in northern Arizona, which is characterized by large tracts of dryland forest. The STARFM algorithm was applied to each MODIS data series to produce four synthetic images (one daily; one 8-day composite; and two 16-day composites) corresponding to each Landsat image. Validation of the accuracy of the synthetic images was achieved by comparing the reflectance values of a random sample of the identified dryland forest pixels in both images. Preliminary data analysis of the effect of the temporal resolution and dataset parameters indicates that the MODIS 8-day composite image may be a suitable and sufficient dataset for phenological analysis in this dryland forest ecosystem. Overall, this work demonstrates the feasibility of using data fusion products to assemble an imagery dataset at sufficiently high temporal and spatial scales to permit a more detailed examination of the underlying phenological processes and trends in dryland forest areas.
Missing Value Imputation Approach for Mass Spectrometry-based Metabolomics Data.
Wei, Runmin; Wang, Jingye; Su, Mingming; Jia, Erik; Chen, Shaoqiu; Chen, Tianlu; Ni, Yan
2018-01-12
Missing values exist widely in mass-spectrometry (MS) based metabolomics data. Various methods have been applied for handling missing values, but the selection can significantly affect following data analyses. Typically, there are three types of missing values, missing not at random (MNAR), missing at random (MAR), and missing completely at random (MCAR). Our study comprehensively compared eight imputation methods (zero, half minimum (HM), mean, median, random forest (RF), singular value decomposition (SVD), k-nearest neighbors (kNN), and quantile regression imputation of left-censored data (QRILC)) for different types of missing values using four metabolomics datasets. Normalized root mean squared error (NRMSE) and NRMSE-based sum of ranks (SOR) were applied to evaluate imputation accuracy. Principal component analysis (PCA)/partial least squares (PLS)-Procrustes analysis were used to evaluate the overall sample distribution. Student's t-test followed by correlation analysis was conducted to evaluate the effects on univariate statistics. Our findings demonstrated that RF performed the best for MCAR/MAR and QRILC was the favored one for left-censored MNAR. Finally, we proposed a comprehensive strategy and developed a public-accessible web-tool for the application of missing value imputation in metabolomics ( https://metabolomics.cc.hawaii.edu/software/MetImp/ ).
Metabolite analysis distinguishes between mice with epidermolysis bullosa acquisita and healthy mice
2013-01-01
Background Epidermolysis bullosa acquisita (EBA) is a rare skin blistering disease with a prevalence of 0.2/ million people. EBA is characterized by autoantibodies against type VII collagen. Type VII collagen builds anchoring fibrils that are essential for the dermal-epidermal junction. The pathogenic relevance of antibodies against type VII collagen subdomains has been demonstrated both in vitro and in vivo. Despite the multitude of clinical and immunological data, no information on metabolic changes exists. Methods We used an animal model of EBA to obtain insights into metabolomic changes during EBA. Sera from mice with immunization-induced EBA and control mice were obtained and metabolites were isolated by filtration. Proton nuclear magnetic resonance (NMR) spectra were recorded and analyzed by principal component analysis (PCA), partial least squares discrimination analysis (PLS-DA) and random forest. Results The metabolic pattern of immunized mice and control mice could be clearly distinguished with PCA and PLS-DA. Metabolites that contribute to the discrimination could be identified via random forest. The observed changes in the metabolic pattern of EBA sera, i.e. increased levels of amino acid, point toward an increased energy demand in EBA. Conclusions Knowledge about metabolic changes due to EBA could help in future to assess the disease status during treatment. Confirming the metabolic changes in patients needs probably large cohorts. PMID:23800341
Hong, Haoyuan; Tsangaratos, Paraskevas; Ilia, Ioanna; Liu, Junzhi; Zhu, A-Xing; Xu, Chong
2018-07-15
The main objective of the present study was to utilize Genetic Algorithms (GA) in order to obtain the optimal combination of forest fire related variables and apply data mining methods for constructing a forest fire susceptibility map. In the proposed approach, a Random Forest (RF) and a Support Vector Machine (SVM) was used to produce a forest fire susceptibility map for the Dayu County which is located in southwest of Jiangxi Province, China. For this purpose, historic forest fires and thirteen forest fire related variables were analyzed, namely: elevation, slope angle, aspect, curvature, land use, soil cover, heat load index, normalized difference vegetation index, mean annual temperature, mean annual wind speed, mean annual rainfall, distance to river network and distance to road network. The Natural Break and the Certainty Factor method were used to classify and weight the thirteen variables, while a multicollinearity analysis was performed to determine the correlation among the variables and decide about their usability. The optimal set of variables, determined by the GA limited the number of variables into eight excluding from the analysis, aspect, land use, heat load index, distance to river network and mean annual rainfall. The performance of the forest fire models was evaluated by using the area under the Receiver Operating Characteristic curve (ROC-AUC) based on the validation dataset. Overall, the RF models gave higher AUC values. Also the results showed that the proposed optimized models outperform the original models. Specifically, the optimized RF model gave the best results (0.8495), followed by the original RF (0.8169), while the optimized SVM gave lower values (0.7456) than the RF, however higher than the original SVM (0.7148) model. The study highlights the significance of feature selection techniques in forest fire susceptibility, whereas data mining methods could be considered as a valid approach for forest fire susceptibility modeling. Copyright © 2018 Elsevier B.V. All rights reserved.
Tustison, Nicholas J; Shrinidhi, K L; Wintermark, Max; Durst, Christopher R; Kandel, Benjamin M; Gee, James C; Grossman, Murray C; Avants, Brian B
2015-04-01
Segmenting and quantifying gliomas from MRI is an important task for diagnosis, planning intervention, and for tracking tumor changes over time. However, this task is complicated by the lack of prior knowledge concerning tumor location, spatial extent, shape, possible displacement of normal tissue, and intensity signature. To accommodate such complications, we introduce a framework for supervised segmentation based on multiple modality intensity, geometry, and asymmetry feature sets. These features drive a supervised whole-brain and tumor segmentation approach based on random forest-derived probabilities. The asymmetry-related features (based on optimal symmetric multimodal templates) demonstrate excellent discriminative properties within this framework. We also gain performance by generating probability maps from random forest models and using these maps for a refining Markov random field regularized probabilistic segmentation. This strategy allows us to interface the supervised learning capabilities of the random forest model with regularized probabilistic segmentation using the recently developed ANTsR package--a comprehensive statistical and visualization interface between the popular Advanced Normalization Tools (ANTs) and the R statistical project. The reported algorithmic framework was the top-performing entry in the MICCAI 2013 Multimodal Brain Tumor Segmentation challenge. The challenge data were widely varying consisting of both high-grade and low-grade glioma tumor four-modality MRI from five different institutions. Average Dice overlap measures for the final algorithmic assessment were 0.87, 0.78, and 0.74 for "complete", "core", and "enhanced" tumor components, respectively.
NASA Astrophysics Data System (ADS)
Molinario, G.; Hansen, M.; Potapov, P.
2016-12-01
High resolution satellite imagery obtained from the National Geospatial Intelligence Agency through NASA was used to photo-interpret sample areas within the DRC. The area sampled is a stratifcation of the forest cover loss from circa 2014 that either occurred completely within the previosly mapped homogenous area of the Rural Complex, at it's interface with primary forest, or in isolated forest perforations. Previous research resulted in a map of these areas that contextualizes forest loss depending on where it occurs and with what spatial density, leading to a better understading of the real impacts on forest degradation of livelihood shifting cultivation. The stratified random sampling approach of these areas allows the characterization of the constituent land cover types within these areas, and their variability throughout the DRC. Shifting cultivation has a variable forest degradation footprint in the DRC depending on many factors that drive it, but it's role in forest degradation and deforestation had been disputed, leading us to investigate and quantify the clearing and reuse rates within the strata throughout the country.
Measuring the effect of fuel treatments on forest carbon using landscape risk analysis
NASA Astrophysics Data System (ADS)
Ager, A. A.; Finney, M. A.; McMahan, A.; Cathcart, J.
2010-12-01
Wildfire simulation modelling was used to examine whether fuel reduction treatments can potentially reduce future wildfire emissions and provide carbon benefits. In contrast to previous reports, the current study modelled landscape scale effects of fuel treatments on fire spread and intensity, and used a probabilistic framework to quantify wildfire effects on carbon pools to account for stochastic wildfire occurrence. The study area was a 68 474 ha watershed located on the Fremont-Winema National Forest in southeastern Oregon, USA. Fuel reduction treatments were simulated on 10% of the watershed (19% of federal forestland). We simulated 30 000 wildfires with random ignition locations under both treated and untreated landscapes to estimate the change in burn probability by flame length class resulting from the treatments. Carbon loss functions were then calculated with the Forest Vegetation Simulator for each stand in the study area to quantify change in carbon as a function of flame length. We then calculated the expected change in carbon from a random ignition and wildfire as the sum of the product of the carbon loss and the burn probabilities by flame length class. The expected carbon difference between the non-treatment and treatment scenarios was then calculated to quantify the effect of fuel treatments. Overall, the results show that the carbon loss from implementing fuel reduction treatments exceeded the expected carbon benefit associated with lowered burn probabilities and reduced fire severity on the treated landscape. Thus, fuel management activities resulted in an expected net loss of carbon immediately after treatment. However, the findings represent a point in time estimate (wildfire immediately after treatments), and a temporal analysis with a probabilistic framework used here is needed to model carbon dynamics over the life cycle of the fuel treatments. Of particular importance is the long-term balance between emissions from the decay of dead trees killed by fire and carbon sequestration by forest regeneration following wildfire.
Tehran Air Pollutants Prediction Based on Random Forest Feature Selection Method
NASA Astrophysics Data System (ADS)
Shamsoddini, A.; Aboodi, M. R.; Karami, J.
2017-09-01
Air pollution as one of the most serious forms of environmental pollutions poses huge threat to human life. Air pollution leads to environmental instability, and has harmful and undesirable effects on the environment. Modern prediction methods of the pollutant concentration are able to improve decision making and provide appropriate solutions. This study examines the performance of the Random Forest feature selection in combination with multiple-linear regression and Multilayer Perceptron Artificial Neural Networks methods, in order to achieve an efficient model to estimate carbon monoxide and nitrogen dioxide, sulfur dioxide and PM2.5 contents in the air. The results indicated that Artificial Neural Networks fed by the attributes selected by Random Forest feature selection method performed more accurate than other models for the modeling of all pollutants. The estimation accuracy of sulfur dioxide emissions was lower than the other air contaminants whereas the nitrogen dioxide was predicted more accurate than the other pollutants.
NASA Astrophysics Data System (ADS)
Shi, Jing; Shi, Yunli; Tan, Jian; Zhu, Lei; Li, Hu
2018-02-01
Traditional power forecasting models cannot efficiently take various factors into account, neither to identify the relation factors. In this paper, the mutual information in information theory and the artificial intelligence random forests algorithm are introduced into the medium and long-term electricity demand prediction. Mutual information can identify the high relation factors based on the value of average mutual information between a variety of variables and electricity demand, different industries may be highly associated with different variables. The random forests algorithm was used for building the different industries forecasting models according to the different correlation factors. The data of electricity consumption in Jiangsu Province is taken as a practical example, and the above methods are compared with the methods without regard to mutual information and the industries. The simulation results show that the above method is scientific, effective, and can provide higher prediction accuracy.
Landscape Analysis of Adult Florida Panther Habitat.
Frakes, Robert A; Belden, Robert C; Wood, Barry E; James, Frederick E
2015-01-01
Historically occurring throughout the southeastern United States, the Florida panther is now restricted to less than 5% of its historic range in one breeding population located in southern Florida. Using radio-telemetry data from 87 prime-aged (≥3 years old) adult panthers (35 males and 52 females) during the period 2004 through 2013 (28,720 radio-locations), we analyzed the characteristics of the occupied area and used those attributes in a random forest model to develop a predictive distribution map for resident breeding panthers in southern Florida. Using 10-fold cross validation, the model was 87.5 % accurate in predicting presence or absence of panthers in the 16,678 km2 study area. Analysis of variable importance indicated that the amount of forests and forest edge, hydrology, and human population density were the most important factors determining presence or absence of panthers. Sensitivity analysis showed that the presence of human populations, roads, and agriculture (other than pasture) had strong negative effects on the probability of panther presence. Forest cover and forest edge had strong positive effects. The median model-predicted probability of presence for panther home ranges was 0.81 (0.82 for females and 0.74 for males). The model identified 5579 km2 of suitable breeding habitat remaining in southern Florida; 1399 km2 (25%) of this habitat is in non-protected private ownership. Because there is less panther habitat remaining than previously thought, we recommend that all remaining breeding habitat in south Florida should be maintained, and the current panther range should be expanded into south-central Florida. This model should be useful for evaluating the impacts of future development projects, in prioritizing areas for panther conservation, and in evaluating the potential impacts of sea-level rise and changes in hydrology.
Random Forest Segregation of Drug Responses May Define Regions of Biological Significance
Bukhari, Qasim; Borsook, David; Rudin, Markus; Becerra, Lino
2016-01-01
The ability to assess brain responses in unsupervised manner based on fMRI measure has remained a challenge. Here we have applied the Random Forest (RF) method to detect differences in the pharmacological MRI (phMRI) response in rats to treatment with an analgesic drug (buprenorphine) as compared to control (saline). Three groups of animals were studied: two groups treated with different doses of the opioid buprenorphine, low (LD), and high dose (HD), and one receiving saline. PhMRI responses were evaluated in 45 brain regions and RF analysis was applied to allocate rats to the individual treatment groups. RF analysis was able to identify drug effects based on differential phMRI responses in the hippocampus, amygdala, nucleus accumbens, superior colliculus, and the lateral and posterior thalamus for drug vs. saline. These structures have high levels of mu opioid receptors. In addition these regions are involved in aversive signaling, which is inhibited by mu opioids. The results demonstrate that buprenorphine mediated phMRI responses comprise characteristic features that allow a supervised differentiation from placebo treated rats as well as the proper allocation to the respective drug dose group using the RF method, a method that has been successfully applied in clinical studies. PMID:27014046
NASA Astrophysics Data System (ADS)
Pan, Xin; Cao, Chen; Yang, Yingbao; Li, Xiaolong; Shan, Liangliang; Zhu, Xi
2018-04-01
The land surface temperature (LST) derived from thermal infrared satellite images is a meaningful variable in many remote sensing applications. However, at present, the spatial resolution of the satellite thermal infrared remote sensing sensor is coarser, which cannot meet the needs. In this study, LST image was downscaled by a random forest model between LST and multiple predictors in an arid region with an oasis-desert ecotone. The proposed downscaling approach was evaluated using LST derived from the MODIS LST product of Zhangye City in Heihe Basin. The primary result of LST downscaling has been shown that the distribution of downscaled LST matched with that of the ecosystem of oasis and desert. By the way of sensitivity analysis, the most sensitive factors to LST downscaling were modified normalized difference water index (MNDWI)/normalized multi-band drought index (NMDI), soil adjusted vegetation index (SAVI)/ shortwave infrared reflectance (SWIR)/normalized difference vegetation index (NDVI), normalized difference building index (NDBI)/SAVI and SWIR/NDBI/MNDWI/NDWI for the region of water, vegetation, building and desert, with LST variation (at most) of 0.20/-0.22 K, 0.92/0.62/0.46 K, 0.28/-0.29 K and 3.87/-1.53/-0.64/-0.25 K in the situation of +/-0.02 predictor perturbances, respectively.
Random forest regression for magnetic resonance image synthesis.
Jog, Amod; Carass, Aaron; Roy, Snehashis; Pham, Dzung L; Prince, Jerry L
2017-01-01
By choosing different pulse sequences and their parameters, magnetic resonance imaging (MRI) can generate a large variety of tissue contrasts. This very flexibility, however, can yield inconsistencies with MRI acquisitions across datasets or scanning sessions that can in turn cause inconsistent automated image analysis. Although image synthesis of MR images has been shown to be helpful in addressing this problem, an inability to synthesize both T 2 -weighted brain images that include the skull and FLuid Attenuated Inversion Recovery (FLAIR) images has been reported. The method described herein, called REPLICA, addresses these limitations. REPLICA is a supervised random forest image synthesis approach that learns a nonlinear regression to predict intensities of alternate tissue contrasts given specific input tissue contrasts. Experimental results include direct image comparisons between synthetic and real images, results from image analysis tasks on both synthetic and real images, and comparison against other state-of-the-art image synthesis methods. REPLICA is computationally fast, and is shown to be comparable to other methods on tasks they are able to perform. Additionally REPLICA has the capability to synthesize both T 2 -weighted images of the full head and FLAIR images, and perform intensity standardization between different imaging datasets. Copyright © 2016 Elsevier B.V. All rights reserved.
Hu, Xisheng; Wu, Zhilong; Wu, Chengzhen; Ye, Limin; Lan, Chaofeng; Tang, Kun; Xu, Lu; Qiu, Rongzu
2016-09-15
Forest cover changes are of global concern due to their roles in global warming and biodiversity. However, many previous studies have ignored the fact that forest loss and forest gain are different processes that may respond to distinct factors by stressing forest loss more than gain or viewing forest cover change as a whole. It behooves us to carefully examine the patterns and drivers of the change by subdividing it into several categories. Our study includes areas of forest loss (4.8% of the study area), forest gain (1.3% of the study area) and forest loss and gain (2.0% of the study area) from 2000 to 2012 in Fujian Province, China. In the study area, approximately 65% and 90% of these changes occurred within 2000m of the nearest road and under road densities of 0.6km/km(2), respectively. We compared two sampling techniques (systematic sampling and random sampling) and four intensities for each technique to investigate the driving patterns underlying the changes using multinomial logistic regression. The results indicated the lack of pronounced differences in the regressions between the two sampling designs, although the sample size had a great impact on the regression outcome. The application of multi-model inference indicated that the low level road density had a negative significant association with forest loss and forest loss and gain, the expressway density had a positive significant impact on forest loss, and the road network was insignificantly related to forest gain. The model including socioeconomic and biophysical variables illuminated potentially different predictors of the different forest change categories. Moreover, the multiple comparisons tested by Fisher's least significant difference (LSD) were a good compensation for the multinomial logistic model to enrich the interpretation of the regression results. Copyright © 2016 Elsevier B.V. All rights reserved.
Semantic segmentation of 3D textured meshes for urban scene analysis
NASA Astrophysics Data System (ADS)
Rouhani, Mohammad; Lafarge, Florent; Alliez, Pierre
2017-01-01
Classifying 3D measurement data has become a core problem in photogrammetry and 3D computer vision, since the rise of modern multiview geometry techniques, combined with affordable range sensors. We introduce a Markov Random Field-based approach for segmenting textured meshes generated via multi-view stereo into urban classes of interest. The input mesh is first partitioned into small clusters, referred to as superfacets, from which geometric and photometric features are computed. A random forest is then trained to predict the class of each superfacet as well as its similarity with the neighboring superfacets. Similarity is used to assign the weights of the Markov Random Field pairwise-potential and to account for contextual information between the classes. The experimental results illustrate the efficacy and accuracy of the proposed framework.
Michael G. Shelton
1995-01-01
Five forest floor weights (0, 10, 20, 30, and 40 MgJha), three forest floor compositions (pine, pine-hardwood, and hardwood), and two seed placements (forest floor and soil surface) were tested in a three-factorial. split-plot design with four incomplete, randomized blocks. The experiment was conducted in a nursery setting and used wooden frames to define 0.145-m
Disentangling factors that control the vulnerability of forests to catastrophic wind damage
NASA Astrophysics Data System (ADS)
Dracup, E.; Taylor, A.; MacLean, D.; Boulanger, Y.
2017-12-01
Wind is an important driver of forest dynamics along North America's north-eastern coastal forests, but also damages many commercially managed forests which society relies as an important source of wood fiber. Although the influence of wind on north-eastern forests is well recognized, knowledge of factors predisposing trees to wind damage is less known, especially in the context of large, powerful wind storm events. This is of particular concern as climate change is expected to alter the frequency and severity of strong wind storms affecting this region. On 29 September 2003, Hurricane Juan made landfall over Nova Scotia, Canada as a Category 2 hurricane with sustained winds of 158 km/h, and gusts of up to 185 km/h. Hurricane Juan variously damaged a swath of over 600,000 ha of forest. The damaged forest area was surveyed using aerial photography and LandSAT imagery and categorized according to level of wind damage sustained (none, low, moderate, severe) at a resolution of 15 x 15 m square cells. We used Random Forest to analyze and compare level of wind damage in each cell with a myriad of abiotic (exposure, depth to water table, soil composition, etc.) and biotic (tree species composition, canopy closure, canopy height, etc.) factors known or expected to predispose trees to windthrow. From our analysis, we identified topographic exposure, precipitation, and maximum gust speed as the top predictors of windthrow during Hurricane Juan. To our surprise, forest stand factors, such as tree species composition and height, had minimal effects on level of windthrow. These results can be used to construct predictive risk maps which can help society to assess the vulnerability of forests to future wind storm events.
Randall J. Wilk; Timothy B. Harrington; Robert A. Gitzen; Chris C. Maguire
2015-01-01
We evaluated the two-year effects of variable-retention harvest on chipmunk (Tamias spp.) abundance (N^) and habitat in mature coniferous forests in western Oregon and Washington because wildlife responses to density/pattern of retained trees remain largely unknown. In a randomized complete-block design, six...
Highlights of the national evaluation of the Forest Stewardship Planning Program
R.J. Moulton; J.D. Esseks
2001-01-01
In 1998 and 1999, a nationwide random sample of 1238 nonindustrial private (NIPF) landowners with approved multiple resource Forest Stewardship Plans were interviewed to determine if this program is meeting its Congressional mandate of promoting sustainable management of forest resources on NIPF ownerships. It was found that two-thirds of program participants had never...
Ownership and ecosystem as sources of spatial heterogeneity in a forested landscape, Wisconsin, USA
Thomas R. Crow; George E. Host; David J. Mladenoff
1999-01-01
The interaction between physical environment and land ownership in creating spatial heterogeneity was studied in largely forested landscapes of northern Wisconsin, USA. A stratified random approach was used in which 2500-ha plots representing two ownerships (National Forest and private non-industrial) were located within two regional ecosystems (extremely well-drained...
2012-03-01
with each SVM discriminating between a pair of the N total speakers in the data set. The (( + 1))/2 classifiers then vote on the final...classification of a test sample. The Random Forest classifier is an ensemble classifier that votes amongst decision trees generated with each node using...Forest vote , and the effects of overtraining will be mitigated by the fact that each decision tree is overtrained differently (due to the random
Probability machines: consistent probability estimation using nonparametric learning machines.
Malley, J D; Kruppa, J; Dasgupta, A; Malley, K G; Ziegler, A
2012-01-01
Most machine learning approaches only provide a classification for binary responses. However, probabilities are required for risk estimation using individual patient characteristics. It has been shown recently that every statistical learning machine known to be consistent for a nonparametric regression problem is a probability machine that is provably consistent for this estimation problem. The aim of this paper is to show how random forests and nearest neighbors can be used for consistent estimation of individual probabilities. Two random forest algorithms and two nearest neighbor algorithms are described in detail for estimation of individual probabilities. We discuss the consistency of random forests, nearest neighbors and other learning machines in detail. We conduct a simulation study to illustrate the validity of the methods. We exemplify the algorithms by analyzing two well-known data sets on the diagnosis of appendicitis and the diagnosis of diabetes in Pima Indians. Simulations demonstrate the validity of the method. With the real data application, we show the accuracy and practicality of this approach. We provide sample code from R packages in which the probability estimation is already available. This means that all calculations can be performed using existing software. Random forest algorithms as well as nearest neighbor approaches are valid machine learning methods for estimating individual probabilities for binary responses. Freely available implementations are available in R and may be used for applications.
A random forest algorithm for nowcasting of intense precipitation events
NASA Astrophysics Data System (ADS)
Das, Saurabh; Chakraborty, Rohit; Maitra, Animesh
2017-09-01
Automatic nowcasting of convective initiation and thunderstorms has potential applications in several sectors including aviation planning and disaster management. In this paper, random forest based machine learning algorithm is tested for nowcasting of convective rain with a ground based radiometer. Brightness temperatures measured at 14 frequencies (7 frequencies in 22-31 GHz band and 7 frequencies in 51-58 GHz bands) are utilized as the inputs of the model. The lower frequency band is associated to the water vapor absorption whereas the upper frequency band relates to the oxygen absorption and hence, provide information on the temperature and humidity of the atmosphere. Synthetic minority over-sampling technique is used to balance the data set and 10-fold cross validation is used to assess the performance of the model. Results indicate that random forest algorithm with fixed alarm generation time of 30 min and 60 min performs quite well (probability of detection of all types of weather condition ∼90%) with low false alarms. It is, however, also observed that reducing the alarm generation time improves the threat score significantly and also decreases false alarms. The proposed model is found to be very sensitive to the boundary layer instability as indicated by the variable importance measure. The study shows the suitability of a random forest algorithm for nowcasting application utilizing a large number of input parameters from diverse sources and can be utilized in other forecasting problems.
Learning accurate and interpretable models based on regularized random forests regression
2014-01-01
Background Many biology related research works combine data from multiple sources in an effort to understand the underlying problems. It is important to find and interpret the most important information from these sources. Thus it will be beneficial to have an effective algorithm that can simultaneously extract decision rules and select critical features for good interpretation while preserving the prediction performance. Methods In this study, we focus on regression problems for biological data where target outcomes are continuous. In general, models constructed from linear regression approaches are relatively easy to interpret. However, many practical biological applications are nonlinear in essence where we can hardly find a direct linear relationship between input and output. Nonlinear regression techniques can reveal nonlinear relationship of data, but are generally hard for human to interpret. We propose a rule based regression algorithm that uses 1-norm regularized random forests. The proposed approach simultaneously extracts a small number of rules from generated random forests and eliminates unimportant features. Results We tested the approach on some biological data sets. The proposed approach is able to construct a significantly smaller set of regression rules using a subset of attributes while achieving prediction performance comparable to that of random forests regression. Conclusion It demonstrates high potential in aiding prediction and interpretation of nonlinear relationships of the subject being studied. PMID:25350120
A multi-sites analysis on the ozone effects on Gross Primary Production of European forests.
Proietti, C; Anav, A; De Marco, A; Sicard, P; Vitale, M
2016-06-15
Ozone (O3) is both a greenhouse gas and a secondary air pollutant causing adverse impacts on forests ecosystems at different scales, from cellular to ecosystem level. Specifically, the phytotoxic nature of O3 can impair CO2 assimilation that, in turn affects forest productivity. This study aims to evaluate the effects of tropospheric O3 on Gross Primary Production (GPP) at 37 European forest sites during the time period 2000-2010. Due to the lack of carbon assimilation data at O3 monitoring stations (and vice-versa) this study makes a first attempt to combine high resolution MODIS Gross Primary Production (GPP) estimates and O3 measurement data. Partial Correlations, Anomalies Analysis and the Random Forests Analysis (RFA) were used to quantify the effects of tropospheric O3 concentration and its uptake on GPP and to evaluate the most important factors affecting inter-annual GPP changes. Our results showed, along a North-West/South-East European transect, a negative impact of O3 on GPP ranging from 0.4% to 30%, although a key role of meteorological parameters respect to pollutant variables in affecting GPP was found. In particular, meteorological parameters, namely air temperature (T), soil water content (SWC) and relative humidity (RH) are the most important predictors at 81% of test sites. Moreover, it is interesting to highlight a key role of SWC in the Mediterranean areas (Spanish, Italian and French test sites) confirming that, soil moisture and soil water availability affect vegetation growth and photosynthesis especially in arid or semi-arid ecosystems such as the Mediterranean climate regions. Considering the pivotal role of GPP in the global carbon balance and the O3 ability to reduce primary productivity of the forests, this study can help in assessing the O3 impacts on ecosystem services, including wood production and carbon sequestration. Copyright © 2016 Elsevier B.V. All rights reserved.
The contribution of competition to tree mortality in old-growth coniferous forests
Das, A.; Battles, J.; Stephenson, N.L.; van Mantgem, P.J.
2011-01-01
Competition is a well-documented contributor to tree mortality in temperate forests, with numerous studies documenting a relationship between tree death and the competitive environment. Models frequently rely on competition as the only non-random mechanism affecting tree mortality. However, for mature forests, competition may cease to be the primary driver of mortality.We use a large, long-term dataset to study the importance of competition in determining tree mortality in old-growth forests on the western slope of the Sierra Nevada of California, U.S.A. We make use of the comparative spatial configuration of dead and live trees, changes in tree spatial pattern through time, and field assessments of contributors to an individual tree's death to quantify competitive effects.Competition was apparently a significant contributor to tree mortality in these forests. Trees that died tended to be in more competitive environments than trees that survived, and suppression frequently appeared as a factor contributing to mortality. On the other hand, based on spatial pattern analyses, only three of 14 plots demonstrated compelling evidence that competition was dominating mortality. Most of the rest of the plots fell within the expectation for random mortality, and three fit neither the random nor the competition model. These results suggest that while competition is often playing a significant role in tree mortality processes in these forests it only infrequently governs those processes. In addition, the field assessments indicated a substantial presence of biotic mortality agents in trees that died.While competition is almost certainly important, demographics in these forests cannot accurately be characterized without a better grasp of other mortality processes. In particular, we likely need a better understanding of biotic agents and their interactions with one another and with competition. ?? 2011.
NASA Astrophysics Data System (ADS)
Gao, Yan; Marpu, Prashanth; Morales Manila, Luis M.
2014-11-01
This paper assesses the suitability of 8-band Worldview-2 (WV2) satellite data and object-based random forest algorithm for the classification of avocado growth stages in Mexico. We tested both pixel-based with minimum distance (MD) and maximum likelihood (MLC) and object-based with Random Forest (RF) algorithm for this task. Training samples and verification data were selected by visual interpreting the WV2 images for seven thematic classes: fully grown, middle stage, and early stage of avocado crops, bare land, two types of natural forests, and water body. To examine the contribution of the four new spectral bands of WV2 sensor, all the tested classifications were carried out with and without the four new spectral bands. Classification accuracy assessment results show that object-based classification with RF algorithm obtained higher overall higher accuracy (93.06%) than pixel-based MD (69.37%) and MLC (64.03%) method. For both pixel-based and object-based methods, the classifications with the four new spectral bands (overall accuracy obtained higher accuracy than those without: overall accuracy of object-based RF classification with vs without: 93.06% vs 83.59%, pixel-based MD: 69.37% vs 67.2%, pixel-based MLC: 64.03% vs 36.05%, suggesting that the four new spectral bands in WV2 sensor contributed to the increase of the classification accuracy.
NASA Astrophysics Data System (ADS)
Co, Noelle Easter C.; Brown, Donald E.; Burns, James T.
2018-05-01
This study applies data science approaches (random forest and logistic regression) to determine the extent to which macro-scale corrosion damage features govern the crack formation behavior in AA7050-T7451. Each corrosion morphology has a set of corresponding predictor variables (pit depth, volume, area, diameter, pit density, total fissure length, surface roughness metrics, etc.) describing the shape of the corrosion damage. The values of the predictor variables are obtained from white light interferometry, x-ray tomography, and scanning electron microscope imaging of the corrosion damage. A permutation test is employed to assess the significance of the logistic and random forest model predictions. Results indicate minimal relationship between the macro-scale corrosion feature predictor variables and fatigue crack initiation. These findings suggest that the macro-scale corrosion features and their interactions do not solely govern the crack formation behavior. While these results do not imply that the macro-features have no impact, they do suggest that additional parameters must be considered to rigorously inform the crack formation location.
Chen, Carla Chia-Ming; Schwender, Holger; Keith, Jonathan; Nunkesser, Robin; Mengersen, Kerrie; Macrossan, Paula
2011-01-01
Due to advancements in computational ability, enhanced technology and a reduction in the price of genotyping, more data are being generated for understanding genetic associations with diseases and disorders. However, with the availability of large data sets comes the inherent challenges of new methods of statistical analysis and modeling. Considering a complex phenotype may be the effect of a combination of multiple loci, various statistical methods have been developed for identifying genetic epistasis effects. Among these methods, logic regression (LR) is an intriguing approach incorporating tree-like structures. Various methods have built on the original LR to improve different aspects of the model. In this study, we review four variations of LR, namely Logic Feature Selection, Monte Carlo Logic Regression, Genetic Programming for Association Studies, and Modified Logic Regression-Gene Expression Programming, and investigate the performance of each method using simulated and real genotype data. We contrast these with another tree-like approach, namely Random Forests, and a Bayesian logistic regression with stochastic search variable selection.
Chakraborty, Somsubhra; Weindorf, David C; Li, Bin; Ali Aldabaa, Abdalsamad Abdalsatar; Ghosh, Rakesh Kumar; Paul, Sathi; Nasim Ali, Md
2015-05-01
Using 108 petroleum contaminated soil samples, this pilot study proposed a new analytical approach of combining visible near-infrared diffuse reflectance spectroscopy (VisNIR DRS) and portable X-ray fluorescence spectrometry (PXRF) for rapid and improved quantification of soil petroleum contamination. Results indicated that an advanced fused model where VisNIR DRS spectra-based penalized spline regression (PSR) was used to predict total petroleum hydrocarbon followed by PXRF elemental data-based random forest regression was used to model the PSR residuals, it outperformed (R(2)=0.78, residual prediction deviation (RPD)=2.19) all other models tested, even producing better generalization than using VisNIR DRS alone (RPD's of 1.64, 1.86, and 1.96 for random forest, penalized spline regression, and partial least squares regression, respectively). Additionally, unsupervised principal component analysis using the PXRF+VisNIR DRS system qualitatively separated contaminated soils from control samples. Fusion of PXRF elemental data and VisNIR derivative spectra produced an optimized model for total petroleum hydrocarbon quantification in soils. Copyright © 2015 Elsevier B.V. All rights reserved.
Classification of Hyperspectral Data Based on Guided Filtering and Random Forest
NASA Astrophysics Data System (ADS)
Ma, H.; Feng, W.; Cao, X.; Wang, L.
2017-09-01
Hyperspectral images usually consist of more than one hundred spectral bands, which have potentials to provide rich spatial and spectral information. However, the application of hyperspectral data is still challengeable due to "the curse of dimensionality". In this context, many techniques, which aim to make full use of both the spatial and spectral information, are investigated. In order to preserve the geometrical information, meanwhile, with less spectral bands, we propose a novel method, which combines principal components analysis (PCA), guided image filtering and the random forest classifier (RF). In detail, PCA is firstly employed to reduce the dimension of spectral bands. Secondly, the guided image filtering technique is introduced to smooth land object, meanwhile preserving the edge of objects. Finally, the features are fed into RF classifier. To illustrate the effectiveness of the method, we carry out experiments over the popular Indian Pines data set, which is collected by Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) sensor. By comparing the proposed method with the method of only using PCA or guided image filter, we find that effect of the proposed method is better.
NASA Astrophysics Data System (ADS)
Sheet, Debdoot; Karamalis, Athanasios; Kraft, Silvan; Noël, Peter B.; Vag, Tibor; Sadhu, Anup; Katouzian, Amin; Navab, Nassir; Chatterjee, Jyotirmoy; Ray, Ajoy K.
2013-03-01
Breast cancer is the most common form of cancer in women. Early diagnosis can significantly improve lifeexpectancy and allow different treatment options. Clinicians favor 2D ultrasonography for breast tissue abnormality screening due to high sensitivity and specificity compared to competing technologies. However, inter- and intra-observer variability in visual assessment and reporting of lesions often handicaps its performance. Existing Computer Assisted Diagnosis (CAD) systems though being able to detect solid lesions are often restricted in performance. These restrictions are inability to (1) detect lesion of multiple sizes and shapes, and (2) differentiate between hypo-echoic lesions from their posterior acoustic shadowing. In this work we present a completely automatic system for detection and segmentation of breast lesions in 2D ultrasound images. We employ random forests for learning of tissue specific primal to discriminate breast lesions from surrounding normal tissues. This enables it to detect lesions of multiple shapes and sizes, as well as discriminate between hypo-echoic lesion from associated posterior acoustic shadowing. The primal comprises of (i) multiscale estimated ultrasonic statistical physics and (ii) scale-space characteristics. The random forest learns lesion vs. background primal from a database of 2D ultrasound images with labeled lesions. For segmentation, the posterior probabilities of lesion pixels estimated by the learnt random forest are hard thresholded to provide a random walks segmentation stage with starting seeds. Our method achieves detection with 99.19% accuracy and segmentation with mean contour-to-contour error < 3 pixels on a set of 40 images with 49 lesions.
NASA Astrophysics Data System (ADS)
Jeuck, James A.
This dissertation consists of research projects related to forest land use / land cover (LULC): (1) factors predicting LULC change and (2) methodology to predict particular forest use, or "potential working timberland" (PWT), from current forms of land data. The first project resulted in a published paper, a meta-analysis of 64 econometric models from 47 studies predicting forest land use changes. The response variables, representing some form of forest land change, were organized into four groups: forest conversion to agriculture (F2A), forestland to development (F2D), forestland to non-forested (F2NF) and undeveloped (including forestland) to developed (U2D) land. Over 250 independent econometric variables were identified, from 21 F2A models, 21 F2D models, 12 F2NF models, and 10 U2D models. These variables were organized into a hierarchy of 119 independent variable groups, 15 categories, and 4 econometric drivers suitable for conducting simple vote count statistics. Vote counts were summarized at the independent variable group level and formed into ratios estimating the predictive success of each variable group. Two ratio estimates were developed based on (1) proportion of times independent variables successfully achieved statistical significance (p ≤0.10), and (2) proportion of times independent variables successfully met the original researchers'expectations. In F2D models, popular independent variables such as population, income, and urban proximity often achieved statistical significance. In F2A models, popular independent variables such as forest and agricultural rents and costs, governmental programs, and site quality often achieved statistical significance. In U2D models, successful independent variables included urban rents and costs, zoning issues concerning forestland loss, site quality, urban proximity, population, and income. F2NF models high success variables were found to be agricultural rents, site quality, population, and income. This meta-analysis provides insight into the general success of econometric independent variables for future forest use or cover change research. The second part of this dissertation developed a method for predicting area estimates and spatial distribution of PWT in the US South. This technique determined land use from USFS Forest Inventory and Analysis (FIA) and land cover from the National Land Cover Database (NLCD). Three dependent variable forms (DV Forms) were derived from the FIA data: DV Form 1, timberland, other; DV Form 2, short timberland, tall timberland, agriculture, other; and DV Form 3, short hardwood (HW) timberland, tall HW timberland, short softwood (SW) timberland, tall SW timberland, agriculture, other. The prediction accuracy of each DV Form was investigated using both random forest model and logistic regression model specifications and data optimization techniques. Model verification employing a "leave-group-out" Monte Carlo simulation determined the selection of a stratified version of the random forest model using one-year NLCD observations with an overall accuracy of 0.53-0.94. The lower accuracy side of the range was when predictions were made from an aggregated NLCD land cover class "grass_shrub". The selected model specification was run using 2011 NLCD and the other predictor variables to produce three levels of timberland prediction and probability maps for the US South. Spatial masks removed areas unlikely to be working forests (protected and urbanized lands) resulting in PWT maps. The area of the resulting maps compared well with USFS area estimates and masked PWT maps and had an 8-11% reduction of the USFS timberland estimate for the US South compared to the DV Form. Change analysis of the 2011 NLCD to PWT showed (1) the majority of the short timberland came from NLCD grass_shrub; (2) the majority of NLCD grass_shrub predicted into tall timberland, and (3) NLCD grass_shrub was more strongly associated with timberland in the Coastal Plain. Resulting map products provide practical analytical tools for those interested in studying the area and distribution of PWT in the US South.
Towards large-scale FAME-based bacterial species identification using machine learning techniques.
Slabbinck, Bram; De Baets, Bernard; Dawyndt, Peter; De Vos, Paul
2009-05-01
In the last decade, bacterial taxonomy witnessed a huge expansion. The swift pace of bacterial species (re-)definitions has a serious impact on the accuracy and completeness of first-line identification methods. Consequently, back-end identification libraries need to be synchronized with the List of Prokaryotic names with Standing in Nomenclature. In this study, we focus on bacterial fatty acid methyl ester (FAME) profiling as a broadly used first-line identification method. From the BAME@LMG database, we have selected FAME profiles of individual strains belonging to the genera Bacillus, Paenibacillus and Pseudomonas. Only those profiles resulting from standard growth conditions have been retained. The corresponding data set covers 74, 44 and 95 validly published bacterial species, respectively, represented by 961, 378 and 1673 standard FAME profiles. Through the application of machine learning techniques in a supervised strategy, different computational models have been built for genus and species identification. Three techniques have been considered: artificial neural networks, random forests and support vector machines. Nearly perfect identification has been achieved at genus level. Notwithstanding the known limited discriminative power of FAME analysis for species identification, the computational models have resulted in good species identification results for the three genera. For Bacillus, Paenibacillus and Pseudomonas, random forests have resulted in sensitivity values, respectively, 0.847, 0.901 and 0.708. The random forests models outperform those of the other machine learning techniques. Moreover, our machine learning approach also outperformed the Sherlock MIS (MIDI Inc., Newark, DE, USA). These results show that machine learning proves very useful for FAME-based bacterial species identification. Besides good bacterial identification at species level, speed and ease of taxonomic synchronization are major advantages of this computational species identification strategy.
Nagao, Chioko; Nagano, Nozomi; Mizuguchi, Kenji
2014-01-01
Determining enzyme functions is essential for a thorough understanding of cellular processes. Although many prediction methods have been developed, it remains a significant challenge to predict enzyme functions at the fourth-digit level of the Enzyme Commission numbers. Functional specificity of enzymes often changes drastically by mutations of a small number of residues and therefore, information about these critical residues can potentially help discriminate detailed functions. However, because these residues must be identified by mutagenesis experiments, the available information is limited, and the lack of experimentally verified specificity determining residues (SDRs) has hindered the development of detailed function prediction methods and computational identification of SDRs. Here we present a novel method for predicting enzyme functions by random forests, EFPrf, along with a set of putative SDRs, the random forests derived SDRs (rf-SDRs). EFPrf consists of a set of binary predictors for enzymes in each CATH superfamily and the rf-SDRs are the residue positions corresponding to the most highly contributing attributes obtained from each predictor. EFPrf showed a precision of 0.98 and a recall of 0.89 in a cross-validated benchmark assessment. The rf-SDRs included many residues, whose importance for specificity had been validated experimentally. The analysis of the rf-SDRs revealed both a general tendency that functionally diverged superfamilies tend to include more active site residues in their rf-SDRs than in less diverged superfamilies, and superfamily-specific conservation patterns of each functional residue. EFPrf and the rf-SDRs will be an effective tool for annotating enzyme functions and for understanding how enzyme functions have diverged within each superfamily. PMID:24416252
Corcoran, Jennifer M.; Knight, Joseph F.; Gallant, Alisa L.
2013-01-01
Wetland mapping at the landscape scale using remotely sensed data requires both affordable data and an efficient accurate classification method. Random forest classification offers several advantages over traditional land cover classification techniques, including a bootstrapping technique to generate robust estimations of outliers in the training data, as well as the capability of measuring classification confidence. Though the random forest classifier can generate complex decision trees with a multitude of input data and still not run a high risk of over fitting, there is a great need to reduce computational and operational costs by including only key input data sets without sacrificing a significant level of accuracy. Our main questions for this study site in Northern Minnesota were: (1) how does classification accuracy and confidence of mapping wetlands compare using different remote sensing platforms and sets of input data; (2) what are the key input variables for accurate differentiation of upland, water, and wetlands, including wetland type; and (3) which datasets and seasonal imagery yield the best accuracy for wetland classification. Our results show the key input variables include terrain (elevation and curvature) and soils descriptors (hydric), along with an assortment of remotely sensed data collected in the spring (satellite visible, near infrared, and thermal bands; satellite normalized vegetation index and Tasseled Cap greenness and wetness; and horizontal-horizontal (HH) and horizontal-vertical (HV) polarization using L-band satellite radar). We undertook this exploratory analysis to inform decisions by natural resource managers charged with monitoring wetland ecosystems and to aid in designing a system for consistent operational mapping of wetlands across landscapes similar to those found in Northern Minnesota.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Liu, Jiamin; Hoffman, Joanne; Zhao, Jocelyn
2016-07-15
Purpose: To develop an automated system for mediastinal lymph node detection and station mapping for chest CT. Methods: The contextual organs, trachea, lungs, and spine are first automatically identified to locate the region of interest (ROI) (mediastinum). The authors employ shape features derived from Hessian analysis, local object scale, and circular transformation that are computed per voxel in the ROI. Eight more anatomical structures are simultaneously segmented by multiatlas label fusion. Spatial priors are defined as the relative multidimensional distance vectors corresponding to each structure. Intensity, shape, and spatial prior features are integrated and parsed by a random forest classifiermore » for lymph node detection. The detected candidates are then segmented by the following curve evolution process. Texture features are computed on the segmented lymph nodes and a support vector machine committee is used for final classification. For lymph node station labeling, based on the segmentation results of the above anatomical structures, the textual definitions of mediastinal lymph node map according to the International Association for the Study of Lung Cancer are converted into patient-specific color-coded CT image, where the lymph node station can be automatically assigned for each detected node. Results: The chest CT volumes from 70 patients with 316 enlarged mediastinal lymph nodes are used for validation. For lymph node detection, their system achieves 88% sensitivity at eight false positives per patient. For lymph node station labeling, 84.5% of lymph nodes are correctly assigned to their stations. Conclusions: Multiple-channel shape, intensity, and spatial prior features aggregated by a random forest classifier improve mediastinal lymph node detection on chest CT. Using the location information of segmented anatomic structures from the multiatlas formulation enables accurate identification of lymph node stations.« less
Characterizing Forest Change Using Community-Based Monitoring Data and Landsat Time Series
DeVries, Ben; Pratihast, Arun Kumar; Verbesselt, Jan; Kooistra, Lammert; Herold, Martin
2016-01-01
Increasing awareness of the issue of deforestation and degradation in the tropics has resulted in efforts to monitor forest resources in tropical countries. Advances in satellite-based remote sensing and ground-based technologies have allowed for monitoring of forests with high spatial, temporal and thematic detail. Despite these advances, there is a need to engage communities in monitoring activities and include these stakeholders in national forest monitoring systems. In this study, we analyzed activity data (deforestation and forest degradation) collected by local forest experts over a 3-year period in an Afro-montane forest area in southwestern Ethiopia and corresponding Landsat Time Series (LTS). Local expert data included forest change attributes, geo-location and photo evidence recorded using mobile phones with integrated GPS and photo capabilities. We also assembled LTS using all available data from all spectral bands and a suite of additional indices and temporal metrics based on time series trajectory analysis. We predicted deforestation, degradation or stable forests using random forest models trained with data from local experts and LTS spectral-temporal metrics as model covariates. Resulting models predicted deforestation and degradation with an out of bag (OOB) error estimate of 29% overall, and 26% and 31% for the deforestation and degradation classes, respectively. By dividing the local expert data into training and operational phases corresponding to local monitoring activities, we found that forest change models improved as more local expert data were used. Finally, we produced maps of deforestation and degradation using the most important spectral bands. The results in this study represent some of the first to combine local expert based forest change data and dense LTS, demonstrating the complementary value of both continuous data streams. Our results underpin the utility of both datasets and provide a useful foundation for integrated forest monitoring systems relying on data streams from diverse sources. PMID:27018852
The influence of negative training set size on machine learning-based virtual screening.
Kurczab, Rafał; Smusz, Sabina; Bojarski, Andrzej J
2014-01-01
The paper presents a thorough analysis of the influence of the number of negative training examples on the performance of machine learning methods. The impact of this rather neglected aspect of machine learning methods application was examined for sets containing a fixed number of positive and a varying number of negative examples randomly selected from the ZINC database. An increase in the ratio of positive to negative training instances was found to greatly influence most of the investigated evaluating parameters of ML methods in simulated virtual screening experiments. In a majority of cases, substantial increases in precision and MCC were observed in conjunction with some decreases in hit recall. The analysis of dynamics of those variations let us recommend an optimal composition of training data. The study was performed on several protein targets, 5 machine learning algorithms (SMO, Naïve Bayes, Ibk, J48 and Random Forest) and 2 types of molecular fingerprints (MACCS and CDK FP). The most effective classification was provided by the combination of CDK FP with SMO or Random Forest algorithms. The Naïve Bayes models appeared to be hardly sensitive to changes in the number of negative instances in the training set. In conclusion, the ratio of positive to negative training instances should be taken into account during the preparation of machine learning experiments, as it might significantly influence the performance of particular classifier. What is more, the optimization of negative training set size can be applied as a boosting-like approach in machine learning-based virtual screening.
The influence of negative training set size on machine learning-based virtual screening
2014-01-01
Background The paper presents a thorough analysis of the influence of the number of negative training examples on the performance of machine learning methods. Results The impact of this rather neglected aspect of machine learning methods application was examined for sets containing a fixed number of positive and a varying number of negative examples randomly selected from the ZINC database. An increase in the ratio of positive to negative training instances was found to greatly influence most of the investigated evaluating parameters of ML methods in simulated virtual screening experiments. In a majority of cases, substantial increases in precision and MCC were observed in conjunction with some decreases in hit recall. The analysis of dynamics of those variations let us recommend an optimal composition of training data. The study was performed on several protein targets, 5 machine learning algorithms (SMO, Naïve Bayes, Ibk, J48 and Random Forest) and 2 types of molecular fingerprints (MACCS and CDK FP). The most effective classification was provided by the combination of CDK FP with SMO or Random Forest algorithms. The Naïve Bayes models appeared to be hardly sensitive to changes in the number of negative instances in the training set. Conclusions In conclusion, the ratio of positive to negative training instances should be taken into account during the preparation of machine learning experiments, as it might significantly influence the performance of particular classifier. What is more, the optimization of negative training set size can be applied as a boosting-like approach in machine learning-based virtual screening. PMID:24976867
Modeling Verdict Outcomes Using Social Network Measures: The Watergate and Caviar Network Cases
2016-01-01
Modelling criminal trial verdict outcomes using social network measures is an emerging research area in quantitative criminology. Few studies have yet analyzed which of these measures are the most important for verdict modelling or which data classification techniques perform best for this application. To compare the performance of different techniques in classifying members of a criminal network, this article applies three different machine learning classifiers–Logistic Regression, Naïve Bayes and Random Forest–with a range of social network measures and the necessary databases to model the verdicts in two real–world cases: the U.S. Watergate Conspiracy of the 1970’s and the now–defunct Canada–based international drug trafficking ring known as the Caviar Network. In both cases it was found that the Random Forest classifier did better than either Logistic Regression or Naïve Bayes, and its superior performance was statistically significant. This being so, Random Forest was used not only for classification but also to assess the importance of the measures. For the Watergate case, the most important one proved to be betweenness centrality while for the Caviar Network, it was the effective size of the network. These results are significant because they show that an approach combining machine learning with social network analysis not only can generate accurate classification models but also helps quantify the importance social network variables in modelling verdict outcomes. We conclude our analysis with a discussion and some suggestions for future work in verdict modelling using social network measures. PMID:26824351
Comparing ensemble learning methods based on decision tree classifiers for protein fold recognition.
Bardsiri, Mahshid Khatibi; Eftekhari, Mahdi
2014-01-01
In this paper, some methods for ensemble learning of protein fold recognition based on a decision tree (DT) are compared and contrasted against each other over three datasets taken from the literature. According to previously reported studies, the features of the datasets are divided into some groups. Then, for each of these groups, three ensemble classifiers, namely, random forest, rotation forest and AdaBoost.M1 are employed. Also, some fusion methods are introduced for combining the ensemble classifiers obtained in the previous step. After this step, three classifiers are produced based on the combination of classifiers of types random forest, rotation forest and AdaBoost.M1. Finally, the three different classifiers achieved are combined to make an overall classifier. Experimental results show that the overall classifier obtained by the genetic algorithm (GA) weighting fusion method, is the best one in comparison to previously applied methods in terms of classification accuracy.
Polarimetric signatures of a coniferous forest canopy based on vector radiative transfer theory
NASA Technical Reports Server (NTRS)
Karam, M. A.; Fung, A. K.; Amar, F.; Mougin, E.; Lopes, A.; Beaudoin, A.
1992-01-01
Complete polarization signatures of a coniferous forest canopy are studied by the iterative solution of the vector radiative transfer equations up to the second order. The forest canopy constituents (leaves, branches, stems, and trunk) are embedded in a multi-layered medium over a rough interface. The branches, stems and trunk scatterers are modeled as finite randomly oriented cylinders. The leaves are modeled as randomly oriented needles. For a plane wave exciting the canopy, the average Mueller matrix is formulated in terms of the iterative solution of the radiative transfer solution and used to determine the linearly polarized backscattering coefficients, the co-polarized and cross-polarized power returns, and the phase difference statistics. Numerical results are presented to investigate the effect of transmitting and receiving antenna configurations on the polarimetric signature of a pine forest. Comparison is made with measurements.
Foliar and woody materials discriminated using terrestrial LiDAR in a mixed natural forest
NASA Astrophysics Data System (ADS)
Zhu, Xi; Skidmore, Andrew K.; Darvishzadeh, Roshanak; Niemann, K. Olaf; Liu, Jing; Shi, Yifang; Wang, Tiejun
2018-02-01
Separation of foliar and woody materials using remotely sensed data is crucial for the accurate estimation of leaf area index (LAI) and woody biomass across forest stands. In this paper, we present a new method to accurately separate foliar and woody materials using terrestrial LiDAR point clouds obtained from ten test sites in a mixed forest in Bavarian Forest National Park, Germany. Firstly, we applied and compared an adaptive radius near-neighbor search algorithm with a fixed radius near-neighbor search method in order to obtain both radiometric and geometric features derived from terrestrial LiDAR point clouds. Secondly, we used a random forest machine learning algorithm to classify foliar and woody materials and examined the impact of understory and slope on the classification accuracy. An average overall accuracy of 84.4% (Kappa = 0.75) was achieved across all experimental plots. The adaptive radius near-neighbor search method outperformed the fixed radius near-neighbor search method. The classification accuracy was significantly higher when the combination of both radiometric and geometric features was utilized. The analysis showed that increasing slope and understory coverage had a significant negative effect on the overall classification accuracy. Our results suggest that the utilization of the adaptive radius near-neighbor search method coupling both radiometric and geometric features has the potential to accurately discriminate foliar and woody materials from terrestrial LiDAR data in a mixed natural forest.
Field evaluation of a random forest activity classifier for wrist-worn accelerometer data.
Pavey, Toby G; Gilson, Nicholas D; Gomersall, Sjaan R; Clark, Bronwyn; Trost, Stewart G
2017-01-01
Wrist-worn accelerometers are convenient to wear and associated with greater wear-time compliance. Previous work has generally relied on choreographed activity trials to train and test classification models. However, validity in free-living contexts is starting to emerge. Study aims were: (1) train and test a random forest activity classifier for wrist accelerometer data; and (2) determine if models trained on laboratory data perform well under free-living conditions. Twenty-one participants (mean age=27.6±6.2) completed seven lab-based activity trials and a 24h free-living trial (N=16). Participants wore a GENEActiv monitor on the non-dominant wrist. Classification models recognising four activity classes (sedentary, stationary+, walking, and running) were trained using time and frequency domain features extracted from 10-s non-overlapping windows. Model performance was evaluated using leave-one-out-cross-validation. Models were implemented using the randomForest package within R. Classifier accuracy during the 24h free living trial was evaluated by calculating agreement with concurrently worn activPAL monitors. Overall classification accuracy for the random forest algorithm was 92.7%. Recognition accuracy for sedentary, stationary+, walking, and running was 80.1%, 95.7%, 91.7%, and 93.7%, respectively for the laboratory protocol. Agreement with the activPAL data (stepping vs. non-stepping) during the 24h free-living trial was excellent and, on average, exceeded 90%. The ICC for stepping time was 0.92 (95% CI=0.75-0.97). However, sensitivity and positive predictive values were modest. Mean bias was 10.3min/d (95% LOA=-46.0 to 25.4min/d). The random forest classifier for wrist accelerometer data yielded accurate group-level predictions under controlled conditions, but was less accurate at identifying stepping verse non-stepping behaviour in free living conditions Future studies should conduct more rigorous field-based evaluations using observation as a criterion measure. Copyright © 2016 Sports Medicine Australia. Published by Elsevier Ltd. All rights reserved.
Benchmarking dairy herd health status using routinely recorded herd summary data.
Parker Gaddis, K L; Cole, J B; Clay, J S; Maltecca, C
2016-02-01
Genetic improvement of dairy cattle health through the use of producer-recorded data has been determined to be feasible. Low estimated heritabilities indicate that genetic progress will be slow. Variation observed in lowly heritable traits can largely be attributed to nongenetic factors, such as the environment. More rapid improvement of dairy cattle health may be attainable if herd health programs incorporate environmental and managerial aspects. More than 1,100 herd characteristics are regularly recorded on farm test-days. We combined these data with producer-recorded health event data, and parametric and nonparametric models were used to benchmark herd and cow health status. Health events were grouped into 3 categories for analyses: mastitis, reproductive, and metabolic. Both herd incidence and individual incidence were used as dependent variables. Models implemented included stepwise logistic regression, support vector machines, and random forests. At both the herd and individual levels, random forest models attained the highest accuracy for predicting health status in all health event categories when evaluated with 10-fold cross-validation. Accuracy (SD) ranged from 0.61 (0.04) to 0.63 (0.04) when using random forest models at the herd level. Accuracy of prediction (SD) at the individual cow level ranged from 0.87 (0.06) to 0.93 (0.001) with random forest models. Highly significant variables and key words from logistic regression and random forest models were also investigated. All models identified several of the same key factors for each health event category, including movement out of the herd, size of the herd, and weather-related variables. We concluded that benchmarking health status using routinely collected herd data is feasible. Nonparametric models were better suited to handle this complex data with numerous variables. These data mining techniques were able to perform prediction of health status and could add evidence to personal experience in herd management. Copyright © 2016 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Mi, Xiangcheng; Swenson, Nathan G; Jia, Qi; Rao, Mide; Feng, Gang; Ren, Haibao; Bebber, Daniel P; Ma, Keping
2016-09-07
Deterministic and stochastic processes jointly determine the community dynamics of forest succession. However, it has been widely held in previous studies that deterministic processes dominate forest succession. Furthermore, inference of mechanisms for community assembly may be misleading if based on a single axis of diversity alone. In this study, we evaluated the relative roles of deterministic and stochastic processes along a disturbance gradient by integrating species, functional, and phylogenetic beta diversity in a subtropical forest chronosequence in Southeastern China. We found a general pattern of increasing species turnover, but little-to-no change in phylogenetic and functional turnover over succession at two spatial scales. Meanwhile, the phylogenetic and functional beta diversity were not significantly different from random expectation. This result suggested a dominance of stochastic assembly, contrary to the general expectation that deterministic processes dominate forest succession. On the other hand, we found significant interactions of environment and disturbance and limited evidence for significant deviations of phylogenetic or functional turnover from random expectations for different size classes. This result provided weak evidence of deterministic processes over succession. Stochastic assembly of forest succession suggests that post-disturbance restoration may be largely unpredictable and difficult to control in subtropical forests.
Ostashev, Vladimir E; Wilson, D Keith; Muhlestein, Michael B; Attenborough, Keith
2018-02-01
Although sound propagation in a forest is important in several applications, there are currently no rigorous yet computationally tractable prediction methods. Due to the complexity of sound scattering in a forest, it is natural to formulate the problem stochastically. In this paper, it is demonstrated that the equations for the statistical moments of the sound field propagating in a forest have the same form as those for sound propagation in a turbulent atmosphere if the scattering properties of the two media are expressed in terms of the differential scattering and total cross sections. Using the existing theories for sound propagation in a turbulent atmosphere, this analogy enables the derivation of several results for predicting forest acoustics. In particular, the second-moment parabolic equation is formulated for the spatial correlation function of the sound field propagating above an impedance ground in a forest with micrometeorology. Effective numerical techniques for solving this equation have been developed in atmospheric acoustics. In another example, formulas are obtained that describe the effect of a forest on the interference between the direct and ground-reflected waves. The formulated correspondence between wave propagation in discrete and continuous random media can also be used in other fields of physics.
David B. Clark; Paulo C. Olivas; Steven F. Oberbauer; Deborah A. Clark; Michael G. Ryan
2008-01-01
Leaf Area Index (leaf area per unit ground area, LAI) is a key driver of forest productivity but has never previously been measured directly at the landscape scale in tropical rain forest (TRF). We used a modular tower and stratified random sampling to harvest all foliage from forest floor to canopy top in 55 vertical transects (4.6 m2) across 500 ha of old growth in...
NASA Astrophysics Data System (ADS)
Tsangaratos, Paraskevas; Ilia, Ioanna; Loupasakis, Constantinos; Papadakis, Michalis; Karimalis, Antonios
2017-04-01
The main objective of the present study was to apply two machine learning methods for the production of a landslide susceptibility map in the Finikas catchment basin, located in North Peloponnese, Greece and to compare their results. Specifically, Logistic Regression and Random Forest were utilized, based on a database of 40 sites classified into two categories, non-landslide and landslide areas that were separated into a training dataset (70% of the total data) and a validation dataset (remaining 30%). The identification of the areas was established by analyzing airborne imagery, extensive field investigation and the examination of previous research studies. Six landslide related variables were analyzed, namely: lithology, elevation, slope, aspect, distance to rivers and distance to faults. Within the Finikas catchment basin most of the reported landslides were located along the road network and within the residential complexes, classified as rotational and translational slides, and rockfalls, mainly caused due to the physical conditions and the general geotechnical behavior of the geological formation that cover the area. Each landslide susceptibility map was reclassified by applying the Geometric Interval classification technique into five classes, namely: very low susceptibility, low susceptibility, moderate susceptibility, high susceptibility, and very high susceptibility. The comparison and validation of the outcomes of each model were achieved using statistical evaluation measures, the receiving operating characteristic and the area under the success and predictive rate curves. The computation process was carried out using RStudio an integrated development environment for R language and ArcGIS 10.1 for compiling the data and producing the landslide susceptibility maps. From the outcomes of the Logistic Regression analysis it was induced that the highest b coefficient is allocated to lithology and slope, which was 2.8423 and 1.5841, respectively. From the estimation of the mean decrease in Gini coefficient performed during the application of Random Forest and the mean decrease in accuracy the most important variable is slope followed by lithology, aspect, elevation, distance from river network, and distance from faults, while the most used variables during the training phase were the variable aspect (21.45%), slope (20.53%) and lithology (19.84%). The outcomes of the analysis are consistent with previous studies concerning the area of research, which have indicated the high influence of lithology and slope in the manifestation of landslides. High percentage of landslide occurrence has been observed in Plio-Pleistocene sediments, flysch formations, and Cretaceous limestone. Also the presences of landslides have been associated with the degree of weathering and fragmentation, the orientation of the discontinuities surfaces and the intense morphological relief. The most accurate model was Random Forest which identified correctly 92.00% of the instances during the training phase, followed by the Logistic Regression 89.00%. The same pattern of accuracy was calculated during the validation phase, in which the Random Forest achieved a classification accuracy of 93.00%, while the Logistic Regression model achieved an accuracy of 91.00%. In conclusion, the outcomes of the study could be a useful cartographic product to local authorities and government agencies during the implementation of successful decision-making and land use planning strategies. Keywords: Landslide Susceptibility, Logistic Regression, Random Forest, GIS, Greece.
Dynamics of Tree Species Diversity in Unlogged and Selectively Logged Malaysian Forests.
Shima, Ken; Yamada, Toshihiro; Okuda, Toshinori; Fletcher, Christine; Kassim, Abdul Rahman
2018-01-18
Selective logging that is commonly conducted in tropical forests may change tree species diversity. In rarely disturbed tropical forests, locally rare species exhibit higher survival rates. If this non-random process occurs in a logged forest, the forest will rapidly recover its tree species diversity. Here we determined whether a forest in the Pasoh Forest Reserve, Malaysia, which was selectively logged 40 years ago, recovered its original species diversity (species richness and composition). To explore this, we compared the dynamics of secies diversity between unlogged forest plot (18.6 ha) and logged forest plot (5.4 ha). We found that 40 years are not sufficient to recover species diversity after logging. Unlike unlogged forests, tree deaths and recruitments did not contribute to increased diversity in the selectively logged forests. Our results predict that selectively logged forests require a longer time at least than our observing period (40 years) to regain their diversity.
A Predictive Analysis of the Department of Defense Distribution System Utilizing Random Forests
2016-06-01
resources capable of meeting both customer and individual resource constraints and goals while also maximizing the global benefit to the supply...and probability rules to determine the optimal red wine distribution network for an Italian-based wine producer. The decision support model for...combinations of factors that will result in delivery of the highest quality wines . The model’s first stage inputs basic logistics information to look
Assessing change in large-scale forest area by visually interpreting Landsat images
Jerry D. Greer; Frederick P. Weber; Raymond L. Czaplewski
2000-01-01
As part of the Forest Resources Assessment 1990, the Food and Agriculture Organization of the United Nations visually interpreted a stratified random sample of 117 Landsat scenes to estimate global status and change in tropical forest area. Images from 1980 and 1990 were interpreted by a group of widely experienced technical people in many different tropical countries...
A ground-based method of assessing urban forest structure and ecosystem services
David J. Nowak; Daniel E. Crane; Jack C. Stevens; Robert E. Hoehn; Jeffrey T. Walton; Jerry Bond
2008-01-01
To properly manage urban forests, it is essential to have data on this important resource. An efficient means to obtain this information is to randomly sample urban areas. To help assess the urban forest structure (e.g., number of trees, species composition, tree sizes, health) and several functions (e.g., air pollution removal, carbon storage and sequestration), the...
Spatially random mortality in old-growth red pine forests of northern Minnesota
Tuomas Aakala; Shawn Fraver; Brian J. Palik; Anthony W. D' Amato
2012-01-01
Characterizing the spatial distribution of tree mortality is critical to understanding forest dynamics, but empirical studies on these patterns under old-growth conditions are rare. This rarity is due in part to low mortality rates in old-growth forests, the study of which necessitates long observation periods, and the confounding influence of tree in-growth during...
Measuring forest landscape patterns in the Cascade Range of Oregon, USA
NASA Technical Reports Server (NTRS)
Ripple, William J.; Bradshaw, G. A.; Spies, Thomas A.
1995-01-01
This paper describes the use of a set of spatial statistics to quantify the landscape pattern caused by the patchwork of clearcuts made over a 15-year period in the western Cascades of Oregon. Fifteen areas were selected at random to represent a diversity of landscape fragmentation patterns. Managed forest stands (patches) were digitized and analyzed to produce both tabular and mapped information describing patch size, shape, abundance and spacing, and matrix characteristics of a given area. In addition, a GIS fragmentation index was developed which was found to be sensitive to patch abundance and to the spatial distribution of patches. Use of the GIS-derived index provides an automated method of determining the level of forest fragmentation and can be used to facilitate spatial analysis of the landscape for later coordination with field and remotely sensed data. A comparison of the spatial statistics calculated for the two years indicates an increase in forest fragmentation as characterized by an increase in mean patch abundance and a decrease in interpatch distance, amount of interior natural forest habitat, and the GIS fragmentation index. Such statistics capable of quantifying patch shape and spatial distribution may prove important in the evaluation of the changing character of interior and edge habitats for wildlife.
NASA Technical Reports Server (NTRS)
Fagan, Matthew E.; Defries, Ruth S.; Sesnie, Steven E.; Arroyo-Mora, J. Pablo; Soto, Carlomagno; Singh, Aditya; Townsend, Philip A.; Chazdon, Robin L.
2015-01-01
An efficient means to map tree plantations is needed to detect tropical land use change and evaluate reforestation projects. To analyze recent tree plantation expansion in northeastern Costa Rica, we examined the potential of combining moderate-resolution hyperspectral imagery (2005 HyMap mosaic) with multitemporal, multispectral data (Landsat) to accurately classify (1) general forest types and (2) tree plantations by species composition. Following a linear discriminant analysis to reduce data dimensionality, we compared four Random Forest classification models: hyperspectral data (HD) alone; HD plus interannual spectral metrics; HD plus a multitemporal forest regrowth classification; and all three models combined. The fourth, combined model achieved overall accuracy of 88.5%. Adding multitemporal data significantly improved classification accuracy (p less than 0.0001) of all forest types, although the effect on tree plantation accuracy was modest. The hyperspectral data alone classified six species of tree plantations with 75% to 93% producer's accuracy; adding multitemporal spectral data increased accuracy only for two species with dense canopies. Non-native tree species had higher classification accuracy overall and made up the majority of tree plantations in this landscape. Our results indicate that combining occasionally acquired hyperspectral data with widely available multitemporal satellite imagery enhances mapping and monitoring of reforestation in tropical landscapes.
Satellite and in situ monitoring data used for modeling of forest vegetation reflectance
NASA Astrophysics Data System (ADS)
Zoran, M. A.; Savastru, R. S.; Savastru, D. M.; Miclos, S. I.; Tautan, M. N.; Baschir, L.
2010-10-01
As climatic variability and anthropogenic stressors are growing up continuously, must be defined the proper criteria for forest vegetation assessment. In order to characterize current and future state of forest vegetation satellite imagery is a very useful tool. Vegetation can be distinguished using remote sensing data from most other (mainly inorganic) materials by virtue of its notable absorption in the red and blue segments of the visible spectrum, its higher green reflectance and, especially, its very strong reflectance in the near-IR. Vegetation reflectance has variations with sun zenith angle, view zenith angle, and terrain slope angle. To provide corrections of these effects, for visible and near-infrared light, was used a developed a simple physical model of vegetation reflectance, by assuming homogeneous and closed vegetation canopy with randomly oriented leaves. A simple physical model of forest vegetation reflectance was applied and validated for Cernica forested area, near Bucharest town through two ASTER satellite data , acquired within minutes from one another ,a nadir and off-nadir for band 3 lying in the near infra red, most radiance differences between the two scenes can be attributed to the BRDF effect. Other satellite data MODIS, Landsat TM and ETM as well as, IKONOS have been used for different NDVI and classification analysis.
Epidemiology of forest malaria in central Vietnam: a large scale cross-sectional survey.
Erhart, Annette; Ngo, Duc Thang; Phan, Van Ky; Ta, Thi Tinh; Van Overmeir, Chantal; Speybroeck, Niko; Obsomer, Valerie; Le, Xuan Hung; Le, Khanh Thuan; Coosemans, Marc; D'alessandro, Umberto
2005-12-08
In Vietnam, a large proportion of all malaria cases and deaths occurs in the central mountainous and forested part of the country. Indeed, forest malaria, despite intensive control activities, is still a major problem which raises several questions about its dynamics.A large-scale malaria morbidity survey to measure malaria endemicity and identify important risk factors was carried out in 43 villages situated in a forested area of Ninh Thuan province, south central Vietnam. Four thousand three hundred and six randomly selected individuals, aged 10-60 years, participated in the survey. Rag Lays (86%), traditionally living in the forest and practising "slash and burn" cultivation represented the most common ethnic group. The overall parasite rate was 13.3% (range [0-42.3] while Plasmodium falciparum seroprevalence was 25.5% (range [2.1-75.6]). Mapping of these two variables showed a patchy distribution, suggesting that risk factors other than remoteness and forest proximity modulated the human-vector interactions. This was confirmed by the results of the multivariate-adjusted analysis, showing that forest work was a significant risk factor for malaria infection, further increased by staying in the forest overnight (OR= 2.86; 95%CI [1.62; 5.07]). Rag Lays had a higher risk of malaria infection, which inversely related to education level and socio-economic status. Women were less at risk than men (OR = 0.71; 95%CI [0.59; 0.86]), a possible consequence of different behaviour. This study confirms that malaria endemicity is still relatively high in this area and that the dynamics of transmission is constantly modulated by the behaviour of both humans and vectors. A well-targeted intervention reducing the "vector/forest worker" interaction, based on long-lasting insecticidal material, could be appropriate in this environment.
Epidemiology of forest malaria in central Vietnam: a large scale cross-sectional survey
Erhart, Annette; Thang, Ngo Duc; Van Ky, Phan; Tinh, Ta Thi; Van Overmeir, Chantal; Speybroeck, Niko; Obsomer, Valerie; Hung, Le Xuan; Thuan, Le Khanh; Coosemans, Marc; D'alessandro, Umberto
2005-01-01
In Vietnam, a large proportion of all malaria cases and deaths occurs in the central mountainous and forested part of the country. Indeed, forest malaria, despite intensive control activities, is still a major problem which raises several questions about its dynamics. A large-scale malaria morbidity survey to measure malaria endemicity and identify important risk factors was carried out in 43 villages situated in a forested area of Ninh Thuan province, south central Vietnam. Four thousand three hundred and six randomly selected individuals, aged 10–60 years, participated in the survey. Rag Lays (86%), traditionally living in the forest and practising "slash and burn" cultivation represented the most common ethnic group. The overall parasite rate was 13.3% (range [0–42.3] while Plasmodium falciparum seroprevalence was 25.5% (range [2.1–75.6]). Mapping of these two variables showed a patchy distribution, suggesting that risk factors other than remoteness and forest proximity modulated the human-vector interactions. This was confirmed by the results of the multivariate-adjusted analysis, showing that forest work was a significant risk factor for malaria infection, further increased by staying in the forest overnight (OR= 2.86; 95%CI [1.62; 5.07]). Rag Lays had a higher risk of malaria infection, which inversely related to education level and socio-economic status. Women were less at risk than men (OR = 0.71; 95%CI [0.59; 0.86]), a possible consequence of different behaviour. This study confirms that malaria endemicity is still relatively high in this area and that the dynamics of transmission is constantly modulated by the behaviour of both humans and vectors. A well-targeted intervention reducing the "vector/forest worker" interaction, based on long-lasting insecticidal material, could be appropriate in this environment. PMID:16336671
Impacts of Landscape Context on Patterns of Wind Downfall Damage in a Fragmented Amazonian Landscape
NASA Astrophysics Data System (ADS)
Schwartz, N.; Uriarte, M.; DeFries, R. S.; Gutierrez-Velez, V. H.; Fernandes, K.; Pinedo-Vasquez, M.
2015-12-01
Wind is a major disturbance in the Amazon and has both short-term impacts and lasting legacies in tropical forests. Observed patterns of damage across landscapes result from differences in wind exposure and stand characteristics, such as tree stature, species traits, successional age, and fragmentation. Wind disturbance has important consequences for biomass dynamics in Amazonian forests, and understanding the spatial distribution and size of impacts is necessary to quantify the effects on carbon dynamics. In November 2013, a mesoscale convective system was observed over the study area in Ucayali, Peru, a highly human modified and fragmented forest landscape. We mapped downfall damage associated with the storm in order to ask: how does the severity of damage vary within forest patches, and across forest patches of different sizes and successional ages? We applied spectral mixture analysis to Landsat images from 2013 and 2014 to calculate the change in non-photosynthetic vegetation fraction after the storm, and combined it with C-band SAR data from the Sentinel-1 satellite to predict downfall damage measured in 30 field plots using random forest regression. We then applied this model to map damage in forests across the study area. Using a land cover classification developed in a previous study, we mapped secondary and mature forest, and compared the severity of damage in the two. We found that damage was on average higher in secondary forests, but patterns varied spatially. This study demonstrates the utility of using multiple sources of satellite data for mapping wind disturbance, and adds to our understanding of the sources of variation in wind-related damage. Ultimately, an improved ability to map wind impacts and a better understanding of their spatial patterns can contribute to better quantification of carbon dynamics in Amazonian landscapes.
Lund, Jensen A; Brown, Paula N; Shipley, Paul R
2017-09-01
For compliance with US Current Good Manufacturing Practice regulations for dietary supplements, manufacturers must provide identity of source plant material. Despite the popularity of hawthorn as a dietary supplement, relatively little is known about the comparative phytochemistry of different hawthorn species, and in particular North American hawthorns. The combination of NMR spectrometry with chemometric analyses offers an innovative approach to differentiating hawthorn species and exploring the phytochemistry. Two European and two North American species, harvested from a farm trial in late summer 2008, were analyzed by standard 1D 1 H and J-resolved (JRES) experiments. The data were preprocessed and modelled by principal component analysis (PCA). A supervised model was then generated by partial least squares-discriminant analysis (PLS-DA) for classification and evaluated by cross validation. Supervised random forests models were constructed from the dataset to explore the potential of machine learning for identification of unique patterns across species. 1D 1 H NMR data yielded increased differentiation over the JRES data. The random forests results correlated with PLS-DA results and outperformed PLS-DA in classification accuracy. In all of these analyses differentiation of the Crataegus spp. was best achieved by focusing on the NMR spectral region that contains signals unique to plant phenolic compounds. Identification of potentially significant metabolites for differentiation between species was approached using univariate techniques including significance analysis of microarrays and Kruskall-Wallis tests. Copyright © 2017 Elsevier Ltd. All rights reserved.
Pyne, Matthew I.; Carlisle, Daren M.; Konrad, Christopher P.; Stein, Eric D.
2017-01-01
Regional classification of streams is an early step in the Ecological Limits of Hydrologic Alteration framework. Many stream classifications are based on an inductive approach using hydrologic data from minimally disturbed basins, but this approach may underrepresent streams from heavily disturbed basins or sparsely gaged arid regions. An alternative is a deductive approach, using watershed climate, land use, and geomorphology to classify streams, but this approach may miss important hydrological characteristics of streams. We classified all stream reaches in California using both approaches. First, we used Bayesian and hierarchical clustering to classify reaches according to watershed characteristics. Streams were clustered into seven classes according to elevation, sedimentary rock, and winter precipitation. Permutation-based analysis of variance and random forest analyses were used to determine which hydrologic variables best separate streams into their respective classes. Stream typology (i.e., the class that a stream reach is assigned to) is shaped mainly by patterns of high and mean flow behavior within the stream's landscape context. Additionally, random forest was used to determine which hydrologic variables best separate minimally disturbed reference streams from non-reference streams in each of the seven classes. In contrast to stream typology, deviation from reference conditions is more difficult to detect and is largely defined by changes in low-flow variables, average daily flow, and duration of flow. Our combined deductive/inductive approach allows us to estimate flow under minimally disturbed conditions based on the deductive analysis and compare to measured flow based on the inductive analysis in order to estimate hydrologic change.
Chen, Shan; Li, Xiao-ning; Liang, Yi-zeng; Zhang, Zhi-min; Liu, Zhao-xia; Zhang, Qi-ming; Ding, Li-xia; Ye, Fei
2010-08-01
During Raman spectroscopy analysis, the organic molecules and contaminations will obscure or swamp Raman signals. The present study starts from Raman spectra of prednisone acetate tablets and glibenclamide tables, which are acquired from the BWTek i-Raman spectrometer. The background is corrected by R package baselineWavelet. Then principle component analysis and random forests are used to perform clustering analysis. Through analyzing the Raman spectra of two medicines, the accurate and validity of this background-correction algorithm is checked and the influences of fluorescence background on Raman spectra clustering analysis is discussed. Thus, it is concluded that it is important to correct fluorescence background for further analysis, and an effective background correction solution is provided for clustering or other analysis.
CRF: detection of CRISPR arrays using random forest.
Wang, Kai; Liang, Chun
2017-01-01
CRISPRs (clustered regularly interspaced short palindromic repeats) are particular repeat sequences found in wide range of bacteria and archaea genomes. Several tools are available for detecting CRISPR arrays in the genomes of both domains. Here we developed a new web-based CRISPR detection tool named CRF (CRISPR Finder by Random Forest). Different from other CRISPR detection tools, a random forest classifier was used in CRF to filter out invalid CRISPR arrays from all putative candidates and accordingly enhanced detection accuracy. In CRF, particularly, triplet elements that combine both sequence content and structure information were extracted from CRISPR repeats for classifier training. The classifier achieved high accuracy and sensitivity. Moreover, CRF offers a highly interactive web interface for robust data visualization that is not available among other CRISPR detection tools. After detection, the query sequence, CRISPR array architecture, and the sequences and secondary structures of CRISPR repeats and spacers can be visualized for visual examination and validation. CRF is freely available at http://bioinfolab.miamioh.edu/crf/home.php.
Do bioclimate variables improve performance of climate envelope models?
Watling, James I.; Romañach, Stephanie S.; Bucklin, David N.; Speroterra, Carolina; Brandt, Laura A.; Pearlstine, Leonard G.; Mazzotti, Frank J.
2012-01-01
Climate envelope models are widely used to forecast potential effects of climate change on species distributions. A key issue in climate envelope modeling is the selection of predictor variables that most directly influence species. To determine whether model performance and spatial predictions were related to the selection of predictor variables, we compared models using bioclimate variables with models constructed from monthly climate data for twelve terrestrial vertebrate species in the southeastern USA using two different algorithms (random forests or generalized linear models), and two model selection techniques (using uncorrelated predictors or a subset of user-defined biologically relevant predictor variables). There were no differences in performance between models created with bioclimate or monthly variables, but one metric of model performance was significantly greater using the random forest algorithm compared with generalized linear models. Spatial predictions between maps using bioclimate and monthly variables were very consistent using the random forest algorithm with uncorrelated predictors, whereas we observed greater variability in predictions using generalized linear models.
Clustering Single-Cell Expression Data Using Random Forest Graphs.
Pouyan, Maziyar Baran; Nourani, Mehrdad
2017-07-01
Complex tissues such as brain and bone marrow are made up of multiple cell types. As the study of biological tissue structure progresses, the role of cell-type-specific research becomes increasingly important. Novel sequencing technology such as single-cell cytometry provides researchers access to valuable biological data. Applying machine-learning techniques to these high-throughput datasets provides deep insights into the cellular landscape of the tissue where those cells are a part of. In this paper, we propose the use of random-forest-based single-cell profiling, a new machine-learning-based technique, to profile different cell types of intricate tissues using single-cell cytometry data. Our technique utilizes random forests to capture cell marker dependences and model the cellular populations using the cell network concept. This cellular network helps us discover what cell types are in the tissue. Our experimental results on public-domain datasets indicate promising performance and accuracy of our technique in extracting cell populations of complex tissues.
NASA Astrophysics Data System (ADS)
de Santana, Felipe Bachion; de Souza, André Marcelo; Poppi, Ronei Jesus
2018-02-01
This study evaluates the use of visible and near infrared spectroscopy (Vis-NIRS) combined with multivariate regression based on random forest to quantify some quality soil parameters. The parameters analyzed were soil cation exchange capacity (CEC), sum of exchange bases (SB), organic matter (OM), clay and sand present in the soils of several regions of Brazil. Current methods for evaluating these parameters are laborious, timely and require various wet analytical methods that are not adequate for use in precision agriculture, where faster and automatic responses are required. The random forest regression models were statistically better than PLS regression models for CEC, OM, clay and sand, demonstrating resistance to overfitting, attenuating the effect of outlier samples and indicating the most important variables for the model. The methodology demonstrates the potential of the Vis-NIR as an alternative for determination of CEC, SB, OM, sand and clay, making possible to develop a fast and automatic analytical procedure.
Modelling Variable Fire Severity in Boreal Forests: Effects of Fire Intensity and Stand Structure
Miquelajauregui, Yosune; Cumming, Steven G.; Gauthier, Sylvie
2016-01-01
It is becoming clear that fires in boreal forests are not uniformly stand-replacing. On the contrary, marked variation in fire severity, measured as tree mortality, has been found both within and among individual fires. It is important to understand the conditions under which this variation can arise. We integrated forest sample plot data, tree allometries and historical forest fire records within a diameter class-structured model of 1.0 ha patches of mono-specific black spruce and jack pine stands in northern Québec, Canada. The model accounts for crown fire initiation and vertical spread into the canopy. It uses empirical relations between fire intensity, scorch height, the percent of crown scorched and tree mortality to simulate fire severity, specifically the percent reduction in patch basal area due to fire-caused mortality. A random forest and a regression tree analysis of a large random sample of simulated fires were used to test for an effect of fireline intensity, stand structure, species composition and pyrogeographic regions on resultant severity. Severity increased with intensity and was lower for jack pine stands. The proportion of simulated fires that burned at high severity (e.g. >75% reduction in patch basal area) was 0.80 for black spruce and 0.11 for jack pine. We identified thresholds in intensity below which there was a marked sensitivity of simulated fire severity to stand structure, and to interactions between intensity and structure. We found no evidence for a residual effect of pyrogeographic region on simulated severity, after the effects of stand structure and species composition were accounted for. The model presented here was able to produce variation in fire severity under a range of fire intensity conditions. This suggests that variation in stand structure is one of the factors causing the observed variation in boreal fire severity. PMID:26919456
Del-Rio, Glaucia; Rêgo, Marco Antonio; Silveira, Luís Fábio
2015-01-01
Over the last 200 years the wetlands of the Upper Tietê and Upper Paraíba do Sul basins, in the southeastern Atlantic Forest, Brazil, have been almost-completely transformed by urbanization, agriculture and mining. Endemic to these river basins, the São Paulo Marsh Antwren (Formicivora paludicola) survived these impacts, but remained unknown to science until its discovery in 2005. Its population status was cause for immediate concern. In order to understand the factors imperiling the species, and provide guidelines for its conservation, we investigated both the species’ distribution and the distribution of areas of suitable habitat using a multiscale approach encompassing species distribution modeling, fieldwork surveys and occupancy models. Of six species distribution models methods used (Generalized Linear Models, Generalized Additive Models, Multivariate Adaptive Regression Splines, Classification Tree Analysis, Artificial Neural Networks and Random Forest), Random Forest showed the best fit and was utilized to guide field validation. After surveying 59 sites, our results indicated that Formicivora paludicola occurred in only 13 sites, having narrow habitat specificity, and restricted habitat availability. Additionally, historic maps, distribution models and satellite imagery showed that human occupation has resulted in a loss of more than 346 km2 of suitable habitat for this species since the early twentieth century, so that it now only occupies a severely fragmented area (area of occupancy) of 1.42 km2, and it should be considered Critically Endangered according to IUCN criteria. Furthermore, averaged occupancy models showed that marshes with lower cattail (Typha dominguensis) densities have higher probabilities of being occupied. Thus, these areas should be prioritized in future conservation efforts to protect the species, and to restore a portion of Atlantic Forest wetlands, in times of unprecedented regional water supply problems. PMID:25798608
Modelling Variable Fire Severity in Boreal Forests: Effects of Fire Intensity and Stand Structure.
Miquelajauregui, Yosune; Cumming, Steven G; Gauthier, Sylvie
2016-01-01
It is becoming clear that fires in boreal forests are not uniformly stand-replacing. On the contrary, marked variation in fire severity, measured as tree mortality, has been found both within and among individual fires. It is important to understand the conditions under which this variation can arise. We integrated forest sample plot data, tree allometries and historical forest fire records within a diameter class-structured model of 1.0 ha patches of mono-specific black spruce and jack pine stands in northern Québec, Canada. The model accounts for crown fire initiation and vertical spread into the canopy. It uses empirical relations between fire intensity, scorch height, the percent of crown scorched and tree mortality to simulate fire severity, specifically the percent reduction in patch basal area due to fire-caused mortality. A random forest and a regression tree analysis of a large random sample of simulated fires were used to test for an effect of fireline intensity, stand structure, species composition and pyrogeographic regions on resultant severity. Severity increased with intensity and was lower for jack pine stands. The proportion of simulated fires that burned at high severity (e.g. >75% reduction in patch basal area) was 0.80 for black spruce and 0.11 for jack pine. We identified thresholds in intensity below which there was a marked sensitivity of simulated fire severity to stand structure, and to interactions between intensity and structure. We found no evidence for a residual effect of pyrogeographic region on simulated severity, after the effects of stand structure and species composition were accounted for. The model presented here was able to produce variation in fire severity under a range of fire intensity conditions. This suggests that variation in stand structure is one of the factors causing the observed variation in boreal fire severity.
Del-Rio, Glaucia; Rêgo, Marco Antonio; Silveira, Luís Fábio
2015-01-01
Over the last 200 years the wetlands of the Upper Tietê and Upper Paraíba do Sul basins, in the southeastern Atlantic Forest, Brazil, have been almost-completely transformed by urbanization, agriculture and mining. Endemic to these river basins, the São Paulo Marsh Antwren (Formicivora paludicola) survived these impacts, but remained unknown to science until its discovery in 2005. Its population status was cause for immediate concern. In order to understand the factors imperiling the species, and provide guidelines for its conservation, we investigated both the species' distribution and the distribution of areas of suitable habitat using a multiscale approach encompassing species distribution modeling, fieldwork surveys and occupancy models. Of six species distribution models methods used (Generalized Linear Models, Generalized Additive Models, Multivariate Adaptive Regression Splines, Classification Tree Analysis, Artificial Neural Networks and Random Forest), Random Forest showed the best fit and was utilized to guide field validation. After surveying 59 sites, our results indicated that Formicivora paludicola occurred in only 13 sites, having narrow habitat specificity, and restricted habitat availability. Additionally, historic maps, distribution models and satellite imagery showed that human occupation has resulted in a loss of more than 346 km2 of suitable habitat for this species since the early twentieth century, so that it now only occupies a severely fragmented area (area of occupancy) of 1.42 km2, and it should be considered Critically Endangered according to IUCN criteria. Furthermore, averaged occupancy models showed that marshes with lower cattail (Typha dominguensis) densities have higher probabilities of being occupied. Thus, these areas should be prioritized in future conservation efforts to protect the species, and to restore a portion of Atlantic Forest wetlands, in times of unprecedented regional water supply problems.
NASA Technical Reports Server (NTRS)
Rejmankova, E.; Pope, K. O.; Roberts, D. R.; Lege, M. G.; Andre, R.; Greico, J.; Alonzo, Y.
1998-01-01
Surveys of larval habitats of Anopheles vestitipennis and Anopheles punctimacula were conducted in Belize, Central America. Habitat analysis and classification resulted in delineation of eight habitat types defined by dominant life forms and hydrology. Percent cover of tall dense macrophytes, shrubs, open water, and pH were significantly different between sites with and without An. vestitipennis. For An. punctimacula, percent cover of tall dense macrophytes, trees, detritus, open water, and water depth were significantly different between larvae positive and negative sites. The discriminant function for An. vestitipennis correctly predicted the presence of larvae in 65% of sites and correctly predicted the absence of larvae in 88% of sites. The discriminant function for An. punctimacula correctly predicted 81% of sites for the presence of larvae and 45% for the absence of larvae. Canonical discriminant analysis of the three groups of habitats (An. vestitipennis positive; An. punctimacula positive; all negative) confirmed that while larval habitats of An. punctimacula are clustered in the tree dominated area, larval habitats of An. vestitipennis were found in both tree dominated and tall dense macrophyte dominated environments. The forest larval habitats of An. vestitipennis and An. punctimacula seem to be randomly distributed among different forest types. Both species tend to occur in denser forests with more detritus, shallower water, and slightly higher pH. Classification of dry season (February) SPOT multispectral satellite imagery produced 10 land cover types with the swamp forest and tall dense marsh classes being of particular interest. The accuracy assessment showed that commission errors for the tall, dense marsh and swamp forest appeared to be minor; but omission errors were significant, especially for the swamp forest (perhaps because no swamp forests are flooded in February). This means that where the classification indicates there are An. vestitipennis breeding sites, they probably do exist; but breeding sites in many locations are not identified and could be more abundant than indicated.
Predicting Coastal Flood Severity using Random Forest Algorithm
NASA Astrophysics Data System (ADS)
Sadler, J. M.; Goodall, J. L.; Morsy, M. M.; Spencer, K.
2017-12-01
Coastal floods have become more common recently and are predicted to further increase in frequency and severity due to sea level rise. Predicting floods in coastal cities can be difficult due to the number of environmental and geographic factors which can influence flooding events. Built stormwater infrastructure and irregular urban landscapes add further complexity. This paper demonstrates the use of machine learning algorithms in predicting street flood occurrence in an urban coastal setting. The model is trained and evaluated using data from Norfolk, Virginia USA from September 2010 - October 2016. Rainfall, tide levels, water table levels, and wind conditions are used as input variables. Street flooding reports made by city workers after named and unnamed storm events, ranging from 1-159 reports per event, are the model output. Results show that Random Forest provides predictive power in estimating the number of flood occurrences given a set of environmental conditions with an out-of-bag root mean squared error of 4.3 flood reports and a mean absolute error of 0.82 flood reports. The Random Forest algorithm performed much better than Poisson regression. From the Random Forest model, total daily rainfall was by far the most important factor in flood occurrence prediction, followed by daily low tide and daily higher high tide. The model demonstrated here could be used to predict flood severity based on forecast rainfall and tide conditions and could be further enhanced using more complete street flooding data for model training.
Differentiation of fat, muscle, and edema in thigh MRIs using random forest classification
NASA Astrophysics Data System (ADS)
Kovacs, William; Liu, Chia-Ying; Summers, Ronald M.; Yao, Jianhua
2016-03-01
There are many diseases that affect the distribution of muscles, including Duchenne and fascioscapulohumeral dystrophy among other myopathies. In these disease cases, it is important to quantify both the muscle and fat volumes to track the disease progression. There has also been evidence that abnormal signal intensity on the MR images, which often is an indication of edema or inflammation can be a good predictor for muscle deterioration. We present a fully-automated method that examines magnetic resonance (MR) images of the thigh and identifies the fat, muscle, and edema using a random forest classifier. First the thigh regions are automatically segmented using the T1 sequence. Then, inhomogeneity artifacts were corrected using the N3 technique. The T1 and STIR (short tau inverse recovery) images are then aligned using landmark based registration with the bone marrow. The normalized T1 and STIR intensity values are used to train the random forest. Once trained, the random forest can accurately classify the aforementioned classes. This method was evaluated on MR images of 9 patients. The precision values are 0.91+/-0.06, 0.98+/-0.01 and 0.50+/-0.29 for muscle, fat, and edema, respectively. The recall values are 0.95+/-0.02, 0.96+/-0.03 and 0.43+/-0.09 for muscle, fat, and edema, respectively. This demonstrates the feasibility of utilizing information from multiple MR sequences for the accurate quantification of fat, muscle and edema.
AUTOCLASSIFICATION OF THE VARIABLE 3XMM SOURCES USING THE RANDOM FOREST MACHINE LEARNING ALGORITHM
DOE Office of Scientific and Technical Information (OSTI.GOV)
Farrell, Sean A.; Murphy, Tara; Lo, Kitty K., E-mail: s.farrell@physics.usyd.edu.au
In the current era of large surveys and massive data sets, autoclassification of astrophysical sources using intelligent algorithms is becoming increasingly important. In this paper we present the catalog of variable sources in the Third XMM-Newton Serendipitous Source catalog (3XMM) autoclassified using the Random Forest machine learning algorithm. We used a sample of manually classified variable sources from the second data release of the XMM-Newton catalogs (2XMMi-DR2) to train the classifier, obtaining an accuracy of ∼92%. We also evaluated the effectiveness of identifying spurious detections using a sample of spurious sources, achieving an accuracy of ∼95%. Manual investigation of amore » random sample of classified sources confirmed these accuracy levels and showed that the Random Forest machine learning algorithm is highly effective at automatically classifying 3XMM sources. Here we present the catalog of classified 3XMM variable sources. We also present three previously unidentified unusual sources that were flagged as outlier sources by the algorithm: a new candidate supergiant fast X-ray transient, a 400 s X-ray pulsar, and an eclipsing 5 hr binary system coincident with a known Cepheid.« less
Prediction of survival with multi-scale radiomic analysis in glioblastoma patients.
Chaddad, Ahmad; Sabri, Siham; Niazi, Tamim; Abdulkarim, Bassam
2018-06-19
We propose a multiscale texture features based on Laplacian-of Gaussian (LoG) filter to predict progression free (PFS) and overall survival (OS) in patients newly diagnosed with glioblastoma (GBM). Experiments use the extracted features derived from 40 patients of GBM with T1-weighted imaging (T1-WI) and Fluid-attenuated inversion recovery (FLAIR) images that were segmented manually into areas of active tumor, necrosis, and edema. Multiscale texture features were extracted locally from each of these areas of interest using a LoG filter and the relation between features to OS and PFS was investigated using univariate (i.e., Spearman's rank correlation coefficient, log-rank test and Kaplan-Meier estimator) and multivariate analyses (i.e., Random Forest classifier). Three and seven features were statistically correlated with PFS and OS, respectively, with absolute correlation values between 0.32 and 0.36 and p < 0.05. Three features derived from active tumor regions only were associated with OS (p < 0.05) with hazard ratios (HR) of 2.9, 3, and 3.24, respectively. Combined features showed an AUC value of 85.37 and 85.54% for predicting the PFS and OS of GBM patients, respectively, using the random forest (RF) classifier. We presented a multiscale texture features to characterize the GBM regions and predict he PFS and OS. The efficiency achievable suggests that this technique can be developed into a GBM MR analysis system suitable for clinical use after a thorough validation involving more patients. Graphical abstract Scheme of the proposed model for characterizing the heterogeneity of GBM regions and predicting the overall survival and progression free survival of GBM patients. (1) Acquisition of pretreatment MRI images; (2) Affine registration of T1-WI image with its corresponding FLAIR images, and GBM subtype (phenotypes) labelling; (3) Extraction of nine texture features from the three texture scales fine, medium, and coarse derived from each of GBM regions; (4) Comparing heterogeneity between GBM regions by ANOVA test; Survival analysis using Univariate (Spearman rank correlation between features and survival (i.e., PFS and OS) based on each of the GBM regions, Kaplan-Meier estimator and log-rank test to predict the PFS and OS of patient groups that grouped based on median of feature), and multivariate (random forest model) for predicting the PFS and OS of patients groups that grouped based on median of PFS and OS.
Saliency-Guided Change Detection of Remotely Sensed Images Using Random Forest
NASA Astrophysics Data System (ADS)
Feng, W.; Sui, H.; Chen, X.
2018-04-01
Studies based on object-based image analysis (OBIA) representing the paradigm shift in change detection (CD) have achieved remarkable progress in the last decade. Their aim has been developing more intelligent interpretation analysis methods in the future. The prediction effect and performance stability of random forest (RF), as a new kind of machine learning algorithm, are better than many single predictors and integrated forecasting method. In this paper, we present a novel CD approach for high-resolution remote sensing images, which incorporates visual saliency and RF. First, highly homogeneous and compact image super-pixels are generated using super-pixel segmentation, and the optimal segmentation result is obtained through image superimposition and principal component analysis (PCA). Second, saliency detection is used to guide the search of interest regions in the initial difference image obtained via the improved robust change vector analysis (RCVA) algorithm. The salient regions within the difference image that correspond to the binarized saliency map are extracted, and the regions are subject to the fuzzy c-means (FCM) clustering to obtain the pixel-level pre-classification result, which can be used as a prerequisite for superpixel-based analysis. Third, on the basis of the optimal segmentation and pixel-level pre-classification results, different super-pixel change possibilities are calculated. Furthermore, the changed and unchanged super-pixels that serve as the training samples are automatically selected. The spectral features and Gabor features of each super-pixel are extracted. Finally, superpixel-based CD is implemented by applying RF based on these samples. Experimental results on Ziyuan 3 (ZY3) multi-spectral images show that the proposed method outperforms the compared methods in the accuracy of CD, and also confirm the feasibility and effectiveness of the proposed approach.
Furusawa, Takuro; Sirikolo, Myknee Qusa; Sasaoka, Masatoshi; Ohtsuka, Ryutaro
2014-01-27
In Solomon Islands, forests have provided people with ecological services while being affected by human use and protection. This study used a quantitative ethnobotanical analysis to explore the society-forest interaction and its transformation in Roviana, Solomon Islands. We compared local plant and land uses between a rural village and urbanized village. Special attention was paid to how local people depend on biodiversity and how traditional human modifications of forest contribute to biodiversity conservation. After defining locally recognized land-use classes, vegetation surveys were conducted in seven forest classes. For detailed observations of daily plant uses, 15 and 17 households were randomly selected in the rural and urban villages, respectively. We quantitatively documented the plant species that were used as food, medicine, building materials, and tools. The vegetation survey revealed that each local forest class represented a different vegetative community with relatively low similarity between communities. Although commercial logging operations and agriculture were both prohibited in the customary nature reserve, local people were allowed to cut down trees for their personal use and to take several types of non-timber forest products. Useful trees were found at high frequencies in the barrier island's primary forest (68.4%) and the main island's reserve (68.3%). Various useful tree species were found only in the reserve forest and seldom available in the urban village. In the rural village, customary governance and control over the use of forest resources by the local people still functioned. Human modifications of the forest created unique vegetation communities, thus increasing biodiversity overall. Each type of forest had different species that varied in their levels of importance to the local subsistence lifestyle, and the villagers' behaviors, such as respect for forest reserves and the semidomestication of some species, contributed to conserving diversity. Urbanization threatened this human-forest interaction. Although the status of biodiversity in human-modified landscapes is not fully understood, this study suggested that traditional human modifications can positively affect biodiversity and that conservation programs should incorporate traditional uses of landscapes to be successful.
NASA Astrophysics Data System (ADS)
Vogels, M. F. A.; de Jong, S. M.; Sterk, G.; Addink, E. A.
2017-02-01
Land-use and land-cover (LULC) conversions have an important impact on land degradation, erosion and water availability. Information on historical land cover (change) is crucial for studying and modelling land- and ecosystem degradation. During the past decades major LULC conversions occurred in Africa, Southeast Asia and South America as a consequence of a growing population and economy. Most distinct is the conversion of natural vegetation into cropland. Historical LULC information can be derived from satellite imagery, but these only date back until approximately 1972. Before the emergence of satellite imagery, landscapes were monitored by black-and-white (B&W) aerial photography. This photography is often visually interpreted, which is a very time-consuming approach. This study presents an innovative, semi-automated method to map cropland acreage from B&W photography. Cropland acreage was mapped on two study sites in Ethiopia and in The Netherlands. For this purpose we used Geographic Object-Based Image Analysis (GEOBIA) and a Random Forest classification on a set of variables comprising texture, shape, slope, neighbour and spectral information. Overall mapping accuracies attained are 90% and 96% for the two study areas respectively. This mapping method increases the timeline at which historical cropland expansion can be mapped purely from brightness information in B&W photography up to the 1930s, which is beneficial for regions where historical land-use statistics are mostly absent.
Diagnosis of colorectal cancer by near-infrared optical fiber spectroscopy and random forest
NASA Astrophysics Data System (ADS)
Chen, Hui; Lin, Zan; Wu, Hegang; Wang, Li; Wu, Tong; Tan, Chao
2015-01-01
Near-infrared (NIR) spectroscopy has such advantages as being noninvasive, fast, relatively inexpensive, and no risk of ionizing radiation. Differences in the NIR signals can reflect many physiological changes, which are in turn associated with such factors as vascularization, cellularity, oxygen consumption, or remodeling. NIR spectral differences between colorectal cancer and healthy tissues were investigated. A Fourier transform NIR spectroscopy instrument equipped with a fiber-optic probe was used to mimic in situ clinical measurements. A total of 186 spectra were collected and then underwent the preprocessing of standard normalize variate (SNV) for removing unwanted background variances. All the specimen and spots used for spectral collection were confirmed staining and examination by an experienced pathologist so as to ensure the representative of the pathology. Principal component analysis (PCA) was used to uncover the possible clustering. Several methods including random forest (RF), partial least squares-discriminant analysis (PLSDA), K-nearest neighbor and classification and regression tree (CART) were used to extract spectral features and to construct the diagnostic models. By comparison, it reveals that, even if no obvious difference of misclassified ratio (MCR) was observed between these models, RF is preferable since it is quicker, more convenient and insensitive to over-fitting. The results indicate that NIR spectroscopy coupled with RF model can serve as a potential tool for discriminating the colorectal cancer tissues from normal ones.
GIS based Cadastral level Forest Information System using World View-II data in Bir Hisar (Haryana)
NASA Astrophysics Data System (ADS)
Mothi Kumar, K. E.; Singh, S.; Attri, P.; Kumar, R.; Kumar, A.; Sarika; Hooda, R. S.; Sapra, R. K.; Garg, V.; Kumar, V.; Nivedita
2014-11-01
Identification and demarcation of Forest lands on the ground remains a major challenge in Forest administration and management. Cadastral forest mapping deals with forestlands boundary delineation and their associated characterization (forest/non forest). The present study is an application of high resolution World View-II data for digitization of Protected Forest boundary at cadastral level with integration of Records of Right (ROR) data. Cadastral vector data was generated by digitization of spatial data using scanned mussavies in ArcGIS environment. Ortho-images were created from World View-II digital stereo data with Universal Transverse Mercator coordinate system with WGS 84 datum. Cadastral vector data of Bir Hisar (Hisar district, Haryana) and adjacent villages was spatially adjusted over ortho-image using ArcGIS software. Edge matching of village boundaries was done with respect to khasra boundaries of individual village. The notified forest grids were identified on ortho-image and grid vector data was extracted from georeferenced cadastral data. Cadastral forest boundary vectors were digitized from ortho-images. Accuracy of cadastral data was checked by comparison of randomly selected geo-coordinates points, tie lines and boundary measurements of randomly selected parcels generated from image data set with that of actual field measurements. Area comparison was done between cadastral map area, the image map area and RoR area. The area covered under Protected Forest was compared with ROR data and within an accuracy of less than 1 % from ROR area was accepted. The methodology presented in this paper is useful to update the cadastral forest maps. The produced GIS databases and large-scale Forest Maps may serve as a data foundation towards a land register of forests. The study introduces the use of very high resolution satellite data to develop a method for cadastral surveying through on - screen digitization in a less time as compared to the old fashioned cadastral parcel boundaries surveying method.
NASA Astrophysics Data System (ADS)
Bassa, Zaakirah; Bob, Urmilla; Szantoi, Zoltan; Ismail, Riyad
2016-01-01
In recent years, the popularity of tree-based ensemble methods for land cover classification has increased significantly. Using WorldView-2 image data, we evaluate the potential of the oblique random forest algorithm (oRF) to classify a highly heterogeneous protected area. In contrast to the random forest (RF) algorithm, the oRF algorithm builds multivariate trees by learning the optimal split using a supervised model. The oRF binary algorithm is adapted to a multiclass land cover and land use application using both the "one-against-one" and "one-against-all" combination approaches. Results show that the oRF algorithms are capable of achieving high classification accuracies (>80%). However, there was no statistical difference in classification accuracies obtained by the oRF algorithms and the more popular RF algorithm. For all the algorithms, user accuracies (UAs) and producer accuracies (PAs) >80% were recorded for most of the classes. Both the RF and oRF algorithms poorly classified the indigenous forest class as indicated by the low UAs and PAs. Finally, the results from this study advocate and support the utility of the oRF algorithm for land cover and land use mapping of protected areas using WorldView-2 image data.
Predicting healthcare associated infections using patients' experiences
NASA Astrophysics Data System (ADS)
Pratt, Michael A.; Chu, Henry
2016-05-01
Healthcare associated infections (HAI) are a major threat to patient safety and are costly to health systems. Our goal is to predict the HAI performance of a hospital using the patients' experience responses as input. We use four classifiers, viz. random forest, naive Bayes, artificial feedforward neural networks, and the support vector machine, to perform the prediction of six types of HAI. The six types include blood stream, urinary tract, surgical site, and intestinal infections. Experiments show that the random forest and support vector machine perform well across the six types of HAI.
Nguyen, Thanh-Tung; Huang, Joshua; Wu, Qingyao; Nguyen, Thuy; Li, Mark
2015-01-01
Single-nucleotide polymorphisms (SNPs) selection and identification are the most important tasks in Genome-wide association data analysis. The problem is difficult because genome-wide association data is very high dimensional and a large portion of SNPs in the data is irrelevant to the disease. Advanced machine learning methods have been successfully used in Genome-wide association studies (GWAS) for identification of genetic variants that have relatively big effects in some common, complex diseases. Among them, the most successful one is Random Forests (RF). Despite of performing well in terms of prediction accuracy in some data sets with moderate size, RF still suffers from working in GWAS for selecting informative SNPs and building accurate prediction models. In this paper, we propose to use a new two-stage quality-based sampling method in random forests, named ts-RF, for SNP subspace selection for GWAS. The method first applies p-value assessment to find a cut-off point that separates informative and irrelevant SNPs in two groups. The informative SNPs group is further divided into two sub-groups: highly informative and weak informative SNPs. When sampling the SNP subspace for building trees for the forest, only those SNPs from the two sub-groups are taken into account. The feature subspaces always contain highly informative SNPs when used to split a node at a tree. This approach enables one to generate more accurate trees with a lower prediction error, meanwhile possibly avoiding overfitting. It allows one to detect interactions of multiple SNPs with the diseases, and to reduce the dimensionality and the amount of Genome-wide association data needed for learning the RF model. Extensive experiments on two genome-wide SNP data sets (Parkinson case-control data comprised of 408,803 SNPs and Alzheimer case-control data comprised of 380,157 SNPs) and 10 gene data sets have demonstrated that the proposed model significantly reduced prediction errors and outperformed most existing the-state-of-the-art random forests. The top 25 SNPs in Parkinson data set were identified by the proposed model including four interesting genes associated with neurological disorders. The presented approach has shown to be effective in selecting informative sub-groups of SNPs potentially associated with diseases that traditional statistical approaches might fail. The new RF works well for the data where the number of case-control objects is much smaller than the number of SNPs, which is a typical problem in gene data and GWAS. Experiment results demonstrated the effectiveness of the proposed RF model that outperformed the state-of-the-art RFs, including Breiman's RF, GRRF and wsRF methods.
Andrew T. Hudak; Jeffrey S. Evans; Nicholas L. Crookston; Michael J. Falkowski; Brant K. Steigers; Rob Taylor; Halli Hemingway
2008-01-01
Stand exams are the principal means by which timber companies monitor and manage their forested lands. Airborne LiDAR surveys sample forest stands at much finer spatial resolution and broader spatial extent than is practical on the ground. In this paper, we developed models that leverage spatially intensive and extensive LiDAR data and a stratified random sample of...
John D. Baldridge; James T. Sylvester; William T. Borrie
2005-01-01
Local, state, and national agencies charged with managing wildlands in the United States are now seeking to learn more about the public's preferences for managing forests. For this reason agency wildland managers are making use of survey research to supplement their public input processes. Agency managers often choose random-digit dial telephone surveys because of...
Effect of the federal estate tax on nonindustrial private forest holdings
John L. Greene; Steven H. Bullard; Tamara L. Cushing; Theodore Beauvais
2006-01-01
Data for this study were collected using a questionnaire mailed to randomly selected members of two forest owner organizations. Among the key findings is that 38% of forest estates owed federal estate tax, a rate many times higher than US estates in general. In 28% of the cases where estate tax was due, timber or land was sold because other assets were not adequate. In...
Acorn Production on the Missouri Ozark Forest Ecosystem Project Study Sites: Pre-treatment Data
Larry D. Vangilder
1997-01-01
In the pre-treatment phase of a study to determine if even- and uneven-aged forest management affects the production of acorns on the Missourt Forest Ecosystem Project (MOFEP) study sites, acorn production was measured on the nine study sites by randomly placing from 2 to 6 plots in each of four ecological land type (ELT) groupings (N=130 plots). A split-plot...
On safari to Random Jungle: a fast implementation of Random Forests for high-dimensional data
Schwarz, Daniel F.; König, Inke R.; Ziegler, Andreas
2010-01-01
Motivation: Genome-wide association (GWA) studies have proven to be a successful approach for helping unravel the genetic basis of complex genetic diseases. However, the identified associations are not well suited for disease prediction, and only a modest portion of the heritability can be explained for most diseases, such as Type 2 diabetes or Crohn's disease. This may partly be due to the low power of standard statistical approaches to detect gene–gene and gene–environment interactions when small marginal effects are present. A promising alternative is Random Forests, which have already been successfully applied in candidate gene analyses. Important single nucleotide polymorphisms are detected by permutation importance measures. To this day, the application to GWA data was highly cumbersome with existing implementations because of the high computational burden. Results: Here, we present the new freely available software package Random Jungle (RJ), which facilitates the rapid analysis of GWA data. The program yields valid results and computes up to 159 times faster than the fastest alternative implementation, while still maintaining all options of other programs. Specifically, it offers the different permutation importance measures available. It includes new options such as the backward elimination method. We illustrate the application of RJ to a GWA of Crohn's disease. The most important single nucleotide polymorphisms (SNPs) validate recent findings in the literature and reveal potential interactions. Availability: The RJ software package is freely available at http://www.randomjungle.org Contact: inke.koenig@imbs.uni-luebeck.de; ziegler@imbs.uni-luebeck.de Supplementary information: Supplementary data are available at Bioinformatics online. PMID:20505004
Large-scale distribution patterns of mangrove nematodes: A global meta-analysis.
Brustolin, Marco C; Nagelkerken, Ivan; Fonseca, Gustavo
2018-05-01
Mangroves harbor diverse invertebrate communities, suggesting that macroecological distribution patterns of habitat-forming foundation species drive the associated faunal distribution. Whether these are driven by mangrove biogeography is still ambiguous. For small-bodied taxa, local factors and landscape metrics might be as important as macroecology. We performed a meta-analysis to address the following questions: (1) can richness of mangrove trees explain macroecological patterns of nematode richness? and (2) do local landscape attributes have equal or higher importance than biogeography in structuring nematode richness? Mangrove areas of Caribbean-Southwest Atlantic, Western Indian, Central Indo-Pacific, and Southwest Pacific biogeographic regions. We used random-effects meta-analyses based on natural logarithm of the response ratio (lnRR) to assess the importance of macroecology (i.e., biogeographic regions, latitude, longitude), local factors (i.e., aboveground mangrove biomass and tree richness), and landscape metrics (forest area and shape) in structuring nematode richness from 34 mangroves sites around the world. Latitude, mangrove forest area, and forest shape index explained 19% of the heterogeneity across studies. Richness was higher at low latitudes, closer to the equator. At local scales, richness increased slightly with landscape complexity and decreased with forest shape index. Our results contrast with biogeographic diversity patterns of mangrove-associated taxa. Global-scale nematode diversity may have evolved independently of mangrove tree richness, and diversity of small-bodied metazoans is probably more closely driven by latitude and associated climates, rather than local, landscape, or global biogeographic patterns.
NASA Astrophysics Data System (ADS)
Clough, B.; Russell, M.; Domke, G. M.; Woodall, C. W.
2016-12-01
Uncertainty estimates are needed to establish confidence in national forest carbon stocks and to verify changes reported to the United Nations Framework Convention on Climate Change. Good practice guidance from the Intergovernmental Panel on Climate Change stipulates that uncertainty assessments should neither exaggerate nor underestimate the actual error within carbon stocks, yet methodological guidance for forests has been hampered by limited understanding of how complex dynamics give rise to errors across spatial scales (i.e., individuals to continents). This talk highlights efforts to develop a multi-scale, data-driven framework for assessing uncertainty within the United States (US) forest carbon inventory, and focuses on challenges and opportunities for improving the precision of national forest carbon stock estimates. Central to our approach is the calibration of allometric models with a newly established legacy biomass database for North American tree species, and the use of hierarchical models to link these data with the Forest Inventory and Analysis (FIA) database as well as remote sensing datasets. Our work suggests substantial risk for misestimating key sources of uncertainty including: (1) attributing more confidence in allometric models than what is warranted by the best available data; (2) failing to capture heterogeneity in biomass stocks due to environmental variation at regional scales; and (3) ignoring spatial autocorrelation and other random effects that are characteristic of national forest inventory data. Our results suggest these sources of error may be much higher than is generally assumed, though these results must be understood with the limited scope and availability of appropriate calibration data in mind. In addition to reporting on important sources of uncertainty, this talk will discuss opportunities to improve the precision of national forest carbon stocks that are motivated by our use of data-driven forecasting including: (1) improving the taxonomic and geographic scope of available biomass data; (2) direct attribution of landscape-level heterogeneity in biomass stocks to specific ecological processes; and (3) integration of expert opinion and meta-analysis to lessen the influence of often highly variable datasets on biomass stock forecasts.
Community turnover of wood-inhabiting fungi across hierarchical spatial scales.
Abrego, Nerea; García-Baquero, Gonzalo; Halme, Panu; Ovaskainen, Otso; Salcedo, Isabel
2014-01-01
For efficient use of conservation resources it is important to determine how species diversity changes across spatial scales. In many poorly known species groups little is known about at which spatial scales the conservation efforts should be focused. Here we examined how the community turnover of wood-inhabiting fungi is realised at three hierarchical levels, and how much of community variation is explained by variation in resource composition and spatial proximity. The hierarchical study design consisted of management type (fixed factor), forest site (random factor, nested within management type) and study plots (randomly placed plots within each study site). To examine how species richness varied across the three hierarchical scales, randomized species accumulation curves and additive partitioning of species richness were applied. To analyse variation in wood-inhabiting species and dead wood composition at each scale, linear and Permanova modelling approaches were used. Wood-inhabiting fungal communities were dominated by rare and infrequent species. The similarity of fungal communities was higher within sites and within management categories than among sites or between the two management categories, and it decreased with increasing distance among the sampling plots and with decreasing similarity of dead wood resources. However, only a small part of community variation could be explained by these factors. The species present in managed forests were in a large extent a subset of those species present in natural forests. Our results suggest that in particular the protection of rare species requires a large total area. As managed forests have only little additional value complementing the diversity of natural forests, the conservation of natural forests is the key to ecologically effective conservation. As the dissimilarity of fungal communities increases with distance, the conserved natural forest sites should be broadly distributed in space, yet the individual conserved areas should be large enough to ensure local persistence.
Community Turnover of Wood-Inhabiting Fungi across Hierarchical Spatial Scales
Abrego, Nerea; García-Baquero, Gonzalo; Halme, Panu; Ovaskainen, Otso; Salcedo, Isabel
2014-01-01
For efficient use of conservation resources it is important to determine how species diversity changes across spatial scales. In many poorly known species groups little is known about at which spatial scales the conservation efforts should be focused. Here we examined how the community turnover of wood-inhabiting fungi is realised at three hierarchical levels, and how much of community variation is explained by variation in resource composition and spatial proximity. The hierarchical study design consisted of management type (fixed factor), forest site (random factor, nested within management type) and study plots (randomly placed plots within each study site). To examine how species richness varied across the three hierarchical scales, randomized species accumulation curves and additive partitioning of species richness were applied. To analyse variation in wood-inhabiting species and dead wood composition at each scale, linear and Permanova modelling approaches were used. Wood-inhabiting fungal communities were dominated by rare and infrequent species. The similarity of fungal communities was higher within sites and within management categories than among sites or between the two management categories, and it decreased with increasing distance among the sampling plots and with decreasing similarity of dead wood resources. However, only a small part of community variation could be explained by these factors. The species present in managed forests were in a large extent a subset of those species present in natural forests. Our results suggest that in particular the protection of rare species requires a large total area. As managed forests have only little additional value complementing the diversity of natural forests, the conservation of natural forests is the key to ecologically effective conservation. As the dissimilarity of fungal communities increases with distance, the conserved natural forest sites should be broadly distributed in space, yet the individual conserved areas should be large enough to ensure local persistence. PMID:25058128
Decision tree modeling using R.
Zhang, Zhongheng
2016-08-01
In machine learning field, decision tree learner is powerful and easy to interpret. It employs recursive binary partitioning algorithm that splits the sample in partitioning variable with the strongest association with the response variable. The process continues until some stopping criteria are met. In the example I focus on conditional inference tree, which incorporates tree-structured regression models into conditional inference procedures. While growing a single tree is subject to small changes in the training data, random forests procedure is introduced to address this problem. The sources of diversity for random forests come from the random sampling and restricted set of input variables to be selected. Finally, I introduce R functions to perform model based recursive partitioning. This method incorporates recursive partitioning into conventional parametric model building.
Kebede, Mihiretu; Zegeye, Desalegn Tigabu; Zeleke, Berihun Megabiaw
2017-12-01
To monitor the progress of therapy and disease progression, periodic CD4 counts are required throughout the course of HIV/AIDS care and support. The demand for CD4 count measurement is increasing as ART programs expand over the last decade. This study aimed to predict CD4 count changes and to identify the predictors of CD4 count changes among patients on ART. A cross-sectional study was conducted at the University of Gondar Hospital from 3,104 adult patients on ART with CD4 counts measured at least twice (baseline and most recent). Data were retrieved from the HIV care clinic electronic database and patients` charts. Descriptive data were analyzed by SPSS version 20. Cross-Industry Standard Process for Data Mining (CRISP-DM) methodology was followed to undertake the study. WEKA version 3.8 was used to conduct a predictive data mining. Before building the predictive data mining models, information gain values and correlation-based Feature Selection methods were used for attribute selection. Variables were ranked according to their relevance based on their information gain values. J48, Neural Network, and Random Forest algorithms were experimented to assess model accuracies. The median duration of ART was 191.5 weeks. The mean CD4 count change was 243 (SD 191.14) cells per microliter. Overall, 2427 (78.2%) patients had their CD4 counts increased by at least 100 cells per microliter, while 4% had a decline from the baseline CD4 value. Baseline variables including age, educational status, CD8 count, ART regimen, and hemoglobin levels predicted CD4 count changes with predictive accuracies of J48, Neural Network, and Random Forest being 87.1%, 83.5%, and 99.8%, respectively. Random Forest algorithm had a superior performance accuracy level than both J48 and Artificial Neural Network. The precision, sensitivity and recall values of Random Forest were also more than 99%. Nearly accurate prediction results were obtained using Random Forest algorithm. This algorithm could be used in a low-resource setting to build a web-based prediction model for CD4 count changes. Copyright © 2017 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Overstreet, B. T.; Legleiter, C. J.
2012-12-01
The Snake River in Grand Teton National Park is a dam-regulated but highly dynamic gravel-bed river that alternates between a single thread and a multithread planform. Identifying key drivers of channel change on this river could improve our understanding of 1) how flow regulation at Jackson Lake Dam has altered the character of the river over time; 2) how changes in the distribution of various types of vegetation impacts river dynamics; and 3) how the Snake River will respond to future human and climate driven disturbances. Despite the importance of monitoring planform changes over time, automated channel extraction and understanding the physical drivers contributing to channel change continue to be challenging yet critical steps in the remote sensing of riverine environments. In this study we use the random forest statistical technique to first classify land cover within the Snake River corridor and then extract channel features from a sequence of high-resolution multispectral images of the Snake River spanning the period from 2006 to 2012, which encompasses both exceptionally dry years and near-record runoff in 2011. We show that the random forest technique can be used to classify images with as few as four spectral bands with far greater accuracy than traditional single-tree classification approaches. Secondly, we couple random forest derived land cover maps with LiDAR derived topography, bathymetry, and canopy height to explore physical drivers contributing to observed channel changes on the Snake River. In conclusion we show that the random forest technique is a powerful tool for classifying multispectral images of rivers. Moreover, we hypothesize that with sufficient data for calculating spatially distributed metrics of channel form and more frequent channel monitoring, this tool can also be used to identify areas with high probabilities of channel change. Land cover maps of a portion of the Snake River produced from digital aerial photography from 2010 and a 2011 WorldView2 satellite image. This pair of maps thus captures changes that occurred during the 2011 runoff
NASA Astrophysics Data System (ADS)
Iwamoto, C.; Utsunomiya, H.; Tamii, A.; Akimune, H.; Nakada, H.; Shima, T.; Yamagata, T.; Kawabata, T.; Fujita, Y.; Matsubara, H.; Shimbara, Y.; Nagashima, M.; Suzuki, T.; Fujita, H.; Sakuda, M.; Mori, T.; Izumi, T.; Okamoto, A.; Kondo, T.; Bilgier, B.; Kozer, H. C.; Lui, Y.-W.; Hatanaka, K.
2012-06-01
A high-resolution measurement of inelastic proton scattering off Zr90 near 0° was performed at 295 MeV with a focus on a pronounced strength previously reported in the low-energy tail of giant dipole resonance. A forest of fine structure was observed in the excitation energy region 7-12 MeV. A multipole decomposition analysis of the angular distribution for the forest was carried out using the ECIS95 distorted-wave Born approximation code with the Hartree-Fock plus random-phase approximation model of E1 and M1 transition densities and inclusion of E1 Coulomb excitation. The analysis separated pygmy dipole and M1 resonances in the forest at EPDR=9.15±0.18MeV with ΓPDR=2.91±0.64MeV and at EM1=9.53±0.06MeV with ΓM1=2.70±0.17MeV in the Lorentzian function, respectively. The B(E1)↑ value for pygmy dipole resonance over 7-11 MeV is 0.75±0.08e2fm2, which corresponds to 2.1±0.2% of the Thomas-Reiche-Kuhn sum rule.
A Time-Series Water Level Forecasting Model Based on Imputation and Variable Selection Method.
Yang, Jun-He; Cheng, Ching-Hsue; Chan, Chia-Pan
2017-01-01
Reservoirs are important for households and impact the national economy. This paper proposed a time-series forecasting model based on estimating a missing value followed by variable selection to forecast the reservoir's water level. This study collected data from the Taiwan Shimen Reservoir as well as daily atmospheric data from 2008 to 2015. The two datasets are concatenated into an integrated dataset based on ordering of the data as a research dataset. The proposed time-series forecasting model summarily has three foci. First, this study uses five imputation methods to directly delete the missing value. Second, we identified the key variable via factor analysis and then deleted the unimportant variables sequentially via the variable selection method. Finally, the proposed model uses a Random Forest to build the forecasting model of the reservoir's water level. This was done to compare with the listing method under the forecasting error. These experimental results indicate that the Random Forest forecasting model when applied to variable selection with full variables has better forecasting performance than the listing model. In addition, this experiment shows that the proposed variable selection can help determine five forecast methods used here to improve the forecasting capability.
Finch, Kristen; Espinoza, Edgard; Jones, F Andrew; Cronn, Richard
2017-05-01
We investigated whether wood metabolite profiles from direct analysis in real time (time-of-flight) mass spectrometry (DART-TOFMS) could be used to determine the geographic origin of Douglas-fir wood cores originating from two regions in western Oregon, USA. Three annual ring mass spectra were obtained from 188 adult Douglas-fir trees, and these were analyzed using random forest models to determine whether samples could be classified to geographic origin, growth year, or growth year and geographic origin. Specific wood molecules that contributed to geographic discrimination were identified. Douglas-fir mass spectra could be differentiated into two geographic classes with an accuracy between 70% and 76%. Classification models could not accurately classify sample mass spectra based on growth year. Thirty-two molecules were identified as key for classifying western Oregon Douglas-fir wood cores to geographic origin. DART-TOFMS is capable of detecting minute but regionally informative differences in wood molecules over a small geographic scale, and these differences made it possible to predict the geographic origin of Douglas-fir wood with moderate accuracy. Studies involving DART-TOFMS, alone and in combination with other technologies, will be relevant for identifying the geographic origin of illegally harvested wood.
NASA Astrophysics Data System (ADS)
Lai, J.-S.; Tsai, F.; Chiang, S.-H.
2016-06-01
This study implements a data mining-based algorithm, the random forests classifier, with geo-spatial data to construct a regional and rainfall-induced landslide susceptibility model. The developed model also takes account of landslide regions (source, non-occurrence and run-out signatures) from the original landslide inventory in order to increase the reliability of the susceptibility modelling. A total of ten causative factors were collected and used in this study, including aspect, curvature, elevation, slope, faults, geology, NDVI (Normalized Difference Vegetation Index), rivers, roads and soil data. Consequently, this study transforms the landslide inventory and vector-based causative factors into the pixel-based format in order to overlay with other raster data for constructing the random forests based model. This study also uses original and edited topographic data in the analysis to understand their impacts to the susceptibility modeling. Experimental results demonstrate that after identifying the run-out signatures, the overall accuracy and Kappa coefficient have been reached to be become more than 85 % and 0.8, respectively. In addition, correcting unreasonable topographic feature of the digital terrain model also produces more reliable modelling results.
Beccaria, Marco; Mellors, Theodore R; Petion, Jacky S; Rees, Christiaan A; Nasir, Mavra; Systrom, Hannah K; Sairistil, Jean W; Jean-Juste, Marc-Antoine; Rivera, Vanessa; Lavoile, Kerline; Severe, Patrice; Pape, Jean W; Wright, Peter F; Hill, Jane E
2018-02-01
Tuberculosis (TB) remains a global public health malady that claims almost 1.8 million lives annually. Diagnosis of TB represents perhaps one of the most challenging aspects of tuberculosis control. Gold standards for diagnosis of active TB (culture and nucleic acid amplification) are sputum-dependent, however, in up to a third of TB cases, an adequate biological sputum sample is not readily available. The analysis of exhaled breath, as an alternative to sputum-dependent tests, has the potential to provide a simple, fast, and non-invasive, and ready-available diagnostic service that could positively change TB detection. Human breath has been evaluated in the setting of active tuberculosis using thermal desorption-comprehensive two-dimensional gas chromatography-time of flight mass spectrometry methodology. From the entire spectrum of volatile metabolites in breath, three random forest machine learning models were applied leading to the generation of a panel of 46 breath features. The twenty-two common features within each random forest model used were selected as a set that could distinguish subjects with confirmed pulmonary M. tuberculosis infection and people with other pathologies than TB. Copyright © 2018 Elsevier B.V. All rights reserved.
Ishwaran, Hemant; Lu, Min
2018-06-04
Random forests are a popular nonparametric tree ensemble procedure with broad applications to data analysis. While its widespread popularity stems from its prediction performance, an equally important feature is that it provides a fully nonparametric measure of variable importance (VIMP). A current limitation of VIMP, however, is that no systematic method exists for estimating its variance. As a solution, we propose a subsampling approach that can be used to estimate the variance of VIMP and for constructing confidence intervals. The method is general enough that it can be applied to many useful settings, including regression, classification, and survival problems. Using extensive simulations, we demonstrate the effectiveness of the subsampling estimator and in particular find that the delete-d jackknife variance estimator, a close cousin, is especially effective under low subsampling rates due to its bias correction properties. These 2 estimators are highly competitive when compared with the .164 bootstrap estimator, a modified bootstrap procedure designed to deal with ties in out-of-sample data. Most importantly, subsampling is computationally fast, thus making it especially attractive for big data settings. Copyright © 2018 John Wiley & Sons, Ltd.
Automatic detection of atrial fibrillation in cardiac vibration signals.
Brueser, C; Diesel, J; Zink, M D H; Winter, S; Schauerte, P; Leonhardt, S
2013-01-01
We present a study on the feasibility of the automatic detection of atrial fibrillation (AF) from cardiac vibration signals (ballistocardiograms/BCGs) recorded by unobtrusive bedmounted sensors. The proposed system is intended as a screening and monitoring tool in home-healthcare applications and not as a replacement for ECG-based methods used in clinical environments. Based on BCG data recorded in a study with 10 AF patients, we evaluate and rank seven popular machine learning algorithms (naive Bayes, linear and quadratic discriminant analysis, support vector machines, random forests as well as bagged and boosted trees) for their performance in separating 30 s long BCG epochs into one of three classes: sinus rhythm, atrial fibrillation, and artifact. For each algorithm, feature subsets of a set of statistical time-frequency-domain and time-domain features were selected based on the mutual information between features and class labels as well as first- and second-order interactions among features. The classifiers were evaluated on a set of 856 epochs by means of 10-fold cross-validation. The best algorithm (random forests) achieved a Matthews correlation coefficient, mean sensitivity, and mean specificity of 0.921, 0.938, and 0.982, respectively.
A machine learning system to improve heart failure patient assistance.
Guidi, Gabriele; Pettenati, Maria Chiara; Melillo, Paolo; Iadanza, Ernesto
2014-11-01
In this paper, we present a clinical decision support system (CDSS) for the analysis of heart failure (HF) patients, providing various outputs such as an HF severity evaluation, HF-type prediction, as well as a management interface that compares the different patients' follow-ups. The whole system is composed of a part of intelligent core and of an HF special-purpose management tool also providing the function to act as interface for the artificial intelligence training and use. To implement the smart intelligent functions, we adopted a machine learning approach. In this paper, we compare the performance of a neural network (NN), a support vector machine, a system with fuzzy rules genetically produced, and a classification and regression tree and its direct evolution, which is the random forest, in analyzing our database. Best performances in both HF severity evaluation and HF-type prediction functions are obtained by using the random forest algorithm. The management tool allows the cardiologist to populate a "supervised database" suitable for machine learning during his or her regular outpatient consultations. The idea comes from the fact that in literature there are a few databases of this type, and they are not scalable to our case.
NASA Astrophysics Data System (ADS)
Sadler, J. M.; Goodall, J. L.; Morsy, M. M.; Spencer, K.
2018-04-01
Sea level rise has already caused more frequent and severe coastal flooding and this trend will likely continue. Flood prediction is an essential part of a coastal city's capacity to adapt to and mitigate this growing problem. Complex coastal urban hydrological systems however, do not always lend themselves easily to physically-based flood prediction approaches. This paper presents a method for using a data-driven approach to estimate flood severity in an urban coastal setting using crowd-sourced data, a non-traditional but growing data source, along with environmental observation data. Two data-driven models, Poisson regression and Random Forest regression, are trained to predict the number of flood reports per storm event as a proxy for flood severity, given extensive environmental data (i.e., rainfall, tide, groundwater table level, and wind conditions) as input. The method is demonstrated using data from Norfolk, Virginia USA from September 2010 to October 2016. Quality-controlled, crowd-sourced street flooding reports ranging from 1 to 159 per storm event for 45 storm events are used to train and evaluate the models. Random Forest performed better than Poisson regression at predicting the number of flood reports and had a lower false negative rate. From the Random Forest model, total cumulative rainfall was by far the most dominant input variable in predicting flood severity, followed by low tide and lower low tide. These methods serve as a first step toward using data-driven methods for spatially and temporally detailed coastal urban flood prediction.
Hu, Chen; Steingrimsson, Jon Arni
2018-01-01
A crucial component of making individualized treatment decisions is to accurately predict each patient's disease risk. In clinical oncology, disease risks are often measured through time-to-event data, such as overall survival and progression/recurrence-free survival, and are often subject to censoring. Risk prediction models based on recursive partitioning methods are becoming increasingly popular largely due to their ability to handle nonlinear relationships, higher-order interactions, and/or high-dimensional covariates. The most popular recursive partitioning methods are versions of the Classification and Regression Tree (CART) algorithm, which builds a simple interpretable tree structured model. With the aim of increasing prediction accuracy, the random forest algorithm averages multiple CART trees, creating a flexible risk prediction model. Risk prediction models used in clinical oncology commonly use both traditional demographic and tumor pathological factors as well as high-dimensional genetic markers and treatment parameters from multimodality treatments. In this article, we describe the most commonly used extensions of the CART and random forest algorithms to right-censored outcomes. We focus on how they differ from the methods for noncensored outcomes, and how the different splitting rules and methods for cost-complexity pruning impact these algorithms. We demonstrate these algorithms by analyzing a randomized Phase III clinical trial of breast cancer. We also conduct Monte Carlo simulations to compare the prediction accuracy of survival forests with more commonly used regression models under various scenarios. These simulation studies aim to evaluate how sensitive the prediction accuracy is to the underlying model specifications, the choice of tuning parameters, and the degrees of missing covariates.
J. D. Shaw
2006-01-01
Benefits of a strategic national forest inventory to science and society: the USDA Forest Service Forest Inventory and Analysis program. Forest Inventory and Analysis, previously known as Forest Survey, is one of the oldest research and development programs in the USDA Forest Service. Statistically-based inventory efforts that started in Scandinavian countries in the...
Patch forest: a hybrid framework of random forest and patch-based segmentation
NASA Astrophysics Data System (ADS)
Xie, Zhongliu; Gillies, Duncan
2016-03-01
The development of an accurate, robust and fast segmentation algorithm has long been a research focus in medical computer vision. State-of-the-art practices often involve non-rigidly registering a target image with a set of training atlases for label propagation over the target space to perform segmentation, a.k.a. multi-atlas label propagation (MALP). In recent years, the patch-based segmentation (PBS) framework has gained wide attention due to its advantage of relaxing the strict voxel-to-voxel correspondence to a series of pair-wise patch comparisons for contextual pattern matching. Despite a high accuracy reported in many scenarios, computational efficiency has consistently been a major obstacle for both approaches. Inspired by recent work on random forest, in this paper we propose a patch forest approach, which by equipping the conventional PBS with a fast patch search engine, is able to boost segmentation speed significantly while retaining an equal level of accuracy. In addition, a fast forest training mechanism is also proposed, with the use of a dynamic grid framework to efficiently approximate data compactness computation and a 3D integral image technique for fast box feature retrieval.
NASA Astrophysics Data System (ADS)
Freeman, Mary Pyott
ABSTRACT An Analysis of Tree Mortality Using High Resolution Remotely-Sensed Data for Mixed-Conifer Forests in San Diego County by Mary Pyott Freeman The montane mixed-conifer forests of San Diego County are currently experiencing extensive tree mortality, which is defined as dieback where whole stands are affected. This mortality is likely the result of the complex interaction of many variables, such as altered fire regimes, climatic conditions such as drought, as well as forest pathogens and past management strategies. Conifer tree mortality and its spatial pattern and change over time were examined in three components. In component 1, two remote sensing approaches were compared for their effectiveness in delineating dead trees, a spatial contextual approach and an OBIA (object based image analysis) approach, utilizing various dates and spatial resolutions of airborne image data. For each approach transforms and masking techniques were explored, which were found to improve classifications, and an object-based assessment approach was tested. In component 2, dead tree maps produced by the most effective techniques derived from component 1 were utilized for point pattern and vector analyses to further understand spatio-temporal changes in tree mortality for the years 1997, 2000, 2002, and 2005 for three study areas: Palomar, Volcan and Laguna mountains. Plot-based fieldwork was conducted to further assess mortality patterns. Results indicate that conifer mortality was significantly clustered, increased substantially between 2002 and 2005, and was non-random with respect to tree species and diameter class sizes. In component 3, multiple environmental variables were used in Generalized Linear Model (GLM-logistic regression) and decision tree classifier model development, revealing the importance of climate and topographic factors such as precipitation and elevation, in being able to predict areas of high risk for tree mortality. The results from this study highlight the importance of multi-scale spatial as well as temporal analyses, in order to understand mixed-conifer forest structure, dynamics, and processes of decline, which can lead to more sustainable management of forests with continued natural and anthropogenic disturbance.
Roysden, Nathaniel; Wright, Adam
2015-01-01
Mental health problems are an independent predictor of increased healthcare utilization. We created random forest classifiers for predicting two outcomes following a patient's first behavioral health encounter: decreased utilization by any amount (AUROC 0.74) and ultra-high absolute utilization (AUROC 0.88). These models may be used for clinical decision support by referring providers, to automatically detect patients who may benefit from referral, for cost management, or for risk/protection factor analysis.
2014-09-18
Converter AES Advance Encryption Standard ANN Artificial Neural Network APS Application Support AUC Area Under the Curve CPA Correlation Power Analysis ...Importance WGN White Gaussian Noise WPAN Wireless Personal Area Networks XEnv Cross-Environment XRx Cross-Receiver xxi ADVANCES IN SCA AND RF-DNA...based tool called KillerBee was released in 2009 that increases the exposure of ZigBee and other IEEE 802.15.4-based Wireless Personal Area Networks
Multi-label spacecraft electrical signal classification method based on DBN and random forest
Li, Ke; Yu, Nan; Li, Pengfei; Song, Shimin; Wu, Yalei; Li, Yang; Liu, Meng
2017-01-01
In spacecraft electrical signal characteristic data, there exists a large amount of data with high-dimensional features, a high computational complexity degree, and a low rate of identification problems, which causes great difficulty in fault diagnosis of spacecraft electronic load systems. This paper proposes a feature extraction method that is based on deep belief networks (DBN) and a classification method that is based on the random forest (RF) algorithm; The proposed algorithm mainly employs a multi-layer neural network to reduce the dimension of the original data, and then, classification is applied. Firstly, we use the method of wavelet denoising, which was used to pre-process the data. Secondly, the deep belief network is used to reduce the feature dimension and improve the rate of classification for the electrical characteristics data. Finally, we used the random forest algorithm to classify the data and comparing it with other algorithms. The experimental results show that compared with other algorithms, the proposed method shows excellent performance in terms of accuracy, computational efficiency, and stability in addressing spacecraft electrical signal data. PMID:28486479
Multi-label spacecraft electrical signal classification method based on DBN and random forest.
Li, Ke; Yu, Nan; Li, Pengfei; Song, Shimin; Wu, Yalei; Li, Yang; Liu, Meng
2017-01-01
In spacecraft electrical signal characteristic data, there exists a large amount of data with high-dimensional features, a high computational complexity degree, and a low rate of identification problems, which causes great difficulty in fault diagnosis of spacecraft electronic load systems. This paper proposes a feature extraction method that is based on deep belief networks (DBN) and a classification method that is based on the random forest (RF) algorithm; The proposed algorithm mainly employs a multi-layer neural network to reduce the dimension of the original data, and then, classification is applied. Firstly, we use the method of wavelet denoising, which was used to pre-process the data. Secondly, the deep belief network is used to reduce the feature dimension and improve the rate of classification for the electrical characteristics data. Finally, we used the random forest algorithm to classify the data and comparing it with other algorithms. The experimental results show that compared with other algorithms, the proposed method shows excellent performance in terms of accuracy, computational efficiency, and stability in addressing spacecraft electrical signal data.
Intelligent Fault Diagnosis of HVCB with Feature Space Optimization-Based Random Forest
Ma, Suliang; Wu, Jianwen; Wang, Yuhao; Jia, Bowen; Jiang, Yuan
2018-01-01
Mechanical faults of high-voltage circuit breakers (HVCBs) always happen over long-term operation, so extracting the fault features and identifying the fault type have become a key issue for ensuring the security and reliability of power supply. Based on wavelet packet decomposition technology and random forest algorithm, an effective identification system was developed in this paper. First, compared with the incomplete description of Shannon entropy, the wavelet packet time-frequency energy rate (WTFER) was adopted as the input vector for the classifier model in the feature selection procedure. Then, a random forest classifier was used to diagnose the HVCB fault, assess the importance of the feature variable and optimize the feature space. Finally, the approach was verified based on actual HVCB vibration signals by considering six typical fault classes. The comparative experiment results show that the classification accuracy of the proposed method with the origin feature space reached 93.33% and reached up to 95.56% with optimized input feature vector of classifier. This indicates that feature optimization procedure is successful, and the proposed diagnosis algorithm has higher efficiency and robustness than traditional methods. PMID:29659548
RandomForest4Life: a Random Forest for predicting ALS disease progression.
Hothorn, Torsten; Jung, Hans H
2014-09-01
We describe a method for predicting disease progression in amyotrophic lateral sclerosis (ALS) patients. The method was developed as a submission to the DREAM Phil Bowen ALS Prediction Prize4Life Challenge of summer 2012. Based on repeated patient examinations over a three- month period, we used a random forest algorithm to predict future disease progression. The procedure was set up and internally evaluated using data from 1197 ALS patients. External validation by an expert jury was based on undisclosed information of an additional 625 patients; all patient data were obtained from the PRO-ACT database. In terms of prediction accuracy, the approach described here ranked third best. Our interpretation of the prediction model confirmed previous reports suggesting that past disease progression is a strong predictor of future disease progression measured on the ALS functional rating scale (ALSFRS). We also found that larger variability in initial ALSFRS scores is linked to faster future disease progression. The results reported here furthermore suggested that approaches taking the multidimensionality of the ALSFRS into account promise some potential for improved ALS disease prediction.
RAQ–A Random Forest Approach for Predicting Air Quality in Urban Sensing Systems
Yu, Ruiyun; Yang, Yu; Yang, Leyou; Han, Guangjie; Move, Oguti Ann
2016-01-01
Air quality information such as the concentration of PM2.5 is of great significance for human health and city management. It affects the way of traveling, urban planning, government policies and so on. However, in major cities there is typically only a limited number of air quality monitoring stations. In the meantime, air quality varies in the urban areas and there can be large differences, even between closely neighboring regions. In this paper, a random forest approach for predicting air quality (RAQ) is proposed for urban sensing systems. The data generated by urban sensing includes meteorology data, road information, real-time traffic status and point of interest (POI) distribution. The random forest algorithm is exploited for data training and prediction. The performance of RAQ is evaluated with real city data. Compared with three other algorithms, this approach achieves better prediction precision. Exciting results are observed from the experiments that the air quality can be inferred with amazingly high accuracy from the data which are obtained from urban sensing. PMID:26761008
PET-CT image fusion using random forest and à-trous wavelet transform.
Seal, Ayan; Bhattacharjee, Debotosh; Nasipuri, Mita; Rodríguez-Esparragón, Dionisio; Menasalvas, Ernestina; Gonzalo-Martin, Consuelo
2018-03-01
New image fusion rules for multimodal medical images are proposed in this work. Image fusion rules are defined by random forest learning algorithm and a translation-invariant à-trous wavelet transform (AWT). The proposed method is threefold. First, source images are decomposed into approximation and detail coefficients using AWT. Second, random forest is used to choose pixels from the approximation and detail coefficients for forming the approximation and detail coefficients of the fused image. Lastly, inverse AWT is applied to reconstruct fused image. All experiments have been performed on 198 slices of both computed tomography and positron emission tomography images of a patient. A traditional fusion method based on Mallat wavelet transform has also been implemented on these slices. A new image fusion performance measure along with 4 existing measures has been presented, which helps to compare the performance of 2 pixel level fusion methods. The experimental results clearly indicate that the proposed method outperforms the traditional method in terms of visual and quantitative qualities and the new measure is meaningful. Copyright © 2017 John Wiley & Sons, Ltd.
GPURFSCREEN: a GPU based virtual screening tool using random forest classifier.
Jayaraj, P B; Ajay, Mathias K; Nufail, M; Gopakumar, G; Jaleel, U C A
2016-01-01
In-silico methods are an integral part of modern drug discovery paradigm. Virtual screening, an in-silico method, is used to refine data models and reduce the chemical space on which wet lab experiments need to be performed. Virtual screening of a ligand data model requires large scale computations, making it a highly time consuming task. This process can be speeded up by implementing parallelized algorithms on a Graphical Processing Unit (GPU). Random Forest is a robust classification algorithm that can be employed in the virtual screening. A ligand based virtual screening tool (GPURFSCREEN) that uses random forests on GPU systems has been proposed and evaluated in this paper. This tool produces optimized results at a lower execution time for large bioassay data sets. The quality of results produced by our tool on GPU is same as that on a regular serial environment. Considering the magnitude of data to be screened, the parallelized virtual screening has a significantly lower running time at high throughput. The proposed parallel tool outperforms its serial counterpart by successfully screening billions of molecules in training and prediction phases.
The Microbiome in Posttraumatic Stress Disorder and Trauma-Exposed Controls: An Exploratory Study.
Hemmings, Sian M J; Malan-Müller, Stefanie; van den Heuvel, Leigh L; Demmitt, Brittany A; Stanislawski, Maggie A; Smith, David G; Bohr, Adam D; Stamper, Christopher E; Hyde, Embriette R; Morton, James T; Marotz, Clarisse A; Siebler, Philip H; Braspenning, Maarten; Van Criekinge, Wim; Hoisington, Andrew J; Brenner, Lisa A; Postolache, Teodor T; McQueen, Matthew B; Krauter, Kenneth S; Knight, Rob; Seedat, Soraya; Lowry, Christopher A
2017-10-01
Inadequate immunoregulation and elevated inflammation may be risk factors for posttraumatic stress disorder (PTSD), and microbial inputs are important determinants of immunoregulation; however, the association between the gut microbiota and PTSD is unknown. This study investigated the gut microbiome in a South African sample of PTSD-affected individuals and trauma-exposed (TE) controls to identify potential differences in microbial diversity or microbial community structure. The Clinician-Administered PTSD Scale for DSM-5 was used to diagnose PTSD according to Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition criteria. Microbial DNA was extracted from stool samples obtained from 18 individuals with PTSD and 12 TE control participants. Bacterial 16S ribosomal RNA gene V3/V4 amplicons were generated and sequenced. Microbial community structure, α-diversity, and β-diversity were analyzed; random forest analysis was used to identify associations between bacterial taxa and PTSD. There were no differences between PTSD and TE control groups in α- or β-diversity measures (e.g., α-diversity: Shannon index, t = 0.386, p = .70; β-diversity, on the basis of analysis of similarities: Bray-Curtis test statistic = -0.033, p = .70); however, random forest analysis highlighted three phyla as important to distinguish PTSD status: Actinobacteria, Lentisphaerae, and Verrucomicrobia. Decreased total abundance of these taxa was associated with higher Clinician-Administered PTSD Scale scores (r = -0.387, p = .035). In this exploratory study, measures of overall microbial diversity were similar among individuals with PTSD and TE controls; however, decreased total abundance of Actinobacteria, Lentisphaerae, and Verrucomicrobia was associated with PTSD status.
What variables are important in predicting bovine viral diarrhea virus? A random forest approach.
Machado, Gustavo; Mendoza, Mariana Recamonde; Corbellini, Luis Gustavo
2015-07-24
Bovine viral diarrhea virus (BVDV) causes one of the most economically important diseases in cattle, and the virus is found worldwide. A better understanding of the disease associated factors is a crucial step towards the definition of strategies for control and eradication. In this study we trained a random forest (RF) prediction model and performed variable importance analysis to identify factors associated with BVDV occurrence. In addition, we assessed the influence of features selection on RF performance and evaluated its predictive power relative to other popular classifiers and to logistic regression. We found that RF classification model resulted in an average error rate of 32.03% for the negative class (negative for BVDV) and 36.78% for the positive class (positive for BVDV).The RF model presented area under the ROC curve equal to 0.702. Variable importance analysis revealed that important predictors of BVDV occurrence were: a) who inseminates the animals, b) number of neighboring farms that have cattle and c) rectal palpation performed routinely. Our results suggest that the use of machine learning algorithms, especially RF, is a promising methodology for the analysis of cross-sectional studies, presenting a satisfactory predictive power and the ability to identify predictors that represent potential risk factors for BVDV investigation. We examined classical predictors and found some new and hard to control practices that may lead to the spread of this disease within and among farms, mainly regarding poor or neglected reproduction management, which should be considered for disease control and eradication.
Multiscale habitat use and selection in cooperatively breeding Micronesian kingfishers
Kesler, D.C.; Haig, S.M.
2007-01-01
Information about the interaction between behavior and landscape resources is key to directing conservation management for endangered species. We studied multi-scale occurrence, habitat use, and selection in a cooperatively breeding population of Micronesian kingfishers (Todiramphus cinnamominus) on the island of Pohnpei, Federated States of Micronesia. At the landscape level, point-transect surveys resulted in kingfisher detection frequencies that were higher than those reported in 1994, although they remained 15-40% lower than 1983 indices. Integration of spatially explicit vegetation information with survey results indicated that kingfisher detections were positively associated with the amount of wet forest and grass-urban vegetative cover, and they were negatively associated with agricultural forest, secondary vegetation, and upland forest cover types. We used radiotelemetry and remote sensing to evaluate habitat use by individual kingfishers at the home-range scale. A comparison of habitats in Micronesian kingfisher home ranges with those in randomly placed polygons illustrated that birds used more forested areas than were randomly available in the immediate surrounding area. Further, members of cooperatively breeding groups included more forest in their home ranges than birds in pair-breeding territories, and forested portions of study areas appeared to be saturated with territories. Together, these results suggested that forest habitats were limited for Micronesian kingfishers. Thus, protecting and managing forests is important for the restoration of Micronesian kingfishers to the island of Guam (United States Territory), where they are currently extirpated, as well as to maintaining kingfisher populations on the islands of Pohnpei and Palau. Results further indicated that limited forest resources may restrict dispersal opportunities and, therefore, play a role in delayed dispersal and cooperative behaviors in Micronesian kingfishers.
Modelling above Ground Biomass of Mangrove Forest Using SENTINEL-1 Imagery
NASA Astrophysics Data System (ADS)
Labadisos Argamosa, Reginald Jay; Conferido Blanco, Ariel; Balidoy Baloloy, Alvin; Gumbao Candido, Christian; Lovern Caboboy Dumalag, John Bart; Carandang Dimapilis, Lee, , Lady; Camero Paringit, Enrico
2018-04-01
Many studies have been conducted in the estimation of forest above ground biomass (AGB) using features from synthetic aperture radar (SAR). Specifically, L-band ALOS/PALSAR (wavelength 23 cm) data is often used. However, few studies have been made on the use of shorter wavelengths (e.g., C-band, 3.75 cm to 7.5 cm) for forest mapping especially in tropical forests since higher attenuation is observed for volumetric objects where energy propagated is absorbed. This study aims to model AGB estimates of mangrove forest using information derived from Sentinel-1 C-band SAR data. Combinations of polarisations (VV, VH), its derivatives, grey level co-occurrence matrix (GLCM), and its principal components were used as features for modelling AGB. Five models were tested with varying combinations of features; a) sigma nought polarisations and its derivatives; b) GLCM textures; c) the first five principal components; d) combination of models a-c; and e) the identified important features by Random Forest variable importance algorithm. Random Forest was used as regressor to compute for the AGB estimates to avoid over fitting caused by the introduction of too many features in the model. Model e obtained the highest r2 of 0.79 and an RMSE of 0.44 Mg using only four features, namely, σ°VH GLCM variance, σ°VH GLCM contrast, PC1, and PC2. This study shows that Sentinel-1 C-band SAR data could be used to produce acceptable AGB estimates in mangrove forest to compensate for the unavailability of longer wavelength SAR.
Use of DNA markers in forest tree improvement research
D.B. Neale; M.E. Devey; K.D. Jermstad; M.R. Ahuja; M.C. Alosi; K.A. Marshall
1992-01-01
DNA markers are rapidly being developed for forest trees. The most important markers are restriction fragment length polymorphisms (RFLPs), polymerase chain reaction- (PCR) based markers such as random amplified polymorphic DNA (RAPD), and fingerprinting markers. DNA markers can supplement isozyme markers for monitoring tree improvement activities such as; estimating...
Chen, Xuexia; Liu, Shuguang; Zhu, Zhiliang; Vogelmann, James E.; Li, Zhengpeng; Ohlen, Donald O.
2011-01-01
The concentrations of CO2 and other greenhouse gases in the atmosphere have been increasing and greatly affecting global climate and socio-economic systems. Actively growing forests are generally considered to be a major carbon sink, but forest wildfires lead to large releases of biomass carbon into the atmosphere. Aboveground forest biomass carbon (AFBC), an important ecological indicator, and fire-induced carbon emissions at regional scales are highly relevant to forest sustainable management and climate change. It is challenging to accurately estimate the spatial distribution of AFBC across large areas because of the spatial heterogeneity of forest cover types and canopy structure. In this study, Forest Inventory and Analysis (FIA) data, Landsat, and Landscape Fire and Resource Management Planning Tools Project (LANDFIRE) data were integrated in a regression tree model for estimating AFBC at a 30-m resolution in the Utah High Plateaus. AFBC were calculated from 225 FIA field plots and used as the dependent variable in the model. Of these plots, 10% were held out for model evaluation with stratified random sampling, and the other 90% were used as training data to develop the regression tree model. Independent variable layers included Landsat imagery and the derived spectral indicators, digital elevation model (DEM) data and derivatives, biophysical gradient data, existing vegetation cover type and vegetation structure. The cross-validation correlation coefficient (r value) was 0.81 for the training model. Independent validation using withheld plot data was similar with r value of 0.82. This validated regression tree model was applied to map AFBC in the Utah High Plateaus and then combined with burn severity information to estimate loss of AFBC in the Longston fire of Zion National Park in 2001. The final dataset represented 24 forest cover types for a 4 million ha forested area. We estimated a total of 353 Tg AFBC with an average of 87 MgC/ha in the Utah High Plateaus. We also estimated that 8054 Mg AFBC were released from 2.24 km2 burned forest area in the Longston fire. These results demonstrate that an AFBC spatial map and estimated biomass carbon consumption can readily be generated using existing database. The methodology provides a consistent, practical, and inexpensive way for estimating AFBC at 30-m resolution over large areas throughout the United States.
Detection of variations in aspen forest habitat from LANDSAT digital data: Bear River Range, Utah
NASA Technical Reports Server (NTRS)
Merola, J. A.; Jaynes, R. A. (Principal Investigator)
1982-01-01
The aspen forests of the Bear River Range were analyzed and mapped using data recorded on July 2, 1979 by the LANDSAT III satellite; study efforts yielded sixty-seven light signatures for the study area, of which three groups were identified as aspen and mapped at a scale of 1:24,000. Analysis and verification of the three groups were accomplished by random location of twenty-six field study plots within the LANDSAT-defined aspen areas. All study plots are included within the Cache portion of the Wasatch-Cache National Forest. The following selected site characteristics were recorded for each study plot: a list of understory species present; average percent cover density for understory species; aspen canopy cover estimates and stem measurements; and general site topographic characteristics. The study plot data were then analyzed with respect to corresponding Landsat spectral signatures. Field studies show that all twenty-six study plots are associated with one of the three aspen groups. Further study efforts concentration on characterizing the differences between the site characteristics of plots falling into each of the three aspen groups.
Holliday, Jason A; Wang, Tongli; Aitken, Sally
2012-09-01
Climate is the primary driver of the distribution of tree species worldwide, and the potential for adaptive evolution will be an important factor determining the response of forests to anthropogenic climate change. Although association mapping has the potential to improve our understanding of the genomic underpinnings of climatically relevant traits, the utility of adaptive polymorphisms uncovered by such studies would be greatly enhanced by the development of integrated models that account for the phenotypic effects of multiple single-nucleotide polymorphisms (SNPs) and their interactions simultaneously. We previously reported the results of association mapping in the widespread conifer Sitka spruce (Picea sitchensis). In the current study we used the recursive partitioning algorithm 'Random Forest' to identify optimized combinations of SNPs to predict adaptive phenotypes. After adjusting for population structure, we were able to explain 37% and 30% of the phenotypic variation, respectively, in two locally adaptive traits--autumn budset timing and cold hardiness. For each trait, the leading five SNPs captured much of the phenotypic variation. To determine the role of epistasis in shaping these phenotypes, we also used a novel approach to quantify the strength and direction of pairwise interactions between SNPs and found such interactions to be common. Our results demonstrate the power of Random Forest to identify subsets of markers that are most important to climatic adaptation, and suggest that interactions among these loci may be widespread.
Nelson, Andrew; Chomitz, Kenneth M.
2011-01-01
Protected areas (PAs) cover a quarter of the tropical forest estate. Yet there is debate over the effectiveness of PAs in reducing deforestation, especially when local people have rights to use the forest. A key analytic problem is the likely placement of PAs on marginal lands with low pressure for deforestation, biasing comparisons between protected and unprotected areas. Using matching techniques to control for this bias, this paper analyzes the global tropical forest biome using forest fires as a high resolution proxy for deforestation; disaggregates impacts by remoteness, a proxy for deforestation pressure; and compares strictly protected vs. multiple use PAs vs indigenous areas. Fire activity was overlaid on a 1 km map of tropical forest extent in 2000; land use change was inferred for any point experiencing one or more fires. Sampled points in pre-2000 PAs were matched with randomly selected never-protected points in the same country. Matching criteria included distance to road network, distance to major cities, elevation and slope, and rainfall. In Latin America and Asia, strict PAs substantially reduced fire incidence, but multi-use PAs were even more effective. In Latin America, where there is data on indigenous areas, these areas reduce forest fire incidence by 16 percentage points, over two and a half times as much as naïve (unmatched) comparison with unprotected areas would suggest. In Africa, more recently established strict PAs appear to be effective, but multi-use tropical forest protected areas yield few sample points, and their impacts are not robustly estimated. These results suggest that forest protection can contribute both to biodiversity conservation and CO2 mitigation goals, with particular relevance to the REDD agenda. Encouragingly, indigenous areas and multi-use protected areas can help to accomplish these goals, suggesting some compatibility between global environmental goals and support for local livelihoods. PMID:21857950
The use of crop rotation for mapping soil organic content in farmland
NASA Astrophysics Data System (ADS)
Yang, Lin; Song, Min; Zhu, A.-Xing; Qin, Chengzhi
2017-04-01
Most of the current digital soil mapping uses natural environmental covariates. However, human activities have significantly impacted the development of soil properties since half a century, and therefore become an important factor affecting soil spatial variability. Many researches have done field experiments to show how soil properties are impacted and changed by human activities, however, spatial variation data of human activities as environmental covariates have been rarely used in digital soil mapping. In this paper, we took crop rotation as an example of agricultural activities, and explored its effectiveness in characterizing and mapping the spatial variability of soil. The cultivated area of Xuanzhou city and Langxi County in Anhui Province was chosen as the study area. Three main crop rotations,including double-rice, wheat-rice,and oilseed rape-cotton were observed through field investigation in 2010. The spatial distribution of the three crop rotations in the study area was obtained by multi-phase remote sensing image interpretation using a supervised classification method. One-way analysis of variance (ANOVA) for topsoil organic content in the three crop rotation groups was performed. Factor importance of seven natural environmental covariates, crop rotation, Land use and NDVI were generated by variable importance criterion of Random Forest. Different combinations of environmental covariates were selected according to the importance rankings of environmental covariates for predicting SOC using Random Forest and Soil Landscape Inference Model (SOLIM). A cross validation was generated to evaluated the mapping accuracies. The results showed that there were siginificant differences of topsoil organic content among the three crop rotation groups. The crop rotation is more important than parent material, land use or NDVI according to the importance ranking calculated by Random Forest. In addition, crop rotation improved the mapping accuracy, especially for the flat clutivated area. This study demonstrates the usefulness of human activities in digital soil mapping and thus indicates the necessity for human activity factors in digital soil mapping studies.
Tree species classification in subtropical forests using small-footprint full-waveform LiDAR data
NASA Astrophysics Data System (ADS)
Cao, Lin; Coops, Nicholas C.; Innes, John L.; Dai, Jinsong; Ruan, Honghua; She, Guanghui
2016-07-01
The accurate classification of tree species is critical for the management of forest ecosystems, particularly subtropical forests, which are highly diverse and complex ecosystems. While airborne Light Detection and Ranging (LiDAR) technology offers significant potential to estimate forest structural attributes, the capacity of this new tool to classify species is less well known. In this research, full-waveform metrics were extracted by a voxel-based composite waveform approach and examined with a Random Forests classifier to discriminate six subtropical tree species (i.e., Masson pine (Pinus massoniana Lamb.)), Chinese fir (Cunninghamia lanceolata (Lamb.) Hook.), Slash pines (Pinus elliottii Engelm.), Sawtooth oak (Quercus acutissima Carruth.) and Chinese holly (Ilex chinensis Sims.) at three levels of discrimination. As part of the analysis, the optimal voxel size for modelling the composite waveforms was investigated, the most important predictor metrics for species classification assessed and the effect of scan angle on species discrimination examined. Results demonstrate that all tree species were classified with relatively high accuracy (68.6% for six classes, 75.8% for four main species and 86.2% for conifers and broadleaved trees). Full-waveform metrics (based on height of median energy, waveform distance and number of waveform peaks) demonstrated high classification importance and were stable among various voxel sizes. The results also suggest that the voxel based approach can alleviate some of the issues associated with large scan angles. In summary, the results indicate that full-waveform LIDAR data have significant potential for tree species classification in the subtropical forests.
Forest resources of the Forest resources of the Apache-Sitgreaves National Forest
Paul Rogers
2008-01-01
The Interior West Forest Inventory and Analysis (IWFIA) program of the USDA Forest Service, Rocky Mountain Research Station, as part of its national Forest Inventory and Analysis (FIA) duties, conducted forest resource inventories of the Southwestern Region (Region 3) National Forests. This report presents highlights of the Apache-Sitgreaves National Forest...
Aspen, climate, and sudden decline in western USA
Gerald E. Rehfeldt; Dennis E. Ferguson; Nicholas L. Crookston
2009-01-01
A bioclimate model predicting the presence or absence of aspen, Populus tremuloides, in western USA from climate variables was developed by using the Random Forests classification tree on Forest Inventory data from about 118,000 permanent sample plots. A reasonably parsimonious model used eight predictors to describe aspen's climate profile. Classification errors...
Variation in Local-Scale Edge Effects: Mechanisms and landscape Context
Therese M. Donovan; Peter W. Jones; Elizabeth M. Annand; Frank R. Thompson III
1997-01-01
Ecological processes near habitat edges often differ from processes away from edges. Yet, the generality of "edge effects" has been hotly debated because results vary tremendously. To understand the factors responsible for this variation, we described nest predation and cowbird distribution patterns in forest edge and forest core habitats on 36 randomly...
Mitigating budget constraints on visitation volume surveys: the case of U.S. National forests
Ashley E. Askew; Donald B.K. English; Stanley J. Zarnoch; Neelam C. Poudyal; J.M. Bowker
2014-01-01
Stratified random sampling (SRS) provides a scientifically based estimate of a population comprising mutually exclusive, homogenous subgroups. In the National Visitor Use Monitoring (NVUM) program, SRS is used to estimate recreation visitation and visitor characteristics across activities on National forests. However, with rising costs and declining budgets, carrying...
Jerry J. Vaske; Maureen P. Donnelly; Daniel R. Williams; Sandra Jonker
2001-01-01
Using the cognitive hierarchy as the theoretical foundation, this article examines the predictive influence of individuals' demographic characteristics on environmental value orientations and normative beliefs about national forest management. Data for this investigation were obtained from a random sample of Colorado residents (n = 960). As predicted by theory, a...
Subtyping cognitive profiles in Autism Spectrum Disorder using a Functional Random Forest algorithm.
Feczko, E; Balba, N M; Miranda-Dominguez, O; Cordova, M; Karalunas, S L; Irwin, L; Demeter, D V; Hill, A P; Langhorst, B H; Grieser Painter, J; Van Santen, J; Fombonne, E J; Nigg, J T; Fair, D A
2018-05-15
DSM-5 Autism Spectrum Disorder (ASD) comprises a set of neurodevelopmental disorders characterized by deficits in social communication and interaction and repetitive behaviors or restricted interests, and may both affect and be affected by multiple cognitive mechanisms. This study attempts to identify and characterize cognitive subtypes within the ASD population using our Functional Random Forest (FRF) machine learning classification model. This model trained a traditional random forest model on measures from seven tasks that reflect multiple levels of information processing. 47 ASD diagnosed and 58 typically developing (TD) children between the ages of 9 and 13 participated in this study. Our RF model was 72.7% accurate, with 80.7% specificity and 63.1% sensitivity. Using the random forest model, the FRF then measures the proximity of each subject to every other subject, generating a distance matrix between participants. This matrix is then used in a community detection algorithm to identify subgroups within the ASD and TD groups, and revealed 3 ASD and 4 TD putative subgroups with unique behavioral profiles. We then examined differences in functional brain systems between diagnostic groups and putative subgroups using resting-state functional connectivity magnetic resonance imaging (rsfcMRI). Chi-square tests revealed a significantly greater number of between group differences (p < .05) within the cingulo-opercular, visual, and default systems as well as differences in inter-system connections in the somato-motor, dorsal attention, and subcortical systems. Many of these differences were primarily driven by specific subgroups suggesting that our method could potentially parse the variation in brain mechanisms affected by ASD. Copyright © 2017. Published by Elsevier Inc.
Simple to complex modeling of breathing volume using a motion sensor.
John, Dinesh; Staudenmayer, John; Freedson, Patty
2013-06-01
To compare simple and complex modeling techniques to estimate categories of low, medium, and high ventilation (VE) from ActiGraph™ activity counts. Vertical axis ActiGraph™ GT1M activity counts, oxygen consumption and VE were measured during treadmill walking and running, sports, household chores and labor-intensive employment activities. Categories of low (<19.3 l/min), medium (19.3 to 35.4 l/min) and high (>35.4 l/min) VEs were derived from activity intensity classifications (light <2.9 METs, moderate 3.0 to 5.9 METs and vigorous >6.0 METs). We examined the accuracy of two simple techniques (multiple regression and activity count cut-point analyses) and one complex (random forest technique) modeling technique in predicting VE from activity counts. Prediction accuracy of the complex random forest technique was marginally better than the simple multiple regression method. Both techniques accurately predicted VE categories almost 80% of the time. The multiple regression and random forest techniques were more accurate (85 to 88%) in predicting medium VE. Both techniques predicted the high VE (70 to 73%) with greater accuracy than low VE (57 to 60%). Actigraph™ cut-points for light, medium and high VEs were <1381, 1381 to 3660 and >3660 cpm. There were minor differences in prediction accuracy between the multiple regression and the random forest technique. This study provides methods to objectively estimate VE categories using activity monitors that can easily be deployed in the field. Objective estimates of VE should provide a better understanding of the dose-response relationship between internal exposure to pollutants and disease. Copyright © 2013 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Chemura, Abel; Mutanga, Onisimo; Dube, Timothy
2017-08-01
Water management is an important component in agriculture, particularly for perennial tree crops such as coffee. Proper detection and monitoring of water stress therefore plays an important role not only in mitigating the associated adverse impacts on crop growth and productivity but also in reducing expensive and environmentally unsustainable irrigation practices. Current methods for water stress detection in coffee production mainly involve monitoring plant physiological characteristics and soil conditions. In this study, we tested the ability of selected wavebands in the VIS/NIR range to predict plant water content (PWC) in coffee using the random forest algorithm. An experiment was set up such that coffee plants were exposed to different levels of water stress and reflectance and plant water content measured. In selecting appropriate parameters, cross-correlation identified 11 wavebands, reflectance difference identified 16 and reflectance sensitivity identified 22 variables related to PWC. Only three wavebands (485 nm, 670 nm and 885 nm) were identified by at least two methods as significant. The selected wavebands were trained (n = 36) and tested on independent data (n = 24) after being integrated into the random forest algorithm to predict coffee PWC. The results showed that the reflectance sensitivity selected bands performed the best in water stress detection (r = 0.87, RMSE = 4.91% and pBias = 0.9%), when compared to reflectance difference (r = 0.79, RMSE = 6.19 and pBias = 2.5%) and cross-correlation selected wavebands (r = 0.75, RMSE = 6.52 and pBias = 1.6). These results indicate that it is possible to reliably predict PWC using wavebands in the VIS/NIR range that correspond with many of the available multispectral scanners using random forests and further research at field and landscape scale is required to operationalize these findings.
Sarica, Alessia; Cerasa, Antonio; Quattrone, Aldo
2017-01-01
Objective: Machine learning classification has been the most important computational development in the last years to satisfy the primary need of clinicians for automatic early diagnosis and prognosis. Nowadays, Random Forest (RF) algorithm has been successfully applied for reducing high dimensional and multi-source data in many scientific realms. Our aim was to explore the state of the art of the application of RF on single and multi-modal neuroimaging data for the prediction of Alzheimer's disease. Methods: A systematic review following PRISMA guidelines was conducted on this field of study. In particular, we constructed an advanced query using boolean operators as follows: ("random forest" OR "random forests") AND neuroimaging AND ("alzheimer's disease" OR alzheimer's OR alzheimer) AND (prediction OR classification) . The query was then searched in four well-known scientific databases: Pubmed, Scopus, Google Scholar and Web of Science. Results: Twelve articles-published between the 2007 and 2017-have been included in this systematic review after a quantitative and qualitative selection. The lesson learnt from these works suggest that when RF was applied on multi-modal data for prediction of Alzheimer's disease (AD) conversion from the Mild Cognitive Impairment (MCI), it produces one of the best accuracies to date. Moreover, the RF has important advantages in terms of robustness to overfitting, ability to handle highly non-linear data, stability in the presence of outliers and opportunity for efficient parallel processing mainly when applied on multi-modality neuroimaging data, such as, MRI morphometric, diffusion tensor imaging, and PET images. Conclusions: We discussed the strengths of RF, considering also possible limitations and by encouraging further studies on the comparisons of this algorithm with other commonly used classification approaches, particularly in the early prediction of the progression from MCI to AD.
Wearn, Oliver R.; Rowcliffe, J. Marcus; Carbone, Chris; Bernard, Henry; Ewers, Robert M.
2013-01-01
The proliferation of camera-trapping studies has led to a spate of extensions in the known distributions of many wild cat species, not least in Borneo. However, we still do not have a clear picture of the spatial patterns of felid abundance in Southeast Asia, particularly with respect to the large areas of highly-disturbed habitat. An important obstacle to increasing the usefulness of camera trap data is the widespread practice of setting cameras at non-random locations. Non-random deployment interacts with non-random space-use by animals, causing biases in our inferences about relative abundance from detection frequencies alone. This may be a particular problem if surveys do not adequately sample the full range of habitat features present in a study region. Using camera-trapping records and incidental sightings from the Kalabakan Forest Reserve, Sabah, Malaysian Borneo, we aimed to assess the relative abundance of felid species in highly-disturbed forest, as well as investigate felid space-use and the potential for biases resulting from non-random sampling. Although the area has been intensively logged over three decades, it was found to still retain the full complement of Bornean felids, including the bay cat Pardofelis badia, a poorly known Bornean endemic. Camera-trapping using strictly random locations detected four of the five Bornean felid species and revealed inter- and intra-specific differences in space-use. We compare our results with an extensive dataset of >1,200 felid records from previous camera-trapping studies and show that the relative abundance of the bay cat, in particular, may have previously been underestimated due to the use of non-random survey locations. Further surveys for this species using random locations will be crucial in determining its conservation status. We advocate the more wide-spread use of random survey locations in future camera-trapping surveys in order to increase the robustness and generality of inferences that can be made. PMID:24223717
Forest resources of the Tonto National Forest
John D. Shaw
2004-01-01
The Interior West Forest Inventory and Analysis (IWFIA) program of the USDA Forest Service, Rocky Mountain Research Station, as part of its national Forest Inventory and Analysis (FIA) duties, conducted forest resource inventories of the Southwestern Region (Region 3) National Forests. This report presents highlights of the Tonto National Forest 1996 inventory...
Forest resources of the Prescott National Forest
Paul Rogers
2003-01-01
The Interior West Forest Inventory and Analysis (IWFIA) program of the USDA Forest Service, Rocky Mountain Research Station, as part of its national Forest Inventory and Analysis (FIA) duties, conducted forest resource inventories of the Southwestern Region (Region 3) National Forests. This report presents highlights of the Prescott National Forest 1996...
Forest resources of the Lincoln National Forest
John D. Shaw
2006-01-01
The Interior West Forest Inventory and Analysis (IWFIA) program of the USDA Forest Service, Rocky Mountain Research Station, as part of its national Forest Inventory and Analysis (FIA) duties, conducted forest resource inventories of the Southwestern Region (Region 3) National Forests. This report presents highlights of the Lincoln National Forest 1997 inventory...
Grietens, Koen Peeters; Xuan, Xa Nguyen; Ribera, Joan; Duc, Thang Ngo; Bortel, Wim van; Ba, Nhat Truong; Van, Ky Pham; Xuan, Hung Le; D'Alessandro, Umberto; Erhart, Annette
2012-01-01
Long-lasting insecticidal hammocks (LLIHs) are being evaluated as an additional malaria prevention tool in settings where standard control strategies have a limited impact. This is the case among the Ra-glai ethnic minority communities of Ninh Thuan, one of the forested and mountainous provinces of Central Vietnam where malaria morbidity persist due to the sylvatic nature of the main malaria vector An. dirus and the dependence of the population on the forest for subsistence--as is the case for many impoverished ethnic minorities in Southeast Asia. A social science study was carried out ancillary to a community-based cluster randomized trial on the effectiveness of LLIHs to control forest malaria. The social science research strategy consisted of a mixed methods study triangulating qualitative data from focused ethnography and quantitative data collected during a malariometric cross-sectional survey on a random sample of 2,045 study participants. To meet work requirements during the labor intensive malaria transmission and rainy season, Ra-glai slash and burn farmers combine living in government supported villages along the road with a second home at their fields located in the forest. LLIH use was evaluated in both locations. During daytime, LLIH use at village level was reported by 69.3% of all respondents, and in forest fields this was 73.2%. In the evening, 54.1% used the LLIHs in the villages, while at the fields this was 20.7%. At night, LLIH use was minimal, regardless of the location (village 4.4%; forest 6.4%). Despite the free distribution of insecticide-treated nets (ITNs) and LLIHs, around half the local population remains largely unprotected when sleeping in their forest plot huts. In order to tackle forest malaria more effectively, control policies should explicitly target forest fields where ethnic minority farmers are more vulnerable to malaria.
Muela Ribera, Joan; Ngo Duc, Thang; van Bortel, Wim; Truong Ba, Nhat; Van, Ky Pham; Le Xuan, Hung; D'Alessandro, Umberto; Erhart, Annette
2012-01-01
Background Long-lasting insecticidal hammocks (LLIHs) are being evaluated as an additional malaria prevention tool in settings where standard control strategies have a limited impact. This is the case among the Ra-glai ethnic minority communities of Ninh Thuan, one of the forested and mountainous provinces of Central Vietnam where malaria morbidity persist due to the sylvatic nature of the main malaria vector An. dirus and the dependence of the population on the forest for subsistence - as is the case for many impoverished ethnic minorities in Southeast Asia. Methods A social science study was carried out ancillary to a community-based cluster randomized trial on the effectiveness of LLIHs to control forest malaria. The social science research strategy consisted of a mixed methods study triangulating qualitative data from focused ethnography and quantitative data collected during a malariometric cross-sectional survey on a random sample of 2,045 study participants. Results To meet work requirements during the labor intensive malaria transmission and rainy season, Ra-glai slash and burn farmers combine living in government supported villages along the road with a second home at their fields located in the forest. LLIH use was evaluated in both locations. During daytime, LLIH use at village level was reported by 69.3% of all respondents, and in forest fields this was 73.2%. In the evening, 54.1% used the LLIHs in the villages, while at the fields this was 20.7%. At night, LLIH use was minimal, regardless of the location (village 4.4%; forest 6.4%). Discussion Despite the free distribution of insecticide-treated nets (ITNs) and LLIHs, around half the local population remains largely unprotected when sleeping in their forest plot huts. In order to tackle forest malaria more effectively, control policies should explicitly target forest fields where ethnic minority farmers are more vulnerable to malaria. PMID:22253852
Landscape variability of vegetation change across the forest to tundra transition of central Canada
NASA Astrophysics Data System (ADS)
Bonney, Mitchell Thurston
Widespread vegetation productivity increases in tundra ecosystems and stagnation, or even productivity decreases, in boreal forest ecosystems have been detected from coarse-scale remote sensing observations over the last few decades. However, finer-scale Landsat studies have shown that these changes are heterogeneous and may be related to landscape and regional variability in climate, land cover, topography and moisture. In this study, a Landsat Normalized Difference Vegetation Index (NDVI) time-series (1984-2016) was examined for a study area spanning the entirety of the sub-Arctic boreal forest to Low Arctic tundra transition of central Canada (i.e., Yellowknife to the Arctic Ocean). NDVI trend analysis indicated that 27% of un-masked pixels in the study area exhibited a significant (p < 0.05) trend and virtually all (99.3%) of those pixels were greening. Greening pixels were most common in the northern tundra zone and the southern forest-tundra ecotone zone. NDVI trends were positive throughout the study area, but were smallest in the forest zone and largest in the northern tundra zone. These results were supported by ground validation, which found a strong relationship (R2 = 0.81) between bulk vegetation volume (BVV) and NDVI for non-tree functional groups in the North Slave region of Northwest Territories. Field observations indicate that alder (Alnus spp.) shrublands and open woodland sites with shrubby understories were most likely to exhibit greening in that area. Random Forest (RF) modelling of the relationship between NDVI trends and environmental variables found that the magnitude and direction of trends differed across the forest to tundra transition. Increased summer temperatures, shrubland and forest land cover, closer proximity to major drainage systems, longer distances from major lakes and lower elevations were generally more important and associated with larger positive NDVI trends. These findings indicate that the largest positive NDVI trends were primarily associated with the increased productivity of shrubby environments, especially at, and north of the forest-tundra ecotone in areas with more favorable growing conditions. Smaller and less significant NDVI trends in boreal forest environments south of the forest-tundra ecotone were likely associated with long-term recovery from fire disturbance rather than the variables analyzed here.
Mark D. Nelson; John Vissage
2007-01-01
The Forest Inventory and Analysis (FIA) program produces area estimates of forest land use within three subcategories: timberland, reserved forest land, and other forest land. Mapping these subcategories of forest land requires the ability to spatially distinguish productive from unproductive land, and reserved from nonreserved land. FIA field data were spatially...
EDITORIAL: Special section on foliage penetration
NASA Astrophysics Data System (ADS)
Fiddy, M. A.; Lang, R.; McGahan, R. V.
2004-04-01
Waves in Random Media was founded in 1991 to provide a forum for papers dealing with electromagnetic and acoustic waves as they propagate and scatter through media or objects having some degree of randomness. This is a broad charter since, in practice, all scattering obstacles and structures have roughness or randomness, often on the scale of the wavelength being used to probe them. Including this random component leads to some quite different methods for describing propagation effects, for example, when propagating through the atmosphere or the ground. This special section on foliage penetration (FOPEN) focuses on the problems arising from microwave propagation through foliage and vegetation. Applications of such studies include the estimation for forest biomass and the moisture of the underlying soil, as well as detecting objects hidden therein. In addition to the so-called `direct problem' of trying to describe energy propagating through such media, the complementary inverse problem is of great interest and much harder to solve. The development of theoretical models and associated numerical algorithms for identifying objects concealed by foliage has applications in surveillance, ranging from monitoring drug trafficking to targeting military vehicles. FOPEN can be employed to map the earth's surface in cases when it is under a forest canopy, permitting the identification of objects or targets on that surface, but the process for doing so is not straightforward. There has been an increasing interest in foliage penetration synthetic aperture radar (FOPEN or FOPENSAR) over the last 10 years and this special section provides a broad overview of many of the issues involved. The detection, identification, and geographical location of targets under foliage or otherwise obscured by poor visibility conditions remains a challenge. In particular, a trade-off often needs to be appreciated, namely that diminishing the deleterious effects of multiple scattering from leaves is typically associated with a significant loss in target resolution. Foliage is more or less transparent to some radar frequencies, but longer wavelengths found in the VHF (30 to 300 MHz) and UHF (300 MHz to 3 GHz) portions of the microwave spectrum have more chance of penetrating foliage than do wavelengths at the X band (8 to 12 GHz). Reflection and multiple scattering occur for some other frequencies and models of the processes involved are crucial. Two topical reviews can be found in this issue, one on the microwave radiometry of forests (page S275) and another describing ionospheric effects on space-based radar (page S189). Subsequent papers present new results on modelling coherent backscatter from forests (page S299), modelling forests as discrete random media over a random interface (page S359) and interpreting ranging scatterometer data from forests (page S317). Cloude et al present research on identifying targets beneath foliage using polarimetric SAR interferometry (page S393) while Treuhaft and Siqueira use interferometric radar to describe forest structure and biomass (page S345). Vechhia et al model scattering from leaves (page S333) and Semichaevsky et al address the problem of the trade-off between increasing wavelength, reduction in multiple scattering, and target resolution (page S415).
Spatio-temporal Change Patterns of Tropical Forests from 2000 to 2014 Using MOD09A1 Dataset
NASA Astrophysics Data System (ADS)
Qin, Y.; Xiao, X.; Dong, J.
2016-12-01
Large-scale deforestation and forest degradation in the tropical region have resulted in extensive carbon emissions and biodiversity loss. However, restricted by the availability of good-quality observations, large uncertainty exists in mapping the spatial distribution of forests and their spatio-temporal changes. In this study, we proposed a pixel- and phenology-based algorithm to identify and map annual tropical forests from 2000 to 2014, using the 8-day, 500-m MOD09A1 (v005) product, under the support of Google cloud computing (Google Earth Engine). A temporal filter was applied to reduce the random noises and to identify the spatio-temporal changes of forests. We then built up a confusion matrix and assessed the accuracy of the annual forest maps based on the ground reference interpreted from high spatial resolution images in Google Earth. The resultant forest maps showed the consistent forest/non-forest, forest loss, and forest gain in the pan-tropical zone during 2000 - 2014. The proposed algorithm showed the potential for tropical forest mapping and the resultant forest maps are important for the estimation of carbon emission and biodiversity loss.
Exploring prediction uncertainty of spatial data in geostatistical and machine learning Approaches
NASA Astrophysics Data System (ADS)
Klump, J. F.; Fouedjio, F.
2017-12-01
Geostatistical methods such as kriging with external drift as well as machine learning techniques such as quantile regression forest have been intensively used for modelling spatial data. In addition to providing predictions for target variables, both approaches are able to deliver a quantification of the uncertainty associated with the prediction at a target location. Geostatistical approaches are, by essence, adequate for providing such prediction uncertainties and their behaviour is well understood. However, they often require significant data pre-processing and rely on assumptions that are rarely met in practice. Machine learning algorithms such as random forest regression, on the other hand, require less data pre-processing and are non-parametric. This makes the application of machine learning algorithms to geostatistical problems an attractive proposition. The objective of this study is to compare kriging with external drift and quantile regression forest with respect to their ability to deliver reliable prediction uncertainties of spatial data. In our comparison we use both simulated and real world datasets. Apart from classical performance indicators, comparisons make use of accuracy plots, probability interval width plots, and the visual examinations of the uncertainty maps provided by the two approaches. By comparing random forest regression to kriging we found that both methods produced comparable maps of estimated values for our variables of interest. However, the measure of uncertainty provided by random forest seems to be quite different to the measure of uncertainty provided by kriging. In particular, the lack of spatial context can give misleading results in areas without ground truth data. These preliminary results raise questions about assessing the risks associated with decisions based on the predictions from geostatistical and machine learning algorithms in a spatial context, e.g. mineral exploration.
Gretchen C. Smith; John W. Coulston; Barbara M. O' Connell
2008-01-01
In 1994, the Forest Inventory and Analysis (FIA) and Forest Health Monitoring programs of the U.S. Forest Service implemented a national ozone (O3) biomonitoring program designed to address specific questions about the area and percent of forest land subject to levels of O3 pollution that may negatively affect the forest...
Xu, Ge Xi; Shi, Zuo Min; Tang, Jing Chao; Liu, Shun; Ma, Fan Qiang; Xu, Han; Liu, Shi Rong; Li, Yi de
2016-11-18
Based on three 1-hm 2 plots of Jianfengling tropical montane rainforest on Hainan Island, 11 commom used functional traits of canopy trees were measured. After combining with topographical factors and trees census data of these three plots, we compared the impacts of weighted species abundance on two functional dispersion indices, mean pairwise distance (MPD) and mean nearest taxon distance (MNTD), by using single- and multi-dimensional traits, respectively. The relationship between functional richness of the forest canopies and species abundance was analyzed. We used a null model approach to explore the variations in standardized size effects of MPD and MNTD, which were weighted by species abundance and eliminated the influences of species richness diffe-rences among communities, and assessed functional diversity patterns of the forest canopies and their responses to local habitat heterogeneity at community's level. The results showed that variation in MPD was greatly dependent on the dimensionalities of functional traits as well as species abundance. The correlations between weighted and non-weighted MPD based on different dimensional traits were relatively weak (R=0.359-0.628). On the contrary, functional traits and species abundance had relatively weak effects on MNTD, which brought stronger correlations between weighted and non-weighted MNTD based on different dimensional traits (R=0.746-0.820). Functional dispersion of the forest canopies were generally overestimated when using non-weighted MPD and MNTD. Functional richness of the forest canopies showed an exponential relationship with species abundance (F=128.20; R 2 =0.632; AIC=97.72; P<0.001), which might exist a species abundance threshold value. Patterns of functional diversity of the forest canopies based on different dimensional functional traits and their habitat responses showed variations in some degree. Forest canopies in the valley usually had relatively stronger biological competition, and functional diversity was higher than expected functional diversity randomized by null model, which indicated dispersed distribution of functional traits among canopy tree species in this habitat. However, the functional diversity of the forest canopies tended to be close or lower than randomization in the other habitat types, which demonstrated random or clustered distribution of the functional traits among canopy tree species.
Löfgren, Stefan; Fröberg, Mats; Yu, Jun; Nisell, Jakob; Ranneby, Bo
2014-12-01
From a policy perspective, it is important to understand forestry effects on surface waters from a landscape perspective. The EU Water Framework Directive demands remedial actions if not achieving good ecological status. In Sweden, 44 % of the surface water bodies have moderate ecological status or worse. Many of these drain catchments with a mosaic of managed forests. It is important for the forestry sector and water authorities to be able to identify where, in the forested landscape, special precautions are necessary. The aim of this study was to quantify the relations between forestry parameters and headwater stream concentrations of nutrients, organic matter and acid-base chemistry. The results are put into the context of regional climate, sulphur and nitrogen deposition, as well as marine influences. Water chemistry was measured in 179 randomly selected headwater streams from two regions in southwest and central Sweden, corresponding to 10 % of the Swedish land area. Forest status was determined from satellite images and Swedish National Forest Inventory data using the probabilistic classifier method, which was used to model stream water chemistry with Bayesian model averaging. The results indicate that concentrations of e.g. nitrogen, phosphorus and organic matter are related to factors associated with forest production but that it is not forestry per se that causes the excess losses. Instead, factors simultaneously affecting forest production and stream water chemistry, such as climate, extensive soil pools and nitrogen deposition, are the most likely candidates The relationships with clear-felled and wetland areas are likely to be direct effects.
Influence of matrix type on tree community assemblages along tropical dry forest edges.
Benítez-Malvido, Julieta; Gallardo-Vásquez, Julio César; Alvarez-Añorve, Mariana Y; Avila-Cabadilla, Luis Daniel
2014-05-01
• Anthropogenic habitat edges have strong negative consequences for the functioning of tropical ecosystems. However, edge effects on tropical dry forest tree communities have been barely documented.• In Chamela, Mexico, we investigated the phylogenetic composition and structure of tree assemblages (≥5 cm dbh) along edges abutting different matrices: (1) disturbed vegetation with cattle, (2) pastures with cattle and, (3) pastures without cattle. Additionally, we sampled preserved forest interiors.• All edge types exhibited similar tree density, basal area and diversity to interior forests, but differed in species composition. A nonmetric multidimensional scaling ordination showed that the presence of cattle influenced species composition more strongly than the vegetation structure of the matrix; tree assemblages abutting matrices with cattle had lower scores in the ordination. The phylogenetic composition of tree assemblages followed the same pattern. The principal plant families and genera were associated according to disturbance regimes as follows: pastures and disturbed vegetation (1) with cattle and (2) without cattle, and (3) pastures without cattle and interior forests. All habitats showed random phylogenetic structures, suggesting that tree communities are assembled mainly by stochastic processes. Long-lived species persisting after edge creation could have important implications in the phylogenetic structure of tree assemblages.• Edge creation exerts a stronger influence on TDF vegetation pathways than previously documented, leading to new ecological communities. Phylogenetic analysis may, however, be needed to detect such changes. © 2014 Botanical Society of America, Inc.
Forest resources of the Santa Fe National Forest
Dana Lambert
2004-01-01
The Interior West Forest Inventory and Analysis (IWFIA) program of the USDA Forest Service, Rocky Mountain Research Station, as part of its national Forest Inventory and Analysis (FIA) duties, conducted forest resource inventories of the Southwestern Region (Region 3) National Forests. This report presents highlights of the Santa Fe National Forest 1998...
Forest resources of the Gila National Forest
John D. Shaw
2008-01-01
The Interior West Forest Inventory and Analysis (IWFIA) program of the USDA Forest Service, Rocky Mountain Research Station, as part of its national Forest Inventory and Analysis (FIA) duties, conducted forest resource inventories of the Southwestern Region (Region 3) National Forests. This report presents highlights of the Gila National Forest 1994 inventory including...
Survival analysis for a large scale forest health issue: Missouri oak decline
C.W. Woodall; P.L. Grambsch; W. Thomas; W.K. Moser
2005-01-01
Survival analysis methodologies provide novel approaches for forest mortality analysis that may aid in detecting, monitoring, and mitigating of large-scale forest health issues. This study examined survivor analysis for evaluating a regional forest health issue - Missouri oak decline. With a statewide Missouri forest inventory, log-rank tests of the effects of...
NASA Astrophysics Data System (ADS)
Rahmani, K.; Mayer, H.
2018-05-01
In this paper we present a pipeline for high quality semantic segmentation of building facades using Structured Random Forest (SRF), Region Proposal Network (RPN) based on a Convolutional Neural Network (CNN) as well as rectangular fitting optimization. Our main contribution is that we employ features created by the RPN as channels in the SRF.We empirically show that this is very effective especially for doors and windows. Our pipeline is evaluated on two datasets where we outperform current state-of-the-art methods. Additionally, we quantify the contribution of the RPN and the rectangular fitting optimization on the accuracy of the result.
Predicting fecal indicator organism contamination in Oregon coastal streams.
Pettus, Paul; Foster, Eugene; Pan, Yangdong
2015-12-01
In this study, we used publicly available GIS layers and statistical tree-based modeling (CART and Random Forest) to predict pathogen indicator counts at a regional scale using 88 spatially explicit landscape predictors and 6657 samples from non-estuarine streams in the Oregon Coast Range. A total of 532 frequently sampled sites were parsed down to 93 pathogen sampling sites to control for spatial and temporal biases. This model's 56.5% explanation of variance, was comparable to other regional models, while still including a large number of variables. Analysis showed the most important predictors on bacteria counts to be: forest and natural riparian zones, cattle related activities, and urban land uses. This research confirmed linkages to anthropogenic activities, with the research prediction mapping showing increased bacteria counts in agricultural and urban land use areas and lower counts with more natural riparian conditions. Copyright © 2015 Elsevier Ltd. All rights reserved.
Bridging the gap between formal and experience-based knowledge for context-aware laparoscopy.
Katić, Darko; Schuck, Jürgen; Wekerle, Anna-Laura; Kenngott, Hannes; Müller-Stich, Beat Peter; Dillmann, Rüdiger; Speidel, Stefanie
2016-06-01
Computer assistance is increasingly common in surgery. However, the amount of information is bound to overload processing abilities of surgeons. We propose methods to recognize the current phase of a surgery for context-aware information filtering. The purpose is to select the most suitable subset of information for surgical situations which require special assistance. We combine formal knowledge, represented by an ontology, and experience-based knowledge, represented by training samples, to recognize phases. For this purpose, we have developed two different methods. Firstly, we use formal knowledge about possible phase transitions to create a composition of random forests. Secondly, we propose a method based on cultural optimization to infer formal rules from experience to recognize phases. The proposed methods are compared with a purely formal knowledge-based approach using rules and a purely experience-based one using regular random forests. The comparative evaluation on laparoscopic pancreas resections and adrenalectomies employs a consistent set of quality criteria on clean and noisy input. The rule-based approaches proved best with noisefree data. The random forest-based ones were more robust in the presence of noise. Formal and experience-based knowledge can be successfully combined for robust phase recognition.
Stevens, Forrest R; Gaughan, Andrea E; Linard, Catherine; Tatem, Andrew J
2015-01-01
High resolution, contemporary data on human population distributions are vital for measuring impacts of population growth, monitoring human-environment interactions and for planning and policy development. Many methods are used to disaggregate census data and predict population densities for finer scale, gridded population data sets. We present a new semi-automated dasymetric modeling approach that incorporates detailed census and ancillary data in a flexible, "Random Forest" estimation technique. We outline the combination of widely available, remotely-sensed and geospatial data that contribute to the modeled dasymetric weights and then use the Random Forest model to generate a gridded prediction of population density at ~100 m spatial resolution. This prediction layer is then used as the weighting surface to perform dasymetric redistribution of the census counts at a country level. As a case study we compare the new algorithm and its products for three countries (Vietnam, Cambodia, and Kenya) with other common gridded population data production methodologies. We discuss the advantages of the new method and increases over the accuracy and flexibility of those previous approaches. Finally, we outline how this algorithm will be extended to provide freely-available gridded population data sets for Africa, Asia and Latin America.
Studies of the DIII-D disruption database using Machine Learning algorithms
NASA Astrophysics Data System (ADS)
Rea, Cristina; Granetz, Robert; Meneghini, Orso
2017-10-01
A Random Forests Machine Learning algorithm, trained on a large database of both disruptive and non-disruptive DIII-D discharges, predicts disruptive behavior in DIII-D with about 90% of accuracy. Several algorithms have been tested and Random Forests was found superior in performances for this particular task. Over 40 plasma parameters are included in the database, with data for each of the parameters taken from 500k time slices. We focused on a subset of non-dimensional plasma parameters, deemed to be good predictors based on physics considerations. Both binary (disruptive/non-disruptive) and multi-label (label based on the elapsed time before disruption) classification problems are investigated. The Random Forests algorithm provides insight on the available dataset by ranking the relative importance of the input features. It is found that q95 and Greenwald density fraction (n/nG) are the most relevant parameters for discriminating between DIII-D disruptive and non-disruptive discharges. A comparison with the Gradient Boosted Trees algorithm is shown and the first results coming from the application of regression algorithms are presented. Work supported by the US Department of Energy under DE-FC02-04ER54698, DE-SC0014264 and DE-FG02-95ER54309.
On the information content of hydrological signatures and their relationship to catchment attributes
NASA Astrophysics Data System (ADS)
Addor, Nans; Clark, Martyn P.; Prieto, Cristina; Newman, Andrew J.; Mizukami, Naoki; Nearing, Grey; Le Vine, Nataliya
2017-04-01
Hydrological signatures, which are indices characterizing hydrologic behavior, are increasingly used for the evaluation, calibration and selection of hydrological models. Their key advantage is to provide more direct insights into specific hydrological processes than aggregated metrics (e.g., the Nash-Sutcliffe efficiency). A plethora of signatures now exists, which enable characterizing a variety of hydrograph features, but also makes the selection of signatures for new studies challenging. Here we propose that the selection of signatures should be based on their information content, which we estimated using several approaches, all leading to similar conclusions. To explore the relationship between hydrological signatures and the landscape, we extended a previously published data set of hydrometeorological time series for 671 catchments in the contiguous United States, by characterizing the climatic conditions, topography, soil, vegetation and stream network of each catchment. This new catchment attributes data set will soon be in open access, and we are looking forward to introducing it to the community. We used this data set in a data-learning algorithm (random forests) to explore whether hydrological signatures could be inferred from catchment attributes alone. We find that some signatures can be predicted remarkably well by random forests and, interestingly, the same signatures are well captured when simulating discharge using a conceptual hydrological model. We discuss what this result reveals about our understanding of hydrological processes shaping hydrological signatures. We also identify which catchment attributes exert the strongest control on catchment behavior, in particular during extreme hydrological events. Overall, climatic attributes have the most significant influence, and strongly condition how well hydrological signatures can be predicted by random forests and simulated by the hydrological model. In contrast, soil characteristics at the catchment scale are not found to be significant predictors by random forests, which raises questions on how to best use soil data for hydrological modeling, for instance for parameter estimation. We finally demonstrate that signatures with high spatial variability are poorly captured by random forests and model simulations, which makes their regionalization delicate. We conclude with a ranking of signatures based on their information content, and propose that the signatures with high information content are best suited for model calibration, model selection and understanding hydrologic similarity.
Proceedings of the eighth annual forest inventory and analysis symposium
Ronald E. McRoberts; Gregory A. Reams; Paul C. Van Deusen; William H., eds. McWilliams
2009-01-01
Documents contributions to forest inventory in the areas of sampling, remote sensing, modeling, information management and analysis for the Forest Inventory and Analysis program of the Forest Service.
Dale D. Gormanson; Scott A. Pugh; Charles J. Barnett; Patrick D. Miles; Randall S. Morin; Paul A. Sowers; Jim Westfall
2017-01-01
The U.S. Forest Service Forest Inventory and Analysis (FIA) program collects sample plot data on all forest ownerships across the United States. FIA's primary objective is to determine the extent, condition, volume, growth, and use of trees on the Nation's forest land through a comprehensive inventory and analysis of the Nation's forest resources. The...
Roubeix, Vincent; Danis, Pierre-Alain; Feret, Thibaut; Baudoin, Jean-Marc
2016-04-01
In aquatic ecosystems, the identification of ecological thresholds may be useful for managers as it can help to diagnose ecosystem health and to identify key levers to enable the success of preservation and restoration measures. A recent statistical method, gradient forest, based on random forests, was used to detect thresholds of phytoplankton community change in lakes along different environmental gradients. It performs exploratory analyses of multivariate biological and environmental data to estimate the location and importance of community thresholds along gradients. The method was applied to a data set of 224 French lakes which were characterized by 29 environmental variables and the mean abundances of 196 phytoplankton species. Results showed the high importance of geographic variables for the prediction of species abundances at the scale of the study. A second analysis was performed on a subset of lakes defined by geographic thresholds and presenting a higher biological homogeneity. Community thresholds were identified for the most important physico-chemical variables including water transparency, total phosphorus, ammonia, nitrates, and dissolved organic carbon. Gradient forest appeared as a powerful method at a first exploratory step, to detect ecological thresholds at large spatial scale. The thresholds that were identified here must be reinforced by the separate analysis of other aquatic communities and may be used then to set protective environmental standards after consideration of natural variability among lakes.
Proceedings of the fourth annual Forest Inventory and Analysis symposium
Ronald E. McRoberts; Gregory A Reams; Paul C. Van Deusen; William H. McWilliams; Chris J. Cieszewski; Chris J., eds. Cieszewski
2005-01-01
Documents contributions to forest inventory in the areas of sampling, remote sensing, modeling, information management and analysis for the Forest Inventory and Analysis program of the USDA Forest Service.
Proceedings of the seventh annual forest inventory and analysis symposium
Ronald E. McRoberts; Gregory A. Reams; Paul C. Van Deusen; William H., eds. McWilliams
2007-01-01
Documents contributions to forest inventory in the areas of sampling, remote sensing, modeling, information management and analysis for the Forest Inventory and Analysis program of the USDA Forest Service.
Proceedings of the sixth annual forest inventory and analysis symposium
Ronald E. McRoberts; Gregory A. Reams; Paul C. Van Duesen; William H., eds. McWilliams
2006-01-01
Documents contributions to forest inventory in the area of sampling, remote sensing, modeling, information management, and analysis for the Forest Inventory and Analysis program of the USDA Forest Service.
Using classified Landsat Thematic Mapper data for stratification in a statewide forest inventory
Mark H. Hansen; Daniel G. Wendt
2000-01-01
The 1998 Indiana/Illinois forest inventory (USDA Forest Service, Forest Inventory and Analysis (FIA)) used Landsat Thematic Mapper (TM) data for stratification. Classified images made by the National Gap Analysis Program (GAP) stratified FIA plots into four classes (nonforest, nonforest/ forest, forest/nonforest, and forest) based on a two pixel forest edge buffer zone...
Using Classified Landsat Thematic Mapper Data for Stratification in a Statewide Forest Inventory
Mark H. Hansen; Daniel G. Wendt
2000-01-01
The 1998 Indiana/Illinois forest inventory (USDA Forest Service, Forest Inventory and Analysis (FIA)) used Landsat Thematic Mapper (TM} data for stratification. Classified images made by the National Gap Analysis Program (GAP) stratified FIA plots into four classes (nonforest, nonforest/forest, forest/nonforest, and forest) based on a two pixel forest edge buffer zone...
Marcus V. Warwell; Gerald E. Rehfeldt; Nicholas L. Crookston
2010-01-01
The Random Forests multiple regression tree was used to develop an empirically based bioclimatic model of the presence-absence of species occupying small geographic distributions in western North America. The species assessed were subalpine larch (Larix lyallii), smooth Arizona cypress (Cupressus arizonica ssp. glabra...
Determining soil erosion from roads in coastal plain of Alabama
McFero Grace; W.J. Elliot
2008-01-01
This paper reports soil losses and observed sediment deposition for 16 randomly selected forest road sections in the National Forests of Alabama. Visible sediment deposition zones were tracked along the stormwater flow path to the most remote location as a means of quantifying soil loss from road sections. Volumes of sediment in deposition zones were determined by...
Quantifying the abundance of co-occurring conifers along Inland Northwest (USA) climate gradients
Gerald E. Rehfeldt; Dennis E. Ferguson; Nicholas L. Crookston
2008-01-01
The occurrence and abundance of conifers along climate gradients in the Inland Northwest (USA) was assessed using data from 5082 field plots, 81% of which were forested. Analyses using the Random Forests classification tree revealed that the sequential distribution of species along an altitudinal gradient could be predicted with reasonable accuracy from a single...
Susan J. Crocker; Dacia M. Meneguzzo; Greg C. Liknes
2010-01-01
Landscape metrics, including host abundance and population density, were calculated using forest inventory and land cover data to assess the relationship between landscape pattern and the presence or absence of the emerald ash borer (EAB) (Agrilus planipennis Fairmaire). The Random Forests classification algorithm in the R statistical environment was...
Quantitative Trait Inheritance in a Forty-Year-Old Longleaf Pine Partial Diallel Test
Michael Stine; Jim Roberds; C. Dana Nelson; David P. Gwaze; Todd Shupe; Les Groom
2002-01-01
A longleaf pine (Pinus palustris Mill.) 13 parent partial diallel field experiment was established at two locations on the Harrison Experimental Forest in 1960. Parent trees were randomly selected from a natural population growing on the Harrison Experimental Forest, near Gulfport, Miss. Distance between trees chosen as parents ranged from 13 to 357...
Random forest meteorological normalisation models for Swiss PM10 trend analysis
NASA Astrophysics Data System (ADS)
Grange, Stuart K.; Carslaw, David C.; Lewis, Alastair C.; Boleti, Eirini; Hueglin, Christoph
2018-05-01
Meteorological normalisation is a technique which accounts for changes in meteorology over time in an air quality time series. Controlling for such changes helps support robust trend analysis because there is more certainty that the observed trends are due to changes in emissions or chemistry, not changes in meteorology. Predictive random forest models (RF; a decision tree machine learning technique) were grown for 31 air quality monitoring sites in Switzerland using surface meteorological, synoptic scale, boundary layer height, and time variables to explain daily PM10 concentrations. The RF models were used to calculate meteorologically normalised trends which were formally tested and evaluated using the Theil-Sen estimator. Between 1997 and 2016, significantly decreasing normalised PM10 trends ranged between -0.09 and -1.16 µg m-3 yr-1 with urban traffic sites experiencing the greatest mean decrease in PM10 concentrations at -0.77 µg m-3 yr-1. Similar magnitudes have been reported for normalised PM10 trends for earlier time periods in Switzerland which indicates PM10 concentrations are continuing to decrease at similar rates as in the past. The ability for RF models to be interpreted was leveraged using partial dependence plots to explain the observed trends and relevant physical and chemical processes influencing PM10 concentrations. Notably, two regimes were suggested by the models which cause elevated PM10 concentrations in Switzerland: one related to poor dispersion conditions and a second resulting from high rates of secondary PM generation in deep, photochemically active boundary layers. The RF meteorological normalisation process was found to be robust, user friendly and simple to implement, and readily interpretable which suggests the technique could be useful in many air quality exploratory data analysis situations.
Ranjith, G; Parvathy, R; Vikas, V; Chandrasekharan, Kesavadas; Nair, Suresh
2015-04-01
With the advent of new imaging modalities, radiologists are faced with handling increasing volumes of data for diagnosis and treatment planning. The use of automated and intelligent systems is becoming essential in such a scenario. Machine learning, a branch of artificial intelligence, is increasingly being used in medical image analysis applications such as image segmentation, registration and computer-aided diagnosis and detection. Histopathological analysis is currently the gold standard for classification of brain tumors. The use of machine learning algorithms along with extraction of relevant features from magnetic resonance imaging (MRI) holds promise of replacing conventional invasive methods of tumor classification. The aim of the study is to classify gliomas into benign and malignant types using MRI data. Retrospective data from 28 patients who were diagnosed with glioma were used for the analysis. WHO Grade II (low-grade astrocytoma) was classified as benign while Grade III (anaplastic astrocytoma) and Grade IV (glioblastoma multiforme) were classified as malignant. Features were extracted from MR spectroscopy. The classification was done using four machine learning algorithms: multilayer perceptrons, support vector machine, random forest and locally weighted learning. Three of the four machine learning algorithms gave an area under ROC curve in excess of 0.80. Random forest gave the best performance in terms of AUC (0.911) while sensitivity was best for locally weighted learning (86.1%). The performance of different machine learning algorithms in the classification of gliomas is promising. An even better performance may be expected by integrating features extracted from other MR sequences. © The Author(s) 2015 Reprints and permissions: sagepub.co.uk/journalsPermissions.nav.
Babcock, Chad; Finley, Andrew O.; Bradford, John B.; Kolka, Randall K.; Birdsey, Richard A.; Ryan, Michael G.
2015-01-01
Many studies and production inventory systems have shown the utility of coupling covariates derived from Light Detection and Ranging (LiDAR) data with forest variables measured on georeferenced inventory plots through regression models. The objective of this study was to propose and assess the use of a Bayesian hierarchical modeling framework that accommodates both residual spatial dependence and non-stationarity of model covariates through the introduction of spatial random effects. We explored this objective using four forest inventory datasets that are part of the North American Carbon Program, each comprising point-referenced measures of above-ground forest biomass and discrete LiDAR. For each dataset, we considered at least five regression model specifications of varying complexity. Models were assessed based on goodness of fit criteria and predictive performance using a 10-fold cross-validation procedure. Results showed that the addition of spatial random effects to the regression model intercept improved fit and predictive performance in the presence of substantial residual spatial dependence. Additionally, in some cases, allowing either some or all regression slope parameters to vary spatially, via the addition of spatial random effects, further improved model fit and predictive performance. In other instances, models showed improved fit but decreased predictive performance—indicating over-fitting and underscoring the need for cross-validation to assess predictive ability. The proposed Bayesian modeling framework provided access to pixel-level posterior predictive distributions that were useful for uncertainty mapping, diagnosing spatial extrapolation issues, revealing missing model covariates, and discovering locally significant parameters.
Mo, Xiao-Xue; Shi, Ling-Ling; Zhang, Yong-Jiang; Zhu, Hua; Slik, J W Ferry
2013-01-01
Tropical rainforests in Southeast Asia are facing increasing and ever more intense human disturbance that often negatively affects biodiversity. The aim of this study was to determine how tree species phylogenetic diversity is affected by traditional forest management types and to understand the change in community phylogenetic structure during succession. Four types of forests with different management histories were selected for this purpose: old growth forests, understorey planted old growth forests, old secondary forests (∼200-years after slash and burn), and young secondary forests (15-50-years after slash and burn). We found that tree phylogenetic community structure changed from clustering to over-dispersion from early to late successional forests and finally became random in old-growth forest. We also found that the phylogenetic structure of the tree overstorey and understorey responded differentially to change in environmental conditions during succession. In addition, we show that slash and burn agriculture (swidden cultivation) can increase landscape level plant community evolutionary information content.
Mo, Xiao-Xue; Shi, Ling-Ling; Zhang, Yong-Jiang; Zhu, Hua; Slik, J. W. Ferry
2013-01-01
Tropical rainforests in Southeast Asia are facing increasing and ever more intense human disturbance that often negatively affects biodiversity. The aim of this study was to determine how tree species phylogenetic diversity is affected by traditional forest management types and to understand the change in community phylogenetic structure during succession. Four types of forests with different management histories were selected for this purpose: old growth forests, understorey planted old growth forests, old secondary forests (∼200-years after slash and burn), and young secondary forests (15–50-years after slash and burn). We found that tree phylogenetic community structure changed from clustering to over-dispersion from early to late successional forests and finally became random in old-growth forest. We also found that the phylogenetic structure of the tree overstorey and understorey responded differentially to change in environmental conditions during succession. In addition, we show that slash and burn agriculture (swidden cultivation) can increase landscape level plant community evolutionary information content. PMID:23936268
The structure of tropical forests and sphere packings
Jahn, Markus Wilhelm; Dobner, Hans-Jürgen; Wiegand, Thorsten; Huth, Andreas
2015-01-01
The search for simple principles underlying the complex architecture of ecological communities such as forests still challenges ecological theorists. We use tree diameter distributions—fundamental for deriving other forest attributes—to describe the structure of tropical forests. Here we argue that tree diameter distributions of natural tropical forests can be explained by stochastic packing of tree crowns representing a forest crown packing system: a method usually used in physics or chemistry. We demonstrate that tree diameter distributions emerge accurately from a surprisingly simple set of principles that include site-specific tree allometries, random placement of trees, competition for space, and mortality. The simple static model also successfully predicted the canopy structure, revealing that most trees in our two studied forests grow up to 30–50 m in height and that the highest packing density of about 60% is reached between the 25- and 40-m height layer. Our approach is an important step toward identifying a minimal set of processes responsible for generating the spatial structure of tropical forests. PMID:26598678
Ponnusamy, K; Jose, S; Savarimuthu, I; Michael, G P; Redenbach, M
2011-09-01
Chromobacterium are saprophytes that cause highly fatal opportunistic infections. Identification and strain differentiation were performed to identify the strain variability among the environmental samples. We have evaluated the suitability of individual and combined methods to detect the strain variations of the samples collected in different seasons. Amplified ribosomal DNA restriction analysis (ARDRA) and random amplified polymorphic DNA (RAPD) profiles were obtained using four different restriction enzyme digestions (AluI, HaeIII, MspI and RsaI) and five random primers. A matrix of dice similarity coefficients was calculated and used to compare these restriction patterns. ARDRA showed rapid differentiation of strains based on 16S rDNA, but the combined RAPD and ARDRA gave a more reliable differentiation than when either of them was analysed individually. A high level of genetic diversity was observed, which indicates that the Kolli Hills' C. violaceum isolates would fall into at least three new clusters. Results showed a noteworthy bacterial variation and genetic diversity of C. violaceum in the unexplored, virgin forest area. © 2011 The Authors. Letters in Applied Microbiology © 2011 The Society for Applied Microbiology.
Das, Arundhati; Nagendra, Harini; Anand, Madhur; Bunyan, Milind
2015-01-01
The objective of this analysis was to identify topographic and bioclimatic factors that predict occurrence of forest and grassland patches within tropical montane forest-grassland mosaics. We further investigated whether interactions between topography and bioclimate are important in determining vegetation pattern, and assessed the role of spatial scale in determining the relative importance of specific topographic features. Finally, we assessed the role of elevation in determining the relative importance of diverse explanatory factors. The study area consists of the central and southern regions of the Western Ghats of Southern India, a global biodiversity hotspot. Random forests were used to assess prediction accuracy and predictor importance. Conditional inference classification trees were used to interpret predictor effects and examine potential interactions between predictors. GLMs were used to confirm predictor importance and assess the strength of interaction terms. Overall, topographic and bioclimatic predictors classified vegetation pattern with approximately 70% accuracy. Prediction accuracy was higher for grassland than forest, and for mosaics at higher elevations. Elevation was the most important predictor, with mosaics above 2000m dominated largely by grassland. Relative topographic position measured at a local scale (within a 300m neighbourhood) was another important predictor of vegetation pattern. In high elevation mosaics, northness and concave land surface curvature were important predictors of forest occurrence. Important bioclimatic predictors were: dry quarter precipitation, annual temperature range and the interaction between the two. The results indicate complex interactions between topography and bioclimate and among topographic variables. Elevation and topography have a strong influence on vegetation pattern in these mosaics. There were marked regional differences in the roles of various topographic and bioclimatic predictors across the range of study mosaics, indicating that the same pattern of grass and forest seems to be generated by different sets of mechanisms across the region, depending on spatial scale and elevation. PMID:26121353
Das, Arundhati; Nagendra, Harini; Anand, Madhur; Bunyan, Milind
2015-01-01
The objective of this analysis was to identify topographic and bioclimatic factors that predict occurrence of forest and grassland patches within tropical montane forest-grassland mosaics. We further investigated whether interactions between topography and bioclimate are important in determining vegetation pattern, and assessed the role of spatial scale in determining the relative importance of specific topographic features. Finally, we assessed the role of elevation in determining the relative importance of diverse explanatory factors. The study area consists of the central and southern regions of the Western Ghats of Southern India, a global biodiversity hotspot. Random forests were used to assess prediction accuracy and predictor importance. Conditional inference classification trees were used to interpret predictor effects and examine potential interactions between predictors. GLMs were used to confirm predictor importance and assess the strength of interaction terms. Overall, topographic and bioclimatic predictors classified vegetation pattern with approximately 70% accuracy. Prediction accuracy was higher for grassland than forest, and for mosaics at higher elevations. Elevation was the most important predictor, with mosaics above 2000 m dominated largely by grassland. Relative topographic position measured at a local scale (within a 300 m neighbourhood) was another important predictor of vegetation pattern. In high elevation mosaics, northness and concave land surface curvature were important predictors of forest occurrence. Important bioclimatic predictors were: dry quarter precipitation, annual temperature range and the interaction between the two. The results indicate complex interactions between topography and bioclimate and among topographic variables. Elevation and topography have a strong influence on vegetation pattern in these mosaics. There were marked regional differences in the roles of various topographic and bioclimatic predictors across the range of study mosaics, indicating that the same pattern of grass and forest seems to be generated by different sets of mechanisms across the region, depending on spatial scale and elevation.
Wells, Konstans; Pfeiffer, Martin; Lakim, Maklarin B; Kalko, Elisabeth K V
2006-09-01
1. Non-volant animals in tropical rain forests differ in their ability to exploit the habitat above the forest floor and also in their response to habitat variability. It is predicted that specific movement trajectories are determined both by intrinsic factors such as ecological specialization, morphology and body size and by structural features of the surrounding habitat such as undergrowth and availability of supportive structures. 2. We applied spool-and-line tracking in order to describe movement trajectories and habitat segregation of eight species of small mammals from an assemblage of Muridae, Tupaiidae and Sciuridae in the rain forest of Borneo where we followed a total of 13,525 m path. We also analysed specific changes in the movement patterns of the small mammals in relation to habitat stratification between logged and unlogged forests. Variables related to climbing activity of the tracked species as well as the supportive structures of the vegetation and undergrowth density were measured along their tracks. 3. Movement patterns of the small mammals differed significantly between species. Most similarities were found in congeneric species that converged strongly in body size and morphology. All species were affected in their movement patterns by the altered forest structure in logged forests with most differences found in Leopoldamys sabanus. However, the large proportions of short step lengths found in all species for both forest types and similar path tortuosity suggest that the main movement strategies of the small mammals were not influenced by logging but comprised generally a response to the heterogeneous habitat as opposed to random movement strategies predicted for homogeneous environments. 4. Overall shifts in microhabitat use showed no coherent trend among species. Multivariate (principal component) analysis revealed contrasting trends for convergent species, in particular for Maxomys rajah and M. surifer as well as for Tupaia longipes and T. tana, suggesting that each species was uniquely affected in its movement trajectories by a multiple set of environmental and intrinsic features.
NASA Astrophysics Data System (ADS)
Othman, Arsalan A.; Gloaguen, Richard
2017-09-01
Lithological mapping in mountainous regions is often impeded by limited accessibility due to relief. This study aims to evaluate (1) the performance of different supervised classification approaches using remote sensing data and (2) the use of additional information such as geomorphology. We exemplify the methodology in the Bardi-Zard area in NE Iraq, a part of the Zagros Fold - Thrust Belt, known for its chromite deposits. We highlighted the improvement of remote sensing geological classification by integrating geomorphic features and spatial information in the classification scheme. We performed a Maximum Likelihood (ML) classification method besides two Machine Learning Algorithms (MLA): Support Vector Machine (SVM) and Random Forest (RF) to allow the joint use of geomorphic features, Band Ratio (BR), Principal Component Analysis (PCA), spatial information (spatial coordinates) and multispectral data of the Advanced Space-borne Thermal Emission and Reflection radiometer (ASTER) satellite. The RF algorithm showed reliable results and discriminated serpentinite, talus and terrace deposits, red argillites with conglomerates and limestone, limy conglomerates and limestone conglomerates, tuffites interbedded with basic lavas, limestone and Metamorphosed limestone and reddish green shales. The best overall accuracy (∼80%) was achieved by Random Forest (RF) algorithms in the majority of the sixteen tested combination datasets.
Aggregation of Sentinel-2 time series classifications as a solution for multitemporal analysis
NASA Astrophysics Data System (ADS)
Lewiński, Stanislaw; Nowakowski, Artur; Malinowski, Radek; Rybicki, Marcin; Kukawska, Ewa; Krupiński, Michał
2017-10-01
The general aim of this work was to elaborate efficient and reliable aggregation method that could be used for creating a land cover map at a global scale from multitemporal satellite imagery. The study described in this paper presents methods for combining results of land cover/land use classifications performed on single-date Sentinel-2 images acquired at different time periods. For that purpose different aggregation methods were proposed and tested on study sites spread on different continents. The initial classifications were performed with Random Forest classifier on individual Sentinel-2 images from a time series. In the following step the resulting land cover maps were aggregated pixel by pixel using three different combinations of information on the number of occurrences of a certain land cover class within a time series and the posterior probability of particular classes resulting from the Random Forest classification. From the proposed methods two are shown superior and in most cases were able to reach or outperform the accuracy of the best individual classifications of single-date images. Moreover, the aggregations results are very stable when used on data with varying cloudiness. They also enable to reduce considerably the number of cloudy pixels in the resulting land cover map what is significant advantage for mapping areas with frequent cloud coverage.
Automatic co-segmentation of lung tumor based on random forest in PET-CT images
NASA Astrophysics Data System (ADS)
Jiang, Xueqing; Xiang, Dehui; Zhang, Bin; Zhu, Weifang; Shi, Fei; Chen, Xinjian
2016-03-01
In this paper, a fully automatic method is proposed to segment the lung tumor in clinical 3D PET-CT images. The proposed method effectively combines PET and CT information to make full use of the high contrast of PET images and superior spatial resolution of CT images. Our approach consists of three main parts: (1) initial segmentation, in which spines are removed in CT images and initial connected regions achieved by thresholding based segmentation in PET images; (2) coarse segmentation, in which monotonic downhill function is applied to rule out structures which have similar standardized uptake values (SUV) to the lung tumor but do not satisfy a monotonic property in PET images; (3) fine segmentation, random forests method is applied to accurately segment the lung tumor by extracting effective features from PET and CT images simultaneously. We validated our algorithm on a dataset which consists of 24 3D PET-CT images from different patients with non-small cell lung cancer (NSCLC). The average TPVF, FPVF and accuracy rate (ACC) were 83.65%, 0.05% and 99.93%, respectively. The correlation analysis shows our segmented lung tumor volumes has strong correlation ( average 0.985) with the ground truth 1 and ground truth 2 labeled by a clinical expert.
Building rooftop classification using random forests for large-scale PV deployment
NASA Astrophysics Data System (ADS)
Assouline, Dan; Mohajeri, Nahid; Scartezzini, Jean-Louis
2017-10-01
Large scale solar Photovoltaic (PV) deployment on existing building rooftops has proven to be one of the most efficient and viable sources of renewable energy in urban areas. As it usually requires a potential analysis over the area of interest, a crucial step is to estimate the geometric characteristics of the building rooftops. In this paper, we introduce a multi-layer machine learning methodology to classify 6 roof types, 9 aspect (azimuth) classes and 5 slope (tilt) classes for all building rooftops in Switzerland, using GIS processing. We train Random Forests (RF), an ensemble learning algorithm, to build the classifiers. We use (2 × 2) [m2 ] LiDAR data (considering buildings and vegetation) to extract several rooftop features, and a generalised footprint polygon data to localize buildings. The roof classifier is trained and tested with 1252 labeled roofs from three different urban areas, namely Baden, Luzern, and Winterthur. The results for roof type classification show an average accuracy of 67%. The aspect and slope classifiers are trained and tested with 11449 labeled roofs in the Zurich periphery area. The results for aspect and slope classification show different accuracies depending on the classes: while some classes are well identified, other under-represented classes remain challenging to detect.
Random Forests for Global and Regional Crop Yield Predictions.
Jeong, Jig Han; Resop, Jonathan P; Mueller, Nathaniel D; Fleisher, David H; Yun, Kyungdahm; Butler, Ethan E; Timlin, Dennis J; Shim, Kyo-Moon; Gerber, James S; Reddy, Vangimalla R; Kim, Soo-Hyung
2016-01-01
Accurate predictions of crop yield are critical for developing effective agricultural and food policies at the regional and global scales. We evaluated a machine-learning method, Random Forests (RF), for its ability to predict crop yield responses to climate and biophysical variables at global and regional scales in wheat, maize, and potato in comparison with multiple linear regressions (MLR) serving as a benchmark. We used crop yield data from various sources and regions for model training and testing: 1) gridded global wheat grain yield, 2) maize grain yield from US counties over thirty years, and 3) potato tuber and maize silage yield from the northeastern seaboard region. RF was found highly capable of predicting crop yields and outperformed MLR benchmarks in all performance statistics that were compared. For example, the root mean square errors (RMSE) ranged between 6 and 14% of the average observed yield with RF models in all test cases whereas these values ranged from 14% to 49% for MLR models. Our results show that RF is an effective and versatile machine-learning method for crop yield predictions at regional and global scales for its high accuracy and precision, ease of use, and utility in data analysis. RF may result in a loss of accuracy when predicting the extreme ends or responses beyond the boundaries of the training data.
Salivary flow rate and pH in patients with oral pathologies.
Foglio-Bonda, P L; Brilli, K; Pattarino, F; Foglio-Bonda, A
2017-01-01
Determine salivary pH and flow rate (FR) in a sample of 164 patients who came to Oral Pathology ambulatory, 84 suffering from oral lesions and 80 without oral lesions. Another aim was to evaluate factors that influence salivary flow rate. Subjects underwent clinical examination and completed an anamnestic questionnaire in order to obtain useful information that was used to classify participants in different groups. Unstimulated whole saliva (UWS) was collected using the spitting method at 11:00 am. The FR was evaluated with the weighing technique and a portable pHmeter, equipped with a microelectrode, was used to measure pH. Both univariate and classification (single and Random Forest) analyses were performed. The data analysis showed that FR and pH showed significant differences (p < 0.001) between patients with oral lesions (FR = 0.336 mL/min, pH = 6.69) and the ones without oral lesions (FR = 0.492 mL/min, pH = 6.96). By Random Forest, oral lesions and antihypertensive drugs were ranked in the top two among the evaluated variables to discretize subjects with FR = 0.16 mL/min. Our study shows that there is a relationship between oral lesions, antihypertensive drugs and alteration of pH and FR.
Kronholm, Scott C.; Capel, Paul D.; Terziotti, Silvia
2016-01-01
Accurate estimation of total nitrogen loads is essential for evaluating conditions in the aquatic environment. Extrapolation of estimates beyond measured streams will greatly expand our understanding of total nitrogen loading to streams. Recursive partitioning and random forest regression were used to assess 85 geospatial, environmental, and watershed variables across 636 small (<585 km2) watersheds to determine which variables are fundamentally important to the estimation of annual loads of total nitrogen. Initial analysis led to the splitting of watersheds into three groups based on predominant land use (agricultural, developed, and undeveloped). Nitrogen application, agricultural and developed land area, and impervious or developed land in the 100-m stream buffer were commonly extracted variables by both recursive partitioning and random forest regression. A series of multiple linear regression equations utilizing the extracted variables were created and applied to the watersheds. As few as three variables explained as much as 76 % of the variability in total nitrogen loads for watersheds with predominantly agricultural land use. Catchment-scale national maps were generated to visualize the total nitrogen loads and yields across the USA. The estimates provided by these models can inform water managers and help identify areas where more in-depth monitoring may be beneficial.
Steyrl, David; Scherer, Reinhold; Faller, Josef; Müller-Putz, Gernot R
2016-02-01
There is general agreement in the brain-computer interface (BCI) community that although non-linear classifiers can provide better results in some cases, linear classifiers are preferable. Particularly, as non-linear classifiers often involve a number of parameters that must be carefully chosen. However, new non-linear classifiers were developed over the last decade. One of them is the random forest (RF) classifier. Although popular in other fields of science, RFs are not common in BCI research. In this work, we address three open questions regarding RFs in sensorimotor rhythm (SMR) BCIs: parametrization, online applicability, and performance compared to regularized linear discriminant analysis (LDA). We found that the performance of RF is constant over a large range of parameter values. We demonstrate - for the first time - that RFs are applicable online in SMR-BCIs. Further, we show in an offline BCI simulation that RFs statistically significantly outperform regularized LDA by about 3%. These results confirm that RFs are practical and convenient non-linear classifiers for SMR-BCIs. Taking into account further properties of RFs, such as independence from feature distributions, maximum margin behavior, multiclass and advanced data mining capabilities, we argue that RFs should be taken into consideration for future BCIs.
Zhang, Kai; Li, Yun; Schwartz, Joel D.; O'Neill, Marie S.
2014-01-01
Hot weather increases risk of mortality. Previous studies used different sets of weather variables to characterize heat stress, resulting in variation in heat-mortality- associations depending on the metric used. We employed a statistical learning method – random forests – to examine which of various weather variables had the greatest impact on heat-related mortality. We compiled a summertime daily weather and mortality counts dataset from four U.S. cities (Chicago, IL; Detroit, MI; Philadelphia, PA; and Phoenix, AZ) from 1998 to 2006. A variety of weather variables were ranked in predicting deviation from typical daily all-cause and cause-specific death counts. Ranks of weather variables varied with city and health outcome. Apparent temperature appeared to be the most important predictor of heat-related mortality for all-cause mortality. Absolute humidity was, on average, most frequently selected one of the top variables for all-cause mortality and seven cause-specific mortality categories. Our analysis affirms that apparent temperature is a reasonable variable for activating heat alerts and warnings, which are commonly based on predictions of total mortality in next few days. Additionally, absolute humidity should be included in future heat-health studies. Finally, random forests can be used to guide choice of weather variables in heat epidemiology studies. PMID:24834832
Finch, Kristen; Espinoza, Edgard; Jones, F. Andrew; Cronn, Richard
2017-01-01
Premise of the study: We investigated whether wood metabolite profiles from direct analysis in real time (time-of-flight) mass spectrometry (DART-TOFMS) could be used to determine the geographic origin of Douglas-fir wood cores originating from two regions in western Oregon, USA. Methods: Three annual ring mass spectra were obtained from 188 adult Douglas-fir trees, and these were analyzed using random forest models to determine whether samples could be classified to geographic origin, growth year, or growth year and geographic origin. Specific wood molecules that contributed to geographic discrimination were identified. Results: Douglas-fir mass spectra could be differentiated into two geographic classes with an accuracy between 70% and 76%. Classification models could not accurately classify sample mass spectra based on growth year. Thirty-two molecules were identified as key for classifying western Oregon Douglas-fir wood cores to geographic origin. Discussion: DART-TOFMS is capable of detecting minute but regionally informative differences in wood molecules over a small geographic scale, and these differences made it possible to predict the geographic origin of Douglas-fir wood with moderate accuracy. Studies involving DART-TOFMS, alone and in combination with other technologies, will be relevant for identifying the geographic origin of illegally harvested wood. PMID:28529831
Relevant Feature Set Estimation with a Knock-out Strategy and Random Forests
Ganz, Melanie; Greve, Douglas N.; Fischl, Bruce; Konukoglu, Ender
2015-01-01
Group analysis of neuroimaging data is a vital tool for identifying anatomical and functional variations related to diseases as well as normal biological processes. The analyses are often performed on a large number of highly correlated measurements using a relatively smaller number of samples. Despite the correlation structure, the most widely used approach is to analyze the data using univariate methods followed by post-hoc corrections that try to account for the data’s multivariate nature. Although widely used, this approach may fail to recover from the adverse effects of the initial analysis when local effects are not strong. Multivariate pattern analysis (MVPA) is a powerful alternative to the univariate approach for identifying relevant variations. Jointly analyzing all the measures, MVPA techniques can detect global effects even when individual local effects are too weak to detect with univariate analysis. Current approaches are successful in identifying variations that yield highly predictive and compact models. However, they suffer from lessened sensitivity and instabilities in identification of relevant variations. Furthermore, current methods’ user-defined parameters are often unintuitive and difficult to determine. In this article, we propose a novel MVPA method for group analysis of high-dimensional data that overcomes the drawbacks of the current techniques. Our approach explicitly aims to identify all relevant variations using a “knock-out” strategy and the Random Forest algorithm. In evaluations with synthetic datasets the proposed method achieved substantially higher sensitivity and accuracy than the state-of-the-art MVPA methods, and outperformed the univariate approach when the effect size is low. In experiments with real datasets the proposed method identified regions beyond the univariate approach, while other MVPA methods failed to replicate the univariate results. More importantly, in a reproducibility study with the well-known ADNI dataset the proposed method yielded higher stability and power than the univariate approach. PMID:26272728
Liu, Quan; Ma, Li; Fan, Shou-Zen; Abbod, Maysam F; Shieh, Jiann-Shing
2018-01-01
Estimating the depth of anaesthesia (DoA) in operations has always been a challenging issue due to the underlying complexity of the brain mechanisms. Electroencephalogram (EEG) signals are undoubtedly the most widely used signals for measuring DoA. In this paper, a novel EEG-based index is proposed to evaluate DoA for 24 patients receiving general anaesthesia with different levels of unconsciousness. Sample Entropy (SampEn) algorithm was utilised in order to acquire the chaotic features of the signals. After calculating the SampEn from the EEG signals, Random Forest was utilised for developing learning regression models with Bispectral index (BIS) as the target. Correlation coefficient, mean absolute error, and area under the curve (AUC) were used to verify the perioperative performance of the proposed method. Validation comparisons with typical nonstationary signal analysis methods (i.e., recurrence analysis and permutation entropy) and regression methods (i.e., neural network and support vector machine) were conducted. To further verify the accuracy and validity of the proposed methodology, the data is divided into four unconsciousness-level groups on the basis of BIS levels. Subsequently, analysis of variance (ANOVA) was applied to the corresponding index (i.e., regression output). Results indicate that the correlation coefficient improved to 0.72 ± 0.09 after filtering and to 0.90 ± 0.05 after regression from the initial values of 0.51 ± 0.17. Similarly, the final mean absolute error dramatically declined to 5.22 ± 2.12. In addition, the ultimate AUC increased to 0.98 ± 0.02, and the ANOVA analysis indicates that each of the four groups of different anaesthetic levels demonstrated significant difference from the nearest levels. Furthermore, the Random Forest output was extensively linear in relation to BIS, thus with better DoA prediction accuracy. In conclusion, the proposed method provides a concrete basis for monitoring patients' anaesthetic level during surgeries.
The soil indicator of forest health in the Forest Inventory and Analysis Program
Michael C. Amacher; Charles H. Perry
2010-01-01
Montreal Process Criteria and Indicators (MPCI) were established to monitor forest conditions and trends to promote sustainable forest management. The Soil Indicator of forest health was developed and implemented within the USFS Forest Inventory and Analysis (FIA) program to assess condition and trends in forest soil quality in U.S. forests regardless of ownership. The...
NASA Astrophysics Data System (ADS)
Huesca Martinez, M.; Garcia, M.; Roth, K. L.; Casas, A.; Ustin, S.
2015-12-01
There is a well-established need within the remote sensing community for improved estimation of canopy structure and understanding of its influence on the retrieval of leaf biochemical properties. The aim of this project was to evaluate the estimation of structural properties directly from hyperspectral data, with the broader goal that these might be used to constrain retrievals of canopy chemistry. We used NASA's Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) to discriminate different canopy structural types, defined in terms of biomass, canopy height and vegetation complexity, and compared them to estimates of these properties measured by LiDAR data. We tested a large number of optical metrics, including single narrow band reflectance and 1st derivative, sub-pixel cover fractions, narrow-band indices, spectral absorption features, and Principal Component Analysis components. Canopy structural types were identified and classified from different forest types by integrating structural traits measured by optical metrics using the Random Forest (RF) classifier. The classification accuracy was above 70% in most of the vegetation scenarios. The best overall accuracy was achieved for hardwood forest (>80% accuracy) and the lowest accuracy was found in mixed forest (~70% accuracy). Furthermore, similarly high accuracy was found when the RF classifier was applied to a spatially independent dataset, showing significant portability for the method used. Results show that all spectral regions played a role in canopy structure assessment, thus the whole spectrum is required. Furthermore, optical metrics derived from AVIRIS proved to be a powerful technique for structural attribute mapping. This research illustrates the potential for using optical properties to distinguish several canopy structural types in different forest types, and these may be used to constrain quantitative measurements of absorbing properties in future research.
A technique for conducting point pattern analysis of cluster plot stem-maps
C.W. Woodall; J.M. Graham
2004-01-01
Point pattern analysis of forest inventory stem-maps may aid interpretation and inventory estimation of forest attributes. To evaluate the techniques and benefits of conducting point pattern analysis of forest inventory stem-maps, Ripley`s K(t) was calculated for simulated tree spatial distributions and for over 600 USDA Forest Service Forest...
Species Composition of Down Dead and Standing Live Trees: Implications for Forest Inventory Analysis
Christopher W. Woodall; Linda Nagel
2005-01-01
The assessment of species composition in most forest inventory analysis relies solely on standing live tree information characterized by current forest type. With the implementation of the third phase of the U.S. Department of Agriculture Forest Service's Forest Inventory and Analysis program, the species composition of down dead trees, otherwise termed coarse...
NASA Technical Reports Server (NTRS)
Neumann, Maxim; Hensley, Scott; Lavalle, Marco; Ahmed, Razi
2013-01-01
This paper concerns forest remote sensing using JPL's multi-baseline polarimetric interferometric UAVSAR data. It presents exemplary results and analyzes the possibilities and limitations of using SAR Tomography and Polarimetric SAR Interferometry (PolInSAR) techniques for the estimation of forest structure. Performance and error indicators for the applicability and reliability of the used multi-baseline (MB) multi-temporal (MT) PolInSAR random volume over ground (RVoG) model are discussed. Experimental results are presented based on JPL's L-band repeat-pass polarimetric interferometric UAVSAR data over temperate and tropical forest biomes in the Harvard Forest, Massachusetts, and in the La Amistad Park, Panama and Costa Rica. The results are partially compared with ground field measurements and with air-borne LVIS lidar data.
NASA Technical Reports Server (NTRS)
Neumann, Maxim; Hensley, Scott; Lavalle, Marco; Ahmed, Razi
2013-01-01
This paper concerns forest remote sensing using JPL's multi-baseline polarimetric interferometric UAVSAR data. It presents exemplary results and analyzes the possibilities and limitations of using SAR Tomography and Polarimetric SAR Interferometry (PolInSAR) techniques for the estimation of forest structure. Performance and error indicators for the applicability and reliability of the used multi-baseline (MB) multi-temporal (MT) PolInSAR random volume over ground (RVoG) model are discussed. Experimental results are presented based on JPL's L-band repeat-pass polarimetric interferometric UAVSAR data over temperate and tropical forest biomes in the Harvard Forest, Massachusetts, and in the La Amistad Park, Panama and Costa Rica. The results are partially compared with ground field measurements and with air-borne LVIS lidar data.
Design of Probabilistic Random Forests with Applications to Anticancer Drug Sensitivity Prediction
Rahman, Raziur; Haider, Saad; Ghosh, Souparno; Pal, Ranadip
2015-01-01
Random forests consisting of an ensemble of regression trees with equal weights are frequently used for design of predictive models. In this article, we consider an extension of the methodology by representing the regression trees in the form of probabilistic trees and analyzing the nature of heteroscedasticity. The probabilistic tree representation allows for analytical computation of confidence intervals (CIs), and the tree weight optimization is expected to provide stricter CIs with comparable performance in mean error. We approached the ensemble of probabilistic trees’ prediction from the perspectives of a mixture distribution and as a weighted sum of correlated random variables. We applied our methodology to the drug sensitivity prediction problem on synthetic and cancer cell line encyclopedia dataset and illustrated that tree weights can be selected to reduce the average length of the CI without increase in mean error. PMID:27081304
Nagasawa, Shinji; Al-Naamani, Eman; Saeki, Akinori
2018-05-17
Owing to the diverse chemical structures, organic photovoltaic (OPV) applications with a bulk heterojunction framework have greatly evolved over the last two decades, which has produced numerous organic semiconductors exhibiting improved power conversion efficiencies (PCEs). Despite the recent fast progress in materials informatics and data science, data-driven molecular design of OPV materials remains challenging. We report a screening of conjugated molecules for polymer-fullerene OPV applications by supervised learning methods (artificial neural network (ANN) and random forest (RF)). Approximately 1000 experimental parameters including PCE, molecular weight, and electronic properties are manually collected from the literature and subjected to machine learning with digitized chemical structures. Contrary to the low correlation coefficient in ANN, RF yields an acceptable accuracy, which is twice that of random classification. We demonstrate the application of RF screening for the design, synthesis, and characterization of a conjugated polymer, which facilitates a rapid development of optoelectronic materials.
Quantifying and mapping spatial variability in simulated forest plots
Gavin R. Corral; Harold E. Burkhart
2016-01-01
We used computer simulations to test the efficacy of multivariate statistical methods to detect, quantify, and map spatial variability of forest stands. Simulated stands were developed of regularly-spaced plantations of loblolly pine (Pinus taeda L.). We assumed no affects of competition or mortality, but random variability was added to individual tree characteristics...
Courtney Flint; Hua Qin; Michael Daab
2008-01-01
The US Forest Service, Pacific Northwest Research Station funded research to assess community responses to forest disturbance by mountain pine beetles (Dendroctonus ponderosae) and public reaction to invasive plants in north central Colorado. In the Spring of2007, 4,027 16-page questionnaires were mailed to randomly selected households with addresses in Breckenridge,...
D. Jordan; F., Jr. Ponder; V. C. Hubbard
2003-01-01
A greenhouse study examined the effects of soil compaction and forest leaf litter on the growth and nitrogen (N) uptake and recovery of red oak (Quercus rubra L.) and scarlet oak (Quercus coccinea Muencch) seedlings and selected microbial activity over a 6-month period. The experiment had a randomized complete block design with...
Stemflow estimation in a redwood forest using model-based stratified random sampling
Jack Lewis
2003-01-01
Model-based stratified sampling is illustrated by a case study of stemflow volume in a redwood forest. The approach is actually a model-assisted sampling design in which auxiliary information (tree diameter) is utilized in the design of stratum boundaries to optimize the efficiency of a regression or ratio estimator. The auxiliary information is utilized in both the...
Sample-based estimation of tree species richness in a wet tropical forest compartment
Steen Magnussen; Raphael Pelissier
2007-01-01
Petersen's capture-recapture ratio estimator and the well-known bootstrap estimator are compared across a range of simulated low-intensity simple random sampling with fixed-area plots of 100 m? in a rich wet tropical forest compartment with 93 tree species in the Western Ghats of India. Petersen's ratio estimator was uniformly superior to the bootstrap...
Rates and Implications of Rainfall Interception in a Coastal Redwood Forest
Leslie M. Reid; Jack Lewis
2007-01-01
Throughfall was measured for a year at five-min intervals in 11 collectors randomly located on two plots in a second-growth redwood forest at the Caspar Creek Experimental Watersheds. Monitoring at one plot continued two more years, during which stemflow from 24 trees was also measured. Comparison of throughfall and stemflow to rainfall measured in adjacent clearings...
Jonatha L. Horton; Barton D. Clinton; John F. Walker; Colin M. Beir; Erik T. Nilsen
2009-01-01
Ericaceous shrubs can influence soil properties in many ecosystems. In this study, we examined how soil and forest floor properties vary among sites with different ericaceous evergreen shrub basal area in the southern Appalachian mountains. We randomly located plots along transects that included open understories and understories with varying amounts of Rhododendron...
Shannon L. Savage; Rick L. Lawrence; John R. Squires
2015-01-01
Ecological and land management applications would often benefit from maps of relative canopy cover of each species present within a pixel, instead of traditional remote-sensing based maps of either dominant species or percent canopy cover without regard to species composition. Widely used statistical models for remote sensing, such as randomForest (RF),...
'Pygmy' old-growth redwood characteristics on an edaphic ecotone in Mendocino County, California
Will Russell; Suzie. Woolhouse
2012-01-01
The 'pygmy forest' is a specialized community that is adapted to highly acidic, hydrophobic, nutrient deprived soils, and exists in pockets within the coast redwood forest in Mendocino County. While coast redwood is known as an exceptionally tall tree, stunted trees exhibit unusual growth-forms on pygmy soils. We used a stratified random sampling procedure to...
Ecological impacts and management strategies for western larch in the face of climate-change
Gerald E. Rehfeldt; Barry C. Jaquish
2010-01-01
Approximately 185,000 forest inventory and ecological plots from both USA and Canada were used to predict the contemporary distribution of western larch (Larix occidentalis Nutt.) from climate variables. The random forests algorithm, using an 8-variable model, produced an overall error rate of about 2.9 %, nearly all of which consisted of predicting presence at...
Simulation of long-term landscape-level fuel treatment effects on large wildfires
Mark A. Finney; Rob C. Seli; Charles W. McHugh; Alan A. Ager; Bernhard Bahro; James K. Agee
2008-01-01
A simulation system was developed to explore how fuel treatments placed in topologically random and optimal spatial patterns affect the growth and behaviour of large fires when implemented at different rates over the course of five decades. The system consisted of a forest and fuel dynamics simulation module (Forest Vegetation Simulator, FVS), logic for deriving fuel...
Foster, Jane R.; D'Amato, Anthony W.; Bradford, John B.
2014-01-01
Forest biomass growth is almost universally assumed to peak early in stand development, near canopy closure, after which it will plateau or decline. The chronosequence and plot remeasurement approaches used to establish the decline pattern suffer from limitations and coarse temporal detail. We combined annual tree ring measurements and mortality models to address two questions: first, how do assumptions about tree growth and mortality influence reconstructions of biomass growth? Second, under what circumstances does biomass production follow the model that peaks early, then declines? We integrated three stochastic mortality models with a census tree-ring data set from eight temperate forest types to reconstruct stand-level biomass increments (in Minnesota, USA). We compared growth patterns among mortality models, forest types and stands. Timing of peak biomass growth varied significantly among mortality models, peaking 20–30 years earlier when mortality was random with respect to tree growth and size, than when mortality favored slow-growing individuals. Random or u-shaped mortality (highest in small or large trees) produced peak growth 25–30 % higher than the surviving tree sample alone. Growth trends for even-aged, monospecific Pinus banksiana or Acer saccharum forests were similar to the early peak and decline expectation. However, we observed continually increasing biomass growth in older, low-productivity forests of Quercus rubra, Fraxinus nigra, and Thuja occidentalis. Tree-ring reconstructions estimated annual changes in live biomass growth and identified more diverse development patterns than previous methods. These detailed, long-term patterns of biomass development are crucial for detecting recent growth responses to global change and modeling future forest dynamics.
NASA Astrophysics Data System (ADS)
Dirilgen, Tara; Juceviča, Edite; Melecis, Viesturs; Querner, Pascal; Bolger, Thomas
2018-01-01
The relative importance of niche separation, non-equilibrial and neutral models of community assembly has been a theme in community ecology for many decades with none appearing to be applicable under all circumstances. In this study, Collembola species abundances were recorded over eleven consecutive years in a spatially explicit grid and used to examine (i) whether observed beta diversity differed from that expected under conditions of neutrality, (ii) whether sampling points differed in their relative contributions to overall beta diversity, and (iii) the number of samples required to provide comparable estimates of species richness across three forest sites. Neutrality could not be rejected for 26 of the forest by year combinations. However, there is a trend toward greater structure in the oldest forest, where beta diversity was greater than predicted by neutrality on five of the eleven sampling dates. The lack of difference in individual- and sample-based rarefaction curves also suggests randomness in the system at this particular scale of investigation. It seems that Collembola communities are not spatially aggregated and assembly is driven primarily by neutral processes particularly in the younger two sites. Whether this finding is due to small sample size or unaccounted for environmental variables cannot be determined. Variability between dates and sites illustrates the potential of drawing incorrect conclusions if data are collected at a single site and a single point in time.
Dale D. Gormanson; Scott A. Pugh; Charles J. Barnett; Patrick D. Miles; Randall S. Morin; Paul A. Sowers; James A. Westfall
2018-01-01
The U.S. Forest Service Forest Inventory and Analysis (FIA) program collects sample plot data on all forest ownerships across the United States. FIAâs primary objective is to determine the extent, condition, volume, growth, and use of trees on the Nationâs forest land through a comprehensive inventory and analysis of the Nationâs forest resources. The FIA program...
California Drought Effects on Sierra Trees Mapped by NASA
2016-06-27
California, reveals the devastating effect of California's ongoing drought on Sierra Nevada conifer forests. The map will be used to help the U.S. Forest Service assess and respond to the impacts of increased tree mortality caused by the drought, particularly where wildlands meet urban areas within the Sierra National Forest. After several years of extreme drought, the highly stressed conifers (trees or bushes that produce cones and are usually green year-round) of the Sierra Nevada are now more susceptible to bark beetles (Dendroctonus spp.). While bark beetles killing trees in the Sierra Nevada is a natural phenomenon, the scale of mortality in the last couple of years is far greater than previously observed. The U.S. Forest Service is using recent airborne spectroscopic measurements from NASA's Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) instrument aboard NASA's ER-2 aircraft, together with new advanced algorithms, to quantify this impact over this large region of rugged terrain. The high-altitude ER-2 aircraft is based at NASA's Armstrong Flight Research Center, Edwards, California. The image was created by scientists at the USFS's Pacific Southwest Region Remote Sensing Lab, McClellan, California, by performing a time series analysis of AVIRIS images. Scientists evaluated baseline tree mortality on public lands in the summer of 2015 using a machine learning algorithm called "random forest." This algorithm classifies the AVIRIS measurements as dominated by either shrubs, healthy trees or newly dead conifer trees. To quantify how much the amount of dead vegetation increased during the fall of 2015, the Forest Service scientists conducted an advanced spectral mixture analysis. This analysis evaluates each spectrum to determine the fraction of green vegetation, dead vegetation and soil. The full spectral range of AVIRIS is important to separate the signatures of soil and dead vegetation. To produce this comprehensive Sierra National Forest tree mortality map, the result from the summer of 2015 was evaluated to look for increases of more than 10 percent in dead vegetation during the fall of 2015. AVIRIS measures spectra of the Earth system to conduct advanced science research. These western U.S. AVIRIS measurements were acquired as part of NASA's Hyperspectral Infrared Imager (HyspIRI) preparatory airborne campaign. HyspIRI was one of the space missions suggested to NASA by the National Academy of Sciences in its 2007 decadal survey for Earth Science. In the future, HyspIRI could provide spectral and thermal measurements of this type globally for ecosystem research and additional science objectives. http://photojournal.jpl.nasa.gov/catalog/PIA20717
Akamani, Kofi; Hall, Troy Elizabeth
2015-01-01
This study tested a proposed community resilience model by investigating the role of institutions, capital assets, community and socio-demographic variables as determinants of households' participation in Ghana's collaborative forest management (CFM) program and outcomes of the program. Quantitative survey data were gathered from 209 randomly selected households from two forest-dependent communities. Regression analysis shows that households' participation in the CFM program was predicted by community location, past connections with institutions, and past bonding social capital. Community location and past capitals were the strongest predictors of the outcomes of the CFM program as judged by current levels of capitals. Participation in the CFM program also had a positive effect on human capital but had minimal impact on the other capitals influencing household well-being and resilience, suggesting that the impact of co-management on household resilience may be modest. In all, the findings highlight the need for co-management policies to pay attention to the historical context of community interaction processes influencing access to capital assets and local institutions to successfully promote equitable resilience. Copyright © 2014 Elsevier Ltd. All rights reserved.
Selection of forest canopy gaps by male Cerulean Warblers in West Virginia
Perkins, Kelly A.; Wood, Petra Bohall
2014-01-01
Forest openings, or canopy gaps, are an important resource for many forest songbirds, such as Cerulean Warblers (Setophaga cerulea). We examined canopy gap selection by this declining species to determine if male Cerulean Warblers selected particular sizes, vegetative heights, or types of gaps. We tested whether these parameters differed among territories, territory core areas, and randomly-placed sample plots. We used enhanced territory mapping techniques (burst sampling) to define habitat use within the territory. Canopy gap densities were higher within core areas of territories than within territories or random plots, indicating that Cerulean Warblers selected habitat within their territories with the highest gap densities. Selection of regenerating gaps with woody vegetation >12 m within the gap, and canopy heights >24 m surrounding the gap, occurred within territory core areas. These findings differed between two sites indicating that gap selection may vary based on forest structure. Differences were also found regarding the placement of territories with respect to gaps. Larger gaps, such as wildlife food plots, were located on the periphery of territories more often than other types and sizes of gaps, while smaller gaps, such as treefalls, were located within territory boundaries more often than expected. The creations of smaller canopy gaps, <100 m2, within dense stands are likely compatible with forest management for this species.
Cluster ensemble based on Random Forests for genetic data.
Alhusain, Luluah; Hafez, Alaaeldin M
2017-01-01
Clustering plays a crucial role in several application domains, such as bioinformatics. In bioinformatics, clustering has been extensively used as an approach for detecting interesting patterns in genetic data. One application is population structure analysis, which aims to group individuals into subpopulations based on shared genetic variations, such as single nucleotide polymorphisms. Advances in DNA sequencing technology have facilitated the obtainment of genetic datasets with exceptional sizes. Genetic data usually contain hundreds of thousands of genetic markers genotyped for thousands of individuals, making an efficient means for handling such data desirable. Random Forests (RFs) has emerged as an efficient algorithm capable of handling high-dimensional data. RFs provides a proximity measure that can capture different levels of co-occurring relationships between variables. RFs has been widely considered a supervised learning method, although it can be converted into an unsupervised learning method. Therefore, RF-derived proximity measure combined with a clustering technique may be well suited for determining the underlying structure of unlabeled data. This paper proposes, RFcluE, a cluster ensemble approach for determining the underlying structure of genetic data based on RFs. The approach comprises a cluster ensemble framework to combine multiple runs of RF clustering. Experiments were conducted on high-dimensional, real genetic dataset to evaluate the proposed approach. The experiments included an examination of the impact of parameter changes, comparing RFcluE performance against other clustering methods, and an assessment of the relationship between the diversity and quality of the ensemble and its effect on RFcluE performance. This paper proposes, RFcluE, a cluster ensemble approach based on RF clustering to address the problem of population structure analysis and demonstrate the effectiveness of the approach. The paper also illustrates that applying a cluster ensemble approach, combining multiple RF clusterings, produces more robust and higher-quality results as a consequence of feeding the ensemble with diverse views of high-dimensional genetic data obtained through bagging and random subspace, the two key features of the RF algorithm.
NASA Astrophysics Data System (ADS)
Ahmed, Zia U.; Woodbury, Peter B.; Sanderman, Jonathan; Hawke, Bruce; Jauss, Verena; Solomon, Dawit; Lehmann, Johannes
2017-02-01
To predict how land management practices and climate change will affect soil carbon cycling, improved understanding of factors controlling soil organic carbon fractions at large spatial scales is needed. We analyzed total soil organic (SOC) as well as pyrogenic (PyC), particulate (POC), and other soil organic carbon (OOC) fractions in surface layers from 650 stratified-sampling locations throughout Colorado, Kansas, New Mexico, and Wyoming. PyC varied from 0.29 to 18.0 mg C g-1 soil with a mean of 4.05 mg C g-1 soil. The mean PyC was 34.6% of the SOC and ranged from 11.8 to 96.6%. Both POC and PyC were highest in forests and canyon bottoms. In the best random forest regression model, normalized vegetation index (NDVI), mean annual precipitation (MAP), mean annual temperature (MAT), and elevation were ranked as the top four important variables determining PyC and POC variability. Random forests regression kriging (RFK) with environmental covariables improved predictions over ordinary kriging by 20 and 7% for PyC and POC, respectively. Based on RFK, 8% of the study area was dominated (≥50% of SOC) by PyC and less than 1% was dominated by POC. Furthermore, based on spatial analysis of the ratio of POC to PyC, we estimated that about 16% of the study area is medium to highly vulnerable to SOC mineralization in surface soil. These are the first results to characterize PyC and POC stocks geospatially using stratified sampling scheme at the scale of 1,000,000 km2, and the methods are scalable to other regions.
Geometric Accuracy Analysis of Worlddem in Relation to AW3D30, Srtm and Aster GDEM2
NASA Astrophysics Data System (ADS)
Bayburt, S.; Kurtak, A. B.; Büyüksalih, G.; Jacobsen, K.
2017-05-01
In a project area close to Istanbul the quality of WorldDEM, AW3D30, SRTM DSM and ASTER GDEM2 have been analyzed in relation to a reference aerial LiDAR DEM and to each other. The random and the systematic height errors have been separated. The absolute offset for all height models in X, Y and Z is within the expectation. The shifts have been respected in advance for a satisfying estimation of the random error component. All height models are influenced by some tilts, different in size. In addition systematic deformations can be seen not influencing the standard deviation too much. The delivery of WorldDEM includes information about the height error map which is based on the interferometric phase errors, and the number and location of coverage's from different orbits. A dependency of the height accuracy from the height error map information and the number of coverage's can be seen, but it is smaller as expected. WorldDEM is more accurate as the other investigated height models and with 10 m point spacing it includes more morphologic details, visible at contour lines. The morphologic details are close to the details based on the LiDAR digital surface model (DSM). As usual a dependency of the accuracy from the terrain slope can be seen. In forest areas the canopy definition of InSAR X- and C-band height models as well as for the height models based on optical satellite images is not the same as the height definition by LiDAR. In addition the interferometric phase uncertainty over forest areas is larger. Both effects lead to lower height accuracy in forest areas, also visible in the height error map.
The effect of IDH1 mutation on the structural connectome in malignant astrocytoma.
Kesler, Shelli R; Noll, Kyle; Cahill, Daniel P; Rao, Ganesh; Wefel, Jeffrey S
2017-02-01
Mutation of the IDH1 gene is associated with differences in malignant astrocytoma growth characteristics that impact phenotypic severity, including cognitive impairment. We previously demonstrated greater cognitive impairment in patients with IDH1 wild type tumor compared to those with IDH1 mutant, and therefore we hypothesized that brain network organization would be lower in patients with wild type tumors. Volumetric, T1-weighted MRI scans were obtained retrospectively from 35 patients with IDH1 mutant and 32 patients with wild type malignant astrocytoma (mean age = 45 ± 14 years) and used to extract individual level, gray matter connectomes. Graph theoretical analysis was then applied to measure efficiency and other connectome properties for each patient. Cognitive performance was categorized as impaired or not and random forest classification was used to explore factors associated with cognitive impairment. Patients with wild type tumor demonstrated significantly lower network efficiency in several medial frontal, posterior parietal and subcortical regions (p < 0.05, corrected for multiple comparisons). Patients with wild type tumor also demonstrated significantly higher incidence of cognitive impairment (p = 0.03). Random forest analysis indicated that network efficiency was inversely, though nonlinearly associated with cognitive impairment in both groups (p < 0.0001). Cognitive reserve appeared to mediate this relationship in patients with mutant tumor suggesting greater neuroplasticity and/or benefit from neuroprotective factors. Tumor volume was the greatest contributor to cognitive impairment in patients with wild type tumor, supporting our hypothesis that greater lesion momentum between grades may cause more disconnection of core neurocircuitry and consequently lower efficiency of information processing.
Evaluation of variable selection methods for random forests and omics data sets.
Degenhardt, Frauke; Seifert, Stephan; Szymczak, Silke
2017-10-16
Machine learning methods and in particular random forests are promising approaches for prediction based on high dimensional omics data sets. They provide variable importance measures to rank predictors according to their predictive power. If building a prediction model is the main goal of a study, often a minimal set of variables with good prediction performance is selected. However, if the objective is the identification of involved variables to find active networks and pathways, approaches that aim to select all relevant variables should be preferred. We evaluated several variable selection procedures based on simulated data as well as publicly available experimental methylation and gene expression data. Our comparison included the Boruta algorithm, the Vita method, recurrent relative variable importance, a permutation approach and its parametric variant (Altmann) as well as recursive feature elimination (RFE). In our simulation studies, Boruta was the most powerful approach, followed closely by the Vita method. Both approaches demonstrated similar stability in variable selection, while Vita was the most robust approach under a pure null model without any predictor variables related to the outcome. In the analysis of the different experimental data sets, Vita demonstrated slightly better stability in variable selection and was less computationally intensive than Boruta.In conclusion, we recommend the Boruta and Vita approaches for the analysis of high-dimensional data sets. Vita is considerably faster than Boruta and thus more suitable for large data sets, but only Boruta can also be applied in low-dimensional settings. © The Author 2017. Published by Oxford University Press.
Zampieri, Fernando G; Póvoa, Pedro; Salluh, Jorge I; Rodriguez, Alejandro; Valade, Sandrine; Andrade Gomes, José; Reignier, Jean; Molinos, Elena; Almirall, Jordi; Boussekey, Nicolas; Socias, Lorenzo; Ramirez, Paula; Viana, William N; Rouzé, Anahita; Nseir, Saad; Martin-Loeches, Ignacio
2018-01-01
To assess whether ventilator-associated lower respiratory tract infections (VA-LRTIs) are associated with mortality in critically ill patients with acute respiratory distress syndrome (ARDS). Post hoc analysis of prospective cohort study including mechanically ventilated patients from a multicenter prospective observational study (TAVeM study); VA-LRTI was defined as either ventilator-associated tracheobronchitis (VAT) or ventilator-associated pneumonia (VAP) based on clinical criteria and microbiological confirmation. Association between intensive care unit (ICU) mortality in patients having ARDS with and without VA-LRTI was assessed through logistic regression controlling for relevant confounders. Association between VA-LRTI and duration of mechanical ventilation and ICU stay was assessed through competing risk analysis. Contribution of VA-LRTI to a mortality model over time was assessed through sequential random forest models. The cohort included 2960 patients of which 524 fulfilled criteria for ARDS; 21% had VA-LRTI (VAT = 10.3% and VAP = 10.7%). After controlling for illness severity and baseline health status, we could not find an association between VA-LRTI and ICU mortality (odds ratio: 1.07; 95% confidence interval: 0.62-1.83; P = .796); VA-LRTI was also not associated with prolonged ICU length of stay or duration of mechanical ventilation. The relative contribution of VA-LRTI to the random forest mortality model remained constant during time. The attributable VA-LRTI mortality for ARDS was higher than the attributable mortality for VA-LRTI alone. After controlling for relevant confounders, we could not find an association between occurrence of VA-LRTI and ICU mortality in patients with ARDS.
Sando, Roy; Chase, Katherine J.
2017-03-23
A common statistical procedure for estimating streamflow statistics at ungaged locations is to develop a relational model between streamflow and drainage basin characteristics at gaged locations using least squares regression analysis; however, least squares regression methods are parametric and make constraining assumptions about the data distribution. The random forest regression method provides an alternative nonparametric method for estimating streamflow characteristics at ungaged sites and requires that the data meet fewer statistical conditions than least squares regression methods.Random forest regression analysis was used to develop predictive models for 89 streamflow characteristics using Precipitation-Runoff Modeling System simulated streamflow data and drainage basin characteristics at 179 sites in central and eastern Montana. The predictive models were developed from streamflow data simulated for current (baseline, water years 1982–99) conditions and three future periods (water years 2021–38, 2046–63, and 2071–88) under three different climate-change scenarios. These predictive models were then used to predict streamflow characteristics for baseline conditions and three future periods at 1,707 fish sampling sites in central and eastern Montana. The average root mean square error for all predictive models was about 50 percent. When streamflow predictions at 23 fish sampling sites were compared to nearby locations with simulated data, the mean relative percent difference was about 43 percent. When predictions were compared to streamflow data recorded at 21 U.S. Geological Survey streamflow-gaging stations outside of the calibration basins, the average mean absolute percent error was about 73 percent.
Diversity of Medicinal Plants among Different Forest-use Types of the Pakistani Himalaya.
Adnan, Muhammad; Hölscher, Dirk
2012-12-01
Diversity of Medicinal Plants among Different Forest-use Types of the Pakistani Himalaya Medicinal plants collected in Himalayan forests play a vital role in the livelihoods of regional rural societies and are also increasingly recognized at the international level. However, these forests are being heavily transformed by logging. Here we ask how forest transformation influences the diversity and composition of medicinal plants in northwestern Pakistan, where we studied old-growth forests, forests degraded by logging, and regrowth forests. First, an approximate map indicating these forest types was established and then 15 study plots per forest type were randomly selected. We found a total of 59 medicinal plant species consisting of herbs and ferns, most of which occurred in the old-growth forest. Species number was lowest in forest degraded by logging and intermediate in regrowth forest. The most valuable economic species, including six Himalayan endemics, occurred almost exclusively in old-growth forest. Species composition and abundance of forest degraded by logging differed markedly from that of old-growth forest, while regrowth forest was more similar to old-growth forest. The density of medicinal plants positively correlated with tree canopy cover in old-growth forest and negatively in degraded forest, which indicates that species adapted to open conditions dominate in logged forest. Thus, old-growth forests are important as refuge for vulnerable endemics. Forest degraded by logging has the lowest diversity of relatively common medicinal plants. Forest regrowth may foster the reappearance of certain medicinal species valuable to local livelihoods and as such promote acceptance of forest expansion and medicinal plants conservation in the region. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1007/s12231-012-9213-4) contains supplementary material, which is available to authorized users.
RFMix: A Discriminative Modeling Approach for Rapid and Robust Local-Ancestry Inference
Maples, Brian K.; Gravel, Simon; Kenny, Eimear E.; Bustamante, Carlos D.
2013-01-01
Local-ancestry inference is an important step in the genetic analysis of fully sequenced human genomes. Current methods can only detect continental-level ancestry (i.e., European versus African versus Asian) accurately even when using millions of markers. Here, we present RFMix, a powerful discriminative modeling approach that is faster (∼30×) and more accurate than existing methods. We accomplish this by using a conditional random field parameterized by random forests trained on reference panels. RFMix is capable of learning from the admixed samples themselves to boost performance and autocorrect phasing errors. RFMix shows high sensitivity and specificity in simulated Hispanics/Latinos and African Americans and admixed Europeans, Africans, and Asians. Finally, we demonstrate that African Americans in HapMap contain modest (but nonzero) levels of Native American ancestry (∼0.4%). PMID:23910464
NASA Astrophysics Data System (ADS)
Kim, S. J.; Lim, C. H.; Kim, G. S.; Lee, W. K.
2017-12-01
Analysis of forest fire risk is important in disaster risk reduction (DRR) since it provides a way to manage forest fires. Climate and socio-economic factors are important in the cause of forest fires, and the role of the socio-economic factors in prevention and preparedness of forest fires is increasing. As most of the forest fires in the Republic of Korea are highly related to human activities, both environmental factors and socio-economic factors were considered into the analysis of forest fire risk. In this study, the Maximum Entropy (MaxEnt) model was used to predict the potential geographical distribution and probability of forest fire occurrence spatially and temporally from 1980s to the 2010s in the Republic of Korea by multi-temporal analysis and analyze the relationship between forest fires and the factors. As a result of the risk analysis, there was an overall increasing trend in forest fire risk from the 1980s to the 2000s, and socio-economic factors were highly correlated with the occurrence of forest fires. The study demonstrates that the socio-economic factors considered as human activities can increase the occurrence of forest fires. The result implies that managing human activities are significant to prevent forest fire occurrence. In addition, timely forest fire prevention and control is necessary as drought index such as Standardized Precipitation Index (SPI) also affected forest fires.
NASA Astrophysics Data System (ADS)
Suiter, Ashley Elizabeth
Multi-spectral imagery provides a robust and low-cost dataset for assessing wetland extent and quality over broad regions and is frequently used for wetland inventories. However in forested wetlands, hydrology is obscured by tree canopy making it difficult to detect with multi-spectral imagery alone. Because of this, classification of forested wetlands often includes greater errors than that of other wetlands types. Elevation and terrain derivatives have been shown to be useful for modelling wetland hydrology. But, few studies have addressed the use of LiDAR intensity data detecting hydrology in forested wetlands. Due the tendency of LiDAR signal to be attenuated by water, this research proposed the fusion of LiDAR intensity data with LiDAR elevation, terrain data, and aerial imagery, for the detection of forested wetland hydrology. We examined the utility of LiDAR intensity data and determined whether the fusion of Lidar derived data with multispectral imagery increased the accuracy of forested wetland classification compared with a classification performed with only multi-spectral image. Four classifications were performed: Classification A -- All Imagery, Classification B -- All LiDAR, Classification C -- LiDAR without Intensity, and Classification D -- Fusion of All Data. These classifications were performed using random forest and each resulted in a 3-foot resolution thematic raster of forested upland and forested wetland locations in Vermilion County, Illinois. The accuracies of these classifications were compared using Kappa Coefficient of Agreement. Importance statistics produced within the random forest classifier were evaluated in order to understand the contribution of individual datasets. Classification D, which used the fusion of LiDAR and multi-spectral imagery as input variables, had moderate to strong agreement between reference data and classification results. It was found that Classification A performed using all the LiDAR data and its derivatives (intensity, elevation, slope, aspect, curvatures, and Topographic Wetness Index) was the most accurate classification with Kappa: 78.04%, indicating moderate to strong agreement. However, Classification C, performed with LiDAR derivative without intensity data had less agreement than would be expected by chance, indicating that LiDAR contributed significantly to the accuracy of Classification B.
Cadogan, Beresford L; Scharbach, Roger D
2003-04-01
A field trial using true replicates was conducted successfully in a boreal forest in 1996 to evaluate the efficacy of two aerially applied Bacillus thuringiensis formulations, ABG 6429 and ABG 6430. A complete randomized design with four replicates per treatment was chosen. Twelve to 15 balsam fir (Abies balsamea [L.] Mill.) per plot were randomly selected as sample trees. Interplot buffer zones, > or = 200 m wide, adequately prevented cross contamination from sprays that were atomized with four rotary atomizers (volume median diameters ranging from 64.6 to 139.4 microm) and released approximately 30 m above the ground. The B. thuringiensis formulations were not significantly different (P > 0.05) from each other in reducing spruce budworm (Choristoneura fumiferana [Clem.]) populations and protecting balsam trees from defoliation but both formulations were significantly more efficacious than the controls. The results suggest that true replicates are a feasible alternative to pseudoreplication in experimental forest aerial applications.
Enhancing Multimedia Imbalanced Concept Detection Using VIMP in Random Forests.
Sadiq, Saad; Yan, Yilin; Shyu, Mei-Ling; Chen, Shu-Ching; Ishwaran, Hemant
2016-07-01
Recent developments in social media and cloud storage lead to an exponential growth in the amount of multimedia data, which increases the complexity of managing, storing, indexing, and retrieving information from such big data. Many current content-based concept detection approaches lag from successfully bridging the semantic gap. To solve this problem, a multi-stage random forest framework is proposed to generate predictor variables based on multivariate regressions using variable importance (VIMP). By fine tuning the forests and significantly reducing the predictor variables, the concept detection scores are evaluated when the concept of interest is rare and imbalanced, i.e., having little collaboration with other high level concepts. Using classical multivariate statistics, estimating the value of one coordinate using other coordinates standardizes the covariates and it depends upon the variance of the correlations instead of the mean. Thus, conditional dependence on the data being normally distributed is eliminated. Experimental results demonstrate that the proposed framework outperforms those approaches in the comparison in terms of the Mean Average Precision (MAP) values.
Cai, Tianxi; Karlson, Elizabeth W.
2013-01-01
Objectives To test whether data extracted from full text patient visit notes from an electronic medical record (EMR) would improve the classification of PsA compared to an algorithm based on codified data. Methods From the > 1,350,000 adults in a large academic EMR, all 2318 patients with a billing code for PsA were extracted and 550 were randomly selected for chart review and algorithm training. Using codified data and phrases extracted from narrative data using natural language processing, 31 predictors were extracted and three random forest algorithms trained using coded, narrative, and combined predictors. The receiver operator curve (ROC) was used to identify the optimal algorithm and a cut point was chosen to achieve the maximum sensitivity possible at a 90% positive predictive value (PPV). The algorithm was then used to classify the remaining 1768 charts and finally validated in a random sample of 300 cases predicted to have PsA. Results The PPV of a single PsA code was 57% (95%CI 55%–58%). Using a combination of coded data and NLP the random forest algorithm reached a PPV of 90% (95%CI 86%–93%) at sensitivity of 87% (95% CI 83% – 91%) in the training data. The PPV was 93% (95%CI 89%–96%) in the validation set. Adding NLP predictors to codified data increased the area under the ROC (p < 0.001). Conclusions Using NLP with text notes from electronic medical records improved the performance of the prediction algorithm significantly. Random forests were a useful tool to accurately classify psoriatic arthritis cases to enable epidemiological research. PMID:20701955
Ensemble Feature Learning of Genomic Data Using Support Vector Machine
Anaissi, Ali; Goyal, Madhu; Catchpoole, Daniel R.; Braytee, Ali; Kennedy, Paul J.
2016-01-01
The identification of a subset of genes having the ability to capture the necessary information to distinguish classes of patients is crucial in bioinformatics applications. Ensemble and bagging methods have been shown to work effectively in the process of gene selection and classification. Testament to that is random forest which combines random decision trees with bagging to improve overall feature selection and classification accuracy. Surprisingly, the adoption of these methods in support vector machines has only recently received attention but mostly on classification not gene selection. This paper introduces an ensemble SVM-Recursive Feature Elimination (ESVM-RFE) for gene selection that follows the concepts of ensemble and bagging used in random forest but adopts the backward elimination strategy which is the rationale of RFE algorithm. The rationale behind this is, building ensemble SVM models using randomly drawn bootstrap samples from the training set, will produce different feature rankings which will be subsequently aggregated as one feature ranking. As a result, the decision for elimination of features is based upon the ranking of multiple SVM models instead of choosing one particular model. Moreover, this approach will address the problem of imbalanced datasets by constructing a nearly balanced bootstrap sample. Our experiments show that ESVM-RFE for gene selection substantially increased the classification performance on five microarray datasets compared to state-of-the-art methods. Experiments on the childhood leukaemia dataset show that an average 9% better accuracy is achieved by ESVM-RFE over SVM-RFE, and 5% over random forest based approach. The selected genes by the ESVM-RFE algorithm were further explored with Singular Value Decomposition (SVD) which reveals significant clusters with the selected data. PMID:27304923
Ozçift, Akin
2011-05-01
Supervised classification algorithms are commonly used in the designing of computer-aided diagnosis systems. In this study, we present a resampling strategy based Random Forests (RF) ensemble classifier to improve diagnosis of cardiac arrhythmia. Random forests is an ensemble classifier that consists of many decision trees and outputs the class that is the mode of the class's output by individual trees. In this way, an RF ensemble classifier performs better than a single tree from classification performance point of view. In general, multiclass datasets having unbalanced distribution of sample sizes are difficult to analyze in terms of class discrimination. Cardiac arrhythmia is such a dataset that has multiple classes with small sample sizes and it is therefore adequate to test our resampling based training strategy. The dataset contains 452 samples in fourteen types of arrhythmias and eleven of these classes have sample sizes less than 15. Our diagnosis strategy consists of two parts: (i) a correlation based feature selection algorithm is used to select relevant features from cardiac arrhythmia dataset. (ii) RF machine learning algorithm is used to evaluate the performance of selected features with and without simple random sampling to evaluate the efficiency of proposed training strategy. The resultant accuracy of the classifier is found to be 90.0% and this is a quite high diagnosis performance for cardiac arrhythmia. Furthermore, three case studies, i.e., thyroid, cardiotocography and audiology, are used to benchmark the effectiveness of the proposed method. The results of experiments demonstrated the efficiency of random sampling strategy in training RF ensemble classification algorithm. Copyright © 2011 Elsevier Ltd. All rights reserved.
A spatially explicit decision support model for restoration of forest bird habitat
Twedt, D.J.; Uihlein, W.B.; Elliott, A.B.
2006-01-01
The historical area of bottomland hardwood forest in the Mississippi Alluvial Valley has been reduced by >75%. Agricultural production was the primary motivator for deforestation; hence, clearing deliberately targeted higher and drier sites. Remaining forests are highly fragmented and hydrologically altered, with larger forest fragments subject to greater inundation, which has negatively affected many forest bird populations. We developed a spatially explicit decision support model, based on a Partners in Flight plan for forest bird conservation, that prioritizes forest restoration to reduce forest fragmentation and increase the area of forest core (interior forest >1 km from 'hostile' edge). Our primary objective was to increase the number of forest patches that harbor >2000 ha of forest core, but we also sought to increase the number and area of forest cores >5000 ha. Concurrently, we targeted restoration within local (320 km2) landscapes to achieve >60% forest cover. Finally, we emphasized restoration of higher-elevation bottomland hardwood forests in areas where restoration would not increase forest fragmentation. Reforestation of 10% of restorable land in the Mississippi Alluvial Valley (approximately 880,000 ha) targeted at priorities established by this decision support model resulted in approximately 824,000 ha of new forest core. This is more than 32 times the amount of core forest added through reforestation of randomly located fields (approximately 25,000 ha). The total area of forest core (1.6 million ha) that resulted from targeted restoration exceeded habitat objectives identified in the Partners in Flight Bird Conservation Plan and approached the area of forest core present in the 1950s.
Ye, Yalan; He, Wenwen; Cheng, Yunfei; Huang, Wenxia; Zhang, Zhilin
2017-02-16
The estimation of heart rate (HR) based on wearable devices is of interest in fitness. Photoplethysmography (PPG) is a promising approach to estimate HR due to low cost; however, it is easily corrupted by motion artifacts (MA). In this work, a robust approach based on random forest is proposed for accurately estimating HR from the photoplethysmography signal contaminated by intense motion artifacts, consisting of two stages. Stage 1 proposes a hybrid method to effectively remove MA with a low computation complexity, where two MA removal algorithms are combined by an accurate binary decision algorithm whose aim is to decide whether or not to adopt the second MA removal algorithm. Stage 2 proposes a random forest-based spectral peak-tracking algorithm, whose aim is to locate the spectral peak corresponding to HR, formulating the problem of spectral peak tracking into a pattern classification problem. Experiments on the PPG datasets including 22 subjects used in the 2015 IEEE Signal Processing Cup showed that the proposed approach achieved the average absolute error of 1.65 beats per minute (BPM) on the 22 PPG datasets. Compared to state-of-the-art approaches, the proposed approach has better accuracy and robustness to intense motion artifacts, indicating its potential use in wearable sensors for health monitoring and fitness tracking.
Comparing spatial regression to random forests for large ...
Environmental data may be “large” due to number of records, number of covariates, or both. Random forests has a reputation for good predictive performance when using many covariates, whereas spatial regression, when using reduced rank methods, has a reputation for good predictive performance when using many records. In this study, we compare these two techniques using a data set containing the macroinvertebrate multimetric index (MMI) at 1859 stream sites with over 200 landscape covariates. Our primary goal is predicting MMI at over 1.1 million perennial stream reaches across the USA. For spatial regression modeling, we develop two new methods to accommodate large data: (1) a procedure that estimates optimal Box-Cox transformations to linearize covariate relationships; and (2) a computationally efficient covariate selection routine that takes into account spatial autocorrelation. We show that our new methods lead to cross-validated performance similar to random forests, but that there is an advantage for spatial regression when quantifying the uncertainty of the predictions. Simulations are used to clarify advantages for each method. This research investigates different approaches for modeling and mapping national stream condition. We use MMI data from the EPA's National Rivers and Streams Assessment and predictors from StreamCat (Hill et al., 2015). Previous studies have focused on modeling the MMI condition classes (i.e., good, fair, and po
Cherry, Kevin M; Peplinski, Brandon; Kim, Lauren; Wang, Shijun; Lu, Le; Zhang, Weidong; Liu, Jianfei; Wei, Zhuoshi; Summers, Ronald M
2015-01-01
Given the potential importance of marginal artery localization in automated registration in computed tomography colonography (CTC), we have devised a semi-automated method of marginal vessel detection employing sequential Monte Carlo tracking (also known as particle filtering tracking) by multiple cue fusion based on intensity, vesselness, organ detection, and minimum spanning tree information for poorly enhanced vessel segments. We then employed a random forest algorithm for intelligent cue fusion and decision making which achieved high sensitivity and robustness. After applying a vessel pruning procedure to the tracking results, we achieved statistically significantly improved precision compared to a baseline Hessian detection method (2.7% versus 75.2%, p<0.001). This method also showed statistically significantly improved recall rate compared to a 2-cue baseline method using fewer vessel cues (30.7% versus 67.7%, p<0.001). These results demonstrate that marginal artery localization on CTC is feasible by combining a discriminative classifier (i.e., random forest) with a sequential Monte Carlo tracking mechanism. In so doing, we present the effective application of an anatomical probability map to vessel pruning as well as a supplementary spatial coordinate system for colonic segmentation and registration when this task has been confounded by colon lumen collapse. Published by Elsevier B.V.
A scattering model for forested area
NASA Technical Reports Server (NTRS)
Karam, M. A.; Fung, A. K.
1988-01-01
A forested area is modeled as a volume of randomly oriented and distributed disc-shaped, or needle-shaped leaves shading a distribution of branches modeled as randomly oriented finite-length, dielectric cylinders above an irregular soil surface. Since the radii of branches have a wide range of sizes, the model only requires the length of a branch to be large compared with its radius which may be any size relative to the incident wavelength. In addition, the model also assumes the thickness of a disc-shaped leaf or the radius of a needle-shaped leaf is much smaller than the electromagnetic wavelength. The scattering phase matrices for disc, needle, and cylinder are developed in terms of the scattering amplitudes of the corresponding fields which are computed by the forward scattering theorem. These quantities along with the Kirchoff scattering model for a randomly rough surface are used in the standard radiative transfer formulation to compute the backscattering coefficient. Numerical illustrations for the backscattering coefficient are given as a function of the shading factor, incidence angle, leaf orientation distribution, branch orientation distribution, and the number density of leaves. Also illustrated are the properties of the extinction coefficient as a function of leaf and branch orientation distributions. Comparisons are made with measured backscattering coefficients from forested areas reported in the literature.
Mateen, Bilal Akhter; Bussas, Matthias; Doogan, Catherine; Waller, Denise; Saverino, Alessia; Király, Franz J; Playford, E Diane
2018-05-01
To determine whether tests of cognitive function and patient-reported outcome measures of motor function can be used to create a machine learning-based predictive tool for falls. Prospective cohort study. Tertiary neurological and neurosurgical center. In all, 337 in-patients receiving neurosurgical, neurological, or neurorehabilitation-based care. Binary (Y/N) for falling during the in-patient episode, the Trail Making Test (a measure of attention and executive function) and the Walk-12 (a patient-reported measure of physical function). The principal outcome was a fall during the in-patient stay ( n = 54). The Trail test was identified as the best predictor of falls. Moreover, addition of other variables, did not improve the prediction (Wilcoxon signed-rank P < 0.001). Classical linear statistical modeling methods were then compared with more recent machine learning based strategies, for example, random forests, neural networks, support vector machines. The random forest was the best modeling strategy when utilizing just the Trail Making Test data (Wilcoxon signed-rank P < 0.001) with 68% (± 7.7) sensitivity, and 90% (± 2.3) specificity. This study identifies a simple yet powerful machine learning (Random Forest) based predictive model for an in-patient neurological population, utilizing a single neuropsychological test of cognitive function, the Trail Making test.
Rexhepaj, Elton; Brennan, Donal J; Holloway, Peter; Kay, Elaine W; McCann, Amanda H; Landberg, Goran; Duffy, Michael J; Jirstrom, Karin; Gallagher, William M
2008-01-01
Manual interpretation of immunohistochemistry (IHC) is a subjective, time-consuming and variable process, with an inherent intra-observer and inter-observer variability. Automated image analysis approaches offer the possibility of developing rapid, uniform indicators of IHC staining. In the present article we describe the development of a novel approach for automatically quantifying oestrogen receptor (ER) and progesterone receptor (PR) protein expression assessed by IHC in primary breast cancer. Two cohorts of breast cancer patients (n = 743) were used in the study. Digital images of breast cancer tissue microarrays were captured using the Aperio ScanScope XT slide scanner (Aperio Technologies, Vista, CA, USA). Image analysis algorithms were developed using MatLab 7 (MathWorks, Apple Hill Drive, MA, USA). A fully automated nuclear algorithm was developed to discriminate tumour from normal tissue and to quantify ER and PR expression in both cohorts. Random forest clustering was employed to identify optimum thresholds for survival analysis. The accuracy of the nuclear algorithm was initially confirmed by a histopathologist, who validated the output in 18 representative images. In these 18 samples, an excellent correlation was evident between the results obtained by manual and automated analysis (Spearman's rho = 0.9, P < 0.001). Optimum thresholds for survival analysis were identified using random forest clustering. This revealed 7% positive tumour cells as the optimum threshold for the ER and 5% positive tumour cells for the PR. Moreover, a 7% cutoff level for the ER predicted a better response to tamoxifen than the currently used 10% threshold. Finally, linear regression was employed to demonstrate a more homogeneous pattern of expression for the ER (R = 0.860) than for the PR (R = 0.681). In summary, we present data on the automated quantification of the ER and the PR in 743 primary breast tumours using a novel unsupervised image analysis algorithm. This novel approach provides a useful tool for the quantification of biomarkers on tissue specimens, as well as for objective identification of appropriate cutoff thresholds for biomarker positivity. It also offers the potential to identify proteins with a homogeneous pattern of expression.
John D. Shaw
2008-01-01
(Please note, this is an abstract only) Widespread mortality in several forest types is associated with several years of drought in the Southwest. Implementation of USDA Forest Service Forest Inventory and Analysis (FIA) annual inventory in several states coincided with the onset of elevated mortality rates. Analysis of data collected 2000-2004 reveals the status and...
Ronald E. McRoberts; Gregory A. Reams; Paul C. Van Deusen; John W. Moser
2003-01-01
Documents contributions to forest inventory in the areas of sampling, remote sensing, modeling, information management, and analysis with emphasis on implementation of the annual inventory system of the Forest Inventory and Analysis program of the USDA Forest Service.
Microscopic saw mark analysis: an empirical approach.
Love, Jennifer C; Derrick, Sharon M; Wiersema, Jason M; Peters, Charles
2015-01-01
Microscopic saw mark analysis is a well published and generally accepted qualitative analytical method. However, little research has focused on identifying and mitigating potential sources of error associated with the method. The presented study proposes the use of classification trees and random forest classifiers as an optimal, statistically sound approach to mitigate the potential for error of variability and outcome error in microscopic saw mark analysis. The statistical model was applied to 58 experimental saw marks created with four types of saws. The saw marks were made in fresh human femurs obtained through anatomical gift and were analyzed using a Keyence digital microscope. The statistical approach weighed the variables based on discriminatory value and produced decision trees with an associated outcome error rate of 8.62-17.82%. © 2014 American Academy of Forensic Sciences.
California's forest resources, 2001-2005: five-year Forest Inventory and Analysis Report.
Glenn A. Christensen; Sally J. Campbell; Jeremy S. Fried
2008-01-01
This report highlights key findings from the most recent (2001-2005) data collected by the Forest Inventory and Analysis Program across all forest land in California. We summarize and interpret basic resource information such as forest area, ownership, volume, biomass, and carbon stocks; structure and function topics such as biodiversity, forest age, dead wood, and...
Stevens, Adam; Murray, Philip; Wojcik, Jerome; Raelson, John; Koledova, Ekaterina; Chatelain, Pierre
2016-01-01
Objective Single-nucleotide polymorphisms (SNPs) associated with the response to recombinant human growth hormone (r-hGH) have previously been identified in growth hormone deficiency (GHD) and Turner syndrome (TS) children in the PREDICT long-term follow-up (LTFU) study (Nbib699855). Here, we describe the PREDICT validation (VAL) study (Nbib1419249), which aimed to confirm these genetic associations. Design and methods Children with GHD (n = 293) or TS (n = 132) were recruited retrospectively from 29 sites in nine countries. All children had completed 1 year of r-hGH therapy. 48 SNPs previously identified as associated with first year growth response to r-hGH were genotyped. Regression analysis was used to assess the association between genotype and growth response using clinical/auxological variables as covariates. Further analysis was undertaken using random forest classification. Results The children were younger, and the growth response was higher in VAL study. Direct genotype analysis did not replicate what was found in the LTFU study. However, using exploratory regression models with covariates, a consistent relationship with growth response in both VAL and LTFU was shown for four genes – SOS1 and INPPL1 in GHD and ESR1 and PTPN1 in TS. The random forest analysis demonstrated that only clinical covariates were important in the prediction of growth response in mild GHD (>4 to <10 μg/L on GH stimulation test), however, in severe GHD (≤4 μg/L) several SNPs contributed (in IGF2, GRB10, FOS, IGFBP3 and GHRHR). Conclusions The PREDICT validation study supports, in an independent cohort, the association of four of 48 genetic markers with growth response to r-hGH treatment in both pre-pubertal GHD and TS children after controlling for clinical/auxological covariates. However, the contribution of these SNPs in a prediction model of first-year response is not sufficient for routine clinical use. PMID:27651465
Chaze, Thibault; Hornez, Louis; Chambon, Christophe; Haddad, Iman; Vinh, Joelle; Peyrat, Jean-Philippe; Benderitter, Marc; Guipaud, Olivier
2013-07-10
The finding of new diagnostic and prognostic markers of local radiation injury, and particularly of the cutaneous radiation syndrome, is crucial for its medical management, in the case of both accidental exposure and radiotherapy side effects. Especially, a fast high-throughput method is still needed for triage of people accidentally exposed to ionizing radiation. In this study, we investigated the impact of localized irradiation of the skin on the early alteration of the serum proteome of mice in an effort to discover markers associated with the exposure and severity of impending damage. Using two different large-scale quantitative proteomic approaches, 2D-DIGE-MS and SELDI-TOF-MS, we performed global analyses of serum proteins collected in the clinical latency phase (days 3 and 7) from non-irradiated and locally irradiated mice exposed to high doses of 20, 40 and 80 Gy which will develop respectively erythema, moist desquamation and necrosis. Unsupervised and supervised multivariate statistical analyses (principal component analysis, partial-least square discriminant analysis and Random Forest analysis) using 2D-DIGE quantitative protein data allowed us to discriminate early between non-irradiated and irradiated animals, and between uninjured/slightly injured animals and animals that will develop severe lesions. On the other hand, despite a high number of animal replicates, PLS-DA and Random Forest analyses of SELDI-TOF-MS data failed to reveal sets of MS peaks able to discriminate between the different groups of animals. Our results show that, unlike SELDI-TOF-MS, the 2D-DIGE approach remains a powerful and promising method for the discovery of sets of proteins that could be used for the development of clinical tests for triage and the prognosis of the severity of radiation-induced skin lesions. We propose a list of 15 proteins which constitutes a set of candidate proteins for triage and prognosis of skin lesion outcomes.
Chaze, Thibault; Hornez, Louis; Chambon, Christophe; Haddad, Iman; Vinh, Joelle; Peyrat, Jean-Philippe; Benderitter, Marc; Guipaud, Olivier
2013-01-01
The finding of new diagnostic and prognostic markers of local radiation injury, and particularly of the cutaneous radiation syndrome, is crucial for its medical management, in the case of both accidental exposure and radiotherapy side effects. Especially, a fast high-throughput method is still needed for triage of people accidentally exposed to ionizing radiation. In this study, we investigated the impact of localized irradiation of the skin on the early alteration of the serum proteome of mice in an effort to discover markers associated with the exposure and severity of impending damage. Using two different large-scale quantitative proteomic approaches, 2D-DIGE-MS and SELDI-TOF-MS, we performed global analyses of serum proteins collected in the clinical latency phase (days 3 and 7) from non-irradiated and locally irradiated mice exposed to high doses of 20, 40 and 80 Gy which will develop respectively erythema, moist desquamation and necrosis. Unsupervised and supervised multivariate statistical analyses (principal component analysis, partial-least square discriminant analysis and Random Forest analysis) using 2D-DIGE quantitative protein data allowed us to discriminate early between non-irradiated and irradiated animals, and between uninjured/slightly injured animals and animals that will develop severe lesions. On the other hand, despite a high number of animal replicates, PLS-DA and Random Forest analyses of SELDI-TOF-MS data failed to reveal sets of MS peaks able to discriminate between the different groups of animals. Our results show that, unlike SELDI-TOF-MS, the 2D-DIGE approach remains a powerful and promising method for the discovery of sets of proteins that could be used for the development of clinical tests for triage and the prognosis of the severity of radiation-induced skin lesions. We propose a list of 15 proteins which constitutes a set of candidate proteins for triage and prognosis of skin lesion outcomes. PMID:28250398
Stevens, Adam; Murray, Philip; Wojcik, Jerome; Raelson, John; Koledova, Ekaterina; Chatelain, Pierre; Clayton, Peter
2016-12-01
Single-nucleotide polymorphisms (SNPs) associated with the response to recombinant human growth hormone (r-hGH) have previously been identified in growth hormone deficiency (GHD) and Turner syndrome (TS) children in the PREDICT long-term follow-up (LTFU) study (Nbib699855). Here, we describe the PREDICT validation (VAL) study (Nbib1419249), which aimed to confirm these genetic associations. Children with GHD (n = 293) or TS (n = 132) were recruited retrospectively from 29 sites in nine countries. All children had completed 1 year of r-hGH therapy. 48 SNPs previously identified as associated with first year growth response to r-hGH were genotyped. Regression analysis was used to assess the association between genotype and growth response using clinical/auxological variables as covariates. Further analysis was undertaken using random forest classification. The children were younger, and the growth response was higher in VAL study. Direct genotype analysis did not replicate what was found in the LTFU study. However, using exploratory regression models with covariates, a consistent relationship with growth response in both VAL and LTFU was shown for four genes - SOS1 and INPPL1 in GHD and ESR1 and PTPN1 in TS. The random forest analysis demonstrated that only clinical covariates were important in the prediction of growth response in mild GHD (>4 to <10 μg/L on GH stimulation test), however, in severe GHD (≤4 μg/L) several SNPs contributed (in IGF2, GRB10, FOS, IGFBP3 and GHRHR). The PREDICT validation study supports, in an independent cohort, the association of four of 48 genetic markers with growth response to r-hGH treatment in both pre-pubertal GHD and TS children after controlling for clinical/auxological covariates. However, the contribution of these SNPs in a prediction model of first-year response is not sufficient for routine clinical use. © 2016 European Society of Endocrinology.
Influence of socioeconomic status on the whole blood transcriptome in African Americans.
Gaye, Amadou; Gibbons, Gary H; Barry, Charles; Quarells, Rakale; Davis, Sharon K
2017-01-01
The correlation between low socioeconomic status (SES) and poor health outcome or higher risk of disease has been consistently reported by many epidemiological studies across various race/ancestry groups. However, the biological mechanisms linking low SES to disease and/or disease risk factors are not well understood and remain relatively under-studied. The analysis of the blood transcriptome is a promising window for elucidating how social and environmental factors influence the molecular networks governing health and disease. To further define the mechanistic pathways between social determinants and health, this study examined the impact of SES on the blood transcriptome in a sample of African-Americans. An integrative approach leveraging three complementary methods (Weighted Gene Co-expression Network Analysis, Random Forest and Differential Expression) was adopted to identify the most predictive and robust transcriptome pathways associated with SES. We analyzed the expression of 15079 genes (RNA-seq) from whole blood across 36 samples. The results revealed a cluster of 141 co-expressed genes over-expressed in the low SES group. Three pro-inflammatory pathways (IL-8 Signaling, NF-κB Signaling and Dendritic Cell Maturation) are activated in this module and over-expressed in low SES. Random Forest analysis revealed 55 of the 141 genes that, collectively, predict SES with an area under the curve of 0.85. One third of the 141 genes are significantly over-expressed in the low SES group. Lower SES has consistently been linked to many social and environmental conditions acting as stressors and known to be correlated with vulnerability to chronic illnesses (e.g. asthma, diabetes) associated with a chronic inflammatory state. Our unbiased analysis of the blood transcriptome in African-Americans revealed evidence of a robust molecular signature of increased inflammation associated with low SES. The results provide a plausible link between the social factors and chronic inflammation.
Recognizing pedestrian's unsafe behaviors in far-infrared imagery at night
NASA Astrophysics Data System (ADS)
Lee, Eun Ju; Ko, Byoung Chul; Nam, Jae-Yeal
2016-05-01
Pedestrian behavior recognition is important work for early accident prevention in advanced driver assistance system (ADAS). In particular, because most pedestrian-vehicle crashes are occurred from late of night to early of dawn, our study focus on recognizing unsafe behavior of pedestrians using thermal image captured from moving vehicle at night. For recognizing unsafe behavior, this study uses convolutional neural network (CNN) which shows high quality of recognition performance. However, because traditional CNN requires the very expensive training time and memory, we design the light CNN consisted of two convolutional layers and two subsampling layers for real-time processing of vehicle applications. In addition, we combine light CNN with boosted random forest (Boosted RF) classifier so that the output of CNN is not fully connected with the classifier but randomly connected with Boosted random forest. We named this CNN as randomly connected CNN (RC-CNN). The proposed method was successfully applied to the pedestrian unsafe behavior (PUB) dataset captured from far-infrared camera at night and its behavior recognition accuracy is confirmed to be higher than that of some algorithms related to CNNs, with a shorter processing time.
Gerald E. Rehfeldt; Barry C. Jaquish; Javier Lopez-Upton; Cuauhtemoc Saenz-Romero; J. Bradley St Clair; Laura P. Leites; Dennis G. Joyce
2014-01-01
The Random Forests classification algorithm was used to predict the occurrence of the realized climate niche for two sub-specific varieties of Pinus ponderosa and three varieties of Pseudotsuga menziesii from presence-absence data in forest inventory ground plots. Analyses were based on ca. 271,000 observations for P. ponderosa and ca. 426,000 observations for P....
Don C. Bragg; Jeffrey L. Kershner
2004-01-01
Riparian large woody debris (LWD) recruitment simulations have traditionally applied a random angle of tree fall from two well-forested stream banks. We used a riparian LWD recruitment model (CWD, version 1.4) to test the validity these assumptions. Both the number of contributing forest banks and predominant tree fall direction significantly influenced simulated...
Scott H. Stoleson; Todd E. Ristau; David S. deCalesta; Stephen B. Horsley
2011-01-01
Use of herbicides in forestry to direct successional trajectories has raised concerns over possible direct or indirect effects on non-target organisms. We studied the response of forest birds to an operational application of glyphosate and sulfometuron methyl herbicides, using a randomized block design in which half of each 8 ha block received herbicide and the other...
Jill M. Wick; Yong Wang
2010-01-01
We evaluated habitat use and home range size of hooded warblers (Wilsonia citrine) and worm-eating warblers (Helmitheros vermivorus) in six treated mixed oak-pine stands on the Bankhead National Forest in north-central AL. Study design is a randomized complete block with a factorial arrangement of three thinning levels (no thin, 11...
R. B. Foltz
2012-01-01
This study tested the erosion mitigation effectiveness of agricultural straw and two wood-based mulches for four years on decommissioned forest roads. Plots were installed on the loosely consolidated, bare soil to measure sediment production, mulch cover, and plant regrowth. The experimental design was a repeated measures, randomized block on two soil types common in...
Hengl, Tomislav; Heuvelink, Gerard B. M.; Kempen, Bas; Leenaars, Johan G. B.; Walsh, Markus G.; Shepherd, Keith D.; Sila, Andrew; MacMillan, Robert A.; Mendes de Jesus, Jorge; Tamene, Lulseged; Tondoh, Jérôme E.
2015-01-01
80% of arable land in Africa has low soil fertility and suffers from physical soil problems. Additionally, significant amounts of nutrients are lost every year due to unsustainable soil management practices. This is partially the result of insufficient use of soil management knowledge. To help bridge the soil information gap in Africa, the Africa Soil Information Service (AfSIS) project was established in 2008. Over the period 2008–2014, the AfSIS project compiled two point data sets: the Africa Soil Profiles (legacy) database and the AfSIS Sentinel Site database. These data sets contain over 28 thousand sampling locations and represent the most comprehensive soil sample data sets of the African continent to date. Utilizing these point data sets in combination with a large number of covariates, we have generated a series of spatial predictions of soil properties relevant to the agricultural management—organic carbon, pH, sand, silt and clay fractions, bulk density, cation-exchange capacity, total nitrogen, exchangeable acidity, Al content and exchangeable bases (Ca, K, Mg, Na). We specifically investigate differences between two predictive approaches: random forests and linear regression. Results of 5-fold cross-validation demonstrate that the random forests algorithm consistently outperforms the linear regression algorithm, with average decreases of 15–75% in Root Mean Squared Error (RMSE) across soil properties and depths. Fitting and running random forests models takes an order of magnitude more time and the modelling success is sensitive to artifacts in the input data, but as long as quality-controlled point data are provided, an increase in soil mapping accuracy can be expected. Results also indicate that globally predicted soil classes (USDA Soil Taxonomy, especially Alfisols and Mollisols) help improve continental scale soil property mapping, and are among the most important predictors. This indicates a promising potential for transferring pedological knowledge from data rich countries to countries with limited soil data. PMID:26110833
Ensemble Pruning for Glaucoma Detection in an Unbalanced Data Set.
Adler, Werner; Gefeller, Olaf; Gul, Asma; Horn, Folkert K; Khan, Zardad; Lausen, Berthold
2016-12-07
Random forests are successful classifier ensemble methods consisting of typically 100 to 1000 classification trees. Ensemble pruning techniques reduce the computational cost, especially the memory demand, of random forests by reducing the number of trees without relevant loss of performance or even with increased performance of the sub-ensemble. The application to the problem of an early detection of glaucoma, a severe eye disease with low prevalence, based on topographical measurements of the eye background faces specific challenges. We examine the performance of ensemble pruning strategies for glaucoma detection in an unbalanced data situation. The data set consists of 102 topographical features of the eye background of 254 healthy controls and 55 glaucoma patients. We compare the area under the receiver operating characteristic curve (AUC), and the Brier score on the total data set, in the majority class, and in the minority class of pruned random forest ensembles obtained with strategies based on the prediction accuracy of greedily grown sub-ensembles, the uncertainty weighted accuracy, and the similarity between single trees. To validate the findings and to examine the influence of the prevalence of glaucoma in the data set, we additionally perform a simulation study with lower prevalences of glaucoma. In glaucoma classification all three pruning strategies lead to improved AUC and smaller Brier scores on the total data set with sub-ensembles as small as 30 to 80 trees compared to the classification results obtained with the full ensemble consisting of 1000 trees. In the simulation study, we were able to show that the prevalence of glaucoma is a critical factor and lower prevalence decreases the performance of our pruning strategies. The memory demand for glaucoma classification in an unbalanced data situation based on random forests could effectively be reduced by the application of pruning strategies without loss of performance in a population with increased risk of glaucoma.
Dimitriadis, S I; Liparas, Dimitris; Tsolaki, Magda N
2018-05-15
In the era of computer-assisted diagnostic tools for various brain diseases, Alzheimer's disease (AD) covers a large percentage of neuroimaging research, with the main scope being its use in daily practice. However, there has been no study attempting to simultaneously discriminate among Healthy Controls (HC), early mild cognitive impairment (MCI), late MCI (cMCI) and stable AD, using features derived from a single modality, namely MRI. Based on preprocessed MRI images from the organizers of a neuroimaging challenge, 3 we attempted to quantify the prediction accuracy of multiple morphological MRI features to simultaneously discriminate among HC, MCI, cMCI and AD. We explored the efficacy of a novel scheme that includes multiple feature selections via Random Forest from subsets of the whole set of features (e.g. whole set, left/right hemisphere etc.), Random Forest classification using a fusion approach and ensemble classification via majority voting. From the ADNI database, 60 HC, 60 MCI, 60 cMCI and 60 CE were used as a training set with known labels. An extra dataset of 160 subjects (HC: 40, MCI: 40, cMCI: 40 and AD: 40) was used as an external blind validation dataset to evaluate the proposed machine learning scheme. In the second blind dataset, we succeeded in a four-class classification of 61.9% by combining MRI-based features with a Random Forest-based Ensemble Strategy. We achieved the best classification accuracy of all teams that participated in this neuroimaging competition. The results demonstrate the effectiveness of the proposed scheme to simultaneously discriminate among four groups using morphological MRI features for the very first time in the literature. Hence, the proposed machine learning scheme can be used to define single and multi-modal biomarkers for AD. Copyright © 2017 Elsevier B.V. All rights reserved.
Automated segmentation of dental CBCT image with prior-guided sequential random forests
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wang, Li; Gao, Yaozong; Shi, Feng
Purpose: Cone-beam computed tomography (CBCT) is an increasingly utilized imaging modality for the diagnosis and treatment planning of the patients with craniomaxillofacial (CMF) deformities. Accurate segmentation of CBCT image is an essential step to generate 3D models for the diagnosis and treatment planning of the patients with CMF deformities. However, due to the image artifacts caused by beam hardening, imaging noise, inhomogeneity, truncation, and maximal intercuspation, it is difficult to segment the CBCT. Methods: In this paper, the authors present a new automatic segmentation method to address these problems. Specifically, the authors first employ a majority voting method to estimatemore » the initial segmentation probability maps of both mandible and maxilla based on multiple aligned expert-segmented CBCT images. These probability maps provide an important prior guidance for CBCT segmentation. The authors then extract both the appearance features from CBCTs and the context features from the initial probability maps to train the first-layer of random forest classifier that can select discriminative features for segmentation. Based on the first-layer of trained classifier, the probability maps are updated, which will be employed to further train the next layer of random forest classifier. By iteratively training the subsequent random forest classifier using both the original CBCT features and the updated segmentation probability maps, a sequence of classifiers can be derived for accurate segmentation of CBCT images. Results: Segmentation results on CBCTs of 30 subjects were both quantitatively and qualitatively validated based on manually labeled ground truth. The average Dice ratios of mandible and maxilla by the authors’ method were 0.94 and 0.91, respectively, which are significantly better than the state-of-the-art method based on sparse representation (p-value < 0.001). Conclusions: The authors have developed and validated a novel fully automated method for CBCT segmentation.« less
Mi, Chunrong; Huettmann, Falk; Guo, Yumin; Han, Xuesong; Wen, Lijia
2017-01-01
Species distribution models (SDMs) have become an essential tool in ecology, biogeography, evolution and, more recently, in conservation biology. How to generalize species distributions in large undersampled areas, especially with few samples, is a fundamental issue of SDMs. In order to explore this issue, we used the best available presence records for the Hooded Crane ( Grus monacha , n = 33), White-naped Crane ( Grus vipio , n = 40), and Black-necked Crane ( Grus nigricollis , n = 75) in China as three case studies, employing four powerful and commonly used machine learning algorithms to map the breeding distributions of the three species: TreeNet (Stochastic Gradient Boosting, Boosted Regression Tree Model), Random Forest, CART (Classification and Regression Tree) and Maxent (Maximum Entropy Models). In addition, we developed an ensemble forecast by averaging predicted probability of the above four models results. Commonly used model performance metrics (Area under ROC (AUC) and true skill statistic (TSS)) were employed to evaluate model accuracy. The latest satellite tracking data and compiled literature data were used as two independent testing datasets to confront model predictions. We found Random Forest demonstrated the best performance for the most assessment method, provided a better model fit to the testing data, and achieved better species range maps for each crane species in undersampled areas. Random Forest has been generally available for more than 20 years and has been known to perform extremely well in ecological predictions. However, while increasingly on the rise, its potential is still widely underused in conservation, (spatial) ecological applications and for inference. Our results show that it informs ecological and biogeographical theories as well as being suitable for conservation applications, specifically when the study area is undersampled. This method helps to save model-selection time and effort, and allows robust and rapid assessments and decisions for efficient conservation.
Mi, Chunrong; Huettmann, Falk; Han, Xuesong; Wen, Lijia
2017-01-01
Species distribution models (SDMs) have become an essential tool in ecology, biogeography, evolution and, more recently, in conservation biology. How to generalize species distributions in large undersampled areas, especially with few samples, is a fundamental issue of SDMs. In order to explore this issue, we used the best available presence records for the Hooded Crane (Grus monacha, n = 33), White-naped Crane (Grus vipio, n = 40), and Black-necked Crane (Grus nigricollis, n = 75) in China as three case studies, employing four powerful and commonly used machine learning algorithms to map the breeding distributions of the three species: TreeNet (Stochastic Gradient Boosting, Boosted Regression Tree Model), Random Forest, CART (Classification and Regression Tree) and Maxent (Maximum Entropy Models). In addition, we developed an ensemble forecast by averaging predicted probability of the above four models results. Commonly used model performance metrics (Area under ROC (AUC) and true skill statistic (TSS)) were employed to evaluate model accuracy. The latest satellite tracking data and compiled literature data were used as two independent testing datasets to confront model predictions. We found Random Forest demonstrated the best performance for the most assessment method, provided a better model fit to the testing data, and achieved better species range maps for each crane species in undersampled areas. Random Forest has been generally available for more than 20 years and has been known to perform extremely well in ecological predictions. However, while increasingly on the rise, its potential is still widely underused in conservation, (spatial) ecological applications and for inference. Our results show that it informs ecological and biogeographical theories as well as being suitable for conservation applications, specifically when the study area is undersampled. This method helps to save model-selection time and effort, and allows robust and rapid assessments and decisions for efficient conservation. PMID:28097060
NASA Astrophysics Data System (ADS)
Ramoelo, Abel; Cho, M. A.; Mathieu, R.; Madonsela, S.; van de Kerchove, R.; Kaszta, Z.; Wolff, E.
2015-12-01
Land use and climate change could have huge impacts on food security and the health of various ecosystems. Leaf nitrogen (N) and above-ground biomass are some of the key factors limiting agricultural production and ecosystem functioning. Leaf N and biomass can be used as indicators of rangeland quality and quantity. Conventional methods for assessing these vegetation parameters at landscape scale level are time consuming and tedious. Remote sensing provides a bird-eye view of the landscape, which creates an opportunity to assess these vegetation parameters over wider rangeland areas. Estimation of leaf N has been successful during peak productivity or high biomass and limited studies estimated leaf N in dry season. The estimation of above-ground biomass has been hindered by the signal saturation problems using conventional vegetation indices. The objective of this study is to monitor leaf N and above-ground biomass as an indicator of rangeland quality and quantity using WorldView-2 satellite images and random forest technique in the north-eastern part of South Africa. Series of field work to collect samples for leaf N and biomass were undertaken in March 2013, April or May 2012 (end of wet season) and July 2012 (dry season). Several conventional and red edge based vegetation indices were computed. Overall results indicate that random forest and vegetation indices explained over 89% of leaf N concentrations for grass and trees, and less than 89% for all the years of assessment. The red edge based vegetation indices were among the important variables for predicting leaf N. For the biomass, random forest model explained over 84% of biomass variation in all years, and visible bands including red edge based vegetation indices were found to be important. The study demonstrated that leaf N could be monitored using high spatial resolution with the red edge band capability, and is important for rangeland assessment and monitoring.
Hengl, Tomislav; Heuvelink, Gerard B M; Kempen, Bas; Leenaars, Johan G B; Walsh, Markus G; Shepherd, Keith D; Sila, Andrew; MacMillan, Robert A; Mendes de Jesus, Jorge; Tamene, Lulseged; Tondoh, Jérôme E
2015-01-01
80% of arable land in Africa has low soil fertility and suffers from physical soil problems. Additionally, significant amounts of nutrients are lost every year due to unsustainable soil management practices. This is partially the result of insufficient use of soil management knowledge. To help bridge the soil information gap in Africa, the Africa Soil Information Service (AfSIS) project was established in 2008. Over the period 2008-2014, the AfSIS project compiled two point data sets: the Africa Soil Profiles (legacy) database and the AfSIS Sentinel Site database. These data sets contain over 28 thousand sampling locations and represent the most comprehensive soil sample data sets of the African continent to date. Utilizing these point data sets in combination with a large number of covariates, we have generated a series of spatial predictions of soil properties relevant to the agricultural management--organic carbon, pH, sand, silt and clay fractions, bulk density, cation-exchange capacity, total nitrogen, exchangeable acidity, Al content and exchangeable bases (Ca, K, Mg, Na). We specifically investigate differences between two predictive approaches: random forests and linear regression. Results of 5-fold cross-validation demonstrate that the random forests algorithm consistently outperforms the linear regression algorithm, with average decreases of 15-75% in Root Mean Squared Error (RMSE) across soil properties and depths. Fitting and running random forests models takes an order of magnitude more time and the modelling success is sensitive to artifacts in the input data, but as long as quality-controlled point data are provided, an increase in soil mapping accuracy can be expected. Results also indicate that globally predicted soil classes (USDA Soil Taxonomy, especially Alfisols and Mollisols) help improve continental scale soil property mapping, and are among the most important predictors. This indicates a promising potential for transferring pedological knowledge from data rich countries to countries with limited soil data.
Coeli M. Hoover; James E. Smith
2012-01-01
The documented role of United States forests in sequestering carbon, the relatively low cost of forest-based mitigation, and the many co-benefits of increasing forest carbon stocks all contribute to the ongoing trend in the establishment of forest-based carbon offset projects. We present a broad analysis of forest inventory data using site quality indicators to provide...
Raymond L. Czaplewski
2004-01-01
This report discusses valid use of data produced by the Forest Service?s Forest Inventory and Analysis (FIA) program and used by the Northern Region of the National Forest System to analyze the compliance of individual National Forests with their Standards and Guidelines. It emphasizes use of FIA data on snag density and the percentage of forest area that meets the...
Guo, Yao-xin; Kang, Bing; Li, Gang; Wang, De-xiang; Yang, Gai-he; Wang, Da-wei
2011-10-01
An investigation was conducted on the species composition and population diameter-class structure of a typical secondary Betula albo-sinensis forest in Xiaolongshan of west Qinling Mountains, and the spatial distribution pattern and interspecific correlations of the main populations were analyzed at multiple scales by the O-ring functions of single variable and double variables. In the test forest, B. albo-sinensis was obviously dominant, but from the analysis of DBH class distribution, the B. albo-sinensis seedlings were short of, and the natural regeneration was very poor. O the contrary, the regeneration of Abies fargesii and Populus davidianas was fine. B. albo-sinensis and Salix matsudana had a random distribution at almost all scales, while A. fargesii and P. davidianas were significantly clumped at small scale. B. albo-sinensis had positive correlations with A. fargesii and P. davidianas at medium scale, whereas S. matsudana had negative correlations with B. albo-sinensis, A. fargesii, and P. davidianas at small scale. No significant correlations were observed between other species. The findings suggested that the spatial distribution patterns of the tree species depended on their biological characteristics at small scale, but on the environmental heterogeneity at larger scales. In a period of future time, B. albo-sinensis would still be dominant, but from a long-term view, it was necessary to take some artificial measures to improve the regeneratio of B. albo-sinensis.
T. J. Brandeis
2003-01-01
Rapid Changes in vegetation over short distances, high species diversity, and fragmented landscape challege the implementation of the Forest service's Forest inventory and Analysis (FIA)program on Puerto Rico. Applying the hexagonal FIA grid as used on the continental United States, the Forest service is installing a new forest sampling and monitoring framework...
UFORE (i-Tree Eco) Analysis of Chicago
Cherie LeBlanc Fisher; David Nowak
2010-01-01
The USDA Forest Service and City of Chicago conducted a UFORE (now called i-Tree Eco) analysis of Chicago's urban forest in the summer of 2007. The UFORE (Urban FORest Effects) model developed by the Forest Service uses on-the-ground sampling data to understand the composition of on urban forest and calculate the forest's impacts on air pollution and energy...
Forest Inventory and Analysis: What it Tells Us About Water Quality in Arkansas
Edwin L. Miller; Hal O. Liechty
2001-01-01
Forests and forest activities have a significant impact on the amount and quality of surface water in Arkansas. Recognizing this important relationship between forests and water quality, we utilized the Forest Inventory and Analysis (FIA) data from Arkansas to better understand how forest land use in Arkansas has likely influenced the water quality in the State during...
Modelling Biophysical Parameters of Maize Using Landsat 8 Time Series
NASA Astrophysics Data System (ADS)
Dahms, Thorsten; Seissiger, Sylvia; Conrad, Christopher; Borg, Erik
2016-06-01
Open and free access to multi-frequent high-resolution data (e.g. Sentinel - 2) will fortify agricultural applications based on satellite data. The temporal and spatial resolution of these remote sensing datasets directly affects the applicability of remote sensing methods, for instance a robust retrieving of biophysical parameters over the entire growing season with very high geometric resolution. In this study we use machine learning methods to predict biophysical parameters, namely the fraction of absorbed photosynthetic radiation (FPAR), the leaf area index (LAI) and the chlorophyll content, from high resolution remote sensing. 30 Landsat 8 OLI scenes were available in our study region in Mecklenburg-Western Pomerania, Germany. In-situ data were weekly to bi-weekly collected on 18 maize plots throughout the summer season 2015. The study aims at an optimized prediction of biophysical parameters and the identification of the best explaining spectral bands and vegetation indices. For this purpose, we used the entire in-situ dataset from 24.03.2015 to 15.10.2015. Random forest and conditional inference forests were used because of their explicit strong exploratory and predictive character. Variable importance measures allowed for analysing the relation between the biophysical parameters with respect to the spectral response, and the performance of the two approaches over the plant stock evolvement. Classical random forest regression outreached the performance of conditional inference forests, in particular when modelling the biophysical parameters over the entire growing period. For example, modelling biophysical parameters of maize for the entire vegetation period using random forests yielded: FPAR: R² = 0.85; RMSE = 0.11; LAI: R² = 0.64; RMSE = 0.9 and chlorophyll content (SPAD): R² = 0.80; RMSE=4.9. Our results demonstrate the great potential in using machine-learning methods for the interpretation of long-term multi-frequent remote sensing datasets to model biophysical parameters.
Todd A. Schroeder; Sean P. Healey; Gretchen G. Moisen; Tracey S. Frescino; Warren B. Cohen; Chengquan Huang; Robert E. Kennedy; Zhiqiang Yang
2014-01-01
With earth's surface temperature and human population both on the rise a new emphasis has been placed on monitoring changes to forested ecosystems the world over. In the United States the U.S. Forest Service Forest Inventory and Analysis (FIA) program monitors the forested land base with field data collected over a permanent network of sample plots. Although these...
Heidema, A Geert; Boer, Jolanda M A; Nagelkerke, Nico; Mariman, Edwin C M; van der A, Daphne L; Feskens, Edith J M
2006-04-21
Genetic epidemiologists have taken the challenge to identify genetic polymorphisms involved in the development of diseases. Many have collected data on large numbers of genetic markers but are not familiar with available methods to assess their association with complex diseases. Statistical methods have been developed for analyzing the relation between large numbers of genetic and environmental predictors to disease or disease-related variables in genetic association studies. In this commentary we discuss logistic regression analysis, neural networks, including the parameter decreasing method (PDM) and genetic programming optimized neural networks (GPNN) and several non-parametric methods, which include the set association approach, combinatorial partitioning method (CPM), restricted partitioning method (RPM), multifactor dimensionality reduction (MDR) method and the random forests approach. The relative strengths and weaknesses of these methods are highlighted. Logistic regression and neural networks can handle only a limited number of predictor variables, depending on the number of observations in the dataset. Therefore, they are less useful than the non-parametric methods to approach association studies with large numbers of predictor variables. GPNN on the other hand may be a useful approach to select and model important predictors, but its performance to select the important effects in the presence of large numbers of predictors needs to be examined. Both the set association approach and random forests approach are able to handle a large number of predictors and are useful in reducing these predictors to a subset of predictors with an important contribution to disease. The combinatorial methods give more insight in combination patterns for sets of genetic and/or environmental predictor variables that may be related to the outcome variable. As the non-parametric methods have different strengths and weaknesses we conclude that to approach genetic association studies using the case-control design, the application of a combination of several methods, including the set association approach, MDR and the random forests approach, will likely be a useful strategy to find the important genes and interaction patterns involved in complex diseases.
Trainor, Patrick J; DeFilippis, Andrew P; Rai, Shesh N
2017-06-21
Statistical classification is a critical component of utilizing metabolomics data for examining the molecular determinants of phenotypes. Despite this, a comprehensive and rigorous evaluation of the accuracy of classification techniques for phenotype discrimination given metabolomics data has not been conducted. We conducted such an evaluation using both simulated and real metabolomics datasets, comparing Partial Least Squares-Discriminant Analysis (PLS-DA), Sparse PLS-DA, Random Forests, Support Vector Machines (SVM), Artificial Neural Network, k -Nearest Neighbors ( k -NN), and Naïve Bayes classification techniques for discrimination. We evaluated the techniques on simulated data generated to mimic global untargeted metabolomics data by incorporating realistic block-wise correlation and partial correlation structures for mimicking the correlations and metabolite clustering generated by biological processes. Over the simulation studies, covariance structures, means, and effect sizes were stochastically varied to provide consistent estimates of classifier performance over a wide range of possible scenarios. The effects of the presence of non-normal error distributions, the introduction of biological and technical outliers, unbalanced phenotype allocation, missing values due to abundances below a limit of detection, and the effect of prior-significance filtering (dimension reduction) were evaluated via simulation. In each simulation, classifier parameters, such as the number of hidden nodes in a Neural Network, were optimized by cross-validation to minimize the probability of detecting spurious results due to poorly tuned classifiers. Classifier performance was then evaluated using real metabolomics datasets of varying sample medium, sample size, and experimental design. We report that in the most realistic simulation studies that incorporated non-normal error distributions, unbalanced phenotype allocation, outliers, missing values, and dimension reduction, classifier performance (least to greatest error) was ranked as follows: SVM, Random Forest, Naïve Bayes, sPLS-DA, Neural Networks, PLS-DA and k -NN classifiers. When non-normal error distributions were introduced, the performance of PLS-DA and k -NN classifiers deteriorated further relative to the remaining techniques. Over the real datasets, a trend of better performance of SVM and Random Forest classifier performance was observed.
FOCIS: A forest classification and inventory system using LANDSAT and digital terrain data
NASA Technical Reports Server (NTRS)
Strahler, A. H.; Franklin, J.; Woodcook, C. E.; Logan, T. L.
1981-01-01
Accurate, cost-effective stratification of forest vegetation and timber inventory is the primary goal of a Forest Classification and Inventory System (FOCIS). Conventional timber stratification using photointerpretation can be time-consuming, costly, and inconsistent from analyst to analyst. FOCIS was designed to overcome these problems by using machine processing techniques to extract and process tonal, textural, and terrain information from registered LANDSAT multispectral and digital terrain data. Comparison of samples from timber strata identified by conventional procedures showed that both have about the same potential to reduce the variance of timber volume estimates over simple random sampling.
Olesen, Alexander Neergaard; Christensen, Julie A E; Sorensen, Helge B D; Jennum, Poul J
2016-08-01
Reducing the number of recording modalities for sleep staging research can benefit both researchers and patients, under the condition that they provide as accurate results as conventional systems. This paper investigates the possibility of exploiting the multisource nature of the electrooculography (EOG) signals by presenting a method for automatic sleep staging using the complete ensemble empirical mode decomposition with adaptive noise algorithm, and a random forest classifier. It achieves a high overall accuracy of 82% and a Cohen's kappa of 0.74 indicating substantial agreement between automatic and manual scoring.
Spatial Analysis for Monitoring Forest Health
Francis A. Roesch
1994-01-01
A plan for the spatial analysis for the sample design for the detection monitoring phase in the joint USDA Forest Service/EPA Forest Health Monitoring Program (FHM) in the United States is discussed. The spatial analysis procedure is intended to more quickly identify changes in forest health by providing increased sensitivity to localized changes. The procedure is...
Global sensitivity analysis of DRAINMOD-FOREST, an integrated forest ecosystem model
Shiying Tian; Mohamed A. Youssef; Devendra M. Amatya; Eric D. Vance
2014-01-01
Global sensitivity analysis is a useful tool to understand process-based ecosystem models by identifying key parameters and processes controlling model predictions. This study reported a comprehensive global sensitivity analysis for DRAINMOD-FOREST, an integrated model for simulating water, carbon (C), and nitrogen (N) cycles and plant growth in lowland forests. The...
NASA Astrophysics Data System (ADS)
Hänsch, Ronny; Hellwich, Olaf
2018-04-01
Random Forests have continuously proven to be one of the most accurate, robust, as well as efficient methods for the supervised classification of images in general and polarimetric synthetic aperture radar data in particular. While the majority of previous work focus on improving classification accuracy, we aim for accelerating the training of the classifier as well as its usage during prediction while maintaining its accuracy. Unlike other approaches we mainly consider algorithmic changes to stay as much as possible independent of platform and programming language. The final model achieves an approximately 60 times faster training and a 500 times faster prediction, while the accuracy is only marginally decreased by roughly 1 %.
Underwater image enhancement through depth estimation based on random forest
NASA Astrophysics Data System (ADS)
Tai, Shen-Chuan; Tsai, Ting-Chou; Huang, Jyun-Han
2017-11-01
Light absorption and scattering in underwater environments can result in low-contrast images with a distinct color cast. This paper proposes a systematic framework for the enhancement of underwater images. Light transmission is estimated using the random forest algorithm. RGB values, luminance, color difference, blurriness, and the dark channel are treated as features in training and estimation. Transmission is calculated using an ensemble machine learning algorithm to deal with a variety of conditions encountered in underwater environments. A color compensation and contrast enhancement algorithm based on depth information was also developed with the aim of improving the visual quality of underwater images. Experimental results demonstrate that the proposed scheme outperforms existing methods with regard to subjective visual quality as well as objective measurements.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lee Spangler; Lee A. Vierling; Eva K. Stand
2012-04-01
Sound policy recommendations relating to the role of forest management in mitigating atmospheric carbon dioxide (CO{sub 2}) depend upon establishing accurate methodologies for quantifying forest carbon pools for large tracts of land that can be dynamically updated over time. Light Detection and Ranging (LiDAR) remote sensing is a promising technology for achieving accurate estimates of aboveground biomass and thereby carbon pools; however, not much is known about the accuracy of estimating biomass change and carbon flux from repeat LiDAR acquisitions containing different data sampling characteristics. In this study, discrete return airborne LiDAR data was collected in 2003 and 2009 acrossmore » {approx}20,000 hectares (ha) of an actively managed, mixed conifer forest landscape in northern Idaho, USA. Forest inventory plots, established via a random stratified sampling design, were established and sampled in 2003 and 2009. The Random Forest machine learning algorithm was used to establish statistical relationships between inventory data and forest structural metrics derived from the LiDAR acquisitions. Aboveground biomass maps were created for the study area based on statistical relationships developed at the plot level. Over this 6-year period, we found that the mean increase in biomass due to forest growth across the non-harvested portions of the study area was 4.8 metric ton/hectare (Mg/ha). In these non-harvested areas, we found a significant difference in biomass increase among forest successional stages, with a higher biomass increase in mature and old forest compared to stand initiation and young forest. Approximately 20% of the landscape had been disturbed by harvest activities during the six-year time period, representing a biomass loss of >70 Mg/ha in these areas. During the study period, these harvest activities outweighed growth at the landscape scale, resulting in an overall loss in aboveground carbon at this site. The 30-fold increase in sampling density between the 2003 and 2009 did not affect the biomass estimates. Overall, LiDAR data coupled with field reference data offer a powerful method for calculating pools and changes in aboveground carbon in forested systems. The results of our study suggest that multitemporal LiDAR-based approaches are likely to be useful for high quality estimates of aboveground carbon change in conifer forest systems.« less
Invasive plants found in east Texas forests, 2009 forest inventory and analysis factsheet
Sonja N. Oswalt; Christopher M. Oswalt
2011-01-01
This science update provides information on the presence and cover of nonnative invasive plants found in forests of the eastern region of the State of Texas based on an annual inventory conducted by the Forest Inventory and Analysis (FIA) Program at the Southern Research Station of the U.S. Department of Agriculture Forest Service in cooperation with the Texas Forest...
Use of ancillary data to improve the analysis of forest health indicators
Dave Gartner
2013-01-01
In addition to its standard suite of mensuration variables, the Forest Inventory and Analysis (FIA) program of the U.S. Forest Service also collects data on forest health variables formerly measured by the Forest Health Monitoring program. FIA obtains forest health information on a subset of the base sample plots. Due to the sample size differences, the two sets of...
Conterminous U.S. and Alaska Forest Type Mapping Using Forest Inventory and Analysis Data
B. Ruefenacht; M.V. Finco; M.D. Nelson; R. Czaplewski; E.H. Helmer; J. A. Blackard; G.R. Holden; A.J. Lister; D. Salajanu; D. Weyermann; K. Winterberger
2008-01-01
Classification-trees were used to model forest type groups and forest types for the conterminous United States and Alaska. The predictor data were a geospatial data set with a spatial resolution of 250 m developed by the U.S. Department of Agriculture Forest Service (USFS). The response data were plot data from the USFS Forest Inventory and Analysis program. Overall...
Invasive plants found in North Carolina’s forests, 2010—forest inventory and analysis factsheet
Christopher M. Oswalt; Sonja N. Oswalt
2014-01-01
This publication provides an overview of invasive plants found in forests of the State of North Carolina based on an annual inventory conducted by the Forest Inventory and Analysis (FIA) Program at the Southern Research Station (SRS) of the Forest Service, U.S. Department of Agriculture in cooperation with the North Carolina Forest Service. These estimates and coverage...