multivariate random forest: Topics by Science.gov

Sample records for multivariate random forest

Hierarchical Bayesian spatial models for predicting multiple forest variables using waveform LiDAR, hyperspectral imagery, and large inventory datasets

USGS Publications Warehouse

Finley, Andrew O.; Banerjee, Sudipto; Cook, Bruce D.; Bradford, John B.

2013-01-01

In this paper we detail a multivariate spatial regression model that couples LiDAR, hyperspectral and forest inventory data to predict forest outcome variables at a high spatial resolution. The proposed model is used to analyze forest inventory data collected on the US Forest Service Penobscot Experimental Forest (PEF), ME, USA. In addition to helping meet the regression model's assumptions, results from the PEF analysis suggest that the addition of multivariate spatial random effects improves model fit and predictive ability, compared with two commonly applied modeling approaches. This improvement results from explicitly modeling the covariation among forest outcome variables and spatial dependence among observations through the random effects. Direct application of such multivariate models to even moderately large datasets is often computationally infeasible because of cubic order matrix algorithms involved in estimation. We apply a spatial dimension reduction technique to help overcome this computational hurdle without sacrificing richness in modeling.
Newer classification and regression tree techniques: Bagging and Random Forests for ecological prediction

Treesearch

Anantha M. Prasad; Louis R. Iverson; Andy Liaw; Andy Liaw

2006-01-01

We evaluated four statistical models - Regression Tree Analysis (RTA), Bagging Trees (BT), Random Forests (RF), and Multivariate Adaptive Regression Splines (MARS) - for predictive vegetation mapping under current and future climate scenarios according to the Canadian Climate Centre global circulation model.
Multivariate stochastic simulation with subjective multivariate normal distributions

Treesearch

P. J. Ince; J. Buongiorno

1991-01-01

In many applications of Monte Carlo simulation in forestry or forest products, it may be known that some variables are correlated. However, for simplicity, in most simulations it has been assumed that random variables are independently distributed. This report describes an alternative Monte Carlo simulation technique for subjectively assesed multivariate normal...
Identification by random forest method of HLA class I amino acid substitutions associated with lower survival at day 100 in unrelated donor hematopoietic cell transplantation.

PubMed

Marino, S R; Lin, S; Maiers, M; Haagenson, M; Spellman, S; Klein, J P; Binkowski, T A; Lee, S J; van Besien, K

2012-02-01

The identification of important amino acid substitutions associated with low survival in hematopoietic cell transplantation (HCT) is hampered by the large number of observed substitutions compared with the small number of patients available for analysis. Random forest analysis is designed to address these limitations. We studied 2107 HCT recipients with good or intermediate risk hematological malignancies to identify HLA class I amino acid substitutions associated with reduced survival at day 100 post transplant. Random forest analysis and traditional univariate and multivariate analyses were used. Random forest analysis identified amino acid substitutions in 33 positions that were associated with reduced 100 day survival, including HLA-A 9, 43, 62, 63, 76, 77, 95, 97, 114, 116, 152, 156, 166 and 167; HLA-B 97, 109, 116 and 156; and HLA-C 6, 9, 11, 14, 21, 66, 77, 80, 95, 97, 99, 116, 156, 163 and 173. In all 13 had been previously reported by other investigators using classical biostatistical approaches. Using the same data set, traditional multivariate logistic regression identified only five amino acid substitutions associated with lower day 100 survival. Random forest analysis is a novel statistical methodology for analysis of HLA mismatching and outcome studies, capable of identifying important amino acid substitutions missed by other methods.
Comparison of Random Forest and Parametric Imputation Models for Imputing Missing Data Using MICE: A CALIBER Study

PubMed Central

Shah, Anoop D.; Bartlett, Jonathan W.; Carpenter, James; Nicholas, Owen; Hemingway, Harry

2014-01-01

Multivariate imputation by chained equations (MICE) is commonly used for imputing missing data in epidemiologic research. The “true” imputation model may contain nonlinearities which are not included in default imputation models. Random forest imputation is a machine learning technique which can accommodate nonlinearities and interactions and does not require a particular regression model to be specified. We compared parametric MICE with a random forest-based MICE algorithm in 2 simulation studies. The first study used 1,000 random samples of 2,000 persons drawn from the 10,128 stable angina patients in the CALIBER database (Cardiovascular Disease Research using Linked Bespoke Studies and Electronic Records; 2001–2010) with complete data on all covariates. Variables were artificially made “missing at random,” and the bias and efficiency of parameter estimates obtained using different imputation methods were compared. Both MICE methods produced unbiased estimates of (log) hazard ratios, but random forest was more efficient and produced narrower confidence intervals. The second study used simulated data in which the partially observed variable depended on the fully observed variables in a nonlinear way. Parameter estimates were less biased using random forest MICE, and confidence interval coverage was better. This suggests that random forest imputation may be useful for imputing complex epidemiologic data sets in which some patients have missing data. PMID:24589914
Comparison of random forest and parametric imputation models for imputing missing data using MICE: a CALIBER study.

PubMed

Shah, Anoop D; Bartlett, Jonathan W; Carpenter, James; Nicholas, Owen; Hemingway, Harry

2014-03-15

Multivariate imputation by chained equations (MICE) is commonly used for imputing missing data in epidemiologic research. The "true" imputation model may contain nonlinearities which are not included in default imputation models. Random forest imputation is a machine learning technique which can accommodate nonlinearities and interactions and does not require a particular regression model to be specified. We compared parametric MICE with a random forest-based MICE algorithm in 2 simulation studies. The first study used 1,000 random samples of 2,000 persons drawn from the 10,128 stable angina patients in the CALIBER database (Cardiovascular Disease Research using Linked Bespoke Studies and Electronic Records; 2001-2010) with complete data on all covariates. Variables were artificially made "missing at random," and the bias and efficiency of parameter estimates obtained using different imputation methods were compared. Both MICE methods produced unbiased estimates of (log) hazard ratios, but random forest was more efficient and produced narrower confidence intervals. The second study used simulated data in which the partially observed variable depended on the fully observed variables in a nonlinear way. Parameter estimates were less biased using random forest MICE, and confidence interval coverage was better. This suggests that random forest imputation may be useful for imputing complex epidemiologic data sets in which some patients have missing data.
Enhancing Multimedia Imbalanced Concept Detection Using VIMP in Random Forests.

PubMed

Sadiq, Saad; Yan, Yilin; Shyu, Mei-Ling; Chen, Shu-Ching; Ishwaran, Hemant

2016-07-01

Recent developments in social media and cloud storage lead to an exponential growth in the amount of multimedia data, which increases the complexity of managing, storing, indexing, and retrieving information from such big data. Many current content-based concept detection approaches lag from successfully bridging the semantic gap. To solve this problem, a multi-stage random forest framework is proposed to generate predictor variables based on multivariate regressions using variable importance (VIMP). By fine tuning the forests and significantly reducing the predictor variables, the concept detection scores are evaluated when the concept of interest is rare and imbalanced, i.e., having little collaboration with other high level concepts. Using classical multivariate statistics, estimating the value of one coordinate using other coordinates standardizes the covariates and it depends upon the variance of the correlations instead of the mean. Thus, conditional dependence on the data being normally distributed is eliminated. Experimental results demonstrate that the proposed framework outperforms those approaches in the comparison in terms of the Mean Average Precision (MAP) values.
Quantifying and mapping spatial variability in simulated forest plots

Treesearch

Gavin R. Corral; Harold E. Burkhart

2016-01-01

We used computer simulations to test the efficacy of multivariate statistical methods to detect,Â quantify, and map spatial variability of forest stands. Simulated stands were developed of regularly-spacedÂ plantations of loblolly pine (Pinus taeda L.). We assumed no affects of competition or mortality, but randomÂ variability was added to individual tree characteristics...
Random forests on Hadoop for genome-wide association studies of multivariate neuroimaging phenotypes

PubMed Central

2013-01-01

Motivation Multivariate quantitative traits arise naturally in recent neuroimaging genetics studies, in which both structural and functional variability of the human brain is measured non-invasively through techniques such as magnetic resonance imaging (MRI). There is growing interest in detecting genetic variants associated with such multivariate traits, especially in genome-wide studies. Random forests (RFs) classifiers, which are ensembles of decision trees, are amongst the best performing machine learning algorithms and have been successfully employed for the prioritisation of genetic variants in case-control studies. RFs can also be applied to produce gene rankings in association studies with multivariate quantitative traits, and to estimate genetic similarities measures that are predictive of the trait. However, in studies involving hundreds of thousands of SNPs and high-dimensional traits, a very large ensemble of trees must be inferred from the data in order to obtain reliable rankings, which makes the application of these algorithms computationally prohibitive. Results We have developed a parallel version of the RF algorithm for regression and genetic similarity learning tasks in large-scale population genetic association studies involving multivariate traits, called PaRFR (Parallel Random Forest Regression). Our implementation takes advantage of the MapReduce programming model and is deployed on Hadoop, an open-source software framework that supports data-intensive distributed applications. Notable speed-ups are obtained by introducing a distance-based criterion for node splitting in the tree estimation process. PaRFR has been applied to a genome-wide association study on Alzheimer's disease (AD) in which the quantitative trait consists of a high-dimensional neuroimaging phenotype describing longitudinal changes in the human brain structure. PaRFR provides a ranking of SNPs associated to this trait, and produces pair-wise measures of genetic proximity that can be directly compared to pair-wise measures of phenotypic proximity. Several known AD-related variants have been identified, including APOE4 and TOMM40. We also present experimental evidence supporting the hypothesis of a linear relationship between the number of top-ranked mutated states, or frequent mutation patterns, and an indicator of disease severity. Availability The Java codes are freely available at http://www2.imperial.ac.uk/~gmontana. PMID:24564704
Random forests on Hadoop for genome-wide association studies of multivariate neuroimaging phenotypes.

PubMed

Wang, Yue; Goh, Wilson; Wong, Limsoon; Montana, Giovanni

2013-01-01

Multivariate quantitative traits arise naturally in recent neuroimaging genetics studies, in which both structural and functional variability of the human brain is measured non-invasively through techniques such as magnetic resonance imaging (MRI). There is growing interest in detecting genetic variants associated with such multivariate traits, especially in genome-wide studies. Random forests (RFs) classifiers, which are ensembles of decision trees, are amongst the best performing machine learning algorithms and have been successfully employed for the prioritisation of genetic variants in case-control studies. RFs can also be applied to produce gene rankings in association studies with multivariate quantitative traits, and to estimate genetic similarities measures that are predictive of the trait. However, in studies involving hundreds of thousands of SNPs and high-dimensional traits, a very large ensemble of trees must be inferred from the data in order to obtain reliable rankings, which makes the application of these algorithms computationally prohibitive. We have developed a parallel version of the RF algorithm for regression and genetic similarity learning tasks in large-scale population genetic association studies involving multivariate traits, called PaRFR (Parallel Random Forest Regression). Our implementation takes advantage of the MapReduce programming model and is deployed on Hadoop, an open-source software framework that supports data-intensive distributed applications. Notable speed-ups are obtained by introducing a distance-based criterion for node splitting in the tree estimation process. PaRFR has been applied to a genome-wide association study on Alzheimer's disease (AD) in which the quantitative trait consists of a high-dimensional neuroimaging phenotype describing longitudinal changes in the human brain structure. PaRFR provides a ranking of SNPs associated to this trait, and produces pair-wise measures of genetic proximity that can be directly compared to pair-wise measures of phenotypic proximity. Several known AD-related variants have been identified, including APOE4 and TOMM40. We also present experimental evidence supporting the hypothesis of a linear relationship between the number of top-ranked mutated states, or frequent mutation patterns, and an indicator of disease severity. The Java codes are freely available at http://www2.imperial.ac.uk/~gmontana.
Visible and near infrared spectroscopy coupled to random forest to quantify some soil quality parameters

NASA Astrophysics Data System (ADS)

de Santana, Felipe Bachion; de Souza, André Marcelo; Poppi, Ronei Jesus

2018-02-01

This study evaluates the use of visible and near infrared spectroscopy (Vis-NIRS) combined with multivariate regression based on random forest to quantify some quality soil parameters. The parameters analyzed were soil cation exchange capacity (CEC), sum of exchange bases (SB), organic matter (OM), clay and sand present in the soils of several regions of Brazil. Current methods for evaluating these parameters are laborious, timely and require various wet analytical methods that are not adequate for use in precision agriculture, where faster and automatic responses are required. The random forest regression models were statistically better than PLS regression models for CEC, OM, clay and sand, demonstrating resistance to overfitting, attenuating the effect of outlier samples and indicating the most important variables for the model. The methodology demonstrates the potential of the Vis-NIR as an alternative for determination of CEC, SB, OM, sand and clay, making possible to develop a fast and automatic analytical procedure.
Random forests as cumulative effects models: A case study of lakes and rivers in Muskoka, Canada.

PubMed

Jones, F Chris; Plewes, Rachel; Murison, Lorna; MacDougall, Mark J; Sinclair, Sarah; Davies, Christie; Bailey, John L; Richardson, Murray; Gunn, John

2017-10-01

Cumulative effects assessment (CEA) - a type of environmental appraisal - lacks effective methods for modeling cumulative effects, evaluating indicators of ecosystem condition, and exploring the likely outcomes of development scenarios. Random forests are an extension of classification and regression trees, which model response variables by recursive partitioning. Random forests were used to model a series of candidate ecological indicators that described lakes and rivers from a case study watershed (The Muskoka River Watershed, Canada). Suitability of the candidate indicators for use in cumulative effects assessment and watershed monitoring was assessed according to how well they could be predicted from natural habitat features and how sensitive they were to human land-use. The best models explained 75% of the variation in a multivariate descriptor of lake benthic-macroinvertebrate community structure, and 76% of the variation in the conductivity of river water. Similar results were obtained by cross-validation. Several candidate indicators detected a simulated doubling of urban land-use in their catchments, and a few were able to detect a simulated doubling of agricultural land-use. The paper demonstrates that random forests can be used to describe the combined and singular effects of multiple stressors and natural environmental factors, and furthermore, that random forests can be used to evaluate the performance of monitoring indicators. The numerical methods presented are applicable to any ecosystem and indicator type, and therefore represent a step forward for CEA. Crown Copyright © 2017. Published by Elsevier Ltd. All rights reserved.
Comparison of Logistic Regression and Random Forests techniques for shallow landslide susceptibility assessment in Giampilieri (NE Sicily, Italy)

NASA Astrophysics Data System (ADS)

Trigila, Alessandro; Iadanza, Carla; Esposito, Carlo; Scarascia-Mugnozza, Gabriele

2015-11-01

The aim of this work is to define reliable susceptibility models for shallow landslides using Logistic Regression and Random Forests multivariate statistical techniques. The study area, located in North-East Sicily, was hit on October 1st 2009 by a severe rainstorm (225 mm of cumulative rainfall in 7 h) which caused flash floods and more than 1000 landslides. Several small villages, such as Giampilieri, were hit with 31 fatalities, 6 missing persons and damage to buildings and transportation infrastructures. Landslides, mainly types such as earth and debris translational slides evolving into debris flows, were triggered on steep slopes and involved colluvium and regolith materials which cover the underlying metamorphic bedrock. The work has been carried out with the following steps: i) realization of a detailed event landslide inventory map through field surveys coupled with observation of high resolution aerial colour orthophoto; ii) identification of landslide source areas; iii) data preparation of landslide controlling factors and descriptive statistics based on a bivariate method (Frequency Ratio) to get an initial overview on existing relationships between causative factors and shallow landslide source areas; iv) choice of criteria for the selection and sizing of the mapping unit; v) implementation of 5 multivariate statistical susceptibility models based on Logistic Regression and Random Forests techniques and focused on landslide source areas; vi) evaluation of the influence of sample size and type of sampling on results and performance of the models; vii) evaluation of the predictive capabilities of the models using ROC curve, AUC and contingency tables; viii) comparison of model results and obtained susceptibility maps; and ix) analysis of temporal variation of landslide susceptibility related to input parameter changes. Models based on Logistic Regression and Random Forests have demonstrated excellent predictive capabilities. Land use and wildfire variables were found to have a strong control on the occurrence of very rapid shallow landslides.
Multivariate classification with random forests for gravitational wave searches of black hole binary coalescence

NASA Astrophysics Data System (ADS)

Baker, Paul T.; Caudill, Sarah; Hodge, Kari A.; Talukder, Dipongkar; Capano, Collin; Cornish, Neil J.

2015-03-01

Searches for gravitational waves produced by coalescing black hole binaries with total masses ≳25 M⊙ use matched filtering with templates of short duration. Non-Gaussian noise bursts in gravitational wave detector data can mimic short signals and limit the sensitivity of these searches. Previous searches have relied on empirically designed statistics incorporating signal-to-noise ratio and signal-based vetoes to separate gravitational wave candidates from noise candidates. We report on sensitivity improvements achieved using a multivariate candidate ranking statistic derived from a supervised machine learning algorithm. We apply the random forest of bagged decision trees technique to two separate searches in the high mass (≳25 M⊙ ) parameter space. For a search which is sensitive to gravitational waves from the inspiral, merger, and ringdown of binary black holes with total mass between 25 M⊙ and 100 M⊙ , we find sensitive volume improvements as high as 70±13%-109±11% when compared to the previously used ranking statistic. For a ringdown-only search which is sensitive to gravitational waves from the resultant perturbed intermediate mass black hole with mass roughly between 10 M⊙ and 600 M⊙ , we find sensitive volume improvements as high as 61±4%-241±12% when compared to the previously used ranking statistic. We also report how sensitivity improvements can differ depending on mass regime, mass ratio, and available data quality information. Finally, we describe the techniques used to tune and train the random forest classifier that can be generalized to its use in other searches for gravitational waves.
Cross-country transferability of multi-variable damage models

NASA Astrophysics Data System (ADS)

Wagenaar, Dennis; Lüdtke, Stefan; Kreibich, Heidi; Bouwer, Laurens

2017-04-01

Flood damage assessment is often done with simple damage curves based only on flood water depth. Additionally, damage models are often transferred in space and time, e.g. from region to region or from one flood event to another. Validation has shown that depth-damage curve estimates are associated with high uncertainties, particularly when applied in regions outside the area where the data for curve development was collected. Recently, progress has been made with multi-variable damage models created with data-mining techniques, i.e. Bayesian Networks and random forest. However, it is still unknown to what extent and under which conditions model transfers are possible and reliable. Model validations in different countries will provide valuable insights into the transferability of multi-variable damage models. In this study we compare multi-variable models developed on basis of flood damage datasets from Germany as well as from The Netherlands. Data from several German floods was collected using computer aided telephone interviews. Data from the 1993 Meuse flood in the Netherlands is available, based on compensations paid by the government. The Bayesian network and random forest based models are applied and validated in both countries on basis of the individual datasets. A major challenge was the harmonization of the variables between both datasets due to factors like differences in variable definitions, and regional and temporal differences in flood hazard and exposure characteristics. Results of model validations and comparisons in both countries are discussed, particularly in respect to encountered challenges and possible solutions for an improvement of model transferability.
Land cover and land use mapping of the iSimangaliso Wetland Park, South Africa: comparison of oblique and orthogonal random forest algorithms

NASA Astrophysics Data System (ADS)

Bassa, Zaakirah; Bob, Urmilla; Szantoi, Zoltan; Ismail, Riyad

2016-01-01

In recent years, the popularity of tree-based ensemble methods for land cover classification has increased significantly. Using WorldView-2 image data, we evaluate the potential of the oblique random forest algorithm (oRF) to classify a highly heterogeneous protected area. In contrast to the random forest (RF) algorithm, the oRF algorithm builds multivariate trees by learning the optimal split using a supervised model. The oRF binary algorithm is adapted to a multiclass land cover and land use application using both the "one-against-one" and "one-against-all" combination approaches. Results show that the oRF algorithms are capable of achieving high classification accuracies (>80%). However, there was no statistical difference in classification accuracies obtained by the oRF algorithms and the more popular RF algorithm. For all the algorithms, user accuracies (UAs) and producer accuracies (PAs) >80% were recorded for most of the classes. Both the RF and oRF algorithms poorly classified the indigenous forest class as indicated by the low UAs and PAs. Finally, the results from this study advocate and support the utility of the oRF algorithm for land cover and land use mapping of protected areas using WorldView-2 image data.
Kalman filter for statistical monitoring of forest cover across sub-continental regions

Treesearch

Raymond L. Czaplewski

1991-01-01

The Kalman filter is a multivariate generalization of the composite estimator which recursively combines a current direct estimate with a past estimate that is updated for expected change over time with a prediction model. The Kalman filter can estimate proportions of different cover types for sub-continental regions each year. A random sample of high-resolution...
Predictive Utility of Marketed Volumetric Software Tools in Subjects at Risk for Alzheimer's: Do Regions Outside the Hippocampus Matter?

PubMed Central

Tanpitukpongse, Teerath P.; Mazurowski, Maciej A.; Ikhena, John; Petrella, Jeffrey R.

2016-01-01

Background and Purpose To assess prognostic efficacy of individual versus combined regional volumetrics in two commercially-available brain volumetric software packages for predicting conversion of patients with mild cognitive impairment to Alzheimer's disease. Materials and Methods Data was obtained through the Alzheimer's Disease Neuroimaging Initiative. 192 subjects (mean age 74.8 years, 39% female) diagnosed with mild cognitive impairment at baseline were studied. All had T1WI MRI sequences at baseline and 3-year clinical follow-up. Analysis was performed with NeuroQuant® and Neuroreader™. Receiver operating characteristic curves assessing the prognostic efficacy of each software package were generated using a univariable approach employing individual regional brain volumes, as well as two multivariable approaches (multiple regression and random forest), combining multiple volumes. Results On univariable analysis of 11 NeuroQuant® and 11 Neuroreader™ regional volumes, hippocampal volume had the highest area under the curve for both software packages (0.69 NeuroQuant®, 0.68 Neuroreader™), and was not significantly different (p > 0.05) between packages. Multivariable analysis did not increase the area under the curve for either package (0.63 logistic regression, 0.60 random forest NeuroQuant®; 0.65 logistic regression, 0.62 random forest Neuroreader™). Conclusion Of the multiple regional volume measures available in FDA-cleared brain volumetric software packages, hippocampal volume remains the best single predictor of conversion of mild cognitive impairment to Alzheimer's disease at 3-year follow-up. Combining volumetrics did not add additional prognostic efficacy. Therefore, future prognostic studies in MCI, combining such tools with demographic and other biomarker measures, are justified in using hippocampal volume as the only volumetric biomarker. PMID:28057634
Accuracy assessments and areal estimates using two-phase stratified random sampling, cluster plots, and the multivariate composite estimator

Treesearch

Raymond L. Czaplewski

2000-01-01

Consider the following example of an accuracy assessment. Landsat data are used to build a thematic map of land cover for a multicounty region. The map classifier (e.g., a supervised classification algorithm) assigns each pixel into one category of land cover. The classification system includes 12 different types of forest and land cover: black spruce, balsam fir,...
Predictive Utility of Marketed Volumetric Software Tools in Subjects at Risk for Alzheimer Disease: Do Regions Outside the Hippocampus Matter?

PubMed

Tanpitukpongse, T P; Mazurowski, M A; Ikhena, J; Petrella, J R

2017-03-01

Alzheimer disease is a prevalent neurodegenerative disease. Computer assessment of brain atrophy patterns can help predict conversion to Alzheimer disease. Our aim was to assess the prognostic efficacy of individual-versus-combined regional volumetrics in 2 commercially available brain volumetric software packages for predicting conversion of patients with mild cognitive impairment to Alzheimer disease. Data were obtained through the Alzheimer's Disease Neuroimaging Initiative. One hundred ninety-two subjects (mean age, 74.8 years; 39% female) diagnosed with mild cognitive impairment at baseline were studied. All had T1-weighted MR imaging sequences at baseline and 3-year clinical follow-up. Analysis was performed with NeuroQuant and Neuroreader. Receiver operating characteristic curves assessing the prognostic efficacy of each software package were generated by using a univariable approach using individual regional brain volumes and 2 multivariable approaches (multiple regression and random forest), combining multiple volumes. On univariable analysis of 11 NeuroQuant and 11 Neuroreader regional volumes, hippocampal volume had the highest area under the curve for both software packages (0.69, NeuroQuant; 0.68, Neuroreader) and was not significantly different ( P > .05) between packages. Multivariable analysis did not increase the area under the curve for either package (0.63, logistic regression; 0.60, random forest NeuroQuant; 0.65, logistic regression; 0.62, random forest Neuroreader). Of the multiple regional volume measures available in FDA-cleared brain volumetric software packages, hippocampal volume remains the best single predictor of conversion of mild cognitive impairment to Alzheimer disease at 3-year follow-up. Combining volumetrics did not add additional prognostic efficacy. Therefore, future prognostic studies in mild cognitive impairment, combining such tools with demographic and other biomarker measures, are justified in using hippocampal volume as the only volumetric biomarker. © 2017 by American Journal of Neuroradiology.

Grey matter volume patterns in thalamic nuclei are associated with familial risk for schizophrenia.

PubMed

Pergola, Giulio; Trizio, Silvestro; Di Carlo, Pasquale; Taurisano, Paolo; Mancini, Marina; Amoroso, Nicola; Nettis, Maria Antonietta; Andriola, Ileana; Caforio, Grazia; Popolizio, Teresa; Rampino, Antonio; Di Giorgio, Annabella; Bertolino, Alessandro; Blasi, Giuseppe

2017-02-01

Previous evidence suggests reduced thalamic grey matter volume (GMV) in patients with schizophrenia (SCZ). However, it is not considered an intermediate phenotype for schizophrenia, possibly because previous studies did not assess the contribution of individual thalamic nuclei and employed univariate statistics. Here, we hypothesized that multivariate statistics would reveal an association of GMV in different thalamic nuclei with familial risk for schizophrenia. We also hypothesized that accounting for the heterogeneity of thalamic GMV in healthy controls would improve the detection of subjects at familial risk for the disorder. We acquired MRI scans for 96 clinically stable SCZ, 55 non-affected siblings of patients with schizophrenia (SIB), and 249 HC. The thalamus was parceled into seven regions of interest (ROIs). After a canonical univariate analysis, we used GMV estimates of thalamic ROIs, together with total thalamic GMV and premorbid intelligence, as features in Random Forests to classify HC, SIB, and SCZ. Then, we computed a Misclassification Index for each individual and tested the improvement in SIB detection after excluding a subsample of HC misclassified as patients. Random Forests discriminated SCZ from HC (accuracy=81%) and SIB from HC (accuracy=75%). Left anteromedial thalamic volumes were significantly associated with both multivariate classifications (p<0.05). Excluding HC misclassified as SCZ improved greatly HC vs. SIB classification (Cohen's d=1.39). These findings suggest that multivariate statistics identify a familial background associated with thalamic GMV reduction in SCZ. They also suggest the relevance of inter-individual variability of GMV patterns for the discrimination of individuals at familial risk for the disorder. Copyright © 2016 Elsevier B.V. All rights reserved.
Multivariate Feature Selection of Image Descriptors Data for Breast Cancer with Computer-Assisted Diagnosis

PubMed Central

Galván-Tejada, Carlos E.; Zanella-Calzada, Laura A.; Galván-Tejada, Jorge I.; Celaya-Padilla, José M.; Gamboa-Rosales, Hamurabi; Garza-Veloz, Idalia; Martinez-Fierro, Margarita L.

2017-01-01

Breast cancer is an important global health problem, and the most common type of cancer among women. Late diagnosis significantly decreases the survival rate of the patient; however, using mammography for early detection has been demonstrated to be a very important tool increasing the survival rate. The purpose of this paper is to obtain a multivariate model to classify benign and malignant tumor lesions using a computer-assisted diagnosis with a genetic algorithm in training and test datasets from mammography image features. A multivariate search was conducted to obtain predictive models with different approaches, in order to compare and validate results. The multivariate models were constructed using: Random Forest, Nearest centroid, and K-Nearest Neighbor (K-NN) strategies as cost function in a genetic algorithm applied to the features in the BCDR public databases. Results suggest that the two texture descriptor features obtained in the multivariate model have a similar or better prediction capability to classify the data outcome compared with the multivariate model composed of all the features, according to their fitness value. This model can help to reduce the workload of radiologists and present a second opinion in the classification of tumor lesions. PMID:28216571
Multivariate Feature Selection of Image Descriptors Data for Breast Cancer with Computer-Assisted Diagnosis.

PubMed

Galván-Tejada, Carlos E; Zanella-Calzada, Laura A; Galván-Tejada, Jorge I; Celaya-Padilla, José M; Gamboa-Rosales, Hamurabi; Garza-Veloz, Idalia; Martinez-Fierro, Margarita L

2017-02-14

Breast cancer is an important global health problem, and the most common type of cancer among women. Late diagnosis significantly decreases the survival rate of the patient; however, using mammography for early detection has been demonstrated to be a very important tool increasing the survival rate. The purpose of this paper is to obtain a multivariate model to classify benign and malignant tumor lesions using a computer-assisted diagnosis with a genetic algorithm in training and test datasets from mammography image features. A multivariate search was conducted to obtain predictive models with different approaches, in order to compare and validate results. The multivariate models were constructed using: Random Forest, Nearest centroid, and K-Nearest Neighbor (K-NN) strategies as cost function in a genetic algorithm applied to the features in the BCDR public databases. Results suggest that the two texture descriptor features obtained in the multivariate model have a similar or better prediction capability to classify the data outcome compared with the multivariate model composed of all the features, according to their fitness value. This model can help to reduce the workload of radiologists and present a second opinion in the classification of tumor lesions.
Statistical Approaches to Type Determination of the Ejector Marks on Cartridge Cases.

PubMed

Warren, Eric M; Sheets, H David

2018-03-01

While type determination on bullets has been performed for over a century, type determination on cartridge cases is often overlooked. Presented here is an example of type determination of ejector marks on cartridge cases from Glock and Smith & Wesson Sigma series pistols using Naïve Bayes and Random Forest classification methods. The shapes of ejector marks were captured from images of test-fired cartridge cases and subjected to multivariate analysis. Naïve Bayes and Random Forest methods were used to assign the ejector shapes to the correct class of firearm with success rates as high as 98%. This method is easily implemented with equipment already available in crime laboratories and can serve as an investigative lead in the form of a list of firearms that could have fired the evidence. Paired with the FBI's General Rifling Characteristics (GRC) database, this could be an invaluable resource for firearm evidence at crime scenes. © 2017 American Academy of Forensic Sciences.
An application of quantile random forests for predictive mapping of forest attributes

Treesearch

E.A. Freeman; G.G. Moisen

2015-01-01

Increasingly, random forest models are used in predictive mapping of forest attributes. Traditional random forests output the mean prediction from the random trees. Quantile regression forests (QRF) is an extension of random forests developed by Nicolai Meinshausen that provides non-parametric estimates of the median predicted value as well as prediction quantiles. It...
Landslide susceptibility assessment in Lianhua County (China): A comparison between a random forest data mining technique and bivariate and multivariate statistical models

NASA Astrophysics Data System (ADS)

Hong, Haoyuan; Pourghasemi, Hamid Reza; Pourtaghi, Zohre Sadat

2016-04-01

Landslides are an important natural hazard that causes a great amount of damage around the world every year, especially during the rainy season. The Lianhua area is located in the middle of China's southern mountainous area, west of Jiangxi Province, and is known to be an area prone to landslides. The aim of this study was to evaluate and compare landslide susceptibility maps produced using the random forest (RF) data mining technique with those produced by bivariate (evidential belief function and frequency ratio) and multivariate (logistic regression) statistical models for Lianhua County, China. First, a landslide inventory map was prepared using aerial photograph interpretation, satellite images, and extensive field surveys. In total, 163 landslide events were recognized in the study area, with 114 landslides (70%) used for training and 49 landslides (30%) used for validation. Next, the landslide conditioning factors-including the slope angle, altitude, slope aspect, topographic wetness index (TWI), slope-length (LS), plan curvature, profile curvature, distance to rivers, distance to faults, distance to roads, annual precipitation, land use, normalized difference vegetation index (NDVI), and lithology-were derived from the spatial database. Finally, the landslide susceptibility maps of Lianhua County were generated in ArcGIS 10.1 based on the random forest (RF), evidential belief function (EBF), frequency ratio (FR), and logistic regression (LR) approaches and were validated using a receiver operating characteristic (ROC) curve. The ROC plot assessment results showed that for landslide susceptibility maps produced using the EBF, FR, LR, and RF models, the area under the curve (AUC) values were 0.8122, 0.8134, 0.7751, and 0.7172, respectively. Therefore, we can conclude that all four models have an AUC of more than 0.70 and can be used in landslide susceptibility mapping in the study area; meanwhile, the EBF and FR models had the best performance for Lianhua County, China. Thus, the resultant susceptibility maps will be useful for land use planning and hazard mitigation aims.
Factors influencing consumption of nutrient rich forest foods in rural Cameroon.

PubMed

Fungo, Robert; Muyonga, John H; Kabahenda, Margaret; Okia, Clement A; Snook, Laura

2016-02-01

Studies show that a number of forest foods consumed in Cameroon are highly nutritious and rich in health boosting bioactive compounds. This study assessed the knowledge and perceptions towards the nutritional and health promoting properties of forest foods among forest dependent communities. The relationship between knowledge, perceptions and socio-demographic attributes on consumption of forest foods was also determined. A total of 279 females in charge of decision making with respect to food preparation were randomly selected from 12 villages in southern and eastern Cameroon and interviewed using researcher administered questionnaires. Multivariate logistic regression analysis was used to identify the factors affecting consumption of forest foods. Baillonella toxisperma (98%) and Irvingia gabonesis (81%) were the most known nutrient rich forest foods by the respondents. About 31% of the respondents were aware of the nutritional value and health benefits of forest foods. About 10%-61% of the respondents expressed positive attitudes to questions related with health benefits of specific forest foods. Consumption of forest foods was found to be higher among polygamous families and also positively related to length of stay in the forest area and age of respondent with consumption of forest foods. Education had an inverse relationship with use of forest foods. Knowledge and positive attitude towards the nutritional value of forest foods were also found to positively influence consumption of forest foods. Since knowledge was found to influence attitude and consumption, there is need to invest in awareness campaigns to strengthen the current knowledge levels among the study population. This should positively influence the attitudes and perceptions towards increased consumption of forest foods. Copyright © 2015 Elsevier Ltd. All rights reserved.
Minimizing effects of methodological decisions on interpretation and prediction in species distribution studies: An example with background selection

USGS Publications Warehouse

Jarnevich, Catherine S.; Talbert, Marian; Morisette, Jeffrey T.; Aldridge, Cameron L.; Brown, Cynthia; Kumar, Sunil; Manier, Daniel; Talbert, Colin; Holcombe, Tracy R.

2017-01-01

Evaluating the conditions where a species can persist is an important question in ecology both to understand tolerances of organisms and to predict distributions across landscapes. Presence data combined with background or pseudo-absence locations are commonly used with species distribution modeling to develop these relationships. However, there is not a standard method to generate background or pseudo-absence locations, and method choice affects model outcomes. We evaluated combinations of both model algorithms (simple and complex generalized linear models, multivariate adaptive regression splines, Maxent, boosted regression trees, and random forest) and background methods (random, minimum convex polygon, and continuous and binary kernel density estimator (KDE)) to assess the sensitivity of model outcomes to choices made. We evaluated six questions related to model results, including five beyond the common comparison of model accuracy assessment metrics (biological interpretability of response curves, cross-validation robustness, independent data accuracy and robustness, and prediction consistency). For our case study with cheatgrass in the western US, random forest was least sensitive to background choice and the binary KDE method was least sensitive to model algorithm choice. While this outcome may not hold for other locations or species, the methods we used can be implemented to help determine appropriate methodologies for particular research questions.
Finding structure in data using multivariate tree boosting

PubMed Central

Miller, Patrick J.; Lubke, Gitta H.; McArtor, Daniel B.; Bergeman, C. S.

2016-01-01

Technology and collaboration enable dramatic increases in the size of psychological and psychiatric data collections, but finding structure in these large data sets with many collected variables is challenging. Decision tree ensembles such as random forests (Strobl, Malley, & Tutz, 2009) are a useful tool for finding structure, but are difficult to interpret with multiple outcome variables which are often of interest in psychology. To find and interpret structure in data sets with multiple outcomes and many predictors (possibly exceeding the sample size), we introduce a multivariate extension to a decision tree ensemble method called gradient boosted regression trees (Friedman, 2001). Our extension, multivariate tree boosting, is a method for nonparametric regression that is useful for identifying important predictors, detecting predictors with nonlinear effects and interactions without specification of such effects, and for identifying predictors that cause two or more outcome variables to covary. We provide the R package ‘mvtboost’ to estimate, tune, and interpret the resulting model, which extends the implementation of univariate boosting in the R package ‘gbm’ (Ridgeway et al., 2015) to continuous, multivariate outcomes. To illustrate the approach, we analyze predictors of psychological well-being (Ryff & Keyes, 1995). Simulations verify that our approach identifies predictors with nonlinear effects and achieves high prediction accuracy, exceeding or matching the performance of (penalized) multivariate multiple regression and multivariate decision trees over a wide range of conditions. PMID:27918183
Multivariate geomorphic analysis of forest streams: Implications for assessment of land use impacts on channel condition

Treesearch

Richard. D. Wood-Smith; John M. Buffington

1996-01-01

Multivariate statistical analyses of geomorphic variables from 23 forest stream reaches in southeast Alaska result in successful discrimination between pristine streams and those disturbed by land management, specifically timber harvesting and associated road building. Results of discriminant function analysis indicate that a three-variable model discriminates 10...
Data-Driven Lead-Acid Battery Prognostics Using Random Survival Forests

DTIC Science & Technology

2014-10-02

Kogalur, Blackstone , & Lauer, 2008; Ishwaran & Kogalur, 2010). Random survival forest is a sur- vival analysis extension of Random Forests (Breiman, 2001...Statistics & probability letters, 80(13), 1056–1064. Ishwaran, H., Kogalur, U. B., Blackstone , E. H., & Lauer, M. S. (2008). Random survival forests. The
A multivariate decision tree analysis of biophysical factors in tropical forest fire occurrence

Treesearch

Rey S. Ofren; Edward Harvey

2000-01-01

A multivariate decision tree model was used to quantify the relative importance of complex hierarchical relationships between biophysical variables and the occurrence of tropical forest fires. The study site is the Huai Kha Kbaeng wildlife sanctuary, a World Heritage Site in northwestern Thailand where annual fires are common and particularly destructive. Thematic...
A tale of two "forests": random forest machine learning AIDS tropical forest carbon mapping.

PubMed

Mascaro, Joseph; Asner, Gregory P; Knapp, David E; Kennedy-Bowdoin, Ty; Martin, Roberta E; Anderson, Christopher; Higgins, Mark; Chadwick, K Dana

2014-01-01

Accurate and spatially-explicit maps of tropical forest carbon stocks are needed to implement carbon offset mechanisms such as REDD+ (Reduced Deforestation and Degradation Plus). The Random Forest machine learning algorithm may aid carbon mapping applications using remotely-sensed data. However, Random Forest has never been compared to traditional and potentially more reliable techniques such as regionally stratified sampling and upscaling, and it has rarely been employed with spatial data. Here, we evaluated the performance of Random Forest in upscaling airborne LiDAR (Light Detection and Ranging)-based carbon estimates compared to the stratification approach over a 16-million hectare focal area of the Western Amazon. We considered two runs of Random Forest, both with and without spatial contextual modeling by including--in the latter case--x, and y position directly in the model. In each case, we set aside 8 million hectares (i.e., half of the focal area) for validation; this rigorous test of Random Forest went above and beyond the internal validation normally compiled by the algorithm (i.e., called "out-of-bag"), which proved insufficient for this spatial application. In this heterogeneous region of Northern Peru, the model with spatial context was the best preforming run of Random Forest, and explained 59% of LiDAR-based carbon estimates within the validation area, compared to 37% for stratification or 43% by Random Forest without spatial context. With the 60% improvement in explained variation, RMSE against validation LiDAR samples improved from 33 to 26 Mg C ha(-1) when using Random Forest with spatial context. Our results suggest that spatial context should be considered when using Random Forest, and that doing so may result in substantially improved carbon stock modeling for purposes of climate change mitigation.
Prediction of survival with multi-scale radiomic analysis in glioblastoma patients.

PubMed

Chaddad, Ahmad; Sabri, Siham; Niazi, Tamim; Abdulkarim, Bassam

2018-06-19

We propose a multiscale texture features based on Laplacian-of Gaussian (LoG) filter to predict progression free (PFS) and overall survival (OS) in patients newly diagnosed with glioblastoma (GBM). Experiments use the extracted features derived from 40 patients of GBM with T1-weighted imaging (T1-WI) and Fluid-attenuated inversion recovery (FLAIR) images that were segmented manually into areas of active tumor, necrosis, and edema. Multiscale texture features were extracted locally from each of these areas of interest using a LoG filter and the relation between features to OS and PFS was investigated using univariate (i.e., Spearman's rank correlation coefficient, log-rank test and Kaplan-Meier estimator) and multivariate analyses (i.e., Random Forest classifier). Three and seven features were statistically correlated with PFS and OS, respectively, with absolute correlation values between 0.32 and 0.36 and p < 0.05. Three features derived from active tumor regions only were associated with OS (p < 0.05) with hazard ratios (HR) of 2.9, 3, and 3.24, respectively. Combined features showed an AUC value of 85.37 and 85.54% for predicting the PFS and OS of GBM patients, respectively, using the random forest (RF) classifier. We presented a multiscale texture features to characterize the GBM regions and predict he PFS and OS. The efficiency achievable suggests that this technique can be developed into a GBM MR analysis system suitable for clinical use after a thorough validation involving more patients. Graphical abstract Scheme of the proposed model for characterizing the heterogeneity of GBM regions and predicting the overall survival and progression free survival of GBM patients. (1) Acquisition of pretreatment MRI images; (2) Affine registration of T1-WI image with its corresponding FLAIR images, and GBM subtype (phenotypes) labelling; (3) Extraction of nine texture features from the three texture scales fine, medium, and coarse derived from each of GBM regions; (4) Comparing heterogeneity between GBM regions by ANOVA test; Survival analysis using Univariate (Spearman rank correlation between features and survival (i.e., PFS and OS) based on each of the GBM regions, Kaplan-Meier estimator and log-rank test to predict the PFS and OS of patient groups that grouped based on median of feature), and multivariate (random forest model) for predicting the PFS and OS of patients groups that grouped based on median of PFS and OS.
Unsupervised learning on scientific ocean drilling datasets from the South China Sea

NASA Astrophysics Data System (ADS)

Tse, Kevin C.; Chiu, Hon-Chim; Tsang, Man-Yin; Li, Yiliang; Lam, Edmund Y.

2018-06-01

Unsupervised learning methods were applied to explore data patterns in multivariate geophysical datasets collected from ocean floor sediment core samples coming from scientific ocean drilling in the South China Sea. Compared to studies on similar datasets, but using supervised learning methods which are designed to make predictions based on sample training data, unsupervised learning methods require no a priori information and focus only on the input data. In this study, popular unsupervised learning methods including K-means, self-organizing maps, hierarchical clustering and random forest were coupled with different distance metrics to form exploratory data clusters. The resulting data clusters were externally validated with lithologic units and geologic time scales assigned to the datasets by conventional methods. Compact and connected data clusters displayed varying degrees of correspondence with existing classification by lithologic units and geologic time scales. K-means and self-organizing maps were observed to perform better with lithologic units while random forest corresponded best with geologic time scales. This study sets a pioneering example of how unsupervised machine learning methods can be used as an automatic processing tool for the increasingly high volume of scientific ocean drilling data.
Summer and winter habitat suitability of Marco Polo argali in southeastern Tajikistan: A modeling approach.

PubMed

Salas, Eric Ariel L; Valdez, Raul; Michel, Stefan

2017-11-01

We modeled summer and winter habitat suitability of Marco Polo argali in the Pamir Mountains in southeastern Tajikistan using these statistical algorithms: Generalized Linear Model, Random Forest, Boosted Regression Tree, Maxent, and Multivariate Adaptive Regression Splines. Using sheep occurrence data collected from 2009 to 2015 and a set of selected habitat predictors, we produced summer and winter habitat suitability maps and determined the important habitat suitability predictors for both seasons. Our results demonstrated that argali selected proximity to riparian areas and greenness as the two most relevant variables for summer, and the degree of slope (gentler slopes between 0° to 20°) and Landsat temperature band for winter. The terrain roughness was also among the most important variables in summer and winter models. Aspect was only significant for winter habitat, with argali preferring south-facing mountain slopes. We evaluated various measures of model performance such as the Area Under the Curve (AUC) and the True Skill Statistic (TSS). Comparing the five algorithms, the AUC scored highest for Boosted Regression Tree in summer (AUC = 0.94) and winter model runs (AUC = 0.94). In contrast, Random Forest underperformed in both model runs.
A comparison of selected parametric and imputation methods for estimating snag density and snag quality attributes

USGS Publications Warehouse

Eskelson, Bianca N.I.; Hagar, Joan; Temesgen, Hailemariam

2012-01-01

Snags (standing dead trees) are an essential structural component of forests. Because wildlife use of snags depends on size and decay stage, snag density estimation without any information about snag quality attributes is of little value for wildlife management decision makers. Little work has been done to develop models that allow multivariate estimation of snag density by snag quality class. Using climate, topography, Landsat TM data, stand age and forest type collected for 2356 forested Forest Inventory and Analysis plots in western Washington and western Oregon, we evaluated two multivariate techniques for their abilities to estimate density of snags by three decay classes. The density of live trees and snags in three decay classes (D1: recently dead, little decay; D2: decay, without top, some branches and bark missing; D3: extensive decay, missing bark and most branches) with diameter at breast height (DBH) ≥ 12.7 cm was estimated using a nonparametric random forest nearest neighbor imputation technique (RF) and a parametric two-stage model (QPORD), for which the number of trees per hectare was estimated with a Quasipoisson model in the first stage and the probability of belonging to a tree status class (live, D1, D2, D3) was estimated with an ordinal regression model in the second stage. The presence of large snags with DBH ≥ 50 cm was predicted using a logistic regression and RF imputation. Because of the more homogenous conditions on private forest lands, snag density by decay class was predicted with higher accuracies on private forest lands than on public lands, while presence of large snags was more accurately predicted on public lands, owing to the higher prevalence of large snags on public lands. RF outperformed the QPORD model in terms of percent accurate predictions, while QPORD provided smaller root mean square errors in predicting snag density by decay class. The logistic regression model achieved more accurate presence/absence classification of large snags than the RF imputation approach. Adjusting the decision threshold to account for unequal size for presence and absence classes is more straightforward for the logistic regression than for the RF imputation approach. Overall, model accuracies were poor in this study, which can be attributed to the poor predictive quality of the explanatory variables and the large range of forest types and geographic conditions observed in the data.
A Tale of Two “Forests”: Random Forest Machine Learning Aids Tropical Forest Carbon Mapping

PubMed Central

Mascaro, Joseph; Asner, Gregory P.; Knapp, David E.; Kennedy-Bowdoin, Ty; Martin, Roberta E.; Anderson, Christopher; Higgins, Mark; Chadwick, K. Dana

2014-01-01

Accurate and spatially-explicit maps of tropical forest carbon stocks are needed to implement carbon offset mechanisms such as REDD+ (Reduced Deforestation and Degradation Plus). The Random Forest machine learning algorithm may aid carbon mapping applications using remotely-sensed data. However, Random Forest has never been compared to traditional and potentially more reliable techniques such as regionally stratified sampling and upscaling, and it has rarely been employed with spatial data. Here, we evaluated the performance of Random Forest in upscaling airborne LiDAR (Light Detection and Ranging)-based carbon estimates compared to the stratification approach over a 16-million hectare focal area of the Western Amazon. We considered two runs of Random Forest, both with and without spatial contextual modeling by including—in the latter case—x, and y position directly in the model. In each case, we set aside 8 million hectares (i.e., half of the focal area) for validation; this rigorous test of Random Forest went above and beyond the internal validation normally compiled by the algorithm (i.e., called “out-of-bag”), which proved insufficient for this spatial application. In this heterogeneous region of Northern Peru, the model with spatial context was the best preforming run of Random Forest, and explained 59% of LiDAR-based carbon estimates within the validation area, compared to 37% for stratification or 43% by Random Forest without spatial context. With the 60% improvement in explained variation, RMSE against validation LiDAR samples improved from 33 to 26 Mg C ha−1 when using Random Forest with spatial context. Our results suggest that spatial context should be considered when using Random Forest, and that doing so may result in substantially improved carbon stock modeling for purposes of climate change mitigation. PMID:24489686
Groundwater potential mapping using C5.0, random forest, and multivariate adaptive regression spline models in GIS.

PubMed

Golkarian, Ali; Naghibi, Seyed Amir; Kalantar, Bahareh; Pradhan, Biswajeet

2018-02-17

Ever increasing demand for water resources for different purposes makes it essential to have better understanding and knowledge about water resources. As known, groundwater resources are one of the main water resources especially in countries with arid climatic condition. Thus, this study seeks to provide groundwater potential maps (GPMs) employing new algorithms. Accordingly, this study aims to validate the performance of C5.0, random forest (RF), and multivariate adaptive regression splines (MARS) algorithms for generating GPMs in the eastern part of Mashhad Plain, Iran. For this purpose, a dataset was produced consisting of spring locations as indicator and groundwater-conditioning factors (GCFs) as input. In this research, 13 GCFs were selected including altitude, slope aspect, slope angle, plan curvature, profile curvature, topographic wetness index (TWI), slope length, distance from rivers and faults, rivers and faults density, land use, and lithology. The mentioned dataset was divided into two classes of training and validation with 70 and 30% of the springs, respectively. Then, C5.0, RF, and MARS algorithms were employed using R statistical software, and the final values were transformed into GPMs. Finally, two evaluation criteria including Kappa and area under receiver operating characteristics curve (AUC-ROC) were calculated. According to the findings of this research, MARS had the best performance with AUC-ROC of 84.2%, followed by RF and C5.0 algorithms with AUC-ROC values of 79.7 and 77.3%, respectively. The results indicated that AUC-ROC values for the employed models are more than 70% which shows their acceptable performance. As a conclusion, the produced methodology could be used in other geographical areas. GPMs could be used by water resource managers and related organizations to accelerate and facilitate water resource exploitation.
Is laparoscopic sleeve gastrectomy safer than laparoscopic gastric bypass? a comparison of 30-day complications using the MBSAQIP data registry.

PubMed

Kumar, Sandhya B; Hamilton, Barbara C; Wood, Stephanie G; Rogers, Stanley J; Carter, Jonathan T; Lin, Matthew Y

2018-03-01

Laparoscopic sleeve gastrectomy (LSG) has become popular due to its technical ease and excellent short-term results. Understanding the risk profile of LSG compared with the gold standard laparoscopic Roux-en-Y gastric bypass (LRYGB) is critical for patient selection. To use traditional regression techniques and random forest classification algorithms to compare LSG with LRYGB using the 2015 Metabolic and Bariatric Surgery Accreditation and Quality Improvement Data Registry. United States. Outcomes were leak, morbidity, and mortality within 30 days. Variable importance was assessed using random forest algorithms. Multivariate models were created in a training set and evaluated on the testing set with receiver operating characteristic curves. The adjusted odds of each outcome were compared. Of 134,142 patients, 93,062 (69%) underwent LSG and 41,080 (31%) underwent LRYGB. One hundred seventy-eight deaths occurred in 96 (.1%) of LSG patients compared with 82 (.2%) of LRYGB patients (P<.001). Morbidity occurred in 8% (5.8% in LSG versus 11.7% in LRYGB, P<.001). Leaks occurred in 1% (.8% in LSG versus 1.6% in LRYGB, P<.001). The most important predictors of all outcomes were body mass index, albumin, and age. In the adjusted multivariate models, LRYGB had higher odds of all complications (leak: odds ratio 2.10, P<.001; morbidity: odds ratio 2.02, P<.001; death: odds ratio 1.64, P<.01). In the Metabolic and Bariatric Surgery Accreditation and Quality Improvements data registry for 2015, LSG had half the risk-adjusted odds of death, serious morbidity, and leak in the first 30 days compared with LRYGB. Copyright © 2018 American Society for Bariatric Surgery. Published by Elsevier Inc. All rights reserved.

Epidemiology of forest malaria in central Vietnam: a large scale cross-sectional survey.

PubMed

Erhart, Annette; Ngo, Duc Thang; Phan, Van Ky; Ta, Thi Tinh; Van Overmeir, Chantal; Speybroeck, Niko; Obsomer, Valerie; Le, Xuan Hung; Le, Khanh Thuan; Coosemans, Marc; D'alessandro, Umberto

2005-12-08

In Vietnam, a large proportion of all malaria cases and deaths occurs in the central mountainous and forested part of the country. Indeed, forest malaria, despite intensive control activities, is still a major problem which raises several questions about its dynamics.A large-scale malaria morbidity survey to measure malaria endemicity and identify important risk factors was carried out in 43 villages situated in a forested area of Ninh Thuan province, south central Vietnam. Four thousand three hundred and six randomly selected individuals, aged 10-60 years, participated in the survey. Rag Lays (86%), traditionally living in the forest and practising "slash and burn" cultivation represented the most common ethnic group. The overall parasite rate was 13.3% (range [0-42.3] while Plasmodium falciparum seroprevalence was 25.5% (range [2.1-75.6]). Mapping of these two variables showed a patchy distribution, suggesting that risk factors other than remoteness and forest proximity modulated the human-vector interactions. This was confirmed by the results of the multivariate-adjusted analysis, showing that forest work was a significant risk factor for malaria infection, further increased by staying in the forest overnight (OR= 2.86; 95%CI [1.62; 5.07]). Rag Lays had a higher risk of malaria infection, which inversely related to education level and socio-economic status. Women were less at risk than men (OR = 0.71; 95%CI [0.59; 0.86]), a possible consequence of different behaviour. This study confirms that malaria endemicity is still relatively high in this area and that the dynamics of transmission is constantly modulated by the behaviour of both humans and vectors. A well-targeted intervention reducing the "vector/forest worker" interaction, based on long-lasting insecticidal material, could be appropriate in this environment.
Epidemiology of forest malaria in central Vietnam: a large scale cross-sectional survey

PubMed Central

Erhart, Annette; Thang, Ngo Duc; Van Ky, Phan; Tinh, Ta Thi; Van Overmeir, Chantal; Speybroeck, Niko; Obsomer, Valerie; Hung, Le Xuan; Thuan, Le Khanh; Coosemans, Marc; D'alessandro, Umberto

2005-01-01

In Vietnam, a large proportion of all malaria cases and deaths occurs in the central mountainous and forested part of the country. Indeed, forest malaria, despite intensive control activities, is still a major problem which raises several questions about its dynamics. A large-scale malaria morbidity survey to measure malaria endemicity and identify important risk factors was carried out in 43 villages situated in a forested area of Ninh Thuan province, south central Vietnam. Four thousand three hundred and six randomly selected individuals, aged 10–60 years, participated in the survey. Rag Lays (86%), traditionally living in the forest and practising "slash and burn" cultivation represented the most common ethnic group. The overall parasite rate was 13.3% (range [0–42.3] while Plasmodium falciparum seroprevalence was 25.5% (range [2.1–75.6]). Mapping of these two variables showed a patchy distribution, suggesting that risk factors other than remoteness and forest proximity modulated the human-vector interactions. This was confirmed by the results of the multivariate-adjusted analysis, showing that forest work was a significant risk factor for malaria infection, further increased by staying in the forest overnight (OR= 2.86; 95%CI [1.62; 5.07]). Rag Lays had a higher risk of malaria infection, which inversely related to education level and socio-economic status. Women were less at risk than men (OR = 0.71; 95%CI [0.59; 0.86]), a possible consequence of different behaviour. This study confirms that malaria endemicity is still relatively high in this area and that the dynamics of transmission is constantly modulated by the behaviour of both humans and vectors. A well-targeted intervention reducing the "vector/forest worker" interaction, based on long-lasting insecticidal material, could be appropriate in this environment. PMID:16336671
A Multiscale Approach Indicates a Severe Reduction in Atlantic Forest Wetlands and Highlights that São Paulo Marsh Antwren Is on the Brink of Extinction

PubMed Central

Del-Rio, Glaucia; Rêgo, Marco Antonio; Silveira, Luís Fábio

2015-01-01

Over the last 200 years the wetlands of the Upper Tietê and Upper Paraíba do Sul basins, in the southeastern Atlantic Forest, Brazil, have been almost-completely transformed by urbanization, agriculture and mining. Endemic to these river basins, the São Paulo Marsh Antwren (Formicivora paludicola) survived these impacts, but remained unknown to science until its discovery in 2005. Its population status was cause for immediate concern. In order to understand the factors imperiling the species, and provide guidelines for its conservation, we investigated both the species’ distribution and the distribution of areas of suitable habitat using a multiscale approach encompassing species distribution modeling, fieldwork surveys and occupancy models. Of six species distribution models methods used (Generalized Linear Models, Generalized Additive Models, Multivariate Adaptive Regression Splines, Classification Tree Analysis, Artificial Neural Networks and Random Forest), Random Forest showed the best fit and was utilized to guide field validation. After surveying 59 sites, our results indicated that Formicivora paludicola occurred in only 13 sites, having narrow habitat specificity, and restricted habitat availability. Additionally, historic maps, distribution models and satellite imagery showed that human occupation has resulted in a loss of more than 346 km2 of suitable habitat for this species since the early twentieth century, so that it now only occupies a severely fragmented area (area of occupancy) of 1.42 km2, and it should be considered Critically Endangered according to IUCN criteria. Furthermore, averaged occupancy models showed that marshes with lower cattail (Typha dominguensis) densities have higher probabilities of being occupied. Thus, these areas should be prioritized in future conservation efforts to protect the species, and to restore a portion of Atlantic Forest wetlands, in times of unprecedented regional water supply problems. PMID:25798608
A multiscale approach indicates a severe reduction in Atlantic Forest wetlands and highlights that São Paulo Marsh Antwren is on the brink of extinction.

PubMed

Del-Rio, Glaucia; Rêgo, Marco Antonio; Silveira, Luís Fábio

2015-01-01

Over the last 200 years the wetlands of the Upper Tietê and Upper Paraíba do Sul basins, in the southeastern Atlantic Forest, Brazil, have been almost-completely transformed by urbanization, agriculture and mining. Endemic to these river basins, the São Paulo Marsh Antwren (Formicivora paludicola) survived these impacts, but remained unknown to science until its discovery in 2005. Its population status was cause for immediate concern. In order to understand the factors imperiling the species, and provide guidelines for its conservation, we investigated both the species' distribution and the distribution of areas of suitable habitat using a multiscale approach encompassing species distribution modeling, fieldwork surveys and occupancy models. Of six species distribution models methods used (Generalized Linear Models, Generalized Additive Models, Multivariate Adaptive Regression Splines, Classification Tree Analysis, Artificial Neural Networks and Random Forest), Random Forest showed the best fit and was utilized to guide field validation. After surveying 59 sites, our results indicated that Formicivora paludicola occurred in only 13 sites, having narrow habitat specificity, and restricted habitat availability. Additionally, historic maps, distribution models and satellite imagery showed that human occupation has resulted in a loss of more than 346 km2 of suitable habitat for this species since the early twentieth century, so that it now only occupies a severely fragmented area (area of occupancy) of 1.42 km2, and it should be considered Critically Endangered according to IUCN criteria. Furthermore, averaged occupancy models showed that marshes with lower cattail (Typha dominguensis) densities have higher probabilities of being occupied. Thus, these areas should be prioritized in future conservation efforts to protect the species, and to restore a portion of Atlantic Forest wetlands, in times of unprecedented regional water supply problems.
Predicting Ascospore Release of Monilinia vaccinii-corymbosi of Blueberry with Machine Learning.

PubMed

Harteveld, Dalphy O C; Grant, Michael R; Pscheidt, Jay W; Peever, Tobin L

2017-11-01

Mummy berry, caused by Monilinia vaccinii-corymbosi, causes economic losses of highbush blueberry in the U.S. Pacific Northwest (PNW). Apothecia develop from mummified berries overwintering on soil surfaces and produce ascospores that infect tissue emerging from floral and vegetative buds. Disease control currently relies on fungicides applied on a calendar basis rather than inoculum availability. To establish a prediction model for ascospore release, apothecial development was tracked in three fields, one in western Oregon and two in northwestern Washington in 2015 and 2016. Air and soil temperature, precipitation, soil moisture, leaf wetness, relative humidity and solar radiation were monitored using in-field weather stations and Washington State University's AgWeatherNet stations. Four modeling approaches were compared: logistic regression, multivariate adaptive regression splines, artificial neural networks, and random forest. A supervised learning approach was used to train the models on two data sets: training (70%) and testing (30%). The importance of environmental factors was calculated for each model separately. Soil temperature, soil moisture, and solar radiation were identified as the most important factors influencing ascospore release. Random forest models, with 78% accuracy, showed the best performance compared with the other models. Results of this research helps PNW blueberry growers to optimize fungicide use and reduce production costs.
Classifier for gravitational-wave inspiral signals in nonideal single-detector data

NASA Astrophysics Data System (ADS)

Kapadia, S. J.; Dent, T.; Dal Canton, T.

2017-11-01

We describe a multivariate classifier for candidate events in a templated search for gravitational-wave (GW) inspiral signals from neutron-star-black-hole (NS-BH) binaries, in data from ground-based detectors where sensitivity is limited by non-Gaussian noise transients. The standard signal-to-noise ratio (SNR) and chi-squared test for inspiral searches use only properties of a single matched filter at the time of an event; instead, we propose a classifier using features derived from a bank of inspiral templates around the time of each event, and also from a search using approximate sine-Gaussian templates. The classifier thus extracts additional information from strain data to discriminate inspiral signals from noise transients. We evaluate a random forest classifier on a set of single-detector events obtained from realistic simulated advanced LIGO data, using simulated NS-BH signals added to the data. The new classifier detects a factor of 1.5-2 more signals at low false positive rates as compared to the standard "reweighted SNR" statistic, and does not require the chi-squared test to be computed. Conversely, if only the SNR and chi-squared values of single-detector events are available, random forest classification performs nearly identically to the reweighted SNR.
The potential use of cuticular hydrocarbons and multivariate analysis to age empty puparial cases of Calliphora vicina and Lucilia sericata.

PubMed

Moore, Hannah E; Pechal, Jennifer L; Benbow, M Eric; Drijfhout, Falko P

2017-05-16

Cuticular hydrocarbons (CHC) have been successfully used in the field of forensic entomology for identifying and ageing forensically important blowfly species, primarily in the larval stages. However in older scenes where all other entomological evidence is no longer present, Calliphoridae puparial cases can often be all that remains and therefore being able to establish the age could give an indication of the PMI. This paper examined the CHCs present in the lipid wax layer of insects, to determine the age of the cases over a period of nine months. The two forensically important species examined were Calliphora vicina and Lucilia sericata. The hydrocarbons were chemically extracted and analysed using Gas Chromatography - Mass Spectrometry. Statistical analysis was then applied in the form of non-metric multidimensional scaling analysis (NMDS), permutational multivariate analysis of variance (PERMANOVA) and random forest models. This study was successful in determining age differences within the empty cases, which to date, has not been establish by any other technique.
Calibrating random forests for probability estimation.

PubMed

Dankowski, Theresa; Ziegler, Andreas

2016-09-30

Probabilities can be consistently estimated using random forests. It is, however, unclear how random forests should be updated to make predictions for other centers or at different time points. In this work, we present two approaches for updating random forests for probability estimation. The first method has been proposed by Elkan and may be used for updating any machine learning approach yielding consistent probabilities, so-called probability machines. The second approach is a new strategy specifically developed for random forests. Using the terminal nodes, which represent conditional probabilities, the random forest is first translated to logistic regression models. These are, in turn, used for re-calibration. The two updating strategies were compared in a simulation study and are illustrated with data from the German Stroke Study Collaboration. In most simulation scenarios, both methods led to similar improvements. In the simulation scenario in which the stricter assumptions of Elkan's method were not met, the logistic regression-based re-calibration approach for random forests outperformed Elkan's method. It also performed better on the stroke data than Elkan's method. The strength of Elkan's method is its general applicability to any probability machine. However, if the strict assumptions underlying this approach are not met, the logistic regression-based approach is preferable for updating random forests for probability estimation. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.
Applying a weighted random forests method to extract karst sinkholes from LiDAR data

NASA Astrophysics Data System (ADS)

Zhu, Junfeng; Pierskalla, William P.

2016-02-01

Detailed mapping of sinkholes provides critical information for mitigating sinkhole hazards and understanding groundwater and surface water interactions in karst terrains. LiDAR (Light Detection and Ranging) measures the earth's surface in high-resolution and high-density and has shown great potentials to drastically improve locating and delineating sinkholes. However, processing LiDAR data to extract sinkholes requires separating sinkholes from other depressions, which can be laborious because of the sheer number of the depressions commonly generated from LiDAR data. In this study, we applied the random forests, a machine learning method, to automatically separate sinkholes from other depressions in a karst region in central Kentucky. The sinkhole-extraction random forest was grown on a training dataset built from an area where LiDAR-derived depressions were manually classified through a visual inspection and field verification process. Based on the geometry of depressions, as well as natural and human factors related to sinkholes, 11 parameters were selected as predictive variables to form the dataset. Because the training dataset was imbalanced with the majority of depressions being non-sinkholes, a weighted random forests method was used to improve the accuracy of predicting sinkholes. The weighted random forest achieved an average accuracy of 89.95% for the training dataset, demonstrating that the random forest can be an effective sinkhole classifier. Testing of the random forest in another area, however, resulted in moderate success with an average accuracy rate of 73.96%. This study suggests that an automatic sinkhole extraction procedure like the random forest classifier can significantly reduce time and labor costs and makes its more tractable to map sinkholes using LiDAR data for large areas. However, the random forests method cannot totally replace manual procedures, such as visual inspection and field verification.
Statistical-learning strategies generate only modestly performing predictive models for urinary symptoms following external beam radiotherapy of the prostate: A comparison of conventional and machine-learning methods

DOE Office of Scientific and Technical Information (OSTI.GOV)

Yahya, Noorazrul, E-mail: noorazrul.yahya@research.uwa.edu.au; Ebert, Martin A.; Bulsara, Max

Purpose: Given the paucity of available data concerning radiotherapy-induced urinary toxicity, it is important to ensure derivation of the most robust models with superior predictive performance. This work explores multiple statistical-learning strategies for prediction of urinary symptoms following external beam radiotherapy of the prostate. Methods: The performance of logistic regression, elastic-net, support-vector machine, random forest, neural network, and multivariate adaptive regression splines (MARS) to predict urinary symptoms was analyzed using data from 754 participants accrued by TROG03.04-RADAR. Predictive features included dose-surface data, comorbidities, and medication-intake. Four symptoms were analyzed: dysuria, haematuria, incontinence, and frequency, each with three definitions (grade ≥more » 1, grade ≥ 2 and longitudinal) with event rate between 2.3% and 76.1%. Repeated cross-validations producing matched models were implemented. A synthetic minority oversampling technique was utilized in endpoints with rare events. Parameter optimization was performed on the training data. Area under the receiver operating characteristic curve (AUROC) was used to compare performance using sample size to detect differences of ≥0.05 at the 95% confidence level. Results: Logistic regression, elastic-net, random forest, MARS, and support-vector machine were the highest-performing statistical-learning strategies in 3, 3, 3, 2, and 1 endpoints, respectively. Logistic regression, MARS, elastic-net, random forest, neural network, and support-vector machine were the best, or were not significantly worse than the best, in 7, 7, 5, 5, 3, and 1 endpoints. The best-performing statistical model was for dysuria grade ≥ 1 with AUROC ± standard deviation of 0.649 ± 0.074 using MARS. For longitudinal frequency and dysuria grade ≥ 1, all strategies produced AUROC>0.6 while all haematuria endpoints and longitudinal incontinence models produced AUROC<0.6. Conclusions: Logistic regression and MARS were most likely to be the best-performing strategy for the prediction of urinary symptoms with elastic-net and random forest producing competitive results. The predictive power of the models was modest and endpoint-dependent. New features, including spatial dose maps, may be necessary to achieve better models.« less
An Alternative Method for Computing Mean and Covariance Matrix of Some Multivariate Distributions

ERIC Educational Resources Information Center

Radhakrishnan, R.; Choudhury, Askar

2009-01-01

Computing the mean and covariance matrix of some multivariate distributions, in particular, multivariate normal distribution and Wishart distribution are considered in this article. It involves a matrix transformation of the normal random vector into a random vector whose components are independent normal random variables, and then integrating…
CURE-SMOTE algorithm and hybrid algorithm for feature selection and parameter optimization based on random forests.

PubMed

Ma, Li; Fan, Suohai

2017-03-14

The random forests algorithm is a type of classifier with prominent universality, a wide application range, and robustness for avoiding overfitting. But there are still some drawbacks to random forests. Therefore, to improve the performance of random forests, this paper seeks to improve imbalanced data processing, feature selection and parameter optimization. We propose the CURE-SMOTE algorithm for the imbalanced data classification problem. Experiments on imbalanced UCI data reveal that the combination of Clustering Using Representatives (CURE) enhances the original synthetic minority oversampling technique (SMOTE) algorithms effectively compared with the classification results on the original data using random sampling, Borderline-SMOTE1, safe-level SMOTE, C-SMOTE, and k-means-SMOTE. Additionally, the hybrid RF (random forests) algorithm has been proposed for feature selection and parameter optimization, which uses the minimum out of bag (OOB) data error as its objective function. Simulation results on binary and higher-dimensional data indicate that the proposed hybrid RF algorithms, hybrid genetic-random forests algorithm, hybrid particle swarm-random forests algorithm and hybrid fish swarm-random forests algorithm can achieve the minimum OOB error and show the best generalization ability. The training set produced from the proposed CURE-SMOTE algorithm is closer to the original data distribution because it contains minimal noise. Thus, better classification results are produced from this feasible and effective algorithm. Moreover, the hybrid algorithm's F-value, G-mean, AUC and OOB scores demonstrate that they surpass the performance of the original RF algorithm. Hence, this hybrid algorithm provides a new way to perform feature selection and parameter optimization.
Relevant Feature Set Estimation with a Knock-out Strategy and Random Forests

PubMed Central

Ganz, Melanie; Greve, Douglas N.; Fischl, Bruce; Konukoglu, Ender

2015-01-01

Group analysis of neuroimaging data is a vital tool for identifying anatomical and functional variations related to diseases as well as normal biological processes. The analyses are often performed on a large number of highly correlated measurements using a relatively smaller number of samples. Despite the correlation structure, the most widely used approach is to analyze the data using univariate methods followed by post-hoc corrections that try to account for the data’s multivariate nature. Although widely used, this approach may fail to recover from the adverse effects of the initial analysis when local effects are not strong. Multivariate pattern analysis (MVPA) is a powerful alternative to the univariate approach for identifying relevant variations. Jointly analyzing all the measures, MVPA techniques can detect global effects even when individual local effects are too weak to detect with univariate analysis. Current approaches are successful in identifying variations that yield highly predictive and compact models. However, they suffer from lessened sensitivity and instabilities in identification of relevant variations. Furthermore, current methods’ user-defined parameters are often unintuitive and difficult to determine. In this article, we propose a novel MVPA method for group analysis of high-dimensional data that overcomes the drawbacks of the current techniques. Our approach explicitly aims to identify all relevant variations using a “knock-out” strategy and the Random Forest algorithm. In evaluations with synthetic datasets the proposed method achieved substantially higher sensitivity and accuracy than the state-of-the-art MVPA methods, and outperformed the univariate approach when the effect size is low. In experiments with real datasets the proposed method identified regions beyond the univariate approach, while other MVPA methods failed to replicate the univariate results. More importantly, in a reproducibility study with the well-known ADNI dataset the proposed method yielded higher stability and power than the univariate approach. PMID:26272728
Outcome prediction in patients with glioblastoma by using imaging, clinical, and genomic biomarkers: focus on the nonenhancing component of the tumor.

PubMed

Jain, Rajan; Poisson, Laila M; Gutman, David; Scarpace, Lisa; Hwang, Scott N; Holder, Chad A; Wintermark, Max; Rao, Arvind; Colen, Rivka R; Kirby, Justin; Freymann, John; Jaffe, C Carl; Mikkelsen, Tom; Flanders, Adam

2014-08-01

To correlate patient survival with morphologic imaging features and hemodynamic parameters obtained from the nonenhancing region (NER) of glioblastoma (GBM), along with clinical and genomic markers. An institutional review board waiver was obtained for this HIPAA-compliant retrospective study. Forty-five patients with GBM underwent baseline imaging with contrast material-enhanced magnetic resonance (MR) imaging and dynamic susceptibility contrast-enhanced T2*-weighted perfusion MR imaging. Molecular and clinical predictors of survival were obtained. Single and multivariable models of overall survival (OS) and progression-free survival (PFS) were explored with Kaplan-Meier estimates, Cox regression, and random survival forests. Worsening OS (log-rank test, P = .0103) and PFS (log-rank test, P = .0223) were associated with increasing relative cerebral blood volume of NER (rCBVNER), which was higher with deep white matter involvement (t test, P = .0482) and poor NER margin definition (t test, P = .0147). NER crossing the midline was the only morphologic feature of NER associated with poor survival (log-rank test, P = .0125). Preoperative Karnofsky performance score (KPS) and resection extent (n = 30) were clinically significant OS predictors (log-rank test, P = .0176 and P = .0038, respectively). No genomic alterations were associated with survival, except patients with high rCBVNER and wild-type epidermal growth factor receptor (EGFR) mutation had significantly poor survival (log-rank test, P = .0306; area under the receiver operating characteristic curve = 0.62). Combining resection extent with rCBVNER marginally improved prognostic ability (permutation, P = .084). Random forest models of presurgical predictors indicated rCBVNER as the top predictor; also important were KPS, age at diagnosis, and NER crossing the midline. A multivariable model containing rCBVNER, age at diagnosis, and KPS can be used to group patients with more than 1 year of difference in observed median survival (0.49-1.79 years). Patients with high rCBVNER and NER crossing the midline and those with high rCBVNER and wild-type EGFR mutation showed poor survival. In multivariable survival models, however, rCBVNER provided unique prognostic information that went above and beyond the assessment of all NER imaging features, as well as clinical and genomic features.
Outcome Prediction in Patients with Glioblastoma by Using Imaging, Clinical, and Genomic Biomarkers: Focus on the Nonenhancing Component of the Tumor

PubMed Central

Poisson, Laila M.; Gutman, David; Scarpace, Lisa; Hwang, Scott N.; Holder, Chad A.; Wintermark, Max; Rao, Arvind; Colen, Rivka R.; Kirby, Justin; Freymann, John; Jaffe, C. Carl; Mikkelsen, Tom; Flanders, Adam

2014-01-01

Purpose To correlate patient survival with morphologic imaging features and hemodynamic parameters obtained from the nonenhancing region (NER) of glioblastoma (GBM), along with clinical and genomic markers. Materials and Methods An institutional review board waiver was obtained for this HIPAA-compliant retrospective study. Forty-five patients with GBM underwent baseline imaging with contrast material–enhanced magnetic resonance (MR) imaging and dynamic susceptibility contrast-enhanced T2*-weighted perfusion MR imaging. Molecular and clinical predictors of survival were obtained. Single and multivariable models of overall survival (OS) and progression-free survival (PFS) were explored with Kaplan-Meier estimates, Cox regression, and random survival forests. Results Worsening OS (log-rank test, P = .0103) and PFS (log-rank test, P = .0223) were associated with increasing relative cerebral blood volume of NER (rCBVNER), which was higher with deep white matter involvement (t test, P = .0482) and poor NER margin definition (t test, P = .0147). NER crossing the midline was the only morphologic feature of NER associated with poor survival (log-rank test, P = .0125). Preoperative Karnofsky performance score (KPS) and resection extent (n = 30) were clinically significant OS predictors (log-rank test, P = .0176 and P = .0038, respectively). No genomic alterations were associated with survival, except patients with high rCBVNER and wild-type epidermal growth factor receptor (EGFR) mutation had significantly poor survival (log-rank test, P = .0306; area under the receiver operating characteristic curve = 0.62). Combining resection extent with rCBVNER marginally improved prognostic ability (permutation, P = .084). Random forest models of presurgical predictors indicated rCBVNER as the top predictor; also important were KPS, age at diagnosis, and NER crossing the midline. A multivariable model containing rCBVNER, age at diagnosis, and KPS can be used to group patients with more than 1 year of difference in observed median survival (0.49–1.79 years). Conclusion Patients with high rCBVNER and NER crossing the midline and those with high rCBVNER and wild-type EGFR mutation showed poor survival. In multivariable survival models, however, rCBVNER provided unique prognostic information that went above and beyond the assessment of all NER imaging features, as well as clinical and genomic features. © RSNA, 2014 Online supplemental material is available for this article. PMID:24646147
Non-random species loss in a forest herbaceous layer following nitrogen addition

Treesearch

Christopher A. Walter; Mary Beth Adams; Frank S. Gilliam; William T. Peterjohn

2017-01-01

Nitrogen (N) additions have decreased species richness (S) in hardwood forest herbaceous layers, yet the functional mechanisms for these decreases have not been explicitly evaluated.We tested two hypothesized mechanisms, random species loss (RSL) and non-random species loss (NRSL), in the hardwood forest herbaceous layer of a long-term, plot-scale...
Chemical Characterization of Young Virgin Queens and Mated Egg-Laying Queens in the Ant Cataglyphis cursor: Random Forest Classification Analysis for Multivariate Datasets.

PubMed

Monnin, Thibaud; Helft, Florence; Leroy, Chloé; d'Ettorre, Patrizia; Doums, Claudie

2018-02-01

Social insects are well known for their extremely rich chemical communication, yet their sex pheromones remain poorly studied. In the thermophilic and thelytokous ant, Cataglyphis cursor, we analyzed the cuticular hydrocarbon profiles and Dufour's gland contents of queens of different age and reproductive status (sexually immature gynes, sexually mature gynes, mated and egg-laying queens) and of workers. Random forest classification analyses showed that the four groups of individuals were well separated for both chemical sources, except mature gynes that clustered with queens for cuticular hydrocarbons and with immature gynes for Dufour's gland secretions. Analyses carried out with two groups of females only allowed identification of candidate chemicals for queen signal and for sexual attractant. In particular, gynes produced more undecane in the Dufour's gland. This chemical is both the sex pheromone and the alarm pheromone of the ant Formica lugubris. It may therefore act as sex pheromone in C. cursor, and/or be involved in the restoration of monogyny that occurs rapidly following colony fission. Indeed, new colonies often start with several gynes and all but one are rapidly culled by workers, and this process likely involves chemical signals between gynes and workers. These findings open novel opportunities for experimental studies of inclusive mate choice and queen choice in C. cursor.
Comparison of partial least squares and random forests for evaluating relationship between phenolics and bioactivities of Neptunia oleracea.

PubMed

Lee, Soo Yee; Mediani, Ahmed; Maulidiani, Maulidiani; Khatib, Alfi; Ismail, Intan Safinar; Zawawi, Norhasnida; Abas, Faridah

2018-01-01

Neptunia oleracea is a plant consumed as a vegetable and which has been used as a folk remedy for several diseases. Herein, two regression models (partial least squares, PLS; and random forest, RF) in a metabolomics approach were compared and applied to the evaluation of the relationship between phenolics and bioactivities of N. oleracea. In addition, the effects of different extraction conditions on the phenolic constituents were assessed by pattern recognition analysis. Comparison of the PLS and RF showed that RF exhibited poorer generalization and hence poorer predictive performance. Both the regression coefficient of PLS and the variable importance of RF revealed that quercetin and kaempferol derivatives, caffeic acid and vitexin-2-O-rhamnoside were significant towards the tested bioactivities. Furthermore, principal component analysis (PCA) and partial least squares-discriminant analysis (PLS-DA) results showed that sonication and absolute ethanol are the preferable extraction method and ethanol ratio, respectively, to produce N. oleracea extracts with high phenolic levels and therefore high DPPH scavenging and α-glucosidase inhibitory activities. Both PLS and RF are useful regression models in metabolomics studies. This work provides insight into the performance of different multivariate data analysis tools and the effects of different extraction conditions on the extraction of desired phenolics from plants. © 2017 Society of Chemical Industry. © 2017 Society of Chemical Industry.
An imputed forest composition map for New England screened by species range boundaries

Treesearch

Matthew J. Duveneck; Jonathan R. Thompson; B. Tyler Wilson

2015-01-01

Initializing forest landscape models (FLMs) to simulate changes in tree species composition requires accurate fine-scale forest attribute information mapped continuously over large areas. Nearest-neighbor imputation maps, maps developed from multivariate imputation of field plots, have high potential for use as the initial condition within FLMs, but the tendency for...
Movement trajectories and habitat partitioning of small mammals in logged and unlogged rain forests on Borneo.

PubMed

Wells, Konstans; Pfeiffer, Martin; Lakim, Maklarin B; Kalko, Elisabeth K V

2006-09-01

1. Non-volant animals in tropical rain forests differ in their ability to exploit the habitat above the forest floor and also in their response to habitat variability. It is predicted that specific movement trajectories are determined both by intrinsic factors such as ecological specialization, morphology and body size and by structural features of the surrounding habitat such as undergrowth and availability of supportive structures. 2. We applied spool-and-line tracking in order to describe movement trajectories and habitat segregation of eight species of small mammals from an assemblage of Muridae, Tupaiidae and Sciuridae in the rain forest of Borneo where we followed a total of 13,525 m path. We also analysed specific changes in the movement patterns of the small mammals in relation to habitat stratification between logged and unlogged forests. Variables related to climbing activity of the tracked species as well as the supportive structures of the vegetation and undergrowth density were measured along their tracks. 3. Movement patterns of the small mammals differed significantly between species. Most similarities were found in congeneric species that converged strongly in body size and morphology. All species were affected in their movement patterns by the altered forest structure in logged forests with most differences found in Leopoldamys sabanus. However, the large proportions of short step lengths found in all species for both forest types and similar path tortuosity suggest that the main movement strategies of the small mammals were not influenced by logging but comprised generally a response to the heterogeneous habitat as opposed to random movement strategies predicted for homogeneous environments. 4. Overall shifts in microhabitat use showed no coherent trend among species. Multivariate (principal component) analysis revealed contrasting trends for convergent species, in particular for Maxomys rajah and M. surifer as well as for Tupaia longipes and T. tana, suggesting that each species was uniquely affected in its movement trajectories by a multiple set of environmental and intrinsic features.

Approximating prediction uncertainty for random forest regression models

Treesearch

John W. Coulston; Christine E. Blinn; Valerie A. Thomas; Randolph H. Wynne

2016-01-01

Machine learning approaches such as random forest haveÂ increased for the spatial modeling and mapping of continuousÂ variables. Random forest is a non-parametric ensembleÂ approach, and unlike traditional regression approaches thereÂ is no direct quantification of prediction error. UnderstandingÂ prediction uncertainty is important when using model-basedÂ continuous maps as...
Comparison of the Predictive Performance and Interpretability of Random Forest and Linear Models on Benchmark Data Sets.

PubMed

Marchese Robinson, Richard L; Palczewska, Anna; Palczewski, Jan; Kidley, Nathan

2017-08-28

The ability to interpret the predictions made by quantitative structure-activity relationships (QSARs) offers a number of advantages. While QSARs built using nonlinear modeling approaches, such as the popular Random Forest algorithm, might sometimes be more predictive than those built using linear modeling approaches, their predictions have been perceived as difficult to interpret. However, a growing number of approaches have been proposed for interpreting nonlinear QSAR models in general and Random Forest in particular. In the current work, we compare the performance of Random Forest to those of two widely used linear modeling approaches: linear Support Vector Machines (SVMs) (or Support Vector Regression (SVR)) and partial least-squares (PLS). We compare their performance in terms of their predictivity as well as the chemical interpretability of the predictions using novel scoring schemes for assessing heat map images of substructural contributions. We critically assess different approaches for interpreting Random Forest models as well as for obtaining predictions from the forest. We assess the models on a large number of widely employed public-domain benchmark data sets corresponding to regression and binary classification problems of relevance to hit identification and toxicology. We conclude that Random Forest typically yields comparable or possibly better predictive performance than the linear modeling approaches and that its predictions may also be interpreted in a chemically and biologically meaningful way. In contrast to earlier work looking at interpretation of nonlinear QSAR models, we directly compare two methodologically distinct approaches for interpreting Random Forest models. The approaches for interpreting Random Forest assessed in our article were implemented using open-source programs that we have made available to the community. These programs are the rfFC package ( https://r-forge.r-project.org/R/?group_id=1725 ) for the R statistical programming language and the Python program HeatMapWrapper [ https://doi.org/10.5281/zenodo.495163 ] for heat map generation.
Predicting temperate forest stand types using only structural profiles from discrete return airborne lidar

NASA Astrophysics Data System (ADS)

Fedrigo, Melissa; Newnham, Glenn J.; Coops, Nicholas C.; Culvenor, Darius S.; Bolton, Douglas K.; Nitschke, Craig R.

2018-02-01

Light detection and ranging (lidar) data have been increasingly used for forest classification due to its ability to penetrate the forest canopy and provide detail about the structure of the lower strata. In this study we demonstrate forest classification approaches using airborne lidar data as inputs to random forest and linear unmixing classification algorithms. Our results demonstrated that both random forest and linear unmixing models identified a distribution of rainforest and eucalypt stands that was comparable to existing ecological vegetation class (EVC) maps based primarily on manual interpretation of high resolution aerial imagery. Rainforest stands were also identified in the region that have not previously been identified in the EVC maps. The transition between stand types was better characterised by the random forest modelling approach. In contrast, the linear unmixing model placed greater emphasis on field plots selected as endmembers which may not have captured the variability in stand structure within a single stand type. The random forest model had the highest overall accuracy (84%) and Cohen's kappa coefficient (0.62). However, the classification accuracy was only marginally better than linear unmixing. The random forest model was applied to a region in the Central Highlands of south-eastern Australia to produce maps of stand type probability, including areas of transition (the 'ecotone') between rainforest and eucalypt forest. The resulting map provided a detailed delineation of forest classes, which specifically recognised the coalescing of stand types at the landscape scale. This represents a key step towards mapping the structural and spatial complexity of these ecosystems, which is important for both their management and conservation.
Novel solutions for an old disease: diagnosis of acute appendicitis with random forest, support vector machines, and artificial neural networks.

PubMed

Hsieh, Chung-Ho; Lu, Ruey-Hwa; Lee, Nai-Hsin; Chiu, Wen-Ta; Hsu, Min-Huei; Li, Yu-Chuan Jack

2011-01-01

Diagnosing acute appendicitis clinically is still difficult. We developed random forests, support vector machines, and artificial neural network models to diagnose acute appendicitis. Between January 2006 and December 2008, patients who had a consultation session with surgeons for suspected acute appendicitis were enrolled. Seventy-five percent of the data set was used to construct models including random forest, support vector machines, artificial neural networks, and logistic regression. Twenty-five percent of the data set was withheld to evaluate model performance. The area under the receiver operating characteristic curve (AUC) was used to evaluate performance, which was compared with that of the Alvarado score. Data from a total of 180 patients were collected, 135 used for training and 45 for testing. The mean age of patients was 39.4 years (range, 16-85). Final diagnosis revealed 115 patients with and 65 without appendicitis. The AUC of random forest, support vector machines, artificial neural networks, logistic regression, and Alvarado was 0.98, 0.96, 0.91, 0.87, and 0.77, respectively. The sensitivity, specificity, positive, and negative predictive values of random forest were 94%, 100%, 100%, and 87%, respectively. Random forest performed better than artificial neural networks, logistic regression, and Alvarado. We demonstrated that random forest can predict acute appendicitis with good accuracy and, deployed appropriately, can be an effective tool in clinical decision making. Copyright © 2011 Mosby, Inc. All rights reserved.
The experimental design of the Missouri Ozark Forest Ecosystem Project

Treesearch

Steven L. Sheriff; Shuoqiong He

1997-01-01

The Missouri Ozark Forest Ecosystem Project (MOFEP) is an experiment that examines the effects of three forest management practices on the forest community. MOFEP is designed as a randomized complete block design using nine sites divided into three blocks. Treatments of uneven-aged, even-aged, and no-harvest management were randomly assigned to sites within each block...
Estimating habitat value using forest inventory data: the fisher (Martes pennanti) in northwestern California

Treesearch

William J. Zielinski; Jeffrey R. Dunk; Andrew N. Gray

2012-01-01

Managing forests for multiple objectives requires balancing timber and vegetation management objectives with needs of sensitive species. Especially challenging is how to retain the habitat elements for species that are typically associated with late-seral forests. We develop a regionally specific, multivariate model describing habitat selection that can be used â when...
Estimating areal means and variances of forest attributes using the k-Nearest Neighbors technique and satellite imagery

Treesearch

Ronald E. McRoberts; Erkki O. Tomppo; Andrew O. Finley; Heikkinen Juha

2007-01-01

The k-Nearest Neighbor (k-NN) technique has become extremely popular for a variety of forest inventory mapping and estimation applications. Much of this popularity may be attributed to the non-parametric, multivariate features of the technique, its intuitiveness, and its ease of use. When used with satellite imagery and forest...
Random Forest as an Imputation Method for Education and Psychology Research: Its Impact on Item Fit and Difficulty of the Rasch Model

ERIC Educational Resources Information Center

Golino, Hudson F.; Gomes, Cristiano M. A.

2016-01-01

This paper presents a non-parametric imputation technique, named random forest, from the machine learning field. The random forest procedure has two main tuning parameters: the number of trees grown in the prediction and the number of predictors used. Fifty experimental conditions were created in the imputation procedure, with different…
Random Bits Forest: a Strong Classifier/Regressor for Big Data

NASA Astrophysics Data System (ADS)

Wang, Yi; Li, Yi; Pu, Weilin; Wen, Kathryn; Shugart, Yin Yao; Xiong, Momiao; Jin, Li

2016-07-01

Efficiency, memory consumption, and robustness are common problems with many popular methods for data analysis. As a solution, we present Random Bits Forest (RBF), a classification and regression algorithm that integrates neural networks (for depth), boosting (for width), and random forests (for prediction accuracy). Through a gradient boosting scheme, it first generates and selects ~10,000 small, 3-layer random neural networks. These networks are then fed into a modified random forest algorithm to obtain predictions. Testing with datasets from the UCI (University of California, Irvine) Machine Learning Repository shows that RBF outperforms other popular methods in both accuracy and robustness, especially with large datasets (N > 1000). The algorithm also performed highly in testing with an independent data set, a real psoriasis genome-wide association study (GWAS).
Using Random Forest Models to Predict Organizational Violence

NASA Technical Reports Server (NTRS)

Levine, Burton; Bobashev, Georgly

2012-01-01

We present a methodology to access the proclivity of an organization to commit violence against nongovernment personnel. We fitted a Random Forest model using the Minority at Risk Organizational Behavior (MAROS) dataset. The MAROS data is longitudinal; so, individual observations are not independent. We propose a modification to the standard Random Forest methodology to account for the violation of the independence assumption. We present the results of the model fit, an example of predicting violence for an organization; and finally, we present a summary of the forest in a "meta-tree,"
The Past, Present and Future of the Meteorological Phenomena Identification Near the Ground (mPING) Project

NASA Astrophysics Data System (ADS)

Elmore, K. L.

2016-12-01

The Metorological Phenomemna Identification NeartheGround (mPING) project is an example of a crowd-sourced, citizen science effort to gather data of sufficeint quality and quantity needed by new post processing methods that use machine learning. Transportation and infrastructure are particularly sensitive to precipitation type in winter weather. We extract attributes from operational numerical forecast models and use them in a random forest to generate forecast winter precipitation types. We find that random forests applied to forecast soundings are effective at generating skillful forecasts of surface ptype with consideralbly more skill than the current algorithms, especuially for ice pellets and freezing rain. We also find that three very different forecast models yuield similar overall results, showing that random forests are able to extract essentially equivalent information from different forecast models. We also show that the random forest for each model, and each profile type is unique to the particular forecast model and that the random forests developed using a particular model suffer significant degradation when given attributes derived from a different model. This implies that no single algorithm can perform well across all forecast models. Clearly, random forests extract information unavailable to "physically based" methods because the physical information in the models does not appear as we expect. One intersting result is that results from the classic "warm nose" sounding profile are, by far, the most sensitive to the particular forecast model, but this profile is also the one for which random forests are most skillful. Finally, a method for calibrarting probabilties for each different ptype using multinomial logistic regression is shown.
Quantitative monitoring of sucrose, reducing sugar and total sugar dynamics for phenotyping of water-deficit stress tolerance in rice through spectroscopy and chemometrics

NASA Astrophysics Data System (ADS)

Das, Bappa; Sahoo, Rabi N.; Pargal, Sourabh; Krishna, Gopal; Verma, Rakesh; Chinnusamy, Viswanathan; Sehgal, Vinay K.; Gupta, Vinod K.; Dash, Sushanta K.; Swain, Padmini

2018-03-01

In the present investigation, the changes in sucrose, reducing and total sugar content due to water-deficit stress in rice leaves were modeled using visible, near infrared (VNIR) and shortwave infrared (SWIR) spectroscopy. The objectives of the study were to identify the best vegetation indices and suitable multivariate technique based on precise analysis of hyperspectral data (350 to 2500 nm) and sucrose, reducing sugar and total sugar content measured at different stress levels from 16 different rice genotypes. Spectral data analysis was done to identify suitable spectral indices and models for sucrose estimation. Novel spectral indices in near infrared (NIR) range viz. ratio spectral index (RSI) and normalised difference spectral indices (NDSI) sensitive to sucrose, reducing sugar and total sugar content were identified which were subsequently calibrated and validated. The RSI and NDSI models had R2 values of 0.65, 0.71 and 0.67; RPD values of 1.68, 1.95 and 1.66 for sucrose, reducing sugar and total sugar, respectively for validation dataset. Different multivariate spectral models such as artificial neural network (ANN), multivariate adaptive regression splines (MARS), multiple linear regression (MLR), partial least square regression (PLSR), random forest regression (RFR) and support vector machine regression (SVMR) were also evaluated. The best performing multivariate models for sucrose, reducing sugars and total sugars were found to be, MARS, ANN and MARS, respectively with respect to RPD values of 2.08, 2.44, and 1.93. Results indicated that VNIR and SWIR spectroscopy combined with multivariate calibration can be used as a reliable alternative to conventional methods for measurement of sucrose, reducing sugars and total sugars of rice under water-deficit stress as this technique is fast, economic, and noninvasive.
Ecological consequences of alternative fuel reduction treatments in seasonally dry forests: the national fire and fire surrogate study

Treesearch

J.D. McIver; C.J. Fettig

2010-01-01

This special issue of Forest Science features the national Fire and Fire Surrogate study (FFS), a niultisite, multivariate research project that evaluates the ecological consequences of prescribed fire and its mechanical surrogates in seasonally dry forests of the United States. The need for a comprehensive national FFS study stemmed from concern that information on...
A Random Forest-based ensemble method for activity recognition.

PubMed

Feng, Zengtao; Mo, Lingfei; Li, Meng

2015-01-01

This paper presents a multi-sensor ensemble approach to human physical activity (PA) recognition, using random forest. We designed an ensemble learning algorithm, which integrates several independent Random Forest classifiers based on different sensor feature sets to build a more stable, more accurate and faster classifier for human activity recognition. To evaluate the algorithm, PA data collected from the PAMAP (Physical Activity Monitoring for Aging People), which is a standard, publicly available database, was utilized to train and test. The experimental results show that the algorithm is able to correctly recognize 19 PA types with an accuracy of 93.44%, while the training is faster than others. The ensemble classifier system based on the RF (Random Forest) algorithm can achieve high recognition accuracy and fast calculation.
Unbiased split variable selection for random survival forests using maximally selected rank statistics.

PubMed

Wright, Marvin N; Dankowski, Theresa; Ziegler, Andreas

2017-04-15

The most popular approach for analyzing survival data is the Cox regression model. The Cox model may, however, be misspecified, and its proportionality assumption may not always be fulfilled. An alternative approach for survival prediction is random forests for survival outcomes. The standard split criterion for random survival forests is the log-rank test statistic, which favors splitting variables with many possible split points. Conditional inference forests avoid this split variable selection bias. However, linear rank statistics are utilized by default in conditional inference forests to select the optimal splitting variable, which cannot detect non-linear effects in the independent variables. An alternative is to use maximally selected rank statistics for the split point selection. As in conditional inference forests, splitting variables are compared on the p-value scale. However, instead of the conditional Monte-Carlo approach used in conditional inference forests, p-value approximations are employed. We describe several p-value approximations and the implementation of the proposed random forest approach. A simulation study demonstrates that unbiased split variable selection is possible. However, there is a trade-off between unbiased split variable selection and runtime. In benchmark studies of prediction performance on simulated and real datasets, the new method performs better than random survival forests if informative dichotomous variables are combined with uninformative variables with more categories and better than conditional inference forests if non-linear covariate effects are included. In a runtime comparison, the method proves to be computationally faster than both alternatives, if a simple p-value approximation is used. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
A comparison of the conditional inference survival forest model to random survival forests based on a simulation study as well as on two applications with time-to-event data.

PubMed

Nasejje, Justine B; Mwambi, Henry; Dheda, Keertan; Lesosky, Maia

2017-07-28

Random survival forest (RSF) models have been identified as alternative methods to the Cox proportional hazards model in analysing time-to-event data. These methods, however, have been criticised for the bias that results from favouring covariates with many split-points and hence conditional inference forests for time-to-event data have been suggested. Conditional inference forests (CIF) are known to correct the bias in RSF models by separating the procedure for the best covariate to split on from that of the best split point search for the selected covariate. In this study, we compare the random survival forest model to the conditional inference model (CIF) using twenty-two simulated time-to-event datasets. We also analysed two real time-to-event datasets. The first dataset is based on the survival of children under-five years of age in Uganda and it consists of categorical covariates with most of them having more than two levels (many split-points). The second dataset is based on the survival of patients with extremely drug resistant tuberculosis (XDR TB) which consists of mainly categorical covariates with two levels (few split-points). The study findings indicate that the conditional inference forest model is superior to random survival forest models in analysing time-to-event data that consists of covariates with many split-points based on the values of the bootstrap cross-validated estimates for integrated Brier scores. However, conditional inference forests perform comparably similar to random survival forests models in analysing time-to-event data consisting of covariates with fewer split-points. Although survival forests are promising methods in analysing time-to-event data, it is important to identify the best forest model for analysis based on the nature of covariates of the dataset in question.
SNP selection and classification of genome-wide SNP data using stratified sampling random forests.

PubMed

Wu, Qingyao; Ye, Yunming; Liu, Yang; Ng, Michael K

2012-09-01

For high dimensional genome-wide association (GWA) case-control data of complex disease, there are usually a large portion of single-nucleotide polymorphisms (SNPs) that are irrelevant with the disease. A simple random sampling method in random forest using default mtry parameter to choose feature subspace, will select too many subspaces without informative SNPs. Exhaustive searching an optimal mtry is often required in order to include useful and relevant SNPs and get rid of vast of non-informative SNPs. However, it is too time-consuming and not favorable in GWA for high-dimensional data. The main aim of this paper is to propose a stratified sampling method for feature subspace selection to generate decision trees in a random forest for GWA high-dimensional data. Our idea is to design an equal-width discretization scheme for informativeness to divide SNPs into multiple groups. In feature subspace selection, we randomly select the same number of SNPs from each group and combine them to form a subspace to generate a decision tree. The advantage of this stratified sampling procedure can make sure each subspace contains enough useful SNPs, but can avoid a very high computational cost of exhaustive search of an optimal mtry, and maintain the randomness of a random forest. We employ two genome-wide SNP data sets (Parkinson case-control data comprised of 408 803 SNPs and Alzheimer case-control data comprised of 380 157 SNPs) to demonstrate that the proposed stratified sampling method is effective, and it can generate better random forest with higher accuracy and lower error bound than those by Breiman's random forest generation method. For Parkinson data, we also show some interesting genes identified by the method, which may be associated with neurological disorders for further biological investigations.
Application of machine learning methods to describe the effects of conjugated equine estrogens therapy on region-specific brain volumes.

PubMed

Casanova, Ramon; Espeland, Mark A; Goveas, Joseph S; Davatzikos, Christos; Gaussoin, Sarah A; Maldjian, Joseph A; Brunner, Robert L; Kuller, Lewis H; Johnson, Karen C; Mysiw, W Jerry; Wagner, Benjamin; Resnick, Susan M

2011-05-01

Use of conjugated equine estrogens (CEE) has been linked to smaller regional brain volumes in women aged ≥65 years; however, it is unknown whether this results in a broad-based characteristic pattern of effects. Structural magnetic resonance imaging was used to assess regional volumes of normal tissue and ischemic lesions among 513 women who had been enrolled in a randomized clinical trial of CEE therapy for an average of 6.6 years, beginning at ages 65-80 years. A multivariate pattern analysis, based on a machine learning technique that combined Random Forest and logistic regression with L(1) penalty, was applied to identify patterns among regional volumes associated with therapy and whether patterns discriminate between treatment groups. The multivariate pattern analysis detected smaller regional volumes of normal tissue within the limbic and temporal lobes among women that had been assigned to CEE therapy. Mean decrements ranged as high as 7% in the left entorhinal cortex and 5% in the left perirhinal cortex, which exceeded the effect sizes reported previously in frontal lobe and hippocampus. Overall accuracy of classification based on these patterns, however, was projected to be only 54.5%. Prescription of CEE therapy for an average of 6.6 years is associated with lower regional brain volumes, but it does not induce a characteristic spatial pattern of changes in brain volumes of sufficient magnitude to discriminate users and nonusers. Copyright © 2011 Elsevier Inc. All rights reserved.
Application of machine learning methods to describe the effects of conjugated equine estrogens therapy on region-specific brain volumes

PubMed Central

Casanova, Ramon; Espeland, Mark A.; Goveas, Joseph S.; Davatzikos, Christos; Gaussoin, Sarah A.; Maldjian, Joseph A.; Brunner, Robert L.; Kuller, Lewis H.; Johnson, Karen C.; Mysiw, W. Jerry; Wagner, Benjamin; Resnick, Susan M.

2011-01-01

Use of conjugated equine estrogens (CEE) has been linked to smaller regional brain volumes in women aged ≥65 years, however it is unknown whether this results in a broad-based characteristic pattern of effects. Structural MRI was used to assess regional volumes of normal tissue and ischemic lesions among 513 women who had been enrolled in a randomized clinical trial of CEE therapy for an average of 6.6 years, beginning at ages 65-80 years. A multivariate pattern analysis, based on a machine learning technique that combined Random Forest and logistic regression with L1 penalty, was applied to identify patterns among regional volumes associated with therapy and whether patterns discriminate between treatment groups. The multivariate pattern analysis detected smaller regional volumes of normal tissue within the limbic and temporal lobes among women that had been assigned to CEE therapy. Mean decrements ranged as high as 7% in the left entorhinal cortex and 5% in the left perirhinal cortex, which exceeded the effect sizes reported previously in frontal lobe and hippocampus. Overall accuracy of classification based on these patterns, however, was projected to be only 54.5%. Prescription of CEE therapy for an average of 6.6 years is associated with lower regional brain volumes, but it does not induce a characteristic spatial pattern of changes in brain volumes of sufficient magnitude to discriminate users and non-users. PMID:21292420
Spectroscopic diagnosis of laryngeal carcinoma using near-infrared Raman spectroscopy and random recursive partitioning ensemble techniques.

PubMed

Teh, Seng Khoon; Zheng, Wei; Lau, David P; Huang, Zhiwei

2009-06-01

In this work, we evaluated the diagnostic ability of near-infrared (NIR) Raman spectroscopy associated with the ensemble recursive partitioning algorithm based on random forests for identifying cancer from normal tissue in the larynx. A rapid-acquisition NIR Raman system was utilized for tissue Raman measurements at 785 nm excitation, and 50 human laryngeal tissue specimens (20 normal; 30 malignant tumors) were used for NIR Raman studies. The random forests method was introduced to develop effective diagnostic algorithms for classification of Raman spectra of different laryngeal tissues. High-quality Raman spectra in the range of 800-1800 cm(-1) can be acquired from laryngeal tissue within 5 seconds. Raman spectra differed significantly between normal and malignant laryngeal tissues. Classification results obtained from the random forests algorithm on tissue Raman spectra yielded a diagnostic sensitivity of 88.0% and specificity of 91.4% for laryngeal malignancy identification. The random forests technique also provided variables importance that facilitates correlation of significant Raman spectral features with cancer transformation. This study shows that NIR Raman spectroscopy in conjunction with random forests algorithm has a great potential for the rapid diagnosis and detection of malignant tumors in the larynx.

Machine learning to predict the occurrence of bisphosphonate-related osteonecrosis of the jaw associated with dental extraction: A preliminary report.

PubMed

Kim, Dong Wook; Kim, Hwiyoung; Nam, Woong; Kim, Hyung Jun; Cha, In-Ho

2018-04-23

The aim of this study was to build and validate five types of machine learning models that can predict the occurrence of BRONJ associated with dental extraction in patients taking bisphosphonates for the management of osteoporosis. A retrospective review of the medical records was conducted to obtain cases and controls for the study. Total 125 patients consisting of 41 cases and 84 controls were selected for the study. Five machine learning prediction algorithms including multivariable logistic regression model, decision tree, support vector machine, artificial neural network, and random forest were implemented. The outputs of these models were compared with each other and also with conventional methods, such as serum CTX level. Area under the receiver operating characteristic (ROC) curve (AUC) was used to compare the results. The performance of machine learning models was significantly superior to conventional statistical methods and single predictors. The random forest model yielded the best performance (AUC = 0.973), followed by artificial neural network (AUC = 0.915), support vector machine (AUC = 0.882), logistic regression (AUC = 0.844), decision tree (AUC = 0.821), drug holiday alone (AUC = 0.810), and CTX level alone (AUC = 0.630). Machine learning methods showed superior performance in predicting BRONJ associated with dental extraction compared to conventional statistical methods using drug holiday and serum CTX level. Machine learning can thus be applied in a wide range of clinical studies. Copyright © 2017. Published by Elsevier Inc.
Identification of ecological thresholds from variations in phytoplankton communities among lakes: contribution to the definition of environmental standards.

PubMed

Roubeix, Vincent; Danis, Pierre-Alain; Feret, Thibaut; Baudoin, Jean-Marc

2016-04-01

In aquatic ecosystems, the identification of ecological thresholds may be useful for managers as it can help to diagnose ecosystem health and to identify key levers to enable the success of preservation and restoration measures. A recent statistical method, gradient forest, based on random forests, was used to detect thresholds of phytoplankton community change in lakes along different environmental gradients. It performs exploratory analyses of multivariate biological and environmental data to estimate the location and importance of community thresholds along gradients. The method was applied to a data set of 224 French lakes which were characterized by 29 environmental variables and the mean abundances of 196 phytoplankton species. Results showed the high importance of geographic variables for the prediction of species abundances at the scale of the study. A second analysis was performed on a subset of lakes defined by geographic thresholds and presenting a higher biological homogeneity. Community thresholds were identified for the most important physico-chemical variables including water transparency, total phosphorus, ammonia, nitrates, and dissolved organic carbon. Gradient forest appeared as a powerful method at a first exploratory step, to detect ecological thresholds at large spatial scale. The thresholds that were identified here must be reinforced by the separate analysis of other aquatic communities and may be used then to set protective environmental standards after consideration of natural variability among lakes.
A gap-filling model for eddy covariance CO2 flux: Estimating carbon assimilated by a subtropical evergreen broad-leaved forest at the Lien-Hua-Chih flux observation site

NASA Astrophysics Data System (ADS)

Lan, C. Y.; Li, M. H.; Chen, Y. Y.

2016-12-01

Appropriate estimations of gaps appeared in eddy covariance (EC) flux observations are critical to the reliability of long-term EC applications. In this study we present a semi-parametric multivariate gap-filling model for tower-based measurement of CO2 flux. The raw EC data passing QC/QA was separated into two groups, clear sky, having net radiation greater than 50 W/m2, and nighttime/cloudy. For the clear sky conditions, the principle component analysis (PCA) was used to resolve the multicollinearity relationships among various environmental variables, including net radiation, wind speed, vapor pressure deficit, soil moisture deficit, leaf area index, and soil temperature, in association with CO2 assimilated by forest. After the principal domains were determined by the PCA, the relationships between CO2 fluxes and selected PCs (key factors) were built up by nonlinear interpolations to estimate the gap-filled CO2 flux. In view of limited photosynthesis at nighttime/cloudy conditions, respiration rate of the forest ecosystem was estimated by the Lloyd-Tylor equation. Artificial gaps were randomly selected to exam the applicability of our PCA approach. Based on tower-based measurement of CO2 flux at the Lien-Hua-Chih site, a total of 5.8 ton-C/ha/yr was assimilated in 2012.
Adapting GNU random forest program for Unix and Windows

NASA Astrophysics Data System (ADS)

Jirina, Marcel; Krayem, M. Said; Jirina, Marcel, Jr.

2013-10-01

The Random Forest is a well-known method and also a program for data clustering and classification. Unfortunately, the original Random Forest program is rather difficult to use. Here we describe a new version of this program originally written in Fortran 77. The modified program in Fortran 95 needs to be compiled only once and information for different tasks is passed with help of arguments. The program was tested with 24 data sets from UCI MLR and results are available on the net.
Modelling the ecological consequences of whole tree harvest for bioenergy production

NASA Astrophysics Data System (ADS)

Skår, Silje; Lange, Holger; Sogn, Trine

2013-04-01

There is an increasing demand for energy from biomass as a substitute to fossil fuels worldwide, and the Norwegian government plans to double the production of bioenergy to 9% of the national energy production or to 28 TWh per year by 2020. A large part of this increase may come from forests, which have a great potential with respect to biomass supply as forest growth increasingly has exceeded harvest in the last decades. One feasible option is the utilization of forest residues (needles, twigs and branches) in addition to stems, known as Whole Tree Harvest (WTH). As opposed to WTH, the residues are traditionally left in the forest with Conventional Timber Harvesting (CH). However, the residues contain a large share of the treés nutrients, indicating that WTH may possibly alter the supply of nutrients and organic matter to the soil and the forest ecosystem. This may potentially lead to reduced tree growth. Other implications can be nutrient imbalance, loss of carbon from the soil and changes in species composition and diversity. This study aims to identify key factors and appropriate strategies for ecologically sustainable WTH in Norway spruce (Picea abies) and Scots pine (Pinus sylvestris) forest stands in Norway. We focus on identifying key factors driving soil organic matter, nutrients, biomass, biodiversity etc. Simulations of the effect on the carbon and nitrogen budget with the two harvesting methods will also be conducted. Data from field trials and long-term manipulation experiments are used to obtain a first overview of key variables. The relationships between the variables are hitherto unknown, but it is by no means obvious that they could be assumed as linear; thus, an ordinary multiple linear regression approach is expected to be insufficient. Here we apply two advanced and highly flexible modelling frameworks which hardly have been used in the context of tree growth, nutrient balances and biomass removal so far: Generalized Additive Models (GAMs) and Random Forests. Results obtained for GAMs so far show that there are differences between WTH and CH in two directions: both the significance of drivers and the shape of the response functions differ. GAMs turn out to be a flexible and powerful alternative to multivariate linear regression. The restriction to linear relationships seems to be unjustified in the present case. We use Random Forests as a highly efficient classifier which gives reliable estimates for the importance of each driver variable in determining the diameter growth for the two different harvesting treatments. Based on the final results of these two modelling approaches, the study contributes to find appropriate strategies and suitable regions (in Norway) where WTH may be sustainable performed.
Predicting Survival From Large Echocardiography and Electronic Health Record Datasets: Optimization With Machine Learning.

PubMed

Samad, Manar D; Ulloa, Alvaro; Wehner, Gregory J; Jing, Linyuan; Hartzel, Dustin; Good, Christopher W; Williams, Brent A; Haggerty, Christopher M; Fornwalt, Brandon K

2018-06-09

The goal of this study was to use machine learning to more accurately predict survival after echocardiography. Predicting patient outcomes (e.g., survival) following echocardiography is primarily based on ejection fraction (EF) and comorbidities. However, there may be significant predictive information within additional echocardiography-derived measurements combined with clinical electronic health record data. Mortality was studied in 171,510 unselected patients who underwent 331,317 echocardiograms in a large regional health system. We investigated the predictive performance of nonlinear machine learning models compared with that of linear logistic regression models using 3 different inputs: 1) clinical variables, including 90 cardiovascular-relevant International Classification of Diseases, Tenth Revision, codes, and age, sex, height, weight, heart rate, blood pressures, low-density lipoprotein, high-density lipoprotein, and smoking; 2) clinical variables plus physician-reported EF; and 3) clinical variables and EF, plus 57 additional echocardiographic measurements. Missing data were imputed with a multivariate imputation by using a chained equations algorithm (MICE). We compared models versus each other and baseline clinical scoring systems by using a mean area under the curve (AUC) over 10 cross-validation folds and across 10 survival durations (6 to 60 months). Machine learning models achieved significantly higher prediction accuracy (all AUC >0.82) over common clinical risk scores (AUC = 0.61 to 0.79), with the nonlinear random forest models outperforming logistic regression (p < 0.01). The random forest model including all echocardiographic measurements yielded the highest prediction accuracy (p < 0.01 across all models and survival durations). Only 10 variables were needed to achieve 96% of the maximum prediction accuracy, with 6 of these variables being derived from echocardiography. Tricuspid regurgitation velocity was more predictive of survival than LVEF. In a subset of studies with complete data for the top 10 variables, multivariate imputation by chained equations yielded slightly reduced predictive accuracies (difference in AUC of 0.003) compared with the original data. Machine learning can fully utilize large combinations of disparate input variables to predict survival after echocardiography with superior accuracy. Copyright © 2018 American College of Cardiology Foundation. Published by Elsevier Inc. All rights reserved.
Mapping Deforestation area in North Korea Using Phenology-based Multi-Index and Random Forest

NASA Astrophysics Data System (ADS)

Jin, Y.; Sung, S.; Lee, D. K.; Jeong, S.

2016-12-01

Forest ecosystem provides ecological benefits to both humans and wildlife. Growing global demand for food and fiber is accelerating the pressure on the forest ecosystem in whole world from agriculture and logging. In recently, North Korea lost almost 40 % of its forests to crop fields for food production and cut-down of forest for fuel woods between 1990 and 2015. It led to the increased damage caused by natural disasters and is known to be one of the most forest degraded areas in the world. The characteristic of forest landscape in North Korea is complex and heterogeneous, the major landscape types in the forest are hillside farm, unstocked forest, natural forest and plateau vegetation. Remote sensing can be used for the forest degradation mapping of a dynamic landscape at a broad scale of detail and spatial distribution. Confusion mostly occurred between hillside farmland and unstocked forest, but also between unstocked forest and forest. Most previous forest degradation that used focused on the classification of broad types such as deforests area and sand from the perspective of land cover classification. The objective of this study is using random forest for mapping degraded forest in North Korea by phenological based vegetation index derived from MODIS products, which has various environmental factors such as vegetation, soil and water at a regional scale for improving accuracy. The model created by random forest resulted in an overall accuracy was 91.44%. Class user's accuracy of hillside farmland and unstocked forest were 97.2% and 84%%, which indicate the degraded forest. Unstocked forest had relative low user accuracy due to misclassified hillside farmland and forest samples. Producer's accuracy of hillside farmland and unstocked forest were 85.2% and 93.3%, repectly. In this case hillside farmland had lower produce accuracy mainly due to confusion with field, unstocked forest and forest. Such a classification of degraded forest could supply essential information to decide the priority of forest management and restoration in degraded forest area.
Subpixel urban land cover estimation: comparing cubist, random forests, and support vector regression

Treesearch

Jeffrey T. Walton

2008-01-01

Three machine learning subpixel estimation methods (Cubist, Random Forests, and support vector regression) were applied to estimate urban cover. Urban forest canopy cover and impervious surface cover were estimated from Landsat-7 ETM+ imagery using a higher resolution cover map resampled to 30 m as training and reference data. Three different band combinations (...
On the use of spectra from portable Raman and ATR-IR instruments in synthesis route attribution of a chemical warfare agent by multivariate modeling.

PubMed

Wiktelius, Daniel; Ahlinder, Linnea; Larsson, Andreas; Höjer Holmgren, Karin; Norlin, Rikard; Andersson, Per Ola

2018-08-15

Collecting data under field conditions for forensic investigations of chemical warfare agents calls for the use of portable instruments. In this study, a set of aged, crude preparations of sulfur mustard were characterized spectroscopically without any sample preparation using handheld Raman and portable IR instruments. The spectral data was used to construct Random Forest multivariate models for the attribution of test set samples to the synthetic method used for their production. Colored and fluorescent samples were included in the study, which made Raman spectroscopy challenging although fluorescence was diminished by using an excitation wavelength of 1064 nm. The predictive power of models constructed with IR or Raman data alone, as well as with combined data was investigated. Both techniques gave useful data for attribution. Model performance was enhanced when Raman and IR spectra were combined, allowing correct classification of 19/23 (83%) of test set spectra. The results demonstrate that data obtained with spectroscopy instruments amenable for field deployment can be useful in forensic studies of chemical warfare agents. Copyright © 2018 Elsevier B.V. All rights reserved.
Random Forest-Based Recognition of Isolated Sign Language Subwords Using Data from Accelerometers and Surface Electromyographic Sensors.

PubMed

Su, Ruiliang; Chen, Xiang; Cao, Shuai; Zhang, Xu

2016-01-14

Sign language recognition (SLR) has been widely used for communication amongst the hearing-impaired and non-verbal community. This paper proposes an accurate and robust SLR framework using an improved decision tree as the base classifier of random forests. This framework was used to recognize Chinese sign language subwords using recordings from a pair of portable devices worn on both arms consisting of accelerometers (ACC) and surface electromyography (sEMG) sensors. The experimental results demonstrated the validity of the proposed random forest-based method for recognition of Chinese sign language (CSL) subwords. With the proposed method, 98.25% average accuracy was obtained for the classification of a list of 121 frequently used CSL subwords. Moreover, the random forests method demonstrated a superior performance in resisting the impact of bad training samples. When the proportion of bad samples in the training set reached 50%, the recognition error rate of the random forest-based method was only 10.67%, while that of a single decision tree adopted in our previous work was almost 27.5%. Our study offers a practical way of realizing a robust and wearable EMG-ACC-based SLR systems.
Pseudo CT estimation from MRI using patch-based random forest

NASA Astrophysics Data System (ADS)

Yang, Xiaofeng; Lei, Yang; Shu, Hui-Kuo; Rossi, Peter; Mao, Hui; Shim, Hyunsuk; Curran, Walter J.; Liu, Tian

2017-02-01

Recently, MR simulators gain popularity because of unnecessary radiation exposure of CT simulators being used in radiation therapy planning. We propose a method for pseudo CT estimation from MR images based on a patch-based random forest. Patient-specific anatomical features are extracted from the aligned training images and adopted as signatures for each voxel. The most robust and informative features are identified using feature selection to train the random forest. The well-trained random forest is used to predict the pseudo CT of a new patient. This prediction technique was tested with human brain images and the prediction accuracy was assessed using the original CT images. Peak signal-to-noise ratio (PSNR) and feature similarity (FSIM) indexes were used to quantify the differences between the pseudo and original CT images. The experimental results showed the proposed method could accurately generate pseudo CT images from MR images. In summary, we have developed a new pseudo CT prediction method based on patch-based random forest, demonstrated its clinical feasibility, and validated its prediction accuracy. This pseudo CT prediction technique could be a useful tool for MRI-based radiation treatment planning and attenuation correction in a PET/MRI scanner.
A multivariate study of mangrove morphology (Rhizophora mangle) using both above and below-water plant architecture

USGS Publications Warehouse

Brooks, R.A.; Bell, S.S.

2005-01-01

A descriptive study of the architecture of the red mangrove, Rhizophora mangle L., habitat of Tampa Bay, FL, was conducted to assess if plant architecture could be used to discriminate overwash from fringing forest type. Seven above-water (e.g., tree height, diameter at breast height, and leaf area) and 10 below-water (e.g., root density, root complexity, and maximum root order) architectural features were measured in eight mangrove stands. A multivariate technique (discriminant analysis) was used to test the ability of different models comprising above-water, below-water, or whole tree architecture to classify forest type. Root architectural features appear to be better than classical forestry measurements at discriminating between fringing and overwash forests but, regardless of the features loaded into the model, misclassification rates were high as forest type was only correctly classified in 66% of the cases. Based upon habitat architecture, the results of this study do not support a sharp distinction between overwash and fringing red mangrove forests in Tampa Bay but rather indicate that the two are architecturally undistinguishable. Therefore, within this northern portion of the geographic range of red mangroves, a more appropriate classification system based upon architecture may be one in which overwash and fringing forest types are combined into a single, "tide dominated" category. ?? 2005 Elsevier Ltd. All rights reserved.
Detecting targets hidden in random forests

NASA Astrophysics Data System (ADS)

Kouritzin, Michael A.; Luo, Dandan; Newton, Fraser; Wu, Biao

2009-05-01

Military tanks, cargo or troop carriers, missile carriers or rocket launchers often hide themselves from detection in the forests. This plagues the detection problem of locating these hidden targets. An electro-optic camera mounted on a surveillance aircraft or unmanned aerial vehicle is used to capture the images of the forests with possible hidden targets, e.g., rocket launchers. We consider random forests of longitudinal and latitudinal correlations. Specifically, foliage coverage is encoded with a binary representation (i.e., foliage or no foliage), and is correlated in adjacent regions. We address the detection problem of camouflaged targets hidden in random forests by building memory into the observations. In particular, we propose an efficient algorithm to generate random forests, ground, and camouflage of hidden targets with two dimensional correlations. The observations are a sequence of snapshots consisting of foliage-obscured ground or target. Theoretically, detection is possible because there are subtle differences in the correlations of the ground and camouflage of the rocket launcher. However, these differences are well beyond human perception. To detect the presence of hidden targets automatically, we develop a Markov representation for these sequences and modify the classical filtering equations to allow the Markov chain observation. Particle filters are used to estimate the position of the targets in combination with a novel random weighting technique. Furthermore, we give positive proof-of-concept simulations.
Screening large-scale association study data: exploiting interactions using random forests.

PubMed

Lunetta, Kathryn L; Hayward, L Brooke; Segal, Jonathan; Van Eerdewegh, Paul

2004-12-10

Genome-wide association studies for complex diseases will produce genotypes on hundreds of thousands of single nucleotide polymorphisms (SNPs). A logical first approach to dealing with massive numbers of SNPs is to use some test to screen the SNPs, retaining only those that meet some criterion for further study. For example, SNPs can be ranked by p-value, and those with the lowest p-values retained. When SNPs have large interaction effects but small marginal effects in a population, they are unlikely to be retained when univariate tests are used for screening. However, model-based screens that pre-specify interactions are impractical for data sets with thousands of SNPs. Random forest analysis is an alternative method that produces a single measure of importance for each predictor variable that takes into account interactions among variables without requiring model specification. Interactions increase the importance for the individual interacting variables, making them more likely to be given high importance relative to other variables. We test the performance of random forests as a screening procedure to identify small numbers of risk-associated SNPs from among large numbers of unassociated SNPs using complex disease models with up to 32 loci, incorporating both genetic heterogeneity and multi-locus interaction. Keeping other factors constant, if risk SNPs interact, the random forest importance measure significantly outperforms the Fisher Exact test as a screening tool. As the number of interacting SNPs increases, the improvement in performance of random forest analysis relative to Fisher Exact test for screening also increases. Random forests perform similarly to the univariate Fisher Exact test as a screening tool when SNPs in the analysis do not interact. In the context of large-scale genetic association studies where unknown interactions exist among true risk-associated SNPs or SNPs and environmental covariates, screening SNPs using random forest analyses can significantly reduce the number of SNPs that need to be retained for further study compared to standard univariate screening methods.
Application of lifting wavelet and random forest in compound fault diagnosis of gearbox

NASA Astrophysics Data System (ADS)

Chen, Tang; Cui, Yulian; Feng, Fuzhou; Wu, Chunzhi

2018-03-01

Aiming at the weakness of compound fault characteristic signals of a gearbox of an armored vehicle and difficult to identify fault types, a fault diagnosis method based on lifting wavelet and random forest is proposed. First of all, this method uses the lifting wavelet transform to decompose the original vibration signal in multi-layers, reconstructs the multi-layer low-frequency and high-frequency components obtained by the decomposition to get multiple component signals. Then the time-domain feature parameters are obtained for each component signal to form multiple feature vectors, which is input into the random forest pattern recognition classifier to determine the compound fault type. Finally, a variety of compound fault data of the gearbox fault analog test platform are verified, the results show that the recognition accuracy of the fault diagnosis method combined with the lifting wavelet and the random forest is up to 99.99%.
D Semantic Labeling of ALS Data Based on Domain Adaption by Transferring and Fusing Random Forest Models

NASA Astrophysics Data System (ADS)

Wu, J.; Yao, W.; Zhang, J.; Li, Y.

2018-04-01

Labeling 3D point cloud data with traditional supervised learning methods requires considerable labelled samples, the collection of which is cost and time expensive. This work focuses on adopting domain adaption concept to transfer existing trained random forest classifiers (based on source domain) to new data scenes (target domain), which aims at reducing the dependence of accurate 3D semantic labeling in point clouds on training samples from the new data scene. Firstly, two random forest classifiers were firstly trained with existing samples previously collected for other data. They were different from each other by using two different decision tree construction algorithms: C4.5 with information gain ratio and CART with Gini index. Secondly, four random forest classifiers adapted to the target domain are derived through transferring each tree in the source random forest models with two types of operations: structure expansion and reduction-SER and structure transfer-STRUT. Finally, points in target domain are labelled by fusing the four newly derived random forest classifiers using weights of evidence based fusion model. To validate our method, experimental analysis was conducted using 3 datasets: one is used as the source domain data (Vaihingen data for 3D Semantic Labelling); another two are used as the target domain data from two cities in China (Jinmen city and Dunhuang city). Overall accuracies of 85.5 % and 83.3 % for 3D labelling were achieved for Jinmen city and Dunhuang city data respectively, with only 1/3 newly labelled samples compared to the cases without domain adaption.
Predicting stem total and assortment volumes in an industrial Pinus taeda L. forest plantation using airborne laser scanning data and random forest

Treesearch

Carlos Alberto Silva; Carine Klauberg; Andrew Thomas Hudak; Lee Alexander Vierling; Wan Shafrina Wan Mohd Jaafar; Midhun Mohan; Mariano Garcia; Antonio Ferraz; Adrian Cardil; Sassan Saatchi

2017-01-01

Improvements in the management of pine plantations result in multiple industrial and environmental benefits. Remote sensing techniques can dramatically increase the efficiency of plantation management by reducing or replacing time-consuming field sampling. We tested the utility and accuracy of combining field and airborne lidar data with Random Forest, a supervised...
Understanding and reaching family forest owners: lessons from social marketing research

Treesearch

Brett J. Butler; Mary Tyrrell; Geoff Feinberg; Scott VanManen; Larry Wiseman; Scott Wallinger

2007-01-01

Social marketing--the use of commercial marketing techniques to effect positive social change--is a promising means by which to develop more effective and efficient outreach, policies, and services for family forest owners. A hierarchical, multivariate analysis based on landowners' attitudes reveals four groups of owners to whom programs can be tailored: woodland...
Development of an ecological classification system for the Wayne National Forest

Treesearch

David M. Hix; Andrea M. Chech

1993-01-01

In 1991, a collaborative research project was initiated to create an ecological classification system for the Wayne National Forest of southeastern Ohio. The work focuses on the ecological land type (ELT) level of ecosystem classification. The most common ELTs are being identified and described using information from intensive field sampling and multivariate data...
Using crown condition variables as indicators of forest health

Treesearch

Stanley J. Zarnoch; William A. Bechtold; K.W. Stolte

2004-01-01

Indicators of forest health used in previous studies have focused on crown variables analyzed individually at the tree level by summarizing over all species. This approach has the virtue of simplicity but does not account for the three-dimensional attributes of a tree crown, the multivariate nature of the crown variables, or variability among species. To alleviate...

Uncertainty in Random Forests: What does it mean in a spatial context?

NASA Astrophysics Data System (ADS)

Klump, Jens; Fouedjio, Francky

2017-04-01

Geochemical surveys are an important part of exploration for mineral resources and in environmental studies. The samples and chemical analyses are often laborious and difficult to obtain and therefore come at a high cost. As a consequence, these surveys are characterised by datasets with large numbers of variables but relatively few data points when compared to conventional big data problems. With more remote sensing platforms and sensor networks being deployed, large volumes of auxiliary data of the surveyed areas are becoming available. The use of these auxiliary data has the potential to improve the prediction of chemical element concentrations over the whole study area. Kriging is a well established geostatistical method for the prediction of spatial data but requires significant pre-processing and makes some basic assumptions about the underlying distribution of the data. Some machine learning algorithms, on the other hand, may require less data pre-processing and are non-parametric. In this study we used a dataset provided by Kirkwood et al. [1] to explore the potential use of Random Forest in geochemical mapping. We chose Random Forest because it is a well understood machine learning method and has the advantage that it provides us with a measure of uncertainty. By comparing Random Forest to Kriging we found that both methods produced comparable maps of estimated values for our variables of interest. Kriging outperformed Random Forest for variables of interest with relatively strong spatial correlation. The measure of uncertainty provided by Random Forest seems to be quite different to the measure of uncertainty provided by Kriging. In particular, the lack of spatial context can give misleading results in areas without ground truth data. In conclusion, our preliminary results show that the model driven approach in geostatistics gives us more reliable estimates for our target variables than Random Forest for variables with relatively strong spatial correlation. However, in cases of weak spatial correlation Random Forest, as a nonparametric method, may give the better results once we have a better understanding of the meaning of its uncertainty measures in a spatial context. References [1] Kirkwood, C., M. Cave, D. Beamish, S. Grebby, and A. Ferreira (2016), A machine learning approach to geochemical mapping, Journal of Geochemical Exploration, 163, 28-40, doi:10.1016/j.gexplo.2016.05.003.
Combining techniques for screening and evaluating interaction terms on high-dimensional time-to-event data.

PubMed

Sariyar, Murat; Hoffmann, Isabell; Binder, Harald

2014-02-26

Molecular data, e.g. arising from microarray technology, is often used for predicting survival probabilities of patients. For multivariate risk prediction models on such high-dimensional data, there are established techniques that combine parameter estimation and variable selection. One big challenge is to incorporate interactions into such prediction models. In this feasibility study, we present building blocks for evaluating and incorporating interactions terms in high-dimensional time-to-event settings, especially for settings in which it is computationally too expensive to check all possible interactions. We use a boosting technique for estimation of effects and the following building blocks for pre-selecting interactions: (1) resampling, (2) random forests and (3) orthogonalization as a data pre-processing step. In a simulation study, the strategy that uses all building blocks is able to detect true main effects and interactions with high sensitivity in different kinds of scenarios. The main challenge are interactions composed of variables that do not represent main effects, but our findings are also promising in this regard. Results on real world data illustrate that effect sizes of interactions frequently may not be large enough to improve prediction performance, even though the interactions are potentially of biological relevance. Screening interactions through random forests is feasible and useful, when one is interested in finding relevant two-way interactions. The other building blocks also contribute considerably to an enhanced pre-selection of interactions. We determined the limits of interaction detection in terms of necessary effect sizes. Our study emphasizes the importance of making full use of existing methods in addition to establishing new ones.
Correcting Classifiers for Sample Selection Bias in Two-Phase Case-Control Studies

PubMed Central

Theis, Fabian J.

2017-01-01

Epidemiological studies often utilize stratified data in which rare outcomes or exposures are artificially enriched. This design can increase precision in association tests but distorts predictions when applying classifiers on nonstratified data. Several methods correct for this so-called sample selection bias, but their performance remains unclear especially for machine learning classifiers. With an emphasis on two-phase case-control studies, we aim to assess which corrections to perform in which setting and to obtain methods suitable for machine learning techniques, especially the random forest. We propose two new resampling-based methods to resemble the original data and covariance structure: stochastic inverse-probability oversampling and parametric inverse-probability bagging. We compare all techniques for the random forest and other classifiers, both theoretically and on simulated and real data. Empirical results show that the random forest profits from only the parametric inverse-probability bagging proposed by us. For other classifiers, correction is mostly advantageous, and methods perform uniformly. We discuss consequences of inappropriate distribution assumptions and reason for different behaviors between the random forest and other classifiers. In conclusion, we provide guidance for choosing correction methods when training classifiers on biased samples. For random forests, our method outperforms state-of-the-art procedures if distribution assumptions are roughly fulfilled. We provide our implementation in the R package sambia. PMID:29312464
A Comparison between Decision Tree and Random Forest in Determining the Risk Factors Associated with Type 2 Diabetes.

PubMed

Esmaily, Habibollah; Tayefi, Maryam; Doosti, Hassan; Ghayour-Mobarhan, Majid; Nezami, Hossein; Amirabadizadeh, Alireza

2018-04-24

We aimed to identify the associated risk factors of type 2 diabetes mellitus (T2DM) using data mining approach, decision tree and random forest techniques using the Mashhad Stroke and Heart Atherosclerotic Disorders (MASHAD) Study program. A cross-sectional study. The MASHAD study started in 2010 and will continue until 2020. Two data mining tools, namely decision trees, and random forests, are used for predicting T2DM when some other characteristics are observed on 9528 subjects recruited from MASHAD database. This paper makes a comparison between these two models in terms of accuracy, sensitivity, specificity and the area under ROC curve. The prevalence rate of T2DM was 14% among these subjects. The decision tree model has 64.9% accuracy, 64.5% sensitivity, 66.8% specificity, and area under the ROC curve measuring 68.6%, while the random forest model has 71.1% accuracy, 71.3% sensitivity, 69.9% specificity, and area under the ROC curve measuring 77.3% respectively. The random forest model, when used with demographic, clinical, and anthropometric and biochemical measurements, can provide a simple tool to identify associated risk factors for type 2 diabetes. Such identification can substantially use for managing the health policy to reduce the number of subjects with T2DM .
Applications of random forest feature selection for fine-scale genetic population assignment.

PubMed

Sylvester, Emma V A; Bentzen, Paul; Bradbury, Ian R; Clément, Marie; Pearce, Jon; Horne, John; Beiko, Robert G

2018-02-01

Genetic population assignment used to inform wildlife management and conservation efforts requires panels of highly informative genetic markers and sensitive assignment tests. We explored the utility of machine-learning algorithms (random forest, regularized random forest and guided regularized random forest) compared with F ST ranking for selection of single nucleotide polymorphisms (SNP) for fine-scale population assignment. We applied these methods to an unpublished SNP data set for Atlantic salmon ( Salmo salar ) and a published SNP data set for Alaskan Chinook salmon ( Oncorhynchus tshawytscha ). In each species, we identified the minimum panel size required to obtain a self-assignment accuracy of at least 90% using each method to create panels of 50-700 markers Panels of SNPs identified using random forest-based methods performed up to 7.8 and 11.2 percentage points better than F ST -selected panels of similar size for the Atlantic salmon and Chinook salmon data, respectively. Self-assignment accuracy ≥90% was obtained with panels of 670 and 384 SNPs for each data set, respectively, a level of accuracy never reached for these species using F ST -selected panels. Our results demonstrate a role for machine-learning approaches in marker selection across large genomic data sets to improve assignment for management and conservation of exploited populations.
Do little interactions get lost in dark random forests?

PubMed

Wright, Marvin N; Ziegler, Andreas; König, Inke R

2016-03-31

Random forests have often been claimed to uncover interaction effects. However, if and how interaction effects can be differentiated from marginal effects remains unclear. In extensive simulation studies, we investigate whether random forest variable importance measures capture or detect gene-gene interactions. With capturing interactions, we define the ability to identify a variable that acts through an interaction with another one, while detection is the ability to identify an interaction effect as such. Of the single importance measures, the Gini importance captured interaction effects in most of the simulated scenarios, however, they were masked by marginal effects in other variables. With the permutation importance, the proportion of captured interactions was lower in all cases. Pairwise importance measures performed about equal, with a slight advantage for the joint variable importance method. However, the overall fraction of detected interactions was low. In almost all scenarios the detection fraction in a model with only marginal effects was larger than in a model with an interaction effect only. Random forests are generally capable of capturing gene-gene interactions, but current variable importance measures are unable to detect them as interactions. In most of the cases, interactions are masked by marginal effects and interactions cannot be differentiated from marginal effects. Consequently, caution is warranted when claiming that random forests uncover interactions.
Application of random survival forests in understanding the determinants of under-five child mortality in Uganda in the presence of covariates that satisfy the proportional and non-proportional hazards assumption.

PubMed

Nasejje, Justine B; Mwambi, Henry

2017-09-07

Uganda just like any other Sub-Saharan African country, has a high under-five child mortality rate. To inform policy on intervention strategies, sound statistical methods are required to critically identify factors strongly associated with under-five child mortality rates. The Cox proportional hazards model has been a common choice in analysing data to understand factors strongly associated with high child mortality rates taking age as the time-to-event variable. However, due to its restrictive proportional hazards (PH) assumption, some covariates of interest which do not satisfy the assumption are often excluded in the analysis to avoid mis-specifying the model. Otherwise using covariates that clearly violate the assumption would mean invalid results. Survival trees and random survival forests are increasingly becoming popular in analysing survival data particularly in the case of large survey data and could be attractive alternatives to models with the restrictive PH assumption. In this article, we adopt random survival forests which have never been used in understanding factors affecting under-five child mortality rates in Uganda using Demographic and Health Survey data. Thus the first part of the analysis is based on the use of the classical Cox PH model and the second part of the analysis is based on the use of random survival forests in the presence of covariates that do not necessarily satisfy the PH assumption. Random survival forests and the Cox proportional hazards model agree that the sex of the household head, sex of the child, number of births in the past 1 year are strongly associated to under-five child mortality in Uganda given all the three covariates satisfy the PH assumption. Random survival forests further demonstrated that covariates that were originally excluded from the earlier analysis due to violation of the PH assumption were important in explaining under-five child mortality rates. These covariates include the number of children under the age of five in a household, number of births in the past 5 years, wealth index, total number of children ever born and the child's birth order. The results further indicated that the predictive performance for random survival forests built using covariates including those that violate the PH assumption was higher than that for random survival forests built using only covariates that satisfy the PH assumption. Random survival forests are appealing methods in analysing public health data to understand factors strongly associated with under-five child mortality rates especially in the presence of covariates that violate the proportional hazards assumption.
Territorial user rights for fisheries as ancillary instruments for marine coastal conservation in Chile.

PubMed

Gelcich, Stefan; Fernández, Miriam; Godoy, Natalio; Canepa, Antonio; Prado, Luis; Castilla, Juan Carlos

2012-12-01

Territorial user rights for fisheries have been advocated as a way to achieve sustainable resource management. However, few researchers have empirically assessed their potential as ancillary marine conservation instruments by comparing them to no-take marine protected areas. In kelp (Lessonia trabeculata) forests of central Chile, we compared species richness, density, and biomass of macroinvertebrates and reef fishes among territorial-user-right areas with low-level and high-level enforcement, no-take marine protected areas, and open-access areas in 42 100-m subtidal transects. We also assessed structural complexity of the kelp forest and substratum composition. Multivariate randomized permutation tests indicated macroinvertebrate and reef fish communities associated with the different access regimes differed significantly. Substratum composition and structural complexity of kelp forest did not differ among access regimes. Univariate analyses showed species richness, biomass, and density of macroinvertebrates and reef fishes were greater in highly enforced territorial-user-right areas and no-take marine protected areas than in open-access areas. Densities of macroinvertebrates and reef fishes of economic importance were not significantly different between highly enforced territorial-user-right and no-take marine protected areas. Densities of economically important macroinvertebrates in areas with low-level enforcement were significantly lower than those in areas with high-level enforcement and no-take marine protected areas but were significantly higher than in areas with open access. Territorial-user-right areas could be important ancillary conservation instruments if they are well enforced. ©2012 Society for Conservation Biology.
Multivariate regression model for predicting yields of grade lumber from yellow birch sawlogs

Treesearch

Andrew F. Howard; Daniel A. Yaussy

1986-01-01

A multivariate regression model was developed to predict green board-foot yields for the common grades of factory lumber processed from yellow birch factory-grade logs. The model incorporates the standard log measurements of scaling diameter, length, proportion of scalable defects, and the assigned USDA Forest Service log grade. Differences in yields between band and...
Plant traits demonstrate that temperate and tropical giant eucalypt forests are ecologically convergent with rainforest not savanna.

PubMed

Tng, David Y P; Jordan, Greg J; Bowman, David M J S

2013-01-01

Ecological theory differentiates rainforest and open vegetation in many regions as functionally divergent alternative stable states with transitional (ecotonal) vegetation between the two forming transient unstable states. This transitional vegetation is of considerable significance, not only as a test case for theories of vegetation dynamics, but also because this type of vegetation is of major economic importance, and is home to a suite of species of conservation significance, including the world's tallest flowering plants. We therefore created predictions of patterns in plant functional traits that would test the alternative stable states model of these systems. We measured functional traits of 128 trees and shrubs across tropical and temperate rainforest - open vegetation transitions in Australia, with giant eucalypt forests situated between these vegetation types. We analysed a set of functional traits: leaf carbon isotopes, leaf area, leaf mass per area, leaf slenderness, wood density, maximum height and bark thickness, using univariate and multivariate methods. For most traits, giant eucalypt forest was similar to rainforest, while rainforest, particularly tropical rainforest, was significantly different from the open vegetation. In multivariate analyses, tropical and temperate rainforest diverged functionally, and both segregated from open vegetation. Furthermore, the giant eucalypt forests overlapped in function with their respective rainforests. The two types of giant eucalypt forests also exhibited greater overall functional similarity to each other than to any of the open vegetation types. We conclude that tropical and temperate giant eucalypt forests are ecologically and functionally convergent. The lack of clear functional differentiation from rainforest suggests that giant eucalypt forests are unstable states within the basin of attraction of rainforest. Our results have important implications for giant eucalypt forest management.
Plant Traits Demonstrate That Temperate and Tropical Giant Eucalypt Forests Are Ecologically Convergent with Rainforest Not Savanna

PubMed Central

Tng, David Y. P.; Jordan, Greg J.; Bowman, David M. J. S.

2013-01-01

Ecological theory differentiates rainforest and open vegetation in many regions as functionally divergent alternative stable states with transitional (ecotonal) vegetation between the two forming transient unstable states. This transitional vegetation is of considerable significance, not only as a test case for theories of vegetation dynamics, but also because this type of vegetation is of major economic importance, and is home to a suite of species of conservation significance, including the world’s tallest flowering plants. We therefore created predictions of patterns in plant functional traits that would test the alternative stable states model of these systems. We measured functional traits of 128 trees and shrubs across tropical and temperate rainforest – open vegetation transitions in Australia, with giant eucalypt forests situated between these vegetation types. We analysed a set of functional traits: leaf carbon isotopes, leaf area, leaf mass per area, leaf slenderness, wood density, maximum height and bark thickness, using univariate and multivariate methods. For most traits, giant eucalypt forest was similar to rainforest, while rainforest, particularly tropical rainforest, was significantly different from the open vegetation. In multivariate analyses, tropical and temperate rainforest diverged functionally, and both segregated from open vegetation. Furthermore, the giant eucalypt forests overlapped in function with their respective rainforests. The two types of giant eucalypt forests also exhibited greater overall functional similarity to each other than to any of the open vegetation types. We conclude that tropical and temperate giant eucalypt forests are ecologically and functionally convergent. The lack of clear functional differentiation from rainforest suggests that giant eucalypt forests are unstable states within the basin of attraction of rainforest. Our results have important implications for giant eucalypt forest management. PMID:24358359
Microhabitat and Environmental Relationships of Bryophytes in Blue Oak (Quercus douglasii H. & A.) Woodlands and Forests of Central Coastal California

Treesearch

Mark Borchert; Daniel Norris

1991-01-01

Microhabitat preferences and species-environment patterns were quantified for bryophytes in blue oak woodlands and forests of central coastal California. Presence data for mosses collected from 149 400 m2 plots were analyzed using canonical correspondence analysis (CCA), a multivariate direct gradient analysis technique. Separate ordinations were performed for...
Geographic variation in forest composition and precipitation predict the synchrony of forest insect outbreaks

Treesearch

Kyle J. Haynes; Andrew M. Liebhold; Ottar N. Bjørnstad; Andrew J. Allstadt; Randall S. Morin

2018-01-01

Evaluating the causes of spatial synchrony in population dynamics in nature is notoriously difficult due to a lack of data and appropriate statistical methods. Here, we use a recently developed method, a multivariate extension of the local indicators of spatial autocorrelation statistic, to map geographic variation in the synchrony of gypsy moth outbreaks. Regression...
Variation in nutrient characteristics of surface soils from the Luquillo Experimental Forest of Puerto Rico: A multivariate perspective.

Treesearch

S. B. Cox; M. R. Willig; F. N. Scatena

2002-01-01

We assessed the effects of landscape features (vegetation type and topography), season, and spatial hierarchy on the nutrient content of surface soils in the Luquillo Experimental Forest (LEF) of Puerto Rico. Considerable spatial variation characterized the soils of the LEF, and differences between replicate sites within each combination of vegetation type (tabonuco vs...
Using small area estimation and Lidar-derived variables for multivariate prediction of forest attributes

Treesearch

F. Mauro; Vicente Monleon; H. Temesgen

2015-01-01

Small area estimation (SAE) techniques have been successfully applied in forest inventories to provide reliable estimates for domains where the sample size is small (i.e. small areas). Previous studies have explored the use of either Area Level or Unit Level Empirical Best Linear Unbiased Predictors (EBLUPs) in a univariate framework, modeling each variable of interest...
Comparing spatial regression to random forests for large environmental data sets

EPA Science Inventory

Environmental data may be “large” due to number of records, number of covariates, or both. Random forests has a reputation for good predictive performance when using many covariates, whereas spatial regression, when using reduced rank methods, has a reputatio...
Assessing the accuracy and stability of variable selection methods for random forest modeling in ecology

EPA Science Inventory

Random forest (RF) modeling has emerged as an important statistical learning method in ecology due to its exceptional predictive performance. However, for large and complex ecological datasets there is limited guidance on variable selection methods for RF modeling. Typically, e...
Combination of complementary data mining methods for geographical characterization of extra virgin olive oils based on mineral composition.

PubMed

Sayago, Ana; González-Domínguez, Raúl; Beltrán, Rafael; Fernández-Recamales, Ángeles

2018-09-30

This work explores the potential of multi-element fingerprinting in combination with advanced data mining strategies to assess the geographical origin of extra virgin olive oil samples. For this purpose, the concentrations of 55 elements were determined in 125 oil samples from multiple Spanish geographic areas. Several unsupervised and supervised multivariate statistical techniques were used to build classification models and investigate the relationship between mineral composition of olive oils and their provenance. Results showed that Spanish extra virgin olive oils exhibit characteristic element profiles, which can be differentiated on the basis of their origin in accordance with three geographical areas: Atlantic coast (Huelva province), Mediterranean coast and inland regions. Furthermore, statistical modelling yielded high sensitivity and specificity, principally when random forest and support vector machines were employed, thus demonstrating the utility of these techniques in food traceability and authenticity research. Copyright © 2018 Elsevier Ltd. All rights reserved.
Spatial modeling of cutaneous leishmaniasis in the Andean region of Colombia.

PubMed

Pérez-Flórez, Mauricio; Ocampo, Clara Beatriz; Valderrama-Ardila, Carlos; Alexander, Neal

2016-06-27

The objective of this research was to identify environmental risk factors for cutaneous leishmaniasis (CL) in Colombia and map high-risk municipalities. The study area was the Colombian Andean region, comprising 715 rural and urban municipalities. We used 10 years of CL surveillance: 2000-2009. We used spatial-temporal analysis - conditional autoregressive Poisson random effects modelling - in a Bayesian framework to model the dependence of municipality-level incidence on land use, climate, elevation and population density. Bivariable spatial analysis identified rainforests, forests and secondary vegetation, temperature, and annual precipitation as positively associated with CL incidence. By contrast, livestock agroecosystems and temperature seasonality were negatively associated. Multivariable analysis identified land use - rainforests and agro-livestock - and climate - temperature, rainfall and temperature seasonality - as best predictors of CL. We conclude that climate and land use can be used to identify areas at high risk of CL and that this approach is potentially applicable elsewhere in Latin America.
Characterizing stand-level forest canopy cover and height using Landsat time series, samples of airborne LiDAR, and the Random Forest algorithm

NASA Astrophysics Data System (ADS)

Ahmed, Oumer S.; Franklin, Steven E.; Wulder, Michael A.; White, Joanne C.

2015-03-01

Many forest management activities, including the development of forest inventories, require spatially detailed forest canopy cover and height data. Among the various remote sensing technologies, LiDAR (Light Detection and Ranging) offers the most accurate and consistent means for obtaining reliable canopy structure measurements. A potential solution to reduce the cost of LiDAR data, is to integrate transects (samples) of LiDAR data with frequently acquired and spatially comprehensive optical remotely sensed data. Although multiple regression is commonly used for such modeling, often it does not fully capture the complex relationships between forest structure variables. This study investigates the potential of Random Forest (RF), a machine learning technique, to estimate LiDAR measured canopy structure using a time series of Landsat imagery. The study is implemented over a 2600 ha area of industrially managed coastal temperate forests on Vancouver Island, British Columbia, Canada. We implemented a trajectory-based approach to time series analysis that generates time since disturbance (TSD) and disturbance intensity information for each pixel and we used this information to stratify the forest land base into two strata: mature forests and young forests. Canopy cover and height for three forest classes (i.e. mature, young and mature and young (combined)) were modeled separately using multiple regression and Random Forest (RF) techniques. For all forest classes, the RF models provided improved estimates relative to the multiple regression models. The lowest validation error was obtained for the mature forest strata in a RF model (R2 = 0.88, RMSE = 2.39 m and bias = -0.16 for canopy height; R2 = 0.72, RMSE = 0.068% and bias = -0.0049 for canopy cover). This study demonstrates the value of using disturbance and successional history to inform estimates of canopy structure and obtain improved estimates of forest canopy cover and height using the RF algorithm.

Predicting membrane protein types using various decision tree classifiers based on various modes of general PseAAC for imbalanced datasets.

PubMed

Sankari, E Siva; Manimegalai, D

2017-12-21

Predicting membrane protein types is an important and challenging research area in bioinformatics and proteomics. Traditional biophysical methods are used to classify membrane protein types. Due to large exploration of uncharacterized protein sequences in databases, traditional methods are very time consuming, expensive and susceptible to errors. Hence, it is highly desirable to develop a robust, reliable, and efficient method to predict membrane protein types. Imbalanced datasets and large datasets are often handled well by decision tree classifiers. Since imbalanced datasets are taken, the performance of various decision tree classifiers such as Decision Tree (DT), Classification And Regression Tree (CART), C4.5, Random tree, REP (Reduced Error Pruning) tree, ensemble methods such as Adaboost, RUS (Random Under Sampling) boost, Rotation forest and Random forest are analysed. Among the various decision tree classifiers Random forest performs well in less time with good accuracy of 96.35%. Another inference is RUS boost decision tree classifier is able to classify one or two samples in the class with very less samples while the other classifiers such as DT, Adaboost, Rotation forest and Random forest are not sensitive for the classes with fewer samples. Also the performance of decision tree classifiers is compared with SVM (Support Vector Machine) and Naive Bayes classifier. Copyright © 2017 Elsevier Ltd. All rights reserved.
Prostate cancer prediction using the random forest algorithm that takes into account transrectal ultrasound findings, age, and serum levels of prostate-specific antigen.

PubMed

Xiao, Li-Hong; Chen, Pei-Ran; Gou, Zhong-Ping; Li, Yong-Zhong; Li, Mei; Xiang, Liang-Cheng; Feng, Ping

2017-01-01

The aim of this study is to evaluate the ability of the random forest algorithm that combines data on transrectal ultrasound findings, age, and serum levels of prostate-specific antigen to predict prostate carcinoma. Clinico-demographic data were analyzed for 941 patients with prostate diseases treated at our hospital, including age, serum prostate-specific antigen levels, transrectal ultrasound findings, and pathology diagnosis based on ultrasound-guided needle biopsy of the prostate. These data were compared between patients with and without prostate cancer using the Chi-square test, and then entered into the random forest model to predict diagnosis. Patients with and without prostate cancer differed significantly in age and serum prostate-specific antigen levels (P < 0.001), as well as in all transrectal ultrasound characteristics (P < 0.05) except uneven echo (P = 0.609). The random forest model based on age, prostate-specific antigen and ultrasound predicted prostate cancer with an accuracy of 83.10%, sensitivity of 65.64%, and specificity of 93.83%. Positive predictive value was 86.72%, and negative predictive value was 81.64%. By integrating age, prostate-specific antigen levels and transrectal ultrasound findings, the random forest algorithm shows better diagnostic performance for prostate cancer than either diagnostic indicator on its own. This algorithm may help improve diagnosis of the disease by identifying patients at high risk for biopsy.
The Efficiency of Random Forest Method for Shoreline Extraction from LANDSAT-8 and GOKTURK-2 Imageries

NASA Astrophysics Data System (ADS)

Bayram, B.; Erdem, F.; Akpinar, B.; Ince, A. K.; Bozkurt, S.; Catal Reis, H.; Seker, D. Z.

2017-11-01

Coastal monitoring plays a vital role in environmental planning and hazard management related issues. Since shorelines are fundamental data for environment management, disaster management, coastal erosion studies, modelling of sediment transport and coastal morphodynamics, various techniques have been developed to extract shorelines. Random Forest is one of these techniques which is used in this study for shoreline extraction.. This algorithm is a machine learning method based on decision trees. Decision trees analyse classes of training data creates rules for classification. In this study, Terkos region has been chosen for the proposed method within the scope of "TUBITAK Project (Project No: 115Y718) titled "Integration of Unmanned Aerial Vehicles for Sustainable Coastal Zone Monitoring Model - Three-Dimensional Automatic Coastline Extraction and Analysis: Istanbul-Terkos Example". Random Forest algorithm has been implemented to extract the shoreline of the Black Sea where near the lake from LANDSAT-8 and GOKTURK-2 satellite imageries taken in 2015. The MATLAB environment was used for classification. To obtain land and water-body classes, the Random Forest method has been applied to NIR bands of LANDSAT-8 (5th band) and GOKTURK-2 (4th band) imageries. Each image has been digitized manually and shorelines obtained for accuracy assessment. According to accuracy assessment results, Random Forest method is efficient for both medium and high resolution images for shoreline extraction studies.
A matrix-based method of moments for fitting the multivariate random effects model for meta-analysis and meta-regression

PubMed Central

Jackson, Dan; White, Ian R; Riley, Richard D

2013-01-01

Multivariate meta-analysis is becoming more commonly used. Methods for fitting the multivariate random effects model include maximum likelihood, restricted maximum likelihood, Bayesian estimation and multivariate generalisations of the standard univariate method of moments. Here, we provide a new multivariate method of moments for estimating the between-study covariance matrix with the properties that (1) it allows for either complete or incomplete outcomes and (2) it allows for covariates through meta-regression. Further, for complete data, it is invariant to linear transformations. Our method reduces to the usual univariate method of moments, proposed by DerSimonian and Laird, in a single dimension. We illustrate our method and compare it with some of the alternatives using a simulation study and a real example. PMID:23401213
Missouri Ozark Forest Ecosystem Project: the experiment

Treesearch

Steven L. Sheriff

2002-01-01

Missouri Ozark Forest Ecosystem Project (MOFEP) is a unique experiment to learn about the impacts of management practices on a forest system. Three forest management practices (uneven-aged management, even-aged management, and no-harvest management) as practiced by the Missouri Department of Conservation were randomly assigned to nine forest management sites using a...
Multivariate Longitudinal Analysis with Bivariate Correlation Test

PubMed Central

Adjakossa, Eric Houngla; Sadissou, Ibrahim; Hounkonnou, Mahouton Norbert; Nuel, Gregory

2016-01-01

In the context of multivariate multilevel data analysis, this paper focuses on the multivariate linear mixed-effects model, including all the correlations between the random effects when the dimensional residual terms are assumed uncorrelated. Using the EM algorithm, we suggest more general expressions of the model’s parameters estimators. These estimators can be used in the framework of the multivariate longitudinal data analysis as well as in the more general context of the analysis of multivariate multilevel data. By using a likelihood ratio test, we test the significance of the correlations between the random effects of two dependent variables of the model, in order to investigate whether or not it is useful to model these dependent variables jointly. Simulation studies are done to assess both the parameter recovery performance of the EM estimators and the power of the test. Using two empirical data sets which are of longitudinal multivariate type and multivariate multilevel type, respectively, the usefulness of the test is illustrated. PMID:27537692
Multivariate Longitudinal Analysis with Bivariate Correlation Test.

PubMed

Adjakossa, Eric Houngla; Sadissou, Ibrahim; Hounkonnou, Mahouton Norbert; Nuel, Gregory

2016-01-01

In the context of multivariate multilevel data analysis, this paper focuses on the multivariate linear mixed-effects model, including all the correlations between the random effects when the dimensional residual terms are assumed uncorrelated. Using the EM algorithm, we suggest more general expressions of the model's parameters estimators. These estimators can be used in the framework of the multivariate longitudinal data analysis as well as in the more general context of the analysis of multivariate multilevel data. By using a likelihood ratio test, we test the significance of the correlations between the random effects of two dependent variables of the model, in order to investigate whether or not it is useful to model these dependent variables jointly. Simulation studies are done to assess both the parameter recovery performance of the EM estimators and the power of the test. Using two empirical data sets which are of longitudinal multivariate type and multivariate multilevel type, respectively, the usefulness of the test is illustrated.
Using Logistic Regression and Random Forests multivariate statistical methods for landslide spatial probability assessment in North-Est Sicily, Italy

NASA Astrophysics Data System (ADS)

Trigila, Alessandro; Iadanza, Carla; Esposito, Carlo; Scarascia-Mugnozza, Gabriele

2015-04-01

North-East Sicily is strongly exposed to shallow landslide events. On October, 1st 2009 a severe rainstorm (225.5 mm of cumulative rainfall in 9 hours) caused flash floods and more than 1000 landslides, which struck several small villages as Giampilieri, Altolia, Molino, Pezzolo, Scaletta Zanclea, Itala, with 31 fatalities, 6 missing persons and damage to buildings and transportation infrastructures. Landslides, mainly consisting in earth and debris translational slides evolving into debris flows, triggered on steep slopes involving colluvium and regolith materials which cover the underlying metamorphic bedrock of Peloritani Mountains. In this area catchments are small (about 10 square kilometres), elongated, with steep slopes, low order streams, short time of concentration, and discharge directly into the sea. In the past, landslides occurred at Altolia in 1613 and 2000, at Molino in 1750, 1805 and 2000, at Giampilieri in 1791, 1918, 1929, 1932, 2000 and on October 25, 2007. The aim of this work is to define susceptibility models for shallow landslides using multivariate statistical analyses in the Giampilieri area (25 square kilometres). A detailed landslide inventory map has been produced, as the first step, through field surveys coupled with the observation of high resolution aerial colour orthophoto taken immediately after the event. 1,490 initiation zones have been identified; most of them have planimetric dimensions ranging between tens to few hundreds of square metres. The spatial hazard assessment has been focused on the detachment areas. Susceptibility models, performed in a GIS environment, took into account several parameters. The morphometric and hydrologic parameters has been derived from a detailed LiDAR 1×1 m. Square grid cells of 4×4 m were adopted as mapping units, on the basis of the area-frequency distribution of the detachment zones, and the optimal representation of the local morphometric conditions (e.g. slope angle, plan curvature). A first phase of the work addressed to identify the spatial relationships between the landslides location and the 13 related factors by using the Frequency Ratio bivariate statistical method. The analysis was then carried out by adopting a multivariate statistical approach, according to the Logistic Regression technique and Random Forests technique that gave best results in terms of AUC. The models were performed and evaluated with different sample sizes and also taking into account the temporal variation of input variables such as burned areas by wildfire. The most significant outcome of this work are: the relevant influence of the sample size on the model results and the strong importance of some environmental factors (e.g. land use and wildfires) for the identification of the depletion zones of extremely rapid shallow landslides.
A multivariate model of plant species richness in forested systems: Old-growth montane forests with a long history of fire

USGS Publications Warehouse

Laughlin, D.C.; Grace, J.B.

2006-01-01

Recently, efforts to develop multivariate models of plant species richness have been extended to include systems where trees play important roles as overstory elements mediating the influences of environment and disturbance on understory richness. We used structural equation modeling to examine the relationship of understory vascular plant species richness to understory abundance, forest structure, topographic slope, and surface fire history in lower montane forests on the North Rim of Grand Canyon National Park, USA based on data from eighty-two 0.1 ha plots. The questions of primary interest in this analysis were: (1) to what degree are influences of trees on understory richness mediated by effects on understory abundance? (2) To what degree are influences of fire history on richness mediated by effects on trees and/or understory abundance? (3) Can the influences of fire history on this system be related simply to time-since-fire or are there unique influences associated with long-term fire frequency? The results we obtained are consistent with the following inferences. First, it appears that pine trees had a strong inhibitory effect on the abundance of understory plants, which in turn led to lower understory species richness. Second, richness declined over time since the last fire. This pattern appears to result from several processes, including (1) a post-fire stimulation of germination, (2) a decline in understory abundance, and (3) an increase over time in pine abundance (which indirectly leads to reduced richness). Finally, once time-since-fire was statistically controlled, it was seen that areas with higher fire frequency have lower richness than expected, which appears to result from negative effects on understory abundance, possibly by depletions of soil nutrients from repeated surface fire. Overall, it appears that at large temporal and spatial scales, surface fire plays an important and complex role in structuring understory plant communities in old-growth montane forests. These results show how multivariate models of herbaceous richness can be expanded to apply to forested systems. Copyright ?? Oikos 2006.
Simulation of multivariate stationary stochastic processes using dimension-reduction representation methods

NASA Astrophysics Data System (ADS)

Liu, Zhangjun; Liu, Zenghui; Peng, Yongbo

2018-03-01

In view of the Fourier-Stieltjes integral formula of multivariate stationary stochastic processes, a unified formulation accommodating spectral representation method (SRM) and proper orthogonal decomposition (POD) is deduced. By introducing random functions as constraints correlating the orthogonal random variables involved in the unified formulation, the dimension-reduction spectral representation method (DR-SRM) and the dimension-reduction proper orthogonal decomposition (DR-POD) are addressed. The proposed schemes are capable of representing the multivariate stationary stochastic process with a few elementary random variables, bypassing the challenges of high-dimensional random variables inherent in the conventional Monte Carlo methods. In order to accelerate the numerical simulation, the technique of Fast Fourier Transform (FFT) is integrated with the proposed schemes. For illustrative purposes, the simulation of horizontal wind velocity field along the deck of a large-span bridge is proceeded using the proposed methods containing 2 and 3 elementary random variables. Numerical simulation reveals the usefulness of the dimension-reduction representation methods.
Variable selection with random forest: Balancing stability, performance, and interpretation in ecological and environmental modeling

EPA Science Inventory

Random forest (RF) is popular in ecological and environmental modeling, in part, because of its insensitivity to correlated predictors and resistance to overfitting. Although variable selection has been proposed to improve both performance and interpretation of RF models, it is u...
Random Forests for Evaluating Pedagogy and Informing Personalized Learning

ERIC Educational Resources Information Center

Spoon, Kelly; Beemer, Joshua; Whitmer, John C.; Fan, Juanjuan; Frazee, James P.; Stronach, Jeanne; Bohonak, Andrew J.; Levine, Richard A.

2016-01-01

Random forests are presented as an analytics foundation for educational data mining tasks. The focus is on course- and program-level analytics including evaluating pedagogical approaches and interventions and identifying and characterizing at-risk students. As part of this development, the concept of individualized treatment effects (ITE) is…
Employing canopy hyperspectral narrowband data and random forest algorithm to differentiate palmer amaranth from colored cotton

USDA-ARS?s Scientific Manuscript database

Palmer amaranth (Amaranthus palmeri S. Wats.) invasion negatively impacts cotton (Gossypium hirsutum L.) production systems throughout the United States. The objective of this study was to evaluate canopy hyperspectral narrowband data as input into the random forest machine learning algorithm to dis...
Old-growth and mature forests near spotted owl nests in western Oregon

NASA Technical Reports Server (NTRS)

Ripple, William J.; Johnson, David H.; Hershey, K. T.; Meslow, E. Charles

1995-01-01

We investigated how the amount of old-growth and mature forest influences the selection of nest sites by northern spotted owls (Strix occidentalis caurina) in the Central Cascade Mountains of Oregon. We used 7 different plot sizes to compare the proportion of mature and old-growth forest between 30 nest sites and 30 random sites. The proportion of old-growth and mature forest was significantly greater at nests sites than at random sites for all plot sizes (P less than or equal to 0.01). Thus, management of the spotted owl might require setting the percentage of old-growth and mature forest retained from harvesting at least 1 standard deviation above the mean for the 30 nest sites we examined.
Application of Machine-Learning Models to Predict Tacrolimus Stable Dose in Renal Transplant Recipients

NASA Astrophysics Data System (ADS)

Tang, Jie; Liu, Rong; Zhang, Yue-Li; Liu, Mou-Ze; Hu, Yong-Fang; Shao, Ming-Jie; Zhu, Li-Jun; Xin, Hua-Wen; Feng, Gui-Wen; Shang, Wen-Jun; Meng, Xiang-Guang; Zhang, Li-Rong; Ming, Ying-Zi; Zhang, Wei

2017-02-01

Tacrolimus has a narrow therapeutic window and considerable variability in clinical use. Our goal was to compare the performance of multiple linear regression (MLR) and eight machine learning techniques in pharmacogenetic algorithm-based prediction of tacrolimus stable dose (TSD) in a large Chinese cohort. A total of 1,045 renal transplant patients were recruited, 80% of which were randomly selected as the “derivation cohort” to develop dose-prediction algorithm, while the remaining 20% constituted the “validation cohort” to test the final selected algorithm. MLR, artificial neural network (ANN), regression tree (RT), multivariate adaptive regression splines (MARS), boosted regression tree (BRT), support vector regression (SVR), random forest regression (RFR), lasso regression (LAR) and Bayesian additive regression trees (BART) were applied and their performances were compared in this work. Among all the machine learning models, RT performed best in both derivation [0.71 (0.67-0.76)] and validation cohorts [0.73 (0.63-0.82)]. In addition, the ideal rate of RT was 4% higher than that of MLR. To our knowledge, this is the first study to use machine learning models to predict TSD, which will further facilitate personalized medicine in tacrolimus administration in the future.
The Dirichlet-Multinomial Model for Multivariate Randomized Response Data and Small Samples

ERIC Educational Resources Information Center

Avetisyan, Marianna; Fox, Jean-Paul

2012-01-01

In survey sampling the randomized response (RR) technique can be used to obtain truthful answers to sensitive questions. Although the individual answers are masked due to the RR technique, individual (sensitive) response rates can be estimated when observing multivariate response data. The beta-binomial model for binary RR data will be generalized…
Tissue segmentation of computed tomography images using a Random Forest algorithm: a feasibility study

NASA Astrophysics Data System (ADS)

Polan, Daniel F.; Brady, Samuel L.; Kaufman, Robert A.

2016-09-01

There is a need for robust, fully automated whole body organ segmentation for diagnostic CT. This study investigates and optimizes a Random Forest algorithm for automated organ segmentation; explores the limitations of a Random Forest algorithm applied to the CT environment; and demonstrates segmentation accuracy in a feasibility study of pediatric and adult patients. To the best of our knowledge, this is the first study to investigate a trainable Weka segmentation (TWS) implementation using Random Forest machine-learning as a means to develop a fully automated tissue segmentation tool developed specifically for pediatric and adult examinations in a diagnostic CT environment. Current innovation in computed tomography (CT) is focused on radiomics, patient-specific radiation dose calculation, and image quality improvement using iterative reconstruction, all of which require specific knowledge of tissue and organ systems within a CT image. The purpose of this study was to develop a fully automated Random Forest classifier algorithm for segmentation of neck-chest-abdomen-pelvis CT examinations based on pediatric and adult CT protocols. Seven materials were classified: background, lung/internal air or gas, fat, muscle, solid organ parenchyma, blood/contrast enhanced fluid, and bone tissue using Matlab and the TWS plugin of FIJI. The following classifier feature filters of TWS were investigated: minimum, maximum, mean, and variance evaluated over a voxel radius of 2 n , (n from 0 to 4), along with noise reduction and edge preserving filters: Gaussian, bilateral, Kuwahara, and anisotropic diffusion. The Random Forest algorithm used 200 trees with 2 features randomly selected per node. The optimized auto-segmentation algorithm resulted in 16 image features including features derived from maximum, mean, variance Gaussian and Kuwahara filters. Dice similarity coefficient (DSC) calculations between manually segmented and Random Forest algorithm segmented images from 21 patient image sections, were analyzed. The automated algorithm produced segmentation of seven material classes with a median DSC of 0.86 ± 0.03 for pediatric patient protocols, and 0.85 ± 0.04 for adult patient protocols. Additionally, 100 randomly selected patient examinations were segmented and analyzed, and a mean sensitivity of 0.91 (range: 0.82-0.98), specificity of 0.89 (range: 0.70-0.98), and accuracy of 0.90 (range: 0.76-0.98) were demonstrated. In this study, we demonstrate that this fully automated segmentation tool was able to produce fast and accurate segmentation of the neck and trunk of the body over a wide range of patient habitus and scan parameters.
Temporal changes in randomness of bird communities across Central Europe.

PubMed

Renner, Swen C; Gossner, Martin M; Kahl, Tiemo; Kalko, Elisabeth K V; Weisser, Wolfgang W; Fischer, Markus; Allan, Eric

2014-01-01

Many studies have examined whether communities are structured by random or deterministic processes, and both are likely to play a role, but relatively few studies have attempted to quantify the degree of randomness in species composition. We quantified, for the first time, the degree of randomness in forest bird communities based on an analysis of spatial autocorrelation in three regions of Germany. The compositional dissimilarity between pairs of forest patches was regressed against the distance between them. We then calculated the y-intercept of the curve, i.e. the 'nugget', which represents the compositional dissimilarity at zero spatial distance. We therefore assume, following similar work on plant communities, that this represents the degree of randomness in species composition. We then analysed how the degree of randomness in community composition varied over time and with forest management intensity, which we expected to reduce the importance of random processes by increasing the strength of environmental drivers. We found that a high portion of the bird community composition could be explained by chance (overall mean of 0.63), implying that most of the variation in local bird community composition is driven by stochastic processes. Forest management intensity did not consistently affect the mean degree of randomness in community composition, perhaps because the bird communities were relatively insensitive to management intensity. We found a high temporal variation in the degree of randomness, which may indicate temporal variation in assembly processes and in the importance of key environmental drivers. We conclude that the degree of randomness in community composition should be considered in bird community studies, and the high values we find may indicate that bird community composition is relatively hard to predict at the regional scale.
Security authentication with a three-dimensional optical phase code using random forest classifier: an overview

NASA Astrophysics Data System (ADS)

Markman, Adam; Carnicer, Artur; Javidi, Bahram

2017-05-01

We overview our recent work [1] on utilizing three-dimensional (3D) optical phase codes for object authentication using the random forest classifier. A simple 3D optical phase code (OPC) is generated by combining multiple diffusers and glass slides. This tag is then placed on a quick-response (QR) code, which is a barcode capable of storing information and can be scanned under non-uniform illumination conditions, rotation, and slight degradation. A coherent light source illuminates the OPC and the transmitted light is captured by a CCD to record the unique signature. Feature extraction on the signature is performed and inputted into a pre-trained random-forest classifier for authentication.
Origin Discrimination of Osmanthus fragrans var. thunbergii Flowers using GC-MS and UPLC-PDA Combined with Multivariable Analysis Methods.

PubMed

Zhou, Fei; Zhao, Yajing; Peng, Jiyu; Jiang, Yirong; Li, Maiquan; Jiang, Yuan; Lu, Baiyi

2017-07-01

Osmanthus fragrans flowers are used as folk medicine and additives for teas, beverages and foods. The metabolites of O. fragrans flowers from different geographical origins were inconsistent in some extent. Chromatography and mass spectrometry combined with multivariable analysis methods provides an approach for discriminating the origin of O. fragrans flowers. To discriminate the Osmanthus fragrans var. thunbergii flowers from different origins with the identified metabolites. GC-MS and UPLC-PDA were conducted to analyse the metabolites in O. fragrans var. thunbergii flowers (in total 150 samples). Principal component analysis (PCA), soft independent modelling of class analogy analysis (SIMCA) and random forest (RF) analysis were applied to group the GC-MS and UPLC-PDA data. GC-MS identified 32 compounds common to all samples while UPLC-PDA/QTOF-MS identified 16 common compounds. PCA of the UPLC-PDA data generated a better clustering than PCA of the GC-MS data. Ten metabolites (six from GC-MS and four from UPLC-PDA) were selected as effective compounds for discrimination by PCA loadings. SIMCA and RF analysis were used to build classification models, and the RF model, based on the four effective compounds (caffeic acid derivative, acteoside, ligustroside and compound 15), yielded better results with the classification rate of 100% in the calibration set and 97.8% in the prediction set. GC-MS and UPLC-PDA combined with multivariable analysis methods can discriminate the origin of Osmanthus fragrans var. thunbergii flowers. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.

Fast image interpolation via random forests.

PubMed

Huang, Jun-Jie; Siu, Wan-Chi; Liu, Tian-Rui

2015-10-01

This paper proposes a two-stage framework for fast image interpolation via random forests (FIRF). The proposed FIRF method gives high accuracy, as well as requires low computation. The underlying idea of this proposed work is to apply random forests to classify the natural image patch space into numerous subspaces and learn a linear regression model for each subspace to map the low-resolution image patch to high-resolution image patch. The FIRF framework consists of two stages. Stage 1 of the framework removes most of the ringing and aliasing artifacts in the initial bicubic interpolated image, while Stage 2 further refines the Stage 1 interpolated image. By varying the number of decision trees in the random forests and the number of stages applied, the proposed FIRF method can realize computationally scalable image interpolation. Extensive experimental results show that the proposed FIRF(3, 2) method achieves more than 0.3 dB improvement in peak signal-to-noise ratio over the state-of-the-art nonlocal autoregressive modeling (NARM) method. Moreover, the proposed FIRF(1, 1) obtains similar or better results as NARM while only takes its 0.3% computational time.
CW-SSIM kernel based random forest for image classification

NASA Astrophysics Data System (ADS)

Fan, Guangzhe; Wang, Zhou; Wang, Jiheng

2010-07-01

Complex wavelet structural similarity (CW-SSIM) index has been proposed as a powerful image similarity metric that is robust to translation, scaling and rotation of images, but how to employ it in image classification applications has not been deeply investigated. In this paper, we incorporate CW-SSIM as a kernel function into a random forest learning algorithm. This leads to a novel image classification approach that does not require a feature extraction or dimension reduction stage at the front end. We use hand-written digit recognition as an example to demonstrate our algorithm. We compare the performance of the proposed approach with random forest learning based on other kernels, including the widely adopted Gaussian and the inner product kernels. Empirical evidences show that the proposed method is superior in its classification power. We also compared our proposed approach with the direct random forest method without kernel and the popular kernel-learning method support vector machine. Our test results based on both simulated and realworld data suggest that the proposed approach works superior to traditional methods without the feature selection procedure.
Improved high-dimensional prediction with Random Forests by the use of co-data.

PubMed

Te Beest, Dennis E; Mes, Steven W; Wilting, Saskia M; Brakenhoff, Ruud H; van de Wiel, Mark A

2017-12-28

Prediction in high dimensional settings is difficult due to the large number of variables relative to the sample size. We demonstrate how auxiliary 'co-data' can be used to improve the performance of a Random Forest in such a setting. Co-data are incorporated in the Random Forest by replacing the uniform sampling probabilities that are used to draw candidate variables by co-data moderated sampling probabilities. Co-data here are defined as any type information that is available on the variables of the primary data, but does not use its response labels. These moderated sampling probabilities are, inspired by empirical Bayes, learned from the data at hand. We demonstrate the co-data moderated Random Forest (CoRF) with two examples. In the first example we aim to predict the presence of a lymph node metastasis with gene expression data. We demonstrate how a set of external p-values, a gene signature, and the correlation between gene expression and DNA copy number can improve the predictive performance. In the second example we demonstrate how the prediction of cervical (pre-)cancer with methylation data can be improved by including the location of the probe relative to the known CpG islands, the number of CpG sites targeted by a probe, and a set of p-values from a related study. The proposed method is able to utilize auxiliary co-data to improve the performance of a Random Forest.
Differential privacy-based evaporative cooling feature selection and classification with relief-F and random forests.

PubMed

Le, Trang T; Simmons, W Kyle; Misaki, Masaya; Bodurka, Jerzy; White, Bill C; Savitz, Jonathan; McKinney, Brett A

2017-09-15

Classification of individuals into disease or clinical categories from high-dimensional biological data with low prediction error is an important challenge of statistical learning in bioinformatics. Feature selection can improve classification accuracy but must be incorporated carefully into cross-validation to avoid overfitting. Recently, feature selection methods based on differential privacy, such as differentially private random forests and reusable holdout sets, have been proposed. However, for domains such as bioinformatics, where the number of features is much larger than the number of observations p≫n , these differential privacy methods are susceptible to overfitting. We introduce private Evaporative Cooling, a stochastic privacy-preserving machine learning algorithm that uses Relief-F for feature selection and random forest for privacy preserving classification that also prevents overfitting. We relate the privacy-preserving threshold mechanism to a thermodynamic Maxwell-Boltzmann distribution, where the temperature represents the privacy threshold. We use the thermal statistical physics concept of Evaporative Cooling of atomic gases to perform backward stepwise privacy-preserving feature selection. On simulated data with main effects and statistical interactions, we compare accuracies on holdout and validation sets for three privacy-preserving methods: the reusable holdout, reusable holdout with random forest, and private Evaporative Cooling, which uses Relief-F feature selection and random forest classification. In simulations where interactions exist between attributes, private Evaporative Cooling provides higher classification accuracy without overfitting based on an independent validation set. In simulations without interactions, thresholdout with random forest and private Evaporative Cooling give comparable accuracies. We also apply these privacy methods to human brain resting-state fMRI data from a study of major depressive disorder. Code available at http://insilico.utulsa.edu/software/privateEC . brett-mckinney@utulsa.edu. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Estimating future burned areas under changing climate in the EU-Mediterranean countries.

PubMed

Amatulli, Giuseppe; Camia, Andrea; San-Miguel-Ayanz, Jesús

2013-04-15

The impacts of climate change on forest fires have received increased attention in recent years at both continental and local scales. It is widely recognized that weather plays a key role in extreme fire situations. It is therefore of great interest to analyze projected changes in fire danger under climate change scenarios and to assess the consequent impacts of forest fires. In this study we estimated burned areas in the European Mediterranean (EU-Med) countries under past and future climate conditions. Historical (1985-2004) monthly burned areas in EU-Med countries were modeled by using the Canadian Fire Weather Index (CFWI). Monthly averages of the CFWI sub-indices were used as explanatory variables to estimate the monthly burned areas in each of the five most affected countries in Europe using three different modeling approaches (Multiple Linear Regression - MLR, Random Forest - RF, Multivariate Adaptive Regression Splines - MARS). MARS outperformed the other methods. Regression equations and significant coefficients of determination were obtained, although there were noticeable differences from country to country. Climatic conditions at the end of the 21st Century were simulated using results from the runs of the regional climate model HIRHAM in the European project PRUDENCE, considering two IPCC SRES scenarios (A2-B2). The MARS models were applied to both scenarios resulting in projected burned areas in each country and in the EU-Med region. Results showed that significant increases, 66% and 140% of the total burned area, can be expected in the EU-Med region under the A2 and B2 scenarios, respectively. Copyright © 2013 Elsevier B.V. All rights reserved.
Forest Edge Regrowth Typologies in Southern Sweden-Relationship to Environmental Characteristics and Implications for Management.

PubMed

Wiström, Björn; Busse Nielsen, Anders

2017-07-01

After two major storms, the Swedish Transport Administration was granted permission in 2008 to expand the railroad corridor from 10 to 20 m from the rail banks, and to clear the forest edges in the expanded area. In order to evaluate the possibilities for managers to promote and control the species composition of the woody regrowth so that a forest edge with a graded profile develops over time, this study mapped the woody regrowth and environmental variables at 78 random sites along the 610-km railroad between Stockholm and Malmö four growing seasons after the clearing was implemented. Through different clustering approaches, dominant tree species to be controlled and future building block species for management were identified. Using multivariate regression trees, the most decisive environmental variables were identified and used to develop a regrowth typology and to calculate species indicator values. Five regrowth types and ten indicator species were identified along the environmental gradients of soil moisture, soil fertility, and altitude. Six tree species dominated the regrowth across the regrowth types, but clustering showed that if these were controlled by selective thinning, lower tree and shrub species were generally present so they could form the "building blocks" for development of a graded edge. We concluded that selective thinning targeted at controlling a few dominant tree species, here named Functional Species Control, is a simple and easily implemented management concept to promote a wide range of suitable species, because it does not require field staff with specialist taxonomic knowledge.
What does it take to get family forest owners to enroll in a forest stewardship-type program?

Treesearch

Michael A. Kilgore; Stephanie A. Snyder; Joseph Schertz; Steven J. Taff

2008-01-01

We estimated the probability of enrollment and factors influencing participation in a forest stewardship-type program, Minnesota's Sustainable Forest Incentives Act, using data from a mail survey of over 1000 randomly-selected Minnesota family forest owners. Of the 15 variables tested, only five were significant predictors of a landowner's interest in...
Mapping forest vegetation for the western United States using modified random forests imputation of FIA forest plots

Treesearch

Karin Riley; Isaac C. Grenfell; Mark A. Finney

2016-01-01

Maps of the number, size, and species of trees in forests across the western United States are desirable for many applications such as estimating terrestrial carbon resources, predicting tree mortality following wildfires, and for forest inventory. However, detailed mapping of trees for large areas is not feasible with current technologies, but statistical...
Random forest regression modelling for forest aboveground biomass estimation using RISAT-1 PolSAR and terrestrial LiDAR data

NASA Astrophysics Data System (ADS)

Mangla, Rohit; Kumar, Shashi; Nandy, Subrata

2016-05-01

SAR and LiDAR remote sensing have already shown the potential of active sensors for forest parameter retrieval. SAR sensor in its fully polarimetric mode has an advantage to retrieve scattering property of different component of forest structure and LiDAR has the capability to measure structural information with very high accuracy. This study was focused on retrieval of forest aboveground biomass (AGB) using Terrestrial Laser Scanner (TLS) based point clouds and scattering property of forest vegetation obtained from decomposition modelling of RISAT-1 fully polarimetric SAR data. TLS data was acquired for 14 plots of Timli forest range, Uttarakhand, India. The forest area is dominated by Sal trees and random sampling with plot size of 0.1 ha (31.62m*31.62m) was adopted for TLS and field data collection. RISAT-1 data was processed to retrieve SAR data based variables and TLS point clouds based 3D imaging was done to retrieve LiDAR based variables. Surface scattering, double-bounce scattering, volume scattering, helix and wire scattering were the SAR based variables retrieved from polarimetric decomposition. Tree heights and stem diameters were used as LiDAR based variables retrieved from single tree vertical height and least square circle fit methods respectively. All the variables obtained for forest plots were used as an input in a machine learning based Random Forest Regression Model, which was developed in this study for forest AGB estimation. Modelled output for forest AGB showed reliable accuracy (RMSE = 27.68 t/ha) and a good coefficient of determination (0.63) was obtained through the linear regression between modelled AGB and field-estimated AGB. The sensitivity analysis showed that the model was more sensitive for the major contributed variables (stem diameter and volume scattering) and these variables were measured from two different remote sensing techniques. This study strongly recommends the integration of SAR and LiDAR data for forest AGB estimation.
The Random Forests Statistical Technique: An Examination of Its Value for the Study of Reading

ERIC Educational Resources Information Center

Matsuki, Kazunaga; Kuperman, Victor; Van Dyke, Julie A.

2016-01-01

Studies investigating individual differences in reading ability often involve data sets containing a large number of collinear predictors and a small number of observations. In this article, we discuss the method of Random Forests and demonstrate its suitability for addressing the statistical concerns raised by such data sets. The method is…
An Introduction to Recursive Partitioning: Rationale, Application, and Characteristics of Classification and Regression Trees, Bagging, and Random Forests

ERIC Educational Resources Information Center

Strobl, Carolin; Malley, James; Tutz, Gerhard

2009-01-01

Recursive partitioning methods have become popular and widely used tools for nonparametric regression and classification in many scientific fields. Especially random forests, which can deal with large numbers of predictor variables even in the presence of complex interactions, have been applied successfully in genetics, clinical medicine, and…
Random location of fuel treatments in wildland community interfaces: a percolation approach

Treesearch

Michael Bevers; Philip N. Omi; John G. Hof

2004-01-01

We explore the use of spatially correlated random treatments to reduce fuels in landscape patterns that appear somewhat natural while forming fully connected fuelbreaks between wildland forests and developed protection zones. From treatment zone maps partitioned into grids of hexagonal forest cells representing potential treatment sites, we selected cells to be treated...
Road Network State Estimation Using Random Forest Ensemble Learning

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hou, Yi; Edara, Praveen; Chang, Yohan

Network-scale travel time prediction not only enables traffic management centers (TMC) to proactively implement traffic management strategies, but also allows travelers make informed decisions about route choices between various origins and destinations. In this paper, a random forest estimator was proposed to predict travel time in a network. The estimator was trained using two years of historical travel time data for a case study network in St. Louis, Missouri. Both temporal and spatial effects were considered in the modeling process. The random forest models predicted travel times accurately during both congested and uncongested traffic conditions. The computational times for themore » models were low, thus useful for real-time traffic management and traveler information applications.« less
Analysis of Forest Foliage Using a Multivariate Mixture Model

NASA Technical Reports Server (NTRS)

Hlavka, C. A.; Peterson, David L.; Johnson, L. F.; Ganapol, B.

1997-01-01

Data with wet chemical measurements and near infrared spectra of ground leaf samples were analyzed to test a multivariate regression technique for estimating component spectra which is based on a linear mixture model for absorbance. The resulting unmixed spectra for carbohydrates, lignin, and protein resemble the spectra of extracted plant starches, cellulose, lignin, and protein. The unmixed protein spectrum has prominent absorption spectra at wavelengths which have been associated with nitrogen bonds.
Adaptive economic and ecological forest management under risk

Treesearch

Joseph Buongiorno; Mo Zhou

2015-01-01

Background: Forest managers must deal with inherently stochastic ecological and economic processes. The future growth of trees is uncertain, and so is their value. The randomness of low-impact, high frequency or rare catastrophic shocks in forest growth has significant implications in shaping the mix of tree species and the forest landscape...
Seeing the forest for the trees: utilizing modified random forests imputation of forest plot data for landscape-level analyses

Treesearch

Karin L. Riley; Isaac C. Grenfell; Mark A. Finney

2015-01-01

Mapping the number, size, and species of trees in forests across the western United States has utility for a number of research endeavors, ranging from estimation of terrestrial carbon resources to tree mortality following wildfires. For landscape fire and forest simulations that use the Forest Vegetation Simulator (FVS), a tree-level dataset, or âtree listâ, is a...
Plant trait-species abundance relationships vary with environmental properties in subtropical forests in eastern china.

PubMed

Yan, En-Rong; Yang, Xiao-Dong; Chang, Scott X; Wang, Xi-Hua

2013-01-01

Understanding how plant trait-species abundance relationships change with a range of single and multivariate environmental properties is crucial for explaining species abundance and rarity. In this study, the abundance of 94 woody plant species was examined and related to 15 plant leaf and wood traits at both local and landscape scales involving 31 plots in subtropical forests in eastern China. Further, plant trait-species abundance relationships were related to a range of single and multivariate (PCA axes) environmental properties such as air humidity, soil moisture content, soil temperature, soil pH, and soil organic matter, nitrogen (N) and phosphorus (P) contents. At the landscape scale, plant maximum height, and twig and stem wood densities were positively correlated, whereas mean leaf area (MLA), leaf N concentration (LN), and total leaf area per twig size (TLA) were negatively correlated with species abundance. At the plot scale, plant maximum height, leaf and twig dry matter contents, twig and stem wood densities were positively correlated, but MLA, specific leaf area, LN, leaf P concentration and TLA were negatively correlated with species abundance. Plant trait-species abundance relationships shifted over the range of seven single environmental properties and along multivariate environmental axes in a similar way. In conclusion, strong relationships between plant traits and species abundance existed among and within communities. Significant shifts in plant trait-species abundance relationships in a range of environmental properties suggest strong environmental filtering processes that influence species abundance and rarity in the studied subtropical forests.
Forest Cover Estimation in Ireland Using Radar Remote Sensing: A Comparative Analysis of Forest Cover Assessment Methodologies.

PubMed

Devaney, John; Barrett, Brian; Barrett, Frank; Redmond, John; O Halloran, John

2015-01-01

Quantification of spatial and temporal changes in forest cover is an essential component of forest monitoring programs. Due to its cloud free capability, Synthetic Aperture Radar (SAR) is an ideal source of information on forest dynamics in countries with near-constant cloud-cover. However, few studies have investigated the use of SAR for forest cover estimation in landscapes with highly sparse and fragmented forest cover. In this study, the potential use of L-band SAR for forest cover estimation in two regions (Longford and Sligo) in Ireland is investigated and compared to forest cover estimates derived from three national (Forestry2010, Prime2, National Forest Inventory), one pan-European (Forest Map 2006) and one global forest cover (Global Forest Change) product. Two machine-learning approaches (Random Forests and Extremely Randomised Trees) are evaluated. Both Random Forests and Extremely Randomised Trees classification accuracies were high (98.1-98.5%), with differences between the two classifiers being minimal (<0.5%). Increasing levels of post classification filtering led to a decrease in estimated forest area and an increase in overall accuracy of SAR-derived forest cover maps. All forest cover products were evaluated using an independent validation dataset. For the Longford region, the highest overall accuracy was recorded with the Forestry2010 dataset (97.42%) whereas in Sligo, highest overall accuracy was obtained for the Prime2 dataset (97.43%), although accuracies of SAR-derived forest maps were comparable. Our findings indicate that spaceborne radar could aid inventories in regions with low levels of forest cover in fragmented landscapes. The reduced accuracies observed for the global and pan-continental forest cover maps in comparison to national and SAR-derived forest maps indicate that caution should be exercised when applying these datasets for national reporting.
Forest Cover Estimation in Ireland Using Radar Remote Sensing: A Comparative Analysis of Forest Cover Assessment Methodologies

PubMed Central

Devaney, John; Barrett, Brian; Barrett, Frank; Redmond, John; O`Halloran, John

2015-01-01

Quantification of spatial and temporal changes in forest cover is an essential component of forest monitoring programs. Due to its cloud free capability, Synthetic Aperture Radar (SAR) is an ideal source of information on forest dynamics in countries with near-constant cloud-cover. However, few studies have investigated the use of SAR for forest cover estimation in landscapes with highly sparse and fragmented forest cover. In this study, the potential use of L-band SAR for forest cover estimation in two regions (Longford and Sligo) in Ireland is investigated and compared to forest cover estimates derived from three national (Forestry2010, Prime2, National Forest Inventory), one pan-European (Forest Map 2006) and one global forest cover (Global Forest Change) product. Two machine-learning approaches (Random Forests and Extremely Randomised Trees) are evaluated. Both Random Forests and Extremely Randomised Trees classification accuracies were high (98.1–98.5%), with differences between the two classifiers being minimal (<0.5%). Increasing levels of post classification filtering led to a decrease in estimated forest area and an increase in overall accuracy of SAR-derived forest cover maps. All forest cover products were evaluated using an independent validation dataset. For the Longford region, the highest overall accuracy was recorded with the Forestry2010 dataset (97.42%) whereas in Sligo, highest overall accuracy was obtained for the Prime2 dataset (97.43%), although accuracies of SAR-derived forest maps were comparable. Our findings indicate that spaceborne radar could aid inventories in regions with low levels of forest cover in fragmented landscapes. The reduced accuracies observed for the global and pan-continental forest cover maps in comparison to national and SAR-derived forest maps indicate that caution should be exercised when applying these datasets for national reporting. PMID:26262681
Multiple filters affect tree species assembly in mid-latitude forest communities.

PubMed

Kubota, Y; Kusumoto, B; Shiono, T; Ulrich, W

2018-05-01

Species assembly patterns of local communities are shaped by the balance between multiple abiotic/biotic filters and dispersal that both select individuals from species pools at the regional scale. Knowledge regarding functional assembly can provide insight into the relative importance of the deterministic and stochastic processes that shape species assembly. We evaluated the hierarchical roles of the α niche and β niches by analyzing the influence of environmental filtering relative to functional traits on geographical patterns of tree species assembly in mid-latitude forests. Using forest plot datasets, we examined the α niche traits (leaf and wood traits) and β niche properties (cold/drought tolerance) of tree species, and tested non-randomness (clustering/over-dispersion) of trait assembly based on null models that assumed two types of species pools related to biogeographical regions. For most plots, species assembly patterns fell within the range of random expectation. However, particularly for cold/drought tolerance-related β niche properties, deviation from randomness was frequently found; non-random clustering was predominant in higher latitudes with harsh climates. Our findings demonstrate that both randomness and non-randomness in trait assembly emerged as a result of the α and β niches, although we suggest the potential role of dispersal processes and/or species equalization through trait similarities in generating the prevalence of randomness. Clustering of β niche traits along latitudinal climatic gradients provides clear evidence of species sorting by filtering particular traits. Our results reveal that multiple filters through functional niches and stochastic processes jointly shape geographical patterns of species assembly across mid-latitude forests.

"L"-Bivariate and "L"-Multivariate Association Coefficients. Research Report. ETS RR-08-40

ERIC Educational Resources Information Center

Kong, Nan; Lewis, Charles

2008-01-01

Given a system of multiple random variables, a new measure called the "L"-multivariate association coefficient is defined using (conditional) entropy. Unlike traditional correlation measures, the L-multivariate association coefficient measures the multiassociations or multirelations among the multiple variables in the given system; that…
Hand pose estimation in depth image using CNN and random forest

NASA Astrophysics Data System (ADS)

Chen, Xi; Cao, Zhiguo; Xiao, Yang; Fang, Zhiwen

2018-03-01

Thanks to the availability of low cost depth cameras, like Microsoft Kinect, 3D hand pose estimation attracted special research attention in these years. Due to the large variations in hand`s viewpoint and the high dimension of hand motion, 3D hand pose estimation is still challenging. In this paper we propose a two-stage framework which joint with CNN and Random Forest to boost the performance of hand pose estimation. First, we use a standard Convolutional Neural Network (CNN) to regress the hand joints` locations. Second, using a Random Forest to refine the joints from the first stage. In the second stage, we propose a pyramid feature which merges the information flow of the CNN. Specifically, we get the rough joints` location from first stage, then rotate the convolutional feature maps (and image). After this, for each joint, we map its location to each feature map (and image) firstly, then crop features at each feature map (and image) around its location, put extracted features to Random Forest to refine at last. Experimentally, we evaluate our proposed method on ICVL dataset and get the mean error about 11mm, our method is also real-time on a desktop.
Recent drought conditions in the Conterminous United States

Treesearch

Frank H. Koch; William D. Smith; John W. Coulston

2013-01-01

Droughts are common in virtually all U.S. forests, but their frequency and intensity vary widely both between and within forest ecosystems (Hanson and Weltzin 2000). Forests in the Western United States generally exhibit a pattern of annual seasonal droughts. Forests in the Eastern United States tend to exhibit one of two prevailing patterns: random occasional droughts...
Stratifying to reduce bias caused by high nonresponse rates: A case study from New Mexico’s forest inventory

Treesearch

Sara A. Goeking; Paul L. Patterson

2013-01-01

The USDA Forest Serviceâs Forest Inventory and Analysis (FIA) Program applies specific sampling and analysis procedures to estimate a variety of forest attributes. FIAâs Interior West region uses post-stratification, where strata consist of forest/nonforest polygons based on MODIS imagery, and assumes that nonresponse plots are distributed at random across each stratum...
RF-Phos: A Novel General Phosphorylation Site Prediction Tool Based on Random Forest.

PubMed

Ismail, Hamid D; Jones, Ahoi; Kim, Jung H; Newman, Robert H; Kc, Dukka B

2016-01-01

Protein phosphorylation is one of the most widespread regulatory mechanisms in eukaryotes. Over the past decade, phosphorylation site prediction has emerged as an important problem in the field of bioinformatics. Here, we report a new method, termed Random Forest-based Phosphosite predictor 2.0 (RF-Phos 2.0), to predict phosphorylation sites given only the primary amino acid sequence of a protein as input. RF-Phos 2.0, which uses random forest with sequence and structural features, is able to identify putative sites of phosphorylation across many protein families. In side-by-side comparisons based on 10-fold cross validation and an independent dataset, RF-Phos 2.0 compares favorably to other popular mammalian phosphosite prediction methods, such as PhosphoSVM, GPS2.1, and Musite.
3D statistical shape models incorporating 3D random forest regression voting for robust CT liver segmentation

NASA Astrophysics Data System (ADS)

Norajitra, Tobias; Meinzer, Hans-Peter; Maier-Hein, Klaus H.

2015-03-01

During image segmentation, 3D Statistical Shape Models (SSM) usually conduct a limited search for target landmarks within one-dimensional search profiles perpendicular to the model surface. In addition, landmark appearance is modeled only locally based on linear profiles and weak learners, altogether leading to segmentation errors from landmark ambiguities and limited search coverage. We present a new method for 3D SSM segmentation based on 3D Random Forest Regression Voting. For each surface landmark, a Random Regression Forest is trained that learns a 3D spatial displacement function between the according reference landmark and a set of surrounding sample points, based on an infinite set of non-local randomized 3D Haar-like features. Landmark search is then conducted omni-directionally within 3D search spaces, where voxelwise forest predictions on landmark position contribute to a common voting map which reflects the overall position estimate. Segmentation experiments were conducted on a set of 45 CT volumes of the human liver, of which 40 images were randomly chosen for training and 5 for testing. Without parameter optimization, using a simple candidate selection and a single resolution approach, excellent results were achieved, while faster convergence and better concavity segmentation were observed, altogether underlining the potential of our approach in terms of increased robustness from distinct landmark detection and from better search coverage.
Artificial Intelligence Procedures for Tree Taper Estimation within a Complex Vegetation Mosaic in Brazil

PubMed Central

Nunes, Matheus Henrique

2016-01-01

Tree stem form in native tropical forests is very irregular, posing a challenge to establishing taper equations that can accurately predict the diameter at any height along the stem and subsequently merchantable volume. Artificial intelligence approaches can be useful techniques in minimizing estimation errors within complex variations of vegetation. We evaluated the performance of Random Forest® regression tree and Artificial Neural Network procedures in modelling stem taper. Diameters and volume outside bark were compared to a traditional taper-based equation across a tropical Brazilian savanna, a seasonal semi-deciduous forest and a rainforest. Neural network models were found to be more accurate than the traditional taper equation. Random forest showed trends in the residuals from the diameter prediction and provided the least precise and accurate estimations for all forest types. This study provides insights into the superiority of a neural network, which provided advantages regarding the handling of local effects. PMID:27187074
Electromagnetic wave extinction within a forested canopy

NASA Technical Reports Server (NTRS)

Karam, M. A.; Fung, A. K.

1989-01-01

A forested canopy is modeled by a collection of randomly oriented finite-length cylinders shaded by randomly oriented and distributed disk- or needle-shaped leaves. For a plane wave exciting the forested canopy, the extinction coefficient is formulated in terms of the extinction cross sections (ECSs) in the local frame of each forest component and the Eulerian angles of orientation (used to describe the orientation of each component). The ECSs in the local frame for the finite-length cylinders used to model the branches are obtained by using the forward-scattering theorem. ECSs in the local frame for the disk- and needle-shaped leaves are obtained by the summation of the absorption and scattering cross-sections. The behavior of the extinction coefficients with the incidence angle is investigated numerically for both deciduous and coniferous forest. The dependencies of the extinction coefficients on the orientation of the leaves are illustrated numerically.
Artificial Intelligence Procedures for Tree Taper Estimation within a Complex Vegetation Mosaic in Brazil.

PubMed

Nunes, Matheus Henrique; Görgens, Eric Bastos

2016-01-01

Tree stem form in native tropical forests is very irregular, posing a challenge to establishing taper equations that can accurately predict the diameter at any height along the stem and subsequently merchantable volume. Artificial intelligence approaches can be useful techniques in minimizing estimation errors within complex variations of vegetation. We evaluated the performance of Random Forest® regression tree and Artificial Neural Network procedures in modelling stem taper. Diameters and volume outside bark were compared to a traditional taper-based equation across a tropical Brazilian savanna, a seasonal semi-deciduous forest and a rainforest. Neural network models were found to be more accurate than the traditional taper equation. Random forest showed trends in the residuals from the diameter prediction and provided the least precise and accurate estimations for all forest types. This study provides insights into the superiority of a neural network, which provided advantages regarding the handling of local effects.
Random forests and stochastic gradient boosting for predicting tree canopy cover: Comparing tuning processes and model performance

Treesearch

E. Freeman; G. Moisen; J. Coulston; B. Wilson

2014-01-01

Random forests (RF) and stochastic gradient boosting (SGB), both involving an ensemble of classification and regression trees, are compared for modeling tree canopy cover for the 2011 National Land Cover Database (NLCD). The objectives of this study were twofold. First, sensitivity of RF and SGB to choices in tuning parameters was explored. Second, performance of the...
Relationship of field and LiDAR estimates of forest canopy cover with snow accumulation and melt

Treesearch

Mariana Dobre; William J. Elliot; Joan Q. Wu; Timothy E. Link; Brandon Glaza; Theresa B. Jain; Andrew T. Hudak

2012-01-01

At the Priest River Experimental Forest in northern Idaho, USA, snow water equivalent (SWE) was recorded over a period of six years on random, equally-spaced plots in ~4.5 ha small watersheds (n=10). Two watersheds were selected as controls and eight as treatments, with two watersheds randomly assigned per treatment as follows: harvest (2007) followed by mastication (...
New machine learning tools for predictive vegetation mapping after climate change: Bagging and Random Forest perform better than Regression Tree Analysis

Treesearch

L.R. Iverson; A.M. Prasad; A. Liaw

2004-01-01

More and better machine learning tools are becoming available for landscape ecologists to aid in understanding species-environment relationships and to map probable species occurrence now and potentially into the future. To thal end, we evaluated three statistical models: Regression Tree Analybib (RTA), Bagging Trees (BT) and Random Forest (RF) for their utility in...
Random forests and stochastic gradient boosting for predicting tree canopy cover: Comparing tuning processes and model performance

Treesearch

Elizabeth A. Freeman; Gretchen G. Moisen; John W. Coulston; Barry T. (Ty) Wilson

2015-01-01

As part of the development of the 2011 National Land Cover Database (NLCD) tree canopy cover layer, a pilot project was launched to test the use of high-resolution photography coupled with extensive ancillary data to map the distribution of tree canopy cover over four study regions in the conterminous US. Two stochastic modeling techniques, random forests (RF...
Quantifying the impact of between-study heterogeneity in multivariate meta-analyses

PubMed Central

Jackson, Dan; White, Ian R; Riley, Richard D

2012-01-01

Measures that quantify the impact of heterogeneity in univariate meta-analysis, including the very popular I2 statistic, are now well established. Multivariate meta-analysis, where studies provide multiple outcomes that are pooled in a single analysis, is also becoming more commonly used. The question of how to quantify heterogeneity in the multivariate setting is therefore raised. It is the univariate R2 statistic, the ratio of the variance of the estimated treatment effect under the random and fixed effects models, that generalises most naturally, so this statistic provides our basis. This statistic is then used to derive a multivariate analogue of I2, which we call . We also provide a multivariate H2 statistic, the ratio of a generalisation of Cochran's heterogeneity statistic and its associated degrees of freedom, with an accompanying generalisation of the usual I2 statistic, . Our proposed heterogeneity statistics can be used alongside all the usual estimates and inferential procedures used in multivariate meta-analysis. We apply our methods to some real datasets and show how our statistics are equally appropriate in the context of multivariate meta-regression, where study level covariate effects are included in the model. Our heterogeneity statistics may be used when applying any procedure for fitting the multivariate random effects model. Copyright © 2012 John Wiley & Sons, Ltd. PMID:22763950
Chapter4 - Drought patterns in the conterminous United States and Hawaii.

Treesearch

Frank H. Koch; William D. Smith; John W. Coulston

2014-01-01

Droughts are common in virtually all U.S. forests, but their frequency and intensity vary widely both between and within forest ecosystems (Hanson and Weltzin 2000). Forests in the Western United States generally exhibit a pattern of annual seasonal droughts. Forests in the Eastern United States tend to exhibit one of two prevailing patterns: random occasional droughts...
A Prospectus on Restoring Late Successional Forest Structure to Eastside Pine Ecosystems Through Large-Scale, Interdisciplinary Research

Treesearch

Steve Zack; William F. Laudenslayer; Luke George; Carl Skinner; William Oliver

1999-01-01

At two different locations in northeast California, an interdisciplinary team of scientists is initiating long-term studies to quantify the effects of forest manipulations intended to accelerate andlor enhance late-successional structure of eastside pine forest ecosystems. One study, at Blacks Mountain Experimental Forest, uses a split-plot, factorial, randomized block...
Probabilistic risk models for multiple disturbances: an example of forest insects and wildfires

Treesearch

Haiganoush K. Preisler; Alan A. Ager; Jane L. Hayes

2010-01-01

Building probabilistic risk models for highly random forest disturbances like wildfire and forest insect outbreaks is a challenging. Modeling the interactions among natural disturbances is even more difficult. In the case of wildfire and forest insects, we looked at the probability of a large fire given an insect outbreak and also the incidence of insect outbreaks...
Utilizing random forests imputation of forest plot data for landscape-level wildfire analyses

Treesearch

Karin L. Riley; Isaac C. Grenfell; Mark A. Finney; Nicholas L. Crookston

2014-01-01

Maps of the number, size, and species of trees in forests across the United States are desirable for a number of applications. For landscape-level fire and forest simulations that use the Forest Vegetation Simulator (FVS), a spatial tree-level dataset, or âtree listâ, is a necessity. FVS is widely used at the stand level for simulating fire effects on tree mortality,...
Multivariate model of female black bear habitat use for a Geographic Information System

USGS Publications Warehouse

Clark, Joseph D.; Dunn, James E.; Smith, Kimberly G.

1993-01-01

Simple univariate statistical techniques may not adequately assess the multidimensional nature of habitats used by wildlife. Thus, we developed a multivariate method to model habitat-use potential using a set of female black bear (Ursus americanus) radio locations and habitat data consisting of forest cover type, elevation, slope, aspect, distance to roads, distance to streams, and forest cover type diversity score in the Ozark Mountains of Arkansas. The model is based on the Mahalanobis distance statistic coupled with Geographic Information System (GIS) technology. That statistic is a measure of dissimilarity and represents a standardized squared distance between a set of sample variates and an ideal based on the mean of variates associated with animal observations. Calculations were made with the GIS to produce a map containing Mahalanobis distance values within each cell on a 60- × 60-m grid. The model identified areas of high habitat use potential that could not otherwise be identified by independent perusal of any single map layer. This technique avoids many pitfalls that commonly affect typical multivariate analyses of habitat use and is a useful tool for habitat manipulation or mitigation to favor terrestrial vertebrates that use habitats on a landscape scale.
Agro-forest landscape and the 'fringe' city: a multivariate assessment of land-use changes in a sprawling region and implications for planning.

PubMed

Salvati, Luca

2014-08-15

The present study evaluates the impact of urban expansion on landscape transformations in Rome's metropolitan area (1500 km(2)) during the last sixty years. Landscape composition, structure and dynamics were assessed for 1949 and 2008 by analyzing the distribution of 26 metrics for nine land-use classes. Changes in landscape structure are analysed by way of a multivariate statistical approach providing a summary measure of rapidity-to-change for each metric and class. Land fragmentation increased during the study period due to urban expansion. Poorly protected or medium-low value added classes (vineyards, arable land, olive groves and pastures) experienced fragmentation processes compared with protected or high-value added classes (e.g. forests, olive groves) showing larger 'core' areas and lower fragmentation. The relationship observed between class area and mean patch size indicates increased fragmentation for all uses of land (both expanding and declining) except for urban areas and forests. Reducing the impact of urban expansion for specific land-use classes is an effective planning strategy to contrast the simplification of Mediterranean landscape in peri-urban areas. Copyright © 2014 Elsevier B.V. All rights reserved.

Random Forest Application for NEXRAD Radar Data Quality Control

NASA Astrophysics Data System (ADS)

Keem, M.; Seo, B. C.; Krajewski, W. F.

2017-12-01

Identification and elimination of non-meteorological radar echoes (e.g., returns from ground, wind turbines, and biological targets) are the basic data quality control steps before radar data use in quantitative applications (e.g., precipitation estimation). Although WSR-88Ds' recent upgrade to dual-polarization has enhanced this quality control and echo classification, there are still challenges to detect some non-meteorological echoes that show precipitation-like characteristics (e.g., wind turbine or anomalous propagation clutter embedded in rain). With this in mind, a new quality control method using Random Forest is proposed in this study. This classification algorithm is known to produce reliable results with less uncertainty. The method introduces randomness into sampling and feature selections and integrates consequent multiple decision trees. The multidimensional structure of the trees can characterize the statistical interactions of involved multiple features in complex situations. The authors explore the performance of Random Forest method for NEXRAD radar data quality control. Training datasets are selected using several clear cases of precipitation and non-precipitation (but with some non-meteorological echoes). The model is structured using available candidate features (from the NEXRAD data) such as horizontal reflectivity, differential reflectivity, differential phase shift, copolar correlation coefficient, and their horizontal textures (e.g., local standard deviation). The influence of each feature on classification results are quantified by variable importance measures that are automatically estimated by the Random Forest algorithm. Therefore, the number and types of features in the final forest can be examined based on the classification accuracy. The authors demonstrate the capability of the proposed approach using several cases ranging from distinct to complex rain/no-rain events and compare the performance with the existing algorithms (e.g., MRMS). They also discuss operational feasibility based on the observed strength and weakness of the method.
Identifying cytokine predictors of cognitive functioning in breast cancer survivors up to 10 years post chemotherapy using machine learning.

PubMed

Henneghan, Ashley M; Palesh, Oxana; Harrison, Michelle; Kesler, Shelli R

2018-07-15

The purpose of this study is to explore 13 cytokine predictors of chemotherapy-related cognitive impairment (CRCI) in breast cancer survivors (BCS) 6 months to 10 years after chemotherapy completion using a multivariate, non-parametric approach. Cross sectional data collection included completion of a survey, cognitive testing, and non-fasting blood from 66 participants. Data were analyzed using random forest regression to identify the most significant predictors for each of the cognitive test scores. A different cytokine profile predicted each cognitive test. Adjusted R 2 for each model ranged from 0.71-0.77 (p's < 9.50 -10 ). The relationships between all the cytokine predictors and cognitive test scores were non-linear. Our findings are unique to the field of CRCI and suggest non-linear cytokine specificity to neural networks underlying cognitive functions assessed in this study. Copyright © 2018 Elsevier B.V. All rights reserved.
Targeted Proteomics Approach for Precision Plant Breeding.

PubMed

Chawade, Aakash; Alexandersson, Erik; Bengtsson, Therese; Andreasson, Erik; Levander, Fredrik

2016-02-05

Selected reaction monitoring (SRM) is a targeted mass spectrometry technique that enables precise quantitation of hundreds of peptides in a single run. This technique provides new opportunities for multiplexed protein biomarker measurements. For precision plant breeding, DNA-based markers have been used extensively, but the potential of protein biomarkers has not been exploited. In this work, we developed an SRM marker panel with assays for 104 potato (Solanum tuberosum) peptides selected using univariate and multivariate statistics. Thereafter, using random forest classification, the prediction markers were identified for Phytopthora infestans resistance in leaves, P. infestans resistance in tubers, and plant yield in potato leaf secretome samples. The results suggest that the marker panel has the predictive potential for three traits, two of which have no commercial DNA markers so far. Furthermore, the marker panel was also tested and found to be applicable to potato clones not used during the marker development. The proposed workflow is thus a proof-of-concept for targeted proteomics as an efficient readout in accelerated breeding for complex and agronomically important traits.
Computer-Assisted Decision Support for Student Admissions Based on Their Predicted Academic Performance.

PubMed

Muratov, Eugene; Lewis, Margaret; Fourches, Denis; Tropsha, Alexander; Cox, Wendy C

2017-04-01

Objective. To develop predictive computational models forecasting the academic performance of students in the didactic-rich portion of a doctor of pharmacy (PharmD) curriculum as admission-assisting tools. Methods. All PharmD candidates over three admission cycles were divided into two groups: those who completed the PharmD program with a GPA ≥ 3; and the remaining candidates. Random Forest machine learning technique was used to develop a binary classification model based on 11 pre-admission parameters. Results. Robust and externally predictive models were developed that had particularly high overall accuracy of 77% for candidates with high or low academic performance. These multivariate models were highly accurate in predicting these groups to those obtained using undergraduate GPA and composite PCAT scores only. Conclusion. The models developed in this study can be used to improve the admission process as preliminary filters and thus quickly identify candidates who are likely to be successful in the PharmD curriculum.
Spatial modeling of cutaneous leishmaniasis in the Andean region of Colombia

PubMed Central

Pérez-Flórez, Mauricio; Ocampo, Clara Beatriz; Valderrama-Ardila, Carlos; Alexander, Neal

2016-01-01

The objective of this research was to identify environmental risk factors for cutaneous leishmaniasis (CL) in Colombia and map high-risk municipalities. The study area was the Colombian Andean region, comprising 715 rural and urban municipalities. We used 10 years of CL surveillance: 2000-2009. We used spatial-temporal analysis - conditional autoregressive Poisson random effects modelling - in a Bayesian framework to model the dependence of municipality-level incidence on land use, climate, elevation and population density. Bivariable spatial analysis identified rainforests, forests and secondary vegetation, temperature, and annual precipitation as positively associated with CL incidence. By contrast, livestock agroecosystems and temperature seasonality were negatively associated. Multivariable analysis identified land use - rainforests and agro-livestock - and climate - temperature, rainfall and temperature seasonality - as best predictors of CL. We conclude that climate and land use can be used to identify areas at high risk of CL and that this approach is potentially applicable elsewhere in Latin America. PMID:27355214
Ensemble habitat mapping of invasive plant species

USGS Publications Warehouse

Stohlgren, T.J.; Ma, P.; Kumar, S.; Rocca, M.; Morisette, J.T.; Jarnevich, C.S.; Benson, N.

2010-01-01

Ensemble species distribution models combine the strengths of several species environmental matching models, while minimizing the weakness of any one model. Ensemble models may be particularly useful in risk analysis of recently arrived, harmful invasive species because species may not yet have spread to all suitable habitats, leaving species-environment relationships difficult to determine. We tested five individual models (logistic regression, boosted regression trees, random forest, multivariate adaptive regression splines (MARS), and maximum entropy model or Maxent) and ensemble modeling for selected nonnative plant species in Yellowstone and Grand Teton National Parks, Wyoming; Sequoia and Kings Canyon National Parks, California, and areas of interior Alaska. The models are based on field data provided by the park staffs, combined with topographic, climatic, and vegetation predictors derived from satellite data. For the four invasive plant species tested, ensemble models were the only models that ranked in the top three models for both field validation and test data. Ensemble models may be more robust than individual species-environment matching models for risk analysis. ?? 2010 Society for Risk Analysis.
Plant Trait-Species Abundance Relationships Vary with Environmental Properties in Subtropical Forests in Eastern China

PubMed Central

Yan, En-Rong; Yang, Xiao-Dong; Chang, Scott X.; Wang, Xi-Hua

2013-01-01

Understanding how plant trait-species abundance relationships change with a range of single and multivariate environmental properties is crucial for explaining species abundance and rarity. In this study, the abundance of 94 woody plant species was examined and related to 15 plant leaf and wood traits at both local and landscape scales involving 31 plots in subtropical forests in eastern China. Further, plant trait-species abundance relationships were related to a range of single and multivariate (PCA axes) environmental properties such as air humidity, soil moisture content, soil temperature, soil pH, and soil organic matter, nitrogen (N) and phosphorus (P) contents. At the landscape scale, plant maximum height, and twig and stem wood densities were positively correlated, whereas mean leaf area (MLA), leaf N concentration (LN), and total leaf area per twig size (TLA) were negatively correlated with species abundance. At the plot scale, plant maximum height, leaf and twig dry matter contents, twig and stem wood densities were positively correlated, but MLA, specific leaf area, LN, leaf P concentration and TLA were negatively correlated with species abundance. Plant trait-species abundance relationships shifted over the range of seven single environmental properties and along multivariate environmental axes in a similar way. In conclusion, strong relationships between plant traits and species abundance existed among and within communities. Significant shifts in plant trait-species abundance relationships in a range of environmental properties suggest strong environmental filtering processes that influence species abundance and rarity in the studied subtropical forests. PMID:23560114
Multiple-factor classification of a human-modified forest landscape in the Hsuehshan Mountain Range, Taiwan.

PubMed

Berg, Kevan J; Icyeh, Lahuy; Lin, Yih-Ren; Janz, Arnold; Newmaster, Steven G

2016-12-01

Human actions drive landscape heterogeneity, yet most ecosystem classifications omit the role of human influence. This study explores land use history to inform a classification of forestland of the Tayal Mrqwang indigenous people of Taiwan. Our objectives were to determine the extent to which human action drives landscape heterogeneity. We used interviews, field sampling, and multivariate analysis to relate vegetation patterns to environmental gradients and human modification across 76 sites. We identified eleven forest classes. In total, around 70 % of plots were at lower elevations and had a history of shifting cultivation, terrace farming, and settlement that resulted in alder, laurel, oak, pine, and bamboo stands. Higher elevation mixed conifer forests were least disturbed. Arboriculture and selective harvesting were drivers of other conspicuous forest patterns. The findings show that past land uses play a key role in shaping forests, which is important to consider when setting targets to guide forest management.
Fault Detection of Aircraft System with Random Forest Algorithm and Similarity Measure

PubMed Central

Park, Wookje; Jung, Sikhang

2014-01-01

Research on fault detection algorithm was developed with the similarity measure and random forest algorithm. The organized algorithm was applied to unmanned aircraft vehicle (UAV) that was readied by us. Similarity measure was designed by the help of distance information, and its usefulness was also verified by proof. Fault decision was carried out by calculation of weighted similarity measure. Twelve available coefficients among healthy and faulty status data group were used to determine the decision. Similarity measure weighting was done and obtained through random forest algorithm (RFA); RF provides data priority. In order to get a fast response of decision, a limited number of coefficients was also considered. Relation of detection rate and amount of feature data were analyzed and illustrated. By repeated trial of similarity calculation, useful data amount was obtained. PMID:25057508
Improving the Spatial Prediction of Soil Organic Carbon Stocks in a Complex Tropical Mountain Landscape by Methodological Specifications in Machine Learning Approaches

PubMed Central

Schmidt, Johannes; Glaser, Bruno

2016-01-01

Tropical forests are significant carbon sinks and their soils’ carbon storage potential is immense. However, little is known about the soil organic carbon (SOC) stocks of tropical mountain areas whose complex soil-landscape and difficult accessibility pose a challenge to spatial analysis. The choice of methodology for spatial prediction is of high importance to improve the expected poor model results in case of low predictor-response correlations. Four aspects were considered to improve model performance in predicting SOC stocks of the organic layer of a tropical mountain forest landscape: Different spatial predictor settings, predictor selection strategies, various machine learning algorithms and model tuning. Five machine learning algorithms: random forests, artificial neural networks, multivariate adaptive regression splines, boosted regression trees and support vector machines were trained and tuned to predict SOC stocks from predictors derived from a digital elevation model and satellite image. Topographical predictors were calculated with a GIS search radius of 45 to 615 m. Finally, three predictor selection strategies were applied to the total set of 236 predictors. All machine learning algorithms—including the model tuning and predictor selection—were compared via five repetitions of a tenfold cross-validation. The boosted regression tree algorithm resulted in the overall best model. SOC stocks ranged between 0.2 to 17.7 kg m-2, displaying a huge variability with diffuse insolation and curvatures of different scale guiding the spatial pattern. Predictor selection and model tuning improved the models’ predictive performance in all five machine learning algorithms. The rather low number of selected predictors favours forward compared to backward selection procedures. Choosing predictors due to their indiviual performance was vanquished by the two procedures which accounted for predictor interaction. PMID:27128736
Improving the Spatial Prediction of Soil Organic Carbon Stocks in a Complex Tropical Mountain Landscape by Methodological Specifications in Machine Learning Approaches.

PubMed

Ließ, Mareike; Schmidt, Johannes; Glaser, Bruno

2016-01-01

Tropical forests are significant carbon sinks and their soils' carbon storage potential is immense. However, little is known about the soil organic carbon (SOC) stocks of tropical mountain areas whose complex soil-landscape and difficult accessibility pose a challenge to spatial analysis. The choice of methodology for spatial prediction is of high importance to improve the expected poor model results in case of low predictor-response correlations. Four aspects were considered to improve model performance in predicting SOC stocks of the organic layer of a tropical mountain forest landscape: Different spatial predictor settings, predictor selection strategies, various machine learning algorithms and model tuning. Five machine learning algorithms: random forests, artificial neural networks, multivariate adaptive regression splines, boosted regression trees and support vector machines were trained and tuned to predict SOC stocks from predictors derived from a digital elevation model and satellite image. Topographical predictors were calculated with a GIS search radius of 45 to 615 m. Finally, three predictor selection strategies were applied to the total set of 236 predictors. All machine learning algorithms-including the model tuning and predictor selection-were compared via five repetitions of a tenfold cross-validation. The boosted regression tree algorithm resulted in the overall best model. SOC stocks ranged between 0.2 to 17.7 kg m-2, displaying a huge variability with diffuse insolation and curvatures of different scale guiding the spatial pattern. Predictor selection and model tuning improved the models' predictive performance in all five machine learning algorithms. The rather low number of selected predictors favours forward compared to backward selection procedures. Choosing predictors due to their indiviual performance was vanquished by the two procedures which accounted for predictor interaction.
A primer on stand and forest inventory designs

Treesearch

H. Gyde Lund; Charles E. Thomas

1989-01-01

Covers designs for the inventory of stands and forests in detail and with worked-out examples. For stands, random sampling, line transects, ricochet plot, systematic sampling, single plot, cluster, subjective sampling and complete enumeration are discussed. For forests inventory, the main categories are subjective sampling, inventories without prior stand mapping,...
Forest community classification of the Porcupine River drainage, interior Alaska, and its application to forest management.

Treesearch

John Yarie

1983-01-01

The forest vegetation of 3,600,000 hectares in northeast interior Alaska was classified. A total of 365 plots located in a stratified random design were run through the ordination programs SIMORD and TWINSPAN. A total of 40 forest communities were described vegetatively and, to a limited extent, environmentally. The area covered by each community was similar, ranging...
Experimental Design Considerations for Establishing an Off-Road, Habitat-Specific Bird Monitoring Program Using Point Counts

Treesearch

JoAnn M. Hanowski; Gerald J. Niemi

1995-01-01

We established bird monitoring programs in two regions of Minnesota: the Chippewa National Forest and the Superior National Forest. The experimental design defined forest cover types as strata in which samples of forest stands were randomly selected. Subsamples (3 point counts) were placed in each stand to maximize field effort and to assess within-stand and between-...
Predicting live and dead tree basal area of bark beetle affected forests from discrete-return lidar

Treesearch

Benjamin C. Bright; Andrew T. Hudak; Robert McGaughey; Hans-Erik Andersen; Jose Negron

2013-01-01

Bark beetle outbreaks have killed large numbers of trees across North America in recent years. Lidar remote sensing can be used to effectively estimate forest biomass, but prediction of both live and dead standing biomass in beetle-affected forests using lidar alone has not been demonstrated. We developed Random Forest (RF) models predicting total, live, dead, and...
Valuing the Recreational Benefits from the Creation of Nature Reserves in Irish Forests

Treesearch

Riccardo Scarpa; Susan M. Chilton; W. George Hutchinson; Joseph Buongiorno

2000-01-01

Data from a large-scale contingent valuation study are used to investigate the effects of forest attribum on willingness to pay for forest recreation in Ireland. In particular, the presence of a nature reserve in the forest is found to significantly increase the visitors' willingness to pay. A random utility model is used to estimate the welfare change associated...
Evaluating effectiveness of down-sampling for stratified designs and unbalanced prevalence in Random Forest models of tree species distributions in Nevada

Treesearch

Elizabeth A. Freeman; Gretchen G. Moisen; Tracy S. Frescino

2012-01-01

Random Forests is frequently used to model species distributions over large geographic areas. Complications arise when data used to train the models have been collected in stratified designs that involve different sampling intensity per stratum. The modeling process is further complicated if some of the target species are relatively rare on the landscape leading to an...
Unbiased feature selection in learning random forests for high-dimensional data.

PubMed

Nguyen, Thanh-Tung; Huang, Joshua Zhexue; Nguyen, Thuy Thi

2015-01-01

Random forests (RFs) have been widely used as a powerful classification method. However, with the randomization in both bagging samples and feature selection, the trees in the forest tend to select uninformative features for node splitting. This makes RFs have poor accuracy when working with high-dimensional data. Besides that, RFs have bias in the feature selection process where multivalued features are favored. Aiming at debiasing feature selection in RFs, we propose a new RF algorithm, called xRF, to select good features in learning RFs for high-dimensional data. We first remove the uninformative features using p-value assessment, and the subset of unbiased features is then selected based on some statistical measures. This feature subset is then partitioned into two subsets. A feature weighting sampling technique is used to sample features from these two subsets for building trees. This approach enables one to generate more accurate trees, while allowing one to reduce dimensionality and the amount of data needed for learning RFs. An extensive set of experiments has been conducted on 47 high-dimensional real-world datasets including image datasets. The experimental results have shown that RFs with the proposed approach outperformed the existing random forests in increasing the accuracy and the AUC measures.
A random forest learning assisted "divide and conquer" approach for peptide conformation search.

PubMed

Chen, Xin; Yang, Bing; Lin, Zijing

2018-06-11

Computational determination of peptide conformations is challenging as it is a problem of finding minima in a high-dimensional space. The "divide and conquer" approach is promising for reliably reducing the search space size. A random forest learning model is proposed here to expand the scope of applicability of the "divide and conquer" approach. A random forest classification algorithm is used to characterize the distributions of the backbone φ-ψ units ("words"). A random forest supervised learning model is developed to analyze the combinations of the φ-ψ units ("grammar"). It is found that amino acid residues may be grouped as equivalent "words", while the φ-ψ combinations in low-energy peptide conformations follow a distinct "grammar". The finding of equivalent words empowers the "divide and conquer" method with the flexibility of fragment substitution. The learnt grammar is used to improve the efficiency of the "divide and conquer" method by removing unfavorable φ-ψ combinations without the need of dedicated human effort. The machine learning assisted search method is illustrated by efficiently searching the conformations of GGG/AAA/GGGG/AAAA/GGGGG through assembling the structures of GFG/GFGG. Moreover, the computational cost of the new method is shown to increase rather slowly with the peptide length.
Optimal Symmetric Multimodal Templates and Concatenated Random Forests for Supervised Brain Tumor Segmentation (Simplified) with ANTsR.

PubMed

Tustison, Nicholas J; Shrinidhi, K L; Wintermark, Max; Durst, Christopher R; Kandel, Benjamin M; Gee, James C; Grossman, Murray C; Avants, Brian B

2015-04-01

Segmenting and quantifying gliomas from MRI is an important task for diagnosis, planning intervention, and for tracking tumor changes over time. However, this task is complicated by the lack of prior knowledge concerning tumor location, spatial extent, shape, possible displacement of normal tissue, and intensity signature. To accommodate such complications, we introduce a framework for supervised segmentation based on multiple modality intensity, geometry, and asymmetry feature sets. These features drive a supervised whole-brain and tumor segmentation approach based on random forest-derived probabilities. The asymmetry-related features (based on optimal symmetric multimodal templates) demonstrate excellent discriminative properties within this framework. We also gain performance by generating probability maps from random forest models and using these maps for a refining Markov random field regularized probabilistic segmentation. This strategy allows us to interface the supervised learning capabilities of the random forest model with regularized probabilistic segmentation using the recently developed ANTsR package--a comprehensive statistical and visualization interface between the popular Advanced Normalization Tools (ANTs) and the R statistical project. The reported algorithmic framework was the top-performing entry in the MICCAI 2013 Multimodal Brain Tumor Segmentation challenge. The challenge data were widely varying consisting of both high-grade and low-grade glioma tumor four-modality MRI from five different institutions. Average Dice overlap measures for the final algorithmic assessment were 0.87, 0.78, and 0.74 for "complete", "core", and "enhanced" tumor components, respectively.

Tropical secondary forests regenerating after shifting cultivation in the Philippines uplands are important carbon sinks.

PubMed

Mukul, Sharif A; Herbohn, John; Firn, Jennifer

2016-03-08

In the tropics, shifting cultivation has long been attributed to large scale forest degradation, and remains a major source of uncertainty in forest carbon accounting. In the Philippines, shifting cultivation, locally known as kaingin, is a major land-use in upland areas. We measured the distribution and recovery of aboveground biomass carbon along a fallow gradient in post-kaingin secondary forests in an upland area in the Philippines. We found significantly higher carbon in the aboveground total biomass and living woody biomass in old-growth forest, while coarse dead wood biomass carbon was higher in the new fallow sites. For young through to the oldest fallow secondary forests, there was a progressive recovery of biomass carbon evident. Multivariate analysis indicates patch size as an influential factor in explaining the variation in biomass carbon recovery in secondary forests after shifting cultivation. Our study indicates secondary forests after shifting cultivation are substantial carbon sinks and that this capacity to store carbon increases with abandonment age. Large trees contribute most to aboveground biomass. A better understanding of the relative contribution of different biomass sources in aboveground total forest biomass, however, is necessary to fully capture the value of such landscapes from forest management, restoration and conservation perspectives.
Tropical secondary forests regenerating after shifting cultivation in the Philippines uplands are important carbon sinks

PubMed Central

Mukul, Sharif A.; Herbohn, John; Firn, Jennifer

2016-01-01

In the tropics, shifting cultivation has long been attributed to large scale forest degradation, and remains a major source of uncertainty in forest carbon accounting. In the Philippines, shifting cultivation, locally known as kaingin, is a major land-use in upland areas. We measured the distribution and recovery of aboveground biomass carbon along a fallow gradient in post-kaingin secondary forests in an upland area in the Philippines. We found significantly higher carbon in the aboveground total biomass and living woody biomass in old-growth forest, while coarse dead wood biomass carbon was higher in the new fallow sites. For young through to the oldest fallow secondary forests, there was a progressive recovery of biomass carbon evident. Multivariate analysis indicates patch size as an influential factor in explaining the variation in biomass carbon recovery in secondary forests after shifting cultivation. Our study indicates secondary forests after shifting cultivation are substantial carbon sinks and that this capacity to store carbon increases with abandonment age. Large trees contribute most to aboveground biomass. A better understanding of the relative contribution of different biomass sources in aboveground total forest biomass, however, is necessary to fully capture the value of such landscapes from forest management, restoration and conservation perspectives. PMID:26951761
Tropical secondary forests regenerating after shifting cultivation in the Philippines uplands are important carbon sinks

NASA Astrophysics Data System (ADS)

Mukul, Sharif A.; Herbohn, John; Firn, Jennifer

2016-03-01

In the tropics, shifting cultivation has long been attributed to large scale forest degradation, and remains a major source of uncertainty in forest carbon accounting. In the Philippines, shifting cultivation, locally known as kaingin, is a major land-use in upland areas. We measured the distribution and recovery of aboveground biomass carbon along a fallow gradient in post-kaingin secondary forests in an upland area in the Philippines. We found significantly higher carbon in the aboveground total biomass and living woody biomass in old-growth forest, while coarse dead wood biomass carbon was higher in the new fallow sites. For young through to the oldest fallow secondary forests, there was a progressive recovery of biomass carbon evident. Multivariate analysis indicates patch size as an influential factor in explaining the variation in biomass carbon recovery in secondary forests after shifting cultivation. Our study indicates secondary forests after shifting cultivation are substantial carbon sinks and that this capacity to store carbon increases with abandonment age. Large trees contribute most to aboveground biomass. A better understanding of the relative contribution of different biomass sources in aboveground total forest biomass, however, is necessary to fully capture the value of such landscapes from forest management, restoration and conservation perspectives.
Introduction to uses and interpretation of principal component analyses in forest biology.

Treesearch

J. G. Isebrands; Thomas R. Crow

1975-01-01

The application of principal component analysis for interpretation of multivariate data sets is reviewed with emphasis on (1) reduction of the number of variables, (2) ordination of variables, and (3) applications in conjunction with multiple regression.
High resolution satellite remote sensing used in a stratified random sampling scheme to quantify the constituent land cover components of the shifting cultivation mosaic of the Democratic Republic of Congo

NASA Astrophysics Data System (ADS)

Molinario, G.; Hansen, M.; Potapov, P.

2016-12-01

High resolution satellite imagery obtained from the National Geospatial Intelligence Agency through NASA was used to photo-interpret sample areas within the DRC. The area sampled is a stratifcation of the forest cover loss from circa 2014 that either occurred completely within the previosly mapped homogenous area of the Rural Complex, at it's interface with primary forest, or in isolated forest perforations. Previous research resulted in a map of these areas that contextualizes forest loss depending on where it occurs and with what spatial density, leading to a better understading of the real impacts on forest degradation of livelihood shifting cultivation. The stratified random sampling approach of these areas allows the characterization of the constituent land cover types within these areas, and their variability throughout the DRC. Shifting cultivation has a variable forest degradation footprint in the DRC depending on many factors that drive it, but it's role in forest degradation and deforestation had been disputed, leading us to investigate and quantify the clearing and reuse rates within the strata throughout the country.
Tehran Air Pollutants Prediction Based on Random Forest Feature Selection Method

NASA Astrophysics Data System (ADS)

Shamsoddini, A.; Aboodi, M. R.; Karami, J.

2017-09-01

Air pollution as one of the most serious forms of environmental pollutions poses huge threat to human life. Air pollution leads to environmental instability, and has harmful and undesirable effects on the environment. Modern prediction methods of the pollutant concentration are able to improve decision making and provide appropriate solutions. This study examines the performance of the Random Forest feature selection in combination with multiple-linear regression and Multilayer Perceptron Artificial Neural Networks methods, in order to achieve an efficient model to estimate carbon monoxide and nitrogen dioxide, sulfur dioxide and PM2.5 contents in the air. The results indicated that Artificial Neural Networks fed by the attributes selected by Random Forest feature selection method performed more accurate than other models for the modeling of all pollutants. The estimation accuracy of sulfur dioxide emissions was lower than the other air contaminants whereas the nitrogen dioxide was predicted more accurate than the other pollutants.
Research on electricity consumption forecast based on mutual information and random forests algorithm

NASA Astrophysics Data System (ADS)

Shi, Jing; Shi, Yunli; Tan, Jian; Zhu, Lei; Li, Hu

2018-02-01

Traditional power forecasting models cannot efficiently take various factors into account, neither to identify the relation factors. In this paper, the mutual information in information theory and the artificial intelligence random forests algorithm are introduced into the medium and long-term electricity demand prediction. Mutual information can identify the high relation factors based on the value of average mutual information between a variety of variables and electricity demand, different industries may be highly associated with different variables. The random forests algorithm was used for building the different industries forecasting models according to the different correlation factors. The data of electricity consumption in Jiangsu Province is taken as a practical example, and the above methods are compared with the methods without regard to mutual information and the industries. The simulation results show that the above method is scientific, effective, and can provide higher prediction accuracy.
Multivariate random-parameters zero-inflated negative binomial regression model: an application to estimate crash frequencies at intersections.

PubMed

Dong, Chunjiao; Clarke, David B; Yan, Xuedong; Khattak, Asad; Huang, Baoshan

2014-09-01

Crash data are collected through police reports and integrated with road inventory data for further analysis. Integrated police reports and inventory data yield correlated multivariate data for roadway entities (e.g., segments or intersections). Analysis of such data reveals important relationships that can help focus on high-risk situations and coming up with safety countermeasures. To understand relationships between crash frequencies and associated variables, while taking full advantage of the available data, multivariate random-parameters models are appropriate since they can simultaneously consider the correlation among the specific crash types and account for unobserved heterogeneity. However, a key issue that arises with correlated multivariate data is the number of crash-free samples increases, as crash counts have many categories. In this paper, we describe a multivariate random-parameters zero-inflated negative binomial (MRZINB) regression model for jointly modeling crash counts. The full Bayesian method is employed to estimate the model parameters. Crash frequencies at urban signalized intersections in Tennessee are analyzed. The paper investigates the performance of MZINB and MRZINB regression models in establishing the relationship between crash frequencies, pavement conditions, traffic factors, and geometric design features of roadway intersections. Compared to the MZINB model, the MRZINB model identifies additional statistically significant factors and provides better goodness of fit in developing the relationships. The empirical results show that MRZINB model possesses most of the desirable statistical properties in terms of its ability to accommodate unobserved heterogeneity and excess zero counts in correlated data. Notably, in the random-parameters MZINB model, the estimated parameters vary significantly across intersections for different crash types. Copyright © 2014 Elsevier Ltd. All rights reserved.
Effects of the amount and composition of the forest floor on emergence and early establishment of loblolly pine seedlings

Treesearch

Michael G. Shelton

1995-01-01

Five forest floor weights (0, 10, 20, 30, and 40 MgJha), three forest floor compositions (pine, pine-hardwood, and hardwood), and two seed placements (forest floor and soil surface) were tested in a three-factorial. split-plot design with four incomplete, randomized blocks. The experiment was conducted in a nursery setting and used wooden frames to define 0.145-m
Extrapolating intensified forest inventory data to the surrounding landscape using landsat

Treesearch

Evan B. Brooks; John W. Coulston; Valerie A. Thomas; Randolph H. Wynne

2015-01-01

In 2011, a collection of spatially intensified plots was established on three of the Experimental Forests and Ranges (EFRs) sites with the intent of facilitating FIA program objectives for regional extrapolation. Characteristic coefficients from harmonic regression (HR) analysis of associated Landsat stacks are used as inputs into a conditional random forests model to...
Forest-floor disturbance reduces chipmunk (Tamias spp.) abundance two years after variable-retention harvest of Pacific Northwestern forests

Treesearch

Randall J. Wilk; Timothy B. Harrington; Robert A. Gitzen; Chris C. Maguire

2015-01-01

We evaluated the two-year effects of variable-retention harvest on chipmunk (Tamias spp.) abundance (N^) and habitat in mature coniferous forests in western Oregon and Washington because wildlife responses to density/pattern of retained trees remain largely unknown. In a randomized complete-block design, six...
Highlights of the national evaluation of the Forest Stewardship Planning Program

Treesearch

R.J. Moulton; J.D. Esseks

2001-01-01

In 1998 and 1999, a nationwide random sample of 1238 nonindustrial private (NIPF) landowners with approved multiple resource Forest Stewardship Plans were interviewed to determine if this program is meeting its Congressional mandate of promoting sustainable management of forest resources on NIPF ownerships. It was found that two-thirds of program participants had never...
Ownership and ecosystem as sources of spatial heterogeneity in a forested landscape, Wisconsin, USA

Treesearch

Thomas R. Crow; George E. Host; David J. Mladenoff

1999-01-01

The interaction between physical environment and land ownership in creating spatial heterogeneity was studied in largely forested landscapes of northern Wisconsin, USA. A stratified random approach was used in which 2500-ha plots representing two ownerships (National Forest and private non-industrial) were located within two regional ecosystems (extremely well-drained...
Bird distributional patterns support biogeographical histories and are associated with bioclimatic units in the Atlantic Forest, Brazil.

PubMed

Carvalho, Cristiano DE Santana; Nascimento, Nayla Fábia Ferreira DO; Araujo, Helder F P DE

2017-10-17

Rivers as barriers to dispersal and past forest refugia are two of the hypotheses proposed to explain the patterns of biodiversity in the Atlantic Forest. It has recently been shown that possible past refugia correspond to bioclimatically different regions, so we tested whether patterns of shared distribution of bird taxa in the Atlantic Forest are 1) limited by the Doce and São Francisco rivers or 2) associated with the bioclimatically different southern and northeastern regions. We catalogued lists of forest birds from 45 locations, 36 in the Atlantic forest and nine in Amazon, and used parsimony analysis of endemicity to identify groups of shared taxa. We also compared differences between these groups by permutational multivariate analysis of variance and identified the species that best supported the resulting groups. The results showed that the distribution of forest birds is divided into two main regions in the Atlantic Forest, the first with more southern localities and the second with northeastern localities. This distributional pattern is not delimited by riverbanks, but it may be associated with bioclimatic units, surrogated by altitude, that maintain current environmental differences between two main regions on Atlantic Forest and may be related to phylogenetic histories of taxa supporting the two groups.
Advanced Subspace Techniques for Modeling Channel and Session Variability in a Speaker Recognition System

DTIC Science & Technology

2012-03-01

with each SVM discriminating between a pair of the N total speakers in the data set. The (( + 1))/2 classifiers then vote on the final...classification of a test sample. The Random Forest classifier is an ensemble classifier that votes amongst decision trees generated with each node using...Forest vote , and the effects of overtraining will be mitigated by the fact that each decision tree is overtrained differently (due to the random
Probability machines: consistent probability estimation using nonparametric learning machines.

PubMed

Malley, J D; Kruppa, J; Dasgupta, A; Malley, K G; Ziegler, A

2012-01-01

Most machine learning approaches only provide a classification for binary responses. However, probabilities are required for risk estimation using individual patient characteristics. It has been shown recently that every statistical learning machine known to be consistent for a nonparametric regression problem is a probability machine that is provably consistent for this estimation problem. The aim of this paper is to show how random forests and nearest neighbors can be used for consistent estimation of individual probabilities. Two random forest algorithms and two nearest neighbor algorithms are described in detail for estimation of individual probabilities. We discuss the consistency of random forests, nearest neighbors and other learning machines in detail. We conduct a simulation study to illustrate the validity of the methods. We exemplify the algorithms by analyzing two well-known data sets on the diagnosis of appendicitis and the diagnosis of diabetes in Pima Indians. Simulations demonstrate the validity of the method. With the real data application, we show the accuracy and practicality of this approach. We provide sample code from R packages in which the probability estimation is already available. This means that all calculations can be performed using existing software. Random forest algorithms as well as nearest neighbor approaches are valid machine learning methods for estimating individual probabilities for binary responses. Freely available implementations are available in R and may be used for applications.
A random forest algorithm for nowcasting of intense precipitation events

NASA Astrophysics Data System (ADS)

Das, Saurabh; Chakraborty, Rohit; Maitra, Animesh

2017-09-01

Automatic nowcasting of convective initiation and thunderstorms has potential applications in several sectors including aviation planning and disaster management. In this paper, random forest based machine learning algorithm is tested for nowcasting of convective rain with a ground based radiometer. Brightness temperatures measured at 14 frequencies (7 frequencies in 22-31 GHz band and 7 frequencies in 51-58 GHz bands) are utilized as the inputs of the model. The lower frequency band is associated to the water vapor absorption whereas the upper frequency band relates to the oxygen absorption and hence, provide information on the temperature and humidity of the atmosphere. Synthetic minority over-sampling technique is used to balance the data set and 10-fold cross validation is used to assess the performance of the model. Results indicate that random forest algorithm with fixed alarm generation time of 30 min and 60 min performs quite well (probability of detection of all types of weather condition ∼90%) with low false alarms. It is, however, also observed that reducing the alarm generation time improves the threat score significantly and also decreases false alarms. The proposed model is found to be very sensitive to the boundary layer instability as indicated by the variable importance measure. The study shows the suitability of a random forest algorithm for nowcasting application utilizing a large number of input parameters from diverse sources and can be utilized in other forecasting problems.
Learning accurate and interpretable models based on regularized random forests regression

PubMed Central

2014-01-01

Background Many biology related research works combine data from multiple sources in an effort to understand the underlying problems. It is important to find and interpret the most important information from these sources. Thus it will be beneficial to have an effective algorithm that can simultaneously extract decision rules and select critical features for good interpretation while preserving the prediction performance. Methods In this study, we focus on regression problems for biological data where target outcomes are continuous. In general, models constructed from linear regression approaches are relatively easy to interpret. However, many practical biological applications are nonlinear in essence where we can hardly find a direct linear relationship between input and output. Nonlinear regression techniques can reveal nonlinear relationship of data, but are generally hard for human to interpret. We propose a rule based regression algorithm that uses 1-norm regularized random forests. The proposed approach simultaneously extracts a small number of rules from generated random forests and eliminates unimportant features. Results We tested the approach on some biological data sets. The proposed approach is able to construct a significantly smaller set of regression rules using a subset of attributes while achieving prediction performance comparable to that of random forests regression. Conclusion It demonstrates high potential in aiding prediction and interpretation of nonlinear relationships of the subject being studied. PMID:25350120
Multivariate analysis: greater insights into complex systems

USDA-ARS?s Scientific Manuscript database

Many agronomic researchers measure and collect multiple response variables in an effort to understand the more complex nature of the system being studied. Multivariate (MV) statistical methods encompass the simultaneous analysis of all random variables (RV) measured on each experimental or sampling ...
The contribution of competition to tree mortality in old-growth coniferous forests

USGS Publications Warehouse

Das, A.; Battles, J.; Stephenson, N.L.; van Mantgem, P.J.

2011-01-01

Competition is a well-documented contributor to tree mortality in temperate forests, with numerous studies documenting a relationship between tree death and the competitive environment. Models frequently rely on competition as the only non-random mechanism affecting tree mortality. However, for mature forests, competition may cease to be the primary driver of mortality.We use a large, long-term dataset to study the importance of competition in determining tree mortality in old-growth forests on the western slope of the Sierra Nevada of California, U.S.A. We make use of the comparative spatial configuration of dead and live trees, changes in tree spatial pattern through time, and field assessments of contributors to an individual tree's death to quantify competitive effects.Competition was apparently a significant contributor to tree mortality in these forests. Trees that died tended to be in more competitive environments than trees that survived, and suppression frequently appeared as a factor contributing to mortality. On the other hand, based on spatial pattern analyses, only three of 14 plots demonstrated compelling evidence that competition was dominating mortality. Most of the rest of the plots fell within the expectation for random mortality, and three fit neither the random nor the competition model. These results suggest that while competition is often playing a significant role in tree mortality processes in these forests it only infrequently governs those processes. In addition, the field assessments indicated a substantial presence of biotic mortality agents in trees that died.While competition is almost certainly important, demographics in these forests cannot accurately be characterized without a better grasp of other mortality processes. In particular, we likely need a better understanding of biotic agents and their interactions with one another and with competition. ?? 2011.

Testing the structure of earthquake networks from multivariate time series of successive main shocks in Greece

NASA Astrophysics Data System (ADS)

Chorozoglou, D.; Kugiumtzis, D.; Papadimitriou, E.

2018-06-01

The seismic hazard assessment in the area of Greece is attempted by studying the earthquake network structure, such as small-world and random. In this network, a node represents a seismic zone in the study area and a connection between two nodes is given by the correlation of the seismic activity of two zones. To investigate the network structure, and particularly the small-world property, the earthquake correlation network is compared with randomized ones. Simulations on multivariate time series of different length and number of variables show that for the construction of randomized networks the method randomizing the time series performs better than methods randomizing directly the original network connections. Based on the appropriate randomization method, the network approach is applied to time series of earthquakes that occurred between main shocks in the territory of Greece spanning the period 1999-2015. The characterization of networks on sliding time windows revealed that small-world structure emerges in the last time interval, shortly before the main shock.
Extensions to Multivariate Space Time Mixture Modeling of Small Area Cancer Data.

PubMed

Carroll, Rachel; Lawson, Andrew B; Faes, Christel; Kirby, Russell S; Aregay, Mehreteab; Watjou, Kevin

2017-05-09

Oral cavity and pharynx cancer, even when considered together, is a fairly rare disease. Implementation of multivariate modeling with lung and bronchus cancer, as well as melanoma cancer of the skin, could lead to better inference for oral cavity and pharynx cancer. The multivariate structure of these models is accomplished via the use of shared random effects, as well as other multivariate prior distributions. The results in this paper indicate that care should be taken when executing these types of models, and that multivariate mixture models may not always be the ideal option, depending on the data of interest.
Estimating correlation between multivariate longitudinal data in the presence of heterogeneity.

PubMed

Gao, Feng; Philip Miller, J; Xiong, Chengjie; Luo, Jingqin; Beiser, Julia A; Chen, Ling; Gordon, Mae O

2017-08-17

Estimating correlation coefficients among outcomes is one of the most important analytical tasks in epidemiological and clinical research. Availability of multivariate longitudinal data presents a unique opportunity to assess joint evolution of outcomes over time. Bivariate linear mixed model (BLMM) provides a versatile tool with regard to assessing correlation. However, BLMMs often assume that all individuals are drawn from a single homogenous population where the individual trajectories are distributed smoothly around population average. Using longitudinal mean deviation (MD) and visual acuity (VA) from the Ocular Hypertension Treatment Study (OHTS), we demonstrated strategies to better understand the correlation between multivariate longitudinal data in the presence of potential heterogeneity. Conditional correlation (i.e., marginal correlation given random effects) was calculated to describe how the association between longitudinal outcomes evolved over time within specific subpopulation. The impact of heterogeneity on correlation was also assessed by simulated data. There was a significant positive correlation in both random intercepts (ρ = 0.278, 95% CI: 0.121-0.420) and random slopes (ρ = 0.579, 95% CI: 0.349-0.810) between longitudinal MD and VA, and the strength of correlation constantly increased over time. However, conditional correlation and simulation studies revealed that the correlation was induced primarily by participants with rapid deteriorating MD who only accounted for a small fraction of total samples. Conditional correlation given random effects provides a robust estimate to describe the correlation between multivariate longitudinal data in the presence of unobserved heterogeneity (NCT00000125).
Analysis of Machine Learning Techniques for Heart Failure Readmissions.

PubMed

Mortazavi, Bobak J; Downing, Nicholas S; Bucholz, Emily M; Dharmarajan, Kumar; Manhapra, Ajay; Li, Shu-Xia; Negahban, Sahand N; Krumholz, Harlan M

2016-11-01

The current ability to predict readmissions in patients with heart failure is modest at best. It is unclear whether machine learning techniques that address higher dimensional, nonlinear relationships among variables would enhance prediction. We sought to compare the effectiveness of several machine learning algorithms for predicting readmissions. Using data from the Telemonitoring to Improve Heart Failure Outcomes trial, we compared the effectiveness of random forests, boosting, random forests combined hierarchically with support vector machines or logistic regression (LR), and Poisson regression against traditional LR to predict 30- and 180-day all-cause readmissions and readmissions because of heart failure. We randomly selected 50% of patients for a derivation set, and a validation set comprised the remaining patients, validated using 100 bootstrapped iterations. We compared C statistics for discrimination and distributions of observed outcomes in risk deciles for predictive range. In 30-day all-cause readmission prediction, the best performing machine learning model, random forests, provided a 17.8% improvement over LR (mean C statistics, 0.628 and 0.533, respectively). For readmissions because of heart failure, boosting improved the C statistic by 24.9% over LR (mean C statistic 0.678 and 0.543, respectively). For 30-day all-cause readmission, the observed readmission rates in the lowest and highest deciles of predicted risk with random forests (7.8% and 26.2%, respectively) showed a much wider separation than LR (14.2% and 16.4%, respectively). Machine learning methods improved the prediction of readmission after hospitalization for heart failure compared with LR and provided the greatest predictive range in observed readmission rates. © 2016 American Heart Association, Inc.
Random forest learning of ultrasonic statistical physics and object spaces for lesion detection in 2D sonomammography

NASA Astrophysics Data System (ADS)

Sheet, Debdoot; Karamalis, Athanasios; Kraft, Silvan; Noël, Peter B.; Vag, Tibor; Sadhu, Anup; Katouzian, Amin; Navab, Nassir; Chatterjee, Jyotirmoy; Ray, Ajoy K.

2013-03-01

Breast cancer is the most common form of cancer in women. Early diagnosis can significantly improve lifeexpectancy and allow different treatment options. Clinicians favor 2D ultrasonography for breast tissue abnormality screening due to high sensitivity and specificity compared to competing technologies. However, inter- and intra-observer variability in visual assessment and reporting of lesions often handicaps its performance. Existing Computer Assisted Diagnosis (CAD) systems though being able to detect solid lesions are often restricted in performance. These restrictions are inability to (1) detect lesion of multiple sizes and shapes, and (2) differentiate between hypo-echoic lesions from their posterior acoustic shadowing. In this work we present a completely automatic system for detection and segmentation of breast lesions in 2D ultrasound images. We employ random forests for learning of tissue specific primal to discriminate breast lesions from surrounding normal tissues. This enables it to detect lesions of multiple shapes and sizes, as well as discriminate between hypo-echoic lesion from associated posterior acoustic shadowing. The primal comprises of (i) multiscale estimated ultrasonic statistical physics and (ii) scale-space characteristics. The random forest learns lesion vs. background primal from a database of 2D ultrasound images with labeled lesions. For segmentation, the posterior probabilities of lesion pixels estimated by the learnt random forest are hard thresholded to provide a random walks segmentation stage with starting seeds. Our method achieves detection with 99.19% accuracy and segmentation with mean contour-to-contour error < 3 pixels on a set of 40 images with 49 lesions.
The role of forest in mitigating the impact of atmospheric dust pollution in a mixed landscape.

PubMed

Santos, Artur; Pinho, Pedro; Munzi, Silvana; Botelho, Maria João; Palma-Oliveira, José Manuel; Branquinho, Cristina

2017-05-01

Atmospheric dust pollution, especially particulate matter below 2.5 μm, causes 3.3 million premature deaths per year worldwide. Although pollution sources are increasingly well known, the role of ecosystems in mitigating their impact is still poorly known. Our objective was to investigate the role of forests located in the surrounding of industrial and urban areas in reducing atmospheric dust pollution. This was tested using lichen transplants as biomonitors in a Mediterranean regional area with high levels of dry deposition. After a multivariate analysis, we have modeled the maximum pollution load expected for each site taking into consideration nearby pollutant sources. The difference between maximum expected pollution load and the observed values was explained by the deposition in nearby forests. Both the dust pollution and the ameliorating effect of forested areas were then mapped. The results showed that forest located nearby pollution sources plays an important role in reducing atmospheric dust pollution, highlighting their importance in the provision of the ecosystem service of air purification.
Serum Proteome Analysis for Profiling Predictive Protein Markers Associated with the Severity of Skin Lesions Induced by Ionizing Radiation.

PubMed

Chaze, Thibault; Hornez, Louis; Chambon, Christophe; Haddad, Iman; Vinh, Joelle; Peyrat, Jean-Philippe; Benderitter, Marc; Guipaud, Olivier

2013-07-10

The finding of new diagnostic and prognostic markers of local radiation injury, and particularly of the cutaneous radiation syndrome, is crucial for its medical management, in the case of both accidental exposure and radiotherapy side effects. Especially, a fast high-throughput method is still needed for triage of people accidentally exposed to ionizing radiation. In this study, we investigated the impact of localized irradiation of the skin on the early alteration of the serum proteome of mice in an effort to discover markers associated with the exposure and severity of impending damage. Using two different large-scale quantitative proteomic approaches, 2D-DIGE-MS and SELDI-TOF-MS, we performed global analyses of serum proteins collected in the clinical latency phase (days 3 and 7) from non-irradiated and locally irradiated mice exposed to high doses of 20, 40 and 80 Gy which will develop respectively erythema, moist desquamation and necrosis. Unsupervised and supervised multivariate statistical analyses (principal component analysis, partial-least square discriminant analysis and Random Forest analysis) using 2D-DIGE quantitative protein data allowed us to discriminate early between non-irradiated and irradiated animals, and between uninjured/slightly injured animals and animals that will develop severe lesions. On the other hand, despite a high number of animal replicates, PLS-DA and Random Forest analyses of SELDI-TOF-MS data failed to reveal sets of MS peaks able to discriminate between the different groups of animals. Our results show that, unlike SELDI-TOF-MS, the 2D-DIGE approach remains a powerful and promising method for the discovery of sets of proteins that could be used for the development of clinical tests for triage and the prognosis of the severity of radiation-induced skin lesions. We propose a list of 15 proteins which constitutes a set of candidate proteins for triage and prognosis of skin lesion outcomes.
Serum Proteome Analysis for Profiling Predictive Protein Markers Associated with the Severity of Skin Lesions Induced by Ionizing Radiation

PubMed Central

Chaze, Thibault; Hornez, Louis; Chambon, Christophe; Haddad, Iman; Vinh, Joelle; Peyrat, Jean-Philippe; Benderitter, Marc; Guipaud, Olivier

2013-01-01

The finding of new diagnostic and prognostic markers of local radiation injury, and particularly of the cutaneous radiation syndrome, is crucial for its medical management, in the case of both accidental exposure and radiotherapy side effects. Especially, a fast high-throughput method is still needed for triage of people accidentally exposed to ionizing radiation. In this study, we investigated the impact of localized irradiation of the skin on the early alteration of the serum proteome of mice in an effort to discover markers associated with the exposure and severity of impending damage. Using two different large-scale quantitative proteomic approaches, 2D-DIGE-MS and SELDI-TOF-MS, we performed global analyses of serum proteins collected in the clinical latency phase (days 3 and 7) from non-irradiated and locally irradiated mice exposed to high doses of 20, 40 and 80 Gy which will develop respectively erythema, moist desquamation and necrosis. Unsupervised and supervised multivariate statistical analyses (principal component analysis, partial-least square discriminant analysis and Random Forest analysis) using 2D-DIGE quantitative protein data allowed us to discriminate early between non-irradiated and irradiated animals, and between uninjured/slightly injured animals and animals that will develop severe lesions. On the other hand, despite a high number of animal replicates, PLS-DA and Random Forest analyses of SELDI-TOF-MS data failed to reveal sets of MS peaks able to discriminate between the different groups of animals. Our results show that, unlike SELDI-TOF-MS, the 2D-DIGE approach remains a powerful and promising method for the discovery of sets of proteins that could be used for the development of clinical tests for triage and the prognosis of the severity of radiation-induced skin lesions. We propose a list of 15 proteins which constitutes a set of candidate proteins for triage and prognosis of skin lesion outcomes. PMID:28250398
Gait phenotypes in paediatric hereditary spastic paraplegia revealed by dynamic time warping analysis and random forests

PubMed Central

Martín-Gonzalo, Juan Andrés; Rodríguez-Andonaegui, Irene; López-López, Javier; Pascual-Pascual, Samuel Ignacio

2018-01-01

The Hereditary Spastic Paraplegias (HSP) are a group of heterogeneous disorders with a wide spectrum of underlying neural pathology, and hence HSP patients express a variety of gait abnormalities. Classification of these phenotypes may help in monitoring disease progression and personalizing therapies. This is currently managed by measuring values of some kinematic and spatio-temporal parameters at certain moments during the gait cycle, either in the doctor´s surgery room or after very precise measurements produced by instrumental gait analysis (IGA). These methods, however, do not provide information about the whole structure of the gait cycle. Classification of the similarities among time series of IGA measured values of sagittal joint positions throughout the whole gait cycle can be achieved by hierarchical clustering analysis based on multivariate dynamic time warping (DTW). Random forests can estimate which are the most important isolated parameters to predict the classification revealed by DTW, since clinicians need to refer to them in their daily practice. We acquired time series of pelvic, hip, knee, ankle and forefoot sagittal angular positions from 26 HSP and 33 healthy children with an optokinetic IGA system. DTW revealed six gait patterns with different degrees of impairment of walking speed, cadence and gait cycle distribution and related with patient’s age, sex, GMFCS stage, concurrence of polyneuropathy and abnormal visual evoked potentials or corpus callosum. The most important parameters to differentiate patterns were mean pelvic tilt and hip flexion at initial contact. Longer time of support, decreased values of hip extension and increased knee flexion at initial contact can differentiate the mildest, near to normal HSP gait phenotype and the normal healthy one. Increased values of knee flexion at initial contact and delayed peak of knee flexion are important factors to distinguish GMFCS stages I from II-III and concurrence of polyneuropathy. PMID:29518090
Comparing ensemble learning methods based on decision tree classifiers for protein fold recognition.

PubMed

Bardsiri, Mahshid Khatibi; Eftekhari, Mahdi

2014-01-01

In this paper, some methods for ensemble learning of protein fold recognition based on a decision tree (DT) are compared and contrasted against each other over three datasets taken from the literature. According to previously reported studies, the features of the datasets are divided into some groups. Then, for each of these groups, three ensemble classifiers, namely, random forest, rotation forest and AdaBoost.M1 are employed. Also, some fusion methods are introduced for combining the ensemble classifiers obtained in the previous step. After this step, three classifiers are produced based on the combination of classifiers of types random forest, rotation forest and AdaBoost.M1. Finally, the three different classifiers achieved are combined to make an overall classifier. Experimental results show that the overall classifier obtained by the genetic algorithm (GA) weighting fusion method, is the best one in comparison to previously applied methods in terms of classification accuracy.
Polarimetric signatures of a coniferous forest canopy based on vector radiative transfer theory

NASA Technical Reports Server (NTRS)

Karam, M. A.; Fung, A. K.; Amar, F.; Mougin, E.; Lopes, A.; Beaudoin, A.

1992-01-01

Complete polarization signatures of a coniferous forest canopy are studied by the iterative solution of the vector radiative transfer equations up to the second order. The forest canopy constituents (leaves, branches, stems, and trunk) are embedded in a multi-layered medium over a rough interface. The branches, stems and trunk scatterers are modeled as finite randomly oriented cylinders. The leaves are modeled as randomly oriented needles. For a plane wave exciting the canopy, the average Mueller matrix is formulated in terms of the iterative solution of the radiative transfer solution and used to determine the linearly polarized backscattering coefficients, the co-polarized and cross-polarized power returns, and the phase difference statistics. Numerical results are presented to investigate the effect of transmitting and receiving antenna configurations on the polarimetric signature of a pine forest. Comparison is made with measurements.
Field evaluation of a random forest activity classifier for wrist-worn accelerometer data.

PubMed

Pavey, Toby G; Gilson, Nicholas D; Gomersall, Sjaan R; Clark, Bronwyn; Trost, Stewart G

2017-01-01

Wrist-worn accelerometers are convenient to wear and associated with greater wear-time compliance. Previous work has generally relied on choreographed activity trials to train and test classification models. However, validity in free-living contexts is starting to emerge. Study aims were: (1) train and test a random forest activity classifier for wrist accelerometer data; and (2) determine if models trained on laboratory data perform well under free-living conditions. Twenty-one participants (mean age=27.6±6.2) completed seven lab-based activity trials and a 24h free-living trial (N=16). Participants wore a GENEActiv monitor on the non-dominant wrist. Classification models recognising four activity classes (sedentary, stationary+, walking, and running) were trained using time and frequency domain features extracted from 10-s non-overlapping windows. Model performance was evaluated using leave-one-out-cross-validation. Models were implemented using the randomForest package within R. Classifier accuracy during the 24h free living trial was evaluated by calculating agreement with concurrently worn activPAL monitors. Overall classification accuracy for the random forest algorithm was 92.7%. Recognition accuracy for sedentary, stationary+, walking, and running was 80.1%, 95.7%, 91.7%, and 93.7%, respectively for the laboratory protocol. Agreement with the activPAL data (stepping vs. non-stepping) during the 24h free-living trial was excellent and, on average, exceeded 90%. The ICC for stepping time was 0.92 (95% CI=0.75-0.97). However, sensitivity and positive predictive values were modest. Mean bias was 10.3min/d (95% LOA=-46.0 to 25.4min/d). The random forest classifier for wrist accelerometer data yielded accurate group-level predictions under controlled conditions, but was less accurate at identifying stepping verse non-stepping behaviour in free living conditions Future studies should conduct more rigorous field-based evaluations using observation as a criterion measure. Copyright © 2016 Sports Medicine Australia. Published by Elsevier Ltd. All rights reserved.
Benchmarking dairy herd health status using routinely recorded herd summary data.

PubMed

Parker Gaddis, K L; Cole, J B; Clay, J S; Maltecca, C

2016-02-01

Genetic improvement of dairy cattle health through the use of producer-recorded data has been determined to be feasible. Low estimated heritabilities indicate that genetic progress will be slow. Variation observed in lowly heritable traits can largely be attributed to nongenetic factors, such as the environment. More rapid improvement of dairy cattle health may be attainable if herd health programs incorporate environmental and managerial aspects. More than 1,100 herd characteristics are regularly recorded on farm test-days. We combined these data with producer-recorded health event data, and parametric and nonparametric models were used to benchmark herd and cow health status. Health events were grouped into 3 categories for analyses: mastitis, reproductive, and metabolic. Both herd incidence and individual incidence were used as dependent variables. Models implemented included stepwise logistic regression, support vector machines, and random forests. At both the herd and individual levels, random forest models attained the highest accuracy for predicting health status in all health event categories when evaluated with 10-fold cross-validation. Accuracy (SD) ranged from 0.61 (0.04) to 0.63 (0.04) when using random forest models at the herd level. Accuracy of prediction (SD) at the individual cow level ranged from 0.87 (0.06) to 0.93 (0.001) with random forest models. Highly significant variables and key words from logistic regression and random forest models were also investigated. All models identified several of the same key factors for each health event category, including movement out of the herd, size of the herd, and weather-related variables. We concluded that benchmarking health status using routinely collected herd data is feasible. Nonparametric models were better suited to handle this complex data with numerous variables. These data mining techniques were able to perform prediction of health status and could add evidence to personal experience in herd management. Copyright © 2016 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
A Machine Learning Approach to Automated Gait Analysis for the Noldus Catwalk System.

PubMed

Frohlich, Holger; Claes, Kasper; De Wolf, Catherine; Van Damme, Xavier; Michel, Anne

2018-05-01

Gait analysis of animal disease models can provide valuable insights into in vivo compound effects and thus help in preclinical drug development. The purpose of this paper is to establish a computational gait analysis approach for the Noldus Catwalk system, in which footprints are automatically captured and stored. We present a - to our knowledge - first machine learning based approach for the Catwalk system, which comprises a step decomposition, definition and extraction of meaningful features, multivariate step sequence alignment, feature selection, and training of different classifiers (gradient boosting machine, random forest, and elastic net). Using animal-wise leave-one-out cross validation we demonstrate that with our method we can reliable separate movement patterns of a putative Parkinson's disease animal model and several control groups. Furthermore, we show that we can predict the time point after and the type of different brain lesions and can even forecast the brain region, where the intervention was applied. We provide an in-depth analysis of the features involved into our classifiers via statistical techniques for model interpretation. A machine learning method for automated analysis of data from the Noldus Catwalk system was established. Our works shows the ability of machine learning to discriminate pharmacologically relevant animal groups based on their walking behavior in a multivariate manner. Further interesting aspects of the approach include the ability to learn from past experiments, improve with more data arriving and to make predictions for single animals in future studies.
Metastability for discontinuous dynamical systems under Lévy noise: Case study on Amazonian Vegetation.

PubMed

Serdukova, Larissa; Zheng, Yayun; Duan, Jinqiao; Kurths, Jürgen

2017-08-24

For the tipping elements in the Earth's climate system, the most important issue to address is how stable is the desirable state against random perturbations. Extreme biotic and climatic events pose severe hazards to tropical rainforests. Their local effects are extremely stochastic and difficult to measure. Moreover, the direction and intensity of the response of forest trees to such perturbations are unknown, especially given the lack of efficient dynamical vegetation models to evaluate forest tree cover changes over time. In this study, we consider randomness in the mathematical modelling of forest trees by incorporating uncertainty through a stochastic differential equation. According to field-based evidence, the interactions between fires and droughts are a more direct mechanism that may describe sudden forest degradation in the south-eastern Amazon. In modeling the Amazonian vegetation system, we include symmetric α-stable Lévy perturbations. We report results of stability analysis of the metastable fertile forest state. We conclude that even a very slight threat to the forest state stability represents L´evy noise with large jumps of low intensity, that can be interpreted as a fire occurring in a non-drought year. During years of severe drought, high-intensity fires significantly accelerate the transition between a forest and savanna state.
Stochastic assembly in a subtropical forest chronosequence: evidence from contrasting changes of species, phylogenetic and functional dissimilarity over succession.

PubMed

Mi, Xiangcheng; Swenson, Nathan G; Jia, Qi; Rao, Mide; Feng, Gang; Ren, Haibao; Bebber, Daniel P; Ma, Keping

2016-09-07

Deterministic and stochastic processes jointly determine the community dynamics of forest succession. However, it has been widely held in previous studies that deterministic processes dominate forest succession. Furthermore, inference of mechanisms for community assembly may be misleading if based on a single axis of diversity alone. In this study, we evaluated the relative roles of deterministic and stochastic processes along a disturbance gradient by integrating species, functional, and phylogenetic beta diversity in a subtropical forest chronosequence in Southeastern China. We found a general pattern of increasing species turnover, but little-to-no change in phylogenetic and functional turnover over succession at two spatial scales. Meanwhile, the phylogenetic and functional beta diversity were not significantly different from random expectation. This result suggested a dominance of stochastic assembly, contrary to the general expectation that deterministic processes dominate forest succession. On the other hand, we found significant interactions of environment and disturbance and limited evidence for significant deviations of phylogenetic or functional turnover from random expectations for different size classes. This result provided weak evidence of deterministic processes over succession. Stochastic assembly of forest succession suggests that post-disturbance restoration may be largely unpredictable and difficult to control in subtropical forests.
Correspondence between sound propagation in discrete and continuous random media with application to forest acoustics.

PubMed

Ostashev, Vladimir E; Wilson, D Keith; Muhlestein, Michael B; Attenborough, Keith

2018-02-01

Although sound propagation in a forest is important in several applications, there are currently no rigorous yet computationally tractable prediction methods. Due to the complexity of sound scattering in a forest, it is natural to formulate the problem stochastically. In this paper, it is demonstrated that the equations for the statistical moments of the sound field propagating in a forest have the same form as those for sound propagation in a turbulent atmosphere if the scattering properties of the two media are expressed in terms of the differential scattering and total cross sections. Using the existing theories for sound propagation in a turbulent atmosphere, this analogy enables the derivation of several results for predicting forest acoustics. In particular, the second-moment parabolic equation is formulated for the spatial correlation function of the sound field propagating above an impedance ground in a forest with micrometeorology. Effective numerical techniques for solving this equation have been developed in atmospheric acoustics. In another example, formulas are obtained that describe the effect of a forest on the interference between the direct and ground-reflected waves. The formulated correspondence between wave propagation in discrete and continuous random media can also be used in other fields of physics.
First direct landscape-scale measurement of tropical rain forest Leaf Area Index, a key driver of global primary productivity

Treesearch

David B. Clark; Paulo C. Olivas; Steven F. Oberbauer; Deborah A. Clark; Michael G. Ryan

2008-01-01

Leaf Area Index (leaf area per unit ground area, LAI) is a key driver of forest productivity but has never previously been measured directly at the landscape scale in tropical rain forest (TRF). We used a modular tower and stratified random sampling to harvest all foliage from forest floor to canopy top in 55 vertical transects (4.6 m2) across 500 ha of old growth in...
Airborne laser-guided imaging spectroscopy to map forest trait diversity and guide conservation.

PubMed

Asner, G P; Martin, R E; Knapp, D E; Tupayachi, R; Anderson, C B; Sinca, F; Vaughn, N R; Llactayo, W

2017-01-27

Functional biogeography may bridge a gap between field-based biodiversity information and satellite-based Earth system studies, thereby supporting conservation plans to protect more species and their contributions to ecosystem functioning. We used airborne laser-guided imaging spectroscopy with environmental modeling to derive large-scale, multivariate forest canopy functional trait maps of the Peruvian Andes-to-Amazon biodiversity hotspot. Seven mapped canopy traits revealed functional variation in a geospatial pattern explained by geology, topography, hydrology, and climate. Clustering of canopy traits yielded a map of forest beta functional diversity for land-use analysis. Up to 53% of each mapped, functionally distinct forest presents an opportunity for new conservation action. Mapping functional diversity advances our understanding of the biosphere to conserve more biodiversity in the face of land use and climate change. Copyright © 2017, American Association for the Advancement of Science.
Dynamics of Tree Species Diversity in Unlogged and Selectively Logged Malaysian Forests.

PubMed

Shima, Ken; Yamada, Toshihiro; Okuda, Toshinori; Fletcher, Christine; Kassim, Abdul Rahman

2018-01-18

Selective logging that is commonly conducted in tropical forests may change tree species diversity. In rarely disturbed tropical forests, locally rare species exhibit higher survival rates. If this non-random process occurs in a logged forest, the forest will rapidly recover its tree species diversity. Here we determined whether a forest in the Pasoh Forest Reserve, Malaysia, which was selectively logged 40 years ago, recovered its original species diversity (species richness and composition). To explore this, we compared the dynamics of secies diversity between unlogged forest plot (18.6 ha) and logged forest plot (5.4 ha). We found that 40 years are not sufficient to recover species diversity after logging. Unlike unlogged forests, tree deaths and recruitments did not contribute to increased diversity in the selectively logged forests. Our results predict that selectively logged forests require a longer time at least than our observing period (40 years) to regain their diversity.

Assessing change in large-scale forest area by visually interpreting Landsat images

Treesearch

Jerry D. Greer; Frederick P. Weber; Raymond L. Czaplewski

2000-01-01

As part of the Forest Resources Assessment 1990, the Food and Agriculture Organization of the United Nations visually interpreted a stratified random sample of 117 Landsat scenes to estimate global status and change in tropical forest area. Images from 1980 and 1990 were interpreted by a group of widely experienced technical people in many different tropical countries...
A ground-based method of assessing urban forest structure and ecosystem services

Treesearch

David J. Nowak; Daniel E. Crane; Jack C. Stevens; Robert E. Hoehn; Jeffrey T. Walton; Jerry Bond

2008-01-01

To properly manage urban forests, it is essential to have data on this important resource. An efficient means to obtain this information is to randomly sample urban areas. To help assess the urban forest structure (e.g., number of trees, species composition, tree sizes, health) and several functions (e.g., air pollution removal, carbon storage and sequestration), the...
Spatially random mortality in old-growth red pine forests of northern Minnesota

Treesearch

Tuomas Aakala; Shawn Fraver; Brian J. Palik; Anthony W. D' Amato

2012-01-01

Characterizing the spatial distribution of tree mortality is critical to understanding forest dynamics, but empirical studies on these patterns under old-growth conditions are rare. This rarity is due in part to low mortality rates in old-growth forests, the study of which necessitates long observation periods, and the confounding influence of tree in-growth during...
Random forests of interaction trees for estimating individualized treatment effects in randomized trials.

PubMed

Su, Xiaogang; Peña, Annette T; Liu, Lei; Levine, Richard A

2018-04-29

Assessing heterogeneous treatment effects is a growing interest in advancing precision medicine. Individualized treatment effects (ITEs) play a critical role in such an endeavor. Concerning experimental data collected from randomized trials, we put forward a method, termed random forests of interaction trees (RFIT), for estimating ITE on the basis of interaction trees. To this end, we propose a smooth sigmoid surrogate method, as an alternative to greedy search, to speed up tree construction. The RFIT outperforms the "separate regression" approach in estimating ITE. Furthermore, standard errors for the estimated ITE via RFIT are obtained with the infinitesimal jackknife method. We assess and illustrate the use of RFIT via both simulation and the analysis of data from an acupuncture headache trial. Copyright © 2018 John Wiley & Sons, Ltd.
Joint 3-D vessel segmentation and centerline extraction using oblique Hough forests with steerable filters.

PubMed

Schneider, Matthias; Hirsch, Sven; Weber, Bruno; Székely, Gábor; Menze, Bjoern H

2015-01-01

We propose a novel framework for joint 3-D vessel segmentation and centerline extraction. The approach is based on multivariate Hough voting and oblique random forests (RFs) that we learn from noisy annotations. It relies on steerable filters for the efficient computation of local image features at different scales and orientations. We validate both the segmentation performance and the centerline accuracy of our approach both on synthetic vascular data and four 3-D imaging datasets of the rat visual cortex at 700 nm resolution. First, we evaluate the most important structural components of our approach: (1) Orthogonal subspace filtering in comparison to steerable filters that show, qualitatively, similarities to the eigenspace filters learned from local image patches. (2) Standard RF against oblique RF. Second, we compare the overall approach to different state-of-the-art methods for (1) vessel segmentation based on optimally oriented flux (OOF) and the eigenstructure of the Hessian, and (2) centerline extraction based on homotopic skeletonization and geodesic path tracing. Our experiments reveal the benefit of steerable over eigenspace filters as well as the advantage of oblique split directions over univariate orthogonal splits. We further show that the learning-based approach outperforms different state-of-the-art methods and proves highly accurate and robust with regard to both vessel segmentation and centerline extraction in spite of the high level of label noise in the training data. Copyright © 2014 Elsevier B.V. All rights reserved.
Air contaminants and litter fall decomposition in urban forest areas: The case of São Paulo - SP, Brazil.

PubMed

Lamano Ferreira, Maurício; Portella Ribeiro, Andreza; Rodrigues Albuquerque, Caroline; Ferreira, Ana Paula do Nascimento Lamano; Figueira, Rubens César Lopes; Lafortezza, Raffaele

2017-05-01

Urban forests are usually affected by several types of atmospheric contaminants and by abnormal variations in weather conditions, thus facilitating the biotic homogenization and modification of ecosystem processes, such as nutrient cycling. Peri-urban forests and even natural forests that surround metropolitan areas are also subject to anthropogenic effects generated by cities, which may compromise the dynamics of these ecosystems. Hence, this study advances the hypothesis that the forests located at the margins of the Metropolitan Region of São Paulo (MRSP), Brazil, have high concentrations of atmospheric contaminants leading to adverse effects on litter fall stock. The production, stock and decomposition of litter fall in two forests were quantified. The first, known as Guarapiranga forest, lies closer to the urban area and is located within the MRSP, approximately 20km from the city center. The second, Curucutu forest, is located 70km from the urban center. This forest is situated exactly on the border of the largest continuum of vegetation of the Atlantic Forest. To verify the reach of atmospheric pollutants from the urban area, levels of heavy metals (Cd, Pb, Ni, Cu) adsorbed on the litter fall deposited on the soil surface of the forests were also quantified. The stock of litter fall and the levels of heavy metals were generally higher in the Guarapiranga forest in the samples collected during the lower rainfall season (dry season). Non-metric multidimensional scaling multivariate analysis showed a clear distinction of the sample units related to the concentrations of heavy metals in each forest. A subtle difference between the units related to the dry and rainy seasons in the Curucutu forest was also noted. Multivariate Analysis of Variance revealed that both site and season of the year (dry or rainy) were important to differentiate the quantity of heavy metals in litter fall stock, although the analysis did not show the interaction between these two factors. Precipitation appeared to be an important factor to disperse air pollutants; one method to better regulate this process is the development and integration of green infrastructure at city level, which might contribute to nature-based solutions. Results suggest that although the Curucutu forest is not very far from the MRSP, which could result in heavy metal levels similar to those observed in the Guarapiranga forest, the weather conditions, geographic location and rainfall rates might act as efficient physical barriers against the dispersion of pollutants in the urban area. However, it is important to highlight that in the period studied (2012-2013), MRSP presented unusual features during the winter period marked by the highest levels of precipitation which was due to several numbers of frontal systems and also due to their permanence for a couple days in the region. Thus, it is recommended to continue this study in order to obtain a database for characterizing the seasonal variation of air pollution levels in the litter fall and their adverse effects on ecosystem processes in these remnants of the Atlantic Forest. Copyright © 2017 Elsevier Inc. All rights reserved.
Multivariate Meta-Analysis of Genetic Association Studies: A Simulation Study

PubMed Central

Neupane, Binod; Beyene, Joseph

2015-01-01

In a meta-analysis with multiple end points of interests that are correlated between or within studies, multivariate approach to meta-analysis has a potential to produce more precise estimates of effects by exploiting the correlation structure between end points. However, under random-effects assumption the multivariate estimation is more complex (as it involves estimation of more parameters simultaneously) than univariate estimation, and sometimes can produce unrealistic parameter estimates. Usefulness of multivariate approach to meta-analysis of the effects of a genetic variant on two or more correlated traits is not well understood in the area of genetic association studies. In such studies, genetic variants are expected to roughly maintain Hardy-Weinberg equilibrium within studies, and also their effects on complex traits are generally very small to modest and could be heterogeneous across studies for genuine reasons. We carried out extensive simulation to explore the comparative performance of multivariate approach with most commonly used univariate inverse-variance weighted approach under random-effects assumption in various realistic meta-analytic scenarios of genetic association studies of correlated end points. We evaluated the performance with respect to relative mean bias percentage, and root mean square error (RMSE) of the estimate and coverage probability of corresponding 95% confidence interval of the effect for each end point. Our simulation results suggest that multivariate approach performs similarly or better than univariate method when correlations between end points within or between studies are at least moderate and between-study variation is similar or larger than average within-study variation for meta-analyses of 10 or more genetic studies. Multivariate approach produces estimates with smaller bias and RMSE especially for the end point that has randomly or informatively missing summary data in some individual studies, when the missing data in the endpoint are imputed with null effects and quite large variance. PMID:26196398
Multivariate Meta-Analysis of Genetic Association Studies: A Simulation Study.

PubMed

Neupane, Binod; Beyene, Joseph

2015-01-01

In a meta-analysis with multiple end points of interests that are correlated between or within studies, multivariate approach to meta-analysis has a potential to produce more precise estimates of effects by exploiting the correlation structure between end points. However, under random-effects assumption the multivariate estimation is more complex (as it involves estimation of more parameters simultaneously) than univariate estimation, and sometimes can produce unrealistic parameter estimates. Usefulness of multivariate approach to meta-analysis of the effects of a genetic variant on two or more correlated traits is not well understood in the area of genetic association studies. In such studies, genetic variants are expected to roughly maintain Hardy-Weinberg equilibrium within studies, and also their effects on complex traits are generally very small to modest and could be heterogeneous across studies for genuine reasons. We carried out extensive simulation to explore the comparative performance of multivariate approach with most commonly used univariate inverse-variance weighted approach under random-effects assumption in various realistic meta-analytic scenarios of genetic association studies of correlated end points. We evaluated the performance with respect to relative mean bias percentage, and root mean square error (RMSE) of the estimate and coverage probability of corresponding 95% confidence interval of the effect for each end point. Our simulation results suggest that multivariate approach performs similarly or better than univariate method when correlations between end points within or between studies are at least moderate and between-study variation is similar or larger than average within-study variation for meta-analyses of 10 or more genetic studies. Multivariate approach produces estimates with smaller bias and RMSE especially for the end point that has randomly or informatively missing summary data in some individual studies, when the missing data in the endpoint are imputed with null effects and quite large variance.
Analyzing Multiple Outcomes in Clinical Research Using Multivariate Multilevel Models

PubMed Central

Baldwin, Scott A.; Imel, Zac E.; Braithwaite, Scott R.; Atkins, David C.

2014-01-01

Objective Multilevel models have become a standard data analysis approach in intervention research. Although the vast majority of intervention studies involve multiple outcome measures, few studies use multivariate analysis methods. The authors discuss multivariate extensions to the multilevel model that can be used by psychotherapy researchers. Method and Results Using simulated longitudinal treatment data, the authors show how multivariate models extend common univariate growth models and how the multivariate model can be used to examine multivariate hypotheses involving fixed effects (e.g., does the size of the treatment effect differ across outcomes?) and random effects (e.g., is change in one outcome related to change in the other?). An online supplemental appendix provides annotated computer code and simulated example data for implementing a multivariate model. Conclusions Multivariate multilevel models are flexible, powerful models that can enhance clinical research. PMID:24491071
CRF: detection of CRISPR arrays using random forest.

PubMed

Wang, Kai; Liang, Chun

2017-01-01

CRISPRs (clustered regularly interspaced short palindromic repeats) are particular repeat sequences found in wide range of bacteria and archaea genomes. Several tools are available for detecting CRISPR arrays in the genomes of both domains. Here we developed a new web-based CRISPR detection tool named CRF (CRISPR Finder by Random Forest). Different from other CRISPR detection tools, a random forest classifier was used in CRF to filter out invalid CRISPR arrays from all putative candidates and accordingly enhanced detection accuracy. In CRF, particularly, triplet elements that combine both sequence content and structure information were extracted from CRISPR repeats for classifier training. The classifier achieved high accuracy and sensitivity. Moreover, CRF offers a highly interactive web interface for robust data visualization that is not available among other CRISPR detection tools. After detection, the query sequence, CRISPR array architecture, and the sequences and secondary structures of CRISPR repeats and spacers can be visualized for visual examination and validation. CRF is freely available at http://bioinfolab.miamioh.edu/crf/home.php.
Do bioclimate variables improve performance of climate envelope models?

USGS Publications Warehouse

Watling, James I.; Romañach, Stephanie S.; Bucklin, David N.; Speroterra, Carolina; Brandt, Laura A.; Pearlstine, Leonard G.; Mazzotti, Frank J.

2012-01-01

Climate envelope models are widely used to forecast potential effects of climate change on species distributions. A key issue in climate envelope modeling is the selection of predictor variables that most directly influence species. To determine whether model performance and spatial predictions were related to the selection of predictor variables, we compared models using bioclimate variables with models constructed from monthly climate data for twelve terrestrial vertebrate species in the southeastern USA using two different algorithms (random forests or generalized linear models), and two model selection techniques (using uncorrelated predictors or a subset of user-defined biologically relevant predictor variables). There were no differences in performance between models created with bioclimate or monthly variables, but one metric of model performance was significantly greater using the random forest algorithm compared with generalized linear models. Spatial predictions between maps using bioclimate and monthly variables were very consistent using the random forest algorithm with uncorrelated predictors, whereas we observed greater variability in predictions using generalized linear models.
Clustering Single-Cell Expression Data Using Random Forest Graphs.

PubMed

Pouyan, Maziyar Baran; Nourani, Mehrdad

2017-07-01

Complex tissues such as brain and bone marrow are made up of multiple cell types. As the study of biological tissue structure progresses, the role of cell-type-specific research becomes increasingly important. Novel sequencing technology such as single-cell cytometry provides researchers access to valuable biological data. Applying machine-learning techniques to these high-throughput datasets provides deep insights into the cellular landscape of the tissue where those cells are a part of. In this paper, we propose the use of random-forest-based single-cell profiling, a new machine-learning-based technique, to profile different cell types of intricate tissues using single-cell cytometry data. Our technique utilizes random forests to capture cell marker dependences and model the cellular populations using the cell network concept. This cellular network helps us discover what cell types are in the tissue. Our experimental results on public-domain datasets indicate promising performance and accuracy of our technique in extracting cell populations of complex tissues.
Comparative analysis of used car price evaluation models

NASA Astrophysics Data System (ADS)

Chen, Chuancan; Hao, Lulu; Xu, Cong

2017-05-01

An accurate used car price evaluation is a catalyst for the healthy development of used car market. Data mining has been applied to predict used car price in several articles. However, little is studied on the comparison of using different algorithms in used car price estimation. This paper collects more than 100,000 used car dealing records throughout China to do empirical analysis on a thorough comparison of two algorithms: linear regression and random forest. These two algorithms are used to predict used car price in three different models: model for a certain car make, model for a certain car series and universal model. Results show that random forest has a stable but not ideal effect in price evaluation model for a certain car make, but it shows great advantage in the universal model compared with linear regression. This indicates that random forest is an optimal algorithm when handling complex models with a large number of variables and samples, yet it shows no obvious advantage when coping with simple models with less variables.
Random forest models to predict aqueous solubility.

PubMed

Palmer, David S; O'Boyle, Noel M; Glen, Robert C; Mitchell, John B O

2007-01-01

Random Forest regression (RF), Partial-Least-Squares (PLS) regression, Support Vector Machines (SVM), and Artificial Neural Networks (ANN) were used to develop QSPR models for the prediction of aqueous solubility, based on experimental data for 988 organic molecules. The Random Forest regression model predicted aqueous solubility more accurately than those created by PLS, SVM, and ANN and offered methods for automatic descriptor selection, an assessment of descriptor importance, and an in-parallel measure of predictive ability, all of which serve to recommend its use. The prediction of log molar solubility for an external test set of 330 molecules that are solid at 25 degrees C gave an r2 = 0.89 and RMSE = 0.69 log S units. For a standard data set selected from the literature, the model performed well with respect to other documented methods. Finally, the diversity of the training and test sets are compared to the chemical space occupied by molecules in the MDL drug data report, on the basis of molecular descriptors selected by the regression analysis.
An Entropy-Based Measure of Dependence between Two Groups of Random Variables. Research Report. ETS RR-07-20

ERIC Educational Resources Information Center

Kong, Nan

2007-01-01

In multivariate statistics, the linear relationship among random variables has been fully explored in the past. This paper looks into the dependence of one group of random variables on another group of random variables using (conditional) entropy. A new measure, called the K-dependence coefficient or dependence coefficient, is defined using…
Predicting Coastal Flood Severity using Random Forest Algorithm

NASA Astrophysics Data System (ADS)

Sadler, J. M.; Goodall, J. L.; Morsy, M. M.; Spencer, K.

2017-12-01

Coastal floods have become more common recently and are predicted to further increase in frequency and severity due to sea level rise. Predicting floods in coastal cities can be difficult due to the number of environmental and geographic factors which can influence flooding events. Built stormwater infrastructure and irregular urban landscapes add further complexity. This paper demonstrates the use of machine learning algorithms in predicting street flood occurrence in an urban coastal setting. The model is trained and evaluated using data from Norfolk, Virginia USA from September 2010 - October 2016. Rainfall, tide levels, water table levels, and wind conditions are used as input variables. Street flooding reports made by city workers after named and unnamed storm events, ranging from 1-159 reports per event, are the model output. Results show that Random Forest provides predictive power in estimating the number of flood occurrences given a set of environmental conditions with an out-of-bag root mean squared error of 4.3 flood reports and a mean absolute error of 0.82 flood reports. The Random Forest algorithm performed much better than Poisson regression. From the Random Forest model, total daily rainfall was by far the most important factor in flood occurrence prediction, followed by daily low tide and daily higher high tide. The model demonstrated here could be used to predict flood severity based on forecast rainfall and tide conditions and could be further enhanced using more complete street flooding data for model training.
Differentiation of fat, muscle, and edema in thigh MRIs using random forest classification

NASA Astrophysics Data System (ADS)

Kovacs, William; Liu, Chia-Ying; Summers, Ronald M.; Yao, Jianhua

2016-03-01

There are many diseases that affect the distribution of muscles, including Duchenne and fascioscapulohumeral dystrophy among other myopathies. In these disease cases, it is important to quantify both the muscle and fat volumes to track the disease progression. There has also been evidence that abnormal signal intensity on the MR images, which often is an indication of edema or inflammation can be a good predictor for muscle deterioration. We present a fully-automated method that examines magnetic resonance (MR) images of the thigh and identifies the fat, muscle, and edema using a random forest classifier. First the thigh regions are automatically segmented using the T1 sequence. Then, inhomogeneity artifacts were corrected using the N3 technique. The T1 and STIR (short tau inverse recovery) images are then aligned using landmark based registration with the bone marrow. The normalized T1 and STIR intensity values are used to train the random forest. Once trained, the random forest can accurately classify the aforementioned classes. This method was evaluated on MR images of 9 patients. The precision values are 0.91+/-0.06, 0.98+/-0.01 and 0.50+/-0.29 for muscle, fat, and edema, respectively. The recall values are 0.95+/-0.02, 0.96+/-0.03 and 0.43+/-0.09 for muscle, fat, and edema, respectively. This demonstrates the feasibility of utilizing information from multiple MR sequences for the accurate quantification of fat, muscle and edema.
AUTOCLASSIFICATION OF THE VARIABLE 3XMM SOURCES USING THE RANDOM FOREST MACHINE LEARNING ALGORITHM

DOE Office of Scientific and Technical Information (OSTI.GOV)

Farrell, Sean A.; Murphy, Tara; Lo, Kitty K., E-mail: s.farrell@physics.usyd.edu.au

In the current era of large surveys and massive data sets, autoclassification of astrophysical sources using intelligent algorithms is becoming increasingly important. In this paper we present the catalog of variable sources in the Third XMM-Newton Serendipitous Source catalog (3XMM) autoclassified using the Random Forest machine learning algorithm. We used a sample of manually classified variable sources from the second data release of the XMM-Newton catalogs (2XMMi-DR2) to train the classifier, obtaining an accuracy of ∼92%. We also evaluated the effectiveness of identifying spurious detections using a sample of spurious sources, achieving an accuracy of ∼95%. Manual investigation of amore » random sample of classified sources confirmed these accuracy levels and showed that the Random Forest machine learning algorithm is highly effective at automatically classifying 3XMM sources. Here we present the catalog of classified 3XMM variable sources. We also present three previously unidentified unusual sources that were flagged as outlier sources by the algorithm: a new candidate supergiant fast X-ray transient, a 400 s X-ray pulsar, and an eclipsing 5 hr binary system coincident with a known Cepheid.« less
Analysis of multivariate longitudinal kidney function outcomes using generalized linear mixed models.

PubMed

Jaffa, Miran A; Gebregziabher, Mulugeta; Jaffa, Ayad A

2015-06-14

Renal transplant patients are mandated to have continuous assessment of their kidney function over time to monitor disease progression determined by changes in blood urea nitrogen (BUN), serum creatinine (Cr), and estimated glomerular filtration rate (eGFR). Multivariate analysis of these outcomes that aims at identifying the differential factors that affect disease progression is of great clinical significance. Thus our study aims at demonstrating the application of different joint modeling approaches with random coefficients on a cohort of renal transplant patients and presenting a comparison of their performance through a pseudo-simulation study. The objective of this comparison is to identify the model with best performance and to determine whether accuracy compensates for complexity in the different multivariate joint models. We propose a novel application of multivariate Generalized Linear Mixed Models (mGLMM) to analyze multiple longitudinal kidney function outcomes collected over 3 years on a cohort of 110 renal transplantation patients. The correlated outcomes BUN, Cr, and eGFR and the effect of various covariates such patient's gender, age and race on these markers was determined holistically using different mGLMMs. The performance of the various mGLMMs that encompass shared random intercept (SHRI), shared random intercept and slope (SHRIS), separate random intercept (SPRI) and separate random intercept and slope (SPRIS) was assessed to identify the one that has the best fit and most accurate estimates. A bootstrap pseudo-simulation study was conducted to gauge the tradeoff between the complexity and accuracy of the models. Accuracy was determined using two measures; the mean of the differences between the estimates of the bootstrapped datasets and the true beta obtained from the application of each model on the renal dataset, and the mean of the square of these differences. The results showed that SPRI provided most accurate estimates and did not exhibit any computational or convergence problem. Higher accuracy was demonstrated when the level of complexity increased from shared random coefficient models to the separate random coefficient alternatives with SPRI showing to have the best fit and most accurate estimates.
Diagnostic tools for nearest neighbors techniques when used with satellite imagery

Treesearch

Ronald E. McRoberts

2009-01-01

Nearest neighbors techniques are non-parametric approaches to multivariate prediction that are useful for predicting both continuous and categorical forest attribute variables. Although some assumptions underlying nearest neighbor techniques are common to other prediction techniques such as regression, other assumptions are unique to nearest neighbor techniques....

Influences of environment and disturbance on forest patterns in coastal Oregon watersheds.

Treesearch

Michael C. Wimberly; Thomas A. Spies

2001-01-01

Modern ecology often emphasizes the distinction between traditional theories of stable, environmentally structured communities and a new paradigm of disturbance driven, nonequilibrium dynamics. However, multiple hypotheses for observed vegetation patterns have seldom been explicitly tested. We used multivariate statistics and variation partitioning methods to assess...
Multivariate non-normally distributed random variables in climate research - introduction to the copula approach

NASA Astrophysics Data System (ADS)

Schölzel, C.; Friederichs, P.

2008-10-01

Probability distributions of multivariate random variables are generally more complex compared to their univariate counterparts which is due to a possible nonlinear dependence between the random variables. One approach to this problem is the use of copulas, which have become popular over recent years, especially in fields like econometrics, finance, risk management, or insurance. Since this newly emerging field includes various practices, a controversial discussion, and vast field of literature, it is difficult to get an overview. The aim of this paper is therefore to provide an brief overview of copulas for application in meteorology and climate research. We examine the advantages and disadvantages compared to alternative approaches like e.g. mixture models, summarize the current problem of goodness-of-fit (GOF) tests for copulas, and discuss the connection with multivariate extremes. An application to station data shows the simplicity and the capabilities as well as the limitations of this approach. Observations of daily precipitation and temperature are fitted to a bivariate model and demonstrate, that copulas are valuable complement to the commonly used methods.
Understanding global climate change scenarios through bioclimate stratification

NASA Astrophysics Data System (ADS)

Soteriades, A. D.; Murray-Rust, D.; Trabucco, A.; Metzger, M. J.

2017-08-01

Despite progress in impact modelling, communicating and understanding the implications of climatic change projections is challenging due to inherent complexity and a cascade of uncertainty. In this letter, we present an alternative representation of global climate change projections based on shifts in 125 multivariate strata characterized by relatively homogeneous climate. These strata form climate analogues that help in the interpretation of climate change impacts. A Random Forests classifier was calculated and applied to 63 Coupled Model Intercomparison Project Phase 5 climate scenarios at 5 arcmin resolution. Results demonstrate how shifting bioclimate strata can summarize future environmental changes and form a middle ground, conveniently integrating current knowledge of climate change impact with the interpretation advantages of categorical data but with a level of detail that resembles a continuous surface at global and regional scales. Both the agreement in major change and differences between climate change projections are visually combined, facilitating the interpretation of complex uncertainty. By making the data and the classifier available we provide a climate service that helps facilitate communication and provide new insight into the consequences of climate change.
A Partial Least Squares Based Procedure for Upstream Sequence Classification in Prokaryotes.

PubMed

Mehmood, Tahir; Bohlin, Jon; Snipen, Lars

2015-01-01

The upstream region of coding genes is important for several reasons, for instance locating transcription factor, binding sites, and start site initiation in genomic DNA. Motivated by a recently conducted study, where multivariate approach was successfully applied to coding sequence modeling, we have introduced a partial least squares (PLS) based procedure for the classification of true upstream prokaryotic sequence from background upstream sequence. The upstream sequences of conserved coding genes over genomes were considered in analysis, where conserved coding genes were found by using pan-genomics concept for each considered prokaryotic species. PLS uses position specific scoring matrix (PSSM) to study the characteristics of upstream region. Results obtained by PLS based method were compared with Gini importance of random forest (RF) and support vector machine (SVM), which is much used method for sequence classification. The upstream sequence classification performance was evaluated by using cross validation, and suggested approach identifies prokaryotic upstream region significantly better to RF (p-value < 0.01) and SVM (p-value < 0.01). Further, the proposed method also produced results that concurred with known biological characteristics of the upstream region.
Discrimination and characterization of strawberry juice based on electronic nose and tongue: comparison of different juice processing approaches by LDA, PLSR, RF, and SVM.

PubMed

Qiu, Shanshan; Wang, Jun; Gao, Liping

2014-07-09

An electronic nose (E-nose) and an electronic tongue (E-tongue) have been used to characterize five types of strawberry juices based on processing approaches (i.e., microwave pasteurization, steam blanching, high temperature short time pasteurization, frozen-thawed, and freshly squeezed). Juice quality parameters (vitamin C, pH, total soluble solid, total acid, and sugar/acid ratio) were detected by traditional measuring methods. Multivariate statistical methods (linear discriminant analysis (LDA) and partial least squares regression (PLSR)) and neural networks (Random Forest (RF) and Support Vector Machines) were employed to qualitative classification and quantitative regression. E-tongue system reached higher accuracy rates than E-nose did, and the simultaneous utilization did have an advantage in LDA classification and PLSR regression. According to cross-validation, RF has shown outstanding and indisputable performances in the qualitative and quantitative analysis. This work indicates that the simultaneous utilization of E-nose and E-tongue can discriminate processed fruit juices and predict quality parameters successfully for the beverage industry.
MAVTgsa: An R Package for Gene Set (Enrichment) Analysis

DOE PAGES

Chien, Chih-Yi; Chang, Ching-Wei; Tsai, Chen-An; ...

2014-01-01

Gene semore » t analysis methods aim to determine whether an a priori defined set of genes shows statistically significant difference in expression on either categorical or continuous outcomes. Although many methods for gene set analysis have been proposed, a systematic analysis tool for identification of different types of gene set significance modules has not been developed previously. This work presents an R package, called MAVTgsa, which includes three different methods for integrated gene set enrichment analysis. (1) The one-sided OLS (ordinary least squares) test detects coordinated changes of genes in gene set in one direction, either up- or downregulation. (2) The two-sided MANOVA (multivariate analysis variance) detects changes both up- and downregulation for studying two or more experimental conditions. (3) A random forests-based procedure is to identify gene sets that can accurately predict samples from different experimental conditions or are associated with the continuous phenotypes. MAVTgsa computes the P values and FDR (false discovery rate) q -value for all gene sets in the study. Furthermore, MAVTgsa provides several visualization outputs to support and interpret the enrichment results. This package is available online.« less
GIS based Cadastral level Forest Information System using World View-II data in Bir Hisar (Haryana)

NASA Astrophysics Data System (ADS)

Mothi Kumar, K. E.; Singh, S.; Attri, P.; Kumar, R.; Kumar, A.; Sarika; Hooda, R. S.; Sapra, R. K.; Garg, V.; Kumar, V.; Nivedita

2014-11-01

Identification and demarcation of Forest lands on the ground remains a major challenge in Forest administration and management. Cadastral forest mapping deals with forestlands boundary delineation and their associated characterization (forest/non forest). The present study is an application of high resolution World View-II data for digitization of Protected Forest boundary at cadastral level with integration of Records of Right (ROR) data. Cadastral vector data was generated by digitization of spatial data using scanned mussavies in ArcGIS environment. Ortho-images were created from World View-II digital stereo data with Universal Transverse Mercator coordinate system with WGS 84 datum. Cadastral vector data of Bir Hisar (Hisar district, Haryana) and adjacent villages was spatially adjusted over ortho-image using ArcGIS software. Edge matching of village boundaries was done with respect to khasra boundaries of individual village. The notified forest grids were identified on ortho-image and grid vector data was extracted from georeferenced cadastral data. Cadastral forest boundary vectors were digitized from ortho-images. Accuracy of cadastral data was checked by comparison of randomly selected geo-coordinates points, tie lines and boundary measurements of randomly selected parcels generated from image data set with that of actual field measurements. Area comparison was done between cadastral map area, the image map area and RoR area. The area covered under Protected Forest was compared with ROR data and within an accuracy of less than 1 % from ROR area was accepted. The methodology presented in this paper is useful to update the cadastral forest maps. The produced GIS databases and large-scale Forest Maps may serve as a data foundation towards a land register of forests. The study introduces the use of very high resolution satellite data to develop a method for cadastral surveying through on - screen digitization in a less time as compared to the old fashioned cadastral parcel boundaries surveying method.
Impact of livestock on a mosquito community (Diptera: Culicidae) in a Brazilian tropical dry forest.

PubMed

Santos, Cleandson Ferreira; Borges, Magno

2015-01-01

This study evaluated the effects of cattle removal on the Culicidae mosquito community structure in a tropical dry forest in Brazil. Culicidae were collected during dry and wet seasons in cattle presence and absence between August 2008 and October 2010 and assessed using multivariate statistical models. Cattle removal did not significantly alter Culicidae species richness and abundance. However, alterations were noted in Culicidae community composition. This is the first study to evaluate the impact of cattle removal on Culicidae community structure in Brazil and demonstrates the importance of assessing ecological parameters such as community species composition.
A Multivariate Randomization Text of Association Applied to Cognitive Test Results

NASA Technical Reports Server (NTRS)

Ahumada, Albert; Beard, Bettina

2009-01-01

Randomization tests provide a conceptually simple, distribution-free way to implement significance testing. We have applied this method to the problem of evaluating the significance of the association among a number (k) of variables. The randomization method was the random re-ordering of k-1 of the variables. The criterion variable was the value of the largest eigenvalue of the correlation matrix.
Predicting healthcare associated infections using patients' experiences

NASA Astrophysics Data System (ADS)

Pratt, Michael A.; Chu, Henry

2016-05-01

Healthcare associated infections (HAI) are a major threat to patient safety and are costly to health systems. Our goal is to predict the HAI performance of a hospital using the patients' experience responses as input. We use four classifiers, viz. random forest, naive Bayes, artificial feedforward neural networks, and the support vector machine, to perform the prediction of six types of HAI. The six types include blood stream, urinary tract, surgical site, and intestinal infections. Experiments show that the random forest and support vector machine perform well across the six types of HAI.
Longer-term effects of selective thinning on microarthropod communities in a late-successional coniferous forest

USGS Publications Warehouse

Peck, R.W.; Niwa, C.G.

2005-01-01

Microarthropod densities within late-successional coniferous forests thinned 16-41 yr before sampling were compared with adjacent unthinned stands to identify longer term effects of thinning on this community. Soil and forest floor layers were sampled separately on eight paired sites. Within the forest floor oribatid, mesostigmatid, and to a marginal extent, prostigmatid mites, were reduced in thinned stands compared with unthinned stands. No differences were found for Collembola in the forest floor or for any mite suborder within the soil. Family level examination of mesostigmatid and prostigmatid mites revealed significant differences between stand types for both horizons. At the species level, thinning influenced numerous oribatid mites and Collembola. For oribatid mites, significant or marginally significant differences were found for seven of 15 common species in the forest floor and five of 16 common species in soil. Collembola were affected less, with differences found for one of 11 common species in the forest floor and three of 13 common species in soil. Multivariate analysis of variance and ordination indicated that forest thinning had little influence on the composition of oribatid mite and collembolan communities within either the forest floor or soil. Differences in microclimate or in the accumulation of organic matter on the forest floor were likely most responsible for the observed patterns of abundance. Considering the role that microarthropods play in nutrient cycling, determining the functional response of a wide range of taxa to thinning may be important to effective ecosystem management.
Aggregating pixel-level basal area predictions derived from LiDAR data to industrial forest stands in North-Central Idaho

Treesearch

Andrew T. Hudak; Jeffrey S. Evans; Nicholas L. Crookston; Michael J. Falkowski; Brant K. Steigers; Rob Taylor; Halli Hemingway

2008-01-01

Stand exams are the principal means by which timber companies monitor and manage their forested lands. Airborne LiDAR surveys sample forest stands at much finer spatial resolution and broader spatial extent than is practical on the ground. In this paper, we developed models that leverage spatially intensive and extensive LiDAR data and a stratified random sample of...
Assessing the accuracy of respondents reports of the location of their home relative to a national forest boundary and forest cover

Treesearch

John D. Baldridge; James T. Sylvester; William T. Borrie

2005-01-01

Local, state, and national agencies charged with managing wildlands in the United States are now seeking to learn more about the public's preferences for managing forests. For this reason agency wildland managers are making use of survey research to supplement their public input processes. Agency managers often choose random-digit dial telephone surveys because of...
Effect of the federal estate tax on nonindustrial private forest holdings

Treesearch

John L. Greene; Steven H. Bullard; Tamara L. Cushing; Theodore Beauvais

2006-01-01

Data for this study were collected using a questionnaire mailed to randomly selected members of two forest owner organizations. Among the key findings is that 38% of forest estates owed federal estate tax, a rate many times higher than US estates in general. In 28% of the cases where estate tax was due, timber or land was sold because other assets were not adequate. In...
Acorn Production on the Missouri Ozark Forest Ecosystem Project Study Sites: Pre-treatment Data

Treesearch

Larry D. Vangilder

1997-01-01

In the pre-treatment phase of a study to determine if even- and uneven-aged forest management affects the production of acorns on the Missourt Forest Ecosystem Project (MOFEP) study sites, acorn production was measured on the nine study sites by randomly placing from 2 to 6 plots in each of four ecological land type (ELT) groupings (N=130 plots). A split-plot...
A non-iterative extension of the multivariate random effects meta-analysis.

PubMed

Makambi, Kepher H; Seung, Hyunuk

2015-01-01

Multivariate methods in meta-analysis are becoming popular and more accepted in biomedical research despite computational issues in some of the techniques. A number of approaches, both iterative and non-iterative, have been proposed including the multivariate DerSimonian and Laird method by Jackson et al. (2010), which is non-iterative. In this study, we propose an extension of the method by Hartung and Makambi (2002) and Makambi (2001) to multivariate situations. A comparison of the bias and mean square error from a simulation study indicates that, in some circumstances, the proposed approach perform better than the multivariate DerSimonian-Laird approach. An example is presented to demonstrate the application of the proposed approach.
On set-valued functionals: Multivariate risk measures and Aumann integrals

NASA Astrophysics Data System (ADS)

Ararat, Cagin

In this dissertation, multivariate risk measures for random vectors and Aumann integrals of set-valued functions are studied. Both are set-valued functionals with values in a complete lattice of subsets of Rm. Multivariate risk measures are considered in a general d-asset financial market with trading opportunities in discrete time. Specifically, the following features of the market are incorporated in the evaluation of multivariate risk: convex transaction costs modeled by solvency regions, intermediate trading constraints modeled by convex random sets, and the requirement of liquidation into the first m ≤ d of the assets. It is assumed that the investor has a "pure" multivariate risk measure R on the space of m-dimensional random vectors which represents her risk attitude towards the assets but does not take into account the frictions of the market. Then, the investor with a d-dimensional position minimizes the set-valued functional R over all m-dimensional positions that she can reach by trading in the market subject to the frictions described above. The resulting functional Rmar on the space of d-dimensional random vectors is another multivariate risk measure, called the market-extension of R. A dual representation for R mar that decomposes the effects of R and the frictions of the market is proved. Next, multivariate risk measures are studied in a utility-based framework. It is assumed that the investor has a complete risk preference towards each individual asset, which can be represented by a von Neumann-Morgenstern utility function. Then, an incomplete preference is considered for multivariate positions which is represented by the vector of the individual utility functions. Under this structure, multivariate shortfall and divergence risk measures are defined as the optimal values of set minimization problems. The dual relationship between the two classes of multivariate risk measures is constructed via a recent Lagrange duality for set optimization. In particular, it is shown that a shortfall risk measure can be written as an intersection over a family of divergence risk measures indexed by a scalarization parameter. Examples include the multivariate versions of the entropic risk measure and the average value at risk. In the second part, Aumann integrals of set-valued functions on a measurable space are viewed as set-valued functionals and a Daniell-Stone type characterization theorem is proved for such functionals. More precisely, it is shown that a functional that maps measurable set-valued functions into a certain complete lattice of subsets of Rm can be written as the Aumann integral with respect to a measure if and only if the functional is (1) additive and (2) positively homogeneous, (3) it preserves decreasing limits, (4) it maps halfspace-valued functions to halfspaces, and (5) it maps shifted cone-valued functions to shifted cones. While the first three properties already exist in the classical Daniell-Stone theorem for the Lebesgue integral, the last two properties are peculiar to the set-valued framework and they suffice to complement the first three properties to identify a set-valued functional as the Aumann integral with respect to a measure.
Multivariate spatial models of excess crash frequency at area level: case of Costa Rica.

PubMed

Aguero-Valverde, Jonathan

2013-10-01

Recently, areal models of crash frequency have being used in the analysis of various area-wide factors affecting road crashes. On the other hand, disease mapping methods are commonly used in epidemiology to assess the relative risk of the population at different spatial units. A natural next step is to combine these two approaches to estimate the excess crash frequency at area level as a measure of absolute crash risk. Furthermore, multivariate spatial models of crash severity are explored in order to account for both frequency and severity of crashes and control for the spatial correlation frequently found in crash data. This paper aims to extent the concept of safety performance functions to be used in areal models of crash frequency. A multivariate spatial model is used for that purpose and compared to its univariate counterpart. Full Bayes hierarchical approach is used to estimate the models of crash frequency at canton level for Costa Rica. An intrinsic multivariate conditional autoregressive model is used for modeling spatial random effects. The results show that the multivariate spatial model performs better than its univariate counterpart in terms of the penalized goodness-of-fit measure Deviance Information Criteria. Additionally, the effects of the spatial smoothing due to the multivariate spatial random effects are evident in the estimation of excess equivalent property damage only crashes. Copyright © 2013 Elsevier Ltd. All rights reserved.
Community turnover of wood-inhabiting fungi across hierarchical spatial scales.

PubMed

Abrego, Nerea; García-Baquero, Gonzalo; Halme, Panu; Ovaskainen, Otso; Salcedo, Isabel

2014-01-01

For efficient use of conservation resources it is important to determine how species diversity changes across spatial scales. In many poorly known species groups little is known about at which spatial scales the conservation efforts should be focused. Here we examined how the community turnover of wood-inhabiting fungi is realised at three hierarchical levels, and how much of community variation is explained by variation in resource composition and spatial proximity. The hierarchical study design consisted of management type (fixed factor), forest site (random factor, nested within management type) and study plots (randomly placed plots within each study site). To examine how species richness varied across the three hierarchical scales, randomized species accumulation curves and additive partitioning of species richness were applied. To analyse variation in wood-inhabiting species and dead wood composition at each scale, linear and Permanova modelling approaches were used. Wood-inhabiting fungal communities were dominated by rare and infrequent species. The similarity of fungal communities was higher within sites and within management categories than among sites or between the two management categories, and it decreased with increasing distance among the sampling plots and with decreasing similarity of dead wood resources. However, only a small part of community variation could be explained by these factors. The species present in managed forests were in a large extent a subset of those species present in natural forests. Our results suggest that in particular the protection of rare species requires a large total area. As managed forests have only little additional value complementing the diversity of natural forests, the conservation of natural forests is the key to ecologically effective conservation. As the dissimilarity of fungal communities increases with distance, the conserved natural forest sites should be broadly distributed in space, yet the individual conserved areas should be large enough to ensure local persistence.
Community Turnover of Wood-Inhabiting Fungi across Hierarchical Spatial Scales

PubMed Central

Abrego, Nerea; García-Baquero, Gonzalo; Halme, Panu; Ovaskainen, Otso; Salcedo, Isabel

2014-01-01

For efficient use of conservation resources it is important to determine how species diversity changes across spatial scales. In many poorly known species groups little is known about at which spatial scales the conservation efforts should be focused. Here we examined how the community turnover of wood-inhabiting fungi is realised at three hierarchical levels, and how much of community variation is explained by variation in resource composition and spatial proximity. The hierarchical study design consisted of management type (fixed factor), forest site (random factor, nested within management type) and study plots (randomly placed plots within each study site). To examine how species richness varied across the three hierarchical scales, randomized species accumulation curves and additive partitioning of species richness were applied. To analyse variation in wood-inhabiting species and dead wood composition at each scale, linear and Permanova modelling approaches were used. Wood-inhabiting fungal communities were dominated by rare and infrequent species. The similarity of fungal communities was higher within sites and within management categories than among sites or between the two management categories, and it decreased with increasing distance among the sampling plots and with decreasing similarity of dead wood resources. However, only a small part of community variation could be explained by these factors. The species present in managed forests were in a large extent a subset of those species present in natural forests. Our results suggest that in particular the protection of rare species requires a large total area. As managed forests have only little additional value complementing the diversity of natural forests, the conservation of natural forests is the key to ecologically effective conservation. As the dissimilarity of fungal communities increases with distance, the conserved natural forest sites should be broadly distributed in space, yet the individual conserved areas should be large enough to ensure local persistence. PMID:25058128

Decision tree modeling using R.

PubMed

Zhang, Zhongheng

2016-08-01

In machine learning field, decision tree learner is powerful and easy to interpret. It employs recursive binary partitioning algorithm that splits the sample in partitioning variable with the strongest association with the response variable. The process continues until some stopping criteria are met. In the example I focus on conditional inference tree, which incorporates tree-structured regression models into conditional inference procedures. While growing a single tree is subject to small changes in the training data, random forests procedure is introduced to address this problem. The sources of diversity for random forests come from the random sampling and restricted set of input variables to be selected. Finally, I introduce R functions to perform model based recursive partitioning. This method incorporates recursive partitioning into conventional parametric model building.
Predicting CD4 count changes among patients on antiretroviral treatment: Application of data mining techniques.

PubMed

Kebede, Mihiretu; Zegeye, Desalegn Tigabu; Zeleke, Berihun Megabiaw

2017-12-01

To monitor the progress of therapy and disease progression, periodic CD4 counts are required throughout the course of HIV/AIDS care and support. The demand for CD4 count measurement is increasing as ART programs expand over the last decade. This study aimed to predict CD4 count changes and to identify the predictors of CD4 count changes among patients on ART. A cross-sectional study was conducted at the University of Gondar Hospital from 3,104 adult patients on ART with CD4 counts measured at least twice (baseline and most recent). Data were retrieved from the HIV care clinic electronic database and patients` charts. Descriptive data were analyzed by SPSS version 20. Cross-Industry Standard Process for Data Mining (CRISP-DM) methodology was followed to undertake the study. WEKA version 3.8 was used to conduct a predictive data mining. Before building the predictive data mining models, information gain values and correlation-based Feature Selection methods were used for attribute selection. Variables were ranked according to their relevance based on their information gain values. J48, Neural Network, and Random Forest algorithms were experimented to assess model accuracies. The median duration of ART was 191.5 weeks. The mean CD4 count change was 243 (SD 191.14) cells per microliter. Overall, 2427 (78.2%) patients had their CD4 counts increased by at least 100 cells per microliter, while 4% had a decline from the baseline CD4 value. Baseline variables including age, educational status, CD8 count, ART regimen, and hemoglobin levels predicted CD4 count changes with predictive accuracies of J48, Neural Network, and Random Forest being 87.1%, 83.5%, and 99.8%, respectively. Random Forest algorithm had a superior performance accuracy level than both J48 and Artificial Neural Network. The precision, sensitivity and recall values of Random Forest were also more than 99%. Nearly accurate prediction results were obtained using Random Forest algorithm. This algorithm could be used in a low-resource setting to build a web-based prediction model for CD4 count changes. Copyright © 2017 Elsevier B.V. All rights reserved.
Estimating the impact of mineral aerosols on crop yields in food insecure regions using statistical crop models

NASA Astrophysics Data System (ADS)

Hoffman, A.; Forest, C. E.; Kemanian, A.

2016-12-01

A significant number of food-insecure nations exist in regions of the world where dust plays a large role in the climate system. While the impacts of common climate variables (e.g. temperature, precipitation, ozone, and carbon dioxide) on crop yields are relatively well understood, the impact of mineral aerosols on yields have not yet been thoroughly investigated. This research aims to develop the data and tools to progress our understanding of mineral aerosol impacts on crop yields. Suspended dust affects crop yields by altering the amount and type of radiation reaching the plant, modifying local temperature and precipitation. While dust events (i.e. dust storms) affect crop yields by depleting the soil of nutrients or by defoliation via particle abrasion. The impact of dust on yields is modeled statistically because we are uncertain which impacts will dominate the response on national and regional scales considered in this study. Multiple linear regression is used in a number of large-scale statistical crop modeling studies to estimate yield responses to various climate variables. In alignment with previous work, we develop linear crop models, but build upon this simple method of regression with machine-learning techniques (e.g. random forests) to identify important statistical predictors and isolate how dust affects yields on the scales of interest. To perform this analysis, we develop a crop-climate dataset for maize, soybean, groundnut, sorghum, rice, and wheat for the regions of West Africa, East Africa, South Africa, and the Sahel. Random forest regression models consistently model historic crop yields better than the linear models. In several instances, the random forest models accurately capture the temperature and precipitation threshold behavior in crops. Additionally, improving agricultural technology has caused a well-documented positive trend that dominates time series of global and regional yields. This trend is often removed before regression with traditional crop models, but likely at the cost of removing climate information. Our random forest models consistently discover the positive trend without removing any additional data. The application of random forests as a statistical crop model provides insight into understanding the impact of dust on yields in marginal food producing regions.
Characterizing channel change along a multithread gravel-bed river using random forest image classification

NASA Astrophysics Data System (ADS)

Overstreet, B. T.; Legleiter, C. J.

2012-12-01

The Snake River in Grand Teton National Park is a dam-regulated but highly dynamic gravel-bed river that alternates between a single thread and a multithread planform. Identifying key drivers of channel change on this river could improve our understanding of 1) how flow regulation at Jackson Lake Dam has altered the character of the river over time; 2) how changes in the distribution of various types of vegetation impacts river dynamics; and 3) how the Snake River will respond to future human and climate driven disturbances. Despite the importance of monitoring planform changes over time, automated channel extraction and understanding the physical drivers contributing to channel change continue to be challenging yet critical steps in the remote sensing of riverine environments. In this study we use the random forest statistical technique to first classify land cover within the Snake River corridor and then extract channel features from a sequence of high-resolution multispectral images of the Snake River spanning the period from 2006 to 2012, which encompasses both exceptionally dry years and near-record runoff in 2011. We show that the random forest technique can be used to classify images with as few as four spectral bands with far greater accuracy than traditional single-tree classification approaches. Secondly, we couple random forest derived land cover maps with LiDAR derived topography, bathymetry, and canopy height to explore physical drivers contributing to observed channel changes on the Snake River. In conclusion we show that the random forest technique is a powerful tool for classifying multispectral images of rivers. Moreover, we hypothesize that with sufficient data for calculating spatially distributed metrics of channel form and more frequent channel monitoring, this tool can also be used to identify areas with high probabilities of channel change. Land cover maps of a portion of the Snake River produced from digital aerial photography from 2010 and a 2011 WorldView2 satellite image. This pair of maps thus captures changes that occurred during the 2011 runoff
Understory vegetation as an indicator for floodplain forest restoration in the Mississippi River Alluvial Valley, U.S.A.

USGS Publications Warehouse

De Steven, Diane; Faulkner, Stephen; Keeland, Bobby D.; Baldwin, Michael; McCoy, John W.; Hughes, Steven C.

2015-01-01

In the Mississippi River Alluvial Valley (MAV), complete alteration of river-floodplain hydrology allowed for widespreadconversion of forested bottomlands to intensive agriculture, resulting in nearly 80% forest loss. Governmental programs haveattempted to restore forest habitat and functions within this altered landscape by the methods of tree planting (afforestation)and local hydrologic enhancement on reclaimed croplands. Early assessments identified factors that influenced whetherplanting plus tree colonization could establish an overstory community similar to natural bottomland forests. The extentto which afforested sites develop typical understory vegetation has not been evaluated, yet understory composition may beindicative of restored site conditions. As part of a broad study quantifying the ecosystem services gained from restorationefforts, understory vegetation was compared between 37 afforested sites and 26 mature forest sites. Differences in vegetationattributes for species growth forms, wetland indicator classes, and native status were tested with univariate analyses;floristic composition data were analyzed by multivariate techniques. Understory vegetation of restoration sites was generallyhydrophytic, but species composition differed from that of mature bottomland forest because of young successional age anddiffering responses of plant growth forms. Attribute and floristic variation among restoration sites was related to variationin canopy development and local wetness conditions, which in turn reflected both intrinsic site features and outcomes ofrestoration practices. Thus, understory vegetation is a useful indicator of functional progress in floodplain forest restoration.
Influences of watershed geomorphology on extent and composition of riparian vegetation

Treesearch

Blake M. Engelhardt; Peter J. Weisberg; Jeanne C. Chambers

2011-01-01

Watershed (drainage basin) morphometry and geology were derived from digital data sets (DEMs and geologic maps). Riparian corridors were classified into five vegetation types (riparian forest, riparian shrub, wet/mesic meadow, dry meadow and shrub dry meadow) using high-resolution aerial photography. Regression and multivariate analyses were used to relate geomorphic...
Pigmented skin lesion detection using random forest and wavelet-based texture

NASA Astrophysics Data System (ADS)

Hu, Ping; Yang, Tie-jun

2016-10-01

The incidence of cutaneous malignant melanoma, a disease of worldwide distribution and is the deadliest form of skin cancer, has been rapidly increasing over the last few decades. Because advanced cutaneous melanoma is still incurable, early detection is an important step toward a reduction in mortality. Dermoscopy photographs are commonly used in melanoma diagnosis and can capture detailed features of a lesion. A great variability exists in the visual appearance of pigmented skin lesions. Therefore, in order to minimize the diagnostic errors that result from the difficulty and subjectivity of visual interpretation, an automatic detection approach is required. The objectives of this paper were to propose a hybrid method using random forest and Gabor wavelet transformation to accurately differentiate which part belong to lesion area and the other is not in a dermoscopy photographs and analyze segmentation accuracy. A random forest classifier consisting of a set of decision trees was used for classification. Gabor wavelets transformation are the mathematical model of visual cortical cells of mammalian brain and an image can be decomposed into multiple scales and multiple orientations by using it. The Gabor function has been recognized as a very useful tool in texture analysis, due to its optimal localization properties in both spatial and frequency domain. Texture features based on Gabor wavelets transformation are found by the Gabor filtered image. Experiment results indicate the following: (1) the proposed algorithm based on random forest outperformed the-state-of-the-art in pigmented skin lesions detection (2) and the inclusion of Gabor wavelet transformation based texture features improved segmentation accuracy significantly.
Modeling urban coastal flood severity from crowd-sourced flood reports using Poisson regression and Random Forest

NASA Astrophysics Data System (ADS)

Sadler, J. M.; Goodall, J. L.; Morsy, M. M.; Spencer, K.

2018-04-01

Sea level rise has already caused more frequent and severe coastal flooding and this trend will likely continue. Flood prediction is an essential part of a coastal city's capacity to adapt to and mitigate this growing problem. Complex coastal urban hydrological systems however, do not always lend themselves easily to physically-based flood prediction approaches. This paper presents a method for using a data-driven approach to estimate flood severity in an urban coastal setting using crowd-sourced data, a non-traditional but growing data source, along with environmental observation data. Two data-driven models, Poisson regression and Random Forest regression, are trained to predict the number of flood reports per storm event as a proxy for flood severity, given extensive environmental data (i.e., rainfall, tide, groundwater table level, and wind conditions) as input. The method is demonstrated using data from Norfolk, Virginia USA from September 2010 to October 2016. Quality-controlled, crowd-sourced street flooding reports ranging from 1 to 159 per storm event for 45 storm events are used to train and evaluate the models. Random Forest performed better than Poisson regression at predicting the number of flood reports and had a lower false negative rate. From the Random Forest model, total cumulative rainfall was by far the most dominant input variable in predicting flood severity, followed by low tide and lower low tide. These methods serve as a first step toward using data-driven methods for spatially and temporally detailed coastal urban flood prediction.
Personalized Risk Prediction in Clinical Oncology Research: Applications and Practical Issues Using Survival Trees and Random Forests.

PubMed

Hu, Chen; Steingrimsson, Jon Arni

2018-01-01

A crucial component of making individualized treatment decisions is to accurately predict each patient's disease risk. In clinical oncology, disease risks are often measured through time-to-event data, such as overall survival and progression/recurrence-free survival, and are often subject to censoring. Risk prediction models based on recursive partitioning methods are becoming increasingly popular largely due to their ability to handle nonlinear relationships, higher-order interactions, and/or high-dimensional covariates. The most popular recursive partitioning methods are versions of the Classification and Regression Tree (CART) algorithm, which builds a simple interpretable tree structured model. With the aim of increasing prediction accuracy, the random forest algorithm averages multiple CART trees, creating a flexible risk prediction model. Risk prediction models used in clinical oncology commonly use both traditional demographic and tumor pathological factors as well as high-dimensional genetic markers and treatment parameters from multimodality treatments. In this article, we describe the most commonly used extensions of the CART and random forest algorithms to right-censored outcomes. We focus on how they differ from the methods for noncensored outcomes, and how the different splitting rules and methods for cost-complexity pruning impact these algorithms. We demonstrate these algorithms by analyzing a randomized Phase III clinical trial of breast cancer. We also conduct Monte Carlo simulations to compare the prediction accuracy of survival forests with more commonly used regression models under various scenarios. These simulation studies aim to evaluate how sensitive the prediction accuracy is to the underlying model specifications, the choice of tuning parameters, and the degrees of missing covariates.
Applications of High Resolution Laser Induced Breakdown Spectroscopy for Environmental and Biological Samples

DOE Office of Scientific and Technical Information (OSTI.GOV)

Martin, Madhavi Z; Labbe, Nicole; Wagner, Rebekah J.

2013-01-01

This chapter details the application of LIBS in a number of environmental areas of research such as carbon sequestration and climate change. LIBS has also been shown to be useful in other high resolution environmental applications for example, elemental mapping and detection of metals in plant materials. LIBS has also been used in phytoremediation applications. Other biological research involves a detailed understanding of wood chemistry response to precipitation variations and also to forest fires. A cross-section of Mountain pine (pinceae Pinus pungen Lamb.) was scanned using a translational stage to determine the differences in the chemical features both before andmore » after a fire event. Consequently, by monitoring the elemental composition pattern of a tree and by looking for abrupt changes, one can reconstruct the disturbance history of a tree and a forest. Lastly we have shown that multivariate analysis of the LIBS data is necessary to standardize the analysis and correlate to other standard laboratory techniques. LIBS along with multivariate statistical analysis makes it a very powerful technology that can be transferred from laboratory to field applications with ease.« less
Bayesian Estimation of Random Coefficient Dynamic Factor Models

ERIC Educational Resources Information Center

Song, Hairong; Ferrer, Emilio

2012-01-01

Dynamic factor models (DFMs) have typically been applied to multivariate time series data collected from a single unit of study, such as a single individual or dyad. The goal of DFMs application is to capture dynamics of multivariate systems. When multiple units are available, however, DFMs are not suited to capture variations in dynamics across…
Landscape genetics of leaf-toed geckos in the tropical dry forest of northern Mexico.

PubMed

Blair, Christopher; Jiménez Arcos, Victor H; Mendez de la Cruz, Fausto R; Murphy, Robert W

2013-01-01

Habitat fragmentation due to both natural and anthropogenic forces continues to threaten the evolution and maintenance of biological diversity. This is of particular concern in tropical regions that are experiencing elevated rates of habitat loss. Although less well-studied than tropical rain forests, tropical dry forests (TDF) contain an enormous diversity of species and continue to be threatened by anthropogenic activities including grazing and agriculture. However, little is known about the processes that shape genetic connectivity in species inhabiting TDF ecosystems. We adopt a landscape genetic approach to understanding functional connectivity for leaf-toed geckos (Phyllodactylus tuberculosus) at multiple sites near the northernmost limit of this ecosystem at Alamos, Sonora, Mexico. Traditional analyses of population genetics are combined with multivariate GIS-based landscape analyses to test hypotheses on the potential drivers of spatial genetic variation. Moderate levels of within-population diversity and substantial levels of population differentiation are revealed by FST and Dest. Analyses using structure suggest the occurrence of from 2 to 9 genetic clusters depending on the model used. Landscape genetic analysis suggests that forest cover, stream connectivity, undisturbed habitat, slope, and minimum temperature of the coldest period explain more genetic variation than do simple Euclidean distances. Additional landscape genetic studies throughout TDF habitat are required to understand species-specific responses to landscape and climate change and to identify common drivers. We urge researchers interested in using multivariate distance methods to test for, and report, significant correlations among predictor matrices that can impact results, particularly when adopting least-cost path approaches. Further investigation into the use of information theoretic approaches for model selection is also warranted.
Multivariate statistical analysis of wildfires in Portugal

NASA Astrophysics Data System (ADS)

Costa, Ricardo; Caramelo, Liliana; Pereira, Mário

2013-04-01

Several studies demonstrate that wildfires in Portugal present high temporal and spatial variability as well as cluster behavior (Pereira et al., 2005, 2011). This study aims to contribute to the characterization of the fire regime in Portugal with the multivariate statistical analysis of the time series of number of fires and area burned in Portugal during the 1980 - 2009 period. The data used in the analysis is an extended version of the Rural Fire Portuguese Database (PRFD) (Pereira et al, 2011), provided by the National Forest Authority (Autoridade Florestal Nacional, AFN), the Portuguese Forest Service, which includes information for more than 500,000 fire records. There are many multiple advanced techniques for examining the relationships among multiple time series at the same time (e.g., canonical correlation analysis, principal components analysis, factor analysis, path analysis, multiple analyses of variance, clustering systems). This study compares and discusses the results obtained with these different techniques. Pereira, M.G., Trigo, R.M., DaCamara, C.C., Pereira, J.M.C., Leite, S.M., 2005: "Synoptic patterns associated with large summer forest fires in Portugal". Agricultural and Forest Meteorology. 129, 11-25. Pereira, M. G., Malamud, B. D., Trigo, R. M., and Alves, P. I.: The history and characteristics of the 1980-2005 Portuguese rural fire database, Nat. Hazards Earth Syst. Sci., 11, 3343-3358, doi:10.5194/nhess-11-3343-2011, 2011 This work is supported by European Union Funds (FEDER/COMPETE - Operational Competitiveness Programme) and by national funds (FCT - Portuguese Foundation for Science and Technology) under the project FCOMP-01-0124-FEDER-022692, the project FLAIR (PTDC/AAC-AMB/104702/2008) and the EU 7th Framework Program through FUME (contract number 243888).
Epidemiology of forest malaria in Central Vietnam: the hidden parasite reservoir.

PubMed

Thanh, Pham Vinh; Van Hong, Nguyen; Van Van, Nguyen; Van Malderen, Carine; Obsomer, Valérie; Rosanas-Urgell, Anna; Grietens, Koen Peeters; Xa, Nguyen Xuan; Bancone, Germana; Chowwiwat, Nongnud; Duong, Tran Thanh; D'Alessandro, Umberto; Speybroeck, Niko; Erhart, Annette

2015-02-19

After successfully reducing the malaria burden to pre-elimination levels over the past two decades, the national malaria programme in Vietnam has recently switched from control to elimination. However, in forested areas of Central Vietnam malaria elimination is likely to be jeopardized by the high occurrence of asymptomatic and submicroscopic infections as shown by previous reports. This paper presents the results of a malaria survey carried out in a remote forested area of Central Vietnam where we evaluated malaria prevalence and risk factors for infection. After a full census (four study villages = 1,810 inhabitants), the study population was screened for malaria infections by standard microscopy and, if needed, treated according to national guidelines. An additional blood sample on filter paper was also taken in a random sample of the population for later polymerase chain reaction (PCR) and more accurate estimation of the actual burden of malaria infections. The risk factor analysis for malaria infections was done using survey multivariate logistic regression as well as the classification and regression tree method (CART). A total of 1,450 individuals were screened. Malaria prevalence by microscopy was 7.8% (ranging from 3.9 to 10.9% across villages) mostly Plasmodium falciparum (81.4%) or Plasmodium vivax (17.7%) mono-infections; a large majority (69.9%) was asymptomatic. By PCR, the prevalence was estimated at 22.6% (ranging from 16.4 to 42.5%) with a higher proportion of P. vivax mono-infections (43.2%). The proportion of sub-patent infections increased with increasing age and with decreasing prevalence across villages. The main risk factors were young age, village, house structure, and absence of bed net. This study confirmed that in Central Vietnam a substantial part of the human malaria reservoir is hidden. Additional studies are urgently needed to assess the contribution of this hidden reservoir to the maintenance of malaria transmission. Such evidence will be crucial for guiding elimination strategies.
Patch forest: a hybrid framework of random forest and patch-based segmentation

NASA Astrophysics Data System (ADS)

Xie, Zhongliu; Gillies, Duncan

2016-03-01

The development of an accurate, robust and fast segmentation algorithm has long been a research focus in medical computer vision. State-of-the-art practices often involve non-rigidly registering a target image with a set of training atlases for label propagation over the target space to perform segmentation, a.k.a. multi-atlas label propagation (MALP). In recent years, the patch-based segmentation (PBS) framework has gained wide attention due to its advantage of relaxing the strict voxel-to-voxel correspondence to a series of pair-wise patch comparisons for contextual pattern matching. Despite a high accuracy reported in many scenarios, computational efficiency has consistently been a major obstacle for both approaches. Inspired by recent work on random forest, in this paper we propose a patch forest approach, which by equipping the conventional PBS with a fast patch search engine, is able to boost segmentation speed significantly while retaining an equal level of accuracy. In addition, a fast forest training mechanism is also proposed, with the use of a dynamic grid framework to efficiently approximate data compactness computation and a 3D integral image technique for fast box feature retrieval.
Quantifying the effect of forests on frequency and intensity of rockfalls

NASA Astrophysics Data System (ADS)

Moos, Christine; Dorren, Luuk; Stoffel, Markus

2017-02-01

Forests serve as a natural means of protection against small rockfalls. Due to their barrier effect, they reduce the intensity and the propagation probability of falling rocks and thus reduce the occurrence frequency of a rockfall event for a given element at risk. However, despite established knowledge on the protective effect of forests, they are generally neglected in quantitative rockfall risk analyses. Their inclusion in quantitative rockfall risk assessment would, however, be necessary to express their efficiency in monetary terms and to allow comparison of forests with other protective measures, such as nets and dams. The goal of this study is to quantify the effect of forests on the occurrence frequency and intensity of rockfalls. We therefore defined an onset frequency of blocks based on a power-law magnitude-frequency distribution and determined their propagation probabilities on a virtual slope based on rockfall simulations. Simulations were run for different forest and non-forest scenarios under varying forest stand and terrain conditions. We analysed rockfall frequencies and intensities at five different distances from the release area. Based on two multivariate statistical prediction models, we investigated which of the terrain and forest characteristics predominantly drive the role of forest in reducing rockfall occurrence frequency and intensity and whether they are able to predict the effect of forest on rockfall risk. The rockfall occurrence frequency below forested slopes is reduced between approximately 10 and 90 % compared to non-forested slope conditions; whereas rockfall intensity is reduced by 10 to 70 %. This reduction increases with increasing slope length and decreases with decreasing tree density, tree diameter and increasing rock volume, as well as in cases of clustered or gappy forest structures. The statistical prediction models reveal that the cumulative basal area of trees, block volume and horizontal forest structure represent key variables for the prediction of the protective effect of forests. In order to validate these results, models have to be tested on real slopes with a wide variation of terrain and forest conditions.
Multi-label spacecraft electrical signal classification method based on DBN and random forest

PubMed Central

Li, Ke; Yu, Nan; Li, Pengfei; Song, Shimin; Wu, Yalei; Li, Yang; Liu, Meng

2017-01-01

In spacecraft electrical signal characteristic data, there exists a large amount of data with high-dimensional features, a high computational complexity degree, and a low rate of identification problems, which causes great difficulty in fault diagnosis of spacecraft electronic load systems. This paper proposes a feature extraction method that is based on deep belief networks (DBN) and a classification method that is based on the random forest (RF) algorithm; The proposed algorithm mainly employs a multi-layer neural network to reduce the dimension of the original data, and then, classification is applied. Firstly, we use the method of wavelet denoising, which was used to pre-process the data. Secondly, the deep belief network is used to reduce the feature dimension and improve the rate of classification for the electrical characteristics data. Finally, we used the random forest algorithm to classify the data and comparing it with other algorithms. The experimental results show that compared with other algorithms, the proposed method shows excellent performance in terms of accuracy, computational efficiency, and stability in addressing spacecraft electrical signal data. PMID:28486479
Multi-label spacecraft electrical signal classification method based on DBN and random forest.

PubMed

Li, Ke; Yu, Nan; Li, Pengfei; Song, Shimin; Wu, Yalei; Li, Yang; Liu, Meng

2017-01-01

In spacecraft electrical signal characteristic data, there exists a large amount of data with high-dimensional features, a high computational complexity degree, and a low rate of identification problems, which causes great difficulty in fault diagnosis of spacecraft electronic load systems. This paper proposes a feature extraction method that is based on deep belief networks (DBN) and a classification method that is based on the random forest (RF) algorithm; The proposed algorithm mainly employs a multi-layer neural network to reduce the dimension of the original data, and then, classification is applied. Firstly, we use the method of wavelet denoising, which was used to pre-process the data. Secondly, the deep belief network is used to reduce the feature dimension and improve the rate of classification for the electrical characteristics data. Finally, we used the random forest algorithm to classify the data and comparing it with other algorithms. The experimental results show that compared with other algorithms, the proposed method shows excellent performance in terms of accuracy, computational efficiency, and stability in addressing spacecraft electrical signal data.
Intelligent Fault Diagnosis of HVCB with Feature Space Optimization-Based Random Forest

PubMed Central

Ma, Suliang; Wu, Jianwen; Wang, Yuhao; Jia, Bowen; Jiang, Yuan

2018-01-01

Mechanical faults of high-voltage circuit breakers (HVCBs) always happen over long-term operation, so extracting the fault features and identifying the fault type have become a key issue for ensuring the security and reliability of power supply. Based on wavelet packet decomposition technology and random forest algorithm, an effective identification system was developed in this paper. First, compared with the incomplete description of Shannon entropy, the wavelet packet time-frequency energy rate (WTFER) was adopted as the input vector for the classifier model in the feature selection procedure. Then, a random forest classifier was used to diagnose the HVCB fault, assess the importance of the feature variable and optimize the feature space. Finally, the approach was verified based on actual HVCB vibration signals by considering six typical fault classes. The comparative experiment results show that the classification accuracy of the proposed method with the origin feature space reached 93.33% and reached up to 95.56% with optimized input feature vector of classifier. This indicates that feature optimization procedure is successful, and the proposed diagnosis algorithm has higher efficiency and robustness than traditional methods. PMID:29659548
Spectral Analysis of Ultrasound Radiofrequency Backscatter for the Detection of Intercostal Blood Vessels.

PubMed

Klingensmith, Jon D; Haggard, Asher; Fedewa, Russell J; Qiang, Beidi; Cummings, Kenneth; DeGrande, Sean; Vince, D Geoffrey; Elsharkawy, Hesham

2018-04-19

Spectral analysis of ultrasound radiofrequency backscatter has the potential to identify intercostal blood vessels during ultrasound-guided placement of paravertebral nerve blocks and intercostal nerve blocks. Autoregressive models were used for spectral estimation, and bandwidth, autoregressive order and region-of-interest size were evaluated. Eight spectral parameters were calculated and used to create random forests. An autoregressive order of 10, bandwidth of 6 dB and region-of-interest size of 1.0 mm resulted in the minimum out-of-bag error. An additional random forest, using these chosen values, was created from 70% of the data and evaluated independently from the remaining 30% of data. The random forest achieved a predictive accuracy of 92% and Youden's index of 0.85. These results suggest that spectral analysis of ultrasound radiofrequency backscatter has the potential to identify intercostal blood vessels. (jokling@siue.edu) © 2018 World Federation for Ultrasound in Medicine and Biology. Copyright © 2018 World Federation for Ultrasound in Medicine and Biology. Published by Elsevier Inc. All rights reserved.

RandomForest4Life: a Random Forest for predicting ALS disease progression.

PubMed

Hothorn, Torsten; Jung, Hans H

2014-09-01

We describe a method for predicting disease progression in amyotrophic lateral sclerosis (ALS) patients. The method was developed as a submission to the DREAM Phil Bowen ALS Prediction Prize4Life Challenge of summer 2012. Based on repeated patient examinations over a three- month period, we used a random forest algorithm to predict future disease progression. The procedure was set up and internally evaluated using data from 1197 ALS patients. External validation by an expert jury was based on undisclosed information of an additional 625 patients; all patient data were obtained from the PRO-ACT database. In terms of prediction accuracy, the approach described here ranked third best. Our interpretation of the prediction model confirmed previous reports suggesting that past disease progression is a strong predictor of future disease progression measured on the ALS functional rating scale (ALSFRS). We also found that larger variability in initial ALSFRS scores is linked to faster future disease progression. The results reported here furthermore suggested that approaches taking the multidimensionality of the ALSFRS into account promise some potential for improved ALS disease prediction.
RAQ–A Random Forest Approach for Predicting Air Quality in Urban Sensing Systems

PubMed Central

Yu, Ruiyun; Yang, Yu; Yang, Leyou; Han, Guangjie; Move, Oguti Ann

2016-01-01

Air quality information such as the concentration of PM2.5 is of great significance for human health and city management. It affects the way of traveling, urban planning, government policies and so on. However, in major cities there is typically only a limited number of air quality monitoring stations. In the meantime, air quality varies in the urban areas and there can be large differences, even between closely neighboring regions. In this paper, a random forest approach for predicting air quality (RAQ) is proposed for urban sensing systems. The data generated by urban sensing includes meteorology data, road information, real-time traffic status and point of interest (POI) distribution. The random forest algorithm is exploited for data training and prediction. The performance of RAQ is evaluated with real city data. Compared with three other algorithms, this approach achieves better prediction precision. Exciting results are observed from the experiments that the air quality can be inferred with amazingly high accuracy from the data which are obtained from urban sensing. PMID:26761008
PET-CT image fusion using random forest and à-trous wavelet transform.

PubMed

Seal, Ayan; Bhattacharjee, Debotosh; Nasipuri, Mita; Rodríguez-Esparragón, Dionisio; Menasalvas, Ernestina; Gonzalo-Martin, Consuelo

2018-03-01

New image fusion rules for multimodal medical images are proposed in this work. Image fusion rules are defined by random forest learning algorithm and a translation-invariant à-trous wavelet transform (AWT). The proposed method is threefold. First, source images are decomposed into approximation and detail coefficients using AWT. Second, random forest is used to choose pixels from the approximation and detail coefficients for forming the approximation and detail coefficients of the fused image. Lastly, inverse AWT is applied to reconstruct fused image. All experiments have been performed on 198 slices of both computed tomography and positron emission tomography images of a patient. A traditional fusion method based on Mallat wavelet transform has also been implemented on these slices. A new image fusion performance measure along with 4 existing measures has been presented, which helps to compare the performance of 2 pixel level fusion methods. The experimental results clearly indicate that the proposed method outperforms the traditional method in terms of visual and quantitative qualities and the new measure is meaningful. Copyright © 2017 John Wiley & Sons, Ltd.
GPURFSCREEN: a GPU based virtual screening tool using random forest classifier.

PubMed

Jayaraj, P B; Ajay, Mathias K; Nufail, M; Gopakumar, G; Jaleel, U C A

2016-01-01

In-silico methods are an integral part of modern drug discovery paradigm. Virtual screening, an in-silico method, is used to refine data models and reduce the chemical space on which wet lab experiments need to be performed. Virtual screening of a ligand data model requires large scale computations, making it a highly time consuming task. This process can be speeded up by implementing parallelized algorithms on a Graphical Processing Unit (GPU). Random Forest is a robust classification algorithm that can be employed in the virtual screening. A ligand based virtual screening tool (GPURFSCREEN) that uses random forests on GPU systems has been proposed and evaluated in this paper. This tool produces optimized results at a lower execution time for large bioassay data sets. The quality of results produced by our tool on GPU is same as that on a regular serial environment. Considering the magnitude of data to be screened, the parallelized virtual screening has a significantly lower running time at high throughput. The proposed parallel tool outperforms its serial counterpart by successfully screening billions of molecules in training and prediction phases.
Monograph on the use of the multivariate Gram Charlier series Type A

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hatayodom, T.; Heydt, G.

1978-01-01

The Gram-Charlier series in an infinite series expansion for a probability density function (pdf) in which terms of the series are Hermite polynomials. There are several Gram-Charlier series - the best known is Type A. The Gram-Charlier series, Type A (GCA) exists for both univariate and multivariate random variables. This monograph introduces the multivariate GCA and illustrates its use through several examples. A brief bibliography and discussion of Hermite polynomials is also included. 9 figures, 2 tables.
A mixed-effects regression model for longitudinal multivariate ordinal data.

PubMed

Liu, Li C; Hedeker, Donald

2006-03-01

A mixed-effects item response theory model that allows for three-level multivariate ordinal outcomes and accommodates multiple random subject effects is proposed for analysis of multivariate ordinal outcomes in longitudinal studies. This model allows for the estimation of different item factor loadings (item discrimination parameters) for the multiple outcomes. The covariates in the model do not have to follow the proportional odds assumption and can be at any level. Assuming either a probit or logistic response function, maximum marginal likelihood estimation is proposed utilizing multidimensional Gauss-Hermite quadrature for integration of the random effects. An iterative Fisher scoring solution, which provides standard errors for all model parameters, is used. An analysis of a longitudinal substance use data set, where four items of substance use behavior (cigarette use, alcohol use, marijuana use, and getting drunk or high) are repeatedly measured over time, is used to illustrate application of the proposed model.
Multiscale habitat use and selection in cooperatively breeding Micronesian kingfishers

USGS Publications Warehouse

Kesler, D.C.; Haig, S.M.

2007-01-01

Information about the interaction between behavior and landscape resources is key to directing conservation management for endangered species. We studied multi-scale occurrence, habitat use, and selection in a cooperatively breeding population of Micronesian kingfishers (Todiramphus cinnamominus) on the island of Pohnpei, Federated States of Micronesia. At the landscape level, point-transect surveys resulted in kingfisher detection frequencies that were higher than those reported in 1994, although they remained 15-40% lower than 1983 indices. Integration of spatially explicit vegetation information with survey results indicated that kingfisher detections were positively associated with the amount of wet forest and grass-urban vegetative cover, and they were negatively associated with agricultural forest, secondary vegetation, and upland forest cover types. We used radiotelemetry and remote sensing to evaluate habitat use by individual kingfishers at the home-range scale. A comparison of habitats in Micronesian kingfisher home ranges with those in randomly placed polygons illustrated that birds used more forested areas than were randomly available in the immediate surrounding area. Further, members of cooperatively breeding groups included more forest in their home ranges than birds in pair-breeding territories, and forested portions of study areas appeared to be saturated with territories. Together, these results suggested that forest habitats were limited for Micronesian kingfishers. Thus, protecting and managing forests is important for the restoration of Micronesian kingfishers to the island of Guam (United States Territory), where they are currently extirpated, as well as to maintaining kingfisher populations on the islands of Pohnpei and Palau. Results further indicated that limited forest resources may restrict dispersal opportunities and, therefore, play a role in delayed dispersal and cooperative behaviors in Micronesian kingfishers.
Impact of Resident Rotations on Critically Ill Patient Outcomes: Results of a French Multicenter Observational Study.

PubMed

Chousterman, Benjamin G; Pirracchio, Romain; Guidet, Bertrand; Aegerter, Philippe; Mentec, Hervé

2016-01-01

The impact of resident rotation on patient outcomes in the intensive care unit (ICU) has been poorly studied. The aim of this study was to address this question using a large ICU database. We retrospectively analyzed the French CUB-REA database. French residents rotate every six months. Two periods were compared: the first (POST) and fifth (PRE) months of the rotation. The primary endpoint was ICU mortality. The secondary endpoints were the length of ICU stay (LOS), the number of organ supports, and the duration of mechanical ventilation (DMV). The impact of resident rotation was explored using multivariate regression, classification tree and random forest models. 262,772 patients were included between 1996 and 2010 in the database. The patient characteristics were similar between the PRE (n = 44,431) and POST (n = 49,979) periods. Multivariate analysis did not reveal any impact of resident rotation on ICU mortality (OR = 1.01, 95% CI = 0.94; 1.07, p = 0.91). Based on the classification trees, the SAPS II and the number of organ failures were the strongest predictors of ICU mortality. In the less severe patients (SAPS II<24), the POST period was associated with increased mortality (OR = 1.65, 95%CI = 1.17-2.33, p = 0.004). After adjustment, no significant association was observed between the rotation period and the LOS, the number of organ supports, or the DMV. Resident rotation exerts no impact on overall ICU mortality at French teaching hospitals but might affect the prognosis of less severe ICU patients. Surveillance should be reinforced when treating those patients.
A detailed comparison of analysis processes for MCC-IMS data in disease classification—Automated methods can replace manual peak annotations

PubMed Central

Horsch, Salome; Kopczynski, Dominik; Kuthe, Elias; Baumbach, Jörg Ingo; Rahmann, Sven

2017-01-01

Motivation Disease classification from molecular measurements typically requires an analysis pipeline from raw noisy measurements to final classification results. Multi capillary column—ion mobility spectrometry (MCC-IMS) is a promising technology for the detection of volatile organic compounds in the air of exhaled breath. From raw measurements, the peak regions representing the compounds have to be identified, quantified, and clustered across different experiments. Currently, several steps of this analysis process require manual intervention of human experts. Our goal is to identify a fully automatic pipeline that yields competitive disease classification results compared to an established but subjective and tedious semi-manual process. Method We combine a large number of modern methods for peak detection, peak clustering, and multivariate classification into analysis pipelines for raw MCC-IMS data. We evaluate all combinations on three different real datasets in an unbiased cross-validation setting. We determine which specific algorithmic combinations lead to high AUC values in disease classifications across the different medical application scenarios. Results The best fully automated analysis process achieves even better classification results than the established manual process. The best algorithms for the three analysis steps are (i) SGLTR (Savitzky-Golay Laplace-operator filter thresholding regions) and LM (Local Maxima) for automated peak identification, (ii) EM clustering (Expectation Maximization) and DBSCAN (Density-Based Spatial Clustering of Applications with Noise) for the clustering step and (iii) RF (Random Forest) for multivariate classification. Thus, automated methods can replace the manual steps in the analysis process to enable an unbiased high throughput use of the technology. PMID:28910313
Modelling above Ground Biomass of Mangrove Forest Using SENTINEL-1 Imagery

NASA Astrophysics Data System (ADS)

Labadisos Argamosa, Reginald Jay; Conferido Blanco, Ariel; Balidoy Baloloy, Alvin; Gumbao Candido, Christian; Lovern Caboboy Dumalag, John Bart; Carandang Dimapilis, Lee, , Lady; Camero Paringit, Enrico

2018-04-01

Many studies have been conducted in the estimation of forest above ground biomass (AGB) using features from synthetic aperture radar (SAR). Specifically, L-band ALOS/PALSAR (wavelength 23 cm) data is often used. However, few studies have been made on the use of shorter wavelengths (e.g., C-band, 3.75 cm to 7.5 cm) for forest mapping especially in tropical forests since higher attenuation is observed for volumetric objects where energy propagated is absorbed. This study aims to model AGB estimates of mangrove forest using information derived from Sentinel-1 C-band SAR data. Combinations of polarisations (VV, VH), its derivatives, grey level co-occurrence matrix (GLCM), and its principal components were used as features for modelling AGB. Five models were tested with varying combinations of features; a) sigma nought polarisations and its derivatives; b) GLCM textures; c) the first five principal components; d) combination of models a-c; and e) the identified important features by Random Forest variable importance algorithm. Random Forest was used as regressor to compute for the AGB estimates to avoid over fitting caused by the introduction of too many features in the model. Model e obtained the highest r2 of 0.79 and an RMSE of 0.44 Mg using only four features, namely, σ°VH GLCM variance, σ°VH GLCM contrast, PC1, and PC2. This study shows that Sentinel-1 C-band SAR data could be used to produce acceptable AGB estimates in mangrove forest to compensate for the unavailability of longer wavelength SAR.
Use of DNA markers in forest tree improvement research

Treesearch

D.B. Neale; M.E. Devey; K.D. Jermstad; M.R. Ahuja; M.C. Alosi; K.A. Marshall

1992-01-01

DNA markers are rapidly being developed for forest trees. The most important markers are restriction fragment length polymorphisms (RFLPs), polymerase chain reaction- (PCR) based markers such as random amplified polymorphic DNA (RAPD), and fingerprinting markers. DNA markers can supplement isozyme markers for monitoring tree improvement activities such as; estimating...
Statistical inferences for data from studies conducted with an aggregated multivariate outcome-dependent sample design

PubMed Central

Lu, Tsui-Shan; Longnecker, Matthew P.; Zhou, Haibo

2016-01-01

Outcome-dependent sampling (ODS) scheme is a cost-effective sampling scheme where one observes the exposure with a probability that depends on the outcome. The well-known such design is the case-control design for binary response, the case-cohort design for the failure time data and the general ODS design for a continuous response. While substantial work has been done for the univariate response case, statistical inference and design for the ODS with multivariate cases remain under-developed. Motivated by the need in biological studies for taking the advantage of the available responses for subjects in a cluster, we propose a multivariate outcome dependent sampling (Multivariate-ODS) design that is based on a general selection of the continuous responses within a cluster. The proposed inference procedure for the Multivariate-ODS design is semiparametric where all the underlying distributions of covariates are modeled nonparametrically using the empirical likelihood methods. We show that the proposed estimator is consistent and developed the asymptotically normality properties. Simulation studies show that the proposed estimator is more efficient than the estimator obtained using only the simple-random-sample portion of the Multivariate-ODS or the estimator from a simple random sample with the same sample size. The Multivariate-ODS design together with the proposed estimator provides an approach to further improve study efficiency for a given fixed study budget. We illustrate the proposed design and estimator with an analysis of association of PCB exposure to hearing loss in children born to the Collaborative Perinatal Study. PMID:27966260
Forest structure and downed woody debris in boreal, temperate, and tropical forest fragments.

PubMed

Gould, William A; González, Grizelle; Hudak, Andrew T; Hollingsworth, Teresa Nettleton; Hollingsworth, Jamie

2008-12-01

Forest fragmentation affects the heterogeneity of accumulated fuels by increasing the diversity of forest types and by increasing forest edges. This heterogeneity has implications in how we manage fuels, fire, and forests. Understanding the relative importance of fragmentation on woody biomass within a single climatic regime, and along climatic gradients, will improve our ability to manage forest fuels and predict fire behavior. In this study we assessed forest fuel characteristics in stands of differing moisture, i.e., dry and moist forests, structure, i.e., open canopy (typically younger) vs. closed canopy (typically older) stands, and size, i.e., small (10-14 ha), medium (33 to 60 ha), and large (100-240 ha) along a climatic gradient of boreal, temperate, and tropical forests. We measured duff, litter, fine and coarse woody debris, standing dead, and live biomass in a series of plots along a transect from outside the forest edge to the fragment interior. The goal was to determine how forest structure and fuel characteristics varied along this transect and whether this variation differed with temperature, moisture, structure, and fragment size. We found nonlinear relationships of coarse woody debris, fine woody debris, standing dead and live tree biomass with mean annual median temperature. Biomass for these variables was greatest in temperate sites. Forest floor fuels (duff and litter) had a linear relationship with temperature and biomass was greatest in boreal sites. In a five-way multivariate analysis of variance we found that temperature, moisture, and age/structure had significant effects on forest floor fuels, downed woody debris, and live tree biomass. Fragment size had an effect on forest floor fuels and live tree biomass. Distance from forest edge had significant effects for only a few subgroups sampled. With some exceptions edges were not distinguishable from interiors in terms of fuels.
The study of combining Latin Hypercube Sampling method and LU decomposition method (LULHS method) for constructing spatial random field

NASA Astrophysics Data System (ADS)

WANG, P. T.

2015-12-01

Groundwater modeling requires to assign hydrogeological properties to every numerical grid. Due to the lack of detailed information and the inherent spatial heterogeneity, geological properties can be treated as random variables. Hydrogeological property is assumed to be a multivariate distribution with spatial correlations. By sampling random numbers from a given statistical distribution and assigning a value to each grid, a random field for modeling can be completed. Therefore, statistics sampling plays an important role in the efficiency of modeling procedure. Latin Hypercube Sampling (LHS) is a stratified random sampling procedure that provides an efficient way to sample variables from their multivariate distributions. This study combines the the stratified random procedure from LHS and the simulation by using LU decomposition to form LULHS. Both conditional and unconditional simulations of LULHS were develpoed. The simulation efficiency and spatial correlation of LULHS are compared to the other three different simulation methods. The results show that for the conditional simulation and unconditional simulation, LULHS method is more efficient in terms of computational effort. Less realizations are required to achieve the required statistical accuracy and spatial correlation.
Predicting adaptive phenotypes from multilocus genotypes in Sitka spruce (Picea sitchensis) using random forest.

PubMed

Holliday, Jason A; Wang, Tongli; Aitken, Sally

2012-09-01

Climate is the primary driver of the distribution of tree species worldwide, and the potential for adaptive evolution will be an important factor determining the response of forests to anthropogenic climate change. Although association mapping has the potential to improve our understanding of the genomic underpinnings of climatically relevant traits, the utility of adaptive polymorphisms uncovered by such studies would be greatly enhanced by the development of integrated models that account for the phenotypic effects of multiple single-nucleotide polymorphisms (SNPs) and their interactions simultaneously. We previously reported the results of association mapping in the widespread conifer Sitka spruce (Picea sitchensis). In the current study we used the recursive partitioning algorithm 'Random Forest' to identify optimized combinations of SNPs to predict adaptive phenotypes. After adjusting for population structure, we were able to explain 37% and 30% of the phenotypic variation, respectively, in two locally adaptive traits--autumn budset timing and cold hardiness. For each trait, the leading five SNPs captured much of the phenotypic variation. To determine the role of epistasis in shaping these phenotypes, we also used a novel approach to quantify the strength and direction of pairwise interactions between SNPs and found such interactions to be common. Our results demonstrate the power of Random Forest to identify subsets of markers that are most important to climatic adaptation, and suggest that interactions among these loci may be widespread.
Assessing Reliability of Student Ratings of Advisor: A Comparison of Univariate and Multivariate Generalizability Approaches.

ERIC Educational Resources Information Center

Sun, Anji; Valiga, Michael J.

In this study, the reliability of the American College Testing (ACT) Program's "Survey of Academic Advising" (SAA) was examined using both univariate and multivariate generalizability theory approaches. The primary purpose of the study was to compare the results of three generalizability theory models (a random univariate model, a mixed…
Bayesian inference for multivariate meta-analysis Box-Cox transformation models for individual patient data with applications to evaluation of cholesterol lowering drugs

PubMed Central

Kim, Sungduk; Chen, Ming-Hui; Ibrahim, Joseph G.; Shah, Arvind K.; Lin, Jianxin

2013-01-01

In this paper, we propose a class of Box-Cox transformation regression models with multidimensional random effects for analyzing multivariate responses for individual patient data (IPD) in meta-analysis. Our modeling formulation uses a multivariate normal response meta-analysis model with multivariate random effects, in which each response is allowed to have its own Box-Cox transformation. Prior distributions are specified for the Box-Cox transformation parameters as well as the regression coefficients in this complex model, and the Deviance Information Criterion (DIC) is used to select the best transformation model. Since the model is quite complex, a novel Monte Carlo Markov chain (MCMC) sampling scheme is developed to sample from the joint posterior of the parameters. This model is motivated by a very rich dataset comprising 26 clinical trials involving cholesterol lowering drugs where the goal is to jointly model the three dimensional response consisting of Low Density Lipoprotein Cholesterol (LDL-C), High Density Lipoprotein Cholesterol (HDL-C), and Triglycerides (TG) (LDL-C, HDL-C, TG). Since the joint distribution of (LDL-C, HDL-C, TG) is not multivariate normal and in fact quite skewed, a Box-Cox transformation is needed to achieve normality. In the clinical literature, these three variables are usually analyzed univariately: however, a multivariate approach would be more appropriate since these variables are correlated with each other. A detailed analysis of these data is carried out using the proposed methodology. PMID:23580436
Bayesian inference for multivariate meta-analysis Box-Cox transformation models for individual patient data with applications to evaluation of cholesterol-lowering drugs.

PubMed

Kim, Sungduk; Chen, Ming-Hui; Ibrahim, Joseph G; Shah, Arvind K; Lin, Jianxin

2013-10-15

In this paper, we propose a class of Box-Cox transformation regression models with multidimensional random effects for analyzing multivariate responses for individual patient data in meta-analysis. Our modeling formulation uses a multivariate normal response meta-analysis model with multivariate random effects, in which each response is allowed to have its own Box-Cox transformation. Prior distributions are specified for the Box-Cox transformation parameters as well as the regression coefficients in this complex model, and the deviance information criterion is used to select the best transformation model. Because the model is quite complex, we develop a novel Monte Carlo Markov chain sampling scheme to sample from the joint posterior of the parameters. This model is motivated by a very rich dataset comprising 26 clinical trials involving cholesterol-lowering drugs where the goal is to jointly model the three-dimensional response consisting of low density lipoprotein cholesterol (LDL-C), high density lipoprotein cholesterol (HDL-C), and triglycerides (TG) (LDL-C, HDL-C, TG). Because the joint distribution of (LDL-C, HDL-C, TG) is not multivariate normal and in fact quite skewed, a Box-Cox transformation is needed to achieve normality. In the clinical literature, these three variables are usually analyzed univariately; however, a multivariate approach would be more appropriate because these variables are correlated with each other. We carry out a detailed analysis of these data by using the proposed methodology. Copyright © 2013 John Wiley & Sons, Ltd.
Idaho forest carbon projections from 2017 to 2117 under forest disturbance and climate change scenarios

NASA Astrophysics Data System (ADS)

Hudak, A. T.; Crookston, N.; Kennedy, R. E.; Domke, G. M.; Fekety, P.; Falkowski, M. J.

2017-12-01

Commercial off-the-shelf lidar collections associated with tree measures in field plots allow aboveground biomass (AGB) estimation with high confidence. Predictive models developed from such datasets are used operationally to map AGB across lidar project areas. We use a random selection of these pixel-level AGB predictions as training for predicting AGB annually across Idaho and western Montana, primarily from Landsat time series imagery processed through LandTrendr. At both the landscape and regional scales, Random Forests is used for predictive AGB modeling. To project future carbon dynamics, we use Climate-FVS (Forest Vegetation Simulator), the tree growth engine used by foresters to inform forest planning decisions, under either constant or changing climate scenarios. Disturbance data compiled from LandTrendr (Kennedy et al. 2010) using TimeSync (Cohen et al. 2010) in forested lands of Idaho (n=509) and western Montana (n=288) are used to generate probabilities of disturbance (harvest, fire, or insect) by land ownership class (public, private) as well as the magnitude of disturbance. Our verification approach is to aggregate the regional, annual AGB predictions at the county level and compare them to annual county-level AGB summarized independently from systematic, field-based, annual inventories conducted by the US Forest Inventory and Analysis (FIA) Program nationally. This analysis shows that when federal lands are disturbed the magnitude is generally high and when other lands are disturbed the magnitudes are more moderate. The probability of disturbance in corporate lands is higher than in other lands but the magnitudes are generally lower. This is consistent with the much higher prevalence of fire and insects occurring on federal lands, and greater harvest activity on private lands. We found large forest carbon losses in drier southern Idaho, only partially offset by carbon gains in wetter northern Idaho, due to anticipated climate change. Public and private forest managers can use these forest carbon projections to 2117 to inform 2017 decisions on which tree species and seed sources to select for planting, and implement forest management strategies now that may seek to maximize forest carbon sequestration for greenhouse gas abatement a century from now.
Influence of Forest Management Regimes on Forest Dynamics in the Upstream Region of the Hun River in Northeastern China

PubMed Central

Yao, Jing; He, Xingyuan; Wang, Anzhi; Chen, Wei; Li, Xiaoyu; Lewis, Bernard J.; Lv, Xiaotao

2012-01-01

Balancing forest harvesting and restoration is critical for forest ecosystem management. In this study, we used LANDIS, a spatially explicit forest landscape model, to evaluate the effects of 21 alternative forest management initiatives which were drafted for forests in the upstream region of the Hun River in northeastern China. These management initiatives included a wide range of planting and harvest intensities for Pinus koraiensis, the historically dominant tree species in the region. Multivariate analysis of variance, Shannon's Diversity Index, and planting efficiency (which indicates how many cells of the target species at the final year benefit from per-cell of the planting trees) estimates were used as indicators to analyze the effects of planting and harvesting regimes on forests in the region. The results showed that the following: (1) Increased planting intensity, although augmenting the coverage of P. koraiensis, was accompanied by decreases in planting efficiency and forest diversity. (2) While selective harvesting could increase forest diversity, the abrupt increase of early succession species accompanying this method merits attention. (3) Stimulating rapid forest succession may not be a good management strategy, since the climax species would crowd out other species which are likely more adapted to future climatic conditions in the long run. In light of the above, we suggest a combination of 30% planting intensity with selective harvesting of 50% and 70% of primary and secondary timber species, respectively, as the most effective management regime in this area. In the long run this would accelerate the ultimate dominance of P. koraiensis in the forest via a more effective rate of planting, while maintaining a higher degree of forest diversity. These results are particularly useful for forest managers constrained by limited financial and labor resources who must deal with conflicts between forest harvesting and restoration. PMID:22723930

Functional trait strategies of trees in dry and wet tropical forests are similar but differ in their consequences for succession.

PubMed

Lohbeck, Madelon; Lebrija-Trejos, Edwin; Martínez-Ramos, Miguel; Meave, Jorge A; Poorter, Lourens; Bongers, Frans

2014-01-01

Global plant trait studies have revealed fundamental trade-offs in plant resource economics. We evaluated such trait trade-offs during secondary succession in two species-rich tropical ecosystems that contrast in precipitation: dry deciduous and wet evergreen forests of Mexico. Species turnover with succession in dry forest largely relates to increasing water availability and in wet forest to decreasing light availability. We hypothesized that while functional trait trade-offs are similar in the two forest systems, the successful plant strategies in these communities will be different, as contrasting filters affect species turnover. Research was carried out in 15 dry secondary forest sites (5-63 years after abandonment) and in 17 wet secondary forest sites (<1-25 years after abandonment). We used 11 functional traits measured on 132 species to make species-trait PCA biplots for dry and wet forest and compare trait trade-offs. We evaluated whether multivariate plant strategies changed during succession, by calculating a 'Community-Weighted Mean' plant strategy, based on species scores on the first two PCA-axes. Trait spectra reflected two main trade-off axes that were similar for dry and wet forest species: acquisitive versus conservative species, and drought avoiding species versus evergreen species with large animal-dispersed seeds. These trait associations were consistent when accounting for evolutionary history. Successional changes in the most successful plant strategies reflected different functional trait spectra depending on the forest type. In dry forest the community changed from having drought avoiding strategies early in succession to increased abundance of evergreen strategies with larger seeds late in succession. In wet forest the community changed from species having mainly acquisitive strategies to those with more conservative strategies during succession. These strategy changes were explained by increasing water availability during dry forest succession and increasing light scarcity during wet forest succession. Although similar trait spectra were observed among dry and wet secondary forest species, the consequences for succession were different resulting from contrasting environmental filters.
A Century Trend of Precipitation in Forest Watersheds from the Lower Mississippi River Basin

NASA Astrophysics Data System (ADS)

Feng, G.; Ouyang, Y.; Leininger, T.; Han, Y.

2017-12-01

Estimates of hydrological processes in forest watersheds are essential to water supply planning, water quality protection, water resources management, and ecological restoration; whereas the century precipitation variation due to climate change could exacerbate forest watershed hydrological processes and add uncertainties to the processes. In this study, the multivariate statisitcal analysis technique was employed to identify a century temporal trend of precipitation in forest watersheds from the Lower Mississippi River Basin (LMRB). Seveal surface water monitoring stations in the LMRB, located in forest watersheds with very little land use disturbance and a century record, were selected to obtain precipitation data. Using frequency distribution analysis with HYDSTRA model, we found that the mean annual precipitation in a decadal scale increased as time elapsed over a 100-year period. Our study further revealed that the precipitation intensity for one-hour duration increased sigificantly in every 10 years for a 100-year period. During this period, the annual mean dry day frequency decreased in a decadal scale, whereas the annual mean wet day frequency increased for the same scale. Results indicated the precipitation pattern has been altered in the LMRB and the selected forest watersheds in this basin seems to become wetter during the past 100 years as a result of climate change.
Aspen, climate, and sudden decline in western USA

Treesearch

Gerald E. Rehfeldt; Dennis E. Ferguson; Nicholas L. Crookston

2009-01-01

A bioclimate model predicting the presence or absence of aspen, Populus tremuloides, in western USA from climate variables was developed by using the Random Forests classification tree on Forest Inventory data from about 118,000 permanent sample plots. A reasonably parsimonious model used eight predictors to describe aspen's climate profile. Classification errors...
Variation in Local-Scale Edge Effects: Mechanisms and landscape Context

Treesearch

Therese M. Donovan; Peter W. Jones; Elizabeth M. Annand; Frank R. Thompson III

1997-01-01

Ecological processes near habitat edges often differ from processes away from edges. Yet, the generality of "edge effects" has been hotly debated because results vary tremendously. To understand the factors responsible for this variation, we described nest predation and cowbird distribution patterns in forest edge and forest core habitats on 36 randomly...
Mitigating budget constraints on visitation volume surveys: the case of U.S. National forests

Treesearch

Ashley E. Askew; Donald B.K. English; Stanley J. Zarnoch; Neelam C. Poudyal; J.M. Bowker

2014-01-01

Stratified random sampling (SRS) provides a scientifically based estimate of a population comprising mutually exclusive, homogenous subgroups. In the National Visitor Use Monitoring (NVUM) program, SRS is used to estimate recreation visitation and visitor characteristics across activities on National forests. However, with rising costs and declining budgets, carrying...
Demographic influences on environmental value orientations and normative beliefs about national forest management

Treesearch

Jerry J. Vaske; Maureen P. Donnelly; Daniel R. Williams; Sandra Jonker

2001-01-01

Using the cognitive hierarchy as the theoretical foundation, this article examines the predictive influence of individuals' demographic characteristics on environmental value orientations and normative beliefs about national forest management. Data for this investigation were obtained from a random sample of Colorado residents (n = 960). As predicted by theory, a...
Subtyping cognitive profiles in Autism Spectrum Disorder using a Functional Random Forest algorithm.

PubMed

Feczko, E; Balba, N M; Miranda-Dominguez, O; Cordova, M; Karalunas, S L; Irwin, L; Demeter, D V; Hill, A P; Langhorst, B H; Grieser Painter, J; Van Santen, J; Fombonne, E J; Nigg, J T; Fair, D A

2018-05-15

DSM-5 Autism Spectrum Disorder (ASD) comprises a set of neurodevelopmental disorders characterized by deficits in social communication and interaction and repetitive behaviors or restricted interests, and may both affect and be affected by multiple cognitive mechanisms. This study attempts to identify and characterize cognitive subtypes within the ASD population using our Functional Random Forest (FRF) machine learning classification model. This model trained a traditional random forest model on measures from seven tasks that reflect multiple levels of information processing. 47 ASD diagnosed and 58 typically developing (TD) children between the ages of 9 and 13 participated in this study. Our RF model was 72.7% accurate, with 80.7% specificity and 63.1% sensitivity. Using the random forest model, the FRF then measures the proximity of each subject to every other subject, generating a distance matrix between participants. This matrix is then used in a community detection algorithm to identify subgroups within the ASD and TD groups, and revealed 3 ASD and 4 TD putative subgroups with unique behavioral profiles. We then examined differences in functional brain systems between diagnostic groups and putative subgroups using resting-state functional connectivity magnetic resonance imaging (rsfcMRI). Chi-square tests revealed a significantly greater number of between group differences (p < .05) within the cingulo-opercular, visual, and default systems as well as differences in inter-system connections in the somato-motor, dorsal attention, and subcortical systems. Many of these differences were primarily driven by specific subgroups suggesting that our method could potentially parse the variation in brain mechanisms affected by ASD. Copyright © 2017. Published by Elsevier Inc.
Predicting attention-deficit/hyperactivity disorder severity from psychosocial stress and stress-response genes: a random forest regression approach

PubMed Central

van der Meer, D; Hoekstra, P J; van Donkelaar, M; Bralten, J; Oosterlaan, J; Heslenfeld, D; Faraone, S V; Franke, B; Buitelaar, J K; Hartman, C A

2017-01-01

Identifying genetic variants contributing to attention-deficit/hyperactivity disorder (ADHD) is complicated by the involvement of numerous common genetic variants with small effects, interacting with each other as well as with environmental factors, such as stress exposure. Random forest regression is well suited to explore this complexity, as it allows for the analysis of many predictors simultaneously, taking into account any higher-order interactions among them. Using random forest regression, we predicted ADHD severity, measured by Conners’ Parent Rating Scales, from 686 adolescents and young adults (of which 281 were diagnosed with ADHD). The analysis included 17 374 single-nucleotide polymorphisms (SNPs) across 29 genes previously linked to hypothalamic–pituitary–adrenal (HPA) axis activity, together with information on exposure to 24 individual long-term difficulties or stressful life events. The model explained 12.5% of variance in ADHD severity. The most important SNP, which also showed the strongest interaction with stress exposure, was located in a region regulating the expression of telomerase reverse transcriptase (TERT). Other high-ranking SNPs were found in or near NPSR1, ESR1, GABRA6, PER3, NR3C2 and DRD4. Chronic stressors were more influential than single, severe, life events. Top hits were partly shared with conduct problems. We conclude that random forest regression may be used to investigate how multiple genetic and environmental factors jointly contribute to ADHD. It is able to implicate novel SNPs of interest, interacting with stress exposure, and may explain inconsistent findings in ADHD genetics. This exploratory approach may be best combined with more hypothesis-driven research; top predictors and their interactions with one another should be replicated in independent samples. PMID:28585928
Simple to complex modeling of breathing volume using a motion sensor.

PubMed

John, Dinesh; Staudenmayer, John; Freedson, Patty

2013-06-01

To compare simple and complex modeling techniques to estimate categories of low, medium, and high ventilation (VE) from ActiGraph™ activity counts. Vertical axis ActiGraph™ GT1M activity counts, oxygen consumption and VE were measured during treadmill walking and running, sports, household chores and labor-intensive employment activities. Categories of low (<19.3 l/min), medium (19.3 to 35.4 l/min) and high (>35.4 l/min) VEs were derived from activity intensity classifications (light <2.9 METs, moderate 3.0 to 5.9 METs and vigorous >6.0 METs). We examined the accuracy of two simple techniques (multiple regression and activity count cut-point analyses) and one complex (random forest technique) modeling technique in predicting VE from activity counts. Prediction accuracy of the complex random forest technique was marginally better than the simple multiple regression method. Both techniques accurately predicted VE categories almost 80% of the time. The multiple regression and random forest techniques were more accurate (85 to 88%) in predicting medium VE. Both techniques predicted the high VE (70 to 73%) with greater accuracy than low VE (57 to 60%). Actigraph™ cut-points for light, medium and high VEs were <1381, 1381 to 3660 and >3660 cpm. There were minor differences in prediction accuracy between the multiple regression and the random forest technique. This study provides methods to objectively estimate VE categories using activity monitors that can easily be deployed in the field. Objective estimates of VE should provide a better understanding of the dose-response relationship between internal exposure to pollutants and disease. Copyright © 2013 Elsevier B.V. All rights reserved.
Managing salinity in Upper Colorado River Basin streams: Selecting catchments for sediment control efforts using watershed characteristics and random forests models

USGS Publications Warehouse

Tillman, Fred; Anning, David W.; Heilman, Julian A.; Buto, Susan G.; Miller, Matthew P.

2018-01-01

Elevated concentrations of dissolved-solids (salinity) including calcium, sodium, sulfate, and chloride, among others, in the Colorado River cause substantial problems for its water users. Previous efforts to reduce dissolved solids in upper Colorado River basin (UCRB) streams often focused on reducing suspended-sediment transport to streams, but few studies have investigated the relationship between suspended sediment and salinity, or evaluated which watershed characteristics might be associated with this relationship. Are there catchment properties that may help in identifying areas where control of suspended sediment will also reduce salinity transport to streams? A random forests classification analysis was performed on topographic, climate, land cover, geology, rock chemistry, soil, and hydrologic information in 163 UCRB catchments. Two random forests models were developed in this study: one for exploring stream and catchment characteristics associated with stream sites where dissolved solids increase with increasing suspended-sediment concentration, and the other for predicting where these sites are located in unmonitored reaches. Results of variable importance from the exploratory random forests models indicate that no simple source, geochemical process, or transport mechanism can easily explain the relationship between dissolved solids and suspended sediment concentrations at UCRB monitoring sites. Among the most important watershed characteristics in both models were measures of soil hydraulic conductivity, soil erodibility, minimum catchment elevation, catchment area, and the silt component of soil in the catchment. Predictions at key locations in the basin were combined with observations from selected monitoring sites, and presented in map-form to give a complete understanding of where catchment sediment control practices would also benefit control of dissolved solids in streams.
Remote sensing leaf water stress in coffee (Coffea arabica) using secondary effects of water absorption and random forests

NASA Astrophysics Data System (ADS)

Chemura, Abel; Mutanga, Onisimo; Dube, Timothy

2017-08-01

Water management is an important component in agriculture, particularly for perennial tree crops such as coffee. Proper detection and monitoring of water stress therefore plays an important role not only in mitigating the associated adverse impacts on crop growth and productivity but also in reducing expensive and environmentally unsustainable irrigation practices. Current methods for water stress detection in coffee production mainly involve monitoring plant physiological characteristics and soil conditions. In this study, we tested the ability of selected wavebands in the VIS/NIR range to predict plant water content (PWC) in coffee using the random forest algorithm. An experiment was set up such that coffee plants were exposed to different levels of water stress and reflectance and plant water content measured. In selecting appropriate parameters, cross-correlation identified 11 wavebands, reflectance difference identified 16 and reflectance sensitivity identified 22 variables related to PWC. Only three wavebands (485 nm, 670 nm and 885 nm) were identified by at least two methods as significant. The selected wavebands were trained (n = 36) and tested on independent data (n = 24) after being integrated into the random forest algorithm to predict coffee PWC. The results showed that the reflectance sensitivity selected bands performed the best in water stress detection (r = 0.87, RMSE = 4.91% and pBias = 0.9%), when compared to reflectance difference (r = 0.79, RMSE = 6.19 and pBias = 2.5%) and cross-correlation selected wavebands (r = 0.75, RMSE = 6.52 and pBias = 1.6). These results indicate that it is possible to reliably predict PWC using wavebands in the VIS/NIR range that correspond with many of the available multispectral scanners using random forests and further research at field and landscape scale is required to operationalize these findings.
Properties of Protein Drug Target Classes

PubMed Central

Bull, Simon C.; Doig, Andrew J.

2015-01-01

Accurate identification of drug targets is a crucial part of any drug development program. We mined the human proteome to discover properties of proteins that may be important in determining their suitability for pharmaceutical modulation. Data was gathered concerning each protein’s sequence, post-translational modifications, secondary structure, germline variants, expression profile and drug target status. The data was then analysed to determine features for which the target and non-target proteins had significantly different values. This analysis was repeated for subsets of the proteome consisting of all G-protein coupled receptors, ion channels, kinases and proteases, as well as proteins that are implicated in cancer. Machine learning was used to quantify the proteins in each dataset in terms of their potential to serve as a drug target. This was accomplished by first inducing a random forest that could distinguish between its targets and non-targets, and then using the random forest to quantify the drug target likeness of the non-targets. The properties that can best differentiate targets from non-targets were primarily those that are directly related to a protein’s sequence (e.g. secondary structure). Germline variants, expression levels and interactions between proteins had minimal discriminative power. Overall, the best indicators of drug target likeness were found to be the proteins’ hydrophobicities, in vivo half-lives, propensity for being membrane bound and the fraction of non-polar amino acids in their sequences. In terms of predicting potential targets, datasets of proteases, ion channels and cancer proteins were able to induce random forests that were highly capable of distinguishing between targets and non-targets. The non-target proteins predicted to be targets by these random forests comprise the set of the most suitable potential future drug targets, and should therefore be prioritised when building a drug development programme. PMID:25822509
Random Forest Algorithm for the Classification of Neuroimaging Data in Alzheimer's Disease: A Systematic Review.

PubMed

Sarica, Alessia; Cerasa, Antonio; Quattrone, Aldo

2017-01-01

Objective: Machine learning classification has been the most important computational development in the last years to satisfy the primary need of clinicians for automatic early diagnosis and prognosis. Nowadays, Random Forest (RF) algorithm has been successfully applied for reducing high dimensional and multi-source data in many scientific realms. Our aim was to explore the state of the art of the application of RF on single and multi-modal neuroimaging data for the prediction of Alzheimer's disease. Methods: A systematic review following PRISMA guidelines was conducted on this field of study. In particular, we constructed an advanced query using boolean operators as follows: ("random forest" OR "random forests") AND neuroimaging AND ("alzheimer's disease" OR alzheimer's OR alzheimer) AND (prediction OR classification) . The query was then searched in four well-known scientific databases: Pubmed, Scopus, Google Scholar and Web of Science. Results: Twelve articles-published between the 2007 and 2017-have been included in this systematic review after a quantitative and qualitative selection. The lesson learnt from these works suggest that when RF was applied on multi-modal data for prediction of Alzheimer's disease (AD) conversion from the Mild Cognitive Impairment (MCI), it produces one of the best accuracies to date. Moreover, the RF has important advantages in terms of robustness to overfitting, ability to handle highly non-linear data, stability in the presence of outliers and opportunity for efficient parallel processing mainly when applied on multi-modality neuroimaging data, such as, MRI morphometric, diffusion tensor imaging, and PET images. Conclusions: We discussed the strengths of RF, considering also possible limitations and by encouraging further studies on the comparisons of this algorithm with other commonly used classification approaches, particularly in the early prediction of the progression from MCI to AD.
Assessing the Status of Wild Felids in a Highly-Disturbed Commercial Forest Reserve in Borneo and the Implications for Camera Trap Survey Design

PubMed Central

Wearn, Oliver R.; Rowcliffe, J. Marcus; Carbone, Chris; Bernard, Henry; Ewers, Robert M.

2013-01-01

The proliferation of camera-trapping studies has led to a spate of extensions in the known distributions of many wild cat species, not least in Borneo. However, we still do not have a clear picture of the spatial patterns of felid abundance in Southeast Asia, particularly with respect to the large areas of highly-disturbed habitat. An important obstacle to increasing the usefulness of camera trap data is the widespread practice of setting cameras at non-random locations. Non-random deployment interacts with non-random space-use by animals, causing biases in our inferences about relative abundance from detection frequencies alone. This may be a particular problem if surveys do not adequately sample the full range of habitat features present in a study region. Using camera-trapping records and incidental sightings from the Kalabakan Forest Reserve, Sabah, Malaysian Borneo, we aimed to assess the relative abundance of felid species in highly-disturbed forest, as well as investigate felid space-use and the potential for biases resulting from non-random sampling. Although the area has been intensively logged over three decades, it was found to still retain the full complement of Bornean felids, including the bay cat Pardofelis badia, a poorly known Bornean endemic. Camera-trapping using strictly random locations detected four of the five Bornean felid species and revealed inter- and intra-specific differences in space-use. We compare our results with an extensive dataset of >1,200 felid records from previous camera-trapping studies and show that the relative abundance of the bay cat, in particular, may have previously been underestimated due to the use of non-random survey locations. Further surveys for this species using random locations will be crucial in determining its conservation status. We advocate the more wide-spread use of random survey locations in future camera-trapping surveys in order to increase the robustness and generality of inferences that can be made. PMID:24223717
Social determinants of long lasting insecticidal hammock use among the Ra-glai ethnic minority in Vietnam: implications for forest malaria control.

PubMed

Grietens, Koen Peeters; Xuan, Xa Nguyen; Ribera, Joan; Duc, Thang Ngo; Bortel, Wim van; Ba, Nhat Truong; Van, Ky Pham; Xuan, Hung Le; D'Alessandro, Umberto; Erhart, Annette

2012-01-01

Long-lasting insecticidal hammocks (LLIHs) are being evaluated as an additional malaria prevention tool in settings where standard control strategies have a limited impact. This is the case among the Ra-glai ethnic minority communities of Ninh Thuan, one of the forested and mountainous provinces of Central Vietnam where malaria morbidity persist due to the sylvatic nature of the main malaria vector An. dirus and the dependence of the population on the forest for subsistence--as is the case for many impoverished ethnic minorities in Southeast Asia. A social science study was carried out ancillary to a community-based cluster randomized trial on the effectiveness of LLIHs to control forest malaria. The social science research strategy consisted of a mixed methods study triangulating qualitative data from focused ethnography and quantitative data collected during a malariometric cross-sectional survey on a random sample of 2,045 study participants. To meet work requirements during the labor intensive malaria transmission and rainy season, Ra-glai slash and burn farmers combine living in government supported villages along the road with a second home at their fields located in the forest. LLIH use was evaluated in both locations. During daytime, LLIH use at village level was reported by 69.3% of all respondents, and in forest fields this was 73.2%. In the evening, 54.1% used the LLIHs in the villages, while at the fields this was 20.7%. At night, LLIH use was minimal, regardless of the location (village 4.4%; forest 6.4%). Despite the free distribution of insecticide-treated nets (ITNs) and LLIHs, around half the local population remains largely unprotected when sleeping in their forest plot huts. In order to tackle forest malaria more effectively, control policies should explicitly target forest fields where ethnic minority farmers are more vulnerable to malaria.
Social Determinants of Long Lasting Insecticidal Hammock-Use Among the Ra-Glai Ethnic Minority in Vietnam: Implications for Forest Malaria Control

PubMed Central

Muela Ribera, Joan; Ngo Duc, Thang; van Bortel, Wim; Truong Ba, Nhat; Van, Ky Pham; Le Xuan, Hung; D'Alessandro, Umberto; Erhart, Annette

2012-01-01

Background Long-lasting insecticidal hammocks (LLIHs) are being evaluated as an additional malaria prevention tool in settings where standard control strategies have a limited impact. This is the case among the Ra-glai ethnic minority communities of Ninh Thuan, one of the forested and mountainous provinces of Central Vietnam where malaria morbidity persist due to the sylvatic nature of the main malaria vector An. dirus and the dependence of the population on the forest for subsistence - as is the case for many impoverished ethnic minorities in Southeast Asia. Methods A social science study was carried out ancillary to a community-based cluster randomized trial on the effectiveness of LLIHs to control forest malaria. The social science research strategy consisted of a mixed methods study triangulating qualitative data from focused ethnography and quantitative data collected during a malariometric cross-sectional survey on a random sample of 2,045 study participants. Results To meet work requirements during the labor intensive malaria transmission and rainy season, Ra-glai slash and burn farmers combine living in government supported villages along the road with a second home at their fields located in the forest. LLIH use was evaluated in both locations. During daytime, LLIH use at village level was reported by 69.3% of all respondents, and in forest fields this was 73.2%. In the evening, 54.1% used the LLIHs in the villages, while at the fields this was 20.7%. At night, LLIH use was minimal, regardless of the location (village 4.4%; forest 6.4%). Discussion Despite the free distribution of insecticide-treated nets (ITNs) and LLIHs, around half the local population remains largely unprotected when sleeping in their forest plot huts. In order to tackle forest malaria more effectively, control policies should explicitly target forest fields where ethnic minority farmers are more vulnerable to malaria. PMID:22253852
Advanced analysis of forest fire clustering

NASA Astrophysics Data System (ADS)

Kanevski, Mikhail; Pereira, Mario; Golay, Jean

2017-04-01

Analysis of point pattern clustering is an important topic in spatial statistics and for many applications: biodiversity, epidemiology, natural hazards, geomarketing, etc. There are several fundamental approaches used to quantify spatial data clustering using topological, statistical and fractal measures. In the present research, the recently introduced multi-point Morisita index (mMI) is applied to study the spatial clustering of forest fires in Portugal. The data set consists of more than 30000 fire events covering the time period from 1975 to 2013. The distribution of forest fires is very complex and highly variable in space. mMI is a multi-point extension of the classical two-point Morisita index. In essence, mMI is estimated by covering the region under study by a grid and by computing how many times more likely it is that m points selected at random will be from the same grid cell than it would be in the case of a complete random Poisson process. By changing the number of grid cells (size of the grid cells), mMI characterizes the scaling properties of spatial clustering. From mMI, the data intrinsic dimension (fractal dimension) of the point distribution can be estimated as well. In this study, the mMI of forest fires is compared with the mMI of random patterns (RPs) generated within the validity domain defined as the forest area of Portugal. It turns out that the forest fires are highly clustered inside the validity domain in comparison with the RPs. Moreover, they demonstrate different scaling properties at different spatial scales. The results obtained from the mMI analysis are also compared with those of fractal measures of clustering - box counting and sand box counting approaches. REFERENCES Golay J., Kanevski M., Vega Orozco C., Leuenberger M., 2014: The multipoint Morisita index for the analysis of spatial patterns. Physica A, 406, 191-202. Golay J., Kanevski M. 2015: A new estimator of intrinsic dimension based on the multipoint Morisita index. Pattern Recognition, 48, 4070-4081.
EDITORIAL: Special section on foliage penetration

NASA Astrophysics Data System (ADS)

Fiddy, M. A.; Lang, R.; McGahan, R. V.

2004-04-01

Waves in Random Media was founded in 1991 to provide a forum for papers dealing with electromagnetic and acoustic waves as they propagate and scatter through media or objects having some degree of randomness. This is a broad charter since, in practice, all scattering obstacles and structures have roughness or randomness, often on the scale of the wavelength being used to probe them. Including this random component leads to some quite different methods for describing propagation effects, for example, when propagating through the atmosphere or the ground. This special section on foliage penetration (FOPEN) focuses on the problems arising from microwave propagation through foliage and vegetation. Applications of such studies include the estimation for forest biomass and the moisture of the underlying soil, as well as detecting objects hidden therein. In addition to the so-called `direct problem' of trying to describe energy propagating through such media, the complementary inverse problem is of great interest and much harder to solve. The development of theoretical models and associated numerical algorithms for identifying objects concealed by foliage has applications in surveillance, ranging from monitoring drug trafficking to targeting military vehicles. FOPEN can be employed to map the earth's surface in cases when it is under a forest canopy, permitting the identification of objects or targets on that surface, but the process for doing so is not straightforward. There has been an increasing interest in foliage penetration synthetic aperture radar (FOPEN or FOPENSAR) over the last 10 years and this special section provides a broad overview of many of the issues involved. The detection, identification, and geographical location of targets under foliage or otherwise obscured by poor visibility conditions remains a challenge. In particular, a trade-off often needs to be appreciated, namely that diminishing the deleterious effects of multiple scattering from leaves is typically associated with a significant loss in target resolution. Foliage is more or less transparent to some radar frequencies, but longer wavelengths found in the VHF (30 to 300 MHz) and UHF (300 MHz to 3 GHz) portions of the microwave spectrum have more chance of penetrating foliage than do wavelengths at the X band (8 to 12 GHz). Reflection and multiple scattering occur for some other frequencies and models of the processes involved are crucial. Two topical reviews can be found in this issue, one on the microwave radiometry of forests (page S275) and another describing ionospheric effects on space-based radar (page S189). Subsequent papers present new results on modelling coherent backscatter from forests (page S299), modelling forests as discrete random media over a random interface (page S359) and interpreting ranging scatterometer data from forests (page S317). Cloude et al present research on identifying targets beneath foliage using polarimetric SAR interferometry (page S393) while Treuhaft and Siqueira use interferometric radar to describe forest structure and biomass (page S345). Vechhia et al model scattering from leaves (page S333) and Semichaevsky et al address the problem of the trade-off between increasing wavelength, reduction in multiple scattering, and target resolution (page S415).
Spatio-temporal Change Patterns of Tropical Forests from 2000 to 2014 Using MOD09A1 Dataset

NASA Astrophysics Data System (ADS)

Qin, Y.; Xiao, X.; Dong, J.

2016-12-01

Large-scale deforestation and forest degradation in the tropical region have resulted in extensive carbon emissions and biodiversity loss. However, restricted by the availability of good-quality observations, large uncertainty exists in mapping the spatial distribution of forests and their spatio-temporal changes. In this study, we proposed a pixel- and phenology-based algorithm to identify and map annual tropical forests from 2000 to 2014, using the 8-day, 500-m MOD09A1 (v005) product, under the support of Google cloud computing (Google Earth Engine). A temporal filter was applied to reduce the random noises and to identify the spatio-temporal changes of forests. We then built up a confusion matrix and assessed the accuracy of the annual forest maps based on the ground reference interpreted from high spatial resolution images in Google Earth. The resultant forest maps showed the consistent forest/non-forest, forest loss, and forest gain in the pan-tropical zone during 2000 - 2014. The proposed algorithm showed the potential for tropical forest mapping and the resultant forest maps are important for the estimation of carbon emission and biodiversity loss.
Statistical analysis of multivariate atmospheric variables. [cloud cover

NASA Technical Reports Server (NTRS)

Tubbs, J. D.

1979-01-01

Topics covered include: (1) estimation in discrete multivariate distributions; (2) a procedure to predict cloud cover frequencies in the bivariate case; (3) a program to compute conditional bivariate normal parameters; (4) the transformation of nonnormal multivariate to near-normal; (5) test of fit for the extreme value distribution based upon the generalized minimum chi-square; (6) test of fit for continuous distributions based upon the generalized minimum chi-square; (7) effect of correlated observations on confidence sets based upon chi-square statistics; and (8) generation of random variates from specified distributions.

Exploring prediction uncertainty of spatial data in geostatistical and machine learning Approaches

NASA Astrophysics Data System (ADS)

Klump, J. F.; Fouedjio, F.

2017-12-01

Geostatistical methods such as kriging with external drift as well as machine learning techniques such as quantile regression forest have been intensively used for modelling spatial data. In addition to providing predictions for target variables, both approaches are able to deliver a quantification of the uncertainty associated with the prediction at a target location. Geostatistical approaches are, by essence, adequate for providing such prediction uncertainties and their behaviour is well understood. However, they often require significant data pre-processing and rely on assumptions that are rarely met in practice. Machine learning algorithms such as random forest regression, on the other hand, require less data pre-processing and are non-parametric. This makes the application of machine learning algorithms to geostatistical problems an attractive proposition. The objective of this study is to compare kriging with external drift and quantile regression forest with respect to their ability to deliver reliable prediction uncertainties of spatial data. In our comparison we use both simulated and real world datasets. Apart from classical performance indicators, comparisons make use of accuracy plots, probability interval width plots, and the visual examinations of the uncertainty maps provided by the two approaches. By comparing random forest regression to kriging we found that both methods produced comparable maps of estimated values for our variables of interest. However, the measure of uncertainty provided by random forest seems to be quite different to the measure of uncertainty provided by kriging. In particular, the lack of spatial context can give misleading results in areas without ground truth data. These preliminary results raise questions about assessing the risks associated with decisions based on the predictions from geostatistical and machine learning algorithms in a spatial context, e.g. mineral exploration.
Statistical inferences for data from studies conducted with an aggregated multivariate outcome-dependent sample design.

PubMed

Lu, Tsui-Shan; Longnecker, Matthew P; Zhou, Haibo

2017-03-15

Outcome-dependent sampling (ODS) scheme is a cost-effective sampling scheme where one observes the exposure with a probability that depends on the outcome. The well-known such design is the case-control design for binary response, the case-cohort design for the failure time data, and the general ODS design for a continuous response. While substantial work has been carried out for the univariate response case, statistical inference and design for the ODS with multivariate cases remain under-developed. Motivated by the need in biological studies for taking the advantage of the available responses for subjects in a cluster, we propose a multivariate outcome-dependent sampling (multivariate-ODS) design that is based on a general selection of the continuous responses within a cluster. The proposed inference procedure for the multivariate-ODS design is semiparametric where all the underlying distributions of covariates are modeled nonparametrically using the empirical likelihood methods. We show that the proposed estimator is consistent and developed the asymptotically normality properties. Simulation studies show that the proposed estimator is more efficient than the estimator obtained using only the simple-random-sample portion of the multivariate-ODS or the estimator from a simple random sample with the same sample size. The multivariate-ODS design together with the proposed estimator provides an approach to further improve study efficiency for a given fixed study budget. We illustrate the proposed design and estimator with an analysis of association of polychlorinated biphenyl exposure to hearing loss in children born to the Collaborative Perinatal Study. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
[Functional diversity characteristics of canopy tree species of Jianfengling tropical montane rainforest on Hainan Island, China.

PubMed

Xu, Ge Xi; Shi, Zuo Min; Tang, Jing Chao; Liu, Shun; Ma, Fan Qiang; Xu, Han; Liu, Shi Rong; Li, Yi de

2016-11-18

Based on three 1-hm 2 plots of Jianfengling tropical montane rainforest on Hainan Island, 11 commom used functional traits of canopy trees were measured. After combining with topographical factors and trees census data of these three plots, we compared the impacts of weighted species abundance on two functional dispersion indices, mean pairwise distance (MPD) and mean nearest taxon distance (MNTD), by using single- and multi-dimensional traits, respectively. The relationship between functional richness of the forest canopies and species abundance was analyzed. We used a null model approach to explore the variations in standardized size effects of MPD and MNTD, which were weighted by species abundance and eliminated the influences of species richness diffe-rences among communities, and assessed functional diversity patterns of the forest canopies and their responses to local habitat heterogeneity at community's level. The results showed that variation in MPD was greatly dependent on the dimensionalities of functional traits as well as species abundance. The correlations between weighted and non-weighted MPD based on different dimensional traits were relatively weak (R=0.359-0.628). On the contrary, functional traits and species abundance had relatively weak effects on MNTD, which brought stronger correlations between weighted and non-weighted MNTD based on different dimensional traits (R=0.746-0.820). Functional dispersion of the forest canopies were generally overestimated when using non-weighted MPD and MNTD. Functional richness of the forest canopies showed an exponential relationship with species abundance (F=128.20; R 2 =0.632; AIC=97.72; P＜0.001), which might exist a species abundance threshold value. Patterns of functional diversity of the forest canopies based on different dimensional functional traits and their habitat responses showed variations in some degree. Forest canopies in the valley usually had relatively stronger biological competition, and functional diversity was higher than expected functional diversity randomized by null model, which indicated dispersed distribution of functional traits among canopy tree species in this habitat. However, the functional diversity of the forest canopies tended to be close or lower than randomization in the other habitat types, which demonstrated random or clustered distribution of the functional traits among canopy tree species.
Parameter estimation of multivariate multiple regression model using bayesian with non-informative Jeffreys’ prior distribution

NASA Astrophysics Data System (ADS)

Saputro, D. R. S.; Amalia, F.; Widyaningsih, P.; Affan, R. C.

2018-05-01

Bayesian method is a method that can be used to estimate the parameters of multivariate multiple regression model. Bayesian method has two distributions, there are prior and posterior distributions. Posterior distribution is influenced by the selection of prior distribution. Jeffreys’ prior distribution is a kind of Non-informative prior distribution. This prior is used when the information about parameter not available. Non-informative Jeffreys’ prior distribution is combined with the sample information resulting the posterior distribution. Posterior distribution is used to estimate the parameter. The purposes of this research is to estimate the parameters of multivariate regression model using Bayesian method with Non-informative Jeffreys’ prior distribution. Based on the results and discussion, parameter estimation of β and Σ which were obtained from expected value of random variable of marginal posterior distribution function. The marginal posterior distributions for β and Σ are multivariate normal and inverse Wishart. However, in calculation of the expected value involving integral of a function which difficult to determine the value. Therefore, approach is needed by generating of random samples according to the posterior distribution characteristics of each parameter using Markov chain Monte Carlo (MCMC) Gibbs sampling algorithm.
How random is the random forest? Random forest algorithm on the service of structural imaging biomarkers for Alzheimer's disease: from Alzheimer's disease neuroimaging initiative (ADNI) database.

PubMed

Dimitriadis, Stavros I; Liparas, Dimitris

2018-06-01

Neuroinformatics is a fascinating research field that applies computational models and analytical tools to high dimensional experimental neuroscience data for a better understanding of how the brain functions or dysfunctions in brain diseases. Neuroinformaticians work in the intersection of neuroscience and informatics supporting the integration of various sub-disciplines (behavioural neuroscience, genetics, cognitive psychology, etc.) working on brain research. Neuroinformaticians are the pathway of information exchange between informaticians and clinicians for a better understanding of the outcome of computational models and the clinical interpretation of the analysis. Machine learning is one of the most significant computational developments in the last decade giving tools to neuroinformaticians and finally to radiologists and clinicians for an automatic and early diagnosis-prognosis of a brain disease. Random forest (RF) algorithm has been successfully applied to high-dimensional neuroimaging data for feature reduction and also has been applied to classify the clinical label of a subject using single or multi-modal neuroimaging datasets. Our aim was to review the studies where RF was applied to correctly predict the Alzheimer's disease (AD), the conversion from mild cognitive impairment (MCI) and its robustness to overfitting, outliers and handling of non-linear data. Finally, we described our RF-based model that gave us the 1 st position in an international challenge for automated prediction of MCI from MRI data.
The use of single-date MODIS imagery for estimating large-scale urban impervious surface fraction with spectral mixture analysis and machine learning techniques

NASA Astrophysics Data System (ADS)

Deng, Chengbin; Wu, Changshan

2013-12-01

Urban impervious surface information is essential for urban and environmental applications at the regional/national scales. As a popular image processing technique, spectral mixture analysis (SMA) has rarely been applied to coarse-resolution imagery due to the difficulty of deriving endmember spectra using traditional endmember selection methods, particularly within heterogeneous urban environments. To address this problem, we derived endmember signatures through a least squares solution (LSS) technique with known abundances of sample pixels, and integrated these endmember signatures into SMA for mapping large-scale impervious surface fraction. In addition, with the same sample set, we carried out objective comparative analyses among SMA (i.e. fully constrained and unconstrained SMA) and machine learning (i.e. Cubist regression tree and Random Forests) techniques. Analysis of results suggests three major conclusions. First, with the extrapolated endmember spectra from stratified random training samples, the SMA approaches performed relatively well, as indicated by small MAE values. Second, Random Forests yields more reliable results than Cubist regression tree, and its accuracy is improved with increased sample sizes. Finally, comparative analyses suggest a tentative guide for selecting an optimal approach for large-scale fractional imperviousness estimation: unconstrained SMA might be a favorable option with a small number of samples, while Random Forests might be preferred if a large number of samples are available.
Prediction of aquatic toxicity mode of action using linear discriminant and random forest models.

PubMed

Martin, Todd M; Grulke, Christopher M; Young, Douglas M; Russom, Christine L; Wang, Nina Y; Jackson, Crystal R; Barron, Mace G

2013-09-23

The ability to determine the mode of action (MOA) for a diverse group of chemicals is a critical part of ecological risk assessment and chemical regulation. However, existing MOA assignment approaches in ecotoxicology have been limited to a relatively few MOAs, have high uncertainty, or rely on professional judgment. In this study, machine based learning algorithms (linear discriminant analysis and random forest) were used to develop models for assigning aquatic toxicity MOA. These methods were selected since they have been shown to be able to correlate diverse data sets and provide an indication of the most important descriptors. A data set of MOA assignments for 924 chemicals was developed using a combination of high confidence assignments, international consensus classifications, ASTER (ASessment Tools for the Evaluation of Risk) predictions, and weight of evidence professional judgment based an assessment of structure and literature information. The overall data set was randomly divided into a training set (75%) and a validation set (25%) and then used to develop linear discriminant analysis (LDA) and random forest (RF) MOA assignment models. The LDA and RF models had high internal concordance and specificity and were able to produce overall prediction accuracies ranging from 84.5 to 87.7% for the validation set. These results demonstrate that computational chemistry approaches can be used to determine the acute toxicity MOAs across a large range of structures and mechanisms.
Water chemistry in 179 randomly selected Swedish headwater streams related to forest production, clear-felling and climate.

PubMed

Löfgren, Stefan; Fröberg, Mats; Yu, Jun; Nisell, Jakob; Ranneby, Bo

2014-12-01

From a policy perspective, it is important to understand forestry effects on surface waters from a landscape perspective. The EU Water Framework Directive demands remedial actions if not achieving good ecological status. In Sweden, 44 % of the surface water bodies have moderate ecological status or worse. Many of these drain catchments with a mosaic of managed forests. It is important for the forestry sector and water authorities to be able to identify where, in the forested landscape, special precautions are necessary. The aim of this study was to quantify the relations between forestry parameters and headwater stream concentrations of nutrients, organic matter and acid-base chemistry. The results are put into the context of regional climate, sulphur and nitrogen deposition, as well as marine influences. Water chemistry was measured in 179 randomly selected headwater streams from two regions in southwest and central Sweden, corresponding to 10 % of the Swedish land area. Forest status was determined from satellite images and Swedish National Forest Inventory data using the probabilistic classifier method, which was used to model stream water chemistry with Bayesian model averaging. The results indicate that concentrations of e.g. nitrogen, phosphorus and organic matter are related to factors associated with forest production but that it is not forestry per se that causes the excess losses. Instead, factors simultaneously affecting forest production and stream water chemistry, such as climate, extensive soil pools and nitrogen deposition, are the most likely candidates The relationships with clear-felled and wetland areas are likely to be direct effects.
Late Quaternary vegetation, biodiversity and fire dynamics on the southern Brazilian highland and their implication for conservation and management of modern Araucaria forest and grassland ecosystems.

PubMed

Behling, Hermann; Pillar, Valério DePatta

2007-02-28

Palaeoecological background information is needed for management and conservation of the highly diverse mosaic of Araucaria forest and Campos (grassland) in southern Brazil. Questions on the origin of Araucaria forest and grasslands; its development, dynamic and stability; its response to environmental change such as climate; and the role of human impact are essential. Further questions on its natural stage of vegetation or its alteration by pre- and post-Columbian anthropogenic activity are also important. To answer these questions, palaeoecological and palaeoenvironmental data based on pollen, charcoal and multivariate data analysis of radiocarbon dated sedimentary archives from southern Brazil are used to provide an insight into past vegetation changes, which allows us to improve our understanding of the modern vegetation and to develop conservation and management strategies for the strongly affected ecosystems in southern Brazil.
ReliefSeq: A Gene-Wise Adaptive-K Nearest-Neighbor Feature Selection Tool for Finding Gene-Gene Interactions and Main Effects in mRNA-Seq Gene Expression Data

PubMed Central

McKinney, Brett A.; White, Bill C.; Grill, Diane E.; Li, Peter W.; Kennedy, Richard B.; Poland, Gregory A.; Oberg, Ann L.

2013-01-01

Relief-F is a nonparametric, nearest-neighbor machine learning method that has been successfully used to identify relevant variables that may interact in complex multivariate models to explain phenotypic variation. While several tools have been developed for assessing differential expression in sequence-based transcriptomics, the detection of statistical interactions between transcripts has received less attention in the area of RNA-seq analysis. We describe a new extension and assessment of Relief-F for feature selection in RNA-seq data. The ReliefSeq implementation adapts the number of nearest neighbors (k) for each gene to optimize the Relief-F test statistics (importance scores) for finding both main effects and interactions. We compare this gene-wise adaptive-k (gwak) Relief-F method with standard RNA-seq feature selection tools, such as DESeq and edgeR, and with the popular machine learning method Random Forests. We demonstrate performance on a panel of simulated data that have a range of distributional properties reflected in real mRNA-seq data including multiple transcripts with varying sizes of main effects and interaction effects. For simulated main effects, gwak-Relief-F feature selection performs comparably to standard tools DESeq and edgeR for ranking relevant transcripts. For gene-gene interactions, gwak-Relief-F outperforms all comparison methods at ranking relevant genes in all but the highest fold change/highest signal situations where it performs similarly. The gwak-Relief-F algorithm outperforms Random Forests for detecting relevant genes in all simulation experiments. In addition, Relief-F is comparable to the other methods based on computational time. We also apply ReliefSeq to an RNA-Seq study of smallpox vaccine to identify gene expression changes between vaccinia virus-stimulated and unstimulated samples. ReliefSeq is an attractive tool for inclusion in the suite of tools used for analysis of mRNA-Seq data; it has power to detect both main effects and interaction effects. Software Availability: http://insilico.utulsa.edu/ReliefSeq.php. PMID:24339943
High Quality Facade Segmentation Based on Structured Random Forest, Region Proposal Network and Rectangular Fitting

NASA Astrophysics Data System (ADS)

Rahmani, K.; Mayer, H.

2018-05-01

In this paper we present a pipeline for high quality semantic segmentation of building facades using Structured Random Forest (SRF), Region Proposal Network (RPN) based on a Convolutional Neural Network (CNN) as well as rectangular fitting optimization. Our main contribution is that we employ features created by the RPN as channels in the SRF.We empirically show that this is very effective especially for doors and windows. Our pipeline is evaluated on two datasets where we outperform current state-of-the-art methods. Additionally, we quantify the contribution of the RPN and the rectangular fitting optimization on the accuracy of the result.
Bridging the gap between formal and experience-based knowledge for context-aware laparoscopy.

PubMed

Katić, Darko; Schuck, Jürgen; Wekerle, Anna-Laura; Kenngott, Hannes; Müller-Stich, Beat Peter; Dillmann, Rüdiger; Speidel, Stefanie

2016-06-01

Computer assistance is increasingly common in surgery. However, the amount of information is bound to overload processing abilities of surgeons. We propose methods to recognize the current phase of a surgery for context-aware information filtering. The purpose is to select the most suitable subset of information for surgical situations which require special assistance. We combine formal knowledge, represented by an ontology, and experience-based knowledge, represented by training samples, to recognize phases. For this purpose, we have developed two different methods. Firstly, we use formal knowledge about possible phase transitions to create a composition of random forests. Secondly, we propose a method based on cultural optimization to infer formal rules from experience to recognize phases. The proposed methods are compared with a purely formal knowledge-based approach using rules and a purely experience-based one using regular random forests. The comparative evaluation on laparoscopic pancreas resections and adrenalectomies employs a consistent set of quality criteria on clean and noisy input. The rule-based approaches proved best with noisefree data. The random forest-based ones were more robust in the presence of noise. Formal and experience-based knowledge can be successfully combined for robust phase recognition.
Analysis and Recognition of Traditional Chinese Medicine Pulse Based on the Hilbert-Huang Transform and Random Forest in Patients with Coronary Heart Disease

PubMed Central

Wang, Yiqin; Yan, Hanxia; Yan, Jianjun; Yuan, Fengyin; Xu, Zhaoxia; Liu, Guoping; Xu, Wenjie

2015-01-01

Objective. This research provides objective and quantitative parameters of the traditional Chinese medicine (TCM) pulse conditions for distinguishing between patients with the coronary heart disease (CHD) and normal people by using the proposed classification approach based on Hilbert-Huang transform (HHT) and random forest. Methods. The energy and the sample entropy features were extracted by applying the HHT to TCM pulse by treating these pulse signals as time series. By using the random forest classifier, the extracted two types of features and their combination were, respectively, used as input data to establish classification model. Results. Statistical results showed that there were significant differences in the pulse energy and sample entropy between the CHD group and the normal group. Moreover, the energy features, sample entropy features, and their combination were inputted as pulse feature vectors; the corresponding average recognition rates were 84%, 76.35%, and 90.21%, respectively. Conclusion. The proposed approach could be appropriately used to analyze pulses of patients with CHD, which can lay a foundation for research on objective and quantitative criteria on disease diagnosis or Zheng differentiation. PMID:26180536
Analysis and Recognition of Traditional Chinese Medicine Pulse Based on the Hilbert-Huang Transform and Random Forest in Patients with Coronary Heart Disease.

PubMed

Guo, Rui; Wang, Yiqin; Yan, Hanxia; Yan, Jianjun; Yuan, Fengyin; Xu, Zhaoxia; Liu, Guoping; Xu, Wenjie

2015-01-01

Objective. This research provides objective and quantitative parameters of the traditional Chinese medicine (TCM) pulse conditions for distinguishing between patients with the coronary heart disease (CHD) and normal people by using the proposed classification approach based on Hilbert-Huang transform (HHT) and random forest. Methods. The energy and the sample entropy features were extracted by applying the HHT to TCM pulse by treating these pulse signals as time series. By using the random forest classifier, the extracted two types of features and their combination were, respectively, used as input data to establish classification model. Results. Statistical results showed that there were significant differences in the pulse energy and sample entropy between the CHD group and the normal group. Moreover, the energy features, sample entropy features, and their combination were inputted as pulse feature vectors; the corresponding average recognition rates were 84%, 76.35%, and 90.21%, respectively. Conclusion. The proposed approach could be appropriately used to analyze pulses of patients with CHD, which can lay a foundation for research on objective and quantitative criteria on disease diagnosis or Zheng differentiation.
Disaggregating census data for population mapping using random forests with remotely-sensed and ancillary data.

PubMed

Stevens, Forrest R; Gaughan, Andrea E; Linard, Catherine; Tatem, Andrew J

2015-01-01

High resolution, contemporary data on human population distributions are vital for measuring impacts of population growth, monitoring human-environment interactions and for planning and policy development. Many methods are used to disaggregate census data and predict population densities for finer scale, gridded population data sets. We present a new semi-automated dasymetric modeling approach that incorporates detailed census and ancillary data in a flexible, "Random Forest" estimation technique. We outline the combination of widely available, remotely-sensed and geospatial data that contribute to the modeled dasymetric weights and then use the Random Forest model to generate a gridded prediction of population density at ~100 m spatial resolution. This prediction layer is then used as the weighting surface to perform dasymetric redistribution of the census counts at a country level. As a case study we compare the new algorithm and its products for three countries (Vietnam, Cambodia, and Kenya) with other common gridded population data production methodologies. We discuss the advantages of the new method and increases over the accuracy and flexibility of those previous approaches. Finally, we outline how this algorithm will be extended to provide freely-available gridded population data sets for Africa, Asia and Latin America.
Studies of the DIII-D disruption database using Machine Learning algorithms

NASA Astrophysics Data System (ADS)

Rea, Cristina; Granetz, Robert; Meneghini, Orso

2017-10-01

A Random Forests Machine Learning algorithm, trained on a large database of both disruptive and non-disruptive DIII-D discharges, predicts disruptive behavior in DIII-D with about 90% of accuracy. Several algorithms have been tested and Random Forests was found superior in performances for this particular task. Over 40 plasma parameters are included in the database, with data for each of the parameters taken from 500k time slices. We focused on a subset of non-dimensional plasma parameters, deemed to be good predictors based on physics considerations. Both binary (disruptive/non-disruptive) and multi-label (label based on the elapsed time before disruption) classification problems are investigated. The Random Forests algorithm provides insight on the available dataset by ranking the relative importance of the input features. It is found that q95 and Greenwald density fraction (n/nG) are the most relevant parameters for discriminating between DIII-D disruptive and non-disruptive discharges. A comparison with the Gradient Boosted Trees algorithm is shown and the first results coming from the application of regression algorithms are presented. Work supported by the US Department of Energy under DE-FC02-04ER54698, DE-SC0014264 and DE-FG02-95ER54309.
Analysis of landslide hazard area in Ludian earthquake based on Random Forests

NASA Astrophysics Data System (ADS)

Xie, J.-C.; Liu, R.; Li, H.-W.; Lai, Z.-L.

2015-04-01

With the development of machine learning theory, more and more algorithms are evaluated for seismic landslides. After the Ludian earthquake, the research team combine with the special geological structure in Ludian area and the seismic filed exploration results, selecting SLOPE(PODU); River distance(HL); Fault distance(DC); Seismic Intensity(LD) and Digital Elevation Model(DEM), the normalized difference vegetation index(NDVI) which based on remote sensing images as evaluation factors. But the relationships among these factors are fuzzy, there also exists heavy noise and high-dimensional, we introduce the random forest algorithm to tolerate these difficulties and get the evaluation result of Ludian landslide areas, in order to verify the accuracy of the result, using the ROC graphs for the result evaluation standard, AUC covers an area of 0.918, meanwhile, the random forest's generalization error rate decreases with the increase of the classification tree to the ideal 0.08 by using Out Of Bag(OOB) Estimation. Studying the final landslides inversion results, paper comes to a statistical conclusion that near 80% of the whole landslides and dilapidations are in areas with high susceptibility and moderate susceptibility, showing the forecast results are reasonable and adopted.
Data mining methods in the prediction of Dementia: A real-data comparison of the accuracy, sensitivity and specificity of linear discriminant analysis, logistic regression, neural networks, support vector machines, classification trees and random forests

PubMed Central

2011-01-01

Background Dementia and cognitive impairment associated with aging are a major medical and social concern. Neuropsychological testing is a key element in the diagnostic procedures of Mild Cognitive Impairment (MCI), but has presently a limited value in the prediction of progression to dementia. We advance the hypothesis that newer statistical classification methods derived from data mining and machine learning methods like Neural Networks, Support Vector Machines and Random Forests can improve accuracy, sensitivity and specificity of predictions obtained from neuropsychological testing. Seven non parametric classifiers derived from data mining methods (Multilayer Perceptrons Neural Networks, Radial Basis Function Neural Networks, Support Vector Machines, CART, CHAID and QUEST Classification Trees and Random Forests) were compared to three traditional classifiers (Linear Discriminant Analysis, Quadratic Discriminant Analysis and Logistic Regression) in terms of overall classification accuracy, specificity, sensitivity, Area under the ROC curve and Press'Q. Model predictors were 10 neuropsychological tests currently used in the diagnosis of dementia. Statistical distributions of classification parameters obtained from a 5-fold cross-validation were compared using the Friedman's nonparametric test. Results Press' Q test showed that all classifiers performed better than chance alone (p < 0.05). Support Vector Machines showed the larger overall classification accuracy (Median (Me) = 0.76) an area under the ROC (Me = 0.90). However this method showed high specificity (Me = 1.0) but low sensitivity (Me = 0.3). Random Forest ranked second in overall accuracy (Me = 0.73) with high area under the ROC (Me = 0.73) specificity (Me = 0.73) and sensitivity (Me = 0.64). Linear Discriminant Analysis also showed acceptable overall accuracy (Me = 0.66), with acceptable area under the ROC (Me = 0.72) specificity (Me = 0.66) and sensitivity (Me = 0.64). The remaining classifiers showed overall classification accuracy above a median value of 0.63, but for most sensitivity was around or even lower than a median value of 0.5. Conclusions When taking into account sensitivity, specificity and overall classification accuracy Random Forests and Linear Discriminant analysis rank first among all the classifiers tested in prediction of dementia using several neuropsychological tests. These methods may be used to improve accuracy, sensitivity and specificity of Dementia predictions from neuropsychological testing. PMID:21849043
On the information content of hydrological signatures and their relationship to catchment attributes

NASA Astrophysics Data System (ADS)

Addor, Nans; Clark, Martyn P.; Prieto, Cristina; Newman, Andrew J.; Mizukami, Naoki; Nearing, Grey; Le Vine, Nataliya

2017-04-01

Hydrological signatures, which are indices characterizing hydrologic behavior, are increasingly used for the evaluation, calibration and selection of hydrological models. Their key advantage is to provide more direct insights into specific hydrological processes than aggregated metrics (e.g., the Nash-Sutcliffe efficiency). A plethora of signatures now exists, which enable characterizing a variety of hydrograph features, but also makes the selection of signatures for new studies challenging. Here we propose that the selection of signatures should be based on their information content, which we estimated using several approaches, all leading to similar conclusions. To explore the relationship between hydrological signatures and the landscape, we extended a previously published data set of hydrometeorological time series for 671 catchments in the contiguous United States, by characterizing the climatic conditions, topography, soil, vegetation and stream network of each catchment. This new catchment attributes data set will soon be in open access, and we are looking forward to introducing it to the community. We used this data set in a data-learning algorithm (random forests) to explore whether hydrological signatures could be inferred from catchment attributes alone. We find that some signatures can be predicted remarkably well by random forests and, interestingly, the same signatures are well captured when simulating discharge using a conceptual hydrological model. We discuss what this result reveals about our understanding of hydrological processes shaping hydrological signatures. We also identify which catchment attributes exert the strongest control on catchment behavior, in particular during extreme hydrological events. Overall, climatic attributes have the most significant influence, and strongly condition how well hydrological signatures can be predicted by random forests and simulated by the hydrological model. In contrast, soil characteristics at the catchment scale are not found to be significant predictors by random forests, which raises questions on how to best use soil data for hydrological modeling, for instance for parameter estimation. We finally demonstrate that signatures with high spatial variability are poorly captured by random forests and model simulations, which makes their regionalization delicate. We conclude with a ranking of signatures based on their information content, and propose that the signatures with high information content are best suited for model calibration, model selection and understanding hydrologic similarity.
A new multivariate zero-adjusted Poisson model with applications to biomedicine.

PubMed

Liu, Yin; Tian, Guo-Liang; Tang, Man-Lai; Yuen, Kam Chuen

2018-05-25

Recently, although advances were made on modeling multivariate count data, existing models really has several limitations: (i) The multivariate Poisson log-normal model (Aitchison and Ho, ) cannot be used to fit multivariate count data with excess zero-vectors; (ii) The multivariate zero-inflated Poisson (ZIP) distribution (Li et al., 1999) cannot be used to model zero-truncated/deflated count data and it is difficult to apply to high-dimensional cases; (iii) The Type I multivariate zero-adjusted Poisson (ZAP) distribution (Tian et al., 2017) could only model multivariate count data with a special correlation structure for random components that are all positive or negative. In this paper, we first introduce a new multivariate ZAP distribution, based on a multivariate Poisson distribution, which allows the correlations between components with a more flexible dependency structure, that is some of the correlation coefficients could be positive while others could be negative. We then develop its important distributional properties, and provide efficient statistical inference methods for multivariate ZAP model with or without covariates. Two real data examples in biomedicine are used to illustrate the proposed methods. © 2018 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

Effect of urbanization on the structure and functional traits of remnant subtropical evergreen broad-leaved forests in South China.

PubMed

Huang, Liujing; Chen, Hongfeng; Ren, Hai; Wang, Jun; Guo, Qinfeng

2013-06-01

We investigated the effects of major environmental drivers associated with urbanization on species diversity and plant functional traits (PFTs) in the remnant subtropical evergreen broad-leaved forests in Metropolitan Guangzhou (Guangdong, China). Twenty environmental factors including topography, light, and soil properties were used to quantify the effects of urbanization. Vegetation data and soil properties were collected from 30 400-m(2) plots at 6 study sites in urban and rural areas. The difference of plant species diversity and PFTs of remnant forests between urban and rural areas were analyzed. To discern the complex relationships, multivariate statistical analyses (e.g., canonical correspondence analysis and regression analysis) were employed. Pioneer species and stress-tolerant species can survive and vigorously establish their population dominance in the urban environment. The native herb diversity was lower in urban forests than in rural forests. Urban forests tend to prefer the species with Mesophanerophyte life form. In contrast, species in rural forests possessed Chamaephyte and Nanophanerophyte life forms and gravity/clonal growth dispersal mode. Soil pH and soil nutrients (K, Na, and TN) were positively related to herb diversity, while soil heavy metal concentrations (Cu) were negatively correlated with herb diversity. The herb plant species diversity declines and the species in the remnant forests usually have stress-tolerant functional traits in response to urbanization. The factors related to urbanization such as soil acidification, nutrient leaching, and heavy metal pollution were important in controlling the plant diversity in the forests along the urban-rural gradients. Urbanization affects the structure and functional traits of remnant subtropical evergreen broad-leaved forests.
Discriminant forest classification method and system

DOEpatents

Chen, Barry Y.; Hanley, William G.; Lemmond, Tracy D.; Hiller, Lawrence J.; Knapp, David A.; Mugge, Marshall J.

2012-11-06

A hybrid machine learning methodology and system for classification that combines classical random forest (RF) methodology with discriminant analysis (DA) techniques to provide enhanced classification capability. A DA technique which uses feature measurements of an object to predict its class membership, such as linear discriminant analysis (LDA) or Andersen-Bahadur linear discriminant technique (AB), is used to split the data at each node in each of its classification trees to train and grow the trees and the forest. When training is finished, a set of n DA-based decision trees of a discriminant forest is produced for use in predicting the classification of new samples of unknown class.
Effect of inventory method on niche models: random versus systematic error

Treesearch

Heather E. Lintz; Andrew N. Gray; Bruce McCune

2013-01-01

Data from large-scale biological inventories are essential for understanding and managing Earth's ecosystems. The Forest Inventory and Analysis Program (FIA) of the U.S. Forest Service is the largest biological inventory in North America; however, the FIA inventory recently changed from an amalgam of different approaches to a nationally-standardized approach in...
Modeling species’ realized climatic niche space and predicting their response to global warming for several western forest species with small geographic distributions

Treesearch

Marcus V. Warwell; Gerald E. Rehfeldt; Nicholas L. Crookston

2010-01-01

The Random Forests multiple regression tree was used to develop an empirically based bioclimatic model of the presence-absence of species occupying small geographic distributions in western North America. The species assessed were subalpine larch (Larix lyallii), smooth Arizona cypress (Cupressus arizonica ssp. glabra...
Determining soil erosion from roads in coastal plain of Alabama

Treesearch

McFero Grace; W.J. Elliot

2008-01-01

This paper reports soil losses and observed sediment deposition for 16 randomly selected forest road sections in the National Forests of Alabama. Visible sediment deposition zones were tracked along the stormwater flow path to the most remote location as a means of quantifying soil loss from road sections. Volumes of sediment in deposition zones were determined by...
Quantifying the abundance of co-occurring conifers along Inland Northwest (USA) climate gradients

Treesearch

Gerald E. Rehfeldt; Dennis E. Ferguson; Nicholas L. Crookston

2008-01-01

The occurrence and abundance of conifers along climate gradients in the Inland Northwest (USA) was assessed using data from 5082 field plots, 81% of which were forested. Analyses using the Random Forests classification tree revealed that the sequential distribution of species along an altitudinal gradient could be predicted with reasonable accuracy from a single...
Patterns among the ashes: Exploring the relationship between landscape pattern and the emerald ash borer

Treesearch

Susan J. Crocker; Dacia M. Meneguzzo; Greg C. Liknes

2010-01-01

Landscape metrics, including host abundance and population density, were calculated using forest inventory and land cover data to assess the relationship between landscape pattern and the presence or absence of the emerald ash borer (EAB) (Agrilus planipennis Fairmaire). The Random Forests classification algorithm in the R statistical environment was...
Quantitative Trait Inheritance in a Forty-Year-Old Longleaf Pine Partial Diallel Test

Treesearch

Michael Stine; Jim Roberds; C. Dana Nelson; David P. Gwaze; Todd Shupe; Les Groom

2002-01-01

A longleaf pine (Pinus palustris Mill.) 13 parent partial diallel field experiment was established at two locations on the Harrison Experimental Forest in 1960. Parent trees were randomly selected from a natural population growing on the Harrison Experimental Forest, near Gulfport, Miss. Distance between trees chosen as parents ranged from 13 to 357...
LiDAR based prediction of forest biomass using hierarchical models with spatially varying coefficients

USGS Publications Warehouse

Babcock, Chad; Finley, Andrew O.; Bradford, John B.; Kolka, Randall K.; Birdsey, Richard A.; Ryan, Michael G.

2015-01-01

Many studies and production inventory systems have shown the utility of coupling covariates derived from Light Detection and Ranging (LiDAR) data with forest variables measured on georeferenced inventory plots through regression models. The objective of this study was to propose and assess the use of a Bayesian hierarchical modeling framework that accommodates both residual spatial dependence and non-stationarity of model covariates through the introduction of spatial random effects. We explored this objective using four forest inventory datasets that are part of the North American Carbon Program, each comprising point-referenced measures of above-ground forest biomass and discrete LiDAR. For each dataset, we considered at least five regression model specifications of varying complexity. Models were assessed based on goodness of fit criteria and predictive performance using a 10-fold cross-validation procedure. Results showed that the addition of spatial random effects to the regression model intercept improved fit and predictive performance in the presence of substantial residual spatial dependence. Additionally, in some cases, allowing either some or all regression slope parameters to vary spatially, via the addition of spatial random effects, further improved model fit and predictive performance. In other instances, models showed improved fit but decreased predictive performance—indicating over-fitting and underscoring the need for cross-validation to assess predictive ability. The proposed Bayesian modeling framework provided access to pixel-level posterior predictive distributions that were useful for uncertainty mapping, diagnosing spatial extrapolation issues, revealing missing model covariates, and discovering locally significant parameters.
Change in phylogenetic community structure during succession of traditionally managed tropical rainforest in southwest China.

PubMed

Mo, Xiao-Xue; Shi, Ling-Ling; Zhang, Yong-Jiang; Zhu, Hua; Slik, J W Ferry

2013-01-01

Tropical rainforests in Southeast Asia are facing increasing and ever more intense human disturbance that often negatively affects biodiversity. The aim of this study was to determine how tree species phylogenetic diversity is affected by traditional forest management types and to understand the change in community phylogenetic structure during succession. Four types of forests with different management histories were selected for this purpose: old growth forests, understorey planted old growth forests, old secondary forests (∼200-years after slash and burn), and young secondary forests (15-50-years after slash and burn). We found that tree phylogenetic community structure changed from clustering to over-dispersion from early to late successional forests and finally became random in old-growth forest. We also found that the phylogenetic structure of the tree overstorey and understorey responded differentially to change in environmental conditions during succession. In addition, we show that slash and burn agriculture (swidden cultivation) can increase landscape level plant community evolutionary information content.
Change in Phylogenetic Community Structure during Succession of Traditionally Managed Tropical Rainforest in Southwest China

PubMed Central

Mo, Xiao-Xue; Shi, Ling-Ling; Zhang, Yong-Jiang; Zhu, Hua; Slik, J. W. Ferry

2013-01-01

Tropical rainforests in Southeast Asia are facing increasing and ever more intense human disturbance that often negatively affects biodiversity. The aim of this study was to determine how tree species phylogenetic diversity is affected by traditional forest management types and to understand the change in community phylogenetic structure during succession. Four types of forests with different management histories were selected for this purpose: old growth forests, understorey planted old growth forests, old secondary forests (∼200-years after slash and burn), and young secondary forests (15–50-years after slash and burn). We found that tree phylogenetic community structure changed from clustering to over-dispersion from early to late successional forests and finally became random in old-growth forest. We also found that the phylogenetic structure of the tree overstorey and understorey responded differentially to change in environmental conditions during succession. In addition, we show that slash and burn agriculture (swidden cultivation) can increase landscape level plant community evolutionary information content. PMID:23936268
The structure of tropical forests and sphere packings

PubMed Central

Jahn, Markus Wilhelm; Dobner, Hans-Jürgen; Wiegand, Thorsten; Huth, Andreas

2015-01-01

The search for simple principles underlying the complex architecture of ecological communities such as forests still challenges ecological theorists. We use tree diameter distributions—fundamental for deriving other forest attributes—to describe the structure of tropical forests. Here we argue that tree diameter distributions of natural tropical forests can be explained by stochastic packing of tree crowns representing a forest crown packing system: a method usually used in physics or chemistry. We demonstrate that tree diameter distributions emerge accurately from a surprisingly simple set of principles that include site-specific tree allometries, random placement of trees, competition for space, and mortality. The simple static model also successfully predicted the canopy structure, revealing that most trees in our two studied forests grow up to 30–50 m in height and that the highest packing density of about 60% is reached between the 25- and 40-m height layer. Our approach is an important step toward identifying a minimal set of processes responsible for generating the spatial structure of tropical forests. PMID:26598678
Using genetic algorithms to optimize k-Nearest Neighbors configurations for use with airborne laser scanning data

Treesearch

Ronald E. McRoberts; Grant M. Domke; Qi Chen; Erik Næsset; Terje Gobakken

2016-01-01

The relatively small sampling intensities used by national forest inventories are often insufficient to produce the desired precision for estimates of population parameters unless the estimation process is augmented with auxiliary information, usually in the form of remotely sensed data. The k-Nearest Neighbors (k-NN) technique is a non-parametric,multivariate approach...
Ecological effects of alternative fuel-reduction treatments: highlights of the National Fire and Fire Surrogate study (FFS)

Treesearch

James D. McIver; Scott L. Stephens; James K. Agee; Jamie Barbour; Ralph E. J. Boerner; Carl B. Edminster; Karen L. Erickson; Kerry L. Farris; Christopher J. Fettig; Carl E. Fiedler; Sally Haase; Stephen C. Hart; Jon E. Keeley; Eric E. Knapp; John F. Lehmkuhl; Jason J. Moghaddas; William Otrosina; Kenneth W. Outcalt; Dylan W. Schwilk; Carl N. Skinner; Thomas A. Waldrop; C. Phillip Weatherspoon; Daniel A. Yaussy; Andrew Youngblood; Steve Zack

2012-01-01

The 12-site National Fire and Fire Surrogate study (FFS) was a multivariate experiment that evaluated ecological consequences of alternative fuel-reduction treatments in seasonally dry forests of the US. Each site was a replicated experiment with a common design that compared an un-manipulated control, prescribed fire, mechanical and mechanical + fire treatments....
Per capita community-level effects of an invasive grass, Microstegium vimineum, on vegetation in mesic forests in northern Mississippi (USA)

Treesearch

J. Stephen Brewer

2010-01-01

Quantifying per capita impacts of invasive species on resident communities requires integrating regression analyses with experiments under natural conditions. Using multivariate and univariate approaches, I regressed the abundance of 105 resident species of groundcover plants and tree seedlings against the abundance and height of an invasive grass, Microstegium...
Application of multivariable search techniques to the optimization of airfoils in a low speed nonlinear inviscid flow field

NASA Technical Reports Server (NTRS)

Hague, D. S.; Merz, A. W.

1975-01-01

Multivariable search techniques are applied to a particular class of airfoil optimization problems. These are the maximization of lift and the minimization of disturbance pressure magnitude in an inviscid nonlinear flow field. A variety of multivariable search techniques contained in an existing nonlinear optimization code, AESOP, are applied to this design problem. These techniques include elementary single parameter perturbation methods, organized search such as steepest-descent, quadratic, and Davidon methods, randomized procedures, and a generalized search acceleration technique. Airfoil design variables are seven in number and define perturbations to the profile of an existing NACA airfoil. The relative efficiency of the techniques are compared. It is shown that elementary one parameter at a time and random techniques compare favorably with organized searches in the class of problems considered. It is also shown that significant reductions in disturbance pressure magnitude can be made while retaining reasonable lift coefficient values at low free stream Mach numbers.
Analysis of petroleum contaminated soils by spectral modeling and pure response profile recovery of n-hexane.

PubMed

Chakraborty, Somsubhra; Weindorf, David C; Li, Bin; Ali, Md Nasim; Majumdar, K; Ray, D P

2014-07-01

This pilot study compared penalized spline regression (PSR) and random forest (RF) regression using visible and near-infrared diffuse reflectance spectroscopy (VisNIR DRS) derived spectra of 164 petroleum contaminated soils after two different spectral pretreatments [first derivative (FD) and standard normal variate (SNV) followed by detrending] for rapid quantification of soil petroleum contamination. Additionally, a new analytical approach was proposed for the recovery of the pure spectral and concentration profiles of n-hexane present in the unresolved mixture of petroleum contaminated soils using multivariate curve resolution alternating least squares (MCR-ALS). The PSR model using FD spectra (r(2) = 0.87, RMSE = 0.580 log10 mg kg(-1), and residual prediction deviation = 2.78) outperformed all other models tested. Quantitative results obtained by MCR-ALS for n-hexane in presence of interferences (r(2) = 0.65 and RMSE 0.261 log10 mg kg(-1)) were comparable to those obtained using FD (PSR) model. Furthermore, MCR ALS was able to recover pure spectra of n-hexane. Copyright © 2014 Elsevier Ltd. All rights reserved.
What's in a title? An assessment of whether randomized controlled trial in a title means that it is one.

PubMed

Koletsi, Despina; Pandis, Nikolaos; Polychronopoulou, Argy; Eliades, Theodore

2012-06-01

In this study, we aimed to investigate whether studies published in orthodontic journals and titled as randomized clinical trials are truly randomized clinical trials. A second objective was to explore the association of journal type and other publication characteristics on correct classification. American Journal of Orthodontics and Dentofacial Orthopedics, European Journal of Orthodontics, Angle Orthodontist, Journal of Orthodontics, Orthodontics and Craniofacial Research, World Journal of Orthodontics, Australian Orthodontic Journal, and Journal of Orofacial Orthopedics were hand searched for clinical trials labeled in the title as randomized from 1979 to July 2011. The data were analyzed by using descriptive statistics, and univariable and multivariable examinations of statistical associations via ordinal logistic regression modeling (proportional odds model). One hundred twelve trials were identified. Of the included trials, 33 (29.5%) were randomized clinical trials, 52 (46.4%) had an unclear status, and 27 (24.1%) were not randomized clinical trials. In the multivariable analysis among the included journal types, year of publication, number of authors, multicenter trial, and involvement of statistician were significant predictors of correctly classifying a study as a randomized clinical trial vs unclear and not a randomized clinical trial. From 112 clinical trials in the orthodontic literature labeled as randomized clinical trials, only 29.5% were identified as randomized clinical trials based on clear descriptions of appropriate random number generation and allocation concealment. The type of journal, involvement of a statistician, multicenter trials, greater numbers of authors, and publication year were associated with correct clinical trial classification. This study indicates the need of clear and accurate reporting of clinical trials and the need for educating investigators on randomized clinical trial methodology. Copyright © 2012 American Association of Orthodontists. Published by Mosby, Inc. All rights reserved.
Treatment effect of methylphenidate on intrinsic functional brain network in medication-naïve ADHD children: A multivariate analysis.

PubMed

Yoo, Jae Hyun; Kim, Dohyun; Choi, Jeewook; Jeong, Bumseok

2018-04-01

Methylphenidate is a first-line therapeutic option for treating attention-deficit/hyperactivity disorder (ADHD); however, elicited changes on resting-state functional networks (RSFNs) are not well understood. This study investigated the treatment effect of methylphenidate using a variety of RSFN analyses and explored the collaborative influences of treatment-relevant RSFN changes in children with ADHD. Resting-state functional magnetic resonance imaging was acquired from 20 medication-naïve ADHD children before methylphenidate treatment and twelve weeks later. Changes in large-scale functional connectivity were defined using independent component analysis with dual regression and graph theoretical analysis. The amplitude of low frequency fluctuation (ALFF) was measured to investigate local spontaneous activity alteration. Finally, significant findings were recruited to random forest regression to identify the feature subset that best explains symptom improvement. After twelve weeks of methylphenidate administration, large-scale connectivity was increased between the left fronto-parietal RSFN and the left insula cortex and the right fronto-parietal and the brainstem, while the clustering coefficient (CC) of the global network and nodes, the left fronto-parietal, cerebellum, and occipital pole-visual network, were decreased. ALFF was increased in the bilateral superior parietal cortex and decreased in the right inferior fronto-temporal area. The subset of the local and large-scale RSFN changes, including widespread ALFF changes, the CC of the global network and the cerebellum, could explain the 27.1% variance of the ADHD Rating Scale and 13.72% of the Conner's Parent Rating Scale. Our multivariate approach suggests that the neural mechanism of methylphenidate treatment could be associated with alteration of spontaneous activity in the superior parietal cortex or widespread brain regions as well as functional segregation of the large-scale intrinsic functional network.
Circularly-symmetric complex normal ratio distribution for scalar transmissibility functions. Part I: Fundamentals

NASA Astrophysics Data System (ADS)

Yan, Wang-Ji; Ren, Wei-Xin

2016-12-01

Recent advances in signal processing and structural dynamics have spurred the adoption of transmissibility functions in academia and industry alike. Due to the inherent randomness of measurement and variability of environmental conditions, uncertainty impacts its applications. This study is focused on statistical inference for raw scalar transmissibility functions modeled as complex ratio random variables. The goal is achieved through companion papers. This paper (Part I) is dedicated to dealing with a formal mathematical proof. New theorems on multivariate circularly-symmetric complex normal ratio distribution are proved on the basis of principle of probabilistic transformation of continuous random vectors. The closed-form distributional formulas for multivariate ratios of correlated circularly-symmetric complex normal random variables are analytically derived. Afterwards, several properties are deduced as corollaries and lemmas to the new theorems. Monte Carlo simulation (MCS) is utilized to verify the accuracy of some representative cases. This work lays the mathematical groundwork to find probabilistic models for raw scalar transmissibility functions, which are to be expounded in detail in Part II of this study.

Allocasuarina tree hosts determine the spatial distribution of hypogeous fungal sporocarps in three tropical Australian sclerophyll forests.

PubMed

Abell-Davis, Sandra E; Gadek, Paul A; Pearce, Ceridwen A; Congdon, Bradley C

2012-01-01

Across three tropical Australian sclerophyll forest types, site-specific environmental variables could explain the distribution of both quantity (abundance and biomass) and richness (genus and species) of hypogeous fungi sporocarps. Quantity was significantly higher in the Allocasuarina forest sites that had high soil nitrogen but low phosphorous. Three genera of hypogeous fungi were found exclusively in Allocasuarina forest sites including Gummiglobus, Labyrinthomyces and Octaviania, as were some species of Castoreum, Chondrogaster, Endogone, Hysterangium and Russula. However, the forest types did not all group according to site-scale variables and subsequently the taxonomic assemblages were not significantly different between the three forest types. At site scale, significant negative relationships were found between phosphorous concentration and the quantity of hypogeous fungi sporocarps. Using a multivariate information theoretic approach, there were other more plausible models to explain the patterns of sporocarp richness. Both the mean number of fungal genera and species increased with the number of Allocasuarina stems, at the same time decreasing with the number of Eucalyptus stems. The optimal conditions for promoting hypogeous fungi sporocarp quantity and sporocarp richness appear to be related to the presence and abundance of Allocasuarina (Casuarinaceae) host trees. Allocasuarina tree species may have a higher host receptivity for ectomycorrhizal hypogeous fungi species that provide an important food resource for Australian mycophagous animals.
Forest Structure Characterization Using JPL's UAVSAR Multi-Baseline Polarimetric SAR Interferometry and Tomography

NASA Technical Reports Server (NTRS)

Neumann, Maxim; Hensley, Scott; Lavalle, Marco; Ahmed, Razi

2013-01-01

This paper concerns forest remote sensing using JPL's multi-baseline polarimetric interferometric UAVSAR data. It presents exemplary results and analyzes the possibilities and limitations of using SAR Tomography and Polarimetric SAR Interferometry (PolInSAR) techniques for the estimation of forest structure. Performance and error indicators for the applicability and reliability of the used multi-baseline (MB) multi-temporal (MT) PolInSAR random volume over ground (RVoG) model are discussed. Experimental results are presented based on JPL's L-band repeat-pass polarimetric interferometric UAVSAR data over temperate and tropical forest biomes in the Harvard Forest, Massachusetts, and in the La Amistad Park, Panama and Costa Rica. The results are partially compared with ground field measurements and with air-borne LVIS lidar data.
Forest Structure Characterization Using Jpl's UAVSAR Multi-Baseline Polarimetric SAR Interferometry and Tomography

NASA Technical Reports Server (NTRS)

Neumann, Maxim; Hensley, Scott; Lavalle, Marco; Ahmed, Razi

2013-01-01

This paper concerns forest remote sensing using JPL's multi-baseline polarimetric interferometric UAVSAR data. It presents exemplary results and analyzes the possibilities and limitations of using SAR Tomography and Polarimetric SAR Interferometry (PolInSAR) techniques for the estimation of forest structure. Performance and error indicators for the applicability and reliability of the used multi-baseline (MB) multi-temporal (MT) PolInSAR random volume over ground (RVoG) model are discussed. Experimental results are presented based on JPL's L-band repeat-pass polarimetric interferometric UAVSAR data over temperate and tropical forest biomes in the Harvard Forest, Massachusetts, and in the La Amistad Park, Panama and Costa Rica. The results are partially compared with ground field measurements and with air-borne LVIS lidar data.
Design of Probabilistic Random Forests with Applications to Anticancer Drug Sensitivity Prediction

PubMed Central

Rahman, Raziur; Haider, Saad; Ghosh, Souparno; Pal, Ranadip

2015-01-01

Random forests consisting of an ensemble of regression trees with equal weights are frequently used for design of predictive models. In this article, we consider an extension of the methodology by representing the regression trees in the form of probabilistic trees and analyzing the nature of heteroscedasticity. The probabilistic tree representation allows for analytical computation of confidence intervals (CIs), and the tree weight optimization is expected to provide stricter CIs with comparable performance in mean error. We approached the ensemble of probabilistic trees’ prediction from the perspectives of a mixture distribution and as a weighted sum of correlated random variables. We applied our methodology to the drug sensitivity prediction problem on synthetic and cancer cell line encyclopedia dataset and illustrated that tree weights can be selected to reduce the average length of the CI without increase in mean error. PMID:27081304
Computer-Aided Screening of Conjugated Polymers for Organic Solar Cell: Classification by Random Forest.

PubMed

Nagasawa, Shinji; Al-Naamani, Eman; Saeki, Akinori

2018-05-17

Owing to the diverse chemical structures, organic photovoltaic (OPV) applications with a bulk heterojunction framework have greatly evolved over the last two decades, which has produced numerous organic semiconductors exhibiting improved power conversion efficiencies (PCEs). Despite the recent fast progress in materials informatics and data science, data-driven molecular design of OPV materials remains challenging. We report a screening of conjugated molecules for polymer-fullerene OPV applications by supervised learning methods (artificial neural network (ANN) and random forest (RF)). Approximately 1000 experimental parameters including PCE, molecular weight, and electronic properties are manually collected from the literature and subjected to machine learning with digitized chemical structures. Contrary to the low correlation coefficient in ANN, RF yields an acceptable accuracy, which is twice that of random classification. We demonstrate the application of RF screening for the design, synthesis, and characterization of a conjugated polymer, which facilitates a rapid development of optoelectronic materials.
Mountain Pine Beetles and Invasive Plant Species Findings from a Survey of Colorado Community Residents

Treesearch

Courtney Flint; Hua Qin; Michael Daab

2008-01-01

The US Forest Service, Pacific Northwest Research Station funded research to assess community responses to forest disturbance by mountain pine beetles (Dendroctonus ponderosae) and public reaction to invasive plants in north central Colorado. In the Spring of2007, 4,027 16-page questionnaires were mailed to randomly selected households with addresses in Breckenridge,...
Effects of soil compaction, forest leaf litter and nitrogen fertilizer on two oak species and microbial activity

Treesearch

D. Jordan; F., Jr. Ponder; V. C. Hubbard

2003-01-01

A greenhouse study examined the effects of soil compaction and forest leaf litter on the growth and nitrogen (N) uptake and recovery of red oak (Quercus rubra L.) and scarlet oak (Quercus coccinea Muencch) seedlings and selected microbial activity over a 6-month period. The experiment had a randomized complete block design with...
Stemflow estimation in a redwood forest using model-based stratified random sampling

Treesearch

Jack Lewis

2003-01-01

Model-based stratified sampling is illustrated by a case study of stemflow volume in a redwood forest. The approach is actually a model-assisted sampling design in which auxiliary information (tree diameter) is utilized in the design of stratum boundaries to optimize the efficiency of a regression or ratio estimator. The auxiliary information is utilized in both the...
Estimating erosion risk on forest lands using improved methods of discriminant analysis

Treesearch

J. Lewis; R. M. Rice

1990-01-01

A population of 638 timber harvest areas in northwestern California was sampled for data related to the occurrence of critical amounts of erosion (>153 m3 within 0.81 ha). Separate analyses were done for forest roads and logged areas. Linear discriminant functions were computed in each analysis to contrast site conditions at critical plots with randomly selected...
Sample-based estimation of tree species richness in a wet tropical forest compartment

Treesearch

Steen Magnussen; Raphael Pelissier

2007-01-01

Petersen's capture-recapture ratio estimator and the well-known bootstrap estimator are compared across a range of simulated low-intensity simple random sampling with fixed-area plots of 100 m? in a rich wet tropical forest compartment with 93 tree species in the Western Ghats of India. Petersen's ratio estimator was uniformly superior to the bootstrap...
Rates and Implications of Rainfall Interception in a Coastal Redwood Forest

Treesearch

Leslie M. Reid; Jack Lewis

2007-01-01

Throughfall was measured for a year at five-min intervals in 11 collectors randomly located on two plots in a second-growth redwood forest at the Caspar Creek Experimental Watersheds. Monitoring at one plot continued two more years, during which stemflow from 24 trees was also measured. Comparison of throughfall and stemflow to rainfall measured in adjacent clearings...
Variation in soil and forest floor characteristics along gradients of ericaceous, evergreen shrub cover in the southern Appalachians

Treesearch

Jonatha L. Horton; Barton D. Clinton; John F. Walker; Colin M. Beir; Erik T. Nilsen

2009-01-01

Ericaceous shrubs can influence soil properties in many ecosystems. In this study, we examined how soil and forest floor properties vary among sites with different ericaceous evergreen shrub basal area in the southern Appalachian mountains. We randomly located plots along transects that included open understories and understories with varying amounts of Rhododendron...
Predicting relative species composition within mixed conifer forest pixels using zero‐inflated models and Landsat imagery

Treesearch

Shannon L. Savage; Rick L. Lawrence; John R. Squires

2015-01-01

Ecological and land management applications would often benefit from maps of relative canopy cover of each species present within a pixel, instead of traditional remote-sensing based maps of either dominant species or percent canopy cover without regard to species composition. Widely used statistical models for remote sensing, such as randomForest (RF),...
'Pygmy' old-growth redwood characteristics on an edaphic ecotone in Mendocino County, California

Treesearch

Will Russell; Suzie. Woolhouse

2012-01-01

The 'pygmy forest' is a specialized community that is adapted to highly acidic, hydrophobic, nutrient deprived soils, and exists in pockets within the coast redwood forest in Mendocino County. While coast redwood is known as an exceptionally tall tree, stunted trees exhibit unusual growth-forms on pygmy soils. We used a stratified random sampling procedure to...
Ecological impacts and management strategies for western larch in the face of climate-change

Treesearch

Gerald E. Rehfeldt; Barry C. Jaquish

2010-01-01

Approximately 185,000 forest inventory and ecological plots from both USA and Canada were used to predict the contemporary distribution of western larch (Larix occidentalis Nutt.) from climate variables. The random forests algorithm, using an 8-variable model, produced an overall error rate of about 2.9 %, nearly all of which consisted of predicting presence at...
Simulation of long-term landscape-level fuel treatment effects on large wildfires

Treesearch

Mark A. Finney; Rob C. Seli; Charles W. McHugh; Alan A. Ager; Bernhard Bahro; James K. Agee

2008-01-01

A simulation system was developed to explore how fuel treatments placed in topologically random and optimal spatial patterns affect the growth and behaviour of large fires when implemented at different rates over the course of five decades. The system consisted of a forest and fuel dynamics simulation module (Forest Vegetation Simulator, FVS), logic for deriving fuel...
Above ground biomass and tree species richness estimation with airborne lidar in tropical Ghana forests

NASA Astrophysics Data System (ADS)

Vaglio Laurin, Gaia; Puletti, Nicola; Chen, Qi; Corona, Piermaria; Papale, Dario; Valentini, Riccardo

2016-10-01

Estimates of forest aboveground biomass are fundamental for carbon monitoring and accounting; delivering information at very high spatial resolution is especially valuable for local management, conservation and selective logging purposes. In tropical areas, hosting large biomass and biodiversity resources which are often threatened by unsustainable anthropogenic pressures, frequent forest resources monitoring is needed. Lidar is a powerful tool to estimate aboveground biomass at fine resolution; however its application in tropical forests has been limited, with high variability in the accuracy of results. Lidar pulses scan the forest vertical profile, and can provide structure information which is also linked to biodiversity. In the last decade the remote sensing of biodiversity has received great attention, but few studies focused on the use of lidar for assessing tree species richness in tropical forests. This research aims at estimating aboveground biomass and tree species richness using discrete return airborne lidar in Ghana forests. We tested an advanced statistical technique, Multivariate Adaptive Regression Splines (MARS), which does not require assumptions on data distribution or on the relationships between variables, being suitable for studying ecological variables. We compared the MARS regression results with those obtained by multilinear regression and found that both algorithms were effective, but MARS provided higher accuracy either for biomass (R2 = 0.72) and species richness (R2 = 0.64). We also noted strong correlation between biodiversity and biomass field values. Even if the forest areas under analysis are limited in extent and represent peculiar ecosystems, the preliminary indications produced by our study suggest that instrument such as lidar, specifically useful for pinpointing forest structure, can also be exploited as a support for tree species richness assessment.
Quantifying streamflow change caused by forest disturbance at a large spatial scale: A single watershed study

NASA Astrophysics Data System (ADS)

Wei, Xiaohua; Zhang, Mingfang

2010-12-01

Climatic variability and forest disturbance are commonly recognized as two major drivers influencing streamflow change in large-scale forested watersheds. The greatest challenge in evaluating quantitative hydrological effects of forest disturbance is the removal of climatic effect on hydrology. In this paper, a method was designed to quantify respective contributions of large-scale forest disturbance and climatic variability on streamflow using the Willow River watershed (2860 km2) located in the central part of British Columbia, Canada. Long-term (>50 years) data on hydrology, climate, and timber harvesting history represented by equivalent clear-cutting area (ECA) were available to discern climatic and forestry influences on streamflow by three steps. First, effective precipitation, an integrated climatic index, was generated by subtracting evapotranspiration from precipitation. Second, modified double mass curves were developed by plotting accumulated annual streamflow against annual effective precipitation, which presented a much clearer picture of the cumulative effects of forest disturbance on streamflow following removal of climatic influence. The average annual streamflow changes that were attributed to forest disturbances and climatic variability were then estimated to be +58.7 and -72.4 mm, respectively. The positive (increasing) and negative (decreasing) values in streamflow change indicated opposite change directions, which suggest an offsetting effect between forest disturbance and climatic variability in the study watershed. Finally, a multivariate Autoregressive Integrated Moving Average (ARIMA) model was generated to establish quantitative relationships between accumulated annual streamflow deviation attributed to forest disturbances and annual ECA. The model was then used to project streamflow change under various timber harvesting scenarios. The methodology can be effectively applied to any large-scale single watershed where long-term data (>50 years) are available.
Looking for age-related growth decline in natural forests: unexpected biomass patterns from tree rings and simulated mortality

USGS Publications Warehouse

Foster, Jane R.; D'Amato, Anthony W.; Bradford, John B.

2014-01-01

Forest biomass growth is almost universally assumed to peak early in stand development, near canopy closure, after which it will plateau or decline. The chronosequence and plot remeasurement approaches used to establish the decline pattern suffer from limitations and coarse temporal detail. We combined annual tree ring measurements and mortality models to address two questions: first, how do assumptions about tree growth and mortality influence reconstructions of biomass growth? Second, under what circumstances does biomass production follow the model that peaks early, then declines? We integrated three stochastic mortality models with a census tree-ring data set from eight temperate forest types to reconstruct stand-level biomass increments (in Minnesota, USA). We compared growth patterns among mortality models, forest types and stands. Timing of peak biomass growth varied significantly among mortality models, peaking 20–30 years earlier when mortality was random with respect to tree growth and size, than when mortality favored slow-growing individuals. Random or u-shaped mortality (highest in small or large trees) produced peak growth 25–30 % higher than the surviving tree sample alone. Growth trends for even-aged, monospecific Pinus banksiana or Acer saccharum forests were similar to the early peak and decline expectation. However, we observed continually increasing biomass growth in older, low-productivity forests of Quercus rubra, Fraxinus nigra, and Thuja occidentalis. Tree-ring reconstructions estimated annual changes in live biomass growth and identified more diverse development patterns than previous methods. These detailed, long-term patterns of biomass development are crucial for detecting recent growth responses to global change and modeling future forest dynamics.
Effect of Methodological and Ecological Approaches on Heterogeneity of Nest-Site Selection of a Long-Lived Vulture

PubMed Central

Moreno-Opo, Rubén; Fernández-Olalla, Mariana; Margalida, Antoni; Arredondo, Ángel; Guil, Francisco

2012-01-01

The application of scientific-based conservation measures requires that sampling methodologies in studies modelling similar ecological aspects produce comparable results making easier their interpretation. We aimed to show how the choice of different methodological and ecological approaches can affect conclusions in nest-site selection studies along different Palearctic meta-populations of an indicator species. First, a multivariate analysis of the variables affecting nest-site selection in a breeding colony of cinereous vulture (Aegypius monachus) in central Spain was performed. Then, a meta-analysis was applied to establish how methodological and habitat-type factors determine differences and similarities in the results obtained by previous studies that have modelled the forest breeding habitat of the species. Our results revealed patterns in nesting-habitat modelling by the cinereous vulture throughout its whole range: steep and south-facing slopes, great cover of large trees and distance to human activities were generally selected. The ratio and situation of the studied plots (nests/random), the use of plots vs. polygons as sampling units and the number of years of data set determined the variability explained by the model. Moreover, a greater size of the breeding colony implied that ecological and geomorphological variables at landscape level were more influential. Additionally, human activities affected in greater proportion to colonies situated in Mediterranean forests. For the first time, a meta-analysis regarding the factors determining nest-site selection heterogeneity for a single species at broad scale was achieved. It is essential to homogenize and coordinate experimental design in modelling the selection of species' ecological requirements in order to avoid that differences in results among studies would be due to methodological heterogeneity. This would optimize best conservation and management practices for habitats and species in a global context. PMID:22413023

The development of soil organic matter in restored biodiverse Jarrah forests of South-Western Australia as determined by ASE and GCMS.

PubMed

Lin, Deborah S; Greenwood, Paul F; George, Suman; Somerfield, Paul J; Tibbett, Mark

2011-08-01

Soil organic matter (SOM) is known to increase with time as landscapes recover after a major disturbance; however, little is known about the evolution of the chemistry of SOM in reconstructed ecosystems. In this study, we assessed the development of SOM chemistry in a chronosequence (space for time substitution) of restored Jarrah forest sites in Western Australia. Replicated samples were taken at the surface of the mineral soil as well as deeper in the profile at sites of 1, 3, 6, 9, 12, and 17 years of age. A molecular approach was developed to distinguish and quantify numerous individual compounds in SOM. This used accelerated solvent extraction in conjunction with gas chromatography mass spectrometry. A novel multivariate statistical approach was used to assess changes in accelerated solvent extraction (ASE)-gas chromatography-mass spectrometry (GCMS) spectra. This enabled us to track SOM developmental trajectories with restoration time. Results showed total carbon concentrations approached that of native forests soils by 17 years of restoration. Using the relate protocol in PRIMER, we demonstrated an overall linear relationship with site age at both depths, indicating that changes in SOM chemistry were occurring. The surface soils were seen to approach native molecular compositions while the deeper soil retained a more stable chemical signature, suggesting litter from the developing diverse plant community has altered SOM near the surface. Our new approach for assessing SOM development, combining ASE-GCMS with illuminating multivariate statistical analysis, holds great promise to more fully develop ASE for the characterisation of SOM.
Functional Trait Strategies of Trees in Dry and Wet Tropical Forests Are Similar but Differ in Their Consequences for Succession

PubMed Central

Lohbeck, Madelon; Lebrija-Trejos, Edwin; Martínez-Ramos, Miguel; Meave, Jorge A.; Poorter, Lourens; Bongers, Frans

2015-01-01

Global plant trait studies have revealed fundamental trade-offs in plant resource economics. We evaluated such trait trade-offs during secondary succession in two species-rich tropical ecosystems that contrast in precipitation: dry deciduous and wet evergreen forests of Mexico. Species turnover with succession in dry forest largely relates to increasing water availability and in wet forest to decreasing light availability. We hypothesized that while functional trait trade-offs are similar in the two forest systems, the successful plant strategies in these communities will be different, as contrasting filters affect species turnover. Research was carried out in 15 dry secondary forest sites (5-63 years after abandonment) and in 17 wet secondary forest sites (<1-25 years after abandonment). We used 11 functional traits measured on 132 species to make species-trait PCA biplots for dry and wet forest and compare trait trade-offs. We evaluated whether multivariate plant strategies changed during succession, by calculating a ‘Community-Weighted Mean’ plant strategy, based on species scores on the first two PCA-axes. Trait spectra reflected two main trade-off axes that were similar for dry and wet forest species: acquisitive versus conservative species, and drought avoiding species versus evergreen species with large animal-dispersed seeds. These trait associations were consistent when accounting for evolutionary history. Successional changes in the most successful plant strategies reflected different functional trait spectra depending on the forest type. In dry forest the community changed from having drought avoiding strategies early in succession to increased abundance of evergreen strategies with larger seeds late in succession. In wet forest the community changed from species having mainly acquisitive strategies to those with more conservative strategies during succession. These strategy changes were explained by increasing water availability during dry forest succession and increasing light scarcity during wet forest succession. Although similar trait spectra were observed among dry and wet secondary forest species, the consequences for succession were different resulting from contrasting environmental filters. PMID:25919023
Mature and old-growth riparian forests: structure, dynamics, and effects on Adirondack stream habitats.

PubMed

Keeton, William S; Kraft, Clifford E; Warren, Dana R

2007-04-01

Riparian forests regulate linkages between terrestrial and aquatic ecosystems, yet relationships among riparian forest development, stand structure, and stream habitats are poorly understood in many temperate deciduous forest systems. Our research has (1) described structural attributes associated with old-growth riparian forests and (2) assessed linkages between these characteristics and in-stream habitat structure. The 19 study sites were located along predominantly first- and second-order streams in northern hardwood-conifer forests in the Adirondack Mountains of New York (U.S.A.). Sites were classified as mature forest (6 sites), mature with remnant old-growth trees (3 sites), and old-growth (10 sites). Forest-structure attributes were measured over stream channels and at varying distances from each bank. In-stream habitat features such as large woody debris (LWD), pools, and boulders were measured in each stream reach. Forest structure was examined in relation to stand age using multivariate techniques, ANOVA, and linear regression. We investigated linkages between forest structure and stream characteristics using similar methods, preceded by information-theoretic modeling (AIC). Old-growth riparian forest structure is more complex than that found in mature forests and exhibits significantly greater accumulations of aboveground tree biomass, both living and dead. In-stream LWD volumes were significantly (alpha = 0.05) greater at old-growth sites (200 m3/ha) compared to mature sites (34 m3/ha) and were strongly related to the basal area of adjacent forests. In-stream large-log densities correlated strongly with debris-dam densities. AIC models that included large-log density, debris-dam density, boulder density, and bankfull width had the most support for predicting pool density. There were higher proportions of LWD-formed pools relative to boulder-formed pools at old-growth sites as compared to mature sites. Old-growth riparian forests provide in-stream habitat features that have not been widely recognized in eastern North America, representing a potential benefit from late-successional riparian forest management and conservation. Riparian management practices (including buffer delineation and restorative silvicultural approaches) that emphasize development and maintenance of late-successional characteristics are recommended where the associated in-stream effects are desired.
Selection of forest canopy gaps by male Cerulean Warblers in West Virginia

USGS Publications Warehouse

Perkins, Kelly A.; Wood, Petra Bohall

2014-01-01

Forest openings, or canopy gaps, are an important resource for many forest songbirds, such as Cerulean Warblers (Setophaga cerulea). We examined canopy gap selection by this declining species to determine if male Cerulean Warblers selected particular sizes, vegetative heights, or types of gaps. We tested whether these parameters differed among territories, territory core areas, and randomly-placed sample plots. We used enhanced territory mapping techniques (burst sampling) to define habitat use within the territory. Canopy gap densities were higher within core areas of territories than within territories or random plots, indicating that Cerulean Warblers selected habitat within their territories with the highest gap densities. Selection of regenerating gaps with woody vegetation >12 m within the gap, and canopy heights >24 m surrounding the gap, occurred within territory core areas. These findings differed between two sites indicating that gap selection may vary based on forest structure. Differences were also found regarding the placement of territories with respect to gaps. Larger gaps, such as wildlife food plots, were located on the periphery of territories more often than other types and sizes of gaps, while smaller gaps, such as treefalls, were located within territory boundaries more often than expected. The creations of smaller canopy gaps, <100 m2, within dense stands are likely compatible with forest management for this species.
Computer aided diagnosis system for the Alzheimer's disease based on partial least squares and random forest SPECT image classification.

PubMed

Ramírez, J; Górriz, J M; Segovia, F; Chaves, R; Salas-Gonzalez, D; López, M; Alvarez, I; Padilla, P

2010-03-19

This letter shows a computer aided diagnosis (CAD) technique for the early detection of the Alzheimer's disease (AD) by means of single photon emission computed tomography (SPECT) image classification. The proposed method is based on partial least squares (PLS) regression model and a random forest (RF) predictor. The challenge of the curse of dimensionality is addressed by reducing the large dimensionality of the input data by downscaling the SPECT images and extracting score features using PLS. A RF predictor then forms an ensemble of classification and regression tree (CART)-like classifiers being its output determined by a majority vote of the trees in the forest. A baseline principal component analysis (PCA) system is also developed for reference. The experimental results show that the combined PLS-RF system yields a generalization error that converges to a limit when increasing the number of trees in the forest. Thus, the generalization error is reduced when using PLS and depends on the strength of the individual trees in the forest and the correlation between them. Moreover, PLS feature extraction is found to be more effective for extracting discriminative information from the data than PCA yielding peak sensitivity, specificity and accuracy values of 100%, 92.7%, and 96.9%, respectively. Moreover, the proposed CAD system outperformed several other recently developed AD CAD systems. Copyright 2010 Elsevier Ireland Ltd. All rights reserved.
Integrating support vector machines and random forests to classify crops in time series of Worldview-2 images

NASA Astrophysics Data System (ADS)

Zafari, A.; Zurita-Milla, R.; Izquierdo-Verdiguier, E.

2017-10-01

Crop maps are essential inputs for the agricultural planning done at various governmental and agribusinesses agencies. Remote sensing offers timely and costs efficient technologies to identify and map crop types over large areas. Among the plethora of classification methods, Support Vector Machine (SVM) and Random Forest (RF) are widely used because of their proven performance. In this work, we study the synergic use of both methods by introducing a random forest kernel (RFK) in an SVM classifier. A time series of multispectral WorldView-2 images acquired over Mali (West Africa) in 2014 was used to develop our case study. Ground truth containing five common crop classes (cotton, maize, millet, peanut, and sorghum) were collected at 45 farms and used to train and test the classifiers. An SVM with the standard Radial Basis Function (RBF) kernel, a RF, and an SVM-RFK were trained and tested over 10 random training and test subsets generated from the ground data. Results show that the newly proposed SVM-RFK classifier can compete with both RF and SVM-RBF. The overall accuracies based on the spectral bands only are of 83, 82 and 83% respectively. Adding vegetation indices to the analysis result in the classification accuracy of 82, 81 and 84% for SVM-RFK, RF, and SVM-RBF respectively. Overall, it can be observed that the newly tested RFK can compete with SVM-RBF and RF classifiers in terms of classification accuracy.
Photos for estimating fuel loadings before and after prescribed burning in the upper coastal plain of the southeast

Treesearch

Eric R. Scholl; Thomas A. Waldrop

1999-01-01

Although prescribed burning is common in the Southeastern United States, most fuel models apply to only western forests. This paper documents a fuel classification system that was developed for plantations of loblolly and longleaf pines for the Upper Coastal Plain region. Multivariate analysis of variance and discriminant function analysis were used to confirm eight...
Vulnerability of carbon storage in North American boreal forests to wildfires during the 21st century

Treesearch

M.S. Balshi; A.D. McGuire; P. Duffy; M. Flannigan; D.W. Kicklighter; J. Melillo

2009-01-01

We use a gridded data set developed with a multivariate adaptive regression spline approach to determine how area burned varies each year with changing climatic and fuel moisture conditions. We apply the process-based Terrestrial Ecosystem Model to evaluate the role of future fire on the carbon dynamics of boreal North America in the context of changing atmospheric...
Diversity of Medicinal Plants among Different Forest-use Types of the Pakistani Himalaya.

PubMed

Adnan, Muhammad; Hölscher, Dirk

2012-12-01

Diversity of Medicinal Plants among Different Forest-use Types of the Pakistani Himalaya Medicinal plants collected in Himalayan forests play a vital role in the livelihoods of regional rural societies and are also increasingly recognized at the international level. However, these forests are being heavily transformed by logging. Here we ask how forest transformation influences the diversity and composition of medicinal plants in northwestern Pakistan, where we studied old-growth forests, forests degraded by logging, and regrowth forests. First, an approximate map indicating these forest types was established and then 15 study plots per forest type were randomly selected. We found a total of 59 medicinal plant species consisting of herbs and ferns, most of which occurred in the old-growth forest. Species number was lowest in forest degraded by logging and intermediate in regrowth forest. The most valuable economic species, including six Himalayan endemics, occurred almost exclusively in old-growth forest. Species composition and abundance of forest degraded by logging differed markedly from that of old-growth forest, while regrowth forest was more similar to old-growth forest. The density of medicinal plants positively correlated with tree canopy cover in old-growth forest and negatively in degraded forest, which indicates that species adapted to open conditions dominate in logged forest. Thus, old-growth forests are important as refuge for vulnerable endemics. Forest degraded by logging has the lowest diversity of relatively common medicinal plants. Forest regrowth may foster the reappearance of certain medicinal species valuable to local livelihoods and as such promote acceptance of forest expansion and medicinal plants conservation in the region. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1007/s12231-012-9213-4) contains supplementary material, which is available to authorized users.
Informing models through empirical relationships between foliar phosphorus, nitrogen and photosynthesis across diverse woody species in tropical forests of Panama

DOE PAGES

Norby, Richard J.; Gu, Lianhong; Haworth, Ivan C.; ...

2016-11-21

Here, our objective was to analyze and summarize data describing photosynthetic parameters and foliar nutrient concentrations from tropical forests in Panama to inform model representation of phosphorus (P) limitation of tropical forest productivity. Gas exchange and nutrient content data were collected from 144 observations of upper canopy leaves from at least 65 species at two forest sites in Panama, differing in species composition, rainfall and soil fertility. Photosynthetic parameters were derived from analysis of assimilation rate vs internal CO 2 concentration curves ( A/C i), and relationships with foliar nitrogen (N) and P content were developed. The relationships between area-basedmore » photosynthetic parameters and nutrients were of similar strength for N and P and robust across diverse species and site conditions. The strongest relationship expressed maximum electron transport rate (J max) as a multivariate function of both N and P, and this relationship was improved with the inclusion of independent data on wood density. Models that estimate photosynthesis from foliar N would be improved only modestly by including additional data on foliar P, but doing so may increase the capability of models to predict future conditions in P-limited tropical forests, especially when combined with data on edaphic conditions and other environmental drivers.« less
Post-wildfire summer greening depends on winter snowpack

NASA Astrophysics Data System (ADS)

Wilson, A.; Nolin, A. W.

2017-12-01

Forested, mountain landscapes in the Pacific Northwest (PNW) are changing at an unprecedented rate, largely due to shifts in the regional climate regime. Documented climatic trends include increasing wildfire frequency and intensity and an increasingly ephemeral snowpack, especially at moderate elevations. One relationship that has yet to be studied thoroughly is the dependence of post-wildfire forest recovery on winter snowpack. This study will correlate winter snowpack with summer greenness in the context of 15 recent severe wildfires across the PNW. Winter snow water equivalent will be estimated using a new Snow Cover Frequency (SCF) metric derived from the Moderate Resolution Imaging Spectroradiometer (MODIS) daily snow cover product. Summer forest greenness will be assessed using the Enhanced Vegetation Index (EVI), also derived from daily MODIS reflectance data. Regression tree analysis will be employed to characterize the relative importance of snowpack, elevation, slope, aspect, soil texture, and summer precipitation to summer greenness. Using findings from the regression tree analysis, the most critical physiographic factors will frame a multivariate time series spanning the 5 years pre-wildfire and 5 years post-wildfire in an effort to illustrate how the snowpack-revegetation relationship persists over time. As northwestern mountainous forests become more vulnerable to wildfire activity, it will be vital to continue deepening our understanding of how snowpack matters to post-wildfire forest recovery.
A multivariate spatial mixture model for areal data: examining regional differences in standardized test scores

PubMed Central

Neelon, Brian; Gelfand, Alan E.; Miranda, Marie Lynn

2013-01-01

Summary Researchers in the health and social sciences often wish to examine joint spatial patterns for two or more related outcomes. Examples include infant birth weight and gestational length, psychosocial and behavioral indices, and educational test scores from different cognitive domains. We propose a multivariate spatial mixture model for the joint analysis of continuous individual-level outcomes that are referenced to areal units. The responses are modeled as a finite mixture of multivariate normals, which accommodates a wide range of marginal response distributions and allows investigators to examine covariate effects within subpopulations of interest. The model has a hierarchical structure built at the individual level (i.e., individuals are nested within areal units), and thus incorporates both individual- and areal-level predictors as well as spatial random effects for each mixture component. Conditional autoregressive (CAR) priors on the random effects provide spatial smoothing and allow the shape of the multivariate distribution to vary flexibly across geographic regions. We adopt a Bayesian modeling approach and develop an efficient Markov chain Monte Carlo model fitting algorithm that relies primarily on closed-form full conditionals. We use the model to explore geographic patterns in end-of-grade math and reading test scores among school-age children in North Carolina. PMID:26401059
Multivariate Methods for Meta-Analysis of Genetic Association Studies.

PubMed

Dimou, Niki L; Pantavou, Katerina G; Braliou, Georgia G; Bagos, Pantelis G

2018-01-01

Multivariate meta-analysis of genetic association studies and genome-wide association studies has received a remarkable attention as it improves the precision of the analysis. Here, we review, summarize and present in a unified framework methods for multivariate meta-analysis of genetic association studies and genome-wide association studies. Starting with the statistical methods used for robust analysis and genetic model selection, we present in brief univariate methods for meta-analysis and we then scrutinize multivariate methodologies. Multivariate models of meta-analysis for a single gene-disease association studies, including models for haplotype association studies, multiple linked polymorphisms and multiple outcomes are discussed. The popular Mendelian randomization approach and special cases of meta-analysis addressing issues such as the assumption of the mode of inheritance, deviation from Hardy-Weinberg Equilibrium and gene-environment interactions are also presented. All available methods are enriched with practical applications and methodologies that could be developed in the future are discussed. Links for all available software implementing multivariate meta-analysis methods are also provided.
Automated retrieval of forest structure variables based on multi-scale texture analysis of VHR satellite imagery

NASA Astrophysics Data System (ADS)

Beguet, Benoit; Guyon, Dominique; Boukir, Samia; Chehata, Nesrine

2014-10-01

The main goal of this study is to design a method to describe the structure of forest stands from Very High Resolution satellite imagery, relying on some typical variables such as crown diameter, tree height, trunk diameter, tree density and tree spacing. The emphasis is placed on the automatization of the process of identification of the most relevant image features for the forest structure retrieval task, exploiting both spectral and spatial information. Our approach is based on linear regressions between the forest structure variables to be estimated and various spectral and Haralick's texture features. The main drawback of this well-known texture representation is the underlying parameters which are extremely difficult to set due to the spatial complexity of the forest structure. To tackle this major issue, an automated feature selection process is proposed which is based on statistical modeling, exploring a wide range of parameter values. It provides texture measures of diverse spatial parameters hence implicitly inducing a multi-scale texture analysis. A new feature selection technique, we called Random PRiF, is proposed. It relies on random sampling in feature space, carefully addresses the multicollinearity issue in multiple-linear regression while ensuring accurate prediction of forest variables. Our automated forest variable estimation scheme was tested on Quickbird and Pléiades panchromatic and multispectral images, acquired at different periods on the maritime pine stands of two sites in South-Western France. It outperforms two well-established variable subset selection techniques. It has been successfully applied to identify the best texture features in modeling the five considered forest structure variables. The RMSE of all predicted forest variables is improved by combining multispectral and panchromatic texture features, with various parameterizations, highlighting the potential of a multi-resolution approach for retrieving forest structure variables from VHR satellite images. Thus an average prediction error of ˜ 1.1 m is expected on crown diameter, ˜ 0.9 m on tree spacing, ˜ 3 m on height and ˜ 0.06 m on diameter at breast height.
Remote sensing based detection of forested wetlands: An evaluation of LiDAR, aerial imagery, and their data fusion

NASA Astrophysics Data System (ADS)

Suiter, Ashley Elizabeth

Multi-spectral imagery provides a robust and low-cost dataset for assessing wetland extent and quality over broad regions and is frequently used for wetland inventories. However in forested wetlands, hydrology is obscured by tree canopy making it difficult to detect with multi-spectral imagery alone. Because of this, classification of forested wetlands often includes greater errors than that of other wetlands types. Elevation and terrain derivatives have been shown to be useful for modelling wetland hydrology. But, few studies have addressed the use of LiDAR intensity data detecting hydrology in forested wetlands. Due the tendency of LiDAR signal to be attenuated by water, this research proposed the fusion of LiDAR intensity data with LiDAR elevation, terrain data, and aerial imagery, for the detection of forested wetland hydrology. We examined the utility of LiDAR intensity data and determined whether the fusion of Lidar derived data with multispectral imagery increased the accuracy of forested wetland classification compared with a classification performed with only multi-spectral image. Four classifications were performed: Classification A -- All Imagery, Classification B -- All LiDAR, Classification C -- LiDAR without Intensity, and Classification D -- Fusion of All Data. These classifications were performed using random forest and each resulted in a 3-foot resolution thematic raster of forested upland and forested wetland locations in Vermilion County, Illinois. The accuracies of these classifications were compared using Kappa Coefficient of Agreement. Importance statistics produced within the random forest classifier were evaluated in order to understand the contribution of individual datasets. Classification D, which used the fusion of LiDAR and multi-spectral imagery as input variables, had moderate to strong agreement between reference data and classification results. It was found that Classification A performed using all the LiDAR data and its derivatives (intensity, elevation, slope, aspect, curvatures, and Topographic Wetness Index) was the most accurate classification with Kappa: 78.04%, indicating moderate to strong agreement. However, Classification C, performed with LiDAR derivative without intensity data had less agreement than would be expected by chance, indicating that LiDAR contributed significantly to the accuracy of Classification B.
Design and evaluation of an aerial spray trial with true replicates to test the efficacy of Bacillus thuringiensis insecticide in a boreal forest.

PubMed

Cadogan, Beresford L; Scharbach, Roger D

2003-04-01

A field trial using true replicates was conducted successfully in a boreal forest in 1996 to evaluate the efficacy of two aerially applied Bacillus thuringiensis formulations, ABG 6429 and ABG 6430. A complete randomized design with four replicates per treatment was chosen. Twelve to 15 balsam fir (Abies balsamea [L.] Mill.) per plot were randomly selected as sample trees. Interplot buffer zones, > or = 200 m wide, adequately prevented cross contamination from sprays that were atomized with four rotary atomizers (volume median diameters ranging from 64.6 to 139.4 microm) and released approximately 30 m above the ground. The B. thuringiensis formulations were not significantly different (P > 0.05) from each other in reducing spruce budworm (Choristoneura fumiferana [Clem.]) populations and protecting balsam trees from defoliation but both formulations were significantly more efficacious than the controls. The results suggest that true replicates are a feasible alternative to pseudoreplication in experimental forest aerial applications.
Validation of psoriatic arthritis diagnoses in electronic medical records using natural language processing

PubMed Central

Cai, Tianxi; Karlson, Elizabeth W.

2013-01-01

Objectives To test whether data extracted from full text patient visit notes from an electronic medical record (EMR) would improve the classification of PsA compared to an algorithm based on codified data. Methods From the > 1,350,000 adults in a large academic EMR, all 2318 patients with a billing code for PsA were extracted and 550 were randomly selected for chart review and algorithm training. Using codified data and phrases extracted from narrative data using natural language processing, 31 predictors were extracted and three random forest algorithms trained using coded, narrative, and combined predictors. The receiver operator curve (ROC) was used to identify the optimal algorithm and a cut point was chosen to achieve the maximum sensitivity possible at a 90% positive predictive value (PPV). The algorithm was then used to classify the remaining 1768 charts and finally validated in a random sample of 300 cases predicted to have PsA. Results The PPV of a single PsA code was 57% (95%CI 55%–58%). Using a combination of coded data and NLP the random forest algorithm reached a PPV of 90% (95%CI 86%–93%) at sensitivity of 87% (95% CI 83% – 91%) in the training data. The PPV was 93% (95%CI 89%–96%) in the validation set. Adding NLP predictors to codified data increased the area under the ROC (p < 0.001). Conclusions Using NLP with text notes from electronic medical records improved the performance of the prediction algorithm significantly. Random forests were a useful tool to accurately classify psoriatic arthritis cases to enable epidemiological research. PMID:20701955
Ensemble Feature Learning of Genomic Data Using Support Vector Machine

PubMed Central

Anaissi, Ali; Goyal, Madhu; Catchpoole, Daniel R.; Braytee, Ali; Kennedy, Paul J.

2016-01-01

The identification of a subset of genes having the ability to capture the necessary information to distinguish classes of patients is crucial in bioinformatics applications. Ensemble and bagging methods have been shown to work effectively in the process of gene selection and classification. Testament to that is random forest which combines random decision trees with bagging to improve overall feature selection and classification accuracy. Surprisingly, the adoption of these methods in support vector machines has only recently received attention but mostly on classification not gene selection. This paper introduces an ensemble SVM-Recursive Feature Elimination (ESVM-RFE) for gene selection that follows the concepts of ensemble and bagging used in random forest but adopts the backward elimination strategy which is the rationale of RFE algorithm. The rationale behind this is, building ensemble SVM models using randomly drawn bootstrap samples from the training set, will produce different feature rankings which will be subsequently aggregated as one feature ranking. As a result, the decision for elimination of features is based upon the ranking of multiple SVM models instead of choosing one particular model. Moreover, this approach will address the problem of imbalanced datasets by constructing a nearly balanced bootstrap sample. Our experiments show that ESVM-RFE for gene selection substantially increased the classification performance on five microarray datasets compared to state-of-the-art methods. Experiments on the childhood leukaemia dataset show that an average 9% better accuracy is achieved by ESVM-RFE over SVM-RFE, and 5% over random forest based approach. The selected genes by the ESVM-RFE algorithm were further explored with Singular Value Decomposition (SVD) which reveals significant clusters with the selected data. PMID:27304923
Random forests ensemble classifier trained with data resampling strategy to improve cardiac arrhythmia diagnosis.

PubMed

Ozçift, Akin

2011-05-01

Supervised classification algorithms are commonly used in the designing of computer-aided diagnosis systems. In this study, we present a resampling strategy based Random Forests (RF) ensemble classifier to improve diagnosis of cardiac arrhythmia. Random forests is an ensemble classifier that consists of many decision trees and outputs the class that is the mode of the class's output by individual trees. In this way, an RF ensemble classifier performs better than a single tree from classification performance point of view. In general, multiclass datasets having unbalanced distribution of sample sizes are difficult to analyze in terms of class discrimination. Cardiac arrhythmia is such a dataset that has multiple classes with small sample sizes and it is therefore adequate to test our resampling based training strategy. The dataset contains 452 samples in fourteen types of arrhythmias and eleven of these classes have sample sizes less than 15. Our diagnosis strategy consists of two parts: (i) a correlation based feature selection algorithm is used to select relevant features from cardiac arrhythmia dataset. (ii) RF machine learning algorithm is used to evaluate the performance of selected features with and without simple random sampling to evaluate the efficiency of proposed training strategy. The resultant accuracy of the classifier is found to be 90.0% and this is a quite high diagnosis performance for cardiac arrhythmia. Furthermore, three case studies, i.e., thyroid, cardiotocography and audiology, are used to benchmark the effectiveness of the proposed method. The results of experiments demonstrated the efficiency of random sampling strategy in training RF ensemble classification algorithm. Copyright © 2011 Elsevier Ltd. All rights reserved.
A spatially explicit decision support model for restoration of forest bird habitat

USGS Publications Warehouse

Twedt, D.J.; Uihlein, W.B.; Elliott, A.B.

2006-01-01

The historical area of bottomland hardwood forest in the Mississippi Alluvial Valley has been reduced by >75%. Agricultural production was the primary motivator for deforestation; hence, clearing deliberately targeted higher and drier sites. Remaining forests are highly fragmented and hydrologically altered, with larger forest fragments subject to greater inundation, which has negatively affected many forest bird populations. We developed a spatially explicit decision support model, based on a Partners in Flight plan for forest bird conservation, that prioritizes forest restoration to reduce forest fragmentation and increase the area of forest core (interior forest >1 km from 'hostile' edge). Our primary objective was to increase the number of forest patches that harbor >2000 ha of forest core, but we also sought to increase the number and area of forest cores >5000 ha. Concurrently, we targeted restoration within local (320 km2) landscapes to achieve >60% forest cover. Finally, we emphasized restoration of higher-elevation bottomland hardwood forests in areas where restoration would not increase forest fragmentation. Reforestation of 10% of restorable land in the Mississippi Alluvial Valley (approximately 880,000 ha) targeted at priorities established by this decision support model resulted in approximately 824,000 ha of new forest core. This is more than 32 times the amount of core forest added through reforestation of randomly located fields (approximately 25,000 ha). The total area of forest core (1.6 million ha) that resulted from targeted restoration exceeded habitat objectives identified in the Partners in Flight Bird Conservation Plan and approached the area of forest core present in the 1950s.

Not accounting for interindividual variability can mask habitat selection patterns: a case study on black bears.

PubMed

Lesmerises, Rémi; St-Laurent, Martin-Hugues

2017-11-01

Habitat selection studies conducted at the population scale commonly aim to describe general patterns that could improve our understanding of the limiting factors in species-habitat relationships. Researchers often consider interindividual variation in selection patterns to control for its effects and avoid pseudoreplication by using mixed-effect models that include individuals as random factors. Here, we highlight common pitfalls and possible misinterpretations of this strategy by describing habitat selection of 21 black bears Ursus americanus. We used Bayesian mixed-effect models and compared results obtained when using random intercept (i.e., population level) versus calculating individual coefficients for each independent variable (i.e., individual level). We then related interindividual variability to individual characteristics (i.e., age, sex, reproductive status, body condition) in a multivariate analysis. The assumption of comparable behavior among individuals was verified only in 40% of the cases in our seasonal best models. Indeed, we found strong and opposite responses among sampled bears and individual coefficients were linked to individual characteristics. For some covariates, contrasted responses canceled each other out at the population level. In other cases, interindividual variability was concealed by the composition of our sample, with the majority of the bears (e.g., old individuals and bears in good physical condition) driving the population response (e.g., selection of young forest cuts). Our results stress the need to consider interindividual variability to avoid misinterpretation and uninformative results, especially for a flexible and opportunistic species. This study helps to identify some ecological drivers of interindividual variability in bear habitat selection patterns.
Assessing the Potential of Land Use Modification to Mitigate Ambient NO₂ and Its Consequences for Respiratory Health.

PubMed

Rao, Meenakshi; George, Linda A; Shandas, Vivek; Rosenstiel, Todd N

2017-07-10

Understanding how local land use and land cover (LULC) shapes intra-urban concentrations of atmospheric pollutants-and thus human health-is a key component in designing healthier cities. Here, NO₂ is modeled based on spatially dense summer and winter NO₂ observations in Portland-Hillsboro-Vancouver (USA), and the spatial variation of NO₂ with LULC investigated using random forest, an ensemble data learning technique. The NO 2 random forest model, together with BenMAP, is further used to develop a better understanding of the relationship among LULC, ambient NO₂ and respiratory health. The impact of land use modifications on ambient NO₂, and consequently on respiratory health, is also investigated using a sensitivity analysis. We find that NO₂ associated with roadways and tree-canopied areas may be affecting annual incidence rates of asthma exacerbation in 4-12 year olds by +3000 per 100,000 and -1400 per 100,000, respectively. Our model shows that increasing local tree canopy by 5% may reduce local incidences rates of asthma exacerbation by 6%, indicating that targeted local tree-planting efforts may have a substantial impact on reducing city-wide incidence of respiratory distress. Our findings demonstrate the utility of random forest modeling in evaluating LULC modifications for enhanced respiratory health.
A Robust Random Forest-Based Approach for Heart Rate Monitoring Using Photoplethysmography Signal Contaminated by Intense Motion Artifacts.

PubMed

Ye, Yalan; He, Wenwen; Cheng, Yunfei; Huang, Wenxia; Zhang, Zhilin

2017-02-16

The estimation of heart rate (HR) based on wearable devices is of interest in fitness. Photoplethysmography (PPG) is a promising approach to estimate HR due to low cost; however, it is easily corrupted by motion artifacts (MA). In this work, a robust approach based on random forest is proposed for accurately estimating HR from the photoplethysmography signal contaminated by intense motion artifacts, consisting of two stages. Stage 1 proposes a hybrid method to effectively remove MA with a low computation complexity, where two MA removal algorithms are combined by an accurate binary decision algorithm whose aim is to decide whether or not to adopt the second MA removal algorithm. Stage 2 proposes a random forest-based spectral peak-tracking algorithm, whose aim is to locate the spectral peak corresponding to HR, formulating the problem of spectral peak tracking into a pattern classification problem. Experiments on the PPG datasets including 22 subjects used in the 2015 IEEE Signal Processing Cup showed that the proposed approach achieved the average absolute error of 1.65 beats per minute (BPM) on the 22 PPG datasets. Compared to state-of-the-art approaches, the proposed approach has better accuracy and robustness to intense motion artifacts, indicating its potential use in wearable sensors for health monitoring and fitness tracking.
Comparing spatial regression to random forests for large ...

EPA Pesticide Factsheets

Environmental data may be “large” due to number of records, number of covariates, or both. Random forests has a reputation for good predictive performance when using many covariates, whereas spatial regression, when using reduced rank methods, has a reputation for good predictive performance when using many records. In this study, we compare these two techniques using a data set containing the macroinvertebrate multimetric index (MMI) at 1859 stream sites with over 200 landscape covariates. Our primary goal is predicting MMI at over 1.1 million perennial stream reaches across the USA. For spatial regression modeling, we develop two new methods to accommodate large data: (1) a procedure that estimates optimal Box-Cox transformations to linearize covariate relationships; and (2) a computationally efficient covariate selection routine that takes into account spatial autocorrelation. We show that our new methods lead to cross-validated performance similar to random forests, but that there is an advantage for spatial regression when quantifying the uncertainty of the predictions. Simulations are used to clarify advantages for each method. This research investigates different approaches for modeling and mapping national stream condition. We use MMI data from the EPA's National Rivers and Streams Assessment and predictors from StreamCat (Hill et al., 2015). Previous studies have focused on modeling the MMI condition classes (i.e., good, fair, and po
Sequential Monte Carlo tracking of the marginal artery by multiple cue fusion and random forest regression.

PubMed

Cherry, Kevin M; Peplinski, Brandon; Kim, Lauren; Wang, Shijun; Lu, Le; Zhang, Weidong; Liu, Jianfei; Wei, Zhuoshi; Summers, Ronald M

2015-01-01

Given the potential importance of marginal artery localization in automated registration in computed tomography colonography (CTC), we have devised a semi-automated method of marginal vessel detection employing sequential Monte Carlo tracking (also known as particle filtering tracking) by multiple cue fusion based on intensity, vesselness, organ detection, and minimum spanning tree information for poorly enhanced vessel segments. We then employed a random forest algorithm for intelligent cue fusion and decision making which achieved high sensitivity and robustness. After applying a vessel pruning procedure to the tracking results, we achieved statistically significantly improved precision compared to a baseline Hessian detection method (2.7% versus 75.2%, p<0.001). This method also showed statistically significantly improved recall rate compared to a 2-cue baseline method using fewer vessel cues (30.7% versus 67.7%, p<0.001). These results demonstrate that marginal artery localization on CTC is feasible by combining a discriminative classifier (i.e., random forest) with a sequential Monte Carlo tracking mechanism. In so doing, we present the effective application of an anatomical probability map to vessel pruning as well as a supplementary spatial coordinate system for colonic segmentation and registration when this task has been confounded by colon lumen collapse. Published by Elsevier B.V.
A scattering model for forested area

NASA Technical Reports Server (NTRS)

Karam, M. A.; Fung, A. K.

1988-01-01

A forested area is modeled as a volume of randomly oriented and distributed disc-shaped, or needle-shaped leaves shading a distribution of branches modeled as randomly oriented finite-length, dielectric cylinders above an irregular soil surface. Since the radii of branches have a wide range of sizes, the model only requires the length of a branch to be large compared with its radius which may be any size relative to the incident wavelength. In addition, the model also assumes the thickness of a disc-shaped leaf or the radius of a needle-shaped leaf is much smaller than the electromagnetic wavelength. The scattering phase matrices for disc, needle, and cylinder are developed in terms of the scattering amplitudes of the corresponding fields which are computed by the forward scattering theorem. These quantities along with the Kirchoff scattering model for a randomly rough surface are used in the standard radiative transfer formulation to compute the backscattering coefficient. Numerical illustrations for the backscattering coefficient are given as a function of the shading factor, incidence angle, leaf orientation distribution, branch orientation distribution, and the number density of leaves. Also illustrated are the properties of the extinction coefficient as a function of leaf and branch orientation distributions. Comparisons are made with measured backscattering coefficients from forested areas reported in the literature.
The Trail Making test: a study of its ability to predict falls in the acute neurological in-patient population.

PubMed

Mateen, Bilal Akhter; Bussas, Matthias; Doogan, Catherine; Waller, Denise; Saverino, Alessia; Király, Franz J; Playford, E Diane

2018-05-01

To determine whether tests of cognitive function and patient-reported outcome measures of motor function can be used to create a machine learning-based predictive tool for falls. Prospective cohort study. Tertiary neurological and neurosurgical center. In all, 337 in-patients receiving neurosurgical, neurological, or neurorehabilitation-based care. Binary (Y/N) for falling during the in-patient episode, the Trail Making Test (a measure of attention and executive function) and the Walk-12 (a patient-reported measure of physical function). The principal outcome was a fall during the in-patient stay ( n = 54). The Trail test was identified as the best predictor of falls. Moreover, addition of other variables, did not improve the prediction (Wilcoxon signed-rank P < 0.001). Classical linear statistical modeling methods were then compared with more recent machine learning based strategies, for example, random forests, neural networks, support vector machines. The random forest was the best modeling strategy when utilizing just the Trail Making Test data (Wilcoxon signed-rank P < 0.001) with 68% (± 7.7) sensitivity, and 90% (± 2.3) specificity. This study identifies a simple yet powerful machine learning (Random Forest) based predictive model for an in-patient neurological population, utilizing a single neuropsychological test of cognitive function, the Trail Making test.
Multiple Imputation of Item Scores in Test and Questionnaire Data, and Influence on Psychometric Results

ERIC Educational Resources Information Center

van Ginkel, Joost R.; van der Ark, L. Andries; Sijtsma, Klaas

2007-01-01

The performance of five simple multiple imputation methods for dealing with missing data were compared. In addition, random imputation and multivariate normal imputation were used as lower and upper benchmark, respectively. Test data were simulated and item scores were deleted such that they were either missing completely at random, missing at…
Global patterns of tropical forest fragmentation

NASA Astrophysics Data System (ADS)

Taubert, Franziska; Fischer, Rico; Groeneveld, Jürgen; Lehmann, Sebastian; Müller, Michael S.; Rödig, Edna; Wiegand, Thorsten; Huth, Andreas

2018-02-01

Remote sensing enables the quantification of tropical deforestation with high spatial resolution. This in-depth mapping has led to substantial advances in the analysis of continent-wide fragmentation of tropical forests. Here we identified approximately 130 million forest fragments in three continents that show surprisingly similar power-law size and perimeter distributions as well as fractal dimensions. Power-law distributions have been observed in many natural phenomena such as wildfires, landslides and earthquakes. The principles of percolation theory provide one explanation for the observed patterns, and suggest that forest fragmentation is close to the critical point of percolation; simulation modelling also supports this hypothesis. The observed patterns emerge not only from random deforestation, which can be described by percolation theory, but also from a wide range of deforestation and forest-recovery regimes. Our models predict that additional forest loss will result in a large increase in the total number of forest fragments—at maximum by a factor of 33 over 50 years—as well as a decrease in their size, and that these consequences could be partly mitigated by reforestation and forest protection.
Recognizing pedestrian's unsafe behaviors in far-infrared imagery at night

NASA Astrophysics Data System (ADS)

Lee, Eun Ju; Ko, Byoung Chul; Nam, Jae-Yeal

2016-05-01

Pedestrian behavior recognition is important work for early accident prevention in advanced driver assistance system (ADAS). In particular, because most pedestrian-vehicle crashes are occurred from late of night to early of dawn, our study focus on recognizing unsafe behavior of pedestrians using thermal image captured from moving vehicle at night. For recognizing unsafe behavior, this study uses convolutional neural network (CNN) which shows high quality of recognition performance. However, because traditional CNN requires the very expensive training time and memory, we design the light CNN consisted of two convolutional layers and two subsampling layers for real-time processing of vehicle applications. In addition, we combine light CNN with boosted random forest (Boosted RF) classifier so that the output of CNN is not fully connected with the classifier but randomly connected with Boosted random forest. We named this CNN as randomly connected CNN (RC-CNN). The proposed method was successfully applied to the pedestrian unsafe behavior (PUB) dataset captured from far-infrared camera at night and its behavior recognition accuracy is confirmed to be higher than that of some algorithms related to CNNs, with a shorter processing time.
Field strategies for the calibration and validation of high-resolution forest carbon maps: Scaling from plots to a three state region MD, DE, & PA, USA.

NASA Astrophysics Data System (ADS)

Dolan, K. A.; Huang, W.; Johnson, K. D.; Birdsey, R.; Finley, A. O.; Dubayah, R.; Hurtt, G. C.

2016-12-01

In 2010 Congress directed NASA to initiate research towards the development of Carbon Monitoring Systems (CMS). In response, our team has worked to develop a robust, replicable framework to quantify and map aboveground forest biomass at high spatial resolutions. Crucial to this framework has been the collection of field-based estimates of aboveground tree biomass, combined with remotely detected canopy and structural attributes, for calibration and validation. Here we evaluate the field- based calibration and validation strategies within this carbon monitoring framework and discuss the implications on local to national monitoring systems. Through project development, the domain of this research has expanded from two counties in MD (2,181 km2), to the entire state of MD (32,133 km2), and most recently the tri-state region of MD, PA, and DE (157,868 km2) and covers forests in four major USDA ecological providences. While there are approximately 1000 Forest Inventory and Analysis (FIA) plots distributed across the state of MD, 60% fell in areas considered non-forest or had conditions that precluded them from being measured in the last forest inventory. Across the two pilot counties, where population and landuse competition is high, that proportion rose to 70% Thus, during the initial phases of this project 850 independent field plots were established for model calibration following a random stratified design to insure the adequate representation of height and vegetation classes found across the state, while FIA data were used as an independent data source for validation. As the project expanded to cover the larger spatial tri-state domain, the strategy was flipped to base calibration on more than 3,300 measured FIA plots, as they provide a standardized, consistent and available data source across the nation. An additional 350 stratified random plots were deployed in the Northern Mixed forests of PA and the Coastal Plains forests of DE for validation.
Comparative genetic responses to climate for the varieties of Pinus ponderosa and Pseudotsuga menziesii: realized climate niches

Treesearch

Gerald E. Rehfeldt; Barry C. Jaquish; Javier Lopez-Upton; Cuauhtemoc Saenz-Romero; J. Bradley St Clair; Laura P. Leites; Dennis G. Joyce

2014-01-01

The Random Forests classification algorithm was used to predict the occurrence of the realized climate niche for two sub-specific varieties of Pinus ponderosa and three varieties of Pseudotsuga menziesii from presence-absence data in forest inventory ground plots. Analyses were based on ca. 271,000 observations for P. ponderosa and ca. 426,000 observations for P....
Sensitivity of a Riparian Large Woody Debris Recruitment Model to the Number of Contributing Banks and Tree Fall Pattern

Treesearch

Don C. Bragg; Jeffrey L. Kershner

2004-01-01

Riparian large woody debris (LWD) recruitment simulations have traditionally applied a random angle of tree fall from two well-forested stream banks. We used a riparian LWD recruitment model (CWD, version 1.4) to test the validity these assumptions. Both the number of contributing forest banks and predominant tree fall direction significantly influenced simulated...
Ten-year response of a forest bird community to an operational herbicide-shelterwood treatment in Allegheny hardwoods

Treesearch

Scott H. Stoleson; Todd E. Ristau; David S. deCalesta; Stephen B. Horsley

2011-01-01

Use of herbicides in forestry to direct successional trajectories has raised concerns over possible direct or indirect effects on non-target organisms. We studied the response of forest birds to an operational application of glyphosate and sulfometuron methyl herbicides, using a randomized block design in which half of each 8 ha block received herbicide and the other...
Habitat use of two songbird species in pine-hardwood forests treated with prescribed burning and thinning: first year results

Treesearch

Jill M. Wick; Yong Wang

2010-01-01

We evaluated habitat use and home range size of hooded warblers (Wilsonia citrine) and worm-eating warblers (Helmitheros vermivorus) in six treated mixed oak-pine stands on the Bankhead National Forest in north-central AL. Study design is a randomized complete block with a factorial arrangement of three thinning levels (no thin, 11...
A comparison of three erosion control mulches on decommissioned forest road corridors in the northern Rocky Mountains, United States

Treesearch

R. B. Foltz

2012-01-01

This study tested the erosion mitigation effectiveness of agricultural straw and two wood-based mulches for four years on decommissioned forest roads. Plots were installed on the loosely consolidated, bare soil to measure sediment production, mulch cover, and plant regrowth. The experimental design was a repeated measures, randomized block on two soil types common in...
Effects of low intensity prescribed fires on ponderosa pine forests in wilderness areas of Zion National Park, Utah

Treesearch

Henry V. Bastian

2001-01-01

Vegetation and fuel loading plots were monitored and sampled in wilderness areas treated with prescribed fire. Changes in ponderosa pine (Pinus ponderosa) forest structure tree species and fuel loading are presented. Plots were randomly stratified and established in burn units in 1995. Preliminary analysis of nine plots 2 years after burning show litter was reduced 54....
Bayesian spatial prediction of the site index in the study of the Missouri Ozark Forest Ecosystem Project

Treesearch

Xiaoqian Sun; Zhuoqiong He; John Kabrick

2008-01-01

This paper presents a Bayesian spatial method for analysing the site index data from the Missouri Ozark Forest Ecosystem Project (MOFEP). Based on ecological background and availability, we select three variables, the aspect class, the soil depth and the land type association as covariates for analysis. To allow great flexibility of the smoothness of the random field,...
Mapping Soil Properties of Africa at 250 m Resolution: Random Forests Significantly Improve Current Predictions

PubMed Central

Hengl, Tomislav; Heuvelink, Gerard B. M.; Kempen, Bas; Leenaars, Johan G. B.; Walsh, Markus G.; Shepherd, Keith D.; Sila, Andrew; MacMillan, Robert A.; Mendes de Jesus, Jorge; Tamene, Lulseged; Tondoh, Jérôme E.

2015-01-01

80% of arable land in Africa has low soil fertility and suffers from physical soil problems. Additionally, significant amounts of nutrients are lost every year due to unsustainable soil management practices. This is partially the result of insufficient use of soil management knowledge. To help bridge the soil information gap in Africa, the Africa Soil Information Service (AfSIS) project was established in 2008. Over the period 2008–2014, the AfSIS project compiled two point data sets: the Africa Soil Profiles (legacy) database and the AfSIS Sentinel Site database. These data sets contain over 28 thousand sampling locations and represent the most comprehensive soil sample data sets of the African continent to date. Utilizing these point data sets in combination with a large number of covariates, we have generated a series of spatial predictions of soil properties relevant to the agricultural management—organic carbon, pH, sand, silt and clay fractions, bulk density, cation-exchange capacity, total nitrogen, exchangeable acidity, Al content and exchangeable bases (Ca, K, Mg, Na). We specifically investigate differences between two predictive approaches: random forests and linear regression. Results of 5-fold cross-validation demonstrate that the random forests algorithm consistently outperforms the linear regression algorithm, with average decreases of 15–75% in Root Mean Squared Error (RMSE) across soil properties and depths. Fitting and running random forests models takes an order of magnitude more time and the modelling success is sensitive to artifacts in the input data, but as long as quality-controlled point data are provided, an increase in soil mapping accuracy can be expected. Results also indicate that globally predicted soil classes (USDA Soil Taxonomy, especially Alfisols and Mollisols) help improve continental scale soil property mapping, and are among the most important predictors. This indicates a promising potential for transferring pedological knowledge from data rich countries to countries with limited soil data. PMID:26110833
Improved predictive mapping of indoor radon concentrations using ensemble regression trees based on automatic clustering of geological units.

PubMed

Kropat, Georg; Bochud, Francois; Jaboyedoff, Michel; Laedermann, Jean-Pascal; Murith, Christophe; Palacios Gruson, Martha; Baechler, Sébastien

2015-09-01

According to estimations around 230 people die as a result of radon exposure in Switzerland. This public health concern makes reliable indoor radon prediction and mapping methods necessary in order to improve risk communication to the public. The aim of this study was to develop an automated method to classify lithological units according to their radon characteristics and to develop mapping and predictive tools in order to improve local radon prediction. About 240 000 indoor radon concentration (IRC) measurements in about 150 000 buildings were available for our analysis. The automated classification of lithological units was based on k-medoids clustering via pair-wise Kolmogorov distances between IRC distributions of lithological units. For IRC mapping and prediction we used random forests and Bayesian additive regression trees (BART). The automated classification groups lithological units well in terms of their IRC characteristics. Especially the IRC differences in metamorphic rocks like gneiss are well revealed by this method. The maps produced by random forests soundly represent the regional difference of IRCs in Switzerland and improve the spatial detail compared to existing approaches. We could explain 33% of the variations in IRC data with random forests. Additionally, the influence of a variable evaluated by random forests shows that building characteristics are less important predictors for IRCs than spatial/geological influences. BART could explain 29% of IRC variability and produced maps that indicate the prediction uncertainty. Ensemble regression trees are a powerful tool to model and understand the multidimensional influences on IRCs. Automatic clustering of lithological units complements this method by facilitating the interpretation of radon properties of rock types. This study provides an important element for radon risk communication. Future approaches should consider taking into account further variables like soil gas radon measurements as well as more detailed geological information. Copyright © 2015 Elsevier Ltd. All rights reserved.

Ensemble Pruning for Glaucoma Detection in an Unbalanced Data Set.

PubMed

Adler, Werner; Gefeller, Olaf; Gul, Asma; Horn, Folkert K; Khan, Zardad; Lausen, Berthold

2016-12-07

Random forests are successful classifier ensemble methods consisting of typically 100 to 1000 classification trees. Ensemble pruning techniques reduce the computational cost, especially the memory demand, of random forests by reducing the number of trees without relevant loss of performance or even with increased performance of the sub-ensemble. The application to the problem of an early detection of glaucoma, a severe eye disease with low prevalence, based on topographical measurements of the eye background faces specific challenges. We examine the performance of ensemble pruning strategies for glaucoma detection in an unbalanced data situation. The data set consists of 102 topographical features of the eye background of 254 healthy controls and 55 glaucoma patients. We compare the area under the receiver operating characteristic curve (AUC), and the Brier score on the total data set, in the majority class, and in the minority class of pruned random forest ensembles obtained with strategies based on the prediction accuracy of greedily grown sub-ensembles, the uncertainty weighted accuracy, and the similarity between single trees. To validate the findings and to examine the influence of the prevalence of glaucoma in the data set, we additionally perform a simulation study with lower prevalences of glaucoma. In glaucoma classification all three pruning strategies lead to improved AUC and smaller Brier scores on the total data set with sub-ensembles as small as 30 to 80 trees compared to the classification results obtained with the full ensemble consisting of 1000 trees. In the simulation study, we were able to show that the prevalence of glaucoma is a critical factor and lower prevalence decreases the performance of our pruning strategies. The memory demand for glaucoma classification in an unbalanced data situation based on random forests could effectively be reduced by the application of pruning strategies without loss of performance in a population with increased risk of glaucoma.
Random forest feature selection, fusion and ensemble strategy: Combining multiple morphological MRI measures to discriminate among healhy elderly, MCI, cMCI and alzheimer's disease patients: From the alzheimer's disease neuroimaging initiative (ADNI) database.

PubMed

Dimitriadis, S I; Liparas, Dimitris; Tsolaki, Magda N

2018-05-15

In the era of computer-assisted diagnostic tools for various brain diseases, Alzheimer's disease (AD) covers a large percentage of neuroimaging research, with the main scope being its use in daily practice. However, there has been no study attempting to simultaneously discriminate among Healthy Controls (HC), early mild cognitive impairment (MCI), late MCI (cMCI) and stable AD, using features derived from a single modality, namely MRI. Based on preprocessed MRI images from the organizers of a neuroimaging challenge, 3 we attempted to quantify the prediction accuracy of multiple morphological MRI features to simultaneously discriminate among HC, MCI, cMCI and AD. We explored the efficacy of a novel scheme that includes multiple feature selections via Random Forest from subsets of the whole set of features (e.g. whole set, left/right hemisphere etc.), Random Forest classification using a fusion approach and ensemble classification via majority voting. From the ADNI database, 60 HC, 60 MCI, 60 cMCI and 60 CE were used as a training set with known labels. An extra dataset of 160 subjects (HC: 40, MCI: 40, cMCI: 40 and AD: 40) was used as an external blind validation dataset to evaluate the proposed machine learning scheme. In the second blind dataset, we succeeded in a four-class classification of 61.9% by combining MRI-based features with a Random Forest-based Ensemble Strategy. We achieved the best classification accuracy of all teams that participated in this neuroimaging competition. The results demonstrate the effectiveness of the proposed scheme to simultaneously discriminate among four groups using morphological MRI features for the very first time in the literature. Hence, the proposed machine learning scheme can be used to define single and multi-modal biomarkers for AD. Copyright © 2017 Elsevier B.V. All rights reserved.
Automated segmentation of dental CBCT image with prior-guided sequential random forests

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wang, Li; Gao, Yaozong; Shi, Feng

Purpose: Cone-beam computed tomography (CBCT) is an increasingly utilized imaging modality for the diagnosis and treatment planning of the patients with craniomaxillofacial (CMF) deformities. Accurate segmentation of CBCT image is an essential step to generate 3D models for the diagnosis and treatment planning of the patients with CMF deformities. However, due to the image artifacts caused by beam hardening, imaging noise, inhomogeneity, truncation, and maximal intercuspation, it is difficult to segment the CBCT. Methods: In this paper, the authors present a new automatic segmentation method to address these problems. Specifically, the authors first employ a majority voting method to estimatemore » the initial segmentation probability maps of both mandible and maxilla based on multiple aligned expert-segmented CBCT images. These probability maps provide an important prior guidance for CBCT segmentation. The authors then extract both the appearance features from CBCTs and the context features from the initial probability maps to train the first-layer of random forest classifier that can select discriminative features for segmentation. Based on the first-layer of trained classifier, the probability maps are updated, which will be employed to further train the next layer of random forest classifier. By iteratively training the subsequent random forest classifier using both the original CBCT features and the updated segmentation probability maps, a sequence of classifiers can be derived for accurate segmentation of CBCT images. Results: Segmentation results on CBCTs of 30 subjects were both quantitatively and qualitatively validated based on manually labeled ground truth. The average Dice ratios of mandible and maxilla by the authors’ method were 0.94 and 0.91, respectively, which are significantly better than the state-of-the-art method based on sparse representation (p-value < 0.001). Conclusions: The authors have developed and validated a novel fully automated method for CBCT segmentation.« less
Why choose Random Forest to predict rare species distribution with few samples in large undersampled areas? Three Asian crane species models provide supporting evidence.

PubMed

Mi, Chunrong; Huettmann, Falk; Guo, Yumin; Han, Xuesong; Wen, Lijia

2017-01-01

Species distribution models (SDMs) have become an essential tool in ecology, biogeography, evolution and, more recently, in conservation biology. How to generalize species distributions in large undersampled areas, especially with few samples, is a fundamental issue of SDMs. In order to explore this issue, we used the best available presence records for the Hooded Crane ( Grus monacha , n = 33), White-naped Crane ( Grus vipio , n = 40), and Black-necked Crane ( Grus nigricollis , n = 75) in China as three case studies, employing four powerful and commonly used machine learning algorithms to map the breeding distributions of the three species: TreeNet (Stochastic Gradient Boosting, Boosted Regression Tree Model), Random Forest, CART (Classification and Regression Tree) and Maxent (Maximum Entropy Models). In addition, we developed an ensemble forecast by averaging predicted probability of the above four models results. Commonly used model performance metrics (Area under ROC (AUC) and true skill statistic (TSS)) were employed to evaluate model accuracy. The latest satellite tracking data and compiled literature data were used as two independent testing datasets to confront model predictions. We found Random Forest demonstrated the best performance for the most assessment method, provided a better model fit to the testing data, and achieved better species range maps for each crane species in undersampled areas. Random Forest has been generally available for more than 20 years and has been known to perform extremely well in ecological predictions. However, while increasingly on the rise, its potential is still widely underused in conservation, (spatial) ecological applications and for inference. Our results show that it informs ecological and biogeographical theories as well as being suitable for conservation applications, specifically when the study area is undersampled. This method helps to save model-selection time and effort, and allows robust and rapid assessments and decisions for efficient conservation.
Why choose Random Forest to predict rare species distribution with few samples in large undersampled areas? Three Asian crane species models provide supporting evidence

PubMed Central

Mi, Chunrong; Huettmann, Falk; Han, Xuesong; Wen, Lijia

2017-01-01

Species distribution models (SDMs) have become an essential tool in ecology, biogeography, evolution and, more recently, in conservation biology. How to generalize species distributions in large undersampled areas, especially with few samples, is a fundamental issue of SDMs. In order to explore this issue, we used the best available presence records for the Hooded Crane (Grus monacha, n = 33), White-naped Crane (Grus vipio, n = 40), and Black-necked Crane (Grus nigricollis, n = 75) in China as three case studies, employing four powerful and commonly used machine learning algorithms to map the breeding distributions of the three species: TreeNet (Stochastic Gradient Boosting, Boosted Regression Tree Model), Random Forest, CART (Classification and Regression Tree) and Maxent (Maximum Entropy Models). In addition, we developed an ensemble forecast by averaging predicted probability of the above four models results. Commonly used model performance metrics (Area under ROC (AUC) and true skill statistic (TSS)) were employed to evaluate model accuracy. The latest satellite tracking data and compiled literature data were used as two independent testing datasets to confront model predictions. We found Random Forest demonstrated the best performance for the most assessment method, provided a better model fit to the testing data, and achieved better species range maps for each crane species in undersampled areas. Random Forest has been generally available for more than 20 years and has been known to perform extremely well in ecological predictions. However, while increasingly on the rise, its potential is still widely underused in conservation, (spatial) ecological applications and for inference. Our results show that it informs ecological and biogeographical theories as well as being suitable for conservation applications, specifically when the study area is undersampled. This method helps to save model-selection time and effort, and allows robust and rapid assessments and decisions for efficient conservation. PMID:28097060
Monitoring grass nutrients and biomass as indicators of rangeland quality and quantity using random forest modelling and WorldView-2 data

NASA Astrophysics Data System (ADS)

Ramoelo, Abel; Cho, M. A.; Mathieu, R.; Madonsela, S.; van de Kerchove, R.; Kaszta, Z.; Wolff, E.

2015-12-01

Land use and climate change could have huge impacts on food security and the health of various ecosystems. Leaf nitrogen (N) and above-ground biomass are some of the key factors limiting agricultural production and ecosystem functioning. Leaf N and biomass can be used as indicators of rangeland quality and quantity. Conventional methods for assessing these vegetation parameters at landscape scale level are time consuming and tedious. Remote sensing provides a bird-eye view of the landscape, which creates an opportunity to assess these vegetation parameters over wider rangeland areas. Estimation of leaf N has been successful during peak productivity or high biomass and limited studies estimated leaf N in dry season. The estimation of above-ground biomass has been hindered by the signal saturation problems using conventional vegetation indices. The objective of this study is to monitor leaf N and above-ground biomass as an indicator of rangeland quality and quantity using WorldView-2 satellite images and random forest technique in the north-eastern part of South Africa. Series of field work to collect samples for leaf N and biomass were undertaken in March 2013, April or May 2012 (end of wet season) and July 2012 (dry season). Several conventional and red edge based vegetation indices were computed. Overall results indicate that random forest and vegetation indices explained over 89% of leaf N concentrations for grass and trees, and less than 89% for all the years of assessment. The red edge based vegetation indices were among the important variables for predicting leaf N. For the biomass, random forest model explained over 84% of biomass variation in all years, and visible bands including red edge based vegetation indices were found to be important. The study demonstrated that leaf N could be monitored using high spatial resolution with the red edge band capability, and is important for rangeland assessment and monitoring.
Accurate Diabetes Risk Stratification Using Machine Learning: Role of Missing Value and Outliers.

PubMed

Maniruzzaman, Md; Rahman, Md Jahanur; Al-MehediHasan, Md; Suri, Harman S; Abedin, Md Menhazul; El-Baz, Ayman; Suri, Jasjit S

2018-04-10

Diabetes mellitus is a group of metabolic diseases in which blood sugar levels are too high. About 8.8% of the world was diabetic in 2017. It is projected that this will reach nearly 10% by 2045. The major challenge is that when machine learning-based classifiers are applied to such data sets for risk stratification, leads to lower performance. Thus, our objective is to develop an optimized and robust machine learning (ML) system under the assumption that missing values or outliers if replaced by a median configuration will yield higher risk stratification accuracy. This ML-based risk stratification is designed, optimized and evaluated, where: (i) the features are extracted and optimized from the six feature selection techniques (random forest, logistic regression, mutual information, principal component analysis, analysis of variance, and Fisher discriminant ratio) and combined with ten different types of classifiers (linear discriminant analysis, quadratic discriminant analysis, naïve Bayes, Gaussian process classification, support vector machine, artificial neural network, Adaboost, logistic regression, decision tree, and random forest) under the hypothesis that both missing values and outliers when replaced by computed medians will improve the risk stratification accuracy. Pima Indian diabetic dataset (768 patients: 268 diabetic and 500 controls) was used. Our results demonstrate that on replacing the missing values and outliers by group median and median values, respectively and further using the combination of random forest feature selection and random forest classification technique yields an accuracy, sensitivity, specificity, positive predictive value, negative predictive value and area under the curve as: 92.26%, 95.96%, 79.72%, 91.14%, 91.20%, and 0.93, respectively. This is an improvement of 10% over previously developed techniques published in literature. The system was validated for its stability and reliability. RF-based model showed the best performance when outliers are replaced by median values.
Mapping Soil Properties of Africa at 250 m Resolution: Random Forests Significantly Improve Current Predictions.

PubMed

Hengl, Tomislav; Heuvelink, Gerard B M; Kempen, Bas; Leenaars, Johan G B; Walsh, Markus G; Shepherd, Keith D; Sila, Andrew; MacMillan, Robert A; Mendes de Jesus, Jorge; Tamene, Lulseged; Tondoh, Jérôme E

2015-01-01

80% of arable land in Africa has low soil fertility and suffers from physical soil problems. Additionally, significant amounts of nutrients are lost every year due to unsustainable soil management practices. This is partially the result of insufficient use of soil management knowledge. To help bridge the soil information gap in Africa, the Africa Soil Information Service (AfSIS) project was established in 2008. Over the period 2008-2014, the AfSIS project compiled two point data sets: the Africa Soil Profiles (legacy) database and the AfSIS Sentinel Site database. These data sets contain over 28 thousand sampling locations and represent the most comprehensive soil sample data sets of the African continent to date. Utilizing these point data sets in combination with a large number of covariates, we have generated a series of spatial predictions of soil properties relevant to the agricultural management--organic carbon, pH, sand, silt and clay fractions, bulk density, cation-exchange capacity, total nitrogen, exchangeable acidity, Al content and exchangeable bases (Ca, K, Mg, Na). We specifically investigate differences between two predictive approaches: random forests and linear regression. Results of 5-fold cross-validation demonstrate that the random forests algorithm consistently outperforms the linear regression algorithm, with average decreases of 15-75% in Root Mean Squared Error (RMSE) across soil properties and depths. Fitting and running random forests models takes an order of magnitude more time and the modelling success is sensitive to artifacts in the input data, but as long as quality-controlled point data are provided, an increase in soil mapping accuracy can be expected. Results also indicate that globally predicted soil classes (USDA Soil Taxonomy, especially Alfisols and Mollisols) help improve continental scale soil property mapping, and are among the most important predictors. This indicates a promising potential for transferring pedological knowledge from data rich countries to countries with limited soil data.
Evaluation of the path integral for flow through random porous media

NASA Astrophysics Data System (ADS)

Westbroek, Marise J. E.; Coche, Gil-Arnaud; King, Peter R.; Vvedensky, Dimitri D.

2018-04-01

We present a path integral formulation of Darcy's equation in one dimension with random permeability described by a correlated multivariate lognormal distribution. This path integral is evaluated with the Markov chain Monte Carlo method to obtain pressure distributions, which are shown to agree with the solutions of the corresponding stochastic differential equation for Dirichlet and Neumann boundary conditions. The extension of our approach to flow through random media in two and three dimensions is discussed.
Introducing two Random Forest based methods for cloud detection in remote sensing images

NASA Astrophysics Data System (ADS)

Ghasemian, Nafiseh; Akhoondzadeh, Mehdi

2018-07-01

Cloud detection is a necessary phase in satellite images processing to retrieve the atmospheric and lithospheric parameters. Currently, some cloud detection methods based on Random Forest (RF) model have been proposed but they do not consider both spectral and textural characteristics of the image. Furthermore, they have not been tested in the presence of snow/ice. In this paper, we introduce two RF based algorithms, Feature Level Fusion Random Forest (FLFRF) and Decision Level Fusion Random Forest (DLFRF) to incorporate visible, infrared (IR) and thermal spectral and textural features (FLFRF) including Gray Level Co-occurrence Matrix (GLCM) and Robust Extended Local Binary Pattern (RELBP_CI) or visible, IR and thermal classifiers (DLFRF) for highly accurate cloud detection on remote sensing images. FLFRF first fuses visible, IR and thermal features. Thereafter, it uses the RF model to classify pixels to cloud, snow/ice and background or thick cloud, thin cloud and background. DLFRF considers visible, IR and thermal features (both spectral and textural) separately and inserts each set of features to RF model. Then, it holds vote matrix of each run of the model. Finally, it fuses the classifiers using the majority vote method. To demonstrate the effectiveness of the proposed algorithms, 10 Terra MODIS and 15 Landsat 8 OLI/TIRS images with different spatial resolutions are used in this paper. Quantitative analyses are based on manually selected ground truth data. Results show that after adding RELBP_CI to input feature set cloud detection accuracy improves. Also, the average cloud kappa values of FLFRF and DLFRF on MODIS images (1 and 0.99) are higher than other machine learning methods, Linear Discriminate Analysis (LDA), Classification And Regression Tree (CART), K Nearest Neighbor (KNN) and Support Vector Machine (SVM) (0.96). The average snow/ice kappa values of FLFRF and DLFRF on MODIS images (1 and 0.85) are higher than other traditional methods. The quantitative values on Landsat 8 images show similar trend. Consequently, while SVM and K-nearest neighbor show overestimation in predicting cloud and snow/ice pixels, our Random Forest (RF) based models can achieve higher cloud, snow/ice kappa values on MODIS and thin cloud, thick cloud and snow/ice kappa values on Landsat 8 images. Our algorithms predict both thin and thick cloud on Landsat 8 images while the existing cloud detection algorithm, Fmask cannot discriminate them. Compared to the state-of-the-art methods, our algorithms have acquired higher average cloud and snow/ice kappa values for different spatial resolutions.
On Models for Binomial Data with Random Numbers of Trials

PubMed Central

Comulada, W. Scott; Weiss, Robert E.

2010-01-01

Summary A binomial outcome is a count s of the number of successes out of the total number of independent trials n = s + f, where f is a count of the failures. The n are random variables not fixed by design in many studies. Joint modeling of (s, f) can provide additional insight into the science and into the probability π of success that cannot be directly incorporated by the logistic regression model. Observations where n = 0 are excluded from the binomial analysis yet may be important to understanding how π is influenced by covariates. Correlation between s and f may exist and be of direct interest. We propose Bayesian multivariate Poisson models for the bivariate response (s, f), correlated through random effects. We extend our models to the analysis of longitudinal and multivariate longitudinal binomial outcomes. Our methodology was motivated by two disparate examples, one from teratology and one from an HIV tertiary intervention study. PMID:17688514
Modelling Biophysical Parameters of Maize Using Landsat 8 Time Series

NASA Astrophysics Data System (ADS)

Dahms, Thorsten; Seissiger, Sylvia; Conrad, Christopher; Borg, Erik

2016-06-01

Open and free access to multi-frequent high-resolution data (e.g. Sentinel - 2) will fortify agricultural applications based on satellite data. The temporal and spatial resolution of these remote sensing datasets directly affects the applicability of remote sensing methods, for instance a robust retrieving of biophysical parameters over the entire growing season with very high geometric resolution. In this study we use machine learning methods to predict biophysical parameters, namely the fraction of absorbed photosynthetic radiation (FPAR), the leaf area index (LAI) and the chlorophyll content, from high resolution remote sensing. 30 Landsat 8 OLI scenes were available in our study region in Mecklenburg-Western Pomerania, Germany. In-situ data were weekly to bi-weekly collected on 18 maize plots throughout the summer season 2015. The study aims at an optimized prediction of biophysical parameters and the identification of the best explaining spectral bands and vegetation indices. For this purpose, we used the entire in-situ dataset from 24.03.2015 to 15.10.2015. Random forest and conditional inference forests were used because of their explicit strong exploratory and predictive character. Variable importance measures allowed for analysing the relation between the biophysical parameters with respect to the spectral response, and the performance of the two approaches over the plant stock evolvement. Classical random forest regression outreached the performance of conditional inference forests, in particular when modelling the biophysical parameters over the entire growing period. For example, modelling biophysical parameters of maize for the entire vegetation period using random forests yielded: FPAR: R² = 0.85; RMSE = 0.11; LAI: R² = 0.64; RMSE = 0.9 and chlorophyll content (SPAD): R² = 0.80; RMSE=4.9. Our results demonstrate the great potential in using machine-learning methods for the interpretation of long-term multi-frequent remote sensing datasets to model biophysical parameters.
Climate Controls on Tree Growth Across Species and Sites in Northeastern Arizona

NASA Astrophysics Data System (ADS)

Schwan, M. R.; Guiterman, C. H.; Anchukaitis, K. J.

2016-12-01

Understanding how forests will respond to ongoing climate change is important for conservation and resource management. Conifer forests in the US Southwest are predicted to be particularly at risk from increased drought and higher temperatures projected to occur in the region. Tree-ring studies shed light on how trees respond to climate, but there remains considerable uncertainty as to which climate factors are most important, and which species are most at risk. Confounding climate and environmental factors, biological differences among species, and biogeography often complicate cross-species analysis. Here we present a multi-species, multivariate analysis of tree growth response to climate variability. We analyze data from three coexisting conifer tree species at two sites near Canyon de Chelly, Arizona. We use a high-resolution PRISM gridded climate dataset to determine the growth responses across species and sites to temperature and precipitation. We identify both common and differential responses in our data and use these to infer possible risks these forest communities may face under a changing climate.
Quantitative analysis of American woodcock nest and brood habitat

USGS Publications Warehouse

Bourgeois, A.; Keppie, Daniel M.; Owen, Ray B.

1977-01-01

Sixteen nest and 19 brood sites of American woodcock (Philohela minoI) were examined in northern lower Michigan between 15 April and 15 June 1974 to determine habitat structure associated with these sites. Woodcock hens utilized young, second-growth forest stands which were similar in species composition for both nesting and brood rearing. A multi-varIate discriminant function analysis revealed a significant (P< 0.05) difference, however, in habitat structure. Nest habitat was characterized by lower tree density (2176 trees/ha) and basal area (8.6 m2/ha), by being close to forest openings (7 m) and by being situated on dry, relatively well drained sites. In contrast, woodcock broods were located in sites that had nearly twice the tree density (3934 trees/hal and basal area (16.5 m2/ha), was located over twice as far from forest openings (18 m) and generally occurred on damp sites, near (8 m) standing water. Importance of the habitat features to the species and possible management implications are discussed.
FOCIS: A forest classification and inventory system using LANDSAT and digital terrain data

NASA Technical Reports Server (NTRS)

Strahler, A. H.; Franklin, J.; Woodcook, C. E.; Logan, T. L.

1981-01-01

Accurate, cost-effective stratification of forest vegetation and timber inventory is the primary goal of a Forest Classification and Inventory System (FOCIS). Conventional timber stratification using photointerpretation can be time-consuming, costly, and inconsistent from analyst to analyst. FOCIS was designed to overcome these problems by using machine processing techniques to extract and process tonal, textural, and terrain information from registered LANDSAT multispectral and digital terrain data. Comparison of samples from timber strata identified by conventional procedures showed that both have about the same potential to reduce the variance of timber volume estimates over simple random sampling.
Faster Trees: Strategies for Accelerated Training and Prediction of Random Forests for Classification of Polsar Images

NASA Astrophysics Data System (ADS)

Hänsch, Ronny; Hellwich, Olaf

2018-04-01

Random Forests have continuously proven to be one of the most accurate, robust, as well as efficient methods for the supervised classification of images in general and polarimetric synthetic aperture radar data in particular. While the majority of previous work focus on improving classification accuracy, we aim for accelerating the training of the classifier as well as its usage during prediction while maintaining its accuracy. Unlike other approaches we mainly consider algorithmic changes to stay as much as possible independent of platform and programming language. The final model achieves an approximately 60 times faster training and a 500 times faster prediction, while the accuracy is only marginally decreased by roughly 1 %.
Underwater image enhancement through depth estimation based on random forest

NASA Astrophysics Data System (ADS)

Tai, Shen-Chuan; Tsai, Ting-Chou; Huang, Jyun-Han

2017-11-01

Light absorption and scattering in underwater environments can result in low-contrast images with a distinct color cast. This paper proposes a systematic framework for the enhancement of underwater images. Light transmission is estimated using the random forest algorithm. RGB values, luminance, color difference, blurriness, and the dark channel are treated as features in training and estimation. Transmission is calculated using an ensemble machine learning algorithm to deal with a variety of conditions encountered in underwater environments. A color compensation and contrast enhancement algorithm based on depth information was also developed with the aim of improving the visual quality of underwater images. Experimental results demonstrate that the proposed scheme outperforms existing methods with regard to subjective visual quality as well as objective measurements.
An evaluation of the use of near infrared (NIR) spectroscopy to identify water and oil-borne preservatives

Treesearch

Chi-Leung So; Stan T. Lebow; Leslie H. Groom; Todd F. Shupe

2003-01-01

In this research we experimented with a new and rapid way of analyzing wood. Near Infrared (NIR)spectroscopy together with multivariate analysis is becoming a widely used technique in the field of forest products especially for property determination and is already firmly established in the pulp and paper industry. This method is ideal for the chemical analysis of wood...
QUANTIFYING FOREST ABOVEGROUND CARBON POOLS AND FLUXES USING MULTI-TEMPORAL LIDAR A report on field monitoring, remote sensing MMV, GIS integration, and modeling results for forestry field validation test to quantify aboveground tree biomass and carbon

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lee Spangler; Lee A. Vierling; Eva K. Stand

2012-04-01

Sound policy recommendations relating to the role of forest management in mitigating atmospheric carbon dioxide (CO{sub 2}) depend upon establishing accurate methodologies for quantifying forest carbon pools for large tracts of land that can be dynamically updated over time. Light Detection and Ranging (LiDAR) remote sensing is a promising technology for achieving accurate estimates of aboveground biomass and thereby carbon pools; however, not much is known about the accuracy of estimating biomass change and carbon flux from repeat LiDAR acquisitions containing different data sampling characteristics. In this study, discrete return airborne LiDAR data was collected in 2003 and 2009 acrossmore » {approx}20,000 hectares (ha) of an actively managed, mixed conifer forest landscape in northern Idaho, USA. Forest inventory plots, established via a random stratified sampling design, were established and sampled in 2003 and 2009. The Random Forest machine learning algorithm was used to establish statistical relationships between inventory data and forest structural metrics derived from the LiDAR acquisitions. Aboveground biomass maps were created for the study area based on statistical relationships developed at the plot level. Over this 6-year period, we found that the mean increase in biomass due to forest growth across the non-harvested portions of the study area was 4.8 metric ton/hectare (Mg/ha). In these non-harvested areas, we found a significant difference in biomass increase among forest successional stages, with a higher biomass increase in mature and old forest compared to stand initiation and young forest. Approximately 20% of the landscape had been disturbed by harvest activities during the six-year time period, representing a biomass loss of >70 Mg/ha in these areas. During the study period, these harvest activities outweighed growth at the landscape scale, resulting in an overall loss in aboveground carbon at this site. The 30-fold increase in sampling density between the 2003 and 2009 did not affect the biomass estimates. Overall, LiDAR data coupled with field reference data offer a powerful method for calculating pools and changes in aboveground carbon in forested systems. The results of our study suggest that multitemporal LiDAR-based approaches are likely to be useful for high quality estimates of aboveground carbon change in conifer forest systems.« less
Incidence and effects of endemic populations of forest pests in young mixed-conifer forests of the Sierra Nevada

Treesearch

Carroll B. Williams; David L. Azuma; George T. Ferrell

1992-01-01

Approximately 3.200 trees in young mixed-conifer stands were examined for pest activity and human-caused or mechanical injuries, and approximately 25 percent of these trees were randomly selected for stem analyses. The examination of trees felled for stem analyses showed that 409 (47 percent) were free of pests and 466 (53 percent) had one or more pest categories....

Short-term effects of silviculture on breeding birds in William B. Bankhead National Forest

Treesearch

Jill M. Wick; Yong Wang; Callie Jo Schweitzer

2013-01-01

We evaluated the changes in the bird community in relation to six disturbance treatments in the William B. Bankhead National Forest, AL. The study design is randomized complete block with a factorial arrangement of three thinning levels [no thin, 11 m²/ha residual basal area (BA), and 17 m²/ha residual BA] and two burn treatments (burn and no burn),...
Modeling change in potential landscape vulnerability to forest insect and pathogen disturbances: methods for forested subwatersheds sampled in the midscale interior Columbia River basin assessment.

Treesearch

Paul F. Hessburg; Bradley G. Smith; Craig A. Miller; Scott D. Kreiter; R. Brion Salter

1999-01-01

In the interior Columbia River basin midscale ecological assessment, including portions of the Klamath and Great Basins, we mapped and characterized historical and current vegetation composition and structure of 337 randomly sampled subwatersheds (9500 ha average size) in 43 subbasins (404 000 ha average size). We compared landscape patterns, vegetation structure and...
Responses of cavity-nesting birds to stand-replacement fire and salvage logging in ponderosa pine/Douglas-fir forests of southwestern Idaho

Treesearch

Victoria A. Saab; Jonathan G. Dudley

1998-01-01

From 1994 to 1996, researchers monitored 695 nests of nine cavity-nesting bird species and measured vegetation at nest sites and at 90 randomly located sites in burned ponderosa pine forests of southwestern Idaho. Site treatments included two types of salvage logging, and unlogged controls. All bird species selected nest sites with higher tree densities, larger...
Northwest Forest Plan—the first 10 years (1994–2003): preliminary assessment of the condition of watersheds.

Treesearch

Kirsten Gallo; Steven H. Lanigan; Peter Eldred; Sean N. Gordon; Chris Moyer

2005-01-01

We aggregated road, vegetation, and inchannel data to assess the condition of sixth-field watersheds and describe the distribution of the condition of watersheds in the Northwest Forest Plan (the Plan) area. The assessment is based on 250 watersheds selected at random within the Plan area. The distributions of conditions are presented for watersheds and for many of the...
Role of decaying logs and other organic seedbeds in natural regeneration of Hawaiian forest species on abandoned montane pasture

Treesearch

Paul G. Scowcroft

1992-01-01

Natural regeneration is one mechanism by which native mixed-species forests become reestablished on abandoned pasture. This study was done to determine patterns of and requirement for natural regeneration of native species in an open woodland after removal of cattle. Ten 50- by 50-m quadrats were randomly selected within a 16-ha exclosure located at 1,700-m elevation...
Point-Sampling and Line-Sampling Probability Theory, Geometric Implications, Synthesis

Treesearch

L.R. Grosenbaugh

1958-01-01

Foresters concerned with measuring tree populations on definite areas have long employed two well-known methods of representative sampling. In list or enumerative sampling the entire tree population is tallied with a known proportion being randomly selected and measured for volume or other variables. In area sampling all trees on randomly located plots or strips...
Mapping ecological systems with a random foret model: tradeoffs between errors and bias

Treesearch

Emilie Grossmann; Janet Ohmann; James Kagan; Heather May; Matthew Gregory

2010-01-01

New methods for predictive vegetation mapping allow improved estimations of plant community composition across large regions. Random Forest (RF) models limit over-fitting problems of other methods, and are known for making accurate classification predictions from noisy, nonnormal data, but can be biased when plot samples are unbalanced. We developed two contrasting...
Simultaneous comparison and assessment of eight remotely sensed maps of Philippine forests

NASA Astrophysics Data System (ADS)

Estoque, Ronald C.; Pontius, Robert G.; Murayama, Yuji; Hou, Hao; Thapa, Rajesh B.; Lasco, Rodel D.; Villar, Merlito A.

2018-05-01

This article compares and assesses eight remotely sensed maps of Philippine forest cover in the year 2010. We examined eight Forest versus Non-Forest maps reclassified from eight land cover products: the Philippine Land Cover, the Climate Change Initiative (CCI) Land Cover, the Landsat Vegetation Continuous Fields (VCF), the MODIS VCF, the MODIS Land Cover Type product (MCD12Q1), the Global Tree Canopy Cover, the ALOS-PALSAR Forest/Non-Forest Map, and the GlobeLand30. The reference data consisted of 9852 randomly distributed sample points interpreted from Google Earth. We created methods to assess the maps and their combinations. Results show that the percentage of the Philippines covered by forest ranges among the maps from a low of 23% for the Philippine Land Cover to a high of 67% for GlobeLand30. Landsat VCF estimates 36% forest cover, which is closest to the 37% estimate based on the reference data. The eight maps plus the reference data agree unanimously on 30% of the sample points, of which 11% are attributable to forest and 19% to non-forest. The overall disagreement between the reference data and Philippine Land Cover is 21%, which is the least among the eight Forest versus Non-Forest maps. About half of the 9852 points have a nested structure such that the forest in a given dataset is a subset of the forest in the datasets that have more forest than the given dataset. The variation among the maps regarding forest quantity and allocation relates to the combined effects of the various definitions of forest and classification errors. Scientists and policy makers must consider these insights when producing future forest cover maps and when establishing benchmarks for forest cover monitoring.
Multivariate test power approximations for balanced linear mixed models in studies with missing data.

PubMed

Ringham, Brandy M; Kreidler, Sarah M; Muller, Keith E; Glueck, Deborah H

2016-07-30

Multilevel and longitudinal studies are frequently subject to missing data. For example, biomarker studies for oral cancer may involve multiple assays for each participant. Assays may fail, resulting in missing data values that can be assumed to be missing completely at random. Catellier and Muller proposed a data analytic technique to account for data missing at random in multilevel and longitudinal studies. They suggested modifying the degrees of freedom for both the Hotelling-Lawley trace F statistic and its null case reference distribution. We propose parallel adjustments to approximate power for this multivariate test in studies with missing data. The power approximations use a modified non-central F statistic, which is a function of (i) the expected number of complete cases, (ii) the expected number of non-missing pairs of responses, or (iii) the trimmed sample size, which is the planned sample size reduced by the anticipated proportion of missing data. The accuracy of the method is assessed by comparing the theoretical results to the Monte Carlo simulated power for the Catellier and Muller multivariate test. Over all experimental conditions, the closest approximation to the empirical power of the Catellier and Muller multivariate test is obtained by adjusting power calculations with the expected number of complete cases. The utility of the method is demonstrated with a multivariate power analysis for a hypothetical oral cancer biomarkers study. We describe how to implement the method using standard, commercially available software products and give example code. Copyright © 2015 John Wiley & Sons, Ltd. Copyright © 2015 John Wiley & Sons, Ltd.
Nonlocal atlas-guided multi-channel forest learning for human brain labeling

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ma, Guangkai; Gao, Yaozong; Wu, Guorong

Purpose: It is important for many quantitative brain studies to label meaningful anatomical regions in MR brain images. However, due to high complexity of brain structures and ambiguous boundaries between different anatomical regions, the anatomical labeling of MR brain images is still quite a challenging task. In many existing label fusion methods, appearance information is widely used. However, since local anatomy in the human brain is often complex, the appearance information alone is limited in characterizing each image point, especially for identifying the same anatomical structure across different subjects. Recent progress in computer vision suggests that the context features canmore » be very useful in identifying an object from a complex scene. In light of this, the authors propose a novel learning-based label fusion method by using both low-level appearance features (computed from the target image) and high-level context features (computed from warped atlases or tentative labeling maps of the target image). Methods: In particular, the authors employ a multi-channel random forest to learn the nonlinear relationship between these hybrid features and target labels (i.e., corresponding to certain anatomical structures). Specifically, at each of the iterations, the random forest will output tentative labeling maps of the target image, from which the authors compute spatial label context features and then use in combination with original appearance features of the target image to refine the labeling. Moreover, to accommodate the high inter-subject variations, the authors further extend their learning-based label fusion to a multi-atlas scenario, i.e., they train a random forest for each atlas and then obtain the final labeling result according to the consensus of results from all atlases. Results: The authors have comprehensively evaluated their method on both public LONI-LBPA40 and IXI datasets. To quantitatively evaluate the labeling accuracy, the authors use the dice similarity coefficient to measure the overlap degree. Their method achieves average overlaps of 82.56% on 54 regions of interest (ROIs) and 79.78% on 80 ROIs, respectively, which significantly outperform the baseline method (random forests), with the average overlaps of 72.48% on 54 ROIs and 72.09% on 80 ROIs, respectively. Conclusions: The proposed methods have achieved the highest labeling accuracy, compared to several state-of-the-art methods in the literature.« less
Nonlocal atlas-guided multi-channel forest learning for human brain labeling

PubMed Central

Ma, Guangkai; Gao, Yaozong; Wu, Guorong; Wu, Ligang; Shen, Dinggang

2016-01-01

Purpose: It is important for many quantitative brain studies to label meaningful anatomical regions in MR brain images. However, due to high complexity of brain structures and ambiguous boundaries between different anatomical regions, the anatomical labeling of MR brain images is still quite a challenging task. In many existing label fusion methods, appearance information is widely used. However, since local anatomy in the human brain is often complex, the appearance information alone is limited in characterizing each image point, especially for identifying the same anatomical structure across different subjects. Recent progress in computer vision suggests that the context features can be very useful in identifying an object from a complex scene. In light of this, the authors propose a novel learning-based label fusion method by using both low-level appearance features (computed from the target image) and high-level context features (computed from warped atlases or tentative labeling maps of the target image). Methods: In particular, the authors employ a multi-channel random forest to learn the nonlinear relationship between these hybrid features and target labels (i.e., corresponding to certain anatomical structures). Specifically, at each of the iterations, the random forest will output tentative labeling maps of the target image, from which the authors compute spatial label context features and then use in combination with original appearance features of the target image to refine the labeling. Moreover, to accommodate the high inter-subject variations, the authors further extend their learning-based label fusion to a multi-atlas scenario, i.e., they train a random forest for each atlas and then obtain the final labeling result according to the consensus of results from all atlases. Results: The authors have comprehensively evaluated their method on both public LONI_LBPA40 and IXI datasets. To quantitatively evaluate the labeling accuracy, the authors use the dice similarity coefficient to measure the overlap degree. Their method achieves average overlaps of 82.56% on 54 regions of interest (ROIs) and 79.78% on 80 ROIs, respectively, which significantly outperform the baseline method (random forests), with the average overlaps of 72.48% on 54 ROIs and 72.09% on 80 ROIs, respectively. Conclusions: The proposed methods have achieved the highest labeling accuracy, compared to several state-of-the-art methods in the literature. PMID:26843260
Habitat Preferences of Boros schneideri (Coleoptera: Boridae) in the Natural Tree Stands of the Białowieża Forest

PubMed Central

Gutowski, Jerzy M.; Sućko, Krzysztof; Zub, Karol; Bohdan, Adam

2014-01-01

Abstract We analyzed habitat requirements of Boros schneideri (Panzer, 1796) (Coleoptera: Boridae) in the natural forests of the continental biogeographical region, using data collected in the Białowieża Forest. This species has been found on the six host trees, but it preferred dead, standing pine trees, characterized by large diameter, moderately moist and moist phloem but avoided trees in sunny locations. It occurred mostly in mesic and wet coniferous forests. This species demonstrated preferences for old tree stands (over 140-yr old), and its occurrence in younger tree-stand age classes (minimum 31–40-yr old) was not significantly different from random distribution. B. schneideri occupied more frequently locations distant from the forest edge, which were less affected by logging. Considering habitat requirements, character of occurrence, and decreasing number of occupied locations in the whole range of distribution, this species can be treated as relict of primeval forests. PMID:25527586
BitterSweetForest: A random forest based binary classifier to predict bitterness and sweetness of chemical compounds

NASA Astrophysics Data System (ADS)

Banerjee, Priyanka; Preissner, Robert

2018-04-01

Taste of a chemical compounds present in food stimulates us to take in nutrients and avoid poisons. However, the perception of taste greatly depends on the genetic as well as evolutionary perspectives. The aim of this work was the development and validation of a machine learning model based on molecular fingerprints to discriminate between sweet and bitter taste of molecules. BitterSweetForest is the first open access model based on KNIME workflow that provides platform for prediction of bitter and sweet taste of chemical compounds using molecular fingerprints and Random Forest based classifier. The constructed model yielded an accuracy of 95% and an AUC of 0.98 in cross-validation. In independent test set, BitterSweetForest achieved an accuracy of 96 % and an AUC of 0.98 for bitter and sweet taste prediction. The constructed model was further applied to predict the bitter and sweet taste of natural compounds, approved drugs as well as on an acute toxicity compound data set. BitterSweetForest suggests 70% of the natural product space, as bitter and 10 % of the natural product space as sweet with confidence score of 0.60 and above. 77 % of the approved drug set was predicted as bitter and 2% as sweet with a confidence scores of 0.75 and above. Similarly, 75% of the total compounds from acute oral toxicity class were predicted only as bitter with a minimum confidence score of 0.75, revealing toxic compounds are mostly bitter. Furthermore, we applied a Bayesian based feature analysis method to discriminate the most occurring chemical features between sweet and bitter compounds from the feature space of a circular fingerprint.
BitterSweetForest: A Random Forest Based Binary Classifier to Predict Bitterness and Sweetness of Chemical Compounds

PubMed Central

Banerjee, Priyanka; Preissner, Robert

2018-01-01

Taste of a chemical compound present in food stimulates us to take in nutrients and avoid poisons. However, the perception of taste greatly depends on the genetic as well as evolutionary perspectives. The aim of this work was the development and validation of a machine learning model based on molecular fingerprints to discriminate between sweet and bitter taste of molecules. BitterSweetForest is the first open access model based on KNIME workflow that provides platform for prediction of bitter and sweet taste of chemical compounds using molecular fingerprints and Random Forest based classifier. The constructed model yielded an accuracy of 95% and an AUC of 0.98 in cross-validation. In independent test set, BitterSweetForest achieved an accuracy of 96% and an AUC of 0.98 for bitter and sweet taste prediction. The constructed model was further applied to predict the bitter and sweet taste of natural compounds, approved drugs as well as on an acute toxicity compound data set. BitterSweetForest suggests 70% of the natural product space, as bitter and 10% of the natural product space as sweet with confidence score of 0.60 and above. 77% of the approved drug set was predicted as bitter and 2% as sweet with a confidence score of 0.75 and above. Similarly, 75% of the total compounds from acute oral toxicity class were predicted only as bitter with a minimum confidence score of 0.75, revealing toxic compounds are mostly bitter. Furthermore, we applied a Bayesian based feature analysis method to discriminate the most occurring chemical features between sweet and bitter compounds using the feature space of a circular fingerprint. PMID:29696137
Inventory of forest resources (including water) by multi-level sampling. [nine northern Virginia coastal plain counties

NASA Technical Reports Server (NTRS)

Aldrich, R. C.; Dana, R. W.; Roberts, E. H. (Principal Investigator)

1977-01-01

The author has identified the following significant results. A stratified random sample using LANDSAT band 5 and 7 panchromatic prints resulted in estimates of water in counties with sampling errors less than + or - 9% (67% probability level). A forest inventory using a four band LANDSAT color composite resulted in estimates of forest area by counties that were within + or - 6.7% and + or - 3.7% respectively (67% probability level). Estimates of forest area for counties by computer assisted techniques were within + or - 21% of operational forest survey figures and for all counties the difference was only one percent. Correlations of airborne terrain reflectance measurements with LANDSAT radiance verified a linear atmospheric model with an additive (path radiance) term and multiplicative (transmittance) term. Coefficients of determination for 28 of the 32 modeling attempts, not adverseley affected by rain shower occurring between the times of LANDSAT passage and aircraft overflights, exceeded 0.83.
Radar modeling of a boreal forest

NASA Technical Reports Server (NTRS)

Chauhan, Narinder S.; Lang, Roger H.; Ranson, K. J.

1991-01-01

Microwave modeling, ground truth, and SAR data are used to investigate the characteristics of forest stands. A mixed coniferous forest stand has been modeled at P, L, and C bands. Extensive measurements of ground truth and canopy geometry parameters were performed in a 200-m-square hemlock-dominated forest plot. About 10 percent of the trees were sampled to determine a distribution of diameter at breast height (DBH). Hemlock trees in the forest are modeled by characterizing tree trunks, branches, and needles as randomly oriented lossy dielectric cylinders whose area and orientation distributions are prescribed. The distorted Born approximation is used to compute the backscatter at P, L, and C bands. The theoretical results are found to be lower than the calibrated ground-truth data. The experiment and model results agree quite closely, however, when the ratios of VV to HH and HV to HH are compared.
Redistribution of soil nitrogen, carbon and organic matter by mechanical disturbance during whole-tree harvesting in northern hardwoods

USGS Publications Warehouse

Ryan, D.F.; Huntington, T.G.; Wayne, Martin C.

1992-01-01

To investigate whether mechanical mixing during harvesting could account for losses observed from forest floor, we measured surface disturbance on a 22 ha watershed that was whole-tree harvested. Surface soil on each 10 cm interval along 81, randomly placed transects was classified immediately after harvesting as mineral or organic, and as undisturbed, depressed, rutted, mounded, scarified, or scalped (forest floor scraped away). We quantitatively sampled these surface categories to collect soil in which preharvest forest floor might reside after harvest. Mechanically mixed mineral and organic soil horizons were readily identified. Buried forest floor under mixed mineral soil occurred in 57% of mounds with mineral surface soil. Harvesting disturbed 65% of the watershed surface and removed forest floor from 25% of the area. Mechanically mixed soil under ruts with organic or mineral surface soil, and mounds with mineral surface soil contained organic carbon and nitrogen pools significantly greater than undisturbed forest floor. Mechanical mixing into underlying mineral soil could account for the loss of forest floor observed between the preharvest condition and the second growing season after whole-tree harvesting. ?? 1992.
Exploring diversity in ensemble classification: Applications in large area land cover mapping

NASA Astrophysics Data System (ADS)

Mellor, Andrew; Boukir, Samia

2017-07-01

Ensemble classifiers, such as random forests, are now commonly applied in the field of remote sensing, and have been shown to perform better than single classifier systems, resulting in reduced generalisation error. Diversity across the members of ensemble classifiers is known to have a strong influence on classification performance - whereby classifier errors are uncorrelated and more uniformly distributed across ensemble members. The relationship between ensemble diversity and classification performance has not yet been fully explored in the fields of information science and machine learning and has never been examined in the field of remote sensing. This study is a novel exploration of ensemble diversity and its link to classification performance, applied to a multi-class canopy cover classification problem using random forests and multisource remote sensing and ancillary GIS data, across seven million hectares of diverse dry-sclerophyll dominated public forests in Victoria Australia. A particular emphasis is placed on analysing the relationship between ensemble diversity and ensemble margin - two key concepts in ensemble learning. The main novelty of our work is on boosting diversity by emphasizing the contribution of lower margin instances used in the learning process. Exploring the influence of tree pruning on diversity is also a new empirical analysis that contributes to a better understanding of ensemble performance. Results reveal insights into the trade-off between ensemble classification accuracy and diversity, and through the ensemble margin, demonstrate how inducing diversity by targeting lower margin training samples is a means of achieving better classifier performance for more difficult or rarer classes and reducing information redundancy in classification problems. Our findings inform strategies for collecting training data and designing and parameterising ensemble classifiers, such as random forests. This is particularly important in large area remote sensing applications, for which training data is costly and resource intensive to collect.
Classification of savanna tree species, in the Greater Kruger National Park region, by integrating hyperspectral and LiDAR data in a Random Forest data mining environment

NASA Astrophysics Data System (ADS)

Naidoo, L.; Cho, M. A.; Mathieu, R.; Asner, G.

2012-04-01

The accurate classification and mapping of individual trees at species level in the savanna ecosystem can provide numerous benefits for the managerial authorities. Such benefits include the mapping of economically useful tree species, which are a key source of food production and fuel wood for the local communities, and of problematic alien invasive and bush encroaching species, which can threaten the integrity of the environment and livelihoods of the local communities. Species level mapping is particularly challenging in African savannas which are complex, heterogeneous, and open environments with high intra-species spectral variability due to differences in geology, topography, rainfall, herbivory and human impacts within relatively short distances. Savanna vegetation are also highly irregular in canopy and crown shape, height and other structural dimensions with a combination of open grassland patches and dense woody thicket - a stark contrast to the more homogeneous forest vegetation. This study classified eight common savanna tree species in the Greater Kruger National Park region, South Africa, using a combination of hyperspectral and Light Detection and Ranging (LiDAR)-derived structural parameters, in the form of seven predictor datasets, in an automated Random Forest modelling approach. The most important predictors, which were found to play an important role in the different classification models and contributed to the success of the hybrid dataset model when combined, were species tree height; NDVI; the chlorophyll b wavelength (466 nm) and a selection of raw, continuum removed and Spectral Angle Mapper (SAM) bands. It was also concluded that the hybrid predictor dataset Random Forest model yielded the highest classification accuracy and prediction success for the eight savanna tree species with an overall classification accuracy of 87.68% and KHAT value of 0.843.
Automated segmentation of thyroid gland on CT images with multi-atlas label fusion and random classification forest

NASA Astrophysics Data System (ADS)

Liu, Jiamin; Chang, Kevin; Kim, Lauren; Turkbey, Evrim; Lu, Le; Yao, Jianhua; Summers, Ronald

2015-03-01

The thyroid gland plays an important role in clinical practice, especially for radiation therapy treatment planning. For patients with head and neck cancer, radiation therapy requires a precise delineation of the thyroid gland to be spared on the pre-treatment planning CT images to avoid thyroid dysfunction. In the current clinical workflow, the thyroid gland is normally manually delineated by radiologists or radiation oncologists, which is time consuming and error prone. Therefore, a system for automated segmentation of the thyroid is desirable. However, automated segmentation of the thyroid is challenging because the thyroid is inhomogeneous and surrounded by structures that have similar intensities. In this work, the thyroid gland segmentation is initially estimated by multi-atlas label fusion algorithm. The segmentation is refined by supervised statistical learning based voxel labeling with a random forest algorithm. Multiatlas label fusion (MALF) transfers expert-labeled thyroids from atlases to a target image using deformable registration. Errors produced by label transfer are reduced by label fusion that combines the results produced by all atlases into a consensus solution. Then, random forest (RF) employs an ensemble of decision trees that are trained on labeled thyroids to recognize features. The trained forest classifier is then applied to the thyroid estimated from the MALF by voxel scanning to assign the class-conditional probability. Voxels from the expert-labeled thyroids in CT volumes are treated as positive classes; background non-thyroid voxels as negatives. We applied this automated thyroid segmentation system to CT scans of 20 patients. The results showed that the MALF achieved an overall 0.75 Dice Similarity Coefficient (DSC) and the RF classification further improved the DSC to 0.81.

Application and partial validation of a habitat model for moose in the Lake Superior region

USGS Publications Warehouse

Allen, A.W.; Terrell, J.W.; Mangus, W.L.; Lindquist, E.L.

1991-01-01

A modified version of the dormant-season portion of a Habitat Suitability Index (HSI) model developed for assessing moose (Alces alces) habitat in the Lake Superior Region was incorporated in a Geographic Information System (GIS) for 490 km2 of Minnesota's Superior National Forest. Moose locations (n=235) were plotted during aerial surveys conducted in December 1988 and January 1990-1991. Dormant-season forage and cover quality for 1,000-m, 500-m, and 200-m radii plots around random points and moose locations were compared using U.S. Forest Service stand examination data. Cover quality indices were lower than forage quality indices within all plots. The median value for the average cover quality index was greater (P=0.003) within 200-m plots around cow moose locations than for plots around random points for the most severe winter of the study. The proportion of highest-quality winter cover, such as mixed stands dominated by mid-age class white spruce (Picea glauca) and balsam fir (Abies balsanea), was greater within 500-m and 200-m plots around cow moose than within similar plots around random points during the two most severe winters. These results indicate that suboptimum ratings of winter habitat quality used in the GIS for dormant-season forage >100 m from cover, as suggested in the original HSI model, are reasonable. Integrating the habitat model with forest stand data using a GIS permitted analysis of moose habitat within a relatively large geographic area. Simulation of habitat quality indicated a potential shortage of late-winter cover in the study area. The effects of forest management actions on moose habitat quality can be simulated without collecting additional data.
Radiative transfer theory for active remote sensing of a forested canopy

NASA Technical Reports Server (NTRS)

Karam, M. A.; Fung, A. K.

1989-01-01

A canopy is modeled as a two-layer medium above a rough interface. The upper layer stands for the forest crown, with the leaves modeled as randomly oriented and distributed disks and needles and the branches modeled as randomly oriented finite dielectric cylinders. The lower layer contains the tree trunks, modeled as randomly positioned vertical cylinders above the rough soil. Radiative-transfer theory is applied to calculate EM scattering from such a canopy, is expressed in terms of the scattering-amplitude tensors (SATs). For leaves, the generalized Rayleigh-Gans approximation is applied, whereas the branch and trunk SATs are obtained by estimating the inner field by fields inside a similar cylinder of infinite length. The Kirchhoff method is used to calculate the soil SAT. For a plane wave exciting the canopy, the radiative-transfer equations are solved by iteration to the first order in albedo of the leaves and the branches. Numerical results are illustrated as a function of the incidence angle.
Multivariate random regression analysis for body weight and main morphological traits in genetically improved farmed tilapia (Oreochromis niloticus).

PubMed

He, Jie; Zhao, Yunfeng; Zhao, Jingli; Gao, Jin; Han, Dandan; Xu, Pao; Yang, Runqing

2017-11-02

Because of their high economic importance, growth traits in fish are under continuous improvement. For growth traits that are recorded at multiple time-points in life, the use of univariate and multivariate animal models is limited because of the variable and irregular timing of these measures. Thus, the univariate random regression model (RRM) was introduced for the genetic analysis of dynamic growth traits in fish breeding. We used a multivariate random regression model (MRRM) to analyze genetic changes in growth traits recorded at multiple time-point of genetically-improved farmed tilapia. Legendre polynomials of different orders were applied to characterize the influences of fixed and random effects on growth trajectories. The final MRRM was determined by optimizing the univariate RRM for the analyzed traits separately via penalizing adaptively the likelihood statistical criterion, which is superior to both the Akaike information criterion and the Bayesian information criterion. In the selected MRRM, the additive genetic effects were modeled by Legendre polynomials of three orders for body weight (BWE) and body length (BL) and of two orders for body depth (BD). By using the covariance functions of the MRRM, estimated heritabilities were between 0.086 and 0.628 for BWE, 0.155 and 0.556 for BL, and 0.056 and 0.607 for BD. Only heritabilities for BD measured from 60 to 140 days of age were consistently higher than those estimated by the univariate RRM. All genetic correlations between growth time-points exceeded 0.5 for either single or pairwise time-points. Moreover, correlations between early and late growth time-points were lower. Thus, for phenotypes that are measured repeatedly in aquaculture, an MRRM can enhance the efficiency of the comprehensive selection for BWE and the main morphological traits.
Mapping forests in monsoon Asia with ALOS PALSAR 50-m mosaic images and MODIS imagery in 2010

PubMed Central

Qin, Yuanwei; Xiao, Xiangming; Dong, Jinwei; Zhang, Geli; Roy, Partha Sarathi; Joshi, Pawan Kumar; Gilani, Hammad; Murthy, Manchiraju Sri Ramachandra; Jin, Cui; Wang, Jie; Zhang, Yao; Chen, Bangqian; Menarguez, Michael Angelo; Biradar, Chandrashekhar M.; Bajgain, Rajen; Li, Xiangping; Dai, Shengqi; Hou, Ying; Xin, Fengfei; Moore III, Berrien

2016-01-01

Extensive forest changes have occurred in monsoon Asia, substantially affecting climate, carbon cycle and biodiversity. Accurate forest cover maps at fine spatial resolutions are required to qualify and quantify these effects. In this study, an algorithm was developed to map forests in 2010, with the use of structure and biomass information from the Advanced Land Observation System (ALOS) Phased Array L-band Synthetic Aperture Radar (PALSAR) mosaic dataset and the phenological information from MODerate Resolution Imaging Spectroradiometer (MOD13Q1 and MOD09A1) products. Our forest map (PALSARMOD50 m F/NF) was assessed through randomly selected ground truth samples from high spatial resolution images and had an overall accuracy of 95%. Total area of forests in monsoon Asia in 2010 was estimated to be ~6.3 × 106 km2. The distribution of evergreen and deciduous forests agreed reasonably well with the median Normalized Difference Vegetation Index (NDVI) in winter. PALSARMOD50 m F/NF map showed good spatial and areal agreements with selected forest maps generated by the Japan Aerospace Exploration Agency (JAXA F/NF), European Space Agency (ESA F/NF), Boston University (MCD12Q1 F/NF), Food and Agricultural Organization (FAO FRA), and University of Maryland (Landsat forests), but relatively large differences and uncertainties in tropical forests and evergreen and deciduous forests. PMID:26864143
Mapping forests in monsoon Asia with ALOS PALSAR 50-m mosaic images and MODIS imagery in 2010.

PubMed

Qin, Yuanwei; Xiao, Xiangming; Dong, Jinwei; Zhang, Geli; Roy, Partha Sarathi; Joshi, Pawan Kumar; Gilani, Hammad; Murthy, Manchiraju Sri Ramachandra; Jin, Cui; Wang, Jie; Zhang, Yao; Chen, Bangqian; Menarguez, Michael Angelo; Biradar, Chandrashekhar M; Bajgain, Rajen; Li, Xiangping; Dai, Shengqi; Hou, Ying; Xin, Fengfei; Moore, Berrien

2016-02-11

Extensive forest changes have occurred in monsoon Asia, substantially affecting climate, carbon cycle and biodiversity. Accurate forest cover maps at fine spatial resolutions are required to qualify and quantify these effects. In this study, an algorithm was developed to map forests in 2010, with the use of structure and biomass information from the Advanced Land Observation System (ALOS) Phased Array L-band Synthetic Aperture Radar (PALSAR) mosaic dataset and the phenological information from MODerate Resolution Imaging Spectroradiometer (MOD13Q1 and MOD09A1) products. Our forest map (PALSARMOD50 m F/NF) was assessed through randomly selected ground truth samples from high spatial resolution images and had an overall accuracy of 95%. Total area of forests in monsoon Asia in 2010 was estimated to be ~6.3 × 10(6 )km(2). The distribution of evergreen and deciduous forests agreed reasonably well with the median Normalized Difference Vegetation Index (NDVI) in winter. PALSARMOD50 m F/NF map showed good spatial and areal agreements with selected forest maps generated by the Japan Aerospace Exploration Agency (JAXA F/NF), European Space Agency (ESA F/NF), Boston University (MCD12Q1 F/NF), Food and Agricultural Organization (FAO FRA), and University of Maryland (Landsat forests), but relatively large differences and uncertainties in tropical forests and evergreen and deciduous forests.
Additive Benefits of Twice Forest Bathing Trips in Elderly Patients with Chronic Heart Failure.

PubMed

Mao, Gen Xiang; Cao, Yong Bao; Yang, Yan; Chen, Zhuo Mei; Dong, Jian Hua; Chen, Sha Sha; Wu, Qing; Lyu, Xiao Ling; Jia, Bing Bing; Yan, Jing; Wang, Guo Fu

2018-02-01

Chronic heart failure (CHF), a clinical syndrome resulting from the consequences of various cardiovascular diseases (CVDs), is increasingly becoming a global cause of morbidity and mortality. We had earlier demonstrated that a 4-day forest bathing trip can provide an adjunctive therapeutic influence on patients with CHF. To further investigate the duration of the impact and the optimal frequency of forest bathing trips in patients with CHF, we recruited those subjects who had experienced the first forest bathing trip again after 4 weeks and randomly categorized them into two groups, namely, the urban control group (city) and the forest bathing group (forest). After a second 4-day forest bathing trip, we observed a steady decline in the brain natriuretic peptide levels, a biomarker of heart failure, and an attenuated inflammatory response as well as oxidative stress. Thus, this exploratory study demonstrated the additive benefits of twice forest bathing trips in elderly patients with CHF, which could further pave the way for analyzing the effects of such interventions in CVDs. Copyright © 2018 The Editorial Board of Biomedical and Environmental Sciences. Published by China CDC. All rights reserved.
Edge-related loss of tree phylogenetic diversity in the severely fragmented Brazilian Atlantic forest.

PubMed

Santos, Bráulio A; Arroyo-Rodríguez, Víctor; Moreno, Claudia E; Tabarelli, Marcelo

2010-09-08

Deforestation and forest fragmentation are known major causes of nonrandom extinction, but there is no information about their impact on the phylogenetic diversity of the remaining species assemblages. Using a large vegetation dataset from an old hyper-fragmented landscape in the Brazilian Atlantic rainforest we assess whether the local extirpation of tree species and functional impoverishment of tree assemblages reduce the phylogenetic diversity of the remaining tree assemblages. We detected a significant loss of tree phylogenetic diversity in forest edges, but not in core areas of small (<80 ha) forest fragments. This was attributed to a reduction of 11% in the average phylogenetic distance between any two randomly chosen individuals from forest edges; an increase of 17% in the average phylogenetic distance to closest non-conspecific relative for each individual in forest edges; and to the potential manifestation of late edge effects in the core areas of small forest remnants. We found no evidence supporting fragmentation-induced phylogenetic clustering or evenness. This could be explained by the low phylogenetic conservatism of key life-history traits corresponding to vulnerable species. Edge effects must be reduced to effectively protect tree phylogenetic diversity in the severely fragmented Brazilian Atlantic forest.
Functional decay in tree community within tropical fragmented landscapes: Effects of landscape-scale forest cover.

PubMed

Rocha-Santos, Larissa; Benchimol, Maíra; Mayfield, Margaret M; Faria, Deborah; Pessoa, Michaele S; Talora, Daniela C; Mariano-Neto, Eduardo; Cazetta, Eliana

2017-01-01

As tropical rainforests are cleared, forest remnants are increasingly isolated within agricultural landscapes. Understanding how forest loss impacts on species diversity can, therefore, contribute to identifying the minimum amount of habitat required for biodiversity maintenance in human-modified landscapes. Here, we evaluate how the amount of forest cover, at the landscape scale, affects patterns of species richness, abundance, key functional traits and common taxonomic families of adult trees in twenty Brazilian Atlantic rainforest landscapes. We found that as forest cover decreases, both tree community richness and abundance decline, without exhibiting a threshold. At the family-level, species richness and abundance of the Myrtaceae and Sapotaceae were also negatively impacted by the percent forest remaining at the landscape scale. For functional traits, we found a reduction in shade-tolerant, animal-dispersed and small-seeded species following a decrease in the amount of forest retained in landscapes. These results suggest that the amount of forest in a landscape is driving non-random losses in phylogenetic and functional tree diversity in Brazil's remaining Atlantic rainforests. Our study highlights potential restraints on the conservation value of Atlantic rainforest remnants in deforested landscapes in the future.
Functional decay in tree community within tropical fragmented landscapes: Effects of landscape-scale forest cover

PubMed Central

Benchimol, Maíra; Mayfield, Margaret M.; Faria, Deborah; Pessoa, Michaele S.; Talora, Daniela C.; Mariano-Neto, Eduardo; Cazetta, Eliana

2017-01-01

As tropical rainforests are cleared, forest remnants are increasingly isolated within agricultural landscapes. Understanding how forest loss impacts on species diversity can, therefore, contribute to identifying the minimum amount of habitat required for biodiversity maintenance in human-modified landscapes. Here, we evaluate how the amount of forest cover, at the landscape scale, affects patterns of species richness, abundance, key functional traits and common taxonomic families of adult trees in twenty Brazilian Atlantic rainforest landscapes. We found that as forest cover decreases, both tree community richness and abundance decline, without exhibiting a threshold. At the family-level, species richness and abundance of the Myrtaceae and Sapotaceae were also negatively impacted by the percent forest remaining at the landscape scale. For functional traits, we found a reduction in shade-tolerant, animal-dispersed and small-seeded species following a decrease in the amount of forest retained in landscapes. These results suggest that the amount of forest in a landscape is driving non-random losses in phylogenetic and functional tree diversity in Brazil’s remaining Atlantic rainforests. Our study highlights potential restraints on the conservation value of Atlantic rainforest remnants in deforested landscapes in the future. PMID:28403166
Assessing the Potential of Land Use Modification to Mitigate Ambient NO2 and Its Consequences for Respiratory Health

PubMed Central

Rao, Meenakshi; George, Linda A.; Shandas, Vivek; Rosenstiel, Todd N.

2017-01-01

Understanding how local land use and land cover (LULC) shapes intra-urban concentrations of atmospheric pollutants—and thus human health—is a key component in designing healthier cities. Here, NO2 is modeled based on spatially dense summer and winter NO2 observations in Portland-Hillsboro-Vancouver (USA), and the spatial variation of NO2 with LULC investigated using random forest, an ensemble data learning technique. The NO2 random forest model, together with BenMAP, is further used to develop a better understanding of the relationship among LULC, ambient NO2 and respiratory health. The impact of land use modifications on ambient NO2, and consequently on respiratory health, is also investigated using a sensitivity analysis. We find that NO2 associated with roadways and tree-canopied areas may be affecting annual incidence rates of asthma exacerbation in 4–12 year olds by +3000 per 100,000 and −1400 per 100,000, respectively. Our model shows that increasing local tree canopy by 5% may reduce local incidences rates of asthma exacerbation by 6%, indicating that targeted local tree-planting efforts may have a substantial impact on reducing city-wide incidence of respiratory distress. Our findings demonstrate the utility of random forest modeling in evaluating LULC modifications for enhanced respiratory health. PMID:28698523
Predicting redox-sensitive contaminant concentrations in groundwater using random forest classification

NASA Astrophysics Data System (ADS)

Tesoriero, Anthony J.; Gronberg, Jo Ann; Juckem, Paul F.; Miller, Matthew P.; Austin, Brian P.

2017-08-01

Machine learning techniques were applied to a large (n > 10,000) compliance monitoring database to predict the occurrence of several redox-active constituents in groundwater across a large watershed. Specifically, random forest classification was used to determine the probabilities of detecting elevated concentrations of nitrate, iron, and arsenic in the Fox, Wolf, Peshtigo, and surrounding watersheds in northeastern Wisconsin. Random forest classification is well suited to describe the nonlinear relationships observed among several explanatory variables and the predicted probabilities of elevated concentrations of nitrate, iron, and arsenic. Maps of the probability of elevated nitrate, iron, and arsenic can be used to assess groundwater vulnerability and the vulnerability of streams to contaminants derived from groundwater. Processes responsible for elevated concentrations are elucidated using partial dependence plots. For example, an increase in the probability of elevated iron and arsenic occurred when well depths coincided with the glacial/bedrock interface, suggesting a bedrock source for these constituents. Furthermore, groundwater in contact with Ordovician bedrock has a higher likelihood of elevated iron concentrations, which supports the hypothesis that groundwater liberates iron from a sulfide-bearing secondary cement horizon of Ordovician age. Application of machine learning techniques to existing compliance monitoring data offers an opportunity to broadly assess aquifer and stream vulnerability at regional and national scales and to better understand geochemical processes responsible for observed conditions.
Predicting redox-sensitive contaminant concentrations in groundwater using random forest classification

USGS Publications Warehouse

Tesoriero, Anthony J.; Gronberg, Jo Ann M.; Juckem, Paul F.; Miller, Matthew P.; Austin, Brian P.

2017-01-01

Machine learning techniques were applied to a large (n > 10,000) compliance monitoring database to predict the occurrence of several redox-active constituents in groundwater across a large watershed. Specifically, random forest classification was used to determine the probabilities of detecting elevated concentrations of nitrate, iron, and arsenic in the Fox, Wolf, Peshtigo, and surrounding watersheds in northeastern Wisconsin. Random forest classification is well suited to describe the nonlinear relationships observed among several explanatory variables and the predicted probabilities of elevated concentrations of nitrate, iron, and arsenic. Maps of the probability of elevated nitrate, iron, and arsenic can be used to assess groundwater vulnerability and the vulnerability of streams to contaminants derived from groundwater. Processes responsible for elevated concentrations are elucidated using partial dependence plots. For example, an increase in the probability of elevated iron and arsenic occurred when well depths coincided with the glacial/bedrock interface, suggesting a bedrock source for these constituents. Furthermore, groundwater in contact with Ordovician bedrock has a higher likelihood of elevated iron concentrations, which supports the hypothesis that groundwater liberates iron from a sulfide-bearing secondary cement horizon of Ordovician age. Application of machine learning techniques to existing compliance monitoring data offers an opportunity to broadly assess aquifer and stream vulnerability at regional and national scales and to better understand geochemical processes responsible for observed conditions.
Developing reservoir monthly inflow forecasts using artificial intelligence and climate phenomenon information

NASA Astrophysics Data System (ADS)

Yang, Tiantian; Asanjan, Ata Akbari; Welles, Edwin; Gao, Xiaogang; Sorooshian, Soroosh; Liu, Xiaomang

2017-04-01

Reservoirs are fundamental human-built infrastructures that collect, store, and deliver fresh surface water in a timely manner for many purposes. Efficient reservoir operation requires policy makers and operators to understand how reservoir inflows are changing under different hydrological and climatic conditions to enable forecast-informed operations. Over the last decade, the uses of Artificial Intelligence and Data Mining [AI & DM] techniques in assisting reservoir streamflow subseasonal to seasonal forecasts have been increasing. In this study, Random Forest [RF), Artificial Neural Network (ANN), and Support Vector Regression (SVR) are employed and compared with respect to their capabilities for predicting 1 month-ahead reservoir inflows for two headwater reservoirs in USA and China. Both current and lagged hydrological information and 17 known climate phenomenon indices, i.e., PDO and ENSO, etc., are selected as predictors for simulating reservoir inflows. Results show (1) three methods are capable of providing monthly reservoir inflows with satisfactory statistics; (2) the results obtained by Random Forest have the best statistical performances compared with the other two methods; (3) another advantage of Random Forest algorithm is its capability of interpreting raw model inputs; (4) climate phenomenon indices are useful in assisting monthly or seasonal forecasts of reservoir inflow; and (5) different climate conditions are autocorrelated with up to several months, and the climatic information and their lags are cross correlated with local hydrological conditions in our case studies.
Combining MODIS and Landsat imagery to estimate and map boreal forest cover loss

USGS Publications Warehouse

Potapov, P.; Hansen, Matthew C.; Stehman, S.V.; Loveland, Thomas R.; Pittman, K.

2008-01-01

Estimation of forest cover change is important for boreal forests, one of the most extensive forested biomes, due to its unique role in global timber stock, carbon sequestration and deposition, and high vulnerability to the effects of global climate change. We used time-series data from the MODerate Resolution Imaging Spectroradiometer (MODIS) to produce annual forest cover loss hotspot maps. These maps were used to assign all blocks (18.5 by 18.5 km) partitioning the boreal biome into strata of high, medium and low likelihood of forest cover loss. A stratified random sample of 118 blocks was interpreted for forest cover and forest cover loss using high spatial resolution Landsat imagery from 2000 and 2005. Area of forest cover gross loss from 2000 to 2005 within the boreal biome is estimated to be 1.63% (standard error 0.10%) of the total biome area, and represents a 4.02% reduction in year 2000 forest cover. The proportion of identified forest cover loss relative to regional forest area is much higher in North America than in Eurasia (5.63% to 3.00%). Of the total forest cover loss identified, 58.9% is attributable to wildfires. The MODIS pan-boreal change hotspot estimates reveal significant increases in forest cover loss due to wildfires in 2002 and 2003, with 2003 being the peak year of loss within the 5-year study period. Overall, the precision of the aggregate forest cover loss estimates derived from the Landsat data and the value of the MODIS-derived map displaying the spatial and temporal patterns of forest loss demonstrate the efficacy of this protocol for operational, cost-effective, and timely biome-wide monitoring of gross forest cover loss.
Semi-empirical modelling for forest above ground biomass estimation using hybrid and fully PolSAR data

NASA Astrophysics Data System (ADS)

Tomar, Kiledar S.; Kumar, Shashi; Tolpekin, Valentyn A.; Joshi, Sushil K.

2016-05-01

Forests act as sink of carbon and as a result maintains carbon cycle in atmosphere. Deforestation leads to imbalance in global carbon cycle and changes in climate. Hence estimation of forest biophysical parameter like biomass becomes a necessity. PolSAR has the ability to discriminate the share of scattering element like surface, double bounce and volume scattering in a single SAR resolution cell. Studies have shown that volume scattering is a significant parameter for forest biophysical characterization which mainly occurred from vegetation due to randomly oriented structures. This random orientation of forest structure causes shift in orientation angle of polarization ellipse which ultimately disturbs the radar signature and shows overestimation of volume scattering and underestimation of double bounce scattering after decomposition of fully PolSAR data. Hybrid polarimetry has the advantage of zero POA shift due to rotational symmetry followed by the circular transmission of electromagnetic waves. The prime objective of this study was to extract the potential of Hybrid PolSAR and fully PolSAR data for AGB estimation using Extended Water Cloud model. Validation was performed using field biomass. The study site chosen was Barkot Forest, Uttarakhand, India. To obtain the decomposition components, m-alpha and Yamaguchi decomposition modelling for Hybrid and fully PolSAR data were implied respectively. The RGB composite image for both the decomposition techniques has generated. The contribution of all scattering from each plot for m-alpha and Yamaguchi decomposition modelling were extracted. The R2 value for modelled AGB and field biomass from Hybrid PolSAR and fully PolSAR data were found 0.5127 and 0.4625 respectively. The RMSE for Hybrid and fully PolSAR between modelled AGB and field biomass were 63.156 (t ha-1) and 73.424 (t ha-1) respectively. On the basis of RMSE and R2 value, this study suggests Hybrid PolSAR decomposition modelling to retrieve scattering element for AGB estimation from forest.
Electromagnetic wave scattering from a forest or vegetation canopy - Ongoing research at the University of Texas at Arlington

NASA Technical Reports Server (NTRS)

Karam, Mostafa A.; Amar, Faouzi; Fung, Adrian K.

1993-01-01

The Wave Scattering Research Center at the University of Texas at Arlington has developed a scattering model for forest or vegetation, based on the theory of electromagnetic-wave scattering in random media. The model generalizes the assumptions imposed by earlier models, and compares well with measurements from several forest canopies. This paper gives a description of the model. It also indicates how the model elements are integrated to obtain the scattering characteristics of different forest canopies. The scattering characteristics may be displayed in the form of polarimetric signatures, represented by like- and cross-polarized scattering coefficients, for an elliptically-polarized wave, or in the form of signal-distribution curves. Results illustrating both types of scattering characteristics are given.
Natural cavities used by wood ducks in north-central Minnesota

USGS Publications Warehouse

Gilmer, D.S.; Ball, I.J.; Cowardin, L.M.; Mathisen, J.

1978-01-01

Radio telemetry was used to locate 31 wood duck (Aix sponsa) nest cavity sites in 16 forest stands. Stands were of 2 types: (1) mature (mean = 107 years) northern hardwoods (10 nest sites), and (2) mature (mean = 68 years) quaking aspen (Populus tremuloides) (21 nest sites). Aspen was the most important cavity-producing tree used by wood ducks and accounted for 57 percent of 28 cavities inspected. In stands used by wood ducks, the average density of suitable cavities was about 4 per hectare. Trees containing nests were closer to water areas (P < 0.05) and the nearest forest canopy openings (P < 0.01) than was a random sample of trees from the same stands. A significant (P < 0.005) relationship existed between the orientation of the cavity entrance and the nearest canopy opening. Potential wood duck cavities usually were clustered within a stand rather than randomly distributed. Selection of trees by woodpeckers for nest hole construction probably influenced the availability of cavities used by wood ducks. A plan for managing forests to benefit wood ducks and other wildlife dependent on old-growth timber is discussed.
Mathematical models application for mapping soils spatial distribution on the example of the farm from the North of Udmurt Republic of Russia

NASA Astrophysics Data System (ADS)

Dokuchaev, P. M.; Meshalkina, J. L.; Yaroslavtsev, A. M.

2018-01-01

Comparative analysis of soils geospatial modeling using multinomial logistic regression, decision trees, random forest, regression trees and support vector machines algorithms was conducted. The visual interpretation of the digital maps obtained and their comparison with the existing map, as well as the quantitative assessment of the individual soil groups detection overall accuracy and of the models kappa showed that multiple logistic regression, support vector method, and random forest models application with spatial prediction of the conditional soil groups distribution can be reliably used for mapping of the study area. It has shown the most accurate detection for sod-podzolics soils (Phaeozems Albic) lightly eroded and moderately eroded soils. In second place, according to the mean overall accuracy of the prediction, there are sod-podzolics soils - non-eroded and warp one, as well as sod-gley soils (Umbrisols Gleyic) and alluvial soils (Fluvisols Dystric, Umbric). Heavy eroded sod-podzolics and gray forest soils (Phaeozems Albic) were detected by methods of automatic classification worst of all.
Genetic parameters for growth characteristics of free-range chickens under univariate random regression models.

PubMed

Rovadoscki, Gregori A; Petrini, Juliana; Ramirez-Diaz, Johanna; Pertile, Simone F N; Pertille, Fábio; Salvian, Mayara; Iung, Laiza H S; Rodriguez, Mary Ana P; Zampar, Aline; Gaya, Leila G; Carvalho, Rachel S B; Coelho, Antonio A D; Savino, Vicente J M; Coutinho, Luiz L; Mourão, Gerson B

2016-09-01

Repeated measures from the same individual have been analyzed by using repeatability and finite dimension models under univariate or multivariate analyses. However, in the last decade, the use of random regression models for genetic studies with longitudinal data have become more common. Thus, the aim of this research was to estimate genetic parameters for body weight of four experimental chicken lines by using univariate random regression models. Body weight data from hatching to 84 days of age (n = 34,730) from four experimental free-range chicken lines (7P, Caipirão da ESALQ, Caipirinha da ESALQ and Carijó Barbado) were used. The analysis model included the fixed effects of contemporary group (gender and rearing system), fixed regression coefficients for age at measurement, and random regression coefficients for permanent environmental effects and additive genetic effects. Heterogeneous variances for residual effects were considered, and one residual variance was assigned for each of six subclasses of age at measurement. Random regression curves were modeled by using Legendre polynomials of the second and third orders, with the best model chosen based on the Akaike Information Criterion, Bayesian Information Criterion, and restricted maximum likelihood. Multivariate analyses under the same animal mixed model were also performed for the validation of the random regression models. The Legendre polynomials of second order were better for describing the growth curves of the lines studied. Moderate to high heritabilities (h(2) = 0.15 to 0.98) were estimated for body weight between one and 84 days of age, suggesting that selection for body weight at all ages can be used as a selection criteria. Genetic correlations among body weight records obtained through multivariate analyses ranged from 0.18 to 0.96, 0.12 to 0.89, 0.06 to 0.96, and 0.28 to 0.96 in 7P, Caipirão da ESALQ, Caipirinha da ESALQ, and Carijó Barbado chicken lines, respectively. Results indicate that genetic gain for body weight can be achieved by selection. Also, selection for body weight at 42 days of age can be maintained as a selection criterion. © 2016 Poultry Science Association Inc.
Multivariate statistical analysis of hemlock (Tsuga) volatiles by SPME/GC/MS: insights into the phytochemistry of the hemlock woolly adelgid (Adelges tsugae Annand)

Treesearch

Anthony Lagalante; Frank Calvosa; Michael Mirzabeigi; Vikram Iyengar; Michael Montgomery; Kathleen Shields

2007-01-01

A previously developed single-needle, SPME/GC/MS technique was used to measure the terpenoid content of T. canadensis growing in a hemlock forest at Lake Scranton, PA (Lagalante and Montgomery 2003). The volatile terpenoid composition was measured over a 1-year period from June 2003 to May 2004 to follow the annual cycle of foliage development from...

Some links on this page may take you to non-federal websites. Their policies may differ from this site.