Austin, Peter C; Lee, Douglas S; Steyerberg, Ewout W; Tu, Jack V
2012-01-01
In biomedical research, the logistic regression model is the most commonly used method for predicting the probability of a binary outcome. While many clinical researchers have expressed an enthusiasm for regression trees, this method may have limited accuracy for predicting health outcomes. We aimed to evaluate the improvement that is achieved by using ensemble-based methods, including bootstrap aggregation (bagging) of regression trees, random forests, and boosted regression trees. We analyzed 30-day mortality in two large cohorts of patients hospitalized with either acute myocardial infarction (N = 16,230) or congestive heart failure (N = 15,848) in two distinct eras (1999–2001 and 2004–2005). We found that both the in-sample and out-of-sample prediction of ensemble methods offered substantial improvement in predicting cardiovascular mortality compared to conventional regression trees. However, conventional logistic regression models that incorporated restricted cubic smoothing splines had even better performance. We conclude that ensemble methods from the data mining and machine learning literature increase the predictive performance of regression trees, but may not lead to clear advantages over conventional logistic regression models for predicting short-term mortality in population-based samples of subjects with cardiovascular disease. PMID:22777999
Boosted Regression Tree Models to Explain Watershed Nutrient Concentrations and Biological Condition
Boosted regression tree (BRT) models were developed to quantify the nonlinear relationships between landscape variables and nutrient concentrations in a mesoscale mixed land cover watershed during base-flow conditions. Factors that affect instream biological components, based on ...
NASA Astrophysics Data System (ADS)
Rooper, Christopher N.; Zimmermann, Mark; Prescott, Megan M.
2017-08-01
Deep-sea coral and sponge ecosystems are widespread throughout most of Alaska's marine waters, and are associated with many different species of fishes and invertebrates. These ecosystems are vulnerable to the effects of commercial fishing activities and climate change. We compared four commonly used species distribution models (general linear models, generalized additive models, boosted regression trees and random forest models) and an ensemble model to predict the presence or absence and abundance of six groups of benthic invertebrate taxa in the Gulf of Alaska. All four model types performed adequately on training data for predicting presence and absence, with regression forest models having the best overall performance measured by the area under the receiver-operating-curve (AUC). The models also performed well on the test data for presence and absence with average AUCs ranging from 0.66 to 0.82. For the test data, ensemble models performed the best. For abundance data, there was an obvious demarcation in performance between the two regression-based methods (general linear models and generalized additive models), and the tree-based models. The boosted regression tree and random forest models out-performed the other models by a wide margin on both the training and testing data. However, there was a significant drop-off in performance for all models of invertebrate abundance ( 50%) when moving from the training data to the testing data. Ensemble model performance was between the tree-based and regression-based methods. The maps of predictions from the models for both presence and abundance agreed very well across model types, with an increase in variability in predictions for the abundance data. We conclude that where data conforms well to the modeled distribution (such as the presence-absence data and binomial distribution in this study), the four types of models will provide similar results, although the regression-type models may be more consistent with biological theory. For data with highly zero-inflated distributions and non-normal distributions such as the abundance data from this study, the tree-based methods performed better. Ensemble models that averaged predictions across the four model types, performed better than the GLM or GAM models but slightly poorer than the tree-based methods, suggesting ensemble models might be more robust to overfitting than tree methods, while mitigating some of the disadvantages in predictive performance of regression methods.
Anderson, Weston; Guikema, Seth; Zaitchik, Ben; Pan, William
2014-01-01
Obtaining accurate small area estimates of population is essential for policy and health planning but is often difficult in countries with limited data. In lieu of available population data, small area estimate models draw information from previous time periods or from similar areas. This study focuses on model-based methods for estimating population when no direct samples are available in the area of interest. To explore the efficacy of tree-based models for estimating population density, we compare six different model structures including Random Forest and Bayesian Additive Regression Trees. Results demonstrate that without information from prior time periods, non-parametric tree-based models produced more accurate predictions than did conventional regression methods. Improving estimates of population density in non-sampled areas is important for regions with incomplete census data and has implications for economic, health and development policies.
Anderson, Weston; Guikema, Seth; Zaitchik, Ben; Pan, William
2014-01-01
Obtaining accurate small area estimates of population is essential for policy and health planning but is often difficult in countries with limited data. In lieu of available population data, small area estimate models draw information from previous time periods or from similar areas. This study focuses on model-based methods for estimating population when no direct samples are available in the area of interest. To explore the efficacy of tree-based models for estimating population density, we compare six different model structures including Random Forest and Bayesian Additive Regression Trees. Results demonstrate that without information from prior time periods, non-parametric tree-based models produced more accurate predictions than did conventional regression methods. Improving estimates of population density in non-sampled areas is important for regions with incomplete census data and has implications for economic, health and development policies. PMID:24992657
Gu, Yingxin; Wylie, Bruce K.; Boyte, Stephen; Picotte, Joshua J.; Howard, Danny; Smith, Kelcy; Nelson, Kurtis
2016-01-01
Regression tree models have been widely used for remote sensing-based ecosystem mapping. Improper use of the sample data (model training and testing data) may cause overfitting and underfitting effects in the model. The goal of this study is to develop an optimal sampling data usage strategy for any dataset and identify an appropriate number of rules in the regression tree model that will improve its accuracy and robustness. Landsat 8 data and Moderate-Resolution Imaging Spectroradiometer-scaled Normalized Difference Vegetation Index (NDVI) were used to develop regression tree models. A Python procedure was designed to generate random replications of model parameter options across a range of model development data sizes and rule number constraints. The mean absolute difference (MAD) between the predicted and actual NDVI (scaled NDVI, value from 0–200) and its variability across the different randomized replications were calculated to assess the accuracy and stability of the models. In our case study, a six-rule regression tree model developed from 80% of the sample data had the lowest MAD (MADtraining = 2.5 and MADtesting = 2.4), which was suggested as the optimal model. This study demonstrates how the training data and rule number selections impact model accuracy and provides important guidance for future remote-sensing-based ecosystem modeling.
Chen, Guangchao; Li, Xuehua; Chen, Jingwen; Zhang, Ya-Nan; Peijnenburg, Willie J G M
2014-12-01
Biodegradation is the principal environmental dissipation process of chemicals. As such, it is a dominant factor determining the persistence and fate of organic chemicals in the environment, and is therefore of critical importance to chemical management and regulation. In the present study, the authors developed in silico methods assessing biodegradability based on a large heterogeneous set of 825 organic compounds, using the techniques of the C4.5 decision tree, the functional inner regression tree, and logistic regression. External validation was subsequently carried out by 2 independent test sets of 777 and 27 chemicals. As a result, the functional inner regression tree exhibited the best predictability with predictive accuracies of 81.5% and 81.0%, respectively, on the training set (825 chemicals) and test set I (777 chemicals). Performance of the developed models on the 2 test sets was subsequently compared with that of the Estimation Program Interface (EPI) Suite Biowin 5 and Biowin 6 models, which also showed a better predictability of the functional inner regression tree model. The model built in the present study exhibits a reasonable predictability compared with existing models while possessing a transparent algorithm. Interpretation of the mechanisms of biodegradation was also carried out based on the models developed. © 2014 SETAC.
Charles E. Rose; Thomas B. Lynch
2001-01-01
A method was developed for estimating parameters in an individual tree basal area growth model using a system of equations based on dbh rank classes. The estimation method developed is a compromise between an individual tree and a stand level basal area growth model that accounts for the correlation between trees within a plot by using seemingly unrelated regression (...
Mani, Ashutosh; Rao, Marepalli; James, Kelley; Bhattacharya, Amit
2015-01-01
The purpose of this study was to explore data-driven models, based on decision trees, to develop practical and easy to use predictive models for early identification of firefighters who are likely to cross the threshold of hyperthermia during live-fire training. Predictive models were created for three consecutive live-fire training scenarios. The final predicted outcome was a categorical variable: will a firefighter cross the upper threshold of hyperthermia - Yes/No. Two tiers of models were built, one with and one without taking into account the outcome (whether a firefighter crossed hyperthermia or not) from the previous training scenario. First tier of models included age, baseline heart rate and core body temperature, body mass index, and duration of training scenario as predictors. The second tier of models included the outcome of the previous scenario in the prediction space, in addition to all the predictors from the first tier of models. Classification and regression trees were used independently for prediction. The response variable for the regression tree was the quantitative variable: core body temperature at the end of each scenario. The predicted quantitative variable from regression trees was compared to the upper threshold of hyperthermia (38°C) to predict whether a firefighter would enter hyperthermia. The performance of classification and regression tree models was satisfactory for the second (success rate = 79%) and third (success rate = 89%) training scenarios but not for the first (success rate = 43%). Data-driven models based on decision trees can be a useful tool for predicting physiological response without modeling the underlying physiological systems. Early prediction of heat stress coupled with proactive interventions, such as pre-cooling, can help reduce heat stress in firefighters.
NASA Astrophysics Data System (ADS)
Di, Nur Faraidah Muhammad; Satari, Siti Zanariah
2017-05-01
Outlier detection in linear data sets has been done vigorously but only a small amount of work has been done for outlier detection in circular data. In this study, we proposed multiple outliers detection in circular regression models based on the clustering algorithm. Clustering technique basically utilizes distance measure to define distance between various data points. Here, we introduce the similarity distance based on Euclidean distance for circular model and obtain a cluster tree using the single linkage clustering algorithm. Then, a stopping rule for the cluster tree based on the mean direction and circular standard deviation of the tree height is proposed. We classify the cluster group that exceeds the stopping rule as potential outlier. Our aim is to demonstrate the effectiveness of proposed algorithms with the similarity distances in detecting the outliers. It is found that the proposed methods are performed well and applicable for circular regression model.
Using Evidence-Based Decision Trees Instead of Formulas to Identify At-Risk Readers. REL 2014-036
ERIC Educational Resources Information Center
Koon, Sharon; Petscher, Yaacov; Foorman, Barbara R.
2014-01-01
This study examines whether the classification and regression tree (CART) model improves the early identification of students at risk for reading comprehension difficulties compared with the more difficult to interpret logistic regression model. CART is a type of predictive modeling that relies on nonparametric techniques. It presents results in…
Kadiyala, Akhil; Kaur, Devinder; Kumar, Ashok
2013-02-01
The present study developed a novel approach to modeling indoor air quality (IAQ) of a public transportation bus by the development of hybrid genetic-algorithm-based neural networks (also known as evolutionary neural networks) with input variables optimized from using the regression trees, referred as the GART approach. This study validated the applicability of the GART modeling approach in solving complex nonlinear systems by accurately predicting the monitored contaminants of carbon dioxide (CO2), carbon monoxide (CO), nitric oxide (NO), sulfur dioxide (SO2), 0.3-0.4 microm sized particle numbers, 0.4-0.5 microm sized particle numbers, particulate matter (PM) concentrations less than 1.0 microm (PM10), and PM concentrations less than 2.5 microm (PM2.5) inside a public transportation bus operating on 20% grade biodiesel in Toledo, OH. First, the important variables affecting each monitored in-bus contaminant were determined using regression trees. Second, the analysis of variance was used as a complimentary sensitivity analysis to the regression tree results to determine a subset of statistically significant variables affecting each monitored in-bus contaminant. Finally, the identified subsets of statistically significant variables were used as inputs to develop three artificial neural network (ANN) models. The models developed were regression tree-based back-propagation network (BPN-RT), regression tree-based radial basis function network (RBFN-RT), and GART models. Performance measures were used to validate the predictive capacity of the developed IAQ models. The results from this approach were compared with the results obtained from using a theoretical approach and a generalized practicable approach to modeling IAQ that included the consideration of additional independent variables when developing the aforementioned ANN models. The hybrid GART models were able to capture majority of the variance in the monitored in-bus contaminants. The genetic-algorithm-based neural network IAQ models outperformed the traditional ANN methods of the back-propagation and the radial basis function networks. The novelty of this research is the development of a novel approach to modeling vehicular indoor air quality by integration of the advanced methods of genetic algorithms, regression trees, and the analysis of variance for the monitored in-vehicle gaseous and particulate matter contaminants, and comparing the results obtained from using the developed approach with conventional artificial intelligence techniques of back propagation networks and radial basis function networks. This study validated the newly developed approach using holdout and threefold cross-validation methods. These results are of great interest to scientists, researchers, and the public in understanding the various aspects of modeling an indoor microenvironment. This methodology can easily be extended to other fields of study also.
Freitas, Alex A; Limbu, Kriti; Ghafourian, Taravat
2015-01-01
Volume of distribution is an important pharmacokinetic property that indicates the extent of a drug's distribution in the body tissues. This paper addresses the problem of how to estimate the apparent volume of distribution at steady state (Vss) of chemical compounds in the human body using decision tree-based regression methods from the area of data mining (or machine learning). Hence, the pros and cons of several different types of decision tree-based regression methods have been discussed. The regression methods predict Vss using, as predictive features, both the compounds' molecular descriptors and the compounds' tissue:plasma partition coefficients (Kt:p) - often used in physiologically-based pharmacokinetics. Therefore, this work has assessed whether the data mining-based prediction of Vss can be made more accurate by using as input not only the compounds' molecular descriptors but also (a subset of) their predicted Kt:p values. Comparison of the models that used only molecular descriptors, in particular, the Bagging decision tree (mean fold error of 2.33), with those employing predicted Kt:p values in addition to the molecular descriptors, such as the Bagging decision tree using adipose Kt:p (mean fold error of 2.29), indicated that the use of predicted Kt:p values as descriptors may be beneficial for accurate prediction of Vss using decision trees if prior feature selection is applied. Decision tree based models presented in this work have an accuracy that is reasonable and similar to the accuracy of reported Vss inter-species extrapolations in the literature. The estimation of Vss for new compounds in drug discovery will benefit from methods that are able to integrate large and varied sources of data and flexible non-linear data mining methods such as decision trees, which can produce interpretable models. Graphical AbstractDecision trees for the prediction of tissue partition coefficient and volume of distribution of drugs.
Regression analysis using dependent Polya trees.
Schörgendorfer, Angela; Branscum, Adam J
2013-11-30
Many commonly used models for linear regression analysis force overly simplistic shape and scale constraints on the residual structure of data. We propose a semiparametric Bayesian model for regression analysis that produces data-driven inference by using a new type of dependent Polya tree prior to model arbitrary residual distributions that are allowed to evolve across increasing levels of an ordinal covariate (e.g., time, in repeated measurement studies). By modeling residual distributions at consecutive covariate levels or time points using separate, but dependent Polya tree priors, distributional information is pooled while allowing for broad pliability to accommodate many types of changing residual distributions. We can use the proposed dependent residual structure in a wide range of regression settings, including fixed-effects and mixed-effects linear and nonlinear models for cross-sectional, prospective, and repeated measurement data. A simulation study illustrates the flexibility of our novel semiparametric regression model to accurately capture evolving residual distributions. In an application to immune development data on immunoglobulin G antibodies in children, our new model outperforms several contemporary semiparametric regression models based on a predictive model selection criterion. Copyright © 2013 John Wiley & Sons, Ltd.
Modeling Caribbean tree stem diameters from tree height and crown width measurements
Thomas Brandeis; KaDonna Randolph; Mike Strub
2009-01-01
Regression models to predict diameter at breast height (DBH) as a function of tree height and maximum crown radius were developed for Caribbean forests based on data collected by the U.S. Forest Service in the Commonwealth of Puerto Rico and Territory of the U.S. Virgin Islands. The model predicting DBH from tree height fit reasonably well (R2 = 0.7110), with...
Kathleen L. Kavanaugh; Matthew B. Dickinson; Anthony S. Bova
2010-01-01
Current operational methods for predicting tree mortality from fire injury are regression-based models that only indirectly consider underlying causes and, thus, have limited generality. A better understanding of the physiological consequences of tree heating and injury are needed to develop biophysical process models that can make predictions under changing or novel...
Gretchen G. Moisen; Elizabeth A. Freeman; Jock A. Blackard; Tracey S. Frescino; Niklaus E. Zimmermann; Thomas C. Edwards
2006-01-01
Many efforts are underway to produce broad-scale forest attribute maps by modelling forest class and structure variables collected in forest inventories as functions of satellite-based and biophysical information. Typically, variants of classification and regression trees implemented in Rulequest's© See5 and Cubist (for binary and continuous responses,...
Suchetana, Bihu; Rajagopalan, Balaji; Silverstein, JoAnn
2017-11-15
A regression tree-based diagnostic approach is developed to evaluate factors affecting US wastewater treatment plant compliance with ammonia discharge permit limits using Discharge Monthly Report (DMR) data from a sample of 106 municipal treatment plants for the period of 2004-2008. Predictor variables used to fit the regression tree are selected using random forests, and consist of the previous month's effluent ammonia, influent flow rates and plant capacity utilization. The tree models are first used to evaluate compliance with existing ammonia discharge standards at each facility and then applied assuming more stringent discharge limits, under consideration in many states. The model predicts that the ability to meet both current and future limits depends primarily on the previous month's treatment performance. With more stringent discharge limits predicted ammonia concentration relative to the discharge limit, increases. In-sample validation shows that the regression trees can provide a median classification accuracy of >70%. The regression tree model is validated using ammonia discharge data from an operating wastewater treatment plant and is able to accurately predict the observed ammonia discharge category approximately 80% of the time, indicating that the regression tree model can be applied to predict compliance for individual treatment plants providing practical guidance for utilities and regulators with an interest in controlling ammonia discharges. The proposed methodology is also used to demonstrate how to delineate reliable sources of demand and supply in a point source-to-point source nutrient credit trading scheme, as well as how planners and decision makers can set reasonable discharge limits in future. Copyright © 2017 Elsevier B.V. All rights reserved.
Fire frequency in the Interior Columbia River Basin: Building regional models from fire history data
McKenzie, D.; Peterson, D.L.; Agee, James K.
2000-01-01
Fire frequency affects vegetation composition and successional pathways; thus it is essential to understand fire regimes in order to manage natural resources at broad spatial scales. Fire history data are lacking for many regions for which fire management decisions are being made, so models are needed to estimate past fire frequency where local data are not yet available. We developed multiple regression models and tree-based (classification and regression tree, or CART) models to predict fire return intervals across the interior Columbia River basin at 1-km resolution, using georeferenced fire history, potential vegetation, cover type, and precipitation databases. The models combined semiqualitative methods and rigorous statistics. The fire history data are of uneven quality; some estimates are based on only one tree, and many are not cross-dated. Therefore, we weighted the models based on data quality and performed a sensitivity analysis of the effects on the models of estimation errors that are due to lack of cross-dating. The regression models predict fire return intervals from 1 to 375 yr for forested areas, whereas the tree-based models predict a range of 8 to 150 yr. Both types of models predict latitudinal and elevational gradients of increasing fire return intervals. Examination of regional-scale output suggests that, although the tree-based models explain more of the variation in the original data, the regression models are less likely to produce extrapolation errors. Thus, the models serve complementary purposes in elucidating the relationships among fire frequency, the predictor variables, and spatial scale. The models can provide local managers with quantitative information and provide data to initialize coarse-scale fire-effects models, although predictions for individual sites should be treated with caution because of the varying quality and uneven spatial coverage of the fire history database. The models also demonstrate the integration of qualitative and quantitative methods when requisite data for fully quantitative models are unavailable. They can be tested by comparing new, independent fire history reconstructions against their predictions and can be continually updated, as better fire history data become available.
An Intelligent Decision Support System for Workforce Forecast
2011-01-01
ARIMA ) model to forecast the demand for construction skills in Hong Kong. This model was based...Decision Trees ARIMA Rule Based Forecasting Segmentation Forecasting Regression Analysis Simulation Modeling Input-Output Models LP and NLP Markovian...data • When results are needed as a set of easily interpretable rules 4.1.4 ARIMA Auto-regressive, integrated, moving-average ( ARIMA ) models
Multivariate regression model for partitioning tree volume of white oak into round-product classes
Daniel A. Yaussy; David L. Sonderman
1984-01-01
Describes the development of multivariate equations that predict the expected cubic volume of four round-product classes from independent variables composed of individual tree-quality characteristics. Although the model has limited application at this time, it does demonstrate the feasibility of partitioning total tree cubic volume into round-product classes based on...
NASA Astrophysics Data System (ADS)
Künne, A.; Fink, M.; Kipka, H.; Krause, P.; Flügel, W.-A.
2012-06-01
In this paper, a method is presented to estimate excess nitrogen on large scales considering single field processes. The approach was implemented by using the physically based model J2000-S to simulate the nitrogen balance as well as the hydrological dynamics within meso-scale test catchments. The model input data, the parameterization, the results and a detailed system understanding were used to generate the regression tree models with GUIDE (Loh, 2002). For each landscape type in the federal state of Thuringia a regression tree was calibrated and validated using the model data and results of excess nitrogen from the test catchments. Hydrological parameters such as precipitation and evapotranspiration were also used to predict excess nitrogen by the regression tree model. Hence they had to be calculated and regionalized as well for the state of Thuringia. Here the model J2000g was used to simulate the water balance on the macro scale. With the regression trees the excess nitrogen was regionalized for each landscape type of Thuringia. The approach allows calculating the potential nitrogen input into the streams of the drainage area. The results show that the applied methodology was able to transfer the detailed model results of the meso-scale catchments to the entire state of Thuringia by low computing time without losing the detailed knowledge from the nitrogen transport modeling. This was validated with modeling results from Fink (2004) in a catchment lying in the regionalization area. The regionalized and modeled excess nitrogen correspond with 94%. The study was conducted within the framework of a project in collaboration with the Thuringian Environmental Ministry, whose overall aim was to assess the effect of agro-environmental measures regarding load reduction in the water bodies of Thuringia to fulfill the requirements of the European Water Framework Directive (Bäse et al., 2007; Fink, 2006; Fink et al., 2007).
Simulation of land use change in the three gorges reservoir area based on CART-CA
NASA Astrophysics Data System (ADS)
Yuan, Min
2018-05-01
This study proposes a new method to simulate spatiotemporal complex multiple land uses by using classification and regression tree algorithm (CART) based CA model. In this model, we use classification and regression tree algorithm to calculate land class conversion probability, and combine neighborhood factor, random factor to extract cellular transformation rules. The overall Kappa coefficient is 0.8014 and the overall accuracy is 0.8821 in the land dynamic simulation results of the three gorges reservoir area from 2000 to 2010, and the simulation results are satisfactory.
Lin, Lei; Wang, Qian; Sadek, Adel W
2016-06-01
The duration of freeway traffic accidents duration is an important factor, which affects traffic congestion, environmental pollution, and secondary accidents. Among previous studies, the M5P algorithm has been shown to be an effective tool for predicting incident duration. M5P builds a tree-based model, like the traditional classification and regression tree (CART) method, but with multiple linear regression models as its leaves. The problem with M5P for accident duration prediction, however, is that whereas linear regression assumes that the conditional distribution of accident durations is normally distributed, the distribution for a "time-to-an-event" is almost certainly nonsymmetrical. A hazard-based duration model (HBDM) is a better choice for this kind of a "time-to-event" modeling scenario, and given this, HBDMs have been previously applied to analyze and predict traffic accidents duration. Previous research, however, has not yet applied HBDMs for accident duration prediction, in association with clustering or classification of the dataset to minimize data heterogeneity. The current paper proposes a novel approach for accident duration prediction, which improves on the original M5P tree algorithm through the construction of a M5P-HBDM model, in which the leaves of the M5P tree model are HBDMs instead of linear regression models. Such a model offers the advantage of minimizing data heterogeneity through dataset classification, and avoids the need for the incorrect assumption of normality for traffic accident durations. The proposed model was then tested on two freeway accident datasets. For each dataset, the first 500 records were used to train the following three models: (1) an M5P tree; (2) a HBDM; and (3) the proposed M5P-HBDM, and the remainder of data were used for testing. The results show that the proposed M5P-HBDM managed to identify more significant and meaningful variables than either M5P or HBDMs. Moreover, the M5P-HBDM had the lowest overall mean absolute percentage error (MAPE). Copyright © 2016 Elsevier Ltd. All rights reserved.
Zhang, Ling Yu; Liu, Zhao Gang
2017-12-01
Based on the data collected from 108 permanent plots of the forest resources survey in Maoershan Experimental Forest Farm during 2004-2016, this study investigated the spatial distribution of recruitment trees in natural secondary forest by global Poisson regression and geographically weighted Poisson regression (GWPR) with four bandwidths of 2.5, 5, 10 and 15 km. The simulation effects of the 5 regressions and the factors influencing the recruitment trees in stands were analyzed, a description was given to the spatial autocorrelation of the regression residuals on global and local levels using Moran's I. The results showed that the spatial distribution of the number of natural secondary forest recruitment was significantly influenced by stands and topographic factors, especially average DBH. The GWPR model with small scale (2.5 km) had high accuracy of model fitting, a large range of model parameter estimates was generated, and the localized spatial distribution effect of the model parameters was obtained. The GWPR model at small scale (2.5 and 5 km) had produced a small range of model residuals, and the stability of the model was improved. The global spatial auto-correlation of the GWPR model residual at the small scale (2.5 km) was the lowe-st, and the local spatial auto-correlation was significantly reduced, in which an ideal spatial distribution pattern of small clusters with different observations was formed. The local model at small scale (2.5 km) was much better than the global model in the simulation effect on the spatial distribution of recruitment tree number.
Jovanovic, Milos; Radovanovic, Sandro; Vukicevic, Milan; Van Poucke, Sven; Delibasic, Boris
2016-09-01
Quantification and early identification of unplanned readmission risk have the potential to improve the quality of care during hospitalization and after discharge. However, high dimensionality, sparsity, and class imbalance of electronic health data and the complexity of risk quantification, challenge the development of accurate predictive models. Predictive models require a certain level of interpretability in order to be applicable in real settings and create actionable insights. This paper aims to develop accurate and interpretable predictive models for readmission in a general pediatric patient population, by integrating a data-driven model (sparse logistic regression) and domain knowledge based on the international classification of diseases 9th-revision clinical modification (ICD-9-CM) hierarchy of diseases. Additionally, we propose a way to quantify the interpretability of a model and inspect the stability of alternative solutions. The analysis was conducted on >66,000 pediatric hospital discharge records from California, State Inpatient Databases, Healthcare Cost and Utilization Project between 2009 and 2011. We incorporated domain knowledge based on the ICD-9-CM hierarchy in a data driven, Tree-Lasso regularized logistic regression model, providing the framework for model interpretation. This approach was compared with traditional Lasso logistic regression resulting in models that are easier to interpret by fewer high-level diagnoses, with comparable prediction accuracy. The results revealed that the use of a Tree-Lasso model was as competitive in terms of accuracy (measured by area under the receiver operating characteristic curve-AUC) as the traditional Lasso logistic regression, but integration with the ICD-9-CM hierarchy of diseases provided more interpretable models in terms of high-level diagnoses. Additionally, interpretations of models are in accordance with existing medical understanding of pediatric readmission. Best performing models have similar performances reaching AUC values 0.783 and 0.779 for traditional Lasso and Tree-Lasso, respectfully. However, information loss of Lasso models is 0.35 bits higher compared to Tree-Lasso model. We propose a method for building predictive models applicable for the detection of readmission risk based on Electronic Health records. Integration of domain knowledge (in the form of ICD-9-CM taxonomy) and a data-driven, sparse predictive algorithm (Tree-Lasso Logistic Regression) resulted in an increase of interpretability of the resulting model. The models are interpreted for the readmission prediction problem in general pediatric population in California, as well as several important subpopulations, and the interpretations of models comply with existing medical understanding of pediatric readmission. Finally, quantitative assessment of the interpretability of the models is given, that is beyond simple counts of selected low-level features. Copyright © 2016 Elsevier B.V. All rights reserved.
A self-trained classification technique for producing 30 m percent-water maps from Landsat data
Rover, Jennifer R.; Wylie, Bruce K.; Ji, Lei
2010-01-01
Small bodies of water can be mapped with moderate-resolution satellite data using methods where water is mapped as subpixel fractions using field measurements or high-resolution images as training datasets. A new method, developed from a regression-tree technique, uses a 30 m Landsat image for training the regression tree that, in turn, is applied to the same image to map subpixel water. The self-trained method was evaluated by comparing the percent-water map with three other maps generated from established percent-water mapping methods: (1) a regression-tree model trained with a 5 m SPOT 5 image, (2) a regression-tree model based on endmembers and (3) a linear unmixing classification technique. The results suggest that subpixel water fractions can be accurately estimated when high-resolution satellite data or intensively interpreted training datasets are not available, which increases our ability to map small water bodies or small changes in lake size at a regional scale.
Scalable Regression Tree Learning on Hadoop using OpenPlanet
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yin, Wei; Simmhan, Yogesh; Prasanna, Viktor
As scientific and engineering domains attempt to effectively analyze the deluge of data arriving from sensors and instruments, machine learning is becoming a key data mining tool to build prediction models. Regression tree is a popular learning model that combines decision trees and linear regression to forecast numerical target variables based on a set of input features. Map Reduce is well suited for addressing such data intensive learning applications, and a proprietary regression tree algorithm, PLANET, using MapReduce has been proposed earlier. In this paper, we describe an open source implement of this algorithm, OpenPlanet, on the Hadoop framework usingmore » a hybrid approach. Further, we evaluate the performance of OpenPlanet using realworld datasets from the Smart Power Grid domain to perform energy use forecasting, and propose tuning strategies of Hadoop parameters to improve the performance of the default configuration by 75% for a training dataset of 17 million tuples on a 64-core Hadoop cluster on FutureGrid.« less
Henrard, S; Speybroeck, N; Hermans, C
2015-11-01
Haemophilia is a rare genetic haemorrhagic disease characterized by partial or complete deficiency of coagulation factor VIII, for haemophilia A, or IX, for haemophilia B. As in any other medical research domain, the field of haemophilia research is increasingly concerned with finding factors associated with binary or continuous outcomes through multivariable models. Traditional models include multiple logistic regressions, for binary outcomes, and multiple linear regressions for continuous outcomes. Yet these regression models are at times difficult to implement, especially for non-statisticians, and can be difficult to interpret. The present paper sought to didactically explain how, why, and when to use classification and regression tree (CART) analysis for haemophilia research. The CART method is non-parametric and non-linear, based on the repeated partitioning of a sample into subgroups based on a certain criterion. Breiman developed this method in 1984. Classification trees (CTs) are used to analyse categorical outcomes and regression trees (RTs) to analyse continuous ones. The CART methodology has become increasingly popular in the medical field, yet only a few examples of studies using this methodology specifically in haemophilia have to date been published. Two examples using CART analysis and previously published in this field are didactically explained in details. There is increasing interest in using CART analysis in the health domain, primarily due to its ease of implementation, use, and interpretation, thus facilitating medical decision-making. This method should be promoted for analysing continuous or categorical outcomes in haemophilia, when applicable. © 2015 John Wiley & Sons Ltd.
Du, Hua Qiang; Sun, Xiao Yan; Han, Ning; Mao, Fang Jie
2017-10-01
By synergistically using the object-based image analysis (OBIA) and the classification and regression tree (CART) methods, the distribution information, the indexes (including diameter at breast, tree height, and crown closure), and the aboveground carbon storage (AGC) of moso bamboo forest in Shanchuan Town, Anji County, Zhejiang Province were investigated. The results showed that the moso bamboo forest could be accurately delineated by integrating the multi-scale ima ge segmentation in OBIA technique and CART, which connected the image objects at various scales, with a pretty good producer's accuracy of 89.1%. The investigation of indexes estimated by regression tree model that was constructed based on the features extracted from the image objects reached normal or better accuracy, in which the crown closure model archived the best estimating accuracy of 67.9%. The estimating accuracy of diameter at breast and tree height was relatively low, which was consistent with conclusion that estimating diameter at breast and tree height using optical remote sensing could not achieve satisfactory results. Estimation of AGC reached relatively high accuracy, and accuracy of the region of high value achieved above 80%.
NASA Astrophysics Data System (ADS)
Tang, Jie; Liu, Rong; Zhang, Yue-Li; Liu, Mou-Ze; Hu, Yong-Fang; Shao, Ming-Jie; Zhu, Li-Jun; Xin, Hua-Wen; Feng, Gui-Wen; Shang, Wen-Jun; Meng, Xiang-Guang; Zhang, Li-Rong; Ming, Ying-Zi; Zhang, Wei
2017-02-01
Tacrolimus has a narrow therapeutic window and considerable variability in clinical use. Our goal was to compare the performance of multiple linear regression (MLR) and eight machine learning techniques in pharmacogenetic algorithm-based prediction of tacrolimus stable dose (TSD) in a large Chinese cohort. A total of 1,045 renal transplant patients were recruited, 80% of which were randomly selected as the “derivation cohort” to develop dose-prediction algorithm, while the remaining 20% constituted the “validation cohort” to test the final selected algorithm. MLR, artificial neural network (ANN), regression tree (RT), multivariate adaptive regression splines (MARS), boosted regression tree (BRT), support vector regression (SVR), random forest regression (RFR), lasso regression (LAR) and Bayesian additive regression trees (BART) were applied and their performances were compared in this work. Among all the machine learning models, RT performed best in both derivation [0.71 (0.67-0.76)] and validation cohorts [0.73 (0.63-0.82)]. In addition, the ideal rate of RT was 4% higher than that of MLR. To our knowledge, this is the first study to use machine learning models to predict TSD, which will further facilitate personalized medicine in tacrolimus administration in the future.
Anantha M. Prasad; Louis R. Iverson; Andy Liaw; Andy Liaw
2006-01-01
We evaluated four statistical models - Regression Tree Analysis (RTA), Bagging Trees (BT), Random Forests (RF), and Multivariate Adaptive Regression Splines (MARS) - for predictive vegetation mapping under current and future climate scenarios according to the Canadian Climate Centre global circulation model.
Shafizadeh-Moghadam, Hossein; Valavi, Roozbeh; Shahabi, Himan; Chapi, Kamran; Shirzadi, Ataollah
2018-07-01
In this research, eight individual machine learning and statistical models are implemented and compared, and based on their results, seven ensemble models for flood susceptibility assessment are introduced. The individual models included artificial neural networks, classification and regression trees, flexible discriminant analysis, generalized linear model, generalized additive model, boosted regression trees, multivariate adaptive regression splines, and maximum entropy, and the ensemble models were Ensemble Model committee averaging (EMca), Ensemble Model confidence interval Inferior (EMciInf), Ensemble Model confidence interval Superior (EMciSup), Ensemble Model to estimate the coefficient of variation (EMcv), Ensemble Model to estimate the mean (EMmean), Ensemble Model to estimate the median (EMmedian), and Ensemble Model based on weighted mean (EMwmean). The data set covered 201 flood events in the Haraz watershed (Mazandaran province in Iran) and 10,000 randomly selected non-occurrence points. Among the individual models, the Area Under the Receiver Operating Characteristic (AUROC), which showed the highest value, belonged to boosted regression trees (0.975) and the lowest value was recorded for generalized linear model (0.642). On the other hand, the proposed EMmedian resulted in the highest accuracy (0.976) among all models. In spite of the outstanding performance of some models, nevertheless, variability among the prediction of individual models was considerable. Therefore, to reduce uncertainty, creating more generalizable, more stable, and less sensitive models, ensemble forecasting approaches and in particular the EMmedian is recommended for flood susceptibility assessment. Copyright © 2018 Elsevier Ltd. All rights reserved.
Hart, Carl R; Reznicek, Nathan J; Wilson, D Keith; Pettit, Chris L; Nykaza, Edward T
2016-05-01
Many outdoor sound propagation models exist, ranging from highly complex physics-based simulations to simplified engineering calculations, and more recently, highly flexible statistical learning methods. Several engineering and statistical learning models are evaluated by using a particular physics-based model, namely, a Crank-Nicholson parabolic equation (CNPE), as a benchmark. Narrowband transmission loss values predicted with the CNPE, based upon a simulated data set of meteorological, boundary, and source conditions, act as simulated observations. In the simulated data set sound propagation conditions span from downward refracting to upward refracting, for acoustically hard and soft boundaries, and low frequencies. Engineering models used in the comparisons include the ISO 9613-2 method, Harmonoise, and Nord2000 propagation models. Statistical learning methods used in the comparisons include bagged decision tree regression, random forest regression, boosting regression, and artificial neural network models. Computed skill scores are relative to sound propagation in a homogeneous atmosphere over a rigid ground. Overall skill scores for the engineering noise models are 0.6%, -7.1%, and 83.8% for the ISO 9613-2, Harmonoise, and Nord2000 models, respectively. Overall skill scores for the statistical learning models are 99.5%, 99.5%, 99.6%, and 99.6% for bagged decision tree, random forest, boosting, and artificial neural network regression models, respectively.
Steen, Paul J.; Passino-Reader, Dora R.; Wiley, Michael J.
2006-01-01
As a part of the Great Lakes Regional Aquatic Gap Analysis Project, we evaluated methodologies for modeling associations between fish species and habitat characteristics at a landscape scale. To do this, we created brook trout Salvelinus fontinalis presence and absence models based on four different techniques: multiple linear regression, logistic regression, neural networks, and classification trees. The models were tested in two ways: by application to an independent validation database and cross-validation using the training data, and by visual comparison of statewide distribution maps with historically recorded occurrences from the Michigan Fish Atlas. Although differences in the accuracy of our models were slight, the logistic regression model predicted with the least error, followed by multiple regression, then classification trees, then the neural networks. These models will provide natural resource managers a way to identify habitats requiring protection for the conservation of fish species.
Teng, Ju-Hsi; Lin, Kuan-Chia; Ho, Bin-Shenq
2007-10-01
A community-based aboriginal study was conducted and analysed to explore the application of classification tree and logistic regression. A total of 1066 aboriginal residents in Yilan County were screened during 2003-2004. The independent variables include demographic characteristics, physical examinations, geographic location, health behaviours, dietary habits and family hereditary diseases history. Risk factors of cardiovascular diseases were selected as the dependent variables in further analysis. The completion rate for heath interview is 88.9%. The classification tree results find that if body mass index is higher than 25.72 kg m(-2) and the age is above 51 years, the predicted probability for number of cardiovascular risk factors > or =3 is 73.6% and the population is 322. If body mass index is higher than 26.35 kg m(-2) and geographical latitude of the village is lower than 24 degrees 22.8', the predicted probability for number of cardiovascular risk factors > or =4 is 60.8% and the population is 74. As the logistic regression results indicate that body mass index, drinking habit and menopause are the top three significant independent variables. The classification tree model specifically shows the discrimination paths and interactions between the risk groups. The logistic regression model presents and analyses the statistical independent factors of cardiovascular risks. Applying both models to specific situations will provide a different angle for the design and management of future health intervention plans after community-based study.
Inferring gene regression networks with model trees
2010-01-01
Background Novel strategies are required in order to handle the huge amount of data produced by microarray technologies. To infer gene regulatory networks, the first step is to find direct regulatory relationships between genes building the so-called gene co-expression networks. They are typically generated using correlation statistics as pairwise similarity measures. Correlation-based methods are very useful in order to determine whether two genes have a strong global similarity but do not detect local similarities. Results We propose model trees as a method to identify gene interaction networks. While correlation-based methods analyze each pair of genes, in our approach we generate a single regression tree for each gene from the remaining genes. Finally, a graph from all the relationships among output and input genes is built taking into account whether the pair of genes is statistically significant. For this reason we apply a statistical procedure to control the false discovery rate. The performance of our approach, named REGNET, is experimentally tested on two well-known data sets: Saccharomyces Cerevisiae and E.coli data set. First, the biological coherence of the results are tested. Second the E.coli transcriptional network (in the Regulon database) is used as control to compare the results to that of a correlation-based method. This experiment shows that REGNET performs more accurately at detecting true gene associations than the Pearson and Spearman zeroth and first-order correlation-based methods. Conclusions REGNET generates gene association networks from gene expression data, and differs from correlation-based methods in that the relationship between one gene and others is calculated simultaneously. Model trees are very useful techniques to estimate the numerical values for the target genes by linear regression functions. They are very often more precise than linear regression models because they can add just different linear regressions to separate areas of the search space favoring to infer localized similarities over a more global similarity. Furthermore, experimental results show the good performance of REGNET. PMID:20950452
Travis Woolley; David C. Shaw; Lisa M. Ganio; Stephen Fitzgerald
2012-01-01
Logistic regression models used to predict tree mortality are critical to post-fire management, planning prescribed bums and understanding disturbance ecology. We review literature concerning post-fire mortality prediction using logistic regression models for coniferous tree species in the western USA. We include synthesis and review of: methods to develop, evaluate...
[Hyperspectral Estimation of Apple Tree Canopy LAI Based on SVM and RF Regression].
Han, Zhao-ying; Zhu, Xi-cun; Fang, Xian-yi; Wang, Zhuo-yuan; Wang, Ling; Zhao, Geng-Xing; Jiang, Yuan-mao
2016-03-01
Leaf area index (LAI) is the dynamic index of crop population size. Hyperspectral technology can be used to estimate apple canopy LAI rapidly and nondestructively. It can be provide a reference for monitoring the tree growing and yield estimation. The Red Fuji apple trees of full bearing fruit are the researching objects. Ninety apple trees canopies spectral reflectance and LAI values were measured by the ASD Fieldspec3 spectrometer and LAI-2200 in thirty orchards in constant two years in Qixia research area of Shandong Province. The optimal vegetation indices were selected by the method of correlation analysis of the original spectral reflectance and vegetation indices. The models of predicting the LAI were built with the multivariate regression analysis method of support vector machine (SVM) and random forest (RF). The new vegetation indices, GNDVI527, ND-VI676, RVI682, FD-NVI656 and GRVI517 and the previous two main vegetation indices, NDVI670 and NDVI705, are in accordance with LAI. In the RF regression model, the calibration set decision coefficient C-R2 of 0.920 and validation set decision coefficient V-R2 of 0.889 are higher than the SVM regression model by 0.045 and 0.033 respectively. The root mean square error of calibration set C-RMSE of 0.249, the root mean square error validation set V-RMSE of 0.236 are lower than that of the SVM regression model by 0.054 and 0.058 respectively. Relative analysis of calibrating error C-RPD and relative analysis of validation set V-RPD reached 3.363 and 2.520, 0.598 and 0.262, respectively, which were higher than the SVM regression model. The measured and predicted the scatterplot trend line slope of the calibration set and validation set C-S and V-S are close to 1. The estimation result of RF regression model is better than that of the SVM. RF regression model can be used to estimate the LAI of red Fuji apple trees in full fruit period.
Automatic energy expenditure measurement for health science.
Catal, Cagatay; Akbulut, Akhan
2018-04-01
It is crucial to predict the human energy expenditure in any sports activity and health science application accurately to investigate the impact of the activity. However, measurement of the real energy expenditure is not a trivial task and involves complex steps. The objective of this work is to improve the performance of existing estimation models of energy expenditure by using machine learning algorithms and several data from different sensors and provide this estimation service in a cloud-based platform. In this study, we used input data such as breathe rate, and hearth rate from three sensors. Inputs are received from a web form and sent to the web service which applies a regression model on Azure cloud platform. During the experiments, we assessed several machine learning models based on regression methods. Our experimental results showed that our novel model which applies Boosted Decision Tree Regression in conjunction with the median aggregation technique provides the best result among other five regression algorithms. This cloud-based energy expenditure system which uses a web service showed that cloud computing technology is a great opportunity to develop estimation systems and the new model which applies Boosted Decision Tree Regression with the median aggregation provides remarkable results. Copyright © 2018 Elsevier B.V. All rights reserved.
Naghibi, Seyed Amir; Pourghasemi, Hamid Reza; Dixon, Barnali
2016-01-01
Groundwater is considered one of the most valuable fresh water resources. The main objective of this study was to produce groundwater spring potential maps in the Koohrang Watershed, Chaharmahal-e-Bakhtiari Province, Iran, using three machine learning models: boosted regression tree (BRT), classification and regression tree (CART), and random forest (RF). Thirteen hydrological-geological-physiographical (HGP) factors that influence locations of springs were considered in this research. These factors include slope degree, slope aspect, altitude, topographic wetness index (TWI), slope length (LS), plan curvature, profile curvature, distance to rivers, distance to faults, lithology, land use, drainage density, and fault density. Subsequently, groundwater spring potential was modeled and mapped using CART, RF, and BRT algorithms. The predicted results from the three models were validated using the receiver operating characteristics curve (ROC). From 864 springs identified, 605 (≈70 %) locations were used for the spring potential mapping, while the remaining 259 (≈30 %) springs were used for the model validation. The area under the curve (AUC) for the BRT model was calculated as 0.8103 and for CART and RF the AUC were 0.7870 and 0.7119, respectively. Therefore, it was concluded that the BRT model produced the best prediction results while predicting locations of springs followed by CART and RF models, respectively. Geospatially integrated BRT, CART, and RF methods proved to be useful in generating the spring potential map (SPM) with reasonable accuracy.
Boosted regression tree, table, and figure data
Spreadsheets are included here to support the manuscript Boosted Regression Tree Models to Explain Watershed Nutrient Concentrations and Biological Condition. This dataset is associated with the following publication:Golden , H., C. Lane , A. Prues, and E. D'Amico. Boosted Regression Tree Models to Explain Watershed Nutrient Concentrations and Biological Condition. JAWRA. American Water Resources Association, Middleburg, VA, USA, 52(5): 1251-1274, (2016).
Wylie, Bruce K.; Howard, Daniel; Dahal, Devendra; Gilmanov, Tagir; Ji, Lei; Zhang, Li; Smith, Kelcy
2016-01-01
This paper presents the methodology and results of two ecological-based net ecosystem production (NEP) regression tree models capable of up scaling measurements made at various flux tower sites throughout the U.S. Great Plains. Separate grassland and cropland NEP regression tree models were trained using various remote sensing data and other biogeophysical data, along with 15 flux towers contributing to the grassland model and 15 flux towers for the cropland model. The models yielded weekly mean daily grassland and cropland NEP maps of the U.S. Great Plains at 250 m resolution for 2000–2008. The grassland and cropland NEP maps were spatially summarized and statistically compared. The results of this study indicate that grassland and cropland ecosystems generally performed as weak net carbon (C) sinks, absorbing more C from the atmosphere than they released from 2000 to 2008. Grasslands demonstrated higher carbon sink potential (139 g C·m−2·year−1) than non-irrigated croplands. A closer look into the weekly time series reveals the C fluctuation through time and space for each land cover type.
Spectral analysis of white ash response to emerald ash borer infestations
NASA Astrophysics Data System (ADS)
Calandra, Laura
The emerald ash borer (EAB) (Agrilus planipennis Fairmaire) is an invasive insect that has killed over 50 million ash trees in the US. The goal of this research was to establish a method to identify ash trees infested with EAB using remote sensing techniques at the leaf-level and tree crown level. First, a field-based study at the leaf-level used the range of spectral bands from the WorldView-2 sensor to determine if there was a significant difference between EAB-infested white ash (Fraxinus americana) and healthy leaves. Binary logistic regression models were developed using individual and combinations of wavelengths; the most successful model included 545 and 950 nm bands. The second half of this research employed imagery to identify healthy and EAB-infested trees, comparing pixel- and object-based methods by applying an unsupervised classification approach and a tree crown delineation algorithm, respectively. The pixel-based models attained the highest overall accuracies.
Du, Ning; Fan, Jintu; Chen, Shuo; Liu, Yang
2008-07-21
Although recent investigations [Ryan, M.G., Yoder, B.J., 1997. Hydraulic limits to tree height and tree growth. Bioscience 47, 235-242; Koch, G.W., Sillett, S.C.,Jennings, G.M.,Davis, S.D., 2004. The limits to tree height. Nature 428, 851-854; Niklas, K.J., Spatz, H., 2004. Growth and hydraulic (not mechanical) constraints govern the scaling of tree height and mass. Proc. Natl Acad. Sci. 101, 15661-15663; Ryan, M.G., Phillips, N., Bond, B.J., 2006. Hydraulic limitation hypothesis revisited. Plant Cell Environ. 29, 367-381; Niklas, K.J., 2007. Maximum plant height and the biophysical factors that limit it. Tree Physiol. 27, 433-440; Burgess, S.S.O., Dawson, T.E., 2007. Predicting the limits to tree height using statistical regressions of leaf traits. New Phytol. 174, 626-636] suggested that the hydraulic limitation hypothesis (HLH) is the most plausible theory to explain the biophysical limits to maximum tree height and the decline in tree growth rate with age, the analysis is largely qualitative or based on statistical regression. Here we present an integrated biophysical model based on the principle that trees develop physiological compensations (e.g. the declined leaf water potential and the tapering of conduits with heights [West, G.B., Brown, J.H., Enquist, B.J., 1999. A general model for the structure and allometry of plant vascular systems. Nature 400, 664-667]) to resist the increasing water stress with height, the classical HLH and the biochemical limitations on photosynthesis [von Caemmerer, S., 2000. Biochemical Models of Leaf Photosynthesis. CSIRO Publishing, Australia]. The model has been applied to the tallest trees in the world (viz. Coast redwood (Sequoia sempervirens)). Xylem water potential, leaf carbon isotope composition, leaf mass to area ratio at different heights derived from the model show good agreements with the experimental measurements of Koch et al. [2004. The limits to tree height. Nature 428, 851-854]. The model also well explains the universal trend of declining growth rate with age.
Fokkema, M; Smits, N; Zeileis, A; Hothorn, T; Kelderman, H
2017-10-25
Identification of subgroups of patients for whom treatment A is more effective than treatment B, and vice versa, is of key importance to the development of personalized medicine. Tree-based algorithms are helpful tools for the detection of such interactions, but none of the available algorithms allow for taking into account clustered or nested dataset structures, which are particularly common in psychological research. Therefore, we propose the generalized linear mixed-effects model tree (GLMM tree) algorithm, which allows for the detection of treatment-subgroup interactions, while accounting for the clustered structure of a dataset. The algorithm uses model-based recursive partitioning to detect treatment-subgroup interactions, and a GLMM to estimate the random-effects parameters. In a simulation study, GLMM trees show higher accuracy in recovering treatment-subgroup interactions, higher predictive accuracy, and lower type II error rates than linear-model-based recursive partitioning and mixed-effects regression trees. Also, GLMM trees show somewhat higher predictive accuracy than linear mixed-effects models with pre-specified interaction effects, on average. We illustrate the application of GLMM trees on an individual patient-level data meta-analysis on treatments for depression. We conclude that GLMM trees are a promising exploratory tool for the detection of treatment-subgroup interactions in clustered datasets.
NASA Astrophysics Data System (ADS)
Rao, M.; Vuong, H.
2013-12-01
The overall objective of this study is to develop a method for estimating total aboveground biomass of redwood stands in Jackson Demonstration State Forest, Mendocino, California using airborne LiDAR data. LiDAR data owing to its vertical and horizontal accuracy are increasingly being used to characterize landscape features including ground surface elevation and canopy height. These LiDAR-derived metrics involving structural signatures at higher precision and accuracy can help better understand ecological processes at various spatial scales. Our study is focused on two major species of the forest: redwood (Sequoia semperirens [D.Don] Engl.) and Douglas-fir (Pseudotsuga mensiezii [Mirb.] Franco). Specifically, the objectives included linear regression models fitting tree diameter at breast height (dbh) to LiDAR derived height for each species. From 23 random points on the study area, field measurement (dbh and tree coordinate) were collected for more than 500 trees of Redwood and Douglas-fir over 0.2 ha- plots. The USFS-FUSION application software along with its LiDAR Data Viewer (LDV) were used to to extract Canopy Height Model (CHM) from which tree heights would be derived. Based on the LiDAR derived height and ground based dbh, a linear regression model was developed to predict dbh. The predicted dbh was used to estimate the biomass at the single tree level using Jenkin's formula (Jenkin et al 2003). The linear regression models were able to explain 65% of the variability associated with Redwood's dbh and 80% of that associated with Douglas-fir's dbh.
Decision tree modeling using R.
Zhang, Zhongheng
2016-08-01
In machine learning field, decision tree learner is powerful and easy to interpret. It employs recursive binary partitioning algorithm that splits the sample in partitioning variable with the strongest association with the response variable. The process continues until some stopping criteria are met. In the example I focus on conditional inference tree, which incorporates tree-structured regression models into conditional inference procedures. While growing a single tree is subject to small changes in the training data, random forests procedure is introduced to address this problem. The sources of diversity for random forests come from the random sampling and restricted set of input variables to be selected. Finally, I introduce R functions to perform model based recursive partitioning. This method incorporates recursive partitioning into conventional parametric model building.
Weather Impact on Airport Arrival Meter Fix Throughput
NASA Technical Reports Server (NTRS)
Wang, Yao
2017-01-01
Time-based flow management provides arrival aircraft schedules based on arrival airport conditions, airport capacity, required spacing, and weather conditions. In order to meet a scheduled time at which arrival aircraft can cross an airport arrival meter fix prior to entering the airport terminal airspace, air traffic controllers make regulations on air traffic. Severe weather may create an airport arrival bottleneck if one or more of airport arrival meter fixes are partially or completely blocked by the weather and the arrival demand has not been reduced accordingly. Under these conditions, aircraft are frequently being put in holding patterns until they can be rerouted. A model that predicts the weather impacted meter fix throughput may help air traffic controllers direct arrival flows into the airport more efficiently, minimizing arrival meter fix congestion. This paper presents an analysis of air traffic flows across arrival meter fixes at the Newark Liberty International Airport (EWR). Several scenarios of weather impacted EWR arrival fix flows are described. Furthermore, multiple linear regression and regression tree ensemble learning approaches for translating multiple sector Weather Impacted Traffic Indexes (WITI) to EWR arrival meter fix throughputs are examined. These weather translation models are developed and validated using the EWR arrival flight and weather data for the period of April-September in 2014. This study also compares the performance of the regression tree ensemble with traditional multiple linear regression models for estimating the weather impacted throughputs at each of the EWR arrival meter fixes. For all meter fixes investigated, the results from the regression tree ensemble weather translation models show a stronger correlation between model outputs and observed meter fix throughputs than that produced from multiple linear regression method.
Susan L. King
2003-01-01
The performance of two classifiers, logistic regression and neural networks, are compared for modeling noncatastrophic individual tree mortality for 21 species of trees in West Virginia. The output of the classifier is usually a continuous number between 0 and 1. A threshold is selected between 0 and 1 and all of the trees below the threshold are classified as...
Shi, Huilan; Jia, Junya; Li, Dong; Wei, Li; Shang, Wenya; Zheng, Zhenfeng
2018-02-09
Precise renal histopathological diagnosis will guide therapy strategy in patients with lupus nephritis. Blood oxygen level dependent (BOLD) magnetic resonance imaging (MRI) has been applicable noninvasive technique in renal disease. This current study was performed to explore whether BOLD MRI could contribute to diagnose renal pathological pattern. Adult patients with lupus nephritis renal pathological diagnosis were recruited for this study. Renal biopsy tissues were assessed based on the lupus nephritis ISN/RPS 2003 classification. The Blood oxygen level dependent magnetic resonance imaging (BOLD-MRI) was used to obtain functional magnetic resonance parameter, R2* values. Several functions of R2* values were calculated and used to construct algorithmic models for renal pathological patterns. In addition, the algorithmic models were compared as to their diagnostic capability. Both Histopathology and BOLD MRI were used to examine a total of twelve patients. Renal pathological patterns included five classes III (including 3 as class III + V) and seven classes IV (including 4 as class IV + V). Three algorithmic models, including decision tree, line discriminant, and logistic regression, were constructed to distinguish the renal pathological pattern of class III and class IV. The sensitivity of the decision tree model was better than that of the line discriminant model (71.87% vs 59.48%, P < 0.001) and inferior to that of the Logistic regression model (71.87% vs 78.71%, P < 0.001). The specificity of decision tree model was equivalent to that of the line discriminant model (63.87% vs 63.73%, P = 0.939) and higher than that of the logistic regression model (63.87% vs 38.0%, P < 0.001). The Area under the ROC curve (AUROCC) of the decision tree model was greater than that of the line discriminant model (0.765 vs 0.629, P < 0.001) and logistic regression model (0.765 vs 0.662, P < 0.001). BOLD MRI is a useful non-invasive imaging technique for the evaluation of lupus nephritis. Decision tree models constructed using functions of R2* values may facilitate the prediction of renal pathological patterns.
Duncan, Dustin T; Kawachi, Ichiro; Kum, Susan; Aldstadt, Jared; Piras, Gianfranco; Matthews, Stephen A; Arbia, Giuseppe; Castro, Marcia C; White, Kellee; Williams, David R
2014-04-01
The racial/ethnic and income composition of neighborhoods often influences local amenities, including the potential spatial distribution of trees, which are important for population health and community wellbeing, particularly in urban areas. This ecological study used spatial analytical methods to assess the relationship between neighborhood socio-demographic characteristics (i.e. minority racial/ethnic composition and poverty) and tree density at the census tact level in Boston, Massachusetts (US). We examined spatial autocorrelation with the Global Moran's I for all study variables and in the ordinary least squares (OLS) regression residuals as well as computed Spearman correlations non-adjusted and adjusted for spatial autocorrelation between socio-demographic characteristics and tree density. Next, we fit traditional regressions (i.e. OLS regression models) and spatial regressions (i.e. spatial simultaneous autoregressive models), as appropriate. We found significant positive spatial autocorrelation for all neighborhood socio-demographic characteristics (Global Moran's I range from 0.24 to 0.86, all P =0.001), for tree density (Global Moran's I =0.452, P =0.001), and in the OLS regression residuals (Global Moran's I range from 0.32 to 0.38, all P <0.001). Therefore, we fit the spatial simultaneous autoregressive models. There was a negative correlation between neighborhood percent non-Hispanic Black and tree density (r S =-0.19; conventional P -value=0.016; spatially adjusted P -value=0.299) as well as a negative correlation between predominantly non-Hispanic Black (over 60% Black) neighborhoods and tree density (r S =-0.18; conventional P -value=0.019; spatially adjusted P -value=0.180). While the conventional OLS regression model found a marginally significant inverse relationship between Black neighborhoods and tree density, we found no statistically significant relationship between neighborhood socio-demographic composition and tree density in the spatial regression models. Methodologically, our study suggests the need to take into account spatial autocorrelation as findings/conclusions can change when the spatial autocorrelation is ignored. Substantively, our findings suggest no need for policy intervention vis-à-vis trees in Boston, though we hasten to add that replication studies, and more nuanced data on tree quality, age and diversity are needed.
Prediction of strontium bromide laser efficiency using cluster and decision tree analysis
NASA Astrophysics Data System (ADS)
Iliev, Iliycho; Gocheva-Ilieva, Snezhana; Kulin, Chavdar
2018-01-01
Subject of investigation is a new high-powered strontium bromide (SrBr2) vapor laser emitting in multiline region of wavelengths. The laser is an alternative to the atom strontium lasers and electron free lasers, especially at the line 6.45 μm which line is used in surgery for medical processing of biological tissues and bones with minimal damage. In this paper the experimental data from measurements of operational and output characteristics of the laser are statistically processed by means of cluster analysis and tree-based regression techniques. The aim is to extract the more important relationships and dependences from the available data which influence the increase of the overall laser efficiency. There are constructed and analyzed a set of cluster models. It is shown by using different cluster methods that the seven investigated operational characteristics (laser tube diameter, length, supplied electrical power, and others) and laser efficiency are combined in 2 clusters. By the built regression tree models using Classification and Regression Trees (CART) technique there are obtained dependences to predict the values of efficiency, and especially the maximum efficiency with over 95% accuracy.
Assessing the predictive capability of randomized tree-based ensembles in streamflow modelling
NASA Astrophysics Data System (ADS)
Galelli, S.; Castelletti, A.
2013-02-01
Combining randomization methods with ensemble prediction is emerging as an effective option to balance accuracy and computational efficiency in data-driven modeling. In this paper we investigate the prediction capability of extremely randomized trees (Extra-Trees), in terms of accuracy, explanation ability and computational efficiency, in a streamflow modeling exercise. Extra-Trees are a totally randomized tree-based ensemble method that (i) alleviates the poor generalization property and tendency to overfitting of traditional standalone decision trees (e.g. CART); (ii) is computationally very efficient; and, (iii) allows to infer the relative importance of the input variables, which might help in the ex-post physical interpretation of the model. The Extra-Trees potential is analyzed on two real-world case studies (Marina catchment (Singapore) and Canning River (Western Australia)) representing two different morphoclimatic contexts comparatively with other tree-based methods (CART and M5) and parametric data-driven approaches (ANNs and multiple linear regression). Results show that Extra-Trees perform comparatively well to the best of the benchmarks (i.e. M5) in both the watersheds, while outperforming the other approaches in terms of computational requirement when adopted on large datasets. In addition, the ranking of the input variable provided can be given a physically meaningful interpretation.
Assessing the predictive capability of randomized tree-based ensembles in streamflow modelling
NASA Astrophysics Data System (ADS)
Galelli, S.; Castelletti, A.
2013-07-01
Combining randomization methods with ensemble prediction is emerging as an effective option to balance accuracy and computational efficiency in data-driven modelling. In this paper, we investigate the prediction capability of extremely randomized trees (Extra-Trees), in terms of accuracy, explanation ability and computational efficiency, in a streamflow modelling exercise. Extra-Trees are a totally randomized tree-based ensemble method that (i) alleviates the poor generalisation property and tendency to overfitting of traditional standalone decision trees (e.g. CART); (ii) is computationally efficient; and, (iii) allows to infer the relative importance of the input variables, which might help in the ex-post physical interpretation of the model. The Extra-Trees potential is analysed on two real-world case studies - Marina catchment (Singapore) and Canning River (Western Australia) - representing two different morphoclimatic contexts. The evaluation is performed against other tree-based methods (CART and M5) and parametric data-driven approaches (ANNs and multiple linear regression). Results show that Extra-Trees perform comparatively well to the best of the benchmarks (i.e. M5) in both the watersheds, while outperforming the other approaches in terms of computational requirement when adopted on large datasets. In addition, the ranking of the input variable provided can be given a physically meaningful interpretation.
Amini, Payam; Maroufizadeh, Saman; Samani, Reza Omani; Hamidi, Omid; Sepidarkish, Mahdi
2017-06-01
Preterm birth (PTB) is a leading cause of neonatal death and the second biggest cause of death in children under five years of age. The objective of this study was to determine the prevalence of PTB and its associated factors using logistic regression and decision tree classification methods. This cross-sectional study was conducted on 4,415 pregnant women in Tehran, Iran, from July 6-21, 2015. Data were collected by a researcher-developed questionnaire through interviews with mothers and review of their medical records. To evaluate the accuracy of the logistic regression and decision tree methods, several indices such as sensitivity, specificity, and the area under the curve were used. The PTB rate was 5.5% in this study. The logistic regression outperformed the decision tree for the classification of PTB based on risk factors. Logistic regression showed that multiple pregnancies, mothers with preeclampsia, and those who conceived with assisted reproductive technology had an increased risk for PTB ( p < 0.05). Identifying and training mothers at risk as well as improving prenatal care may reduce the PTB rate. We also recommend that statisticians utilize the logistic regression model for the classification of risk groups for PTB.
Prediction of Baseflow Index of Catchments using Machine Learning Algorithms
NASA Astrophysics Data System (ADS)
Yadav, B.; Hatfield, K.
2017-12-01
We present the results of eight machine learning techniques for predicting the baseflow index (BFI) of ungauged basins using a surrogate of catchment scale climate and physiographic data. The tested algorithms include ordinary least squares, ridge regression, least absolute shrinkage and selection operator (lasso), elasticnet, support vector machine, gradient boosted regression trees, random forests, and extremely randomized trees. Our work seeks to identify the dominant controls of BFI that can be readily obtained from ancillary geospatial databases and remote sensing measurements, such that the developed techniques can be extended to ungauged catchments. More than 800 gauged catchments spanning the continental United States were selected to develop the general methodology. The BFI calculation was based on the baseflow separated from daily streamflow hydrograph using HYSEP filter. The surrogate catchment attributes were compiled from multiple sources including digital elevation model, soil, landuse, climate data, other publicly available ancillary and geospatial data. 80% catchments were used to train the ML algorithms, and the remaining 20% of the catchments were used as an independent test set to measure the generalization performance of fitted models. A k-fold cross-validation using exhaustive grid search was used to fit the hyperparameters of each model. Initial model development was based on 19 independent variables, but after variable selection and feature ranking, we generated revised sparse models of BFI prediction that are based on only six catchment attributes. These key predictive variables selected after the careful evaluation of bias-variance tradeoff include average catchment elevation, slope, fraction of sand, permeability, temperature, and precipitation. The most promising algorithms exceeding an accuracy score (r-square) of 0.7 on test data include support vector machine, gradient boosted regression trees, random forests, and extremely randomized trees. Considering both the accuracy and the computational complexity of these algorithms, we identify the extremely randomized trees as the best performing algorithm for BFI prediction in ungauged basins.
NASA Astrophysics Data System (ADS)
Wilson, Barry T.; Knight, Joseph F.; McRoberts, Ronald E.
2018-03-01
Imagery from the Landsat Program has been used frequently as a source of auxiliary data for modeling land cover, as well as a variety of attributes associated with tree cover. With ready access to all scenes in the archive since 2008 due to the USGS Landsat Data Policy, new approaches to deriving such auxiliary data from dense Landsat time series are required. Several methods have previously been developed for use with finer temporal resolution imagery (e.g. AVHRR and MODIS), including image compositing and harmonic regression using Fourier series. The manuscript presents a study, using Minnesota, USA during the years 2009-2013 as the study area and timeframe. The study examined the relative predictive power of land cover models, in particular those related to tree cover, using predictor variables based solely on composite imagery versus those using estimated harmonic regression coefficients. The study used two common non-parametric modeling approaches (i.e. k-nearest neighbors and random forests) for fitting classification and regression models of multiple attributes measured on USFS Forest Inventory and Analysis plots using all available Landsat imagery for the study area and timeframe. The estimated Fourier coefficients developed by harmonic regression of tasseled cap transformation time series data were shown to be correlated with land cover, including tree cover. Regression models using estimated Fourier coefficients as predictor variables showed a two- to threefold increase in explained variance for a small set of continuous response variables, relative to comparable models using monthly image composites. Similarly, the overall accuracies of classification models using the estimated Fourier coefficients were approximately 10-20 percentage points higher than the models using the image composites, with corresponding individual class accuracies between six and 45 percentage points higher.
NASA Astrophysics Data System (ADS)
Pham, Binh Thai; Prakash, Indra; Tien Bui, Dieu
2018-02-01
A hybrid machine learning approach of Random Subspace (RSS) and Classification And Regression Trees (CART) is proposed to develop a model named RSSCART for spatial prediction of landslides. This model is a combination of the RSS method which is known as an efficient ensemble technique and the CART which is a state of the art classifier. The Luc Yen district of Yen Bai province, a prominent landslide prone area of Viet Nam, was selected for the model development. Performance of the RSSCART model was evaluated through the Receiver Operating Characteristic (ROC) curve, statistical analysis methods, and the Chi Square test. Results were compared with other benchmark landslide models namely Support Vector Machines (SVM), single CART, Naïve Bayes Trees (NBT), and Logistic Regression (LR). In the development of model, ten important landslide affecting factors related with geomorphology, geology and geo-environment were considered namely slope angles, elevation, slope aspect, curvature, lithology, distance to faults, distance to rivers, distance to roads, and rainfall. Performance of the RSSCART model (AUC = 0.841) is the best compared with other popular landslide models namely SVM (0.835), single CART (0.822), NBT (0.821), and LR (0.723). These results indicate that performance of the RSSCART is a promising method for spatial landslide prediction.
Jose F. Negron
1998-01-01
Infested and uninfested areas within Douglas fir, Pseudotsuga menziesii Mirb.. Franco, stands affected by the Douglas-fir beetle, Dendroctonus pseudotsugae Hopk. were sampled in the Colorado Front Range, CO. Classification tree models were built to predict probabilities of infestation. Regression trees and linear regression analysis were used to model amount of tree...
Duncan, Dustin T.; Kawachi, Ichiro; Kum, Susan; Aldstadt, Jared; Piras, Gianfranco; Matthews, Stephen A.; Arbia, Giuseppe; Castro, Marcia C.; White, Kellee; Williams, David R.
2017-01-01
The racial/ethnic and income composition of neighborhoods often influences local amenities, including the potential spatial distribution of trees, which are important for population health and community wellbeing, particularly in urban areas. This ecological study used spatial analytical methods to assess the relationship between neighborhood socio-demographic characteristics (i.e. minority racial/ethnic composition and poverty) and tree density at the census tact level in Boston, Massachusetts (US). We examined spatial autocorrelation with the Global Moran’s I for all study variables and in the ordinary least squares (OLS) regression residuals as well as computed Spearman correlations non-adjusted and adjusted for spatial autocorrelation between socio-demographic characteristics and tree density. Next, we fit traditional regressions (i.e. OLS regression models) and spatial regressions (i.e. spatial simultaneous autoregressive models), as appropriate. We found significant positive spatial autocorrelation for all neighborhood socio-demographic characteristics (Global Moran’s I range from 0.24 to 0.86, all P=0.001), for tree density (Global Moran’s I=0.452, P=0.001), and in the OLS regression residuals (Global Moran’s I range from 0.32 to 0.38, all P<0.001). Therefore, we fit the spatial simultaneous autoregressive models. There was a negative correlation between neighborhood percent non-Hispanic Black and tree density (rS=−0.19; conventional P-value=0.016; spatially adjusted P-value=0.299) as well as a negative correlation between predominantly non-Hispanic Black (over 60% Black) neighborhoods and tree density (rS=−0.18; conventional P-value=0.019; spatially adjusted P-value=0.180). While the conventional OLS regression model found a marginally significant inverse relationship between Black neighborhoods and tree density, we found no statistically significant relationship between neighborhood socio-demographic composition and tree density in the spatial regression models. Methodologically, our study suggests the need to take into account spatial autocorrelation as findings/conclusions can change when the spatial autocorrelation is ignored. Substantively, our findings suggest no need for policy intervention vis-à-vis trees in Boston, though we hasten to add that replication studies, and more nuanced data on tree quality, age and diversity are needed. PMID:29354668
The allometry of coarse root biomass: log-transformed linear regression or nonlinear regression?
Lai, Jiangshan; Yang, Bo; Lin, Dunmei; Kerkhoff, Andrew J; Ma, Keping
2013-01-01
Precise estimation of root biomass is important for understanding carbon stocks and dynamics in forests. Traditionally, biomass estimates are based on allometric scaling relationships between stem diameter and coarse root biomass calculated using linear regression (LR) on log-transformed data. Recently, it has been suggested that nonlinear regression (NLR) is a preferable fitting method for scaling relationships. But while this claim has been contested on both theoretical and empirical grounds, and statistical methods have been developed to aid in choosing between the two methods in particular cases, few studies have examined the ramifications of erroneously applying NLR. Here, we use direct measurements of 159 trees belonging to three locally dominant species in east China to compare the LR and NLR models of diameter-root biomass allometry. We then contrast model predictions by estimating stand coarse root biomass based on census data from the nearby 24-ha Gutianshan forest plot and by testing the ability of the models to predict known root biomass values measured on multiple tropical species at the Pasoh Forest Reserve in Malaysia. Based on likelihood estimates for model error distributions, as well as the accuracy of extrapolative predictions, we find that LR on log-transformed data is superior to NLR for fitting diameter-root biomass scaling models. More importantly, inappropriately using NLR leads to grossly inaccurate stand biomass estimates, especially for stands dominated by smaller trees.
Tree STEM and Canopy Biomass Estimates from Terrestrial Laser Scanning Data
NASA Astrophysics Data System (ADS)
Olofsson, K.; Holmgren, J.
2017-10-01
In this study an automatic method for estimating both the tree stem and the tree canopy biomass is presented. The point cloud tree extraction techniques operate on TLS data and models the biomass using the estimated stem and canopy volume as independent variables. The regression model fit error is of the order of less than 5 kg, which gives a relative model error of about 5 % for the stem estimate and 10-15 % for the spruce and pine canopy biomass estimates. The canopy biomass estimate was improved by separating the models by tree species which indicates that the method is allometry dependent and that the regression models need to be recomputed for different areas with different climate and different vegetation.
Factor complexity of crash occurrence: An empirical demonstration using boosted regression trees.
Chung, Yi-Shih
2013-12-01
Factor complexity is a characteristic of traffic crashes. This paper proposes a novel method, namely boosted regression trees (BRT), to investigate the complex and nonlinear relationships in high-variance traffic crash data. The Taiwanese 2004-2005 single-vehicle motorcycle crash data are used to demonstrate the utility of BRT. Traditional logistic regression and classification and regression tree (CART) models are also used to compare their estimation results and external validities. Both the in-sample cross-validation and out-of-sample validation results show that an increase in tree complexity provides improved, although declining, classification performance, indicating a limited factor complexity of single-vehicle motorcycle crashes. The effects of crucial variables including geographical, time, and sociodemographic factors explain some fatal crashes. Relatively unique fatal crashes are better approximated by interactive terms, especially combinations of behavioral factors. BRT models generally provide improved transferability than conventional logistic regression and CART models. This study also discusses the implications of the results for devising safety policies. Copyright © 2012 Elsevier Ltd. All rights reserved.
Bianca N.I. Eskelson; Hailemariam Temesgen; Tara M. Barrett
2009-01-01
Cavity tree and snag abundance data are highly variable and contain many zero observations. We predict cavity tree and snag abundance from variables that are readily available from forest cover maps or remotely sensed data using negative binomial (NB), zero-inflated NB, and zero-altered NB (ZANB) regression models as well as nearest neighbor (NN) imputation methods....
Presence of indicator plant species as a predictor of wetland vegetation integrity
Stapanian, Martin A.; Adams, Jean V.; Gara, Brian
2013-01-01
We fit regression and classification tree models to vegetation data collected from Ohio (USA) wetlands to determine (1) which species best predict Ohio vegetation index of biotic integrity (OVIBI) score and (2) which species best predict high-quality wetlands (OVIBI score >75). The simplest regression tree model predicted OVIBI score based on the occurrence of three plant species: skunk-cabbage (Symplocarpus foetidus), cinnamon fern (Osmunda cinnamomea), and swamp rose (Rosa palustris). The lowest OVIBI scores were best predicted by the absence of the selected plant species rather than by the presence of other species. The simplest classification tree model predicted high-quality wetlands based on the occurrence of two plant species: skunk-cabbage and marsh-fern (Thelypteris palustris). The overall misclassification rate from this tree was 13 %. Again, low-quality wetlands were better predicted than high-quality wetlands by the absence of selected species rather than the presence of other species using the classification tree model. Our results suggest that a species’ wetland status classification and coefficient of conservatism are of little use in predicting wetland quality. A simple, statistically derived species checklist such as the one created in this study could be used by field biologists to quickly and efficiently identify wetland sites likely to be regulated as high-quality, and requiring more intensive field assessments. Alternatively, it can be used for advanced determinations of low-quality wetlands. Agencies can save considerable money by screening wetlands for the presence/absence of such “indicator” species before issuing permits.
Large unbalanced credit scoring using Lasso-logistic regression ensemble.
Wang, Hong; Xu, Qingsong; Zhou, Lifeng
2015-01-01
Recently, various ensemble learning methods with different base classifiers have been proposed for credit scoring problems. However, for various reasons, there has been little research using logistic regression as the base classifier. In this paper, given large unbalanced data, we consider the plausibility of ensemble learning using regularized logistic regression as the base classifier to deal with credit scoring problems. In this research, the data is first balanced and diversified by clustering and bagging algorithms. Then we apply a Lasso-logistic regression learning ensemble to evaluate the credit risks. We show that the proposed algorithm outperforms popular credit scoring models such as decision tree, Lasso-logistic regression and random forests in terms of AUC and F-measure. We also provide two importance measures for the proposed model to identify important variables in the data.
Decision Tree Approach for Soil Liquefaction Assessment
Gandomi, Amir H.; Fridline, Mark M.; Roke, David A.
2013-01-01
In the current study, the performances of some decision tree (DT) techniques are evaluated for postearthquake soil liquefaction assessment. A database containing 620 records of seismic parameters and soil properties is used in this study. Three decision tree techniques are used here in two different ways, considering statistical and engineering points of view, to develop decision rules. The DT results are compared to the logistic regression (LR) model. The results of this study indicate that the DTs not only successfully predict liquefaction but they can also outperform the LR model. The best DT models are interpreted and evaluated based on an engineering point of view. PMID:24489498
Decision tree approach for soil liquefaction assessment.
Gandomi, Amir H; Fridline, Mark M; Roke, David A
2013-01-01
In the current study, the performances of some decision tree (DT) techniques are evaluated for postearthquake soil liquefaction assessment. A database containing 620 records of seismic parameters and soil properties is used in this study. Three decision tree techniques are used here in two different ways, considering statistical and engineering points of view, to develop decision rules. The DT results are compared to the logistic regression (LR) model. The results of this study indicate that the DTs not only successfully predict liquefaction but they can also outperform the LR model. The best DT models are interpreted and evaluated based on an engineering point of view.
Zhao, Yang; Zheng, Wei; Zhuo, Daisy Y; Lu, Yuefeng; Ma, Xiwen; Liu, Hengchang; Zeng, Zhen; Laird, Glen
2017-10-11
Personalized medicine, or tailored therapy, has been an active and important topic in recent medical research. Many methods have been proposed in the literature for predictive biomarker detection and subgroup identification. In this article, we propose a novel decision tree-based approach applicable in randomized clinical trials. We model the prognostic effects of the biomarkers using additive regression trees and the biomarker-by-treatment effect using a single regression tree. Bayesian approach is utilized to periodically revise the split variables and the split rules of the decision trees, which provides a better overall fitting. Gibbs sampler is implemented in the MCMC procedure, which updates the prognostic trees and the interaction tree separately. We use the posterior distribution of the interaction tree to construct the predictive scores of the biomarkers and to identify the subgroup where the treatment is superior to the control. Numerical simulations show that our proposed method performs well under various settings comparing to existing methods. We also demonstrate an application of our method in a real clinical trial.
Singh, Gyanendra; Sachdeva, S N; Pal, Mahesh
2016-11-01
This work examines the application of M5 model tree and conventionally used fixed/random effect negative binomial (FENB/RENB) regression models for accident prediction on non-urban sections of highway in Haryana (India). Road accident data for a period of 2-6 years on different sections of 8 National and State Highways in Haryana was collected from police records. Data related to road geometry, traffic and road environment related variables was collected through field studies. Total two hundred and twenty two data points were gathered by dividing highways into sections with certain uniform geometric characteristics. For prediction of accident frequencies using fifteen input parameters, two modeling approaches: FENB/RENB regression and M5 model tree were used. Results suggest that both models perform comparably well in terms of correlation coefficient and root mean square error values. M5 model tree provides simple linear equations that are easy to interpret and provide better insight, indicating that this approach can effectively be used as an alternative to RENB approach if the sole purpose is to predict motor vehicle crashes. Sensitivity analysis using M5 model tree also suggests that its results reflect the physical conditions. Both models clearly indicate that to improve safety on Indian highways minor accesses to the highways need to be properly designed and controlled, the service roads to be made functional and dispersion of speeds is to be brought down. Copyright © 2016 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Lombardo, L.; Cama, M.; Maerker, M.; Parisi, L.; Rotigliano, E.
2014-12-01
This study aims at comparing the performances of Binary Logistic Regression (BLR) and Boosted Regression Trees (BRT) methods in assessing landslide susceptibility for multiple-occurrence regional landslide events within the Mediterranean region. A test area was selected in the north-eastern sector of Sicily (southern Italy), corresponding to the catchments of the Briga and the Giampilieri streams both stretching for few kilometres from the Peloritan ridge (eastern Sicily, Italy) to the Ionian sea. This area was struck on the 1st October 2009 by an extreme climatic event resulting in thousands of rapid shallow landslides, mainly of debris flows and debris avalanches types involving the weathered layer of a low to high grade metamorphic bedrock. Exploiting the same set of predictors and the 2009 landslide archive, BLR- and BRT-based susceptibility models were obtained for the two catchments separately, adopting a random partition (RP) technique for validation; besides, the models trained in one of the two catchments (Briga) were tested in predicting the landslide distribution in the other (Giampilieri), adopting a spatial partition (SP) based validation procedure. All the validation procedures were based on multi-folds tests so to evaluate and compare the reliability of the fitting, the prediction skill, the coherence in the predictor selection and the precision of the susceptibility estimates. All the obtained models for the two methods produced very high predictive performances, with a general congruence between BLR and BRT in the predictor importance. In particular, the research highlighted that BRT-models reached a higher prediction performance with respect to BLR-models, for RP based modelling, whilst for the SP-based models the difference in predictive skills between the two methods dropped drastically, converging to an analogous excellent performance. However, when looking at the precision of the probability estimates, BLR demonstrated to produce more robust models in terms of selected predictors and coefficients, as well as of dispersion of the estimated probabilities around the mean value for each mapped pixel. The difference in the behaviour could be interpreted as the result of overfitting effects, which heavily affect decision tree classification more than logistic regression techniques.
Using Baidu Search Index to Predict Dengue Outbreak in China
NASA Astrophysics Data System (ADS)
Liu, Kangkang; Wang, Tao; Yang, Zhicong; Huang, Xiaodong; Milinovich, Gabriel J.; Lu, Yi; Jing, Qinlong; Xia, Yao; Zhao, Zhengyang; Yang, Yang; Tong, Shilu; Hu, Wenbiao; Lu, Jiahai
2016-12-01
This study identified the possible threshold to predict dengue fever (DF) outbreaks using Baidu Search Index (BSI). Time-series classification and regression tree models based on BSI were used to develop a predictive model for DF outbreak in Guangzhou and Zhongshan, China. In the regression tree models, the mean autochthonous DF incidence rate increased approximately 30-fold in Guangzhou when the weekly BSI for DF at the lagged moving average of 1-3 weeks was more than 382. When the weekly BSI for DF at the lagged moving average of 1-5 weeks was more than 91.8, there was approximately 9-fold increase of the mean autochthonous DF incidence rate in Zhongshan. In the classification tree models, the results showed that when the weekly BSI for DF at the lagged moving average of 1-3 weeks was more than 99.3, there was 89.28% chance of DF outbreak in Guangzhou, while, in Zhongshan, when the weekly BSI for DF at the lagged moving average of 1-5 weeks was more than 68.1, the chance of DF outbreak rose up to 100%. The study indicated that less cost internet-based surveillance systems can be the valuable complement to traditional DF surveillance in China.
Modeling individual tree survial
Quang V. Cao
2016-01-01
Information provided by growth and yield models is the basis for forest managers to make decisions on how to manage their forests. Among different types of growth models, whole-stand models offer predictions at stand level, whereas individual-tree models give detailed information at tree level. The well-known logistic regression is commonly used to predict tree...
NASA Astrophysics Data System (ADS)
Mangla, Rohit; Kumar, Shashi; Nandy, Subrata
2016-05-01
SAR and LiDAR remote sensing have already shown the potential of active sensors for forest parameter retrieval. SAR sensor in its fully polarimetric mode has an advantage to retrieve scattering property of different component of forest structure and LiDAR has the capability to measure structural information with very high accuracy. This study was focused on retrieval of forest aboveground biomass (AGB) using Terrestrial Laser Scanner (TLS) based point clouds and scattering property of forest vegetation obtained from decomposition modelling of RISAT-1 fully polarimetric SAR data. TLS data was acquired for 14 plots of Timli forest range, Uttarakhand, India. The forest area is dominated by Sal trees and random sampling with plot size of 0.1 ha (31.62m*31.62m) was adopted for TLS and field data collection. RISAT-1 data was processed to retrieve SAR data based variables and TLS point clouds based 3D imaging was done to retrieve LiDAR based variables. Surface scattering, double-bounce scattering, volume scattering, helix and wire scattering were the SAR based variables retrieved from polarimetric decomposition. Tree heights and stem diameters were used as LiDAR based variables retrieved from single tree vertical height and least square circle fit methods respectively. All the variables obtained for forest plots were used as an input in a machine learning based Random Forest Regression Model, which was developed in this study for forest AGB estimation. Modelled output for forest AGB showed reliable accuracy (RMSE = 27.68 t/ha) and a good coefficient of determination (0.63) was obtained through the linear regression between modelled AGB and field-estimated AGB. The sensitivity analysis showed that the model was more sensitive for the major contributed variables (stem diameter and volume scattering) and these variables were measured from two different remote sensing techniques. This study strongly recommends the integration of SAR and LiDAR data for forest AGB estimation.
Louis R Iverson; Anantha M. Prasad; Mark W. Schwartz; Mark W. Schwartz
2005-01-01
We predict current distribution and abundance for tree species present in eastern North America, and subsequently estimate potential suitable habitat for those species under a changed climate with 2 x CO2. We used a series of statistical models (i.e., Regression Tree Analysis (RTA), Multivariate Adaptive Regression Splines (MARS), Bagging Trees (...
Alghamdi, Manal; Al-Mallah, Mouaz; Keteyian, Steven; Brawner, Clinton; Ehrman, Jonathan; Sakr, Sherif
2017-01-01
Machine learning is becoming a popular and important approach in the field of medical research. In this study, we investigate the relative performance of various machine learning methods such as Decision Tree, Naïve Bayes, Logistic Regression, Logistic Model Tree and Random Forests for predicting incident diabetes using medical records of cardiorespiratory fitness. In addition, we apply different techniques to uncover potential predictors of diabetes. This FIT project study used data of 32,555 patients who are free of any known coronary artery disease or heart failure who underwent clinician-referred exercise treadmill stress testing at Henry Ford Health Systems between 1991 and 2009 and had a complete 5-year follow-up. At the completion of the fifth year, 5,099 of those patients have developed diabetes. The dataset contained 62 attributes classified into four categories: demographic characteristics, disease history, medication use history, and stress test vital signs. We developed an Ensembling-based predictive model using 13 attributes that were selected based on their clinical importance, Multiple Linear Regression, and Information Gain Ranking methods. The negative effect of the imbalance class of the constructed model was handled by Synthetic Minority Oversampling Technique (SMOTE). The overall performance of the predictive model classifier was improved by the Ensemble machine learning approach using the Vote method with three Decision Trees (Naïve Bayes Tree, Random Forest, and Logistic Model Tree) and achieved high accuracy of prediction (AUC = 0.92). The study shows the potential of ensembling and SMOTE approaches for predicting incident diabetes using cardiorespiratory fitness data.
Towards lidar-based mapping of tree age at the Arctic forest tundra ecotone.
NASA Astrophysics Data System (ADS)
Jensen, J.; Maguire, A.; Oelkers, R.; Andreu-Hayles, L.; Boelman, N.; D'Arrigo, R.; Griffin, K. L.; Jennewein, J. S.; Hiers, E.; Meddens, A. J.; Russell, M.; Vierling, L. A.; Eitel, J.
2017-12-01
Climate change may cause spatial shifts in the forest-tundra ecotone (FTE). To improve our ability to study these spatial shifts, information on tree demography along the FTE is needed. The objective of this study was to assess the suitability of lidar derived tree heights as a surrogate for tree age. We calculated individual tree age from 48 tree cores collected at basal height from white spruce (Picea glauca) within the FTE in northern Alaska. Tree height was obtained from terrestrial lidar scans (<1cm spatial resolution). The relationship between age and height was examined using a linear regression model forced through the origin. We found a very strong predictive relationship between tree height and age (R2 = 0.90, RMSE = 19.34 years) for trees that ranged between 14 to 230 years. Separate regression models were also developed for small (height < 3 m) and large trees (height >= 3 m), yielding strong predictive relationships between height and age (R2 = 0.86, RMSE 12.21 years, and R2 = 0.93, RMSE = 25.16 years, respectively). The slope coefficient for small and large tree models (16.83 and 12.98 years/m, respectively) indicate that small trees grow 1.3 times faster than large trees at these FTE study sites. Although a strong, predictive relationship between age and height is uncommon in light-limited forest environments, our findings suggest that the sparseness of trees within the FTE may explain the strong tree height-age relationships found herein. Further analysis of 36 additional tree cores recently collected within the FTE near Inuvik, Canada will be performed. Our preliminary analysis suggests that lidar derived tree height could be a reliable proxy for tree age at the FTE, thereby establishing a new technique for scaling tree structure and demographics across larger portions of this sensitive ecotone.
Predicting Error Bars for QSAR Models
NASA Astrophysics Data System (ADS)
Schroeter, Timon; Schwaighofer, Anton; Mika, Sebastian; Ter Laak, Antonius; Suelzle, Detlev; Ganzer, Ursula; Heinrich, Nikolaus; Müller, Klaus-Robert
2007-09-01
Unfavorable physicochemical properties often cause drug failures. It is therefore important to take lipophilicity and water solubility into account early on in lead discovery. This study presents log D7 models built using Gaussian Process regression, Support Vector Machines, decision trees and ridge regression algorithms based on 14556 drug discovery compounds of Bayer Schering Pharma. A blind test was conducted using 7013 new measurements from the last months. We also present independent evaluations using public data. Apart from accuracy, we discuss the quality of error bars that can be computed by Gaussian Process models, and ensemble and distance based techniques for the other modelling approaches.
Modelling spruce bark beetle infestation probability
Paulius Zolubas; Jose Negron; A. Steven Munson
2009-01-01
Spruce bark beetle (Ips typographus L.) risk model, based on pure Norway spruce (Picea abies Karst.) stand characteristics in experimental and control plots was developed using classification and regression tree statistical technique under endemic pest population density. The most significant variable in spruce bark beetle...
Bayesian Ensemble Trees (BET) for Clustering and Prediction in Heterogeneous Data
Duan, Leo L.; Clancy, John P.; Szczesniak, Rhonda D.
2016-01-01
We propose a novel “tree-averaging” model that utilizes the ensemble of classification and regression trees (CART). Each constituent tree is estimated with a subset of similar data. We treat this grouping of subsets as Bayesian Ensemble Trees (BET) and model them as a Dirichlet process. We show that BET determines the optimal number of trees by adapting to the data heterogeneity. Compared with the other ensemble methods, BET requires much fewer trees and shows equivalent prediction accuracy using weighted averaging. Moreover, each tree in BET provides variable selection criterion and interpretation for each subset. We developed an efficient estimating procedure with improved estimation strategies in both CART and mixture models. We demonstrate these advantages of BET with simulations and illustrate the approach with a real-world data example involving regression of lung function measurements obtained from patients with cystic fibrosis. Supplemental materials are available online. PMID:27524872
Boyte, Stephen P.; Wylie, Bruce K.; Major, Donald J.; Brown, Jesslyn F.
2015-01-01
Cheatgrass exhibits spatial and temporal phenological variability across the Great Basin as described by ecological models formed using remote sensing and other spatial data-sets. We developed a rule-based, piecewise regression-tree model trained on 99 points that used three data-sets – latitude, elevation, and start of season time based on remote sensing input data – to estimate cheatgrass beginning of spring growth (BOSG) in the northern Great Basin. The model was then applied to map the location and timing of cheatgrass spring growth for the entire area. The model was strong (R2 = 0.85) and predicted an average cheatgrass BOSG across the study area of 29 March–4 April. Of early cheatgrass BOSG areas, 65% occurred at elevations below 1452 m. The highest proportion of cheatgrass BOSG occurred between mid-April and late May. Predicted cheatgrass BOSG in this study matched well with previous Great Basin cheatgrass green-up studies.
A hierarchical linear model for tree height prediction.
Vicente J. Monleon
2003-01-01
Measuring tree height is a time-consuming process. Often, tree diameter is measured and height is estimated from a published regression model. Trees used to develop these models are clustered into stands, but this structure is ignored and independence is assumed. In this study, hierarchical linear models that account explicitly for the clustered structure of the data...
Modeling time-to-event (survival) data using classification tree analysis.
Linden, Ariel; Yarnold, Paul R
2017-12-01
Time to the occurrence of an event is often studied in health research. Survival analysis differs from other designs in that follow-up times for individuals who do not experience the event by the end of the study (called censored) are accounted for in the analysis. Cox regression is the standard method for analysing censored data, but the assumptions required of these models are easily violated. In this paper, we introduce classification tree analysis (CTA) as a flexible alternative for modelling censored data. Classification tree analysis is a "decision-tree"-like classification model that provides parsimonious, transparent (ie, easy to visually display and interpret) decision rules that maximize predictive accuracy, derives exact P values via permutation tests, and evaluates model cross-generalizability. Using empirical data, we identify all statistically valid, reproducible, longitudinally consistent, and cross-generalizable CTA survival models and then compare their predictive accuracy to estimates derived via Cox regression and an unadjusted naïve model. Model performance is assessed using integrated Brier scores and a comparison between estimated survival curves. The Cox regression model best predicts average incidence of the outcome over time, whereas CTA survival models best predict either relatively high, or low, incidence of the outcome over time. Classification tree analysis survival models offer many advantages over Cox regression, such as explicit maximization of predictive accuracy, parsimony, statistical robustness, and transparency. Therefore, researchers interested in accurate prognoses and clear decision rules should consider developing models using the CTA-survival framework. © 2017 John Wiley & Sons, Ltd.
Data mining of tree-based models to analyze freeway accident frequency.
Chang, Li-Yen; Chen, Wen-Chieh
2005-01-01
Statistical models, such as Poisson or negative binomial regression models, have been employed to analyze vehicle accident frequency for many years. However, these models have their own model assumptions and pre-defined underlying relationship between dependent and independent variables. If these assumptions are violated, the model could lead to erroneous estimation of accident likelihood. Classification and Regression Tree (CART), one of the most widely applied data mining techniques, has been commonly employed in business administration, industry, and engineering. CART does not require any pre-defined underlying relationship between target (dependent) variable and predictors (independent variables) and has been shown to be a powerful tool, particularly for dealing with prediction and classification problems. This study collected the 2001-2002 accident data of National Freeway 1 in Taiwan. A CART model and a negative binomial regression model were developed to establish the empirical relationship between traffic accidents and highway geometric variables, traffic characteristics, and environmental factors. The CART findings indicated that the average daily traffic volume and precipitation variables were the key determinants for freeway accident frequencies. By comparing the prediction performance between the CART and the negative binomial regression models, this study demonstrates that CART is a good alternative method for analyzing freeway accident frequencies. By comparing the prediction performance between the CART and the negative binomial regression models, this study demonstrates that CART is a good alternative method for analyzing freeway accident frequencies.
Stemflow estimation in a redwood forest using model-based stratified random sampling
Jack Lewis
2003-01-01
Model-based stratified sampling is illustrated by a case study of stemflow volume in a redwood forest. The approach is actually a model-assisted sampling design in which auxiliary information (tree diameter) is utilized in the design of stratum boundaries to optimize the efficiency of a regression or ratio estimator. The auxiliary information is utilized in both the...
Large Unbalanced Credit Scoring Using Lasso-Logistic Regression Ensemble
Wang, Hong; Xu, Qingsong; Zhou, Lifeng
2015-01-01
Recently, various ensemble learning methods with different base classifiers have been proposed for credit scoring problems. However, for various reasons, there has been little research using logistic regression as the base classifier. In this paper, given large unbalanced data, we consider the plausibility of ensemble learning using regularized logistic regression as the base classifier to deal with credit scoring problems. In this research, the data is first balanced and diversified by clustering and bagging algorithms. Then we apply a Lasso-logistic regression learning ensemble to evaluate the credit risks. We show that the proposed algorithm outperforms popular credit scoring models such as decision tree, Lasso-logistic regression and random forests in terms of AUC and F-measure. We also provide two importance measures for the proposed model to identify important variables in the data. PMID:25706988
NASA Astrophysics Data System (ADS)
Shao, G.; Gallion, J.; Fei, S.
2016-12-01
Sound forest aboveground biomass estimation is required to monitor diverse forest ecosystems and their impacts on the changing climate. Lidar-based regression models provided promised biomass estimations in most forest ecosystems. However, considerable uncertainties of biomass estimations have been reported in the temperate hardwood and hardwood-dominated mixed forests. Varied site productivities in temperate hardwood forests largely diversified height and diameter growth rates, which significantly reduced the correlation between tree height and diameter at breast height (DBH) in mature and complex forests. It is, therefore, difficult to utilize height-based lidar metrics to predict DBH-based field-measured biomass through a simple regression model regardless the variation of site productivity. In this study, we established a multi-dimension nonlinear regression model incorporating lidar metrics and site productivity classes derived from soil features. In the regression model, lidar metrics provided horizontal and vertical structural information and productivity classes differentiated good and poor forest sites. The selection and combination of lidar metrics were discussed. Multiple regression models were employed and compared. Uncertainty analysis was applied to the best fit model. The effects of site productivity on the lidar-based biomass model were addressed.
Extensions and applications of ensemble-of-trees methods in machine learning
NASA Astrophysics Data System (ADS)
Bleich, Justin
Ensemble-of-trees algorithms have emerged to the forefront of machine learning due to their ability to generate high forecasting accuracy for a wide array of regression and classification problems. Classic ensemble methodologies such as random forests (RF) and stochastic gradient boosting (SGB) rely on algorithmic procedures to generate fits to data. In contrast, more recent ensemble techniques such as Bayesian Additive Regression Trees (BART) and Dynamic Trees (DT) focus on an underlying Bayesian probability model to generate the fits. These new probability model-based approaches show much promise versus their algorithmic counterparts, but also offer substantial room for improvement. The first part of this thesis focuses on methodological advances for ensemble-of-trees techniques with an emphasis on the more recent Bayesian approaches. In particular, we focus on extensions of BART in four distinct ways. First, we develop a more robust implementation of BART for both research and application. We then develop a principled approach to variable selection for BART as well as the ability to naturally incorporate prior information on important covariates into the algorithm. Next, we propose a method for handling missing data that relies on the recursive structure of decision trees and does not require imputation. Last, we relax the assumption of homoskedasticity in the BART model to allow for parametric modeling of heteroskedasticity. The second part of this thesis returns to the classic algorithmic approaches in the context of classification problems with asymmetric costs of forecasting errors. First we consider the performance of RF and SGB more broadly and demonstrate its superiority to logistic regression for applications in criminology with asymmetric costs. Next, we use RF to forecast unplanned hospital readmissions upon patient discharge with asymmetric costs taken into account. Finally, we explore the construction of stable decision trees for forecasts of violence during probation hearings in court systems.
Regeneration of Douglas-fir cutblocks on the Six Rivers National Forest in northwestern California
R. O. Strothmann
1979-01-01
A survey of 61 cutblocks planted since 1964 evaluated stocking of conifers (trees 1 foot tall or taller) on 2-milacre quadrats. Overall stocking percentage averaged 42.2 and ranged from 15 to 8 1. Overall number of trees per acre averaged 396. In the regression model, based on 36 cutblocks, better stocking was associated with high site class, northerly aspect,...
Tree allometry and improved estimation of carbon stocks and balance in tropical forests.
Chave, J; Andalo, C; Brown, S; Cairns, M A; Chambers, J Q; Eamus, D; Fölster, H; Fromard, F; Higuchi, N; Kira, T; Lescure, J-P; Nelson, B W; Ogawa, H; Puig, H; Riéra, B; Yamakura, T
2005-08-01
Tropical forests hold large stores of carbon, yet uncertainty remains regarding their quantitative contribution to the global carbon cycle. One approach to quantifying carbon biomass stores consists in inferring changes from long-term forest inventory plots. Regression models are used to convert inventory data into an estimate of aboveground biomass (AGB). We provide a critical reassessment of the quality and the robustness of these models across tropical forest types, using a large dataset of 2,410 trees >or= 5 cm diameter, directly harvested in 27 study sites across the tropics. Proportional relationships between aboveground biomass and the product of wood density, trunk cross-sectional area, and total height are constructed. We also develop a regression model involving wood density and stem diameter only. Our models were tested for secondary and old-growth forests, for dry, moist and wet forests, for lowland and montane forests, and for mangrove forests. The most important predictors of AGB of a tree were, in decreasing order of importance, its trunk diameter, wood specific gravity, total height, and forest type (dry, moist, or wet). Overestimates prevailed, giving a bias of 0.5-6.5% when errors were averaged across all stands. Our regression models can be used reliably to predict aboveground tree biomass across a broad range of tropical forests. Because they are based on an unprecedented dataset, these models should improve the quality of tropical biomass estimates, and bring consensus about the contribution of the tropical forest biome and tropical deforestation to the global carbon cycle.
Balk, Benjamin; Elder, Kelly
2000-01-01
We model the spatial distribution of snow across a mountain basin using an approach that combines binary decision tree and geostatistical techniques. In April 1997 and 1998, intensive snow surveys were conducted in the 6.9‐km2 Loch Vale watershed (LVWS), Rocky Mountain National Park, Colorado. Binary decision trees were used to model the large‐scale variations in snow depth, while the small‐scale variations were modeled through kriging interpolation methods. Binary decision trees related depth to the physically based independent variables of net solar radiation, elevation, slope, and vegetation cover type. These decision tree models explained 54–65% of the observed variance in the depth measurements. The tree‐based modeled depths were then subtracted from the measured depths, and the resulting residuals were spatially distributed across LVWS through kriging techniques. The kriged estimates of the residuals were added to the tree‐based modeled depths to produce a combined depth model. The combined depth estimates explained 60–85% of the variance in the measured depths. Snow densities were mapped across LVWS using regression analysis. Snow‐covered area was determined from high‐resolution aerial photographs. Combining the modeled depths and densities with a snow cover map produced estimates of the spatial distribution of snow water equivalence (SWE). This modeling approach offers improvement over previous methods of estimating SWE distribution in mountain basins.
NASA Astrophysics Data System (ADS)
Mfumu Kihumba, Antoine; Ndembo Longo, Jean; Vanclooster, Marnik
2016-03-01
A multivariate statistical modelling approach was applied to explain the anthropogenic pressure of nitrate pollution on the Kinshasa groundwater body (Democratic Republic of Congo). Multiple regression and regression tree models were compared and used to identify major environmental factors that control the groundwater nitrate concentration in this region. The analyses were made in terms of physical attributes related to the topography, land use, geology and hydrogeology in the capture zone of different groundwater sampling stations. For the nitrate data, groundwater datasets from two different surveys were used. The statistical models identified the topography, the residential area, the service land (cemetery), and the surface-water land-use classes as major factors explaining nitrate occurrence in the groundwater. Also, groundwater nitrate pollution depends not on one single factor but on the combined influence of factors representing nitrogen loading sources and aquifer susceptibility characteristics. The groundwater nitrate pressure was better predicted with the regression tree model than with the multiple regression model. Furthermore, the results elucidated the sensitivity of the model performance towards the method of delineation of the capture zones. For pollution modelling at the monitoring points, therefore, it is better to identify capture-zone shapes based on a conceptual hydrogeological model rather than to adopt arbitrary circular capture zones.
Tests of a habitat suitability model for black-capped chickadees
Schroeder, Richard L.
1990-01-01
The black-capped chickadee (Parus atricapillus) Habitat Suitability Index (HSI) model provides a quantitative rating of the capability of a habitat to support breeding, based on measures related to food and nest site availability. The model assumption that tree canopy volume can be predicted from measures of tree height and canopy closure was tested using data from foliage volume studies conducted in the riparian cottonwood habitat along the South Platte River in Colorado. Least absolute deviations (LAD) regression showed that canopy cover and over story tree height yielded volume predictions significantly lower than volume estimated by more direct methods. Revisions to these model relations resulted in improved predictions of foliage volume. The relation between the HSI and estimates of black-capped chickadee population densities was examined using LAD regression for both the original model and the model with the foliage volume revisions. Residuals from these models were compared to residuals from both a zero slope model and an ideal model. The fit model for the original HSI differed significantly from the ideal model, whereas the fit model for the original HSI did not differ significantly from the ideal model. However, both the fit model for the original HSI and the fit model for the revised HSI did not differ significantly from a model with a zero slope. Although further testing of the revised model is needed, its use is recommended for more realistic estimates of tree canopy volume and habitat suitability.
Louys, Julien; Meloro, Carlo; Elton, Sarah; Ditchfield, Peter; Bishop, Laura C
2015-01-01
We test the performance of two models that use mammalian communities to reconstruct multivariate palaeoenvironments. While both models exploit the correlation between mammal communities (defined in terms of functional groups) and arboreal heterogeneity, the first uses a multiple multivariate regression of community structure and arboreal heterogeneity, while the second uses a linear regression of the principal components of each ecospace. The success of these methods means the palaeoenvironment of a particular locality can be reconstructed in terms of the proportions of heavy, moderate, light, and absent tree canopy cover. The linear regression is less biased, and more precisely and accurately reconstructs heavy tree canopy cover than the multiple multivariate model. However, the multiple multivariate model performs better than the linear regression for all other canopy cover categories. Both models consistently perform better than randomly generated reconstructions. We apply both models to the palaeocommunity of the Upper Laetolil Beds, Tanzania. Our reconstructions indicate that there was very little heavy tree cover at this site (likely less than 10%), with the palaeo-landscape instead comprising a mixture of light and absent tree cover. These reconstructions help resolve the previous conflicting palaeoecological reconstructions made for this site. Copyright © 2014 Elsevier Ltd. All rights reserved.
Kim, So-Ra; Kwak, Doo-Ahn; Lee, Woo-Kyun; oLee, Woo-Kyun; Son, Yowhan; Bae, Sang-Won; Kim, Choonsig; Yoo, Seongjin
2010-07-01
The objective of this study was to estimate the carbon storage capacity of Pinus densiflora stands using remotely sensed data by combining digital aerial photography with light detection and ranging (LiDAR) data. A digital canopy model (DCM), generated from the LiDAR data, was combined with aerial photography for segmenting crowns of individual trees. To eliminate errors in over and under-segmentation, the combined image was smoothed using a Gaussian filtering method. The processed image was then segmented into individual trees using a marker-controlled watershed segmentation method. After measuring the crown area from the segmented individual trees, the individual tree diameter at breast height (DBH) was estimated using a regression function developed from the relationship observed between the field-measured DBH and crown area. The above ground biomass of individual trees could be calculated by an image-derived DBH using a regression function developed by the Korea Forest Research Institute. The carbon storage, based on individual trees, was estimated by simple multiplication using the carbon conversion index (0.5), as suggested in guidelines from the Intergovernmental Panel on Climate Change. The mean carbon storage per individual tree was estimated and then compared with the field-measured value. This study suggested that the biomass and carbon storage in a large forest area can be effectively estimated using aerial photographs and LiDAR data.
ERIC Educational Resources Information Center
Koon, Sharon; Petscher, Yaacov
2015-01-01
The purpose of this report was to explicate the use of logistic regression and classification and regression tree (CART) analysis in the development of early warning systems. It was motivated by state education leaders' interest in maintaining high classification accuracy while simultaneously improving practitioner understanding of the rules by…
Improving ensemble decision tree performance using Adaboost and Bagging
NASA Astrophysics Data System (ADS)
Hasan, Md. Rajib; Siraj, Fadzilah; Sainin, Mohd Shamrie
2015-12-01
Ensemble classifier systems are considered as one of the most promising in medical data classification and the performance of deceision tree classifier can be increased by the ensemble method as it is proven to be better than single classifiers. However, in a ensemble settings the performance depends on the selection of suitable base classifier. This research employed two prominent esemble s namely Adaboost and Bagging with base classifiers such as Random Forest, Random Tree, j48, j48grafts and Logistic Model Regression (LMT) that have been selected independently. The empirical study shows that the performance varries when different base classifiers are selected and even some places overfitting issue also been noted. The evidence shows that ensemble decision tree classfiers using Adaboost and Bagging improves the performance of selected medical data sets.
NASA Astrophysics Data System (ADS)
Jurasinski, Gerald; Scharnweber, Tobias; Schröder, Christian; Lennartz, Bernd; Bauwe, Andreas
2017-04-01
Tree growth depends, among other factors, largely on the prevailing climatic conditions. Therefore, tree growth patterns are to be expected under climate change. Here, we analyze the tree-ring growth response of three major European tree species to projected future climate across a climatic (mostly precipitation) gradient in northeastern Germany. We used monthly data for temperature, precipitation, and the standardized precipitation evapotranspiration index (SPEI) over multiple time scales (1, 3, 6, 12, and 24 months) to construct models of tree-ring growth for Scots pine (Pinus syl- vestris L.) at three pure stands, and for Common beech (Fagus sylvatica L.) and Pedunculate oak (Quercus robur L.) at three mature mixed stands. The regression models were derived using a two-step approach based on partial least squares regression (PLSR) to extract potentially well explaining variables followed by ordinary least squares regression (OLSR) to consolidate the models to the least number of variables while retaining high explanatory power. The stability of the models was tested with a comprehensive calibration-verification scheme. All models were successfully verified with R2s ranging from 0.21 for the western pine stand to 0.62 for the beech stand in the east. For growth prediction, climate data forecasted until 2100 by the regional climate model WETTREG2010 based on the A1B Intergovernmental Panel on Climate Change (IPCC) emission scenario was used. For beech and oak, growth rates will likely decrease until the end of the 21st century. For pine, modeled growth trends vary and range from a slight growth increase to a weak decrease in growth rates depending on the position along the climatic gradient. The climatic gradient across the study area will possibly affect the future growth of oak with larger growth reductions towards the drier east. For beech, site-specific adaptations seem to override the influence of the climatic gradient. We conclude that in Northeastern Germany Scots pine has great potential to remain resilient to projected climate change without any greater impairment, whereas Common beech and Pedunculate oak will likely face lesser growth under the expected warmer and dryer climate conditions. The results call for an adaptation of forest management to mitigate the negative effects of climate change for beech and oak in the region.
Trees grow on money: urban tree canopy cover and environmental justice.
Schwarz, Kirsten; Fragkias, Michail; Boone, Christopher G; Zhou, Weiqi; McHale, Melissa; Grove, J Morgan; O'Neil-Dunne, Jarlath; McFadden, Joseph P; Buckley, Geoffrey L; Childers, Dan; Ogden, Laura; Pincetl, Stephanie; Pataki, Diane; Whitmer, Ali; Cadenasso, Mary L
2015-01-01
This study examines the distributional equity of urban tree canopy (UTC) cover for Baltimore, MD, Los Angeles, CA, New York, NY, Philadelphia, PA, Raleigh, NC, Sacramento, CA, and Washington, D.C. using high spatial resolution land cover data and census data. Data are analyzed at the Census Block Group levels using Spearman's correlation, ordinary least squares regression (OLS), and a spatial autoregressive model (SAR). Across all cities there is a strong positive correlation between UTC cover and median household income. Negative correlations between race and UTC cover exist in bivariate models for some cities, but they are generally not observed using multivariate regressions that include additional variables on income, education, and housing age. SAR models result in higher r-square values compared to the OLS models across all cities, suggesting that spatial autocorrelation is an important feature of our data. Similarities among cities can be found based on shared characteristics of climate, race/ethnicity, and size. Our findings suggest that a suite of variables, including income, contribute to the distribution of UTC cover. These findings can help target simultaneous strategies for UTC goals and environmental justice concerns.
Predicting Error Bars for QSAR Models
DOE Office of Scientific and Technical Information (OSTI.GOV)
Schroeter, Timon; Technische Universitaet Berlin, Department of Computer Science, Franklinstrasse 28/29, 10587 Berlin; Schwaighofer, Anton
2007-09-18
Unfavorable physicochemical properties often cause drug failures. It is therefore important to take lipophilicity and water solubility into account early on in lead discovery. This study presents log D{sub 7} models built using Gaussian Process regression, Support Vector Machines, decision trees and ridge regression algorithms based on 14556 drug discovery compounds of Bayer Schering Pharma. A blind test was conducted using 7013 new measurements from the last months. We also present independent evaluations using public data. Apart from accuracy, we discuss the quality of error bars that can be computed by Gaussian Process models, and ensemble and distance based techniquesmore » for the other modelling approaches.« less
NASA Astrophysics Data System (ADS)
Kukunda, Collins B.; Duque-Lazo, Joaquín; González-Ferreiro, Eduardo; Thaden, Hauke; Kleinn, Christoph
2018-03-01
Distinguishing tree species is relevant in many contexts of remote sensing assisted forest inventory. Accurate tree species maps support management and conservation planning, pest and disease control and biomass estimation. This study evaluated the performance of applying ensemble techniques with the goal of automatically distinguishing Pinus sylvestris L. and Pinus uncinata Mill. Ex Mirb within a 1.3 km2 mountainous area in Barcelonnette (France). Three modelling schemes were examined, based on: (1) high-density LiDAR data (160 returns m-2), (2) Worldview-2 multispectral imagery, and (3) Worldview-2 and LiDAR in combination. Variables related to the crown structure and height of individual trees were extracted from the normalized LiDAR point cloud at individual-tree level, after performing individual tree crown (ITC) delineation. Vegetation indices and the Haralick texture indices were derived from Worldview-2 images and served as independent spectral variables. Selection of the best predictor subset was done after a comparison of three variable selection procedures: (1) Random Forests with cross validation (AUCRFcv), (2) Akaike Information Criterion (AIC) and (3) Bayesian Information Criterion (BIC). To classify the species, 9 regression techniques were combined using ensemble models. Predictions were evaluated using cross validation and an independent dataset. Integration of datasets and models improved individual tree species classification (True Skills Statistic, TSS; from 0.67 to 0.81) over individual techniques and maintained strong predictive power (Relative Operating Characteristic, ROC = 0.91). Assemblage of regression models and integration of the datasets provided more reliable species distribution maps and associated tree-scale mapping uncertainties. Our study highlights the potential of model and data assemblage at improving species classifications needed in present-day forest planning and management.
Modeling vertebrate diversity in Oregon using satellite imagery
NASA Astrophysics Data System (ADS)
Cablk, Mary Elizabeth
Vertebrate diversity was modeled for the state of Oregon using a parametric approach to regression tree analysis. This exploratory data analysis effectively modeled the non-linear relationships between vertebrate richness and phenology, terrain, and climate. Phenology was derived from time-series NOAA-AVHRR satellite imagery for the year 1992 using two methods: principal component analysis and derivation of EROS data center greenness metrics. These two measures of spatial and temporal vegetation condition incorporated the critical temporal element in this analysis. The first three principal components were shown to contain spatial and temporal information about the landscape and discriminated phenologically distinct regions in Oregon. Principal components 2 and 3, 6 greenness metrics, elevation, slope, aspect, annual precipitation, and annual seasonal temperature difference were investigated as correlates to amphibians, birds, all vertebrates, reptiles, and mammals. Variation explained for each regression tree by taxa were: amphibians (91%), birds (67%), all vertebrates (66%), reptiles (57%), and mammals (55%). Spatial statistics were used to quantify the pattern of each taxa and assess validity of resulting predictions from regression tree models. Regression tree analysis was relatively robust against spatial autocorrelation in the response data and graphical results indicated models were well fit to the data.
Andrew T. Hudak; Nicholas L. Crookston; Jeffrey S. Evans; Michael K. Falkowski; Alistair M. S. Smith; Paul E. Gessler; Penelope Morgan
2006-01-01
We compared the utility of discrete-return light detection and ranging (lidar) data and multispectral satellite imagery, and their integration, for modeling and mapping basal area and tree density across two diverse coniferous forest landscapes in north-central Idaho. We applied multiple linear regression models subset from a suite of 26 predictor variables derived...
Tree-based modeling of complex interactions of phosphorus loadings and environmental factors.
Grunwald, S; Daroub, S H; Lang, T A; Diaz, O A
2009-06-01
Phosphorus (P) enrichment has been observed in the historic oligotrophic Greater Everglades in Florida mainly due to P influx from upstream, agriculturally dominated, low relief drainage basins of the Everglades Agricultural Area (EAA). Our specific objectives were to: (1) investigate relationships between various environmental factors and P loads in 10 farm basins within the EAA, (2) identify those environmental factors that impart major effects on P loads using three different tree-based modeling approaches, and (3) evaluate predictive models to assess P loads. We assembled thirteen environmental variable sets for all 10 sub-basins characterizing water level management, cropping practices, soils, hydrology, and farm-specific properties. Drainage flow and P concentrations were measured at each sub-basin outlet from 1992-2002 and aggregated to derive monthly P loads. We used three different tree-based models including single regression trees (ST), committee trees in Bagging (CTb) and ARCing (CTa) modes and ten-fold cross-validation to test prediction performances. The monthly P loads (MPL) during the monitoring period showed a maximum of 2528 kg (mean: 103 kg) and maximum monthly unit area P loads (UAL) of 4.88 kg P ha(-1) (mean: 0.16 kg P ha(-1)). Our results suggest that hydrologic/water management properties are the major controlling variables to predict MPL and UAL in the EAA. Tree-based modeling was successful in identifying relationships between P loads and environmental predictor variables on 10 farms in the EAA indicated by high R(2) (>0.80) and low prediction errors. Committee trees in ARCing mode generated the best performing models to predict P loads and P loads per unit area. Tree-based models had the ability to analyze complex, non-linear relationships between P loads and multiple variables describing hydrologic/water management, cropping practices, soil and farm-specific properties within the EAA.
Jon C. Regelbrugge
1993-01-01
Abstract. We modeled tree mortality occurring two years following wildfire in Pinus ponderosa forests using data from 1275 trees in 25 stands burned during the 1987 Stanislaus Complex fires. We used logistic regression analysis to develop models relating the probability of wildfire-induced mortality with tree size and fire severity for Pinus ponderosa, Calocedrus...
Forest type mapping of the Interior West
Bonnie Ruefenacht; Gretchen G. Moisen; Jock A. Blackard
2004-01-01
This paper develops techniques for the mapping of forest types in Arizona, New Mexico, and Wyoming. The methods involve regression-tree modeling using a variety of remote sensing and GIS layers along with Forest Inventory Analysis (FIA) point data. Regression-tree modeling is a fast and efficient technique of estimating variables for large data sets with high accuracy...
The process and utility of classification and regression tree methodology in nursing research
Kuhn, Lisa; Page, Karen; Ward, John; Worrall-Carter, Linda
2014-01-01
Aim This paper presents a discussion of classification and regression tree analysis and its utility in nursing research. Background Classification and regression tree analysis is an exploratory research method used to illustrate associations between variables not suited to traditional regression analysis. Complex interactions are demonstrated between covariates and variables of interest in inverted tree diagrams. Design Discussion paper. Data sources English language literature was sourced from eBooks, Medline Complete and CINAHL Plus databases, Google and Google Scholar, hard copy research texts and retrieved reference lists for terms including classification and regression tree* and derivatives and recursive partitioning from 1984–2013. Discussion Classification and regression tree analysis is an important method used to identify previously unknown patterns amongst data. Whilst there are several reasons to embrace this method as a means of exploratory quantitative research, issues regarding quality of data as well as the usefulness and validity of the findings should be considered. Implications for Nursing Research Classification and regression tree analysis is a valuable tool to guide nurses to reduce gaps in the application of evidence to practice. With the ever-expanding availability of data, it is important that nurses understand the utility and limitations of the research method. Conclusion Classification and regression tree analysis is an easily interpreted method for modelling interactions between health-related variables that would otherwise remain obscured. Knowledge is presented graphically, providing insightful understanding of complex and hierarchical relationships in an accessible and useful way to nursing and other health professions. PMID:24237048
The process and utility of classification and regression tree methodology in nursing research.
Kuhn, Lisa; Page, Karen; Ward, John; Worrall-Carter, Linda
2014-06-01
This paper presents a discussion of classification and regression tree analysis and its utility in nursing research. Classification and regression tree analysis is an exploratory research method used to illustrate associations between variables not suited to traditional regression analysis. Complex interactions are demonstrated between covariates and variables of interest in inverted tree diagrams. Discussion paper. English language literature was sourced from eBooks, Medline Complete and CINAHL Plus databases, Google and Google Scholar, hard copy research texts and retrieved reference lists for terms including classification and regression tree* and derivatives and recursive partitioning from 1984-2013. Classification and regression tree analysis is an important method used to identify previously unknown patterns amongst data. Whilst there are several reasons to embrace this method as a means of exploratory quantitative research, issues regarding quality of data as well as the usefulness and validity of the findings should be considered. Classification and regression tree analysis is a valuable tool to guide nurses to reduce gaps in the application of evidence to practice. With the ever-expanding availability of data, it is important that nurses understand the utility and limitations of the research method. Classification and regression tree analysis is an easily interpreted method for modelling interactions between health-related variables that would otherwise remain obscured. Knowledge is presented graphically, providing insightful understanding of complex and hierarchical relationships in an accessible and useful way to nursing and other health professions. © 2013 The Authors. Journal of Advanced Nursing Published by John Wiley & Sons Ltd.
Red-shouldered hawk nesting habitat preference in south Texas
Strobel, Bradley N.; Boal, Clint W.
2010-01-01
We examined nesting habitat preference by red-shouldered hawks Buteo lineatus using conditional logistic regression on characteristics measured at 27 occupied nest sites and 68 unused sites in 2005–2009 in south Texas. We measured vegetation characteristics of individual trees (nest trees and unused trees) and corresponding 0.04-ha plots. We evaluated the importance of tree and plot characteristics to nesting habitat selection by comparing a priori tree-specific and plot-specific models using Akaike's information criterion. Models with only plot variables carried 14% more weight than models with only center tree variables. The model-averaged odds ratios indicated red-shouldered hawks selected to nest in taller trees and in areas with higher average diameter at breast height than randomly available within the forest stand. Relative to randomly selected areas, each 1-m increase in nest tree height and 1-cm increase in the plot average diameter at breast height increased the probability of selection by 85% and 10%, respectively. Our results indicate that red-shouldered hawks select nesting habitat based on vegetation characteristics of individual trees as well as the 0.04-ha area surrounding the tree. Our results indicate forest management practices resulting in tall forest stands with large average diameter at breast height would benefit red-shouldered hawks in south Texas.
Goo, Yeong-Jia James; Shen, Zone-De
2014-01-01
As the fraudulent financial statement of an enterprise is increasingly serious with each passing day, establishing a valid forecasting fraudulent financial statement model of an enterprise has become an important question for academic research and financial practice. After screening the important variables using the stepwise regression, the study also matches the logistic regression, support vector machine, and decision tree to construct the classification models to make a comparison. The study adopts financial and nonfinancial variables to assist in establishment of the forecasting fraudulent financial statement model. Research objects are the companies to which the fraudulent and nonfraudulent financial statement happened between years 1998 to 2012. The findings are that financial and nonfinancial information are effectively used to distinguish the fraudulent financial statement, and decision tree C5.0 has the best classification effect 85.71%. PMID:25302338
Chen, Suduan; Goo, Yeong-Jia James; Shen, Zone-De
2014-01-01
As the fraudulent financial statement of an enterprise is increasingly serious with each passing day, establishing a valid forecasting fraudulent financial statement model of an enterprise has become an important question for academic research and financial practice. After screening the important variables using the stepwise regression, the study also matches the logistic regression, support vector machine, and decision tree to construct the classification models to make a comparison. The study adopts financial and nonfinancial variables to assist in establishment of the forecasting fraudulent financial statement model. Research objects are the companies to which the fraudulent and nonfraudulent financial statement happened between years 1998 to 2012. The findings are that financial and nonfinancial information are effectively used to distinguish the fraudulent financial statement, and decision tree C5.0 has the best classification effect 85.71%.
Nattee, Cholwich; Khamsemanan, Nirattaya; Lawtrakul, Luckhana; Toochinda, Pisanu; Hannongbua, Supa
2017-01-01
Malaria is still one of the most serious diseases in tropical regions. This is due in part to the high resistance against available drugs for the inhibition of parasites, Plasmodium, the cause of the disease. New potent compounds with high clinical utility are urgently needed. In this work, we created a novel model using a regression tree to study structure-activity relationships and predict the inhibition constant, K i of three different antimalarial analogues (Trimethoprim, Pyrimethamine, and Cycloguanil) based on their molecular descriptors. To the best of our knowledge, this work is the first attempt to study the structure-activity relationships of all three analogues combined. The most relevant descriptors and appropriate parameters of the regression tree are harvested using extremely randomized trees. These descriptors are water accessible surface area, Log of the aqueous solubility, total hydrophobic van der Waals surface area, and molecular refractivity. Out of all possible combinations of these selected parameters and descriptors, the tree with the strongest coefficient of determination is selected to be our prediction model. Predicted K i values from the proposed model show a strong coefficient of determination, R 2 =0.996, to experimental K i values. From the structure of the regression tree, compounds with high accessible surface area of all hydrophobic atoms (ASA_H) and low aqueous solubility of inhibitors (Log S) generally possess low K i values. Our prediction model can also be utilized as a screening test for new antimalarial drug compounds which may reduce the time and expenses for new drug development. New compounds with high predicted K i should be excluded from further drug development. It is also our inference that a threshold of ASA_H greater than 575.80 and Log S less than or equal to -4.36 is a sufficient condition for a new compound to possess a low K i . Copyright © 2016 Elsevier Inc. All rights reserved.
Anderson, S.C.; Kupfer, J.A.; Wilson, R.R.; Cooper, R.J.
2000-01-01
The purpose of this research was to develop a model that could be used to provide a spatial representation of uneven-aged silvicultural treatments on forest crown area. We began by developing species-specific linear regression equations relating tree DBH to crown area for eight bottomland tree species at White River National Wildlife Refuge, Arkansas, USA. The relationships were highly significant for all species, with coefficients of determination (r(2)) ranging from 0.37 for Ulmus crassifolia to nearly 0.80 for Quercus nuttalliii and Taxodium distichum. We next located and measured the diameters of more than 4000 stumps from a single tree-group selection timber harvest. Stump locations were recorded with respect to an established gl id point system and entered into a Geographic Information System (ARC/INFO). The area occupied by the crown of each logged individual was then estimated by using the stump dimensions (adjusted to DBHs) and the regression equations relating tree DBH to crown area. Our model projected that the selection cuts removed roughly 300 m(2) of basal area from the logged sites resulting in the loss of approximate to 55 000 m(2) of crown area. The model developed in this research represents a tool that can be used in conjunction with remote sensing applications to assist in forest inventory and management, as well as to estimate the impacts of selective timber harvest on wildlife.
Ratcliffe, B; El-Dien, O G; Klápště, J; Porth, I; Chen, C; Jaquish, B; El-Kassaby, Y A
2015-01-01
Genomic selection (GS) potentially offers an unparalleled advantage over traditional pedigree-based selection (TS) methods by reducing the time commitment required to carry out a single cycle of tree improvement. This quality is particularly appealing to tree breeders, where lengthy improvement cycles are the norm. We explored the prospect of implementing GS for interior spruce (Picea engelmannii × glauca) utilizing a genotyped population of 769 trees belonging to 25 open-pollinated families. A series of repeated tree height measurements through ages 3–40 years permitted the testing of GS methods temporally. The genotyping-by-sequencing (GBS) platform was used for single nucleotide polymorphism (SNP) discovery in conjunction with three unordered imputation methods applied to a data set with 60% missing information. Further, three diverse GS models were evaluated based on predictive accuracy (PA), and their marker effects. Moderate levels of PA (0.31–0.55) were observed and were of sufficient capacity to deliver improved selection response over TS. Additionally, PA varied substantially through time accordingly with spatial competition among trees. As expected, temporal PA was well correlated with age-age genetic correlation (r=0.99), and decreased substantially with increasing difference in age between the training and validation populations (0.04–0.47). Moreover, our imputation comparisons indicate that k-nearest neighbor and singular value decomposition yielded a greater number of SNPs and gave higher predictive accuracies than imputing with the mean. Furthermore, the ridge regression (rrBLUP) and BayesCπ (BCπ) models both yielded equal, and better PA than the generalized ridge regression heteroscedastic effect model for the traits evaluated. PMID:26126540
Ratcliffe, B; El-Dien, O G; Klápště, J; Porth, I; Chen, C; Jaquish, B; El-Kassaby, Y A
2015-12-01
Genomic selection (GS) potentially offers an unparalleled advantage over traditional pedigree-based selection (TS) methods by reducing the time commitment required to carry out a single cycle of tree improvement. This quality is particularly appealing to tree breeders, where lengthy improvement cycles are the norm. We explored the prospect of implementing GS for interior spruce (Picea engelmannii × glauca) utilizing a genotyped population of 769 trees belonging to 25 open-pollinated families. A series of repeated tree height measurements through ages 3-40 years permitted the testing of GS methods temporally. The genotyping-by-sequencing (GBS) platform was used for single nucleotide polymorphism (SNP) discovery in conjunction with three unordered imputation methods applied to a data set with 60% missing information. Further, three diverse GS models were evaluated based on predictive accuracy (PA), and their marker effects. Moderate levels of PA (0.31-0.55) were observed and were of sufficient capacity to deliver improved selection response over TS. Additionally, PA varied substantially through time accordingly with spatial competition among trees. As expected, temporal PA was well correlated with age-age genetic correlation (r=0.99), and decreased substantially with increasing difference in age between the training and validation populations (0.04-0.47). Moreover, our imputation comparisons indicate that k-nearest neighbor and singular value decomposition yielded a greater number of SNPs and gave higher predictive accuracies than imputing with the mean. Furthermore, the ridge regression (rrBLUP) and BayesCπ (BCπ) models both yielded equal, and better PA than the generalized ridge regression heteroscedastic effect model for the traits evaluated.
L.R. Iverson; A.M. Prasad; A. Liaw
2004-01-01
More and better machine learning tools are becoming available for landscape ecologists to aid in understanding species-environment relationships and to map probable species occurrence now and potentially into the future. To thal end, we evaluated three statistical models: Regression Tree Analybib (RTA), Bagging Trees (BT) and Random Forest (RF) for their utility in...
Wang, Ling; Zhao, Geng-Xing; Zhu, Xi-Cun; Wang, Rui-Yan; Chang, Chun-Yan
2013-10-01
Taking Qixia City of Shandong, China as the study area, and based on the Landsat-5 TM and ALOS AVNIR-2 images, the canopy retrieval reflectance of apple trees at blossom stage was acquired. In combining with the measured reflectance of sample trees, the nitrogen-sensitive spectral indices were constructed and selected. By using the sensitive spectral indices as the independent variables, the nitrogen retrieval models were established, and the model with the best accuracy was used for spatial retrieve. The correlations between the spectral indices and the nitrogen nutritional status were in the order of canopy > leaf > flower. The sensitive indices were mainly composed of green, red, and near infrared bands. The accuracy of the retrieval models was in the order of support vector regression > multi-variable stepwise regression > one-variable regression. The retrieval results based on different images were similar, and showed that the leaf nitrogen content was mainly of grades 3-4 (27-33 g x kg(-1)), and the canopy nitrogen nutrient indices were mainly of grades 2-4 (TM: 38-47 g x kg(-1); ALOS: 32-41 g x kg(-1)). The spatial distribution of the retrieval nitrogen nutritional status based on different images also showed the similar trend, i. e., the nitrogen nutritional status was higher in the north and south than that in the middle part of the study area, and the areas with the high grades of leaf nitrogen and canopy nitrogen were mainly located in Sujiadian Town and Songshan subdistrict in the northwest, Zangjiazhuang Town and Tingkou Town in the northeast, and Shewopo Town in the south, which were consistent with the distribution of the key towns for apple production in Qixia City. This study provided a feasible method for the acquisition of nitrogen nutritional status of apple trees on macroscopic scale, and also, provided reference for other similar remote sensing retrievals.
New flux based dose-response relationships for ozone for European forest tree species.
Büker, P; Feng, Z; Uddling, J; Briolat, A; Alonso, R; Braun, S; Elvira, S; Gerosa, G; Karlsson, P E; Le Thiec, D; Marzuoli, R; Mills, G; Oksanen, E; Wieser, G; Wilkinson, M; Emberson, L D
2015-11-01
To derive O3 dose-response relationships (DRR) for five European forest trees species and broadleaf deciduous and needleleaf tree plant functional types (PFTs), phytotoxic O3 doses (PODy) were related to biomass reductions. PODy was calculated using a stomatal flux model with a range of cut-off thresholds (y) indicative of varying detoxification capacities. Linear regression analysis showed that DRR for PFT and individual tree species differed in their robustness. A simplified parameterisation of the flux model was tested and showed that for most non-Mediterranean tree species, this simplified model led to similarly robust DRR as compared to a species- and climate region-specific parameterisation. Experimentally induced soil water stress was not found to substantially reduce PODy, mainly due to the short duration of soil water stress periods. This study validates the stomatal O3 flux concept and represents a step forward in predicting O3 damage to forests in a spatially and temporally varying climate. Crown Copyright © 2015. Published by Elsevier Ltd. All rights reserved.
Trees Grow on Money: Urban Tree Canopy Cover and Environmental Justice
Schwarz, Kirsten; Fragkias, Michail; Boone, Christopher G.; Zhou, Weiqi; McHale, Melissa; Grove, J. Morgan; O’Neil-Dunne, Jarlath; McFadden, Joseph P.; Buckley, Geoffrey L.; Childers, Dan; Ogden, Laura; Pincetl, Stephanie; Pataki, Diane; Whitmer, Ali; Cadenasso, Mary L.
2015-01-01
This study examines the distributional equity of urban tree canopy (UTC) cover for Baltimore, MD, Los Angeles, CA, New York, NY, Philadelphia, PA, Raleigh, NC, Sacramento, CA, and Washington, D.C. using high spatial resolution land cover data and census data. Data are analyzed at the Census Block Group levels using Spearman’s correlation, ordinary least squares regression (OLS), and a spatial autoregressive model (SAR). Across all cities there is a strong positive correlation between UTC cover and median household income. Negative correlations between race and UTC cover exist in bivariate models for some cities, but they are generally not observed using multivariate regressions that include additional variables on income, education, and housing age. SAR models result in higher r-square values compared to the OLS models across all cities, suggesting that spatial autocorrelation is an important feature of our data. Similarities among cities can be found based on shared characteristics of climate, race/ethnicity, and size. Our findings suggest that a suite of variables, including income, contribute to the distribution of UTC cover. These findings can help target simultaneous strategies for UTC goals and environmental justice concerns. PMID:25830303
A novel tree-based procedure for deciphering the genomic spectrum of clinical disease entities.
Mbogning, Cyprien; Perdry, Hervé; Toussile, Wilson; Broët, Philippe
2014-01-01
Dissecting the genomic spectrum of clinical disease entities is a challenging task. Recursive partitioning (or classification trees) methods provide powerful tools for exploring complex interplay among genomic factors, with respect to a main factor, that can reveal hidden genomic patterns. To take confounding variables into account, the partially linear tree-based regression (PLTR) model has been recently published. It combines regression models and tree-based methodology. It is however computationally burdensome and not well suited for situations for which a large number of exploratory variables is expected. We developed a novel procedure that represents an alternative to the original PLTR procedure, and considered different selection criteria. A simulation study with different scenarios has been performed to compare the performances of the proposed procedure to the original PLTR strategy. The proposed procedure with a Bayesian Information Criterion (BIC) achieved good performances to detect the hidden structure as compared to the original procedure. The novel procedure was used for analyzing patterns of copy-number alterations in lung adenocarcinomas, with respect to Kirsten Rat Sarcoma Viral Oncogene Homolog gene (KRAS) mutation status, while controlling for a cohort effect. Results highlight two subgroups of pure or nearly pure wild-type KRAS tumors with particular copy-number alteration patterns. The proposed procedure with a BIC criterion represents a powerful and practical alternative to the original procedure. Our procedure performs well in a general framework and is simple to implement.
Spatial Assessment of Model Errors from Four Regression Techniques
Lianjun Zhang; Jeffrey H. Gove; Jeffrey H. Gove
2005-01-01
Fomst modelers have attempted to account for the spatial autocorrelations among trees in growth and yield models by applying alternative regression techniques such as linear mixed models (LMM), generalized additive models (GAM), and geographicalIy weighted regression (GWR). However, the model errors are commonly assessed using average errors across the entire study...
Rifai, Sami W; Urquiza Muñoz, José D; Negrón-Juárez, Robinson I; Ramírez Arévalo, Fredy R; Tello-Espinoza, Rodil; Vanderwel, Mark C; Lichstein, Jeremy W; Chambers, Jeffrey Q; Bohlman, Stephanie A
2016-10-01
Wind disturbance can create large forest blowdowns, which greatly reduces live biomass and adds uncertainty to the strength of the Amazon carbon sink. Observational studies from within the central Amazon have quantified blowdown size and estimated total mortality but have not determined which trees are most likely to die from a catastrophic wind disturbance. Also, the impact of spatial dependence upon tree mortality from wind disturbance has seldom been quantified, which is important because wind disturbance often kills clusters of trees due to large treefalls killing surrounding neighbors. We examine (1) the causes of differential mortality between adult trees from a 300-ha blowdown event in the Peruvian region of the northwestern Amazon, (2) how accounting for spatial dependence affects mortality predictions, and (3) how incorporating both differential mortality and spatial dependence affect the landscape level estimation of necromass produced from the blowdown. Standard regression and spatial regression models were used to estimate how stem diameter, wood density, elevation, and a satellite-derived disturbance metric influenced the probability of tree death from the blowdown event. The model parameters regarding tree characteristics, topography, and spatial autocorrelation of the field data were then used to determine the consequences of non-random mortality for landscape production of necromass through a simulation model. Tree mortality was highly non-random within the blowdown, where tree mortality rates were highest for trees that were large, had low wood density, and were located at high elevation. Of the differential mortality models, the non-spatial models overpredicted necromass, whereas the spatial model slightly underpredicted necromass. When parameterized from the same field data, the spatial regression model with differential mortality estimated only 7.5% more dead trees across the entire blowdown than the random mortality model, yet it estimated 51% greater necromass. We suggest that predictions of forest carbon loss from wind disturbance are sensitive to not only the underlying spatial dependence of observations, but also the biological differences between individuals that promote differential levels of mortality. © 2016 by the Ecological Society of America.
USDA-ARS?s Scientific Manuscript database
Incomplete meteorological data has been a problem in environmental modeling studies. The objective of this work was to develop a technique to reconstruct missing daily precipitation data in the central part of Chesapeake Bay Watershed using regression trees (RT) and artificial neural networks (ANN)....
USDA-ARS?s Scientific Manuscript database
Missing meteorological data have to be estimated for agricultural and environmental modeling. The objective of this work was to develop a technique to reconstruct the missing daily precipitation data in the central part of the Chesapeake Bay Watershed using regression trees (RT) and artificial neura...
Dispersion patterns and sampling plans for Diaphorina citri (Hemiptera: Psyllidae) in citrus.
Sétamou, Mamoudou; Flores, Daniel; French, J Victor; Hall, David G
2008-08-01
The abundance and spatial dispersion of Diaphorina citri Kuwayama (Hemiptera: Psyllidae) were studied in 34 grapefruit (Citrus paradisi Macfad.) and six sweet orange [Citrus sinensis (L.) Osbeck] orchards from March to August 2006 when the pest is more abundant in southern Texas. Although flush shoot infestation levels did not vary with host plant species, densities of D. citri eggs, nymphs, and adults were significantly higher on sweet orange than on grapefruit. D. citri immatures also were found in significantly higher numbers in the southeastern quadrant of trees than other parts of the canopy. The spatial distribution of D. citri nymphs and adults was analyzed using Iowa's patchiness regression and Taylor's power law. Taylor's power law fitted the data better than Iowa's model. Based on both regression models, the field dispersion patterns of D. citri nymphs and adults were aggregated among flush shoots in individual trees as indicated by the regression slopes that were significantly >1. For the average density of each life stage obtained during our surveys, the minimum number of flush shoots per tree needed to estimate D. citri densities varied from eight for eggs to four flush shoots for adults. Projections indicated that a sampling plan consisting of 10 trees and eight flush shoots per tree would provide density estimates of the three developmental stages of D. citri acceptable enough for population studies and management decisions. A presence-absence sampling plan with a fixed precision level was developed and can be used to provide a quick estimation of D. citri populations in citrus orchards.
An introduction to tree-structured modeling with application to quality of life data.
Su, Xiaogang; Azuero, Andres; Cho, June; Kvale, Elizabeth; Meneses, Karen M; McNees, M Patrick
2011-01-01
Investigators addressing nursing research are faced increasingly with the need to analyze data that involve variables of mixed types and are characterized by complex nonlinearity and interactions. Tree-based methods, also called recursive partitioning, are gaining popularity in various fields. In addition to efficiency and flexibility in handling multifaceted data, tree-based methods offer ease of interpretation. The aims of this study were to introduce tree-based methods, discuss their advantages and pitfalls in application, and describe their potential use in nursing research. In this article, (a) an introduction to tree-structured methods is presented, (b) the technique is illustrated via quality of life (QOL) data collected in the Breast Cancer Education Intervention study, and (c) implications for their potential use in nursing research are discussed. As illustrated by the QOL analysis example, tree methods generate interesting and easily understood findings that cannot be uncovered via traditional linear regression analysis. The expanding breadth and complexity of nursing research may entail the use of new tools to improve efficiency and gain new insights. In certain situations, tree-based methods offer an attractive approach that help address such needs.
Perceived Organizational Support for Enhancing Welfare at Work: A Regression Tree Model
Giorgi, Gabriele; Dubin, David; Perez, Javier Fiz
2016-01-01
When trying to examine outcomes such as welfare and well-being, research tends to focus on main effects and take into account limited numbers of variables at a time. There are a number of techniques that may help address this problem. For example, many statistical packages available in R provide easy-to-use methods of modeling complicated analysis such as classification and tree regression (i.e., recursive partitioning). The present research illustrates the value of recursive partitioning in the prediction of perceived organizational support in a sample of more than 6000 Italian bankers. Utilizing the tree function party package in R, we estimated a regression tree model predicting perceived organizational support from a multitude of job characteristics including job demand, lack of job control, lack of supervisor support, training, etc. The resulting model appears particularly helpful in pointing out several interactions in the prediction of perceived organizational support. In particular, training is the dominant factor. Another dimension that seems to influence organizational support is reporting (perceived communication about safety and stress concerns). Results are discussed from a theoretical and methodological point of view. PMID:28082924
Liu, Feng; Tan, Chang; Lei, Pi-Feng
2014-11-01
Taking Wugang forest farm in Xuefeng Mountain as the research object, using the airborne light detection and ranging (LiDAR) data under leaf-on condition and field data of concomitant plots, this paper assessed the ability of using LiDAR technology to estimate aboveground biomass of the mid-subtropical forest. A semi-automated individual tree LiDAR cloud point segmentation was obtained by using condition random fields and optimization methods. Spatial structure, waveform characteristics and topography were calculated as LiDAR metrics from the segmented objects. Then statistical models between aboveground biomass from field data and these LiDAR metrics were built. The individual tree recognition rates were 93%, 86% and 60% for coniferous, broadleaf and mixed forests, respectively. The adjusted coefficients of determination (R(2)adj) and the root mean squared errors (RMSE) for the three types of forest were 0.83, 0.81 and 0.74, and 28.22, 29.79 and 32.31 t · hm(-2), respectively. The estimation capability of model based on canopy geometric volume, tree percentile height, slope and waveform characteristics was much better than that of traditional regression model based on tree height. Therefore, LiDAR metrics from individual tree could facilitate better performance in biomass estimation.
Zhu, K; Lou, Z; Zhou, J; Ballester, N; Kong, N; Parikh, P
2015-01-01
This article is part of the Focus Theme of Methods of Information in Medicine on "Big Data and Analytics in Healthcare". Hospital readmissions raise healthcare costs and cause significant distress to providers and patients. It is, therefore, of great interest to healthcare organizations to predict what patients are at risk to be readmitted to their hospitals. However, current logistic regression based risk prediction models have limited prediction power when applied to hospital administrative data. Meanwhile, although decision trees and random forests have been applied, they tend to be too complex to understand among the hospital practitioners. Explore the use of conditional logistic regression to increase the prediction accuracy. We analyzed an HCUP statewide inpatient discharge record dataset, which includes patient demographics, clinical and care utilization data from California. We extracted records of heart failure Medicare beneficiaries who had inpatient experience during an 11-month period. We corrected the data imbalance issue with under-sampling. In our study, we first applied standard logistic regression and decision tree to obtain influential variables and derive practically meaning decision rules. We then stratified the original data set accordingly and applied logistic regression on each data stratum. We further explored the effect of interacting variables in the logistic regression modeling. We conducted cross validation to assess the overall prediction performance of conditional logistic regression (CLR) and compared it with standard classification models. The developed CLR models outperformed several standard classification models (e.g., straightforward logistic regression, stepwise logistic regression, random forest, support vector machine). For example, the best CLR model improved the classification accuracy by nearly 20% over the straightforward logistic regression model. Furthermore, the developed CLR models tend to achieve better sensitivity of more than 10% over the standard classification models, which can be translated to correct labeling of additional 400 - 500 readmissions for heart failure patients in the state of California over a year. Lastly, several key predictor identified from the HCUP data include the disposition location from discharge, the number of chronic conditions, and the number of acute procedures. It would be beneficial to apply simple decision rules obtained from the decision tree in an ad-hoc manner to guide the cohort stratification. It could be potentially beneficial to explore the effect of pairwise interactions between influential predictors when building the logistic regression models for different data strata. Judicious use of the ad-hoc CLR models developed offers insights into future development of prediction models for hospital readmissions, which can lead to better intuition in identifying high-risk patients and developing effective post-discharge care strategies. Lastly, this paper is expected to raise the awareness of collecting data on additional markers and developing necessary database infrastructure for larger-scale exploratory studies on readmission risk prediction.
A Ranking Approach to Genomic Selection.
Blondel, Mathieu; Onogi, Akio; Iwata, Hiroyoshi; Ueda, Naonori
2015-01-01
Genomic selection (GS) is a recent selective breeding method which uses predictive models based on whole-genome molecular markers. Until now, existing studies formulated GS as the problem of modeling an individual's breeding value for a particular trait of interest, i.e., as a regression problem. To assess predictive accuracy of the model, the Pearson correlation between observed and predicted trait values was used. In this paper, we propose to formulate GS as the problem of ranking individuals according to their breeding value. Our proposed framework allows us to employ machine learning methods for ranking which had previously not been considered in the GS literature. To assess ranking accuracy of a model, we introduce a new measure originating from the information retrieval literature called normalized discounted cumulative gain (NDCG). NDCG rewards more strongly models which assign a high rank to individuals with high breeding value. Therefore, NDCG reflects a prerequisite objective in selective breeding: accurate selection of individuals with high breeding value. We conducted a comparison of 10 existing regression methods and 3 new ranking methods on 6 datasets, consisting of 4 plant species and 25 traits. Our experimental results suggest that tree-based ensemble methods including McRank, Random Forests and Gradient Boosting Regression Trees achieve excellent ranking accuracy. RKHS regression and RankSVM also achieve good accuracy when used with an RBF kernel. Traditional regression methods such as Bayesian lasso, wBSR and BayesC were found less suitable for ranking. Pearson correlation was found to correlate poorly with NDCG. Our study suggests two important messages. First, ranking methods are a promising research direction in GS. Second, NDCG can be a useful evaluation measure for GS.
Decision trees in epidemiological research.
Venkatasubramaniam, Ashwini; Wolfson, Julian; Mitchell, Nathan; Barnes, Timothy; JaKa, Meghan; French, Simone
2017-01-01
In many studies, it is of interest to identify population subgroups that are relatively homogeneous with respect to an outcome. The nature of these subgroups can provide insight into effect mechanisms and suggest targets for tailored interventions. However, identifying relevant subgroups can be challenging with standard statistical methods. We review the literature on decision trees, a family of techniques for partitioning the population, on the basis of covariates, into distinct subgroups who share similar values of an outcome variable. We compare two decision tree methods, the popular Classification and Regression tree (CART) technique and the newer Conditional Inference tree (CTree) technique, assessing their performance in a simulation study and using data from the Box Lunch Study, a randomized controlled trial of a portion size intervention. Both CART and CTree identify homogeneous population subgroups and offer improved prediction accuracy relative to regression-based approaches when subgroups are truly present in the data. An important distinction between CART and CTree is that the latter uses a formal statistical hypothesis testing framework in building decision trees, which simplifies the process of identifying and interpreting the final tree model. We also introduce a novel way to visualize the subgroups defined by decision trees. Our novel graphical visualization provides a more scientifically meaningful characterization of the subgroups identified by decision trees. Decision trees are a useful tool for identifying homogeneous subgroups defined by combinations of individual characteristics. While all decision tree techniques generate subgroups, we advocate the use of the newer CTree technique due to its simplicity and ease of interpretation.
Louis R. Iverson; Anantha Prasad; Mark W. Schwartz; Mark W. Schwartz
1999-01-01
We are using a deterministic regression tree analysis model (DISTRIB) and a stochastic migration model (SHIFT) to examine potential distributions of ~66 individual species of eastern US trees under a 2 x CO2 climate change scenario. This process is demonstrated for Virginia pine (Pinus virginiana).
Austin, Peter C.; Tu, Jack V.; Ho, Jennifer E.; Levy, Daniel; Lee, Douglas S.
2014-01-01
Objective Physicians classify patients into those with or without a specific disease. Furthermore, there is often interest in classifying patients according to disease etiology or subtype. Classification trees are frequently used to classify patients according to the presence or absence of a disease. However, classification trees can suffer from limited accuracy. In the data-mining and machine learning literature, alternate classification schemes have been developed. These include bootstrap aggregation (bagging), boosting, random forests, and support vector machines. Study design and Setting We compared the performance of these classification methods with those of conventional classification trees to classify patients with heart failure according to the following sub-types: heart failure with preserved ejection fraction (HFPEF) vs. heart failure with reduced ejection fraction (HFREF). We also compared the ability of these methods to predict the probability of the presence of HFPEF with that of conventional logistic regression. Results We found that modern, flexible tree-based methods from the data mining literature offer substantial improvement in prediction and classification of heart failure sub-type compared to conventional classification and regression trees. However, conventional logistic regression had superior performance for predicting the probability of the presence of HFPEF compared to the methods proposed in the data mining literature. Conclusion The use of tree-based methods offers superior performance over conventional classification and regression trees for predicting and classifying heart failure subtypes in a population-based sample of patients from Ontario. However, these methods do not offer substantial improvements over logistic regression for predicting the presence of HFPEF. PMID:23384592
Marcus V. Warwell; Gerald E. Rehfeldt; Nicholas L. Crookston
2010-01-01
The Random Forests multiple regression tree was used to develop an empirically based bioclimatic model of the presence-absence of species occupying small geographic distributions in western North America. The species assessed were subalpine larch (Larix lyallii), smooth Arizona cypress (Cupressus arizonica ssp. glabra...
THOMAS J. BRANDEIS; MARIA DEL ROCIO SUAREZ ROZO
2005-01-01
Total aboveground live tree biomass in Puerto Rican lower montane wet, subtropical wet, subtropical moist and subtropical dry forests was estimated using data from two forest inventories and published regression equations. Multiple potentially-applicable published biomass models existed for some forested life zones, and their estimates tended to diverge with increasing...
Thomas J. Brandeis; Maria Del Rocio; Suarez Rozo
2005-01-01
Total aboveground live tree biomass in Puerto Rican lower montane wet, subtropical wet, subtropical moist and subtropical dry forests was estimated using data from two forest inventories and published regression equations. Multiple potentially-applicable published biomass models existed for some forested life zones, and their estimates tended to diverge with increasing...
Developing Models to Forcast Sales of Natural Christmas Trees
Lawrence D. Garrett; Thomas H. Pendleton
1977-01-01
A study of practices for marketing Christmas trees in Winston-Salem, North Carolina, and Denver, Colorado, revealed that such factors as retail lot competition, tree price, consumer traffic, and consumer income were very important in determining a particular retailer's sales. Analyses of 4 years of market data were used in developing regression models for...
Park, Ji Hyun; Kim, Hyeon-Young; Lee, Hanna; Yun, Eun Kyoung
2015-12-01
This study compares the performance of the logistic regression and decision tree analysis methods for assessing the risk factors for infection in cancer patients undergoing chemotherapy. The subjects were 732 cancer patients who were receiving chemotherapy at K university hospital in Seoul, Korea. The data were collected between March 2011 and February 2013 and were processed for descriptive analysis, logistic regression and decision tree analysis using the IBM SPSS Statistics 19 and Modeler 15.1 programs. The most common risk factors for infection in cancer patients receiving chemotherapy were identified as alkylating agents, vinca alkaloid and underlying diabetes mellitus. The logistic regression explained 66.7% of the variation in the data in terms of sensitivity and 88.9% in terms of specificity. The decision tree analysis accounted for 55.0% of the variation in the data in terms of sensitivity and 89.0% in terms of specificity. As for the overall classification accuracy, the logistic regression explained 88.0% and the decision tree analysis explained 87.2%. The logistic regression analysis showed a higher degree of sensitivity and classification accuracy. Therefore, logistic regression analysis is concluded to be the more effective and useful method for establishing an infection prediction model for patients undergoing chemotherapy. Copyright © 2015 Elsevier Ltd. All rights reserved.
Seligman, D A; Pullinger, A G
2000-01-01
Confusion about the relationship of occlusion to temporomandibular disorders (TMD) persists. This study attempted to identify occlusal and attrition factors plus age that would characterize asymptomatic normal female subjects. A total of 124 female patients with intracapsular TMD were compared with 47 asymptomatic female controls for associations to 9 occlusal factors, 3 attrition severity measures, and age using classification tree, multiple stepwise logistic regression, and univariate analyses. Models were tested for accuracy (sensitivity and specificity) and total contribution to the variance. The classification tree model had 4 terminal nodes that used only anterior attrition and age. "Normals" were mainly characterized by low attrition levels, whereas patients had higher attrition and tended to be younger. The tree model was only moderately useful (sensitivity 63%, specificity 94%) in predicting normals. The logistic regression model incorporated unilateral posterior crossbite and mediotrusive attrition severity in addition to the 2 factors in the tree, but was slightly less accurate than the tree (sensitivity 51%, specificity 90%). When only occlusal factors were considered in the analysis, normals were additionally characterized by a lack of anterior open bite, smaller overjet, and smaller RCP-ICP slides. The log likelihood accounted for was similar for both the tree (pseudo R(2) = 29.38%; mean deviance = 0.95) and the multiple logistic regression (Cox Snell R(2) = 30.3%, mean deviance = 0.84) models. The occlusal and attrition factors studied were only moderately useful in differentiating normals from TMD patients.
Schmid, Matthias; Küchenhoff, Helmut; Hoerauf, Achim; Tutz, Gerhard
2016-02-28
Survival trees are a popular alternative to parametric survival modeling when there are interactions between the predictor variables or when the aim is to stratify patients into prognostic subgroups. A limitation of classical survival tree methodology is that most algorithms for tree construction are designed for continuous outcome variables. Hence, classical methods might not be appropriate if failure time data are measured on a discrete time scale (as is often the case in longitudinal studies where data are collected, e.g., quarterly or yearly). To address this issue, we develop a method for discrete survival tree construction. The proposed technique is based on the result that the likelihood of a discrete survival model is equivalent to the likelihood of a regression model for binary outcome data. Hence, we modify tree construction methods for binary outcomes such that they result in optimized partitions for the estimation of discrete hazard functions. By applying the proposed method to data from a randomized trial in patients with filarial lymphedema, we demonstrate how discrete survival trees can be used to identify clinically relevant patient groups with similar survival behavior. Copyright © 2015 John Wiley & Sons, Ltd.
NASA Astrophysics Data System (ADS)
Holburn, E. R.; Bledsoe, B. P.; Poff, N. L.; Cuhaciyan, C. O.
2005-05-01
Using over 300 R/EMAP sites in OR and WA, we examine the relative explanatory power of watershed, valley, and reach scale descriptors in modeling variation in benthic macroinvertebrate indices. Innovative metrics describing flow regime, geomorphic processes, and hydrologic-distance weighted watershed and valley characteristics are used in multiple regression and regression tree modeling to predict EPT richness, % EPT, EPT/C, and % Plecoptera. A nested design using seven ecoregions is employed to evaluate the influence of geographic scale and environmental heterogeneity on the explanatory power of individual and combined scales. Regression tree models are constructed to explain variability while identifying threshold responses and interactions. Cross-validated models demonstrate differences in the explanatory power associated with single-scale and multi-scale models as environmental heterogeneity is varied. Models explaining the greatest variability in biological indices result from multi-scale combinations of physical descriptors. Results also indicate that substantial variation in benthic macroinvertebrate response can be explained with process-based watershed and valley scale metrics derived exclusively from common geospatial data. This study outlines a general framework for identifying key processes driving macroinvertebrate assemblages across a range of scales and establishing the geographic extent at which various levels of physical description best explain biological variability. Such information can guide process-based stratification to avoid spurious comparison of dissimilar stream types in bioassessments and ensure that key environmental gradients are adequately represented in sampling designs.
Huang, C.; Townshend, J.R.G.
2003-01-01
A stepwise regression tree (SRT) algorithm was developed for approximating complex nonlinear relationships. Based on the regression tree of Breiman et al . (BRT) and a stepwise linear regression (SLR) method, this algorithm represents an improvement over SLR in that it can approximate nonlinear relationships and over BRT in that it gives more realistic predictions. The applicability of this method to estimating subpixel forest was demonstrated using three test data sets, on all of which it gave more accurate predictions than SLR and BRT. SRT also generated more compact trees and performed better than or at least as well as BRT at all 10 equal forest proportion interval ranging from 0 to 100%. This method is appealing to estimating subpixel land cover over large areas.
Additivity of nonlinear biomass equations
Bernard R. Parresol
2001-01-01
Two procedures that guarantee the property of additivity among the components of tree biomass and total tree biomass utilizing nonlinear functions are developed. Procedure 1 is a simple combination approach, and procedure 2 is based on nonlinear joint-generalized regression (nonlinear seemingly unrelated regressions) with parameter restrictions. Statistical theory is...
NASA Astrophysics Data System (ADS)
Park, Seonyoung; Im, Jungho; Park, Sumin; Rhee, Jinyoung
2017-04-01
Soil moisture is one of the most important keys for understanding regional and global climate systems. Soil moisture is directly related to agricultural processes as well as hydrological processes because soil moisture highly influences vegetation growth and determines water supply in the agroecosystem. Accurate monitoring of the spatiotemporal pattern of soil moisture is important. Soil moisture has been generally provided through in situ measurements at stations. Although field survey from in situ measurements provides accurate soil moisture with high temporal resolution, it requires high cost and does not provide the spatial distribution of soil moisture over large areas. Microwave satellite (e.g., advanced Microwave Scanning Radiometer on the Earth Observing System (AMSR2), the Advanced Scatterometer (ASCAT), and Soil Moisture Active Passive (SMAP)) -based approaches and numerical models such as Global Land Data Assimilation System (GLDAS) and Modern- Era Retrospective Analysis for Research and Applications (MERRA) provide spatial-temporalspatiotemporally continuous soil moisture products at global scale. However, since those global soil moisture products have coarse spatial resolution ( 25-40 km), their applications for agriculture and water resources at local and regional scales are very limited. Thus, soil moisture downscaling is needed to overcome the limitation of the spatial resolution of soil moisture products. In this study, GLDAS soil moisture data were downscaled up to 1 km spatial resolution through the integration of AMSR2 and ASCAT soil moisture data, Shuttle Radar Topography Mission (SRTM) Digital Elevation Model (DEM), and Moderate Resolution Imaging Spectroradiometer (MODIS) data—Land Surface Temperature, Normalized Difference Vegetation Index, and Land cover—using modified regression trees over East Asia from 2013 to 2015. Modified regression trees were implemented using Cubist, a commercial software tool based on machine learning. An optimization based on pruning of rules derived from the modified regression trees was conducted. Root Mean Square Error (RMSE) and Correlation coefficients (r) were used to optimize the rules, and finally 59 rules from modified regression trees were produced. The results show high validation r (0.79) and low validation RMSE (0.0556m3/m3). The 1 km downscaled soil moisture was evaluated using ground soil moisture data at 14 stations, and both soil moisture data showed similar temporal patterns (average r=0.51 and average RMSE=0.041). The spatial distribution of the 1 km downscaled soil moisture well corresponded with GLDAS soil moisture that caught both extremely dry and wet regions. Correlation between GLDAS and the 1 km downscaled soil moisture during growing season was positive (mean r=0.35) in most regions.
Louis R. Iverson; Anantha M. Prasad; Anantha M. Prasad
2002-01-01
Global climate change could have profound effects on the Earth's biota, including large redistributions of tree species and forest types. We used DISTRIB, a deterministic regression tree analysis model, to examine environmental drivers related to current forest-species distributions and then model potential suitable habitat under five climate change scenarios...
Ultrasonographic Diagnosis of Biliary Atresia Based on a Decision-Making Tree Model.
Lee, So Mi; Cheon, Jung-Eun; Choi, Young Hun; Kim, Woo Sun; Cho, Hyun-Hae; Cho, Hyun-Hye; Kim, In-One; You, Sun Kyoung
2015-01-01
To assess the diagnostic value of various ultrasound (US) findings and to make a decision-tree model for US diagnosis of biliary atresia (BA). From March 2008 to January 2014, the following US findings were retrospectively evaluated in 100 infants with cholestatic jaundice (BA, n = 46; non-BA, n = 54): length and morphology of the gallbladder, triangular cord thickness, hepatic artery and portal vein diameters, and visualization of the common bile duct. Logistic regression analyses were performed to determine the features that would be useful in predicting BA. Conditional inference tree analysis was used to generate a decision-making tree for classifying patients into the BA or non-BA groups. Multivariate logistic regression analysis showed that abnormal gallbladder morphology and greater triangular cord thickness were significant predictors of BA (p = 0.003 and 0.001; adjusted odds ratio: 345.6 and 65.6, respectively). In the decision-making tree using conditional inference tree analysis, gallbladder morphology and triangular cord thickness (optimal cutoff value of triangular cord thickness, 3.4 mm) were also selected as significant discriminators for differential diagnosis of BA, and gallbladder morphology was the first discriminator. The diagnostic performance of the decision-making tree was excellent, with sensitivity of 100% (46/46), specificity of 94.4% (51/54), and overall accuracy of 97% (97/100). Abnormal gallbladder morphology and greater triangular cord thickness (> 3.4 mm) were the most useful predictors of BA on US. We suggest that the gallbladder morphology should be evaluated first and that triangular cord thickness should be evaluated subsequently in cases with normal gallbladder morphology.
Predicting U.S. Army Reserve Unit Manning Using Market Demographics
2015-06-01
develops linear regression , classification tree, and logistic regression models to determine the ability of the location to support manning requirements... logistic regression model delivers predictive results that allow decision-makers to identify locations with a high probability of meeting unit...manning requirements. The recommendation of this thesis is that the USAR implement the logistic regression model. 14. SUBJECT TERMS U.S
Identification of chilling and heat requirements of cherry trees--a statistical approach.
Luedeling, Eike; Kunz, Achim; Blanke, Michael M
2013-09-01
Most trees from temperate climates require the accumulation of winter chill and subsequent heat during their dormant phase to resume growth and initiate flowering in the following spring. Global warming could reduce chill and hence hamper the cultivation of high-chill species such as cherries. Yet determining chilling and heat requirements requires large-scale controlled-forcing experiments, and estimates are thus often unavailable. Where long-term phenology datasets exist, partial least squares (PLS) regression can be used as an alternative, to determine climatic requirements statistically. Bloom dates of cherry cv. 'Schneiders späte Knorpelkirsche' trees in Klein-Altendorf, Germany, from 24 growing seasons were correlated with 11-day running means of daily mean temperature. Based on the output of the PLS regression, five candidate chilling periods ranging in length from 17 to 102 days, and one forcing phase of 66 days were delineated. Among three common chill models used to quantify chill, the Dynamic Model showed the lowest variation in chill, indicating that it may be more accurate than the Utah and Chilling Hours Models. Based on the longest candidate chilling phase with the earliest starting date, cv. 'Schneiders späte Knorpelkirsche' cherries at Bonn exhibited a chilling requirement of 68.6 ± 5.7 chill portions (or 1,375 ± 178 chilling hours or 1,410 ± 238 Utah chill units) and a heat requirement of 3,473 ± 1,236 growing degree hours. Closer investigation of the distinct chilling phases detected by PLS regression could contribute to our understanding of dormancy processes and thus help fruit and nut growers identify suitable tree cultivars for a future in which static climatic conditions can no longer be assumed. All procedures used in this study were bundled in an R package ('chillR') and are provided as Supplementary materials. The procedure was also applied to leaf emergence dates of walnut (cv. 'Payne') at Davis, California.
Method for estimating potential tree-grade distributions for northeastern forest species
Daniel A. Yaussy; Daniel A. Yaussy
1993-01-01
Generalized logistic regression was used to distribute trees into four potential tree grades for 20 northeastern species groups. The potential tree grade is defined as the tree grade based on the length and amount of clear cuttings and defects only, disregarding minimum grading diameter. The algorithms described use site index and tree diameter as the predictive...
Facchinello, Yann; Beauséjour, Marie; Richard-Denis, Andreane; Thompson, Cynthia; Mac-Thiong, Jean-Marc
2017-10-25
Predicting the long-term functional outcome following traumatic spinal cord injury is needed to adapt medical strategies and to plan an optimized rehabilitation. This study investigates the use of regression tree for the development of predictive models based on acute clinical and demographic predictors. This prospective study was performed on 172 patients hospitalized following traumatic spinal cord injury. Functional outcome was quantified using the Spinal Cord Independence Measure collected within the first-year post injury. Age, delay prior to surgery and Injury Severity Score were considered as continuous predictors while energy of injury, trauma mechanisms, neurological level of injury, injury severity, occurrence of early spasticity, urinary tract infection, pressure ulcer and pneumonia were coded as categorical inputs. A simplified model was built using only injury severity, neurological level, energy and age as predictor and was compared to a more complex model considering all 11 predictors mentioned above The models built using 4 and 11 predictors were found to explain 51.4% and 62.3% of the variance of the Spinal Cord Independence Measure total score after validation, respectively. The severity of the neurological deficit at admission was found to be the most important predictor. Other important predictors were the Injury Severity Score, age, neurological level and delay prior to surgery. Regression trees offer promising performances for predicting the functional outcome after a traumatic spinal cord injury. It could help to determine the number and type of predictors leading to a prediction model of the functional outcome that can be used clinically in the future.
Lisa M. Ganio; Robert A. Progar
2017-01-01
Wild and prescribed fire-induced injury to forest trees can produce immediate or delayed tree mortality but fire-injured trees can also survive. Land managers use logistic regression models that incorporate tree-injury variables to discriminate between fatally injured trees and those that will survive. We used data from 4024 ponderosa pine (Pinus ponderosa...
Censored quantile regression with recursive partitioning-based weights
Wey, Andrew; Wang, Lan; Rudser, Kyle
2014-01-01
Censored quantile regression provides a useful alternative to the Cox proportional hazards model for analyzing survival data. It directly models the conditional quantile of the survival time and hence is easy to interpret. Moreover, it relaxes the proportionality constraint on the hazard function associated with the popular Cox model and is natural for modeling heterogeneity of the data. Recently, Wang and Wang (2009. Locally weighted censored quantile regression. Journal of the American Statistical Association 103, 1117–1128) proposed a locally weighted censored quantile regression approach that allows for covariate-dependent censoring and is less restrictive than other censored quantile regression methods. However, their kernel smoothing-based weighting scheme requires all covariates to be continuous and encounters practical difficulty with even a moderate number of covariates. We propose a new weighting approach that uses recursive partitioning, e.g. survival trees, that offers greater flexibility in handling covariate-dependent censoring in moderately high dimensions and can incorporate both continuous and discrete covariates. We prove that this new weighting scheme leads to consistent estimation of the quantile regression coefficients and demonstrate its effectiveness via Monte Carlo simulations. We also illustrate the new method using a widely recognized data set from a clinical trial on primary biliary cirrhosis. PMID:23975800
Hu, Chen; Steingrimsson, Jon Arni
2018-01-01
A crucial component of making individualized treatment decisions is to accurately predict each patient's disease risk. In clinical oncology, disease risks are often measured through time-to-event data, such as overall survival and progression/recurrence-free survival, and are often subject to censoring. Risk prediction models based on recursive partitioning methods are becoming increasingly popular largely due to their ability to handle nonlinear relationships, higher-order interactions, and/or high-dimensional covariates. The most popular recursive partitioning methods are versions of the Classification and Regression Tree (CART) algorithm, which builds a simple interpretable tree structured model. With the aim of increasing prediction accuracy, the random forest algorithm averages multiple CART trees, creating a flexible risk prediction model. Risk prediction models used in clinical oncology commonly use both traditional demographic and tumor pathological factors as well as high-dimensional genetic markers and treatment parameters from multimodality treatments. In this article, we describe the most commonly used extensions of the CART and random forest algorithms to right-censored outcomes. We focus on how they differ from the methods for noncensored outcomes, and how the different splitting rules and methods for cost-complexity pruning impact these algorithms. We demonstrate these algorithms by analyzing a randomized Phase III clinical trial of breast cancer. We also conduct Monte Carlo simulations to compare the prediction accuracy of survival forests with more commonly used regression models under various scenarios. These simulation studies aim to evaluate how sensitive the prediction accuracy is to the underlying model specifications, the choice of tuning parameters, and the degrees of missing covariates.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rupšys, P.
A system of stochastic differential equations (SDE) with mixed-effects parameters and multivariate normal copula density function were used to develop tree height model for Scots pine trees in Lithuania. A two-step maximum likelihood parameter estimation method is used and computational guidelines are given. After fitting the conditional probability density functions to outside bark diameter at breast height, and total tree height, a bivariate normal copula distribution model was constructed. Predictions from the mixed-effects parameters SDE tree height model calculated during this research were compared to the regression tree height equations. The results are implemented in the symbolic computational language MAPLE.
Abbasitabar, Fatemeh; Zare-Shahabadi, Vahid
2017-04-01
Risk assessment of chemicals is an important issue in environmental protection; however, there is a huge lack of experimental data for a large number of end-points. The experimental determination of toxicity of chemicals involves high costs and time-consuming process. In silico tools such as quantitative structure-toxicity relationship (QSTR) models, which are constructed on the basis of computational molecular descriptors, can predict missing data for toxic end-points for existing or even not yet synthesized chemicals. Phenol derivatives are known to be aquatic pollutants. With this background, we aimed to develop an accurate and reliable QSTR model for the prediction of toxicity of 206 phenols to Tetrahymena pyriformis. A multiple linear regression (MLR)-based QSTR was obtained using a powerful descriptor selection tool named Memorized_ACO algorithm. Statistical parameters of the model were 0.72 and 0.68 for R training 2 and R test 2 , respectively. To develop a high-quality QSTR model, classification and regression tree (CART) was employed. Two approaches were considered: (1) phenols were classified into different modes of action using CART and (2) the phenols in the training set were partitioned to several subsets by a tree in such a manner that in each subset, a high-quality MLR could be developed. For the first approach, the statistical parameters of the resultant QSTR model were improved to 0.83 and 0.75 for R training 2 and R test 2 , respectively. Genetic algorithm was employed in the second approach to obtain an optimal tree, and it was shown that the final QSTR model provided excellent prediction accuracy for the training and test sets (R training 2 and R test 2 were 0.91 and 0.93, respectively). The mean absolute error for the test set was computed as 0.1615. Copyright © 2016 Elsevier Ltd. All rights reserved.
Liu, Rong; Li, Xi; Zhang, Wei; Zhou, Hong-Hao
2015-01-01
Objective Multiple linear regression (MLR) and machine learning techniques in pharmacogenetic algorithm-based warfarin dosing have been reported. However, performances of these algorithms in racially diverse group have never been objectively evaluated and compared. In this literature-based study, we compared the performances of eight machine learning techniques with those of MLR in a large, racially-diverse cohort. Methods MLR, artificial neural network (ANN), regression tree (RT), multivariate adaptive regression splines (MARS), boosted regression tree (BRT), support vector regression (SVR), random forest regression (RFR), lasso regression (LAR) and Bayesian additive regression trees (BART) were applied in warfarin dose algorithms in a cohort from the International Warfarin Pharmacogenetics Consortium database. Covariates obtained by stepwise regression from 80% of randomly selected patients were used to develop algorithms. To compare the performances of these algorithms, the mean percentage of patients whose predicted dose fell within 20% of the actual dose (mean percentage within 20%) and the mean absolute error (MAE) were calculated in the remaining 20% of patients. The performances of these techniques in different races, as well as the dose ranges of therapeutic warfarin were compared. Robust results were obtained after 100 rounds of resampling. Results BART, MARS and SVR were statistically indistinguishable and significantly out performed all the other approaches in the whole cohort (MAE: 8.84–8.96 mg/week, mean percentage within 20%: 45.88%–46.35%). In the White population, MARS and BART showed higher mean percentage within 20% and lower mean MAE than those of MLR (all p values < 0.05). In the Asian population, SVR, BART, MARS and LAR performed the same as MLR. MLR and LAR optimally performed among the Black population. When patients were grouped in terms of warfarin dose range, all machine learning techniques except ANN and LAR showed significantly higher mean percentage within 20%, and lower MAE (all p values < 0.05) than MLR in the low- and high- dose ranges. Conclusion Overall, machine learning-based techniques, BART, MARS and SVR performed superior than MLR in warfarin pharmacogenetic dosing. Differences of algorithms’ performances exist among the races. Moreover, machine learning-based algorithms tended to perform better in the low- and high- dose ranges than MLR. PMID:26305568
NASA Astrophysics Data System (ADS)
Park, J.; Yoo, K.
2013-12-01
For groundwater resource conservation, it is important to accurately assess groundwater pollution sensitivity or vulnerability. In this work, we attempted to use data mining approach to assess groundwater pollution vulnerability in a TCE (trichloroethylene) contaminated Korean industrial site. The conventional DRASTIC method failed to describe TCE sensitivity data with a poor correlation with hydrogeological properties. Among the different data mining methods such as Artificial Neural Network (ANN), Multiple Logistic Regression (MLR), Case Base Reasoning (CBR), and Decision Tree (DT), the accuracy and consistency of Decision Tree (DT) was the best. According to the following tree analyses with the optimal DT model, the failure of the conventional DRASTIC method in fitting with TCE sensitivity data may be due to the use of inaccurate weight values of hydrogeological parameters for the study site. These findings provide a proof of concept that DT based data mining approach can be used in predicting and rule induction of groundwater TCE sensitivity without pre-existing information on weights of hydrogeological properties.
Shi, K-Q; Zhou, Y-Y; Yan, H-D; Li, H; Wu, F-L; Xie, Y-Y; Braddock, M; Lin, X-Y; Zheng, M-H
2017-02-01
At present, there is no ideal model for predicting the short-term outcome of patients with acute-on-chronic hepatitis B liver failure (ACHBLF). This study aimed to establish and validate a prognostic model by using the classification and regression tree (CART) analysis. A total of 1047 patients from two separate medical centres with suspected ACHBLF were screened in the study, which were recognized as derivation cohort and validation cohort, respectively. CART analysis was applied to predict the 3-month mortality of patients with ACHBLF. The accuracy of the CART model was tested using the area under the receiver operating characteristic curve, which was compared with the model for end-stage liver disease (MELD) score and a new logistic regression model. CART analysis identified four variables as prognostic factors of ACHBLF: total bilirubin, age, serum sodium and INR, and three distinct risk groups: low risk (4.2%), intermediate risk (30.2%-53.2%) and high risk (81.4%-96.9%). The new logistic regression model was constructed with four independent factors, including age, total bilirubin, serum sodium and prothrombin activity by multivariate logistic regression analysis. The performances of the CART model (0.896), similar to the logistic regression model (0.914, P=.382), exceeded that of MELD score (0.667, P<.001). The results were confirmed in the validation cohort. We have developed and validated a novel CART model superior to MELD for predicting three-month mortality of patients with ACHBLF. Thus, the CART model could facilitate medical decision-making and provide clinicians with a validated practical bedside tool for ACHBLF risk stratification. © 2016 John Wiley & Sons Ltd.
Identification of extremely premature infants at high risk of rehospitalization.
Ambalavanan, Namasivayam; Carlo, Waldemar A; McDonald, Scott A; Yao, Qing; Das, Abhik; Higgins, Rosemary D
2011-11-01
Extremely low birth weight infants often require rehospitalization during infancy. Our objective was to identify at the time of discharge which extremely low birth weight infants are at higher risk for rehospitalization. Data from extremely low birth weight infants in Eunice Kennedy Shriver National Institute of Child Health and Human Development Neonatal Research Network centers from 2002-2005 were analyzed. The primary outcome was rehospitalization by the 18- to 22-month follow-up, and secondary outcome was rehospitalization for respiratory causes in the first year. Using variables and odds ratios identified by stepwise logistic regression, scoring systems were developed with scores proportional to odds ratios. Classification and regression-tree analysis was performed by recursive partitioning and automatic selection of optimal cutoff points of variables. A total of 3787 infants were evaluated (mean ± SD birth weight: 787 ± 136 g; gestational age: 26 ± 2 weeks; 48% male, 42% black). Forty-five percent of the infants were rehospitalized by 18 to 22 months; 14.7% were rehospitalized for respiratory causes in the first year. Both regression models (area under the curve: 0.63) and classification and regression-tree models (mean misclassification rate: 40%-42%) were moderately accurate. Predictors for the primary outcome by regression were shunt surgery for hydrocephalus, hospital stay of >120 days for pulmonary reasons, necrotizing enterocolitis stage II or higher or spontaneous gastrointestinal perforation, higher fraction of inspired oxygen at 36 weeks, and male gender. By classification and regression-tree analysis, infants with hospital stays of >120 days for pulmonary reasons had a 66% rehospitalization rate compared with 42% without such a stay. The scoring systems and classification and regression-tree analysis models identified infants at higher risk of rehospitalization and might assist planning for care after discharge.
Identification of Extremely Premature Infants at High Risk of Rehospitalization
Carlo, Waldemar A.; McDonald, Scott A.; Yao, Qing; Das, Abhik; Higgins, Rosemary D.
2011-01-01
OBJECTIVE: Extremely low birth weight infants often require rehospitalization during infancy. Our objective was to identify at the time of discharge which extremely low birth weight infants are at higher risk for rehospitalization. METHODS: Data from extremely low birth weight infants in Eunice Kennedy Shriver National Institute of Child Health and Human Development Neonatal Research Network centers from 2002–2005 were analyzed. The primary outcome was rehospitalization by the 18- to 22-month follow-up, and secondary outcome was rehospitalization for respiratory causes in the first year. Using variables and odds ratios identified by stepwise logistic regression, scoring systems were developed with scores proportional to odds ratios. Classification and regression-tree analysis was performed by recursive partitioning and automatic selection of optimal cutoff points of variables. RESULTS: A total of 3787 infants were evaluated (mean ± SD birth weight: 787 ± 136 g; gestational age: 26 ± 2 weeks; 48% male, 42% black). Forty-five percent of the infants were rehospitalized by 18 to 22 months; 14.7% were rehospitalized for respiratory causes in the first year. Both regression models (area under the curve: 0.63) and classification and regression-tree models (mean misclassification rate: 40%–42%) were moderately accurate. Predictors for the primary outcome by regression were shunt surgery for hydrocephalus, hospital stay of >120 days for pulmonary reasons, necrotizing enterocolitis stage II or higher or spontaneous gastrointestinal perforation, higher fraction of inspired oxygen at 36 weeks, and male gender. By classification and regression-tree analysis, infants with hospital stays of >120 days for pulmonary reasons had a 66% rehospitalization rate compared with 42% without such a stay. CONCLUSIONS: The scoring systems and classification and regression-tree analysis models identified infants at higher risk of rehospitalization and might assist planning for care after discharge. PMID:22007016
NASA Astrophysics Data System (ADS)
Kisi, Ozgur; Parmar, Kulwinder Singh
2016-03-01
This study investigates the accuracy of least square support vector machine (LSSVM), multivariate adaptive regression splines (MARS) and M5 model tree (M5Tree) in modeling river water pollution. Various combinations of water quality parameters, Free Ammonia (AMM), Total Kjeldahl Nitrogen (TKN), Water Temperature (WT), Total Coliform (TC), Fecal Coliform (FC) and Potential of Hydrogen (pH) monitored at Nizamuddin, Delhi Yamuna River in India were used as inputs to the applied models. Results indicated that the LSSVM and MARS models had almost same accuracy and they performed better than the M5Tree model in modeling monthly chemical oxygen demand (COD). The average root mean square error (RMSE) of the LSSVM and M5Tree models was decreased by 1.47% and 19.1% using MARS model, respectively. Adding TC input to the models did not increase their accuracy in modeling COD while adding FC and pH inputs to the models generally decreased the accuracy. The overall results indicated that the MARS and LSSVM models could be successfully used in estimating monthly river water pollution level by using AMM, TKN and WT parameters as inputs.
Distribution of cavity trees in midwestern old-growth and second-growth forests
Zhaofei Fan; Stephen R. Shifley; Martin A. Spetich; Frank R. Thompson; David R. Larsen
2003-01-01
We used classification and regression tree analysis to determine the primary variables associated with the occurrence of cavity trees and the hierarchical structure among those variables. We applied that information to develop logistic models predicting cavity tree probability as a function of diameter, species group, and decay class. Inventories of cavity abundance in...
Distribution of cavity trees in midwesternold-growth and second-growth forests
Zhaofei Fan; Stephen R. Shifley; Martin A. Spetich; Frank R., III Thompson; David R. Larsen
2003-01-01
We used classification and regression tree analysis to determine the primary variables associated with the occurrence of cavity trees and the hierarchical structure among those variables. We applied that information to develop logistic models predicting cavity tree probability as a function of diameter, species group, and decay class. Inventories of cavity abundance in...
Synchrony of forest responses to climate from the aspect of tree mortality in South Korea
NASA Astrophysics Data System (ADS)
Kim, M.; Lee, W. K.; Piao, D.; Choi, G. M.; Gang, H. U.
2016-12-01
Mortality is a key process in forest-stand dynamics. However, tree mortality is not well understood, particularly in relation to climatic factors. The objectives of this study were to: (i) determine the patterns of maximum stem number (MSN) per ha over dominant tree height from 5-year remeasurements of the permanent sample plots for temperate forests [Red pine (Pinus densiflora), Japanese larch (Larix kaempferi), Korean pine (Pinus koraiensis), Chinese cork oak (Quercus variabilis), and Mongolian oak (Quercus mongolica)] using Sterba's theory and Korean National Forest Inventory (NFI) data, (ii) develop a stand-level mortality (self-thinning) model using the MSN curve, and (iii) assess the impact of temperature on tree mortality in semi-variogram and linear regression models. The MSN curve represents the upper range of observed stem numbers per ha. The mortality model and validation statistic reveal significant differences between the observed data and the model predictions (R2 = 0.55-0.81), and no obvious dependencies or patterns that indicate systematic trends between the residuals and the independent variable. However, spatial autocorrelation was detected from residuals of coniferous species (Red pine, Japanese larch and Korean pine), but not of oak species (Chinese cork oak and Mongolian oak). Based on linear regression from residuals, we found that the mortality of coniferous forests tended to increase when the annual mean temperature increased. Conversely, oak mortality nonsignificantly decreased with increasing temperature. These findings indicate that enhanced tree mortality due to rising temperatures in response to climate change is possible, especially in coniferous forests, and are expected to contribute to policy decisions to support and forest management practices.
A P2P Botnet detection scheme based on decision tree and adaptive multilayer neural networks.
Alauthaman, Mohammad; Aslam, Nauman; Zhang, Li; Alasem, Rafe; Hossain, M A
2018-01-01
In recent years, Botnets have been adopted as a popular method to carry and spread many malicious codes on the Internet. These malicious codes pave the way to execute many fraudulent activities including spam mail, distributed denial-of-service attacks and click fraud. While many Botnets are set up using centralized communication architecture, the peer-to-peer (P2P) Botnets can adopt a decentralized architecture using an overlay network for exchanging command and control data making their detection even more difficult. This work presents a method of P2P Bot detection based on an adaptive multilayer feed-forward neural network in cooperation with decision trees. A classification and regression tree is applied as a feature selection technique to select relevant features. With these features, a multilayer feed-forward neural network training model is created using a resilient back-propagation learning algorithm. A comparison of feature set selection based on the decision tree, principal component analysis and the ReliefF algorithm indicated that the neural network model with features selection based on decision tree has a better identification accuracy along with lower rates of false positives. The usefulness of the proposed approach is demonstrated by conducting experiments on real network traffic datasets. In these experiments, an average detection rate of 99.08 % with false positive rate of 0.75 % was observed.
Comparison of stream invertebrate response models for bioassessment metric
Waite, Ian R.; Kennen, Jonathan G.; May, Jason T.; Brown, Larry R.; Cuffney, Thomas F.; Jones, Kimberly A.; Orlando, James L.
2012-01-01
We aggregated invertebrate data from various sources to assemble data for modeling in two ecoregions in Oregon and one in California. Our goal was to compare the performance of models developed using multiple linear regression (MLR) techniques with models developed using three relatively new techniques: classification and regression trees (CART), random forest (RF), and boosted regression trees (BRT). We used tolerance of taxa based on richness (RICHTOL) and ratio of observed to expected taxa (O/E) as response variables and land use/land cover as explanatory variables. Responses were generally linear; therefore, there was little improvement to the MLR models when compared to models using CART and RF. In general, the four modeling techniques (MLR, CART, RF, and BRT) consistently selected the same primary explanatory variables for each region. However, results from the BRT models showed significant improvement over the MLR models for each region; increases in R2 from 0.09 to 0.20. The O/E metric that was derived from models specifically calibrated for Oregon consistently had lower R2 values than RICHTOL for the two regions tested. Modeled O/E R2 values were between 0.06 and 0.10 lower for each of the four modeling methods applied in the Willamette Valley and were between 0.19 and 0.36 points lower for the Blue Mountains. As a result, BRT models may indeed represent a good alternative to MLR for modeling species distribution relative to environmental variables.
Marcus V. Warwell; Gerald E. Rehfeldt; Nicholas L. Crookston
2006-01-01
The Random Forests multiple regression tree was used to develop an empirically-based bioclimate model for the distribution of Pinus albicaulis (whitebark pine) in western North America, latitudes 31° to 51° N and longitudes 102° to 125° W. Independent variables included 35 simple expressions of temperature and precipitation and their interactions....
Leaf aging of Amazonian canopy trees as revealed by spectral and physiochemical measurements.
Chavana-Bryant, Cecilia; Malhi, Yadvinder; Wu, Jin; Asner, Gregory P; Anastasiou, Athanasios; Enquist, Brian J; Cosio Caravasi, Eric G; Doughty, Christopher E; Saleska, Scott R; Martin, Roberta E; Gerard, France F
2017-05-01
Leaf aging is a fundamental driver of changes in leaf traits, thereby regulating ecosystem processes and remotely sensed canopy dynamics. We explore leaf reflectance as a tool to monitor leaf age and develop a spectra-based partial least squares regression (PLSR) model to predict age using data from a phenological study of 1099 leaves from 12 lowland Amazonian canopy trees in southern Peru. Results demonstrated monotonic decreases in leaf water (LWC) and phosphorus (P mass ) contents and an increase in leaf mass per unit area (LMA) with age across trees; leaf nitrogen (N mass ) and carbon (C mass ) contents showed monotonic but tree-specific age responses. We observed large age-related variation in leaf spectra across trees. A spectra-based model was more accurate in predicting leaf age (R 2 = 0.86; percent root mean square error (%RMSE) = 33) compared with trait-based models using single (R 2 = 0.07-0.73; %RMSE = 7-38) and multiple (R 2 = 0.76; %RMSE = 28) predictors. Spectra- and trait-based models established a physiochemical basis for the spectral age model. Vegetation indices (VIs) including the normalized difference vegetation index (NDVI), enhanced vegetation index 2 (EVI2), normalized difference water index (NDWI) and photosynthetic reflectance index (PRI) were all age-dependent. This study highlights the importance of leaf age as a mediator of leaf traits, provides evidence of age-related leaf reflectance changes that have important impacts on VIs used to monitor canopy dynamics and productivity and proposes a new approach to predicting and monitoring leaf age with important implications for remote sensing. © 2016 The Authors. New Phytologist © 2016 New Phytologist Trust.
NASA Astrophysics Data System (ADS)
Carisi, Francesca; Domeneghetti, Alessio; Kreibich, Heidi; Schröter, Kai; Castellarin, Attilio
2017-04-01
Flood risk is function of flood hazard and vulnerability, therefore its accurate assessment depends on a reliable quantification of both factors. The scientific literature proposes a number of objective and reliable methods for assessing flood hazard, yet it highlights a limited understanding of the fundamental damage processes. Loss modelling is associated with large uncertainty which is, among other factors, due to a lack of standard procedures; for instance, flood losses are often estimated based on damage models derived in completely different contexts (i.e. different countries or geographical regions) without checking its applicability, or by considering only one explanatory variable (i.e. typically water depth). We consider the Secchia river flood event of January 2014, when a sudden levee-breach caused the inundation of nearly 200 km2 in Northern Italy. In the aftermath of this event, local authorities collected flood loss data, together with additional information on affected private households and industrial activities (e.g. buildings surface and economic value, number of company's employees and others). Based on these data we implemented and compared a quadratic-regression damage function, with water depth as the only explanatory variable, and a multi-variable model that combines multiple regression trees and considers several explanatory variables (i.e. bagging decision trees). Our results show the importance of data collection revealing that (1) a simple quadratic regression damage function based on empirical data from the study area can be significantly more accurate than literature damage-models derived for a different context and (2) multi-variable modelling may outperform the uni-variable approach, yet it is more difficult to develop and apply due to a much higher demand of detailed data.
E. Freeman; G. Moisen; J. Coulston; B. Wilson
2014-01-01
Random forests (RF) and stochastic gradient boosting (SGB), both involving an ensemble of classification and regression trees, are compared for modeling tree canopy cover for the 2011 National Land Cover Database (NLCD). The objectives of this study were twofold. First, sensitivity of RF and SGB to choices in tuning parameters was explored. Second, performance of the...
Brito-Rocha, E; Schilling, A C; Dos Anjos, L; Piotto, D; Dalmolin, A C; Mielke, M S
2016-01-01
Individual leaf area (LA) is a key variable in studies of tree ecophysiology because it directly influences light interception, photosynthesis and evapotranspiration of adult trees and seedlings. We analyzed the leaf dimensions (length - L and width - W) of seedlings and adults of seven Neotropical rainforest tree species (Brosimum rubescens, Manilkara maxima, Pouteria caimito, Pouteria torta, Psidium cattleyanum, Symphonia globulifera and Tabebuia stenocalyx) with the objective to test the feasibility of single regression models to estimate LA of both adults and seedlings. In southern Bahia, Brazil, a first set of data was collected between March and October 2012. From the seven species analyzed, only two (P. cattleyanum and T. stenocalyx) had very similar relationships between LW and LA in both ontogenetic stages. For these two species, a second set of data was collected in August 2014, in order to validate the single models encompassing adult and seedlings. Our results show the possibility of development of models for predicting individual leaf area encompassing different ontogenetic stages for tropical tree species. The development of these models was more dependent on the species than the differences in leaf size between seedlings and adults.
Habitat features and predictive habitat modeling for the Colorado chipmunk in southern New Mexico
Rivieccio, M.; Thompson, B.C.; Gould, W.R.; Boykin, K.G.
2003-01-01
Two subspecies of Colorado chipmunk (state threatened and federal species of concern) occur in southern New Mexico: Tamias quadrivittatus australis in the Organ Mountains and T. q. oscuraensis in the Oscura Mountains. We developed a GIS model of potentially suitable habitat based on vegetation and elevation features, evaluated site classifications of the GIS model, and determined vegetation and terrain features associated with chipmunk occurrence. We compared GIS model classifications with actual vegetation and elevation features measured at 37 sites. At 60 sites we measured 18 habitat variables regarding slope, aspect, tree species, shrub species, and ground cover. We used logistic regression to analyze habitat variables associated with chipmunk presence/absence. All (100%) 37 sample sites (28 predicted suitable, 9 predicted unsuitable) were classified correctly by the GIS model regarding elevation and vegetation. For 28 sites predicted suitable by the GIS model, 18 sites (64%) appeared visually suitable based on habitat variables selected from logistic regression analyses, of which 10 sites (36%) were specifically predicted as suitable habitat via logistic regression. We detected chipmunks at 70% of sites deemed suitable via the logistic regression models. Shrub cover, tree density, plant proximity, presence of logs, and presence of rock outcrop were retained in the logistic model for the Oscura Mountains; litter, shrub cover, and grass cover were retained in the logistic model for the Organ Mountains. Evaluation of predictive models illustrates the need for multi-stage analyses to best judge performance. Microhabitat analyses indicate prospective needs for different management strategies between the subspecies. Sensitivities of each population of the Colorado chipmunk to natural and prescribed fire suggest that partial burnings of areas inhabited by Colorado chipmunks in southern New Mexico may be beneficial. These partial burnings may later help avoid a fire that could substantially reduce habitat of chipmunks over a mountain range.
Bennema, S C; Molento, M B; Scholte, R G; Carvalho, O S; Pritsch, I
2017-11-01
Fascioliasis is a condition caused by the trematode Fasciola hepatica. In this paper, the spatial distribution of F. hepatica in bovines in Brazil was modelled using a decision tree approach and a logistic regression, combined with a geographic information system (GIS) query. In the decision tree and the logistic model, isothermality had the strongest influence on disease prevalence. Also, the 50-year average precipitation in the warmest quarter of the year was included as a risk factor, having a negative influence on the parasite prevalence. The risk maps developed using both techniques, showed a predicted higher prevalence mainly in the South of Brazil. The prediction performance seemed to be high, but both techniques failed to reach a high accuracy in predicting the medium and high prevalence classes to the entire country. The GIS query map, based on the range of isothermality, minimum temperature of coldest month, precipitation of warmest quarter of the year, altitude and the average dailyland surface temperature, showed a possibility of presence of F. hepatica in a very large area. The risk maps produced using these methods can be used to focus activities of animal and public health programmes, even on non-evaluated F. hepatica areas.
NASA Astrophysics Data System (ADS)
Hapca, Simona
2015-04-01
Many soil properties and functions emerge from interactions of physical, chemical and biological processes at microscopic scales, which can be understood only by integrating techniques that traditionally are developed within separate disciplines. While recent advances in imaging techniques, such as X-ray computed tomography (X-ray CT), offer the possibility to reconstruct the 3D physical structure at fine resolutions, for the distribution of chemicals in soil, existing methods, based on scanning electron microscope (SEM) and energy dispersive X-ray detection (EDX), allow for characterization of the chemical composition only on 2D surfaces. At present, direct 3D measurement techniques are still lacking, sequential sectioning of soils, followed by 2D mapping of chemical elements and interpolation to 3D, being an alternative which is explored in this study. Specifically, we develop an integrated experimental and theoretical framework which combines 3D X-ray CT imaging technique with 2D SEM-EDX and use spatial statistics methods to map the chemical composition of soil in 3D. The procedure involves three stages 1) scanning a resin impregnated soil cube by X-ray CT, followed by precision cutting to produce parallel thin slices, the surfaces of which are scanned by SEM-EDX, 2) alignment of the 2D chemical maps within the internal 3D structure of the soil cube, and 3) development, of spatial statistics methods to predict the chemical composition of 3D soil based on the observed 2D chemical and 3D physical data. Specifically, three statistical models consisting of a regression tree, a regression tree kriging and cokriging model were used to predict the 3D spatial distribution of carbon, silicon, iron and oxygen in soil, these chemical elements showing a good spatial agreement between the X-ray grayscale intensities and the corresponding 2D SEM-EDX data. Due to the spatial correlation between the physical and chemical data, the regression-tree model showed a great potential in predicting chemical composition in particular for iron, which is generally sparsely distributed in soil. For carbon, silicon and oxygen, which are more densely distributed, the additional kriging of the regression tree residuals improved significantly the prediction, whereas prediction based on co-kriging was less consistent across replicates, underperforming regression-tree kriging. The present study shows a great potential in integrating geo-statistical methods with imaging techniques to unveil the 3D chemical structure of soil at very fine scales, the framework being suitable to be further applied to other types of imaging data such as images of biological thin sections for characterization of microbial distribution. Key words: X-ray CT, SEM-EDX, segmentation techniques, spatial correlation, 3D soil images, 2D chemical maps.
Zong, Shengwei; Wu, Zhengfang; Xu, Jiawei; Li, Ming; Gao, Xiaofeng; He, Hongshi; Du, Haibo; Wang, Lei
2014-01-01
Tree line ecotone in the Changbai Mountains has undergone large changes in the past decades. Tree locations show variations on the four sides of the mountains, especially on the northern and western sides, which has not been fully explained. Previous studies attributed such variations to the variations in temperature. However, in this study, we hypothesized that topographic controls were responsible for causing the variations in the tree locations in tree line ecotone of the Changbai Mountains. To test the hypothesis, we used IKONOS images and WorldView-1 image to identify the tree locations and developed a logistic regression model using topographical variables to identify the dominant controls of the tree locations. The results showed that aspect, wetness, and slope were dominant controls for tree locations on western side of the mountains, whereas altitude, SPI, and aspect were the dominant factors on northern side. The upmost altitude a tree can currently reach was 2140 m asl on the northern side and 2060 m asl on western side. The model predicted results showed that habitats above the current tree line on the both sides were available for trees. Tree recruitments under the current tree line may take advantage of the available habitats at higher elevations based on the current tree location. Our research confirmed the controlling effects of topography on the tree locations in the tree line ecotone of Changbai Mountains and suggested that it was essential to assess the tree response to topography in the research of tree line ecotone. PMID:25170918
Zong, Shengwei; Wu, Zhengfang; Xu, Jiawei; Li, Ming; Gao, Xiaofeng; He, Hongshi; Du, Haibo; Wang, Lei
2014-01-01
Tree line ecotone in the Changbai Mountains has undergone large changes in the past decades. Tree locations show variations on the four sides of the mountains, especially on the northern and western sides, which has not been fully explained. Previous studies attributed such variations to the variations in temperature. However, in this study, we hypothesized that topographic controls were responsible for causing the variations in the tree locations in tree line ecotone of the Changbai Mountains. To test the hypothesis, we used IKONOS images and WorldView-1 image to identify the tree locations and developed a logistic regression model using topographical variables to identify the dominant controls of the tree locations. The results showed that aspect, wetness, and slope were dominant controls for tree locations on western side of the mountains, whereas altitude, SPI, and aspect were the dominant factors on northern side. The upmost altitude a tree can currently reach was 2140 m asl on the northern side and 2060 m asl on western side. The model predicted results showed that habitats above the current tree line on the both sides were available for trees. Tree recruitments under the current tree line may take advantage of the available habitats at higher elevations based on the current tree location. Our research confirmed the controlling effects of topography on the tree locations in the tree line ecotone of Changbai Mountains and suggested that it was essential to assess the tree response to topography in the research of tree line ecotone.
Online adaptive decision trees: pattern classification and function approximation.
Basak, Jayanta
2006-09-01
Recently we have shown that decision trees can be trained in the online adaptive (OADT) mode (Basak, 2004), leading to better generalization score. OADTs were bottlenecked by the fact that they are able to handle only two-class classification tasks with a given structure. In this article, we provide an architecture based on OADT, ExOADT, which can handle multiclass classification tasks and is able to perform function approximation. ExOADT is structurally similar to OADT extended with a regression layer. We also show that ExOADT is capable not only of adapting the local decision hyperplanes in the nonterminal nodes but also has the potential of smoothly changing the structure of the tree depending on the data samples. We provide the learning rules based on steepest gradient descent for the new model ExOADT. Experimentally we demonstrate the effectiveness of ExOADT in the pattern classification and function approximation tasks. Finally, we briefly discuss the relationship of ExOADT with other classification models.
Preliminary Survey on TRY Forest Traits and Growth Index Relations - New Challenges
NASA Astrophysics Data System (ADS)
Lyubenova, Mariyana; Kattge, Jens; van Bodegom, Peter; Chikalanov, Alexandre; Popova, Silvia; Zlateva, Plamena; Peteva, Simona
2016-04-01
Forest ecosystems provide critical ecosystem goods and services, including food, fodder, water, shelter, nutrient cycling, and cultural and recreational value. Forests also store carbon, provide habitat for a wide range of species and help alleviate land degradation and desertification. Thus they have a potentially significant role to play in climate change adaptation planning through maintaining ecosystem services and providing livelihood options. Therefore the study of forest traits is such an important issue not just for individual countries but for the planet as a whole. We need to know what functional relations between forest traits exactly can express TRY data base and haw it will be significant for the global modeling and IPBES. The study of the biodiversity characteristics at all levels and functional links between them is extremely important for the selection of key indicators for assessing biodiversity and ecosystem services for sustainable natural capital control. By comparing the available information in tree data bases: TRY, ITR (International Tree Ring) and SP-PAM the 42 tree species are selected for the traits analyses. The dependence between location characteristics (latitude, longitude, altitude, annual precipitation, annual temperature and soil type) and forest traits (specific leaf area, leaf weight ratio, wood density and growth index) is studied by by multiply regression analyses (RDA) using the statistical software package Canoco 4.5. The Pearson correlation coefficient (measure of linear correlation), Kendal rank correlation coefficient (non parametric measure of statistical dependence) and Spearman correlation coefficient (monotonic function relationship between two variables) are calculated for each pair of variables (indexes) and species. After analysis of above mentioned correlation coefficients the dimensional linear regression models, multidimensional linear and nonlinear regression models and multidimensional neural networks models are built. The strongest dependence between It and WD was obtained. The research will support the work on: Strategic Plan for Biodiversity 2011-2020, modelling and implementation of ecosystem-based approaches to climate change adaptation and disaster risk reduction. Key words: Specific leaf area (SLA), Leaf weight ratio (LWR), Wood density (WD), Growth index (It)
2017-01-01
The potential benefits of planting trees have generated significant interest with respect to sequestering carbon and restoring other forest based ecosystem services. Reliable estimates of carbon stocks are pivotal for understanding the global carbon balance and for promoting initiatives to mitigate CO2 emissions through forest management. There are numerous studies employing allometric regression models that convert inventory into aboveground biomass (AGB) and carbon (C). Yet the majority of allometric regression models do not consider the root system nor do these equations provide detail on the architecture and shape of different species. The root system is a vital piece toward understanding the hidden form and function roots play in carbon accumulation, nutrient and plant water uptake, and groundwater infiltration. Work that estimates C in forests as well as models that are used to better understand the hydrologic function of trees need better characterization of tree roots. We harvested 40 trees of six different species, including their roots down to 2 mm in diameter and created species-specific and multi-species models to calculate aboveground (AGB), coarse root belowground biomass (BGB), and total biomass (TB). We also explore the relationship between crown structure and root structure. We found that BGB contributes ~27.6% of a tree’s TB, lateral roots extend over 1.25 times the distance of crown extent, root allocation patterns varied among species, and that AGB is a strong predictor of TB. These findings highlight the potential importance of including the root system in C estimates and lend important insights into the function roots play in water cycling. PMID:29023553
Ensemble habitat mapping of invasive plant species
Stohlgren, T.J.; Ma, P.; Kumar, S.; Rocca, M.; Morisette, J.T.; Jarnevich, C.S.; Benson, N.
2010-01-01
Ensemble species distribution models combine the strengths of several species environmental matching models, while minimizing the weakness of any one model. Ensemble models may be particularly useful in risk analysis of recently arrived, harmful invasive species because species may not yet have spread to all suitable habitats, leaving species-environment relationships difficult to determine. We tested five individual models (logistic regression, boosted regression trees, random forest, multivariate adaptive regression splines (MARS), and maximum entropy model or Maxent) and ensemble modeling for selected nonnative plant species in Yellowstone and Grand Teton National Parks, Wyoming; Sequoia and Kings Canyon National Parks, California, and areas of interior Alaska. The models are based on field data provided by the park staffs, combined with topographic, climatic, and vegetation predictors derived from satellite data. For the four invasive plant species tested, ensemble models were the only models that ranked in the top three models for both field validation and test data. Ensemble models may be more robust than individual species-environment matching models for risk analysis. ?? 2010 Society for Risk Analysis.
Boosted Regression Tree Models to Explain Watershed ...
Boosted regression tree (BRT) models were developed to quantify the nonlinear relationships between landscape variables and nutrient concentrations in a mesoscale mixed land cover watershed during base-flow conditions. Factors that affect instream biological components, based on the Index of Biotic Integrity (IBI), were also analyzed. Seasonal BRT models at two spatial scales (watershed and riparian buffered area [RBA]) for nitrite-nitrate (NO2-NO3), total Kjeldahl nitrogen, and total phosphorus (TP) and annual models for the IBI score were developed. Two primary factors — location within the watershed (i.e., geographic position, stream order, and distance to a downstream confluence) and percentage of urban land cover (both scales) — emerged as important predictor variables. Latitude and longitude interacted with other factors to explain the variability in summer NO2-NO3 concentrations and IBI scores. BRT results also suggested that location might be associated with indicators of sources (e.g., land cover), runoff potential (e.g., soil and topographic factors), and processes not easily represented by spatial data indicators. Runoff indicators (e.g., Hydrological Soil Group D and Topographic Wetness Indices) explained a substantial portion of the variability in nutrient concentrations as did point sources for TP in the summer months. The results from our BRT approach can help prioritize areas for nutrient management in mixed-use and heavily impacted watershed
Assessing visual green effects of individual urban trees using airborne Lidar data.
Chen, Ziyue; Xu, Bing; Gao, Bingbo
2015-12-01
Urban trees benefit people's daily life in terms of air quality, local climate, recreation and aesthetics. Among these functions, a growing number of studies have been conducted to understand the relationship between residents' preference towards local environments and visual green effects of urban greenery. However, except for on-site photography, there are few quantitative methods to calculate green visibility, especially tree green visibility, from viewers' perspectives. To fill this research gap, a case study was conducted in the city of Cambridge, which has a diversity of tree species, sizes and shapes. Firstly, a photograph-based survey was conducted to approximate the actual value of visual green effects of individual urban trees. In addition, small footprint airborne Lidar (Light detection and ranging) data was employed to measure the size and shape of individual trees. Next, correlations between visual tree green effects and tree structural parameters were examined. Through experiments and gradual refinement, a regression model with satisfactory R2 and limited large errors is proposed. Considering the diversity of sample trees and the result of cross-validation, this model has the potential to be applied to other study sites. This research provides urban planners and decision makers with an innovative method to analyse and evaluate landscape patterns in terms of tree greenness. Copyright © 2015 Elsevier B.V. All rights reserved.
Risk Factors of Falls in Community-Dwelling Older Adults: Logistic Regression Tree Analysis
ERIC Educational Resources Information Center
Yamashita, Takashi; Noe, Douglas A.; Bailer, A. John
2012-01-01
Purpose of the Study: A novel logistic regression tree-based method was applied to identify fall risk factors and possible interaction effects of those risk factors. Design and Methods: A nationally representative sample of American older adults aged 65 years and older (N = 9,592) in the Health and Retirement Study 2004 and 2006 modules was used.…
NASA Astrophysics Data System (ADS)
Hadley, Brian Christopher
This dissertation assessed remotely sensed data and geospatial modeling technique(s) to map the spatial distribution of total above-ground biomass present on the surface of the Savannah River National Laboratory's (SRNL) Mixed Waste Management Facility (MWMF) hazardous waste landfill. Ordinary least squares (OLS) regression, regression kriging, and tree-structured regression were employed to model the empirical relationship between in-situ measured Bahia (Paspalum notatum Flugge) and Centipede [Eremochloa ophiuroides (Munro) Hack.] grass biomass against an assortment of explanatory variables extracted from fine spatial resolution passive optical and LIDAR remotely sensed data. Explanatory variables included: (1) discrete channels of visible, near-infrared (NIR), and short-wave infrared (SWIR) reflectance, (2) spectral vegetation indices (SVI), (3) spectral mixture analysis (SMA) modeled fractions, (4) narrow-band derivative-based vegetation indices, and (5) LIDAR derived topographic variables (i.e. elevation, slope, and aspect). Results showed that a linear combination of the first- (1DZ_DGVI), second- (2DZ_DGVI), and third-derivative of green vegetation indices (3DZ_DGVI) calculated from hyperspectral data recorded over the 400--960 nm wavelengths of the electromagnetic spectrum explained the largest percentage of statistical variation (R2 = 0.5184) in the total above-ground biomass measurements. In general, the topographic variables did not correlate well with the MWMF biomass data, accounting for less than five percent of the statistical variation. It was concluded that tree-structured regression represented the optimum geospatial modeling technique due to a combination of model performance and efficiency/flexibility factors.
Jose F. Negron; Willis C. Schaupp; Kenneth E. Gibson; John Anhold; Dawn Hansen; Ralph Thier; Phil Mocettini
1999-01-01
Data collected from Douglas-fir stands infected by the Douglas-fir beetle in Wyoming, Montana, Idaho, and Utah, were used to develop models to estimate amount of mortality in terms of basal area killed. Models were built using stepwise linear regression and regression tree approaches. Linear regression models using initial Douglas-fir basal area were built for all...
Ye, Fang; Chen, Zhi-Hua; Chen, Jie; Liu, Fang; Zhang, Yong; Fan, Qin-Ying; Wang, Lin
2016-01-01
Background: In the past decades, studies on infant anemia have mainly focused on rural areas of China. With the increasing heterogeneity of population in recent years, available information on infant anemia is inconclusive in large cities of China, especially with comparison between native residents and floating population. This population-based cross-sectional study was implemented to determine the anemic status of infants as well as the risk factors in a representative downtown area of Beijing. Methods: As useful methods to build a predictive model, Chi-squared automatic interaction detection (CHAID) decision tree analysis and logistic regression analysis were introduced to explore risk factors of infant anemia. A total of 1091 infants aged 6–12 months together with their parents/caregivers living at Heping Avenue Subdistrict of Beijing were surveyed from January 1, 2013 to December 31, 2014. Results: The prevalence of anemia was 12.60% with a range of 3.47%–40.00% in different subgroup characteristics. The CHAID decision tree model has demonstrated multilevel interaction among risk factors through stepwise pathways to detect anemia. Besides the three predictors identified by logistic regression model including maternal anemia during pregnancy, exclusive breastfeeding in the first 6 months, and floating population, CHAID decision tree analysis also identified the fourth risk factor, the maternal educational level, with higher overall classification accuracy and larger area below the receiver operating characteristic curve. Conclusions: The infant anemic status in metropolis is complex and should be carefully considered by the basic health care practitioners. CHAID decision tree analysis has demonstrated a better performance in hierarchical analysis of population with great heterogeneity. Risk factors identified by this study might be meaningful in the early detection and prompt treatment of infant anemia in large cities. PMID:27174328
Ye, Fang; Chen, Zhi-Hua; Chen, Jie; Liu, Fang; Zhang, Yong; Fan, Qin-Ying; Wang, Lin
2016-05-20
In the past decades, studies on infant anemia have mainly focused on rural areas of China. With the increasing heterogeneity of population in recent years, available information on infant anemia is inconclusive in large cities of China, especially with comparison between native residents and floating population. This population-based cross-sectional study was implemented to determine the anemic status of infants as well as the risk factors in a representative downtown area of Beijing. As useful methods to build a predictive model, Chi-squared automatic interaction detection (CHAID) decision tree analysis and logistic regression analysis were introduced to explore risk factors of infant anemia. A total of 1091 infants aged 6-12 months together with their parents/caregivers living at Heping Avenue Subdistrict of Beijing were surveyed from January 1, 2013 to December 31, 2014. The prevalence of anemia was 12.60% with a range of 3.47%-40.00% in different subgroup characteristics. The CHAID decision tree model has demonstrated multilevel interaction among risk factors through stepwise pathways to detect anemia. Besides the three predictors identified by logistic regression model including maternal anemia during pregnancy, exclusive breastfeeding in the first 6 months, and floating population, CHAID decision tree analysis also identified the fourth risk factor, the maternal educational level, with higher overall classification accuracy and larger area below the receiver operating characteristic curve. The infant anemic status in metropolis is complex and should be carefully considered by the basic health care practitioners. CHAID decision tree analysis has demonstrated a better performance in hierarchical analysis of population with great heterogeneity. Risk factors identified by this study might be meaningful in the early detection and prompt treatment of infant anemia in large cities.
NASA Astrophysics Data System (ADS)
Stas, Michiel; Dong, Qinghan; Heremans, Stien; Zhang, Beier; Van Orshoven, Jos
2016-08-01
This paper compares two machine learning techniques to predict regional winter wheat yields. The models, based on Boosted Regression Trees (BRT) and Support Vector Machines (SVM), are constructed of Normalized Difference Vegetation Indices (NDVI) derived from low resolution SPOT VEGETATION satellite imagery. Three types of NDVI-related predictors were used: Single NDVI, Incremental NDVI and Targeted NDVI. BRT and SVM were first used to select features with high relevance for predicting the yield. Although the exact selections differed between the prefectures, certain periods with high influence scores for multiple prefectures could be identified. The same period of high influence stretching from March to June was detected by both machine learning methods. After feature selection, BRT and SVM models were applied to the subset of selected features for actual yield forecasting. Whereas both machine learning methods returned very low prediction errors, BRT seems to slightly but consistently outperform SVM.
Ramírez, J; Górriz, J M; Segovia, F; Chaves, R; Salas-Gonzalez, D; López, M; Alvarez, I; Padilla, P
2010-03-19
This letter shows a computer aided diagnosis (CAD) technique for the early detection of the Alzheimer's disease (AD) by means of single photon emission computed tomography (SPECT) image classification. The proposed method is based on partial least squares (PLS) regression model and a random forest (RF) predictor. The challenge of the curse of dimensionality is addressed by reducing the large dimensionality of the input data by downscaling the SPECT images and extracting score features using PLS. A RF predictor then forms an ensemble of classification and regression tree (CART)-like classifiers being its output determined by a majority vote of the trees in the forest. A baseline principal component analysis (PCA) system is also developed for reference. The experimental results show that the combined PLS-RF system yields a generalization error that converges to a limit when increasing the number of trees in the forest. Thus, the generalization error is reduced when using PLS and depends on the strength of the individual trees in the forest and the correlation between them. Moreover, PLS feature extraction is found to be more effective for extracting discriminative information from the data than PCA yielding peak sensitivity, specificity and accuracy values of 100%, 92.7%, and 96.9%, respectively. Moreover, the proposed CAD system outperformed several other recently developed AD CAD systems. Copyright 2010 Elsevier Ireland Ltd. All rights reserved.
Liang, Zhaohui; Liu, Jun; Huang, Jimmy X; Zeng, Xing
2018-01-01
The genetic polymorphism of Cytochrome P450 (CYP 450) is considered as one of the main causes for adverse drug reactions (ADRs). In order to explore the latent correlations between ADRs and potentially corresponding single-nucleotide polymorphism (SNPs) in CYP450, three algorithms based on information theory are used as the main method to predict the possible relation. The study uses a retrospective case-control study to explore the potential relation of ADRs to specific genomic locations and single-nucleotide polymorphism (SNP). The genomic data collected from 53 healthy volunteers are applied for the analysis, another group of genomic data collected from 30 healthy volunteers excluded from the study are used as the control group. The SNPs respective on five loci of CYP2D6*2,*10,*14 and CYP1A2*1C, *1F are detected by the Applied Biosystem 3130xl. The raw data is processed by ChromasPro to detect the specific alleles on the above loci from each sample. The secondary data are reorganized and processed by R combined with the reports of ADRs from clinical reports. Three information theory based algorithms are implemented for the screening task: JMI, CMIM, and mRMR. If a SNP is selected by more than two algorithms, we are confident to conclude that it is related to the corresponding ADR. The selection results are compared with the control decision tree + LASSO regression model. In the study group where ADRs occur, 10 SNPs are considered relevant to the occurrence of a specific ADR by the combined information theory model. In comparison, only 5 SNPs are considered relevant to a specific ADR by the decision tree + LASSO regression model. In addition, the new method detects more relevant pairs of SNP and ADR which are affected by both SNP and dosage. This implies that the new information theory based model is effective to discover correlations of ADRs and CYP 450 SNPs and is helpful in predicting the potential vulnerable genotype for some ADRs. The newly proposed information theory based model has superiority performance in detecting the relation between SNP and ADR compared to the decision tree + LASSO regression model. The new model is more sensitive to detect ADRs compared to the old method, while the old method is more reliable. Therefore, the selection criteria for selecting algorithms should depend on the pragmatic needs. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
Prediction model of critical weight loss in cancer patients during particle therapy.
Zhang, Zhihong; Zhu, Yu; Zhang, Lijuan; Wang, Ziying; Wan, Hongwei
2018-01-01
The objective of this study is to investigate the predictors of critical weight loss in cancer patients receiving particle therapy, and build a prediction model based on its predictive factors. Patients receiving particle therapy were enroled between June 2015 and June 2016. Body weight was measured at the start and end of particle therapy. Association between critical weight loss (defined as >5%) during particle therapy and patients' demographic, clinical characteristic, pre-therapeutic nutrition risk screening (NRS 2002) and BMI were evaluated by logistic regression and decision tree analysis. Finally, 375 cancer patients receiving particle therapy were included. Mean weight loss was 0.55 kg, and 11.5% of patients experienced critical weight loss during particle therapy. The main predictors of critical weight loss during particle therapy were head and neck tumour location, total radiation dose ≥70 Gy on the primary tumour, and without post-surgery, as indicated by both logistic regression and decision tree analysis. Prediction model that includes tumour locations, total radiation dose and post-surgery had a good predictive ability, with the area under receiver operating characteristic curve 0.79 (95% CI: 0.71-0.88) and 0.78 (95% CI: 0.69-0.86) for decision tree and logistic regression model, respectively. Cancer patients with head and neck tumour location, total radiation dose ≥70 Gy and without post-surgery were at higher risk of critical weight loss during particle therapy, and early intensive nutrition counselling or intervention should be target at this population. © The Author 2017. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Barzegar, Rahim; Moghaddam, Asghar Asghari; Deo, Ravinesh; Fijani, Elham; Tziritis, Evangelos
2018-04-15
Constructing accurate and reliable groundwater risk maps provide scientifically prudent and strategic measures for the protection and management of groundwater. The objectives of this paper are to design and validate machine learning based-risk maps using ensemble-based modelling with an integrative approach. We employ the extreme learning machines (ELM), multivariate regression splines (MARS), M5 Tree and support vector regression (SVR) applied in multiple aquifer systems (e.g. unconfined, semi-confined and confined) in the Marand plain, North West Iran, to encapsulate the merits of individual learning algorithms in a final committee-based ANN model. The DRASTIC Vulnerability Index (VI) ranged from 56.7 to 128.1, categorized with no risk, low and moderate vulnerability thresholds. The correlation coefficient (r) and Willmott's Index (d) between NO 3 concentrations and VI were 0.64 and 0.314, respectively. To introduce improvements in the original DRASTIC method, the vulnerability indices were adjusted by NO 3 concentrations, termed as the groundwater contamination risk (GCR). Seven DRASTIC parameters utilized as the model inputs and GCR values utilized as the outputs of individual machine learning models were served in the fully optimized committee-based ANN-predictive model. The correlation indicators demonstrated that the ELM and SVR models outperformed the MARS and M5 Tree models, by virtue of a larger d and r value. Subsequently, the r and d metrics for the ANN-committee based multi-model in the testing phase were 0.8889 and 0.7913, respectively; revealing the superiority of the integrated (or ensemble) machine learning models when compared with the original DRASTIC approach. The newly designed multi-model ensemble-based approach can be considered as a pragmatic step for mapping groundwater contamination risks of multiple aquifer systems with multi-model techniques, yielding the high accuracy of the ANN committee-based model. Copyright © 2017 Elsevier B.V. All rights reserved.
Price, B; Gomez, A; Mathys, L; Gardi, O; Schellenberger, A; Ginzler, C; Thürig, E
2017-03-01
Trees outside forest (TOF) can perform a variety of social, economic and ecological functions including carbon sequestration. However, detailed quantification of tree biomass is usually limited to forest areas. Taking advantage of structural information available from stereo aerial imagery and airborne laser scanning (ALS), this research models tree biomass using national forest inventory data and linear least-square regression and applies the model both inside and outside of forest to create a nationwide model for tree biomass (above ground and below ground). Validation of the tree biomass model against TOF data within settlement areas shows relatively low model performance (R 2 of 0.44) but still a considerable improvement on current biomass estimates used for greenhouse gas inventory and carbon accounting. We demonstrate an efficient and easily implementable approach to modelling tree biomass across a large heterogeneous nationwide area. The model offers significant opportunity for improved estimates on land use combination categories (CC) where tree biomass has either not been included or only roughly estimated until now. The ALS biomass model also offers the advantage of providing greater spatial resolution and greater within CC spatial variability compared to the current nationwide estimates.
Nunes, Matheus Henrique
2016-01-01
Tree stem form in native tropical forests is very irregular, posing a challenge to establishing taper equations that can accurately predict the diameter at any height along the stem and subsequently merchantable volume. Artificial intelligence approaches can be useful techniques in minimizing estimation errors within complex variations of vegetation. We evaluated the performance of Random Forest® regression tree and Artificial Neural Network procedures in modelling stem taper. Diameters and volume outside bark were compared to a traditional taper-based equation across a tropical Brazilian savanna, a seasonal semi-deciduous forest and a rainforest. Neural network models were found to be more accurate than the traditional taper equation. Random forest showed trends in the residuals from the diameter prediction and provided the least precise and accurate estimations for all forest types. This study provides insights into the superiority of a neural network, which provided advantages regarding the handling of local effects. PMID:27187074
Nunes, Matheus Henrique; Görgens, Eric Bastos
2016-01-01
Tree stem form in native tropical forests is very irregular, posing a challenge to establishing taper equations that can accurately predict the diameter at any height along the stem and subsequently merchantable volume. Artificial intelligence approaches can be useful techniques in minimizing estimation errors within complex variations of vegetation. We evaluated the performance of Random Forest® regression tree and Artificial Neural Network procedures in modelling stem taper. Diameters and volume outside bark were compared to a traditional taper-based equation across a tropical Brazilian savanna, a seasonal semi-deciduous forest and a rainforest. Neural network models were found to be more accurate than the traditional taper equation. Random forest showed trends in the residuals from the diameter prediction and provided the least precise and accurate estimations for all forest types. This study provides insights into the superiority of a neural network, which provided advantages regarding the handling of local effects.
Design of Probabilistic Random Forests with Applications to Anticancer Drug Sensitivity Prediction
Rahman, Raziur; Haider, Saad; Ghosh, Souparno; Pal, Ranadip
2015-01-01
Random forests consisting of an ensemble of regression trees with equal weights are frequently used for design of predictive models. In this article, we consider an extension of the methodology by representing the regression trees in the form of probabilistic trees and analyzing the nature of heteroscedasticity. The probabilistic tree representation allows for analytical computation of confidence intervals (CIs), and the tree weight optimization is expected to provide stricter CIs with comparable performance in mean error. We approached the ensemble of probabilistic trees’ prediction from the perspectives of a mixture distribution and as a weighted sum of correlated random variables. We applied our methodology to the drug sensitivity prediction problem on synthetic and cancer cell line encyclopedia dataset and illustrated that tree weights can be selected to reduce the average length of the CI without increase in mean error. PMID:27081304
Sabel, Michael S; Rice, John D; Griffith, Kent A; Lowe, Lori; Wong, Sandra L; Chang, Alfred E; Johnson, Timothy M; Taylor, Jeremy M G
2012-01-01
To identify melanoma patients at sufficiently low risk of nodal metastases who could avoid sentinel lymph node biopsy (SLNB), several statistical models have been proposed based upon patient/tumor characteristics, including logistic regression, classification trees, random forests, and support vector machines. We sought to validate recently published models meant to predict sentinel node status. We queried our comprehensive, prospectively collected melanoma database for consecutive melanoma patients undergoing SLNB. Prediction values were estimated based upon four published models, calculating the same reported metrics: negative predictive value (NPV), rate of negative predictions (RNP), and false-negative rate (FNR). Logistic regression performed comparably with our data when considering NPV (89.4 versus 93.6%); however, the model's specificity was not high enough to significantly reduce the rate of biopsies (SLN reduction rate of 2.9%). When applied to our data, the classification tree produced NPV and reduction in biopsy rates that were lower (87.7 versus 94.1 and 29.8 versus 14.3, respectively). Two published models could not be applied to our data due to model complexity and the use of proprietary software. Published models meant to reduce the SLNB rate among patients with melanoma either underperformed when applied to our larger dataset, or could not be validated. Differences in selection criteria and histopathologic interpretation likely resulted in underperformance. Statistical predictive models must be developed in a clinically applicable manner to allow for both validation and ultimately clinical utility.
Song, Huiming; Liu, Yu; Li, Qiang; Gao, Na; Ma, Yongyong; Zhang, Yanhua
2014-01-01
Tree-ring samples from Chinese Pine (Pinus tabulaeformis Carr.) collected at Mt. Shimen on the western Loess Plateau, China, were used to reconstruct the mean May–July temperature during AD 1630–2011. The regression model explained 48% of the adjusted variance in the instrumentally observed mean May–July temperature. The reconstruction revealed significant temperature variations at interannual to decadal scales. Cool periods observed in the reconstruction coincided with reduced solar activities. The reconstructed temperature matched well with two other tree-ring based temperature reconstructions conducted on the northern slope of the Qinling Mountains (on the southern margin of the Loess Plateau of China) for both annual and decadal scales. In addition, this study agreed well with several series derived from different proxies. This reconstruction improves upon the sparse network of high-resolution paleoclimatic records for the western Loess Plateau, China. PMID:24690885
A method for reconstructing the development of the sapwood area of balsam fir.
Coyea, M R; Margolis, H A; Gagnon, R R
1990-09-01
Leaf area is commonly estimated as a function of sapwood area. However, because sapwood changes to heartwood over time, it has not previously been possible to reconstruct either the sapwood area or the leaf area of older trees into the past. In this study, we report a method for reconstructing the development of the sapwood area of dominant and codominant balsam fir (Abies balsamea (L.) Mill.). The technique is based on establishing a species-specific relationship between the number of annual growth rings in the sapwood area and tree age. Because the number of annual growth rings in the sapwood of balsam fir at a given age was found to be independent of site quality and stand density, the number of rings in sapwood (NRS) can be predicted from the age of a tree thus: NRS = 14.818 (1 - e(-0.031 age)), unweighted R(2) = 0.80, and NRS = 2.490 (1 - e(-0.038 age)), unweighted R(2) = 0.64, for measurements at breast height and at the base of the live crown, respectively. These nonlinear asymptotic regression models based only on age, were not improved by adding other tree variables such as diameter at breast height, diameter at the base of the live crown, total tree height or percent live crown.
Stemflow in low-density and hedgerow olive orchards in Portugal
NASA Astrophysics Data System (ADS)
Dias, Pedro D.; Valente, Fernanda; Pereira, Fernando L.; Abreu, Francisco G.
2015-04-01
Stemflow (Sf) is responsible for a localized water and solute input to soil around tree's trunks, playing an important eco-hydrological role in forest and agricultural ecosystems. Sf was monitored for seven months in 25 Olea europaea L. trees distributed in three orchards managed in two different ways, traditional low-density and super high density hedgerow. The orchards were located in central Portugal in the regions of Santarém (Várzea and Azóia) and Lisboa (Tapada). Seven olive varieties were analysed: Arbequina, Galega, Picual, Maçanilha, Cordovil, Azeiteira, Negrinha and Blanqueta. Measured Sf ranged from 7.5 to 87.2 mm (relative to crown-projected area), corresponding to 1.2 and 16.7% of gross rainfall (Pg). To understand better the variables that affect Sf and to be able to predict its value, linear regression models were fitted to these data. Whenever possible, the linear models were simplified using the backward stepwise algorithm based on the Akaike information criterion. For each tree, multiple linear regressions were adjusted between Sf and the duration, volume and intensity of rainfall episodes and maximum evaporation rate. In the low-density Várzea grove the more relevant explanatory variables were the three rainfall characteristics. In the super high density Azóia orchard only rainfall volume and intensity were considered relevant. In the low-density Tapada's grove all trees had a different sub-model with Pg being the only common variable. To try to explain differences between trees and to improve the quality of the modeling in each orchard, another set of explanatory variables was added: canopy volume, tree and trunk heights and trunk perimeter at the height of the first branches. The variables present in all sub-models were rainfall volume and intensity and the tree and trunk heights. Canopy volume and rainfall duration were also present in the sub-models of the two low-density groves (Tapada and Várzea). The determination coefficient (R2) of all models ranged from 0.5 to 0.76. The size of leaves was also analysed. Although there were significant differences between varieties and between trees of the same variety, they did not seem to affect the amount of Sf generated. Through analysis of bark storage capacity, it was found that older trees, with rough and thick bark, had higher trunk storage capacity and, therefore, originated less Sf. The results confirm the need for considering the contribution of stemflow when trying to correctly assess interception loss in olive orchards. Although the use of simple and general statistical models may be an attractive option, their precision may be small, making direct measurements or conceptual modelling preferable methods.
Sabel, Michael S.; Rice, John D.; Griffith, Kent A.; Lowe, Lori; Wong, Sandra L.; Chang, Alfred E.; Johnson, Timothy M.; Taylor, Jeremy M.G.
2013-01-01
Introduction To identify melanoma patients at sufficiently low risk of nodal metastases who could avoid SLN biopsy (SLNB). Several statistical models have been proposed based upon patient/tumor characteristics, including logistic regression, classification trees, random forests and support vector machines. We sought to validate recently published models meant to predict sentinel node status. Methods We queried our comprehensive, prospectively-collected melanoma database for consecutive melanoma patients undergoing SLNB. Prediction values were estimated based upon 4 published models, calculating the same reported metrics: negative predictive value (NPV), rate of negative predictions (RNP), and false negative rate (FNR). Results Logistic regression performed comparably with our data when considering NPV (89.4% vs. 93.6%); however the model’s specificity was not high enough to significantly reduce the rate of biopsies (SLN reduction rate of 2.9%). When applied to our data, the classification tree produced NPV and reduction in biopsies rates that were lower 87.7% vs. 94.1% and 29.8% vs. 14.3%, respectively. Two published models could not be applied to our data due to model complexity and the use of proprietary software. Conclusions Published models meant to reduce the SLNB rate among patients with melanoma either underperformed when applied to our larger dataset, or could not be validated. Differences in selection criteria and histopathologic interpretation likely resulted in underperformance. Development of statistical predictive models must be created in a clinically applicable manner to allow for both validation and ultimately clinical utility. PMID:21822550
López-Sampson, Arlene; Cernusak, Lucas A; Page, Tony
2017-05-01
Physiological traits are frequently used as indicators of tree productivity. Aquilaria species growing in a research planting were studied to investigate relationships between leaf-productivity traits and tree growth. Twenty-eight trees were selected to measure isotopic composition of carbon (δ13C) and nitrogen (δ15N) and monitor six leaf attributes. Trees were sampled randomly within each of four diametric classes (at 150 mm above ground level) ensuring the variability in growth of the whole population was represented. A model averaging technique based on the Akaike's information criterion was computed to identify whether leaf traits could assist in diameter prediction. Regression analysis was performed to test for relationships between carbon isotope values and diameter and leaf traits. Approximately one new leaf per week was produced by a shoot. The rate of leaf expansion was estimated as 1.45 mm day-1. The range of δ13C values in leaves of Aquilaria species was from -25.5‰ to -31‰, with an average of -28.4 ‰ (±1.5‰ SD). A moderate negative correlation (R2 = 0.357) between diameter and δ13C in leaf dry matter indicated that individuals with high intercellular CO2 concentrations (low δ13C) and associated low water-use efficiency sustained rapid growth. Analysis of the 95% confidence of best-ranked regression models indicated that the predictors that could best explain growth in Aquilaria species were δ13C, δ15N, petiole length, number of new leaves produced per week and specific leaf area. The model constructed with these variables explained 55% (R2 = 0.55) of the variability in stem diameter. This demonstrates that leaf traits can assist in the early selection of high-productivity trees in Aquilaria species. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Sonja N. Oswalt; Christopher M. Oswalt
2008-01-01
This paper compares and contrasts hurricane-related damage recorded across the Mississippi landscape in the 2 years following Katrina with initial damage assessments based on modeled parameters by the USDA Forest Service. Logistic and multiple regressions are used to evaluate the influence of stand characteristics on tree damage probability. Specifically, this paper...
Empirical analyses of plant-climate relationships for the western United States
Gerald E. Rehfeldt; Nicholas L. Crookston; Marcus V. Warwell; Jeffrey S. Evans
2006-01-01
The Random Forests multiple-regression tree was used to model climate profiles of 25 biotic communities of the western United States and nine of their constituent species. Analyses of the communities were based on a gridded sample of ca. 140,000 points, while those for the species used presence-absence data from ca. 120,000 locations. Independent variables included 35...
Deciphering factors controlling groundwater arsenic spatial variability in Bangladesh
NASA Astrophysics Data System (ADS)
Tan, Z.; Yang, Q.; Zheng, C.; Zheng, Y.
2017-12-01
Elevated concentrations of geogenic arsenic in groundwater have been found in many countries to exceed 10 μg/L, the WHO's guideline value for drinking water. A common yet unexplained characteristic of groundwater arsenic spatial distribution is the extensive variability at various spatial scales. This study investigates factors influencing the spatial variability of groundwater arsenic in Bangladesh to improve the accuracy of models predicting arsenic exceedance rate spatially. A novel boosted regression tree method is used to establish a weak-learning ensemble model, which is compared to a linear model using a conventional stepwise logistic regression method. The boosted regression tree models offer the advantage of parametric interaction when big datasets are analyzed in comparison to the logistic regression. The point data set (n=3,538) of groundwater hydrochemistry with 19 parameters was obtained by the British Geological Survey in 2001. The spatial data sets of geological parameters (n=13) were from the Consortium for Spatial Information, Technical University of Denmark, University of East Anglia and the FAO, while the soil parameters (n=42) were from the Harmonized World Soil Database. The aforementioned parameters were regressed to categorical groundwater arsenic concentrations below or above three thresholds: 5 μg/L, 10 μg/L and 50 μg/L to identify respective controlling factors. Boosted regression tree method outperformed logistic regression methods in all three threshold levels in terms of accuracy, specificity and sensitivity, resulting in an improvement of spatial distribution map of probability of groundwater arsenic exceeding all three thresholds when compared to disjunctive-kriging interpolated spatial arsenic map using the same groundwater arsenic dataset. Boosted regression tree models also show that the most important controlling factors of groundwater arsenic distribution include groundwater iron content and well depth for all three thresholds. The probability of a well with iron content higher than 5mg/L to contain greater than 5 μg/L, 10 μg/L and 50 μg/L As is estimated to be more than 91%, 85% and 51%, respectively, while the probability of a well from depth more than 160m to contain more than 5 μg/L, 10 μg/L and 50 μg/L As is estimated to be less than 38%, 25% and 14%, respectively.
Delayed conifer tree mortality following fire in California
Sharon M. Hood; Sheri L. Smith; Daniel R. Cluck
2007-01-01
Fire injury was characterized and survival monitored for 5,246 trees from five wildfires in California that occurred between 1999 and 2002. Logistic regression models for predicting the probability of mortality were developed for incense-cedar, Jeffrey pine, ponderosa pine, red fir and white fir. Two-year post-fire preliminary models were developed for incense-cedar,...
Diameter-growth model across shortleaf pine range using regression tree analysis
Daniel Yaussy; Louis Iverson; Anantha Prasad
1999-01-01
Diameter growth of a tree in most gap-phase models is limited by light, nutrients, moisture, and temperature. Growing-season temperature is represented by growing degree days (gdd), which is the sum of the average daily temperatures above a baseline temperature. Gap-phase models determine the north-south range of a species by the gdd limits at the north and south...
Comprehensive database of diameter-based biomass regressions for North American tree species
Jennifer C. Jenkins; David C. Chojnacky; Linda S. Heath; Richard A. Birdsey
2004-01-01
A database consisting of 2,640 equations compiled from the literature for predicting the biomass of trees and tree components from diameter measurements of species found in North America. Bibliographic information, geographic locations, diameter limits, diameter and biomass units, equation forms, statistical errors, and coefficients are provided for each equation,...
Identification of chilling and heat requirements of cherry trees—a statistical approach
NASA Astrophysics Data System (ADS)
Luedeling, Eike; Kunz, Achim; Blanke, Michael M.
2013-09-01
Most trees from temperate climates require the accumulation of winter chill and subsequent heat during their dormant phase to resume growth and initiate flowering in the following spring. Global warming could reduce chill and hence hamper the cultivation of high-chill species such as cherries. Yet determining chilling and heat requirements requires large-scale controlled-forcing experiments, and estimates are thus often unavailable. Where long-term phenology datasets exist, partial least squares (PLS) regression can be used as an alternative, to determine climatic requirements statistically. Bloom dates of cherry cv. `Schneiders späte Knorpelkirsche' trees in Klein-Altendorf, Germany, from 24 growing seasons were correlated with 11-day running means of daily mean temperature. Based on the output of the PLS regression, five candidate chilling periods ranging in length from 17 to 102 days, and one forcing phase of 66 days were delineated. Among three common chill models used to quantify chill, the Dynamic Model showed the lowest variation in chill, indicating that it may be more accurate than the Utah and Chilling Hours Models. Based on the longest candidate chilling phase with the earliest starting date, cv. `Schneiders späte Knorpelkirsche' cherries at Bonn exhibited a chilling requirement of 68.6 ± 5.7 chill portions (or 1,375 ± 178 chilling hours or 1,410 ± 238 Utah chill units) and a heat requirement of 3,473 ± 1,236 growing degree hours. Closer investigation of the distinct chilling phases detected by PLS regression could contribute to our understanding of dormancy processes and thus help fruit and nut growers identify suitable tree cultivars for a future in which static climatic conditions can no longer be assumed. All procedures used in this study were bundled in an R package (`chillR') and are provided as Supplementary materials. The procedure was also applied to leaf emergence dates of walnut (cv. `Payne') at Davis, California.
Pestana, Maribela; Beja, Pedro; Correia, Pedro José; de Varennes, Amarilis; Faria, Eugénio Araújo
2005-06-01
To determine if flower nutrient composition can be used to predict fruit quality, a field experiment was conducted over three seasons (1996-1999) in a commercial orange orchard (Citrus sinensis (L.) Osbeck cv. 'Valencia Late', budded on Troyer citrange rootstock) established on a calcareous soil in southern Portugal. Flowers were collected from 20 trees during full bloom in April and their nutrient composition determined, and fruits were harvested the following March and their quality evaluated. Patterns of covariation in flower nutrient concentrations and in fruit quality variables were evaluated by principal component analysis. Regression models relating fruit quality variables to flower nutrient composition were developed by stepwise selection procedures. The predictive power of the regression models was evaluated with an independent data set. Nutrient composition of flowers at full bloom could be used to predict the fruit quality variables fresh fruit mass and maturation index in the following year. Magnesium, Ca and Zn concentrations measured in flowers were related to fruit fresh mass estimations and N, P, Mg and Fe concentrations were related to fruit maturation index. We also established reference values for the nutrient composition of flowers based on measurements made in trees that produced large (> 76 mm in diameter) fruit.
NASA Astrophysics Data System (ADS)
Ghosh, S. M.; Behera, M. D.
2017-12-01
Forest aboveground biomass (AGB) is an important factor for preparation of global policy making decisions to tackle the impact of climate change. Several previous studies has concluded that remote sensing methods are more suitable for estimating forest biomass on regional scale. Among all available remote sensing data and methods, Synthetic Aperture Radar (SAR) data in combination with decision tree based machine learning algorithms has shown better promise in estimating higher biomass values. There aren't many studies done for biomass estimation of dense Indian tropical forests with high biomass density. In this study aboveground biomass was estimated for two major tree species, Sal (Shorea robusta) and Teak (Tectona grandis), of Katerniaghat Wildlife Sanctuary, a tropical forest situated in northern India. Biomass was estimated by combining C-band SAR data from Sentinel-1A satellite, vegetation indices produced using Sentinel-2A data and ground inventory plots. Along with SAR backscatter value, SAR texture images were also used as input as earlier studies had found that image texture has a correlation with vegetation biomass. Decision tree based nonlinear machine learning algorithms were used in place of parametric regression models for establishing relationship between fields measured values and remotely sensed parameters. Using random forest model with a combination of vegetation indices with SAR backscatter as predictor variables shows best result for Sal forest, with a coefficient of determination value of 0.71 and a RMSE value of 105.027 t/ha. In teak forest also best result can be found in the same combination but for stochastic gradient boosted model with a coefficient of determination value of 0.6 and a RMSE value of 79.45 t/ha. These results are mostly better than the results of other studies done for similar kind of forests. This study shows that Sentinel series satellite data has exceptional capabilities in estimating dense forest AGB and machine learning algorithms are better means to do so than parametric regression models.
Potential Changes in Tree Species Richness and Forest Community Types following Climate Change
Louis R. Iverson; Anantha M. Prasad
2001-01-01
Potential changes in tree species richness and forest community types were evaluated for the eastern United States according to five scenarios of future climate change resulting from a doubling of atmospheric carbon dioxide (CO2). DISTRIB, an empirical model that uses a regression tree analysis approach, was used to generate suitable habitat, or potential future...
Luo, Laiping; Zhai, Qiuping; Su, Yanjun; Ma, Qin; Kelly, Maggi; Guo, Qinghua
2018-05-14
Crown base height (CBH) is an essential tree biophysical parameter for many applications in forest management, forest fuel treatment, wildfire modeling, ecosystem modeling and global climate change studies. Accurate and automatic estimation of CBH for individual trees is still a challenging task. Airborne light detection and ranging (LiDAR) provides reliable and promising data for estimating CBH. Various methods have been developed to calculate CBH indirectly using regression-based means from airborne LiDAR data and field measurements. However, little attention has been paid to directly calculate CBH at the individual tree scale in mixed-species forests without field measurements. In this study, we propose a new method for directly estimating individual-tree CBH from airborne LiDAR data. Our method involves two main strategies: 1) removing noise and understory vegetation for each tree; and 2) estimating CBH by generating percentile ranking profile for each tree and using a spline curve to identify its inflection points. These two strategies lend our method the advantages of no requirement of field measurements and being efficient and effective in mixed-species forests. The proposed method was applied to a mixed conifer forest in the Sierra Nevada, California and was validated by field measurements. The results showed that our method can directly estimate CBH at individual tree level with a root-mean-squared error of 1.62 m, a coefficient of determination of 0.88 and a relative bias of 3.36%. Furthermore, we systematically analyzed the accuracies among different height groups and tree species by comparing with field measurements. Our results implied that taller trees had relatively higher uncertainties than shorter trees. Our findings also show that the accuracy for CBH estimation was the highest for black oak trees, with an RMSE of 0.52 m. The conifer species results were also good with uniformly high R 2 ranging from 0.82 to 0.93. In general, our method has demonstrated high accuracy for individual tree CBH estimation and strong potential for applications in mixed species over large areas.
Bisseleua, D H B; Vidal, Stefan
2011-02-01
The spatio-temporal distribution of Sahlbergella singularis Haglung, a major pest of cacao trees (Theobroma cacao) (Malvaceae), was studied for 2 yr in traditional cacao forest gardens in the humid forest area of southern Cameroon. The first objective was to analyze the dispersion of this insect on cacao trees. The second objective was to develop sampling plans based on fixed levels of precision for estimating S. singularis populations. The following models were used to analyze the data: Taylor's power law, Iwao's patchiness regression, the Nachman model, and the negative binomial distribution. Our results document that Taylor's power law was a better fit for the data than the Iwao and Nachman models. Taylor's b and Iwao's β were both significantly >1, indicating that S. singularis aggregated on specific trees. This result was further supported by the calculated common k of 1.75444. Iwao's α was significantly <0, indicating that the basic distribution component of S. singularis was the individual insect. Comparison of negative binomial (NBD) and Nachman models indicated that the NBD model was appropriate for studying S. singularis distribution. Optimal sample sizes for fixed precision levels of 0.10, 0.15, and 0.25 were estimated with Taylor's regression coefficients. Required sample sizes increased dramatically with increasing levels of precision. This is the first study on S. singularis dispersion in cacao plantations. Sampling plans, presented here, should be a tool for research on population dynamics and pest management decisions of mirid bugs on cacao. © 2011 Entomological Society of America
Modelling daily water temperature from air temperature for the Missouri River.
Zhu, Senlin; Nyarko, Emmanuel Karlo; Hadzima-Nyarko, Marijana
2018-01-01
The bio-chemical and physical characteristics of a river are directly affected by water temperature, which thereby affects the overall health of aquatic ecosystems. It is a complex problem to accurately estimate water temperature. Modelling of river water temperature is usually based on a suitable mathematical model and field measurements of various atmospheric factors. In this article, the air-water temperature relationship of the Missouri River is investigated by developing three different machine learning models (Artificial Neural Network (ANN), Gaussian Process Regression (GPR), and Bootstrap Aggregated Decision Trees (BA-DT)). Standard models (linear regression, non-linear regression, and stochastic models) are also developed and compared to machine learning models. Analyzing the three standard models, the stochastic model clearly outperforms the standard linear model and nonlinear model. All the three machine learning models have comparable results and outperform the stochastic model, with GPR having slightly better results for stations No. 2 and 3, while BA-DT has slightly better results for station No. 1. The machine learning models are very effective tools which can be used for the prediction of daily river temperature.
Characteristics of the tree-drawing test in chronic schizophrenia.
Kaneda, Ayako; Yasui-Furukori, Norio; Saito, Manabu; Sugawara, Norio; Nakagami, Taku; Furukori, Hanako; Kaneko, Sunao
2010-04-01
A tree-drawing test acts as both a projective psychological examination as well as a supplementary psychodiagnostic tool. There is little information relating the characteristics of schizophrenia and the tree-drawing test. The present study compared the structural and morphological differences in the results of the tree-drawing test between schizophrenic patients and healthy individuals, as well as between schizophrenic patients who responded well to treatment and those who responded poorly. The subjects included 202 chronic schizophrenic patients and 113 healthy individuals. The schizophrenic patients were categorized as 'good responders' or 'poor responders' based on their response to medical treatments. The tree-drawing test was performed on all subjects. The tree drawn by each subject was analyzed structurally and morphologically. There were significant differences between the trunk and branches drawn by schizophrenic patients and those drawn by healthy controls. There were no significant differences between the good responders and the poor responders in any aspect of the tree drawings. Multiple regression models showed that the ratio of the tree area to the total area of the drawing paper, the width of the trunk, the trunk base opening, and the size of the branch ends were significantly associated with schizophrenia. The present study suggests that the trees drawn by schizophrenic patients are significantly different from those drawn by healthy individuals, but among schizophrenic patients, it is difficult to distinguish between good responders and poor responders using the tree-drawing test.
Unravelling the limits to tree height: a major role for water and nutrient trade-offs.
Cramer, Michael D
2012-05-01
Competition for light has driven forest trees to grow exceedingly tall, but the lack of a single universal limit to tree height indicates multiple interacting environmental limitations. Because soil nutrient availability is determined by both nutrient concentrations and soil water, water and nutrient availabilities may interact in determining realised nutrient availability and consequently tree height. In SW Australia, which is characterised by nutrient impoverished soils that support some of the world's tallest forests, total [P] and water availability were independently correlated with tree height (r = 0.42 and 0.39, respectively). However, interactions between water availability and each of total [P], pH and [Mg] contributed to a multiple linear regression model of tree height (r = 0.72). A boosted regression tree model showed that maximum tree height was correlated with water availability (24%), followed by soil properties including total P (11%), Mg (10%) and total N (9%), amongst others, and that there was an interaction between water availability and total [P] in determining maximum tree height. These interactions indicated a trade-off between water and P availability in determining maximum tree height in SW Australia. This is enabled by a species assemblage capable of growing tall and surviving (some) disturbances. The mechanism for this trade-off is suggested to be through water enabling mass-flow and diffusive mobility of P, particularly of relatively mobile organic P, although water interactions with microbial activity could also play a role.
A Clinical Decision Support System for Breast Cancer Patients
NASA Astrophysics Data System (ADS)
Fernandes, Ana S.; Alves, Pedro; Jarman, Ian H.; Etchells, Terence A.; Fonseca, José M.; Lisboa, Paulo J. G.
This paper proposes a Web clinical decision support system for clinical oncologists and for breast cancer patients making prognostic assessments, using the particular characteristics of the individual patient. This system comprises three different prognostic modelling methodologies: the clinically widely used Nottingham prognostic index (NPI); the Cox regression modelling and a partial logistic artificial neural network with automatic relevance determination (PLANN-ARD). All three models yield a different prognostic index that can be analysed together in order to obtain a more accurate prognostic assessment of the patient. Missing data is incorporated in the mentioned models, a common issue in medical data that was overcome using multiple imputation techniques. Risk group assignments are also provided through a methodology based on regression trees, where Boolean rules can be obtained expressed with patient characteristics.
Gbm.auto: A software tool to simplify spatial modelling and Marine Protected Area planning
Officer, Rick; Clarke, Maurice; Reid, David G.; Brophy, Deirdre
2017-01-01
Boosted Regression Trees. Excellent for data-poor spatial management but hard to use Marine resource managers and scientists often advocate spatial approaches to manage data-poor species. Existing spatial prediction and management techniques are either insufficiently robust, struggle with sparse input data, or make suboptimal use of multiple explanatory variables. Boosted Regression Trees feature excellent performance and are well suited to modelling the distribution of data-limited species, but are extremely complicated and time-consuming to learn and use, hindering access for a wide potential user base and therefore limiting uptake and usage. BRTs automated and simplified for accessible general use with rich feature set We have built a software suite in R which integrates pre-existing functions with new tailor-made functions to automate the processing and predictive mapping of species abundance data: by automating and greatly simplifying Boosted Regression Tree spatial modelling, the gbm.auto R package suite makes this powerful statistical modelling technique more accessible to potential users in the ecological and modelling communities. The package and its documentation allow the user to generate maps of predicted abundance, visualise the representativeness of those abundance maps and to plot the relative influence of explanatory variables and their relationship to the response variables. Databases of the processed model objects and a report explaining all the steps taken within the model are also generated. The package includes a previously unavailable Decision Support Tool which combines estimated escapement biomass (the percentage of an exploited population which must be retained each year to conserve it) with the predicted abundance maps to generate maps showing the location and size of habitat that should be protected to conserve the target stocks (candidate MPAs), based on stakeholder priorities, such as the minimisation of fishing effort displacement. Gbm.auto for management in various settings By bridging the gap between advanced statistical methods for species distribution modelling and conservation science, management and policy, these tools can allow improved spatial abundance predictions, and therefore better management, decision-making, and conservation. Although this package was built to support spatial management of a data-limited marine elasmobranch fishery, it should be equally applicable to spatial abundance modelling, area protection, and stakeholder engagement in various scenarios. PMID:29216310
Evaluation of Oil-Palm Fungal Disease Infestation with Canopy Hyperspectral Reflectance Data
Lelong, Camille C. D.; Roger, Jean-Michel; Brégand, Simon; Dubertret, Fabrice; Lanore, Mathieu; Sitorus, Nurul A.; Raharjo, Doni A.; Caliman, Jean-Pierre
2010-01-01
Fungal disease detection in perennial crops is a major issue in estate management and production. However, nowadays such diagnostics are long and difficult when only made from visual symptom observation, and very expensive and damaging when based on root or stem tissue chemical analysis. As an alternative, we propose in this study to evaluate the potential of hyperspectral reflectance data to help detecting the disease efficiently without destruction of tissues. This study focuses on the calibration of a statistical model of discrimination between several stages of Ganoderma attack on oil palm trees, based on field hyperspectral measurements at tree scale. Field protocol and measurements are first described. Then, combinations of pre-processing, partial least square regression and linear discriminant analysis are tested on about hundred samples to prove the efficiency of canopy reflectance in providing information about the plant sanitary status. A robust algorithm is thus derived, allowing classifying oil-palm in a 4-level typology, based on disease severity from healthy to critically sick stages, with a global performance close to 94%. Moreover, this model discriminates sick from healthy trees with a confidence level of almost 98%. Applications and further improvements of this experiment are finally discussed. PMID:22315565
O’Connor, Christopher D.; Lynch, Ann M.
2016-01-01
A significant concern about Metabolic Scaling Theory (MST) in real forests relates to consistent differences between the values of power law scaling exponents of tree primary size measures used to estimate mass and those predicted by MST. Here we consider why observed scaling exponents for diameter and height relationships deviate from MST predictions across three semi-arid conifer forests in relation to: (1) tree condition and physical form, (2) the level of inter-tree competition (e.g. open vs closed stand structure), (3) increasing tree age, and (4) differences in site productivity. Scaling exponent values derived from non-linear least-squares regression for trees in excellent condition (n = 381) were above the MST prediction at the 95% confidence level, while the exponent for trees in good condition were no different than MST (n = 926). Trees that were in fair or poor condition, characterized as diseased, leaning, or sparsely crowned had exponent values below MST predictions (n = 2,058), as did recently dead standing trees (n = 375). Exponent value of the mean-tree model that disregarded tree condition (n = 3,740) was consistent with other studies that reject MST scaling. Ostensibly, as stand density and competition increase trees exhibited greater morphological plasticity whereby the majority had characteristically fair or poor growth forms. Fitting by least-squares regression biases the mean-tree model scaling exponent toward values that are below MST idealized predictions. For 368 trees from Arizona with known establishment dates, increasing age had no significant impact on expected scaling. We further suggest height to diameter ratios below MST relate to vertical truncation caused by limitation in plant water availability. Even with environmentally imposed height limitation, proportionality between height and diameter scaling exponents were consistent with the predictions of MST. PMID:27391084
Swetnam, Tyson L; O'Connor, Christopher D; Lynch, Ann M
2016-01-01
A significant concern about Metabolic Scaling Theory (MST) in real forests relates to consistent differences between the values of power law scaling exponents of tree primary size measures used to estimate mass and those predicted by MST. Here we consider why observed scaling exponents for diameter and height relationships deviate from MST predictions across three semi-arid conifer forests in relation to: (1) tree condition and physical form, (2) the level of inter-tree competition (e.g. open vs closed stand structure), (3) increasing tree age, and (4) differences in site productivity. Scaling exponent values derived from non-linear least-squares regression for trees in excellent condition (n = 381) were above the MST prediction at the 95% confidence level, while the exponent for trees in good condition were no different than MST (n = 926). Trees that were in fair or poor condition, characterized as diseased, leaning, or sparsely crowned had exponent values below MST predictions (n = 2,058), as did recently dead standing trees (n = 375). Exponent value of the mean-tree model that disregarded tree condition (n = 3,740) was consistent with other studies that reject MST scaling. Ostensibly, as stand density and competition increase trees exhibited greater morphological plasticity whereby the majority had characteristically fair or poor growth forms. Fitting by least-squares regression biases the mean-tree model scaling exponent toward values that are below MST idealized predictions. For 368 trees from Arizona with known establishment dates, increasing age had no significant impact on expected scaling. We further suggest height to diameter ratios below MST relate to vertical truncation caused by limitation in plant water availability. Even with environmentally imposed height limitation, proportionality between height and diameter scaling exponents were consistent with the predictions of MST.
Error analysis of leaf area estimates made from allometric regression models
NASA Technical Reports Server (NTRS)
Feiveson, A. H.; Chhikara, R. S.
1986-01-01
Biological net productivity, measured in terms of the change in biomass with time, affects global productivity and the quality of life through biochemical and hydrological cycles and by its effect on the overall energy balance. Estimating leaf area for large ecosystems is one of the more important means of monitoring this productivity. For a particular forest plot, the leaf area is often estimated by a two-stage process. In the first stage, known as dimension analysis, a small number of trees are felled so that their areas can be measured as accurately as possible. These leaf areas are then related to non-destructive, easily-measured features such as bole diameter and tree height, by using a regression model. In the second stage, the non-destructive features are measured for all or for a sample of trees in the plots and then used as input into the regression model to estimate the total leaf area. Because both stages of the estimation process are subject to error, it is difficult to evaluate the accuracy of the final plot leaf area estimates. This paper illustrates how a complete error analysis can be made, using an example from a study made on aspen trees in northern Minnesota. The study was a joint effort by NASA and the University of California at Santa Barbara known as COVER (Characterization of Vegetation with Remote Sensing).
Extending the Distributed Lag Model framework to handle chemical mixtures.
Bello, Ghalib A; Arora, Manish; Austin, Christine; Horton, Megan K; Wright, Robert O; Gennings, Chris
2017-07-01
Distributed Lag Models (DLMs) are used in environmental health studies to analyze the time-delayed effect of an exposure on an outcome of interest. Given the increasing need for analytical tools for evaluation of the effects of exposure to multi-pollutant mixtures, this study attempts to extend the classical DLM framework to accommodate and evaluate multiple longitudinally observed exposures. We introduce 2 techniques for quantifying the time-varying mixture effect of multiple exposures on an outcome of interest. Lagged WQS, the first technique, is based on Weighted Quantile Sum (WQS) regression, a penalized regression method that estimates mixture effects using a weighted index. We also introduce Tree-based DLMs, a nonparametric alternative for assessment of lagged mixture effects. This technique is based on the Random Forest (RF) algorithm, a nonparametric, tree-based estimation technique that has shown excellent performance in a wide variety of domains. In a simulation study, we tested the feasibility of these techniques and evaluated their performance in comparison to standard methodology. Both methods exhibited relatively robust performance, accurately capturing pre-defined non-linear functional relationships in different simulation settings. Further, we applied these techniques to data on perinatal exposure to environmental metal toxicants, with the goal of evaluating the effects of exposure on neurodevelopment. Our methods identified critical neurodevelopmental windows showing significant sensitivity to metal mixtures. Copyright © 2017 Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Liu, S.; Zhuang, Q.
2016-12-01
Climatic change affects the plant physiological and biogeochemistry processes, and therefore on the ecosystem water use efficiency (WUE). Therefore, a comprehensive understanding of WUE would help us understand the adaptability of ecosystem to variable climate conditions. Tree ring data have great potential in addressing the forest response to climatic changes compared with mechanistic model simulations, eddy flux measurement and manipulative experiments. Here, we collected the tree ring isotopic carbon data in 12 boreal forest sites to develop a multiple linear regression model, and the model was extrapolated to the whole boreal region to obtain the WUE spatial and temporal variation from 1948 to 2010. Two algorithms were also used to estimate the inter-annual gross primary productivity (GPP) based on our derived WUE. Our results demonstrated that most of boreal regions showed significant increasing WUE trend during the period except parts of Alaska. The spatial averaged annual mean WUE was predicted to increase by 13%, from 2.3±0.4 g C kg-1 H2O at 1948 to 2.6±0.7 g C kg-1 H2O at 2012, which was much higher than other land surface models. Our predicted GPP by the WUE definition algorithm was comparable with site observation, while for the revised light use efficiency algorithm, GPP estimation was higher than site observation as well as than land surface models. In addition, the increasing GPP trends by two algorithms were similar with land surface model simulations. This is the first study to evaluate regional WUE and GPP in forest ecosystem based on tree ring data and future work should consider other variables (elevation, nitrogen deposition) that influence tree ring isotopic signals and the dual-isotope approach may help improve predicting the inter-annual WUE variation.
Predicting species' range limits from functional traits for the tree flora of North America.
Stahl, Ulrike; Reu, Björn; Wirth, Christian
2014-09-23
Using functional traits to explain species' range limits is a promising approach in functional biogeography. It replaces the idiosyncrasy of species-specific climate ranges with a generic trait-based predictive framework. In addition, it has the potential to shed light on specific filter mechanisms creating large-scale vegetation patterns. However, its application to a continental flora, spanning large climate gradients, has been hampered by a lack of trait data. Here, we explore whether five key plant functional traits (seed mass, wood density, specific leaf area (SLA), maximum height, and longevity of a tree)--indicative of life history, mechanical, and physiological adaptations--explain the climate ranges of 250 North American tree species distributed from the boreal to the subtropics. Although the relationship between traits and the median climate across a species range is weak, quantile regressions revealed strong effects on range limits. Wood density and seed mass were strongly related to the lower but not upper temperature range limits of species. Maximum height affects the species range limits in both dry and humid climates, whereas SLA and longevity do not show clear relationships. These results allow the definition and delineation of climatic "no-go areas" for North American tree species based on key traits. As some of these key traits serve as important parameters in recent vegetation models, the implementation of trait-based climatic constraints has the potential to predict both range shifts and ecosystem consequences on a more functional basis. Moreover, for future trait-based vegetation models our results provide a benchmark for model evaluation.
Jose Negron
1997-01-01
Classification trees and linear regression analysis were used to build models to predict probabilities of infestation and amount of tree mortality in terms of basal area resulting from roundheaded pine beetle, Dendroctonus adjunctus Blandford, activity in ponderosa pine, Pinus ponderosa Laws., in the Sacramento Mountains, New Mexico. Classification trees were built for...
2015-01-01
Among the recent data mining techniques available, the boosting approach has attracted a great deal of attention because of its effective learning algorithm and strong boundaries in terms of its generalization performance. However, the boosting approach has yet to be used in regression problems within the construction domain, including cost estimations, but has been actively utilized in other domains. Therefore, a boosting regression tree (BRT) is applied to cost estimations at the early stage of a construction project to examine the applicability of the boosting approach to a regression problem within the construction domain. To evaluate the performance of the BRT model, its performance was compared with that of a neural network (NN) model, which has been proven to have a high performance in cost estimation domains. The BRT model has shown results similar to those of NN model using 234 actual cost datasets of a building construction project. In addition, the BRT model can provide additional information such as the importance plot and structure model, which can support estimators in comprehending the decision making process. Consequently, the boosting approach has potential applicability in preliminary cost estimations in a building construction project. PMID:26339227
Shin, Yoonseok
2015-01-01
Among the recent data mining techniques available, the boosting approach has attracted a great deal of attention because of its effective learning algorithm and strong boundaries in terms of its generalization performance. However, the boosting approach has yet to be used in regression problems within the construction domain, including cost estimations, but has been actively utilized in other domains. Therefore, a boosting regression tree (BRT) is applied to cost estimations at the early stage of a construction project to examine the applicability of the boosting approach to a regression problem within the construction domain. To evaluate the performance of the BRT model, its performance was compared with that of a neural network (NN) model, which has been proven to have a high performance in cost estimation domains. The BRT model has shown results similar to those of NN model using 234 actual cost datasets of a building construction project. In addition, the BRT model can provide additional information such as the importance plot and structure model, which can support estimators in comprehending the decision making process. Consequently, the boosting approach has potential applicability in preliminary cost estimations in a building construction project.
NASA Astrophysics Data System (ADS)
Kwon, Y.
2013-12-01
As evidence of global warming continue to increase, being able to predict forest response to climate changes, such as expected rise of temperature and precipitation, will be vital for maintaining the sustainability and productivity of forests. To map forest species redistribution by climate change scenario has been successful, however, most species redistribution maps lack mechanistic understanding to explain why trees grow under the novel conditions of chaining climate. Distributional map is only capable of predicting under the equilibrium assumption that the communities would exist following a prolonged period under the new climate. In this context, forest NPP as a surrogate for growth rate, the most important facet that determines stand dynamics, can lead to valid prediction on the transition stage to new vegetation-climate equilibrium as it represents changes in structure of forest reflecting site conditions and climate factors. The objective of this study is to develop forest growth map using regression tree analysis by extracting large-scale non-linear structures from both field-based FIA and remotely sensed MODIS data set. The major issue addressed in this approach is non-linear spatial patterns of forest attributes. Forest inventory data showed complex spatial patterns that reflect environmental states and processes that originate at different spatial scales. At broad scales, non-linear spatial trends in forest attributes and mixture of continuous and discrete types of environmental variables make traditional statistical (multivariate regression) and geostatistical (kriging) models inefficient. It calls into question some traditional underlying assumptions of spatial trends that uncritically accepted in forest data. To solve the controversy surrounding the suitability of forest data, regression tree analysis are performed using Software See5 and Cubist. Four publicly available data sets were obtained: First, field-based Forest Inventory and Analysis (USDA, Forest Service) data set for the 31 eastern most United States. Second, 8-day composite of MODIS Land Cover, FPAR, LAI and GPP/NPP data were obtained from Jan 2001 to Dec 2004 (total 182 composite) and each product were filtered by pixel-level quality assurance data to select best quality pixels. Third, 30-year averaged climate data were collected from National Oceanic and Atmospheric Administration (NOAA) and five climatic variables were obtained: Monthly temperature, precipitation, annual heating and cooling days, and annual frost-free days. Forth, topographic data were obtained from digital elevation model (1km by 1km). This research will provide a better understanding of large-scale forest responses to environmental factors that will be beneficial for the development of important forest management applications.
Zhao, Cai-Yun; Xu, Jing; Liu, Xiao-Yan
2017-01-01
Abstract Globalization increases the opportunities for unintentionally introduced invasive alien species, especially for insects, and most of these species could damage ecosystems and cause economic loss in China. In this study, we analyzed drivers of the distribution of unintentionally introduced invasive alien insects. Based on the number of unintentionally introduced invasive alien insects and their presence/absence records in each province in mainland China, regression trees were built to elucidate the roles of environmental and anthropogenic factors on the number distribution and similarity of species composition of these insects. Classification and regression trees indicated climatic suitability (the mean temperature in January) and human economic activity (sum of total freight) are primary drivers for the number distribution pattern of unintentionally introduced invasive alien insects at provincial scale, while only environmental factors (the mean January temperature, the annual precipitation and the areas of provinces) significantly affect the similarity of them based on the multivariate regression trees. PMID:28973576
Chen, Carla Chia-Ming; Schwender, Holger; Keith, Jonathan; Nunkesser, Robin; Mengersen, Kerrie; Macrossan, Paula
2011-01-01
Due to advancements in computational ability, enhanced technology and a reduction in the price of genotyping, more data are being generated for understanding genetic associations with diseases and disorders. However, with the availability of large data sets comes the inherent challenges of new methods of statistical analysis and modeling. Considering a complex phenotype may be the effect of a combination of multiple loci, various statistical methods have been developed for identifying genetic epistasis effects. Among these methods, logic regression (LR) is an intriguing approach incorporating tree-like structures. Various methods have built on the original LR to improve different aspects of the model. In this study, we review four variations of LR, namely Logic Feature Selection, Monte Carlo Logic Regression, Genetic Programming for Association Studies, and Modified Logic Regression-Gene Expression Programming, and investigate the performance of each method using simulated and real genotype data. We contrast these with another tree-like approach, namely Random Forests, and a Bayesian logistic regression with stochastic search variable selection.
Zigler, S.J.; Newton, T.J.; Steuer, J.J.; Bartsch, M.R.; Sauer, J.S.
2008-01-01
Interest in understanding physical and hydraulic factors that might drive distribution and abundance of freshwater mussels has been increasing due to their decline throughout North America. We assessed whether the spatial distribution of unionid mussels could be predicted from physical and hydraulic variables in a reach of the Upper Mississippi River. Classification and regression tree (CART) models were constructed using mussel data compiled from various sources and explanatory variables derived from GIS coverages. Prediction success of CART models for presence-absence of mussels ranged from 71 to 76% across three gears (brail, sled-dredge, and dive-quadrat) and 51% of the deviance in abundance. Models were largely driven by shear stress and substrate stability variables, but interactions with simple physical variables, especially slope, were also important. Geospatial models, which were based on tree model results, predicted few mussels in poorly connected backwater areas (e.g., floodplain lakes) and the navigation channel, whereas main channel border areas with high geomorphic complexity (e.g., river bends, islands, side channel entrances) and small side channels were typically favorable to mussels. Moreover, bootstrap aggregation of discharge-specific regression tree models of dive-quadrat data indicated that variables measured at low discharge were about 25% more predictive (PMSE = 14.8) than variables measured at median discharge (PMSE = 20.4) with high discharge (PMSE = 17.1) variables intermediate. This result suggests that episodic events such as droughts and floods were important in structuring mussel distributions. Although the substantial mussel and ancillary data in our study reach is unusual, our approach to develop exploratory statistical and geospatial models should be useful even when data are more limited. ?? 2007 Springer Science+Business Media B.V.
Identifying pollution sources and predicting urban air quality using ensemble learning methods
NASA Astrophysics Data System (ADS)
Singh, Kunwar P.; Gupta, Shikha; Rai, Premanjali
2013-12-01
In this study, principal components analysis (PCA) was performed to identify air pollution sources and tree based ensemble learning models were constructed to predict the urban air quality of Lucknow (India) using the air quality and meteorological databases pertaining to a period of five years. PCA identified vehicular emissions and fuel combustion as major air pollution sources. The air quality indices revealed the air quality unhealthy during the summer and winter. Ensemble models were constructed to discriminate between the seasonal air qualities, factors responsible for discrimination, and to predict the air quality indices. Accordingly, single decision tree (SDT), decision tree forest (DTF), and decision treeboost (DTB) were constructed and their generalization and predictive performance was evaluated in terms of several statistical parameters and compared with conventional machine learning benchmark, support vector machines (SVM). The DT and SVM models discriminated the seasonal air quality rendering misclassification rate (MR) of 8.32% (SDT); 4.12% (DTF); 5.62% (DTB), and 6.18% (SVM), respectively in complete data. The AQI and CAQI regression models yielded a correlation between measured and predicted values and root mean squared error of 0.901, 6.67 and 0.825, 9.45 (SDT); 0.951, 4.85 and 0.922, 6.56 (DTF); 0.959, 4.38 and 0.929, 6.30 (DTB); 0.890, 7.00 and 0.836, 9.16 (SVR) in complete data. The DTF and DTB models outperformed the SVM both in classification and regression which could be attributed to the incorporation of the bagging and boosting algorithms in these models. The proposed ensemble models successfully predicted the urban ambient air quality and can be used as effective tools for its management.
van der Ploeg, Tjeerd; Nieboer, Daan; Steyerberg, Ewout W
2016-10-01
Prediction of medical outcomes may potentially benefit from using modern statistical modeling techniques. We aimed to externally validate modeling strategies for prediction of 6-month mortality of patients suffering from traumatic brain injury (TBI) with predictor sets of increasing complexity. We analyzed individual patient data from 15 different studies including 11,026 TBI patients. We consecutively considered a core set of predictors (age, motor score, and pupillary reactivity), an extended set with computed tomography scan characteristics, and a further extension with two laboratory measurements (glucose and hemoglobin). With each of these sets, we predicted 6-month mortality using default settings with five statistical modeling techniques: logistic regression (LR), classification and regression trees, random forests (RFs), support vector machines (SVM) and neural nets. For external validation, a model developed on one of the 15 data sets was applied to each of the 14 remaining sets. This process was repeated 15 times for a total of 630 validations. The area under the receiver operating characteristic curve (AUC) was used to assess the discriminative ability of the models. For the most complex predictor set, the LR models performed best (median validated AUC value, 0.757), followed by RF and support vector machine models (median validated AUC value, 0.735 and 0.732, respectively). With each predictor set, the classification and regression trees models showed poor performance (median validated AUC value, <0.7). The variability in performance across the studies was smallest for the RF- and LR-based models (inter quartile range for validated AUC values from 0.07 to 0.10). In the area of predicting mortality from TBI, nonlinear and nonadditive effects are not pronounced enough to make modern prediction methods beneficial. Copyright © 2016 Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Deo, Ravinesh C.; Kisi, Ozgur; Singh, Vijay P.
2017-02-01
Drought forecasting using standardized metrics of rainfall is a core task in hydrology and water resources management. Standardized Precipitation Index (SPI) is a rainfall-based metric that caters for different time-scales at which the drought occurs, and due to its standardization, is well-suited for forecasting drought at different periods in climatically diverse regions. This study advances drought modelling using multivariate adaptive regression splines (MARS), least square support vector machine (LSSVM), and M5Tree models by forecasting SPI in eastern Australia. MARS model incorporated rainfall as mandatory predictor with month (periodicity), Southern Oscillation Index, Pacific Decadal Oscillation Index and Indian Ocean Dipole, ENSO Modoki and Nino 3.0, 3.4 and 4.0 data added gradually. The performance was evaluated with root mean square error (RMSE), mean absolute error (MAE), and coefficient of determination (r2). Best MARS model required different input combinations, where rainfall, sea surface temperature and periodicity were used for all stations, but ENSO Modoki and Pacific Decadal Oscillation indices were not required for Bathurst, Collarenebri and Yamba, and the Southern Oscillation Index was not required for Collarenebri. Inclusion of periodicity increased the r2 value by 0.5-8.1% and reduced RMSE by 3.0-178.5%. Comparisons showed that MARS superseded the performance of the other counterparts for three out of five stations with lower MAE by 15.0-73.9% and 7.3-42.2%, respectively. For the other stations, M5Tree was better than MARS/LSSVM with lower MAE by 13.8-13.4% and 25.7-52.2%, respectively, and for Bathurst, LSSVM yielded more accurate result. For droughts identified by SPI ≤ - 0.5, accurate forecasts were attained by MARS/M5Tree for Bathurst, Yamba and Peak Hill, whereas for Collarenebri and Barraba, M5Tree was better than LSSVM/MARS. Seasonal analysis revealed disparate results where MARS/M5Tree was better than LSSVM. The results highlight the importance of periodicity in drought forecasting and also ascertains that model accuracy scales with geographic/seasonal factors due to complexity of drought and its relationship with inputs and data attributes that can affect the evolution of drought events.
Eric J. Gustafson
2014-01-01
Regression models developed in the upper Midwest (United States) to predict drought-induced tree mortality from measures of drought (Palmer Drought Severity Index) were tested in the northeastern United States and found inadequate. The most likely cause of this result is that long drought events were rare in the Northeast during the period when inventory data were...
Equations relating compacted and uncompacted live crown ratio for common tree species in the South
KaDonna C. Randolph
2010-01-01
Species-specific equations to predict uncompacted crown ratio (UNCR) from compacted live crown ratio (CCR), tree length, and stem diameter were developed for 24 species and 12 genera in the southern United States. Using data from the US Forest Service Forest Inventory and Analysis program, nonlinear regression was used to model UNCR with a logistic function. Model...
Estimating leaf area and leaf biomass of open-grown deciduous urban trees
David J. Nowak
1996-01-01
Logarithmic regression equations were developed to predict leaf area and leaf biomass for open-grown deciduous urban trees based on stem diameter and crown parameters. Equations based on crown parameters produced more reliable estimates. The equations can be used to help quantify forest structure and functions, particularly in urbanizing and urban/suburban areas.
Improving Cluster Analysis with Automatic Variable Selection Based on Trees
2014-12-01
regression trees Daisy DISsimilAritY PAM partitioning around medoids PMA penalized multivariate analysis SPC sparse principal components UPGMA unweighted...unweighted pair-group average method ( UPGMA ). This method measures dissimilarities between all objects in two clusters and takes the average value
Evaluation and prediction of shrub cover in coastal Oregon forests (USA)
Becky K. Kerns; Janet L. Ohmann
2004-01-01
We used data from regional forest inventories and research programs, coupled with mapped climatic and topographic information, to explore relationships and develop multiple linear regression (MLR) and regression tree models for total and deciduous shrub cover in the Oregon coastal province. Results from both types of models indicate that forest structure variables were...
National scale biomass estimators for United States tree species
Jennifer C. Jenkins; David C. Chojnacky; Linda S. Heath; Richard A. Birdsey
2003-01-01
Estimates of national-scale forest carbon (C) stocks and fluxes are typically based on allometric regression equations developed using dimensional analysis techniques. However, the literature is inconsistent and incomplete with respect to large-scale forest C estimation. We compiled all available diameter-based allometric regression equations for estimating total...
James W. Flewelling
2009-01-01
Remotely sensed data can be used to make digital maps showing individual tree crowns (ITC) for entire forests. Attributes of the ITCs may include area, shape, height, and color. The crown map is sampled in a way that provides an unbiased linkage between ITCs and identifiable trees measured on the ground. Methods of avoiding edge bias are given. In an example from a...
Ryberg, Karen R.; Vecchia, Aldo V.; Akyüz, F. Adnan; Lin, Wei
2016-01-01
Historically unprecedented flooding occurred in the Souris River Basin of Saskatchewan, North Dakota and Manitoba in 2011, during a longer term period of wet conditions in the basin. In order to develop a model of future flows, there is a need to evaluate effects of past multidecadal climate variability and/or possible climate change on precipitation. In this study, tree-ring chronologies and historical precipitation data in a four-degree buffer around the Souris River Basin were analyzed to develop regression models that can be used for predicting long-term variations of precipitation. To focus on longer term variability, 12-year moving average precipitation was modeled in five subregions (determined through cluster analysis of measures of precipitation) of the study area over three seasons (November–February, March–June and July–October). The models used multiresolution decomposition (an additive decomposition based on powers of two using a discrete wavelet transform) of tree-ring chronologies from Canada and the US and seasonal 12-year moving average precipitation based on Adjusted and Homogenized Canadian Climate Data and US Historical Climatology Network data. Results show that precipitation varies on long-term (multidecadal) time scales of 16, 32 and 64 years. Past extended pluvial and drought events, which can vary greatly with season and subregion, were highlighted by the models. Results suggest that the recent wet period may be a part of natural variability on a very long time scale.
Buchner, Florian; Wasem, Jürgen; Schillo, Sonja
2017-01-01
Risk equalization formulas have been refined since their introduction about two decades ago. Because of the complexity and the abundance of possible interactions between the variables used, hardly any interactions are considered. A regression tree is used to systematically search for interactions, a methodologically new approach in risk equalization. Analyses are based on a data set of nearly 2.9 million individuals from a major German social health insurer. A two-step approach is applied: In the first step a regression tree is built on the basis of the learning data set. Terminal nodes characterized by more than one morbidity-group-split represent interaction effects of different morbidity groups. In the second step the 'traditional' weighted least squares regression equation is expanded by adding interaction terms for all interactions detected by the tree, and regression coefficients are recalculated. The resulting risk adjustment formula shows an improvement in the adjusted R 2 from 25.43% to 25.81% on the evaluation data set. Predictive ratios are calculated for subgroups affected by the interactions. The R 2 improvement detected is only marginal. According to the sample level performance measures used, not involving a considerable number of morbidity interactions forms no relevant loss in accuracy. Copyright © 2015 John Wiley & Sons, Ltd. Copyright © 2015 John Wiley & Sons, Ltd.
Tree-ring reconstructions of hydroclimatic variability in the Upper Colorado River Basin
NASA Astrophysics Data System (ADS)
Hidalgo-Leon, Hugo
Three major sources of improvements in tree-ring analysis and reconstruction of hydroclimatic variables are presented for the Upper Colorado River Basin (UCRB) in the southwestern U.S.: (1) Cross validation statistics are used for identifying optimal reconstruction models based on different alternatives of PCA-based regression. Results showed that a physically-consistent parsimonious model with low mean square error can be obtained by using strict rules for principal component selection and cross validation statistics. The improved methods were used to produce a ˜500 year high-resolution reconstruction of the UCRB's streamflow and compared with results of a previous reconstruction based on traditional procedures. (2) Tree-species' type was found to be a factor for determining chronology selection from dendrohydroclimatic models. The relative sensitivity of six tree species (Pinus edulis, Pseudotsuga menziesii, Pinus ponderosa, Pinus flexilis, Pinus aristata, and Picea engelmanni) to hydroclimatic extreme variations was determined using contingency table scores of tree-ring growth (at different lags) against hydroclimatic observations. Pinus edulis and Pseudotsuga menziesii were found to be the species most sensitive to low water. Results showed that tree-rings are biased towards greater sensitivity to hot-dry conditions and less responsive to cool-moist conditions. Resulted also showed higher streamflow response scores compared to precipitation implying a good integration and persistence representation of the basin through normal hydrological processes. (3) Previous reconstructions on the basin used data extending only up to 1963. This is an important limitation since hydroclimatic records from 1963 to the present show significantly different variation than prior to 1963. The changes are caused by variations in the strength of forcing mechanisms from the Pacific Ocean. A comparative analysis of the influence of North Pacific variation and El Nino/Southern Oscillation (ENSO) showed that the responses of Tropical and North Pacific forcing in UCRB's hydroclimate are different for annual precipitation and total streamflow and that these relationships have changed at decadal time scales. Furthermore, most of the few tree-rings available up to 1985, present the same shifts as the hydroclimatic variables studied. To capture the full range of variability observed in instrumental data is necessary to collect new tree-ring samples.
Liu, Yang; Lü, Yi-he; Zheng, Hai-feng; Chen, Li-ding
2010-05-01
Based on the 10-day SPOT VEGETATION NDVI data and the daily meteorological data from 1998 to 2007 in Yan' an City, the main meteorological variables affecting the annual and interannual variations of NDVI were determined by using regression tree. It was found that the effects of test meteorological variables on the variability of NDVI differed with seasons and time lags. Temperature and precipitation were the most important meteorological variables affecting the annual variation of NDVI, and the average highest temperature was the most important meteorological variable affecting the inter-annual variation of NDVI. Regression tree was very powerful in determining the key meteorological variables affecting NDVI variation, but could not build quantitative relations between NDVI and meteorological variables, which limited its further and wider application.
NASA Astrophysics Data System (ADS)
Lilly, P.; Yanai, R. D.; Buckley, H. L.; Case, B. S.; Woollons, R. C.; Holdaway, R. J.; Johnson, J.
2016-12-01
Calculations of forest biomass and elemental content require many measurements and models, each contributing uncertainty to the final estimates. While sampling error is commonly reported, based on replicate plots, error due to uncertainty in the regression used to estimate biomass from tree diameter is usually not quantified. Some published estimates of uncertainty due to the regression models have used the uncertainty in the prediction of individuals, ignoring uncertainty in the mean, while others have propagated uncertainty in the mean while ignoring individual variation. Using the simple case of the calcium concentration of sugar maple leaves, we compare the variation among individuals (the standard deviation) to the uncertainty in the mean (the standard error) and illustrate the declining importance in the prediction of individual concentrations as the number of individuals increases. For allometric models, the analogous statistics are the prediction interval (or the residual variation in the model fit) and the confidence interval (describing the uncertainty in the best fit model). The effect of propagating these two sources of error is illustrated using the mass of sugar maple foliage. The uncertainty in individual tree predictions was large for plots with few trees; for plots with 30 trees or more, the uncertainty in individuals was less important than the uncertainty in the mean. Authors of previously published analyses have reanalyzed their data to show the magnitude of these two sources of uncertainty in scales ranging from experimental plots to entire countries. The most correct analysis will take both sources of uncertainty into account, but for practical purposes, country-level reports of uncertainty in carbon stocks, as required by the IPCC, can ignore the uncertainty in individuals. Ignoring the uncertainty in the mean will lead to exaggerated estimates of confidence in estimates of forest biomass and carbon and nutrient contents.
Modeling non-linear growth responses to temperature and hydrology in wetland trees
NASA Astrophysics Data System (ADS)
Keim, R.; Allen, S. T.
2016-12-01
Growth responses of wetland trees to flooding and climate variations are difficult to model because they depend on multiple, apparently interacting factors, but are a critical link in hydrological control of wetland carbon budgets. To more generally understand tree growth to hydrological forcing, we modeled non-linear responses of tree ring growth to flooding and climate at sub-annual time steps, using Vaganov-Shashkin response functions. We calibrated the model to six baldcypress tree-ring chronologies from two hydrologically distinct sites in southern Louisiana, and tested several hypotheses of plasticity in wetlands tree responses to interacting environmental variables. The model outperformed traditional multiple linear regression. More importantly, optimized response parameters were generally similar among sites with varying hydrological conditions, suggesting generality to the functions. Model forms that included interacting responses to multiple forcing factors were more effective than were single response functions, indicating the principle of a single limiting factor is not correct in wetlands and both climatic and hydrological variables must be considered in predicting responses to hydrological or climate change.
Zhuo, Lin; Tao, Hong; Wei, Hong; Chengzhen, Wu
2016-01-01
We tried to establish compatible carbon content models of individual trees for a Chinese fir (Cunninghamia lanceolata (Lamb.) Hook.) plantation from Fujian province in southeast China. In general, compatibility requires that the sum of components equal the whole tree, meaning that the sum of percentages calculated from component equations should equal 100%. Thus, we used multiple approaches to simulate carbon content in boles, branches, foliage leaves, roots and the whole individual trees. The approaches included (i) single optimal fitting (SOF), (ii) nonlinear adjustment in proportion (NAP) and (iii) nonlinear seemingly unrelated regression (NSUR). These approaches were used in combination with variables relating diameter at breast height (D) and tree height (H), such as D, D2H, DH and D&H (where D&H means two separate variables in bivariate model). Power, exponential and polynomial functions were tested as well as a new general function model was proposed by this study. Weighted least squares regression models were employed to eliminate heteroscedasticity. Model performances were evaluated by using mean residuals, residual variance, mean square error and the determination coefficient. The results indicated that models with two dimensional variables (DH, D2H and D&H) were always superior to those with a single variable (D). The D&H variable combination was found to be the most useful predictor. Of all the approaches, SOF could establish a single optimal model separately, but there were deviations in estimating results due to existing incompatibilities, while NAP and NSUR could ensure predictions compatibility. Simultaneously, we found that the new general model had better accuracy than others. In conclusion, we recommend that the new general model be used to estimate carbon content for Chinese fir and considered for other vegetation types as well. PMID:26982054
Liu, Pei-Yang
2014-01-01
Metabolic syndrome (MetS) in young adults (age 20–39) is often undiagnosed. A simple screening tool using a surrogate measure might be invaluable in the early detection of MetS. Methods. A chi-squared automatic interaction detection (CHAID) decision tree analysis with waist circumference user-specified as the first level was used to detect MetS in young adults using data from the National Health and Nutrition Examination Survey (NHANES) 2009-2010 Cohort as a representative sample of the United States population (n = 745). Results. Twenty percent of the sample met the National Cholesterol Education Program Adult Treatment Panel III (NCEP) classification criteria for MetS. The user-specified CHAID model was compared to both CHAID model with no user-specified first level and logistic regression based model. This analysis identified waist circumference as a strong predictor in the MetS diagnosis. The accuracy of the final model with waist circumference user-specified as the first level was 92.3% with its ability to detect MetS at 71.8% which outperformed comparison models. Conclusions. Preliminary findings suggest that young adults at risk for MetS could be identified for further followup based on their waist circumference. Decision tree methods show promise for the development of a preliminary detection algorithm for MetS. PMID:24817904
Speiser, Jaime Lynn; Lee, William M; Karvellas, Constantine J
2015-01-01
Assessing prognosis for acetaminophen-induced acute liver failure (APAP-ALF) patients often presents significant challenges. King's College (KCC) has been validated on hospital admission, but little has been published on later phases of illness. We aimed to improve determinations of prognosis both at the time of and following admission for APAP-ALF using Classification and Regression Tree (CART) models. CART models were applied to US ALFSG registry data to predict 21-day death or liver transplant early (on admission) and post-admission (days 3-7) for 803 APAP-ALF patients enrolled 01/1998-09/2013. Accuracy in prediction of outcome (AC), sensitivity (SN), specificity (SP), and area under receiver-operating curve (AUROC) were compared between 3 models: KCC (INR, creatinine, coma grade, pH), CART analysis using only KCC variables (KCC-CART) and a CART model using new variables (NEW-CART). Traditional KCC yielded 69% AC, 90% SP, 27% SN, and 0.58 AUROC on admission, with similar performance post-admission. KCC-CART at admission offered predictive 66% AC, 65% SP, 67% SN, and 0.74 AUROC. Post-admission, KCC-CART had predictive 82% AC, 86% SP, 46% SN and 0.81 AUROC. NEW-CART models using MELD (Model for end stage liver disease), lactate and mechanical ventilation on admission yielded predictive 72% AC, 71% SP, 77% SN and AUROC 0.79. For later stages, NEW-CART (MELD, lactate, coma grade) offered predictive AC 86%, SP 91%, SN 46%, AUROC 0.73. CARTs offer simple prognostic models for APAP-ALF patients, which have higher AUROC and SN than KCC, with similar AC and negligibly worse SP. Admission and post-admission predictions were developed. • Prognostication in acetaminophen-induced acute liver failure (APAP-ALF) is challenging beyond admission • Little has been published regarding the use of King's College Criteria (KCC) beyond admission and KCC has shown limited sensitivity in subsequent studies • Classification and Regression Tree (CART) methodology allows the development of predictive models using binary splits and offers an intuitive method for predicting outcome, using processes familiar to clinicians • Data from the ALFSG registry suggested that CART prognosis models for the APAP population offer improved sensitivity and model performance over traditional regression-based KCC, while maintaining similar accuracy and negligibly worse specificity • KCC-CART models offered modest improvement over traditional KCC, with NEW-CART models performing better than KCC-CART particularly at late time points.
Inferring epidemiological parameters from phylogenies using regression-ABC: A comparative study
Gascuel, Olivier
2017-01-01
Inferring epidemiological parameters such as the R0 from time-scaled phylogenies is a timely challenge. Most current approaches rely on likelihood functions, which raise specific issues that range from computing these functions to finding their maxima numerically. Here, we present a new regression-based Approximate Bayesian Computation (ABC) approach, which we base on a large variety of summary statistics intended to capture the information contained in the phylogeny and its corresponding lineage-through-time plot. The regression step involves the Least Absolute Shrinkage and Selection Operator (LASSO) method, which is a robust machine learning technique. It allows us to readily deal with the large number of summary statistics, while avoiding resorting to Markov Chain Monte Carlo (MCMC) techniques. To compare our approach to existing ones, we simulated target trees under a variety of epidemiological models and settings, and inferred parameters of interest using the same priors. We found that, for large phylogenies, the accuracy of our regression-ABC is comparable to that of likelihood-based approaches involving birth-death processes implemented in BEAST2. Our approach even outperformed these when inferring the host population size with a Susceptible-Infected-Removed epidemiological model. It also clearly outperformed a recent kernel-ABC approach when assuming a Susceptible-Infected epidemiological model with two host types. Lastly, by re-analyzing data from the early stages of the recent Ebola epidemic in Sierra Leone, we showed that regression-ABC provides more realistic estimates for the duration parameters (latency and infectiousness) than the likelihood-based method. Overall, ABC based on a large variety of summary statistics and a regression method able to perform variable selection and avoid overfitting is a promising approach to analyze large phylogenies. PMID:28263987
Hanrahan, Kirsten; McCarthy, Ann Marie; Kleiber, Charmaine; Ataman, Kaan; Street, W Nick; Zimmerman, M Bridget; Ersig, Anne L
2012-10-01
This secondary data analysis used data mining methods to develop predictive models of child risk for distress during a healthcare procedure. Data used came from a study that predicted factors associated with children's responses to an intravenous catheter insertion while parents provided distraction coaching. From the 255 items used in the primary study, 44 predictive items were identified through automatic feature selection and used to build support vector machine regression models. Models were validated using multiple cross-validation tests and by comparing variables identified as explanatory in the traditional versus support vector machine regression. Rule-based approaches were applied to the model outputs to identify overall risk for distress. A decision tree was then applied to evidence-based instructions for tailoring distraction to characteristics and preferences of the parent and child. The resulting decision support computer application, titled Children, Parents and Distraction, is being used in research. Future use will support practitioners in deciding the level and type of distraction intervention needed by a child undergoing a healthcare procedure.
NASA Astrophysics Data System (ADS)
Freeman, Mary Pyott
ABSTRACT An Analysis of Tree Mortality Using High Resolution Remotely-Sensed Data for Mixed-Conifer Forests in San Diego County by Mary Pyott Freeman The montane mixed-conifer forests of San Diego County are currently experiencing extensive tree mortality, which is defined as dieback where whole stands are affected. This mortality is likely the result of the complex interaction of many variables, such as altered fire regimes, climatic conditions such as drought, as well as forest pathogens and past management strategies. Conifer tree mortality and its spatial pattern and change over time were examined in three components. In component 1, two remote sensing approaches were compared for their effectiveness in delineating dead trees, a spatial contextual approach and an OBIA (object based image analysis) approach, utilizing various dates and spatial resolutions of airborne image data. For each approach transforms and masking techniques were explored, which were found to improve classifications, and an object-based assessment approach was tested. In component 2, dead tree maps produced by the most effective techniques derived from component 1 were utilized for point pattern and vector analyses to further understand spatio-temporal changes in tree mortality for the years 1997, 2000, 2002, and 2005 for three study areas: Palomar, Volcan and Laguna mountains. Plot-based fieldwork was conducted to further assess mortality patterns. Results indicate that conifer mortality was significantly clustered, increased substantially between 2002 and 2005, and was non-random with respect to tree species and diameter class sizes. In component 3, multiple environmental variables were used in Generalized Linear Model (GLM-logistic regression) and decision tree classifier model development, revealing the importance of climate and topographic factors such as precipitation and elevation, in being able to predict areas of high risk for tree mortality. The results from this study highlight the importance of multi-scale spatial as well as temporal analyses, in order to understand mixed-conifer forest structure, dynamics, and processes of decline, which can lead to more sustainable management of forests with continued natural and anthropogenic disturbance.
Chen, Weisheng; Sun, Cheng; Wei, Ru; Zhang, Yanlin; Ye, Heng; Chi, Ruibin; Zhang, Yichen; Hu, Bei; Lv, Bo; Chen, Lifang; Zhang, Xiunong; Lan, Huilan; Chen, Chunbo
2016-08-31
Despite the use of prokinetic agents, the overall success rate for postpyloric placement via a self-propelled spiral nasoenteric tube is quite low. This retrospective study was conducted in the intensive care units of 11 university hospitals from 2006 to 2016 among adult patients who underwent self-propelled spiral nasoenteric tube insertion. Success was defined as postpyloric nasoenteric tube placement confirmed by abdominal x-ray scan 24 hours after tube insertion. Chi-square automatic interaction detection (CHAID), simple classification and regression trees (SimpleCart), and J48 methodologies were used to develop decision tree models, and multiple logistic regression (LR) methodology was used to develop an LR model for predicting successful postpyloric nasoenteric tube placement. The area under the receiver operating characteristic curve (AUC) was used to evaluate the performance of these models. Successful postpyloric nasoenteric tube placement was confirmed in 427 of 939 patients enrolled. For predicting successful postpyloric nasoenteric tube placement, the performance of the 3 decision trees was similar in terms of the AUCs: 0.715 for the CHAID model, 0.682 for the SimpleCart model, and 0.671 for the J48 model. The AUC of the LR model was 0.729, which outperformed the J48 model. Both the CHAID and LR models achieved an acceptable discrimination for predicting successful postpyloric nasoenteric tube placement and were useful for intensivists in the setting of self-propelled spiral nasoenteric tube insertion. © 2016 American Society for Parenteral and Enteral Nutrition.
Chen, Weisheng; Sun, Cheng; Wei, Ru; Zhang, Yanlin; Ye, Heng; Chi, Ruibin; Zhang, Yichen; Hu, Bei; Lv, Bo; Chen, Lifang; Zhang, Xiunong; Lan, Huilan; Chen, Chunbo
2018-01-01
Despite the use of prokinetic agents, the overall success rate for postpyloric placement via a self-propelled spiral nasoenteric tube is quite low. This retrospective study was conducted in the intensive care units of 11 university hospitals from 2006 to 2016 among adult patients who underwent self-propelled spiral nasoenteric tube insertion. Success was defined as postpyloric nasoenteric tube placement confirmed by abdominal x-ray scan 24 hours after tube insertion. Chi-square automatic interaction detection (CHAID), simple classification and regression trees (SimpleCart), and J48 methodologies were used to develop decision tree models, and multiple logistic regression (LR) methodology was used to develop an LR model for predicting successful postpyloric nasoenteric tube placement. The area under the receiver operating characteristic curve (AUC) was used to evaluate the performance of these models. Successful postpyloric nasoenteric tube placement was confirmed in 427 of 939 patients enrolled. For predicting successful postpyloric nasoenteric tube placement, the performance of the 3 decision trees was similar in terms of the AUCs: 0.715 for the CHAID model, 0.682 for the SimpleCart model, and 0.671 for the J48 model. The AUC of the LR model was 0.729, which outperformed the J48 model. Both the CHAID and LR models achieved an acceptable discrimination for predicting successful postpyloric nasoenteric tube placement and were useful for intensivists in the setting of self-propelled spiral nasoenteric tube insertion. © 2016 American Society for Parenteral and Enteral Nutrition.
Ahn, Jae Joon; Kim, Young Min; Yoo, Keunje; Park, Joonhong; Oh, Kyong Joo
2012-11-01
For groundwater conservation and management, it is important to accurately assess groundwater pollution vulnerability. This study proposed an integrated model using ridge regression and a genetic algorithm (GA) to effectively select the major hydro-geological parameters influencing groundwater pollution vulnerability in an aquifer. The GA-Ridge regression method determined that depth to water, net recharge, topography, and the impact of vadose zone media were the hydro-geological parameters that influenced trichloroethene pollution vulnerability in a Korean aquifer. When using these selected hydro-geological parameters, the accuracy was improved for various statistical nonlinear and artificial intelligence (AI) techniques, such as multinomial logistic regression, decision trees, artificial neural networks, and case-based reasoning. These results provide a proof of concept that the GA-Ridge regression is effective at determining influential hydro-geological parameters for the pollution vulnerability of an aquifer, and in turn, improves the AI performance in assessing groundwater pollution vulnerability.
An Extension of CART's Pruning Algorithm. Program Statistics Research Technical Report No. 91-11.
ERIC Educational Resources Information Center
Kim, Sung-Ho
Among the computer-based methods used for the construction of trees such as AID, THAID, CART, and FACT, the only one that uses an algorithm that first grows a tree and then prunes the tree is CART. The pruning component of CART is analogous in spirit to the backward elimination approach in regression analysis. This idea provides a tool in…
STX--Fortran-4 program for estimates of tree populations from 3P sample-tree-measurements
L. R. Grosenbaugh
1967-01-01
Describes how to use an improved and greatly expanded version of an earlier computer program (1964) that converts dendrometer measurements of 3P-sample trees to population values in terms of whatever units user desires. Many new options are available, including that of obtaining a product-yield and appraisal report based on regression coefficients supplied by user....
Bayesian averaging over Decision Tree models for trauma severity scoring.
Schetinin, V; Jakaite, L; Krzanowski, W
2018-01-01
Health care practitioners analyse possible risks of misleading decisions and need to estimate and quantify uncertainty in predictions. We have examined the "gold" standard of screening a patient's conditions for predicting survival probability, based on logistic regression modelling, which is used in trauma care for clinical purposes and quality audit. This methodology is based on theoretical assumptions about data and uncertainties. Models induced within such an approach have exposed a number of problems, providing unexplained fluctuation of predicted survival and low accuracy of estimating uncertainty intervals within which predictions are made. Bayesian method, which in theory is capable of providing accurate predictions and uncertainty estimates, has been adopted in our study using Decision Tree models. Our approach has been tested on a large set of patients registered in the US National Trauma Data Bank and has outperformed the standard method in terms of prediction accuracy, thereby providing practitioners with accurate estimates of the predictive posterior densities of interest that are required for making risk-aware decisions. Copyright © 2017 Elsevier B.V. All rights reserved.
Comparison of Sub-Pixel Classification Approaches for Crop-Specific Mapping
This paper examined two non-linear models, Multilayer Perceptron (MLP) regression and Regression Tree (RT), for estimating sub-pixel crop proportions using time-series MODIS-NDVI data. The sub-pixel proportions were estimated for three major crop types including corn, soybean, a...
Beating the Odds: Trees to Success in Different Countries
ERIC Educational Resources Information Center
Finch, W. Holmes; Marchant, Gregory J.
2017-01-01
A recursive partitioning model approach in the form of classification and regression trees (CART) was used with 2012 PISA data for five countries (Canada, Finland, Germany, Singapore-China, and the Unites States). The objective of the study was to determine demographic and educational variables that differentiated between low SES student that were…
NASA Astrophysics Data System (ADS)
Zack, J. W.
2015-12-01
Predictions from Numerical Weather Prediction (NWP) models are the foundation for wind power forecasts for day-ahead and longer forecast horizons. The NWP models directly produce three-dimensional wind forecasts on their respective computational grids. These can be interpolated to the location and time of interest. However, these direct predictions typically contain significant systematic errors ("biases"). This is due to a variety of factors including the limited space-time resolution of the NWP models and shortcomings in the model's representation of physical processes. It has become common practice to attempt to improve the raw NWP forecasts by statistically adjusting them through a procedure that is widely known as Model Output Statistics (MOS). The challenge is to identify complex patterns of systematic errors and then use this knowledge to adjust the NWP predictions. The MOS-based improvements are the basis for much of the value added by commercial wind power forecast providers. There are an enormous number of statistical approaches that can be used to generate the MOS adjustments to the raw NWP forecasts. In order to obtain insight into the potential value of some of the newer and more sophisticated statistical techniques often referred to as "machine learning methods" a MOS-method comparison experiment has been performed for wind power generation facilities in 6 wind resource areas of California. The underlying NWP models that provided the raw forecasts were the two primary operational models of the US National Weather Service: the GFS and NAM models. The focus was on 1- and 2-day ahead forecasts of the hourly wind-based generation. The statistical methods evaluated included: (1) screening multiple linear regression, which served as a baseline method, (2) artificial neural networks, (3) a decision-tree approach called random forests, (4) gradient boosted regression based upon an decision-tree algorithm, (5) support vector regression and (6) analog ensemble, which is a case-matching scheme. The presentation will provide (1) an overview of each method and the experimental design, (2) performance comparisons based on standard metrics such as bias, MAE and RMSE, (3) a summary of the performance characteristics of each approach and (4) a preview of further experiments to be conducted.
Quantifying post-fire fallen trees using multi-temporal lidar
NASA Astrophysics Data System (ADS)
Bohlin, Inka; Olsson, Håkan; Bohlin, Jonas; Granström, Anders
2017-12-01
Massive tree-felling due to root damage is a common fire effect on burnt areas in Scandinavia, but has so far not been analyzed in detail. Here we explore if pre- and post-fire lidar data can be used to estimate the proportion of fallen trees. The study was carried out within a large (14,000 ha) area in central Sweden burnt in August 2014, where we had access to airborne lidar data from both 2011 and 2015. Three data-sets of predictor variables were tested: POST (post-fire lidar metrics), DIF (difference between post- and pre-fire lidar metrics) and combination of those two (POST_DIF). Fractional logistic regression was used to predict the proportion of fallen trees. Training data consisted of 61 plots, where the number of fallen and standing trees was calculated both in the field and with interpretation of drone images. The accuracy of the best model was tested based on 100 randomly selected validation plots with a size of 25 × 25 m. Our results showed that multi-temporal lidar together with field-collected training data can be used for quantifying post-fire tree felling over large areas. Several height-, density- and intensity metrics correlated with the proportion of fallen trees. The best model combined metrics from both datasets (POST_DIF), resulting in a RMSE of 0.11. Results were slightly poorer in the validation plots with RMSE of 0.18 using pixel size of 12.5 m and RMSE of 0.15 using pixel size of 6.25 m. Our model performed least well for stands that had been exposed to high-intensity crown fire. This was likely due to the low amount of echoes from the standing black tree skeletons. Wall-to-wall maps produced with this model can be used for landscape level analysis of fire effects and to explore the relationship between fallen trees and forest structure, soil type, fire intensity or topography.
Marek K. Jakubowksi; Qinghua Guo; Brandon Collins; Scott Stephens; Maggi Kelly
2013-01-01
We compared the ability of several classification and regression algorithms to predict forest stand structure metrics and standard surface fuel models. Our study area spans a dense, topographically complex Sierra Nevada mixed-conifer forest. We used clustering, regression trees, and support vector machine algorithms to analyze high density (average 9 pulses/m
Estimating tree species diversity in the savannah using NDVI and woody canopy cover
NASA Astrophysics Data System (ADS)
Madonsela, Sabelo; Cho, Moses Azong; Ramoelo, Abel; Mutanga, Onisimo; Naidoo, Laven
2018-04-01
Remote sensing applications in biodiversity research often rely on the establishment of relationships between spectral information from the image and tree species diversity measured in the field. Most studies have used normalized difference vegetation index (NDVI) to estimate tree species diversity on the basis that it is sensitive to primary productivity which defines spatial variation in plant diversity. The NDVI signal is influenced by photosynthetically active vegetation which, in the savannah, includes woody canopy foliage and grasses. The question is whether the relationship between NDVI and tree species diversity in the savanna depends on the woody cover percentage. This study explored the relationship between woody canopy cover (WCC) and tree species diversity in the savannah woodland of southern Africa and also investigated whether there is a significant interaction between seasonal NDVI and WCC in the factorial model when estimating tree species diversity. To fulfil our aim, we followed stratified random sampling approach and surveyed tree species in 68 plots of 90 m × 90 m across the study area. Within each plot, all trees with diameter at breast height of >10 cm were sampled and Shannon index - a common measure of species diversity which considers both species richness and abundance - was used to quantify tree species diversity. We then extracted WCC in each plot from existing fractional woody cover product produced from Synthetic Aperture Radar (SAR) data. Factorial regression model was used to determine the interaction effect between NDVI and WCC when estimating tree species diversity. Results from regression analysis showed that (i) WCC has a highly significant relationship with tree species diversity (r2 = 0.21; p < 0.01), (ii) the interaction between the NDVI and WCC is not significant, however, the factorial model significantly reduced the error of prediction (RMSE = 0.47, p < 0.05) compared to NDVI (RMSE = 0.49) or WCC (RMSE = 0.49) model during the senescence period. The result justifies our assertion that combining NDVI with WCC will be optimal for biodiversity estimation during the senescence period.
Using decision trees to understand structure in missing data
Tierney, Nicholas J; Harden, Fiona A; Harden, Maurice J; Mengersen, Kerrie L
2015-01-01
Objectives Demonstrate the application of decision trees—classification and regression trees (CARTs), and their cousins, boosted regression trees (BRTs)—to understand structure in missing data. Setting Data taken from employees at 3 different industrial sites in Australia. Participants 7915 observations were included. Materials and methods The approach was evaluated using an occupational health data set comprising results of questionnaires, medical tests and environmental monitoring. Statistical methods included standard statistical tests and the ‘rpart’ and ‘gbm’ packages for CART and BRT analyses, respectively, from the statistical software ‘R’. A simulation study was conducted to explore the capability of decision tree models in describing data with missingness artificially introduced. Results CART and BRT models were effective in highlighting a missingness structure in the data, related to the type of data (medical or environmental), the site in which it was collected, the number of visits, and the presence of extreme values. The simulation study revealed that CART models were able to identify variables and values responsible for inducing missingness. There was greater variation in variable importance for unstructured as compared to structured missingness. Discussion Both CART and BRT models were effective in describing structural missingness in data. CART models may be preferred over BRT models for exploratory analysis of missing data, and selecting variables important for predicting missingness. BRT models can show how values of other variables influence missingness, which may prove useful for researchers. Conclusions Researchers are encouraged to use CART and BRT models to explore and understand missing data. PMID:26124509
Zhong, Buqing; Liang, Tao; Wang, Lingqing; Li, Kexin
2014-08-15
An extensive soil survey was conducted to study pollution sources and delineate contamination of heavy metals in one of the metalliferous industrial bases, in the karst areas of southwest China. A total of 597 topsoil samples were collected and the concentrations of five heavy metals, namely Cd, As (metalloid), Pb, Hg and Cr were analyzed. Stochastic models including a conditional inference tree (CIT) and a finite mixture distribution model (FMDM) were applied to identify the sources and partition the contribution from natural and anthropogenic sources for heavy metal in topsoils of the study area. Regression trees for Cd, As, Pb and Hg were proved to depend mostly on indicators of anthropogenic activities such as industrial type and distance from urban area, while the regression tree for Cr was found to be mainly influenced by the geogenic characteristics. The FMDM analysis showed that the geometric means of modeled background values for Cd, As, Pb, Hg and Cr were close to their background values previously reported in the study area, while the contamination of Cd and Hg were widespread in the study area, imposing potentially detrimental effects on organisms through the food chain. Finally, the probabilities of single and multiple heavy metals exceeding the threshold values derived from the FMDM were estimated using indicator kriging (IK) and multivariate indicator kriging (MVIK). The high probabilities exceeding the thresholds of heavy metals were associated with metalliferous production and atmospheric deposition of heavy metals transported from the urban and industrial areas. Geostatistics coupled with stochastic models provide an effective way to delineate multiple heavy metal pollution to facilitate improved environmental management. Copyright © 2014 Elsevier B.V. All rights reserved.
A prognostic pollen emissions model for climate models (PECM1.0)
NASA Astrophysics Data System (ADS)
Wozniak, Matthew C.; Steiner, Allison L.
2017-11-01
We develop a prognostic model called Pollen Emissions for Climate Models (PECM) for use within regional and global climate models to simulate pollen counts over the seasonal cycle based on geography, vegetation type, and meteorological parameters. Using modern surface pollen count data, empirical relationships between prior-year annual average temperature and pollen season start dates and end dates are developed for deciduous broadleaf trees (Acer, Alnus, Betula, Fraxinus, Morus, Platanus, Populus, Quercus, Ulmus), evergreen needleleaf trees (Cupressaceae, Pinaceae), grasses (Poaceae; C3, C4), and ragweed (Ambrosia). This regression model explains as much as 57 % of the variance in pollen phenological dates, and it is used to create a climate-flexible phenology that can be used to study the response of wind-driven pollen emissions to climate change. The emissions model is evaluated in the Regional Climate Model version 4 (RegCM4) over the continental United States by prescribing an emission potential from PECM and transporting pollen as aerosol tracers. We evaluate two different pollen emissions scenarios in the model using (1) a taxa-specific land cover database, phenology, and emission potential, and (2) a plant functional type (PFT) land cover, phenology, and emission potential. The simulated surface pollen concentrations for both simulations are evaluated against observed surface pollen counts in five climatic subregions. Given prescribed pollen emissions, the RegCM4 simulates observed concentrations within an order of magnitude, although the performance of the simulations in any subregion is strongly related to the land cover representation and the number of observation sites used to create the empirical phenological relationship. The taxa-based model provides a better representation of the phenology of tree-based pollen counts than the PFT-based model; however, we note that the PFT-based version provides a useful and climate-flexible emissions model for the general representation of the pollen phenology over the United States.
Mechanisms behind the estimation of photosynthesis traits from leaf reflectance observations
NASA Astrophysics Data System (ADS)
Dechant, Benjamin; Cuntz, Matthias; Doktor, Daniel; Vohland, Michael
2016-04-01
Many studies have investigated the reflectance-based estimation of leaf chlorophyll, water and dry matter contents of plants. Only few studies focused on photosynthesis traits, however. The maximum potential uptake of carbon dioxide under given environmental conditions is determined mainly by RuBisCO activity, limiting carboxylation, or the speed of photosynthetic electron transport. These two main limitations are represented by the maximum carboxylation capacity, V cmax,25, and the maximum electron transport rate, Jmax,25. These traits were estimated from leaf reflectance before but the mechanisms underlying the estimation remain rather speculative. The aim of this study was therefore to reveal the mechanisms behind reflectance-based estimation of V cmax,25 and Jmax,25. Leaf reflectance, photosynthetic response curves as well as nitrogen content per area, Narea, and leaf mass per area, LMA, were measured on 37 deciduous tree species. V cmax,25 and Jmax,25 were determined from the response curves. Partial Least Squares (PLS) regression models for the two photosynthesis traits V cmax,25 and Jmax,25 as well as Narea and LMA were studied using a cross-validation approach. Analyses of linear regression models based on Narea and other leaf traits estimated via PROSPECT inversion, PLS regression coefficients and model residuals were conducted in order to reveal the mechanisms behind the reflectance-based estimation. We found that V cmax,25 and Jmax,25 can be estimated from leaf reflectance with good to moderate accuracy for a large number of species and different light conditions. The dominant mechanism behind the estimations was the strong relationship between photosynthesis traits and leaf nitrogen content. This was concluded from very strong relationships between PLS regression coefficients, the model residuals as well as the prediction performance of Narea- based linear regression models compared to PLS regression models. While the PLS regression model for V cmax,25 was fully based on the correlation to Narea, the PLS regression model for Jmax,25 was not entirely based on it. Analyses of the contributions of different parts of the reflectance spectrum revealed that the information contributing to the Jmax,25 PLS regression model in addition to the main source of information, Narea, was mainly located in the visible part of the spectrum (500-900 nm). Estimated chlorophyll content could be excluded as potential source of this extra information. The PLS regression coefficients of the Jmax,25 model indicated possible contributions from chlorophyll fluorescence and cytochrome f content. In summary, we found that the main mechanism behind the estimation of V cmax,25 and Jmax,25 from leaf reflectance observations is the correlation to Narea but that there is additional information related to Jmax,25 mainly in the visible part of the spectrum.
NASA Astrophysics Data System (ADS)
Rocha, Alby D.; Groen, Thomas A.; Skidmore, Andrew K.; Darvishzadeh, Roshanak; Willemen, Louise
2017-11-01
The growing number of narrow spectral bands in hyperspectral remote sensing improves the capacity to describe and predict biological processes in ecosystems. But it also poses a challenge to fit empirical models based on such high dimensional data, which often contain correlated and noisy predictors. As sample sizes, to train and validate empirical models, seem not to be increasing at the same rate, overfitting has become a serious concern. Overly complex models lead to overfitting by capturing more than the underlying relationship, and also through fitting random noise in the data. Many regression techniques claim to overcome these problems by using different strategies to constrain complexity, such as limiting the number of terms in the model, by creating latent variables or by shrinking parameter coefficients. This paper is proposing a new method, named Naïve Overfitting Index Selection (NOIS), which makes use of artificially generated spectra, to quantify the relative model overfitting and to select an optimal model complexity supported by the data. The robustness of this new method is assessed by comparing it to a traditional model selection based on cross-validation. The optimal model complexity is determined for seven different regression techniques, such as partial least squares regression, support vector machine, artificial neural network and tree-based regressions using five hyperspectral datasets. The NOIS method selects less complex models, which present accuracies similar to the cross-validation method. The NOIS method reduces the chance of overfitting, thereby avoiding models that present accurate predictions that are only valid for the data used, and too complex to make inferences about the underlying process.
Finding structure in data using multivariate tree boosting
Miller, Patrick J.; Lubke, Gitta H.; McArtor, Daniel B.; Bergeman, C. S.
2016-01-01
Technology and collaboration enable dramatic increases in the size of psychological and psychiatric data collections, but finding structure in these large data sets with many collected variables is challenging. Decision tree ensembles such as random forests (Strobl, Malley, & Tutz, 2009) are a useful tool for finding structure, but are difficult to interpret with multiple outcome variables which are often of interest in psychology. To find and interpret structure in data sets with multiple outcomes and many predictors (possibly exceeding the sample size), we introduce a multivariate extension to a decision tree ensemble method called gradient boosted regression trees (Friedman, 2001). Our extension, multivariate tree boosting, is a method for nonparametric regression that is useful for identifying important predictors, detecting predictors with nonlinear effects and interactions without specification of such effects, and for identifying predictors that cause two or more outcome variables to covary. We provide the R package ‘mvtboost’ to estimate, tune, and interpret the resulting model, which extends the implementation of univariate boosting in the R package ‘gbm’ (Ridgeway et al., 2015) to continuous, multivariate outcomes. To illustrate the approach, we analyze predictors of psychological well-being (Ryff & Keyes, 1995). Simulations verify that our approach identifies predictors with nonlinear effects and achieves high prediction accuracy, exceeding or matching the performance of (penalized) multivariate multiple regression and multivariate decision trees over a wide range of conditions. PMID:27918183
NASA Astrophysics Data System (ADS)
Herguido, Estela; Pulido, Manuel; Francisco Lavado Contador, Joaquín; Schnabel, Susanne
2017-04-01
In Iberian dehesas and montados, the lack of tree recruitment compromises its long-term sustainability. However, in marginal areas of dehesas shrub encroachment facilitates tree recruitment while altering the distinctive physiognomic and cultural characteristics of the system. These are ongoing processes that should be considered when designing afforestation measures and policies. Based on spatial variables, we modeled the proneness of a piece of land to undergo tree recruitment and the results were related with the afforestation measures carried out under the UE First Afforestation Agricultural Land Program between 1992 and 2008. We analyzed the temporal tree population dynamics in 800 randomly selected plots of 100 m radius (2,510 ha in total) in dehesas and treeless pasturelands of Extremadura (hereafter rangelands). Tree changes were revealed by comparing aerial images taken in 1956 with orthophotographs and infrared ones from 2012. Spatial models that predict the areas prone either to lack tree recruitment or with recruitment were developed and based on three data mining algorithms: MARS (Multivariate Adaptive Regression Splines), Random Forest (RF) and Stochastic Gradient Boosting (Tree-Net, TN). Recruited-tree locations (1) vs. locations of places with no recruitment (0) (randomly selected from the study areas) were used as the binary dependent variable. A 5% of the data were used as test data set. As candidate explanatory variables we used 51 different topographic, climatic, bioclimatic, land cover-related and edaphic ones. The statistical models developed were extrapolated to the spatial context of the afforested areas in the region and also to the whole Extremenian rangelands, and the percentage of area modelled as prone to tree recruitment was calculated for each case. A total of 46,674.63 ha were afforested with holm oak (Quercus ilex) or cork oak (Quercus suber) in the studied rangelands under the UE First Afforestation Agricultural Land Program. In the sampled plots, 16,747 trees were detected as recruited, while 47,058 and 12,803 were present in both dates and lost during the studied period, respectively. Based on the Area Under the ROC Curve (AUC), all the data mining models considered showed a high fitness (MARS AUC= 0.86; TN AUC= 0.92; RF AUC= 0.95) and low misclassification rates. Correctly predicted test samples for absence and presence of tree recruitment accounted respectively to 78.3% and 76.8% when using MARS, 90.8% and 90.8% using TN and 88.9% and 89.1% using RF. The spatial patterns of the different models were similar. However, attending only the percentage of area prone to tree recruitment, outstanding differences were observed among models considering the total surface of rangelands (36.03% in MARS, 22.88% in TN and 6.72 % in RF). Despite these differences, when comparing the results with those of the afforested surfaces (31.73% in MARS, 20.70% in TN and 5.63 % in RF) the three algorithms pointed to similar conclusions, i.e. the afforestations performed in rangelands of Extremadura under UE First Afforestation Agricultural Land Program, barely discriminate between areas with or without natural regeneration. In conclusion, data mining technics are suitable to develop high-performance spatial models of vegetation dynamics. These models could be useful for policy and decision makers aimed at assessing the implementation of afforestation measures and the selection of more adequate locations.
Zhao, Cai-Yun; Li, Jun-Sheng; Xu, Jing; Liu, Xiao-Yan
2017-05-01
Globalization increases the opportunities for unintentionally introduced invasive alien species, especially for insects, and most of these species could damage ecosystems and cause economic loss in China. In this study, we analyzed drivers of the distribution of unintentionally introduced invasive alien insects. Based on the number of unintentionally introduced invasive alien insects and their presence/absence records in each province in mainland China, regression trees were built to elucidate the roles of environmental and anthropogenic factors on the number distribution and similarity of species composition of these insects. Classification and regression trees indicated climatic suitability (the mean temperature in January) and human economic activity (sum of total freight) are primary drivers for the number distribution pattern of unintentionally introduced invasive alien insects at provincial scale, while only environmental factors (the mean January temperature, the annual precipitation and the areas of provinces) significantly affect the similarity of them based on the multivariate regression trees. © The Authors 2017. Published by Oxford University Press on behalf of Entomological Society of America.
Digression and Value Concatenation to Enable Privacy-Preserving Regression.
Li, Xiao-Bai; Sarkar, Sumit
2014-09-01
Regression techniques can be used not only for legitimate data analysis, but also to infer private information about individuals. In this paper, we demonstrate that regression trees, a popular data-analysis and data-mining technique, can be used to effectively reveal individuals' sensitive data. This problem, which we call a "regression attack," has not been addressed in the data privacy literature, and existing privacy-preserving techniques are not appropriate in coping with this problem. We propose a new approach to counter regression attacks. To protect against privacy disclosure, our approach introduces a novel measure, called digression , which assesses the sensitive value disclosure risk in the process of building a regression tree model. Specifically, we develop an algorithm that uses the measure for pruning the tree to limit disclosure of sensitive data. We also propose a dynamic value-concatenation method for anonymizing data, which better preserves data utility than a user-defined generalization scheme commonly used in existing approaches. Our approach can be used for anonymizing both numeric and categorical data. An experimental study is conducted using real-world financial, economic and healthcare data. The results of the experiments demonstrate that the proposed approach is very effective in protecting data privacy while preserving data quality for research and analysis.
Jose M. Iniguez; Joseph L. Ganey; Peter J. Daughtery; John D. Bailey
2005-01-01
The objective of this study was to develop a rule based cover type classification system for the forest and woodland vegetation in the Sky Islands of southeastern Arizona. In order to develop such a system we qualitatively and quantitatively compared a hierarchical (Wardâs) and a non-hierarchical (k-means) clustering method. Ecologically, unique groups represented by...
Jose M. Iniguez; Joseph L. Ganey; Peter J. Daugherty; John D. Bailey
2005-01-01
The objective of this study was to develop a rule based cover type classification system for the forest and woodland vegetation in the Sky Islands of southeastern Arizona. In order to develop such system we qualitatively and quantitatively compared a hierarchical (Wardâs) and a non-hierarchical (k-means) clustering method. Ecologically, unique groups and plots...
Stratification of the severity of critically ill patients with classification trees
2009-01-01
Background Development of three classification trees (CT) based on the CART (Classification and Regression Trees), CHAID (Chi-Square Automatic Interaction Detection) and C4.5 methodologies for the calculation of probability of hospital mortality; the comparison of the results with the APACHE II, SAPS II and MPM II-24 scores, and with a model based on multiple logistic regression (LR). Methods Retrospective study of 2864 patients. Random partition (70:30) into a Development Set (DS) n = 1808 and Validation Set (VS) n = 808. Their properties of discrimination are compared with the ROC curve (AUC CI 95%), Percent of correct classification (PCC CI 95%); and the calibration with the Calibration Curve and the Standardized Mortality Ratio (SMR CI 95%). Results CTs are produced with a different selection of variables and decision rules: CART (5 variables and 8 decision rules), CHAID (7 variables and 15 rules) and C4.5 (6 variables and 10 rules). The common variables were: inotropic therapy, Glasgow, age, (A-a)O2 gradient and antecedent of chronic illness. In VS: all the models achieved acceptable discrimination with AUC above 0.7. CT: CART (0.75(0.71-0.81)), CHAID (0.76(0.72-0.79)) and C4.5 (0.76(0.73-0.80)). PCC: CART (72(69-75)), CHAID (72(69-75)) and C4.5 (76(73-79)). Calibration (SMR) better in the CT: CART (1.04(0.95-1.31)), CHAID (1.06(0.97-1.15) and C4.5 (1.08(0.98-1.16)). Conclusion With different methodologies of CTs, trees are generated with different selection of variables and decision rules. The CTs are easy to interpret, and they stratify the risk of hospital mortality. The CTs should be taken into account for the classification of the prognosis of critically ill patients. PMID:20003229
Salas, Eric Ariel L; Valdez, Raul; Michel, Stefan
2017-11-01
We modeled summer and winter habitat suitability of Marco Polo argali in the Pamir Mountains in southeastern Tajikistan using these statistical algorithms: Generalized Linear Model, Random Forest, Boosted Regression Tree, Maxent, and Multivariate Adaptive Regression Splines. Using sheep occurrence data collected from 2009 to 2015 and a set of selected habitat predictors, we produced summer and winter habitat suitability maps and determined the important habitat suitability predictors for both seasons. Our results demonstrated that argali selected proximity to riparian areas and greenness as the two most relevant variables for summer, and the degree of slope (gentler slopes between 0° to 20°) and Landsat temperature band for winter. The terrain roughness was also among the most important variables in summer and winter models. Aspect was only significant for winter habitat, with argali preferring south-facing mountain slopes. We evaluated various measures of model performance such as the Area Under the Curve (AUC) and the True Skill Statistic (TSS). Comparing the five algorithms, the AUC scored highest for Boosted Regression Tree in summer (AUC = 0.94) and winter model runs (AUC = 0.94). In contrast, Random Forest underperformed in both model runs.
Nateghi, Roshanak; Guikema, Seth D; Quiring, Steven M
2011-12-01
This article compares statistical methods for modeling power outage durations during hurricanes and examines the predictive accuracy of these methods. Being able to make accurate predictions of power outage durations is valuable because the information can be used by utility companies to plan their restoration efforts more efficiently. This information can also help inform customers and public agencies of the expected outage times, enabling better collective response planning, and coordination of restoration efforts for other critical infrastructures that depend on electricity. In the long run, outage duration estimates for future storm scenarios may help utilities and public agencies better allocate risk management resources to balance the disruption from hurricanes with the cost of hardening power systems. We compare the out-of-sample predictive accuracy of five distinct statistical models for estimating power outage duration times caused by Hurricane Ivan in 2004. The methods compared include both regression models (accelerated failure time (AFT) and Cox proportional hazard models (Cox PH)) and data mining techniques (regression trees, Bayesian additive regression trees (BART), and multivariate additive regression splines). We then validate our models against two other hurricanes. Our results indicate that BART yields the best prediction accuracy and that it is possible to predict outage durations with reasonable accuracy. © 2011 Society for Risk Analysis.
NASA Astrophysics Data System (ADS)
Heddam, Salim; Kisi, Ozgur
2018-04-01
In the present study, three types of artificial intelligence techniques, least square support vector machine (LSSVM), multivariate adaptive regression splines (MARS) and M5 model tree (M5T) are applied for modeling daily dissolved oxygen (DO) concentration using several water quality variables as inputs. The DO concentration and water quality variables data from three stations operated by the United States Geological Survey (USGS) were used for developing the three models. The water quality data selected consisted of daily measured of water temperature (TE, °C), pH (std. unit), specific conductance (SC, μS/cm) and discharge (DI cfs), are used as inputs to the LSSVM, MARS and M5T models. The three models were applied for each station separately and compared to each other. According to the results obtained, it was found that: (i) the DO concentration could be successfully estimated using the three models and (ii) the best model among all others differs from one station to another.
Annual Tree Growth Predictions From Periodic Measurements
Quang V. Cao
2004-01-01
Data from annual measurements of a loblolly pine (Pinus taeda L.) plantation were available for this study. Regression techniques were employed to model annual changes of individual trees in terms of diameters, heights, and survival probabilities. Subsets of the data that include measurements every 2, 3, 4, 5, and 6 years were used to fit the same...
Mapping and spatial-temporal modeling of Bromus tectorum invasion in central Utah
NASA Astrophysics Data System (ADS)
Jin, Zhenyu
Cheatgrass, or Downy Brome, is an exotic winter annual weed native to the Mediterranean region. Since its introduction to the U.S., it has become a significant weed and aggressive invader of sagebrush, pinion-juniper, and other shrub communities, where it can completely out-compete native grasses and shrubs. In this research, remotely sensed data combined with field collected data are used to investigate the distribution of the cheatgrass in Central Utah, to characterize the trend of the NDVI time-series of cheatgrass, and to construct a spatially explicit population-based model to simulate the spatial-temporal dynamics of the cheatgrass. This research proposes a method for mapping the canopy closure of invasive species using remotely sensed data acquired at different dates. Different invasive species have their own distinguished phenologies and the satellite images in different dates could be used to capture the phenology. The results of cheatgrass abundance prediction have a good fit with the field data for both linear regression and regression tree models, although the regression tree model has better performance than the linear regression model. To characterize the trend of NDVI time-series of cheatgrass, a novel smoothing algorithm named RMMEH is presented in this research to overcome some drawbacks of many other algorithms. By comparing the performance of RMMEH in smoothing a 16-day composite of the MODIS NDVI time-series with that of two other methods, which are the 4253EH, twice and the MVI, we have found that RMMEH not only keeps the original valid NDVI points, but also effectively removes the spurious spikes. The reconstructed NDVI time-series of different land covers are of higher quality and have smoother temporal trend. To simulate the spatial-temporal dynamics of cheatgrass, a spatially explicit population-based model is built applying remotely sensed data. The comparison between the model output and the ground truth of cheatgrass closure demonstrates that the model could successfully simulate the spatial-temporal dynamics of cheatgrass in a simple cheatgrass-dominant environment. The simulation of the functional response of different prescribed fire rates also shows that this model is helpful to answer management questions like, "What are the effects of prescribed fire to invasive species?" It demonstrates that a medium fire rate of 10% can successfully prevent cheatgrass invasion.
NASA Astrophysics Data System (ADS)
Dokuchaev, P. M.; Meshalkina, J. L.; Yaroslavtsev, A. M.
2018-01-01
Comparative analysis of soils geospatial modeling using multinomial logistic regression, decision trees, random forest, regression trees and support vector machines algorithms was conducted. The visual interpretation of the digital maps obtained and their comparison with the existing map, as well as the quantitative assessment of the individual soil groups detection overall accuracy and of the models kappa showed that multiple logistic regression, support vector method, and random forest models application with spatial prediction of the conditional soil groups distribution can be reliably used for mapping of the study area. It has shown the most accurate detection for sod-podzolics soils (Phaeozems Albic) lightly eroded and moderately eroded soils. In second place, according to the mean overall accuracy of the prediction, there are sod-podzolics soils - non-eroded and warp one, as well as sod-gley soils (Umbrisols Gleyic) and alluvial soils (Fluvisols Dystric, Umbric). Heavy eroded sod-podzolics and gray forest soils (Phaeozems Albic) were detected by methods of automatic classification worst of all.
Statistical Methods in Ai: Rare Event Learning Using Associative Rules and Higher-Order Statistics
NASA Astrophysics Data System (ADS)
Iyer, V.; Shetty, S.; Iyengar, S. S.
2015-07-01
Rare event learning has not been actively researched since lately due to the unavailability of algorithms which deal with big samples. The research addresses spatio-temporal streams from multi-resolution sensors to find actionable items from a perspective of real-time algorithms. This computing framework is independent of the number of input samples, application domain, labelled or label-less streams. A sampling overlap algorithm such as Brooks-Iyengar is used for dealing with noisy sensor streams. We extend the existing noise pre-processing algorithms using Data-Cleaning trees. Pre-processing using ensemble of trees using bagging and multi-target regression showed robustness to random noise and missing data. As spatio-temporal streams are highly statistically correlated, we prove that a temporal window based sampling from sensor data streams converges after n samples using Hoeffding bounds. Which can be used for fast prediction of new samples in real-time. The Data-cleaning tree model uses a nonparametric node splitting technique, which can be learned in an iterative way which scales linearly in memory consumption for any size input stream. The improved task based ensemble extraction is compared with non-linear computation models using various SVM kernels for speed and accuracy. We show using empirical datasets the explicit rule learning computation is linear in time and is only dependent on the number of leafs present in the tree ensemble. The use of unpruned trees (t) in our proposed ensemble always yields minimum number (m) of leafs keeping pre-processing computation to n × t log m compared to N2 for Gram Matrix. We also show that the task based feature induction yields higher Qualify of Data (QoD) in the feature space compared to kernel methods using Gram Matrix.
Carnahan, Brian; Meyer, Gérard; Kuntz, Lois-Ann
2003-01-01
Multivariate classification models play an increasingly important role in human factors research. In the past, these models have been based primarily on discriminant analysis and logistic regression. Models developed from machine learning research offer the human factors professional a viable alternative to these traditional statistical classification methods. To illustrate this point, two machine learning approaches--genetic programming and decision tree induction--were used to construct classification models designed to predict whether or not a student truck driver would pass his or her commercial driver license (CDL) examination. The models were developed and validated using the curriculum scores and CDL exam performances of 37 student truck drivers who had completed a 320-hr driver training course. Results indicated that the machine learning classification models were superior to discriminant analysis and logistic regression in terms of predictive accuracy. Actual or potential applications of this research include the creation of models that more accurately predict human performance outcomes.
NASA Astrophysics Data System (ADS)
Yang, Guang; Ye, Xujiong; Slabaugh, Greg; Keegan, Jennifer; Mohiaddin, Raad; Firmin, David
2016-03-01
In this paper, we propose a novel self-learning based single-image super-resolution (SR) method, which is coupled with dual-tree complex wavelet transform (DTCWT) based denoising to better recover high-resolution (HR) medical images. Unlike previous methods, this self-learning based SR approach enables us to reconstruct HR medical images from a single low-resolution (LR) image without extra training on HR image datasets in advance. The relationships between the given image and its scaled down versions are modeled using support vector regression with sparse coding and dictionary learning, without explicitly assuming reoccurrence or self-similarity across image scales. In addition, we perform DTCWT based denoising to initialize the HR images at each scale instead of simple bicubic interpolation. We evaluate our method on a variety of medical images. Both quantitative and qualitative results show that the proposed approach outperforms bicubic interpolation and state-of-the-art single-image SR methods while effectively removing noise.
Research on complex 3D tree modeling based on L-system
NASA Astrophysics Data System (ADS)
Gang, Chen; Bin, Chen; Yuming, Liu; Hui, Li
2018-03-01
L-system as a fractal iterative system could simulate complex geometric patterns. Based on the field observation data of trees and knowledge of forestry experts, this paper extracted modeling constraint rules and obtained an L-system rules set. Using the self-developed L-system modeling software the L-system rule set was parsed to generate complex tree 3d models.The results showed that the geometrical modeling method based on l-system could be used to describe the morphological structure of complex trees and generate 3D tree models.
Cheaib, Alissar; Badeau, Vincent; Boe, Julien; Chuine, Isabelle; Delire, Christine; Dufrêne, Eric; François, Christophe; Gritti, Emmanuel S; Legay, Myriam; Pagé, Christian; Thuiller, Wilfried; Viovy, Nicolas; Leadley, Paul
2012-06-01
Model-based projections of shifts in tree species range due to climate change are becoming an important decision support tool for forest management. However, poorly evaluated sources of uncertainty require more scrutiny before relying heavily on models for decision-making. We evaluated uncertainty arising from differences in model formulations of tree response to climate change based on a rigorous intercomparison of projections of tree distributions in France. We compared eight models ranging from niche-based to process-based models. On average, models project large range contractions of temperate tree species in lowlands due to climate change. There was substantial disagreement between models for temperate broadleaf deciduous tree species, but differences in the capacity of models to account for rising CO(2) impacts explained much of the disagreement. There was good quantitative agreement among models concerning the range contractions for Scots pine. For the dominant Mediterranean tree species, Holm oak, all models foresee substantial range expansion. © 2012 Blackwell Publishing Ltd/CNRS.
Real-time quality monitoring in debutanizer column with regression tree and ANFIS
NASA Astrophysics Data System (ADS)
Siddharth, Kumar; Pathak, Amey; Pani, Ajaya Kumar
2018-05-01
A debutanizer column is an integral part of any petroleum refinery. Online composition monitoring of debutanizer column outlet streams is highly desirable in order to maximize the production of liquefied petroleum gas. In this article, data-driven models for debutanizer column are developed for real-time composition monitoring. The dataset used has seven process variables as inputs and the output is the butane concentration in the debutanizer column bottom product. The input-output dataset is divided equally into a training (calibration) set and a validation (testing) set. The training set data were used to develop fuzzy inference, adaptive neuro fuzzy (ANFIS) and regression tree models for the debutanizer column. The accuracy of the developed models were evaluated by simulation of the models with the validation dataset. It is observed that the ANFIS model has better estimation accuracy than other models developed in this work and many data-driven models proposed so far in the literature for the debutanizer column.
Proxies for soil organic carbon derived from remote sensing
NASA Astrophysics Data System (ADS)
Rasel, S. M. M.; Groen, T. A.; Hussin, Y. A.; Diti, I. J.
2017-07-01
The possibility of carbon storage in soils is of interest because compared to vegetation it contains more carbon. Estimation of soil carbon through remote sensing based techniques can be a cost effective approach, but is limited by available methods. This study aims to develop a model based on remotely sensed variables (elevation, forest type and above ground biomass) to estimate soil carbon stocks. Field observations on soil organic carbon, species composition, and above ground biomass were recorded in the subtropical forest of Chitwan, Nepal. These variables were also estimated using LiDAR data and a WorldView 2 image. Above ground biomass was estimated from the LiDAR image using a novel approach where the image was segmented to identify individual trees, and for these trees estimates of DBH and Height were made. Based on AIC (Akaike Information Criterion) a regression model with above ground biomass derived from LiDAR data, and forest type derived from WorldView 2 imagery was selected to estimate soil organic carbon (SOC) stocks. The selected model had a coefficient of determination (R2) of 0.69. This shows the scope of estimating SOC with remote sensing derived variables in sub-tropical forests.
The effect of using genealogy-based haplotypes for genomic prediction
2013-01-01
Background Genomic prediction uses two sources of information: linkage disequilibrium between markers and quantitative trait loci, and additive genetic relationships between individuals. One way to increase the accuracy of genomic prediction is to capture more linkage disequilibrium by regression on haplotypes instead of regression on individual markers. The aim of this study was to investigate the accuracy of genomic prediction using haplotypes based on local genealogy information. Methods A total of 4429 Danish Holstein bulls were genotyped with the 50K SNP chip. Haplotypes were constructed using local genealogical trees. Effects of haplotype covariates were estimated with two types of prediction models: (1) assuming that effects had the same distribution for all haplotype covariates, i.e. the GBLUP method and (2) assuming that a large proportion (π) of the haplotype covariates had zero effect, i.e. a Bayesian mixture method. Results About 7.5 times more covariate effects were estimated when fitting haplotypes based on local genealogical trees compared to fitting individuals markers. Genealogy-based haplotype clustering slightly increased the accuracy of genomic prediction and, in some cases, decreased the bias of prediction. With the Bayesian method, accuracy of prediction was less sensitive to parameter π when fitting haplotypes compared to fitting markers. Conclusions Use of haplotypes based on genealogy can slightly increase the accuracy of genomic prediction. Improved methods to cluster the haplotypes constructed from local genealogy could lead to additional gains in accuracy. PMID:23496971
The effect of using genealogy-based haplotypes for genomic prediction.
Edriss, Vahid; Fernando, Rohan L; Su, Guosheng; Lund, Mogens S; Guldbrandtsen, Bernt
2013-03-06
Genomic prediction uses two sources of information: linkage disequilibrium between markers and quantitative trait loci, and additive genetic relationships between individuals. One way to increase the accuracy of genomic prediction is to capture more linkage disequilibrium by regression on haplotypes instead of regression on individual markers. The aim of this study was to investigate the accuracy of genomic prediction using haplotypes based on local genealogy information. A total of 4429 Danish Holstein bulls were genotyped with the 50K SNP chip. Haplotypes were constructed using local genealogical trees. Effects of haplotype covariates were estimated with two types of prediction models: (1) assuming that effects had the same distribution for all haplotype covariates, i.e. the GBLUP method and (2) assuming that a large proportion (π) of the haplotype covariates had zero effect, i.e. a Bayesian mixture method. About 7.5 times more covariate effects were estimated when fitting haplotypes based on local genealogical trees compared to fitting individuals markers. Genealogy-based haplotype clustering slightly increased the accuracy of genomic prediction and, in some cases, decreased the bias of prediction. With the Bayesian method, accuracy of prediction was less sensitive to parameter π when fitting haplotypes compared to fitting markers. Use of haplotypes based on genealogy can slightly increase the accuracy of genomic prediction. Improved methods to cluster the haplotypes constructed from local genealogy could lead to additional gains in accuracy.
GTM-Based QSAR Models and Their Applicability Domains.
Gaspar, H A; Baskin, I I; Marcou, G; Horvath, D; Varnek, A
2015-06-01
In this paper we demonstrate that Generative Topographic Mapping (GTM), a machine learning method traditionally used for data visualisation, can be efficiently applied to QSAR modelling using probability distribution functions (PDF) computed in the latent 2-dimensional space. Several different scenarios of the activity assessment were considered: (i) the "activity landscape" approach based on direct use of PDF, (ii) QSAR models involving GTM-generated on descriptors derived from PDF, and, (iii) the k-Nearest Neighbours approach in 2D latent space. Benchmarking calculations were performed on five different datasets: stability constants of metal cations Ca(2+) , Gd(3+) and Lu(3+) complexes with organic ligands in water, aqueous solubility and activity of thrombin inhibitors. It has been shown that the performance of GTM-based regression models is similar to that obtained with some popular machine-learning methods (random forest, k-NN, M5P regression tree and PLS) and ISIDA fragment descriptors. By comparing GTM activity landscapes built both on predicted and experimental activities, we may visually assess the model's performance and identify the areas in the chemical space corresponding to reliable predictions. The applicability domain used in this work is based on data likelihood. Its application has significantly improved the model performances for 4 out of 5 datasets. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Vergara, Pablo M.; Soto, Gerardo E.; Rodewald, Amanda D.; Meneses, Luis O.; Pérez-Hernández, Christian G.
2016-01-01
Theoretical models predict that animals should make foraging decisions after assessing the quality of available habitat, but most models fail to consider the spatio-temporal scales at which animals perceive habitat availability. We tested three foraging strategies that explain how Magellanic woodpeckers (Campephilus magellanicus) assess the relative quality of trees: 1) Woodpeckers with local knowledge select trees based on the available trees in the immediate vicinity. 2) Woodpeckers lacking local knowledge select trees based on their availability at previously visited locations. 3) Woodpeckers using information from long-term memory select trees based on knowledge about trees available within the entire landscape. We observed foraging woodpeckers and used a Brownian Bridge Movement Model to identify trees available to woodpeckers along foraging routes. Woodpeckers selected trees with a later decay stage than available trees. Selection models indicated that preferences of Magellanic woodpeckers were based on clusters of trees near the most recently visited trees, thus suggesting that woodpeckers use visual cues from neighboring trees. In a second analysis, Cox’s proportional hazards models showed that woodpeckers used information consolidated across broader spatial scales to adjust tree residence times. Specifically, woodpeckers spent more time at trees with larger diameters and in a more advanced stage of decay than trees available along their routes. These results suggest that Magellanic woodpeckers make foraging decisions based on the relative quality of trees that they perceive and memorize information at different spatio-temporal scales. PMID:27416115
Vergara, Pablo M; Soto, Gerardo E; Moreira-Arce, Darío; Rodewald, Amanda D; Meneses, Luis O; Pérez-Hernández, Christian G
2016-01-01
Theoretical models predict that animals should make foraging decisions after assessing the quality of available habitat, but most models fail to consider the spatio-temporal scales at which animals perceive habitat availability. We tested three foraging strategies that explain how Magellanic woodpeckers (Campephilus magellanicus) assess the relative quality of trees: 1) Woodpeckers with local knowledge select trees based on the available trees in the immediate vicinity. 2) Woodpeckers lacking local knowledge select trees based on their availability at previously visited locations. 3) Woodpeckers using information from long-term memory select trees based on knowledge about trees available within the entire landscape. We observed foraging woodpeckers and used a Brownian Bridge Movement Model to identify trees available to woodpeckers along foraging routes. Woodpeckers selected trees with a later decay stage than available trees. Selection models indicated that preferences of Magellanic woodpeckers were based on clusters of trees near the most recently visited trees, thus suggesting that woodpeckers use visual cues from neighboring trees. In a second analysis, Cox's proportional hazards models showed that woodpeckers used information consolidated across broader spatial scales to adjust tree residence times. Specifically, woodpeckers spent more time at trees with larger diameters and in a more advanced stage of decay than trees available along their routes. These results suggest that Magellanic woodpeckers make foraging decisions based on the relative quality of trees that they perceive and memorize information at different spatio-temporal scales.
NASA Astrophysics Data System (ADS)
Beguet, Benoit; Guyon, Dominique; Boukir, Samia; Chehata, Nesrine
2014-10-01
The main goal of this study is to design a method to describe the structure of forest stands from Very High Resolution satellite imagery, relying on some typical variables such as crown diameter, tree height, trunk diameter, tree density and tree spacing. The emphasis is placed on the automatization of the process of identification of the most relevant image features for the forest structure retrieval task, exploiting both spectral and spatial information. Our approach is based on linear regressions between the forest structure variables to be estimated and various spectral and Haralick's texture features. The main drawback of this well-known texture representation is the underlying parameters which are extremely difficult to set due to the spatial complexity of the forest structure. To tackle this major issue, an automated feature selection process is proposed which is based on statistical modeling, exploring a wide range of parameter values. It provides texture measures of diverse spatial parameters hence implicitly inducing a multi-scale texture analysis. A new feature selection technique, we called Random PRiF, is proposed. It relies on random sampling in feature space, carefully addresses the multicollinearity issue in multiple-linear regression while ensuring accurate prediction of forest variables. Our automated forest variable estimation scheme was tested on Quickbird and Pléiades panchromatic and multispectral images, acquired at different periods on the maritime pine stands of two sites in South-Western France. It outperforms two well-established variable subset selection techniques. It has been successfully applied to identify the best texture features in modeling the five considered forest structure variables. The RMSE of all predicted forest variables is improved by combining multispectral and panchromatic texture features, with various parameterizations, highlighting the potential of a multi-resolution approach for retrieving forest structure variables from VHR satellite images. Thus an average prediction error of ˜ 1.1 m is expected on crown diameter, ˜ 0.9 m on tree spacing, ˜ 3 m on height and ˜ 0.06 m on diameter at breast height.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Andrew G. Peterson; J. Timothy Ball; Yiqi Luo
1998-09-25
Estimation of leaf photosynthetic rate (A) from leaf nitrogen content (N) is both conceptually and numerically important in models of plant, ecosystem and biosphere responses to global change. The relationship between A and N has been studied extensively at ambient CO{sub 2} but much less at elevated CO{sub 2}. This study was designed to (1) assess whether the A-N relationship was more similar for species within than between community and vegetation types, and (2) examine how growth at elevated CO{sub 2} affects the A-N relationship. Data were obtained for 39 C{sub 3} species grown at ambient CO{sub 2} and 10more » C{sub 3} species grown at ambient and elevated CO{sub 2}. A regression model was applied to each species as well as to species pooled within different community and vegetation types. Cluster analysis of the regression coefficients indicated that species measured at ambient CO{sub 2} did not separate into distinct groups matching community or vegetation type. Instead, most community and vegetation types shared the same general parameter space for regression coefficients. Growth at elevated CO{sub 2} increased photosynthetic nitrogen use efficiency for pines and deciduous trees. When species were pooled by vegetation type, the A-N relationship for deciduous trees expressed on a leaf-mass bask was not altered by elevated CO{sub 2}, while the intercept increased for pines. When regression coefficients were averaged to give mean responses for different vegetation types, elevated CO{sub 2} increased the intercept and the slope for deciduous trees but increased only the intercept for pines. There were no statistical differences between the pines and deciduous trees for the effect of CO{sub 2}. Generalizations about the effect of elevated CO{sub 2} on the A-N relationship, and differences between pines and deciduous trees will be enhanced as more data become available.« less
Stand basal-area and tree-diameter growth in red spruce-fir forests in Maine, 1960-80
S.J. Zarnoch; D.A. Gansner; D.S. Powell; T.A. Birch; T.A. Birch
1990-01-01
Stand basal-area change and individual surviving red spruce d.b.h. growth from 1960 to 1980 were analyzed for red spruce-fir stands in Maine. Regression modeling was used to relate these measures of growth to stand and tree conditions and to compare growth throughout the period. Results indicate a decline in growth.
NASA Technical Reports Server (NTRS)
Porter, Trevor J.; Pisaric, Michael F. J.; Field, Robert D.; Kokelj, Steven V.; Edwards, Thomas W. D.; deMontigny, Peter; Healy, Richard; LeGrande, Allegra N.
2013-01-01
High-latitude delta(exp 18)O archives deriving from meteoric water (e.g., tree-rings and ice-cores) can provide valuable information on past temperature variability, but stationarity of temperature signals in these archives depends on the stability of moisture source/trajectory and precipitation seasonality, both of which can be affected by atmospheric circulation changes. A tree-ring delta(exp 18)O record (AD 1780-2003) from the Mackenzie Delta is evaluated as a temperature proxy based on linear regression diagnostics. The primary source of moisture for this region is the North Pacific and, thus, North Pacific atmospheric circulation variability could potentially affect the tree-ring delta(exp 18)O-temperature signal. Over the instrumental period (AD 1892-2003), tree-ring delta(exp 18)O explained 29% of interannual variability in April-July minimum temperatures, and the explained variability increases substantially at lower-frequencies. A split-period calibration/verification analysis found the delta(exp 18)O-temperature relation was time-stable, which supported a temperature reconstruction back to AD 1780. The stability of the delta(exp 18)O-temperature signal indirectly implies the study region is insensitive to North Pacific circulation effects, since North Pacific circulation was not constant over the calibration period. Simulations from the NASA-GISS ModelE isotope-enabled general circulation model confirm that meteoric delta(exp 18)O and precipitation seasonality in the study region are likely insensitive to North Pacific circulation effects, highlighting the paleoclimatic value of tree-ring and possibly other delta(exp 18)O records from this region. Our delta(exp 18)O-based temperature reconstruction is the first of its kind in northwestern North America, and one of few worldwide, and provides a long-term context for evaluating recent climate warming in the Mackenzie Delta region.
Model-based conifer crown surface reconstruction from multi-ocular high-resolution aerial imagery
NASA Astrophysics Data System (ADS)
Sheng, Yongwei
2000-12-01
Tree crown parameters such as width, height, shape and crown closure are desirable in forestry and ecological studies, but they are time-consuming and labor intensive to measure in the field. The stereoscopic capability of high-resolution aerial imagery provides a way to crown surface reconstruction. Existing photogrammetric algorithms designed to map terrain surfaces, however, cannot adequately extract crown surfaces, especially for steep conifer crowns. Considering crown surface reconstruction in a broader context of tree characterization from aerial images, we develop a rigorous perspective tree image formation model to bridge image-based tree extraction and crown surface reconstruction, and an integrated model-based approach to conifer crown surface reconstruction. Based on the fact that most conifer crowns are in a solid geometric form, conifer crowns are modeled as a generalized hemi-ellipsoid. Both the automatic and semi-automatic approaches are investigated to optimal tree model development from multi-ocular images. The semi-automatic 3D tree interpreter developed in this thesis is able to efficiently extract reliable tree parameters and tree models in complicated tree stands. This thesis starts with a sophisticated stereo matching algorithm, and incorporates tree models to guide stereo matching. The following critical problems are addressed in the model-based surface reconstruction process: (1) the problem of surface model composition from tree models, (2) the occlusion problem in disparity prediction from tree models, (3) the problem of integrating the predicted disparities into image matching, (4) the tree model edge effect reduction on the disparity map, (5) the occlusion problem in orthophoto production, and (6) the foreshortening problem in image matching, which is very serious for conifer crown surfaces. Solutions to the above problems are necessary for successful crown surface reconstruction. The model-based approach was applied to recover the canopy surface of a dense redwood stand using tri-ocular high-resolution images scanned from 1:2,400 aerial photographs. The results demonstrate the approach's ability to reconstruct complicated stands. The model-based approach proposed in this thesis is potentially applicable to other surfaces recovering problems with a priori knowledge about objects.
Future credible precipitation occurrences in Los Alamos, New Mexico
DOE Office of Scientific and Technical Information (OSTI.GOV)
Abeele, W.V.
1980-09-01
I have studied many factors thought to have influenced past climatic change. Because they might recur, they are possible suspects for future climatic alterations. Most of these factors are totally unpredictable; therefore, they cast a shadow on the validity of derived climatic predictions. Changes in atmospheric conditions and in continental surfaces, variations in solar radiation, and in the earth's orbit around the sun are among the influential mechanisms investigated. Even when models are set up that include the above parameters, their reliability will depend on unpredictable variables totally alien to the model (like volcanic eruptions). Based on climatic records, however,more » maximum precipitation amounts have been calculated for different probability levels. These seem to correspond well to past precipitation occurrences, derived from tree ring indices. The link between tree ring indices and local climate has been established through regression analysis.« less
Bayesian models for comparative analysis integrating phylogenetic uncertainty.
de Villemereuil, Pierre; Wells, Jessie A; Edwards, Robert D; Blomberg, Simon P
2012-06-28
Uncertainty in comparative analyses can come from at least two sources: a) phylogenetic uncertainty in the tree topology or branch lengths, and b) uncertainty due to intraspecific variation in trait values, either due to measurement error or natural individual variation. Most phylogenetic comparative methods do not account for such uncertainties. Not accounting for these sources of uncertainty leads to false perceptions of precision (confidence intervals will be too narrow) and inflated significance in hypothesis testing (e.g. p-values will be too small). Although there is some application-specific software for fitting Bayesian models accounting for phylogenetic error, more general and flexible software is desirable. We developed models to directly incorporate phylogenetic uncertainty into a range of analyses that biologists commonly perform, using a Bayesian framework and Markov Chain Monte Carlo analyses. We demonstrate applications in linear regression, quantification of phylogenetic signal, and measurement error models. Phylogenetic uncertainty was incorporated by applying a prior distribution for the phylogeny, where this distribution consisted of the posterior tree sets from Bayesian phylogenetic tree estimation programs. The models were analysed using simulated data sets, and applied to a real data set on plant traits, from rainforest plant species in Northern Australia. Analyses were performed using the free and open source software OpenBUGS and JAGS. Incorporating phylogenetic uncertainty through an empirical prior distribution of trees leads to more precise estimation of regression model parameters than using a single consensus tree and enables a more realistic estimation of confidence intervals. In addition, models incorporating measurement errors and/or individual variation, in one or both variables, are easily formulated in the Bayesian framework. We show that BUGS is a useful, flexible general purpose tool for phylogenetic comparative analyses, particularly for modelling in the face of phylogenetic uncertainty and accounting for measurement error or individual variation in explanatory variables. Code for all models is provided in the BUGS model description language.
Bayesian models for comparative analysis integrating phylogenetic uncertainty
2012-01-01
Background Uncertainty in comparative analyses can come from at least two sources: a) phylogenetic uncertainty in the tree topology or branch lengths, and b) uncertainty due to intraspecific variation in trait values, either due to measurement error or natural individual variation. Most phylogenetic comparative methods do not account for such uncertainties. Not accounting for these sources of uncertainty leads to false perceptions of precision (confidence intervals will be too narrow) and inflated significance in hypothesis testing (e.g. p-values will be too small). Although there is some application-specific software for fitting Bayesian models accounting for phylogenetic error, more general and flexible software is desirable. Methods We developed models to directly incorporate phylogenetic uncertainty into a range of analyses that biologists commonly perform, using a Bayesian framework and Markov Chain Monte Carlo analyses. Results We demonstrate applications in linear regression, quantification of phylogenetic signal, and measurement error models. Phylogenetic uncertainty was incorporated by applying a prior distribution for the phylogeny, where this distribution consisted of the posterior tree sets from Bayesian phylogenetic tree estimation programs. The models were analysed using simulated data sets, and applied to a real data set on plant traits, from rainforest plant species in Northern Australia. Analyses were performed using the free and open source software OpenBUGS and JAGS. Conclusions Incorporating phylogenetic uncertainty through an empirical prior distribution of trees leads to more precise estimation of regression model parameters than using a single consensus tree and enables a more realistic estimation of confidence intervals. In addition, models incorporating measurement errors and/or individual variation, in one or both variables, are easily formulated in the Bayesian framework. We show that BUGS is a useful, flexible general purpose tool for phylogenetic comparative analyses, particularly for modelling in the face of phylogenetic uncertainty and accounting for measurement error or individual variation in explanatory variables. Code for all models is provided in the BUGS model description language. PMID:22741602
Personalized Modeling for Prediction with Decision-Path Models
Visweswaran, Shyam; Ferreira, Antonio; Ribeiro, Guilherme A.; Oliveira, Alexandre C.; Cooper, Gregory F.
2015-01-01
Deriving predictive models in medicine typically relies on a population approach where a single model is developed from a dataset of individuals. In this paper we describe and evaluate a personalized approach in which we construct a new type of decision tree model called decision-path model that takes advantage of the particular features of a given person of interest. We introduce three personalized methods that derive personalized decision-path models. We compared the performance of these methods to that of Classification And Regression Tree (CART) that is a population decision tree to predict seven different outcomes in five medical datasets. Two of the three personalized methods performed statistically significantly better on area under the ROC curve (AUC) and Brier skill score compared to CART. The personalized approach of learning decision path models is a new approach for predictive modeling that can perform better than a population approach. PMID:26098570
Olivier, Pieter I.; van Aarde, Rudi J.
2017-01-01
The peninsula effect predicts that the number of species should decline from the base of a peninsula to the tip. However, evidence for the peninsula effect is ambiguous, as different analytical methods, study taxa, and variations in local habitat or regional climatic conditions influence conclusions on its presence. We address this uncertainty by using two analytical methods to investigate the peninsula effect in three taxa that occupy different trophic levels: trees, millipedes, and birds. We surveyed 81 tree quadrants, 102 millipede transects, and 152 bird points within 150 km of coastal dune forest that resemble a habitat peninsula along the northeast coast of South Africa. We then used spatial (trend surface analyses) and non-spatial regressions (generalized linear mixed models) to test for the presence of the peninsula effect in each of the three taxa. We also used linear mixed models to test if climate (temperature and precipitation) and/or local habitat conditions (water availability associated with topography and landscape structural variables) could explain gradients in species richness. Non-spatial models suggest that the peninsula effect was present in all three taxa. However, spatial models indicated that only bird species richness declined from the peninsula base to the peninsula tip. Millipede species richness increased near the centre of the peninsula, while tree species richness increased near the tip. Local habitat conditions explained species richness patterns of birds and trees, but not of millipedes, regardless of model type. Our study highlights the idiosyncrasies associated with the peninsula effect—conclusions on the presence of the peninsula effect depend on the analytical methods used and the taxon studied. The peninsula effect might therefore be better suited to describe a species richness pattern where the number of species decline from a broader habitat base to a narrow tip, rather than a process that drives species richness. PMID:28376096
Rieger, Isaak; Kowarik, Ingo; Cherubini, Paolo; Cierjacks, Arne
2017-01-01
Aboveground carbon (C) sequestration in trees is important in global C dynamics, but reliable techniques for its modeling in highly productive and heterogeneous ecosystems are limited. We applied an extended dendrochronological approach to disentangle the functioning of drivers from the atmosphere (temperature, precipitation), the lithosphere (sedimentation rate), the hydrosphere (groundwater table, river water level fluctuation), the biosphere (tree characteristics), and the anthroposphere (dike construction). Carbon sequestration in aboveground biomass of riparian Quercus robur L. and Fraxinus excelsior L. was modeled (1) over time using boosted regression tree analysis (BRT) on cross-datable trees characterized by equal annual growth ring patterns and (2) across space using a subsequent classification and regression tree analysis (CART) on cross-datable and not cross-datable trees. While C sequestration of cross-datable Q. robur responded to precipitation and temperature, cross-datable F. excelsior also responded to a low Danube river water level. However, CART revealed that C sequestration over time is governed by tree height and parameters that vary over space (magnitude of fluctuation in the groundwater table, vertical distance to mean river water level, and longitudinal distance to upstream end of the study area). Thus, a uniform response to climatic drivers of aboveground C sequestration in Q. robur was only detectable in trees of an intermediate height class and in taller trees (>21.8m) on sites where the groundwater table fluctuated little (≤0.9m). The detection of climatic drivers and the river water level in F. excelsior depended on sites at lower altitudes above the mean river water level (≤2.7m) and along a less dynamic downstream section of the study area. Our approach indicates unexploited opportunities of understanding the interplay of different environmental drivers in aboveground C sequestration. Results may support species-specific and locally adapted forest management plans to increase carbon dioxide sequestration from the atmosphere in trees. Copyright © 2016 Elsevier B.V. All rights reserved.
Hapca, Simona; Baveye, Philippe C; Wilson, Clare; Lark, Richard Murray; Otten, Wilfred
2015-01-01
There is currently a significant need to improve our understanding of the factors that control a number of critical soil processes by integrating physical, chemical and biological measurements on soils at microscopic scales to help produce 3D maps of the related properties. Because of technological limitations, most chemical and biological measurements can be carried out only on exposed soil surfaces or 2-dimensional cuts through soil samples. Methods need to be developed to produce 3D maps of soil properties based on spatial sequences of 2D maps. In this general context, the objective of the research described here was to develop a method to generate 3D maps of soil chemical properties at the microscale by combining 2D SEM-EDX data with 3D X-ray computed tomography images. A statistical approach using the regression tree method and ordinary kriging applied to the residuals was developed and applied to predict the 3D spatial distribution of carbon, silicon, iron, and oxygen at the microscale. The spatial correlation between the X-ray grayscale intensities and the chemical maps made it possible to use a regression-tree model as an initial step to predict the 3D chemical composition. For chemical elements, e.g., iron, that are sparsely distributed in a soil sample, the regression-tree model provides a good prediction, explaining as much as 90% of the variability in some of the data. However, for chemical elements that are more homogenously distributed, such as carbon, silicon, or oxygen, the additional kriging of the regression tree residuals improved significantly the prediction with an increase in the R2 value from 0.221 to 0.324 for carbon, 0.312 to 0.423 for silicon, and 0.218 to 0.374 for oxygen, respectively. The present research develops for the first time an integrated experimental and theoretical framework, which combines geostatistical methods with imaging techniques to unveil the 3-D chemical structure of soil at very fine scales. The methodology presented in this study can be easily adapted and applied to other types of data such as bacterial or fungal population densities for the 3D characterization of microbial distribution.
Hapca, Simona; Baveye, Philippe C.; Wilson, Clare; Lark, Richard Murray; Otten, Wilfred
2015-01-01
There is currently a significant need to improve our understanding of the factors that control a number of critical soil processes by integrating physical, chemical and biological measurements on soils at microscopic scales to help produce 3D maps of the related properties. Because of technological limitations, most chemical and biological measurements can be carried out only on exposed soil surfaces or 2-dimensional cuts through soil samples. Methods need to be developed to produce 3D maps of soil properties based on spatial sequences of 2D maps. In this general context, the objective of the research described here was to develop a method to generate 3D maps of soil chemical properties at the microscale by combining 2D SEM-EDX data with 3D X-ray computed tomography images. A statistical approach using the regression tree method and ordinary kriging applied to the residuals was developed and applied to predict the 3D spatial distribution of carbon, silicon, iron, and oxygen at the microscale. The spatial correlation between the X-ray grayscale intensities and the chemical maps made it possible to use a regression-tree model as an initial step to predict the 3D chemical composition. For chemical elements, e.g., iron, that are sparsely distributed in a soil sample, the regression-tree model provides a good prediction, explaining as much as 90% of the variability in some of the data. However, for chemical elements that are more homogenously distributed, such as carbon, silicon, or oxygen, the additional kriging of the regression tree residuals improved significantly the prediction with an increase in the R2 value from 0.221 to 0.324 for carbon, 0.312 to 0.423 for silicon, and 0.218 to 0.374 for oxygen, respectively. The present research develops for the first time an integrated experimental and theoretical framework, which combines geostatistical methods with imaging techniques to unveil the 3-D chemical structure of soil at very fine scales. The methodology presented in this study can be easily adapted and applied to other types of data such as bacterial or fungal population densities for the 3D characterization of microbial distribution. PMID:26372473
Gamal El-Dien, Omnia; Ratcliffe, Blaise; Klápště, Jaroslav; Chen, Charles; Porth, Ilga; El-Kassaby, Yousry A
2015-05-09
Genomic selection (GS) in forestry can substantially reduce the length of breeding cycle and increase gain per unit time through early selection and greater selection intensity, particularly for traits of low heritability and late expression. Affordable next-generation sequencing technologies made it possible to genotype large numbers of trees at a reasonable cost. Genotyping-by-sequencing was used to genotype 1,126 Interior spruce trees representing 25 open-pollinated families planted over three sites in British Columbia, Canada. Four imputation algorithms were compared (mean value (MI), singular value decomposition (SVD), expectation maximization (EM), and a newly derived, family-based k-nearest neighbor (kNN-Fam)). Trees were phenotyped for several yield and wood attributes. Single- and multi-site GS prediction models were developed using the Ridge Regression Best Linear Unbiased Predictor (RR-BLUP) and the Generalized Ridge Regression (GRR) to test different assumption about trait architecture. Finally, using PCA, multi-trait GS prediction models were developed. The EM and kNN-Fam imputation methods were superior for 30 and 60% missing data, respectively. The RR-BLUP GS prediction model produced better accuracies than the GRR indicating that the genetic architecture for these traits is complex. GS prediction accuracies for multi-site were high and better than those of single-sites while multi-site predictability produced the lowest accuracies reflecting type-b genetic correlations and deemed unreliable. The incorporation of genomic information in quantitative genetics analyses produced more realistic heritability estimates as half-sib pedigree tended to inflate the additive genetic variance and subsequently both heritability and gain estimates. Principle component scores as representatives of multi-trait GS prediction models produced surprising results where negatively correlated traits could be concurrently selected for using PCA2 and PCA3. The application of GS to open-pollinated family testing, the simplest form of tree improvement evaluation methods, was proven to be effective. Prediction accuracies obtained for all traits greatly support the integration of GS in tree breeding. While the within-site GS prediction accuracies were high, the results clearly indicate that single-site GS models ability to predict other sites are unreliable supporting the utilization of multi-site approach. Principle component scores provided an opportunity for the concurrent selection of traits with different phenotypic optima.
McCarthy, Ann Marie; Kleiber, Charmaine; Ataman, Kaan; Street, W. Nick; Zimmerman, M. Bridget; Ersig, Anne L.
2012-01-01
This secondary data analysis used data mining methods to develop predictive models of child risk for distress during a healthcare procedure. Data used came from a study that predicted factors associated with children’s responses to an intravenous catheter insertion while parents provided distraction coaching. From the 255 items used in the primary study, 44 predictive items were identified through automatic feature selection and used to build support vector machine regression models. Models were validated using multiple cross-validation tests and by comparing variables identified as explanatory in the traditional versus support vector machine regression. Rule-based approaches were applied to the model outputs to identify overall risk for distress. A decision tree was then applied to evidence-based instructions for tailoring distraction to characteristics and preferences of the parent and child. The resulting decision support computer application, the Children, Parents and Distraction (CPaD), is being used in research. Future use will support practitioners in deciding the level and type of distraction intervention needed by a child undergoing a healthcare procedure. PMID:22805121
Hao, Xu; Yujun, Sun; Xinjie, Wang; Jin, Wang; Yao, Fu
2015-01-01
A multiple linear model was developed for individual tree crown width of Cunninghamia lanceolata (Lamb.) Hook in Fujian province, southeast China. Data were obtained from 55 sample plots of pure China-fir plantation stands. An Ordinary Linear Least Squares (OLS) regression was used to establish the crown width model. To adjust for correlations between observations from the same sample plots, we developed one level linear mixed-effects (LME) models based on the multiple linear model, which take into account the random effects of plots. The best random effects combinations for the LME models were determined by the Akaike's information criterion, the Bayesian information criterion and the -2logarithm likelihood. Heteroscedasticity was reduced by three residual variance functions: the power function, the exponential function and the constant plus power function. The spatial correlation was modeled by three correlation structures: the first-order autoregressive structure [AR(1)], a combination of first-order autoregressive and moving average structures [ARMA(1,1)], and the compound symmetry structure (CS). Then, the LME model was compared to the multiple linear model using the absolute mean residual (AMR), the root mean square error (RMSE), and the adjusted coefficient of determination (adj-R2). For individual tree crown width models, the one level LME model showed the best performance. An independent dataset was used to test the performance of the models and to demonstrate the advantage of calibrating LME models.
NASA Astrophysics Data System (ADS)
Zhang, Wangfei; Chen, Erxue; Li, Zengyuan; Feng, Qi; Zhao, Lei
2016-08-01
DEM Differential Method is an effective and efficient way for forest tree height assessment with Polarimetric and interferometric technology, however, the assessment accuracy of it is based on the accuracy of interferometric results and DEM. Terra-SAR/TanDEM-X, which established the first spaceborne bistatic interferometer, can provide highly accurate cross-track interferometric images in the whole global without inherent accuracy limitations like temporal decorrelation and atmospheric disturbance. These characters of Terra-SAR/TandDEM-X give great potential for global or regional tree height assessment, which have been constraint by the temporal decorrelation in traditional repeat-pass interferometry. Currently, in China, it will be costly to collect high accurate DEM with Lidar. At the same time, it is also difficult to get truly representative ground survey samples to test and verify the assessment results. In this paper, we analyzed the feasibility of using TerraSAR/TanDEM-X data to assess forest tree height with current free DEM data like ASTER-GDEM and archived ground in-suit data like forest management inventory data (FMI). At first, the accuracy and of ASTER-GDEM and forest management inventory data had been assessment according to the DEM and canopy height model (CHM) extracted from Lidar data. The results show the average elevation RMSE between ASTER-GEDM and Lidar-DEM is about 13 meters, but they have high correlation with the correlation coefficient of 0.96. With a linear regression model, we can compensate ASTER-GDEM and improve its accuracy nearly to the Lidar-DEM with same scale. The correlation coefficient between FMI and CHM is 0.40. its accuracy is able to be improved by a linear regression model withinconfidence intervals of 95%. After compensation of ASTER-GDEM and FMI, we calculated the tree height in Mengla test site with DEM Differential Method. The results showed that the corrected ASTER-GDEM can effectively improve the assessment accuracy. The average assessment accuracy before and after corrected is 0.73 and 0.76, the RMSE is 5.5 and 4.4, respectively.
NASA Astrophysics Data System (ADS)
Bouffon, T.; Rice, R.; Bales, R.
2006-12-01
The spatial distributions of snow water equivalent (SWE) and snow depth within a 1, 4, and 16 km2 grid element around two automated snow pillows in a forested and open- forested region of the Upper Merced River Basin (2,800 km2) of Yosemite National Park were characterized using field observations and analyzed using binary regression trees. Snow surveys occurred at the forested site during the accumulation and ablation seasons, while at the open-forest site a survey was performed only during the accumulation season. An average of 130 snow depth and 7 snow density measurements were made on each survey, within the 4 km2 grid. Snow depth was distributed using binary regression trees and geostatistical methods using the physiographic parameters (e.g. elevation, slope, vegetation, aspect). Results in the forest region indicate that the snow pillow overestimated average SWE within the 1, 4, and 16 km2 areas by 34 percent during ablation, but during accumulation the snow pillow provides a good estimate of the modeled mean SWE grid value, however it is suspected that the snow pillow was underestimating SWE. However, at the open forest site, during accumulation, the snow pillow was 28 percent greater than the mean modeled grid element. In addition, the binary regression trees indicate that the independent variables of vegetation, slope, and aspect are the most influential parameters of snow depth distribution. The binary regression tree and multivariate linear regression models explain about 60 percent of the initial variance for snow depth and 80 percent for density, respectively. This short-term study provides motivation and direction for the installation of a distributed snow measurement network to fill the information gap in basin-wide SWE and snow depth measurements. Guided by these results, a distributed snow measurement network was installed in the Fall 2006 at Gin Flat in the Upper Merced River Basin with the specific objective of measuring accumulation and ablation across topographic variables with the aim of providing guidance for future larger scale observation network designs.
Hernandez, J E; Epstein, L D; Rodriguez, M H; Rodriguez, A D; Rejmankova, E; Roberts, D R
1997-03-01
We propose the use of generalized tree models (GTMs) to analyze data from entomological field studies. Generalized tree models can be used to characterize environments with different mosquito breeding capacity. A GTM simultaneously analyzes a set of predictor variables (e.g., vegetation coverage) in relation to a response variable (e.g., counts of Anopheles albimanus larvae), and how it varies with respect to a set of criterion variables (e.g., presence of predators). The algorithm produces a treelike graphical display with its root at the top and 2 branches stemming down from each node. At each node, conditions on the value of predictors partition the observations into subgroups (environments) in which the relation between response and criterion variables is most homogeneous.
Kirk M. Stueve; Dawna L. Cerney; Regina M. Rochefort; Laurie L. Kurth
2009-01-01
We performed classification analysis of 1970 satellite imagery and 2003 aerial photography to delineate establishment. Local site conditions were calculated from a LIDAR-based DEM, ancillary climate data, and 1970 tree locations in a GIS. We used logistic regression on a spatially weighted landscape matrix to rank variables.
Mukherjee, Arideep; Agrawal, Madhoolika
2018-05-15
Responses of urban vegetation to air pollution stress in relation to their tolerance and sensitivity have been extensively studied, however, studies related to air pollution responses based on different leaf functional traits and tree characteristics are limited. In this paper, we have tried to assess combined and individual effects of major air pollutants PM 10 (particulate matter ≤ 10 µm), TSP (total suspended particulate matter), SO 2 (sulphur dioxide), NO 2 (nitrogen dioxide) and O 3 (ozone) on thirteen tropical tree species in relation to fifteen leaf functional traits and different tree characteristics. Stepwise linear regression a general linear modelling approach was used to quantify the pollution response of trees against air pollutants. The study was performed for six successive seasons for two years in three distinct urban areas (traffic, industrial and residential) of Varanasi city in India. At all the study sites, concentrations of air pollutants, specifically PM (particulate matter) and NO 2 were above the specified standards. Distinct variations were recorded in all the fifteen leaf functional traits with pollution load. Caesalpinia sappan was identified as most tolerant species followed by Psidium guajava, Dalbergia sissoo and Albizia lebbeck. Stepwise regression analysis identified maximum response of Eucalyptus citriodora and P. guajava to air pollutants explaining overall 59% and 58% variability's in leaf functional traits, respectively. Among leaf functional traits, maximum effect of air pollutants was observed on non-enzymatic antioxidants followed by photosynthetic pigments and leaf water status. Among the pollutants, PM was identified as the major stress factor followed by O 3 explaining 47% and 33% variability's in leaf functional traits. Tolerance and pollution response were regulated by different tree characteristics such as height, canopy size, leaf from, texture and nature of tree. Outcomes of this study will help in urban forest development by selection of specific pollutant tolerant tree species and leaf traits, which is suitable as air pollution mitigation measure. Copyright © 2018 Elsevier Inc. All rights reserved.
BAYESIAN METHODS FOR REGIONAL-SCALE EUTROPHICATION MODELS. (R830887)
We demonstrate a Bayesian classification and regression tree (CART) approach to link multiple environmental stressors to biological responses and quantify uncertainty in model predictions. Such an approach can: (1) report prediction uncertainty, (2) be consistent with the amou...
Static terrestrial laser scanning of juvenile understory trees for field phenotyping
NASA Astrophysics Data System (ADS)
Wang, Huanhuan; Lin, Yi
2014-11-01
This study was to attempt the cutting-edge 3D remote sensing technique of static terrestrial laser scanning (TLS) for parametric 3D reconstruction of juvenile understory trees. The data for test was collected with a Leica HDS6100 TLS system in a single-scan way. The geometrical structures of juvenile understory trees are extracted by model fitting. Cones are used to model trunks and branches. Principal component analysis (PCA) is adopted to calculate their major axes. Coordinate transformation and orthogonal projection are used to estimate the parameters of the cones. Then, AutoCAD is utilized to simulate the morphological characteristics of the understory trees, and to add secondary branches and leaves in a random way. Comparison of the reference values and the estimated values gives the regression equation and shows that the proposed algorithm of extracting parameters is credible. The results have basically verified the applicability of TLS for field phenotyping of juvenile understory trees.
Measuring urban tree loss dynamics across residential landscapes.
Ossola, Alessandro; Hopton, Matthew E
2018-01-15
The spatial arrangement of urban vegetation depends on urban morphology and socio-economic settings. Urban vegetation changes over time because of human management. Urban trees are removed due to hazard prevention or aesthetic preferences. Previous research attributed tree loss to decreases in canopy cover. However, this provides little information about location and structural characteristics of trees lost, as well as environmental and social factors affecting tree loss dynamics. This is particularly relevant in residential landscapes where access to residential parcels for field surveys is limited. We tested whether multi-temporal airborne LiDAR and multi-spectral imagery collected at a 5-year interval can be used to investigate urban tree loss dynamics across residential landscapes in Denver, CO and Milwaukee, WI, covering 400,705 residential parcels in 444 census tracts. Position and stem height of trees lost were extracted from canopy height models calculated as the difference between final (year 5) and initial (year 0) vegetation height derived from LiDAR. Multivariate regression models were used to predict number and height of tree stems lost in residential parcels in each census tract based on urban morphological and socio-economic variables. A total of 28,427 stems were lost from residential parcels in Denver and Milwaukee over 5years. Overall, 7% of residential parcels lost one stem, averaging 90.87 stems per km 2 . Average stem height was 10.16m, though trees lost in Denver were taller compared to Milwaukee. The number of stems lost was higher in neighborhoods with higher canopy cover and developed before the 1970s. However, socio-economic characteristics had little effect on tree loss dynamics. The study provides a simple method for measuring urban tree loss dynamics within and across entire cities, and represents a further step toward high resolution assessments of the three-dimensional change of urban vegetation at large spatial scales. Published by Elsevier B.V.
A hybrid PCA-CART-MARS-based prognostic approach of the remaining useful life for aircraft engines.
Sánchez Lasheras, Fernando; García Nieto, Paulino José; de Cos Juez, Francisco Javier; Mayo Bayón, Ricardo; González Suárez, Victor Manuel
2015-03-23
Prognostics is an engineering discipline that predicts the future health of a system. In this research work, a data-driven approach for prognostics is proposed. Indeed, the present paper describes a data-driven hybrid model for the successful prediction of the remaining useful life of aircraft engines. The approach combines the multivariate adaptive regression splines (MARS) technique with the principal component analysis (PCA), dendrograms and classification and regression trees (CARTs). Elements extracted from sensor signals are used to train this hybrid model, representing different levels of health for aircraft engines. In this way, this hybrid algorithm is used to predict the trends of these elements. Based on this fitting, one can determine the future health state of a system and estimate its remaining useful life (RUL) with accuracy. To evaluate the proposed approach, a test was carried out using aircraft engine signals collected from physical sensors (temperature, pressure, speed, fuel flow, etc.). Simulation results show that the PCA-CART-MARS-based approach can forecast faults long before they occur and can predict the RUL. The proposed hybrid model presents as its main advantage the fact that it does not require information about the previous operation states of the input variables of the engine. The performance of this model was compared with those obtained by other benchmark models (multivariate linear regression and artificial neural networks) also applied in recent years for the modeling of remaining useful life. Therefore, the PCA-CART-MARS-based approach is very promising in the field of prognostics of the RUL for aircraft engines.
A Hybrid PCA-CART-MARS-Based Prognostic Approach of the Remaining Useful Life for Aircraft Engines
Lasheras, Fernando Sánchez; Nieto, Paulino José García; de Cos Juez, Francisco Javier; Bayón, Ricardo Mayo; Suárez, Victor Manuel González
2015-01-01
Prognostics is an engineering discipline that predicts the future health of a system. In this research work, a data-driven approach for prognostics is proposed. Indeed, the present paper describes a data-driven hybrid model for the successful prediction of the remaining useful life of aircraft engines. The approach combines the multivariate adaptive regression splines (MARS) technique with the principal component analysis (PCA), dendrograms and classification and regression trees (CARTs). Elements extracted from sensor signals are used to train this hybrid model, representing different levels of health for aircraft engines. In this way, this hybrid algorithm is used to predict the trends of these elements. Based on this fitting, one can determine the future health state of a system and estimate its remaining useful life (RUL) with accuracy. To evaluate the proposed approach, a test was carried out using aircraft engine signals collected from physical sensors (temperature, pressure, speed, fuel flow, etc.). Simulation results show that the PCA-CART-MARS-based approach can forecast faults long before they occur and can predict the RUL. The proposed hybrid model presents as its main advantage the fact that it does not require information about the previous operation states of the input variables of the engine. The performance of this model was compared with those obtained by other benchmark models (multivariate linear regression and artificial neural networks) also applied in recent years for the modeling of remaining useful life. Therefore, the PCA-CART-MARS-based approach is very promising in the field of prognostics of the RUL for aircraft engines. PMID:25806876
Eskelson, Bianca N.I.; Hagar, Joan; Temesgen, Hailemariam
2012-01-01
Snags (standing dead trees) are an essential structural component of forests. Because wildlife use of snags depends on size and decay stage, snag density estimation without any information about snag quality attributes is of little value for wildlife management decision makers. Little work has been done to develop models that allow multivariate estimation of snag density by snag quality class. Using climate, topography, Landsat TM data, stand age and forest type collected for 2356 forested Forest Inventory and Analysis plots in western Washington and western Oregon, we evaluated two multivariate techniques for their abilities to estimate density of snags by three decay classes. The density of live trees and snags in three decay classes (D1: recently dead, little decay; D2: decay, without top, some branches and bark missing; D3: extensive decay, missing bark and most branches) with diameter at breast height (DBH) ≥ 12.7 cm was estimated using a nonparametric random forest nearest neighbor imputation technique (RF) and a parametric two-stage model (QPORD), for which the number of trees per hectare was estimated with a Quasipoisson model in the first stage and the probability of belonging to a tree status class (live, D1, D2, D3) was estimated with an ordinal regression model in the second stage. The presence of large snags with DBH ≥ 50 cm was predicted using a logistic regression and RF imputation. Because of the more homogenous conditions on private forest lands, snag density by decay class was predicted with higher accuracies on private forest lands than on public lands, while presence of large snags was more accurately predicted on public lands, owing to the higher prevalence of large snags on public lands. RF outperformed the QPORD model in terms of percent accurate predictions, while QPORD provided smaller root mean square errors in predicting snag density by decay class. The logistic regression model achieved more accurate presence/absence classification of large snags than the RF imputation approach. Adjusting the decision threshold to account for unequal size for presence and absence classes is more straightforward for the logistic regression than for the RF imputation approach. Overall, model accuracies were poor in this study, which can be attributed to the poor predictive quality of the explanatory variables and the large range of forest types and geographic conditions observed in the data.
Accuracy and Calibration of Computational Approaches for Inpatient Mortality Predictive Modeling.
Nakas, Christos T; Schütz, Narayan; Werners, Marcus; Leichtle, Alexander B
2016-01-01
Electronic Health Record (EHR) data can be a key resource for decision-making support in clinical practice in the "big data" era. The complete database from early 2012 to late 2015 involving hospital admissions to Inselspital Bern, the largest Swiss University Hospital, was used in this study, involving over 100,000 admissions. Age, sex, and initial laboratory test results were the features/variables of interest for each admission, the outcome being inpatient mortality. Computational decision support systems were utilized for the calculation of the risk of inpatient mortality. We assessed the recently proposed Acute Laboratory Risk of Mortality Score (ALaRMS) model, and further built generalized linear models, generalized estimating equations, artificial neural networks, and decision tree systems for the predictive modeling of the risk of inpatient mortality. The Area Under the ROC Curve (AUC) for ALaRMS marginally corresponded to the anticipated accuracy (AUC = 0.858). Penalized logistic regression methodology provided a better result (AUC = 0.872). Decision tree and neural network-based methodology provided even higher predictive performance (up to AUC = 0.912 and 0.906, respectively). Additionally, decision tree-based methods can efficiently handle Electronic Health Record (EHR) data that have a significant amount of missing records (in up to >50% of the studied features) eliminating the need for imputation in order to have complete data. In conclusion, we show that statistical learning methodology can provide superior predictive performance in comparison to existing methods and can also be production ready. Statistical modeling procedures provided unbiased, well-calibrated models that can be efficient decision support tools for predicting inpatient mortality and assigning preventive measures.
Sung, Sheng-Feng; Hsieh, Cheng-Yang; Kao Yang, Yea-Huei; Lin, Huey-Juan; Chen, Chih-Hung; Chen, Yu-Wei; Hu, Ya-Han
2015-11-01
Case-mix adjustment is difficult for stroke outcome studies using administrative data. However, relevant prescription, laboratory, procedure, and service claims might be surrogates for stroke severity. This study proposes a method for developing a stroke severity index (SSI) by using administrative data. We identified 3,577 patients with acute ischemic stroke from a hospital-based registry and analyzed claims data with plenty of features. Stroke severity was measured using the National Institutes of Health Stroke Scale (NIHSS). We used two data mining methods and conventional multiple linear regression (MLR) to develop prediction models, comparing the model performance according to the Pearson correlation coefficient between the SSI and the NIHSS. We validated these models in four independent cohorts by using hospital-based registry data linked to a nationwide administrative database. We identified seven predictive features and developed three models. The k-nearest neighbor model (correlation coefficient, 0.743; 95% confidence interval: 0.737, 0.749) performed slightly better than the MLR model (0.742; 0.736, 0.747), followed by the regression tree model (0.737; 0.731, 0.742). In the validation cohorts, the correlation coefficients were between 0.677 and 0.725 for all three models. The claims-based SSI enables adjusting for disease severity in stroke studies using administrative data. Copyright © 2015 Elsevier Inc. All rights reserved.
Chilling and heat requirements for flowering in temperate fruit trees
NASA Astrophysics Data System (ADS)
Guo, Liang; Dai, Junhu; Ranjitkar, Sailesh; Yu, Haiying; Xu, Jianchu; Luedeling, Eike
2014-08-01
Climate change has affected the rates of chilling and heat accumulation, which are vital for flowering and production, in temperate fruit trees, but few studies have been conducted in the cold-winter climates of East Asia. To evaluate tree responses to variation in chill and heat accumulation rates, partial least squares regression was used to correlate first flowering dates of chestnut ( Castanea mollissima Blume) and jujube ( Zizyphus jujube Mill.) in Beijing, China, with daily chill and heat accumulation between 1963 and 2008. The Dynamic Model and the Growing Degree Hour Model were used to convert daily records of minimum and maximum temperature into horticulturally meaningful metrics. Regression analyses identified the chilling and forcing periods for chestnut and jujube. The forcing periods started when half the chilling requirements were fulfilled. Over the past 50 years, heat accumulation during tree dormancy increased significantly, while chill accumulation remained relatively stable for both species. Heat accumulation was the main driver of bloom timing, with effects of variation in chill accumulation negligible in Beijing's cold-winter climate. It does not seem likely that reductions in chill will have a major effect on the studied species in Beijing in the near future. Such problems are much more likely for trees grown in locations that are substantially warmer than their native habitats, such as temperate species in the subtropics and tropics.
Chilling and heat requirements for flowering in temperate fruit trees.
Guo, Liang; Dai, Junhu; Ranjitkar, Sailesh; Yu, Haiying; Xu, Jianchu; Luedeling, Eike
2014-08-01
Climate change has affected the rates of chilling and heat accumulation, which are vital for flowering and production, in temperate fruit trees, but few studies have been conducted in the cold-winter climates of East Asia. To evaluate tree responses to variation in chill and heat accumulation rates, partial least squares regression was used to correlate first flowering dates of chestnut (Castanea mollissima Blume) and jujube (Zizyphus jujube Mill.) in Beijing, China, with daily chill and heat accumulation between 1963 and 2008. The Dynamic Model and the Growing Degree Hour Model were used to convert daily records of minimum and maximum temperature into horticulturally meaningful metrics. Regression analyses identified the chilling and forcing periods for chestnut and jujube. The forcing periods started when half the chilling requirements were fulfilled. Over the past 50 years, heat accumulation during tree dormancy increased significantly, while chill accumulation remained relatively stable for both species. Heat accumulation was the main driver of bloom timing, with effects of variation in chill accumulation negligible in Beijing’s cold-winter climate. It does not seem likely that reductions in chill will have a major effect on the studied species in Beijing in the near future. Such problems are much more likely for trees grown in locations that are substantially warmer than their native habitats, such as temperate species in the subtropics and tropics.
Development of post-fire crown damage mortality thresholds in ponderosa pine
James F. Fowler; Carolyn Hull Sieg; Joel McMillin; Kurt K. Allen; Jose F. Negron; Linda L. Wadleigh; John A. Anhold; Ken E. Gibson
2010-01-01
Previous research has shown that crown scorch volume and crown consumption volume are the major predictors of post-fire mortality in ponderosa pine. In this study, we use piecewise logistic regression models of crown scorch data from 6633 trees in five wildfires from the Intermountain West to locate a mortality threshold at 88% scorch by volume for trees with no crown...
Regression trees modeling and forecasting of PM10 air pollution in urban areas
NASA Astrophysics Data System (ADS)
Stoimenova, M.; Voynikova, D.; Ivanov, A.; Gocheva-Ilieva, S.; Iliev, I.
2017-10-01
Fine particulate matter (PM10) air pollution is a serious problem affecting the health of the population in many Bulgarian cities. As an example, the object of this study is the pollution with PM10 of the town of Pleven, Northern Bulgaria. The measured concentrations of this air pollutant for this city consistently exceeded the permissible limits set by European and national legislation. Based on data for the last 6 years (2011-2016), the analysis shows that this applies both to the daily limit of 50 micrograms per cubic meter and the allowable number of daily concentration exceedances to 35 per year. Also, the average annual concentration of PM10 exceeded the prescribed norm of no more than 40 micrograms per cubic meter. The aim of this work is to build high performance mathematical models for effective prediction and forecasting the level of PM10 pollution. The study was conducted with the powerful flexible data mining technique Classification and Regression Trees (CART). The values of PM10 were fitted with respect to meteorological data such as maximum and minimum air temperature, relative humidity, wind speed and direction and others, as well as with time and autoregressive variables. As a result the obtained CART models demonstrate high predictive ability and fit the actual data with up to 80%. The best models were applied for forecasting the level pollution for 3 to 7 days ahead. An interpretation of the modeling results is presented.
Long-Term Vegetation Trends Detected In Northern Canada Using Landsat Image Stacks
NASA Astrophysics Data System (ADS)
Fraser, R.; Olthof, I.; Carrière, M.; Deschamps, A.; Pouliot, D.
2011-12-01
Evidence of recent productivity increases in arctic vegetation comes from a variety of sources. At local scales, long-term plot measurements in North America are beginning to record increases in vascular plant cover and biomass. At landscape scales, expansion and densification of shrubs has been observed using repeat oblique photographs. Finally, continental-scale increases in vegetation "greenness" have been documented based on analysis of coarse resolution (≥ 1 km) NOAA-AVHRR satellite imagery. In this study we investigated intermediate, regional-level changes occurring in tundra vegetation since 1984 using the Landsat TM and ETM+ satellite image archive. Four study areas averaging 13,619 km2 were located over widely distributed national parks in northern Canada (Ivvavik, Sirmilik, Torngat Mountains, and Wapusk). Time-series image stacks of 16-41 growing-season Landsat scenes from overlapping WRS-2 frames were acquired spanning periods of 17-25 years. Each pixel's unique temporal database of clear-sky values was then analyzed for trends in four indices (NDVI, Tasseled Cap Brightness, Greenness and Wetness) using robust linear regression. The trends were further related to changes in the fractional cover of functional vegetation types using regression tree models trained with plot data and high resolution (≤ 10 m) satellite imagery. We found all four study areas to have a larger proportion of significant (p<0.05) positive greenness trends (range 6.1-25.5%) by comparison to negative trends (range 0.3-4.1%). For the three study areas where regression tree models could be derived, consistent trends of increasing shrub or vascular fractional cover and decreasing bare cover were predicted. The Landsat-based observations were associated with warming trends in each park over the analysis periods. Many of the major changes observed could be corroborated using published studies or field observations.
McLaughlin, Samuel B; Wullschleger, Stan D; Nosal, Miloslav
2003-11-01
To evaluate indicators of whole-tree physiological responses to climate stress, we determined seasonal, daily and diurnal patterns of growth and water use in 10 yellow poplar (Liriodendron tulipifera L.) trees in a stand recently released from competition. Precise measurements of stem increment and sap flow made with automated electronic dendrometers and thermal dissipation probes, respectively, indicated close temporal linkages between water use and patterns of stem shrinkage and swelling during daily cycles of water depletion and recharge of extensible outer-stem tissues. These cycles also determined net daily basal area increment. Multivariate regression models based on a 123-day data series showed that daily diameter increments were related negatively to vapor pressure deficit (VPD), but positively to precipitation and temperature. The same model form with slight changes in coefficients yielded coefficients of determination of about 0.62 (0.57-0.66) across data subsets that included widely variable growth rates and VPDs. Model R2 was improved to 0.75 by using 3-day running mean daily growth data. Rapid recovery of stem diameter growth following short-term, diurnal reductions in VPD indicated that water stored in extensible stem tissues was part of a fast recharge system that limited hydration changes in the cambial zone during periods of water stress. There were substantial differences in the seasonal dynamics of growth among individual trees, and analyses indicated that faster-growing trees were more positively affected by precipitation, solar irradiance and temperature and more negatively affected by high VPD than slower-growing trees. There were no negative effects of ozone on daily growth rates in a year of low ozone concentrations.
Multiplicative Forests for Continuous-Time Processes
Weiss, Jeremy C.; Natarajan, Sriraam; Page, David
2013-01-01
Learning temporal dependencies between variables over continuous time is an important and challenging task. Continuous-time Bayesian networks effectively model such processes but are limited by the number of conditional intensity matrices, which grows exponentially in the number of parents per variable. We develop a partition-based representation using regression trees and forests whose parameter spaces grow linearly in the number of node splits. Using a multiplicative assumption we show how to update the forest likelihood in closed form, producing efficient model updates. Our results show multiplicative forests can be learned from few temporal trajectories with large gains in performance and scalability. PMID:25284967
Multiplicative Forests for Continuous-Time Processes.
Weiss, Jeremy C; Natarajan, Sriraam; Page, David
2012-01-01
Learning temporal dependencies between variables over continuous time is an important and challenging task. Continuous-time Bayesian networks effectively model such processes but are limited by the number of conditional intensity matrices, which grows exponentially in the number of parents per variable. We develop a partition-based representation using regression trees and forests whose parameter spaces grow linearly in the number of node splits. Using a multiplicative assumption we show how to update the forest likelihood in closed form, producing efficient model updates. Our results show multiplicative forests can be learned from few temporal trajectories with large gains in performance and scalability.
Quirke, Michael; Curran, Emma May; O'Kelly, Patrick; Moran, Ruth; Daly, Eimear; Aylward, Seamus; McElvaney, Gerry; Wakai, Abel
2018-01-01
To measure the percentage rate and risk factors for amendment in the type, duration and setting of outpatient parenteral antimicrobial therapy ( OPAT) for the treatment of cellulitis. A retrospective cohort study of adult patients receiving OPAT for cellulitis was performed. Treatment amendment (TA) was defined as hospital admission or change in antibiotic therapy in order to achieve clinical response. Multivariable logistic regression (MVLR) and classification and regression tree (CART) analysis were performed. There were 307 patients enrolled. TA occurred in 36 patients (11.7%). Significant risk factors for TA on MVLR were increased age, increased Numerical Pain Scale Score (NPSS) and immunocompromise. The median OPAT duration was 7 days. Increased age, heart rate and C reactive protein were associated with treatment prolongation. CART analysis selected age <64.5 years, female gender and NPSS <2.5 in the final model, generating a low-sensitivity (27.8%), high-specificity (97.1%) decision tree. Increased age, NPSS and immunocompromise were associated with OPAT amendment. These identified risk factors can be used to support an evidence-based approach to patient selection for OPAT in cellulitis. The CART algorithm has good specificity but lacks sensitivity and is shown to be inferior in this study to logistic regression modelling. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
Weighted linear regression using D2H and D2 as the independent variables
Hans T. Schreuder; Michael S. Williams
1998-01-01
Several error structures for weighted regression equations used for predicting volume were examined for 2 large data sets of felled and standing loblolly pine trees (Pinus taeda L.). The generally accepted model with variance of error proportional to the value of the covariate squared ( D2H = diameter squared times height or D...
Trends and Tipping Points of Drought-induced Tree Mortality
NASA Astrophysics Data System (ADS)
Huang, K.; Yi, C.; Wu, D.; Zhou, T.; Zhao, X.; Blanford, W. J.; Wei, S.; Wu, H.; Du, L.
2014-12-01
Drought-induced tree mortality worldwide has been recently reported in a review of the literature by Allen et al. (2010). However, a quantitative relationship between widespread loss of forest from mortality and drought is still a key knowledge gap. Specifically, the field lacks quantitative knowledge of tipping point in trees when coping with water stress, which inhibits the assessments of how climate change affects the forest ecosystem. We investigate the statistical relationships for different (seven) conifer species between Ring Width Index (RWI) and Standardized Precipitation Evapotranspiration Index (SPEI), based on 411 chronologies from the International Tree-Ring Data Bank across 11 states of the western United States. We found robust species-specific relationships between RWI and SPEI for all seven conifer species at dry condition. The regression models show that the RWI decreases with SPEI decreasing (drying) and more than 76% variation of tree growth (RWI) can be explained by the drought index (SPEI). However, when soil water is sufficient (i.e., SPEI>SPEIu), soil water is no longer a restrictive factor for tree growth and, therefore, the RWI shows a weak correlation with SPEI. Based on the statistical models, we derived the tipping point of SPEI (SPEItp) where the RWI equals 0, which means the carbon efflux by tree respiration equals carbon influx by tree photosynthesis. When the severity of drought exceeds this tipping point(i.e. SPEI
Pre-operative prediction of surgical morbidity in children: comparison of five statistical models.
Cooper, Jennifer N; Wei, Lai; Fernandez, Soledad A; Minneci, Peter C; Deans, Katherine J
2015-02-01
The accurate prediction of surgical risk is important to patients and physicians. Logistic regression (LR) models are typically used to estimate these risks. However, in the fields of data mining and machine-learning, many alternative classification and prediction algorithms have been developed. This study aimed to compare the performance of LR to several data mining algorithms for predicting 30-day surgical morbidity in children. We used the 2012 National Surgical Quality Improvement Program-Pediatric dataset to compare the performance of (1) a LR model that assumed linearity and additivity (simple LR model) (2) a LR model incorporating restricted cubic splines and interactions (flexible LR model) (3) a support vector machine, (4) a random forest and (5) boosted classification trees for predicting surgical morbidity. The ensemble-based methods showed significantly higher accuracy, sensitivity, specificity, PPV, and NPV than the simple LR model. However, none of the models performed better than the flexible LR model in terms of the aforementioned measures or in model calibration or discrimination. Support vector machines, random forests, and boosted classification trees do not show better performance than LR for predicting pediatric surgical morbidity. After further validation, the flexible LR model derived in this study could be used to assist with clinical decision-making based on patient-specific surgical risks. Copyright © 2014 Elsevier Ltd. All rights reserved.
Non-Parametric Blur Map Regression for Depth of Field Extension.
D'Andres, Laurent; Salvador, Jordi; Kochale, Axel; Susstrunk, Sabine
2016-04-01
Real camera systems have a limited depth of field (DOF) which may cause an image to be degraded due to visible misfocus or too shallow DOF. In this paper, we present a blind deblurring pipeline able to restore such images by slightly extending their DOF and recovering sharpness in regions slightly out of focus. To address this severely ill-posed problem, our algorithm relies first on the estimation of the spatially varying defocus blur. Drawing on local frequency image features, a machine learning approach based on the recently introduced regression tree fields is used to train a model able to regress a coherent defocus blur map of the image, labeling each pixel by the scale of a defocus point spread function. A non-blind spatially varying deblurring algorithm is then used to properly extend the DOF of the image. The good performance of our algorithm is assessed both quantitatively, using realistic ground truth data obtained with a novel approach based on a plenoptic camera, and qualitatively with real images.
Drought impact functions as intermediate step towards drought damage assessment
NASA Astrophysics Data System (ADS)
Bachmair, Sophie; Svensson, Cecilia; Prosdocimi, Ilaria; Hannaford, Jamie; Helm Smith, Kelly; Svoboda, Mark; Stahl, Kerstin
2016-04-01
While damage or vulnerability functions for floods and seismic hazards have gained considerable attention, there is comparably little knowledge on drought damage or loss. On the one hand this is due to the complexity of the drought hazard affecting different domains of the hydrological cycle and different sectors of human activity. Hence, a single hazard indicator is likely not able to fully capture this multifaceted hazard. On the other hand, drought impacts are often non-structural and hard to quantify or monetize. Examples are impaired navigability of streams, restrictions on domestic water use, reduced hydropower production, reduced tree growth, and irreversible deterioration/loss of wetlands. Apart from reduced crop yield, data about drought damage or loss with adequate spatial and temporal resolution is scarce, making the development of drought damage functions difficult. As an intermediate step towards drought damage functions we exploit text-based reports on drought impacts from the European Drought Impact report Inventory and the US Drought Impact Reporter to derive surrogate information for drought damage or loss. First, text-based information on drought impacts is converted into timeseries of absence versus presence of impacts, or number of impact occurrences. Second, meaningful hydro-meteorological indicators characterizing drought intensity are identified. Third, different statistical models are tested as link functions relating drought hazard indicators with drought impacts: 1) logistic regression for drought impacts coded as binary response variable; and 2) mixture/hurdle models (zero-inflated/zero-altered negative binomial regression) and an ensemble regression tree approach for modeling the number of drought impact occurrences. Testing the predictability of (number of) drought impact occurrences based on cross-validation revealed a good agreement between observed and modeled (number of) impacts for regions at the scale of federal states or provinces with good data availability. Impact functions representing localized drought impacts are more challenging to construct given that less data is available, yet may provide information that more directly addresses stakeholders' needs. Overall, our study contributes insights into how drought intensity translates into ecological and socioeconomic impacts, and how such information may be used for enhancing drought monitoring and early warning.
Modified TAROT for cross-selling personal financial products
NASA Astrophysics Data System (ADS)
Tee, Ya-Mei; LEE, Lai-Soon; LEE, Chew-Ging; SEOW, Hsin-Vonn
2014-09-01
The Top Application characteristics Remainder Offer characteristics Tree (TAROT) was first introduced in 2007. This is a modified Classification and Regression Trees (CART) used to help decide which question(s) to ask potential applicants to customise an offer of a personal financial product so that it would have a high probability of take up. In this piece of work the authors are presenting, they have further modified the TAROT to cross TAROT, using its properties and modeling steps to deal with the issue of cross-selling. Since the bank already has ready customers, it would be ideal to cross-sell the financial products seeing that one can ask one (or more) further question(s) based on the initial offer to identify and customise another financial product to offer.
Arevalillo, Jorge M; Sztein, Marcelo B; Kotloff, Karen L; Levine, Myron M; Simon, Jakub K
2017-10-01
Immunologic correlates of protection are important in vaccine development because they give insight into mechanisms of protection, assist in the identification of promising vaccine candidates, and serve as endpoints in bridging clinical vaccine studies. Our goal is the development of a methodology to identify immunologic correlates of protection using the Shigella challenge as a model. The proposed methodology utilizes the Random Forests (RF) machine learning algorithm as well as Classification and Regression Trees (CART) to detect immune markers that predict protection, identify interactions between variables, and define optimal cutoffs. Logistic regression modeling is applied to estimate the probability of protection and the confidence interval (CI) for such a probability is computed by bootstrapping the logistic regression models. The results demonstrate that the combination of Classification and Regression Trees and Random Forests complements the standard logistic regression and uncovers subtle immune interactions. Specific levels of immunoglobulin IgG antibody in blood on the day of challenge predicted protection in 75% (95% CI 67-86). Of those subjects that did not have blood IgG at or above a defined threshold, 100% were protected if they had IgA antibody secreting cells above a defined threshold. Comparison with the results obtained by applying only logistic regression modeling with standard Akaike Information Criterion for model selection shows the usefulness of the proposed method. Given the complexity of the immune system, the use of machine learning methods may enhance traditional statistical approaches. When applied together, they offer a novel way to quantify important immune correlates of protection that may help the development of vaccines. Copyright © 2017 Elsevier Inc. All rights reserved.
The application of data mining techniques to oral cancer prognosis.
Tseng, Wan-Ting; Chiang, Wei-Fan; Liu, Shyun-Yeu; Roan, Jinsheng; Lin, Chun-Nan
2015-05-01
This study adopted an integrated procedure that combines the clustering and classification features of data mining technology to determine the differences between the symptoms shown in past cases where patients died from or survived oral cancer. Two data mining tools, namely decision tree and artificial neural network, were used to analyze the historical cases of oral cancer, and their performance was compared with that of logistic regression, the popular statistical analysis tool. Both decision tree and artificial neural network models showed superiority to the traditional statistical model. However, as to clinician, the trees created by the decision tree models are relatively easier to interpret compared to that of the artificial neural network models. Cluster analysis also discovers that those stage 4 patients whose also possess the following four characteristics are having an extremely low survival rate: pN is N2b, level of RLNM is level I-III, AJCC-T is T4, and cells mutate situation (G) is moderate.
Mereta, Seid Tiku; Yewhalaw, Delenasaw; Boets, Pieter; Ahmed, Abdulhakim; Duchateau, Luc; Speybroeck, Niko; Vanwambeke, Sophie O; Legesse, Worku; De Meester, Luc; Goethals, Peter L M
2013-11-04
A fundamental understanding of the spatial distribution and ecology of mosquito larvae is essential for effective vector control intervention strategies. In this study, data-driven decision tree models, generalized linear models and ordination analysis were used to identify the most important biotic and abiotic factors that affect the occurrence and abundance of mosquito larvae in Southwest Ethiopia. In total, 220 samples were taken at 180 sampling locations during the years 2010 and 2012. Sampling sites were characterized based on physical, chemical and biological attributes. The predictive performance of decision tree models was evaluated based on correctly classified instances (CCI), Cohen's kappa statistic (κ) and the determination coefficient (R2). A conditional analysis was performed on the regression tree models to test the relation between key environmental and biological parameters and the abundance of mosquito larvae. The decision tree model developed for anopheline larvae showed a good model performance (CCI = 84 ± 2%, and κ = 0.66 ± 0.04), indicating that the genus has clear habitat requirements. Anopheline mosquito larvae showed a widespread distribution and especially occurred in small human-made aquatic habitats. Water temperature, canopy cover, emergent vegetation cover, and presence of predators and competitors were found to be the main variables determining the abundance and distribution of anopheline larvae. In contrast, anopheline mosquito larvae were found to be less prominently present in permanent larval habitats. This could be attributed to the high abundance and diversity of natural predators and competitors suppressing the mosquito population densities. The findings of this study suggest that targeting smaller human-made aquatic habitats could result in effective larval control of anopheline mosquitoes in the study area. Controlling the occurrence of mosquito larvae via drainage of permanent wetlands may not be a good management strategy as it negatively affects the occurrence and abundance of mosquito predators and competitors and promotes an increase in anopheline population densities.
2013-01-01
Background A fundamental understanding of the spatial distribution and ecology of mosquito larvae is essential for effective vector control intervention strategies. In this study, data-driven decision tree models, generalized linear models and ordination analysis were used to identify the most important biotic and abiotic factors that affect the occurrence and abundance of mosquito larvae in Southwest Ethiopia. Methods In total, 220 samples were taken at 180 sampling locations during the years 2010 and 2012. Sampling sites were characterized based on physical, chemical and biological attributes. The predictive performance of decision tree models was evaluated based on correctly classified instances (CCI), Cohen’s kappa statistic (κ) and the determination coefficient (R2). A conditional analysis was performed on the regression tree models to test the relation between key environmental and biological parameters and the abundance of mosquito larvae. Results The decision tree model developed for anopheline larvae showed a good model performance (CCI = 84 ± 2%, and κ = 0.66 ± 0.04), indicating that the genus has clear habitat requirements. Anopheline mosquito larvae showed a widespread distribution and especially occurred in small human-made aquatic habitats. Water temperature, canopy cover, emergent vegetation cover, and presence of predators and competitors were found to be the main variables determining the abundance and distribution of anopheline larvae. In contrast, anopheline mosquito larvae were found to be less prominently present in permanent larval habitats. This could be attributed to the high abundance and diversity of natural predators and competitors suppressing the mosquito population densities. Conclusions The findings of this study suggest that targeting smaller human-made aquatic habitats could result in effective larval control of anopheline mosquitoes in the study area. Controlling the occurrence of mosquito larvae via drainage of permanent wetlands may not be a good management strategy as it negatively affects the occurrence and abundance of mosquito predators and competitors and promotes an increase in anopheline population densities. PMID:24499518
Random Forest as a Predictive Analytics Alternative to Regression in Institutional Research
ERIC Educational Resources Information Center
He, Lingjun; Levine, Richard A.; Fan, Juanjuan; Beemer, Joshua; Stronach, Jeanne
2018-01-01
In institutional research, modern data mining approaches are seldom considered to address predictive analytics problems. The goal of this paper is to highlight the advantages of tree-based machine learning algorithms over classic (logistic) regression methods for data-informed decision making in higher education problems, and stress the success of…
NASA Technical Reports Server (NTRS)
Stolzer, Alan J.; Halford, Carl
2007-01-01
In a previous study, multiple regression techniques were applied to Flight Operations Quality Assurance-derived data to develop parsimonious model(s) for fuel consumption on the Boeing 757 airplane. The present study examined several data mining algorithms, including neural networks, on the fuel consumption problem and compared them to the multiple regression results obtained earlier. Using regression methods, parsimonious models were obtained that explained approximately 85% of the variation in fuel flow. In general data mining methods were more effective in predicting fuel consumption. Classification and Regression Tree methods reported correlation coefficients of .91 to .92, and General Linear Models and Multilayer Perceptron neural networks reported correlation coefficients of about .99. These data mining models show great promise for use in further examining large FOQA databases for operational and safety improvements.
Deo, Ravinesh C; Downs, Nathan; Parisi, Alfio V; Adamowski, Jan F; Quilty, John M
2017-05-01
Exposure to erythemally-effective solar ultraviolet radiation (UVR) that contributes to malignant keratinocyte cancers and associated health-risk is best mitigated through innovative decision-support systems, with global solar UV index (UVI) forecast necessary to inform real-time sun-protection behaviour recommendations. It follows that the UVI forecasting models are useful tools for such decision-making. In this study, a model for computationally-efficient data-driven forecasting of diffuse and global very short-term reactive (VSTR) (10-min lead-time) UVI, enhanced by drawing on the solar zenith angle (θ s ) data, was developed using an extreme learning machine (ELM) algorithm. An ELM algorithm typically serves to address complex and ill-defined forecasting problems. UV spectroradiometer situated in Toowoomba, Australia measured daily cycles (0500-1700h) of UVI over the austral summer period. After trialling activations functions based on sine, hard limit, logarithmic and tangent sigmoid and triangular and radial basis networks for best results, an optimal ELM architecture utilising logarithmic sigmoid equation in hidden layer, with lagged combinations of θ s as the predictor data was developed. ELM's performance was evaluated using statistical metrics: correlation coefficient (r), Willmott's Index (WI), Nash-Sutcliffe efficiency coefficient (E NS ), root mean square error (RMSE), and mean absolute error (MAE) between observed and forecasted UVI. Using these metrics, the ELM model's performance was compared to that of existing methods: multivariate adaptive regression spline (MARS), M5 Model Tree, and a semi-empirical (Pro6UV) clear sky model. Based on RMSE and MAE values, the ELM model (0.255, 0.346, respectively) outperformed the MARS (0.310, 0.438) and M5 Model Tree (0.346, 0.466) models. Concurring with these metrics, the Willmott's Index for the ELM, MARS and M5 Model Tree models were 0.966, 0.942 and 0.934, respectively. About 57% of the ELM model's absolute errors were small in magnitude (±0.25), whereas the MARS and M5 Model Tree models generated 53% and 48% of such errors, respectively, indicating the latter models' errors to be distributed in larger magnitude error range. In terms of peak global UVI forecasting, with half the level of error, the ELM model outperformed MARS and M5 Model Tree. A comparison of the magnitude of hourly-cumulated errors of 10-min lead time forecasts for diffuse and global UVI highlighted ELM model's greater accuracy compared to MARS, M5 Model Tree or Pro6UV models. This confirmed the versatility of an ELM model drawing on θ s data for VSTR forecasting of UVI at near real-time horizon. When applied to the goal of enhancing expert systems, ELM-based accurate forecasts capable of reacting quickly to measured conditions can enhance real-time exposure advice for the public, mitigating the potential for solar UV-exposure-related disease. Crown Copyright © 2017. Published by Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Kaskhedikar, Apoorva Prakash
According to the U.S. Energy Information Administration, commercial buildings represent about 40% of the United State's energy consumption of which office buildings consume a major portion. Gauging the extent to which an individual building consumes energy in excess of its peers is the first step in initiating energy efficiency improvement. Energy Benchmarking offers initial building energy performance assessment without rigorous evaluation. Energy benchmarking tools based on the Commercial Buildings Energy Consumption Survey (CBECS) database are investigated in this thesis. This study proposes a new benchmarking methodology based on decision trees, where a relationship between the energy use intensities (EUI) and building parameters (continuous and categorical) is developed for different building types. This methodology was applied to medium office and school building types contained in the CBECS database. The Random Forest technique was used to find the most influential parameters that impact building energy use intensities. Subsequently, correlations which were significant were identified between EUIs and CBECS variables. Other than floor area, some of the important variables were number of workers, location, number of PCs and main cooling equipment. The coefficient of variation was used to evaluate the effectiveness of the new model. The customization technique proposed in this thesis was compared with another benchmarking model that is widely used by building owners and designers namely, the ENERGY STAR's Portfolio Manager. This tool relies on the standard Linear Regression methods which is only able to handle continuous variables. The model proposed uses data mining technique and was found to perform slightly better than the Portfolio Manager. The broader impacts of the new benchmarking methodology proposed is that it allows for identifying important categorical variables, and then incorporating them in a local, as against a global, model framework for EUI pertinent to the building type. The ability to identify and rank the important variables is of great importance in practical implementation of the benchmarking tools which rely on query-based building and HVAC variable filters specified by the user.
Modeling ready biodegradability of fragrance materials.
Ceriani, Lidia; Papa, Ester; Kovarich, Simona; Boethling, Robert; Gramatica, Paola
2015-06-01
In the present study, quantitative structure activity relationships were developed for predicting ready biodegradability of approximately 200 heterogeneous fragrance materials. Two classification methods, classification and regression tree (CART) and k-nearest neighbors (kNN), were applied to perform the modeling. The models were validated with multiple external prediction sets, and the structural applicability domain was verified by the leverage approach. The best models had good sensitivity (internal ≥80%; external ≥68%), specificity (internal ≥80%; external 73%), and overall accuracy (≥75%). Results from the comparison with BIOWIN global models, based on group contribution method, show that specific models developed in the present study perform better in prediction than BIOWIN6, in particular for the correct classification of not readily biodegradable fragrance materials. © 2015 SETAC.
The wisdom of the commons: ensemble tree classifiers for prostate cancer prognosis.
Koziol, James A; Feng, Anne C; Jia, Zhenyu; Wang, Yipeng; Goodison, Seven; McClelland, Michael; Mercola, Dan
2009-01-01
Classification and regression trees have long been used for cancer diagnosis and prognosis. Nevertheless, instability and variable selection bias, as well as overfitting, are well-known problems of tree-based methods. In this article, we investigate whether ensemble tree classifiers can ameliorate these difficulties, using data from two recent studies of radical prostatectomy in prostate cancer. Using time to progression following prostatectomy as the relevant clinical endpoint, we found that ensemble tree classifiers robustly and reproducibly identified three subgroups of patients in the two clinical datasets: non-progressors, early progressors and late progressors. Moreover, the consensus classifications were independent predictors of time to progression compared to known clinical prognostic factors.
Bridging process-based and empirical approaches to modeling tree growth
Harry T. Valentine; Annikki Makela; Annikki Makela
2005-01-01
The gulf between process-based and empirical approaches to modeling tree growth may be bridged, in part, by the use of a common model. To this end, we have formulated a process-based model of tree growth that can be fitted and applied in an empirical mode. The growth model is grounded in pipe model theory and an optimal control model of crown development. Together, the...
E. Gregory McPherson; Paula J. Peper
2012-01-01
This paper describes three long-term tree growth studies conducted to evaluate tree performance because repeated measurements of the same trees produce critical data for growth model calibration and validation. Several empirical and process-based approaches to modeling tree growth are reviewed. Modeling is more advanced in the fields of forestry and...
Cerasoli, Francesco; Iannella, Mattia; D'Alessandro, Paola; Biondi, Maurizio
2017-01-01
Boosted Regression Trees (BRT) is one of the modelling techniques most recently applied to biodiversity conservation and it can be implemented with presence-only data through the generation of artificial absences (pseudo-absences). In this paper, three pseudo-absences generation techniques are compared, namely the generation of pseudo-absences within target-group background (TGB), testing both the weighted (WTGB) and unweighted (UTGB) scheme, and the generation at random (RDM), evaluating their performance and applicability in distribution modelling and species conservation. The choice of the target group fell on amphibians, because of their rapid decline worldwide and the frequent lack of guidelines for conservation strategies and regional-scale planning, which instead could be provided through an appropriate implementation of SDMs. Bufo bufo, Salamandrina perspicillata and Triturus carnifex were considered as target species, in order to perform our analysis with species having different ecological and distributional characteristics. The study area is the "Gran Sasso-Monti della Laga" National Park, which hosts 15 Natura 2000 sites and represents one of the most important biodiversity hotspots in Europe. Our results show that the model calibration ameliorates when using the target-group based pseudo-absences compared to the random ones, especially when applying the WTGB. Contrarily, model discrimination did not significantly vary in a consistent way among the three approaches with respect to the tree target species. Both WTGB and RDM clearly isolate the highly contributing variables, supplying many relevant indications for species conservation actions. Moreover, the assessment of pairwise variable interactions and their three-dimensional visualization further increase the amount of useful information for protected areas' managers. Finally, we suggest the use of RDM as an admissible alternative when it is not possible to individuate a suitable set of species as a representative target-group from which the pseudo-absences can be generated.
Faults Discovery By Using Mined Data
NASA Technical Reports Server (NTRS)
Lee, Charles
2005-01-01
Fault discovery in the complex systems consist of model based reasoning, fault tree analysis, rule based inference methods, and other approaches. Model based reasoning builds models for the systems either by mathematic formulations or by experiment model. Fault Tree Analysis shows the possible causes of a system malfunction by enumerating the suspect components and their respective failure modes that may have induced the problem. The rule based inference build the model based on the expert knowledge. Those models and methods have one thing in common; they have presumed some prior-conditions. Complex systems often use fault trees to analyze the faults. Fault diagnosis, when error occurs, is performed by engineers and analysts performing extensive examination of all data gathered during the mission. International Space Station (ISS) control center operates on the data feedback from the system and decisions are made based on threshold values by using fault trees. Since those decision-making tasks are safety critical and must be done promptly, the engineers who manually analyze the data are facing time challenge. To automate this process, this paper present an approach that uses decision trees to discover fault from data in real-time and capture the contents of fault trees as the initial state of the trees.
Predicting Rotator Cuff Tears Using Data Mining and Bayesian Likelihood Ratios
Lu, Hsueh-Yi; Huang, Chen-Yuan; Su, Chwen-Tzeng; Lin, Chen-Chiang
2014-01-01
Objectives Rotator cuff tear is a common cause of shoulder diseases. Correct diagnosis of rotator cuff tears can save patients from further invasive, costly and painful tests. This study used predictive data mining and Bayesian theory to improve the accuracy of diagnosing rotator cuff tears by clinical examination alone. Methods In this retrospective study, 169 patients who had a preliminary diagnosis of rotator cuff tear on the basis of clinical evaluation followed by confirmatory MRI between 2007 and 2011 were identified. MRI was used as a reference standard to classify rotator cuff tears. The predictor variable was the clinical assessment results, which consisted of 16 attributes. This study employed 2 data mining methods (ANN and the decision tree) and a statistical method (logistic regression) to classify the rotator cuff diagnosis into “tear” and “no tear” groups. Likelihood ratio and Bayesian theory were applied to estimate the probability of rotator cuff tears based on the results of the prediction models. Results Our proposed data mining procedures outperformed the classic statistical method. The correction rate, sensitivity, specificity and area under the ROC curve of predicting a rotator cuff tear were statistical better in the ANN and decision tree models compared to logistic regression. Based on likelihood ratios derived from our prediction models, Fagan's nomogram could be constructed to assess the probability of a patient who has a rotator cuff tear using a pretest probability and a prediction result (tear or no tear). Conclusions Our predictive data mining models, combined with likelihood ratios and Bayesian theory, appear to be good tools to classify rotator cuff tears as well as determine the probability of the presence of the disease to enhance diagnostic decision making for rotator cuff tears. PMID:24733553
NASA Astrophysics Data System (ADS)
Althuwaynee, Omar F.; Pradhan, Biswajeet; Ahmad, Noordin
2014-06-01
This article uses methodology based on chi-squared automatic interaction detection (CHAID), as a multivariate method that has an automatic classification capacity to analyse large numbers of landslide conditioning factors. This new algorithm was developed to overcome the subjectivity of the manual categorization of scale data of landslide conditioning factors, and to predict rainfall-induced susceptibility map in Kuala Lumpur city and surrounding areas using geographic information system (GIS). The main objective of this article is to use CHi-squared automatic interaction detection (CHAID) method to perform the best classification fit for each conditioning factor, then, combining it with logistic regression (LR). LR model was used to find the corresponding coefficients of best fitting function that assess the optimal terminal nodes. A cluster pattern of landslide locations was extracted in previous study using nearest neighbor index (NNI), which were then used to identify the clustered landslide locations range. Clustered locations were used as model training data with 14 landslide conditioning factors such as; topographic derived parameters, lithology, NDVI, land use and land cover maps. Pearson chi-squared value was used to find the best classification fit between the dependent variable and conditioning factors. Finally the relationship between conditioning factors were assessed and the landslide susceptibility map (LSM) was produced. An area under the curve (AUC) was used to test the model reliability and prediction capability with the training and validation landslide locations respectively. This study proved the efficiency and reliability of decision tree (DT) model in landslide susceptibility mapping. Also it provided a valuable scientific basis for spatial decision making in planning and urban management studies.
Calibration of remotely sensed, coarse resolution NDVI to CO2 fluxes in a sagebrush–steppe ecosystem
Wylie, Bruce K.; Johnson, Douglas A.; Laca, Emilio; Saliendra, Nicanor Z.; Gilmanov, Tagir G.; Reed, Bradley C.; Tieszen, Larry L.; Worstell, Bruce B.
2003-01-01
The net ecosystem exchange (NEE) of carbon flux can be partitioned into gross primary productivity (GPP) and respiration (R). The contribution of remote sensing and modeling holds the potential to predict these components and map them spatially and temporally. This has obvious utility to quantify carbon sink and source relationships and to identify improved land management strategies for optimizing carbon sequestration. The objective of our study was to evaluate prediction of 14-day average daytime CO2 fluxes (Fday) and nighttime CO2 fluxes (Rn) using remote sensing and other data. Fday and Rnwere measured with a Bowen ratio–energy balance (BREB) technique in a sagebrush (Artemisia spp.)–steppe ecosystem in northeast Idaho, USA, during 1996–1999. Micrometeorological variables aggregated across 14-day periods and time-integrated Advanced Very High Resolution Radiometer (AVHRR) Normalized Difference Vegetation Index (iNDVI) were determined during four growing seasons (1996–1999) and used to predict Fday and Rn. We found that iNDVI was a strong predictor of Fday(R2=0.79, n=66, P<0.0001). Inclusion of evapotranspiration in the predictive equation led to improved predictions of Fday (R2=0.82, n=66, P<0.0001). Crossvalidation indicated that regression tree predictions of Fday were prone to overfitting and that linear regression models were more robust. Multiple regression and regression tree models predicted Rn quite well (R2=0.75–0.77, n=66) with the regression tree model being slightly more robust in crossvalidation. Temporal mapping of Fday and Rn is possible with these techniques and would allow the assessment of NEE in sagebrush–steppe ecosystems. Simulations of periodic Fday measurements, as might be provided by a mobile flux tower, indicated that such measurements could be used in combination with iNDVI to accurately predict Fday. These periodic measurements could maximize the utility of expensive flux towers for evaluating various carbon management strategies, carbon certification, and validation and calibration of carbon flux models.
Calibration of remotely sensed, coarse resolution NDVI to CO2 fluxes in a sagebrush-steppe ecosystem
Wylie, B.K.; Johnson, D.A.; Laca, Emilio; Saliendra, Nicanor Z.; Gilmanov, T.G.; Reed, B.C.; Tieszen, L.L.; Worstell, B.B.
2003-01-01
The net ecosystem exchange (NEE) of carbon flux can be partitioned into gross primary productivity (GPP) and respiration (R). The contribution of remote sensing and modeling holds the potential to predict these components and map them spatially and temporally. This has obvious utility to quantify carbon sink and source relationships and to identify improved land management strategies for optimizing carbon sequestration. The objective of our study was to evaluate prediction of 14-day average daytime CO2 fluxes (Fday) and nighttime CO2 fluxes (Rn) using remote sensing and other data. Fday and Rn were measured with a Bowen ratio-energy balance (BREB) technique in a sagebrush (Artemisia spp.)-steppe ecosystem in northeast Idaho, USA, during 1996-1999. Micrometeorological variables aggregated across 14-day periods and time-integrated Advanced Very High Resolution Radiometer (AVHRR) Normalized Difference Vegetation Index (iNDVI) were determined during four growing seasons (1996-1999) and used to predict Fday and Rn. We found that iNDVI was a strong predictor of Fday (R2 = 0.79, n = 66, P < 0.0001). Inclusion of evapotranspiration in the predictive equation led to improved predictions of Fday (R2= 0.82, n = 66, P < 0.0001). Crossvalidation indicated that regression tree predictions of Fday were prone to overfitting and that linear regression models were more robust. Multiple regression and regression tree models predicted Rn quite well (R2 = 0.75-0.77, n = 66) with the regression tree model being slightly more robust in crossvalidation. Temporal mapping of Fday and Rn is possible with these techniques and would allow the assessment of NEE in sagebrush-steppe ecosystems. Simulations of periodic Fday measurements, as might be provided by a mobile flux tower, indicated that such measurements could be used in combination with iNDVI to accurately predict Fday. These periodic measurements could maximize the utility of expensive flux towers for evaluating various carbon management strategies, carbon certification, and validation and calibration of carbon flux models. ?? 2003 Elsevier Science Inc. All rights reserved.
Batterham, Philip J; Christensen, Helen; Mackinnon, Andrew J
2009-11-22
Relative to physical health conditions such as cardiovascular disease, little is known about risk factors that predict the prevalence of depression. The present study investigates the expected effects of a reduction of these risks over time, using the decision tree method favoured in assessing cardiovascular disease risk. The PATH through Life cohort was used for the study, comprising 2,105 20-24 year olds, 2,323 40-44 year olds and 2,177 60-64 year olds sampled from the community in the Canberra region, Australia. A decision tree methodology was used to predict the presence of major depressive disorder after four years of follow-up. The decision tree was compared with a logistic regression analysis using ROC curves. The decision tree was found to distinguish and delineate a wide range of risk profiles. Previous depressive symptoms were most highly predictive of depression after four years, however, modifiable risk factors such as substance use and employment status played significant roles in assessing the risk of depression. The decision tree was found to have better sensitivity and specificity than a logistic regression using identical predictors. The decision tree method was useful in assessing the risk of major depressive disorder over four years. Application of the model to the development of a predictive tool for tailored interventions is discussed.
Imai, Kenji; Takai, Koji; Watanabe, Satoshi; Hanai, Tatsunori; Suetsugu, Atsushi; Shiraki, Makoto; Shimizu, Masahito
2017-09-22
Sarcopenia impairs survival in patients with hepatocellular carcinoma (HCC). This study aimed to clarify the factors that contribute to decreased skeletal muscle volume in patients with HCC. The third lumbar vertebra skeletal muscle index (L3 SMI) in 351 consecutive patients with HCC was calculated to identify sarcopenia. Sarcopenia was defined as an L3 SMI value ≤ 29.0 cm²/m² for women and ≤ 36.0 cm²/m² for men. The factors affecting L3 SMI were analyzed by multiple linear regression analysis and tree-based models. Of the 351 HCC patients, 33 were diagnosed as having sarcopenia and showed poor prognosis compared with non-sarcopenia patients ( p = 0.007). However, this significant difference disappeared after the adjustments for age, sex, Child-Pugh score, maximum tumor size, tumor number, and the degree of portal vein invasion by propensity score matching analysis. Multiple linear regression analysis showed that age ( p = 0.015) and sex ( p < 0.0001) were significantly correlated with a decrease in L3 SMI. Tree-based models revealed that sex (female) is the most significant factor that affects L3 SMI. In male patients, L3 SMI was decreased by aging, increased Child-Pugh score (≥56 years), and enlarged tumor size (<56 years). Maintaining liver functional reserve and early diagnosis and therapy for HCC are vital to prevent skeletal muscle depletion and improve the prognosis of patients with HCC.
Vegetation Continuous Fields--Transitioning from MODIS to VIIRS
NASA Astrophysics Data System (ADS)
DiMiceli, C.; Townshend, J. R.; Sohlberg, R. A.; Kim, D. H.; Kelly, M.
2015-12-01
Measurements of fractional vegetation cover are critical for accurate and consistent monitoring of global deforestation rates. They also provide important parameters for land surface, climate and carbon models and vital background data for research into fire, hydrological and ecosystem processes. MODIS Vegetation Continuous Fields (VCF) products provide four complementary layers of fractional cover: tree cover, non-tree vegetation, bare ground, and surface water. MODIS VCF products are currently produced globally and annually at 250m resolution for 2000 to the present. Additionally, annual VCF products at 1/20° resolution derived from AVHRR and MODIS Long-Term Data Records are in development to provide Earth System Data Records of fractional vegetation cover for 1982 to the present. In order to provide continuity of these valuable products, we are extending the VCF algorithms to create Suomi NPP/VIIRS VCF products. This presentation will highlight the first VIIRS fractional cover product: global percent tree cover at 1 km resolution. To create this product, phenological and physiological metrics were derived from each complete year of VIIRS 8-day surface reflectance products. A supervised regression tree method was applied to the metrics, using training derived from Landsat data supplemented by high-resolution data from Ikonos, RapidEye and QuickBird. The regression tree model was then applied globally to produce fractional tree cover. In our presentation we will detail our methods for creating the VIIRS VCF product. We will compare the new VIIRS VCF product to our current MODIS VCF products and demonstrate continuity between instruments. Finally, we will outline future VIIRS VCF development plans.
Van Boeckel, Thomas P; Thanapongtharm, Weerapong; Robinson, Timothy; Biradar, Chandrashekhar M; Xiao, Xiangming; Gilbert, Marius
2012-01-01
Since 1996 when Highly Pathogenic Avian Influenza type H5N1 first emerged in southern China, numerous studies sought risk factors and produced risk maps based on environmental and anthropogenic predictors. However little attention has been paid to the link between the level of intensification of poultry production and the risk of outbreak. This study revised H5N1 risk mapping in Central and Western Thailand during the second wave of the 2004 epidemic. Production structure was quantified using a disaggregation methodology based on the number of poultry per holding. Population densities of extensively- and intensively-raised ducks and chickens were derived both at the sub-district and at the village levels. LandSat images were used to derive another previously neglected potential predictor of HPAI H5N1 risk: the proportion of water in the landscape resulting from floods. We used Monte Carlo simulation of Boosted Regression Trees models of predictor variables to characterize the risk of HPAI H5N1. Maps of mean risk and uncertainty were derived both at the sub-district and the village levels. The overall accuracy of Boosted Regression Trees models was comparable to that of logistic regression approaches. The proportion of area flooded made the highest contribution to predicting the risk of outbreak, followed by the densities of intensively-raised ducks, extensively-raised ducks and human population. Our results showed that as little as 15% of flooded land in villages is sufficient to reach the maximum level of risk associated with this variable. The spatial pattern of predicted risk is similar to previous work: areas at risk are mainly located along the flood plain of the Chao Phraya river and to the south-east of Bangkok. Using high-resolution village-level poultry census data, rather than sub-district data, the spatial accuracy of predictions was enhanced to highlight local variations in risk. Such maps provide useful information to guide intervention.
Van Boeckel, Thomas P.; Thanapongtharm, Weerapong; Robinson, Timothy; Biradar, Chandrashekhar M.; Xiao, Xiangming; Gilbert, Marius
2012-01-01
Since 1996 when Highly Pathogenic Avian Influenza type H5N1 first emerged in southern China, numerous studies sought risk factors and produced risk maps based on environmental and anthropogenic predictors. However little attention has been paid to the link between the level of intensification of poultry production and the risk of outbreak. This study revised H5N1 risk mapping in Central and Western Thailand during the second wave of the 2004 epidemic. Production structure was quantified using a disaggregation methodology based on the number of poultry per holding. Population densities of extensively- and intensively-raised ducks and chickens were derived both at the sub-district and at the village levels. LandSat images were used to derive another previously neglected potential predictor of HPAI H5N1 risk: the proportion of water in the landscape resulting from floods. We used Monte Carlo simulation of Boosted Regression Trees models of predictor variables to characterize the risk of HPAI H5N1. Maps of mean risk and uncertainty were derived both at the sub-district and the village levels. The overall accuracy of Boosted Regression Trees models was comparable to that of logistic regression approaches. The proportion of area flooded made the highest contribution to predicting the risk of outbreak, followed by the densities of intensively-raised ducks, extensively-raised ducks and human population. Our results showed that as little as 15% of flooded land in villages is sufficient to reach the maximum level of risk associated with this variable. The spatial pattern of predicted risk is similar to previous work: areas at risk are mainly located along the flood plain of the Chao Phraya river and to the south-east of Bangkok. Using high-resolution village-level poultry census data, rather than sub-district data, the spatial accuracy of predictions was enhanced to highlight local variations in risk. Such maps provide useful information to guide intervention. PMID:23185352
B. Desta Fekedulegn; J.J. Colbert; R.R., Jr. Hicks; Michael E. Schuckers
2002-01-01
The theory and application of principal components regression, a method for coping with multicollinearity among independent variables in analyzing ecological data, is exhibited in detail. A concrete example of the complex procedures that must be carried out in developing a diagnostic growth-climate model is provided. We use tree radial increment data taken from breast...
Chiriac, Anca Mirela; Wang, Youna; Schrijvers, Rik; Bousquet, Philippe Jean; Mura, Thibault; Molinari, Nicolas; Demoly, Pascal
Beta-lactam antibiotics represent the main cause of allergic reactions to drugs, inducing both immediate and nonimmediate allergies. The diagnosis is well established, usually based on skin tests and drug provocation tests, but cumbersome. To design predictive models for the diagnosis of beta-lactam allergy, based on the clinical history of patients with suspicions of allergic reactions to beta-lactams. The study included a retrospective phase, in which records of patients explored for a suspicion of beta-lactam allergy (in the Allergy Unit of the University Hospital of Montpellier between September 1996 and September 2012) were used to construct predictive models based on a logistic regression and decision tree method; a prospective phase, in which we performed an external validation of the chosen models in patients with suspicion of beta-lactam allergy recruited from 3 allergy centers (Montpellier, Nîmes, Narbonne) between March and November 2013. Data related to clinical history and allergy evaluation results were retrieved and analyzed. The retrospective and prospective phases included 1991 and 200 patients, respectively, with a different prevalence of confirmed beta-lactam allergy (23.6% vs 31%, P = .02). For the logistic regression method, performances of the models were similar in both samples: sensitivity was 51% (vs 60%), specificity 75% (vs 80%), positive predictive value 40% (vs 57%), and negative predictive value 83% (vs 82%). The decision tree method reached a sensitivity of 29.5% (vs 43.5%), specificity of 96.4% (vs 94.9%), positive predictive value of 71.6% (vs 79.4%), and negative predictive value of 81.6% (vs 81.3%). Two different independent methods using clinical history predictors were unable to accurately predict beta-lactam allergy and replace a conventional allergy evaluation for suspected beta-lactam allergy. Copyright © 2017 American Academy of Allergy, Asthma & Immunology. Published by Elsevier Inc. All rights reserved.
Using data mining to predict success in a weight loss trial.
Batterham, M; Tapsell, L; Charlton, K; O'Shea, J; Thorne, R
2017-08-01
Traditional methods for predicting weight loss success use regression approaches, which make the assumption that the relationships between the independent and dependent (or logit of the dependent) variable are linear. The aim of the present study was to investigate the relationship between common demographic and early weight loss variables to predict weight loss success at 12 months without making this assumption. Data mining methods (decision trees, generalised additive models and multivariate adaptive regression splines), in addition to logistic regression, were employed to predict: (i) weight loss success (defined as ≥5%) at the end of a 12-month dietary intervention using demographic variables [body mass index (BMI), sex and age]; percentage weight loss at 1 month; and (iii) the difference between actual and predicted weight loss using an energy balance model. The methods were compared by assessing model parsimony and the area under the curve (AUC). The decision tree provided the most clinically useful model and had a good accuracy (AUC 0.720 95% confidence interval = 0.600-0.840). Percentage weight loss at 1 month (≥0.75%) was the strongest predictor for successful weight loss. Within those individuals losing ≥0.75%, individuals with a BMI (≥27 kg m -2 ) were more likely to be successful than those with a BMI between 25 and 27 kg m -2 . Data mining methods can provide a more accurate way of assessing relationships when conventional assumptions are not met. In the present study, a decision tree provided the most parsimonious model. Given that early weight loss cannot be predicted before randomisation, incorporating this information into a post randomisation trial design may give better weight loss results. © 2017 The British Dietetic Association Ltd.
Seeing the forest and the trees: multilevel models reveal both species and community patterns
Michelle M. Jackson; Monica G. Turner; Scott M. Pearson; Anthony R. Ives
2012-01-01
Studies designed to understand species distributions and community assemblages typically use separate analytical approaches (e.g., logistic regression and ordination) to model the distribution of individual species and to relate community composition to environmental variation. Multilevel models (MLMs) offer a promising strategy for integrating species and community-...
Post-Modeling Histogram Matching of Maps Produced Using Regression Trees
Andrew J. Lister; Tonya W. Lister
2006-01-01
Spatial predictive models often use statistical techniques that in some way rely on averaging of values. Estimates from linear modeling are known to be susceptible to truncation of variance when the independent (predictor) variables are measured with error. A straightforward post-processing technique (histogram matching) for attempting to mitigate this effect is...
USDA-ARS?s Scientific Manuscript database
In order to control algal blooms, stressor-response relationships between water quality metrics, environmental variables, and algal growth should be understood and modeled. Machine-learning methods were suggested to express stressor-response relationships found by application of mechanistic water qu...
A Survival Model for Shortleaf Pine Tress Growing in Uneven-Aged Stands
Thomas B. Lynch; Lawrence R. Gering; Michael M. Huebschmann; Paul A. Murphy
1999-01-01
A survival model for shortleaf pine (Pinus echinata Mill.) trees growing in uneven-aged stands was developed using data from permanently established plots maintained by an industrial forestry company in western Arkansas. Parameters were fitted to a logistic regression model with a Bernoulli dependent variable in which "0" represented...
Park, Myonghwa; Choi, Sora; Shin, A Mi; Koo, Chul Hoi
2013-02-01
The purpose of this study was to develop a prediction model for the characteristics of older adults with depression using the decision tree method. A large dataset from the 2008 Korean Elderly Survey was used and data of 14,970 elderly people were analyzed. Target variable was depression and 53 input variables were general characteristics, family & social relationship, economic status, health status, health behavior, functional status, leisure & social activity, quality of life, and living environment. Data were analyzed by decision tree analysis, a data mining technique using SPSS Window 19.0 and Clementine 12.0 programs. The decision trees were classified into five different rules to define the characteristics of older adults with depression. Classification & Regression Tree (C&RT) showed the best prediction with an accuracy of 80.81% among data mining models. Factors in the rules were life satisfaction, nutritional status, daily activity difficulty due to pain, functional limitation for basic or instrumental daily activities, number of chronic diseases and daily activity difficulty due to disease. The different rules classified by the decision tree model in this study should contribute as baseline data for discovering informative knowledge and developing interventions tailored to these individual characteristics.
Hostettler, Isabel Charlotte; Muroi, Carl; Richter, Johannes Konstantin; Schmid, Josef; Neidert, Marian Christoph; Seule, Martin; Boss, Oliver; Pangalu, Athina; Germans, Menno Robbert; Keller, Emanuela
2018-01-19
OBJECTIVE The aim of this study was to create prediction models for outcome parameters by decision tree analysis based on clinical and laboratory data in patients with aneurysmal subarachnoid hemorrhage (aSAH). METHODS The database consisted of clinical and laboratory parameters of 548 patients with aSAH who were admitted to the Neurocritical Care Unit, University Hospital Zurich. To examine the model performance, the cohort was randomly divided into a derivation cohort (60% [n = 329]; training data set) and a validation cohort (40% [n = 219]; test data set). The classification and regression tree prediction algorithm was applied to predict death, functional outcome, and ventriculoperitoneal (VP) shunt dependency. Chi-square automatic interaction detection was applied to predict delayed cerebral infarction on days 1, 3, and 7. RESULTS The overall mortality was 18.4%. The accuracy of the decision tree models was good for survival on day 1 and favorable functional outcome at all time points, with a difference between the training and test data sets of < 5%. Prediction accuracy for survival on day 1 was 75.2%. The most important differentiating factor was the interleukin-6 (IL-6) level on day 1. Favorable functional outcome, defined as Glasgow Outcome Scale scores of 4 and 5, was observed in 68.6% of patients. Favorable functional outcome at all time points had a prediction accuracy of 71.1% in the training data set, with procalcitonin on day 1 being the most important differentiating factor at all time points. A total of 148 patients (27%) developed VP shunt dependency. The most important differentiating factor was hyperglycemia on admission. CONCLUSIONS The multiple variable analysis capability of decision trees enables exploration of dependent variables in the context of multiple changing influences over the course of an illness. The decision tree currently generated increases awareness of the early systemic stress response, which is seemingly pertinent for prognostication.
Holtschlag, David J.; Shively, Dawn; Whitman, Richard L.; Haack, Sheridan K.; Fogarty, Lisa R.
2008-01-01
Regression analyses and hydrodynamic modeling were used to identify environmental factors and flow paths associated with Escherichia coli (E. coli) concentrations at Memorial and Metropolitan Beaches on Lake St. Clair in Macomb County, Mich. Lake St. Clair is part of the binational waterway between the United States and Canada that connects Lake Huron with Lake Erie in the Great Lakes Basin. Linear regression, regression-tree, and logistic regression models were developed from E. coli concentration and ancillary environmental data. Linear regression models on log10 E. coli concentrations indicated that rainfall prior to sampling, water temperature, and turbidity were positively associated with bacteria concentrations at both beaches. Flow from Clinton River, changes in water levels, wind conditions, and log10 E. coli concentrations 2 days before or after the target bacteria concentrations were statistically significant at one or both beaches. In addition, various interaction terms were significant at Memorial Beach. Linear regression models for both beaches explained only about 30 percent of the variability in log10 E. coli concentrations. Regression-tree models were developed from data from both Memorial and Metropolitan Beaches but were found to have limited predictive capability in this study. The results indicate that too few observations were available to develop reliable regression-tree models. Linear logistic models were developed to estimate the probability of E. coli concentrations exceeding 300 most probable number (MPN) per 100 milliliters (mL). Rainfall amounts before bacteria sampling were positively associated with exceedance probabilities at both beaches. Flow of Clinton River, turbidity, and log10 E. coli concentrations measured before or after the target E. coli measurements were related to exceedances at one or both beaches. The linear logistic models were effective in estimating bacteria exceedances at both beaches. A receiver operating characteristic (ROC) analysis was used to determine cut points for maximizing the true positive rate prediction while minimizing the false positive rate. A two-dimensional hydrodynamic model was developed to simulate horizontal current patterns on Lake St. Clair in response to wind, flow, and water-level conditions at model boundaries. Simulated velocity fields were used to track hypothetical massless particles backward in time from the beaches along flow paths toward source areas. Reverse particle tracking for idealized steady-state conditions shows changes in expected flow paths and traveltimes with wind speeds and directions from 24 sectors. The results indicate that three to four sets of contiguous wind sectors have similar effects on flow paths in the vicinity of the beaches. In addition, reverse particle tracking was used for transient conditions to identify expected flow paths for 10 E. coli sampling events in 2004. These results demonstrate the ability to track hypothetical particles from the beaches, backward in time, to likely source areas. This ability, coupled with a greater frequency of bacteria sampling, may provide insight into changes in bacteria concentrations between source and sink areas.
Koch, George W; Sillett, Stephen C; Jennings, Gregory M; Davis, Stephen D
2004-04-22
Trees grow tall where resources are abundant, stresses are minor, and competition for light places a premium on height growth. The height to which trees can grow and the biophysical determinants of maximum height are poorly understood. Some models predict heights of up to 120 m in the absence of mechanical damage, but there are historical accounts of taller trees. Current hypotheses of height limitation focus on increasing water transport constraints in taller trees and the resulting reductions in leaf photosynthesis. We studied redwoods (Sequoia sempervirens), including the tallest known tree on Earth (112.7 m), in wet temperate forests of northern California. Our regression analyses of height gradients in leaf functional characteristics estimate a maximum tree height of 122-130 m barring mechanical damage, similar to the tallest recorded trees of the past. As trees grow taller, increasing leaf water stress due to gravity and path length resistance may ultimately limit leaf expansion and photosynthesis for further height growth, even with ample soil moisture.
Kropat, Georg; Bochud, Francois; Jaboyedoff, Michel; Laedermann, Jean-Pascal; Murith, Christophe; Palacios Gruson, Martha; Baechler, Sébastien
2015-09-01
According to estimations around 230 people die as a result of radon exposure in Switzerland. This public health concern makes reliable indoor radon prediction and mapping methods necessary in order to improve risk communication to the public. The aim of this study was to develop an automated method to classify lithological units according to their radon characteristics and to develop mapping and predictive tools in order to improve local radon prediction. About 240 000 indoor radon concentration (IRC) measurements in about 150 000 buildings were available for our analysis. The automated classification of lithological units was based on k-medoids clustering via pair-wise Kolmogorov distances between IRC distributions of lithological units. For IRC mapping and prediction we used random forests and Bayesian additive regression trees (BART). The automated classification groups lithological units well in terms of their IRC characteristics. Especially the IRC differences in metamorphic rocks like gneiss are well revealed by this method. The maps produced by random forests soundly represent the regional difference of IRCs in Switzerland and improve the spatial detail compared to existing approaches. We could explain 33% of the variations in IRC data with random forests. Additionally, the influence of a variable evaluated by random forests shows that building characteristics are less important predictors for IRCs than spatial/geological influences. BART could explain 29% of IRC variability and produced maps that indicate the prediction uncertainty. Ensemble regression trees are a powerful tool to model and understand the multidimensional influences on IRCs. Automatic clustering of lithological units complements this method by facilitating the interpretation of radon properties of rock types. This study provides an important element for radon risk communication. Future approaches should consider taking into account further variables like soil gas radon measurements as well as more detailed geological information. Copyright © 2015 Elsevier Ltd. All rights reserved.
Machine learning models for lipophilicity and their domain of applicability.
Schroeter, Timon; Schwaighofer, Anton; Mika, Sebastian; Laak, Antonius Ter; Suelzle, Detlev; Ganzer, Ursula; Heinrich, Nikolaus; Müller, Klaus-Robert
2007-01-01
Unfavorable lipophilicity and water solubility cause many drug failures; therefore these properties have to be taken into account early on in lead discovery. Commercial tools for predicting lipophilicity usually have been trained on small and neutral molecules, and are thus often unable to accurately predict in-house data. Using a modern Bayesian machine learning algorithm--a Gaussian process model--this study constructs a log D7 model based on 14,556 drug discovery compounds of Bayer Schering Pharma. Performance is compared with support vector machines, decision trees, ridge regression, and four commercial tools. In a blind test on 7013 new measurements from the last months (including compounds from new projects) 81% were predicted correctly within 1 log unit, compared to only 44% achieved by commercial software. Additional evaluations using public data are presented. We consider error bars for each method (model based error bars, ensemble based, and distance based approaches), and investigate how well they quantify the domain of applicability of each model.
Louis R. Iverson; Frank R. Thompson; Stephen Matthews; Matthew Peters; Anantha Prasad; William D. Dijak; Jacob Fraser; Wen J. Wang; Brice Hanberry; Hong He; Maria Janowiak; Patricia Butler; Leslie Brandt; Chris Swanston
2016-01-01
Context. Species distribution models (SDM) establish statistical relationships between the current distribution of species and key attributes whereas process-based models simulate ecosystem and tree species dynamics based on representations of physical and biological processes. TreeAtlas, which uses DISTRIB SDM, and Linkages and LANDIS PRO, process...
Ramezankhani, Roghieh; Sajjadi, Nooshin; Nezakati Esmaeilzadeh, Roya; Jozi, Seyed Ali; Shirzadi, Mohammad Reza
2018-05-08
Cutaneous Leishmaniasis (CL) is a neglected tropical disease that continues to be a health problem in Iran. Nearly 350 million people are thought to be at risk. We investigated the impact of the environmental factors on CL incidence during the period 2007- 2015 in a known endemic area for this disease in Isfahan Province, Iran. After collecting data with regard to the climatic, topographic, vegetation coverage and CL cases in the study area, a decision tree model was built using the classification and regression tree algorithm. CL data for the years 2007 until 2012 were used for model construction and the data for the years 2013 until 2015 were used for testing the model. The Root Mean Square error and the correlation factor were used to evaluate the predictive performance of the decision tree model. We found that wind speeds less than 14 m/s, altitudes between 1234 and 1810 m above the mean sea level, vegetation coverage according to the normalized difference vegetation index (NDVI) less than 0.12, rainfall less than 1.6 mm and air temperatures higher than 30°C would correspond to a seasonal incidence of 163.28 per 100,000 persons, while if wind speed is less than 14 m/s, altitude less than 1,810 m and NDVI higher than 0.12, then the mean seasonal incidence of the disease would be 2.27 per 100,000 persons. Environmental factors were found to be important predictive variables for CL incidence and should be considered in surveillance and prevention programmes for CL control.
Bakal, Gokhan; Talari, Preetham; Kakani, Elijah V; Kavuluru, Ramakanth
2018-06-01
Identifying new potential treatment options for medical conditions that cause human disease burden is a central task of biomedical research. Since all candidate drugs cannot be tested with animal and clinical trials, in vitro approaches are first attempted to identify promising candidates. Likewise, identifying different causal relations between biomedical entities is also critical to understand biomedical processes. Generally, natural language processing (NLP) and machine learning are used to predict specific relations between any given pair of entities using the distant supervision approach. To build high accuracy supervised predictive models to predict previously unknown treatment and causative relations between biomedical entities based only on semantic graph pattern features extracted from biomedical knowledge graphs. We used 7000 treats and 2918 causes hand-curated relations from the UMLS Metathesaurus to train and test our models. Our graph pattern features are extracted from simple paths connecting biomedical entities in the SemMedDB graph (based on the well-known SemMedDB database made available by the U.S. National Library of Medicine). Using these graph patterns connecting biomedical entities as features of logistic regression and decision tree models, we computed mean performance measures (precision, recall, F-score) over 100 distinct 80-20% train-test splits of the datasets. For all experiments, we used a positive:negative class imbalance of 1:10 in the test set to model relatively more realistic scenarios. Our models predict treats and causes relations with high F-scores of 99% and 90% respectively. Logistic regression model coefficients also help us identify highly discriminative patterns that have an intuitive interpretation. We are also able to predict some new plausible relations based on false positives that our models scored highly based on our collaborations with two physician co-authors. Finally, our decision tree models are able to retrieve over 50% of treatment relations from a recently created external dataset. We employed semantic graph patterns connecting pairs of candidate biomedical entities in a knowledge graph as features to predict treatment/causative relations between them. We provide what we believe is the first evidence in direct prediction of biomedical relations based on graph features. Our work complements lexical pattern based approaches in that the graph patterns can be used as additional features for weakly supervised relation prediction. Copyright © 2018 Elsevier Inc. All rights reserved.
Pitcher, Brandon; Alaqla, Ali; Noujeim, Marcel; Wealleans, James A; Kotsakis, Georgios; Chrepa, Vanessa
2017-03-01
Cone-beam computed tomographic (CBCT) analysis allows for 3-dimensional assessment of periradicular lesions and may facilitate preoperative periapical cyst screening. The purpose of this study was to develop and assess the predictive validity of a cyst screening method based on CBCT volumetric analysis alone or combined with designated radiologic criteria. Three independent examiners evaluated 118 presurgical CBCT scans from cases that underwent apicoectomies and had an accompanying gold standard histopathological diagnosis of either a cyst or granuloma. Lesion volume, density, and specific radiologic characteristics were assessed using specialized software. Logistic regression models with histopathological diagnosis as the dependent variable were constructed for cyst prediction, and receiver operating characteristic curves were used to assess the predictive validity of the models. A conditional inference binary decision tree based on a recursive partitioning algorithm was constructed to facilitate preoperative screening. Interobserver agreement was excellent for volume and density, but it varied from poor to good for the radiologic criteria. Volume and root displacement were strong predictors for cyst screening in all analyses. The binary decision tree classifier determined that if the volume of the lesion was >247 mm 3 , there was 80% probability of a cyst. If volume was <247 mm 3 and root displacement was present, cyst probability was 60% (78% accuracy). The good accuracy and high specificity of the decision tree classifier renders it a useful preoperative cyst screening tool that can aid in clinical decision making but not a substitute for definitive histopathological diagnosis after biopsy. Confirmatory studies are required to validate the present findings. Published by Elsevier Inc.
NASA Astrophysics Data System (ADS)
Muller, Sybrand Jacobus; van Niekerk, Adriaan
2016-07-01
Soil salinity often leads to reduced crop yield and quality and can render soils barren. Irrigated areas are particularly at risk due to intensive cultivation and secondary salinization caused by waterlogging. Regular monitoring of salt accumulation in irrigation schemes is needed to keep its negative effects under control. The dynamic spatial and temporal characteristics of remote sensing can provide a cost-effective solution for monitoring salt accumulation at irrigation scheme level. This study evaluated a range of pan-fused SPOT-5 derived features (spectral bands, vegetation indices, image textures and image transformations) for classifying salt-affected areas in two distinctly different irrigation schemes in South Africa, namely Vaalharts and Breede River. The relationship between the input features and electro conductivity measurements were investigated using regression modelling (stepwise linear regression, partial least squares regression, curve fit regression modelling) and supervised classification (maximum likelihood, nearest neighbour, decision tree analysis, support vector machine and random forests). Classification and regression trees and random forest were used to select the most important features for differentiating salt-affected and unaffected areas. The results showed that the regression analyses produced weak models (<0.4 R squared). Better results were achieved using the supervised classifiers, but the algorithms tend to over-estimate salt-affected areas. A key finding was that none of the feature sets or classification algorithms stood out as being superior for monitoring salt accumulation at irrigation scheme level. This was attributed to the large variations in the spectral responses of different crops types at different growing stages, coupled with their individual tolerances to saline conditions.
Fang, H; Lu, B; Wang, X; Zheng, L; Sun, K; Cai, W
2017-08-17
This study proposed a decision tree model to screen upper urinary tract damage (UUTD) for patients with neurogenic bladder (NGB). Thirty-four NGB patients with UUTD were recruited in the case group, while 78 without UUTD were included in the control group. A decision tree method, classification and regression tree (CART), was then applied to develop the model in which UUTD was used as a dependent variable and history of urinary tract infections, bladder management, conservative treatment, and urodynamic findings were used as independent variables. The urethra function factor was found to be the primary screening information of patients and treated as the root node of the tree; Pabd max (maximum abdominal pressure, >14 cmH2O), Pves max (maximum intravesical pressure, ≤89 cmH2O), and gender (female) were also variables associated with UUTD. The accuracy of the proposed model was 84.8%, and the area under curve was 0.901 (95%CI=0.844-0.958), suggesting that the decision tree model might provide a new and convenient way to screen UUTD for NGB patients in both undeveloped and developing areas.
Weichenthal, Scott; Van Ryswyk, Keith; Goldstein, Alon; Shekarrizfard, Maryam; Hatzopoulou, Marianne
2016-01-01
Exposure models are needed to evaluate the chronic health effects of ambient ultrafine particles (<0.1 μm) (UFPs). We developed a land use regression model for ambient UFPs in Toronto, Canada using mobile monitoring data collected during summer/winter 2010-2011. In total, 405 road segments were included in the analysis. The final model explained 67% of the spatial variation in mean UFPs and included terms for the logarithm of distances to highways, major roads, the central business district, Pearson airport, and bus routes as well as variables for the number of on-street trees, parks, open space, and the length of bus routes within a 100 m buffer. There was no systematic difference between measured and predicted values when the model was evaluated in an external dataset, although the R(2) value decreased (R(2) = 50%). This model will be used to evaluate the chronic health effects of UFPs using population-based cohorts in the Toronto area. Crown Copyright © 2015. Published by Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Togashi, Henrique; Prentice, Colin; Evans, Bradley; Forrester, David; Drake, Paul; Feikema, Paul; Brooksbank, Kim; Eamus, Derek; Taylor, Daniel
2014-05-01
The leaf area to sapwood area ratio (LA:SA) is a key plant trait that links photosynthesis to transpiration. Pipe model theory states that the sapwood cross-sectional area of a stem or branch at any point should scale isometrically with the area of leaves distal to that point. Optimization theory further suggests that LA:SA should decrease towards drier climates. Although acclimation of LA:SA to climate has been reported within species, much less is known about the scaling of this trait with climate among species. We compiled LA:SA measurements from 184 species of Australian evergreen angiosperm trees. The pipe model was broadly confirmed, based on measurements on branches and trunks of trees from one to 27 years old. We found considerable scatter in LA:SA among species. However quantile regression showed strong (0.2
Togashi, Henrique Furstenau; Prentice, Iain Colin; Evans, Bradley John; Forrester, David Ian; Drake, Paul; Feikema, Paul; Brooksbank, Kim; Eamus, Derek; Taylor, Daniel
2015-03-01
The leaf area-to-sapwood area ratio (LA:SA) is a key plant trait that links photosynthesis to transpiration. The pipe model theory states that the sapwood cross-sectional area of a stem or branch at any point should scale isometrically with the area of leaves distal to that point. Optimization theory further suggests that LA:SA should decrease toward drier climates. Although acclimation of LA:SA to climate has been reported within species, much less is known about the scaling of this trait with climate among species. We compiled LA:SA measurements from 184 species of Australian evergreen angiosperm trees. The pipe model was broadly confirmed, based on measurements on branches and trunks of trees from one to 27 years old. Despite considerable scatter in LA:SA among species, quantile regression showed strong (0.2 < R1 < 0.65) positive relationships between two climatic moisture indices and the lowermost (5%) and uppermost (5-15%) quantiles of log LA:SA, suggesting that moisture availability constrains the envelope of minimum and maximum values of LA:SA typical for any given climate. Interspecific differences in plant hydraulic conductivity are probably responsible for the large scatter of values in the mid-quantile range and may be an important determinant of tree morphology.
Togashi, Henrique Furstenau; Prentice, Iain Colin; Evans, Bradley John; Forrester, David Ian; Drake, Paul; Feikema, Paul; Brooksbank, Kim; Eamus, Derek; Taylor, Daniel
2015-01-01
The leaf area-to-sapwood area ratio (LA:SA) is a key plant trait that links photosynthesis to transpiration. The pipe model theory states that the sapwood cross-sectional area of a stem or branch at any point should scale isometrically with the area of leaves distal to that point. Optimization theory further suggests that LA:SA should decrease toward drier climates. Although acclimation of LA:SA to climate has been reported within species, much less is known about the scaling of this trait with climate among species. We compiled LA:SA measurements from 184 species of Australian evergreen angiosperm trees. The pipe model was broadly confirmed, based on measurements on branches and trunks of trees from one to 27 years old. Despite considerable scatter in LA:SA among species, quantile regression showed strong (0.2 < R1 < 0.65) positive relationships between two climatic moisture indices and the lowermost (5%) and uppermost (5–15%) quantiles of log LA:SA, suggesting that moisture availability constrains the envelope of minimum and maximum values of LA:SA typical for any given climate. Interspecific differences in plant hydraulic conductivity are probably responsible for the large scatter of values in the mid-quantile range and may be an important determinant of tree morphology. PMID:25859331
Chen, Wei; Li, Hui; Hou, Enke; Wang, Shengquan; Wang, Guirong; Panahi, Mahdi; Li, Tao; Peng, Tao; Guo, Chen; Niu, Chao; Xiao, Lele; Wang, Jiale; Xie, Xiaoshen; Ahmad, Baharin Bin
2018-09-01
The aim of the current study was to produce groundwater spring potential maps using novel ensemble weights-of-evidence (WoE) with logistic regression (LR) and functional tree (FT) models. First, a total of 66 springs were identified by field surveys, out of which 70% of the spring locations were used for training the models and 30% of the spring locations were employed for the validation process. Second, a total of 14 affecting factors including aspect, altitude, slope, plan curvature, profile curvature, stream power index (SPI), topographic wetness index (TWI), sediment transport index (STI), lithology, normalized difference vegetation index (NDVI), land use, soil, distance to roads, and distance to streams was used to analyze the spatial relationship between these affecting factors and spring occurrences. Multicollinearity analysis and feature selection of the correlation attribute evaluation (CAE) method were employed to optimize the affecting factors. Subsequently, the novel ensembles of the WoE, LR, and FT models were constructed using the training dataset. Finally, the receiver operating characteristic (ROC) curves, standard error, confidence interval (CI) at 95%, and significance level P were employed to validate and compare the performance of three models. Overall, all three models performed well for groundwater spring potential evaluation. The prediction capability of the FT model, with the highest AUC values, the smallest standard errors, the narrowest CIs, and the smallest P values for the training and validation datasets, is better compared to those of other models. The groundwater spring potential maps can be adopted for the management of water resources and land use by planners and engineers. Copyright © 2018 Elsevier B.V. All rights reserved.
Development of Interpretable Predictive Models for BPH and Prostate Cancer.
Bermejo, Pablo; Vivo, Alicia; Tárraga, Pedro J; Rodríguez-Montes, J A
2015-01-01
Traditional methods for deciding whether to recommend a patient for a prostate biopsy are based on cut-off levels of stand-alone markers such as prostate-specific antigen (PSA) or any of its derivatives. However, in the last decade we have seen the increasing use of predictive models that combine, in a non-linear manner, several predictives that are better able to predict prostate cancer (PC), but these fail to help the clinician to distinguish between PC and benign prostate hyperplasia (BPH) patients. We construct two new models that are capable of predicting both PC and BPH. An observational study was performed on 150 patients with PSA ≥3 ng/mL and age >50 years. We built a decision tree and a logistic regression model, validated with the leave-one-out methodology, in order to predict PC or BPH, or reject both. Statistical dependence with PC and BPH was found for prostate volume (P-value < 0.001), PSA (P-value < 0.001), international prostate symptom score (IPSS; P-value < 0.001), digital rectal examination (DRE; P-value < 0.001), age (P-value < 0.002), antecedents (P-value < 0.006), and meat consumption (P-value < 0.08). The two predictive models that were constructed selected a subset of these, namely, volume, PSA, DRE, and IPSS, obtaining an area under the ROC curve (AUC) between 72% and 80% for both PC and BPH prediction. PSA and volume together help to build predictive models that accurately distinguish among PC, BPH, and patients without any of these pathologies. Our decision tree and logistic regression models outperform the AUC obtained in the compared studies. Using these models as decision support, the number of unnecessary biopsies might be significantly reduced.
Individual tree growth models for natural even-aged shortleaf pine
Chakra B. Budhathoki; Thomas B. Lynch; James M. Guldin
2006-01-01
Shortleaf pine (Pinus echinata Mill.) measurements were available from permanent plots established in even-aged stands of the Ouachita Mountains for studying growth. Annual basal area growth was modeled with a least-squares nonlinear regression method utilizing three measurements. The analysis showed that the parameter estimates were in agreement...
The wisdom of the commons: ensemble tree classifiers for prostate cancer prognosis
Koziol, James A.; Feng, Anne C.; Jia, Zhenyu; Wang, Yipeng; Goodison, Seven; McClelland, Michael; Mercola, Dan
2009-01-01
Motivation: Classification and regression trees have long been used for cancer diagnosis and prognosis. Nevertheless, instability and variable selection bias, as well as overfitting, are well-known problems of tree-based methods. In this article, we investigate whether ensemble tree classifiers can ameliorate these difficulties, using data from two recent studies of radical prostatectomy in prostate cancer. Results: Using time to progression following prostatectomy as the relevant clinical endpoint, we found that ensemble tree classifiers robustly and reproducibly identified three subgroups of patients in the two clinical datasets: non-progressors, early progressors and late progressors. Moreover, the consensus classifications were independent predictors of time to progression compared to known clinical prognostic factors. Contact: dmercola@uci.edu PMID:18628288
Silvis, Alexander; Ford, W. Mark; Eric R. Britzke,; Nathan R. Beane,; Joshua B. Johnson,
2012-01-01
Conservation of summer maternity roosts is considered critical for bat management in North America, yet many aspects of the physical and environmental factors that drive roost selection are poorly understood. We tracked 58 female northern bats (Myotis septentrionalis) to 105 roost trees of 21 species on the Fort Knox military reservation in north-central Kentucky during the summer of 2011. Sassafras (Sassafras albidum) was used as a day roost more than expected based on forest stand-level availability and accounted for 48.6% of all observed day roosts. Using logistic regression and an information theoretic approach, we were unable to reliably differentiate between sassafras and other roost species or between day roosts used during different maternity periods using models representative of individual tree metrics, site metrics, topographic location, or combinations of these factors. For northern bats, we suggest that day-roost selection is not a function of differences between individual tree species per se, but rather of forest successional patterns, stand and tree structure. Present successional trajectories may not provide this particular selected structure again without management intervention, thereby suggesting that resource managers take a relatively long retrospective view to manage current and future forest conditions for bats.
Westreich, Daniel; Lessler, Justin; Funk, Michele Jonsson
2010-08-01
Propensity scores for the analysis of observational data are typically estimated using logistic regression. Our objective in this review was to assess machine learning alternatives to logistic regression, which may accomplish the same goals but with fewer assumptions or greater accuracy. We identified alternative methods for propensity score estimation and/or classification from the public health, biostatistics, discrete mathematics, and computer science literature, and evaluated these algorithms for applicability to the problem of propensity score estimation, potential advantages over logistic regression, and ease of use. We identified four techniques as alternatives to logistic regression: neural networks, support vector machines, decision trees (classification and regression trees [CART]), and meta-classifiers (in particular, boosting). Although the assumptions of logistic regression are well understood, those assumptions are frequently ignored. All four alternatives have advantages and disadvantages compared with logistic regression. Boosting (meta-classifiers) and, to a lesser extent, decision trees (particularly CART), appear to be most promising for use in the context of propensity score analysis, but extensive simulation studies are needed to establish their utility in practice. Copyright (c) 2010 Elsevier Inc. All rights reserved.
Patient casemix classification for medicare psychiatric prospective payment.
Drozd, Edward M; Cromwell, Jerry; Gage, Barbara; Maier, Jan; Greenwald, Leslie M; Goldman, Howard H
2006-04-01
For a proposed Medicare prospective payment system for inpatient psychiatric facility treatment, the authors developed a casemix classification to capture differences in patients' real daily resource use. Primary data on patient characteristics and daily time spent in various activities were collected in a survey of 696 patients from 40 inpatient psychiatric facilities. Survey data were combined with Medicare claims data to estimate intensity-adjusted daily cost. Classification and Regression Trees (CART) analysis of average daily routine and ancillary costs yielded several hierarchical classification groupings. Regression analysis was used to control for facility and day-of-stay effects in order to compare hierarchical models with models based on the recently proposed payment system of the Centers for Medicare & Medicaid Services. CART analysis identified a small set of patient characteristics strongly associated with higher daily costs, including age, psychiatric diagnosis, deficits in daily living activities, and detox or ECT use. A parsimonious, 16-group, fully interactive model that used five major DSM-IV categories and stratified by age, illness severity, deficits in daily living activities, dangerousness, and use of ECT explained 40% (out of a possible 76%) of daily cost variation not attributable to idiosyncratic daily changes within patients. A noninteractive model based on diagnosis-related groups, age, and medical comorbidity had explanatory power of only 32%. A regression model with 16 casemix groups restricted to using "appropriate" payment variables (i.e., those with clinical face validity and low administrative burden that are easily validated and provide proper care incentives) produced more efficient and equitable payments than did a noninteractive system based on diagnosis-related groups.
Finch, Holmes W; Davis, Andrew; Dean, Raymond S
2015-03-01
The accurate and early identification of individuals with pervasive conditions such as attention deficit hyperactivity disorder (ADHD) is crucial to ensuring that they receive appropriate and timely assistance and treatment. Heretofore, identification of such individuals has proven somewhat difficult, typically involving clinical decision making based on descriptions and observations of behavior, in conjunction with the administration of cognitive assessments. The present study reports on the use of a sensory motor battery in conjunction with a recursive partitioning computer algorithm, boosted trees, to develop a prediction heuristic for identifying individuals with ADHD. Results of the study demonstrate that this method is able to do so with accuracy rates of over 95 %, much higher than the popular logistic regression model against which it was compared. Implications of these results for practice are provided.
Landscape Variation in Tree Species Richness in Northern Iran Forests
Bourque, Charles P.-A.; Bayat, Mahmoud
2015-01-01
Mapping landscape variation in tree species richness (SR) is essential to the long term management and conservation of forest ecosystems. The current study examines the prospect of mapping field assessments of SR in a high-elevation, deciduous forest in northern Iran as a function of 16 biophysical variables representative of the area’s unique physiography, including topography and coastal placement, biophysical environment, and forests. Basic to this study is the development of moderate-resolution biophysical surfaces and associated plot-estimates for 202 permanent sampling plots. The biophysical variables include: (i) three topographic variables generated directly from the area’s digital terrain model; (ii) four ecophysiologically-relevant variables derived from process models or from first principles; and (iii) seven variables of Landsat-8-acquired surface reflectance and two, of surface radiance. With symbolic regression, it was shown that only four of the 16 variables were needed to explain 85% of observed plot-level variation in SR (i.e., wind velocity, surface reflectance of blue light, and topographic wetness indices representative of soil water content), yielding mean-absolute and root-mean-squared error of 0.50 and 0.78, respectively. Overall, localised calculations of wind velocity and surface reflectance of blue light explained about 63% of observed variation in SR, with wind velocity accounting for 51% of that variation. The remaining 22% was explained by linear combinations of soil-water-related topographic indices and associated thresholds. In general, SR and diversity tended to be greatest for plots dominated by Carpinus betulus (involving ≥ 33% of all trees in a plot), than by Fagus orientalis (median difference of one species). This study provides a significant step towards describing landscape variation in SR as a function of modelled and satellite-based information and symbolic regression. Methods in this study are sufficiently general to be applicable to the characterisation of SR in other forested regions of the world, providing plot-scale data are available for model generation. PMID:25849029
Landscape variation in tree species richness in northern Iran forests.
Bourque, Charles P-A; Bayat, Mahmoud
2015-01-01
Mapping landscape variation in tree species richness (SR) is essential to the long term management and conservation of forest ecosystems. The current study examines the prospect of mapping field assessments of SR in a high-elevation, deciduous forest in northern Iran as a function of 16 biophysical variables representative of the area's unique physiography, including topography and coastal placement, biophysical environment, and forests. Basic to this study is the development of moderate-resolution biophysical surfaces and associated plot-estimates for 202 permanent sampling plots. The biophysical variables include: (i) three topographic variables generated directly from the area's digital terrain model; (ii) four ecophysiologically-relevant variables derived from process models or from first principles; and (iii) seven variables of Landsat-8-acquired surface reflectance and two, of surface radiance. With symbolic regression, it was shown that only four of the 16 variables were needed to explain 85% of observed plot-level variation in SR (i.e., wind velocity, surface reflectance of blue light, and topographic wetness indices representative of soil water content), yielding mean-absolute and root-mean-squared error of 0.50 and 0.78, respectively. Overall, localised calculations of wind velocity and surface reflectance of blue light explained about 63% of observed variation in SR, with wind velocity accounting for 51% of that variation. The remaining 22% was explained by linear combinations of soil-water-related topographic indices and associated thresholds. In general, SR and diversity tended to be greatest for plots dominated by Carpinus betulus (involving ≥ 33% of all trees in a plot), than by Fagus orientalis (median difference of one species). This study provides a significant step towards describing landscape variation in SR as a function of modelled and satellite-based information and symbolic regression. Methods in this study are sufficiently general to be applicable to the characterisation of SR in other forested regions of the world, providing plot-scale data are available for model generation.
Classification and regression trees
G. G. Moisen
2008-01-01
Frequently, ecologists are interested in exploring ecological relationships, describing patterns and processes, or making spatial or temporal predictions. These purposes often can be addressed by modeling the relationship between some outcome or response and a set of features or explanatory variables.
NASA Astrophysics Data System (ADS)
Webb, Mathew A.; Hall, Andrew; Kidd, Darren; Minansy, Budiman
2016-05-01
Assessment of local spatial climatic variability is important in the planning of planting locations for horticultural crops. This study investigated three regression-based calibration methods (i.e. traditional versus two optimized methods) to relate short-term 12-month data series from 170 temperature loggers and 4 weather station sites with data series from nearby long-term Australian Bureau of Meteorology climate stations. The techniques trialled to interpolate climatic temperature variables, such as frost risk, growing degree days (GDDs) and chill hours, were regression kriging (RK), regression trees (RTs) and random forests (RFs). All three calibration methods produced accurate results, with the RK-based calibration method delivering the most accurate validation measures: coefficients of determination ( R 2) of 0.92, 0.97 and 0.95 and root-mean-square errors of 1.30, 0.80 and 1.31 °C, for daily minimum, daily maximum and hourly temperatures, respectively. Compared with the traditional method of calibration using direct linear regression between short-term and long-term stations, the RK-based calibration method improved R 2 and reduced root-mean-square error (RMSE) by at least 5 % and 0.47 °C for daily minimum temperature, 1 % and 0.23 °C for daily maximum temperature and 3 % and 0.33 °C for hourly temperature. Spatial modelling indicated insignificant differences between the interpolation methods, with the RK technique tending to be the slightly better method due to the high degree of spatial autocorrelation between logger sites.
Hierarchical Matching and Regression with Application to Photometric Redshift Estimation
NASA Astrophysics Data System (ADS)
Murtagh, Fionn
2017-06-01
This work emphasizes that heterogeneity, diversity, discontinuity, and discreteness in data is to be exploited in classification and regression problems. A global a priori model may not be desirable. For data analytics in cosmology, this is motivated by the variety of cosmological objects such as elliptical, spiral, active, and merging galaxies at a wide range of redshifts. Our aim is matching and similarity-based analytics that takes account of discrete relationships in the data. The information structure of the data is represented by a hierarchy or tree where the branch structure, rather than just the proximity, is important. The representation is related to p-adic number theory. The clustering or binning of the data values, related to the precision of the measurements, has a central role in this methodology. If used for regression, our approach is a method of cluster-wise regression, generalizing nearest neighbour regression. Both to exemplify this analytics approach, and to demonstrate computational benefits, we address the well-known photometric redshift or `photo-z' problem, seeking to match Sloan Digital Sky Survey (SDSS) spectroscopic and photometric redshifts.
González-Costa, Juan José; Reigosa, Manuel Joaquín; Matías, José María; Fernández-Covelo, Emma
2017-01-01
This study determines the influence of the different soil components and of the cation-exchange capacity on the adsorption and retention of different heavy metals: cadmium, chromium, copper, nickel, lead and zinc. In order to do so, regression models were created through decision trees and the importance of soil components was assessed. Used variables were: humified organic matter, specific cation-exchange capacity, percentages of sand and silt, proportions of Mn, Fe and Al oxides and hematite, and the proportion of quartz, plagioclase and mica, and the proportions of the different clays: kaolinite, vermiculite, gibbsite and chlorite. The most important components in the obtained models were vermiculite and gibbsite, especially for the adsorption of cadmium and zinc, while clays were less relevant. Oxides are less important than clays, especially for the adsorption of chromium and lead and the retention of chromium, copper and lead. PMID:28072849
Yılmaz Isıkhan, Selen; Karabulut, Erdem; Alpar, Celal Reha
2016-01-01
Background/Aim . Evaluating the success of dose prediction based on genetic or clinical data has substantially advanced recently. The aim of this study is to predict various clinical dose values from DNA gene expression datasets using data mining techniques. Materials and Methods . Eleven real gene expression datasets containing dose values were included. First, important genes for dose prediction were selected using iterative sure independence screening. Then, the performances of regression trees (RTs), support vector regression (SVR), RT bagging, SVR bagging, and RT boosting were examined. Results . The results demonstrated that a regression-based feature selection method substantially reduced the number of irrelevant genes from raw datasets. Overall, the best prediction performance in nine of 11 datasets was achieved using SVR; the second most accurate performance was provided using a gradient-boosting machine (GBM). Conclusion . Analysis of various dose values based on microarray gene expression data identified common genes found in our study and the referenced studies. According to our findings, SVR and GBM can be good predictors of dose-gene datasets. Another result of the study was to identify the sample size of n = 25 as a cutoff point for RT bagging to outperform a single RT.
Eric H. Wharton; Tiberius Cunia
1987-01-01
Proceedings of a workshop co-sponsored by the USDA Forest Service, the State University of New York, and the Society of American Foresters. Presented were papers on the methodology of sample tree selection, tree biomass measurement, construction of biomass tables and estimation of their error, and combining the error of biomass tables with that of the sample plots or...
Hauglin, Marius; Bollandsås, Ole Martin; Gobakken, Terje; Næsset, Erik
2017-12-08
Monitoring of forest resources through national forest inventory programmes is carried out in many countries. The expected climate changes will affect trees and forests and might cause an expansion of trees into presently treeless areas, such as above the current alpine tree line. It is therefore a need to develop methods that enable the inclusion of also these areas into monitoring programmes. Airborne laser scanning (ALS) is an established tool in operational forest inventories, and could be a viable option for monitoring tasks. In the present study, we used multi-temporal ALS data with point density of 8-15 points per m 2 , together with field measurements from single trees in the forest-tundra ecotone along a 1500-km-long transect in Norway. The material comprised 262 small trees with an average height of 1.78 m. The field-measured height growth was derived from height measurements at two points in time. The elapsed time between the two measurements was 4 years. Regression models were then used to model the relationship between ALS-derived variables and tree heights as well as the height growth. Strong relationships between ALS-derived variables and tree heights were found, with R 2 values of 0.93 and 0.97 for the two points in time. The relationship between the ALS data and the field-derived height growth was weaker, with R 2 values of 0.36-0.42. A cross-validation gave corresponding results, with root mean square errors of 19 and 11% for the ALS height models and 60% for the model relating ALS data to single-tree height growth.
Jiao, Y; Chen, R; Ke, X; Cheng, L; Chu, K; Lu, Z; Herskovits, E H
2011-01-01
Autism spectrum disorder (ASD) is a neurodevelopmental disorder, of which Asperger syndrome and high-functioning autism are subtypes. Our goal is: 1) to determine whether a diagnostic model based on single-nucleotide polymorphisms (SNPs), brain regional thickness measurements, or brain regional volume measurements can distinguish Asperger syndrome from high-functioning autism; and 2) to compare the SNP, thickness, and volume-based diagnostic models. Our study included 18 children with ASD: 13 subjects with high-functioning autism and 5 subjects with Asperger syndrome. For each child, we obtained 25 SNPs for 8 ASD-related genes; we also computed regional cortical thicknesses and volumes for 66 brain structures, based on structural magnetic resonance (MR) examination. To generate diagnostic models, we employed five machine-learning techniques: decision stump, alternating decision trees, multi-class alternating decision trees, logistic model trees, and support vector machines. For SNP-based classification, three decision-tree-based models performed better than the other two machine-learning models. The performance metrics for three decision-tree-based models were similar: decision stump was modestly better than the other two methods, with accuracy = 90%, sensitivity = 0.95 and specificity = 0.75. All thickness and volume-based diagnostic models performed poorly. The SNP-based diagnostic models were superior to those based on thickness and volume. For SNP-based classification, rs878960 in GABRB3 (gamma-aminobutyric acid A receptor, beta 3) was selected by all tree-based models. Our analysis demonstrated that SNP-based classification was more accurate than morphometry-based classification in ASD subtype classification. Also, we found that one SNP--rs878960 in GABRB3--distinguishes Asperger syndrome from high-functioning autism.
NASA Astrophysics Data System (ADS)
Alexander, Cici; Korstjens, Amanda H.; Hill, Ross A.
2018-03-01
Tree or canopy height is an important attribute for carbon stock estimation, forest management and habitat quality assessment. Airborne Laser Scanning (ALS) based on Light Detection and Ranging (LiDAR) has advantages over other remote sensing techniques for describing the structure of forests. However, sloped terrain can be challenging for accurate estimation of tree locations and heights based on a Canopy Height Model (CHM) generated from ALS data; a CHM is a height-normalised Digital Surface Model (DSM) obtained by subtracting a Digital Terrain Model (DTM) from a DSM. On sloped terrain, points at the same elevation on a tree crown appear to increase in height in the downhill direction, based on the ground elevations at these points. A point will be incorrectly identified as the treetop by individual tree crown (ITC) recognition algorithms if its height is greater than that of the actual treetop in the CHM, which will be recorded as the tree height. In this study, the influence of terrain slope and crown characteristics on the detection of treetops and estimation of tree heights is assessed using ALS data in a tropical forest with complex terrain (i.e. micro-topography) and tree crown characteristics. Locations and heights of 11,442 trees based on a DSM are compared with those based on a CHM. The horizontal (DH) and vertical displacements (DV) increase with terrain slope (r = 0.47 and r = 0.54 respectively, p < 0.001). The overestimations in tree height are up to 16.6 m on slopes greater than 50° in our study area in Sumatra. The errors in locations (DH) and tree heights (DV) are modelled for trees with conical and spherical tree crowns. For a spherical tree crown, DH can be modelled as R sin θ, and DV as R (sec θ - 1). In this study, a model is developed for an idealised conical tree crown, DV = R (tan θ - tan ψ), where R is the crown radius, and θ and ψ are terrain and crown angles respectively. It is shown that errors occur only when terrain angle exceeds the crown angle, with the horizontal displacement equal to the crown radius. Errors in location are seen to be greater for spherical than conical trees on slopes where crown angles of conical trees are less than the terrain angle. The results are especially relevant for biomass and carbon stock estimations in tropical forests where there are trees with large crown radii on slopes.
NASA Astrophysics Data System (ADS)
Doucet, A.; Savard, M. M.; Bégin, C.; Ouarda, T. B.; Marion, J.
2010-12-01
The combined analyses of tree-ring δ13C, δ18O, δ15N, 206Pb/207Pb, 206Pb/204Pb and 206Pb/208Pb isotope ratios of three red spruce specimens from the Tantaré ecological reserve located 40 km northwest of Québec City (Canada) were studied with the aim of reconstructing environmental conditions and unravel past air-quality changes of the 1880-2007 period. To separate the tree-ring δ18O and δ13C patterns induced by natural conditions from those generated by anthropogenic perturbations, a linear regression was applied between the most explicative meteorological parameters and the isotopic series for the period of low pollution (1880 to 1909). The model equations were then applied to the most recent part of the series (1910-2007) to verify if climatic conditions have remained the main driver of the tree-ring isotopic variations. The good fit between the modeled and measured δ18O series for the entire studied period suggests that the assimilation of oxygen by red spruce trees is not significantly affected by pollution stress near Québec City. However, the deviation between the measured and modeled δ13C values for the 1944-2007 period indicates that diffuse pollution affected carbon assimilation by the investigated trees. To independently validate if atmospheric pollution could have generated the deviation between the measured and the estimated δ13C values, a linear regression was applied between the portion of the residual δ13C values and atmospheric pollution (Canadian fossil fuel proxy from 1958 to 2000). The nice fit between the modeled δ13C values from the combination of the two regression analyses based on climate and emission proxy strongly supports the hypothesis that there is a natural and an anthropogenic portion in the δ13C variations of the studied specimens. The short-term variations of the red spruce δ15N series are correlated with the instrumentally measured amounts of provincial N emissions for the 1990 to 2006 period (longest measurements available). Additionally, the long-term decrease of the δ15N series after 1956 is linked to the low isotopic values of NOx emitted by car exhausts, as expressed by the provincial number of cars which reflect the amount of transport-related N deposition at the provincial scale. The 208Pb/206Pb and 204Pb/206Pb ratios as a function of 206Pb/207Pb of the 1880-1919 period reflect a mixture of natural lead from the mineral soil horizon and mainly anthropogenic lead from north-eastern American coal combustion. The lower Pb ratios of the 1920-1989 period correlate well with the introduction of leaded additives to gasoline characterized by lower ratios relative to coal combustion. Inferring the lead sources of the 1990-2008 period is not as straightforward because lead can potentially derive from three main sources: coal combustion, burnt recycled material and natural lead present in soils. Our results show the great potential of tree-ring stable isotopes to record pollution events in the context of peri-urban diffuse pollution, and to prolong the pollution history in regions where direct measurements of pollutants only covers a relatively short period.
Anatomical modeling of the bronchial tree
NASA Astrophysics Data System (ADS)
Hentschel, Gerrit; Klinder, Tobias; Blaffert, Thomas; Bülow, Thomas; Wiemker, Rafael; Lorenz, Cristian
2010-02-01
The bronchial tree is of direct clinical importance in the context of respective diseases, such as chronic obstructive pulmonary disease (COPD). It furthermore constitutes a reference structure for object localization in the lungs and it finally provides access to lung tissue in, e.g., bronchoscope based procedures for diagnosis and therapy. This paper presents a comprehensive anatomical model for the bronchial tree, including statistics of position, relative and absolute orientation, length, and radius of 34 bronchial segments, going beyond previously published results. The model has been built from 16 manually annotated CT scans, covering several branching variants. The model is represented as a centerline/tree structure but can also be converted in a surface representation. Possible model applications are either to anatomically label extracted bronchial trees or to improve the tree extraction itself by identifying missing segments or sub-trees, e.g., if located beyond a bronchial stenosis. Bronchial tree labeling is achieved using a naïve Bayesian classifier based on the segment properties contained in the model in combination with tree matching. The tree matching step makes use of branching variations covered by the model. An evaluation of the model has been performed in a leaveone- out manner. In total, 87% of the branches resulting from preceding airway tree segmentation could be correctly labeled. The individualized model enables the detection of missing branches, allowing a targeted search, e.g., a local rerun of the tree-segmentation segmentation.
Lofaro, Danilo; Jager, Kitty J; Abu-Hanna, Ameen; Groothoff, Jaap W; Arikoski, Pekka; Hoecker, Britta; Roussey-Kesler, Gwenaelle; Spasojević, Brankica; Verrina, Enrico; Schaefer, Franz; van Stralen, Karlijn J
2016-02-01
Identification of patient groups by risk of renal graft loss might be helpful for accurate patient counselling and clinical decision-making. Survival tree models are an alternative statistical approach to identify subgroups, offering cut-off points for covariates and an easy-to-interpret representation. Within the European Society of Pediatric Nephrology/European Renal Association-European Dialysis and Transplant Association (ESPN/ERA-EDTA) Registry data we identified paediatric patient groups with specific profiles for 5-year renal graft survival. Two analyses were performed, including (i) parameters known at time of transplantation and (ii) additional clinical measurements obtained early after transplantation. The identified subgroups were added as covariates in two survival models. The prognostic performance of the models was tested and compared with conventional Cox regression analyses. The first analysis included 5275 paediatric renal transplants. The best 5-year graft survival (90.4%) was found among patients who received a renal graft as a pre-emptive transplantation or after short-term dialysis (<45 days), whereas graft survival was poorest (51.7%) in adolescents transplanted after long-term dialysis (>2.2 years). The Cox model including both pre-transplant factors and tree subgroups had a significantly better predictive performance than conventional Cox regression (P < 0.001). In the analysis including clinical factors, graft survival ranged from 97.3% [younger patients with estimated glomerular filtration rate (eGFR) >30 mL/min/1.73 m(2) and dialysis <20 months] to 34.7% (adolescents with eGFR <60 mL/min/1.73 m(2) and dialysis >20 months). Also in this case combining tree findings and clinical factors improved the predictive performance as compared with conventional Cox model models (P < 0.0001). In conclusion, we demonstrated the tree model to be an accurate and attractive tool to predict graft failure for patients with specific characteristics. This may aid the evaluation of individual graft prognosis and thereby the design of measures to improve graft survival in the poor prognosis groups. © The Author 2015. Published by Oxford University Press on behalf of ERA-EDTA. All rights reserved.
Kumar, S.; Spaulding, S.A.; Stohlgren, T.J.; Hermann, K.A.; Schmidt, T.S.; Bahls, L.L.
2009-01-01
The diatom Didymosphenia geminata is a single-celled alga found in lakes, streams, and rivers. Nuisance blooms of D geminata affect the diversity, abundance, and productivity of other aquatic organisms. Because D geminata can be transported by humans on waders and other gear, accurate spatial prediction of habitat suitability is urgently needed for early detection and rapid response, as well as for evaluation of monitoring and control programs. We compared four modeling methods to predict D geminata's habitat distribution; two methods use presence-absence data (logistic regression and classification and regression tree [CART]), and two involve presence data (maximum entropy model [Maxent] and genetic algorithm for rule-set production [GARP]). Using these methods, we evaluated spatially explicit, bioclimatic and environmental variables as predictors of diatom distribution. The Maxent model provided the most accurate predictions, followed by logistic regression, CART, and GARP. The most suitable habitats were predicted to occur in the western US, in relatively cool sites, and at high elevations with a high base-flow index. The results provide insights into the factors that affect the distribution of D geminata and a spatial basis for the prediction of nuisance blooms. ?? The Ecological Society of America.
Dyer, Betsey D.; Kahn, Michael J.; LeBlanc, Mark D.
2008-01-01
Classification and regression tree (CART) analysis was applied to genome-wide tetranucleotide frequencies (genomic signatures) of 195 archaea and bacteria. Although genomic signatures have typically been used to classify evolutionary divergence, in this study, convergent evolution was the focus. Temperature optima for most of the organisms examined could be distinguished by CART analyses of tetranucleotide frequencies. This suggests that pervasive (nonlinear) qualities of genomes may reflect certain environmental conditions (such as temperature) in which those genomes evolved. The predominant use of GAGA and AGGA as the discriminating tetramers in CART models suggests that purine-loading and codon biases of thermophiles may explain some of the results. PMID:19054742
Custer, Christine M.; Custer, Thomas W.; Dummer, Paul; Etterson, Matthew A.; Thogmartin, Wayne E.; Wu, Qian; Kannan, Kurunthachalam; Trowbridge, Annette; McKann, Patrick C.
2013-01-01
The exposure and effects of perfluoroalkyl substances (PFASs) were studied at eight locations in Minnesota and Wisconsin between 2007 and 2011 using tree swallows (Tachycineta bicolor). Concentrations of PFASs were quantified as were reproductive success end points. The sample egg method was used wherein an egg sample is collected, and the hatching success of the remaining eggs in the nest is assessed. The association between PFAS exposure and reproductive success was assessed by site comparisons, logistic regression analysis, and multistate modeling, a technique not previously used in this context. There was a negative association between concentrations of perfluorooctane sulfonate (PFOS) in eggs and hatching success. The concentration at which effects became evident (150–200 ng/g wet weight) was far lower than effect levels found in laboratory feeding trials or egg-injection studies of other avian species. This discrepancy was likely because behavioral effects and other extrinsic factors are not accounted for in these laboratory studies and the possibility that tree swallows are unusually sensitive to PFASs. The results from multistate modeling and simple logistic regression analyses were nearly identical. Multistate modeling provides a better method to examine possible effects of additional covariates and assessment of models using Akaike information criteria analyses. There was a credible association between PFOS concentrations in plasma and eggs, so extrapolation between these two commonly sampled tissues can be performed.
Bayesian Models for Streamflow and River Network Reconstruction using Tree Rings
NASA Astrophysics Data System (ADS)
Ravindranath, A.; Devineni, N.
2016-12-01
Water systems face non-stationary, dynamically shifting risks due to shifting societal conditions and systematic long-term variations in climate manifesting as quasi-periodic behavior on multi-decadal time scales. Water systems are thus vulnerable to long periods of wet or dry hydroclimatic conditions. Streamflow is a major component of water systems and a primary means by which water is transported to serve ecosystems' and human needs. Thus, our concern is in understanding streamflow variability. Climate variability and impacts on water resources are crucial factors affecting streamflow, and multi-scale variability increases risk to water sustainability and systems. Dam operations are necessary for collecting water brought by streamflow while maintaining downstream ecological health. Rules governing dam operations are based on streamflow records that are woefully short compared to periods of systematic variation present in the climatic factors driving streamflow variability and non-stationarity. We use hierarchical Bayesian regression methods in order to reconstruct paleo-streamflow records for dams within a basin using paleoclimate proxies (e.g. tree rings) to guide the reconstructions. The riverine flow network for the entire basin is subsequently modeled hierarchically using feeder stream and tributary flows. This is a starting point in analyzing streamflow variability and risks to water systems, and developing a scientifically-informed dynamic risk management framework for formulating dam operations and water policies to best hedge such risks. We will apply this work to the Missouri and Delaware River Basins (DRB). Preliminary results of streamflow reconstructions for eight dams in the upper DRB using standard Gaussian regression with regional tree ring chronologies give streamflow records that now span two to two and a half centuries, and modestly smoothed versions of these reconstructed flows indicate physically-justifiable trends in the time series.
Fraver, Shawn; D'Amato, Anthony W.; Bradford, John B.; Jonsson, Bengt Gunnar; Jönsson, Mari; Esseen, Per-Anders
2013-01-01
Question: What factors best characterize tree competitive environments in this structurally diverse old-growth forest, and do these factors vary spatially within and among stands? Location: Old-growth Picea abies forest of boreal Sweden. Methods: Using long-term, mapped permanent plot data augmented with dendrochronological analyses, we evaluated the effect of neighbourhood competition on focal tree growth by means of standard competition indices, each modified to include various metrics of trees size, neighbour mortality weighting (for neighbours that died during the inventory period), and within-neighbourhood tree clustering. Candidate models were evaluated using mixed-model linear regression analyses, with mean basal area increment as the response variable. We then analysed stand-level spatial patterns of competition indices and growth rates (via kriging) to determine if the relationship between these patterns could further elucidate factors influencing tree growth. Results: Inter-tree competition clearly affected growth rates, with crown volume being the size metric most strongly influencing the neighbourhood competitive environment. Including neighbour tree mortality weightings in models only slightly improved descriptions of competitive interactions. Although the within-neighbourhood clustering index did not improve model predictions, competition intensity was influenced by the underlying stand-level tree spatial arrangement: stand-level clustering locally intensified competition and reduced tree growth, whereas in the absence of such clustering, inter-tree competition played a lesser role in constraining tree growth. Conclusions: Our findings demonstrate that competition continues to influence forest processes and structures in an old-growth system that has not experienced major disturbances for at least two centuries. The finding that the underlying tree spatial pattern influenced the competitive environment suggests caution in interpreting traditional tree competition studies, in which tree spatial patterning is typically not taken into account. Our findings highlight the importance of forest structure – particularly the spatial arrangement of trees – in regulating inter-tree competition and growth in structurally diverse forests, and they provide insight into the causes and consequences of heterogeneity in this old-growth system.
NASA Technical Reports Server (NTRS)
Garner, Gregory G.; Thompson, Anne M.
2013-01-01
An ensemble statistical post-processor (ESP) is developed for the National Air Quality Forecast Capability (NAQFC) to address the unique challenges of forecasting surface ozone in Baltimore, MD. Air quality and meteorological data were collected from the eight monitors that constitute the Baltimore forecast region. These data were used to build the ESP using a moving-block bootstrap, regression tree models, and extreme-value theory. The ESP was evaluated using a 10-fold cross-validation to avoid evaluation with the same data used in the development process. Results indicate that the ESP is conditionally biased, likely due to slight overfitting while training the regression tree models. When viewed from the perspective of a decision-maker, the ESP provides a wealth of additional information previously not available through the NAQFC alone. The user is provided the freedom to tailor the forecast to the decision at hand by using decision-specific probability thresholds that define a forecast for an ozone exceedance. Taking advantage of the ESP, the user not only receives an increase in value over the NAQFC, but also receives value for An ensemble statistical post-processor (ESP) is developed for the National Air Quality Forecast Capability (NAQFC) to address the unique challenges of forecasting surface ozone in Baltimore, MD. Air quality and meteorological data were collected from the eight monitors that constitute the Baltimore forecast region. These data were used to build the ESP using a moving-block bootstrap, regression tree models, and extreme-value theory. The ESP was evaluated using a 10-fold cross-validation to avoid evaluation with the same data used in the development process. Results indicate that the ESP is conditionally biased, likely due to slight overfitting while training the regression tree models. When viewed from the perspective of a decision-maker, the ESP provides a wealth of additional information previously not available through the NAQFC alone. The user is provided the freedom to tailor the forecast to the decision at hand by using decision-specific probability thresholds that define a forecast for an ozone exceedance. Taking advantage of the ESP, the user not only receives an increase in value over the NAQFC, but also receives value for
Xia, Jiangzhou; Liang, Shunlin; Chen, Jiquan; Yuan, Wenping; Liu, Shuguang; Li, Linghao; Cai, Wenwen; Zhang, Li; Fu, Yang; Zhao, Tianbao; Feng, Jinming; Ma, Zhuguo; Ma, Mingguo; Liu, Shaomin; Zhou, Guangsheng; Asanuma, Jun; Chen, Shiping; Du, Mingyuan; Davaa, Gombo; Kato, Tomomichi; Liu, Qiang; Liu, Suhong; Li, Shenggong; Shao, Changliang; Tang, Yanhong; Zhao, Xiang
2014-01-01
The regression tree method is used to upscale evapotranspiration (ET) measurements at eddy-covariance (EC) towers to the grassland ecosystems over the Dryland East Asia (DEA). The regression tree model was driven by satellite and meteorology datasets, and explained 82% and 76% of the variations of ET observations in the calibration and validation datasets, respectively. The annual ET estimates ranged from 222.6 to 269.1 mm yr−1 over the DEA region with an average of 245.8 mm yr−1 from 1982 through 2009. Ecosystem ET showed decreased trends over 61% of the DEA region during this period, especially in most regions of Mongolia and eastern Inner Mongolia due to decreased precipitation. The increased ET occurred primarily in the western and southern DEA region. Over the entire study area, water balance (the difference between precipitation and ecosystem ET) decreased substantially during the summer and growing season. Precipitation reduction was an important cause for the severe water deficits. The drying trend occurring in the grassland ecosystems of the DEA region can exert profound impacts on a variety of terrestrial ecosystem processes and functions. PMID:24845063
Xia, Jiangzhou; Liang, Shunlin; Chen, Jiquan; Yuan, Wenping; Liu, Shuguang; Li, Linghao; Cai, Wenwen; Zhang, Li; Fu, Yang; Zhao, Tianbao; Feng, Jinming; Ma, Zhuguo; Ma, Mingguo; Liu, Shaomin; Zhou, Guangsheng; Asanuma, Jun; Chen, Shiping; Du, Mingyuan; Davaa, Gombo; Kato, Tomomichi; Liu, Qiang; Liu, Suhong; Li, Shenggong; Shao, Changliang; Tang, Yanhong; Zhao, Xiang
2014-01-01
The regression tree method is used to upscale evapotranspiration (ET) measurements at eddy-covariance (EC) towers to the grassland ecosystems over the Dryland East Asia (DEA). The regression tree model was driven by satellite and meteorology datasets, and explained 82% and 76% of the variations of ET observations in the calibration and validation datasets, respectively. The annual ET estimates ranged from 222.6 to 269.1 mm yr(-1) over the DEA region with an average of 245.8 mm yr(-1) from 1982 through 2009. Ecosystem ET showed decreased trends over 61% of the DEA region during this period, especially in most regions of Mongolia and eastern Inner Mongolia due to decreased precipitation. The increased ET occurred primarily in the western and southern DEA region. Over the entire study area, water balance (the difference between precipitation and ecosystem ET) decreased substantially during the summer and growing season. Precipitation reduction was an important cause for the severe water deficits. The drying trend occurring in the grassland ecosystems of the DEA region can exert profound impacts on a variety of terrestrial ecosystem processes and functions.
Comparing field- and model-based standing dead tree carbon stock estimates across forests of the US
Chistopher W. Woodall; Grant M. Domke; David W. MacFarlane; Christopher M. Oswalt
2012-01-01
As signatories to the United Nation Framework Convention on Climate Change, the US has been estimating standing dead tree (SDT) carbon (C) stocks using a model based on live tree attributes. The USDA Forest Service began sampling SDTs nationwide in 1999. With comprehensive field data now available, the objective of this study was to compare field- and model-based...
Voxel-Based 3-D Tree Modeling from Lidar Images for Extracting Tree Structual Information
NASA Astrophysics Data System (ADS)
Hosoi, F.
2014-12-01
Recently, lidar (light detection and ranging) has been used to extracting tree structural information. Portable scanning lidar systems can capture the complex shape of individual trees as a 3-D point-cloud image. 3-D tree models reproduced from the lidar-derived 3-D image can be used to estimate tree structural parameters. We have proposed the voxel-based 3-D modeling for extracting tree structural parameters. One of the tree parameters derived from the voxel modeling is leaf area density (LAD). We refer to the method as the voxel-based canopy profiling (VCP) method. In this method, several measurement points surrounding the canopy and optimally inclined laser beams are adopted for full laser beam illumination of whole canopy up to the internal. From obtained lidar image, the 3-D information is reproduced as the voxel attributes in the 3-D voxel array. Based on the voxel attributes, contact frequency of laser beams on leaves is computed and LAD in each horizontal layer is obtained. This method offered accurate LAD estimation for individual trees and woody canopy trees. For more accurate LAD estimation, the voxel model was constructed by combining airborne and portable ground-based lidar data. The profiles obtained by the two types of lidar complemented each other, thus eliminating blind regions and yielding more accurate LAD profiles than could be obtained by using each type of lidar alone. Based on the estimation results, we proposed an index named laser beam coverage index, Ω, which relates to the lidar's laser beam settings and a laser beam attenuation factor. It was shown that this index can be used for adjusting measurement set-up of lidar systems and also used for explaining the LAD estimation error using different types of lidar systems. Moreover, we proposed a method to estimate woody material volume as another application of the voxel tree modeling. In this method, voxel solid model of a target tree was produced from the lidar image, which is composed of consecutive voxels that filled the outer surface and the interior of the stem and large branches. From the model, the woody material volume of any part of the target tree can be directly calculated easily by counting the number of corresponding voxels and multiplying the result by the per-voxel volume.
Dynamic travel time estimation using regression trees.
DOT National Transportation Integrated Search
2008-10-01
This report presents a methodology for travel time estimation by using regression trees. The dissemination of travel time information has become crucial for effective traffic management, especially under congested road conditions. In the absence of c...
Differential Diagnosis of Erythmato-Squamous Diseases Using Classification and Regression Tree.
Maghooli, Keivan; Langarizadeh, Mostafa; Shahmoradi, Leila; Habibi-Koolaee, Mahdi; Jebraeily, Mohamad; Bouraghi, Hamid
2016-10-01
Differential diagnosis of Erythmato-Squamous Diseases (ESD) is a major challenge in the field of dermatology. The ESD diseases are placed into six different classes. Data mining is the process for detection of hidden patterns. In the case of ESD, data mining help us to predict the diseases. Different algorithms were developed for this purpose. we aimed to use the Classification and Regression Tree (CART) to predict differential diagnosis of ESD. we used the Cross Industry Standard Process for Data Mining (CRISP-DM) methodology. For this purpose, the dermatology data set from machine learning repository, UCI was obtained. The Clementine 12.0 software from IBM Company was used for modelling. In order to evaluation of the model we calculate the accuracy, sensitivity and specificity of the model. The proposed model had an accuracy of 94.84% (. 24.42) in order to correct prediction of the ESD disease. Results indicated that using of this classifier could be useful. But, it would be strongly recommended that the combination of machine learning methods could be more useful in terms of prediction of ESD.
NASA Astrophysics Data System (ADS)
Rogers, Jeffrey N.; Parrish, Christopher E.; Ward, Larry G.; Burdick, David M.
2018-03-01
Salt marsh vegetation tends to increase vertical uncertainty in light detection and ranging (lidar) derived elevation data, often causing the data to become ineffective for analysis of topographic features governing tidal inundation or vegetation zonation. Previous attempts at improving lidar data collected in salt marsh environments range from simply computing and subtracting the global elevation bias to more complex methods such as computing vegetation-specific, constant correction factors. The vegetation specific corrections can be used along with an existing habitat map to apply separate corrections to different areas within a study site. It is hypothesized here that correcting salt marsh lidar data by applying location-specific, point-by-point corrections, which are computed from lidar waveform-derived features, tidal-datum based elevation, distance from shoreline and other lidar digital elevation model based variables, using nonparametric regression will produce better results. The methods were developed and tested using full-waveform lidar and ground truth for three marshes in Cape Cod, Massachusetts, U.S.A. Five different model algorithms for nonparametric regression were evaluated, with TreeNet's stochastic gradient boosting algorithm consistently producing better regression and classification results. Additionally, models were constructed to predict the vegetative zone (high marsh and low marsh). The predictive modeling methods used in this study estimated ground elevation with a mean bias of 0.00 m and a standard deviation of 0.07 m (0.07 m root mean square error). These methods appear very promising for correction of salt marsh lidar data and, importantly, do not require an existing habitat map, biomass measurements, or image based remote sensing data such as multi/hyperspectral imagery.
Don C. Bragg
2002-01-01
This article is an introduction to the computer software used by the Potential Relative Increment (PRI) approach to optimal tree diameter growth modeling. These DOS programs extract qualified tree and plot data from the Eastwide Forest Inventory Data Base (EFIDB), calculate relative tree increment, sort for the highest relative increments by diameter class, and...
Khalkhali, Hamid Reza; Lotfnezhad Afshar, Hadi; Esnaashari, Omid; Jabbari, Nasrollah
2016-01-01
Breast cancer survival has been analyzed by many standard data mining algorithms. A group of these algorithms belonged to the decision tree category. Ability of the decision tree algorithms in terms of visualizing and formulating of hidden patterns among study variables were main reasons to apply an algorithm from the decision tree category in the current study that has not studied already. The classification and regression trees (CART) was applied to a breast cancer database contained information on 569 patients in 2007-2010. The measurement of Gini impurity used for categorical target variables was utilized. The classification error that is a function of tree size was measured by 10-fold cross-validation experiments. The performance of created model was evaluated by the criteria as accuracy, sensitivity and specificity. The CART model produced a decision tree with 17 nodes, 9 of which were associated with a set of rules. The rules were meaningful clinically. They showed in the if-then format that Stage was the most important variable for predicting breast cancer survival. The scores of accuracy, sensitivity and specificity were: 80.3%, 93.5% and 53%, respectively. The current study model as the first one created by the CART was able to extract useful hidden rules from a relatively small size dataset.
Using nonlinear quantile regression to estimate the self-thinning boundary curve
Quang V. Cao; Thomas J. Dean
2015-01-01
The relationship between tree size (quadratic mean diameter) and tree density (number of trees per unit area) has been a topic of research and discussion for many decades. Starting with Reineke in 1933, the maximum size-density relationship, on a log-log scale, has been assumed to be linear. Several techniques, including linear quantile regression, have been employed...
Altamirano, J; Augustin, S; Muntaner, L; Zapata, L; González-Angulo, A; Martínez, B; Flores-Arroyo, A; Camargo, L; Genescá, J
2010-01-01
Variceal bleeding (VB) is the main cause of death among cirrhotic patients. About 30-50% of early rebleeding is encountered few days after the acute episode of VB. It is necessary to stratify patients with high risk of very early rebleeding (VER) for more aggressive therapies. However, there are few and incompletely understood prognostic models for this purpose. To determine the risk factors associated with VER after an acute VB. Assessment and comparison of a novel prognostic model generated by Classification and Regression Tree Analysis (CART) with classic-used models (MELD and Child-Pugh [CP]). Sixty consecutive cirrhotic patients with acute variceal bleeding. CART analysis, MELD and Child-Pugh scores were performed at admission. Receiver operating characteristic (ROC) curves were constructed to evaluate the predictive performance of the models. Very early rebleeding rate was 13%. Variables associated with VER were: serum albumin (p = 0.027), creatinine (p = 0.021) and transfused blood units in the first 24 hrs (p = 0.05). The area under the ROC for MELD, CHILD-Pugh and CART were 0.46, 0.50 and 0.82, respectively. The value of cut analyzed by CART for the significant variables were: 1) Albumin 2.85 mg/dL, 2) Packed red cells 2 units and 3) Creatinine 1.65 mg/dL the ABC-ROC. Serum albumin, creatinine and number of transfused blood units were associated with VER. A simple CART algorithm combining these variables allows an accurate predictive assessment of VER after acute variceal bleeding. Key words: cirrhosis, variceal bleeding, esophageal varices, prognosis, portal hypertension.
On Tree-Based Phylogenetic Networks.
Zhang, Louxin
2016-07-01
A large class of phylogenetic networks can be obtained from trees by the addition of horizontal edges between the tree edges. These networks are called tree-based networks. We present a simple necessary and sufficient condition for tree-based networks and prove that a universal tree-based network exists for any number of taxa that contains as its base every phylogenetic tree on the same set of taxa. This answers two problems posted by Francis and Steel recently. A byproduct is a computer program for generating random binary phylogenetic networks under the uniform distribution model.
Yamashita, Takashi; Kart, Cary S; Noe, Douglas A
2012-12-01
Type 2 diabetes is known to contribute to health disparities in the U.S. and failure to adhere to recommended self-care behaviors is a contributing factor. Intervention programs face difficulties as a result of patient diversity and limited resources. With data from the 2005 Behavioral Risk Factor Surveillance System, this study employs a logistic regression tree algorithm to identify characteristics of sub-populations with type 2 diabetes according to their reported frequency of adherence to four recommended diabetes self-care behaviors including blood glucose monitoring, foot examination, eye examination and HbA1c testing. Using Andersen's health behavior model, need factors appear to dominate the definition of which sub-groups were at greatest risk for low as well as high adherence. Findings demonstrate the utility of easily interpreted tree diagrams to design specific culturally appropriate intervention programs targeting sub-populations of diabetes patients who need to improve their self-care behaviors. Limitations and contributions of the study are discussed.
Kabeshova, A; Annweiler, C; Fantino, B; Philip, T; Gromov, V A; Launay, C P; Beauchet, O
2014-06-01
Regression tree (RT) analyses are particularly adapted to explore the risk of recurrent falling according to various combinations of fall risk factors compared to logistic regression models. The aims of this study were (1) to determine which combinations of fall risk factors were associated with the occurrence of recurrent falls in older community-dwellers, and (2) to compare the efficacy of RT and multiple logistic regression model for the identification of recurrent falls. A total of 1,760 community-dwelling volunteers (mean age ± standard deviation, 71.0 ± 5.1 years; 49.4 % female) were recruited prospectively in this cross-sectional study. Age, gender, polypharmacy, use of psychoactive drugs, fear of falling (FOF), cognitive disorders and sad mood were recorded. In addition, the history of falls within the past year was recorded using a standardized questionnaire. Among 1,760 participants, 19.7 % (n = 346) were recurrent fallers. The RT identified 14 nodes groups and 8 end nodes with FOF as the first major split. Among participants with FOF, those who had sad mood and polypharmacy formed the end node with the greatest OR for recurrent falls (OR = 6.06 with p < 0.001). Among participants without FOF, those who were male and not sad had the lowest OR for recurrent falls (OR = 0.25 with p < 0.001). The RT correctly classified 1,356 from 1,414 non-recurrent fallers (specificity = 95.6 %), and 65 from 346 recurrent fallers (sensitivity = 18.8 %). The overall classification accuracy was 81.0 %. The multiple logistic regression correctly classified 1,372 from 1,414 non-recurrent fallers (specificity = 97.0 %), and 61 from 346 recurrent fallers (sensitivity = 17.6 %). The overall classification accuracy was 81.4 %. Our results show that RT may identify specific combinations of risk factors for recurrent falls, the combination most associated with recurrent falls involving FOF, sad mood and polypharmacy. The FOF emerged as the risk factor strongly associated with recurrent falls. In addition, RT and multiple logistic regression were not sensitive enough to identify the majority of recurrent fallers but appeared efficient in detecting individuals not at risk of recurrent falls.
Prognoses of diameter and height of trees of eucalyptus using artificial intelligence.
Vieira, Giovanni Correia; de Mendonça, Adriano Ribeiro; da Silva, Gilson Fernandes; Zanetti, Sidney Sára; da Silva, Mayra Marques; Dos Santos, Alexandre Rosa
2018-04-01
Models of individual trees are composed of sub-models that generally estimate competition, mortality, and growth in height and diameter of each tree. They are usually adopted when we want more detailed information to estimate forest multiproduct. In these models, estimates of growth in diameter at 1.30m above the ground (DBH) and total height (H) are obtained by regression analysis. Recently, artificial intelligence techniques (AIT) have been used with satisfactory performance in forest measurement. Therefore, the objective of this study was to evaluate the performance of two AIT, artificial neural networks and adaptive neuro-fuzzy inference system, to estimate the growth in DBH and H of eucalyptus trees. We used data of continuous forest inventories of eucalyptus, with annual measurements of DBH, H, and the dominant height of trees of 398 plots, plus two qualitative variables: genetic material and site index. It was observed that the two AIT showed accuracy in growth estimation of DBH and H. Therefore, the two techniques discussed can be used for the prognosis of DBH and H in even-aged eucalyptus stands. The techniques used could also be adapted to other areas and forest species. Copyright © 2017 Elsevier B.V. All rights reserved.
Dong, Ling-Bo; Liu, Zhao-Gang; Li, Feng-Ri; Jiang, Li-Chun
2013-09-01
By using the branch analysis data of 955 standard branches from 60 sampled trees in 12 sampling plots of Pinus koraiensis plantation in Mengjiagang Forest Farm in Heilongjiang Province of Northeast China, and based on the linear mixed-effect model theory and methods, the models for predicting branch variables, including primary branch diameter, length, and angle, were developed. Considering tree effect, the MIXED module of SAS software was used to fit the prediction models. The results indicated that the fitting precision of the models could be improved by choosing appropriate random-effect parameters and variance-covariance structure. Then, the correlation structures including complex symmetry structure (CS), first-order autoregressive structure [AR(1)], and first-order autoregressive and moving average structure [ARMA(1,1)] were added to the optimal branch size mixed-effect model. The AR(1) improved the fitting precision of branch diameter and length mixed-effect model significantly, but all the three structures didn't improve the precision of branch angle mixed-effect model. In order to describe the heteroscedasticity during building mixed-effect model, the CF1 and CF2 functions were added to the branch mixed-effect model. CF1 function improved the fitting effect of branch angle mixed model significantly, whereas CF2 function improved the fitting effect of branch diameter and length mixed model significantly. Model validation confirmed that the mixed-effect model could improve the precision of prediction, as compare to the traditional regression model for the branch size prediction of Pinus koraiensis plantation.
NASA Astrophysics Data System (ADS)
Sayegh, Arwa; Tate, James E.; Ropkins, Karl
2016-02-01
Oxides of Nitrogen (NOx) is a major component of photochemical smog and its constituents are considered principal traffic-related pollutants affecting human health. This study investigates the influence of background concentrations of NOx, traffic density, and prevailing meteorological conditions on roadside concentrations of NOx at UK urban, open motorway, and motorway tunnel sites using the statistical approach Boosted Regression Trees (BRT). BRT models have been fitted using hourly concentration, traffic, and meteorological data for each site. The models predict, rank, and visualise the relationship between model variables and roadside NOx concentrations. A strong relationship between roadside NOx and monitored local background concentrations is demonstrated. Relationships between roadside NOx and other model variables have been shown to be strongly influenced by the quality and resolution of background concentrations of NOx, i.e. if it were based on monitored data or modelled prediction. The paper proposes a direct method of using site-specific fundamental diagrams for splitting traffic data into four traffic states: free-flow, busy-flow, congested, and severely congested. Using BRT models, the density of traffic (vehicles per kilometre) was observed to have a proportional influence on the concentrations of roadside NOx, with different fitted regression line slopes for the different traffic states. When other influences are conditioned out, the relationship between roadside concentrations and ambient air temperature suggests NOx concentrations reach a minimum at around 22 °C with high concentrations at low ambient air temperatures which could be associated to restricted atmospheric dispersion and/or to changes in road traffic exhaust emission characteristics at low ambient air temperatures. This paper uses BRT models to study how different critical factors, and their relative importance, influence the variation of roadside NOx concentrations. The paper highlights the importance of either setting up local background continuous monitors or improving the quality and resolution of modelled UK background maps and the need to further investigate the influence of ambient air temperature on NOx emissions and roadside NOx concentrations.
Analyses of rear-end crashes based on classification tree models.
Yan, Xuedong; Radwan, Essam
2006-09-01
Signalized intersections are accident-prone areas especially for rear-end crashes due to the fact that the diversity of the braking behaviors of drivers increases during the signal change. The objective of this article is to improve knowledge of the relationship between rear-end crashes occurring at signalized intersections and a series of potential traffic risk factors classified by driver characteristics, environments, and vehicle types. Based on the 2001 Florida crash database, the classification tree method and Quasi-induced exposure concept were used to perform the statistical analysis. Two binary classification tree models were developed in this study. One was used for the crash comparison between rear-end and non-rear-end to identify those specific trends of the rear-end crashes. The other was constructed for the comparison between striking vehicles/drivers (at-fault) and struck vehicles/drivers (not-at-fault) to find more complex crash pattern associated with the traffic attributes of driver, vehicle, and environment. The modeling results showed that the rear-end crashes are over-presented in the higher speed limits (45-55 mph); the rear-end crash propensity for daytime is apparently larger than nighttime; and the reduction of braking capacity due to wet and slippery road surface conditions would definitely contribute to rear-end crashes, especially at intersections with higher speed limits. The tree model segmented drivers into four homogeneous age groups: < 21 years, 21-31 years, 32-75 years, and > 75 years. The youngest driver group shows the largest crash propensity; in the 21-31 age group, the male drivers are over-involved in rear-end crashes under adverse weather conditions and the 32-75 years drivers driving large size vehicles have a larger crash propensity compared to those driving passenger vehicles. Combined with the quasi-induced exposure concept, the classification tree method is a proper statistical tool for traffic-safety analysis to investigate crash propensity. Compared to the logistic regression models, tree models have advantages for handling continuous independent variables and easily explaining the complex interaction effect with more than two independent variables. This research recommended that at signalized intersections with higher speed limits, reducing the speed limit to 40 mph efficiently contribute to a lower accident rate. Drivers involved in alcohol use may increase not only rear-end crash risk but also the driver injury severity. Education and enforcement countermeasures should focus on the driver group younger than 21 years. Further studies are suggested to compare crash risk distributions of the driver age for other main crash types to seek corresponding traffic countermeasures.
Macalady, Alison K; Bugmann, Harald
2014-01-01
The processes leading to drought-associated tree mortality are poorly understood, particularly long-term predisposing factors, memory effects, and variability in mortality processes and thresholds in space and time. We use tree rings from four sites to investigate Pinus edulis mortality during two drought periods in the southwestern USA. We draw on recent sampling and archived collections to (1) analyze P. edulis growth patterns and mortality during the 1950s and 2000s droughts; (2) determine the influence of climate and competition on growth in trees that died and survived; and (3) derive regression models of growth-mortality risk and evaluate their performance across space and time. Recent growth was 53% higher in surviving vs. dying trees, with some sites exhibiting decades-long growth divergences associated with previous drought. Differential growth response to climate partly explained growth differences between live and dead trees, with responses wet/cool conditions most influencing eventual tree status. Competition constrained tree growth, and reduced trees' ability to respond to favorable climate. The best predictors in growth-mortality models included long-term (15-30 year) average growth rate combined with a metric of growth variability and the number of abrupt growth increases over 15 and 10 years, respectively. The most parsimonious models had high discriminatory power (ROC>0.84) and correctly classified ∼ 70% of trees, suggesting that aspects of tree growth, especially over decades, can be powerful predictors of widespread drought-associated die-off. However, model discrimination varied across sites and drought events. Weaker growth-mortality relationships and higher growth at lower survival probabilities for some sites during the 2000s event suggest a shift in mortality processes from longer-term growth-related constraints to shorter-term processes, such as rapid metabolic decline even in vigorous trees due to acute drought stress, and/or increases in the attack rate of both chronically stressed and more vigorous trees by bark beetles.
Tree-, stand- and site-specific controls on landscape-scale patterns of transpiration
NASA Astrophysics Data System (ADS)
Hassler, Sibylle; Markus, Weiler; Theresa, Blume
2017-04-01
Transpiration is a key process in the hydrological cycle and a sound understanding and quantification of transpiration and its spatial variability is essential for management decisions as well as for improving the parameterisation of hydrological and soil-vegetation-atmosphere transfer models. For individual trees, transpiration is commonly estimated by measuring sap flow. Besides evaporative demand and water availability, tree-specific characteristics such as species, size or social status control sap flow amounts of individual trees. Within forest stands, properties such as species composition, basal area or stand density additionally affect sap flow, for example via competition mechanisms. Finally, sap flow patterns might also be influenced by landscape-scale characteristics such as geology, slope position or aspect because they affect water and energy availability; however, little is known about the dynamic interplay of these controls. We studied the relative importance of various tree-, stand- and site-specific characteristics with multiple linear regression models to explain the variability of sap velocity measurements in 61 beech and oak trees, located at 24 sites spread over a 290 km2-catchment in Luxembourg. For each of 132 consecutive days of the growing season of 2014 we modelled the daily sap velocities of these 61 trees and determined the importance of the different predictors. Results indicate that a combination of tree-, stand- and site-specific factors controls sap velocity patterns in the landscape, namely tree species, tree diameter, the stand density, geology and aspect. Compared to these predictors, spatial variability of atmospheric demand and soil moisture explains only a small fraction of the variability in the daily datasets. However, the temporal dynamics of the explanatory power of the tree-specific characteristics, especially species, are correlated to the temporal dynamics of potential evaporation. Thus, transpiration estimates at the landscape scale would benefit from not only considering hydro-meteorological drivers, but also including tree, stand and site characteristics in order to improve the spatial representation of transpiration for hydrological and soil-vegetation-atmosphere transfer models.
Remote Sensing Protocols for Parameterizing an Individual, Tree-Based, Forest Growth and Yield Model
2014-09-01
Leaf-Off Tree Crowns in Small Footprint, High Sampling Density LIDAR Data from Eastern Deciduous Forests in North America.” Remote Sensing of...William A. 2003. “Crown-Diameter Prediction Models for 87 Species of Stand- Grown Trees in the Eastern United States.” Southern Journal of Applied...ER D C/ CE RL T R- 14 -1 8 Base Facilities Environmental Quality Remote Sensing Protocols for Parameterizing an Individual, Tree -Based
Chin, Weng-Yee; Wan, Eric Yuk Fai; Dowrick, Christopher; Arroll, Bruce; Lam, Cindy Lo Kuen
2018-04-26
The aim of this study was to explore the relationship between patient self-reported Patient Health Questionnaire-9 (PHQ-9) symptoms and doctor diagnosis of depression using a tree analysis approach. This was a secondary analysis on a dataset obtained from 10 179 adult primary care patients and 59 primary care physicians (PCPs) across Hong Kong. Patients completed a waiting room survey collecting data on socio-demographics and the PHQ-9. Blinded doctors documented whether they thought the patient had depression. Data were analyzed using multiple logistic regression and conditional inference decision tree modeling. PCPs diagnosed 594 patients with depression. Logistic regression identified gender, age, employment status, past history of depression, family history of mental illness and recent doctor visit as factors associated with a depression diagnosis. Tree analyses revealed different pathways of association between PHQ-9 symptoms and depression diagnosis for patients with and without past depression. The PHQ-9 symptom model revealed low mood, sense of worthlessness, fatigue, sleep disturbance and functional impairment as early classifiers. The PHQ-9 total score model revealed cut-off scores of >12 and >15 were most frequently associated with depression diagnoses in patients with and without past depression. A past history of depression is the most significant factor associated with the diagnosis of depression. PCPs appear to utilize a hypothetical-deductive problem-solving approach incorporating pre-test probability, with different associated factors for patients with and without past depression. Diagnostic thresholds may be too low for patients with past depression and too high for those without, potentially leading to over and under diagnosis of depression.
Predicting the limits to tree height using statistical regressions of leaf traits.
Burgess, Stephen S O; Dawson, Todd E
2007-01-01
Leaf morphology and physiological functioning demonstrate considerable plasticity within tree crowns, with various leaf traits often exhibiting pronounced vertical gradients in very tall trees. It has been proposed that the trajectory of these gradients, as determined by regression methods, could be used in conjunction with theoretical biophysical limits to estimate the maximum height to which trees can grow. Here, we examined this approach using published and new experimental data from tall conifer and angiosperm species. We showed that height predictions were sensitive to tree-to-tree variation in the shape of the regression and to the biophysical endpoints selected. We examined the suitability of proposed end-points and their theoretical validity. We also noted that site and environment influenced height predictions considerably. Use of leaf mass per unit area or leaf water potential coupled with vulnerability of twigs to cavitation poses a number of difficulties for predicting tree height. Photosynthetic rate and carbon isotope discrimination show more promise, but in the second case, the complex relationship between light, water availability, photosynthetic capacity and internal conductance to CO(2) must first be characterized.
A quantitative analysis to objectively appraise drought indicators and model drought impacts
NASA Astrophysics Data System (ADS)
Bachmair, S.; Svensson, C.; Hannaford, J.; Barker, L. J.; Stahl, K.
2016-07-01
Drought monitoring and early warning is an important measure to enhance resilience towards drought. While there are numerous operational systems using different drought indicators, there is no consensus on which indicator best represents drought impact occurrence for any given sector. Furthermore, thresholds are widely applied in these indicators but, to date, little empirical evidence exists as to which indicator thresholds trigger impacts on society, the economy, and ecosystems. The main obstacle for evaluating commonly used drought indicators is a lack of information on drought impacts. Our aim was therefore to exploit text-based data from the European Drought Impact report Inventory (EDII) to identify indicators that are meaningful for region-, sector-, and season-specific impact occurrence, and to empirically determine indicator thresholds. In addition, we tested the predictability of impact occurrence based on the best-performing indicators. To achieve these aims we applied a correlation analysis and an ensemble regression tree approach, using Germany and the UK (the most data-rich countries in the EDII) as test beds. As candidate indicators we chose two meteorological indicators (Standardized Precipitation Index, SPI, and Standardized Precipitation Evaporation Index, SPEI) and two hydrological indicators (streamflow and groundwater level percentiles). The analysis revealed that accumulation periods of SPI and SPEI best linked to impact occurrence are longer for the UK compared with Germany, but there is variability within each country, among impact categories and, to some degree, seasons. The median of regression tree splitting values, which we regard as estimates of thresholds of impact occurrence, was around -1 for SPI and SPEI in the UK; distinct differences between northern/northeastern vs. southern/central regions were found for Germany. Predictions with the ensemble regression tree approach yielded reasonable results for regions with good impact data coverage. The predictions also provided insights into the EDII, in particular highlighting drought events where missing impact reports may reflect a lack of recording rather than true absence of impacts. Overall, the presented quantitative framework proved to be a useful tool for evaluating drought indicators, and to model impact occurrence. In summary, this study demonstrates the information gain for drought monitoring and early warning through impact data collection and analysis. It highlights the important role that quantitative analysis with impact data can have in providing "ground truth" for drought indicators, alongside more traditional stakeholder-led approaches.
Nam, Vu Thanh; van Kuijk, Marijke; Anten, Niels P R
2016-01-01
Allometric regression models are widely used to estimate tropical forest biomass, but balancing model accuracy with efficiency of implementation remains a major challenge. In addition, while numerous models exist for aboveground mass, very few exist for roots. We developed allometric equations for aboveground biomass (AGB) and root biomass (RB) based on 300 (of 45 species) and 40 (of 25 species) sample trees respectively, in an evergreen forest in Vietnam. The biomass estimations from these local models were compared to regional and pan-tropical models. For AGB we also compared local models that distinguish functional types to an aggregated model, to assess the degree of specificity needed in local models. Besides diameter at breast height (DBH) and tree height (H), wood density (WD) was found to be an important parameter in AGB models. Existing pan-tropical models resulted in up to 27% higher estimates of AGB, and overestimated RB by nearly 150%, indicating the greater accuracy of local models at the plot level. Our functional group aggregated local model which combined data for all species, was as accurate in estimating AGB as functional type specific models, indicating that a local aggregated model is the best choice for predicting plot level AGB in tropical forests. Finally our study presents the first allometric biomass models for aboveground and root biomass in forests in Vietnam.
Nam, Vu Thanh; van Kuijk, Marijke; Anten, Niels P. R.
2016-01-01
Allometric regression models are widely used to estimate tropical forest biomass, but balancing model accuracy with efficiency of implementation remains a major challenge. In addition, while numerous models exist for aboveground mass, very few exist for roots. We developed allometric equations for aboveground biomass (AGB) and root biomass (RB) based on 300 (of 45 species) and 40 (of 25 species) sample trees respectively, in an evergreen forest in Vietnam. The biomass estimations from these local models were compared to regional and pan-tropical models. For AGB we also compared local models that distinguish functional types to an aggregated model, to assess the degree of specificity needed in local models. Besides diameter at breast height (DBH) and tree height (H), wood density (WD) was found to be an important parameter in AGB models. Existing pan-tropical models resulted in up to 27% higher estimates of AGB, and overestimated RB by nearly 150%, indicating the greater accuracy of local models at the plot level. Our functional group aggregated local model which combined data for all species, was as accurate in estimating AGB as functional type specific models, indicating that a local aggregated model is the best choice for predicting plot level AGB in tropical forests. Finally our study presents the first allometric biomass models for aboveground and root biomass in forests in Vietnam. PMID:27309718
Estimating Mixed Broadleaves Forest Stand Volume Using Dsm Extracted from Digital Aerial Images
NASA Astrophysics Data System (ADS)
Sohrabi, H.
2012-07-01
In mixed old growth broadleaves of Hyrcanian forests, it is difficult to estimate stand volume at plot level by remotely sensed data while LiDar data is absent. In this paper, a new approach has been proposed and tested for estimating stand forest volume. The approach is based on this idea that forest volume can be estimated by variation of trees height at plots. In the other word, the more the height variation in plot, the more the stand volume would be expected. For testing this idea, 120 circular 0.1 ha sample plots with systematic random design has been collected in Tonekaon forest located in Hyrcanian zone. Digital surface model (DSM) measure the height values of the first surface on the ground including terrain features, trees, building etc, which provides a topographic model of the earth's surface. The DSMs have been extracted automatically from aerial UltraCamD images so that ground pixel size for extracted DSM varied from 1 to 10 m size by 1m span. DSMs were checked manually for probable errors. Corresponded to ground samples, standard deviation and range of DSM pixels have been calculated. For modeling, non-linear regression method was used. The results showed that standard deviation of plot pixels with 5 m resolution was the most appropriate data for modeling. Relative bias and RMSE of estimation was 5.8 and 49.8 percent, respectively. Comparing to other approaches for estimating stand volume based on passive remote sensing data in mixed broadleaves forests, these results are more encouraging. One big problem in this method occurs when trees canopy cover is totally closed. In this situation, the standard deviation of height is low while stand volume is high. In future studies, applying forest stratification could be studied.
Shuguang Liua; Pamela Anderson; Guoyi Zhoud; Boone Kauffman; Flint Hughes; David Schimel; Vicente Watson; Joseph Tosi
2008-01-01
Objectively assessing the performance of a model and deriving model parameter values from observations are critical and challenging in landscape to regional modeling. In this paper, we applied a nonlinear inversion technique to calibrate the ecosystem model CENTURY against carbon (C) and nitrogen (N) stock measurements collected from 39 mature tropical forest sites in...
Henry W. Mcnab; Thomas F. Lloyd
1999-01-01
Models of forest vegetation dynamics based on characteristics of individual trees are more suitable to predicting growth of multiple species and age classes than those based on stands. The objective of this study was to assess age- and site index-independent relationships between periodic diameter increment and tree and site effects for 11 major hardwood tree species....
NASA Technical Reports Server (NTRS)
Cohen, Warren B.; Spies, Thomas A.
1992-01-01
Relationships between spectral and texture variables derived from SPOT HRV 10 m panchromatic and Landsat TM 30 m multispectral data and 16 forest stand structural attributes is evaluated to determine the utility of satellite data for analysis of hemlock forests west of the Cascade Mountains crest in Oregon and Washington, USA. Texture of the HRV data was found to be strongly related to many of the stand attributes evaluated, whereas TM texture was weakly related to all attributes. Data analysis based on regression models indicates that both TM and HRV imagery should yield equally accurate estimates of forest age class and stand structure. It is concluded that the satellite data are a valuable source for estimation of the standard deviation of tree sizes, mean size and density of trees in the upper canopy layers, a structural complexity index, and stand age.
Equations for predicting biomass in 2- to 6-year-old Eucalyptus saligna in Hawaii
Craig D. Whitesell; Susan C. Miyasaka; Robert F. Strand; Thomas H. Schubert; Katharine E. McDuffie
1988-01-01
Eucalyptus saligna trees grown in short-rotation plantations on the island of Hawaii were measured, harvested, and weighed to provide data for developing regression equations using non-destructive stand measurements. Regression analysis of the data from 190 trees in the 2.0- to 3.5-year range and 96 trees in the 4- to 6-year range related stem-only...
Predicting heavy metal concentrations in soils and plants using field spectrophotometry
NASA Astrophysics Data System (ADS)
Muradyan, V.; Tepanosyan, G.; Asmaryan, Sh.; Sahakyan, L.; Saghatelyan, A.; Warner, T. A.
2017-09-01
Aim of this study is to predict heavy metal (HM) concentrations in soils and plants using field remote sensing methods. The studied sites were an industrial town of Kajaran and city of Yerevan. The research also included sampling of soils and leaves of two tree species exposed to different pollution levels and determination of contents of HM in lab conditions. The obtained spectral values were then collated with contents of HM in Kajaran soils and the tree leaves sampled in Yerevan, and statistical analysis was done. Consequently, Zn and Pb have a negative correlation coefficient (p <0.01) in a 2498 nm spectral range for soils. Pb has a significantly higher correlation at red edge for plants. A regression models and artificial neural network (ANN) for HM prediction were developed. Good results were obtained for the best stress sensitive spectral band ANN (R2 0.9, RPD 2.0), Simple Linear Regression (SLR) and Partial Least Squares Regression (PLSR) (R2 0.7, RPD 1.4) models. Multiple Linear Regression (MLR) model was not applicable to predict Pb and Zn concentrations in soils in this research. Almost all full spectrum PLS models provide good calibration and validation results (RPD>1.4). Full spectrum ANN models are characterized by excellent calibration R2, rRMSE and RPD (0.9; 0.1 and >2.5 respectively). For prediction of Pb and Ni contents in plants SLR and PLS models were used. The latter provide almost the same results. Our findings indicate that it is possible to make coarse direct estimation of HM content in soils and plants using rapid and economic reflectance spectroscopy.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Singh, Kunwar P., E-mail: kpsingh_52@yahoo.com; Gupta, Shikha
Ensemble learning approach based decision treeboost (DTB) and decision tree forest (DTF) models are introduced in order to establish quantitative structure–toxicity relationship (QSTR) for the prediction of toxicity of 1450 diverse chemicals. Eight non-quantum mechanical molecular descriptors were derived. Structural diversity of the chemicals was evaluated using Tanimoto similarity index. Stochastic gradient boosting and bagging algorithms supplemented DTB and DTF models were constructed for classification and function optimization problems using the toxicity end-point in T. pyriformis. Special attention was drawn to prediction ability and robustness of the models, investigated both in external and 10-fold cross validation processes. In complete data,more » optimal DTB and DTF models rendered accuracies of 98.90%, 98.83% in two-category and 98.14%, 98.14% in four-category toxicity classifications. Both the models further yielded classification accuracies of 100% in external toxicity data of T. pyriformis. The constructed regression models (DTB and DTF) using five descriptors yielded correlation coefficients (R{sup 2}) of 0.945, 0.944 between the measured and predicted toxicities with mean squared errors (MSEs) of 0.059, and 0.064 in complete T. pyriformis data. The T. pyriformis regression models (DTB and DTF) applied to the external toxicity data sets yielded R{sup 2} and MSE values of 0.637, 0.655; 0.534, 0.507 (marine bacteria) and 0.741, 0.691; 0.155, 0.173 (algae). The results suggest for wide applicability of the inter-species models in predicting toxicity of new chemicals for regulatory purposes. These approaches provide useful strategy and robust tools in the screening of ecotoxicological risk or environmental hazard potential of chemicals. - Graphical abstract: Importance of input variables in DTB and DTF classification models for (a) two-category, and (b) four-category toxicity intervals in T. pyriformis data. Generalization and predictive abilities of the constructed (c) DTB and (d) DTF regression models to predict the T. pyriformis toxicity of diverse chemicals. - Highlights: • Ensemble learning (EL) based models constructed for toxicity prediction of chemicals • Predictive models used a few simple non-quantum mechanical molecular descriptors. • EL-based DTB/DTF models successfully discriminated toxic and non-toxic chemicals. • DTB/DTF regression models precisely predicted toxicity of chemicals in multi-species. • Proposed EL based models can be used as tool to predict toxicity of new chemicals.« less
NASA Astrophysics Data System (ADS)
Rao, M.; George, L. A.
2012-12-01
Nitrogen dioxide (NO2), an atmospheric pollutant generated primarily by anthropogenic combustion processes, is typically found at higher concentrations in urban areas compared to non-urbanized environments. Elevated NO2 levels have multiple ecosystem effects at different spatial scales. At the local scale, elevated levels affect human health directly and through the formation of secondary pollutants such as ozone and aerosols; at the regional scale secondary pollutants such as nitric acid and organic nitrates have deleterious effects on non-urbanized areas; and, at the global scale, nitrogen oxide emissions significantly alter the natural biogeochemical nitrogen cycle. As cities globally become larger and larger sources of nitrogen oxide emissions, it is important to assess possible mitigation strategies to reduce the impact of emissions locally, regionally and globally. In this study, we build a national land-use regression (LUR) model to compare the impacts of deciduous and evergreen trees on urban NO2 levels in the United States. We use the EPA monitoring network values of NO2 levels for 2006, the 2006 NLCD tree canopy data for deciduous and evergreen canopies, and the US Census Bureau's TIGER shapefiles for roads, railroads, impervious area & population density as proxies for NO2 sources on-road traffic, railroad traffic, off-road and area sources respectively. Our preliminary LUR model corroborates previous LUR studies showing that the presence of trees is associated with reduced urban NO2 levels. Additionally, our model indicates that deciduous and evergreen trees reduce NO2 to different extents, and that the amount of NO2 reduced varies seasonally. The model indicates that every square kilometer of deciduous canopy within a 2km buffer is associated with a reduction in ambient NO2 levels of 0.64 ppb in summer and 0.46ppb in winter. Similarly, every square kilometer of evergreen tree canopy within a 2 km buffer is associated with a reduction in ambient NO2 by 0.53 ppb in summer and 0.84 ppb in winter. Thus, the model indicates that deciduous trees are associated with a 30% smaller reduction in NO2 in winter as compared to summer, while evergreens are associated with a 60% increase in the reduction of NO2 in winter, for every square kilometer of deciduous or evergreen canopy within a 2 km buffer. Leaf- and local canopy-level studies have shown that trees are a sink for urban NO2 through deposition as well as stomatal and cuticular uptake. The winter time versus summer time effects suggest that leaf-level deposition may not be the only uptake mechanism and points to the need for a more holistic analysis of tree and canopy-level deposition for urban air pollution models. Since deposition velocities for NO2 vary by tree species, the reduction may also vary by species. These findings have implications for designing cities to reduce the impact of air pollution.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brudvig, Lars A.; Orrock, John L.; Damschen, Ellen I.
Ecological restoration is frequently guided by reference conditions describing a successfully restored ecosystem; however, the causes and magnitude of ecosystem degradation vary, making simple knowledge of reference conditions insufficient for prioritizing and guiding restoration. Ecological reference models provide further guidance by quantifying reference conditions, as well as conditions at degraded states that deviate from reference conditions. Many reference models remain qualitative, however, limiting their utility. We quantified and evaluated a reference model for southeastern U.S. longleaf pine woodland understory plant communities. We used regression trees to classify 232 longleaf pine woodland sites at three locations along the Atlantic coastal plainmore » based on relationships between understory plant community composition, soils lol(which broadly structure these communities), and factors associated with understory degradation, including fire frequency, agricultural history, and tree basal area. To understand the spatial generality of this model, we classified all sites together. and for each of three study locations separately. Both the regional and location-specific models produced quantifiable degradation gradients–i.e., progressive deviation from conditions at 38 reference sites, based on understory species composition, diversity and total cover, litter depth, and other attributes. Regionally, fire suppression was the most important degrading factor, followed by agricultural history, but at individual locations, agricultural history or tree basal area was most important. At one location, the influence of a degrading factor depended on soil attributes. We suggest that our regional model can help prioritize longleaf pine woodland restoration across our study region; however, due to substantial landscape-to-landscape variation, local management decisions should take into account additional factors (e.g., soil attributes). Our study demonstrates the utility of quantifying degraded states and provides a series of hypotheses for future experimental restoration work. More broadly, our work provides a framework for developing and evaluating reference models that incorporate multiple, interactive anthropogenic drivers of ecosystem degradation.« less
A generalized system of models forecasting Central States tree growth.
Stephen R. Shifley
1987-01-01
Describes the development and testing of a system of individual tree-based growth projection models applicable to species in Indiana, Missouri, and Ohio. Annual tree basal area growth is estimated as a function of tree size, crown ratio, stand density, and site index. Models are compatible with the STEMS and TWIGS Projection System.
Ricker, Martin; Peña Ramírez, Víctor M.; von Rosen, Dietrich
2014-01-01
Growth curves are monotonically increasing functions that measure repeatedly the same subjects over time. The classical growth curve model in the statistical literature is the Generalized Multivariate Analysis of Variance (GMANOVA) model. In order to model the tree trunk radius (r) over time (t) of trees on different sites, GMANOVA is combined here with the adapted PL regression model Q = A·T+E, where for and for , A = initial relative growth to be estimated, , and E is an error term for each tree and time point. Furthermore, Ei[–b·r] = , , with TPR being the turning point radius in a sigmoid curve, and at is an estimated calibrating time-radius point. Advantages of the approach are that growth rates can be compared among growth curves with different turning point radiuses and different starting points, hidden outliers are easily detectable, the method is statistically robust, and heteroscedasticity of the residuals among time points is allowed. The model was implemented with dendrochronological data of 235 Pinus montezumae trees on ten Mexican volcano sites to calculate comparison intervals for the estimated initial relative growth . One site (at the Popocatépetl volcano) stood out, with being 3.9 times the value of the site with the slowest-growing trees. Calculating variance components for the initial relative growth, 34% of the growth variation was found among sites, 31% among trees, and 35% over time. Without the Popocatépetl site, the numbers changed to 7%, 42%, and 51%. Further explanation of differences in growth would need to focus on factors that vary within sites and over time. PMID:25402427
Scollo, Annalisa; Gottardo, Flaviana; Contiero, Barbara; Edwards, Sandra A
2017-10-01
Tail biting in pigs has been an identified behavioural, welfare and economic problem for decades, and requires appropriate but sometimes difficult on-farm interventions. The aim of the paper is to introduce the Classification and Regression Tree (CRT) methodologies to develop a tool for prevention of acute tail biting lesions in pigs on-farm. A sample of 60 commercial farms rearing heavy pigs were involved; an on-farm visit and an interview with the farmer collected data on general management, herd health, disease prevention, climate control, feeding and production traits. Results suggest a value for the CRT analysis in managing the risk factors behind tail biting on a farm-specific level, showing 86.7% sensitivity for the Classification Tree and a correlation of 0.7 between observed and predicted prevalence of tail biting obtained with the Regression Tree. CRT analysis showed five main variables (stocking density, ammonia levels, number of pigs per stockman, type of floor and timeliness in feed supply) as critical predictors of acute tail biting lesions, which demonstrate different importance in different farms subgroups. The model might have reliable and practical applications for the support and implementation of tail biting prevention interventions, especially in case of subgroups of pigs with higher risk, helping farmers and veterinarians to assess the risk in their own farm and to manage their predisposing variables in order to reduce acute tail biting lesions. Copyright © 2017 Elsevier B.V. All rights reserved.
Lo, Benjamin W Y; Fukuda, Hitoshi; Angle, Mark; Teitelbaum, Jeanne; Macdonald, R Loch; Farrokhyar, Forough; Thabane, Lehana; Levine, Mitchell A H
2016-01-01
Classification and regression tree analysis involves the creation of a decision tree by recursive partitioning of a dataset into more homogeneous subgroups. Thus far, there is scarce literature on using this technique to create clinical prediction tools for aneurysmal subarachnoid hemorrhage (SAH). The classification and regression tree analysis technique was applied to the multicenter Tirilazad database (3551 patients) in order to create the decision-making algorithm. In order to elucidate prognostic subgroups in aneurysmal SAH, neurologic, systemic, and demographic factors were taken into account. The dependent variable used for analysis was the dichotomized Glasgow Outcome Score at 3 months. Classification and regression tree analysis revealed seven prognostic subgroups. Neurological grade, occurrence of post-admission stroke, occurrence of post-admission fever, and age represented the explanatory nodes of this decision tree. Split sample validation revealed classification accuracy of 79% for the training dataset and 77% for the testing dataset. In addition, the occurrence of fever at 1-week post-aneurysmal SAH is associated with increased odds of post-admission stroke (odds ratio: 1.83, 95% confidence interval: 1.56-2.45, P < 0.01). A clinically useful classification tree was generated, which serves as a prediction tool to guide bedside prognostication and clinical treatment decision making. This prognostic decision-making algorithm also shed light on the complex interactions between a number of risk factors in determining outcome after aneurysmal SAH.
GWAS-based machine learning approach to predict duloxetine response in major depressive disorder.
Maciukiewicz, Malgorzata; Marshe, Victoria S; Hauschild, Anne-Christin; Foster, Jane A; Rotzinger, Susan; Kennedy, James L; Kennedy, Sidney H; Müller, Daniel J; Geraci, Joseph
2018-04-01
Major depressive disorder (MDD) is one of the most prevalent psychiatric disorders and is commonly treated with antidepressant drugs. However, large variability is observed in terms of response to antidepressants. Machine learning (ML) models may be useful to predict treatment outcomes. A sample of 186 MDD patients received treatment with duloxetine for up to 8 weeks were categorized as "responders" based on a MADRS change >50% from baseline; or "remitters" based on a MADRS score ≤10 at end point. The initial dataset (N = 186) was randomly divided into training and test sets in a nested 5-fold cross-validation, where 80% was used as a training set and 20% made up five independent test sets. We performed genome-wide logistic regression to identify potentially significant variants related to duloxetine response/remission and extracted the most promising predictors using LASSO regression. Subsequently, classification-regression trees (CRT) and support vector machines (SVM) were applied to construct models, using ten-fold cross-validation. With regards to response, none of the pairs performed significantly better than chance (accuracy p > .1). For remission, SVM achieved moderate performance with an accuracy = 0.52, a sensitivity = 0.58, and a specificity = 0.46, and 0.51 for all coefficients for CRT. The best performing SVM fold was characterized by an accuracy = 0.66 (p = .071), sensitivity = 0.70 and a sensitivity = 0.61. In this study, the potential of using GWAS data to predict duloxetine outcomes was examined using ML models. The models were characterized by a promising sensitivity, but specificity remained moderate at best. The inclusion of additional non-genetic variables to create integrated models may improve prediction. Copyright © 2017. Published by Elsevier Ltd.
MINER: exploratory analysis of gene interaction networks by machine learning from expression data.
Kadupitige, Sidath Randeni; Leung, Kin Chun; Sellmeier, Julia; Sivieng, Jane; Catchpoole, Daniel R; Bain, Michael E; Gaëta, Bruno A
2009-12-03
The reconstruction of gene regulatory networks from high-throughput "omics" data has become a major goal in the modelling of living systems. Numerous approaches have been proposed, most of which attempt only "one-shot" reconstruction of the whole network with no intervention from the user, or offer only simple correlation analysis to infer gene dependencies. We have developed MINER (Microarray Interactive Network Exploration and Representation), an application that combines multivariate non-linear tree learning of individual gene regulatory dependencies, visualisation of these dependencies as both trees and networks, and representation of known biological relationships based on common Gene Ontology annotations. MINER allows biologists to explore the dependencies influencing the expression of individual genes in a gene expression data set in the form of decision, model or regression trees, using their domain knowledge to guide the exploration and formulate hypotheses. Multiple trees can then be summarised in the form of a gene network diagram. MINER is being adopted by several of our collaborators and has already led to the discovery of a new significant regulatory relationship with subsequent experimental validation. Unlike most gene regulatory network inference methods, MINER allows the user to start from genes of interest and build the network gene-by-gene, incorporating domain expertise in the process. This approach has been used successfully with RNA microarray data but is applicable to other quantitative data produced by high-throughput technologies such as proteomics and "next generation" DNA sequencing.
Hayes, Timothy; Usami, Satoshi; Jacobucci, Ross; McArdle, John J
2015-12-01
In this article, we describe a recent development in the analysis of attrition: using classification and regression trees (CART) and random forest methods to generate inverse sampling weights. These flexible machine learning techniques have the potential to capture complex nonlinear, interactive selection models, yet to our knowledge, their performance in the missing data analysis context has never been evaluated. To assess the potential benefits of these methods, we compare their performance with commonly employed multiple imputation and complete case techniques in 2 simulations. These initial results suggest that weights computed from pruned CART analyses performed well in terms of both bias and efficiency when compared with other methods. We discuss the implications of these findings for applied researchers. (c) 2015 APA, all rights reserved).
Hayes, Timothy; Usami, Satoshi; Jacobucci, Ross; McArdle, John J.
2016-01-01
In this article, we describe a recent development in the analysis of attrition: using classification and regression trees (CART) and random forest methods to generate inverse sampling weights. These flexible machine learning techniques have the potential to capture complex nonlinear, interactive selection models, yet to our knowledge, their performance in the missing data analysis context has never been evaluated. To assess the potential benefits of these methods, we compare their performance with commonly employed multiple imputation and complete case techniques in 2 simulations. These initial results suggest that weights computed from pruned CART analyses performed well in terms of both bias and efficiency when compared with other methods. We discuss the implications of these findings for applied researchers. PMID:26389526
DBH Prediction Using Allometry Described by Bivariate Copula Distribution
NASA Astrophysics Data System (ADS)
Xu, Q.; Hou, Z.; Li, B.; Greenberg, J. A.
2017-12-01
Forest biomass mapping based on single tree detection from the airborne laser scanning (ALS) usually depends on an allometric equation that relates diameter at breast height (DBH) with per-tree aboveground biomass. The incapability of the ALS technology in directly measuring DBH leads to the need to predict DBH with other ALS-measured tree-level structural parameters. A copula-based method is proposed in the study to predict DBH with the ALS-measured tree height and crown diameter using a dataset measured in the Lassen National Forest in California. Instead of exploring an explicit mathematical equation that explains the underlying relationship between DBH and other structural parameters, the copula-based prediction method utilizes the dependency between cumulative distributions of these variables, and solves the DBH based on an assumption that for a single tree, the cumulative probability of each structural parameter is identical. Results show that compared with the bench-marking least-square linear regression and the k-MSN imputation, the copula-based method obtains better accuracy in the DBH for the Lassen National Forest. To assess the generalization of the proposed method, prediction uncertainty is quantified using bootstrapping techniques that examine the variability of the RMSE of the predicted DBH. We find that the copula distribution is reliable in describing the allometric relationship between tree-level structural parameters, and it contributes to the reduction of prediction uncertainty.
ERIC Educational Resources Information Center
Raju, Dheeraj; Schumacker, Randall
2015-01-01
The study used earliest available student data from a flagship university in the southeast United States to build data mining models like logistic regression with different variable selection methods, decision trees, and neural networks to explore important student characteristics associated with retention leading to graduation. The decision tree…
Comparison of Field Methods and Models to Estimate Mean Crown Diameter
William A. Bechtold; Manfred E. Mielke; Stanley J. Zarnoch
2002-01-01
The direct measurement of crown diameters with logger's tapes adds significantly to the cost of extensive forest inventories. We undertook a study of 100 trees to compare this measurement method to four alternatives-two field instruments, ocular estimates, and regression models. Using the taping method as the standard of comparison, accuracy of the tested...
Macalady, Alison K.; Bugmann, Harald
2014-01-01
The processes leading to drought-associated tree mortality are poorly understood, particularly long-term predisposing factors, memory effects, and variability in mortality processes and thresholds in space and time. We use tree rings from four sites to investigate Pinus edulis mortality during two drought periods in the southwestern USA. We draw on recent sampling and archived collections to (1) analyze P. edulis growth patterns and mortality during the 1950s and 2000s droughts; (2) determine the influence of climate and competition on growth in trees that died and survived; and (3) derive regression models of growth-mortality risk and evaluate their performance across space and time. Recent growth was 53% higher in surviving vs. dying trees, with some sites exhibiting decades-long growth divergences associated with previous drought. Differential growth response to climate partly explained growth differences between live and dead trees, with responses wet/cool conditions most influencing eventual tree status. Competition constrained tree growth, and reduced trees’ ability to respond to favorable climate. The best predictors in growth-mortality models included long-term (15–30 year) average growth rate combined with a metric of growth variability and the number of abrupt growth increases over 15 and 10 years, respectively. The most parsimonious models had high discriminatory power (ROC>0.84) and correctly classified ∼70% of trees, suggesting that aspects of tree growth, especially over decades, can be powerful predictors of widespread drought-associated die-off. However, model discrimination varied across sites and drought events. Weaker growth-mortality relationships and higher growth at lower survival probabilities for some sites during the 2000s event suggest a shift in mortality processes from longer-term growth-related constraints to shorter-term processes, such as rapid metabolic decline even in vigorous trees due to acute drought stress, and/or increases in the attack rate of both chronically stressed and more vigorous trees by bark beetles. PMID:24786646
Using decision tree analysis to identify risk factors for relapse to smoking
Piper, Megan E.; Loh, Wei-Yin; Smith, Stevens S.; Japuntich, Sandra J.; Baker, Timothy B.
2010-01-01
This research used classification tree analysis and logistic regression models to identify risk factors related to short- and long-term abstinence. Baseline and cessation outcome data from two smoking cessation trials, conducted from 2001 to 2002, in two Midwestern urban areas, were analyzed. There were 928 participants (53.1% women, 81.8% white) with complete data. Both analyses suggest that relapse risk is produced by interactions of risk factors and that early and late cessation outcomes reflect different vulnerability factors. The results illustrate the dynamic nature of relapse risk and suggest the importance of efficient modeling of interactions in relapse prediction. PMID:20397871
Climate change at upper treeline: How do trees on the edge react to increasing temperatures?
NASA Astrophysics Data System (ADS)
Jochner, Matthias; Bugmann, Harald; Nötzli, Magdalena; Bigler, Christof
2017-04-01
Treeline ecotones are thought to be particularly sensitive to climate warming, and an alteration of their growth conditions may have important implications for the ecosystem services they supply in mountain regions. We use a novel approach to quantify effects of a changing climate on tree growth, using case studies in the European Alps. We compiled tree-ring data from almost 600 trees of four species at treeline in three climate regions of Switzerland. Temperature loggers installed along transects provided data for a precise interpolation of temperatures experienced by the sampled trees. To assess the influence of temperature on annual growth, we used linear mixed-effects models, allowing us to quantify effect sizes and to account for between-tree growth variability. After removing biological growth trends, we isolated temporal trends of ring-width indices. Furthermore, we fitted non-linear regression models to radial growth rates of individual years with temperature and tree age as predicting covariates for a fine-scale investigation of the temperature dependency of tree growth. For all species, climate-growth linear mixed-effects models indicated strong positive responses of ring-width indices to temperature in early summer and previous year's autumn, featuring considerable between-tree variability. All species showed positive ring-width index trends at treeline but different interactions with elevation: Larix decidua exhibited a declining ring-width index trend with decreasing elevation, whereas Picea abies, Pinus cembra and Pinus mugo showed increasing and/or stable trends. Not only reflected our findings the effects of ameliorated growth conditions, they might have also revealed suspected negative and positive feedbacks of climate change on growth, and increased the knowledge about the functional form and parameterization of the temperature dependency of tree growth.
Amirabadizadeh, Alireza; Nezami, Hossein; Vaughn, Michael G; Nakhaee, Samaneh; Mehrpour, Omid
2018-05-12
Substance abuse exacts considerable social and health care burdens throughout the world. The aim of this study was to create a prediction model to better identify risk factors for drug use. A prospective cross-sectional study was conducted in South Khorasan Province, Iran. Of the total of 678 eligible subjects, 70% (n: 474) were randomly selected to provide a training set for constructing decision tree and multiple logistic regression (MLR) models. The remaining 30% (n: 204) were employed in a holdout sample to test the performance of the decision tree and MLR models. Predictive performance of different models was analyzed by the receiver operating characteristic (ROC) curve using the testing set. Independent variables were selected from demographic characteristics and history of drug use. For the decision tree model, the sensitivity and specificity for identifying people at risk for drug abuse were 66% and 75%, respectively, while the MLR model was somewhat less effective at 60% and 73%. Key independent variables in the analyses included first substance experience, age at first drug use, age, place of residence, history of cigarette use, and occupational and marital status. While study findings are exploratory and lack generalizability they do suggest that the decision tree model holds promise as an effective classification approach for identifying risk factors for drug use. Convergent with prior research in Western contexts is that age of drug use initiation was a critical factor predicting a substance use disorder.
NASA Astrophysics Data System (ADS)
Żukowicz, Marek; Markiewicz, Michał
2016-09-01
The aim of the article is to present a mathematical definition of the object model, that is known in computer science as TreeList and to show application of this model for design evolutionary algorithm, that purpose is to generate structures based on this object. The first chapter introduces the reader to the problem of presenting data using the TreeList object. The second chapter describes the problem of testing data structures based on TreeList. The third one shows a mathematical model of the object TreeList and the parameters, used in determining the utility of structures created through this model and in evolutionary strategy, that generates these structures for testing purposes. The last chapter provides a brief summary and plans for future research related to the algorithm presented in the article.
Selective Tree-ring Models: A Novel Method for Reconstructing Streamflow Using Tree Rings
NASA Astrophysics Data System (ADS)
Foard, M. B.; Nelson, A. S.; Harley, G. L.
2017-12-01
Surface water is among the most instrumental and vulnerable resources in the Northwest United States (NW). Recent observations show that overall water quantity is declining in streams across the region, while extreme flooding events occur more frequently. Historical streamflow models inform probabilities of extreme flow events (flood or drought) by describing frequency and duration of past events. There are numerous examples of tree-rings being utilized to reconstruct streamflow in the NW. These models confirm that tree-rings are highly accurate at predicting streamflow, however there are many nuances that limit their applicability through time and space. For example, most models predict streamflow from hydrologically altered rivers (e.g. dammed, channelized) which may hinder our ability to predict natural prehistoric flow. They also have a tendency to over/under-predict extreme flow events. Moreover, they often neglect to capture the changing relationships between tree-growth and streamflow over time and space. To address these limitations, we utilized national tree-ring and streamflow archives to investigate the relationships between the growth of multiple coniferous species and free-flowing streams across the NW using novel species-and site-specific streamflow models - a term we coined"selective tree-ring models." Correlation function analysis and regression modeling were used to evaluate the strengths and directions of the flow-growth relationships. Species with significant relationships in the same direction were identified as strong candidates for selective models. Temporal and spatial patterns of these relationships were examined using running correlations and inverse distance weighting interpolation, respectively. Our early results indicate that (1) species adapted to extreme climates (e.g. hot-dry, cold-wet) exhibit the most consistent relationships across space, (2) these relationships weaken in locations with mild climatic variability, and (3) some species appear to be strong candidates for predicting high flow events, while others may be better at pridicting drought. These findings indicate that selective models may outperform traditional models when reconstructing distinctive aspects of streamflow.
Differences in Risk Factors for Rotator Cuff Tears between Elderly Patients and Young Patients.
Watanabe, Akihisa; Ono, Qana; Nishigami, Tomohiko; Hirooka, Takahiko; Machida, Hirohisa
2018-02-01
It has been unclear whether the risk factors for rotator cuff tears are the same at all ages or differ between young and older populations. In this study, we examined the risk factors for rotator cuff tears using classification and regression tree analysis as methods of nonlinear regression analysis. There were 65 patients in the rotator cuff tears group and 45 patients in the intact rotator cuff group. Classification and regression tree analysis was performed to predict rotator cuff tears. The target factor was rotator cuff tears; explanatory variables were age, sex, trauma, and critical shoulder angle≥35°. In the results of classification and regression tree analysis, the tree was divided at age 64. For patients aged≥64, the tree was divided at trauma. For patients aged<64, the tree was divided at critical shoulder angle≥35°. The odds ratio for critical shoulder angle≥35° was significant for all ages (5.89), and for patients aged<64 (10.3) while trauma was only a significant factor for patients aged≥64 (5.13). Age, trauma, and critical shoulder angle≥35° were related to rotator cuff tears in this study. However, these risk factors showed different trends according to age group, not a linear relationship.
Mocellin, Simone; Thompson, John F; Pasquali, Sandro; Montesco, Maria C; Pilati, Pierluigi; Nitti, Donato; Saw, Robyn P; Scolyer, Richard A; Stretch, Jonathan R; Rossi, Carlo R
2009-12-01
To improve selection for sentinel node (SN) biopsy (SNB) in patients with cutaneous melanoma using statistical models predicting SN status. About 80% of patients currently undergoing SNB are node negative. In the absence of conclusive evidence of a SNBassociated survival benefit, these patients may be over-treated. Here, we tested the efficiency of 4 different models in predicting SN status. The clinicopathologic data (age, gender, tumor thickness, Clark level, regression, ulceration, histologic subtype, and mitotic index) of 1132 melanoma patients who had undergone SNB at institutions in Italy and Australia were analyzed. Logistic regression, classification tree, random forest, and support vector machine models were fitted to the data. The predictive models were built with the aim of maximizing the negative predictive value (NPV) and reducing the rate of SNB procedures though minimizing the error rate. After cross-validation logistic regression, classification tree, random forest, and support vector machine predictive models obtained clinically relevant NPV (93.6%, 94.0%, 97.1%, and 93.0%, respectively), SNB reduction (27.5%, 29.8%, 18.2%, and 30.1%, respectively), and error rates (1.8%, 1.8%, 0.5%, and 2.1%, respectively). Using commonly available clinicopathologic variables, predictive models can preoperatively identify a proportion of patients ( approximately 25%) who might be spared SNB, with an acceptable (1%-2%) error. If validated in large prospective series, these models might be implemented in the clinical setting for improved patient selection, which ultimately would lead to better quality of life for patients and optimization of resource allocation for the health care system.
Louis R. Iverson; Anantha M. Prasad; Stephen N. Matthews; Matthew P. Peters
2010-01-01
Climate change will likely cause impacts that are species specific and significant; modeling is critical to better understand potential changes in suitable habitat. We use empirical, abundance-based habitat models utilizing decision tree-based ensemble methods to explore potential changes of 134 tree species habitats in the eastern United States (http://www.nrs.fs.fed....
A nonparametric analysis of plot basal area growth using tree based models
G. L. Gadbury; H. K. lyer; H. T. Schreuder; C. Y. Ueng
1997-01-01
Tree based statistical models can be used to investigate data structure and predict future observations. We used nonparametric and nonlinear models to reexamine the data sets on tree growth used by Bechtold et al. (1991) and Ruark et al. (1991). The growth data were collected by Forest Inventory and Analysis (FIA) teams from 1962 to 1972 (4th cycle) and 1972 to 1982 (...
NASA Astrophysics Data System (ADS)
Fang, F. J.
2017-12-01
Reconciling observations at fundamentally different scales is central in understanding the global carbon cycle. This study investigates a model-based melding of forest inventory data, remote-sensing data and micrometeorological-station data ("flux towers" estimating forest heat, CO2 and H2O fluxes). The individual tree-based model FORCCHN was used to evaluate the tree DBH increment and forest carbon fluxes. These are the first simultaneous simulations of the forest carbon budgets from flux towers and individual-tree growth estimates of forest carbon budgets using the continuous forest inventory data — under circumstances in which both predictions can be tested. Along with the global implications of such findings, this also improves the capacity for forest sustainable management and the comprehensive understanding of forest ecosystems. In forest ecology, diameter at breast height (DBH) of a tree significantly determines an individual tree's cross-sectional sapwood area, its biomass and carbon storage. Evaluation the annual DBH increment (ΔDBH) of an individual tree is central to understanding tree growth and forest ecology. Ecosystem Carbon flux is a consequence of key ecosystem processes in the forest-ecosystem carbon cycle, Gross and Net Primary Production (GPP and NPP, respectively) and Net Ecosystem Respiration (NEP). All of these closely relate with tree DBH changes and tree death. Despite advances in evaluating forest carbon fluxes with flux towers and forest inventories for individual tree ΔDBH, few current ecological models can simultaneously quantify and predict the tree ΔDBH and forest carbon flux.
Bayesian and Phylogenic Approaches for Studying Relationships among Table Olive Cultivars.
Ben Ayed, Rayda; Ennouri, Karim; Ben Amar, Fathi; Moreau, Fabienne; Triki, Mohamed Ali; Rebai, Ahmed
2017-08-01
To enhance table olive tree authentication, relationship, and productivity, we consider the analysis of 18 worldwide table olive cultivars (Olea europaea L.) based on morphological, biological, and physicochemical markers analyzed by bioinformatic and biostatistic tools. Accordingly, we assess the relationships between the studied varieties, on the one hand, and the potential productivity-quantitative parameter links on the other hand. The bioinformatic analysis based on the graphical representation of the matrix of Euclidean distances, the principal components analysis, unweighted pair group method with arithmetic mean, and principal coordinate analysis (PCoA) revealed three major clusters which were not correlated with the geographic origin. The statistical analysis based on Kendall's and Spearman correlation coefficients suggests two highly significant associations with both fruit color and pollinization and the productivity character. These results are confirmed by the multiple linear regression prediction models. In fact, based on the coefficient of determination (R 2 ) value, the best model demonstrated the power of the pollinization on the tree productivity (R 2 = 0.846). Moreover, the derived directed acyclic graph showed that only two direct influences are detected: effect of tolerance on fruit and stone symmetry on side and effect of tolerance on stone form and oil content on the other side. This work provides better understanding of the diversity available in worldwide table olive cultivars and supplies an important contribution for olive breeding and authenticity.
Villamor, Grace B.; Nyarko, Benjamin Kofi; Wala, Kperkouma; Akpagana, Koffi
2018-01-01
Vitellaria paradoxa (Gaertn C. F.), or shea tree, remains one of the most valuable trees for farmers in the Atacora district of northern Benin, where rural communities depend on shea products for both food and income. To optimize productivity and management of shea agroforestry systems, or "parklands," accurate and up-to-date data are needed. For this purpose, we monitored120 fruiting shea trees for two years under three land-use scenarios and different soil groups in Atacora, coupled with a farm household survey to elicit information on decision making and management practices. To examine the local pattern of shea tree productivity and relationships between morphological factors and yields, we used a randomized branch sampling method and applied a regression analysis to build a shea yield model based on dendrometric, soil and land-use variables. We also compared potential shea yields based on farm household socio-economic characteristics and management practices derived from the survey data. Soil and land-use variables were the most important determinants of shea fruit yield. In terms of land use, shea trees growing on farmland plots exhibited the highest yields (i.e., fruit quantity and mass) while trees growing on Lixisols performed better than those of the other soil group. Contrary to our expectations, dendrometric parameters had weak relationships with fruit yield regardless of land-use and soil group. There is an inter-annual variability in fruit yield in both soil groups and land-use type. In addition to observed inter-annual yield variability, there was a high degree of variability in production among individual shea trees. Furthermore, household socioeconomic characteristics such as road accessibility, landholding size, and gross annual income influence shea fruit yield. The use of fallow areas is an important land management practice in the study area that influences both conservation and shea yield. PMID:29346406
Aleza, Koutchoukalo; Villamor, Grace B; Nyarko, Benjamin Kofi; Wala, Kperkouma; Akpagana, Koffi
2018-01-01
Vitellaria paradoxa (Gaertn C. F.), or shea tree, remains one of the most valuable trees for farmers in the Atacora district of northern Benin, where rural communities depend on shea products for both food and income. To optimize productivity and management of shea agroforestry systems, or "parklands," accurate and up-to-date data are needed. For this purpose, we monitored120 fruiting shea trees for two years under three land-use scenarios and different soil groups in Atacora, coupled with a farm household survey to elicit information on decision making and management practices. To examine the local pattern of shea tree productivity and relationships between morphological factors and yields, we used a randomized branch sampling method and applied a regression analysis to build a shea yield model based on dendrometric, soil and land-use variables. We also compared potential shea yields based on farm household socio-economic characteristics and management practices derived from the survey data. Soil and land-use variables were the most important determinants of shea fruit yield. In terms of land use, shea trees growing on farmland plots exhibited the highest yields (i.e., fruit quantity and mass) while trees growing on Lixisols performed better than those of the other soil group. Contrary to our expectations, dendrometric parameters had weak relationships with fruit yield regardless of land-use and soil group. There is an inter-annual variability in fruit yield in both soil groups and land-use type. In addition to observed inter-annual yield variability, there was a high degree of variability in production among individual shea trees. Furthermore, household socioeconomic characteristics such as road accessibility, landholding size, and gross annual income influence shea fruit yield. The use of fallow areas is an important land management practice in the study area that influences both conservation and shea yield.
Computational intelligence models to predict porosity of tablets using minimum features
Khalid, Mohammad Hassan; Kazemi, Pezhman; Perez-Gandarillas, Lucia; Michrafy, Abderrahim; Szlęk, Jakub; Jachowicz, Renata; Mendyk, Aleksander
2017-01-01
The effects of different formulations and manufacturing process conditions on the physical properties of a solid dosage form are of importance to the pharmaceutical industry. It is vital to have in-depth understanding of the material properties and governing parameters of its processes in response to different formulations. Understanding the mentioned aspects will allow tighter control of the process, leading to implementation of quality-by-design (QbD) practices. Computational intelligence (CI) offers an opportunity to create empirical models that can be used to describe the system and predict future outcomes in silico. CI models can help explore the behavior of input parameters, unlocking deeper understanding of the system. This research endeavor presents CI models to predict the porosity of tablets created by roll-compacted binary mixtures, which were milled and compacted under systematically varying conditions. CI models were created using tree-based methods, artificial neural networks (ANNs), and symbolic regression trained on an experimental data set and screened using root-mean-square error (RMSE) scores. The experimental data were composed of proportion of microcrystalline cellulose (MCC) (in percentage), granule size fraction (in micrometers), and die compaction force (in kilonewtons) as inputs and porosity as an output. The resulting models show impressive generalization ability, with ANNs (normalized root-mean-square error [NRMSE] =1%) and symbolic regression (NRMSE =4%) as the best-performing methods, also exhibiting reliable predictive behavior when presented with a challenging external validation data set (best achieved symbolic regression: NRMSE =3%). Symbolic regression demonstrates the transition from the black box modeling paradigm to more transparent predictive models. Predictive performance and feature selection behavior of CI models hints at the most important variables within this factor space. PMID:28138223
Computational intelligence models to predict porosity of tablets using minimum features.
Khalid, Mohammad Hassan; Kazemi, Pezhman; Perez-Gandarillas, Lucia; Michrafy, Abderrahim; Szlęk, Jakub; Jachowicz, Renata; Mendyk, Aleksander
2017-01-01
The effects of different formulations and manufacturing process conditions on the physical properties of a solid dosage form are of importance to the pharmaceutical industry. It is vital to have in-depth understanding of the material properties and governing parameters of its processes in response to different formulations. Understanding the mentioned aspects will allow tighter control of the process, leading to implementation of quality-by-design (QbD) practices. Computational intelligence (CI) offers an opportunity to create empirical models that can be used to describe the system and predict future outcomes in silico. CI models can help explore the behavior of input parameters, unlocking deeper understanding of the system. This research endeavor presents CI models to predict the porosity of tablets created by roll-compacted binary mixtures, which were milled and compacted under systematically varying conditions. CI models were created using tree-based methods, artificial neural networks (ANNs), and symbolic regression trained on an experimental data set and screened using root-mean-square error (RMSE) scores. The experimental data were composed of proportion of microcrystalline cellulose (MCC) (in percentage), granule size fraction (in micrometers), and die compaction force (in kilonewtons) as inputs and porosity as an output. The resulting models show impressive generalization ability, with ANNs (normalized root-mean-square error [NRMSE] =1%) and symbolic regression (NRMSE =4%) as the best-performing methods, also exhibiting reliable predictive behavior when presented with a challenging external validation data set (best achieved symbolic regression: NRMSE =3%). Symbolic regression demonstrates the transition from the black box modeling paradigm to more transparent predictive models. Predictive performance and feature selection behavior of CI models hints at the most important variables within this factor space.
Nieto-Lugilde, Diego; Lenoir, Jonathan; Abdulhak, Sylvain; Aeschimann, David; Dullinger, Stefan; Gégout, Jean-Claude; Guisan, Antoine; Pauli, Harald; Renaud, Julien; Theurillat, Jean-Paul; Thuiller, Wilfried; Van Es, Jérémie; Vittoz, Pascal; Willner, Wolfgang; Wohlgemuth, Thomas; Zimmermann, Niklaus E.; Svenning, Jens-Christian
2015-01-01
The role of competition for light among plants has long been recognised at local scales, but its importance for plant species distributions at larger spatial scales has generally been ignored. Tree cover modifies the local abiotic conditions below the canopy, notably by reducing light availability, and thus, also the performance of species that are not adapted to low-light conditions. However, this local effect may propagate to coarser spatial grains, by affecting colonisation probabilities and local extinction risks of herbs and shrubs. To assess the effect of tree cover at both the plot- and landscape-grain sizes (approximately 10-m and 1-km), we fit Generalised Linear Models (GLMs) for the plot-level distributions of 960 species of herbs and shrubs using 6,935 vegetation plots across the European Alps. We ran four models with different combinations of variables (climate, soil and tree cover) at both spatial grains for each species. We used partial regressions to evaluate the independent effects of plot- and landscape-grain tree cover on plot-level plant communities. Finally, the effects on species-specific elevational range limits were assessed by simulating a removal experiment comparing the species distributions under high and low tree cover. Accounting for tree cover improved the model performance, with the probability of the presence of shade-tolerant species increasing with increasing tree cover, whereas shade-intolerant species showed the opposite pattern. The tree cover effect occurred consistently at both the plot and landscape spatial grains, albeit most strongly at the former. Importantly, tree cover at the two grain sizes had partially independent effects on plot-level plant communities. With high tree cover, shade-intolerant species exhibited narrower elevational ranges than with low tree cover whereas shade-tolerant species showed wider elevational ranges at both limits. These findings suggest that forecasts of climate-related range shifts for herb and shrub species may be modified by tree cover dynamics. PMID:26290621
Tree-Structured Infinite Sparse Factor Model
Zhang, XianXing; Dunson, David B.; Carin, Lawrence
2013-01-01
A tree-structured multiplicative gamma process (TMGP) is developed, for inferring the depth of a tree-based factor-analysis model. This new model is coupled with the nested Chinese restaurant process, to nonparametrically infer the depth and width (structure) of the tree. In addition to developing the model, theoretical properties of the TMGP are addressed, and a novel MCMC sampler is developed. The structure of the inferred tree is used to learn relationships between high-dimensional data, and the model is also applied to compressive sensing and interpolation of incomplete images. PMID:25279389
Learning-based Wind Estimation using Distant Soundings for Unguided Aerial Delivery
NASA Astrophysics Data System (ADS)
Plyler, M.; Cahoy, K.; Angermueller, K.; Chen, D.; Markuzon, N.
2016-12-01
Delivering unguided, parachuted payloads from aircraft requires accurate knowledge of the wind field inside an operational zone. Usually, a dropsonde released from the aircraft over the drop zone gives a more accurate wind estimate than a forecast. Mission objectives occasionally demand releasing the dropsonde away from the drop zone, but still require accuracy and precision. Barnes interpolation and many other assimilation methods do poorly when the forecast error is inconsistent in a forecast grid. A machine learning approach can better leverage non-linear relations between different weather patterns and thus provide a better wind estimate at the target drop zone when using data collected up to 100 km away. This study uses the 13 km resolution Rapid Refresh (RAP) dataset available through NOAA and subsamples to an area around Yuma, AZ and up to approximately 10km AMSL. RAP forecast grids are updated with simulated dropsondes taken from analysis (historical weather maps). We train models using different data mining and machine learning techniques, most notably boosted regression trees, that can accurately assimilate the distant dropsonde. The model takes a forecast grid and simulated remote dropsonde data as input and produces an estimate of the wind stick over the drop zone. Using ballistic winds as a defining metric, we show our data driven approach does better than Barnes interpolation under some conditions, most notably when the forecast error is different between the two locations, on test data previously unseen by the model. We study and evaluate the model's performance depending on the size, the time lag, the drop altitude, and the geographic location of the training set, and identify parameters most contributing to the accuracy of the wind estimation. This study demonstrates a new approach for assimilating remotely released dropsondes, based on boosted regression trees, and shows improvement in wind estimation over currently used methods.
Section-Based Tree Species Identification Using Airborne LIDAR Point Cloud
NASA Astrophysics Data System (ADS)
Yao, C.; Zhang, X.; Liu, H.
2017-09-01
The application of LiDAR data in forestry initially focused on mapping forest community, particularly and primarily intended for largescale forest management and planning. Then with the smaller footprint and higher sampling density LiDAR data available, detecting individual tree overstory, estimating crowns parameters and identifying tree species are demonstrated practicable. This paper proposes a section-based protocol of tree species identification taking palm tree as an example. Section-based method is to detect objects through certain profile among different direction, basically along X-axis or Y-axis. And this method improve the utilization of spatial information to generate accurate results. Firstly, separate the tree points from manmade-object points by decision-tree-based rules, and create Crown Height Mode (CHM) by subtracting the Digital Terrain Model (DTM) from the digital surface model (DSM). Then calculate and extract key points to locate individual trees, thus estimate specific tree parameters related to species information, such as crown height, crown radius, and cross point etc. Finally, with parameters we are able to identify certain tree species. Comparing to species information measured on ground, the portion correctly identified trees on all plots could reach up to 90.65 %. The identification result in this research demonstrate the ability to distinguish palm tree using LiDAR point cloud. Furthermore, with more prior knowledge, section-based method enable the process to classify trees into different classes.
NASA Astrophysics Data System (ADS)
Fuller, D. O.
2016-12-01
Tree cover is a key parameter in climate modeling. It strongly influences CO2 exchanges between the land surface and atmosphere and surface energy balance. We measured percent woody canopy cover (PWCC) in the savanna woodlands of eastern Zambia over a 10-day period in May 2016 using a new iPhone App (CanopyApp) and related these field measurements to Landsat 8 (L8) Band 4 (red) imagery acquired approximately the same time. We then used parameters from the band 4 digital numbers (DNs)-PWCC linear regression to derive a new map of PWCC for the entire L8 scene. Consistent with theory and previous empirical studies, we found that the relationship between L8 band 4 DNs- PWCC was negative and linear (r2 = 0.61, p < 0.05). Interestingly, the relationship between PWCC and L8 band 4 surface reflectance was weaker (r2 = 0.46, p < 0.05) than that for DNs. This suggests that the scene model used in L8 atmospheric correction may not account well for within-pixel shadowing effects and other spatial inhomogeneities from variable soil and background reflectance. Our PWCC map agreed qualitatively with similar percent tree-cover maps based on Landsat level 1 products and past field studies in the area conducted using a hemispherical lens. Our results also compared favorably with other remote sensing studies that have used complex multivariate approaches to estimate tree cover, which suggests that use of a single L8 band 4 is sufficient to estimate PWCC when spectral contrast exists between the grass, soil and tree layers during the austral fall period in southern African savannas.
Shrinking windows of opportunity for oak seedling establishment in southern California mountains
Davis, Frank W.; Sweet, Lynn C.; Serra-Diaz, Josep M.; Franklin, Janet; McCullough, Ian M.; Flint, Alan L.; Flint, Lorraine E.; Dingman, John; Regan, Helen M.; Syphard, Alexandra D.; Hannah, Lee; Redmond, Kelly; Moritz, Max A.
2016-01-01
Seedling establishment is a critical step that may ultimately govern tree species’ distribution shifts under environmental change. Annual variation in the location of seed rain and microclimates results in transient “windows of opportunity” for tree seedling establishment across the landscape. These establishment windows vary at fine spatiotemporal scales that are not considered in most assessments of climate change impacts on tree species range dynamics and habitat displacement. We integrate field seedling establishment trials conducted in the southern Sierra Nevada and western Tehachapi Mountains of southern California with spatially downscaled grids of modeled water-year climatic water deficit (CWDwy) and mean August maximum daily temperature (Tmax) to map historical and projected future microclimates suitable for establishment windows of opportunity for Quercus douglasii, a dominant tree species of warm, dry foothill woodlands, and Q. kelloggii, a dominant of cooler, more mesic montane woodlands and forests. Based on quasi-binomial regression models, Q. douglasii seedling establishment is significantly associated with modeled CWDwy and to a lesser degree with modeled Tmax. Q. kelloggii seedling establishment is most strongly associated with Tmax and best predicted by a two-factor model including CWDwy and Tmax. Establishment niche models are applied to explore recruitment window dynamics in the western Tehachapi Mountains, where these species are currently widespread canopy dominants. Establishment windows are projected to decrease by 50–95%, shrinking locally to higher elevations and north-facing slopes by the end of this century depending on the species and climate scenario. These decreases in establishment windows suggest the potential for longer-term regional population declines of the species. While many additional processes regulate seedling establishment and growth, this study highlights the need to account for topoclimatic controls and interannual climatic variation when assessing how seedling establishment and colonization processes could be affected by climate change.
NASA Astrophysics Data System (ADS)
Ragettli, S.; Zhou, J.; Wang, H.; Liu, C.; Guo, L.
2017-12-01
Flash floods in small mountain catchments are one of the most frequent causes of loss of life and property from natural hazards in China. Hydrological models can be a useful tool for the anticipation of these events and the issuing of timely warnings. One of the main challenges of setting up such a system is finding appropriate model parameter values for ungauged catchments. Previous studies have shown that the transfer of parameter sets from hydrologically similar gauged catchments is one of the best performing regionalization methods. However, a remaining key issue is the identification of suitable descriptors of similarity. In this study, we use decision tree learning to explore parameter set transferability in the full space of catchment descriptors. For this purpose, a semi-distributed rainfall-runoff model is set up for 35 catchments in ten Chinese provinces. Hourly runoff data from in total 858 storm events are used to calibrate the model and to evaluate the performance of parameter set transfers between catchments. We then present a novel technique that uses the splitting rules of classification and regression trees (CART) for finding suitable donor catchments for ungauged target catchments. The ability of the model to detect flood events in assumed ungauged catchments is evaluated in series of leave-one-out tests. We show that CART analysis increases the probability of detection of 10-year flood events in comparison to a conventional measure of physiographic-climatic similarity by up to 20%. Decision tree learning can outperform other regionalization approaches because it generates rules that optimally consider spatial proximity and physical similarity. Spatial proximity can be used as a selection criteria but is skipped in the case where no similar gauged catchments are in the vicinity. We conclude that the CART regionalization concept is particularly suitable for implementation in sparsely gauged and topographically complex environments where a proximity-based regionalization concept is not applicable.
Efficient Exploration of the Space of Reconciled Gene Trees
Szöllősi, Gergely J.; Rosikiewicz, Wojciech; Boussau, Bastien; Tannier, Eric; Daubin, Vincent
2013-01-01
Gene trees record the combination of gene-level events, such as duplication, transfer and loss (DTL), and species-level events, such as speciation and extinction. Gene tree–species tree reconciliation methods model these processes by drawing gene trees into the species tree using a series of gene and species-level events. The reconstruction of gene trees based on sequence alone almost always involves choosing between statistically equivalent or weakly distinguishable relationships that could be much better resolved based on a putative species tree. To exploit this potential for accurate reconstruction of gene trees, the space of reconciled gene trees must be explored according to a joint model of sequence evolution and gene tree–species tree reconciliation. Here we present amalgamated likelihood estimation (ALE), a probabilistic approach to exhaustively explore all reconciled gene trees that can be amalgamated as a combination of clades observed in a sample of gene trees. We implement the ALE approach in the context of a reconciliation model (Szöllősi et al. 2013), which allows for the DTL of genes. We use ALE to efficiently approximate the sum of the joint likelihood over amalgamations and to find the reconciled gene tree that maximizes the joint likelihood among all such trees. We demonstrate using simulations that gene trees reconstructed using the joint likelihood are substantially more accurate than those reconstructed using sequence alone. Using realistic gene tree topologies, branch lengths, and alignment sizes, we demonstrate that ALE produces more accurate gene trees even if the model of sequence evolution is greatly simplified. Finally, examining 1099 gene families from 36 cyanobacterial genomes we find that joint likelihood-based inference results in a striking reduction in apparent phylogenetic discord, with respectively. 24%, 59%, and 46% reductions in the mean numbers of duplications, transfers, and losses per gene family. The open source implementation of ALE is available from https://github.com/ssolo/ALE.git. [amalgamation; gene tree reconciliation; gene tree reconstruction; lateral gene transfer; phylogeny.] PMID:23925510
Kumar, Rajesh; Dogra, Vishal; Rani, Khushbu; Sahu, Kanti
2017-01-01
District level determinants of total fertility rate in Empowered Action Group states of India can help in ongoing population stabilization programs in India. Present study intends to assess the role of district level determinants in predicting total fertility rate among districts of the Empowered Action Group states of India. Data from Annual Health Survey (2011-12) was analysed using STATA and R software packages. Multiple linear regression models were built and evaluated using Akaike Information Criterion. For further understanding, recursive partitioning was used to prepare a regression tree. Female married illiteracy positively associated with total fertility rate and explained more than half (53%) of variance. Under multiple linear regression model, married illiteracy, infant mortality rate, Ante natal care registration, household size, median age of live birth and sex ratio explained 70% of total variance in total fertility rate. In regression tree, female married illiteracy was the root node and splits at 42% determined TFR <= 2.7. The next left side branch was again married illiteracy with splits at 23% to determine TFR <= 2.1. We conclude that female married illiteracy is one of the most important determinants explaining total fertility rate among the districts of an Empowered Action Group states. Focus on female literacy is required to stabilize the population growth in long run.
Evaluating multimedia chemical persistence: Classification and regression tree analysis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bennett, D.H.; McKone, T.E.; Kastenberg, W.E.
2000-04-01
For the thousands of chemicals continuously released into the environment, it is desirable to make prospective assessments of those likely to be persistent. Widely distributed persistent chemicals are impossible to remove from the environment and remediation by natural processes may take decades, which is problematic if adverse health or ecological effects are discovered after prolonged release into the environment. A tiered approach using a classification scheme and a multimedia model for determining persistence is presented. Using specific criteria for persistence, a classification tree is developed to classify a chemical as persistent or nonpersistent based on the chemical properties. In thismore » approach, the classification is derived from the results of a standardized unit world multimedia model. Thus, the classifications are more robust for multimedia pollutants than classifications using a single medium half-life. The method can be readily implemented and provides insight without requiring extensive and often unavailable data. This method can be used to classify chemicals when only a few properties are known and can be used to direct further data collection. Case studies are presented to demonstrate the advantages of the approach.« less
Weight management behaviors in a sample of Iranian adolescent girls.
Garousi, S; Garrusi, B; Baneshi, Mohammad Reza; Sharifi, Z
2016-09-01
Attempts to obtain the ideal body shape portrayed in advertising can result in behaviors that lead to an unhealthy reduction in weight. This study was designed to identify contributing factors that may be effective in changing the behavior of a sample of Iranian adolescents. Three hundred fifty adolescent girls from high schools in Kerman, Iran participated in a cross-sectional study based on a self-administered questionnaire. Multifactorial logistic regression modeling was used to identify the factors influencing each of the contributing factors for body management methods, and a decision tree model was constructed to identify individuals who were more or less likely to change their body shape. Approximately one-third of the adolescent girls had attempted dieting, and 37 % of them had exercised to lose weight. The logistic regression model showed that pressure from their mother and the media; father's education level; and body mass index (BMI) were important factors in dieting. BMI and perceived pressure from the media were risk factors for attempting exercise. BMI and perceived pressure from relatives, particularly mothers, and the media were important factors in attempts by adolescent girls to lose weight.
Qiu, Lefeng; Wang, Kai; Long, Wenli; Wang, Ke; Hu, Wei; Amable, Gabriel S.
2016-01-01
Soil cadmium (Cd) contamination has attracted a great deal of attention because of its detrimental effects on animals and humans. This study aimed to develop and compare the performances of stepwise linear regression (SLR), classification and regression tree (CART) and random forest (RF) models in the prediction and mapping of the spatial distribution of soil Cd and to identify likely sources of Cd accumulation in Fuyang County, eastern China. Soil Cd data from 276 topsoil (0–20 cm) samples were collected and randomly divided into calibration (222 samples) and validation datasets (54 samples). Auxiliary data, including detailed land use information, soil organic matter, soil pH, and topographic data, were incorporated into the models to simulate the soil Cd concentrations and further identify the main factors influencing soil Cd variation. The predictive models for soil Cd concentration exhibited acceptable overall accuracies (72.22% for SLR, 70.37% for CART, and 75.93% for RF). The SLR model exhibited the largest predicted deviation, with a mean error (ME) of 0.074 mg/kg, a mean absolute error (MAE) of 0.160 mg/kg, and a root mean squared error (RMSE) of 0.274 mg/kg, and the RF model produced the results closest to the observed values, with an ME of 0.002 mg/kg, an MAE of 0.132 mg/kg, and an RMSE of 0.198 mg/kg. The RF model also exhibited the greatest R2 value (0.772). The CART model predictions closely followed, with ME, MAE, RMSE, and R2 values of 0.013 mg/kg, 0.154 mg/kg, 0.230 mg/kg and 0.644, respectively. The three prediction maps generally exhibited similar and realistic spatial patterns of soil Cd contamination. The heavily Cd-affected areas were primarily located in the alluvial valley plain of the Fuchun River and its tributaries because of the dramatic industrialization and urbanization processes that have occurred there. The most important variable for explaining high levels of soil Cd accumulation was the presence of metal smelting industries. The good performance of the RF model was attributable to its ability to handle the non-linear and hierarchical relationships between soil Cd and environmental variables. These results confirm that the RF approach is promising for the prediction and spatial distribution mapping of soil Cd at the regional scale. PMID:26964095
Qiu, Lefeng; Wang, Kai; Long, Wenli; Wang, Ke; Hu, Wei; Amable, Gabriel S
2016-01-01
Soil cadmium (Cd) contamination has attracted a great deal of attention because of its detrimental effects on animals and humans. This study aimed to develop and compare the performances of stepwise linear regression (SLR), classification and regression tree (CART) and random forest (RF) models in the prediction and mapping of the spatial distribution of soil Cd and to identify likely sources of Cd accumulation in Fuyang County, eastern China. Soil Cd data from 276 topsoil (0-20 cm) samples were collected and randomly divided into calibration (222 samples) and validation datasets (54 samples). Auxiliary data, including detailed land use information, soil organic matter, soil pH, and topographic data, were incorporated into the models to simulate the soil Cd concentrations and further identify the main factors influencing soil Cd variation. The predictive models for soil Cd concentration exhibited acceptable overall accuracies (72.22% for SLR, 70.37% for CART, and 75.93% for RF). The SLR model exhibited the largest predicted deviation, with a mean error (ME) of 0.074 mg/kg, a mean absolute error (MAE) of 0.160 mg/kg, and a root mean squared error (RMSE) of 0.274 mg/kg, and the RF model produced the results closest to the observed values, with an ME of 0.002 mg/kg, an MAE of 0.132 mg/kg, and an RMSE of 0.198 mg/kg. The RF model also exhibited the greatest R2 value (0.772). The CART model predictions closely followed, with ME, MAE, RMSE, and R2 values of 0.013 mg/kg, 0.154 mg/kg, 0.230 mg/kg and 0.644, respectively. The three prediction maps generally exhibited similar and realistic spatial patterns of soil Cd contamination. The heavily Cd-affected areas were primarily located in the alluvial valley plain of the Fuchun River and its tributaries because of the dramatic industrialization and urbanization processes that have occurred there. The most important variable for explaining high levels of soil Cd accumulation was the presence of metal smelting industries. The good performance of the RF model was attributable to its ability to handle the non-linear and hierarchical relationships between soil Cd and environmental variables. These results confirm that the RF approach is promising for the prediction and spatial distribution mapping of soil Cd at the regional scale.
Connecting clinical and actuarial prediction with rule-based methods.
Fokkema, Marjolein; Smits, Niels; Kelderman, Henk; Penninx, Brenda W J H
2015-06-01
Meta-analyses comparing the accuracy of clinical versus actuarial prediction have shown actuarial methods to outperform clinical methods, on average. However, actuarial methods are still not widely used in clinical practice, and there has been a call for the development of actuarial prediction methods for clinical practice. We argue that rule-based methods may be more useful than the linear main effect models usually employed in prediction studies, from a data and decision analytic as well as a practical perspective. In addition, decision rules derived with rule-based methods can be represented as fast and frugal trees, which, unlike main effects models, can be used in a sequential fashion, reducing the number of cues that have to be evaluated before making a prediction. We illustrate the usability of rule-based methods by applying RuleFit, an algorithm for deriving decision rules for classification and regression problems, to a dataset on prediction of the course of depressive and anxiety disorders from Penninx et al. (2011). The RuleFit algorithm provided a model consisting of 2 simple decision rules, requiring evaluation of only 2 to 4 cues. Predictive accuracy of the 2-rule model was very similar to that of a logistic regression model incorporating 20 predictor variables, originally applied to the dataset. In addition, the 2-rule model required, on average, evaluation of only 3 cues. Therefore, the RuleFit algorithm appears to be a promising method for creating decision tools that are less time consuming and easier to apply in psychological practice, and with accuracy comparable to traditional actuarial methods. (c) 2015 APA, all rights reserved).
Can Sap Flow Help Us to Better Understand Transpiration Patterns in Landscapes?
NASA Astrophysics Data System (ADS)
Hassler, S. K.; Weiler, M.; Blume, T.
2017-12-01
Transpiration is a key process in the hydrological cycle and a sound understanding and quantification of transpiration and its spatial variability is essential for management decisions and for improving the parameterisation of hydrological and soil-vegetation-atmosphere transfer models. At the tree scale, transpiration is commonly estimated by measuring sap flow. Besides evaporative demand and water availability, tree-specific characteristics such as species, size or social status, stand-specific characteristics such as basal area or stand density and site-specific characteristics such as geology, slope position or aspect control sap flow of individual trees. However, little is known about the relative importance or the dynamic interplay of these controls. We studied these influences with multiple linear regression models to explain the variability of sap velocity measurements in 61 beech and oak trees, located at 24 sites spread over a 290 km²-catchment in Luxembourg. For each of 132 consecutive days of the growing season of 2014 we applied linear models to the daily spatial pattern of sap velocity and determined the importance of the different predictors. By upscaling sap velocities to the tree level with the help of species-dependent empirical estimates for sapwood area we also examined patterns of sap flow as a more direct representation of transpiration. Results indicate that a combination of mainly tree- and site-specific factors controls sap velocity patterns in this landscape, namely tree species, tree diameter, geology and aspect. For sap flow, the site-specific predictors provided the largest contribution to the explained variance, however, in contrast to the sap velocity analysis, geology was more important than aspect. Spatial variability of atmospheric demand and soil moisture explained only a small fraction of the variance. However, the temporal dynamics of the explanatory power of the tree-specific characteristics, especially species, were correlated to the temporal dynamics of potential evaporation. We conclude that spatial representation of transpiration in models could benefit from including patterns according to tree and site characteristics.
Sah, Jay P.; Ross, Michael S.; Snyder, James R.; Ogurcak, Danielle E.
2010-01-01
In fire-dependent forests, managers are interested in predicting the consequences of prescribed burning on postfire tree mortality. We examined the effects of prescribed fire on tree mortality in Florida Keys pine forests, using a factorial design with understory type, season, and year of burn as factors. We also used logistic regression to model the effects of burn season, fire severity, and tree dimensions on individual tree mortality. Despite limited statistical power due to problems in carrying out the full suite of planned experimental burns, associations with tree and fire variables were observed. Post-fire pine tree mortality was negatively correlated with tree size and positively correlated with char height and percent crown scorch. Unlike post-fire mortality, tree mortality associated with storm surge from Hurricane Wilma was greater in the large size classes. Due to their influence on population structure and fuel dynamics, the size-selective mortality patterns following fire and storm surge have practical importance for using fire as a management tool in Florida Keys pinelands in the future, particularly when the threats to their continued existence from tropical storms and sea level rise are expected to increase.
You, Ming P.; Rensing, Kelly; Renton, Michael; Barbetti, Martin J.
2017-01-01
Subterranean clover (Trifolium subterraneum) is a critical pasture legume in Mediterranean regions of southern Australia and elsewhere, including Mediterranean-type climatic regions in Africa, Asia, Australia, Europe, North America, and South America. Pythium damping-off and root disease caused by Pythium irregulare is a significant threat to subterranean clover in Australia and a study was conducted to define how environmental factors (viz. temperature, soil type, moisture and nutrition) as well as variety, influence the extent of damping-off and root disease as well as subterranean clover productivity under challenge by this pathogen. Relationships were statistically modeled using linear and generalized linear models and boosted regression trees. Modeling found complex relationships between explanatory variables and the extent of Pythium damping-off and root rot. Linear modeling identified high-level (4 or 5-way) significant interactions for each dependent variable (dry shoot and root weight, emergence, tap and lateral root disease index). Furthermore, all explanatory variables (temperature, soil, moisture, nutrition, variety) were found significant as part of some interaction within these models. A significant five-way interaction between all explanatory variables was found for both dry shoot and root dry weights, and a four way interaction between temperature, soil, moisture, and nutrition was found for both tap and lateral root disease index. A second approach to modeling using boosted regression trees provided support for and helped clarify the complex nature of the relationships found in linear models. All explanatory variables showed at least 5% relative influence on each of the five dependent variables. All models indicated differences due to soil type, with the sand-based soil having either higher weights, greater emergence, or lower disease indices; while lowest weights and less emergence, as well as higher disease indices, were found for loam soil and low temperature. There was more severe tap and lateral root rot disease in higher moisture situations. PMID:29184544
Oztekin, Asil; Delen, Dursun; Kong, Zhenyu James
2009-12-01
Predicting the survival of heart-lung transplant patients has the potential to play a critical role in understanding and improving the matching procedure between the recipient and graft. Although voluminous data related to the transplantation procedures is being collected and stored, only a small subset of the predictive factors has been used in modeling heart-lung transplantation outcomes. The previous studies have mainly focused on applying statistical techniques to a small set of factors selected by the domain-experts in order to reveal the simple linear relationships between the factors and survival. The collection of methods known as 'data mining' offers significant advantages over conventional statistical techniques in dealing with the latter's limitations such as normality assumption of observations, independence of observations from each other, and linearity of the relationship between the observations and the output measure(s). There are statistical methods that overcome these limitations. Yet, they are computationally more expensive and do not provide fast and flexible solutions as do data mining techniques in large datasets. The main objective of this study is to improve the prediction of outcomes following combined heart-lung transplantation by proposing an integrated data-mining methodology. A large and feature-rich dataset (16,604 cases with 283 variables) is used to (1) develop machine learning based predictive models and (2) extract the most important predictive factors. Then, using three different variable selection methods, namely, (i) machine learning methods driven variables-using decision trees, neural networks, logistic regression, (ii) the literature review-based expert-defined variables, and (iii) common sense-based interaction variables, a consolidated set of factors is generated and used to develop Cox regression models for heart-lung graft survival. The predictive models' performance in terms of 10-fold cross-validation accuracy rates for two multi-imputed datasets ranged from 79% to 86% for neural networks, from 78% to 86% for logistic regression, and from 71% to 79% for decision trees. The results indicate that the proposed integrated data mining methodology using Cox hazard models better predicted the graft survival with different variables than the conventional approaches commonly used in the literature. This result is validated by the comparison of the corresponding Gains charts for our proposed methodology and the literature review based Cox results, and by the comparison of Akaike information criteria (AIC) values received from each. Data mining-based methodology proposed in this study reveals that there are undiscovered relationships (i.e. interactions of the existing variables) among the survival-related variables, which helps better predict the survival of the heart-lung transplants. It also brings a different set of variables into the scene to be evaluated by the domain-experts and be considered prior to the organ transplantation.
The influence of tree morphology on stemflow generation in a tropical lowland rainforest
NASA Astrophysics Data System (ADS)
Uber, Magdalena; Levia, Delphis F.; Zimmermann, Beate; Zimmermann, Alexander
2014-05-01
Even though stemflow usually accounts for only a small proportion of rainfall, it is an important point source of water and ion input to forest floors and may, for instance, influence soil moisture patterns and groundwater recharge. Previous studies showed that the generation of stemflow depends on a multitude of meteorological and biological factors. Interestingly, despite the tremendous progress in stemflow research during the last decades it is still largely unknown which combination of tree characteristics determines stemflow volumes in species-rich tropical forests. This knowledge gap motivated us to analyse the influence of tree characteristics on stemflow volumes in a 1 hectare plot located in a Panamanian lowland rainforest. Our study comprised stemflow measurements in six randomly selected 10 m by 10 m subplots. In each subplot we measured stemflow of all trees with a diameter at breast height (DBH) > 5 cm on an event-basis for a period of six weeks. Additionally, we identified all tree species and determined a set of tree characteristics including DBH, crown diameter, bark roughness, bark furrowing, epiphyte coverage, tree architecture, stem inclination, and crown position. During the sampling period, we collected 985 L of stemflow (0.98 % of total rainfall). Based on regression analyses and comparisons among plant functional groups we show that palms were most efficient in yielding stemflow due to their large inclined fronds. Trees with large emergent crowns also produced relatively large amounts of stemflow. Due to their abundance, understory trees contribute much to stemflow yield not on individual but on the plot scale. Even though parameters such as crown diameter, branch inclination and position of the crown influence stemflow generation to some extent, these parameters explain less than 30 % of the variation in stemflow volumes. In contrast to published results from temperate forests, we did not detect a negative correlation between bark roughness and stemflow volume. This is because other parameters such as crown diameter obscured this relationship. Due to multicollinearity and poor correlations between single tree characteristics with stemflow volume, an assessment of stemflow volumes based on forest characteristics remains cumbersome in highly diverse ecosystems. Instead of relying on regression relationships, we therefore advocate a total sampling of trees in several plots to determine stand-scale stemflow yield in tropical forests.
Chen, Yikai; Wang, Kai; Xu, Chengcheng; Shi, Qin; He, Jie; Li, Peiqing; Shi, Ting
2018-05-19
To overcome the limitations of previous highway alignment safety evaluation methods, this article presents a highway alignment safety evaluation method based on fault tree analysis (FTA) and the characteristics of vehicle safety boundaries, within the framework of dynamic modeling of the driver-vehicle-road system. Approaches for categorizing the vehicle failure modes while driving on highways and the corresponding safety boundaries were comprehensively investigated based on vehicle system dynamics theory. Then, an overall crash probability model was formulated based on FTA considering the risks of 3 failure modes: losing steering capability, losing track-holding capability, and rear-end collision. The proposed method was implemented on a highway segment between Bengbu and Nanjing in China. A driver-vehicle-road multibody dynamics model was developed based on the 3D alignments of the Bengbu to Nanjing section of Ning-Luo expressway using Carsim, and the dynamics indices, such as sideslip angle and, yaw rate were obtained. Then, the average crash probability of each road section was calculated with a fixed-length method. Finally, the average crash probability was validated against the crash frequency per kilometer to demonstrate the accuracy of the proposed method. The results of the regression analysis and correlation analysis indicated good consistency between the results of the safety evaluation and the crash data and that it outperformed the safety evaluation methods used in previous studies. The proposed method has the potential to be used in practical engineering applications to identify crash-prone locations and alignment deficiencies on highways in the planning and design phases, as well as those in service.
Ratliff, John K; Balise, Ray; Veeravagu, Anand; Cole, Tyler S; Cheng, Ivan; Olshen, Richard A; Tian, Lu
2016-05-18
Postoperative metrics are increasingly important in determining standards of quality for physicians and hospitals. Although complications following spinal surgery have been described, procedural and patient variables have yet to be incorporated into a predictive model of adverse-event occurrence. We sought to develop a predictive model of complication occurrence after spine surgery. We used longitudinal prospective data from a national claims database and developed a predictive model incorporating complication type and frequency of occurrence following spine surgery procedures. We structured our model to assess the impact of features such as preoperative diagnosis, patient comorbidities, location in the spine, anterior versus posterior approach, whether fusion had been performed, whether instrumentation had been used, number of levels, and use of bone morphogenetic protein (BMP). We assessed a variety of adverse events. Prediction models were built using logistic regression with additive main effects and logistic regression with main effects as well as all 2 and 3-factor interactions. Least absolute shrinkage and selection operator (LASSO) regularization was used to select features. Competing approaches included boosted additive trees and the classification and regression trees (CART) algorithm. The final prediction performance was evaluated by estimating the area under a receiver operating characteristic curve (AUC) as predictions were applied to independent validation data and compared with the Charlson comorbidity score. The model was developed from 279,135 records of patients with a minimum duration of follow-up of 30 days. Preliminary assessment showed an adverse-event rate of 13.95%, well within norms reported in the literature. We used the first 80% of the records for training (to predict adverse events) and the remaining 20% of the records for validation. There was remarkable similarity among methods, with an AUC of 0.70 for predicting the occurrence of adverse events. The AUC using the Charlson comorbidity score was 0.61. The described model was more accurate than Charlson scoring (p < 0.01). We present a modeling effort based on administrative claims data that predicts the occurrence of complications after spine surgery. We believe that the development of a predictive modeling tool illustrating the risk of complication occurrence after spine surgery will aid in patient counseling and improve the accuracy of risk modeling strategies. Copyright © 2016 by The Journal of Bone and Joint Surgery, Incorporated.
Ensemble learning with trees and rules: supervised, semi-supervised, unsupervised
USDA-ARS?s Scientific Manuscript database
In this article, we propose several new approaches for post processing a large ensemble of conjunctive rules for supervised and semi-supervised learning problems. We show with various examples that for high dimensional regression problems the models constructed by the post processing the rules with ...