Sample records for additive regression trees

  1. Additivity of nonlinear biomass equations

    Treesearch

    Bernard R. Parresol

    2001-01-01

    Two procedures that guarantee the property of additivity among the components of tree biomass and total tree biomass utilizing nonlinear functions are developed. Procedure 1 is a simple combination approach, and procedure 2 is based on nonlinear joint-generalized regression (nonlinear seemingly unrelated regressions) with parameter restrictions. Statistical theory is...

  2. Application of Machine-Learning Models to Predict Tacrolimus Stable Dose in Renal Transplant Recipients

    NASA Astrophysics Data System (ADS)

    Tang, Jie; Liu, Rong; Zhang, Yue-Li; Liu, Mou-Ze; Hu, Yong-Fang; Shao, Ming-Jie; Zhu, Li-Jun; Xin, Hua-Wen; Feng, Gui-Wen; Shang, Wen-Jun; Meng, Xiang-Guang; Zhang, Li-Rong; Ming, Ying-Zi; Zhang, Wei

    2017-02-01

    Tacrolimus has a narrow therapeutic window and considerable variability in clinical use. Our goal was to compare the performance of multiple linear regression (MLR) and eight machine learning techniques in pharmacogenetic algorithm-based prediction of tacrolimus stable dose (TSD) in a large Chinese cohort. A total of 1,045 renal transplant patients were recruited, 80% of which were randomly selected as the “derivation cohort” to develop dose-prediction algorithm, while the remaining 20% constituted the “validation cohort” to test the final selected algorithm. MLR, artificial neural network (ANN), regression tree (RT), multivariate adaptive regression splines (MARS), boosted regression tree (BRT), support vector regression (SVR), random forest regression (RFR), lasso regression (LAR) and Bayesian additive regression trees (BART) were applied and their performances were compared in this work. Among all the machine learning models, RT performed best in both derivation [0.71 (0.67-0.76)] and validation cohorts [0.73 (0.63-0.82)]. In addition, the ideal rate of RT was 4% higher than that of MLR. To our knowledge, this is the first study to use machine learning models to predict TSD, which will further facilitate personalized medicine in tacrolimus administration in the future.

  3. Bayesian additive decision trees of biomarker by treatment interactions for predictive biomarker detection and subgroup identification.

    PubMed

    Zhao, Yang; Zheng, Wei; Zhuo, Daisy Y; Lu, Yuefeng; Ma, Xiwen; Liu, Hengchang; Zeng, Zhen; Laird, Glen

    2017-10-11

    Personalized medicine, or tailored therapy, has been an active and important topic in recent medical research. Many methods have been proposed in the literature for predictive biomarker detection and subgroup identification. In this article, we propose a novel decision tree-based approach applicable in randomized clinical trials. We model the prognostic effects of the biomarkers using additive regression trees and the biomarker-by-treatment effect using a single regression tree. Bayesian approach is utilized to periodically revise the split variables and the split rules of the decision trees, which provides a better overall fitting. Gibbs sampler is implemented in the MCMC procedure, which updates the prognostic trees and the interaction tree separately. We use the posterior distribution of the interaction tree to construct the predictive scores of the biomarkers and to identify the subgroup where the treatment is superior to the control. Numerical simulations show that our proposed method performs well under various settings comparing to existing methods. We also demonstrate an application of our method in a real clinical trial.

  4. Methods for estimating population density in data-limited areas: evaluating regression and tree-based models in Peru.

    PubMed

    Anderson, Weston; Guikema, Seth; Zaitchik, Ben; Pan, William

    2014-01-01

    Obtaining accurate small area estimates of population is essential for policy and health planning but is often difficult in countries with limited data. In lieu of available population data, small area estimate models draw information from previous time periods or from similar areas. This study focuses on model-based methods for estimating population when no direct samples are available in the area of interest. To explore the efficacy of tree-based models for estimating population density, we compare six different model structures including Random Forest and Bayesian Additive Regression Trees. Results demonstrate that without information from prior time periods, non-parametric tree-based models produced more accurate predictions than did conventional regression methods. Improving estimates of population density in non-sampled areas is important for regions with incomplete census data and has implications for economic, health and development policies.

  5. Methods for Estimating Population Density in Data-Limited Areas: Evaluating Regression and Tree-Based Models in Peru

    PubMed Central

    Anderson, Weston; Guikema, Seth; Zaitchik, Ben; Pan, William

    2014-01-01

    Obtaining accurate small area estimates of population is essential for policy and health planning but is often difficult in countries with limited data. In lieu of available population data, small area estimate models draw information from previous time periods or from similar areas. This study focuses on model-based methods for estimating population when no direct samples are available in the area of interest. To explore the efficacy of tree-based models for estimating population density, we compare six different model structures including Random Forest and Bayesian Additive Regression Trees. Results demonstrate that without information from prior time periods, non-parametric tree-based models produced more accurate predictions than did conventional regression methods. Improving estimates of population density in non-sampled areas is important for regions with incomplete census data and has implications for economic, health and development policies. PMID:24992657

  6. Aneurysmal subarachnoid hemorrhage prognostic decision-making algorithm using classification and regression tree analysis.

    PubMed

    Lo, Benjamin W Y; Fukuda, Hitoshi; Angle, Mark; Teitelbaum, Jeanne; Macdonald, R Loch; Farrokhyar, Forough; Thabane, Lehana; Levine, Mitchell A H

    2016-01-01

    Classification and regression tree analysis involves the creation of a decision tree by recursive partitioning of a dataset into more homogeneous subgroups. Thus far, there is scarce literature on using this technique to create clinical prediction tools for aneurysmal subarachnoid hemorrhage (SAH). The classification and regression tree analysis technique was applied to the multicenter Tirilazad database (3551 patients) in order to create the decision-making algorithm. In order to elucidate prognostic subgroups in aneurysmal SAH, neurologic, systemic, and demographic factors were taken into account. The dependent variable used for analysis was the dichotomized Glasgow Outcome Score at 3 months. Classification and regression tree analysis revealed seven prognostic subgroups. Neurological grade, occurrence of post-admission stroke, occurrence of post-admission fever, and age represented the explanatory nodes of this decision tree. Split sample validation revealed classification accuracy of 79% for the training dataset and 77% for the testing dataset. In addition, the occurrence of fever at 1-week post-aneurysmal SAH is associated with increased odds of post-admission stroke (odds ratio: 1.83, 95% confidence interval: 1.56-2.45, P < 0.01). A clinically useful classification tree was generated, which serves as a prediction tool to guide bedside prognostication and clinical treatment decision making. This prognostic decision-making algorithm also shed light on the complex interactions between a number of risk factors in determining outcome after aneurysmal SAH.

  7. Comparison of modeling methods to predict the spatial distribution of deep-sea coral and sponge in the Gulf of Alaska

    NASA Astrophysics Data System (ADS)

    Rooper, Christopher N.; Zimmermann, Mark; Prescott, Megan M.

    2017-08-01

    Deep-sea coral and sponge ecosystems are widespread throughout most of Alaska's marine waters, and are associated with many different species of fishes and invertebrates. These ecosystems are vulnerable to the effects of commercial fishing activities and climate change. We compared four commonly used species distribution models (general linear models, generalized additive models, boosted regression trees and random forest models) and an ensemble model to predict the presence or absence and abundance of six groups of benthic invertebrate taxa in the Gulf of Alaska. All four model types performed adequately on training data for predicting presence and absence, with regression forest models having the best overall performance measured by the area under the receiver-operating-curve (AUC). The models also performed well on the test data for presence and absence with average AUCs ranging from 0.66 to 0.82. For the test data, ensemble models performed the best. For abundance data, there was an obvious demarcation in performance between the two regression-based methods (general linear models and generalized additive models), and the tree-based models. The boosted regression tree and random forest models out-performed the other models by a wide margin on both the training and testing data. However, there was a significant drop-off in performance for all models of invertebrate abundance ( 50%) when moving from the training data to the testing data. Ensemble model performance was between the tree-based and regression-based methods. The maps of predictions from the models for both presence and abundance agreed very well across model types, with an increase in variability in predictions for the abundance data. We conclude that where data conforms well to the modeled distribution (such as the presence-absence data and binomial distribution in this study), the four types of models will provide similar results, although the regression-type models may be more consistent with biological theory. For data with highly zero-inflated distributions and non-normal distributions such as the abundance data from this study, the tree-based methods performed better. Ensemble models that averaged predictions across the four model types, performed better than the GLM or GAM models but slightly poorer than the tree-based methods, suggesting ensemble models might be more robust to overfitting than tree methods, while mitigating some of the disadvantages in predictive performance of regression methods.

  8. Analysis of occlusal variables, dental attrition, and age for distinguishing healthy controls from female patients with intracapsular temporomandibular disorders.

    PubMed

    Seligman, D A; Pullinger, A G

    2000-01-01

    Confusion about the relationship of occlusion to temporomandibular disorders (TMD) persists. This study attempted to identify occlusal and attrition factors plus age that would characterize asymptomatic normal female subjects. A total of 124 female patients with intracapsular TMD were compared with 47 asymptomatic female controls for associations to 9 occlusal factors, 3 attrition severity measures, and age using classification tree, multiple stepwise logistic regression, and univariate analyses. Models were tested for accuracy (sensitivity and specificity) and total contribution to the variance. The classification tree model had 4 terminal nodes that used only anterior attrition and age. "Normals" were mainly characterized by low attrition levels, whereas patients had higher attrition and tended to be younger. The tree model was only moderately useful (sensitivity 63%, specificity 94%) in predicting normals. The logistic regression model incorporated unilateral posterior crossbite and mediotrusive attrition severity in addition to the 2 factors in the tree, but was slightly less accurate than the tree (sensitivity 51%, specificity 90%). When only occlusal factors were considered in the analysis, normals were additionally characterized by a lack of anterior open bite, smaller overjet, and smaller RCP-ICP slides. The log likelihood accounted for was similar for both the tree (pseudo R(2) = 29.38%; mean deviance = 0.95) and the multiple logistic regression (Cox Snell R(2) = 30.3%, mean deviance = 0.84) models. The occlusal and attrition factors studied were only moderately useful in differentiating normals from TMD patients.

  9. Predicting tree species presence and basal area in Utah: A comparison of stochastic gradient boosting, generalized additive models, and tree-based methods

    Treesearch

    Gretchen G. Moisen; Elizabeth A. Freeman; Jock A. Blackard; Tracey S. Frescino; Niklaus E. Zimmermann; Thomas C. Edwards

    2006-01-01

    Many efforts are underway to produce broad-scale forest attribute maps by modelling forest class and structure variables collected in forest inventories as functions of satellite-based and biophysical information. Typically, variants of classification and regression trees implemented in Rulequest's© See5 and Cubist (for binary and continuous responses,...

  10. Spatial Assessment of Model Errors from Four Regression Techniques

    Treesearch

    Lianjun Zhang; Jeffrey H. Gove; Jeffrey H. Gove

    2005-01-01

    Fomst modelers have attempted to account for the spatial autocorrelations among trees in growth and yield models by applying alternative regression techniques such as linear mixed models (LMM), generalized additive models (GAM), and geographicalIy weighted regression (GWR). However, the model errors are commonly assessed using average errors across the entire study...

  11. Individualized Prediction of Heat Stress in Firefighters: A Data-Driven Approach Using Classification and Regression Trees.

    PubMed

    Mani, Ashutosh; Rao, Marepalli; James, Kelley; Bhattacharya, Amit

    2015-01-01

    The purpose of this study was to explore data-driven models, based on decision trees, to develop practical and easy to use predictive models for early identification of firefighters who are likely to cross the threshold of hyperthermia during live-fire training. Predictive models were created for three consecutive live-fire training scenarios. The final predicted outcome was a categorical variable: will a firefighter cross the upper threshold of hyperthermia - Yes/No. Two tiers of models were built, one with and one without taking into account the outcome (whether a firefighter crossed hyperthermia or not) from the previous training scenario. First tier of models included age, baseline heart rate and core body temperature, body mass index, and duration of training scenario as predictors. The second tier of models included the outcome of the previous scenario in the prediction space, in addition to all the predictors from the first tier of models. Classification and regression trees were used independently for prediction. The response variable for the regression tree was the quantitative variable: core body temperature at the end of each scenario. The predicted quantitative variable from regression trees was compared to the upper threshold of hyperthermia (38°C) to predict whether a firefighter would enter hyperthermia. The performance of classification and regression tree models was satisfactory for the second (success rate = 79%) and third (success rate = 89%) training scenarios but not for the first (success rate = 43%). Data-driven models based on decision trees can be a useful tool for predicting physiological response without modeling the underlying physiological systems. Early prediction of heat stress coupled with proactive interventions, such as pre-cooling, can help reduce heat stress in firefighters.

  12. Is Susceptibility to Prenatal Methylmercury Exposure from Fish Consumption Non-Homogeneous? Tree-Structured Analysis for the Seychelles Child Development Study

    PubMed Central

    Huang, Li-Shan; Myers, Gary J.; Davidson, Philip W.; Cox, Christopher; Xiao, Fenyuan; Thurston, Sally W.; Cernichiari, Elsa; Shamlaye, Conrad F.; Sloane-Reeves, Jean; Georger, Lesley; Clarkson, Thomas W.

    2007-01-01

    Studies of the association between prenatal methylmercury exposure from maternal fish consumption during pregnancy and neurodevelopmental test scores in the Seychelles Child Development Study have found no consistent pattern of associations through age nine years. The analyses for the most recent nine-year data examined the population effects of prenatal exposure, but did not address the possibility of non-homogeneous susceptibility. This paper presents a regression tree approach: covariate effects are treated nonlinearly and non-additively and non-homogeneous effects of prenatal methylmercury exposure are permitted among the covariate clusters identified by the regression tree. The approach allows us to address whether children in the lower or higher ends of the developmental spectrum differ in susceptibility to subtle exposure effects. Of twenty-one endpoints available at age nine years, we chose the Weschler Full Scale IQ and its associated covariates to construct the regression tree. The prenatal mercury effect in each of the nine resulting clusters was assessed linearly and non-homogeneously. In addition we reanalyzed five other nine-year endpoints that in the linear analysis has a two-tailed p-value <0.2 for the effect of prenatal exposure. In this analysis, motor proficiency and activity level improved significantly with increasing MeHg for 53% of the children who had an average home environment. Motor proficiency significantly decreased with increasing prenatal MeHg exposure in 7% of the children whose home environment was below average. The regression tree results support previous analyses of outcomes in this cohort. However, this analysis raises the intriguing possibility that an effect may be non-homogeneous among children with different backgrounds and IQ levels. PMID:17942158

  13. Is susceptibility to prenatal methylmercury exposure from fish consumption non-homogeneous? Tree-structured analysis for the Seychelles Child Development Study.

    PubMed

    Huang, Li-Shan; Myers, Gary J; Davidson, Philip W; Cox, Christopher; Xiao, Fenyuan; Thurston, Sally W; Cernichiari, Elsa; Shamlaye, Conrad F; Sloane-Reeves, Jean; Georger, Lesley; Clarkson, Thomas W

    2007-11-01

    Studies of the association between prenatal methylmercury exposure from maternal fish consumption during pregnancy and neurodevelopmental test scores in the Seychelles Child Development Study have found no consistent pattern of associations through age 9 years. The analyses for the most recent 9-year data examined the population effects of prenatal exposure, but did not address the possibility of non-homogeneous susceptibility. This paper presents a regression tree approach: covariate effects are treated non-linearly and non-additively and non-homogeneous effects of prenatal methylmercury exposure are permitted among the covariate clusters identified by the regression tree. The approach allows us to address whether children in the lower or higher ends of the developmental spectrum differ in susceptibility to subtle exposure effects. Of 21 endpoints available at age 9 years, we chose the Weschler Full Scale IQ and its associated covariates to construct the regression tree. The prenatal mercury effect in each of the nine resulting clusters was assessed linearly and non-homogeneously. In addition we reanalyzed five other 9-year endpoints that in the linear analysis had a two-tailed p-value <0.2 for the effect of prenatal exposure. In this analysis, motor proficiency and activity level improved significantly with increasing MeHg for 53% of the children who had an average home environment. Motor proficiency significantly decreased with increasing prenatal MeHg exposure in 7% of the children whose home environment was below average. The regression tree results support previous analyses of outcomes in this cohort. However, this analysis raises the intriguing possibility that an effect may be non-homogeneous among children with different backgrounds and IQ levels.

  14. Predicting volume of distribution with decision tree-based regression methods using predicted tissue:plasma partition coefficients.

    PubMed

    Freitas, Alex A; Limbu, Kriti; Ghafourian, Taravat

    2015-01-01

    Volume of distribution is an important pharmacokinetic property that indicates the extent of a drug's distribution in the body tissues. This paper addresses the problem of how to estimate the apparent volume of distribution at steady state (Vss) of chemical compounds in the human body using decision tree-based regression methods from the area of data mining (or machine learning). Hence, the pros and cons of several different types of decision tree-based regression methods have been discussed. The regression methods predict Vss using, as predictive features, both the compounds' molecular descriptors and the compounds' tissue:plasma partition coefficients (Kt:p) - often used in physiologically-based pharmacokinetics. Therefore, this work has assessed whether the data mining-based prediction of Vss can be made more accurate by using as input not only the compounds' molecular descriptors but also (a subset of) their predicted Kt:p values. Comparison of the models that used only molecular descriptors, in particular, the Bagging decision tree (mean fold error of 2.33), with those employing predicted Kt:p values in addition to the molecular descriptors, such as the Bagging decision tree using adipose Kt:p (mean fold error of 2.29), indicated that the use of predicted Kt:p values as descriptors may be beneficial for accurate prediction of Vss using decision trees if prior feature selection is applied. Decision tree based models presented in this work have an accuracy that is reasonable and similar to the accuracy of reported Vss inter-species extrapolations in the literature. The estimation of Vss for new compounds in drug discovery will benefit from methods that are able to integrate large and varied sources of data and flexible non-linear data mining methods such as decision trees, which can produce interpretable models. Graphical AbstractDecision trees for the prediction of tissue partition coefficient and volume of distribution of drugs.

  15. Ensemble of trees approaches to risk adjustment for evaluating a hospital's performance.

    PubMed

    Liu, Yang; Traskin, Mikhail; Lorch, Scott A; George, Edward I; Small, Dylan

    2015-03-01

    A commonly used method for evaluating a hospital's performance on an outcome is to compare the hospital's observed outcome rate to the hospital's expected outcome rate given its patient (case) mix and service. The process of calculating the hospital's expected outcome rate given its patient mix and service is called risk adjustment (Iezzoni 1997). Risk adjustment is critical for accurately evaluating and comparing hospitals' performances since we would not want to unfairly penalize a hospital just because it treats sicker patients. The key to risk adjustment is accurately estimating the probability of an Outcome given patient characteristics. For cases with binary outcomes, the method that is commonly used in risk adjustment is logistic regression. In this paper, we consider ensemble of trees methods as alternatives for risk adjustment, including random forests and Bayesian additive regression trees (BART). Both random forests and BART are modern machine learning methods that have been shown recently to have excellent performance for prediction of outcomes in many settings. We apply these methods to carry out risk adjustment for the performance of neonatal intensive care units (NICU). We show that these ensemble of trees methods outperform logistic regression in predicting mortality among babies treated in NICU, and provide a superior method of risk adjustment compared to logistic regression.

  16. A Pilot Test of Indicator Species to Assess Uniqueness of Oak-Dominated Ecoregions in Central Tennessee

    Treesearch

    W. Henry McNab; David L. Loftis; Callie J. Schweitzer; Raymond Sheffield

    2004-01-01

    We used tree indicator species occurring on 438 plots in the Plateau counties of Tennessee to test the uniqueness of four conterminous ecoregions. Multinomial logistic regression indicated that the presence of 14 tree species allowed classification of sample plots according to ecoregion with an average overall accuracy of 75 percent (range 45 to 94 percent). Additional...

  17. Application of Boosting Regression Trees to Preliminary Cost Estimation in Building Construction Projects

    PubMed Central

    2015-01-01

    Among the recent data mining techniques available, the boosting approach has attracted a great deal of attention because of its effective learning algorithm and strong boundaries in terms of its generalization performance. However, the boosting approach has yet to be used in regression problems within the construction domain, including cost estimations, but has been actively utilized in other domains. Therefore, a boosting regression tree (BRT) is applied to cost estimations at the early stage of a construction project to examine the applicability of the boosting approach to a regression problem within the construction domain. To evaluate the performance of the BRT model, its performance was compared with that of a neural network (NN) model, which has been proven to have a high performance in cost estimation domains. The BRT model has shown results similar to those of NN model using 234 actual cost datasets of a building construction project. In addition, the BRT model can provide additional information such as the importance plot and structure model, which can support estimators in comprehending the decision making process. Consequently, the boosting approach has potential applicability in preliminary cost estimations in a building construction project. PMID:26339227

  18. Application of Boosting Regression Trees to Preliminary Cost Estimation in Building Construction Projects.

    PubMed

    Shin, Yoonseok

    2015-01-01

    Among the recent data mining techniques available, the boosting approach has attracted a great deal of attention because of its effective learning algorithm and strong boundaries in terms of its generalization performance. However, the boosting approach has yet to be used in regression problems within the construction domain, including cost estimations, but has been actively utilized in other domains. Therefore, a boosting regression tree (BRT) is applied to cost estimations at the early stage of a construction project to examine the applicability of the boosting approach to a regression problem within the construction domain. To evaluate the performance of the BRT model, its performance was compared with that of a neural network (NN) model, which has been proven to have a high performance in cost estimation domains. The BRT model has shown results similar to those of NN model using 234 actual cost datasets of a building construction project. In addition, the BRT model can provide additional information such as the importance plot and structure model, which can support estimators in comprehending the decision making process. Consequently, the boosting approach has potential applicability in preliminary cost estimations in a building construction project.

  19. Newer classification and regression tree techniques: Bagging and Random Forests for ecological prediction

    Treesearch

    Anantha M. Prasad; Louis R. Iverson; Andy Liaw; Andy Liaw

    2006-01-01

    We evaluated four statistical models - Regression Tree Analysis (RTA), Bagging Trees (BT), Random Forests (RF), and Multivariate Adaptive Regression Splines (MARS) - for predictive vegetation mapping under current and future climate scenarios according to the Canadian Climate Centre global circulation model.

  20. Towards lidar-based mapping of tree age at the Arctic forest tundra ecotone.

    NASA Astrophysics Data System (ADS)

    Jensen, J.; Maguire, A.; Oelkers, R.; Andreu-Hayles, L.; Boelman, N.; D'Arrigo, R.; Griffin, K. L.; Jennewein, J. S.; Hiers, E.; Meddens, A. J.; Russell, M.; Vierling, L. A.; Eitel, J.

    2017-12-01

    Climate change may cause spatial shifts in the forest-tundra ecotone (FTE). To improve our ability to study these spatial shifts, information on tree demography along the FTE is needed. The objective of this study was to assess the suitability of lidar derived tree heights as a surrogate for tree age. We calculated individual tree age from 48 tree cores collected at basal height from white spruce (Picea glauca) within the FTE in northern Alaska. Tree height was obtained from terrestrial lidar scans (<1cm spatial resolution). The relationship between age and height was examined using a linear regression model forced through the origin. We found a very strong predictive relationship between tree height and age (R2 = 0.90, RMSE = 19.34 years) for trees that ranged between 14 to 230 years. Separate regression models were also developed for small (height < 3 m) and large trees (height >= 3 m), yielding strong predictive relationships between height and age (R2 = 0.86, RMSE 12.21 years, and R2 = 0.93, RMSE = 25.16 years, respectively). The slope coefficient for small and large tree models (16.83 and 12.98 years/m, respectively) indicate that small trees grow 1.3 times faster than large trees at these FTE study sites. Although a strong, predictive relationship between age and height is uncommon in light-limited forest environments, our findings suggest that the sparseness of trees within the FTE may explain the strong tree height-age relationships found herein. Further analysis of 36 additional tree cores recently collected within the FTE near Inuvik, Canada will be performed. Our preliminary analysis suggests that lidar derived tree height could be a reliable proxy for tree age at the FTE, thereby establishing a new technique for scaling tree structure and demographics across larger portions of this sensitive ecotone.

  1. The process and utility of classification and regression tree methodology in nursing research

    PubMed Central

    Kuhn, Lisa; Page, Karen; Ward, John; Worrall-Carter, Linda

    2014-01-01

    Aim This paper presents a discussion of classification and regression tree analysis and its utility in nursing research. Background Classification and regression tree analysis is an exploratory research method used to illustrate associations between variables not suited to traditional regression analysis. Complex interactions are demonstrated between covariates and variables of interest in inverted tree diagrams. Design Discussion paper. Data sources English language literature was sourced from eBooks, Medline Complete and CINAHL Plus databases, Google and Google Scholar, hard copy research texts and retrieved reference lists for terms including classification and regression tree* and derivatives and recursive partitioning from 1984–2013. Discussion Classification and regression tree analysis is an important method used to identify previously unknown patterns amongst data. Whilst there are several reasons to embrace this method as a means of exploratory quantitative research, issues regarding quality of data as well as the usefulness and validity of the findings should be considered. Implications for Nursing Research Classification and regression tree analysis is a valuable tool to guide nurses to reduce gaps in the application of evidence to practice. With the ever-expanding availability of data, it is important that nurses understand the utility and limitations of the research method. Conclusion Classification and regression tree analysis is an easily interpreted method for modelling interactions between health-related variables that would otherwise remain obscured. Knowledge is presented graphically, providing insightful understanding of complex and hierarchical relationships in an accessible and useful way to nursing and other health professions. PMID:24237048

  2. The process and utility of classification and regression tree methodology in nursing research.

    PubMed

    Kuhn, Lisa; Page, Karen; Ward, John; Worrall-Carter, Linda

    2014-06-01

    This paper presents a discussion of classification and regression tree analysis and its utility in nursing research. Classification and regression tree analysis is an exploratory research method used to illustrate associations between variables not suited to traditional regression analysis. Complex interactions are demonstrated between covariates and variables of interest in inverted tree diagrams. Discussion paper. English language literature was sourced from eBooks, Medline Complete and CINAHL Plus databases, Google and Google Scholar, hard copy research texts and retrieved reference lists for terms including classification and regression tree* and derivatives and recursive partitioning from 1984-2013. Classification and regression tree analysis is an important method used to identify previously unknown patterns amongst data. Whilst there are several reasons to embrace this method as a means of exploratory quantitative research, issues regarding quality of data as well as the usefulness and validity of the findings should be considered. Classification and regression tree analysis is a valuable tool to guide nurses to reduce gaps in the application of evidence to practice. With the ever-expanding availability of data, it is important that nurses understand the utility and limitations of the research method. Classification and regression tree analysis is an easily interpreted method for modelling interactions between health-related variables that would otherwise remain obscured. Knowledge is presented graphically, providing insightful understanding of complex and hierarchical relationships in an accessible and useful way to nursing and other health professions. © 2013 The Authors. Journal of Advanced Nursing Published by John Wiley & Sons Ltd.

  3. Building interpretable predictive models for pediatric hospital readmission using Tree-Lasso logistic regression.

    PubMed

    Jovanovic, Milos; Radovanovic, Sandro; Vukicevic, Milan; Van Poucke, Sven; Delibasic, Boris

    2016-09-01

    Quantification and early identification of unplanned readmission risk have the potential to improve the quality of care during hospitalization and after discharge. However, high dimensionality, sparsity, and class imbalance of electronic health data and the complexity of risk quantification, challenge the development of accurate predictive models. Predictive models require a certain level of interpretability in order to be applicable in real settings and create actionable insights. This paper aims to develop accurate and interpretable predictive models for readmission in a general pediatric patient population, by integrating a data-driven model (sparse logistic regression) and domain knowledge based on the international classification of diseases 9th-revision clinical modification (ICD-9-CM) hierarchy of diseases. Additionally, we propose a way to quantify the interpretability of a model and inspect the stability of alternative solutions. The analysis was conducted on >66,000 pediatric hospital discharge records from California, State Inpatient Databases, Healthcare Cost and Utilization Project between 2009 and 2011. We incorporated domain knowledge based on the ICD-9-CM hierarchy in a data driven, Tree-Lasso regularized logistic regression model, providing the framework for model interpretation. This approach was compared with traditional Lasso logistic regression resulting in models that are easier to interpret by fewer high-level diagnoses, with comparable prediction accuracy. The results revealed that the use of a Tree-Lasso model was as competitive in terms of accuracy (measured by area under the receiver operating characteristic curve-AUC) as the traditional Lasso logistic regression, but integration with the ICD-9-CM hierarchy of diseases provided more interpretable models in terms of high-level diagnoses. Additionally, interpretations of models are in accordance with existing medical understanding of pediatric readmission. Best performing models have similar performances reaching AUC values 0.783 and 0.779 for traditional Lasso and Tree-Lasso, respectfully. However, information loss of Lasso models is 0.35 bits higher compared to Tree-Lasso model. We propose a method for building predictive models applicable for the detection of readmission risk based on Electronic Health records. Integration of domain knowledge (in the form of ICD-9-CM taxonomy) and a data-driven, sparse predictive algorithm (Tree-Lasso Logistic Regression) resulted in an increase of interpretability of the resulting model. The models are interpreted for the readmission prediction problem in general pediatric population in California, as well as several important subpopulations, and the interpretations of models comply with existing medical understanding of pediatric readmission. Finally, quantitative assessment of the interpretability of the models is given, that is beyond simple counts of selected low-level features. Copyright © 2016 Elsevier B.V. All rights reserved.

  4. Regression trees for predicting mortality in patients with cardiovascular disease: What improvement is achieved by using ensemble-based methods?

    PubMed Central

    Austin, Peter C; Lee, Douglas S; Steyerberg, Ewout W; Tu, Jack V

    2012-01-01

    In biomedical research, the logistic regression model is the most commonly used method for predicting the probability of a binary outcome. While many clinical researchers have expressed an enthusiasm for regression trees, this method may have limited accuracy for predicting health outcomes. We aimed to evaluate the improvement that is achieved by using ensemble-based methods, including bootstrap aggregation (bagging) of regression trees, random forests, and boosted regression trees. We analyzed 30-day mortality in two large cohorts of patients hospitalized with either acute myocardial infarction (N = 16,230) or congestive heart failure (N = 15,848) in two distinct eras (1999–2001 and 2004–2005). We found that both the in-sample and out-of-sample prediction of ensemble methods offered substantial improvement in predicting cardiovascular mortality compared to conventional regression trees. However, conventional logistic regression models that incorporated restricted cubic smoothing splines had even better performance. We conclude that ensemble methods from the data mining and machine learning literature increase the predictive performance of regression trees, but may not lead to clear advantages over conventional logistic regression models for predicting short-term mortality in population-based samples of subjects with cardiovascular disease. PMID:22777999

  5. Comparison and validation of statistical methods for predicting power outage durations in the event of hurricanes.

    PubMed

    Nateghi, Roshanak; Guikema, Seth D; Quiring, Steven M

    2011-12-01

    This article compares statistical methods for modeling power outage durations during hurricanes and examines the predictive accuracy of these methods. Being able to make accurate predictions of power outage durations is valuable because the information can be used by utility companies to plan their restoration efforts more efficiently. This information can also help inform customers and public agencies of the expected outage times, enabling better collective response planning, and coordination of restoration efforts for other critical infrastructures that depend on electricity. In the long run, outage duration estimates for future storm scenarios may help utilities and public agencies better allocate risk management resources to balance the disruption from hurricanes with the cost of hardening power systems. We compare the out-of-sample predictive accuracy of five distinct statistical models for estimating power outage duration times caused by Hurricane Ivan in 2004. The methods compared include both regression models (accelerated failure time (AFT) and Cox proportional hazard models (Cox PH)) and data mining techniques (regression trees, Bayesian additive regression trees (BART), and multivariate additive regression splines). We then validate our models against two other hurricanes. Our results indicate that BART yields the best prediction accuracy and that it is possible to predict outage durations with reasonable accuracy. © 2011 Society for Risk Analysis.

  6. Development of hybrid genetic-algorithm-based neural networks using regression trees for modeling air quality inside a public transportation bus.

    PubMed

    Kadiyala, Akhil; Kaur, Devinder; Kumar, Ashok

    2013-02-01

    The present study developed a novel approach to modeling indoor air quality (IAQ) of a public transportation bus by the development of hybrid genetic-algorithm-based neural networks (also known as evolutionary neural networks) with input variables optimized from using the regression trees, referred as the GART approach. This study validated the applicability of the GART modeling approach in solving complex nonlinear systems by accurately predicting the monitored contaminants of carbon dioxide (CO2), carbon monoxide (CO), nitric oxide (NO), sulfur dioxide (SO2), 0.3-0.4 microm sized particle numbers, 0.4-0.5 microm sized particle numbers, particulate matter (PM) concentrations less than 1.0 microm (PM10), and PM concentrations less than 2.5 microm (PM2.5) inside a public transportation bus operating on 20% grade biodiesel in Toledo, OH. First, the important variables affecting each monitored in-bus contaminant were determined using regression trees. Second, the analysis of variance was used as a complimentary sensitivity analysis to the regression tree results to determine a subset of statistically significant variables affecting each monitored in-bus contaminant. Finally, the identified subsets of statistically significant variables were used as inputs to develop three artificial neural network (ANN) models. The models developed were regression tree-based back-propagation network (BPN-RT), regression tree-based radial basis function network (RBFN-RT), and GART models. Performance measures were used to validate the predictive capacity of the developed IAQ models. The results from this approach were compared with the results obtained from using a theoretical approach and a generalized practicable approach to modeling IAQ that included the consideration of additional independent variables when developing the aforementioned ANN models. The hybrid GART models were able to capture majority of the variance in the monitored in-bus contaminants. The genetic-algorithm-based neural network IAQ models outperformed the traditional ANN methods of the back-propagation and the radial basis function networks. The novelty of this research is the development of a novel approach to modeling vehicular indoor air quality by integration of the advanced methods of genetic algorithms, regression trees, and the analysis of variance for the monitored in-vehicle gaseous and particulate matter contaminants, and comparing the results obtained from using the developed approach with conventional artificial intelligence techniques of back propagation networks and radial basis function networks. This study validated the newly developed approach using holdout and threefold cross-validation methods. These results are of great interest to scientists, researchers, and the public in understanding the various aspects of modeling an indoor microenvironment. This methodology can easily be extended to other fields of study also.

  7. Boosted regression tree, table, and figure data

    EPA Pesticide Factsheets

    Spreadsheets are included here to support the manuscript Boosted Regression Tree Models to Explain Watershed Nutrient Concentrations and Biological Condition. This dataset is associated with the following publication:Golden , H., C. Lane , A. Prues, and E. D'Amico. Boosted Regression Tree Models to Explain Watershed Nutrient Concentrations and Biological Condition. JAWRA. American Water Resources Association, Middleburg, VA, USA, 52(5): 1251-1274, (2016).

  8. A review of logistic regression models used to predict post-fire tree mortality of western North American conifers

    Treesearch

    Travis Woolley; David C. Shaw; Lisa M. Ganio; Stephen Fitzgerald

    2012-01-01

    Logistic regression models used to predict tree mortality are critical to post-fire management, planning prescribed bums and understanding disturbance ecology. We review literature concerning post-fire mortality prediction using logistic regression models for coniferous tree species in the western USA. We include synthesis and review of: methods to develop, evaluate...

  9. Extensions and applications of ensemble-of-trees methods in machine learning

    NASA Astrophysics Data System (ADS)

    Bleich, Justin

    Ensemble-of-trees algorithms have emerged to the forefront of machine learning due to their ability to generate high forecasting accuracy for a wide array of regression and classification problems. Classic ensemble methodologies such as random forests (RF) and stochastic gradient boosting (SGB) rely on algorithmic procedures to generate fits to data. In contrast, more recent ensemble techniques such as Bayesian Additive Regression Trees (BART) and Dynamic Trees (DT) focus on an underlying Bayesian probability model to generate the fits. These new probability model-based approaches show much promise versus their algorithmic counterparts, but also offer substantial room for improvement. The first part of this thesis focuses on methodological advances for ensemble-of-trees techniques with an emphasis on the more recent Bayesian approaches. In particular, we focus on extensions of BART in four distinct ways. First, we develop a more robust implementation of BART for both research and application. We then develop a principled approach to variable selection for BART as well as the ability to naturally incorporate prior information on important covariates into the algorithm. Next, we propose a method for handling missing data that relies on the recursive structure of decision trees and does not require imputation. Last, we relax the assumption of homoskedasticity in the BART model to allow for parametric modeling of heteroskedasticity. The second part of this thesis returns to the classic algorithmic approaches in the context of classification problems with asymmetric costs of forecasting errors. First we consider the performance of RF and SGB more broadly and demonstrate its superiority to logistic regression for applications in criminology with asymmetric costs. Next, we use RF to forecast unplanned hospital readmissions upon patient discharge with asymmetric costs taken into account. Finally, we explore the construction of stable decision trees for forecasts of violence during probation hearings in court systems.

  10. A spatially explicit approach to the study of socio-demographic inequality in the spatial distribution of trees across Boston neighborhoods.

    PubMed

    Duncan, Dustin T; Kawachi, Ichiro; Kum, Susan; Aldstadt, Jared; Piras, Gianfranco; Matthews, Stephen A; Arbia, Giuseppe; Castro, Marcia C; White, Kellee; Williams, David R

    2014-04-01

    The racial/ethnic and income composition of neighborhoods often influences local amenities, including the potential spatial distribution of trees, which are important for population health and community wellbeing, particularly in urban areas. This ecological study used spatial analytical methods to assess the relationship between neighborhood socio-demographic characteristics (i.e. minority racial/ethnic composition and poverty) and tree density at the census tact level in Boston, Massachusetts (US). We examined spatial autocorrelation with the Global Moran's I for all study variables and in the ordinary least squares (OLS) regression residuals as well as computed Spearman correlations non-adjusted and adjusted for spatial autocorrelation between socio-demographic characteristics and tree density. Next, we fit traditional regressions (i.e. OLS regression models) and spatial regressions (i.e. spatial simultaneous autoregressive models), as appropriate. We found significant positive spatial autocorrelation for all neighborhood socio-demographic characteristics (Global Moran's I range from 0.24 to 0.86, all P =0.001), for tree density (Global Moran's I =0.452, P =0.001), and in the OLS regression residuals (Global Moran's I range from 0.32 to 0.38, all P <0.001). Therefore, we fit the spatial simultaneous autoregressive models. There was a negative correlation between neighborhood percent non-Hispanic Black and tree density (r S =-0.19; conventional P -value=0.016; spatially adjusted P -value=0.299) as well as a negative correlation between predominantly non-Hispanic Black (over 60% Black) neighborhoods and tree density (r S =-0.18; conventional P -value=0.019; spatially adjusted P -value=0.180). While the conventional OLS regression model found a marginally significant inverse relationship between Black neighborhoods and tree density, we found no statistically significant relationship between neighborhood socio-demographic composition and tree density in the spatial regression models. Methodologically, our study suggests the need to take into account spatial autocorrelation as findings/conclusions can change when the spatial autocorrelation is ignored. Substantively, our findings suggest no need for policy intervention vis-à-vis trees in Boston, though we hasten to add that replication studies, and more nuanced data on tree quality, age and diversity are needed.

  11. Predicting Potential Changes in Suitable Habitat and Distribution by 2100 for Tree Species of the Eastern United States

    Treesearch

    Louis R Iverson; Anantha M. Prasad; Mark W. Schwartz; Mark W. Schwartz

    2005-01-01

    We predict current distribution and abundance for tree species present in eastern North America, and subsequently estimate potential suitable habitat for those species under a changed climate with 2 x CO2. We used a series of statistical models (i.e., Regression Tree Analysis (RTA), Multivariate Adaptive Regression Splines (MARS), Bagging Trees (...

  12. Blood oxygen level dependent magnetic resonance imaging for detecting pathological patterns in lupus nephritis patients: a preliminary study using a decision tree model.

    PubMed

    Shi, Huilan; Jia, Junya; Li, Dong; Wei, Li; Shang, Wenya; Zheng, Zhenfeng

    2018-02-09

    Precise renal histopathological diagnosis will guide therapy strategy in patients with lupus nephritis. Blood oxygen level dependent (BOLD) magnetic resonance imaging (MRI) has been applicable noninvasive technique in renal disease. This current study was performed to explore whether BOLD MRI could contribute to diagnose renal pathological pattern. Adult patients with lupus nephritis renal pathological diagnosis were recruited for this study. Renal biopsy tissues were assessed based on the lupus nephritis ISN/RPS 2003 classification. The Blood oxygen level dependent magnetic resonance imaging (BOLD-MRI) was used to obtain functional magnetic resonance parameter, R2* values. Several functions of R2* values were calculated and used to construct algorithmic models for renal pathological patterns. In addition, the algorithmic models were compared as to their diagnostic capability. Both Histopathology and BOLD MRI were used to examine a total of twelve patients. Renal pathological patterns included five classes III (including 3 as class III + V) and seven classes IV (including 4 as class IV + V). Three algorithmic models, including decision tree, line discriminant, and logistic regression, were constructed to distinguish the renal pathological pattern of class III and class IV. The sensitivity of the decision tree model was better than that of the line discriminant model (71.87% vs 59.48%, P < 0.001) and inferior to that of the Logistic regression model (71.87% vs 78.71%, P < 0.001). The specificity of decision tree model was equivalent to that of the line discriminant model (63.87% vs 63.73%, P = 0.939) and higher than that of the logistic regression model (63.87% vs 38.0%, P < 0.001). The Area under the ROC curve (AUROCC) of the decision tree model was greater than that of the line discriminant model (0.765 vs 0.629, P < 0.001) and logistic regression model (0.765 vs 0.662, P < 0.001). BOLD MRI is a useful non-invasive imaging technique for the evaluation of lupus nephritis. Decision tree models constructed using functions of R2* values may facilitate the prediction of renal pathological patterns.

  13. Estimating parameters for tree basal area growth with a system of equations and seemingly unrelated regressions

    Treesearch

    Charles E. Rose; Thomas B. Lynch

    2001-01-01

    A method was developed for estimating parameters in an individual tree basal area growth model using a system of equations based on dbh rank classes. The estimation method developed is a compromise between an individual tree and a stand level basal area growth model that accounts for the correlation between trees within a plot by using seemingly unrelated regression (...

  14. Using ROC curves to compare neural networks and logistic regression for modeling individual noncatastrophic tree mortality

    Treesearch

    Susan L. King

    2003-01-01

    The performance of two classifiers, logistic regression and neural networks, are compared for modeling noncatastrophic individual tree mortality for 21 species of trees in West Virginia. The output of the classifier is usually a continuous number between 0 and 1. A threshold is selected between 0 and 1 and all of the trees below the threshold are classified as...

  15. A spatially explicit approach to the study of socio-demographic inequality in the spatial distribution of trees across Boston neighborhoods

    PubMed Central

    Duncan, Dustin T.; Kawachi, Ichiro; Kum, Susan; Aldstadt, Jared; Piras, Gianfranco; Matthews, Stephen A.; Arbia, Giuseppe; Castro, Marcia C.; White, Kellee; Williams, David R.

    2017-01-01

    The racial/ethnic and income composition of neighborhoods often influences local amenities, including the potential spatial distribution of trees, which are important for population health and community wellbeing, particularly in urban areas. This ecological study used spatial analytical methods to assess the relationship between neighborhood socio-demographic characteristics (i.e. minority racial/ethnic composition and poverty) and tree density at the census tact level in Boston, Massachusetts (US). We examined spatial autocorrelation with the Global Moran’s I for all study variables and in the ordinary least squares (OLS) regression residuals as well as computed Spearman correlations non-adjusted and adjusted for spatial autocorrelation between socio-demographic characteristics and tree density. Next, we fit traditional regressions (i.e. OLS regression models) and spatial regressions (i.e. spatial simultaneous autoregressive models), as appropriate. We found significant positive spatial autocorrelation for all neighborhood socio-demographic characteristics (Global Moran’s I range from 0.24 to 0.86, all P=0.001), for tree density (Global Moran’s I=0.452, P=0.001), and in the OLS regression residuals (Global Moran’s I range from 0.32 to 0.38, all P<0.001). Therefore, we fit the spatial simultaneous autoregressive models. There was a negative correlation between neighborhood percent non-Hispanic Black and tree density (rS=−0.19; conventional P-value=0.016; spatially adjusted P-value=0.299) as well as a negative correlation between predominantly non-Hispanic Black (over 60% Black) neighborhoods and tree density (rS=−0.18; conventional P-value=0.019; spatially adjusted P-value=0.180). While the conventional OLS regression model found a marginally significant inverse relationship between Black neighborhoods and tree density, we found no statistically significant relationship between neighborhood socio-demographic composition and tree density in the spatial regression models. Methodologically, our study suggests the need to take into account spatial autocorrelation as findings/conclusions can change when the spatial autocorrelation is ignored. Substantively, our findings suggest no need for policy intervention vis-à-vis trees in Boston, though we hasten to add that replication studies, and more nuanced data on tree quality, age and diversity are needed. PMID:29354668

  16. Novel forecasting approaches using combination of machine learning and statistical models for flood susceptibility mapping.

    PubMed

    Shafizadeh-Moghadam, Hossein; Valavi, Roozbeh; Shahabi, Himan; Chapi, Kamran; Shirzadi, Ataollah

    2018-07-01

    In this research, eight individual machine learning and statistical models are implemented and compared, and based on their results, seven ensemble models for flood susceptibility assessment are introduced. The individual models included artificial neural networks, classification and regression trees, flexible discriminant analysis, generalized linear model, generalized additive model, boosted regression trees, multivariate adaptive regression splines, and maximum entropy, and the ensemble models were Ensemble Model committee averaging (EMca), Ensemble Model confidence interval Inferior (EMciInf), Ensemble Model confidence interval Superior (EMciSup), Ensemble Model to estimate the coefficient of variation (EMcv), Ensemble Model to estimate the mean (EMmean), Ensemble Model to estimate the median (EMmedian), and Ensemble Model based on weighted mean (EMwmean). The data set covered 201 flood events in the Haraz watershed (Mazandaran province in Iran) and 10,000 randomly selected non-occurrence points. Among the individual models, the Area Under the Receiver Operating Characteristic (AUROC), which showed the highest value, belonged to boosted regression trees (0.975) and the lowest value was recorded for generalized linear model (0.642). On the other hand, the proposed EMmedian resulted in the highest accuracy (0.976) among all models. In spite of the outstanding performance of some models, nevertheless, variability among the prediction of individual models was considerable. Therefore, to reduce uncertainty, creating more generalizable, more stable, and less sensitive models, ensemble forecasting approaches and in particular the EMmedian is recommended for flood susceptibility assessment. Copyright © 2018 Elsevier Ltd. All rights reserved.

  17. A stepwise regression tree for nonlinear approximation: applications to estimating subpixel land cover

    USGS Publications Warehouse

    Huang, C.; Townshend, J.R.G.

    2003-01-01

    A stepwise regression tree (SRT) algorithm was developed for approximating complex nonlinear relationships. Based on the regression tree of Breiman et al . (BRT) and a stepwise linear regression (SLR) method, this algorithm represents an improvement over SLR in that it can approximate nonlinear relationships and over BRT in that it gives more realistic predictions. The applicability of this method to estimating subpixel forest was demonstrated using three test data sets, on all of which it gave more accurate predictions than SLR and BRT. SRT also generated more compact trees and performed better than or at least as well as BRT at all 10 equal forest proportion interval ranging from 0 to 100%. This method is appealing to estimating subpixel land cover over large areas.

  18. Trees grow on money: urban tree canopy cover and environmental justice.

    PubMed

    Schwarz, Kirsten; Fragkias, Michail; Boone, Christopher G; Zhou, Weiqi; McHale, Melissa; Grove, J Morgan; O'Neil-Dunne, Jarlath; McFadden, Joseph P; Buckley, Geoffrey L; Childers, Dan; Ogden, Laura; Pincetl, Stephanie; Pataki, Diane; Whitmer, Ali; Cadenasso, Mary L

    2015-01-01

    This study examines the distributional equity of urban tree canopy (UTC) cover for Baltimore, MD, Los Angeles, CA, New York, NY, Philadelphia, PA, Raleigh, NC, Sacramento, CA, and Washington, D.C. using high spatial resolution land cover data and census data. Data are analyzed at the Census Block Group levels using Spearman's correlation, ordinary least squares regression (OLS), and a spatial autoregressive model (SAR). Across all cities there is a strong positive correlation between UTC cover and median household income. Negative correlations between race and UTC cover exist in bivariate models for some cities, but they are generally not observed using multivariate regressions that include additional variables on income, education, and housing age. SAR models result in higher r-square values compared to the OLS models across all cities, suggesting that spatial autocorrelation is an important feature of our data. Similarities among cities can be found based on shared characteristics of climate, race/ethnicity, and size. Our findings suggest that a suite of variables, including income, contribute to the distribution of UTC cover. These findings can help target simultaneous strategies for UTC goals and environmental justice concerns.

  19. Comparing Methodologies for Developing an Early Warning System: Classification and Regression Tree Model versus Logistic Regression. REL 2015-077

    ERIC Educational Resources Information Center

    Koon, Sharon; Petscher, Yaacov

    2015-01-01

    The purpose of this report was to explicate the use of logistic regression and classification and regression tree (CART) analysis in the development of early warning systems. It was motivated by state education leaders' interest in maintaining high classification accuracy while simultaneously improving practitioner understanding of the rules by…

  20. Propensity score estimation: neural networks, support vector machines, decision trees (CART), and meta-classifiers as alternatives to logistic regression.

    PubMed

    Westreich, Daniel; Lessler, Justin; Funk, Michele Jonsson

    2010-08-01

    Propensity scores for the analysis of observational data are typically estimated using logistic regression. Our objective in this review was to assess machine learning alternatives to logistic regression, which may accomplish the same goals but with fewer assumptions or greater accuracy. We identified alternative methods for propensity score estimation and/or classification from the public health, biostatistics, discrete mathematics, and computer science literature, and evaluated these algorithms for applicability to the problem of propensity score estimation, potential advantages over logistic regression, and ease of use. We identified four techniques as alternatives to logistic regression: neural networks, support vector machines, decision trees (classification and regression trees [CART]), and meta-classifiers (in particular, boosting). Although the assumptions of logistic regression are well understood, those assumptions are frequently ignored. All four alternatives have advantages and disadvantages compared with logistic regression. Boosting (meta-classifiers) and, to a lesser extent, decision trees (particularly CART), appear to be most promising for use in the context of propensity score analysis, but extensive simulation studies are needed to establish their utility in practice. Copyright (c) 2010 Elsevier Inc. All rights reserved.

  1. Estimating tree biomass regressions and their error, proceedings of the workshop on tree biomass regression functions and their contribution to the error

    Treesearch

    Eric H. Wharton; Tiberius Cunia

    1987-01-01

    Proceedings of a workshop co-sponsored by the USDA Forest Service, the State University of New York, and the Society of American Foresters. Presented were papers on the methodology of sample tree selection, tree biomass measurement, construction of biomass tables and estimation of their error, and combining the error of biomass tables with that of the sample plots or...

  2. Comparative study of biodegradability prediction of chemicals using decision trees, functional trees, and logistic regression.

    PubMed

    Chen, Guangchao; Li, Xuehua; Chen, Jingwen; Zhang, Ya-Nan; Peijnenburg, Willie J G M

    2014-12-01

    Biodegradation is the principal environmental dissipation process of chemicals. As such, it is a dominant factor determining the persistence and fate of organic chemicals in the environment, and is therefore of critical importance to chemical management and regulation. In the present study, the authors developed in silico methods assessing biodegradability based on a large heterogeneous set of 825 organic compounds, using the techniques of the C4.5 decision tree, the functional inner regression tree, and logistic regression. External validation was subsequently carried out by 2 independent test sets of 777 and 27 chemicals. As a result, the functional inner regression tree exhibited the best predictability with predictive accuracies of 81.5% and 81.0%, respectively, on the training set (825 chemicals) and test set I (777 chemicals). Performance of the developed models on the 2 test sets was subsequently compared with that of the Estimation Program Interface (EPI) Suite Biowin 5 and Biowin 6 models, which also showed a better predictability of the functional inner regression tree model. The model built in the present study exhibits a reasonable predictability compared with existing models while possessing a transparent algorithm. Interpretation of the mechanisms of biodegradation was also carried out based on the models developed. © 2014 SETAC.

  3. Using data mining to predict success in a weight loss trial.

    PubMed

    Batterham, M; Tapsell, L; Charlton, K; O'Shea, J; Thorne, R

    2017-08-01

    Traditional methods for predicting weight loss success use regression approaches, which make the assumption that the relationships between the independent and dependent (or logit of the dependent) variable are linear. The aim of the present study was to investigate the relationship between common demographic and early weight loss variables to predict weight loss success at 12 months without making this assumption. Data mining methods (decision trees, generalised additive models and multivariate adaptive regression splines), in addition to logistic regression, were employed to predict: (i) weight loss success (defined as ≥5%) at the end of a 12-month dietary intervention using demographic variables [body mass index (BMI), sex and age]; percentage weight loss at 1 month; and (iii) the difference between actual and predicted weight loss using an energy balance model. The methods were compared by assessing model parsimony and the area under the curve (AUC). The decision tree provided the most clinically useful model and had a good accuracy (AUC 0.720 95% confidence interval = 0.600-0.840). Percentage weight loss at 1 month (≥0.75%) was the strongest predictor for successful weight loss. Within those individuals losing ≥0.75%, individuals with a BMI (≥27 kg m -2 ) were more likely to be successful than those with a BMI between 25 and 27 kg m -2 . Data mining methods can provide a more accurate way of assessing relationships when conventional assumptions are not met. In the present study, a decision tree provided the most parsimonious model. Given that early weight loss cannot be predicted before randomisation, incorporating this information into a post randomisation trial design may give better weight loss results. © 2017 The British Dietetic Association Ltd.

  4. Vegetation Continuous Fields--Transitioning from MODIS to VIIRS

    NASA Astrophysics Data System (ADS)

    DiMiceli, C.; Townshend, J. R.; Sohlberg, R. A.; Kim, D. H.; Kelly, M.

    2015-12-01

    Measurements of fractional vegetation cover are critical for accurate and consistent monitoring of global deforestation rates. They also provide important parameters for land surface, climate and carbon models and vital background data for research into fire, hydrological and ecosystem processes. MODIS Vegetation Continuous Fields (VCF) products provide four complementary layers of fractional cover: tree cover, non-tree vegetation, bare ground, and surface water. MODIS VCF products are currently produced globally and annually at 250m resolution for 2000 to the present. Additionally, annual VCF products at 1/20° resolution derived from AVHRR and MODIS Long-Term Data Records are in development to provide Earth System Data Records of fractional vegetation cover for 1982 to the present. In order to provide continuity of these valuable products, we are extending the VCF algorithms to create Suomi NPP/VIIRS VCF products. This presentation will highlight the first VIIRS fractional cover product: global percent tree cover at 1 km resolution. To create this product, phenological and physiological metrics were derived from each complete year of VIIRS 8-day surface reflectance products. A supervised regression tree method was applied to the metrics, using training derived from Landsat data supplemented by high-resolution data from Ikonos, RapidEye and QuickBird. The regression tree model was then applied globally to produce fractional tree cover. In our presentation we will detail our methods for creating the VIIRS VCF product. We will compare the new VIIRS VCF product to our current MODIS VCF products and demonstrate continuity between instruments. Finally, we will outline future VIIRS VCF development plans.

  5. Predicting 30-day Hospital Readmission with Publicly Available Administrative Database. A Conditional Logistic Regression Modeling Approach.

    PubMed

    Zhu, K; Lou, Z; Zhou, J; Ballester, N; Kong, N; Parikh, P

    2015-01-01

    This article is part of the Focus Theme of Methods of Information in Medicine on "Big Data and Analytics in Healthcare". Hospital readmissions raise healthcare costs and cause significant distress to providers and patients. It is, therefore, of great interest to healthcare organizations to predict what patients are at risk to be readmitted to their hospitals. However, current logistic regression based risk prediction models have limited prediction power when applied to hospital administrative data. Meanwhile, although decision trees and random forests have been applied, they tend to be too complex to understand among the hospital practitioners. Explore the use of conditional logistic regression to increase the prediction accuracy. We analyzed an HCUP statewide inpatient discharge record dataset, which includes patient demographics, clinical and care utilization data from California. We extracted records of heart failure Medicare beneficiaries who had inpatient experience during an 11-month period. We corrected the data imbalance issue with under-sampling. In our study, we first applied standard logistic regression and decision tree to obtain influential variables and derive practically meaning decision rules. We then stratified the original data set accordingly and applied logistic regression on each data stratum. We further explored the effect of interacting variables in the logistic regression modeling. We conducted cross validation to assess the overall prediction performance of conditional logistic regression (CLR) and compared it with standard classification models. The developed CLR models outperformed several standard classification models (e.g., straightforward logistic regression, stepwise logistic regression, random forest, support vector machine). For example, the best CLR model improved the classification accuracy by nearly 20% over the straightforward logistic regression model. Furthermore, the developed CLR models tend to achieve better sensitivity of more than 10% over the standard classification models, which can be translated to correct labeling of additional 400 - 500 readmissions for heart failure patients in the state of California over a year. Lastly, several key predictor identified from the HCUP data include the disposition location from discharge, the number of chronic conditions, and the number of acute procedures. It would be beneficial to apply simple decision rules obtained from the decision tree in an ad-hoc manner to guide the cohort stratification. It could be potentially beneficial to explore the effect of pairwise interactions between influential predictors when building the logistic regression models for different data strata. Judicious use of the ad-hoc CLR models developed offers insights into future development of prediction models for hospital readmissions, which can lead to better intuition in identifying high-risk patients and developing effective post-discharge care strategies. Lastly, this paper is expected to raise the awareness of collecting data on additional markers and developing necessary database infrastructure for larger-scale exploratory studies on readmission risk prediction.

  6. Assessment of wastewater treatment facility compliance with decreasing ammonia discharge limits using a regression tree model.

    PubMed

    Suchetana, Bihu; Rajagopalan, Balaji; Silverstein, JoAnn

    2017-11-15

    A regression tree-based diagnostic approach is developed to evaluate factors affecting US wastewater treatment plant compliance with ammonia discharge permit limits using Discharge Monthly Report (DMR) data from a sample of 106 municipal treatment plants for the period of 2004-2008. Predictor variables used to fit the regression tree are selected using random forests, and consist of the previous month's effluent ammonia, influent flow rates and plant capacity utilization. The tree models are first used to evaluate compliance with existing ammonia discharge standards at each facility and then applied assuming more stringent discharge limits, under consideration in many states. The model predicts that the ability to meet both current and future limits depends primarily on the previous month's treatment performance. With more stringent discharge limits predicted ammonia concentration relative to the discharge limit, increases. In-sample validation shows that the regression trees can provide a median classification accuracy of >70%. The regression tree model is validated using ammonia discharge data from an operating wastewater treatment plant and is able to accurately predict the observed ammonia discharge category approximately 80% of the time, indicating that the regression tree model can be applied to predict compliance for individual treatment plants providing practical guidance for utilities and regulators with an interest in controlling ammonia discharges. The proposed methodology is also used to demonstrate how to delineate reliable sources of demand and supply in a point source-to-point source nutrient credit trading scheme, as well as how planners and decision makers can set reasonable discharge limits in future. Copyright © 2017 Elsevier B.V. All rights reserved.

  7. Factor complexity of crash occurrence: An empirical demonstration using boosted regression trees.

    PubMed

    Chung, Yi-Shih

    2013-12-01

    Factor complexity is a characteristic of traffic crashes. This paper proposes a novel method, namely boosted regression trees (BRT), to investigate the complex and nonlinear relationships in high-variance traffic crash data. The Taiwanese 2004-2005 single-vehicle motorcycle crash data are used to demonstrate the utility of BRT. Traditional logistic regression and classification and regression tree (CART) models are also used to compare their estimation results and external validities. Both the in-sample cross-validation and out-of-sample validation results show that an increase in tree complexity provides improved, although declining, classification performance, indicating a limited factor complexity of single-vehicle motorcycle crashes. The effects of crucial variables including geographical, time, and sociodemographic factors explain some fatal crashes. Relatively unique fatal crashes are better approximated by interactive terms, especially combinations of behavioral factors. BRT models generally provide improved transferability than conventional logistic regression and CART models. This study also discusses the implications of the results for devising safety policies. Copyright © 2012 Elsevier Ltd. All rights reserved.

  8. Dynamic travel time estimation using regression trees.

    DOT National Transportation Integrated Search

    2008-10-01

    This report presents a methodology for travel time estimation by using regression trees. The dissemination of travel time information has become crucial for effective traffic management, especially under congested road conditions. In the absence of c...

  9. Probability of infestation and extent of mortality associated with the Douglas-fir beetle in the Colorado Front Range

    Treesearch

    Jose F. Negron

    1998-01-01

    Infested and uninfested areas within Douglas fir, Pseudotsuga menziesii Mirb.. Franco, stands affected by the Douglas-fir beetle, Dendroctonus pseudotsugae Hopk. were sampled in the Colorado Front Range, CO. Classification tree models were built to predict probabilities of infestation. Regression trees and linear regression analysis were used to model amount of tree...

  10. Using nonlinear quantile regression to estimate the self-thinning boundary curve

    Treesearch

    Quang V. Cao; Thomas J. Dean

    2015-01-01

    The relationship between tree size (quadratic mean diameter) and tree density (number of trees per unit area) has been a topic of research and discussion for many decades. Starting with Reineke in 1933, the maximum size-density relationship, on a log-log scale, has been assumed to be linear. Several techniques, including linear quantile regression, have been employed...

  11. Assessing the predictive capability of randomized tree-based ensembles in streamflow modelling

    NASA Astrophysics Data System (ADS)

    Galelli, S.; Castelletti, A.

    2013-02-01

    Combining randomization methods with ensemble prediction is emerging as an effective option to balance accuracy and computational efficiency in data-driven modeling. In this paper we investigate the prediction capability of extremely randomized trees (Extra-Trees), in terms of accuracy, explanation ability and computational efficiency, in a streamflow modeling exercise. Extra-Trees are a totally randomized tree-based ensemble method that (i) alleviates the poor generalization property and tendency to overfitting of traditional standalone decision trees (e.g. CART); (ii) is computationally very efficient; and, (iii) allows to infer the relative importance of the input variables, which might help in the ex-post physical interpretation of the model. The Extra-Trees potential is analyzed on two real-world case studies (Marina catchment (Singapore) and Canning River (Western Australia)) representing two different morphoclimatic contexts comparatively with other tree-based methods (CART and M5) and parametric data-driven approaches (ANNs and multiple linear regression). Results show that Extra-Trees perform comparatively well to the best of the benchmarks (i.e. M5) in both the watersheds, while outperforming the other approaches in terms of computational requirement when adopted on large datasets. In addition, the ranking of the input variable provided can be given a physically meaningful interpretation.

  12. Assessing the predictive capability of randomized tree-based ensembles in streamflow modelling

    NASA Astrophysics Data System (ADS)

    Galelli, S.; Castelletti, A.

    2013-07-01

    Combining randomization methods with ensemble prediction is emerging as an effective option to balance accuracy and computational efficiency in data-driven modelling. In this paper, we investigate the prediction capability of extremely randomized trees (Extra-Trees), in terms of accuracy, explanation ability and computational efficiency, in a streamflow modelling exercise. Extra-Trees are a totally randomized tree-based ensemble method that (i) alleviates the poor generalisation property and tendency to overfitting of traditional standalone decision trees (e.g. CART); (ii) is computationally efficient; and, (iii) allows to infer the relative importance of the input variables, which might help in the ex-post physical interpretation of the model. The Extra-Trees potential is analysed on two real-world case studies - Marina catchment (Singapore) and Canning River (Western Australia) - representing two different morphoclimatic contexts. The evaluation is performed against other tree-based methods (CART and M5) and parametric data-driven approaches (ANNs and multiple linear regression). Results show that Extra-Trees perform comparatively well to the best of the benchmarks (i.e. M5) in both the watersheds, while outperforming the other approaches in terms of computational requirement when adopted on large datasets. In addition, the ranking of the input variable provided can be given a physically meaningful interpretation.

  13. Which sociodemographic factors are important on smoking behaviour of high school students? The contribution of classification and regression tree methodology in a broad epidemiological survey.

    PubMed

    Ozge, C; Toros, F; Bayramkaya, E; Camdeviren, H; Sasmaz, T

    2006-08-01

    The purpose of this study is to evaluate the most important sociodemographic factors on smoking status of high school students using a broad randomised epidemiological survey. Using in-class, self administered questionnaire about their sociodemographic variables and smoking behaviour, a representative sample of total 3304 students of preparatory, 9th, 10th, and 11th grades, from 22 randomly selected schools of Mersin, were evaluated and discriminative factors have been determined using appropriate statistics. In addition to binary logistic regression analysis, the study evaluated combined effects of these factors using classification and regression tree methodology, as a new statistical method. The data showed that 38% of the students reported lifetime smoking and 16.9% of them reported current smoking with a male predominancy and increasing prevalence by age. Second hand smoking was reported at a 74.3% frequency with father predominance (56.6%). The significantly important factors that affect current smoking in these age groups were increased by household size, late birth rank, certain school types, low academic performance, increased second hand smoking, and stress (especially reported as separation from a close friend or because of violence at home). Classification and regression tree methodology showed the importance of some neglected sociodemographic factors with a good classification capacity. It was concluded that, as closely related with sociocultural factors, smoking was a common problem in this young population, generating important academic and social burden in youth life and with increasing data about this behaviour and using new statistical methods, effective coping strategies could be composed.

  14. Predicting the limits to tree height using statistical regressions of leaf traits.

    PubMed

    Burgess, Stephen S O; Dawson, Todd E

    2007-01-01

    Leaf morphology and physiological functioning demonstrate considerable plasticity within tree crowns, with various leaf traits often exhibiting pronounced vertical gradients in very tall trees. It has been proposed that the trajectory of these gradients, as determined by regression methods, could be used in conjunction with theoretical biophysical limits to estimate the maximum height to which trees can grow. Here, we examined this approach using published and new experimental data from tall conifer and angiosperm species. We showed that height predictions were sensitive to tree-to-tree variation in the shape of the regression and to the biophysical endpoints selected. We examined the suitability of proposed end-points and their theoretical validity. We also noted that site and environment influenced height predictions considerably. Use of leaf mass per unit area or leaf water potential coupled with vulnerability of twigs to cavitation poses a number of difficulties for predicting tree height. Photosynthetic rate and carbon isotope discrimination show more promise, but in the second case, the complex relationship between light, water availability, photosynthetic capacity and internal conductance to CO(2) must first be characterized.

  15. Ensemble Statistical Post-Processing of the National Air Quality Forecast Capability: Enhancing Ozone Forecasts in Baltimore, Maryland

    NASA Technical Reports Server (NTRS)

    Garner, Gregory G.; Thompson, Anne M.

    2013-01-01

    An ensemble statistical post-processor (ESP) is developed for the National Air Quality Forecast Capability (NAQFC) to address the unique challenges of forecasting surface ozone in Baltimore, MD. Air quality and meteorological data were collected from the eight monitors that constitute the Baltimore forecast region. These data were used to build the ESP using a moving-block bootstrap, regression tree models, and extreme-value theory. The ESP was evaluated using a 10-fold cross-validation to avoid evaluation with the same data used in the development process. Results indicate that the ESP is conditionally biased, likely due to slight overfitting while training the regression tree models. When viewed from the perspective of a decision-maker, the ESP provides a wealth of additional information previously not available through the NAQFC alone. The user is provided the freedom to tailor the forecast to the decision at hand by using decision-specific probability thresholds that define a forecast for an ozone exceedance. Taking advantage of the ESP, the user not only receives an increase in value over the NAQFC, but also receives value for An ensemble statistical post-processor (ESP) is developed for the National Air Quality Forecast Capability (NAQFC) to address the unique challenges of forecasting surface ozone in Baltimore, MD. Air quality and meteorological data were collected from the eight monitors that constitute the Baltimore forecast region. These data were used to build the ESP using a moving-block bootstrap, regression tree models, and extreme-value theory. The ESP was evaluated using a 10-fold cross-validation to avoid evaluation with the same data used in the development process. Results indicate that the ESP is conditionally biased, likely due to slight overfitting while training the regression tree models. When viewed from the perspective of a decision-maker, the ESP provides a wealth of additional information previously not available through the NAQFC alone. The user is provided the freedom to tailor the forecast to the decision at hand by using decision-specific probability thresholds that define a forecast for an ozone exceedance. Taking advantage of the ESP, the user not only receives an increase in value over the NAQFC, but also receives value for

  16. An introduction to tree-structured modeling with application to quality of life data.

    PubMed

    Su, Xiaogang; Azuero, Andres; Cho, June; Kvale, Elizabeth; Meneses, Karen M; McNees, M Patrick

    2011-01-01

    Investigators addressing nursing research are faced increasingly with the need to analyze data that involve variables of mixed types and are characterized by complex nonlinearity and interactions. Tree-based methods, also called recursive partitioning, are gaining popularity in various fields. In addition to efficiency and flexibility in handling multifaceted data, tree-based methods offer ease of interpretation. The aims of this study were to introduce tree-based methods, discuss their advantages and pitfalls in application, and describe their potential use in nursing research. In this article, (a) an introduction to tree-structured methods is presented, (b) the technique is illustrated via quality of life (QOL) data collected in the Breast Cancer Education Intervention study, and (c) implications for their potential use in nursing research are discussed. As illustrated by the QOL analysis example, tree methods generate interesting and easily understood findings that cannot be uncovered via traditional linear regression analysis. The expanding breadth and complexity of nursing research may entail the use of new tools to improve efficiency and gain new insights. In certain situations, tree-based methods offer an attractive approach that help address such needs.

  17. New machine learning tools for predictive vegetation mapping after climate change: Bagging and Random Forest perform better than Regression Tree Analysis

    Treesearch

    L.R. Iverson; A.M. Prasad; A. Liaw

    2004-01-01

    More and better machine learning tools are becoming available for landscape ecologists to aid in understanding species-environment relationships and to map probable species occurrence now and potentially into the future. To thal end, we evaluated three statistical models: Regression Tree Analybib (RTA), Bagging Trees (BT) and Random Forest (RF) for their utility in...

  18. Equations for predicting biomass in 2- to 6-year-old Eucalyptus saligna in Hawaii

    Treesearch

    Craig D. Whitesell; Susan C. Miyasaka; Robert F. Strand; Thomas H. Schubert; Katharine E. McDuffie

    1988-01-01

    Eucalyptus saligna trees grown in short-rotation plantations on the island of Hawaii were measured, harvested, and weighed to provide data for developing regression equations using non-destructive stand measurements. Regression analysis of the data from 190 trees in the 2.0- to 3.5-year range and 96 trees in the 4- to 6-year range related stem-only...

  19. Estimating cavity tree and snag abundance using negative binomial regression models and nearest neighbor imputation methods

    Treesearch

    Bianca N.I. Eskelson; Hailemariam Temesgen; Tara M. Barrett

    2009-01-01

    Cavity tree and snag abundance data are highly variable and contain many zero observations. We predict cavity tree and snag abundance from variables that are readily available from forest cover maps or remotely sensed data using negative binomial (NB), zero-inflated NB, and zero-altered NB (ZANB) regression models as well as nearest neighbor (NN) imputation methods....

  20. Prevalence and Determinants of Preterm Birth in Tehran, Iran: A Comparison between Logistic Regression and Decision Tree Methods.

    PubMed

    Amini, Payam; Maroufizadeh, Saman; Samani, Reza Omani; Hamidi, Omid; Sepidarkish, Mahdi

    2017-06-01

    Preterm birth (PTB) is a leading cause of neonatal death and the second biggest cause of death in children under five years of age. The objective of this study was to determine the prevalence of PTB and its associated factors using logistic regression and decision tree classification methods. This cross-sectional study was conducted on 4,415 pregnant women in Tehran, Iran, from July 6-21, 2015. Data were collected by a researcher-developed questionnaire through interviews with mothers and review of their medical records. To evaluate the accuracy of the logistic regression and decision tree methods, several indices such as sensitivity, specificity, and the area under the curve were used. The PTB rate was 5.5% in this study. The logistic regression outperformed the decision tree for the classification of PTB based on risk factors. Logistic regression showed that multiple pregnancies, mothers with preeclampsia, and those who conceived with assisted reproductive technology had an increased risk for PTB ( p < 0.05). Identifying and training mothers at risk as well as improving prenatal care may reduce the PTB rate. We also recommend that statisticians utilize the logistic regression model for the classification of risk groups for PTB.

  1. Spatial properties of snow cover in the Upper Merced River Basin: implications for a distributed snow measurement network

    NASA Astrophysics Data System (ADS)

    Bouffon, T.; Rice, R.; Bales, R.

    2006-12-01

    The spatial distributions of snow water equivalent (SWE) and snow depth within a 1, 4, and 16 km2 grid element around two automated snow pillows in a forested and open- forested region of the Upper Merced River Basin (2,800 km2) of Yosemite National Park were characterized using field observations and analyzed using binary regression trees. Snow surveys occurred at the forested site during the accumulation and ablation seasons, while at the open-forest site a survey was performed only during the accumulation season. An average of 130 snow depth and 7 snow density measurements were made on each survey, within the 4 km2 grid. Snow depth was distributed using binary regression trees and geostatistical methods using the physiographic parameters (e.g. elevation, slope, vegetation, aspect). Results in the forest region indicate that the snow pillow overestimated average SWE within the 1, 4, and 16 km2 areas by 34 percent during ablation, but during accumulation the snow pillow provides a good estimate of the modeled mean SWE grid value, however it is suspected that the snow pillow was underestimating SWE. However, at the open forest site, during accumulation, the snow pillow was 28 percent greater than the mean modeled grid element. In addition, the binary regression trees indicate that the independent variables of vegetation, slope, and aspect are the most influential parameters of snow depth distribution. The binary regression tree and multivariate linear regression models explain about 60 percent of the initial variance for snow depth and 80 percent for density, respectively. This short-term study provides motivation and direction for the installation of a distributed snow measurement network to fill the information gap in basin-wide SWE and snow depth measurements. Guided by these results, a distributed snow measurement network was installed in the Fall 2006 at Gin Flat in the Upper Merced River Basin with the specific objective of measuring accumulation and ablation across topographic variables with the aim of providing guidance for future larger scale observation network designs.

  2. Boosted Regression Tree Models to Explain Watershed Nutrient Concentrations and Biological Condition

    EPA Science Inventory

    Boosted regression tree (BRT) models were developed to quantify the nonlinear relationships between landscape variables and nutrient concentrations in a mesoscale mixed land cover watershed during base-flow conditions. Factors that affect instream biological components, based on ...

  3. Trees Grow on Money: Urban Tree Canopy Cover and Environmental Justice

    PubMed Central

    Schwarz, Kirsten; Fragkias, Michail; Boone, Christopher G.; Zhou, Weiqi; McHale, Melissa; Grove, J. Morgan; O’Neil-Dunne, Jarlath; McFadden, Joseph P.; Buckley, Geoffrey L.; Childers, Dan; Ogden, Laura; Pincetl, Stephanie; Pataki, Diane; Whitmer, Ali; Cadenasso, Mary L.

    2015-01-01

    This study examines the distributional equity of urban tree canopy (UTC) cover for Baltimore, MD, Los Angeles, CA, New York, NY, Philadelphia, PA, Raleigh, NC, Sacramento, CA, and Washington, D.C. using high spatial resolution land cover data and census data. Data are analyzed at the Census Block Group levels using Spearman’s correlation, ordinary least squares regression (OLS), and a spatial autoregressive model (SAR). Across all cities there is a strong positive correlation between UTC cover and median household income. Negative correlations between race and UTC cover exist in bivariate models for some cities, but they are generally not observed using multivariate regressions that include additional variables on income, education, and housing age. SAR models result in higher r-square values compared to the OLS models across all cities, suggesting that spatial autocorrelation is an important feature of our data. Similarities among cities can be found based on shared characteristics of climate, race/ethnicity, and size. Our findings suggest that a suite of variables, including income, contribute to the distribution of UTC cover. These findings can help target simultaneous strategies for UTC goals and environmental justice concerns. PMID:25830303

  4. Assessing visual green effects of individual urban trees using airborne Lidar data.

    PubMed

    Chen, Ziyue; Xu, Bing; Gao, Bingbo

    2015-12-01

    Urban trees benefit people's daily life in terms of air quality, local climate, recreation and aesthetics. Among these functions, a growing number of studies have been conducted to understand the relationship between residents' preference towards local environments and visual green effects of urban greenery. However, except for on-site photography, there are few quantitative methods to calculate green visibility, especially tree green visibility, from viewers' perspectives. To fill this research gap, a case study was conducted in the city of Cambridge, which has a diversity of tree species, sizes and shapes. Firstly, a photograph-based survey was conducted to approximate the actual value of visual green effects of individual urban trees. In addition, small footprint airborne Lidar (Light detection and ranging) data was employed to measure the size and shape of individual trees. Next, correlations between visual tree green effects and tree structural parameters were examined. Through experiments and gradual refinement, a regression model with satisfactory R2 and limited large errors is proposed. Considering the diversity of sample trees and the result of cross-validation, this model has the potential to be applied to other study sites. This research provides urban planners and decision makers with an innovative method to analyse and evaluate landscape patterns in terms of tree greenness. Copyright © 2015 Elsevier B.V. All rights reserved.

  5. Tree Biomass Allocation and Its Model Additivity for Casuarina equisetifolia in a Tropical Forest of Hainan Island, China.

    PubMed

    Xue, Yang; Yang, Zhongyang; Wang, Xiaoyan; Lin, Zhipan; Li, Dunxi; Su, Shaofeng

    2016-01-01

    Casuarina equisetifolia is commonly planted and used in the construction of coastal shelterbelt protection in Hainan Island. Thus, it is critical to accurately estimate the tree biomass of Casuarina equisetifolia L. for forest managers to evaluate the biomass stock in Hainan. The data for this work consisted of 72 trees, which were divided into three age groups: young forest, middle-aged forest, and mature forest. The proportion of biomass from the trunk significantly increased with age (P<0.05). However, the biomass of the branch and leaf decreased, and the biomass of the root did not change. To test whether the crown radius (CR) can improve biomass estimates of C. equisetifolia, we introduced CR into the biomass models. Here, six models were used to estimate the biomass of each component, including the trunk, the branch, the leaf, and the root. In each group, we selected one model among these six models for each component. The results showed that including the CR greatly improved the model performance and reduced the error, especially for the young and mature forests. In addition, to ensure biomass additivity, the selected equation for each component was fitted as a system of equations using seemingly unrelated regression (SUR). The SUR method not only gave efficient and accurate estimates but also achieved the logical additivity. The results in this study provide a robust estimation of tree biomass components and total biomass over three groups of C. equisetifolia.

  6. Tree Biomass Allocation and Its Model Additivity for Casuarina equisetifolia in a Tropical Forest of Hainan Island, China

    PubMed Central

    Xue, Yang; Yang, Zhongyang; Wang, Xiaoyan; Lin, Zhipan; Li, Dunxi; Su, Shaofeng

    2016-01-01

    Casuarina equisetifolia is commonly planted and used in the construction of coastal shelterbelt protection in Hainan Island. Thus, it is critical to accurately estimate the tree biomass of Casuarina equisetifolia L. for forest managers to evaluate the biomass stock in Hainan. The data for this work consisted of 72 trees, which were divided into three age groups: young forest, middle-aged forest, and mature forest. The proportion of biomass from the trunk significantly increased with age (P<0.05). However, the biomass of the branch and leaf decreased, and the biomass of the root did not change. To test whether the crown radius (CR) can improve biomass estimates of C. equisetifolia, we introduced CR into the biomass models. Here, six models were used to estimate the biomass of each component, including the trunk, the branch, the leaf, and the root. In each group, we selected one model among these six models for each component. The results showed that including the CR greatly improved the model performance and reduced the error, especially for the young and mature forests. In addition, to ensure biomass additivity, the selected equation for each component was fitted as a system of equations using seemingly unrelated regression (SUR). The SUR method not only gave efficient and accurate estimates but also achieved the logical additivity. The results in this study provide a robust estimation of tree biomass components and total biomass over three groups of C. equisetifolia. PMID:27002822

  7. Differences in Risk Factors for Rotator Cuff Tears between Elderly Patients and Young Patients.

    PubMed

    Watanabe, Akihisa; Ono, Qana; Nishigami, Tomohiko; Hirooka, Takahiko; Machida, Hirohisa

    2018-02-01

    It has been unclear whether the risk factors for rotator cuff tears are the same at all ages or differ between young and older populations. In this study, we examined the risk factors for rotator cuff tears using classification and regression tree analysis as methods of nonlinear regression analysis. There were 65 patients in the rotator cuff tears group and 45 patients in the intact rotator cuff group. Classification and regression tree analysis was performed to predict rotator cuff tears. The target factor was rotator cuff tears; explanatory variables were age, sex, trauma, and critical shoulder angle≥35°. In the results of classification and regression tree analysis, the tree was divided at age 64. For patients aged≥64, the tree was divided at trauma. For patients aged<64, the tree was divided at critical shoulder angle≥35°. The odds ratio for critical shoulder angle≥35° was significant for all ages (5.89), and for patients aged<64 (10.3) while trauma was only a significant factor for patients aged≥64 (5.13). Age, trauma, and critical shoulder angle≥35° were related to rotator cuff tears in this study. However, these risk factors showed different trends according to age group, not a linear relationship.

  8. Fertilizer Response Curves for Commercial Southern Forest Species Defined with an Un-Replicated Experimental Design.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Coleman, Mark; Aubrey, Doug; Coyle, David, R.

    2005-11-01

    There has been recent interest in use of non-replicated regression experimental designs in forestry, as the need for replication in experimental design is burdensome on limited research budgets. We wanted to determine the interacting effects of soil moisture and nutrient availability on the production of various southeastern forest trees (two clones of Populus deltoides, open pollinated Platanus occidentalis, Liquidambar styraciflua and Pinus taeda). Additionally, we required an understanding of the fertilizer response curve. To accomplish both objectives we developed a composite design that includes a core ANOVA approach to consider treatment interactions, with the addition of non-replicated regression plots receivingmore » a range of fertilizer levels for the primary irrigation treatment.« less

  9. Bayesian Ensemble Trees (BET) for Clustering and Prediction in Heterogeneous Data

    PubMed Central

    Duan, Leo L.; Clancy, John P.; Szczesniak, Rhonda D.

    2016-01-01

    We propose a novel “tree-averaging” model that utilizes the ensemble of classification and regression trees (CART). Each constituent tree is estimated with a subset of similar data. We treat this grouping of subsets as Bayesian Ensemble Trees (BET) and model them as a Dirichlet process. We show that BET determines the optimal number of trees by adapting to the data heterogeneity. Compared with the other ensemble methods, BET requires much fewer trees and shows equivalent prediction accuracy using weighted averaging. Moreover, each tree in BET provides variable selection criterion and interpretation for each subset. We developed an efficient estimating procedure with improved estimation strategies in both CART and mixture models. We demonstrate these advantages of BET with simulations and illustrate the approach with a real-world data example involving regression of lung function measurements obtained from patients with cystic fibrosis. Supplemental materials are available online. PMID:27524872

  10. Using methods from the data mining and machine learning literature for disease classification and prediction: A case study examining classification of heart failure sub-types

    PubMed Central

    Austin, Peter C.; Tu, Jack V.; Ho, Jennifer E.; Levy, Daniel; Lee, Douglas S.

    2014-01-01

    Objective Physicians classify patients into those with or without a specific disease. Furthermore, there is often interest in classifying patients according to disease etiology or subtype. Classification trees are frequently used to classify patients according to the presence or absence of a disease. However, classification trees can suffer from limited accuracy. In the data-mining and machine learning literature, alternate classification schemes have been developed. These include bootstrap aggregation (bagging), boosting, random forests, and support vector machines. Study design and Setting We compared the performance of these classification methods with those of conventional classification trees to classify patients with heart failure according to the following sub-types: heart failure with preserved ejection fraction (HFPEF) vs. heart failure with reduced ejection fraction (HFREF). We also compared the ability of these methods to predict the probability of the presence of HFPEF with that of conventional logistic regression. Results We found that modern, flexible tree-based methods from the data mining literature offer substantial improvement in prediction and classification of heart failure sub-type compared to conventional classification and regression trees. However, conventional logistic regression had superior performance for predicting the probability of the presence of HFPEF compared to the methods proposed in the data mining literature. Conclusion The use of tree-based methods offers superior performance over conventional classification and regression trees for predicting and classifying heart failure subtypes in a population-based sample of patients from Ontario. However, these methods do not offer substantial improvements over logistic regression for predicting the presence of HFPEF. PMID:23384592

  11. A self-trained classification technique for producing 30 m percent-water maps from Landsat data

    USGS Publications Warehouse

    Rover, Jennifer R.; Wylie, Bruce K.; Ji, Lei

    2010-01-01

    Small bodies of water can be mapped with moderate-resolution satellite data using methods where water is mapped as subpixel fractions using field measurements or high-resolution images as training datasets. A new method, developed from a regression-tree technique, uses a 30 m Landsat image for training the regression tree that, in turn, is applied to the same image to map subpixel water. The self-trained method was evaluated by comparing the percent-water map with three other maps generated from established percent-water mapping methods: (1) a regression-tree model trained with a 5 m SPOT 5 image, (2) a regression-tree model based on endmembers and (3) a linear unmixing classification technique. The results suggest that subpixel water fractions can be accurately estimated when high-resolution satellite data or intensively interpreted training datasets are not available, which increases our ability to map small water bodies or small changes in lake size at a regional scale.

  12. Scalable Regression Tree Learning on Hadoop using OpenPlanet

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Yin, Wei; Simmhan, Yogesh; Prasanna, Viktor

    As scientific and engineering domains attempt to effectively analyze the deluge of data arriving from sensors and instruments, machine learning is becoming a key data mining tool to build prediction models. Regression tree is a popular learning model that combines decision trees and linear regression to forecast numerical target variables based on a set of input features. Map Reduce is well suited for addressing such data intensive learning applications, and a proprietary regression tree algorithm, PLANET, using MapReduce has been proposed earlier. In this paper, we describe an open source implement of this algorithm, OpenPlanet, on the Hadoop framework usingmore » a hybrid approach. Further, we evaluate the performance of OpenPlanet using realworld datasets from the Smart Power Grid domain to perform energy use forecasting, and propose tuning strategies of Hadoop parameters to improve the performance of the default configuration by 75% for a training dataset of 17 million tuples on a 64-core Hadoop cluster on FutureGrid.« less

  13. Optimizing a basal bark spray of dinotefuran to manage armored scales (Hemiptera: Diaspididae) in Christmas tree plantations.

    PubMed

    Cowles, Richard S

    2010-10-01

    The armored scales Fiorinia externa Ferris and Aspidiotus cryptomeriae Kuwana (Hemiptera: Diaspididae) are increasingly damaging to Christmas tree plantings in southern New England. The systemic insecticide dinotefuran was investigated for selectively suppressing armored scale populations relative to their natural enemies in cooperating growers' fields in 2008 and 2009. Banded soil application of dinotefuran resulted in poor control. However, a dinotefuran spray applied to the basal 25 cm of trunk resulted in its absorption through the bark, translocation to the foliage, and good efficacy. The basal bark spray did not significantly impact the activity of predators Chilocorus stigma (Say) or Cybocephalus nipponicus Enrody-Younga and in 2009 showed a dosage-dependent improvement in the percentage of scales parasitized by Encarsia citrina Craw. A field dosage-response factorial experiment revealed that a 0.25% (vol:vol) addition of a surfactant with dinotefuran did not enhance insecticidal effect. Probit-transformed scale population reduction relative to the untreated check was subjected to linear regression analysis; reduction of scale populations was proportional to the log of insecticide dosage, whereas basal bark spray efficacy declined in proportion to the cube of tree height. The regression equation can be used to optimize dosage relative to tree height. Excellent efficacy resulted from basal bark spray application dates of 28 April (prebud break) to mid-June, but earlier spray timing within that treatment window had fewer crawlers discoloring new growth with their short-lived feeding. A basal bark spray of dinotefuran is well suited for integration with natural enemies to manage armored scales in Christmas tree plantations.

  14. Predicting diabetes mellitus using SMOTE and ensemble machine learning approach: The Henry Ford ExercIse Testing (FIT) project.

    PubMed

    Alghamdi, Manal; Al-Mallah, Mouaz; Keteyian, Steven; Brawner, Clinton; Ehrman, Jonathan; Sakr, Sherif

    2017-01-01

    Machine learning is becoming a popular and important approach in the field of medical research. In this study, we investigate the relative performance of various machine learning methods such as Decision Tree, Naïve Bayes, Logistic Regression, Logistic Model Tree and Random Forests for predicting incident diabetes using medical records of cardiorespiratory fitness. In addition, we apply different techniques to uncover potential predictors of diabetes. This FIT project study used data of 32,555 patients who are free of any known coronary artery disease or heart failure who underwent clinician-referred exercise treadmill stress testing at Henry Ford Health Systems between 1991 and 2009 and had a complete 5-year follow-up. At the completion of the fifth year, 5,099 of those patients have developed diabetes. The dataset contained 62 attributes classified into four categories: demographic characteristics, disease history, medication use history, and stress test vital signs. We developed an Ensembling-based predictive model using 13 attributes that were selected based on their clinical importance, Multiple Linear Regression, and Information Gain Ranking methods. The negative effect of the imbalance class of the constructed model was handled by Synthetic Minority Oversampling Technique (SMOTE). The overall performance of the predictive model classifier was improved by the Ensemble machine learning approach using the Vote method with three Decision Trees (Naïve Bayes Tree, Random Forest, and Logistic Model Tree) and achieved high accuracy of prediction (AUC = 0.92). The study shows the potential of ensembling and SMOTE approaches for predicting incident diabetes using cardiorespiratory fitness data.

  15. Which sociodemographic factors are important on smoking behaviour of high school students? The contribution of classification and regression tree methodology in a broad epidemiological survey

    PubMed Central

    Özge, C; Toros, F; Bayramkaya, E; Çamdeviren, H; Şaşmaz, T

    2006-01-01

    Background The purpose of this study is to evaluate the most important sociodemographic factors on smoking status of high school students using a broad randomised epidemiological survey. Methods Using in‐class, self administered questionnaire about their sociodemographic variables and smoking behaviour, a representative sample of total 3304 students of preparatory, 9th, 10th, and 11th grades, from 22 randomly selected schools of Mersin, were evaluated and discriminative factors have been determined using appropriate statistics. In addition to binary logistic regression analysis, the study evaluated combined effects of these factors using classification and regression tree methodology, as a new statistical method. Results The data showed that 38% of the students reported lifetime smoking and 16.9% of them reported current smoking with a male predominancy and increasing prevalence by age. Second hand smoking was reported at a 74.3% frequency with father predominance (56.6%). The significantly important factors that affect current smoking in these age groups were increased by household size, late birth rank, certain school types, low academic performance, increased second hand smoking, and stress (especially reported as separation from a close friend or because of violence at home). Classification and regression tree methodology showed the importance of some neglected sociodemographic factors with a good classification capacity. Conclusions It was concluded that, as closely related with sociocultural factors, smoking was a common problem in this young population, generating important academic and social burden in youth life and with increasing data about this behaviour and using new statistical methods, effective coping strategies could be composed. PMID:16891446

  16. Method for estimating potential tree-grade distributions for northeastern forest species

    Treesearch

    Daniel A. Yaussy; Daniel A. Yaussy

    1993-01-01

    Generalized logistic regression was used to distribute trees into four potential tree grades for 20 northeastern species groups. The potential tree grade is defined as the tree grade based on the length and amount of clear cuttings and defects only, disregarding minimum grading diameter. The algorithms described use site index and tree diameter as the predictive...

  17. A retrospective analysis to identify the factors affecting infection in patients undergoing chemotherapy.

    PubMed

    Park, Ji Hyun; Kim, Hyeon-Young; Lee, Hanna; Yun, Eun Kyoung

    2015-12-01

    This study compares the performance of the logistic regression and decision tree analysis methods for assessing the risk factors for infection in cancer patients undergoing chemotherapy. The subjects were 732 cancer patients who were receiving chemotherapy at K university hospital in Seoul, Korea. The data were collected between March 2011 and February 2013 and were processed for descriptive analysis, logistic regression and decision tree analysis using the IBM SPSS Statistics 19 and Modeler 15.1 programs. The most common risk factors for infection in cancer patients receiving chemotherapy were identified as alkylating agents, vinca alkaloid and underlying diabetes mellitus. The logistic regression explained 66.7% of the variation in the data in terms of sensitivity and 88.9% in terms of specificity. The decision tree analysis accounted for 55.0% of the variation in the data in terms of sensitivity and 89.0% in terms of specificity. As for the overall classification accuracy, the logistic regression explained 88.0% and the decision tree analysis explained 87.2%. The logistic regression analysis showed a higher degree of sensitivity and classification accuracy. Therefore, logistic regression analysis is concluded to be the more effective and useful method for establishing an infection prediction model for patients undergoing chemotherapy. Copyright © 2015 Elsevier Ltd. All rights reserved.

  18. An optimal sample data usage strategy to minimize overfitting and underfitting effects in regression tree models based on remotely-sensed data

    USGS Publications Warehouse

    Gu, Yingxin; Wylie, Bruce K.; Boyte, Stephen; Picotte, Joshua J.; Howard, Danny; Smith, Kelcy; Nelson, Kurtis

    2016-01-01

    Regression tree models have been widely used for remote sensing-based ecosystem mapping. Improper use of the sample data (model training and testing data) may cause overfitting and underfitting effects in the model. The goal of this study is to develop an optimal sampling data usage strategy for any dataset and identify an appropriate number of rules in the regression tree model that will improve its accuracy and robustness. Landsat 8 data and Moderate-Resolution Imaging Spectroradiometer-scaled Normalized Difference Vegetation Index (NDVI) were used to develop regression tree models. A Python procedure was designed to generate random replications of model parameter options across a range of model development data sizes and rule number constraints. The mean absolute difference (MAD) between the predicted and actual NDVI (scaled NDVI, value from 0–200) and its variability across the different randomized replications were calculated to assess the accuracy and stability of the models. In our case study, a six-rule regression tree model developed from 80% of the sample data had the lowest MAD (MADtraining = 2.5 and MADtesting = 2.4), which was suggested as the optimal model. This study demonstrates how the training data and rule number selections impact model accuracy and provides important guidance for future remote-sensing-based ecosystem modeling.

  19. Comparison of Nine Statistical Model Based Warfarin Pharmacogenetic Dosing Algorithms Using the Racially Diverse International Warfarin Pharmacogenetic Consortium Cohort Database

    PubMed Central

    Liu, Rong; Li, Xi; Zhang, Wei; Zhou, Hong-Hao

    2015-01-01

    Objective Multiple linear regression (MLR) and machine learning techniques in pharmacogenetic algorithm-based warfarin dosing have been reported. However, performances of these algorithms in racially diverse group have never been objectively evaluated and compared. In this literature-based study, we compared the performances of eight machine learning techniques with those of MLR in a large, racially-diverse cohort. Methods MLR, artificial neural network (ANN), regression tree (RT), multivariate adaptive regression splines (MARS), boosted regression tree (BRT), support vector regression (SVR), random forest regression (RFR), lasso regression (LAR) and Bayesian additive regression trees (BART) were applied in warfarin dose algorithms in a cohort from the International Warfarin Pharmacogenetics Consortium database. Covariates obtained by stepwise regression from 80% of randomly selected patients were used to develop algorithms. To compare the performances of these algorithms, the mean percentage of patients whose predicted dose fell within 20% of the actual dose (mean percentage within 20%) and the mean absolute error (MAE) were calculated in the remaining 20% of patients. The performances of these techniques in different races, as well as the dose ranges of therapeutic warfarin were compared. Robust results were obtained after 100 rounds of resampling. Results BART, MARS and SVR were statistically indistinguishable and significantly out performed all the other approaches in the whole cohort (MAE: 8.84–8.96 mg/week, mean percentage within 20%: 45.88%–46.35%). In the White population, MARS and BART showed higher mean percentage within 20% and lower mean MAE than those of MLR (all p values < 0.05). In the Asian population, SVR, BART, MARS and LAR performed the same as MLR. MLR and LAR optimally performed among the Black population. When patients were grouped in terms of warfarin dose range, all machine learning techniques except ANN and LAR showed significantly higher mean percentage within 20%, and lower MAE (all p values < 0.05) than MLR in the low- and high- dose ranges. Conclusion Overall, machine learning-based techniques, BART, MARS and SVR performed superior than MLR in warfarin pharmacogenetic dosing. Differences of algorithms’ performances exist among the races. Moreover, machine learning-based algorithms tended to perform better in the low- and high- dose ranges than MLR. PMID:26305568

  20. Reconstructing missing daily precipitation data using regression trees and artificial neural networks

    USDA-ARS?s Scientific Manuscript database

    Incomplete meteorological data has been a problem in environmental modeling studies. The objective of this work was to develop a technique to reconstruct missing daily precipitation data in the central part of Chesapeake Bay Watershed using regression trees (RT) and artificial neural networks (ANN)....

  1. Reconstructing missing daily precipitation data using regression trees and artificial neural networks

    USDA-ARS?s Scientific Manuscript database

    Missing meteorological data have to be estimated for agricultural and environmental modeling. The objective of this work was to develop a technique to reconstruct the missing daily precipitation data in the central part of the Chesapeake Bay Watershed using regression trees (RT) and artificial neura...

  2. The influence of tree morphology on stemflow generation in a tropical lowland rainforest

    NASA Astrophysics Data System (ADS)

    Uber, Magdalena; Levia, Delphis F.; Zimmermann, Beate; Zimmermann, Alexander

    2014-05-01

    Even though stemflow usually accounts for only a small proportion of rainfall, it is an important point source of water and ion input to forest floors and may, for instance, influence soil moisture patterns and groundwater recharge. Previous studies showed that the generation of stemflow depends on a multitude of meteorological and biological factors. Interestingly, despite the tremendous progress in stemflow research during the last decades it is still largely unknown which combination of tree characteristics determines stemflow volumes in species-rich tropical forests. This knowledge gap motivated us to analyse the influence of tree characteristics on stemflow volumes in a 1 hectare plot located in a Panamanian lowland rainforest. Our study comprised stemflow measurements in six randomly selected 10 m by 10 m subplots. In each subplot we measured stemflow of all trees with a diameter at breast height (DBH) > 5 cm on an event-basis for a period of six weeks. Additionally, we identified all tree species and determined a set of tree characteristics including DBH, crown diameter, bark roughness, bark furrowing, epiphyte coverage, tree architecture, stem inclination, and crown position. During the sampling period, we collected 985 L of stemflow (0.98 % of total rainfall). Based on regression analyses and comparisons among plant functional groups we show that palms were most efficient in yielding stemflow due to their large inclined fronds. Trees with large emergent crowns also produced relatively large amounts of stemflow. Due to their abundance, understory trees contribute much to stemflow yield not on individual but on the plot scale. Even though parameters such as crown diameter, branch inclination and position of the crown influence stemflow generation to some extent, these parameters explain less than 30 % of the variation in stemflow volumes. In contrast to published results from temperate forests, we did not detect a negative correlation between bark roughness and stemflow volume. This is because other parameters such as crown diameter obscured this relationship. Due to multicollinearity and poor correlations between single tree characteristics with stemflow volume, an assessment of stemflow volumes based on forest characteristics remains cumbersome in highly diverse ecosystems. Instead of relying on regression relationships, we therefore advocate a total sampling of trees in several plots to determine stand-scale stemflow yield in tropical forests.

  3. Tree STEM and Canopy Biomass Estimates from Terrestrial Laser Scanning Data

    NASA Astrophysics Data System (ADS)

    Olofsson, K.; Holmgren, J.

    2017-10-01

    In this study an automatic method for estimating both the tree stem and the tree canopy biomass is presented. The point cloud tree extraction techniques operate on TLS data and models the biomass using the estimated stem and canopy volume as independent variables. The regression model fit error is of the order of less than 5 kg, which gives a relative model error of about 5 % for the stem estimate and 10-15 % for the spruce and pine canopy biomass estimates. The canopy biomass estimate was improved by separating the models by tree species which indicates that the method is allometry dependent and that the regression models need to be recomputed for different areas with different climate and different vegetation.

  4. A Millennial-length Reconstruction of the Western Pacific Pattern with Associated Paleoclimate

    NASA Astrophysics Data System (ADS)

    Wright, W. E.; Guan, B. T.; Wei, K.

    2010-12-01

    The Western Pacific Pattern (WP) is a lesser known 500 hPa pressure pattern similar to the NAO or PNA. As defined, the poles of the WP index are centered on 60°N over the Kamchatka peninsula and the neighboring Pacific and on 32.5°N over the western north Pacific. However, the area of influence for the southern half of the dipole includes a wide swath from East Asia, across Taiwan, through the Philippine Sea, to the western north Pacific. Tree rings of Taiwanese Chamaecyparis obtusa var. formosana in this extended region show significant correlation with the WP, and with local temperature. The WP is also significantly correlated with atmospheric temperatures over Taiwan, especially at 850hPa and 700 hPa, pressure levels that bracket the tree site. Spectral analysis indicates that variations in the WP occur at relatively high frequency, with most power at less than 5 years. Simple linear regression against high frequency variants of the tree-ring chronology yielded the most significant correlation coefficients. Two reconstructions are presented. The first uses a tree-ring time series produced as the first intrinsic mode function (IMF) from an Ensemble Empirical Mode Decomposition (EEMD), based on the Hilbert-Huang Transform. The significance of the regression using the EEMD-derived time series was much more significant than time series produced using traditional high pass filtering. The second also uses the first IMF of a tree-ring time series, but the dataset was first sorted and partitioned at a specified quantile prior to EEMD decomposition, with the mean of the partitioned data forming the input to the EEMD. The partitioning was done to filter out the less climatically sensitive tree rings, a common problem with shade tolerant trees. Time series statistics indicate that the first reconstruction is reliable to 1241 of the Common Era. Reliability of the second reconstruction is dependent on the development of statistics related to the quantile partitioning, and the consequent reduction in sample depth. However, the correlation coefficients from regressions over the instrumental period greatly exceed those from any other method of chronology generation, and so the technique holds promise. Additional atmospheric parameters having significant correlations against the WPO and tree ring time series with similar spatial patterns are also presented. These include vertical wind shear (850hPa-700hPa) over the northern Philippines and the Philippine Sea, surface Omega and 850hPa v-winds over the East China Sea, Japan and Taiwan. Possible links to changes in the subtropical jet stream will also be discussed.

  5. Improved predictive mapping of indoor radon concentrations using ensemble regression trees based on automatic clustering of geological units.

    PubMed

    Kropat, Georg; Bochud, Francois; Jaboyedoff, Michel; Laedermann, Jean-Pascal; Murith, Christophe; Palacios Gruson, Martha; Baechler, Sébastien

    2015-09-01

    According to estimations around 230 people die as a result of radon exposure in Switzerland. This public health concern makes reliable indoor radon prediction and mapping methods necessary in order to improve risk communication to the public. The aim of this study was to develop an automated method to classify lithological units according to their radon characteristics and to develop mapping and predictive tools in order to improve local radon prediction. About 240 000 indoor radon concentration (IRC) measurements in about 150 000 buildings were available for our analysis. The automated classification of lithological units was based on k-medoids clustering via pair-wise Kolmogorov distances between IRC distributions of lithological units. For IRC mapping and prediction we used random forests and Bayesian additive regression trees (BART). The automated classification groups lithological units well in terms of their IRC characteristics. Especially the IRC differences in metamorphic rocks like gneiss are well revealed by this method. The maps produced by random forests soundly represent the regional difference of IRCs in Switzerland and improve the spatial detail compared to existing approaches. We could explain 33% of the variations in IRC data with random forests. Additionally, the influence of a variable evaluated by random forests shows that building characteristics are less important predictors for IRCs than spatial/geological influences. BART could explain 29% of IRC variability and produced maps that indicate the prediction uncertainty. Ensemble regression trees are a powerful tool to model and understand the multidimensional influences on IRCs. Automatic clustering of lithological units complements this method by facilitating the interpretation of radon properties of rock types. This study provides an important element for radon risk communication. Future approaches should consider taking into account further variables like soil gas radon measurements as well as more detailed geological information. Copyright © 2015 Elsevier Ltd. All rights reserved.

  6. Generalized and synthetic regression estimators for randomized branch sampling

    Treesearch

    David L. R. Affleck; Timothy G. Gregoire

    2015-01-01

    In felled-tree studies, ratio and regression estimators are commonly used to convert more readily measured branch characteristics to dry crown mass estimates. In some cases, data from multiple trees are pooled to form these estimates. This research evaluates the utility of both tactics in the estimation of crown biomass following randomized branch sampling (...

  7. Cloud-Free Satellite Image Mosaics with Regression Trees and Histogram Matching.

    Treesearch

    E.H. Helmer; B. Ruefenacht

    2005-01-01

    Cloud-free optical satellite imagery simplifies remote sensing, but land-cover phenology limits existing solutions to persistent cloudiness to compositing temporally resolute, spatially coarser imagery. Here, a new strategy for developing cloud-free imagery at finer resolution permits simple automatic change detection. The strategy uses regression trees to predict...

  8. Regression estimators for late-instar gypsy moth larvae at low pupulation densities

    Treesearch

    W.E. Wallnr; A.S. Devito; Stanley J. Zarnoch

    1989-01-01

    Two regression estimators were developed for determining densities of late-instar gypsy moth, Lymantria dispar (Lepidoptera: Lymantriidae), larvae from burlap band and pyrethrin spray counts on oak trees in Vermont, Massachusetts, Connecticut, and New York. Studies were conducted by marking larvae on individual burlap banded trees within 15...

  9. What Satisfies Students?: Mining Student-Opinion Data with Regression and Decision Tree Analysis

    ERIC Educational Resources Information Center

    Thomas, Emily H.; Galambos, Nora

    2004-01-01

    To investigate how students' characteristics and experiences affect satisfaction, this study uses regression and decision tree analysis with the CHAID algorithm to analyze student-opinion data. A data mining approach identifies the specific aspects of students' university experience that most influence three measures of general satisfaction. The…

  10. Tree-Ring Based May-July Temperature Reconstruction Since AD 1630 on the Western Loess Plateau, China

    PubMed Central

    Song, Huiming; Liu, Yu; Li, Qiang; Gao, Na; Ma, Yongyong; Zhang, Yanhua

    2014-01-01

    Tree-ring samples from Chinese Pine (Pinus tabulaeformis Carr.) collected at Mt. Shimen on the western Loess Plateau, China, were used to reconstruct the mean May–July temperature during AD 1630–2011. The regression model explained 48% of the adjusted variance in the instrumentally observed mean May–July temperature. The reconstruction revealed significant temperature variations at interannual to decadal scales. Cool periods observed in the reconstruction coincided with reduced solar activities. The reconstructed temperature matched well with two other tree-ring based temperature reconstructions conducted on the northern slope of the Qinling Mountains (on the southern margin of the Loess Plateau of China) for both annual and decadal scales. In addition, this study agreed well with several series derived from different proxies. This reconstruction improves upon the sparse network of high-resolution paleoclimatic records for the western Loess Plateau, China. PMID:24690885

  11. A comparison of genomic selection models across time in interior spruce (Picea engelmannii × glauca) using unordered SNP imputation methods

    PubMed Central

    Ratcliffe, B; El-Dien, O G; Klápště, J; Porth, I; Chen, C; Jaquish, B; El-Kassaby, Y A

    2015-01-01

    Genomic selection (GS) potentially offers an unparalleled advantage over traditional pedigree-based selection (TS) methods by reducing the time commitment required to carry out a single cycle of tree improvement. This quality is particularly appealing to tree breeders, where lengthy improvement cycles are the norm. We explored the prospect of implementing GS for interior spruce (Picea engelmannii × glauca) utilizing a genotyped population of 769 trees belonging to 25 open-pollinated families. A series of repeated tree height measurements through ages 3–40 years permitted the testing of GS methods temporally. The genotyping-by-sequencing (GBS) platform was used for single nucleotide polymorphism (SNP) discovery in conjunction with three unordered imputation methods applied to a data set with 60% missing information. Further, three diverse GS models were evaluated based on predictive accuracy (PA), and their marker effects. Moderate levels of PA (0.31–0.55) were observed and were of sufficient capacity to deliver improved selection response over TS. Additionally, PA varied substantially through time accordingly with spatial competition among trees. As expected, temporal PA was well correlated with age-age genetic correlation (r=0.99), and decreased substantially with increasing difference in age between the training and validation populations (0.04–0.47). Moreover, our imputation comparisons indicate that k-nearest neighbor and singular value decomposition yielded a greater number of SNPs and gave higher predictive accuracies than imputing with the mean. Furthermore, the ridge regression (rrBLUP) and BayesCπ (BCπ) models both yielded equal, and better PA than the generalized ridge regression heteroscedastic effect model for the traits evaluated. PMID:26126540

  12. A comparison of genomic selection models across time in interior spruce (Picea engelmannii × glauca) using unordered SNP imputation methods.

    PubMed

    Ratcliffe, B; El-Dien, O G; Klápště, J; Porth, I; Chen, C; Jaquish, B; El-Kassaby, Y A

    2015-12-01

    Genomic selection (GS) potentially offers an unparalleled advantage over traditional pedigree-based selection (TS) methods by reducing the time commitment required to carry out a single cycle of tree improvement. This quality is particularly appealing to tree breeders, where lengthy improvement cycles are the norm. We explored the prospect of implementing GS for interior spruce (Picea engelmannii × glauca) utilizing a genotyped population of 769 trees belonging to 25 open-pollinated families. A series of repeated tree height measurements through ages 3-40 years permitted the testing of GS methods temporally. The genotyping-by-sequencing (GBS) platform was used for single nucleotide polymorphism (SNP) discovery in conjunction with three unordered imputation methods applied to a data set with 60% missing information. Further, three diverse GS models were evaluated based on predictive accuracy (PA), and their marker effects. Moderate levels of PA (0.31-0.55) were observed and were of sufficient capacity to deliver improved selection response over TS. Additionally, PA varied substantially through time accordingly with spatial competition among trees. As expected, temporal PA was well correlated with age-age genetic correlation (r=0.99), and decreased substantially with increasing difference in age between the training and validation populations (0.04-0.47). Moreover, our imputation comparisons indicate that k-nearest neighbor and singular value decomposition yielded a greater number of SNPs and gave higher predictive accuracies than imputing with the mean. Furthermore, the ridge regression (rrBLUP) and BayesCπ (BCπ) models both yielded equal, and better PA than the generalized ridge regression heteroscedastic effect model for the traits evaluated.

  13. Regression analysis using dependent Polya trees.

    PubMed

    Schörgendorfer, Angela; Branscum, Adam J

    2013-11-30

    Many commonly used models for linear regression analysis force overly simplistic shape and scale constraints on the residual structure of data. We propose a semiparametric Bayesian model for regression analysis that produces data-driven inference by using a new type of dependent Polya tree prior to model arbitrary residual distributions that are allowed to evolve across increasing levels of an ordinal covariate (e.g., time, in repeated measurement studies). By modeling residual distributions at consecutive covariate levels or time points using separate, but dependent Polya tree priors, distributional information is pooled while allowing for broad pliability to accommodate many types of changing residual distributions. We can use the proposed dependent residual structure in a wide range of regression settings, including fixed-effects and mixed-effects linear and nonlinear models for cross-sectional, prospective, and repeated measurement data. A simulation study illustrates the flexibility of our novel semiparametric regression model to accurately capture evolving residual distributions. In an application to immune development data on immunoglobulin G antibodies in children, our new model outperforms several contemporary semiparametric regression models based on a predictive model selection criterion. Copyright © 2013 John Wiley & Sons, Ltd.

  14. DIF Trees: Using Classification Trees to Detect Differential Item Functioning

    ERIC Educational Resources Information Center

    Vaughn, Brandon K.; Wang, Qiu

    2010-01-01

    A nonparametric tree classification procedure is used to detect differential item functioning for items that are dichotomously scored. Classification trees are shown to be an alternative procedure to detect differential item functioning other than the use of traditional Mantel-Haenszel and logistic regression analysis. A nonparametric…

  15. Design of Probabilistic Random Forests with Applications to Anticancer Drug Sensitivity Prediction

    PubMed Central

    Rahman, Raziur; Haider, Saad; Ghosh, Souparno; Pal, Ranadip

    2015-01-01

    Random forests consisting of an ensemble of regression trees with equal weights are frequently used for design of predictive models. In this article, we consider an extension of the methodology by representing the regression trees in the form of probabilistic trees and analyzing the nature of heteroscedasticity. The probabilistic tree representation allows for analytical computation of confidence intervals (CIs), and the tree weight optimization is expected to provide stricter CIs with comparable performance in mean error. We approached the ensemble of probabilistic trees’ prediction from the perspectives of a mixture distribution and as a weighted sum of correlated random variables. We applied our methodology to the drug sensitivity prediction problem on synthetic and cancer cell line encyclopedia dataset and illustrated that tree weights can be selected to reduce the average length of the CI without increase in mean error. PMID:27081304

  16. Regression Trees Identify Relevant Interactions: Can This Improve the Predictive Performance of Risk Adjustment?

    PubMed

    Buchner, Florian; Wasem, Jürgen; Schillo, Sonja

    2017-01-01

    Risk equalization formulas have been refined since their introduction about two decades ago. Because of the complexity and the abundance of possible interactions between the variables used, hardly any interactions are considered. A regression tree is used to systematically search for interactions, a methodologically new approach in risk equalization. Analyses are based on a data set of nearly 2.9 million individuals from a major German social health insurer. A two-step approach is applied: In the first step a regression tree is built on the basis of the learning data set. Terminal nodes characterized by more than one morbidity-group-split represent interaction effects of different morbidity groups. In the second step the 'traditional' weighted least squares regression equation is expanded by adding interaction terms for all interactions detected by the tree, and regression coefficients are recalculated. The resulting risk adjustment formula shows an improvement in the adjusted R 2 from 25.43% to 25.81% on the evaluation data set. Predictive ratios are calculated for subgroups affected by the interactions. The R 2 improvement detected is only marginal. According to the sample level performance measures used, not involving a considerable number of morbidity interactions forms no relevant loss in accuracy. Copyright © 2015 John Wiley & Sons, Ltd. Copyright © 2015 John Wiley & Sons, Ltd.

  17. Foot and hip contributions to high frontal plane knee projection angle in athletes: a classification and regression tree approach.

    PubMed

    Bittencourt, Natalia F N; Ocarino, Juliana M; Mendonça, Luciana D M; Hewett, Timothy E; Fonseca, Sergio T

    2012-12-01

    Cross-sectional. To investigate predictors of increased frontal plane knee projection angle (FPKPA) in athletes. The underlying mechanisms that lead to increased FPKPA are likely multifactorial and depend on how the musculoskeletal system adapts to the possible interactions between its distal and proximal segments. Bivariate and linear analyses traditionally employed to analyze the occurrence of increased FPKPA are not sufficiently robust to capture complex relationships among predictors. The investigation of nonlinear interactions among biomechanical factors is necessary to further our understanding of the interdependence of lower-limb segments and resultant dynamic knee alignment. The FPKPA was assessed in 101 athletes during a single-leg squat and in 72 athletes at the moment of landing from a jump. The investigated predictors were sex, hip abductor isometric torque, passive range of motion (ROM) of hip internal rotation (IR), and shank-forefoot alignment. Classification and regression trees were used to investigate nonlinear interactions among predictors and their influence on the occurrence of increased FPKPA. During single-leg squatting, the occurrence of high FPKPA was predicted by the interaction between hip abductor isometric torque and passive hip IR ROM. At the moment of landing, the shank-forefoot alignment, abductor isometric torque, and passive hip IR ROM were predictors of high FPKPA. In addition, the classification and regression trees established cutoff points that could be used in clinical practice to identify athletes who are at potential risk for excessive FPKPA. The models captured nonlinear interactions between hip abductor isometric torque, passive hip IR ROM, and shank-forefoot alignment.

  18. A Hybrid Approach of Stepwise Regression, Logistic Regression, Support Vector Machine, and Decision Tree for Forecasting Fraudulent Financial Statements

    PubMed Central

    Goo, Yeong-Jia James; Shen, Zone-De

    2014-01-01

    As the fraudulent financial statement of an enterprise is increasingly serious with each passing day, establishing a valid forecasting fraudulent financial statement model of an enterprise has become an important question for academic research and financial practice. After screening the important variables using the stepwise regression, the study also matches the logistic regression, support vector machine, and decision tree to construct the classification models to make a comparison. The study adopts financial and nonfinancial variables to assist in establishment of the forecasting fraudulent financial statement model. Research objects are the companies to which the fraudulent and nonfraudulent financial statement happened between years 1998 to 2012. The findings are that financial and nonfinancial information are effectively used to distinguish the fraudulent financial statement, and decision tree C5.0 has the best classification effect 85.71%. PMID:25302338

  19. A hybrid approach of stepwise regression, logistic regression, support vector machine, and decision tree for forecasting fraudulent financial statements.

    PubMed

    Chen, Suduan; Goo, Yeong-Jia James; Shen, Zone-De

    2014-01-01

    As the fraudulent financial statement of an enterprise is increasingly serious with each passing day, establishing a valid forecasting fraudulent financial statement model of an enterprise has become an important question for academic research and financial practice. After screening the important variables using the stepwise regression, the study also matches the logistic regression, support vector machine, and decision tree to construct the classification models to make a comparison. The study adopts financial and nonfinancial variables to assist in establishment of the forecasting fraudulent financial statement model. Research objects are the companies to which the fraudulent and nonfraudulent financial statement happened between years 1998 to 2012. The findings are that financial and nonfinancial information are effectively used to distinguish the fraudulent financial statement, and decision tree C5.0 has the best classification effect 85.71%.

  20. Mortality predictions of fire-injured large Douglas-fir and ponderosa pine in Oregon and Washington, USA

    Treesearch

    Lisa M. Ganio; Robert A. Progar

    2017-01-01

    Wild and prescribed fire-induced injury to forest trees can produce immediate or delayed tree mortality but fire-injured trees can also survive. Land managers use logistic regression models that incorporate tree-injury variables to discriminate between fatally injured trees and those that will survive. We used data from 4024 ponderosa pine (Pinus ponderosa...

  1. Using Evidence-Based Decision Trees Instead of Formulas to Identify At-Risk Readers. REL 2014-036

    ERIC Educational Resources Information Center

    Koon, Sharon; Petscher, Yaacov; Foorman, Barbara R.

    2014-01-01

    This study examines whether the classification and regression tree (CART) model improves the early identification of students at risk for reading comprehension difficulties compared with the more difficult to interpret logistic regression model. CART is a type of predictive modeling that relies on nonparametric techniques. It presents results in…

  2. Identification of Sexually Abused Female Adolescents at Risk for Suicidal Ideations: A Classification and Regression Tree Analysis

    ERIC Educational Resources Information Center

    Brabant, Marie-Eve; Hebert, Martine; Chagnon, Francois

    2013-01-01

    This study explored the clinical profiles of 77 female teenager survivors of sexual abuse and examined the association of abuse-related and personal variables with suicidal ideations. Analyses revealed that 64% of participants experienced suicidal ideations. Findings from classification and regression tree analysis indicated that depression,…

  3. Forest type mapping of the Interior West

    Treesearch

    Bonnie Ruefenacht; Gretchen G. Moisen; Jock A. Blackard

    2004-01-01

    This paper develops techniques for the mapping of forest types in Arizona, New Mexico, and Wyoming. The methods involve regression-tree modeling using a variety of remote sensing and GIS layers along with Forest Inventory Analysis (FIA) point data. Regression-tree modeling is a fast and efficient technique of estimating variables for large data sets with high accuracy...

  4. What Satisfies Students? Mining Student-Opinion Data with Regression and Decision-Tree Analysis. AIR 2002 Forum Paper.

    ERIC Educational Resources Information Center

    Thomas, Emily H.; Galambos, Nora

    To investigate how students' characteristics and experiences affect satisfaction, this study used regression and decision-tree analysis with the CHAID algorithm to analyze student opinion data from a sample of 1,783 college students. A data-mining approach identifies the specific aspects of students' university experience that most influence three…

  5. Using the PDD Behavior Inventory as a Level 2 Screener: A Classification and Regression Trees Analysis

    ERIC Educational Resources Information Center

    Cohen, Ira L.; Liu, Xudong; Hudson, Melissa; Gillis, Jennifer; Cavalari, Rachel N. S.; Romanczyk, Raymond G.; Karmel, Bernard Z.; Gardner, Judith M.

    2016-01-01

    In order to improve discrimination accuracy between Autism Spectrum Disorder (ASD) and similar neurodevelopmental disorders, a data mining procedure, Classification and Regression Trees (CART), was used on a large multi-site sample of PDD Behavior Inventory (PDDBI) forms on children with and without ASD. Discrimination accuracy exceeded 80%,…

  6. Classification and regression tree analysis vs. multivariable linear and logistic regression methods as statistical tools for studying haemophilia.

    PubMed

    Henrard, S; Speybroeck, N; Hermans, C

    2015-11-01

    Haemophilia is a rare genetic haemorrhagic disease characterized by partial or complete deficiency of coagulation factor VIII, for haemophilia A, or IX, for haemophilia B. As in any other medical research domain, the field of haemophilia research is increasingly concerned with finding factors associated with binary or continuous outcomes through multivariable models. Traditional models include multiple logistic regressions, for binary outcomes, and multiple linear regressions for continuous outcomes. Yet these regression models are at times difficult to implement, especially for non-statisticians, and can be difficult to interpret. The present paper sought to didactically explain how, why, and when to use classification and regression tree (CART) analysis for haemophilia research. The CART method is non-parametric and non-linear, based on the repeated partitioning of a sample into subgroups based on a certain criterion. Breiman developed this method in 1984. Classification trees (CTs) are used to analyse categorical outcomes and regression trees (RTs) to analyse continuous ones. The CART methodology has become increasingly popular in the medical field, yet only a few examples of studies using this methodology specifically in haemophilia have to date been published. Two examples using CART analysis and previously published in this field are didactically explained in details. There is increasing interest in using CART analysis in the health domain, primarily due to its ease of implementation, use, and interpretation, thus facilitating medical decision-making. This method should be promoted for analysing continuous or categorical outcomes in haemophilia, when applicable. © 2015 John Wiley & Sons Ltd.

  7. Hyperspectral Analysis of Soil Nitrogen, Carbon, Carbonate, and Organic Matter Using Regression Trees

    PubMed Central

    Gmur, Stephan; Vogt, Daniel; Zabowski, Darlene; Moskal, L. Monika

    2012-01-01

    The characterization of soil attributes using hyperspectral sensors has revealed patterns in soil spectra that are known to respond to mineral composition, organic matter, soil moisture and particle size distribution. Soil samples from different soil horizons of replicated soil series from sites located within Washington and Oregon were analyzed with the FieldSpec Spectroradiometer to measure their spectral signatures across the electromagnetic range of 400 to 1,000 nm. Similarity rankings of individual soil samples reveal differences between replicate series as well as samples within the same replicate series. Using classification and regression tree statistical methods, regression trees were fitted to each spectral response using concentrations of nitrogen, carbon, carbonate and organic matter as the response variables. Statistics resulting from fitted trees were: nitrogen R2 0.91 (p < 0.01) at 403, 470, 687, and 846 nm spectral band widths, carbonate R2 0.95 (p < 0.01) at 531 and 898 nm band widths, total carbon R2 0.93 (p < 0.01) at 400, 409, 441 and 907 nm band widths, and organic matter R2 0.98 (p < 0.01) at 300, 400, 441, 832 and 907 nm band widths. Use of the 400 to 1,000 nm electromagnetic range utilizing regression trees provided a powerful, rapid and inexpensive method for assessing nitrogen, carbon, carbonate and organic matter for upper soil horizons in a nondestructive method. PMID:23112620

  8. Black bear (Ursus americanus Pallas) feeding damage across timber harvest edges in northern California coast redwood (Sequoia sempervirens[D. Don] Endl.) forests, USA

    USGS Publications Warehouse

    Russell, W.H.; Carnell, K.; McBride, J.R.

    2001-01-01

    Feeding damage to trees by black bears (Ursus americanus Pallas) was recorded in proximity to timber harvest edges in harvested and old-growth stands of coast redwood (Sequoia sempervirens [D. Don] Endl.) in northern California, USA. Bears exhibited distinct preference in their feeding patterns related to stand structure and composition and to distance from the timber-harvest edge. Most damage was recorded within regenerating stands. Regression analysis indicated that density of damaged trees was negatively correlated with distance from timber harvest edges within old-growth stands. A significant negative correlation was also found between the density of trees damaged by bears and habitat diversity (H') as measured by the Shannon diversity index. In addition, bears exhibited preference for pole-size trees (dbh = 10-50 cm) over all other size classes, and coast redwood over other species. In general, damage by bears appeared to act as a natural thinning agent in even-aged stands. No damage was recorded in old-growth stands except in close proximity to the timber-harvest edge where subcanopy recruitment was high.

  9. Risk Factors of Falls in Community-Dwelling Older Adults: Logistic Regression Tree Analysis

    ERIC Educational Resources Information Center

    Yamashita, Takashi; Noe, Douglas A.; Bailer, A. John

    2012-01-01

    Purpose of the Study: A novel logistic regression tree-based method was applied to identify fall risk factors and possible interaction effects of those risk factors. Design and Methods: A nationally representative sample of American older adults aged 65 years and older (N = 9,592) in the Health and Retirement Study 2004 and 2006 modules was used.…

  10. Analytical framework for reconstructing heterogeneous environmental variables from mammal community structure.

    PubMed

    Louys, Julien; Meloro, Carlo; Elton, Sarah; Ditchfield, Peter; Bishop, Laura C

    2015-01-01

    We test the performance of two models that use mammalian communities to reconstruct multivariate palaeoenvironments. While both models exploit the correlation between mammal communities (defined in terms of functional groups) and arboreal heterogeneity, the first uses a multiple multivariate regression of community structure and arboreal heterogeneity, while the second uses a linear regression of the principal components of each ecospace. The success of these methods means the palaeoenvironment of a particular locality can be reconstructed in terms of the proportions of heavy, moderate, light, and absent tree canopy cover. The linear regression is less biased, and more precisely and accurately reconstructs heavy tree canopy cover than the multiple multivariate model. However, the multiple multivariate model performs better than the linear regression for all other canopy cover categories. Both models consistently perform better than randomly generated reconstructions. We apply both models to the palaeocommunity of the Upper Laetolil Beds, Tanzania. Our reconstructions indicate that there was very little heavy tree cover at this site (likely less than 10%), with the palaeo-landscape instead comprising a mixture of light and absent tree cover. These reconstructions help resolve the previous conflicting palaeoecological reconstructions made for this site. Copyright © 2014 Elsevier Ltd. All rights reserved.

  11. Modeling vertebrate diversity in Oregon using satellite imagery

    NASA Astrophysics Data System (ADS)

    Cablk, Mary Elizabeth

    Vertebrate diversity was modeled for the state of Oregon using a parametric approach to regression tree analysis. This exploratory data analysis effectively modeled the non-linear relationships between vertebrate richness and phenology, terrain, and climate. Phenology was derived from time-series NOAA-AVHRR satellite imagery for the year 1992 using two methods: principal component analysis and derivation of EROS data center greenness metrics. These two measures of spatial and temporal vegetation condition incorporated the critical temporal element in this analysis. The first three principal components were shown to contain spatial and temporal information about the landscape and discriminated phenologically distinct regions in Oregon. Principal components 2 and 3, 6 greenness metrics, elevation, slope, aspect, annual precipitation, and annual seasonal temperature difference were investigated as correlates to amphibians, birds, all vertebrates, reptiles, and mammals. Variation explained for each regression tree by taxa were: amphibians (91%), birds (67%), all vertebrates (66%), reptiles (57%), and mammals (55%). Spatial statistics were used to quantify the pattern of each taxa and assess validity of resulting predictions from regression tree models. Regression tree analysis was relatively robust against spatial autocorrelation in the response data and graphical results indicated models were well fit to the data.

  12. Environmental factors and flow paths related to Escherichia coli concentrations at two beaches on Lake St. Clair, Michigan, 2002–2005

    USGS Publications Warehouse

    Holtschlag, David J.; Shively, Dawn; Whitman, Richard L.; Haack, Sheridan K.; Fogarty, Lisa R.

    2008-01-01

    Regression analyses and hydrodynamic modeling were used to identify environmental factors and flow paths associated with Escherichia coli (E. coli) concentrations at Memorial and Metropolitan Beaches on Lake St. Clair in Macomb County, Mich. Lake St. Clair is part of the binational waterway between the United States and Canada that connects Lake Huron with Lake Erie in the Great Lakes Basin. Linear regression, regression-tree, and logistic regression models were developed from E. coli concentration and ancillary environmental data. Linear regression models on log10 E. coli concentrations indicated that rainfall prior to sampling, water temperature, and turbidity were positively associated with bacteria concentrations at both beaches. Flow from Clinton River, changes in water levels, wind conditions, and log10 E. coli concentrations 2 days before or after the target bacteria concentrations were statistically significant at one or both beaches. In addition, various interaction terms were significant at Memorial Beach. Linear regression models for both beaches explained only about 30 percent of the variability in log10 E. coli concentrations. Regression-tree models were developed from data from both Memorial and Metropolitan Beaches but were found to have limited predictive capability in this study. The results indicate that too few observations were available to develop reliable regression-tree models. Linear logistic models were developed to estimate the probability of E. coli concentrations exceeding 300 most probable number (MPN) per 100 milliliters (mL). Rainfall amounts before bacteria sampling were positively associated with exceedance probabilities at both beaches. Flow of Clinton River, turbidity, and log10 E. coli concentrations measured before or after the target E. coli measurements were related to exceedances at one or both beaches. The linear logistic models were effective in estimating bacteria exceedances at both beaches. A receiver operating characteristic (ROC) analysis was used to determine cut points for maximizing the true positive rate prediction while minimizing the false positive rate. A two-dimensional hydrodynamic model was developed to simulate horizontal current patterns on Lake St. Clair in response to wind, flow, and water-level conditions at model boundaries. Simulated velocity fields were used to track hypothetical massless particles backward in time from the beaches along flow paths toward source areas. Reverse particle tracking for idealized steady-state conditions shows changes in expected flow paths and traveltimes with wind speeds and directions from 24 sectors. The results indicate that three to four sets of contiguous wind sectors have similar effects on flow paths in the vicinity of the beaches. In addition, reverse particle tracking was used for transient conditions to identify expected flow paths for 10 E. coli sampling events in 2004. These results demonstrate the ability to track hypothetical particles from the beaches, backward in time, to likely source areas. This ability, coupled with a greater frequency of bacteria sampling, may provide insight into changes in bacteria concentrations between source and sink areas.

  13. A regression tree for identifying combinations of fall risk factors associated to recurrent falling: a cross-sectional elderly population-based study.

    PubMed

    Kabeshova, A; Annweiler, C; Fantino, B; Philip, T; Gromov, V A; Launay, C P; Beauchet, O

    2014-06-01

    Regression tree (RT) analyses are particularly adapted to explore the risk of recurrent falling according to various combinations of fall risk factors compared to logistic regression models. The aims of this study were (1) to determine which combinations of fall risk factors were associated with the occurrence of recurrent falls in older community-dwellers, and (2) to compare the efficacy of RT and multiple logistic regression model for the identification of recurrent falls. A total of 1,760 community-dwelling volunteers (mean age ± standard deviation, 71.0 ± 5.1 years; 49.4 % female) were recruited prospectively in this cross-sectional study. Age, gender, polypharmacy, use of psychoactive drugs, fear of falling (FOF), cognitive disorders and sad mood were recorded. In addition, the history of falls within the past year was recorded using a standardized questionnaire. Among 1,760 participants, 19.7 % (n = 346) were recurrent fallers. The RT identified 14 nodes groups and 8 end nodes with FOF as the first major split. Among participants with FOF, those who had sad mood and polypharmacy formed the end node with the greatest OR for recurrent falls (OR = 6.06 with p < 0.001). Among participants without FOF, those who were male and not sad had the lowest OR for recurrent falls (OR = 0.25 with p < 0.001). The RT correctly classified 1,356 from 1,414 non-recurrent fallers (specificity = 95.6 %), and 65 from 346 recurrent fallers (sensitivity = 18.8 %). The overall classification accuracy was 81.0 %. The multiple logistic regression correctly classified 1,372 from 1,414 non-recurrent fallers (specificity = 97.0 %), and 61 from 346 recurrent fallers (sensitivity = 17.6 %). The overall classification accuracy was 81.4 %. Our results show that RT may identify specific combinations of risk factors for recurrent falls, the combination most associated with recurrent falls involving FOF, sad mood and polypharmacy. The FOF emerged as the risk factor strongly associated with recurrent falls. In addition, RT and multiple logistic regression were not sensitive enough to identify the majority of recurrent fallers but appeared efficient in detecting individuals not at risk of recurrent falls.

  14. Distribution of cavity trees in midwestern old-growth and second-growth forests

    Treesearch

    Zhaofei Fan; Stephen R. Shifley; Martin A. Spetich; Frank R. Thompson; David R. Larsen

    2003-01-01

    We used classification and regression tree analysis to determine the primary variables associated with the occurrence of cavity trees and the hierarchical structure among those variables. We applied that information to develop logistic models predicting cavity tree probability as a function of diameter, species group, and decay class. Inventories of cavity abundance in...

  15. Distribution of cavity trees in midwesternold-growth and second-growth forests

    Treesearch

    Zhaofei Fan; Stephen R. Shifley; Martin A. Spetich; Frank R., III Thompson; David R. Larsen

    2003-01-01

    We used classification and regression tree analysis to determine the primary variables associated with the occurrence of cavity trees and the hierarchical structure among those variables. We applied that information to develop logistic models predicting cavity tree probability as a function of diameter, species group, and decay class. Inventories of cavity abundance in...

  16. A hierarchical linear model for tree height prediction.

    Treesearch

    Vicente J. Monleon

    2003-01-01

    Measuring tree height is a time-consuming process. Often, tree diameter is measured and height is estimated from a published regression model. Trees used to develop these models are clustered into stands, but this structure is ignored and independence is assumed. In this study, hierarchical linear models that account explicitly for the clustered structure of the data...

  17. Modeling individual tree survial

    Treesearch

    Quang V. Cao

    2016-01-01

    Information provided by growth and yield models is the basis for forest managers to make decisions on how to manage their forests. Among different types of growth models, whole-stand models offer predictions at stand level, whereas individual-tree models give detailed information at tree level. The well-known logistic regression is commonly used to predict tree...

  18. Using CART to Identify Thresholds and Hierarchies in the Determinants of Funding Decisions.

    PubMed

    Schilling, Chris; Mortimer, Duncan; Dalziel, Kim

    2017-02-01

    There is much interest in understanding decision-making processes that determine funding outcomes for health interventions. We use classification and regression trees (CART) to identify cost-effectiveness thresholds and hierarchies in the determinants of funding decisions. The hierarchical structure of CART is suited to analyzing complex conditional and nonlinear relationships. Our analysis uncovered hierarchies where interventions were grouped according to their type and objective. Cost-effectiveness thresholds varied markedly depending on which group the intervention belonged to: lifestyle-type interventions with a prevention objective had an incremental cost-effectiveness threshold of $2356, suggesting that such interventions need to be close to cost saving or dominant to be funded. For lifestyle-type interventions with a treatment objective, the threshold was much higher at $37,024. Lower down the tree, intervention attributes such as the level of patient contribution and the eligibility for government reimbursement influenced the likelihood of funding within groups of similar interventions. Comparison between our CART models and previously published results demonstrated concurrence with standard regression techniques while providing additional insights regarding the role of the funding environment and the structure of decision-maker preferences.

  19. [Hyperspectral Estimation of Apple Tree Canopy LAI Based on SVM and RF Regression].

    PubMed

    Han, Zhao-ying; Zhu, Xi-cun; Fang, Xian-yi; Wang, Zhuo-yuan; Wang, Ling; Zhao, Geng-Xing; Jiang, Yuan-mao

    2016-03-01

    Leaf area index (LAI) is the dynamic index of crop population size. Hyperspectral technology can be used to estimate apple canopy LAI rapidly and nondestructively. It can be provide a reference for monitoring the tree growing and yield estimation. The Red Fuji apple trees of full bearing fruit are the researching objects. Ninety apple trees canopies spectral reflectance and LAI values were measured by the ASD Fieldspec3 spectrometer and LAI-2200 in thirty orchards in constant two years in Qixia research area of Shandong Province. The optimal vegetation indices were selected by the method of correlation analysis of the original spectral reflectance and vegetation indices. The models of predicting the LAI were built with the multivariate regression analysis method of support vector machine (SVM) and random forest (RF). The new vegetation indices, GNDVI527, ND-VI676, RVI682, FD-NVI656 and GRVI517 and the previous two main vegetation indices, NDVI670 and NDVI705, are in accordance with LAI. In the RF regression model, the calibration set decision coefficient C-R2 of 0.920 and validation set decision coefficient V-R2 of 0.889 are higher than the SVM regression model by 0.045 and 0.033 respectively. The root mean square error of calibration set C-RMSE of 0.249, the root mean square error validation set V-RMSE of 0.236 are lower than that of the SVM regression model by 0.054 and 0.058 respectively. Relative analysis of calibrating error C-RPD and relative analysis of validation set V-RPD reached 3.363 and 2.520, 0.598 and 0.262, respectively, which were higher than the SVM regression model. The measured and predicted the scatterplot trend line slope of the calibration set and validation set C-S and V-S are close to 1. The estimation result of RF regression model is better than that of the SVM. RF regression model can be used to estimate the LAI of red Fuji apple trees in full fruit period.

  20. Log and tree sawing times for hardwood mills

    Treesearch

    Everette D. Rast

    1974-01-01

    Data on 6,850 logs and 1,181 trees were analyzed to predict sawing times. For both logs and trees, regression equations were derived that express (in minutes) sawing time per log or tree and per Mbf. For trees, merchantable height is expressed in number of logs as well as in feet. One of the major uses for the tables of average sawing times is as a bench mark against...

  1. Logistic regression trees for initial selection of interesting loci in case-control studies

    PubMed Central

    Nickolov, Radoslav Z; Milanov, Valentin B

    2007-01-01

    Modern genetic epidemiology faces the challenge of dealing with hundreds of thousands of genetic markers. The selection of a small initial subset of interesting markers for further investigation can greatly facilitate genetic studies. In this contribution we suggest the use of a logistic regression tree algorithm known as logistic tree with unbiased selection. Using the simulated data provided for Genetic Analysis Workshop 15, we show how this algorithm, with incorporation of multifactor dimensionality reduction method, can reduce an initial large pool of markers to a small set that includes the interesting markers with high probability. PMID:18466557

  2. Exposure and effects of perfluoroalkyl substances in tree swallows nesting in Minnesota and Wisconsin, USA

    USGS Publications Warehouse

    Custer, Christine M.; Custer, Thomas W.; Dummer, Paul; Etterson, Matthew A.; Thogmartin, Wayne E.; Wu, Qian; Kannan, Kurunthachalam; Trowbridge, Annette; McKann, Patrick C.

    2013-01-01

    The exposure and effects of perfluoroalkyl substances (PFASs) were studied at eight locations in Minnesota and Wisconsin between 2007 and 2011 using tree swallows (Tachycineta bicolor). Concentrations of PFASs were quantified as were reproductive success end points. The sample egg method was used wherein an egg sample is collected, and the hatching success of the remaining eggs in the nest is assessed. The association between PFAS exposure and reproductive success was assessed by site comparisons, logistic regression analysis, and multistate modeling, a technique not previously used in this context. There was a negative association between concentrations of perfluorooctane sulfonate (PFOS) in eggs and hatching success. The concentration at which effects became evident (150–200 ng/g wet weight) was far lower than effect levels found in laboratory feeding trials or egg-injection studies of other avian species. This discrepancy was likely because behavioral effects and other extrinsic factors are not accounted for in these laboratory studies and the possibility that tree swallows are unusually sensitive to PFASs. The results from multistate modeling and simple logistic regression analyses were nearly identical. Multistate modeling provides a better method to examine possible effects of additional covariates and assessment of models using Akaike information criteria analyses. There was a credible association between PFOS concentrations in plasma and eggs, so extrapolation between these two commonly sampled tissues can be performed.

  3. Regression modeling and mapping of coniferous forest basal area and tree density from discrete-return lidar and multispectral data

    Treesearch

    Andrew T. Hudak; Nicholas L. Crookston; Jeffrey S. Evans; Michael K. Falkowski; Alistair M. S. Smith; Paul E. Gessler; Penelope Morgan

    2006-01-01

    We compared the utility of discrete-return light detection and ranging (lidar) data and multispectral satellite imagery, and their integration, for modeling and mapping basal area and tree density across two diverse coniferous forest landscapes in north-central Idaho. We applied multiple linear regression models subset from a suite of 26 predictor variables derived...

  4. Assessing College Student Interest in Math and/or Computer Science in a Cross-National Sample Using Classification and Regression Trees

    ERIC Educational Resources Information Center

    Kitsantas, Anastasia; Kitsantas, Panagiota; Kitsantas, Thomas

    2012-01-01

    The purpose of this exploratory study was to assess the relative importance of a number of variables in predicting students' interest in math and/or computer science. Classification and regression trees (CART) were employed in the analysis of survey data collected from 276 college students enrolled in two U.S. and Greek universities. The results…

  5. Identification of extremely premature infants at high risk of rehospitalization.

    PubMed

    Ambalavanan, Namasivayam; Carlo, Waldemar A; McDonald, Scott A; Yao, Qing; Das, Abhik; Higgins, Rosemary D

    2011-11-01

    Extremely low birth weight infants often require rehospitalization during infancy. Our objective was to identify at the time of discharge which extremely low birth weight infants are at higher risk for rehospitalization. Data from extremely low birth weight infants in Eunice Kennedy Shriver National Institute of Child Health and Human Development Neonatal Research Network centers from 2002-2005 were analyzed. The primary outcome was rehospitalization by the 18- to 22-month follow-up, and secondary outcome was rehospitalization for respiratory causes in the first year. Using variables and odds ratios identified by stepwise logistic regression, scoring systems were developed with scores proportional to odds ratios. Classification and regression-tree analysis was performed by recursive partitioning and automatic selection of optimal cutoff points of variables. A total of 3787 infants were evaluated (mean ± SD birth weight: 787 ± 136 g; gestational age: 26 ± 2 weeks; 48% male, 42% black). Forty-five percent of the infants were rehospitalized by 18 to 22 months; 14.7% were rehospitalized for respiratory causes in the first year. Both regression models (area under the curve: 0.63) and classification and regression-tree models (mean misclassification rate: 40%-42%) were moderately accurate. Predictors for the primary outcome by regression were shunt surgery for hydrocephalus, hospital stay of >120 days for pulmonary reasons, necrotizing enterocolitis stage II or higher or spontaneous gastrointestinal perforation, higher fraction of inspired oxygen at 36 weeks, and male gender. By classification and regression-tree analysis, infants with hospital stays of >120 days for pulmonary reasons had a 66% rehospitalization rate compared with 42% without such a stay. The scoring systems and classification and regression-tree analysis models identified infants at higher risk of rehospitalization and might assist planning for care after discharge.

  6. Identification of Extremely Premature Infants at High Risk of Rehospitalization

    PubMed Central

    Carlo, Waldemar A.; McDonald, Scott A.; Yao, Qing; Das, Abhik; Higgins, Rosemary D.

    2011-01-01

    OBJECTIVE: Extremely low birth weight infants often require rehospitalization during infancy. Our objective was to identify at the time of discharge which extremely low birth weight infants are at higher risk for rehospitalization. METHODS: Data from extremely low birth weight infants in Eunice Kennedy Shriver National Institute of Child Health and Human Development Neonatal Research Network centers from 2002–2005 were analyzed. The primary outcome was rehospitalization by the 18- to 22-month follow-up, and secondary outcome was rehospitalization for respiratory causes in the first year. Using variables and odds ratios identified by stepwise logistic regression, scoring systems were developed with scores proportional to odds ratios. Classification and regression-tree analysis was performed by recursive partitioning and automatic selection of optimal cutoff points of variables. RESULTS: A total of 3787 infants were evaluated (mean ± SD birth weight: 787 ± 136 g; gestational age: 26 ± 2 weeks; 48% male, 42% black). Forty-five percent of the infants were rehospitalized by 18 to 22 months; 14.7% were rehospitalized for respiratory causes in the first year. Both regression models (area under the curve: 0.63) and classification and regression-tree models (mean misclassification rate: 40%–42%) were moderately accurate. Predictors for the primary outcome by regression were shunt surgery for hydrocephalus, hospital stay of >120 days for pulmonary reasons, necrotizing enterocolitis stage II or higher or spontaneous gastrointestinal perforation, higher fraction of inspired oxygen at 36 weeks, and male gender. By classification and regression-tree analysis, infants with hospital stays of >120 days for pulmonary reasons had a 66% rehospitalization rate compared with 42% without such a stay. CONCLUSIONS: The scoring systems and classification and regression-tree analysis models identified infants at higher risk of rehospitalization and might assist planning for care after discharge. PMID:22007016

  7. Modeling Tree Mortality Following Wildfire in Pinus ponderosa Forests in the Central Sierra Nevada of California

    Treesearch

    Jon C. Regelbrugge

    1993-01-01

    Abstract. We modeled tree mortality occurring two years following wildfire in Pinus ponderosa forests using data from 1275 trees in 25 stands burned during the 1987 Stanislaus Complex fires. We used logistic regression analysis to develop models relating the probability of wildfire-induced mortality with tree size and fire severity for Pinus ponderosa, Calocedrus...

  8. Finding structure in data using multivariate tree boosting

    PubMed Central

    Miller, Patrick J.; Lubke, Gitta H.; McArtor, Daniel B.; Bergeman, C. S.

    2016-01-01

    Technology and collaboration enable dramatic increases in the size of psychological and psychiatric data collections, but finding structure in these large data sets with many collected variables is challenging. Decision tree ensembles such as random forests (Strobl, Malley, & Tutz, 2009) are a useful tool for finding structure, but are difficult to interpret with multiple outcome variables which are often of interest in psychology. To find and interpret structure in data sets with multiple outcomes and many predictors (possibly exceeding the sample size), we introduce a multivariate extension to a decision tree ensemble method called gradient boosted regression trees (Friedman, 2001). Our extension, multivariate tree boosting, is a method for nonparametric regression that is useful for identifying important predictors, detecting predictors with nonlinear effects and interactions without specification of such effects, and for identifying predictors that cause two or more outcome variables to covary. We provide the R package ‘mvtboost’ to estimate, tune, and interpret the resulting model, which extends the implementation of univariate boosting in the R package ‘gbm’ (Ridgeway et al., 2015) to continuous, multivariate outcomes. To illustrate the approach, we analyze predictors of psychological well-being (Ryff & Keyes, 1995). Simulations verify that our approach identifies predictors with nonlinear effects and achieves high prediction accuracy, exceeding or matching the performance of (penalized) multivariate multiple regression and multivariate decision trees over a wide range of conditions. PMID:27918183

  9. Modeling time-to-event (survival) data using classification tree analysis.

    PubMed

    Linden, Ariel; Yarnold, Paul R

    2017-12-01

    Time to the occurrence of an event is often studied in health research. Survival analysis differs from other designs in that follow-up times for individuals who do not experience the event by the end of the study (called censored) are accounted for in the analysis. Cox regression is the standard method for analysing censored data, but the assumptions required of these models are easily violated. In this paper, we introduce classification tree analysis (CTA) as a flexible alternative for modelling censored data. Classification tree analysis is a "decision-tree"-like classification model that provides parsimonious, transparent (ie, easy to visually display and interpret) decision rules that maximize predictive accuracy, derives exact P values via permutation tests, and evaluates model cross-generalizability. Using empirical data, we identify all statistically valid, reproducible, longitudinally consistent, and cross-generalizable CTA survival models and then compare their predictive accuracy to estimates derived via Cox regression and an unadjusted naïve model. Model performance is assessed using integrated Brier scores and a comparison between estimated survival curves. The Cox regression model best predicts average incidence of the outcome over time, whereas CTA survival models best predict either relatively high, or low, incidence of the outcome over time. Classification tree analysis survival models offer many advantages over Cox regression, such as explicit maximization of predictive accuracy, parsimony, statistical robustness, and transparency. Therefore, researchers interested in accurate prognoses and clear decision rules should consider developing models using the CTA-survival framework. © 2017 John Wiley & Sons, Ltd.

  10. [Application of regression tree in analyzing the effects of climate factors on NDVI in loess hilly area of Shaanxi Province].

    PubMed

    Liu, Yang; Lü, Yi-he; Zheng, Hai-feng; Chen, Li-ding

    2010-05-01

    Based on the 10-day SPOT VEGETATION NDVI data and the daily meteorological data from 1998 to 2007 in Yan' an City, the main meteorological variables affecting the annual and interannual variations of NDVI were determined by using regression tree. It was found that the effects of test meteorological variables on the variability of NDVI differed with seasons and time lags. Temperature and precipitation were the most important meteorological variables affecting the annual variation of NDVI, and the average highest temperature was the most important meteorological variable affecting the inter-annual variation of NDVI. Regression tree was very powerful in determining the key meteorological variables affecting NDVI variation, but could not build quantitative relations between NDVI and meteorological variables, which limited its further and wider application.

  11. Partitioning sources of variation in vertebrate species richness

    USGS Publications Warehouse

    Boone, R.B.; Krohn, W.B.

    2000-01-01

    Aim: To explore biogeographic patterns of terrestrial vertebrates in Maine, USA using techniques that would describe local and spatial correlations with the environment. Location: Maine, USA. Methods: We delineated the ranges within Maine (86,156 km2) of 275 species using literature and expert review. Ranges were combined into species richness maps, and compared to geomorphology, climate, and woody plant distributions. Methods were adapted that compared richness of all vertebrate classes to each environmental correlate, rather than assessing a single explanatory theory. We partitioned variation in species richness into components using tree and multiple linear regression. Methods were used that allowed for useful comparisons between tree and linear regression results. For both methods we partitioned variation into broad-scale (spatially autocorrelated) and fine-scale (spatially uncorrelated) explained and unexplained components. By partitioning variance, and using both tree and linear regression in analyses, we explored the degree of variation in species richness for each vertebrate group that Could be explained by the relative contribution of each environmental variable. Results: In tree regression, climate variation explained richness better (92% of mean deviance explained for all species) than woody plant variation (87%) and geomorphology (86%). Reptiles were highly correlated with environmental variation (93%), followed by mammals, amphibians, and birds (each with 84-82% deviance explained). In multiple linear regression, climate was most closely associated with total vertebrate richness (78%), followed by woody plants (67%) and geomorphology (56%). Again, reptiles were closely correlated with the environment (95%), followed by mammals (73%), amphibians (63%) and birds (57%). Main conclusions: Comparing variation explained using tree and multiple linear regression quantified the importance of nonlinear relationships and local interactions between species richness and environmental variation, identifying the importance of linear relationships between reptiles and the environment, and nonlinear relationships between birds and woody plants, for example. Conservation planners should capture climatic variation in broad-scale designs; temperatures may shift during climate change, but the underlying correlations between the environment and species richness will presumably remain.

  12. The effect of using genealogy-based haplotypes for genomic prediction

    PubMed Central

    2013-01-01

    Background Genomic prediction uses two sources of information: linkage disequilibrium between markers and quantitative trait loci, and additive genetic relationships between individuals. One way to increase the accuracy of genomic prediction is to capture more linkage disequilibrium by regression on haplotypes instead of regression on individual markers. The aim of this study was to investigate the accuracy of genomic prediction using haplotypes based on local genealogy information. Methods A total of 4429 Danish Holstein bulls were genotyped with the 50K SNP chip. Haplotypes were constructed using local genealogical trees. Effects of haplotype covariates were estimated with two types of prediction models: (1) assuming that effects had the same distribution for all haplotype covariates, i.e. the GBLUP method and (2) assuming that a large proportion (π) of the haplotype covariates had zero effect, i.e. a Bayesian mixture method. Results About 7.5 times more covariate effects were estimated when fitting haplotypes based on local genealogical trees compared to fitting individuals markers. Genealogy-based haplotype clustering slightly increased the accuracy of genomic prediction and, in some cases, decreased the bias of prediction. With the Bayesian method, accuracy of prediction was less sensitive to parameter π when fitting haplotypes compared to fitting markers. Conclusions Use of haplotypes based on genealogy can slightly increase the accuracy of genomic prediction. Improved methods to cluster the haplotypes constructed from local genealogy could lead to additional gains in accuracy. PMID:23496971

  13. The effect of using genealogy-based haplotypes for genomic prediction.

    PubMed

    Edriss, Vahid; Fernando, Rohan L; Su, Guosheng; Lund, Mogens S; Guldbrandtsen, Bernt

    2013-03-06

    Genomic prediction uses two sources of information: linkage disequilibrium between markers and quantitative trait loci, and additive genetic relationships between individuals. One way to increase the accuracy of genomic prediction is to capture more linkage disequilibrium by regression on haplotypes instead of regression on individual markers. The aim of this study was to investigate the accuracy of genomic prediction using haplotypes based on local genealogy information. A total of 4429 Danish Holstein bulls were genotyped with the 50K SNP chip. Haplotypes were constructed using local genealogical trees. Effects of haplotype covariates were estimated with two types of prediction models: (1) assuming that effects had the same distribution for all haplotype covariates, i.e. the GBLUP method and (2) assuming that a large proportion (π) of the haplotype covariates had zero effect, i.e. a Bayesian mixture method. About 7.5 times more covariate effects were estimated when fitting haplotypes based on local genealogical trees compared to fitting individuals markers. Genealogy-based haplotype clustering slightly increased the accuracy of genomic prediction and, in some cases, decreased the bias of prediction. With the Bayesian method, accuracy of prediction was less sensitive to parameter π when fitting haplotypes compared to fitting markers. Use of haplotypes based on genealogy can slightly increase the accuracy of genomic prediction. Improved methods to cluster the haplotypes constructed from local genealogy could lead to additional gains in accuracy.

  14. Integration of vessel traits, wood density, and height in angiosperm shrubs and trees.

    PubMed

    Martínez-Cabrera, Hugo I; Schenk, H Jochen; Cevallos-Ferriz, Sergio R S; Jones, Cynthia S

    2011-05-01

    Trees and shrubs tend to occupy different niches within and across ecosystems; therefore, traits related to their resource use and life history are expected to differ. Here we analyzed how growth form is related to variation in integration among vessel traits, wood density, and height. We also considered the ecological and evolutionary consequences of such differences. In a sample of 200 woody plant species (65 shrubs and 135 trees) from Argentina, Mexico, and the United States, standardized major axis (SMA) regression, correlation analyses, and ANOVA were used to determine whether relationships among traits differed between growth forms. The influence of phylogenetic relationships was examined with a phylogenetic ANOVA and phylogenetically independent contrasts (PICs). A principal component analysis was conducted to determine whether trees and shrubs occupy different portions of multivariate trait space. Wood density did not differ between shrubs and trees, but there were significant differences in vessel diameter, vessel density, theoretical conductivity, and as expected, height. In addition, relationships between vessel traits and wood density differed between growth forms. Trees showed coordination among vessel traits, wood density, and height, but in shrubs, wood density and vessel traits were independent. These results hold when phylogenetic relationships were considered. In the multivariate analyses, these differences translated as significantly different positions in multivariate trait space occupied by shrubs and trees. Differences in trait integration between growth forms suggest that evolution of growth form in some lineages might be associated with the degree of trait interrelation.

  15. Modeling Caribbean tree stem diameters from tree height and crown width measurements

    Treesearch

    Thomas Brandeis; KaDonna Randolph; Mike Strub

    2009-01-01

    Regression models to predict diameter at breast height (DBH) as a function of tree height and maximum crown radius were developed for Caribbean forests based on data collected by the U.S. Forest Service in the Commonwealth of Puerto Rico and Territory of the U.S. Virgin Islands. The model predicting DBH from tree height fit reasonably well (R2 = 0.7110), with...

  16. Tree thinning as an option to increase herbaceous yield of an encroached semi-arid savanna in South Africa

    PubMed Central

    Smit, Gert N

    2005-01-01

    Background The investigation was conducted in a savanna area covered by what was considered an undesirably dense stand of Colophospermum mopane trees, mainly because such a dense stand of trees often results in the suppression of herbaceous plants. The objectives of this study were to determine the influence of intensity of tree thinning on the dry matter yield of herbaceous plants (notably grasses) and to investigate differences in herbaceous species composition between defined subhabitats (under tree canopies, between tree canopies and where trees have been removed). Seven plots (65 × 180 m) were subjected to different intensities of tree thinning, ranging from a totally cleared plot (0 %) to plots thinned to the equivalent of 10 %, 20%, 35 %, 50% and 75 % of the leaf biomass of a control plot (100 %) with a tree density of 2711 plants ha-1. The establishment of herbaceous plants (grasses and forbs) in response to reduced competition from the woody plants was measured during three full growing seasons following the thinning treatments. Results The grass component reacted positively to the tree thinning in terms of total dry matter (DM) yield, but forbs were negatively influenced. Rainfall interacted with tree density and the differences between grass DM yields in thinned plots during years of below average rainfall were substantially higher than those of the control. At high tree densities, yields differed little between seasons of varying rainfall. The relation between grass DM yield and tree biomass was curvilinear, best described by the exponential regression equation. Subhabitat differentiation by C. mopane trees did provide some qualitative benefits, with certain desirable grass species showing a preference for the subhabitat under tree canopies. Conclusion While it can be concluded from this study that high tree densities suppress herbaceous production, the decision to clear/thin the C. mopane trees should include additional considerations. Thinning of C. mopane with the exclusive objective of increasing productivity of the grass layer would thus invariably involve a compromise situation where some trees should be left for the sake of the qualitative benefits on the herbaceous layer, soil enrichment, provision of browse and stability of the ecosystem. PMID:15921528

  17. Perceived Organizational Support for Enhancing Welfare at Work: A Regression Tree Model

    PubMed Central

    Giorgi, Gabriele; Dubin, David; Perez, Javier Fiz

    2016-01-01

    When trying to examine outcomes such as welfare and well-being, research tends to focus on main effects and take into account limited numbers of variables at a time. There are a number of techniques that may help address this problem. For example, many statistical packages available in R provide easy-to-use methods of modeling complicated analysis such as classification and tree regression (i.e., recursive partitioning). The present research illustrates the value of recursive partitioning in the prediction of perceived organizational support in a sample of more than 6000 Italian bankers. Utilizing the tree function party package in R, we estimated a regression tree model predicting perceived organizational support from a multitude of job characteristics including job demand, lack of job control, lack of supervisor support, training, etc. The resulting model appears particularly helpful in pointing out several interactions in the prediction of perceived organizational support. In particular, training is the dominant factor. Another dimension that seems to influence organizational support is reporting (perceived communication about safety and stress concerns). Results are discussed from a theoretical and methodological point of view. PMID:28082924

  18. Prediction of strontium bromide laser efficiency using cluster and decision tree analysis

    NASA Astrophysics Data System (ADS)

    Iliev, Iliycho; Gocheva-Ilieva, Snezhana; Kulin, Chavdar

    2018-01-01

    Subject of investigation is a new high-powered strontium bromide (SrBr2) vapor laser emitting in multiline region of wavelengths. The laser is an alternative to the atom strontium lasers and electron free lasers, especially at the line 6.45 μm which line is used in surgery for medical processing of biological tissues and bones with minimal damage. In this paper the experimental data from measurements of operational and output characteristics of the laser are statistically processed by means of cluster analysis and tree-based regression techniques. The aim is to extract the more important relationships and dependences from the available data which influence the increase of the overall laser efficiency. There are constructed and analyzed a set of cluster models. It is shown by using different cluster methods that the seven investigated operational characteristics (laser tube diameter, length, supplied electrical power, and others) and laser efficiency are combined in 2 clusters. By the built regression tree models using Classification and Regression Trees (CART) technique there are obtained dependences to predict the values of efficiency, and especially the maximum efficiency with over 95% accuracy.

  19. Digression and Value Concatenation to Enable Privacy-Preserving Regression.

    PubMed

    Li, Xiao-Bai; Sarkar, Sumit

    2014-09-01

    Regression techniques can be used not only for legitimate data analysis, but also to infer private information about individuals. In this paper, we demonstrate that regression trees, a popular data-analysis and data-mining technique, can be used to effectively reveal individuals' sensitive data. This problem, which we call a "regression attack," has not been addressed in the data privacy literature, and existing privacy-preserving techniques are not appropriate in coping with this problem. We propose a new approach to counter regression attacks. To protect against privacy disclosure, our approach introduces a novel measure, called digression , which assesses the sensitive value disclosure risk in the process of building a regression tree model. Specifically, we develop an algorithm that uses the measure for pruning the tree to limit disclosure of sensitive data. We also propose a dynamic value-concatenation method for anonymizing data, which better preserves data utility than a user-defined generalization scheme commonly used in existing approaches. Our approach can be used for anonymizing both numeric and categorical data. An experimental study is conducted using real-world financial, economic and healthcare data. The results of the experiments demonstrate that the proposed approach is very effective in protecting data privacy while preserving data quality for research and analysis.

  20. Three-Dimensional Mapping of Soil Chemical Characteristics at Micrometric Scale by Combining 2D SEM-EDX Data and 3D X-Ray CT Images.

    PubMed

    Hapca, Simona; Baveye, Philippe C; Wilson, Clare; Lark, Richard Murray; Otten, Wilfred

    2015-01-01

    There is currently a significant need to improve our understanding of the factors that control a number of critical soil processes by integrating physical, chemical and biological measurements on soils at microscopic scales to help produce 3D maps of the related properties. Because of technological limitations, most chemical and biological measurements can be carried out only on exposed soil surfaces or 2-dimensional cuts through soil samples. Methods need to be developed to produce 3D maps of soil properties based on spatial sequences of 2D maps. In this general context, the objective of the research described here was to develop a method to generate 3D maps of soil chemical properties at the microscale by combining 2D SEM-EDX data with 3D X-ray computed tomography images. A statistical approach using the regression tree method and ordinary kriging applied to the residuals was developed and applied to predict the 3D spatial distribution of carbon, silicon, iron, and oxygen at the microscale. The spatial correlation between the X-ray grayscale intensities and the chemical maps made it possible to use a regression-tree model as an initial step to predict the 3D chemical composition. For chemical elements, e.g., iron, that are sparsely distributed in a soil sample, the regression-tree model provides a good prediction, explaining as much as 90% of the variability in some of the data. However, for chemical elements that are more homogenously distributed, such as carbon, silicon, or oxygen, the additional kriging of the regression tree residuals improved significantly the prediction with an increase in the R2 value from 0.221 to 0.324 for carbon, 0.312 to 0.423 for silicon, and 0.218 to 0.374 for oxygen, respectively. The present research develops for the first time an integrated experimental and theoretical framework, which combines geostatistical methods with imaging techniques to unveil the 3-D chemical structure of soil at very fine scales. The methodology presented in this study can be easily adapted and applied to other types of data such as bacterial or fungal population densities for the 3D characterization of microbial distribution.

  1. Three-Dimensional Mapping of Soil Chemical Characteristics at Micrometric Scale by Combining 2D SEM-EDX Data and 3D X-Ray CT Images

    PubMed Central

    Hapca, Simona; Baveye, Philippe C.; Wilson, Clare; Lark, Richard Murray; Otten, Wilfred

    2015-01-01

    There is currently a significant need to improve our understanding of the factors that control a number of critical soil processes by integrating physical, chemical and biological measurements on soils at microscopic scales to help produce 3D maps of the related properties. Because of technological limitations, most chemical and biological measurements can be carried out only on exposed soil surfaces or 2-dimensional cuts through soil samples. Methods need to be developed to produce 3D maps of soil properties based on spatial sequences of 2D maps. In this general context, the objective of the research described here was to develop a method to generate 3D maps of soil chemical properties at the microscale by combining 2D SEM-EDX data with 3D X-ray computed tomography images. A statistical approach using the regression tree method and ordinary kriging applied to the residuals was developed and applied to predict the 3D spatial distribution of carbon, silicon, iron, and oxygen at the microscale. The spatial correlation between the X-ray grayscale intensities and the chemical maps made it possible to use a regression-tree model as an initial step to predict the 3D chemical composition. For chemical elements, e.g., iron, that are sparsely distributed in a soil sample, the regression-tree model provides a good prediction, explaining as much as 90% of the variability in some of the data. However, for chemical elements that are more homogenously distributed, such as carbon, silicon, or oxygen, the additional kriging of the regression tree residuals improved significantly the prediction with an increase in the R2 value from 0.221 to 0.324 for carbon, 0.312 to 0.423 for silicon, and 0.218 to 0.374 for oxygen, respectively. The present research develops for the first time an integrated experimental and theoretical framework, which combines geostatistical methods with imaging techniques to unveil the 3-D chemical structure of soil at very fine scales. The methodology presented in this study can be easily adapted and applied to other types of data such as bacterial or fungal population densities for the 3D characterization of microbial distribution. PMID:26372473

  2. Estimating Dbh of Trees Employing Multiple Linear Regression of the best Lidar-Derived Parameter Combination Automated in Python in a Natural Broadleaf Forest in the Philippines

    NASA Astrophysics Data System (ADS)

    Ibanez, C. A. G.; Carcellar, B. G., III; Paringit, E. C.; Argamosa, R. J. L.; Faelga, R. A. G.; Posilero, M. A. V.; Zaragosa, G. P.; Dimayacyac, N. A.

    2016-06-01

    Diameter-at-Breast-Height Estimation is a prerequisite in various allometric equations estimating important forestry indices like stem volume, basal area, biomass and carbon stock. LiDAR Technology has a means of directly obtaining different forest parameters, except DBH, from the behavior and characteristics of point cloud unique in different forest classes. Extensive tree inventory was done on a two-hectare established sample plot in Mt. Makiling, Laguna for a natural growth forest. Coordinates, height, and canopy cover were measured and types of species were identified to compare to LiDAR derivatives. Multiple linear regression was used to get LiDAR-derived DBH by integrating field-derived DBH and 27 LiDAR-derived parameters at 20m, 10m, and 5m grid resolutions. To know the best combination of parameters in DBH Estimation, all possible combinations of parameters were generated and automated using python scripts and additional regression related libraries such as Numpy, Scipy, and Scikit learn were used. The combination that yields the highest r-squared or coefficient of determination and lowest AIC (Akaike's Information Criterion) and BIC (Bayesian Information Criterion) was determined to be the best equation. The equation is at its best using 11 parameters at 10mgrid size and at of 0.604 r-squared, 154.04 AIC and 175.08 BIC. Combination of parameters may differ among forest classes for further studies. Additional statistical tests can be supplemented to help determine the correlation among parameters such as Kaiser- Meyer-Olkin (KMO) Coefficient and the Barlett's Test for Spherecity (BTS).

  3. Comparing methods for estimation of heterogeneous treatment effects using observational data from health care databases.

    PubMed

    Wendling, T; Jung, K; Callahan, A; Schuler, A; Shah, N H; Gallego, B

    2018-06-03

    There is growing interest in using routinely collected data from health care databases to study the safety and effectiveness of therapies in "real-world" conditions, as it can provide complementary evidence to that of randomized controlled trials. Causal inference from health care databases is challenging because the data are typically noisy, high dimensional, and most importantly, observational. It requires methods that can estimate heterogeneous treatment effects while controlling for confounding in high dimensions. Bayesian additive regression trees, causal forests, causal boosting, and causal multivariate adaptive regression splines are off-the-shelf methods that have shown good performance for estimation of heterogeneous treatment effects in observational studies of continuous outcomes. However, it is not clear how these methods would perform in health care database studies where outcomes are often binary and rare and data structures are complex. In this study, we evaluate these methods in simulation studies that recapitulate key characteristics of comparative effectiveness studies. We focus on the conditional average effect of a binary treatment on a binary outcome using the conditional risk difference as an estimand. To emulate health care database studies, we propose a simulation design where real covariate and treatment assignment data are used and only outcomes are simulated based on nonparametric models of the real outcomes. We apply this design to 4 published observational studies that used records from 2 major health care databases in the United States. Our results suggest that Bayesian additive regression trees and causal boosting consistently provide low bias in conditional risk difference estimates in the context of health care database studies. Copyright © 2018 John Wiley & Sons, Ltd.

  4. Dendroclimatic estimates of a drought index for northern Virginia

    USGS Publications Warehouse

    Puckett, Larry J.

    1981-01-01

    A 230-year record of the Palmer drought-severity index (PDSI) was estimated for northern Virginia from variations in widths of tree rings. Increment cores were extracted from eastern hemlock, Tsuga canadensis (L.) Carr., at three locations in northern Virginia. Measurements of annual growth increments were made and converted to standardized indices of growth. A response function was derived for hemlock to determine the growth-climate relationship. Growth was positively correlated with precipitation and negatively correlated with temperature during the May-July growing season. Combined standardized indices of growth were calibrated with the July PDSI. Growth accounted for 20-30 percent of the PDSI variance. Further regressions using factor scores of combined tree growth indices resulted in a small but significant improvement. Greatest improvement was made by using factor scores of growth indices of individual trees, thereby accounting for 64 percent of the July PDSI variance in the regression. Comparison of the results with a 241-year reconstruction from New York showed good agreement between low-frequency climatic trends. Analysis of the estimated Central Mountain climatic division of Virginia PDSI record indicated that, relative to the long-term record (1746-1975), dry years have occurred in disproportionally larger numbers during the last half of the 19th century and the mid-20th century. This trend appears reversed for the last half of the 18th century and the first half of the 19th century. Although these results are considered first-generation products, they are encouraging, suggesting that once additional tree-ring chronologies are constructed and techniques are refined, it will be possible to obtain more accurate estimates of prior climatic conditions in the mid-Atlantic region.

  5. Decision trees in epidemiological research.

    PubMed

    Venkatasubramaniam, Ashwini; Wolfson, Julian; Mitchell, Nathan; Barnes, Timothy; JaKa, Meghan; French, Simone

    2017-01-01

    In many studies, it is of interest to identify population subgroups that are relatively homogeneous with respect to an outcome. The nature of these subgroups can provide insight into effect mechanisms and suggest targets for tailored interventions. However, identifying relevant subgroups can be challenging with standard statistical methods. We review the literature on decision trees, a family of techniques for partitioning the population, on the basis of covariates, into distinct subgroups who share similar values of an outcome variable. We compare two decision tree methods, the popular Classification and Regression tree (CART) technique and the newer Conditional Inference tree (CTree) technique, assessing their performance in a simulation study and using data from the Box Lunch Study, a randomized controlled trial of a portion size intervention. Both CART and CTree identify homogeneous population subgroups and offer improved prediction accuracy relative to regression-based approaches when subgroups are truly present in the data. An important distinction between CART and CTree is that the latter uses a formal statistical hypothesis testing framework in building decision trees, which simplifies the process of identifying and interpreting the final tree model. We also introduce a novel way to visualize the subgroups defined by decision trees. Our novel graphical visualization provides a more scientifically meaningful characterization of the subgroups identified by decision trees. Decision trees are a useful tool for identifying homogeneous subgroups defined by combinations of individual characteristics. While all decision tree techniques generate subgroups, we advocate the use of the newer CTree technique due to its simplicity and ease of interpretation.

  6. Indicators of Terrorism Vulnerability in Africa

    DTIC Science & Technology

    2015-03-26

    the terror threat and vulnerabilities across Africa. Key words: Terrorism, Africa, Negative Binomial Regression, Classification Tree iv I would like...31 Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 Log -likelihood...70 viii Page 5.3 Classification Tree Description

  7. Bayesian models for comparative analysis integrating phylogenetic uncertainty.

    PubMed

    de Villemereuil, Pierre; Wells, Jessie A; Edwards, Robert D; Blomberg, Simon P

    2012-06-28

    Uncertainty in comparative analyses can come from at least two sources: a) phylogenetic uncertainty in the tree topology or branch lengths, and b) uncertainty due to intraspecific variation in trait values, either due to measurement error or natural individual variation. Most phylogenetic comparative methods do not account for such uncertainties. Not accounting for these sources of uncertainty leads to false perceptions of precision (confidence intervals will be too narrow) and inflated significance in hypothesis testing (e.g. p-values will be too small). Although there is some application-specific software for fitting Bayesian models accounting for phylogenetic error, more general and flexible software is desirable. We developed models to directly incorporate phylogenetic uncertainty into a range of analyses that biologists commonly perform, using a Bayesian framework and Markov Chain Monte Carlo analyses. We demonstrate applications in linear regression, quantification of phylogenetic signal, and measurement error models. Phylogenetic uncertainty was incorporated by applying a prior distribution for the phylogeny, where this distribution consisted of the posterior tree sets from Bayesian phylogenetic tree estimation programs. The models were analysed using simulated data sets, and applied to a real data set on plant traits, from rainforest plant species in Northern Australia. Analyses were performed using the free and open source software OpenBUGS and JAGS. Incorporating phylogenetic uncertainty through an empirical prior distribution of trees leads to more precise estimation of regression model parameters than using a single consensus tree and enables a more realistic estimation of confidence intervals. In addition, models incorporating measurement errors and/or individual variation, in one or both variables, are easily formulated in the Bayesian framework. We show that BUGS is a useful, flexible general purpose tool for phylogenetic comparative analyses, particularly for modelling in the face of phylogenetic uncertainty and accounting for measurement error or individual variation in explanatory variables. Code for all models is provided in the BUGS model description language.

  8. Bayesian models for comparative analysis integrating phylogenetic uncertainty

    PubMed Central

    2012-01-01

    Background Uncertainty in comparative analyses can come from at least two sources: a) phylogenetic uncertainty in the tree topology or branch lengths, and b) uncertainty due to intraspecific variation in trait values, either due to measurement error or natural individual variation. Most phylogenetic comparative methods do not account for such uncertainties. Not accounting for these sources of uncertainty leads to false perceptions of precision (confidence intervals will be too narrow) and inflated significance in hypothesis testing (e.g. p-values will be too small). Although there is some application-specific software for fitting Bayesian models accounting for phylogenetic error, more general and flexible software is desirable. Methods We developed models to directly incorporate phylogenetic uncertainty into a range of analyses that biologists commonly perform, using a Bayesian framework and Markov Chain Monte Carlo analyses. Results We demonstrate applications in linear regression, quantification of phylogenetic signal, and measurement error models. Phylogenetic uncertainty was incorporated by applying a prior distribution for the phylogeny, where this distribution consisted of the posterior tree sets from Bayesian phylogenetic tree estimation programs. The models were analysed using simulated data sets, and applied to a real data set on plant traits, from rainforest plant species in Northern Australia. Analyses were performed using the free and open source software OpenBUGS and JAGS. Conclusions Incorporating phylogenetic uncertainty through an empirical prior distribution of trees leads to more precise estimation of regression model parameters than using a single consensus tree and enables a more realistic estimation of confidence intervals. In addition, models incorporating measurement errors and/or individual variation, in one or both variables, are easily formulated in the Bayesian framework. We show that BUGS is a useful, flexible general purpose tool for phylogenetic comparative analyses, particularly for modelling in the face of phylogenetic uncertainty and accounting for measurement error or individual variation in explanatory variables. Code for all models is provided in the BUGS model description language. PMID:22741602

  9. The effect of different distance measures in detecting outliers using clustering-based algorithm for circular regression model

    NASA Astrophysics Data System (ADS)

    Di, Nur Faraidah Muhammad; Satari, Siti Zanariah

    2017-05-01

    Outlier detection in linear data sets has been done vigorously but only a small amount of work has been done for outlier detection in circular data. In this study, we proposed multiple outliers detection in circular regression models based on the clustering algorithm. Clustering technique basically utilizes distance measure to define distance between various data points. Here, we introduce the similarity distance based on Euclidean distance for circular model and obtain a cluster tree using the single linkage clustering algorithm. Then, a stopping rule for the cluster tree based on the mean direction and circular standard deviation of the tree height is proposed. We classify the cluster group that exceeds the stopping rule as potential outlier. Our aim is to demonstrate the effectiveness of proposed algorithms with the similarity distances in detecting the outliers. It is found that the proposed methods are performed well and applicable for circular regression model.

  10. Developing Models to Forcast Sales of Natural Christmas Trees

    Treesearch

    Lawrence D. Garrett; Thomas H. Pendleton

    1977-01-01

    A study of practices for marketing Christmas trees in Winston-Salem, North Carolina, and Denver, Colorado, revealed that such factors as retail lot competition, tree price, consumer traffic, and consumer income were very important in determining a particular retailer's sales. Analyses of 4 years of market data were used in developing regression models for...

  11. Comprehensive database of diameter-based biomass regressions for North American tree species

    Treesearch

    Jennifer C. Jenkins; David C. Chojnacky; Linda S. Heath; Richard A. Birdsey

    2004-01-01

    A database consisting of 2,640 equations compiled from the literature for predicting the biomass of trees and tree components from diameter measurements of species found in North America. Bibliographic information, geographic locations, diameter limits, diameter and biomass units, equation forms, statistical errors, and coefficients are provided for each equation,...

  12. Tree-, stand- and site-specific controls on landscape-scale patterns of transpiration

    NASA Astrophysics Data System (ADS)

    Hassler, Sibylle; Markus, Weiler; Theresa, Blume

    2017-04-01

    Transpiration is a key process in the hydrological cycle and a sound understanding and quantification of transpiration and its spatial variability is essential for management decisions as well as for improving the parameterisation of hydrological and soil-vegetation-atmosphere transfer models. For individual trees, transpiration is commonly estimated by measuring sap flow. Besides evaporative demand and water availability, tree-specific characteristics such as species, size or social status control sap flow amounts of individual trees. Within forest stands, properties such as species composition, basal area or stand density additionally affect sap flow, for example via competition mechanisms. Finally, sap flow patterns might also be influenced by landscape-scale characteristics such as geology, slope position or aspect because they affect water and energy availability; however, little is known about the dynamic interplay of these controls. We studied the relative importance of various tree-, stand- and site-specific characteristics with multiple linear regression models to explain the variability of sap velocity measurements in 61 beech and oak trees, located at 24 sites spread over a 290 km2-catchment in Luxembourg. For each of 132 consecutive days of the growing season of 2014 we modelled the daily sap velocities of these 61 trees and determined the importance of the different predictors. Results indicate that a combination of tree-, stand- and site-specific factors controls sap velocity patterns in the landscape, namely tree species, tree diameter, the stand density, geology and aspect. Compared to these predictors, spatial variability of atmospheric demand and soil moisture explains only a small fraction of the variability in the daily datasets. However, the temporal dynamics of the explanatory power of the tree-specific characteristics, especially species, are correlated to the temporal dynamics of potential evaporation. Thus, transpiration estimates at the landscape scale would benefit from not only considering hydro-meteorological drivers, but also including tree, stand and site characteristics in order to improve the spatial representation of transpiration for hydrological and soil-vegetation-atmosphere transfer models.

  13. Landscape-scale consequences of differential tree mortality from catastrophic wind disturbance in the Amazon.

    PubMed

    Rifai, Sami W; Urquiza Muñoz, José D; Negrón-Juárez, Robinson I; Ramírez Arévalo, Fredy R; Tello-Espinoza, Rodil; Vanderwel, Mark C; Lichstein, Jeremy W; Chambers, Jeffrey Q; Bohlman, Stephanie A

    2016-10-01

    Wind disturbance can create large forest blowdowns, which greatly reduces live biomass and adds uncertainty to the strength of the Amazon carbon sink. Observational studies from within the central Amazon have quantified blowdown size and estimated total mortality but have not determined which trees are most likely to die from a catastrophic wind disturbance. Also, the impact of spatial dependence upon tree mortality from wind disturbance has seldom been quantified, which is important because wind disturbance often kills clusters of trees due to large treefalls killing surrounding neighbors. We examine (1) the causes of differential mortality between adult trees from a 300-ha blowdown event in the Peruvian region of the northwestern Amazon, (2) how accounting for spatial dependence affects mortality predictions, and (3) how incorporating both differential mortality and spatial dependence affect the landscape level estimation of necromass produced from the blowdown. Standard regression and spatial regression models were used to estimate how stem diameter, wood density, elevation, and a satellite-derived disturbance metric influenced the probability of tree death from the blowdown event. The model parameters regarding tree characteristics, topography, and spatial autocorrelation of the field data were then used to determine the consequences of non-random mortality for landscape production of necromass through a simulation model. Tree mortality was highly non-random within the blowdown, where tree mortality rates were highest for trees that were large, had low wood density, and were located at high elevation. Of the differential mortality models, the non-spatial models overpredicted necromass, whereas the spatial model slightly underpredicted necromass. When parameterized from the same field data, the spatial regression model with differential mortality estimated only 7.5% more dead trees across the entire blowdown than the random mortality model, yet it estimated 51% greater necromass. We suggest that predictions of forest carbon loss from wind disturbance are sensitive to not only the underlying spatial dependence of observations, but also the biological differences between individuals that promote differential levels of mortality. © 2016 by the Ecological Society of America.

  14. Tree Morphologic Plasticity Explains Deviation from Metabolic Scaling Theory in Semi-Arid Conifer Forests, Southwestern USA

    PubMed Central

    O’Connor, Christopher D.; Lynch, Ann M.

    2016-01-01

    A significant concern about Metabolic Scaling Theory (MST) in real forests relates to consistent differences between the values of power law scaling exponents of tree primary size measures used to estimate mass and those predicted by MST. Here we consider why observed scaling exponents for diameter and height relationships deviate from MST predictions across three semi-arid conifer forests in relation to: (1) tree condition and physical form, (2) the level of inter-tree competition (e.g. open vs closed stand structure), (3) increasing tree age, and (4) differences in site productivity. Scaling exponent values derived from non-linear least-squares regression for trees in excellent condition (n = 381) were above the MST prediction at the 95% confidence level, while the exponent for trees in good condition were no different than MST (n = 926). Trees that were in fair or poor condition, characterized as diseased, leaning, or sparsely crowned had exponent values below MST predictions (n = 2,058), as did recently dead standing trees (n = 375). Exponent value of the mean-tree model that disregarded tree condition (n = 3,740) was consistent with other studies that reject MST scaling. Ostensibly, as stand density and competition increase trees exhibited greater morphological plasticity whereby the majority had characteristically fair or poor growth forms. Fitting by least-squares regression biases the mean-tree model scaling exponent toward values that are below MST idealized predictions. For 368 trees from Arizona with known establishment dates, increasing age had no significant impact on expected scaling. We further suggest height to diameter ratios below MST relate to vertical truncation caused by limitation in plant water availability. Even with environmentally imposed height limitation, proportionality between height and diameter scaling exponents were consistent with the predictions of MST. PMID:27391084

  15. Tree Morphologic Plasticity Explains Deviation from Metabolic Scaling Theory in Semi-Arid Conifer Forests, Southwestern USA.

    PubMed

    Swetnam, Tyson L; O'Connor, Christopher D; Lynch, Ann M

    2016-01-01

    A significant concern about Metabolic Scaling Theory (MST) in real forests relates to consistent differences between the values of power law scaling exponents of tree primary size measures used to estimate mass and those predicted by MST. Here we consider why observed scaling exponents for diameter and height relationships deviate from MST predictions across three semi-arid conifer forests in relation to: (1) tree condition and physical form, (2) the level of inter-tree competition (e.g. open vs closed stand structure), (3) increasing tree age, and (4) differences in site productivity. Scaling exponent values derived from non-linear least-squares regression for trees in excellent condition (n = 381) were above the MST prediction at the 95% confidence level, while the exponent for trees in good condition were no different than MST (n = 926). Trees that were in fair or poor condition, characterized as diseased, leaning, or sparsely crowned had exponent values below MST predictions (n = 2,058), as did recently dead standing trees (n = 375). Exponent value of the mean-tree model that disregarded tree condition (n = 3,740) was consistent with other studies that reject MST scaling. Ostensibly, as stand density and competition increase trees exhibited greater morphological plasticity whereby the majority had characteristically fair or poor growth forms. Fitting by least-squares regression biases the mean-tree model scaling exponent toward values that are below MST idealized predictions. For 368 trees from Arizona with known establishment dates, increasing age had no significant impact on expected scaling. We further suggest height to diameter ratios below MST relate to vertical truncation caused by limitation in plant water availability. Even with environmentally imposed height limitation, proportionality between height and diameter scaling exponents were consistent with the predictions of MST.

  16. Lidar-based biomass assessment for the Yukon River Basin

    NASA Astrophysics Data System (ADS)

    Peterson, B.; Wylie, B. K.; Stoker, J.; Nossov, D.

    2010-12-01

    Climate change is expected to have a significant impact on high-latitude forests in terms of their ability to sequester carbon as expressed as pools of standing total biomass and soil organic matter. Above ground biomass is an important driver in ecosystem process models used to assess, predict, and understand climate change impacts. Therefore, it is of compelling interest to acquire accurate assessments of current biomass levels for these high-latitude forests, a particular challenge because of their vastness and remoteness. At this time, remote sensing is the only feasible method through which to acquire such assessments. In this study, the use of lidar data for estimating shrub and tree biomass for the Yukon Flats region of Alaska’s Yukon River Basin (YRB) is demonstrated. The lidar data were acquired in the late summer and fall of 2009 as were an initial set of field sampling data collected for training and validation purposes. The 2009 field campaigns were located near Canvasback Lake and Boot Lake in the YRB. Various tallies of biomass were calculated from the field data using allometric equations (Bond-Lamberty et al. 2002, Yarie et al. 2007, Mack et al. 2008). Additional field data were also collected during two 2010 field campaigns at different locations in the Yukon Flats. Linear regressions have been developed based on field-based shrub and tree biomass and various lidar metrics of canopy height calculated for the plots (900 m^2). A multiple linear regression performed at the plot level resulted in a strong relationship (R^2=0.88) between observed and predicted biomass at the plot level. The coefficients for this regression were used to generate a shrub and tree biomass map for the entire Yukon Flats study area covered by lidar. This biomass map will be evaluated using additional field data collected in 2010 as well as other remote sensing data sources. Furthermore, additional lidar metrics (e.g. height of median energy) are being derived from the raw lidar data set and are expected to result in improved biomass products for the YRB as they have been shown to be highly predictive of biomass in other biomes. The results of this project represent the first step in a larger effort to collect lidar and field data for various study sites across the YRB for biomass estimations to train large-scale mapping efforts using Landsat imagery and radar data. Bond-Lamberty, B., C. Wang, and S.T. Gower. 2002. Aboveground and belowground biomass and sapwood area allometric equations for six boreal tree species of northern Manitoba. Canadian Journal of Forest Research 32: 1441-1450. Mack, M., K. Treseder, K. Manies, J. Harden, E. Schuur, J. Vogel, J. Randerson, and F.S. Chapin III. 2008. Recovery of Aboveground Plant Biomass and Productivity After Fire in Mesic and Dry Black Spruce Forests of Interior Alaska, Ecosystems v.11:209-225. Yarie, J., E. Kane, and M. Mack. 2007. Aboveground Biomass Equations for the Trees of Interior Alaska. AFES Bulletin 115.

  17. QSRR modeling for diverse drugs using different feature selection methods coupled with linear and nonlinear regressions.

    PubMed

    Goodarzi, Mohammad; Jensen, Richard; Vander Heyden, Yvan

    2012-12-01

    A Quantitative Structure-Retention Relationship (QSRR) is proposed to estimate the chromatographic retention of 83 diverse drugs on a Unisphere poly butadiene (PBD) column, using isocratic elutions at pH 11.7. Previous work has generated QSRR models for them using Classification And Regression Trees (CART). In this work, Ant Colony Optimization is used as a feature selection method to find the best molecular descriptors from a large pool. In addition, several other selection methods have been applied, such as Genetic Algorithms, Stepwise Regression and the Relief method, not only to evaluate Ant Colony Optimization as a feature selection method but also to investigate its ability to find the important descriptors in QSRR. Multiple Linear Regression (MLR) and Support Vector Machines (SVMs) were applied as linear and nonlinear regression methods, respectively, giving excellent correlation between the experimental, i.e. extrapolated to a mobile phase consisting of pure water, and predicted logarithms of the retention factors of the drugs (logk(w)). The overall best model was the SVM one built using descriptors selected by ACO. Copyright © 2012 Elsevier B.V. All rights reserved.

  18. Three-dimensional mapping of soil chemical characteristics at micrometric scale: Statistical prediction by combining 2D SEM-EDX data and 3D X-ray computed micro-tomographic images

    NASA Astrophysics Data System (ADS)

    Hapca, Simona

    2015-04-01

    Many soil properties and functions emerge from interactions of physical, chemical and biological processes at microscopic scales, which can be understood only by integrating techniques that traditionally are developed within separate disciplines. While recent advances in imaging techniques, such as X-ray computed tomography (X-ray CT), offer the possibility to reconstruct the 3D physical structure at fine resolutions, for the distribution of chemicals in soil, existing methods, based on scanning electron microscope (SEM) and energy dispersive X-ray detection (EDX), allow for characterization of the chemical composition only on 2D surfaces. At present, direct 3D measurement techniques are still lacking, sequential sectioning of soils, followed by 2D mapping of chemical elements and interpolation to 3D, being an alternative which is explored in this study. Specifically, we develop an integrated experimental and theoretical framework which combines 3D X-ray CT imaging technique with 2D SEM-EDX and use spatial statistics methods to map the chemical composition of soil in 3D. The procedure involves three stages 1) scanning a resin impregnated soil cube by X-ray CT, followed by precision cutting to produce parallel thin slices, the surfaces of which are scanned by SEM-EDX, 2) alignment of the 2D chemical maps within the internal 3D structure of the soil cube, and 3) development, of spatial statistics methods to predict the chemical composition of 3D soil based on the observed 2D chemical and 3D physical data. Specifically, three statistical models consisting of a regression tree, a regression tree kriging and cokriging model were used to predict the 3D spatial distribution of carbon, silicon, iron and oxygen in soil, these chemical elements showing a good spatial agreement between the X-ray grayscale intensities and the corresponding 2D SEM-EDX data. Due to the spatial correlation between the physical and chemical data, the regression-tree model showed a great potential in predicting chemical composition in particular for iron, which is generally sparsely distributed in soil. For carbon, silicon and oxygen, which are more densely distributed, the additional kriging of the regression tree residuals improved significantly the prediction, whereas prediction based on co-kriging was less consistent across replicates, underperforming regression-tree kriging. The present study shows a great potential in integrating geo-statistical methods with imaging techniques to unveil the 3D chemical structure of soil at very fine scales, the framework being suitable to be further applied to other types of imaging data such as images of biological thin sections for characterization of microbial distribution. Key words: X-ray CT, SEM-EDX, segmentation techniques, spatial correlation, 3D soil images, 2D chemical maps.

  19. Unravelling the limits to tree height: a major role for water and nutrient trade-offs.

    PubMed

    Cramer, Michael D

    2012-05-01

    Competition for light has driven forest trees to grow exceedingly tall, but the lack of a single universal limit to tree height indicates multiple interacting environmental limitations. Because soil nutrient availability is determined by both nutrient concentrations and soil water, water and nutrient availabilities may interact in determining realised nutrient availability and consequently tree height. In SW Australia, which is characterised by nutrient impoverished soils that support some of the world's tallest forests, total [P] and water availability were independently correlated with tree height (r = 0.42 and 0.39, respectively). However, interactions between water availability and each of total [P], pH and [Mg] contributed to a multiple linear regression model of tree height (r = 0.72). A boosted regression tree model showed that maximum tree height was correlated with water availability (24%), followed by soil properties including total P (11%), Mg (10%) and total N (9%), amongst others, and that there was an interaction between water availability and total [P] in determining maximum tree height. These interactions indicated a trade-off between water and P availability in determining maximum tree height in SW Australia. This is enabled by a species assemblage capable of growing tall and surviving (some) disturbances. The mechanism for this trade-off is suggested to be through water enabling mass-flow and diffusive mobility of P, particularly of relatively mobile organic P, although water interactions with microbial activity could also play a role.

  20. Multivariate regression model for partitioning tree volume of white oak into round-product classes

    Treesearch

    Daniel A. Yaussy; David L. Sonderman

    1984-01-01

    Describes the development of multivariate equations that predict the expected cubic volume of four round-product classes from independent variables composed of individual tree-quality characteristics. Although the model has limited application at this time, it does demonstrate the feasibility of partitioning total tree cubic volume into round-product classes based on...

  1. A way forward for fire-caused tree mortality prediction: Modeling a physiological consequence of fire

    Treesearch

    Kathleen L. Kavanaugh; Matthew B. Dickinson; Anthony S. Bova

    2010-01-01

    Current operational methods for predicting tree mortality from fire injury are regression-based models that only indirectly consider underlying causes and, thus, have limited generality. A better understanding of the physiological consequences of tree heating and injury are needed to develop biophysical process models that can make predictions under changing or novel...

  2. Height-age relationships for regeneration-size trees in the northern Rocky Mountains, USA

    Treesearch

    Dennis E. Ferguson; Clinton E. Carlson

    2010-01-01

    Regression equations were developed to predict heights of 10 conifer species inregenerating stands in central and northern Idaho, western Montana, and eastern Washington. Most sample trees were natural regeneration that became established after conventional harvest and site preparation methods. Heights are predicted as a function of tree age, residual overstory density...

  3. Potential redistribution of tree species habitat under five climate change scenarios in the eastern US

    Treesearch

    Louis R. Iverson; Anantha M. Prasad; Anantha M. Prasad

    2002-01-01

    Global climate change could have profound effects on the Earth's biota, including large redistributions of tree species and forest types. We used DISTRIB, a deterministic regression tree analysis model, to examine environmental drivers related to current forest-species distributions and then model potential suitable habitat under five climate change scenarios...

  4. Multiple tree-ring isotopes as environmental indicators of diffuse atmospheric pollution in a peri-urban area

    NASA Astrophysics Data System (ADS)

    Doucet, A.; Savard, M. M.; Bégin, C.; Ouarda, T. B.; Marion, J.

    2010-12-01

    The combined analyses of tree-ring δ13C, δ18O, δ15N, 206Pb/207Pb, 206Pb/204Pb and 206Pb/208Pb isotope ratios of three red spruce specimens from the Tantaré ecological reserve located 40 km northwest of Québec City (Canada) were studied with the aim of reconstructing environmental conditions and unravel past air-quality changes of the 1880-2007 period. To separate the tree-ring δ18O and δ13C patterns induced by natural conditions from those generated by anthropogenic perturbations, a linear regression was applied between the most explicative meteorological parameters and the isotopic series for the period of low pollution (1880 to 1909). The model equations were then applied to the most recent part of the series (1910-2007) to verify if climatic conditions have remained the main driver of the tree-ring isotopic variations. The good fit between the modeled and measured δ18O series for the entire studied period suggests that the assimilation of oxygen by red spruce trees is not significantly affected by pollution stress near Québec City. However, the deviation between the measured and modeled δ13C values for the 1944-2007 period indicates that diffuse pollution affected carbon assimilation by the investigated trees. To independently validate if atmospheric pollution could have generated the deviation between the measured and the estimated δ13C values, a linear regression was applied between the portion of the residual δ13C values and atmospheric pollution (Canadian fossil fuel proxy from 1958 to 2000). The nice fit between the modeled δ13C values from the combination of the two regression analyses based on climate and emission proxy strongly supports the hypothesis that there is a natural and an anthropogenic portion in the δ13C variations of the studied specimens. The short-term variations of the red spruce δ15N series are correlated with the instrumentally measured amounts of provincial N emissions for the 1990 to 2006 period (longest measurements available). Additionally, the long-term decrease of the δ15N series after 1956 is linked to the low isotopic values of NOx emitted by car exhausts, as expressed by the provincial number of cars which reflect the amount of transport-related N deposition at the provincial scale. The 208Pb/206Pb and 204Pb/206Pb ratios as a function of 206Pb/207Pb of the 1880-1919 period reflect a mixture of natural lead from the mineral soil horizon and mainly anthropogenic lead from north-eastern American coal combustion. The lower Pb ratios of the 1920-1989 period correlate well with the introduction of leaded additives to gasoline characterized by lower ratios relative to coal combustion. Inferring the lead sources of the 1990-2008 period is not as straightforward because lead can potentially derive from three main sources: coal combustion, burnt recycled material and natural lead present in soils. Our results show the great potential of tree-ring stable isotopes to record pollution events in the context of peri-urban diffuse pollution, and to prolong the pollution history in regions where direct measurements of pollutants only covers a relatively short period.

  5. Vegetation placement for summer built surface temperature moderation in an urban microclimate.

    PubMed

    Millward, Andrew A; Torchia, Melissa; Laursen, Andrew E; Rothman, Lorne D

    2014-06-01

    Urban vegetation can mitigate increases in summer air temperature by reducing the solar gain received by buildings. To quantify the temperature-moderating influence of city trees and vine-covered buildings, a total of 13 pairs of temperature loggers were installed on the surfaces of eight buildings in downtown Toronto, Canada, for 6 months during the summer of 2008. One logger in each pair was shaded by vegetation while the other measured built surface temperature in full sunlight. We investigated the temperature-moderating benefits of solitary mature trees, clusters of trees, and perennial vines using a linear-mixed model and a multiple regression analysis of degree hour difference. We then assessed the temperature-moderating effect of leaf area, plant size and proximity to building, and plant location relative to solar path. During a period of high solar intensity, we measured an average temperature differential of 11.7 °C, with as many as 10-12 h of sustained cooler built surface temperatures. Vegetation on the west-facing aspect of built structures provided the greatest temperature moderation, with maximum benefit (peak temperature difference) occurring late in the afternoon. Large mature trees growing within 5 m of buildings showed the greatest ability to moderate built surface temperature, with those growing in clusters delivering limited additional benefit compared with isolated trees. Perennial vines proved as effective as trees at moderating rise in built surface temperature to the south and west sides of buildings, providing an attractive alternative to shade trees where soil volume and space are limited.

  6. Vegetation Placement for Summer Built Surface Temperature Moderation in an Urban Microclimate

    NASA Astrophysics Data System (ADS)

    Millward, Andrew A.; Torchia, Melissa; Laursen, Andrew E.; Rothman, Lorne D.

    2014-06-01

    Urban vegetation can mitigate increases in summer air temperature by reducing the solar gain received by buildings. To quantify the temperature-moderating influence of city trees and vine-covered buildings, a total of 13 pairs of temperature loggers were installed on the surfaces of eight buildings in downtown Toronto, Canada, for 6 months during the summer of 2008. One logger in each pair was shaded by vegetation while the other measured built surface temperature in full sunlight. We investigated the temperature-moderating benefits of solitary mature trees, clusters of trees, and perennial vines using a linear-mixed model and a multiple regression analysis of degree hour difference. We then assessed the temperature-moderating effect of leaf area, plant size and proximity to building, and plant location relative to solar path. During a period of high solar intensity, we measured an average temperature differential of 11.7 °C, with as many as 10-12 h of sustained cooler built surface temperatures. Vegetation on the west-facing aspect of built structures provided the greatest temperature moderation, with maximum benefit (peak temperature difference) occurring late in the afternoon. Large mature trees growing within 5 m of buildings showed the greatest ability to moderate built surface temperature, with those growing in clusters delivering limited additional benefit compared with isolated trees. Perennial vines proved as effective as trees at moderating rise in built surface temperature to the south and west sides of buildings, providing an attractive alternative to shade trees where soil volume and space are limited.

  7. Regionalization of meso-scale physically based nitrogen modeling outputs to the macro-scale by the use of regression trees

    NASA Astrophysics Data System (ADS)

    Künne, A.; Fink, M.; Kipka, H.; Krause, P.; Flügel, W.-A.

    2012-06-01

    In this paper, a method is presented to estimate excess nitrogen on large scales considering single field processes. The approach was implemented by using the physically based model J2000-S to simulate the nitrogen balance as well as the hydrological dynamics within meso-scale test catchments. The model input data, the parameterization, the results and a detailed system understanding were used to generate the regression tree models with GUIDE (Loh, 2002). For each landscape type in the federal state of Thuringia a regression tree was calibrated and validated using the model data and results of excess nitrogen from the test catchments. Hydrological parameters such as precipitation and evapotranspiration were also used to predict excess nitrogen by the regression tree model. Hence they had to be calculated and regionalized as well for the state of Thuringia. Here the model J2000g was used to simulate the water balance on the macro scale. With the regression trees the excess nitrogen was regionalized for each landscape type of Thuringia. The approach allows calculating the potential nitrogen input into the streams of the drainage area. The results show that the applied methodology was able to transfer the detailed model results of the meso-scale catchments to the entire state of Thuringia by low computing time without losing the detailed knowledge from the nitrogen transport modeling. This was validated with modeling results from Fink (2004) in a catchment lying in the regionalization area. The regionalized and modeled excess nitrogen correspond with 94%. The study was conducted within the framework of a project in collaboration with the Thuringian Environmental Ministry, whose overall aim was to assess the effect of agro-environmental measures regarding load reduction in the water bodies of Thuringia to fulfill the requirements of the European Water Framework Directive (Bäse et al., 2007; Fink, 2006; Fink et al., 2007).

  8. Predicting Diameter at Breast Height from Stump Diameters for Northeastern Tree Species

    Treesearch

    Eric H. Wharton; Eric H. Wharton

    1984-01-01

    Presents equations to predict diameter at breast height from stump diameter measurements for 17 northeastern tree species. Simple linear regression was used to develop the equations. Application of the equations is discussed.

  9. Combining logistic regression with classification and regression tree to predict quality of care in a home health nursing data set.

    PubMed

    Guo, Huey-Ming; Shyu, Yea-Ing Lotus; Chang, Her-Kun

    2006-01-01

    In this article, the authors provide an overview of a research method to predict quality of care in home health nursing data set. The results of this study can be visualized through classification an regression tree (CART) graphs. The analysis was more effective, and the results were more informative since the home health nursing dataset was analyzed with a combination of the logistic regression and CART, these two techniques complete each other. And the results more informative that more patients' characters were related to quality of care in home care. The results contributed to home health nurse predict patient outcome in case management. Improved prediction is needed for interventions to be appropriately targeted for improved patient outcome and quality of care.

  10. Informing tree-ring reconstructions with automated dendrometer data: the case of single-leaf pinyon (Pinus monophylla) from Great Basin National Park, Nevada, USA

    NASA Astrophysics Data System (ADS)

    Biondi, F.

    2012-12-01

    One of the most pressing issues in modern tree-ring science is to reduce uncertainty of reconstructions while emphasizing that the composition and dynamics of modern ecosystems cannot be understood from the present alone. I present here the latest results from research on the environmental factors that control radial growth of single-leaf pinyon (Pinus monophylla) in the Great Basin of North America using dendrometer data collected at half-hour intervals during two full growing season, 2010 and 2011. Automated (solar-powered) sensors at the site consisted of 8 point dendrometers installed on 7 trees to measure stem size, together with environmental probes that recorded air temperature, soil temperature and soil moisture. Additional meteorological variables at hourly timesteps were available from the EPA-CASTNET station located within 100 m of the dendrometer site. Daily cycles of stem expansion and contraction were quantified using the approach of Deslauriers et al. 2011, and the amount of daily radial stem increment was regressed against environmental variables. Graphical and numerical results showed that tree growth is relatively insensitive to surface soil moisture during the growing season. This finding corroborates empirical dendroclimatic results that showed how tree-ring chronologies of single-leaf pinyon are mostly a proxy for the balance between winter-spring precipitation supply and growing season evapotranspiration demand, thereby making it an ideal species for drought reconstructions.

  11. Estimation of carbon storage based on individual tree detection in Pinus densiflora stands using a fusion of aerial photography and LiDAR data.

    PubMed

    Kim, So-Ra; Kwak, Doo-Ahn; Lee, Woo-Kyun; oLee, Woo-Kyun; Son, Yowhan; Bae, Sang-Won; Kim, Choonsig; Yoo, Seongjin

    2010-07-01

    The objective of this study was to estimate the carbon storage capacity of Pinus densiflora stands using remotely sensed data by combining digital aerial photography with light detection and ranging (LiDAR) data. A digital canopy model (DCM), generated from the LiDAR data, was combined with aerial photography for segmenting crowns of individual trees. To eliminate errors in over and under-segmentation, the combined image was smoothed using a Gaussian filtering method. The processed image was then segmented into individual trees using a marker-controlled watershed segmentation method. After measuring the crown area from the segmented individual trees, the individual tree diameter at breast height (DBH) was estimated using a regression function developed from the relationship observed between the field-measured DBH and crown area. The above ground biomass of individual trees could be calculated by an image-derived DBH using a regression function developed by the Korea Forest Research Institute. The carbon storage, based on individual trees, was estimated by simple multiplication using the carbon conversion index (0.5), as suggested in guidelines from the Intergovernmental Panel on Climate Change. The mean carbon storage per individual tree was estimated and then compared with the field-measured value. This study suggested that the biomass and carbon storage in a large forest area can be effectively estimated using aerial photographs and LiDAR data.

  12. Modeling potential future individual tree-species distributions in the eastern United States under a climate change scenario: a case study with Pinus virginiana

    Treesearch

    Louis R. Iverson; Anantha Prasad; Mark W. Schwartz; Mark W. Schwartz

    1999-01-01

    We are using a deterministic regression tree analysis model (DISTRIB) and a stochastic migration model (SHIFT) to examine potential distributions of ~66 individual species of eastern US trees under a 2 x CO2 climate change scenario. This process is demonstrated for Virginia pine (Pinus virginiana).

  13. Potential Changes in Tree Species Richness and Forest Community Types following Climate Change

    Treesearch

    Louis R. Iverson; Anantha M. Prasad

    2001-01-01

    Potential changes in tree species richness and forest community types were evaluated for the eastern United States according to five scenarios of future climate change resulting from a doubling of atmospheric carbon dioxide (CO2). DISTRIB, an empirical model that uses a regression tree analysis approach, was used to generate suitable habitat, or potential future...

  14. Estimating tree crown widths for the primary Acadian species in Maine

    Treesearch

    Matthew B. Russell; Aaron R. Weiskittel

    2012-01-01

    In this analysis, data for seven conifer and eight hardwood species were gathered from across the state of Maine for estimating tree crown widths. Maximum and largest crown width equations were developed using tree diameter at breast height as the primary predicting variable. Quantile regression techniques were used to estimate the maximum crown width and a constrained...

  15. [RS estimation of inventory parameters and carbon storage of moso bamboo forest based on synergistic use of object-based image analysis and decision tree].

    PubMed

    Du, Hua Qiang; Sun, Xiao Yan; Han, Ning; Mao, Fang Jie

    2017-10-01

    By synergistically using the object-based image analysis (OBIA) and the classification and regression tree (CART) methods, the distribution information, the indexes (including diameter at breast, tree height, and crown closure), and the aboveground carbon storage (AGC) of moso bamboo forest in Shanchuan Town, Anji County, Zhejiang Province were investigated. The results showed that the moso bamboo forest could be accurately delineated by integrating the multi-scale ima ge segmentation in OBIA technique and CART, which connected the image objects at various scales, with a pretty good producer's accuracy of 89.1%. The investigation of indexes estimated by regression tree model that was constructed based on the features extracted from the image objects reached normal or better accuracy, in which the crown closure model archived the best estimating accuracy of 67.9%. The estimating accuracy of diameter at breast and tree height was relatively low, which was consistent with conclusion that estimating diameter at breast and tree height using optical remote sensing could not achieve satisfactory results. Estimation of AGC reached relatively high accuracy, and accuracy of the region of high value achieved above 80%.

  16. Tree growth response to ENSO in Durango, Mexico

    NASA Astrophysics Data System (ADS)

    Pompa-García, Marin; Miranda-Aragón, Liliana; Aguirre-Salado, Carlos Arturo

    2015-01-01

    The dynamics of forest ecosystems worldwide have been driven largely by climatic teleconnections. El Niño-Southern Oscillation (ENSO) is the strongest interannual variation of the Earth's climate, affecting the regional climatic regime. These teleconnections may impact plant phenology, growth rate, forest extent, and other gradual changes in forest ecosystems. The objective of this study was to investigate how Pinus cooperi populations face the influence of ENSO and regional microclimates in five ecozones in northwestern Mexico. Using standard dendrochronological techniques, tree-ring chronologies (TRI) were generated. TRI, ENSO, and climate relationships were correlated from 1950-2010. Additionally, multiple regressions were conducted in order to detect those ENSO months with direct relations in TRI ( p < 0.1). The five chronologies showed similar trends during the period they overlapped, indicating that the P. cooperi populations shared an interannual growth variation. In general, ENSO index showed correspondences with tree-ring growth in synchronous periods. We concluded that ENSO had connectivity with regional climate in northern Mexico and radial growth of P. cooperi populations has been driven largely by positive ENSO values (El Niño episodes).

  17. Tree growth response to ENSO in Durango, Mexico.

    PubMed

    Pompa-García, Marin; Miranda-Aragón, Liliana; Aguirre-Salado, Carlos Arturo

    2015-01-01

    The dynamics of forest ecosystems worldwide have been driven largely by climatic teleconnections. El Niño-Southern Oscillation (ENSO) is the strongest interannual variation of the Earth's climate, affecting the regional climatic regime. These teleconnections may impact plant phenology, growth rate, forest extent, and other gradual changes in forest ecosystems. The objective of this study was to investigate how Pinus cooperi populations face the influence of ENSO and regional microclimates in five ecozones in northwestern Mexico. Using standard dendrochronological techniques, tree-ring chronologies (TRI) were generated. TRI, ENSO, and climate relationships were correlated from 1950-2010. Additionally, multiple regressions were conducted in order to detect those ENSO months with direct relations in TRI (p < 0.1). The five chronologies showed similar trends during the period they overlapped, indicating that the P. cooperi populations shared an interannual growth variation. In general, ENSO index showed correspondences with tree-ring growth in synchronous periods. We concluded that ENSO had connectivity with regional climate in northern Mexico and radial growth of P. cooperi populations has been driven largely by positive ENSO values (El Niño episodes).

  18. Disentangling Environmental and Anthropogenic Impacts on the Distribution of Unintentionally Introduced Invasive Alien Insects in Mainland China

    PubMed Central

    Zhao, Cai-Yun; Xu, Jing; Liu, Xiao-Yan

    2017-01-01

    Abstract Globalization increases the opportunities for unintentionally introduced invasive alien species, especially for insects, and most of these species could damage ecosystems and cause economic loss in China. In this study, we analyzed drivers of the distribution of unintentionally introduced invasive alien insects. Based on the number of unintentionally introduced invasive alien insects and their presence/absence records in each province in mainland China, regression trees were built to elucidate the roles of environmental and anthropogenic factors on the number distribution and similarity of species composition of these insects. Classification and regression trees indicated climatic suitability (the mean temperature in January) and human economic activity (sum of total freight) are primary drivers for the number distribution pattern of unintentionally introduced invasive alien insects at provincial scale, while only environmental factors (the mean January temperature, the annual precipitation and the areas of provinces) significantly affect the similarity of them based on the multivariate regression trees. PMID:28973576

  19. Generalized linear and generalized additive models in studies of species distributions: Setting the scene

    USGS Publications Warehouse

    Guisan, Antoine; Edwards, T.C.; Hastie, T.

    2002-01-01

    An important statistical development of the last 30 years has been the advance in regression analysis provided by generalized linear models (GLMs) and generalized additive models (GAMs). Here we introduce a series of papers prepared within the framework of an international workshop entitled: Advances in GLMs/GAMs modeling: from species distribution to environmental management, held in Riederalp, Switzerland, 6-11 August 2001. We first discuss some general uses of statistical models in ecology, as well as provide a short review of several key examples of the use of GLMs and GAMs in ecological modeling efforts. We next present an overview of GLMs and GAMs, and discuss some of their related statistics used for predictor selection, model diagnostics, and evaluation. Included is a discussion of several new approaches applicable to GLMs and GAMs, such as ridge regression, an alternative to stepwise selection of predictors, and methods for the identification of interactions by a combined use of regression trees and several other approaches. We close with an overview of the papers and how we feel they advance our understanding of their application to ecological modeling. ?? 2002 Elsevier Science B.V. All rights reserved.

  20. Industrial and occupational ergonomics in the petrochemical process industry: a regression trees approach.

    PubMed

    Bevilacqua, M; Ciarapica, F E; Giacchetta, G

    2008-07-01

    This work is an attempt to apply classification tree methods to data regarding accidents in a medium-sized refinery, so as to identify the important relationships between the variables, which can be considered as decision-making rules when adopting any measures for improvement. The results obtained using the CART (Classification And Regression Trees) method proved to be the most precise and, in general, they are encouraging concerning the use of tree diagrams as preliminary explorative techniques for the assessment of the ergonomic, management and operational parameters which influence high accident risk situations. The Occupational Injury analysis carried out in this paper was planned as a dynamic process and can be repeated systematically. The CART technique, which considers a very wide set of objective and predictive variables, shows new cause-effect correlations in occupational safety which had never been previously described, highlighting possible injury risk groups and supporting decision-making in these areas. The use of classification trees must not, however, be seen as an attempt to supplant other techniques, but as a complementary method which can be integrated into traditional types of analysis.

  1. Acid rain, air pollution, and tree growth in southeastern New York

    USGS Publications Warehouse

    Puckett, L.J.

    1982-01-01

    Whether dendroecological analyses could be used to detect changes in the relationship of tree growth to climate that might have resulted from chronic exposure to components of the acid rain-air pollution complex was determined. Tree-ring indices of white pine (Pinus strobus L.), eastern hemlock (Tsuga canadensis (L.) Cart.), pitch pine (Pinus rigida Mill.), and chestnut oak (Quercus prinus L.) were regressed against orthogonally transformed values of temperature and precipitation in order to derive a response-function relationship. Results of the regression analyses for three time periods, 1901–1920, 1926–1945, and 1954–1973 suggest that the relationship of tree growth to climate has been altered. Statistical tests of the temperature and precipitation data suggest that this change was nonclimatic. Temporally, the shift in growth response appears to correspond with the suspected increase in acid rain and air pollution in the Shawangunk Mountain area of southeastern New York in the early 1950's. This change could be the result of physiological stress induced by components of the acid rain-air pollution complex, causing climatic conditions to be more limiting to tree growth.

  2. Application of classification tree and logistic regression for the management and health intervention plans in a community-based study.

    PubMed

    Teng, Ju-Hsi; Lin, Kuan-Chia; Ho, Bin-Shenq

    2007-10-01

    A community-based aboriginal study was conducted and analysed to explore the application of classification tree and logistic regression. A total of 1066 aboriginal residents in Yilan County were screened during 2003-2004. The independent variables include demographic characteristics, physical examinations, geographic location, health behaviours, dietary habits and family hereditary diseases history. Risk factors of cardiovascular diseases were selected as the dependent variables in further analysis. The completion rate for heath interview is 88.9%. The classification tree results find that if body mass index is higher than 25.72 kg m(-2) and the age is above 51 years, the predicted probability for number of cardiovascular risk factors > or =3 is 73.6% and the population is 322. If body mass index is higher than 26.35 kg m(-2) and geographical latitude of the village is lower than 24 degrees 22.8', the predicted probability for number of cardiovascular risk factors > or =4 is 60.8% and the population is 74. As the logistic regression results indicate that body mass index, drinking habit and menopause are the top three significant independent variables. The classification tree model specifically shows the discrimination paths and interactions between the risk groups. The logistic regression model presents and analyses the statistical independent factors of cardiovascular risks. Applying both models to specific situations will provide a different angle for the design and management of future health intervention plans after community-based study.

  3. GIS-based groundwater potential mapping using boosted regression tree, classification and regression tree, and random forest machine learning models in Iran.

    PubMed

    Naghibi, Seyed Amir; Pourghasemi, Hamid Reza; Dixon, Barnali

    2016-01-01

    Groundwater is considered one of the most valuable fresh water resources. The main objective of this study was to produce groundwater spring potential maps in the Koohrang Watershed, Chaharmahal-e-Bakhtiari Province, Iran, using three machine learning models: boosted regression tree (BRT), classification and regression tree (CART), and random forest (RF). Thirteen hydrological-geological-physiographical (HGP) factors that influence locations of springs were considered in this research. These factors include slope degree, slope aspect, altitude, topographic wetness index (TWI), slope length (LS), plan curvature, profile curvature, distance to rivers, distance to faults, lithology, land use, drainage density, and fault density. Subsequently, groundwater spring potential was modeled and mapped using CART, RF, and BRT algorithms. The predicted results from the three models were validated using the receiver operating characteristics curve (ROC). From 864 springs identified, 605 (≈70 %) locations were used for the spring potential mapping, while the remaining 259 (≈30 %) springs were used for the model validation. The area under the curve (AUC) for the BRT model was calculated as 0.8103 and for CART and RF the AUC were 0.7870 and 0.7119, respectively. Therefore, it was concluded that the BRT model produced the best prediction results while predicting locations of springs followed by CART and RF models, respectively. Geospatially integrated BRT, CART, and RF methods proved to be useful in generating the spring potential map (SPM) with reasonable accuracy.

  4. Propensity score estimation: machine learning and classification methods as alternatives to logistic regression

    PubMed Central

    Westreich, Daniel; Lessler, Justin; Funk, Michele Jonsson

    2010-01-01

    Summary Objective Propensity scores for the analysis of observational data are typically estimated using logistic regression. Our objective in this Review was to assess machine learning alternatives to logistic regression which may accomplish the same goals but with fewer assumptions or greater accuracy. Study Design and Setting We identified alternative methods for propensity score estimation and/or classification from the public health, biostatistics, discrete mathematics, and computer science literature, and evaluated these algorithms for applicability to the problem of propensity score estimation, potential advantages over logistic regression, and ease of use. Results We identified four techniques as alternatives to logistic regression: neural networks, support vector machines, decision trees (CART), and meta-classifiers (in particular, boosting). Conclusion While the assumptions of logistic regression are well understood, those assumptions are frequently ignored. All four alternatives have advantages and disadvantages compared with logistic regression. Boosting (meta-classifiers) and to a lesser extent decision trees (particularly CART) appear to be most promising for use in the context of propensity score analysis, but extensive simulation studies are needed to establish their utility in practice. PMID:20630332

  5. What contributes to perceived stress in later life? A recursive partitioning approach.

    PubMed

    Scott, Stacey B; Jackson, Brenda R; Bergeman, C S

    2011-12-01

    One possible explanation for the individual differences in outcomes of stress is the diversity of inputs that produce perceptions of being stressed. The current study examines how combinations of contextual features (e.g., social isolation, neighborhood quality, health problems, age discrimination, financial concerns, and recent life events) of later life contribute to overall feelings of stress. Recursive partitioning techniques (regression trees and random forests) were used to examine unique interrelations between predictors of perceived stress in a sample of 282 community-dwelling adults. Trees provided possible examples of equifinality (i.e., subsets of people with similar levels of perceived stress but different predictors) as well as identification both of contextual combinations that separated participants with very high and very low perceived stress. Random forest analyses aggregated across many trees based on permuted versions of the data and predictors; loneliness, financial strain, neighborhood strain, ageism, and to some extent life events emerged as important predictors. Interviews with a subsample of participants provided both thick description of the complex relationships identified in the trees, as well as additional risks not appearing in the survey results. Together, the analyses highlight what may be missed when stress is used as a simple unidimensional construct and can guide differential intervention efforts.

  6. What contributes to perceived stress in later life? A recursive partitioning approach

    PubMed Central

    Scott, Stacey B.; Jackson, Brenda R.; Bergeman, C. S.

    2011-01-01

    One possible explanation for the individual differences in outcomes of stress is the diversity of inputs that produce perceptions of being stressed. The current study examines how combinations of contextual features (e.g., social isolation, neighborhood quality, health problems, age discrimination, financial concerns, and recent life events) of later life contribute to overall feelings of stress. Recursive partitioning techniques (regression trees and random forests) were used to examine unique interrelations between predictors of perceived stress in a sample of 282 community-dwelling adults. Trees provided possible examples of equifinality (i.e., subsets of people with similar levels of perceived stress but different predictors) as well as for the identification both of contextual combinations that separated participants with very high and very low perceived stress. Random forest analyses aggregated across many trees based on permuted versions of the data and predictors; loneliness, financial strain, neighborhood strain, ageism, and to some extent life events emerged as important predictors. Interviews with a subsample of participants provided both thick description of the complex relationships identified in the trees, as well as additional risks not appearing in the survey results. Together, the analyses highlight what may be missed when stress is used as a simple unidimensional construct and can guide differential intervention efforts. PMID:21604885

  7. Random forests and stochastic gradient boosting for predicting tree canopy cover: Comparing tuning processes and model performance

    Treesearch

    E. Freeman; G. Moisen; J. Coulston; B. Wilson

    2014-01-01

    Random forests (RF) and stochastic gradient boosting (SGB), both involving an ensemble of classification and regression trees, are compared for modeling tree canopy cover for the 2011 National Land Cover Database (NLCD). The objectives of this study were twofold. First, sensitivity of RF and SGB to choices in tuning parameters was explored. Second, performance of the...

  8. Estimating probabilities of infestation and extent of damage by the roundheaded pine beetle in ponderosa pine in the Sacramento Mountains, New Mexico

    Treesearch

    Jose Negron

    1997-01-01

    Classification trees and linear regression analysis were used to build models to predict probabilities of infestation and amount of tree mortality in terms of basal area resulting from roundheaded pine beetle, Dendroctonus adjunctus Blandford, activity in ponderosa pine, Pinus ponderosa Laws., in the Sacramento Mountains, New Mexico. Classification trees were built for...

  9. An Extension of CART's Pruning Algorithm. Program Statistics Research Technical Report No. 91-11.

    ERIC Educational Resources Information Center

    Kim, Sung-Ho

    Among the computer-based methods used for the construction of trees such as AID, THAID, CART, and FACT, the only one that uses an algorithm that first grows a tree and then prunes the tree is CART. The pruning component of CART is analogous in spirit to the backward elimination approach in regression analysis. This idea provides a tool in…

  10. STX--Fortran-4 program for estimates of tree populations from 3P sample-tree-measurements

    Treesearch

    L. R. Grosenbaugh

    1967-01-01

    Describes how to use an improved and greatly expanded version of an earlier computer program (1964) that converts dendrometer measurements of 3P-sample trees to population values in terms of whatever units user desires. Many new options are available, including that of obtaining a product-yield and appraisal report based on regression coefficients supplied by user....

  11. Portable Language-Independent Adaptive Translation from OCR. Phase 1

    DTIC Science & Technology

    2009-04-01

    including brute-force k-Nearest Neighbors ( kNN ), fast approximate kNN using hashed k-d trees, classification and regression trees, and locality...achieved by refinements in ground-truthing protocols. Recent algorithmic improvements to our approximate kNN classifier using hashed k-D trees allows...recent years discriminative training has been shown to outperform phonetic HMMs estimated using ML for speech recognition. Standard ML estimation

  12. Additivity in tree biomass components of Pyrenean oak (Quercus pyrenaica Willd.)

    Treesearch

    Joao P. Carvalho; Bernard R. Parresol

    2003-01-01

    In tree biomass estimations, it is important to consider the property of additivity, i.e., the total tree biomass should equal the sum of the components. This work presents functions that allow estimation of the stem and crown dry weight components of Pyrenean oak (Quercus pyrenaica Willd.) trees. A procedure that considers additivity of tree biomass...

  13. Improving Cluster Analysis with Automatic Variable Selection Based on Trees

    DTIC Science & Technology

    2014-12-01

    regression trees Daisy DISsimilAritY PAM partitioning around medoids PMA penalized multivariate analysis SPC sparse principal components UPGMA unweighted...unweighted pair-group average method ( UPGMA ). This method measures dissimilarities between all objects in two clusters and takes the average value

  14. Predicting U.S. Army Reserve Unit Manning Using Market Demographics

    DTIC Science & Technology

    2015-06-01

    develops linear regression , classification tree, and logistic regression models to determine the ability of the location to support manning requirements... logistic regression model delivers predictive results that allow decision-makers to identify locations with a high probability of meeting unit...manning requirements. The recommendation of this thesis is that the USAR implement the logistic regression model. 14. SUBJECT TERMS U.S

  15. Simulation of land use change in the three gorges reservoir area based on CART-CA

    NASA Astrophysics Data System (ADS)

    Yuan, Min

    2018-05-01

    This study proposes a new method to simulate spatiotemporal complex multiple land uses by using classification and regression tree algorithm (CART) based CA model. In this model, we use classification and regression tree algorithm to calculate land class conversion probability, and combine neighborhood factor, random factor to extract cellular transformation rules. The overall Kappa coefficient is 0.8014 and the overall accuracy is 0.8821 in the land dynamic simulation results of the three gorges reservoir area from 2000 to 2010, and the simulation results are satisfactory.

  16. CADDIS Volume 4. Data Analysis: Basic Analyses

    EPA Pesticide Factsheets

    Use of statistical tests to determine if an observation is outside the normal range of expected values. Details of CART, regression analysis, use of quantile regression analysis, CART in causal analysis, simplifying or pruning resulting trees.

  17. Nitrogen deposition outweighs climatic variability in driving annual growth rate of canopy beech trees: Evidence from long-term growth reconstruction across a geographic gradient.

    PubMed

    Gentilesca, Tiziana; Rita, Angelo; Brunetti, Michele; Giammarchi, Francesco; Leonardi, Stefano; Magnani, Federico; van Noije, Twan; Tonon, Giustino; Borghetti, Marco

    2018-07-01

    In this study, we investigated the role of climatic variability and atmospheric nitrogen deposition in driving long-term tree growth in canopy beech trees along a geographic gradient in the montane belt of the Italian peninsula, from the Alps to the southern Apennines. We sampled dominant trees at different developmental stages (from young to mature tree cohorts, with tree ages spanning from 35 to 160 years) and used stem analysis to infer historic reconstruction of tree volume and dominant height. Annual growth volume (G V ) and height (G H ) variability were related to annual variability in model simulated atmospheric nitrogen deposition and site-specific climatic variables, (i.e. mean annual temperature, total annual precipitation, mean growing period temperature, total growing period precipitation, and standard precipitation evapotranspiration index) and atmospheric CO 2 concentration, including tree cambial age among growth predictors. Generalized additive models (GAM), linear mixed-effects models (LMM), and Bayesian regression models (BRM) were independently employed to assess explanatory variables. The main results from our study were as follows: (i) tree age was the main explanatory variable for long-term growth variability; (ii) GAM, LMM, and BRM results consistently indicated climatic variables and CO 2 effects on G V and G H were weak, therefore evidence of recent climatic variability influence on beech annual growth rates was limited in the montane belt of the Italian peninsula; (iii) instead, significant positive nitrogen deposition (N dep ) effects were repeatedly observed in G V and G H ; the positive effects of N dep on canopy height growth rates, which tended to level off at N dep values greater than approximately 1.0 g m -2  y -1 , were interpreted as positive impacts on forest stand above-ground net productivity at the selected study sites. © 2018 John Wiley & Sons Ltd.

  18. Modifiable risk factors predicting major depressive disorder at four year follow-up: a decision tree approach.

    PubMed

    Batterham, Philip J; Christensen, Helen; Mackinnon, Andrew J

    2009-11-22

    Relative to physical health conditions such as cardiovascular disease, little is known about risk factors that predict the prevalence of depression. The present study investigates the expected effects of a reduction of these risks over time, using the decision tree method favoured in assessing cardiovascular disease risk. The PATH through Life cohort was used for the study, comprising 2,105 20-24 year olds, 2,323 40-44 year olds and 2,177 60-64 year olds sampled from the community in the Canberra region, Australia. A decision tree methodology was used to predict the presence of major depressive disorder after four years of follow-up. The decision tree was compared with a logistic regression analysis using ROC curves. The decision tree was found to distinguish and delineate a wide range of risk profiles. Previous depressive symptoms were most highly predictive of depression after four years, however, modifiable risk factors such as substance use and employment status played significant roles in assessing the risk of depression. The decision tree was found to have better sensitivity and specificity than a logistic regression using identical predictors. The decision tree method was useful in assessing the risk of major depressive disorder over four years. Application of the model to the development of a predictive tool for tailored interventions is discussed.

  19. A study of Solar-Enso correlation with southern Brazil tree ring index (1955- 1991)

    NASA Astrophysics Data System (ADS)

    Rigozo, N.; Nordemann, D.; Vieira, L.; Echer, E.

    The effects of solar activity and El Niño-Southern Oscillation on tree growth in Southern Brazil were studied by correlation analysis. Trees for this study were native Araucaria (Araucaria Angustifolia)from four locations in Rio Grande do Sul State, in Southern Brazil: Canela (29o18`S, 50o51`W, 790 m asl), Nova Petropolis (29o2`S, 51o10`W, 579 m asl), Sao Francisco de Paula (29o25`S, 50o24`W, 930 m asl) and Sao Martinho da Serra (29o30`S, 53o53`W, 484 m asl). From these four sites, an average tree ring Index for this region was derived, for the period 1955-1991. Linear correlations were made on annual and 10 year running averages of this tree ring Index, of sunspot number Rz and SOI. For annual averages, the correlation coefficients were low, and the multiple regression between tree ring and SOI and Rz indicates that 20% of the variance in tree rings was explained by solar activity and ENSO variability. However, when the 10 year running averages correlations were made, the coefficient correlations were much higher. A clear anticorrelation is observed between SOI and Index (r=-0.81) whereas Rz and Index show a positive correlation (r=0.67). The multiple regression of 10 year running averages indicates that 76% of the variance in tree ring INdex was explained by solar activity and ENSO. These results indicate that the effects of solar activity and ENSO on tree rings are better seen on long timescales.

  20. Forest inventory predictions from individual tree crowns: regression modeling within a sample framework

    Treesearch

    James W. Flewelling

    2009-01-01

    Remotely sensed data can be used to make digital maps showing individual tree crowns (ITC) for entire forests. Attributes of the ITCs may include area, shape, height, and color. The crown map is sampled in a way that provides an unbiased linkage between ITCs and identifiable trees measured on the ground. Methods of avoiding edge bias are given. In an example from a...

  1. The relationship between tree canopy and crime rates across an urban-rural gradient in the greater Baltimore region

    Treesearch

    Austin Troy; J. Morgan Grove; Jarlath O' Neill-Dunne

    2012-01-01

    The extent to which urban tree cover influences crime is in debate in the literature. This research took advantage of geocoded crime point data and high resolution tree canopy data to address this question in Baltimore City and County, MD, an area that includes a significant urban-rural gradient. Using ordinary least squares and spatially adjusted regression and...

  2. Identification of subgroups by risk of graft failure after paediatric renal transplantation: application of survival tree models on the ESPN/ERA-EDTA Registry.

    PubMed

    Lofaro, Danilo; Jager, Kitty J; Abu-Hanna, Ameen; Groothoff, Jaap W; Arikoski, Pekka; Hoecker, Britta; Roussey-Kesler, Gwenaelle; Spasojević, Brankica; Verrina, Enrico; Schaefer, Franz; van Stralen, Karlijn J

    2016-02-01

    Identification of patient groups by risk of renal graft loss might be helpful for accurate patient counselling and clinical decision-making. Survival tree models are an alternative statistical approach to identify subgroups, offering cut-off points for covariates and an easy-to-interpret representation. Within the European Society of Pediatric Nephrology/European Renal Association-European Dialysis and Transplant Association (ESPN/ERA-EDTA) Registry data we identified paediatric patient groups with specific profiles for 5-year renal graft survival. Two analyses were performed, including (i) parameters known at time of transplantation and (ii) additional clinical measurements obtained early after transplantation. The identified subgroups were added as covariates in two survival models. The prognostic performance of the models was tested and compared with conventional Cox regression analyses. The first analysis included 5275 paediatric renal transplants. The best 5-year graft survival (90.4%) was found among patients who received a renal graft as a pre-emptive transplantation or after short-term dialysis (<45 days), whereas graft survival was poorest (51.7%) in adolescents transplanted after long-term dialysis (>2.2 years). The Cox model including both pre-transplant factors and tree subgroups had a significantly better predictive performance than conventional Cox regression (P < 0.001). In the analysis including clinical factors, graft survival ranged from 97.3% [younger patients with estimated glomerular filtration rate (eGFR) >30 mL/min/1.73 m(2) and dialysis <20 months] to 34.7% (adolescents with eGFR <60 mL/min/1.73 m(2) and dialysis >20 months). Also in this case combining tree findings and clinical factors improved the predictive performance as compared with conventional Cox model models (P < 0.0001). In conclusion, we demonstrated the tree model to be an accurate and attractive tool to predict graft failure for patients with specific characteristics. This may aid the evaluation of individual graft prognosis and thereby the design of measures to improve graft survival in the poor prognosis groups. © The Author 2015. Published by Oxford University Press on behalf of ERA-EDTA. All rights reserved.

  3. Modeling brook trout presence and absence from landscape variables using four different analytical methods

    USGS Publications Warehouse

    Steen, Paul J.; Passino-Reader, Dora R.; Wiley, Michael J.

    2006-01-01

    As a part of the Great Lakes Regional Aquatic Gap Analysis Project, we evaluated methodologies for modeling associations between fish species and habitat characteristics at a landscape scale. To do this, we created brook trout Salvelinus fontinalis presence and absence models based on four different techniques: multiple linear regression, logistic regression, neural networks, and classification trees. The models were tested in two ways: by application to an independent validation database and cross-validation using the training data, and by visual comparison of statewide distribution maps with historically recorded occurrences from the Michigan Fish Atlas. Although differences in the accuracy of our models were slight, the logistic regression model predicted with the least error, followed by multiple regression, then classification trees, then the neural networks. These models will provide natural resource managers a way to identify habitats requiring protection for the conservation of fish species.

  4. Methods for identifying SNP interactions: a review on variations of Logic Regression, Random Forest and Bayesian logistic regression.

    PubMed

    Chen, Carla Chia-Ming; Schwender, Holger; Keith, Jonathan; Nunkesser, Robin; Mengersen, Kerrie; Macrossan, Paula

    2011-01-01

    Due to advancements in computational ability, enhanced technology and a reduction in the price of genotyping, more data are being generated for understanding genetic associations with diseases and disorders. However, with the availability of large data sets comes the inherent challenges of new methods of statistical analysis and modeling. Considering a complex phenotype may be the effect of a combination of multiple loci, various statistical methods have been developed for identifying genetic epistasis effects. Among these methods, logic regression (LR) is an intriguing approach incorporating tree-like structures. Various methods have built on the original LR to improve different aspects of the model. In this study, we review four variations of LR, namely Logic Feature Selection, Monte Carlo Logic Regression, Genetic Programming for Association Studies, and Modified Logic Regression-Gene Expression Programming, and investigate the performance of each method using simulated and real genotype data. We contrast these with another tree-like approach, namely Random Forests, and a Bayesian logistic regression with stochastic search variable selection.

  5. [Prediction and spatial distribution of recruitment trees of natural secondary forest based on geographically weighted Poisson model].

    PubMed

    Zhang, Ling Yu; Liu, Zhao Gang

    2017-12-01

    Based on the data collected from 108 permanent plots of the forest resources survey in Maoershan Experimental Forest Farm during 2004-2016, this study investigated the spatial distribution of recruitment trees in natural secondary forest by global Poisson regression and geographically weighted Poisson regression (GWPR) with four bandwidths of 2.5, 5, 10 and 15 km. The simulation effects of the 5 regressions and the factors influencing the recruitment trees in stands were analyzed, a description was given to the spatial autocorrelation of the regression residuals on global and local levels using Moran's I. The results showed that the spatial distribution of the number of natural secondary forest recruitment was significantly influenced by stands and topographic factors, especially average DBH. The GWPR model with small scale (2.5 km) had high accuracy of model fitting, a large range of model parameter estimates was generated, and the localized spatial distribution effect of the model parameters was obtained. The GWPR model at small scale (2.5 and 5 km) had produced a small range of model residuals, and the stability of the model was improved. The global spatial auto-correlation of the GWPR model residual at the small scale (2.5 km) was the lowe-st, and the local spatial auto-correlation was significantly reduced, in which an ideal spatial distribution pattern of small clusters with different observations was formed. The local model at small scale (2.5 km) was much better than the global model in the simulation effect on the spatial distribution of recruitment tree number.

  6. International consensus on preliminary definitions of improvement in adult and juvenile myositis.

    PubMed

    Rider, Lisa G; Giannini, Edward H; Brunner, Hermine I; Ruperto, Nicola; James-Newton, Laura; Reed, Ann M; Lachenbruch, Peter A; Miller, Frederick W

    2004-07-01

    To use a core set of outcome measures to develop preliminary definitions of improvement for adult and juvenile myositis as composite end points for therapeutic trials. Twenty-nine experts in the assessment of myositis achieved consensus on 102 adult and 102 juvenile paper patient profiles as clinically improved or not improved. Two hundred twenty-seven candidate definitions of improvement were developed using the experts' consensus ratings as a gold standard and their judgment of clinically meaningful change in the core set of measures. Seventeen additional candidate definitions of improvement were developed from classification and regression tree analysis, a data-mining decision tree tool analysis. Six candidate definitions specifying percentage change or raw change in the core set of measures were developed using logistic regression analysis. Adult and pediatric working groups ranked the 13 top-performing candidate definitions for face validity, clinical sensibility, and ease of use, in which the sensitivity and specificity were >/=75% in adult, pediatric, and combined data sets. Nominal group technique was used to facilitate consensus formation. The definition of improvement (common to the adult and pediatric working groups) that ranked highest was 3 of any 6 of the core set measures improved by >/=20%, with no more than 2 worse by >/=25% (which could not include manual muscle testing to assess strength). Five and 4 additional preliminary definitions of improvement for adult and juvenile myositis, respectively, were also developed, with several definitions common to both groups. Participants also agreed to prospectively test 6 logistic regression definitions of improvement in clinical trials. Consensus preliminary definitions of improvement were developed for adult and juvenile myositis, and these incorporate clinically meaningful change in all myositis core set measures in a composite end point. These definitions require prospective validation, but they are now proposed for use as end points in all myositis trials.

  7. A combined M5P tree and hazard-based duration model for predicting urban freeway traffic accident durations.

    PubMed

    Lin, Lei; Wang, Qian; Sadek, Adel W

    2016-06-01

    The duration of freeway traffic accidents duration is an important factor, which affects traffic congestion, environmental pollution, and secondary accidents. Among previous studies, the M5P algorithm has been shown to be an effective tool for predicting incident duration. M5P builds a tree-based model, like the traditional classification and regression tree (CART) method, but with multiple linear regression models as its leaves. The problem with M5P for accident duration prediction, however, is that whereas linear regression assumes that the conditional distribution of accident durations is normally distributed, the distribution for a "time-to-an-event" is almost certainly nonsymmetrical. A hazard-based duration model (HBDM) is a better choice for this kind of a "time-to-event" modeling scenario, and given this, HBDMs have been previously applied to analyze and predict traffic accidents duration. Previous research, however, has not yet applied HBDMs for accident duration prediction, in association with clustering or classification of the dataset to minimize data heterogeneity. The current paper proposes a novel approach for accident duration prediction, which improves on the original M5P tree algorithm through the construction of a M5P-HBDM model, in which the leaves of the M5P tree model are HBDMs instead of linear regression models. Such a model offers the advantage of minimizing data heterogeneity through dataset classification, and avoids the need for the incorrect assumption of normality for traffic accident durations. The proposed model was then tested on two freeway accident datasets. For each dataset, the first 500 records were used to train the following three models: (1) an M5P tree; (2) a HBDM; and (3) the proposed M5P-HBDM, and the remainder of data were used for testing. The results show that the proposed M5P-HBDM managed to identify more significant and meaningful variables than either M5P or HBDMs. Moreover, the M5P-HBDM had the lowest overall mean absolute percentage error (MAPE). Copyright © 2016 Elsevier Ltd. All rights reserved.

  8. Observed Methods for Felling Hardwood Trees with Chain Saws

    Treesearch

    Jerry L. Koger

    1983-01-01

    The angles and lengths of the cutting surfaces made by chain saw operators on hardwood tree stumps are described by means, standard deviations, ranges, and regression equations. Recommended felling guidelines are compared with observed felling methods used by experienced timber cutters in the southern Appalachian Mountains.

  9. The use of copulas to practical estimation of multivariate stochastic differential equation mixed effects models

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Rupšys, P.

    A system of stochastic differential equations (SDE) with mixed-effects parameters and multivariate normal copula density function were used to develop tree height model for Scots pine trees in Lithuania. A two-step maximum likelihood parameter estimation method is used and computational guidelines are given. After fitting the conditional probability density functions to outside bark diameter at breast height, and total tree height, a bivariate normal copula distribution model was constructed. Predictions from the mixed-effects parameters SDE tree height model calculated during this research were compared to the regression tree height equations. The results are implemented in the symbolic computational language MAPLE.

  10. Grassland and cropland net ecosystem production of the U.S. Great Plains: Regression tree model development and comparative analysis

    USGS Publications Warehouse

    Wylie, Bruce K.; Howard, Daniel; Dahal, Devendra; Gilmanov, Tagir; Ji, Lei; Zhang, Li; Smith, Kelcy

    2016-01-01

    This paper presents the methodology and results of two ecological-based net ecosystem production (NEP) regression tree models capable of up scaling measurements made at various flux tower sites throughout the U.S. Great Plains. Separate grassland and cropland NEP regression tree models were trained using various remote sensing data and other biogeophysical data, along with 15 flux towers contributing to the grassland model and 15 flux towers for the cropland model. The models yielded weekly mean daily grassland and cropland NEP maps of the U.S. Great Plains at 250 m resolution for 2000–2008. The grassland and cropland NEP maps were spatially summarized and statistically compared. The results of this study indicate that grassland and cropland ecosystems generally performed as weak net carbon (C) sinks, absorbing more C from the atmosphere than they released from 2000 to 2008. Grasslands demonstrated higher carbon sink potential (139 g C·m−2·year−1) than non-irrigated croplands. A closer look into the weekly time series reveals the C fluctuation through time and space for each land cover type.

  11. Disentangling Environmental and Anthropogenic Impacts on the Distribution of Unintentionally Introduced Invasive Alien Insects in Mainland China.

    PubMed

    Zhao, Cai-Yun; Li, Jun-Sheng; Xu, Jing; Liu, Xiao-Yan

    2017-05-01

    Globalization increases the opportunities for unintentionally introduced invasive alien species, especially for insects, and most of these species could damage ecosystems and cause economic loss in China. In this study, we analyzed drivers of the distribution of unintentionally introduced invasive alien insects. Based on the number of unintentionally introduced invasive alien insects and their presence/absence records in each province in mainland China, regression trees were built to elucidate the roles of environmental and anthropogenic factors on the number distribution and similarity of species composition of these insects. Classification and regression trees indicated climatic suitability (the mean temperature in January) and human economic activity (sum of total freight) are primary drivers for the number distribution pattern of unintentionally introduced invasive alien insects at provincial scale, while only environmental factors (the mean January temperature, the annual precipitation and the areas of provinces) significantly affect the similarity of them based on the multivariate regression trees. © The Authors 2017. Published by Oxford University Press on behalf of Entomological Society of America.

  12. Analysis of the effect of evergreen and deciduous trees on urban nitrogen dioxide levels in the U.S. using land-use regression

    NASA Astrophysics Data System (ADS)

    Rao, M.; George, L. A.

    2012-12-01

    Nitrogen dioxide (NO2), an atmospheric pollutant generated primarily by anthropogenic combustion processes, is typically found at higher concentrations in urban areas compared to non-urbanized environments. Elevated NO2 levels have multiple ecosystem effects at different spatial scales. At the local scale, elevated levels affect human health directly and through the formation of secondary pollutants such as ozone and aerosols; at the regional scale secondary pollutants such as nitric acid and organic nitrates have deleterious effects on non-urbanized areas; and, at the global scale, nitrogen oxide emissions significantly alter the natural biogeochemical nitrogen cycle. As cities globally become larger and larger sources of nitrogen oxide emissions, it is important to assess possible mitigation strategies to reduce the impact of emissions locally, regionally and globally. In this study, we build a national land-use regression (LUR) model to compare the impacts of deciduous and evergreen trees on urban NO2 levels in the United States. We use the EPA monitoring network values of NO2 levels for 2006, the 2006 NLCD tree canopy data for deciduous and evergreen canopies, and the US Census Bureau's TIGER shapefiles for roads, railroads, impervious area & population density as proxies for NO2 sources on-road traffic, railroad traffic, off-road and area sources respectively. Our preliminary LUR model corroborates previous LUR studies showing that the presence of trees is associated with reduced urban NO2 levels. Additionally, our model indicates that deciduous and evergreen trees reduce NO2 to different extents, and that the amount of NO2 reduced varies seasonally. The model indicates that every square kilometer of deciduous canopy within a 2km buffer is associated with a reduction in ambient NO2 levels of 0.64 ppb in summer and 0.46ppb in winter. Similarly, every square kilometer of evergreen tree canopy within a 2 km buffer is associated with a reduction in ambient NO2 by 0.53 ppb in summer and 0.84 ppb in winter. Thus, the model indicates that deciduous trees are associated with a 30% smaller reduction in NO2 in winter as compared to summer, while evergreens are associated with a 60% increase in the reduction of NO2 in winter, for every square kilometer of deciduous or evergreen canopy within a 2 km buffer. Leaf- and local canopy-level studies have shown that trees are a sink for urban NO2 through deposition as well as stomatal and cuticular uptake. The winter time versus summer time effects suggest that leaf-level deposition may not be the only uptake mechanism and points to the need for a more holistic analysis of tree and canopy-level deposition for urban air pollution models. Since deposition velocities for NO2 vary by tree species, the reduction may also vary by species. These findings have implications for designing cities to reduce the impact of air pollution.

  13. Influence of meteorological variables on rainfall partitioning for deciduous and coniferous tree species in urban area

    NASA Astrophysics Data System (ADS)

    Zabret, Katarina; Rakovec, Jože; Šraj, Mojca

    2018-03-01

    Rainfall partitioning is an important part of the ecohydrological cycle, influenced by numerous variables. Rainfall partitioning for pine (Pinus nigra Arnold) and birch (Betula pendula Roth.) trees was measured from January 2014 to June 2017 in an urban area of Ljubljana, Slovenia. 180 events from more than three years of observations were analyzed, focusing on 13 meteorological variables, including the number of raindrops, their diameter, and velocity. Regression tree and boosted regression tree analyses were performed to evaluate the influence of the variables on rainfall interception loss, throughfall, and stemflow in different phenoseasons. The amount of rainfall was recognized as the most influential variable, followed by rainfall intensity and the number of raindrops. Higher rainfall amount, intensity, and the number of drops decreased percentage of rainfall interception loss. Rainfall amount and intensity were the most influential on interception loss by birch and pine trees during the leafed and leafless periods, respectively. Lower wind speed was found to increase throughfall, whereas wind direction had no significant influence. Consideration of drop size spectrum properties proved to be important, since the number of drops, drop diameter, and median volume diameter were often recognized as important influential variables.

  14. Regression models for estimating leaf area of seedlings and adult individuals of Neotropical rainforest tree species.

    PubMed

    Brito-Rocha, E; Schilling, A C; Dos Anjos, L; Piotto, D; Dalmolin, A C; Mielke, M S

    2016-01-01

    Individual leaf area (LA) is a key variable in studies of tree ecophysiology because it directly influences light interception, photosynthesis and evapotranspiration of adult trees and seedlings. We analyzed the leaf dimensions (length - L and width - W) of seedlings and adults of seven Neotropical rainforest tree species (Brosimum rubescens, Manilkara maxima, Pouteria caimito, Pouteria torta, Psidium cattleyanum, Symphonia globulifera and Tabebuia stenocalyx) with the objective to test the feasibility of single regression models to estimate LA of both adults and seedlings. In southern Bahia, Brazil, a first set of data was collected between March and October 2012. From the seven species analyzed, only two (P. cattleyanum and T. stenocalyx) had very similar relationships between LW and LA in both ontogenetic stages. For these two species, a second set of data was collected in August 2014, in order to validate the single models encompassing adult and seedlings. Our results show the possibility of development of models for predicting individual leaf area encompassing different ontogenetic stages for tropical tree species. The development of these models was more dependent on the species than the differences in leaf size between seedlings and adults.

  15. Deciphering factors controlling groundwater arsenic spatial variability in Bangladesh

    NASA Astrophysics Data System (ADS)

    Tan, Z.; Yang, Q.; Zheng, C.; Zheng, Y.

    2017-12-01

    Elevated concentrations of geogenic arsenic in groundwater have been found in many countries to exceed 10 μg/L, the WHO's guideline value for drinking water. A common yet unexplained characteristic of groundwater arsenic spatial distribution is the extensive variability at various spatial scales. This study investigates factors influencing the spatial variability of groundwater arsenic in Bangladesh to improve the accuracy of models predicting arsenic exceedance rate spatially. A novel boosted regression tree method is used to establish a weak-learning ensemble model, which is compared to a linear model using a conventional stepwise logistic regression method. The boosted regression tree models offer the advantage of parametric interaction when big datasets are analyzed in comparison to the logistic regression. The point data set (n=3,538) of groundwater hydrochemistry with 19 parameters was obtained by the British Geological Survey in 2001. The spatial data sets of geological parameters (n=13) were from the Consortium for Spatial Information, Technical University of Denmark, University of East Anglia and the FAO, while the soil parameters (n=42) were from the Harmonized World Soil Database. The aforementioned parameters were regressed to categorical groundwater arsenic concentrations below or above three thresholds: 5 μg/L, 10 μg/L and 50 μg/L to identify respective controlling factors. Boosted regression tree method outperformed logistic regression methods in all three threshold levels in terms of accuracy, specificity and sensitivity, resulting in an improvement of spatial distribution map of probability of groundwater arsenic exceeding all three thresholds when compared to disjunctive-kriging interpolated spatial arsenic map using the same groundwater arsenic dataset. Boosted regression tree models also show that the most important controlling factors of groundwater arsenic distribution include groundwater iron content and well depth for all three thresholds. The probability of a well with iron content higher than 5mg/L to contain greater than 5 μg/L, 10 μg/L and 50 μg/L As is estimated to be more than 91%, 85% and 51%, respectively, while the probability of a well from depth more than 160m to contain more than 5 μg/L, 10 μg/L and 50 μg/L As is estimated to be less than 38%, 25% and 14%, respectively.

  16. The photosynthesis - leaf nitrogen relationship at ambient and elevated atmospheric carbon dioxide: a meta-analysis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Andrew G. Peterson; J. Timothy Ball; Yiqi Luo

    1998-09-25

    Estimation of leaf photosynthetic rate (A) from leaf nitrogen content (N) is both conceptually and numerically important in models of plant, ecosystem and biosphere responses to global change. The relationship between A and N has been studied extensively at ambient CO{sub 2} but much less at elevated CO{sub 2}. This study was designed to (1) assess whether the A-N relationship was more similar for species within than between community and vegetation types, and (2) examine how growth at elevated CO{sub 2} affects the A-N relationship. Data were obtained for 39 C{sub 3} species grown at ambient CO{sub 2} and 10more » C{sub 3} species grown at ambient and elevated CO{sub 2}. A regression model was applied to each species as well as to species pooled within different community and vegetation types. Cluster analysis of the regression coefficients indicated that species measured at ambient CO{sub 2} did not separate into distinct groups matching community or vegetation type. Instead, most community and vegetation types shared the same general parameter space for regression coefficients. Growth at elevated CO{sub 2} increased photosynthetic nitrogen use efficiency for pines and deciduous trees. When species were pooled by vegetation type, the A-N relationship for deciduous trees expressed on a leaf-mass bask was not altered by elevated CO{sub 2}, while the intercept increased for pines. When regression coefficients were averaged to give mean responses for different vegetation types, elevated CO{sub 2} increased the intercept and the slope for deciduous trees but increased only the intercept for pines. There were no statistical differences between the pines and deciduous trees for the effect of CO{sub 2}. Generalizations about the effect of elevated CO{sub 2} on the A-N relationship, and differences between pines and deciduous trees will be enhanced as more data become available.« less

  17. Using LiDAR to Estimate Total Aboveground Biomass of Redwood Stands in the Jackson Demonstration State Forest, Mendocino, California

    NASA Astrophysics Data System (ADS)

    Rao, M.; Vuong, H.

    2013-12-01

    The overall objective of this study is to develop a method for estimating total aboveground biomass of redwood stands in Jackson Demonstration State Forest, Mendocino, California using airborne LiDAR data. LiDAR data owing to its vertical and horizontal accuracy are increasingly being used to characterize landscape features including ground surface elevation and canopy height. These LiDAR-derived metrics involving structural signatures at higher precision and accuracy can help better understand ecological processes at various spatial scales. Our study is focused on two major species of the forest: redwood (Sequoia semperirens [D.Don] Engl.) and Douglas-fir (Pseudotsuga mensiezii [Mirb.] Franco). Specifically, the objectives included linear regression models fitting tree diameter at breast height (dbh) to LiDAR derived height for each species. From 23 random points on the study area, field measurement (dbh and tree coordinate) were collected for more than 500 trees of Redwood and Douglas-fir over 0.2 ha- plots. The USFS-FUSION application software along with its LiDAR Data Viewer (LDV) were used to to extract Canopy Height Model (CHM) from which tree heights would be derived. Based on the LiDAR derived height and ground based dbh, a linear regression model was developed to predict dbh. The predicted dbh was used to estimate the biomass at the single tree level using Jenkin's formula (Jenkin et al 2003). The linear regression models were able to explain 65% of the variability associated with Redwood's dbh and 80% of that associated with Douglas-fir's dbh.

  18. Dispersion patterns and sampling plans for Diaphorina citri (Hemiptera: Psyllidae) in citrus.

    PubMed

    Sétamou, Mamoudou; Flores, Daniel; French, J Victor; Hall, David G

    2008-08-01

    The abundance and spatial dispersion of Diaphorina citri Kuwayama (Hemiptera: Psyllidae) were studied in 34 grapefruit (Citrus paradisi Macfad.) and six sweet orange [Citrus sinensis (L.) Osbeck] orchards from March to August 2006 when the pest is more abundant in southern Texas. Although flush shoot infestation levels did not vary with host plant species, densities of D. citri eggs, nymphs, and adults were significantly higher on sweet orange than on grapefruit. D. citri immatures also were found in significantly higher numbers in the southeastern quadrant of trees than other parts of the canopy. The spatial distribution of D. citri nymphs and adults was analyzed using Iowa's patchiness regression and Taylor's power law. Taylor's power law fitted the data better than Iowa's model. Based on both regression models, the field dispersion patterns of D. citri nymphs and adults were aggregated among flush shoots in individual trees as indicated by the regression slopes that were significantly >1. For the average density of each life stage obtained during our surveys, the minimum number of flush shoots per tree needed to estimate D. citri densities varied from eight for eggs to four flush shoots for adults. Projections indicated that a sampling plan consisting of 10 trees and eight flush shoots per tree would provide density estimates of the three developmental stages of D. citri acceptable enough for population studies and management decisions. A presence-absence sampling plan with a fixed precision level was developed and can be used to provide a quick estimation of D. citri populations in citrus orchards.

  19. CART (Classification and Regression Trees) Program: The Implementation of the CART Program and Its Application to Estimating Attrition Rates.

    DTIC Science & Technology

    1985-12-01

    consists of the node t and all descendants of t in T. (3) Definition 3. Pruning a branch Tt from a tree T con- sists of deleting from T all...The default is 1.0 so that actually, this keyword did not need to appear in the above file. (5) DELETE . This keyword does not appear in our example, but...when it is used associated with some variable names, it indicates that we want to delete these vari- ables from the regression. If this keyword is

  20. Error analysis of leaf area estimates made from allometric regression models

    NASA Technical Reports Server (NTRS)

    Feiveson, A. H.; Chhikara, R. S.

    1986-01-01

    Biological net productivity, measured in terms of the change in biomass with time, affects global productivity and the quality of life through biochemical and hydrological cycles and by its effect on the overall energy balance. Estimating leaf area for large ecosystems is one of the more important means of monitoring this productivity. For a particular forest plot, the leaf area is often estimated by a two-stage process. In the first stage, known as dimension analysis, a small number of trees are felled so that their areas can be measured as accurately as possible. These leaf areas are then related to non-destructive, easily-measured features such as bole diameter and tree height, by using a regression model. In the second stage, the non-destructive features are measured for all or for a sample of trees in the plots and then used as input into the regression model to estimate the total leaf area. Because both stages of the estimation process are subject to error, it is difficult to evaluate the accuracy of the final plot leaf area estimates. This paper illustrates how a complete error analysis can be made, using an example from a study made on aspen trees in northern Minnesota. The study was a joint effort by NASA and the University of California at Santa Barbara known as COVER (Characterization of Vegetation with Remote Sensing).

  1. Nervous systems and scenarios for the invertebrate-to-vertebrate transition

    PubMed Central

    Holland, Nicholas D.

    2016-01-01

    Older evolutionary scenarios for the origin of vertebrates often gave nervous systems top billing in accordance with the notion that a big-brained Homo sapiens crowned a tree of life shaped mainly by progressive evolution. Now, however, tree thinking positions all extant organisms equidistant from the tree's root, and molecular phylogenies indicate that regressive evolution is more common than previously suspected. Even so, contemporary theories of vertebrate origin still focus on the nervous system because of its functional importance, its richness in characters for comparative biology, and its central position in the two currently prominent scenarios for the invertebrate-to-vertebrate transition, which grew out of the markedly neurocentric annelid and enteropneust theories of the nineteenth century. Both these scenarios compare phyla with diverse overall body plans. This diversity, exacerbated by the scarcity of relevant fossil data, makes it challenging to establish plausible homologies between component parts (e.g. nervous system regions). In addition, our current understanding of the relation between genotype and phenotype is too preliminary to permit us to convert gene network data into structural features in any simple way. These issues are discussed here with special reference to the evolution of nervous systems during proposed transitions from invertebrates to vertebrates. PMID:26598728

  2. A novel dendrochronological approach reveals drivers of carbon sequestration in tree species of riparian forests across spatiotemporal scales.

    PubMed

    Rieger, Isaak; Kowarik, Ingo; Cherubini, Paolo; Cierjacks, Arne

    2017-01-01

    Aboveground carbon (C) sequestration in trees is important in global C dynamics, but reliable techniques for its modeling in highly productive and heterogeneous ecosystems are limited. We applied an extended dendrochronological approach to disentangle the functioning of drivers from the atmosphere (temperature, precipitation), the lithosphere (sedimentation rate), the hydrosphere (groundwater table, river water level fluctuation), the biosphere (tree characteristics), and the anthroposphere (dike construction). Carbon sequestration in aboveground biomass of riparian Quercus robur L. and Fraxinus excelsior L. was modeled (1) over time using boosted regression tree analysis (BRT) on cross-datable trees characterized by equal annual growth ring patterns and (2) across space using a subsequent classification and regression tree analysis (CART) on cross-datable and not cross-datable trees. While C sequestration of cross-datable Q. robur responded to precipitation and temperature, cross-datable F. excelsior also responded to a low Danube river water level. However, CART revealed that C sequestration over time is governed by tree height and parameters that vary over space (magnitude of fluctuation in the groundwater table, vertical distance to mean river water level, and longitudinal distance to upstream end of the study area). Thus, a uniform response to climatic drivers of aboveground C sequestration in Q. robur was only detectable in trees of an intermediate height class and in taller trees (>21.8m) on sites where the groundwater table fluctuated little (≤0.9m). The detection of climatic drivers and the river water level in F. excelsior depended on sites at lower altitudes above the mean river water level (≤2.7m) and along a less dynamic downstream section of the study area. Our approach indicates unexploited opportunities of understanding the interplay of different environmental drivers in aboveground C sequestration. Results may support species-specific and locally adapted forest management plans to increase carbon dioxide sequestration from the atmosphere in trees. Copyright © 2016 Elsevier B.V. All rights reserved.

  3. There is no temperature dependence of net biochemical fractionation of hydrogen and oxygen isotopes in tree-ring cellulose.

    PubMed

    Roden, J S; Ehleringer, J R

    2000-01-01

    The isotopic composition of tree-ring cellulose was obtained over a two-year period from small diameter, riparian zone trees along an elevational transect in Big Cottonwood Canyon, Utah, USA to test for a possible temperature dependence of net biological fractionation during cellulose synthesis. The isotope ratios of stream water varied by only 3.6% and 0.2% in deltaD and delta18O, respectively, over an elevation change of 810m. The similarity in stream water and macroenvironment over the short (13km) transect produced nearly constant stem and leaf water deltaD and delta18O values. In addition, what few seasonal variations observed in the isotopic composition of source water and atmospheric water vapor or in leaf water evaporative enrichment were experienced equally by all sites along the elevational transect. The temperature at each site along the transect spanned a range of > or = 5 degrees C as calculated using the adiabatic lapse rate. Since the deltaD and delta18O values of stem and leaf water varied little for these trees over this elevation/temperature transect, any differences in tree-ring cellulose deltaD and delta18O values should have been associated with temperature effects on net biological fractionation. However, the slopes of the regressions of elevation versus the deltaD and delta18O values of tree-ring cellulose were not significantly different from zero indicating little or no temperature dependence of net biological fractionation. Therefore, cross-site climatic reconstruction studies using the isotope ratios of cellulose need not be concerned that temperatures during the growing season have influenced results.

  4. Inferring gene regression networks with model trees

    PubMed Central

    2010-01-01

    Background Novel strategies are required in order to handle the huge amount of data produced by microarray technologies. To infer gene regulatory networks, the first step is to find direct regulatory relationships between genes building the so-called gene co-expression networks. They are typically generated using correlation statistics as pairwise similarity measures. Correlation-based methods are very useful in order to determine whether two genes have a strong global similarity but do not detect local similarities. Results We propose model trees as a method to identify gene interaction networks. While correlation-based methods analyze each pair of genes, in our approach we generate a single regression tree for each gene from the remaining genes. Finally, a graph from all the relationships among output and input genes is built taking into account whether the pair of genes is statistically significant. For this reason we apply a statistical procedure to control the false discovery rate. The performance of our approach, named REGNET, is experimentally tested on two well-known data sets: Saccharomyces Cerevisiae and E.coli data set. First, the biological coherence of the results are tested. Second the E.coli transcriptional network (in the Regulon database) is used as control to compare the results to that of a correlation-based method. This experiment shows that REGNET performs more accurately at detecting true gene associations than the Pearson and Spearman zeroth and first-order correlation-based methods. Conclusions REGNET generates gene association networks from gene expression data, and differs from correlation-based methods in that the relationship between one gene and others is calculated simultaneously. Model trees are very useful techniques to estimate the numerical values for the target genes by linear regression functions. They are very often more precise than linear regression models because they can add just different linear regressions to separate areas of the search space favoring to infer localized similarities over a more global similarity. Furthermore, experimental results show the good performance of REGNET. PMID:20950452

  5. More Trees, More Poverty? The Socioeconomic Effects of Tree Plantations in Chile, 2001-2011

    NASA Astrophysics Data System (ADS)

    Andersson, Krister; Lawrence, Duncan; Zavaleta, Jennifer; Guariguata, Manuel R.

    2016-01-01

    Tree plantations play a controversial role in many nations' efforts to balance goals for economic development, ecological conservation, and social justice. This paper seeks to contribute to this debate by analyzing the socioeconomic impact of such plantations. We focus our study on Chile, a country that has experienced extraordinary growth of industrial tree plantations. Our analysis draws on a unique dataset with longitudinal observations collected in 180 municipal territories during 2001-2011. Employing panel data regression techniques, we find that growth in plantation area is associated with higher than average rates of poverty during this period.

  6. More Trees, More Poverty? The Socioeconomic Effects of Tree Plantations in Chile, 2001-2011.

    PubMed

    Andersson, Krister; Lawrence, Duncan; Zavaleta, Jennifer; Guariguata, Manuel R

    2016-01-01

    Tree plantations play a controversial role in many nations' efforts to balance goals for economic development, ecological conservation, and social justice. This paper seeks to contribute to this debate by analyzing the socioeconomic impact of such plantations. We focus our study on Chile, a country that has experienced extraordinary growth of industrial tree plantations. Our analysis draws on a unique dataset with longitudinal observations collected in 180 municipal territories during 2001-2011. Employing panel data regression techniques, we find that growth in plantation area is associated with higher than average rates of poverty during this period.

  7. The wisdom of the commons: ensemble tree classifiers for prostate cancer prognosis.

    PubMed

    Koziol, James A; Feng, Anne C; Jia, Zhenyu; Wang, Yipeng; Goodison, Seven; McClelland, Michael; Mercola, Dan

    2009-01-01

    Classification and regression trees have long been used for cancer diagnosis and prognosis. Nevertheless, instability and variable selection bias, as well as overfitting, are well-known problems of tree-based methods. In this article, we investigate whether ensemble tree classifiers can ameliorate these difficulties, using data from two recent studies of radical prostatectomy in prostate cancer. Using time to progression following prostatectomy as the relevant clinical endpoint, we found that ensemble tree classifiers robustly and reproducibly identified three subgroups of patients in the two clinical datasets: non-progressors, early progressors and late progressors. Moreover, the consensus classifications were independent predictors of time to progression compared to known clinical prognostic factors.

  8. Beating the Odds: Trees to Success in Different Countries

    ERIC Educational Resources Information Center

    Finch, W. Holmes; Marchant, Gregory J.

    2017-01-01

    A recursive partitioning model approach in the form of classification and regression trees (CART) was used with 2012 PISA data for five countries (Canada, Finland, Germany, Singapore-China, and the Unites States). The objective of the study was to determine demographic and educational variables that differentiated between low SES student that were…

  9. Geospatial relationships of tree species damage caused by Hurricane Katrina in south Mississippi

    Treesearch

    Mark W. Garrigues; Zhaofei Fan; David L. Evans; Scott D. Roberts; William H. Cooke III

    2012-01-01

    Hurricane Katrina generated substantial impacts on the forests and biological resources of the affected area in Mississippi. This study seeks to use classification tree analysis (CTA) to determine which variables are significant in predicting hurricane damage (shear or windthrow) in the Southeast Mississippi Institute for Forest Inventory District. Logistic regressions...

  10. Using Classification Trees to Predict Alumni Giving for Higher Education

    ERIC Educational Resources Information Center

    Weerts, David J.; Ronca, Justin M.

    2009-01-01

    As the relative level of public support for higher education declines, colleges and universities aim to maximize alumni-giving to keep their programs competitive. Anchored in a utility maximization framework, this study employs the classification and regression tree methodology to examine characteristics of alumni donors and non-donors at a…

  11. Updated generalized biomass equations for North American tree species

    Treesearch

    David C. Chojnacky; Linda S. Heath; Jennifer C. Jenkins

    2014-01-01

    Historically, tree biomass at large scales has been estimated by applying dimensional analysis techniques and field measurements such as diameter at breast height (dbh) in allometric regression equations. Equations often have been developed using differing methods and applied only to certain species or isolated areas. We previously had compiled and combined (in meta-...

  12. The microcomputer scientific software series 5: the BIOMASS user's guide.

    Treesearch

    George E. Host; Stephen C. Westin; William G. Cole; Kurt S. Pregitzer

    1989-01-01

    BIOMASS is an interactive microcomputer program that uses allometric regression equations to calculate aboveground biomass of common tree species of the Lake States. The equations are species-specific and most use both diameter and height as independent variables. The program accommodates fixed area and variable radius sample designs and produces both individual tree...

  13. Northern Arkansas Spring Precipitation Reconstructed from Tree Rings, 1023-1992 A.D.

    Treesearch

    Malcolm K. Cleaveland

    2001-01-01

    Three baldcypress (Taxodium distichum (L.) Rich.) tree-ring chronologies in northeastern Arkansas and southeastern Missouri respond strongly to April-June (spring) rainfall in northern Arkansas. I used regression to reconstruct an average of spring rainfall in the three climatic divisions of northern Arkansas since 1023 A.D. The reconstruction was...

  14. Weather Impact on Airport Arrival Meter Fix Throughput

    NASA Technical Reports Server (NTRS)

    Wang, Yao

    2017-01-01

    Time-based flow management provides arrival aircraft schedules based on arrival airport conditions, airport capacity, required spacing, and weather conditions. In order to meet a scheduled time at which arrival aircraft can cross an airport arrival meter fix prior to entering the airport terminal airspace, air traffic controllers make regulations on air traffic. Severe weather may create an airport arrival bottleneck if one or more of airport arrival meter fixes are partially or completely blocked by the weather and the arrival demand has not been reduced accordingly. Under these conditions, aircraft are frequently being put in holding patterns until they can be rerouted. A model that predicts the weather impacted meter fix throughput may help air traffic controllers direct arrival flows into the airport more efficiently, minimizing arrival meter fix congestion. This paper presents an analysis of air traffic flows across arrival meter fixes at the Newark Liberty International Airport (EWR). Several scenarios of weather impacted EWR arrival fix flows are described. Furthermore, multiple linear regression and regression tree ensemble learning approaches for translating multiple sector Weather Impacted Traffic Indexes (WITI) to EWR arrival meter fix throughputs are examined. These weather translation models are developed and validated using the EWR arrival flight and weather data for the period of April-September in 2014. This study also compares the performance of the regression tree ensemble with traditional multiple linear regression models for estimating the weather impacted throughputs at each of the EWR arrival meter fixes. For all meter fixes investigated, the results from the regression tree ensemble weather translation models show a stronger correlation between model outputs and observed meter fix throughputs than that produced from multiple linear regression method.

  15. Classification and regression tree analysis of acute-on-chronic hepatitis B liver failure: Seeing the forest for the trees.

    PubMed

    Shi, K-Q; Zhou, Y-Y; Yan, H-D; Li, H; Wu, F-L; Xie, Y-Y; Braddock, M; Lin, X-Y; Zheng, M-H

    2017-02-01

    At present, there is no ideal model for predicting the short-term outcome of patients with acute-on-chronic hepatitis B liver failure (ACHBLF). This study aimed to establish and validate a prognostic model by using the classification and regression tree (CART) analysis. A total of 1047 patients from two separate medical centres with suspected ACHBLF were screened in the study, which were recognized as derivation cohort and validation cohort, respectively. CART analysis was applied to predict the 3-month mortality of patients with ACHBLF. The accuracy of the CART model was tested using the area under the receiver operating characteristic curve, which was compared with the model for end-stage liver disease (MELD) score and a new logistic regression model. CART analysis identified four variables as prognostic factors of ACHBLF: total bilirubin, age, serum sodium and INR, and three distinct risk groups: low risk (4.2%), intermediate risk (30.2%-53.2%) and high risk (81.4%-96.9%). The new logistic regression model was constructed with four independent factors, including age, total bilirubin, serum sodium and prothrombin activity by multivariate logistic regression analysis. The performances of the CART model (0.896), similar to the logistic regression model (0.914, P=.382), exceeded that of MELD score (0.667, P<.001). The results were confirmed in the validation cohort. We have developed and validated a novel CART model superior to MELD for predicting three-month mortality of patients with ACHBLF. Thus, the CART model could facilitate medical decision-making and provide clinicians with a validated practical bedside tool for ACHBLF risk stratification. © 2016 John Wiley & Sons Ltd.

  16. Harmonic regression of Landsat time series for modeling attributes from national forest inventory data

    NASA Astrophysics Data System (ADS)

    Wilson, Barry T.; Knight, Joseph F.; McRoberts, Ronald E.

    2018-03-01

    Imagery from the Landsat Program has been used frequently as a source of auxiliary data for modeling land cover, as well as a variety of attributes associated with tree cover. With ready access to all scenes in the archive since 2008 due to the USGS Landsat Data Policy, new approaches to deriving such auxiliary data from dense Landsat time series are required. Several methods have previously been developed for use with finer temporal resolution imagery (e.g. AVHRR and MODIS), including image compositing and harmonic regression using Fourier series. The manuscript presents a study, using Minnesota, USA during the years 2009-2013 as the study area and timeframe. The study examined the relative predictive power of land cover models, in particular those related to tree cover, using predictor variables based solely on composite imagery versus those using estimated harmonic regression coefficients. The study used two common non-parametric modeling approaches (i.e. k-nearest neighbors and random forests) for fitting classification and regression models of multiple attributes measured on USFS Forest Inventory and Analysis plots using all available Landsat imagery for the study area and timeframe. The estimated Fourier coefficients developed by harmonic regression of tasseled cap transformation time series data were shown to be correlated with land cover, including tree cover. Regression models using estimated Fourier coefficients as predictor variables showed a two- to threefold increase in explained variance for a small set of continuous response variables, relative to comparable models using monthly image composites. Similarly, the overall accuracies of classification models using the estimated Fourier coefficients were approximately 10-20 percentage points higher than the models using the image composites, with corresponding individual class accuracies between six and 45 percentage points higher.

  17. Tree-, stand- and site-specific controls on landscape-scale patterns of transpiration

    NASA Astrophysics Data System (ADS)

    Kathrin Hassler, Sibylle; Weiler, Markus; Blume, Theresa

    2018-01-01

    Transpiration is a key process in the hydrological cycle, and a sound understanding and quantification of transpiration and its spatial variability is essential for management decisions as well as for improving the parameterisation and evaluation of hydrological and soil-vegetation-atmosphere transfer models. For individual trees, transpiration is commonly estimated by measuring sap flow. Besides evaporative demand and water availability, tree-specific characteristics such as species, size or social status control sap flow amounts of individual trees. Within forest stands, properties such as species composition, basal area or stand density additionally affect sap flow, for example via competition mechanisms. Finally, sap flow patterns might also be influenced by landscape-scale characteristics such as geology and soils, slope position or aspect because they affect water and energy availability; however, little is known about the dynamic interplay of these controls.We studied the relative importance of various tree-, stand- and site-specific characteristics with multiple linear regression models to explain the variability of sap velocity measurements in 61 beech and oak trees, located at 24 sites across a 290 km2 catchment in Luxembourg. For each of 132 consecutive days of the growing season of 2014 we modelled the daily sap velocity and derived sap flow patterns of these 61 trees, and we determined the importance of the different controls.Results indicate that a combination of mainly tree- and site-specific factors controls sap velocity patterns in the landscape, namely tree species, tree diameter, geology and aspect. For sap flow we included only the stand- and site-specific predictors in the models to ensure variable independence. Of those, geology and aspect were most important. Compared to these predictors, spatial variability of atmospheric demand and soil moisture explains only a small fraction of the variability in the daily datasets. However, the temporal dynamics of the explanatory power of the tree-specific characteristics, especially species, are correlated to the temporal dynamics of potential evaporation. We conclude that transpiration estimates on the landscape scale would benefit from not only consideration of hydro-meteorological drivers, but also tree, stand and site characteristics in order to improve the spatial and temporal representation of transpiration for hydrological and soil-vegetation-atmosphere transfer models.

  18. A novel prediction approach for antimalarial activities of Trimethoprim, Pyrimethamine, and Cycloguanil analogues using extremely randomized trees.

    PubMed

    Nattee, Cholwich; Khamsemanan, Nirattaya; Lawtrakul, Luckhana; Toochinda, Pisanu; Hannongbua, Supa

    2017-01-01

    Malaria is still one of the most serious diseases in tropical regions. This is due in part to the high resistance against available drugs for the inhibition of parasites, Plasmodium, the cause of the disease. New potent compounds with high clinical utility are urgently needed. In this work, we created a novel model using a regression tree to study structure-activity relationships and predict the inhibition constant, K i of three different antimalarial analogues (Trimethoprim, Pyrimethamine, and Cycloguanil) based on their molecular descriptors. To the best of our knowledge, this work is the first attempt to study the structure-activity relationships of all three analogues combined. The most relevant descriptors and appropriate parameters of the regression tree are harvested using extremely randomized trees. These descriptors are water accessible surface area, Log of the aqueous solubility, total hydrophobic van der Waals surface area, and molecular refractivity. Out of all possible combinations of these selected parameters and descriptors, the tree with the strongest coefficient of determination is selected to be our prediction model. Predicted K i values from the proposed model show a strong coefficient of determination, R 2 =0.996, to experimental K i values. From the structure of the regression tree, compounds with high accessible surface area of all hydrophobic atoms (ASA_H) and low aqueous solubility of inhibitors (Log S) generally possess low K i values. Our prediction model can also be utilized as a screening test for new antimalarial drug compounds which may reduce the time and expenses for new drug development. New compounds with high predicted K i should be excluded from further drug development. It is also our inference that a threshold of ASA_H greater than 575.80 and Log S less than or equal to -4.36 is a sufficient condition for a new compound to possess a low K i . Copyright © 2016 Elsevier Inc. All rights reserved.

  19. The limits to tree height.

    PubMed

    Koch, George W; Sillett, Stephen C; Jennings, Gregory M; Davis, Stephen D

    2004-04-22

    Trees grow tall where resources are abundant, stresses are minor, and competition for light places a premium on height growth. The height to which trees can grow and the biophysical determinants of maximum height are poorly understood. Some models predict heights of up to 120 m in the absence of mechanical damage, but there are historical accounts of taller trees. Current hypotheses of height limitation focus on increasing water transport constraints in taller trees and the resulting reductions in leaf photosynthesis. We studied redwoods (Sequoia sempervirens), including the tallest known tree on Earth (112.7 m), in wet temperate forests of northern California. Our regression analyses of height gradients in leaf functional characteristics estimate a maximum tree height of 122-130 m barring mechanical damage, similar to the tallest recorded trees of the past. As trees grow taller, increasing leaf water stress due to gravity and path length resistance may ultimately limit leaf expansion and photosynthesis for further height growth, even with ample soil moisture.

  20. Comparison of Sub-Pixel Classification Approaches for Crop-Specific Mapping

    EPA Science Inventory

    This paper examined two non-linear models, Multilayer Perceptron (MLP) regression and Regression Tree (RT), for estimating sub-pixel crop proportions using time-series MODIS-NDVI data. The sub-pixel proportions were estimated for three major crop types including corn, soybean, a...

  1. "Mad or bad?": burden on caregivers of patients with personality disorders.

    PubMed

    Bauer, Rita; Döring, Antje; Schmidt, Tanja; Spießl, Hermann

    2012-12-01

    The burden on caregivers of patients with personality disorders is often greatly underestimated or completely disregarded. Possibilities for caregiver support have rarely been assessed. Thirty interviews were conducted with caregivers of such patients to assess illness-related burden. Responses were analyzed with a mixed method of qualitative and quantitative analysis in a sequential design. Patient and caregiver data, including sociodemographic and disease-related variables, were evaluated with regression analysis and regression trees. Caregiver statements (n = 404) were summarized into 44 global statements. The most frequent global statements were worries about the burden on other family members (70.0%), poor cooperation with clinical centers and other institutions (60.0%), financial burden (56.7%), worry about the patient's future (53.3%), and dissatisfaction with the patient's treatment and rehabilitation (53.3%). Linear regression and regression tree analysis identified predictors for more burdened caregivers. Caregivers of patients with personality disorders experience a variety of burdens, some disorder specific. Yet these caregivers often receive little attention or support.

  2. Automatic localization of bifurcations and vessel crossings in digital fundus photographs using location regression

    NASA Astrophysics Data System (ADS)

    Niemeijer, Meindert; Dumitrescu, Alina V.; van Ginneken, Bram; Abrámoff, Michael D.

    2011-03-01

    Parameters extracted from the vasculature on the retina are correlated with various conditions such as diabetic retinopathy and cardiovascular diseases such as stroke. Segmentation of the vasculature on the retina has been a topic that has received much attention in the literature over the past decade. Analysis of the segmentation result, however, has only received limited attention with most works describing methods to accurately measure the width of the vessels. Analyzing the connectedness of the vascular network is an important step towards the characterization of the complete vascular tree. The retinal vascular tree, from an image interpretation point of view, originates at the optic disc and spreads out over the retina. The tree bifurcates and the vessels also cross each other. The points where this happens form the key to determining the connectedness of the complete tree. We present a supervised method to detect the bifurcations and crossing points of the vasculature of the retina. The method uses features extracted from the vasculature as well as the image in a location regression approach to find those locations of the segmented vascular tree where the bifurcation or crossing occurs (from here, POI, points of interest). We evaluate the method on the publicly available DRIVE database in which an ophthalmologist has marked the POI.

  3. Ultrasonographic Diagnosis of Biliary Atresia Based on a Decision-Making Tree Model.

    PubMed

    Lee, So Mi; Cheon, Jung-Eun; Choi, Young Hun; Kim, Woo Sun; Cho, Hyun-Hae; Cho, Hyun-Hye; Kim, In-One; You, Sun Kyoung

    2015-01-01

    To assess the diagnostic value of various ultrasound (US) findings and to make a decision-tree model for US diagnosis of biliary atresia (BA). From March 2008 to January 2014, the following US findings were retrospectively evaluated in 100 infants with cholestatic jaundice (BA, n = 46; non-BA, n = 54): length and morphology of the gallbladder, triangular cord thickness, hepatic artery and portal vein diameters, and visualization of the common bile duct. Logistic regression analyses were performed to determine the features that would be useful in predicting BA. Conditional inference tree analysis was used to generate a decision-making tree for classifying patients into the BA or non-BA groups. Multivariate logistic regression analysis showed that abnormal gallbladder morphology and greater triangular cord thickness were significant predictors of BA (p = 0.003 and 0.001; adjusted odds ratio: 345.6 and 65.6, respectively). In the decision-making tree using conditional inference tree analysis, gallbladder morphology and triangular cord thickness (optimal cutoff value of triangular cord thickness, 3.4 mm) were also selected as significant discriminators for differential diagnosis of BA, and gallbladder morphology was the first discriminator. The diagnostic performance of the decision-making tree was excellent, with sensitivity of 100% (46/46), specificity of 94.4% (51/54), and overall accuracy of 97% (97/100). Abnormal gallbladder morphology and greater triangular cord thickness (> 3.4 mm) were the most useful predictors of BA on US. We suggest that the gallbladder morphology should be evaluated first and that triangular cord thickness should be evaluated subsequently in cases with normal gallbladder morphology.

  4. Relationships between individual-tree mortality and water-balance variables indicate positive trends in water stress-induced tree mortality across North America.

    PubMed

    Hember, Robbie A; Kurz, Werner A; Coops, Nicholas C

    2017-04-01

    Accounting for water stress-induced tree mortality in forest productivity models remains a challenge due to uncertainty in stress tolerance of tree populations. In this study, logistic regression models were developed to assess species-specific relationships between probability of mortality (P m ) and drought, drawing on 8.1 million observations of change in vital status (m) of individual trees across North America. Drought was defined by standardized (relative) values of soil water content (W s,z ) and reference evapotranspiration (ET r,z ) at each field plot. The models additionally tested for interactions between the water-balance variables, aridity class of the site (AC), and estimated tree height (h). Considering drought improved model performance in 95 (80) per cent of the 64 tested species during calibration (cross-validation). On average, sensitivity to relative drought increased with site AC (i.e. aridity). Interaction between water-balance variables and estimated tree height indicated that drought sensitivity commonly decreased during early height development and increased during late height development, which may reflect expansion of the root system and decreasing whole-plant, leaf-specific hydraulic conductance, respectively. Across North America, predictions suggested that changes in the water balance caused mortality to increase from 1.1% yr -1 in 1951 to 2.0% yr -1 in 2014 (a net change of 0.9 ± 0.3% yr -1 ). Interannual variation in mortality also increased, driven by increasingly severe droughts in 1988, 1998, 2006, 2007 and 2012. With strong confidence, this study indicates that water stress is a common cause of tree mortality. With weak-to-moderate confidence, this study strengthens previous claims attributing positive trends in mortality to increasing levels of water stress. This 'learn-as-we-go' approach - defined by sampling rare drought events as they continue to intensify - will help to constrain the hydraulic limits of dominant tree species and the viability of boreal and temperate forest biomes under continued climate change. © 2016 John Wiley & Sons Ltd.

  5. The use of single-date MODIS imagery for estimating large-scale urban impervious surface fraction with spectral mixture analysis and machine learning techniques

    NASA Astrophysics Data System (ADS)

    Deng, Chengbin; Wu, Changshan

    2013-12-01

    Urban impervious surface information is essential for urban and environmental applications at the regional/national scales. As a popular image processing technique, spectral mixture analysis (SMA) has rarely been applied to coarse-resolution imagery due to the difficulty of deriving endmember spectra using traditional endmember selection methods, particularly within heterogeneous urban environments. To address this problem, we derived endmember signatures through a least squares solution (LSS) technique with known abundances of sample pixels, and integrated these endmember signatures into SMA for mapping large-scale impervious surface fraction. In addition, with the same sample set, we carried out objective comparative analyses among SMA (i.e. fully constrained and unconstrained SMA) and machine learning (i.e. Cubist regression tree and Random Forests) techniques. Analysis of results suggests three major conclusions. First, with the extrapolated endmember spectra from stratified random training samples, the SMA approaches performed relatively well, as indicated by small MAE values. Second, Random Forests yields more reliable results than Cubist regression tree, and its accuracy is improved with increased sample sizes. Finally, comparative analyses suggest a tentative guide for selecting an optimal approach for large-scale fractional imperviousness estimation: unconstrained SMA might be a favorable option with a small number of samples, while Random Forests might be preferred if a large number of samples are available.

  6. Shea (Vitellaria paradoxa Gaertn C. F.) fruit yield assessment and management by farm households in the Atacora district of Benin

    PubMed Central

    Villamor, Grace B.; Nyarko, Benjamin Kofi; Wala, Kperkouma; Akpagana, Koffi

    2018-01-01

    Vitellaria paradoxa (Gaertn C. F.), or shea tree, remains one of the most valuable trees for farmers in the Atacora district of northern Benin, where rural communities depend on shea products for both food and income. To optimize productivity and management of shea agroforestry systems, or "parklands," accurate and up-to-date data are needed. For this purpose, we monitored120 fruiting shea trees for two years under three land-use scenarios and different soil groups in Atacora, coupled with a farm household survey to elicit information on decision making and management practices. To examine the local pattern of shea tree productivity and relationships between morphological factors and yields, we used a randomized branch sampling method and applied a regression analysis to build a shea yield model based on dendrometric, soil and land-use variables. We also compared potential shea yields based on farm household socio-economic characteristics and management practices derived from the survey data. Soil and land-use variables were the most important determinants of shea fruit yield. In terms of land use, shea trees growing on farmland plots exhibited the highest yields (i.e., fruit quantity and mass) while trees growing on Lixisols performed better than those of the other soil group. Contrary to our expectations, dendrometric parameters had weak relationships with fruit yield regardless of land-use and soil group. There is an inter-annual variability in fruit yield in both soil groups and land-use type. In addition to observed inter-annual yield variability, there was a high degree of variability in production among individual shea trees. Furthermore, household socioeconomic characteristics such as road accessibility, landholding size, and gross annual income influence shea fruit yield. The use of fallow areas is an important land management practice in the study area that influences both conservation and shea yield. PMID:29346406

  7. Shea (Vitellaria paradoxa Gaertn C. F.) fruit yield assessment and management by farm households in the Atacora district of Benin.

    PubMed

    Aleza, Koutchoukalo; Villamor, Grace B; Nyarko, Benjamin Kofi; Wala, Kperkouma; Akpagana, Koffi

    2018-01-01

    Vitellaria paradoxa (Gaertn C. F.), or shea tree, remains one of the most valuable trees for farmers in the Atacora district of northern Benin, where rural communities depend on shea products for both food and income. To optimize productivity and management of shea agroforestry systems, or "parklands," accurate and up-to-date data are needed. For this purpose, we monitored120 fruiting shea trees for two years under three land-use scenarios and different soil groups in Atacora, coupled with a farm household survey to elicit information on decision making and management practices. To examine the local pattern of shea tree productivity and relationships between morphological factors and yields, we used a randomized branch sampling method and applied a regression analysis to build a shea yield model based on dendrometric, soil and land-use variables. We also compared potential shea yields based on farm household socio-economic characteristics and management practices derived from the survey data. Soil and land-use variables were the most important determinants of shea fruit yield. In terms of land use, shea trees growing on farmland plots exhibited the highest yields (i.e., fruit quantity and mass) while trees growing on Lixisols performed better than those of the other soil group. Contrary to our expectations, dendrometric parameters had weak relationships with fruit yield regardless of land-use and soil group. There is an inter-annual variability in fruit yield in both soil groups and land-use type. In addition to observed inter-annual yield variability, there was a high degree of variability in production among individual shea trees. Furthermore, household socioeconomic characteristics such as road accessibility, landholding size, and gross annual income influence shea fruit yield. The use of fallow areas is an important land management practice in the study area that influences both conservation and shea yield.

  8. Combined self-learning based single-image super-resolution and dual-tree complex wavelet transform denoising for medical images

    NASA Astrophysics Data System (ADS)

    Yang, Guang; Ye, Xujiong; Slabaugh, Greg; Keegan, Jennifer; Mohiaddin, Raad; Firmin, David

    2016-03-01

    In this paper, we propose a novel self-learning based single-image super-resolution (SR) method, which is coupled with dual-tree complex wavelet transform (DTCWT) based denoising to better recover high-resolution (HR) medical images. Unlike previous methods, this self-learning based SR approach enables us to reconstruct HR medical images from a single low-resolution (LR) image without extra training on HR image datasets in advance. The relationships between the given image and its scaled down versions are modeled using support vector regression with sparse coding and dictionary learning, without explicitly assuming reoccurrence or self-similarity across image scales. In addition, we perform DTCWT based denoising to initialize the HR images at each scale instead of simple bicubic interpolation. We evaluate our method on a variety of medical images. Both quantitative and qualitative results show that the proposed approach outperforms bicubic interpolation and state-of-the-art single-image SR methods while effectively removing noise.

  9. Estimating forest crown area removed by selection cutting: a linked regression-GIS approach based on stump diameters

    USGS Publications Warehouse

    Anderson, S.C.; Kupfer, J.A.; Wilson, R.R.; Cooper, R.J.

    2000-01-01

    The purpose of this research was to develop a model that could be used to provide a spatial representation of uneven-aged silvicultural treatments on forest crown area. We began by developing species-specific linear regression equations relating tree DBH to crown area for eight bottomland tree species at White River National Wildlife Refuge, Arkansas, USA. The relationships were highly significant for all species, with coefficients of determination (r(2)) ranging from 0.37 for Ulmus crassifolia to nearly 0.80 for Quercus nuttalliii and Taxodium distichum. We next located and measured the diameters of more than 4000 stumps from a single tree-group selection timber harvest. Stump locations were recorded with respect to an established gl id point system and entered into a Geographic Information System (ARC/INFO). The area occupied by the crown of each logged individual was then estimated by using the stump dimensions (adjusted to DBHs) and the regression equations relating tree DBH to crown area. Our model projected that the selection cuts removed roughly 300 m(2) of basal area from the logged sites resulting in the loss of approximate to 55 000 m(2) of crown area. The model developed in this research represents a tool that can be used in conjunction with remote sensing applications to assist in forest inventory and management, as well as to estimate the impacts of selective timber harvest on wildlife.

  10. A cross-sectional study for predicting tail biting risk in pig farms using classification and regression tree analysis.

    PubMed

    Scollo, Annalisa; Gottardo, Flaviana; Contiero, Barbara; Edwards, Sandra A

    2017-10-01

    Tail biting in pigs has been an identified behavioural, welfare and economic problem for decades, and requires appropriate but sometimes difficult on-farm interventions. The aim of the paper is to introduce the Classification and Regression Tree (CRT) methodologies to develop a tool for prevention of acute tail biting lesions in pigs on-farm. A sample of 60 commercial farms rearing heavy pigs were involved; an on-farm visit and an interview with the farmer collected data on general management, herd health, disease prevention, climate control, feeding and production traits. Results suggest a value for the CRT analysis in managing the risk factors behind tail biting on a farm-specific level, showing 86.7% sensitivity for the Classification Tree and a correlation of 0.7 between observed and predicted prevalence of tail biting obtained with the Regression Tree. CRT analysis showed five main variables (stocking density, ammonia levels, number of pigs per stockman, type of floor and timeliness in feed supply) as critical predictors of acute tail biting lesions, which demonstrate different importance in different farms subgroups. The model might have reliable and practical applications for the support and implementation of tail biting prevention interventions, especially in case of subgroups of pigs with higher risk, helping farmers and veterinarians to assess the risk in their own farm and to manage their predisposing variables in order to reduce acute tail biting lesions. Copyright © 2017 Elsevier B.V. All rights reserved.

  11. Modeling Fire Severity in Black Spruce Stands in the Alaskan Boreal Forest Using Spectral and Non-Spectral Geospatial Data

    NASA Technical Reports Server (NTRS)

    Barrett, K.; Kasischke, E. S.; McGuire, A. D.; Turetsky, M. R.; Kane, E. S.

    2010-01-01

    Biomass burning in the Alaskan interior is already a major disturbance and source of carbon emissions, and is likely to increase in response to the warming and drying predicted for the future climate. In addition to quantifying changes to the spatial and temporal patterns of burned areas, observing variations in severity is the key to studying the impact of changes to the fire regime on carbon cycling, energy budgets, and post-fire succession. Remote sensing indices of fire severity have not consistently been well-correlated with in situ observations of important severity characteristics in Alaskan black spruce stands, including depth of burning of the surface organic layer. The incorporation of ancillary data such as in situ observations and GIS layers with spectral data from Landsat TM/ETM+ greatly improved efforts to map the reduction of the organic layer in burned black spruce stands. Using a regression tree approach, the R2 of the organic layer depth reduction models was 0.60 and 0.55 (pb0.01) for relative and absolute depth reduction, respectively. All of the independent variables used by the regression tree to estimate burn depth can be obtained independently of field observations. Implementation of a gradient boosting algorithm improved the R2 to 0.80 and 0.79 (pb0.01) for absolute and relative organic layer depth reduction, respectively. Independent variables used in the regression tree model of burn depth included topographic position, remote sensing indices related to soil and vegetation characteristics, timing of the fire event, and meteorological data. Post-fire organic layer depth characteristics are determined for a large (N200,000 ha) fire to identify areas that are potentially vulnerable to a shift in post-fire succession. This application showed that 12% of this fire event experienced fire severe enough to support a change in post-fire succession. We conclude that non-parametric models and ancillary data are useful in the modeling of the surface organic layer fire depth. Because quantitative differences in post-fire surface characteristics do not directly influence spectral properties, these modeling techniques provide better information than the use of remote sensing data alone.

  12. Modeling fire severity in black spruce stands in the Alaskan boreal forest using spectral and non-spectral geospatial data

    USGS Publications Warehouse

    Barrett, Kirsten M.; Kasischke, E.S.; McGuire, A.D.; Turetsky, M.R.; Kane, E.S.

    2010-01-01

    Biomass burning in the Alaskan interior is already a major disturbance and source of carbon emissions, and is likely to increase in response to the warming and drying predicted for the future climate. In addition to quantifying changes to the spatial and temporal patterns of burned areas, observing variations in severity is the key to studying the impact of changes to the fire regime on carbon cycling, energy budgets, and post-fire succession. Remote sensing indices of fire severity have not consistently been well-correlated with in situ observations of important severity characteristics in Alaskan black spruce stands, including depth of burning of the surface organic layer. The incorporation of ancillary data such as in situ observations and GIS layers with spectral data from Landsat TM/ETM+ greatly improved efforts to map the reduction of the organic layer in burned black spruce stands. Using a regression tree approach, the R2 of the organic layer depth reduction models was 0.60 and 0.55 (pb0.01) for relative and absolute depth reduction, respectively. All of the independent variables used by the regression tree to estimate burn depth can be obtained independently of field observations. Implementation of a gradient boosting algorithm improved the R2 to 0.80 and 0.79 (pb0.01) for absolute and relative organic layer depth reduction, respectively. Independent variables used in the regression tree model of burn depth included topographic position, remote sensing indices related to soil and vegetation characteristics, timing of the fire event, and meteorological data. Post-fire organic layer depth characteristics are determined for a large (N200,000 ha) fire to identify areas that are potentially vulnerable to a shift in post-fire succession. This application showed that 12% of this fire event experienced fire severe enough to support a change in post-fire succession. We conclude that non-parametric models and ancillary data are useful in the modeling of the surface organic layer fire depth. Because quantitative differences in post-fire surface characteristics do not directly influence spectral properties, these modeling techniques provide better information than the use of remote sensing data alone.

  13. Annual Tree Growth Predictions From Periodic Measurements

    Treesearch

    Quang V. Cao

    2004-01-01

    Data from annual measurements of a loblolly pine (Pinus taeda L.) plantation were available for this study. Regression techniques were employed to model annual changes of individual trees in terms of diameters, heights, and survival probabilities. Subsets of the data that include measurements every 2, 3, 4, 5, and 6 years were used to fit the same...

  14. Understory response following varying levels of overstory removal in mixed conifer stands

    Treesearch

    Fabian C.C. Uzoh; Leroy K. Dolph; John R. Anstead

    1997-01-01

    Diameter growth rates of understory trees were measured for periods both before and after overstory removal on six study areas in northern California. All the species responded with increased diameter growth after adjusting to their new environments. Linear regression equations that predict post treatment diameter growth increment of the residual trees are presented...

  15. Delayed conifer tree mortality following fire in California

    Treesearch

    Sharon M. Hood; Sheri L. Smith; Daniel R. Cluck

    2007-01-01

    Fire injury was characterized and survival monitored for 5,246 trees from five wildfires in California that occurred between 1999 and 2002. Logistic regression models for predicting the probability of mortality were developed for incense-cedar, Jeffrey pine, ponderosa pine, red fir and white fir. Two-year post-fire preliminary models were developed for incense-cedar,...

  16. Estimating leaf area and leaf biomass of open-grown deciduous urban trees

    Treesearch

    David J. Nowak

    1996-01-01

    Logarithmic regression equations were developed to predict leaf area and leaf biomass for open-grown deciduous urban trees based on stem diameter and crown parameters. Equations based on crown parameters produced more reliable estimates. The equations can be used to help quantify forest structure and functions, particularly in urbanizing and urban/suburban areas.

  17. Post-fire tree establishment patterns at the alpine treeline ecotone: Mount Rainier National Park, Washington, USA

    Treesearch

    Kirk M. Stueve; Dawna L. Cerney; Regina M. Rochefort; Laurie L. Kurth

    2009-01-01

    We performed classification analysis of 1970 satellite imagery and 2003 aerial photography to delineate establishment. Local site conditions were calculated from a LIDAR-based DEM, ancillary climate data, and 1970 tree locations in a GIS. We used logistic regression on a spatially weighted landscape matrix to rank variables.

  18. Biomass of Yellow-Poplar in Natural Stands in Western North Carolina

    Treesearch

    Alexander Clark; James G. Schroeder

    1977-01-01

    Aboveground biomass was determined for yellow-poplar(Liriodendron tulipifera L.) trees 6 to 28 inches d. b. h. growingin natural, uneven-aged mountaincovestandsin western North Carolina.Specific gravity, moisture content, and green weight per cubic foot are presented for the total tree and its components. Tables developed from regression equations show weight and...

  19. The wisdom of the commons: ensemble tree classifiers for prostate cancer prognosis

    PubMed Central

    Koziol, James A.; Feng, Anne C.; Jia, Zhenyu; Wang, Yipeng; Goodison, Seven; McClelland, Michael; Mercola, Dan

    2009-01-01

    Motivation: Classification and regression trees have long been used for cancer diagnosis and prognosis. Nevertheless, instability and variable selection bias, as well as overfitting, are well-known problems of tree-based methods. In this article, we investigate whether ensemble tree classifiers can ameliorate these difficulties, using data from two recent studies of radical prostatectomy in prostate cancer. Results: Using time to progression following prostatectomy as the relevant clinical endpoint, we found that ensemble tree classifiers robustly and reproducibly identified three subgroups of patients in the two clinical datasets: non-progressors, early progressors and late progressors. Moreover, the consensus classifications were independent predictors of time to progression compared to known clinical prognostic factors. Contact: dmercola@uci.edu PMID:18628288

  20. Decision tree modeling using R.

    PubMed

    Zhang, Zhongheng

    2016-08-01

    In machine learning field, decision tree learner is powerful and easy to interpret. It employs recursive binary partitioning algorithm that splits the sample in partitioning variable with the strongest association with the response variable. The process continues until some stopping criteria are met. In the example I focus on conditional inference tree, which incorporates tree-structured regression models into conditional inference procedures. While growing a single tree is subject to small changes in the training data, random forests procedure is introduced to address this problem. The sources of diversity for random forests come from the random sampling and restricted set of input variables to be selected. Finally, I introduce R functions to perform model based recursive partitioning. This method incorporates recursive partitioning into conventional parametric model building.

  1. Estimating extent of mortality associated with the Douglas-fir beetle in the Central and Northern Rockies

    Treesearch

    Jose F. Negron; Willis C. Schaupp; Kenneth E. Gibson; John Anhold; Dawn Hansen; Ralph Thier; Phil Mocettini

    1999-01-01

    Data collected from Douglas-fir stands infected by the Douglas-fir beetle in Wyoming, Montana, Idaho, and Utah, were used to develop models to estimate amount of mortality in terms of basal area killed. Models were built using stepwise linear regression and regression tree approaches. Linear regression models using initial Douglas-fir basal area were built for all...

  2. Biomass expansion factor and root-to-shoot ratio for Pinus in Brazil.

    PubMed

    Sanquetta, Carlos R; Corte, Ana Pd; da Silva, Fernando

    2011-09-24

    The Biomass Expansion Factor (BEF) and the Root-to-Shoot Ratio (R) are variables used to quantify carbon stock in forests. They are often considered as constant or species/area specific values in most studies. This study aimed at showing tree size and age dependence upon BEF and R and proposed equations to improve forest biomass and carbon stock. Data from 70 sample Pinus spp. grown in southern Brazil trees in different diameter classes and ages were used to demonstrate the correlation between BEF and R, and forest inventory data, such as DBH, tree height and age. Total dry biomass, carbon stock and CO2 equivalent were simulated using the IPCC default values of BEF and R, corresponding average calculated from data used in this study, as well as the values estimated by regression equations. The mean values of BEF and R calculated in this study were 1.47 and 0.17, respectively. The relationship between BEF and R and the tree measurement variables were inversely related with negative exponential behavior. Simulations indicated that use of fixed values of BEF and R, either IPCC default or current average data, may lead to unreliable estimates of carbon stock inventories and CDM projects. It was concluded that accounting for the variations in BEF and R and using regression equations to relate them to DBH, tree height and age, is fundamental in obtaining reliable estimates of forest tree biomass, carbon sink and CO2 equivalent.

  3. Diagnostic and Prognostic Value of Long-Axis Strain and Myocardial Contraction Fraction Using Standard Cardiovascular MR Imaging in Patients with Nonischemic Dilated Cardiomyopathies.

    PubMed

    Arenja, Nisha; Riffel, Johannes H; Fritz, Thomas; André, Florian; Aus dem Siepen, Fabian; Mueller-Hennessen, Matthias; Giannitsis, Evangelos; Katus, Hugo A; Friedrich, Matthias G; Buss, Sebastian J

    2017-06-01

    Purpose To assess the utility of established functional markers versus two additional functional markers derived from standard cardiovascular magnetic resonance (MR) images for their incremental diagnostic and prognostic information in patients with nonischemic dilated cardiomyopathy (NIDCM). Materials and Methods Approval was obtained from the local ethics committee. MR images from 453 patients with NIDCM and 150 healthy control subjects were included between 2005 and 2013 and were analyzed retrospectively. Myocardial contraction fraction (MCF) was calculated by dividing left ventricular (LV) stroke volume by LV myocardial volume, and long-axis strain (LAS) was calculated from the distances between the epicardial border of the LV apex and the midpoint of a line connecting the origins of the mitral valve leaflets at end systole and end diastole. Receiver operating characteristic curve, Kaplan-Meier method, Cox regression, and classification and regression tree (CART) analyses were performed for diagnostic and prognostic performances. Results LAS (area under the receiver operating characteristic curve [AUC] = 0.93, P < .001) and MCF (AUC = 0.92, P < .001) can be used to discriminate patients with NIDCM from age- and sex-matched control subjects. A total of 97 patients reached the combined end point during a median follow-up of 4.8 years. In multivariate Cox regression analysis, only LV ejection fraction (EF) and LAS independently indicated the combined end point (hazard ratio = 2.8 and 1.9, respectively; P < .001 for both). In a risk stratification approach with classification and regression tree analysis, combined LV EF and LAS cutoff values were used to stratify patients into three risk groups (log-rank test, P < .001). Conclusion Cardiovascular MR-derived MCF and LAS serve as reliable diagnostic and prognostic markers in patients with NIDCM. LAS, as a marker for longitudinal contractile function, is an independent parameter for outcome and offers incremental information beyond LV EF and the presence of myocardial fibrosis. © RSNA, 2017 Online supplemental material is available for this article.

  4. Prediction accuracies for growth and wood attributes of interior spruce in space using genotyping-by-sequencing.

    PubMed

    Gamal El-Dien, Omnia; Ratcliffe, Blaise; Klápště, Jaroslav; Chen, Charles; Porth, Ilga; El-Kassaby, Yousry A

    2015-05-09

    Genomic selection (GS) in forestry can substantially reduce the length of breeding cycle and increase gain per unit time through early selection and greater selection intensity, particularly for traits of low heritability and late expression. Affordable next-generation sequencing technologies made it possible to genotype large numbers of trees at a reasonable cost. Genotyping-by-sequencing was used to genotype 1,126 Interior spruce trees representing 25 open-pollinated families planted over three sites in British Columbia, Canada. Four imputation algorithms were compared (mean value (MI), singular value decomposition (SVD), expectation maximization (EM), and a newly derived, family-based k-nearest neighbor (kNN-Fam)). Trees were phenotyped for several yield and wood attributes. Single- and multi-site GS prediction models were developed using the Ridge Regression Best Linear Unbiased Predictor (RR-BLUP) and the Generalized Ridge Regression (GRR) to test different assumption about trait architecture. Finally, using PCA, multi-trait GS prediction models were developed. The EM and kNN-Fam imputation methods were superior for 30 and 60% missing data, respectively. The RR-BLUP GS prediction model produced better accuracies than the GRR indicating that the genetic architecture for these traits is complex. GS prediction accuracies for multi-site were high and better than those of single-sites while multi-site predictability produced the lowest accuracies reflecting type-b genetic correlations and deemed unreliable. The incorporation of genomic information in quantitative genetics analyses produced more realistic heritability estimates as half-sib pedigree tended to inflate the additive genetic variance and subsequently both heritability and gain estimates. Principle component scores as representatives of multi-trait GS prediction models produced surprising results where negatively correlated traits could be concurrently selected for using PCA2 and PCA3. The application of GS to open-pollinated family testing, the simplest form of tree improvement evaluation methods, was proven to be effective. Prediction accuracies obtained for all traits greatly support the integration of GS in tree breeding. While the within-site GS prediction accuracies were high, the results clearly indicate that single-site GS models ability to predict other sites are unreliable supporting the utilization of multi-site approach. Principle component scores provided an opportunity for the concurrent selection of traits with different phenotypic optima.

  5. Fire frequency in the Interior Columbia River Basin: Building regional models from fire history data

    USGS Publications Warehouse

    McKenzie, D.; Peterson, D.L.; Agee, James K.

    2000-01-01

    Fire frequency affects vegetation composition and successional pathways; thus it is essential to understand fire regimes in order to manage natural resources at broad spatial scales. Fire history data are lacking for many regions for which fire management decisions are being made, so models are needed to estimate past fire frequency where local data are not yet available. We developed multiple regression models and tree-based (classification and regression tree, or CART) models to predict fire return intervals across the interior Columbia River basin at 1-km resolution, using georeferenced fire history, potential vegetation, cover type, and precipitation databases. The models combined semiqualitative methods and rigorous statistics. The fire history data are of uneven quality; some estimates are based on only one tree, and many are not cross-dated. Therefore, we weighted the models based on data quality and performed a sensitivity analysis of the effects on the models of estimation errors that are due to lack of cross-dating. The regression models predict fire return intervals from 1 to 375 yr for forested areas, whereas the tree-based models predict a range of 8 to 150 yr. Both types of models predict latitudinal and elevational gradients of increasing fire return intervals. Examination of regional-scale output suggests that, although the tree-based models explain more of the variation in the original data, the regression models are less likely to produce extrapolation errors. Thus, the models serve complementary purposes in elucidating the relationships among fire frequency, the predictor variables, and spatial scale. The models can provide local managers with quantitative information and provide data to initialize coarse-scale fire-effects models, although predictions for individual sites should be treated with caution because of the varying quality and uneven spatial coverage of the fire history database. The models also demonstrate the integration of qualitative and quantitative methods when requisite data for fully quantitative models are unavailable. They can be tested by comparing new, independent fire history reconstructions against their predictions and can be continually updated, as better fire history data become available.

  6. Uni- and multi-variable modelling of flood losses: experiences gained from the Secchia river inundation event.

    NASA Astrophysics Data System (ADS)

    Carisi, Francesca; Domeneghetti, Alessio; Kreibich, Heidi; Schröter, Kai; Castellarin, Attilio

    2017-04-01

    Flood risk is function of flood hazard and vulnerability, therefore its accurate assessment depends on a reliable quantification of both factors. The scientific literature proposes a number of objective and reliable methods for assessing flood hazard, yet it highlights a limited understanding of the fundamental damage processes. Loss modelling is associated with large uncertainty which is, among other factors, due to a lack of standard procedures; for instance, flood losses are often estimated based on damage models derived in completely different contexts (i.e. different countries or geographical regions) without checking its applicability, or by considering only one explanatory variable (i.e. typically water depth). We consider the Secchia river flood event of January 2014, when a sudden levee-breach caused the inundation of nearly 200 km2 in Northern Italy. In the aftermath of this event, local authorities collected flood loss data, together with additional information on affected private households and industrial activities (e.g. buildings surface and economic value, number of company's employees and others). Based on these data we implemented and compared a quadratic-regression damage function, with water depth as the only explanatory variable, and a multi-variable model that combines multiple regression trees and considers several explanatory variables (i.e. bagging decision trees). Our results show the importance of data collection revealing that (1) a simple quadratic regression damage function based on empirical data from the study area can be significantly more accurate than literature damage-models derived for a different context and (2) multi-variable modelling may outperform the uni-variable approach, yet it is more difficult to develop and apply due to a much higher demand of detailed data.

  7. National scale biomass estimators for United States tree species

    Treesearch

    Jennifer C. Jenkins; David C. Chojnacky; Linda S. Heath; Richard A. Birdsey

    2003-01-01

    Estimates of national-scale forest carbon (C) stocks and fluxes are typically based on allometric regression equations developed using dimensional analysis techniques. However, the literature is inconsistent and incomplete with respect to large-scale forest C estimation. We compiled all available diameter-based allometric regression equations for estimating total...

  8. A hydraulic-photosynthetic model based on extended HLH and its application to Coast redwood (Sequoia sempervirens).

    PubMed

    Du, Ning; Fan, Jintu; Chen, Shuo; Liu, Yang

    2008-07-21

    Although recent investigations [Ryan, M.G., Yoder, B.J., 1997. Hydraulic limits to tree height and tree growth. Bioscience 47, 235-242; Koch, G.W., Sillett, S.C.,Jennings, G.M.,Davis, S.D., 2004. The limits to tree height. Nature 428, 851-854; Niklas, K.J., Spatz, H., 2004. Growth and hydraulic (not mechanical) constraints govern the scaling of tree height and mass. Proc. Natl Acad. Sci. 101, 15661-15663; Ryan, M.G., Phillips, N., Bond, B.J., 2006. Hydraulic limitation hypothesis revisited. Plant Cell Environ. 29, 367-381; Niklas, K.J., 2007. Maximum plant height and the biophysical factors that limit it. Tree Physiol. 27, 433-440; Burgess, S.S.O., Dawson, T.E., 2007. Predicting the limits to tree height using statistical regressions of leaf traits. New Phytol. 174, 626-636] suggested that the hydraulic limitation hypothesis (HLH) is the most plausible theory to explain the biophysical limits to maximum tree height and the decline in tree growth rate with age, the analysis is largely qualitative or based on statistical regression. Here we present an integrated biophysical model based on the principle that trees develop physiological compensations (e.g. the declined leaf water potential and the tapering of conduits with heights [West, G.B., Brown, J.H., Enquist, B.J., 1999. A general model for the structure and allometry of plant vascular systems. Nature 400, 664-667]) to resist the increasing water stress with height, the classical HLH and the biochemical limitations on photosynthesis [von Caemmerer, S., 2000. Biochemical Models of Leaf Photosynthesis. CSIRO Publishing, Australia]. The model has been applied to the tallest trees in the world (viz. Coast redwood (Sequoia sempervirens)). Xylem water potential, leaf carbon isotope composition, leaf mass to area ratio at different heights derived from the model show good agreements with the experimental measurements of Koch et al. [2004. The limits to tree height. Nature 428, 851-854]. The model also well explains the universal trend of declining growth rate with age.

  9. Estimating tree species diversity in the savannah using NDVI and woody canopy cover

    NASA Astrophysics Data System (ADS)

    Madonsela, Sabelo; Cho, Moses Azong; Ramoelo, Abel; Mutanga, Onisimo; Naidoo, Laven

    2018-04-01

    Remote sensing applications in biodiversity research often rely on the establishment of relationships between spectral information from the image and tree species diversity measured in the field. Most studies have used normalized difference vegetation index (NDVI) to estimate tree species diversity on the basis that it is sensitive to primary productivity which defines spatial variation in plant diversity. The NDVI signal is influenced by photosynthetically active vegetation which, in the savannah, includes woody canopy foliage and grasses. The question is whether the relationship between NDVI and tree species diversity in the savanna depends on the woody cover percentage. This study explored the relationship between woody canopy cover (WCC) and tree species diversity in the savannah woodland of southern Africa and also investigated whether there is a significant interaction between seasonal NDVI and WCC in the factorial model when estimating tree species diversity. To fulfil our aim, we followed stratified random sampling approach and surveyed tree species in 68 plots of 90 m × 90 m across the study area. Within each plot, all trees with diameter at breast height of >10 cm were sampled and Shannon index - a common measure of species diversity which considers both species richness and abundance - was used to quantify tree species diversity. We then extracted WCC in each plot from existing fractional woody cover product produced from Synthetic Aperture Radar (SAR) data. Factorial regression model was used to determine the interaction effect between NDVI and WCC when estimating tree species diversity. Results from regression analysis showed that (i) WCC has a highly significant relationship with tree species diversity (r2 = 0.21; p < 0.01), (ii) the interaction between the NDVI and WCC is not significant, however, the factorial model significantly reduced the error of prediction (RMSE = 0.47, p < 0.05) compared to NDVI (RMSE = 0.49) or WCC (RMSE = 0.49) model during the senescence period. The result justifies our assertion that combining NDVI with WCC will be optimal for biodiversity estimation during the senescence period.

  10. Leveraging Past and Current Measurements to Probabilistically Nowcast Low Visibility Procedures at an Airport

    NASA Astrophysics Data System (ADS)

    Mayr, G. J.; Kneringer, P.; Dietz, S. J.; Zeileis, A.

    2016-12-01

    Low visibility or low cloud ceiling reduce the capacity of airports by requiring special low visibility procedures (LVP) for incoming/departing aircraft. Probabilistic forecasts when such procedures will become necessary help to mitigate delays and economic losses.We compare the performance of probabilistic nowcasts with two statistical methods: ordered logistic regression, and trees and random forests. These models harness historic and current meteorological measurements in the vicinity of the airport and LVP states, and incorporate diurnal and seasonal climatological information via generalized additive models (GAM). The methods are applied at Vienna International Airport (Austria). The performance is benchmarked against climatology, persistence and human forecasters.

  11. Crown area equations for 13 species of trees and shrubs in northern California and southwestern Oregon

    Treesearch

    Fabian C.C. Uzoh; Martin W. Ritchie

    1996-01-01

    The equations presented predict crown area for 13 species of trees and shrubs which may be found growing in competition with commercial conifers during early stages of stand development. The equations express crown area as a function of basal area and height. Parameters were estimated for each species individually using weighted nonlinear least square regression.

  12. Biomass equations for major tree species of the Northeast

    Treesearch

    Louise M. Tritton; James W. Hornbeck

    1982-01-01

    Regression equations are used in both forestry and ecosystem studies to estimate tree biomass from field measurements of dbh (diameter at breast height) or a combination of dbh and height. Literature on biomass is reviewed, and 178 sets of publish equation for 25 species common to the Northeastern Unites States are listed. On the basis of these equations, estimates of...

  13. Stand basal-area and tree-diameter growth in red spruce-fir forests in Maine, 1960-80

    Treesearch

    S.J. Zarnoch; D.A. Gansner; D.S. Powell; T.A. Birch; T.A. Birch

    1990-01-01

    Stand basal-area change and individual surviving red spruce d.b.h. growth from 1960 to 1980 were analyzed for red spruce-fir stands in Maine. Regression modeling was used to relate these measures of growth to stand and tree conditions and to compare growth throughout the period. Results indicate a decline in growth.

  14. Examination of the Arborsonic Decay Detector for Detecting Bacterial Wetwood in Red Oaks

    Treesearch

    Zicai Xu; Theodor D. Leininger; James G. Williams; Frank H. Tainter

    2000-01-01

    The Arborsonic Decay Detector (ADD; Fujikura Europe Limited, Wiltshire, England) was used to measure the time it took an ultrasound wave to cross 280 diameters in red oak trees with varying degrees of bacterial wetwood or heartwood decay. Linear regressions derived from the ADD readings of trees in Mississippi and South Carolina with wetwood and heartwood decay...

  15. Application of least square support vector machine and multivariate adaptive regression spline models in long term prediction of river water pollution

    NASA Astrophysics Data System (ADS)

    Kisi, Ozgur; Parmar, Kulwinder Singh

    2016-03-01

    This study investigates the accuracy of least square support vector machine (LSSVM), multivariate adaptive regression splines (MARS) and M5 model tree (M5Tree) in modeling river water pollution. Various combinations of water quality parameters, Free Ammonia (AMM), Total Kjeldahl Nitrogen (TKN), Water Temperature (WT), Total Coliform (TC), Fecal Coliform (FC) and Potential of Hydrogen (pH) monitored at Nizamuddin, Delhi Yamuna River in India were used as inputs to the applied models. Results indicated that the LSSVM and MARS models had almost same accuracy and they performed better than the M5Tree model in modeling monthly chemical oxygen demand (COD). The average root mean square error (RMSE) of the LSSVM and M5Tree models was decreased by 1.47% and 19.1% using MARS model, respectively. Adding TC input to the models did not increase their accuracy in modeling COD while adding FC and pH inputs to the models generally decreased the accuracy. The overall results indicated that the MARS and LSSVM models could be successfully used in estimating monthly river water pollution level by using AMM, TKN and WT parameters as inputs.

  16. Nervous systems and scenarios for the invertebrate-to-vertebrate transition.

    PubMed

    Holland, Nicholas D

    2016-01-05

    Older evolutionary scenarios for the origin of vertebrates often gave nervous systems top billing in accordance with the notion that a big-brained Homo sapiens crowned a tree of life shaped mainly by progressive evolution. Now, however, tree thinking positions all extant organisms equidistant from the tree's root, and molecular phylogenies indicate that regressive evolution is more common than previously suspected. Even so, contemporary theories of vertebrate origin still focus on the nervous system because of its functional importance, its richness in characters for comparative biology, and its central position in the two currently prominent scenarios for the invertebrate-to-vertebrate transition, which grew out of the markedly neurocentric annelid and enteropneust theories of the nineteenth century. Both these scenarios compare phyla with diverse overall body plans. This diversity, exacerbated by the scarcity of relevant fossil data, makes it challenging to establish plausible homologies between component parts (e.g. nervous system regions). In addition, our current understanding of the relation between genotype and phenotype is too preliminary to permit us to convert gene network data into structural features in any simple way. These issues are discussed here with special reference to the evolution of nervous systems during proposed transitions from invertebrates to vertebrates. © 2015 The Author(s).

  17. Interactive Tree Of Life v2: online annotation and display of phylogenetic trees made easy.

    PubMed

    Letunic, Ivica; Bork, Peer

    2011-07-01

    Interactive Tree Of Life (http://itol.embl.de) is a web-based tool for the display, manipulation and annotation of phylogenetic trees. It is freely available and open to everyone. In addition to classical tree viewer functions, iTOL offers many novel ways of annotating trees with various additional data. Current version introduces numerous new features and greatly expands the number of supported data set types. Trees can be interactively manipulated and edited. A free personal account system is available, providing management and sharing of trees in user defined workspaces and projects. Export to various bitmap and vector graphics formats is supported. Batch access interface is available for programmatic access or inclusion of interactive trees into other web services.

  18. Topography and crop management are key factors for the development of american leaf spot epidemics on coffee in costa rica.

    PubMed

    Avelino, Jacques; Cabut, Sandrine; Barboza, Bernardo; Barquero, Miguel; Alfaro, Ronny; Esquivel, César; Durand, Jean-François; Cilas, Christian

    2007-12-01

    ABSTRACT We monitored the development of American leaf spot of coffee, a disease caused by the gemmiferous fungus Mycena citricolor, in 57 plots in Costa Rica for 1 or 2 years in order to gain a clearer understanding of conditions conducive to the disease and improve its control. During the investigation, characteristics of the coffee trees, crop management, and the environment were recorded. For the analyses, we used partial least-squares regression via the spline functions (PLSS), which is a nonlinear extension to partial least-squares regression (PLS). The fungus developed well in areas located between approximately 1,100 and 1,550 m above sea level. Slopes were conducive to its development, but eastern-facing slopes were less affected than the others, probably because they were more exposed to sunlight, especially in the rainy season. The distance between planting rows, the shade percentage, coffee tree height, the type of shade, and the pruning system explained disease intensity due to their effects on coffee tree shading and, possibly, on the humidity conditions in the plot. Forest trees and fruit trees intercropped with coffee provided particularly propitious conditions. Apparently, fertilization was unfavorable for the disease, probably due to dilution phenomena associated with faster coffee tree growth. Finally, series of wet spells interspersed with dry spells, which were frequent in the middle of the rainy season, were critical for the disease, probably because they affected the production and release of gemmae and their viability. These results could be used to draw up a map of epidemic risks taking topographical factors into account. To reduce those risks and improve chemical control, our results suggested that farmers should space planting rows further apart, maintain light shading in the plantation, and prune their coffee trees.

  19. Ensemble classification of individual Pinus crowns from multispectral satellite imagery and airborne LiDAR

    NASA Astrophysics Data System (ADS)

    Kukunda, Collins B.; Duque-Lazo, Joaquín; González-Ferreiro, Eduardo; Thaden, Hauke; Kleinn, Christoph

    2018-03-01

    Distinguishing tree species is relevant in many contexts of remote sensing assisted forest inventory. Accurate tree species maps support management and conservation planning, pest and disease control and biomass estimation. This study evaluated the performance of applying ensemble techniques with the goal of automatically distinguishing Pinus sylvestris L. and Pinus uncinata Mill. Ex Mirb within a 1.3 km2 mountainous area in Barcelonnette (France). Three modelling schemes were examined, based on: (1) high-density LiDAR data (160 returns m-2), (2) Worldview-2 multispectral imagery, and (3) Worldview-2 and LiDAR in combination. Variables related to the crown structure and height of individual trees were extracted from the normalized LiDAR point cloud at individual-tree level, after performing individual tree crown (ITC) delineation. Vegetation indices and the Haralick texture indices were derived from Worldview-2 images and served as independent spectral variables. Selection of the best predictor subset was done after a comparison of three variable selection procedures: (1) Random Forests with cross validation (AUCRFcv), (2) Akaike Information Criterion (AIC) and (3) Bayesian Information Criterion (BIC). To classify the species, 9 regression techniques were combined using ensemble models. Predictions were evaluated using cross validation and an independent dataset. Integration of datasets and models improved individual tree species classification (True Skills Statistic, TSS; from 0.67 to 0.81) over individual techniques and maintained strong predictive power (Relative Operating Characteristic, ROC = 0.91). Assemblage of regression models and integration of the datasets provided more reliable species distribution maps and associated tree-scale mapping uncertainties. Our study highlights the potential of model and data assemblage at improving species classifications needed in present-day forest planning and management.

  20. Chilling and heat requirements for flowering in temperate fruit trees

    NASA Astrophysics Data System (ADS)

    Guo, Liang; Dai, Junhu; Ranjitkar, Sailesh; Yu, Haiying; Xu, Jianchu; Luedeling, Eike

    2014-08-01

    Climate change has affected the rates of chilling and heat accumulation, which are vital for flowering and production, in temperate fruit trees, but few studies have been conducted in the cold-winter climates of East Asia. To evaluate tree responses to variation in chill and heat accumulation rates, partial least squares regression was used to correlate first flowering dates of chestnut ( Castanea mollissima Blume) and jujube ( Zizyphus jujube Mill.) in Beijing, China, with daily chill and heat accumulation between 1963 and 2008. The Dynamic Model and the Growing Degree Hour Model were used to convert daily records of minimum and maximum temperature into horticulturally meaningful metrics. Regression analyses identified the chilling and forcing periods for chestnut and jujube. The forcing periods started when half the chilling requirements were fulfilled. Over the past 50 years, heat accumulation during tree dormancy increased significantly, while chill accumulation remained relatively stable for both species. Heat accumulation was the main driver of bloom timing, with effects of variation in chill accumulation negligible in Beijing's cold-winter climate. It does not seem likely that reductions in chill will have a major effect on the studied species in Beijing in the near future. Such problems are much more likely for trees grown in locations that are substantially warmer than their native habitats, such as temperate species in the subtropics and tropics.

  1. Chilling and heat requirements for flowering in temperate fruit trees.

    PubMed

    Guo, Liang; Dai, Junhu; Ranjitkar, Sailesh; Yu, Haiying; Xu, Jianchu; Luedeling, Eike

    2014-08-01

    Climate change has affected the rates of chilling and heat accumulation, which are vital for flowering and production, in temperate fruit trees, but few studies have been conducted in the cold-winter climates of East Asia. To evaluate tree responses to variation in chill and heat accumulation rates, partial least squares regression was used to correlate first flowering dates of chestnut (Castanea mollissima Blume) and jujube (Zizyphus jujube Mill.) in Beijing, China, with daily chill and heat accumulation between 1963 and 2008. The Dynamic Model and the Growing Degree Hour Model were used to convert daily records of minimum and maximum temperature into horticulturally meaningful metrics. Regression analyses identified the chilling and forcing periods for chestnut and jujube. The forcing periods started when half the chilling requirements were fulfilled. Over the past 50 years, heat accumulation during tree dormancy increased significantly, while chill accumulation remained relatively stable for both species. Heat accumulation was the main driver of bloom timing, with effects of variation in chill accumulation negligible in Beijing’s cold-winter climate. It does not seem likely that reductions in chill will have a major effect on the studied species in Beijing in the near future. Such problems are much more likely for trees grown in locations that are substantially warmer than their native habitats, such as temperate species in the subtropics and tropics.

  2. Predictors of adherence with self-care guidelines among persons with type 2 diabetes: results from a logistic regression tree analysis.

    PubMed

    Yamashita, Takashi; Kart, Cary S; Noe, Douglas A

    2012-12-01

    Type 2 diabetes is known to contribute to health disparities in the U.S. and failure to adhere to recommended self-care behaviors is a contributing factor. Intervention programs face difficulties as a result of patient diversity and limited resources. With data from the 2005 Behavioral Risk Factor Surveillance System, this study employs a logistic regression tree algorithm to identify characteristics of sub-populations with type 2 diabetes according to their reported frequency of adherence to four recommended diabetes self-care behaviors including blood glucose monitoring, foot examination, eye examination and HbA1c testing. Using Andersen's health behavior model, need factors appear to dominate the definition of which sub-groups were at greatest risk for low as well as high adherence. Findings demonstrate the utility of easily interpreted tree diagrams to design specific culturally appropriate intervention programs targeting sub-populations of diabetes patients who need to improve their self-care behaviors. Limitations and contributions of the study are discussed.

  3. Spatial prediction of landslides using a hybrid machine learning approach based on Random Subspace and Classification and Regression Trees

    NASA Astrophysics Data System (ADS)

    Pham, Binh Thai; Prakash, Indra; Tien Bui, Dieu

    2018-02-01

    A hybrid machine learning approach of Random Subspace (RSS) and Classification And Regression Trees (CART) is proposed to develop a model named RSSCART for spatial prediction of landslides. This model is a combination of the RSS method which is known as an efficient ensemble technique and the CART which is a state of the art classifier. The Luc Yen district of Yen Bai province, a prominent landslide prone area of Viet Nam, was selected for the model development. Performance of the RSSCART model was evaluated through the Receiver Operating Characteristic (ROC) curve, statistical analysis methods, and the Chi Square test. Results were compared with other benchmark landslide models namely Support Vector Machines (SVM), single CART, Naïve Bayes Trees (NBT), and Logistic Regression (LR). In the development of model, ten important landslide affecting factors related with geomorphology, geology and geo-environment were considered namely slope angles, elevation, slope aspect, curvature, lithology, distance to faults, distance to rivers, distance to roads, and rainfall. Performance of the RSSCART model (AUC = 0.841) is the best compared with other popular landslide models namely SVM (0.835), single CART (0.822), NBT (0.821), and LR (0.723). These results indicate that performance of the RSSCART is a promising method for spatial landslide prediction.

  4. Random Forest as a Predictive Analytics Alternative to Regression in Institutional Research

    ERIC Educational Resources Information Center

    He, Lingjun; Levine, Richard A.; Fan, Juanjuan; Beemer, Joshua; Stronach, Jeanne

    2018-01-01

    In institutional research, modern data mining approaches are seldom considered to address predictive analytics problems. The goal of this paper is to highlight the advantages of tree-based machine learning algorithms over classic (logistic) regression methods for data-informed decision making in higher education problems, and stress the success of…

  5. Tree allometry and improved estimation of carbon stocks and balance in tropical forests.

    PubMed

    Chave, J; Andalo, C; Brown, S; Cairns, M A; Chambers, J Q; Eamus, D; Fölster, H; Fromard, F; Higuchi, N; Kira, T; Lescure, J-P; Nelson, B W; Ogawa, H; Puig, H; Riéra, B; Yamakura, T

    2005-08-01

    Tropical forests hold large stores of carbon, yet uncertainty remains regarding their quantitative contribution to the global carbon cycle. One approach to quantifying carbon biomass stores consists in inferring changes from long-term forest inventory plots. Regression models are used to convert inventory data into an estimate of aboveground biomass (AGB). We provide a critical reassessment of the quality and the robustness of these models across tropical forest types, using a large dataset of 2,410 trees >or= 5 cm diameter, directly harvested in 27 study sites across the tropics. Proportional relationships between aboveground biomass and the product of wood density, trunk cross-sectional area, and total height are constructed. We also develop a regression model involving wood density and stem diameter only. Our models were tested for secondary and old-growth forests, for dry, moist and wet forests, for lowland and montane forests, and for mangrove forests. The most important predictors of AGB of a tree were, in decreasing order of importance, its trunk diameter, wood specific gravity, total height, and forest type (dry, moist, or wet). Overestimates prevailed, giving a bias of 0.5-6.5% when errors were averaged across all stands. Our regression models can be used reliably to predict aboveground tree biomass across a broad range of tropical forests. Because they are based on an unprecedented dataset, these models should improve the quality of tropical biomass estimates, and bring consensus about the contribution of the tropical forest biome and tropical deforestation to the global carbon cycle.

  6. Decision tree analysis to stratify risk of de novo non-melanoma skin cancer following liver transplantation.

    PubMed

    Tanaka, Tomohiro; Voigt, Michael D

    2018-03-01

    Non-melanoma skin cancer (NMSC) is the most common de novo malignancy in liver transplant (LT) recipients; it behaves more aggressively and it increases mortality. We used decision tree analysis to develop a tool to stratify and quantify risk of NMSC in LT recipients. We performed Cox regression analysis to identify which predictive variables to enter into the decision tree analysis. Data were from the Organ Procurement Transplant Network (OPTN) STAR files of September 2016 (n = 102984). NMSC developed in 4556 of the 105984 recipients, a mean of 5.6 years after transplant. The 5/10/20-year rates of NMSC were 2.9/6.3/13.5%, respectively. Cox regression identified male gender, Caucasian race, age, body mass index (BMI) at LT, and sirolimus use as key predictive or protective factors for NMSC. These factors were entered into a decision tree analysis. The final tree stratified non-Caucasians as low risk (0.8%), and Caucasian males > 47 years, BMI < 40 who did not receive sirolimus, as high risk (7.3% cumulative incidence of NMSC). The predictions in the derivation set were almost identical to those in the validation set (r 2  = 0.971, p < 0.0001). Cumulative incidence of NMSC in low, moderate and high risk groups at 5/10/20 year was 0.5/1.2/3.3, 2.1/4.8/11.7 and 5.6/11.6/23.1% (p < 0.0001). The decision tree model accurately stratifies the risk of developing NMSC in the long-term after LT.

  7. Downscaling soil moisture over East Asia through multi-sensor data fusion and optimization of regression trees

    NASA Astrophysics Data System (ADS)

    Park, Seonyoung; Im, Jungho; Park, Sumin; Rhee, Jinyoung

    2017-04-01

    Soil moisture is one of the most important keys for understanding regional and global climate systems. Soil moisture is directly related to agricultural processes as well as hydrological processes because soil moisture highly influences vegetation growth and determines water supply in the agroecosystem. Accurate monitoring of the spatiotemporal pattern of soil moisture is important. Soil moisture has been generally provided through in situ measurements at stations. Although field survey from in situ measurements provides accurate soil moisture with high temporal resolution, it requires high cost and does not provide the spatial distribution of soil moisture over large areas. Microwave satellite (e.g., advanced Microwave Scanning Radiometer on the Earth Observing System (AMSR2), the Advanced Scatterometer (ASCAT), and Soil Moisture Active Passive (SMAP)) -based approaches and numerical models such as Global Land Data Assimilation System (GLDAS) and Modern- Era Retrospective Analysis for Research and Applications (MERRA) provide spatial-temporalspatiotemporally continuous soil moisture products at global scale. However, since those global soil moisture products have coarse spatial resolution ( 25-40 km), their applications for agriculture and water resources at local and regional scales are very limited. Thus, soil moisture downscaling is needed to overcome the limitation of the spatial resolution of soil moisture products. In this study, GLDAS soil moisture data were downscaled up to 1 km spatial resolution through the integration of AMSR2 and ASCAT soil moisture data, Shuttle Radar Topography Mission (SRTM) Digital Elevation Model (DEM), and Moderate Resolution Imaging Spectroradiometer (MODIS) data—Land Surface Temperature, Normalized Difference Vegetation Index, and Land cover—using modified regression trees over East Asia from 2013 to 2015. Modified regression trees were implemented using Cubist, a commercial software tool based on machine learning. An optimization based on pruning of rules derived from the modified regression trees was conducted. Root Mean Square Error (RMSE) and Correlation coefficients (r) were used to optimize the rules, and finally 59 rules from modified regression trees were produced. The results show high validation r (0.79) and low validation RMSE (0.0556m3/m3). The 1 km downscaled soil moisture was evaluated using ground soil moisture data at 14 stations, and both soil moisture data showed similar temporal patterns (average r=0.51 and average RMSE=0.041). The spatial distribution of the 1 km downscaled soil moisture well corresponded with GLDAS soil moisture that caught both extremely dry and wet regions. Correlation between GLDAS and the 1 km downscaled soil moisture during growing season was positive (mean r=0.35) in most regions.

  8. Hide and vanish: data sets where the most parsimonious tree is known but hard to find, and their implications for tree search methods.

    PubMed

    Goloboff, Pablo A

    2014-10-01

    Three different types of data sets, for which the uniquely most parsimonious tree can be known exactly but is hard to find with heuristic tree search methods, are studied. Tree searches are complicated more by the shape of the tree landscape (i.e. the distribution of homoplasy on different trees) than by the sheer abundance of homoplasy or character conflict. Data sets of Type 1 are those constructed by Radel et al. (2013). Data sets of Type 2 present a very rugged landscape, with narrow peaks and valleys, but relatively low amounts of homoplasy. For such a tree landscape, subjecting the trees to TBR and saving suboptimal trees produces much better results when the sequence of clipping for the tree branches is randomized instead of fixed. An unexpected finding for data sets of Types 1 and 2 is that starting a search from a random tree instead of a random addition sequence Wagner tree may increase the probability that the search finds the most parsimonious tree; a small artificial example where these probabilities can be calculated exactly is presented. Data sets of Type 3, the most difficult data sets studied here, comprise only congruent characters, and a single island with only one most parsimonious tree. Even if there is a single island, missing entries create a very flat landscape which is difficult to traverse with tree search algorithms because the number of equally parsimonious trees that need to be saved and swapped to effectively move around the plateaus is too large. Minor modifications of the parameters of tree drifting, ratchet, and sectorial searches allow travelling around these plateaus much more efficiently than saving and swapping large numbers of equally parsimonious trees with TBR. For these data sets, two new related criteria for selecting taxon addition sequences in Wagner trees (the "selected" and "informative" addition sequences) produce much better results than the standard random or closest addition sequences. These new methods for Wagner trees and for moving around plateaus can be useful when analyzing phylogenomic data sets formed by concatenation of genes with uneven taxon representation ("sparse" supermatrices), which are likely to present a tree landscape with extensive plateaus. Copyright © 2014 Elsevier Inc. All rights reserved.

  9. Diameter-growth model across shortleaf pine range using regression tree analysis

    Treesearch

    Daniel Yaussy; Louis Iverson; Anantha Prasad

    1999-01-01

    Diameter growth of a tree in most gap-phase models is limited by light, nutrients, moisture, and temperature. Growing-season temperature is represented by growing degree days (gdd), which is the sum of the average daily temperatures above a baseline temperature. Gap-phase models determine the north-south range of a species by the gdd limits at the north and south...

  10. Applicability of predictive models of drought-induced tree mortality between the midwest and northeast United States

    Treesearch

    Eric J. Gustafson

    2014-01-01

    Regression models developed in the upper Midwest (United States) to predict drought-induced tree mortality from measures of drought (Palmer Drought Severity Index) were tested in the northeastern United States and found inadequate. The most likely cause of this result is that long drought events were rare in the Northeast during the period when inventory data were...

  11. Dating tree mortality using log decay in the White Mountains of New Hampshire

    Treesearch

    Andrew J. Fast; Mark J. Ducey; Jeffrey H. Gove; William B. Leak

    2008-01-01

    Coarse woody material (CWM) is an important component of forest ecosystems. To meet specific CWM management objectives, it is important to understand rates of decay. We present results from a silvicultural trial at the Bartlett Experimental Forest, in which time of death is known for a large sample of trees. Either a simple table or regression equations that use...

  12. Development of post-fire crown damage mortality thresholds in ponderosa pine

    Treesearch

    James F. Fowler; Carolyn Hull Sieg; Joel McMillin; Kurt K. Allen; Jose F. Negron; Linda L. Wadleigh; John A. Anhold; Ken E. Gibson

    2010-01-01

    Previous research has shown that crown scorch volume and crown consumption volume are the major predictors of post-fire mortality in ponderosa pine. In this study, we use piecewise logistic regression models of crown scorch data from 6633 trees in five wildfires from the Intermountain West to locate a mortality threshold at 88% scorch by volume for trees with no crown...

  13. Equations relating compacted and uncompacted live crown ratio for common tree species in the South

    Treesearch

    KaDonna C. Randolph

    2010-01-01

    Species-specific equations to predict uncompacted crown ratio (UNCR) from compacted live crown ratio (CCR), tree length, and stem diameter were developed for 24 species and 12 genera in the southern United States. Using data from the US Forest Service Forest Inventory and Analysis program, nonlinear regression was used to model UNCR with a logistic function. Model...

  14. Patterns of tree species diversity and composition in old-field successional forests in central Illinois

    Treesearch

    Scott M. Bretthauer; George Z. Gertner; Gary L. Rolfe; Jeffery O. Dawson

    2003-01-01

    Tree species diversity increases and dominance decreases with proximity to forest border in two 60-year-old successional forest stands developed on abandoned agricultural land in Piatt County, Illinois. A regression equation allowed us to quantify an increase in diversity with closeness to forest border for one of the forest stands. Shingle oak is the most dominant...

  15. Regeneration of Douglas-fir cutblocks on the Six Rivers National Forest in northwestern California

    Treesearch

    R. O. Strothmann

    1979-01-01

    A survey of 61 cutblocks planted since 1964 evaluated stocking of conifers (trees 1 foot tall or taller) on 2-milacre quadrats. Overall stocking percentage averaged 42.2 and ranged from 15 to 8 1. Overall number of trees per acre averaged 396. In the regression model, based on 36 cutblocks, better stocking was associated with high site class, northerly aspect,...

  16. Protection of individual ash trees from emerald ash borer (Coleoptera: Buprestidae) with basal soil applications of imidacloprid.

    PubMed

    Smitley, D R; Rebek, E J; Royalty, R N; Davis, T W; Newhouse, K F

    2010-02-01

    We conducted field trials at five different locations over a period of 6 yr to investigate the efficacy of imidacloprid applied each spring as a basal soil drench for protection against emerald ash borer, Agrilus planipennis Fairmaire (Coleoptera: Buprestidae). Canopy thinning and emerald ash borer larval density were used to evaluate efficacy for 3-4 yr at each location while treatments continued. Test sites included small urban trees (5-15 cm diameter at breast height [dbh]), medium to large (15-65 cm dbh) trees at golf courses, and medium to large street trees. Annual basal drenches with imidacloprid gave complete protection of small ash trees for three years. At three sites where the size of trees ranged from 23 to 37 cm dbh, we successfully protected all ash trees beginning the test with <60% canopy thinning. Regression analysis of data from two sites reveals that tree size explains 46% of the variation in efficacy of imidacloprid drenches. The smallest trees (<30 cm dbh) remained in excellent condition for 3 yr, whereas most of the largest trees (>38 cm dbh) declined to a weakened state and undesirable appearance. The five-fold increase in trunk and branch surface area of ash trees as the tree dbh doubles may account for reduced efficacy on larger trees, and suggests a need to increase treatment rates for larger trees.

  17. Prediction of Baseflow Index of Catchments using Machine Learning Algorithms

    NASA Astrophysics Data System (ADS)

    Yadav, B.; Hatfield, K.

    2017-12-01

    We present the results of eight machine learning techniques for predicting the baseflow index (BFI) of ungauged basins using a surrogate of catchment scale climate and physiographic data. The tested algorithms include ordinary least squares, ridge regression, least absolute shrinkage and selection operator (lasso), elasticnet, support vector machine, gradient boosted regression trees, random forests, and extremely randomized trees. Our work seeks to identify the dominant controls of BFI that can be readily obtained from ancillary geospatial databases and remote sensing measurements, such that the developed techniques can be extended to ungauged catchments. More than 800 gauged catchments spanning the continental United States were selected to develop the general methodology. The BFI calculation was based on the baseflow separated from daily streamflow hydrograph using HYSEP filter. The surrogate catchment attributes were compiled from multiple sources including digital elevation model, soil, landuse, climate data, other publicly available ancillary and geospatial data. 80% catchments were used to train the ML algorithms, and the remaining 20% of the catchments were used as an independent test set to measure the generalization performance of fitted models. A k-fold cross-validation using exhaustive grid search was used to fit the hyperparameters of each model. Initial model development was based on 19 independent variables, but after variable selection and feature ranking, we generated revised sparse models of BFI prediction that are based on only six catchment attributes. These key predictive variables selected after the careful evaluation of bias-variance tradeoff include average catchment elevation, slope, fraction of sand, permeability, temperature, and precipitation. The most promising algorithms exceeding an accuracy score (r-square) of 0.7 on test data include support vector machine, gradient boosted regression trees, random forests, and extremely randomized trees. Considering both the accuracy and the computational complexity of these algorithms, we identify the extremely randomized trees as the best performing algorithm for BFI prediction in ungauged basins.

  18. Current and Potential Tree Locations in Tree Line Ecotone of Changbai Mountains, Northeast China: The Controlling Effects of Topography

    PubMed Central

    Zong, Shengwei; Wu, Zhengfang; Xu, Jiawei; Li, Ming; Gao, Xiaofeng; He, Hongshi; Du, Haibo; Wang, Lei

    2014-01-01

    Tree line ecotone in the Changbai Mountains has undergone large changes in the past decades. Tree locations show variations on the four sides of the mountains, especially on the northern and western sides, which has not been fully explained. Previous studies attributed such variations to the variations in temperature. However, in this study, we hypothesized that topographic controls were responsible for causing the variations in the tree locations in tree line ecotone of the Changbai Mountains. To test the hypothesis, we used IKONOS images and WorldView-1 image to identify the tree locations and developed a logistic regression model using topographical variables to identify the dominant controls of the tree locations. The results showed that aspect, wetness, and slope were dominant controls for tree locations on western side of the mountains, whereas altitude, SPI, and aspect were the dominant factors on northern side. The upmost altitude a tree can currently reach was 2140 m asl on the northern side and 2060 m asl on western side. The model predicted results showed that habitats above the current tree line on the both sides were available for trees. Tree recruitments under the current tree line may take advantage of the available habitats at higher elevations based on the current tree location. Our research confirmed the controlling effects of topography on the tree locations in the tree line ecotone of Changbai Mountains and suggested that it was essential to assess the tree response to topography in the research of tree line ecotone. PMID:25170918

  19. Current and potential tree locations in tree line ecotone of Changbai Mountains, Northeast China: the controlling effects of topography.

    PubMed

    Zong, Shengwei; Wu, Zhengfang; Xu, Jiawei; Li, Ming; Gao, Xiaofeng; He, Hongshi; Du, Haibo; Wang, Lei

    2014-01-01

    Tree line ecotone in the Changbai Mountains has undergone large changes in the past decades. Tree locations show variations on the four sides of the mountains, especially on the northern and western sides, which has not been fully explained. Previous studies attributed such variations to the variations in temperature. However, in this study, we hypothesized that topographic controls were responsible for causing the variations in the tree locations in tree line ecotone of the Changbai Mountains. To test the hypothesis, we used IKONOS images and WorldView-1 image to identify the tree locations and developed a logistic regression model using topographical variables to identify the dominant controls of the tree locations. The results showed that aspect, wetness, and slope were dominant controls for tree locations on western side of the mountains, whereas altitude, SPI, and aspect were the dominant factors on northern side. The upmost altitude a tree can currently reach was 2140 m asl on the northern side and 2060 m asl on western side. The model predicted results showed that habitats above the current tree line on the both sides were available for trees. Tree recruitments under the current tree line may take advantage of the available habitats at higher elevations based on the current tree location. Our research confirmed the controlling effects of topography on the tree locations in the tree line ecotone of Changbai Mountains and suggested that it was essential to assess the tree response to topography in the research of tree line ecotone.

  20. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wang, Chun-Chieh; Department of Medical Imaging and Radiological Science, Chang Gung University School of Medicine, Taoyuan, Taiwan; Lai, Chyong-Huey

    Purpose: To study the prognostic value of the human papillomavirus (HPV) genotypes in cervical cancer patients undergoing radiotherapy. Patients and Methods: A total of 1,010 patients with cervical cancer after radiotherapy between 1993 and 2000 were eligible for this study. The HPV genotypes were determined by a genechip, which detects 38 types of HPV. The patient characteristics and treatment outcomes were analyzed using the Cox regression hazard model and classification and regression tree decision tree method. Results: A total of 25 genotypes of HPV were detected in 992 specimens (98.2%). The leading 8 types were HPV16, 58, 18, 33, 52,more » 39, 31, and 45. These types belong to two high-risk HPV species: alpha-7 (HPV18, 39, 45) and alpha-9 (HPV16, 31, 33, 52, 58). Three HPV-based risk groups, which were independent of established prognostic factors, such as International Federation of Gynecology and Obstetrics stage, age, pathologic features, squamous cell carcinoma antigen, and lymph node metastasis, were associated with the survival outcomes. The high-risk group consisted of the patients without HPV infection or the ones infected with the alpha-7 species only. Patients co-infected with the alpha-7 and alpha-9 species belonged to the medium-risk group, and the others were included in the low-risk group. Conclusion: The results of the present study have confirmed the prognostic value of HPV genotypes in cervical cancer treated with radiotherapy. The different effect of the alpha-7 and alpha-9 species on the radiation response deserves additional exploration.« less

  1. Using multiobjective tradeoff sets and Multivariate Regression Trees to identify critical and robust decisions for long term water utility planning

    NASA Astrophysics Data System (ADS)

    Smith, R.; Kasprzyk, J. R.; Balaji, R.

    2017-12-01

    In light of deeply uncertain factors like future climate change and population shifts, responsible resource management will require new types of information and strategies. For water utilities, this entails potential expansion and efficient management of water supply infrastructure systems for changes in overall supply; changes in frequency and severity of climate extremes such as droughts and floods; and variable demands, all while accounting for conflicting long and short term performance objectives. Multiobjective Evolutionary Algorithms (MOEAs) are emerging decision support tools that have been used by researchers and, more recently, water utilities to efficiently generate and evaluate thousands of planning portfolios. The tradeoffs between conflicting objectives are explored in an automated way to produce (often large) suites of portfolios that strike different balances of performance. Once generated, the sets of optimized portfolios are used to support relatively subjective assertions of priorities and human reasoning, leading to adoption of a plan. These large tradeoff sets contain information about complex relationships between decisions and between groups of decisions and performance that, until now, has not been quantitatively described. We present a novel use of Multivariate Regression Trees (MRTs) to analyze tradeoff sets to reveal these relationships and critical decisions. Additionally, when MRTs are applied to tradeoff sets developed for different realizations of an uncertain future, they can identify decisions that are robust across a wide range of conditions and produce fundamental insights about the system being optimized.

  2. Concurrent validation of a neurocognitive assessment protocol for clients with mental illness in job matching as shop sales in supported employment.

    PubMed

    Ng, S S W; Lak, D C C; Lee, S C K; Ng, P P K

    2015-03-01

    Occupational therapists play a major role in the assessment and referral of clients with severe mental illness for supported employment. Nonetheless, there is scarce literature about the content and predictive validity of the process. In addition, the criteria of successful job matching have not been analysed and job supervisors have relied on experience rather than objective standards in recruitment. This study aimed to explore the profile of successful clients working in 'shop sales' in a supportive environment using a neurocognitive assessment protocol, and to validate the protocol against 'internal standards' of the job supervisors. This was a concurrent validation study of criterion-related scales for a single job type. The subjective ratings from the supervisors were concurrently validated against the results of neurocognitive assessment of intellectual function and work-related cognitive behaviour. A regression model was established for clients who succeeded and failed in employment using supervisor's ratings and a cutoff value of 10.5 for the Performance Fitness Rating Scale (R(2) = 0.918, F[41] = 3.794, p = 0.003). Classification And Regression Tree was also plotted to identify the profile of cases, with an overall accuracy of 0.861 (relative error, 0.26). Use of both inference statistics and data mining techniques enables the decision tree of neurocognitive assessments to be more readily applied by therapists in vocational rehabilitation, and thus directly improve the efficiency and efficacy of the process.

  3. Hierarchical additive modeling of nonlinear association with spatial correlations--an application to relate alcohol outlet density and neighborhood assault rates.

    PubMed

    Yu, Qingzhao; Li, Bin; Scribner, Richard Allen

    2009-06-30

    Previous studies have suggested a link between alcohol outlets and assaults. In this paper, we explore the effects of alcohol availability on assaults at the census tract level over time. In addition, we use a natural experiment to check whether a sudden loss of alcohol outlets is associated with deeper decreasing in assault violence. Several features of the data raise statistical challenges: (1) the association between covariates (for example, the alcohol outlet density of each census tract) and the assault rates may be complex and therefore cannot be described using a linear model without covariates transformation, (2) the covariates may be highly correlated with each other, (3) there are a number of observations that have missing inputs, and (4) there is spatial association in assault rates at the census tract level. We propose a hierarchical additive model, where the nonlinear correlations and the complex interaction effects are modeled using the multiple additive regression trees and the residual spatial association in the assault rates that cannot be explained in the model are smoothed using a conditional autoregressive (CAR) method. We develop a two-stage algorithm that connects the nonparametric trees with CAR to look for important covariates associated with the assault rates, while taking into account the spatial association of assault rates in adjacent census tracts. The proposed method is applied to the Los Angeles assault data (1990-1999). To assess the efficiency of the method, the results are compared with those obtained from a hierarchical linear model. Copyright (c) 2009 John Wiley & Sons, Ltd.

  4. Classification and regression tree (CART) analyses of genomic signatures reveal sets of tetramers that discriminate temperature optima of archaea and bacteria

    PubMed Central

    Dyer, Betsey D.; Kahn, Michael J.; LeBlanc, Mark D.

    2008-01-01

    Classification and regression tree (CART) analysis was applied to genome-wide tetranucleotide frequencies (genomic signatures) of 195 archaea and bacteria. Although genomic signatures have typically been used to classify evolutionary divergence, in this study, convergent evolution was the focus. Temperature optima for most of the organisms examined could be distinguished by CART analyses of tetranucleotide frequencies. This suggests that pervasive (nonlinear) qualities of genomes may reflect certain environmental conditions (such as temperature) in which those genomes evolved. The predominant use of GAGA and AGGA as the discriminating tetramers in CART models suggests that purine-loading and codon biases of thermophiles may explain some of the results. PMID:19054742

  5. Identification of sexually abused female adolescents at risk for suicidal ideations: a classification and regression tree analysis.

    PubMed

    Brabant, Marie-Eve; Hébert, Martine; Chagnon, François

    2013-01-01

    This study explored the clinical profiles of 77 female teenager survivors of sexual abuse and examined the association of abuse-related and personal variables with suicidal ideations. Analyses revealed that 64% of participants experienced suicidal ideations. Findings from classification and regression tree analysis indicated that depression, posttraumatic stress symptoms, and hopelessness discriminated profiles of suicidal and nonsuicidal survivors. The elevated prevalence of suicidal ideations among adolescent survivors of sexual abuse underscores the importance of investigating the presence of suicidal ideations in sexual abuse survivors. However, suicidal ideation is not the sole variable that needs to be investigated; depression, hopelessness and posttraumatic stress symptoms are also related to suicidal ideations in survivors and could therefore guide interventions.

  6. Evaluation and prediction of shrub cover in coastal Oregon forests (USA)

    Treesearch

    Becky K. Kerns; Janet L. Ohmann

    2004-01-01

    We used data from regional forest inventories and research programs, coupled with mapped climatic and topographic information, to explore relationships and develop multiple linear regression (MLR) and regression tree models for total and deciduous shrub cover in the Oregon coastal province. Results from both types of models indicate that forest structure variables were...

  7. An Introduction to Recursive Partitioning: Rationale, Application, and Characteristics of Classification and Regression Trees, Bagging, and Random Forests

    ERIC Educational Resources Information Center

    Strobl, Carolin; Malley, James; Tutz, Gerhard

    2009-01-01

    Recursive partitioning methods have become popular and widely used tools for nonparametric regression and classification in many scientific fields. Especially random forests, which can deal with large numbers of predictor variables even in the presence of complex interactions, have been applied successfully in genetics, clinical medicine, and…

  8. Weighted linear regression using D2H and D2 as the independent variables

    Treesearch

    Hans T. Schreuder; Michael S. Williams

    1998-01-01

    Several error structures for weighted regression equations used for predicting volume were examined for 2 large data sets of felled and standing loblolly pine trees (Pinus taeda L.). The generally accepted model with variance of error proportional to the value of the covariate squared ( D2H = diameter squared times height or D...

  9. An Intelligent Decision Support System for Workforce Forecast

    DTIC Science & Technology

    2011-01-01

    ARIMA ) model to forecast the demand for construction skills in Hong Kong. This model was based...Decision Trees ARIMA Rule Based Forecasting Segmentation Forecasting Regression Analysis Simulation Modeling Input-Output Models LP and NLP Markovian...data • When results are needed as a set of easily interpretable rules 4.1.4 ARIMA Auto-regressive, integrated, moving-average ( ARIMA ) models

  10. Tree biomass in the Swiss landscape: nationwide modelling for improved accounting for forest and non-forest trees.

    PubMed

    Price, B; Gomez, A; Mathys, L; Gardi, O; Schellenberger, A; Ginzler, C; Thürig, E

    2017-03-01

    Trees outside forest (TOF) can perform a variety of social, economic and ecological functions including carbon sequestration. However, detailed quantification of tree biomass is usually limited to forest areas. Taking advantage of structural information available from stereo aerial imagery and airborne laser scanning (ALS), this research models tree biomass using national forest inventory data and linear least-square regression and applies the model both inside and outside of forest to create a nationwide model for tree biomass (above ground and below ground). Validation of the tree biomass model against TOF data within settlement areas shows relatively low model performance (R 2 of 0.44) but still a considerable improvement on current biomass estimates used for greenhouse gas inventory and carbon accounting. We demonstrate an efficient and easily implementable approach to modelling tree biomass across a large heterogeneous nationwide area. The model offers significant opportunity for improved estimates on land use combination categories (CC) where tree biomass has either not been included or only roughly estimated until now. The ALS biomass model also offers the advantage of providing greater spatial resolution and greater within CC spatial variability compared to the current nationwide estimates.

  11. Tree mortality following prescribed fire and a storm surge event in Slash Pine (pinus elliottii var. densa) forests in the Florida Keys, USA

    USGS Publications Warehouse

    Sah, Jay P.; Ross, Michael S.; Snyder, James R.; Ogurcak, Danielle E.

    2010-01-01

    In fire-dependent forests, managers are interested in predicting the consequences of prescribed burning on postfire tree mortality. We examined the effects of prescribed fire on tree mortality in Florida Keys pine forests, using a factorial design with understory type, season, and year of burn as factors. We also used logistic regression to model the effects of burn season, fire severity, and tree dimensions on individual tree mortality. Despite limited statistical power due to problems in carrying out the full suite of planned experimental burns, associations with tree and fire variables were observed. Post-fire pine tree mortality was negatively correlated with tree size and positively correlated with char height and percent crown scorch. Unlike post-fire mortality, tree mortality associated with storm surge from Hurricane Wilma was greater in the large size classes. Due to their influence on population structure and fuel dynamics, the size-selective mortality patterns following fire and storm surge have practical importance for using fire as a management tool in Florida Keys pinelands in the future, particularly when the threats to their continued existence from tropical storms and sea level rise are expected to increase.

  12. Simple street tree sampling

    Treesearch

    David J. Nowak; Jeffrey T. Walton; James Baldwin; Jerry Bond

    2015-01-01

    Information on street trees is critical for management of this important resource. Sampling of street tree populations provides an efficient means to obtain street tree population information. Long-term repeat measures of street tree samples supply additional information on street tree changes and can be used to report damages from catastrophic events. Analyses of...

  13. M5 model tree based predictive modeling of road accidents on non-urban sections of highways in India.

    PubMed

    Singh, Gyanendra; Sachdeva, S N; Pal, Mahesh

    2016-11-01

    This work examines the application of M5 model tree and conventionally used fixed/random effect negative binomial (FENB/RENB) regression models for accident prediction on non-urban sections of highway in Haryana (India). Road accident data for a period of 2-6 years on different sections of 8 National and State Highways in Haryana was collected from police records. Data related to road geometry, traffic and road environment related variables was collected through field studies. Total two hundred and twenty two data points were gathered by dividing highways into sections with certain uniform geometric characteristics. For prediction of accident frequencies using fifteen input parameters, two modeling approaches: FENB/RENB regression and M5 model tree were used. Results suggest that both models perform comparably well in terms of correlation coefficient and root mean square error values. M5 model tree provides simple linear equations that are easy to interpret and provide better insight, indicating that this approach can effectively be used as an alternative to RENB approach if the sole purpose is to predict motor vehicle crashes. Sensitivity analysis using M5 model tree also suggests that its results reflect the physical conditions. Both models clearly indicate that to improve safety on Indian highways minor accesses to the highways need to be properly designed and controlled, the service roads to be made functional and dispersion of speeds is to be brought down. Copyright © 2016 Elsevier Ltd. All rights reserved.

  14. Presence of indicator plant species as a predictor of wetland vegetation integrity

    USGS Publications Warehouse

    Stapanian, Martin A.; Adams, Jean V.; Gara, Brian

    2013-01-01

    We fit regression and classification tree models to vegetation data collected from Ohio (USA) wetlands to determine (1) which species best predict Ohio vegetation index of biotic integrity (OVIBI) score and (2) which species best predict high-quality wetlands (OVIBI score >75). The simplest regression tree model predicted OVIBI score based on the occurrence of three plant species: skunk-cabbage (Symplocarpus foetidus), cinnamon fern (Osmunda cinnamomea), and swamp rose (Rosa palustris). The lowest OVIBI scores were best predicted by the absence of the selected plant species rather than by the presence of other species. The simplest classification tree model predicted high-quality wetlands based on the occurrence of two plant species: skunk-cabbage and marsh-fern (Thelypteris palustris). The overall misclassification rate from this tree was 13 %. Again, low-quality wetlands were better predicted than high-quality wetlands by the absence of selected species rather than the presence of other species using the classification tree model. Our results suggest that a species’ wetland status classification and coefficient of conservatism are of little use in predicting wetland quality. A simple, statistically derived species checklist such as the one created in this study could be used by field biologists to quickly and efficiently identify wetland sites likely to be regulated as high-quality, and requiring more intensive field assessments. Alternatively, it can be used for advanced determinations of low-quality wetlands. Agencies can save considerable money by screening wetlands for the presence/absence of such “indicator” species before issuing permits.

  15. Automatic energy expenditure measurement for health science.

    PubMed

    Catal, Cagatay; Akbulut, Akhan

    2018-04-01

    It is crucial to predict the human energy expenditure in any sports activity and health science application accurately to investigate the impact of the activity. However, measurement of the real energy expenditure is not a trivial task and involves complex steps. The objective of this work is to improve the performance of existing estimation models of energy expenditure by using machine learning algorithms and several data from different sensors and provide this estimation service in a cloud-based platform. In this study, we used input data such as breathe rate, and hearth rate from three sensors. Inputs are received from a web form and sent to the web service which applies a regression model on Azure cloud platform. During the experiments, we assessed several machine learning models based on regression methods. Our experimental results showed that our novel model which applies Boosted Decision Tree Regression in conjunction with the median aggregation technique provides the best result among other five regression algorithms. This cloud-based energy expenditure system which uses a web service showed that cloud computing technology is a great opportunity to develop estimation systems and the new model which applies Boosted Decision Tree Regression with the median aggregation provides remarkable results. Copyright © 2018 Elsevier B.V. All rights reserved.

  16. Why choose Random Forest to predict rare species distribution with few samples in large undersampled areas? Three Asian crane species models provide supporting evidence.

    PubMed

    Mi, Chunrong; Huettmann, Falk; Guo, Yumin; Han, Xuesong; Wen, Lijia

    2017-01-01

    Species distribution models (SDMs) have become an essential tool in ecology, biogeography, evolution and, more recently, in conservation biology. How to generalize species distributions in large undersampled areas, especially with few samples, is a fundamental issue of SDMs. In order to explore this issue, we used the best available presence records for the Hooded Crane ( Grus monacha , n  = 33), White-naped Crane ( Grus vipio , n  = 40), and Black-necked Crane ( Grus nigricollis , n  = 75) in China as three case studies, employing four powerful and commonly used machine learning algorithms to map the breeding distributions of the three species: TreeNet (Stochastic Gradient Boosting, Boosted Regression Tree Model), Random Forest, CART (Classification and Regression Tree) and Maxent (Maximum Entropy Models). In addition, we developed an ensemble forecast by averaging predicted probability of the above four models results. Commonly used model performance metrics (Area under ROC (AUC) and true skill statistic (TSS)) were employed to evaluate model accuracy. The latest satellite tracking data and compiled literature data were used as two independent testing datasets to confront model predictions. We found Random Forest demonstrated the best performance for the most assessment method, provided a better model fit to the testing data, and achieved better species range maps for each crane species in undersampled areas. Random Forest has been generally available for more than 20 years and has been known to perform extremely well in ecological predictions. However, while increasingly on the rise, its potential is still widely underused in conservation, (spatial) ecological applications and for inference. Our results show that it informs ecological and biogeographical theories as well as being suitable for conservation applications, specifically when the study area is undersampled. This method helps to save model-selection time and effort, and allows robust and rapid assessments and decisions for efficient conservation.

  17. Why choose Random Forest to predict rare species distribution with few samples in large undersampled areas? Three Asian crane species models provide supporting evidence

    PubMed Central

    Mi, Chunrong; Huettmann, Falk; Han, Xuesong; Wen, Lijia

    2017-01-01

    Species distribution models (SDMs) have become an essential tool in ecology, biogeography, evolution and, more recently, in conservation biology. How to generalize species distributions in large undersampled areas, especially with few samples, is a fundamental issue of SDMs. In order to explore this issue, we used the best available presence records for the Hooded Crane (Grus monacha, n = 33), White-naped Crane (Grus vipio, n = 40), and Black-necked Crane (Grus nigricollis, n = 75) in China as three case studies, employing four powerful and commonly used machine learning algorithms to map the breeding distributions of the three species: TreeNet (Stochastic Gradient Boosting, Boosted Regression Tree Model), Random Forest, CART (Classification and Regression Tree) and Maxent (Maximum Entropy Models). In addition, we developed an ensemble forecast by averaging predicted probability of the above four models results. Commonly used model performance metrics (Area under ROC (AUC) and true skill statistic (TSS)) were employed to evaluate model accuracy. The latest satellite tracking data and compiled literature data were used as two independent testing datasets to confront model predictions. We found Random Forest demonstrated the best performance for the most assessment method, provided a better model fit to the testing data, and achieved better species range maps for each crane species in undersampled areas. Random Forest has been generally available for more than 20 years and has been known to perform extremely well in ecological predictions. However, while increasingly on the rise, its potential is still widely underused in conservation, (spatial) ecological applications and for inference. Our results show that it informs ecological and biogeographical theories as well as being suitable for conservation applications, specifically when the study area is undersampled. This method helps to save model-selection time and effort, and allows robust and rapid assessments and decisions for efficient conservation. PMID:28097060

  18. Differentiation between Cystic Pituitary Adenomas and Rathke Cleft Cysts: A Diagnostic Model Using MRI.

    PubMed

    Park, M; Lee, S-K; Choi, J; Kim, S-H; Kim, S H; Shin, N-Y; Kim, J; Ahn, S S

    2015-10-01

    Cystic pituitary adenomas may mimic Rathke cleft cysts when there is no solid enhancing component found on MR imaging, and preoperative differentiation may enable a more appropriate selection of treatment strategies. We investigated the diagnostic potential of MR imaging features to differentiate cystic pituitary adenomas from Rathke cleft cysts and to develop a diagnostic model. This retrospective study included 54 patients with a cystic pituitary adenoma (40 women; mean age, 37.7 years) and 28 with a Rathke cleft cyst (18 women; mean age, 31.5 years) who underwent MR imaging followed by surgery. The following imaging features were assessed: the presence or absence of a fluid-fluid level, a hypointense rim on T2-weighted images, septation, an off-midline location, the presence or absence of an intracystic nodule, size change, and signal change. On the basis of the results of logistic regression analysis, a diagnostic tree model was developed to differentiate between cystic pituitary adenomas and Rathke cleft cysts. External validation was performed for an additional 16 patients with a cystic pituitary adenoma and 8 patients with a Rathke cleft cyst. The presence of a fluid-fluid level, a hypointense rim on T2-weighted images, septation, and an off-midline location were more common with pituitary adenomas, whereas the presence of an intracystic nodule was more common with Rathke cleft cysts. Multiple logistic regression analysis showed that cystic pituitary adenomas and Rathke cleft cysts can be distinguished on the basis of the presence of a fluid-fluid level, septation, an off-midline location, and the presence of an intracystic nodule (P = .006, .032, .001, and .023, respectively). Among 24 patients in the external validation population, 22 were classified correctly on the basis of the diagnostic tree model used in this study. A systematic approach using this diagnostic tree model can be helpful in distinguishing cystic pituitary adenomas from Rathke cleft cysts. © 2015 by American Journal of Neuroradiology.

  19. Quantifying the variability of snowpack properties and processes in a small-forested catchment representative of the boreal zone

    NASA Astrophysics Data System (ADS)

    Parajuli, A.; Nadeau, D.; Anctil, F.; Parent, A. C.; Bouchard, B.; Jutras, S.

    2017-12-01

    In snow-fed catchments, it is crucial to monitor and to model snow water equivalent (SWE), particularly to simulate the melt water runoff. However, the distribution of SWE can be highly heterogeneous, particularly within forested environments, mainly because of the large variability in snow depths. Although the boreal forest is the dominant land cover in Canada and in a few other northern countries, very few studies have quantified the spatiotemporal variability of snow depths and snowpack dynamics within this biome. The objective of this paper is to fill this research gap, through a detailed monitoring of snowpack dynamics at nine locations within a 3.57 km2 experimental forested catchment in southern Quebec, Canada (47°N, 71°W). The catchment receives 6 m of snow annually on average and is predominantly covered with balsam fir stand with some traces of spruce and white birch. In this study, we used a network of nine so-called `snow profiling stations', providing automated snow depth and snowpack temperature profile measurements, as well as three contrasting sites (juvenile, sapling and open areas) where sublimation rates were directly measured with flux towers. In addition, a total of 1401 manual snow samples supported by 20 snow pits measurements were collected throughout the winter of 2017. This paper presents some preliminary analyses of this unique dataset. Simple empirical relations relying SWE with easy-to-determine proxies, such as snow depths and snow temperature, are tested. Then, binary regression trees and multiple regression analysis are used to model SWE using topographic characteristics (slope, aspect, elevation), forest features (tree height, tree diameter, forest density and gap fraction) and meteorological forcing (solar radiation, wind speed, snow-pack temperature profile, air temperature, humidity). An analysis of sublimation rates comparing open area, saplings and juvenile forest is also presented in this paper.

  20. Predicting species' range limits from functional traits for the tree flora of North America.

    PubMed

    Stahl, Ulrike; Reu, Björn; Wirth, Christian

    2014-09-23

    Using functional traits to explain species' range limits is a promising approach in functional biogeography. It replaces the idiosyncrasy of species-specific climate ranges with a generic trait-based predictive framework. In addition, it has the potential to shed light on specific filter mechanisms creating large-scale vegetation patterns. However, its application to a continental flora, spanning large climate gradients, has been hampered by a lack of trait data. Here, we explore whether five key plant functional traits (seed mass, wood density, specific leaf area (SLA), maximum height, and longevity of a tree)--indicative of life history, mechanical, and physiological adaptations--explain the climate ranges of 250 North American tree species distributed from the boreal to the subtropics. Although the relationship between traits and the median climate across a species range is weak, quantile regressions revealed strong effects on range limits. Wood density and seed mass were strongly related to the lower but not upper temperature range limits of species. Maximum height affects the species range limits in both dry and humid climates, whereas SLA and longevity do not show clear relationships. These results allow the definition and delineation of climatic "no-go areas" for North American tree species based on key traits. As some of these key traits serve as important parameters in recent vegetation models, the implementation of trait-based climatic constraints has the potential to predict both range shifts and ecosystem consequences on a more functional basis. Moreover, for future trait-based vegetation models our results provide a benchmark for model evaluation.

  1. Use of GLM approach to assess the responses of tropical trees to urban air pollution in relation to leaf functional traits and tree characteristics.

    PubMed

    Mukherjee, Arideep; Agrawal, Madhoolika

    2018-05-15

    Responses of urban vegetation to air pollution stress in relation to their tolerance and sensitivity have been extensively studied, however, studies related to air pollution responses based on different leaf functional traits and tree characteristics are limited. In this paper, we have tried to assess combined and individual effects of major air pollutants PM 10 (particulate matter ≤ 10 µm), TSP (total suspended particulate matter), SO 2 (sulphur dioxide), NO 2 (nitrogen dioxide) and O 3 (ozone) on thirteen tropical tree species in relation to fifteen leaf functional traits and different tree characteristics. Stepwise linear regression a general linear modelling approach was used to quantify the pollution response of trees against air pollutants. The study was performed for six successive seasons for two years in three distinct urban areas (traffic, industrial and residential) of Varanasi city in India. At all the study sites, concentrations of air pollutants, specifically PM (particulate matter) and NO 2 were above the specified standards. Distinct variations were recorded in all the fifteen leaf functional traits with pollution load. Caesalpinia sappan was identified as most tolerant species followed by Psidium guajava, Dalbergia sissoo and Albizia lebbeck. Stepwise regression analysis identified maximum response of Eucalyptus citriodora and P. guajava to air pollutants explaining overall 59% and 58% variability's in leaf functional traits, respectively. Among leaf functional traits, maximum effect of air pollutants was observed on non-enzymatic antioxidants followed by photosynthetic pigments and leaf water status. Among the pollutants, PM was identified as the major stress factor followed by O 3 explaining 47% and 33% variability's in leaf functional traits. Tolerance and pollution response were regulated by different tree characteristics such as height, canopy size, leaf from, texture and nature of tree. Outcomes of this study will help in urban forest development by selection of specific pollutant tolerant tree species and leaf traits, which is suitable as air pollution mitigation measure. Copyright © 2018 Elsevier Inc. All rights reserved.

  2. Contaminant Gradients in Trees: Directional Tree Coring Reveals Boundaries of Soil and Soil-Gas Contamination with Potential Applications in Vapor Intrusion Assessment.

    PubMed

    Wilson, Jordan L; Samaranayake, V A; Limmer, Matthew A; Schumacher, John G; Burken, Joel G

    2017-12-19

    Contaminated sites pose ecological and human-health risks through exposure to contaminated soil and groundwater. Whereas we can readily locate, monitor, and track contaminants in groundwater, it is harder to perform these tasks in the vadose zone. In this study, tree-core samples were collected at a Superfund site to determine if the sample-collection location around a particular tree could reveal the subsurface location, or direction, of soil and soil-gas contaminant plumes. Contaminant-centroid vectors were calculated from tree-core data to reveal contaminant distributions in directional tree samples at a higher resolution, and vectors were correlated with soil-gas characterization collected using conventional methods. Results clearly demonstrated that directional tree coring around tree trunks can indicate gradients in soil and soil-gas contaminant plumes, and the strength of the correlations were directly proportionate to the magnitude of tree-core concentration gradients (spearman's coefficient of -0.61 and -0.55 in soil and tree-core gradients, respectively). Linear regression indicates agreement between the concentration-centroid vectors is significantly affected by in planta and soil concentration gradients and when concentration centroids in soil are closer to trees. Given the existing link between soil-gas and vapor intrusion, this study also indicates that directional tree coring might be applicable in vapor intrusion assessment.

  3. Contaminant gradients in trees: Directional tree coring reveals boundaries of soil and soil-gas contamination with potential applications in vapor intrusion assessment

    USGS Publications Warehouse

    Wilson, Jordan L.; Samaranayake, V.A.; Limmer, Matthew A.; Schumacher, John G.; Burken, Joel G.

    2017-01-01

    Contaminated sites pose ecological and human-health risks through exposure to contaminated soil and groundwater. Whereas we can readily locate, monitor, and track contaminants in groundwater, it is harder to perform these tasks in the vadose zone. In this study, tree-core samples were collected at a Superfund site to determine if the sample-collection location around a particular tree could reveal the subsurface location, or direction, of soil and soil-gas contaminant plumes. Contaminant-centroid vectors were calculated from tree-core data to reveal contaminant distributions in directional tree samples at a higher resolution, and vectors were correlated with soil-gas characterization collected using conventional methods. Results clearly demonstrated that directional tree coring around tree trunks can indicate gradients in soil and soil-gas contaminant plumes, and the strength of the correlations were directly proportionate to the magnitude of tree-core concentration gradients (spearman’s coefficient of -0.61 and -0.55 in soil and tree-core gradients, respectively). Linear regression indicates agreement between the concentration-centroid vectors is significantly affected by in-planta and soil concentration gradients and when concentration centroids in soil are closer to trees. Given the existing link between soil-gas and vapor intrusion, this study also indicates that directional tree coring might be applicable in vapor intrusion assessment.

  4. Aeolian dust nutrient contributions increase with substrate age in semi-arid ecosystems

    NASA Astrophysics Data System (ADS)

    Coble, A. A.; Hart, S. C.; Ketterer, M. E.; Newman, G. S.

    2013-12-01

    Rock-derived nutrients supplied by mineral weathering become depleted over time, and without an additional nutrient source the ecosystem may eventually regress or reach a terminal steady state. Previous studies have demonstrated that aeolian dust act as parent materials of soils and important nutrients to plants in arid regions, but the relative importance of these exogenous nutrients to the function of dry ecosystems during soil development is uncertain. Here, using strontium isotopes as a tracer and a well-constrained, three million year old substrate age gradient, we show that aeolian-derived nutrients become increasingly important to plant-available soil pools and tree (Pinus edulis) growth during the latter stages of soil development in a semi-arid climate. Furthermore, the depth of nutrient uptake increased on older substrates, suggesting that trees in arid regions acquire nutrients from greater depths as ecosystem development progresses presumably in response to nutrient depletion in the more weathered surface soils. Our results contribute to the unification of biogeochemical theory by demonstrating the similarity in roles of atmospheric nutrient inputs during ecosystem development across contrasting climates.

  5. Above ground biomass and tree species richness estimation with airborne lidar in tropical Ghana forests

    NASA Astrophysics Data System (ADS)

    Vaglio Laurin, Gaia; Puletti, Nicola; Chen, Qi; Corona, Piermaria; Papale, Dario; Valentini, Riccardo

    2016-10-01

    Estimates of forest aboveground biomass are fundamental for carbon monitoring and accounting; delivering information at very high spatial resolution is especially valuable for local management, conservation and selective logging purposes. In tropical areas, hosting large biomass and biodiversity resources which are often threatened by unsustainable anthropogenic pressures, frequent forest resources monitoring is needed. Lidar is a powerful tool to estimate aboveground biomass at fine resolution; however its application in tropical forests has been limited, with high variability in the accuracy of results. Lidar pulses scan the forest vertical profile, and can provide structure information which is also linked to biodiversity. In the last decade the remote sensing of biodiversity has received great attention, but few studies focused on the use of lidar for assessing tree species richness in tropical forests. This research aims at estimating aboveground biomass and tree species richness using discrete return airborne lidar in Ghana forests. We tested an advanced statistical technique, Multivariate Adaptive Regression Splines (MARS), which does not require assumptions on data distribution or on the relationships between variables, being suitable for studying ecological variables. We compared the MARS regression results with those obtained by multilinear regression and found that both algorithms were effective, but MARS provided higher accuracy either for biomass (R2 = 0.72) and species richness (R2 = 0.64). We also noted strong correlation between biodiversity and biomass field values. Even if the forest areas under analysis are limited in extent and represent peculiar ecosystems, the preliminary indications produced by our study suggest that instrument such as lidar, specifically useful for pinpointing forest structure, can also be exploited as a support for tree species richness assessment.

  6. Red-shouldered hawk nesting habitat preference in south Texas

    USGS Publications Warehouse

    Strobel, Bradley N.; Boal, Clint W.

    2010-01-01

    We examined nesting habitat preference by red-shouldered hawks Buteo lineatus using conditional logistic regression on characteristics measured at 27 occupied nest sites and 68 unused sites in 2005–2009 in south Texas. We measured vegetation characteristics of individual trees (nest trees and unused trees) and corresponding 0.04-ha plots. We evaluated the importance of tree and plot characteristics to nesting habitat selection by comparing a priori tree-specific and plot-specific models using Akaike's information criterion. Models with only plot variables carried 14% more weight than models with only center tree variables. The model-averaged odds ratios indicated red-shouldered hawks selected to nest in taller trees and in areas with higher average diameter at breast height than randomly available within the forest stand. Relative to randomly selected areas, each 1-m increase in nest tree height and 1-cm increase in the plot average diameter at breast height increased the probability of selection by 85% and 10%, respectively. Our results indicate that red-shouldered hawks select nesting habitat based on vegetation characteristics of individual trees as well as the 0.04-ha area surrounding the tree. Our results indicate forest management practices resulting in tall forest stands with large average diameter at breast height would benefit red-shouldered hawks in south Texas.

  7. Trees of Yap: a field guide

    Treesearch

    Marjorie V. Cushing Falanruw

    2015-01-01

    Descriptions, drawings, and photographs are presented for trees found on the Yap Islands in the Federated States of Micronesia. Included are all recorded native trees and most introduced trees as well as new records of native and introduced trees. Additional information is provided on tree distribution, status, vernacular names in Micronesia, and English names when...

  8. Static terrestrial laser scanning of juvenile understory trees for field phenotyping

    NASA Astrophysics Data System (ADS)

    Wang, Huanhuan; Lin, Yi

    2014-11-01

    This study was to attempt the cutting-edge 3D remote sensing technique of static terrestrial laser scanning (TLS) for parametric 3D reconstruction of juvenile understory trees. The data for test was collected with a Leica HDS6100 TLS system in a single-scan way. The geometrical structures of juvenile understory trees are extracted by model fitting. Cones are used to model trunks and branches. Principal component analysis (PCA) is adopted to calculate their major axes. Coordinate transformation and orthogonal projection are used to estimate the parameters of the cones. Then, AutoCAD is utilized to simulate the morphological characteristics of the understory trees, and to add secondary branches and leaves in a random way. Comparison of the reference values and the estimated values gives the regression equation and shows that the proposed algorithm of extracting parameters is credible. The results have basically verified the applicability of TLS for field phenotyping of juvenile understory trees.

  9. Spectral analysis of white ash response to emerald ash borer infestations

    NASA Astrophysics Data System (ADS)

    Calandra, Laura

    The emerald ash borer (EAB) (Agrilus planipennis Fairmaire) is an invasive insect that has killed over 50 million ash trees in the US. The goal of this research was to establish a method to identify ash trees infested with EAB using remote sensing techniques at the leaf-level and tree crown level. First, a field-based study at the leaf-level used the range of spectral bands from the WorldView-2 sensor to determine if there was a significant difference between EAB-infested white ash (Fraxinus americana) and healthy leaves. Binary logistic regression models were developed using individual and combinations of wavelengths; the most successful model included 545 and 950 nm bands. The second half of this research employed imagery to identify healthy and EAB-infested trees, comparing pixel- and object-based methods by applying an unsupervised classification approach and a tree crown delineation algorithm, respectively. The pixel-based models attained the highest overall accuracies.

  10. Improving medical diagnosis reliability using Boosted C5.0 decision tree empowered by Particle Swarm Optimization.

    PubMed

    Pashaei, Elnaz; Ozen, Mustafa; Aydin, Nizamettin

    2015-08-01

    Improving accuracy of supervised classification algorithms in biomedical applications is one of active area of research. In this study, we improve the performance of Particle Swarm Optimization (PSO) combined with C4.5 decision tree (PSO+C4.5) classifier by applying Boosted C5.0 decision tree as the fitness function. To evaluate the effectiveness of our proposed method, it is implemented on 1 microarray dataset and 5 different medical data sets obtained from UCI machine learning databases. Moreover, the results of PSO + Boosted C5.0 implementation are compared to eight well-known benchmark classification methods (PSO+C4.5, support vector machine under the kernel of Radial Basis Function, Classification And Regression Tree (CART), C4.5 decision tree, C5.0 decision tree, Boosted C5.0 decision tree, Naive Bayes and Weighted K-Nearest neighbor). Repeated five-fold cross-validation method was used to justify the performance of classifiers. Experimental results show that our proposed method not only improve the performance of PSO+C4.5 but also obtains higher classification accuracy compared to the other classification methods.

  11. A quantitative analysis to objectively appraise drought indicators and model drought impacts

    NASA Astrophysics Data System (ADS)

    Bachmair, S.; Svensson, C.; Hannaford, J.; Barker, L. J.; Stahl, K.

    2016-07-01

    Drought monitoring and early warning is an important measure to enhance resilience towards drought. While there are numerous operational systems using different drought indicators, there is no consensus on which indicator best represents drought impact occurrence for any given sector. Furthermore, thresholds are widely applied in these indicators but, to date, little empirical evidence exists as to which indicator thresholds trigger impacts on society, the economy, and ecosystems. The main obstacle for evaluating commonly used drought indicators is a lack of information on drought impacts. Our aim was therefore to exploit text-based data from the European Drought Impact report Inventory (EDII) to identify indicators that are meaningful for region-, sector-, and season-specific impact occurrence, and to empirically determine indicator thresholds. In addition, we tested the predictability of impact occurrence based on the best-performing indicators. To achieve these aims we applied a correlation analysis and an ensemble regression tree approach, using Germany and the UK (the most data-rich countries in the EDII) as test beds. As candidate indicators we chose two meteorological indicators (Standardized Precipitation Index, SPI, and Standardized Precipitation Evaporation Index, SPEI) and two hydrological indicators (streamflow and groundwater level percentiles). The analysis revealed that accumulation periods of SPI and SPEI best linked to impact occurrence are longer for the UK compared with Germany, but there is variability within each country, among impact categories and, to some degree, seasons. The median of regression tree splitting values, which we regard as estimates of thresholds of impact occurrence, was around -1 for SPI and SPEI in the UK; distinct differences between northern/northeastern vs. southern/central regions were found for Germany. Predictions with the ensemble regression tree approach yielded reasonable results for regions with good impact data coverage. The predictions also provided insights into the EDII, in particular highlighting drought events where missing impact reports may reflect a lack of recording rather than true absence of impacts. Overall, the presented quantitative framework proved to be a useful tool for evaluating drought indicators, and to model impact occurrence. In summary, this study demonstrates the information gain for drought monitoring and early warning through impact data collection and analysis. It highlights the important role that quantitative analysis with impact data can have in providing "ground truth" for drought indicators, alongside more traditional stakeholder-led approaches.

  12. Exposure and effects of perfluoroalkyl substances in tree ...

    EPA Pesticide Factsheets

    The exposure and effects of perfluoroalkyl substances (PFASs) were studied at eight locations in Minnesota and Wisconsin between 2007 and 2011 using tree swallows (Tachycineta bicolor) as sentinel species. These eight sites covered a range of possible exposure pathways and ecological settings. Concentrations in various swallow tissues were quantified as were reproductive success endpoints. The sample egg method was used wherein an egg sample is collected and the hatching success of the remaining eggs in the nest is assessed. The association between PFAS exposure and reproductive success was assessed by site comparisons, logistic regression analysis, and multistate modeling, a technique that has not previously been used in this context. There was a negative association between concentrations of PFASs in eggs and hatching success; this is the second field study in which a negative association was found. The concentration at which effects became evident (150 200 ng/g wet wt.) was far below effect levels found in laboratory feeding trials or egg injection studies on other avian species. This discrepancy was likely because behavioral effects and other extrinsic factors are not accounted for in these laboratory studies; further, there is a mixture of PFASs in field studies rather than a single-contaminant used in laboratory studies, and the possibility that tree swallows are unusually sensitive to PFASs. Additional field effect studies on other avian species

  13. A Comparison of Logistic Regression, Neural Networks, and Classification Trees Predicting Success of Actuarial Students

    ERIC Educational Resources Information Center

    Schumacher, Phyllis; Olinsky, Alan; Quinn, John; Smith, Richard

    2010-01-01

    The authors extended previous research by 2 of the authors who conducted a study designed to predict the successful completion of students enrolled in an actuarial program. They used logistic regression to determine the probability of an actuarial student graduating in the major or dropping out. They compared the results of this study with those…

  14. Coping with Multicollinearity: An Example on Application of Principal Components Regression in Dendroecology

    Treesearch

    B. Desta Fekedulegn; J.J. Colbert; R.R., Jr. Hicks; Michael E. Schuckers

    2002-01-01

    The theory and application of principal components regression, a method for coping with multicollinearity among independent variables in analyzing ecological data, is exhibited in detail. A concrete example of the complex procedures that must be carried out in developing a diagnostic growth-climate model is provided. We use tree radial increment data taken from breast...

  15. Regression methods for spatially correlated data: an example using beetle attacks in a seed orchard

    Treesearch

    Preisler Haiganoush; Nancy G. Rappaport; David L. Wood

    1997-01-01

    We present a statistical procedure for studying the simultaneous effects of observed covariates and unmeasured spatial variables on responses of interest. The procedure uses regression type analyses that can be used with existing statistical software packages. An example using the rate of twig beetle attacks on Douglas-fir trees in a seed orchard illustrates the...

  16. Per capita community-level effects of an invasive grass, Microstegium vimineum, on vegetation in mesic forests in northern Mississippi (USA)

    Treesearch

    J. Stephen Brewer

    2010-01-01

    Quantifying per capita impacts of invasive species on resident communities requires integrating regression analyses with experiments under natural conditions. Using multivariate and univariate approaches, I regressed the abundance of 105 resident species of groundcover plants and tree seedlings against the abundance and height of an invasive grass, Microstegium...

  17. Predicting surface fuel models and fuel metrics using lidar and CIR imagery in a dense mixed conifer forest

    Treesearch

    Marek K. Jakubowksi; Qinghua Guo; Brandon Collins; Scott Stephens; Maggi Kelly

    2013-01-01

    We compared the ability of several classification and regression algorithms to predict forest stand structure metrics and standard surface fuel models. Our study area spans a dense, topographically complex Sierra Nevada mixed-conifer forest. We used clustering, regression trees, and support vector machine algorithms to analyze high density (average 9 pulses/m

  18. Can tree species diversity be assessed with Landsat data in a temperate forest?

    PubMed

    Arekhi, Maliheh; Yılmaz, Osman Yalçın; Yılmaz, Hatice; Akyüz, Yaşar Feyza

    2017-10-28

    The diversity of forest trees as an indicator of ecosystem health can be assessed using the spectral characteristics of plant communities through remote sensing data. The objectives of this study were to investigate alpha and beta tree diversity using Landsat data for six dates in the Gönen dam watershed of Turkey. We used richness and the Shannon and Simpson diversity indices to calculate tree alpha diversity. We also represented the relationship between beta diversity and remotely sensed data using species composition similarity and spectral distance similarity of sampling plots via quantile regression. A total of 99 sampling units, each 20 m × 20 m, were selected using geographically stratified random sampling method. Within each plot, the tree species were identified, and all of the trees with a diameter at breast height (dbh) larger than 7 cm were measured. Presence/absence and abundance data (tree species number and tree species basal area) of tree species were used to determine the relationship between richness and the Shannon and Simpson diversity indices, which were computed with ground field data, and spectral variables derived (2 × 2 pixels and 3 × 3 pixels) from Landsat 8 OLI data. The Shannon-Weiner index had the highest correlation. For all six dates, NDVI (normalized difference vegetation index) was the spectral variable most strongly correlated with the Shannon index and the tree diversity variables. The Ratio of green to red (VI) was the spectral variable least correlated with the tree diversity variables and the Shannon basal area. In both beta diversity curves, the slope of the OLS regression was low, while in the upper quantile, it was approximately twice the lower quantiles. The Jaccard index is closed to one with little difference in both two beta diversity approaches. This result is due to increasing the similarity between the sampling plots when they are located close to each other. The intercept differences between two investigated beta diversity were strongly related to the development stage of a number of sampling plots in the tree species basal area method. To obtain beta diversity, the tree basal area method indicates better result than the tree species number method at representing similarity of regions which are located close together. In conclusion, NDVI is helpful for estimating the alpha diversity of trees over large areas when the vegetation is at the maximum growing season. Beta diversity could be obtained with the spectral heterogeneity of Landsat data. Future tree diversity studies using remote sensing data should select data sets when vegetation is at the maximum growing season. Also, forest tree diversity investigations can be identified by using higher-resolution remote sensing data such as ESA Sentinel 2 data which is freely available since June 2015.

  19. Using Classification and Regression Trees (CART) and random forests to analyze attrition: Results from two simulations.

    PubMed

    Hayes, Timothy; Usami, Satoshi; Jacobucci, Ross; McArdle, John J

    2015-12-01

    In this article, we describe a recent development in the analysis of attrition: using classification and regression trees (CART) and random forest methods to generate inverse sampling weights. These flexible machine learning techniques have the potential to capture complex nonlinear, interactive selection models, yet to our knowledge, their performance in the missing data analysis context has never been evaluated. To assess the potential benefits of these methods, we compare their performance with commonly employed multiple imputation and complete case techniques in 2 simulations. These initial results suggest that weights computed from pruned CART analyses performed well in terms of both bias and efficiency when compared with other methods. We discuss the implications of these findings for applied researchers. (c) 2015 APA, all rights reserved).

  20. Using Classification and Regression Trees (CART) and Random Forests to Analyze Attrition: Results From Two Simulations

    PubMed Central

    Hayes, Timothy; Usami, Satoshi; Jacobucci, Ross; McArdle, John J.

    2016-01-01

    In this article, we describe a recent development in the analysis of attrition: using classification and regression trees (CART) and random forest methods to generate inverse sampling weights. These flexible machine learning techniques have the potential to capture complex nonlinear, interactive selection models, yet to our knowledge, their performance in the missing data analysis context has never been evaluated. To assess the potential benefits of these methods, we compare their performance with commonly employed multiple imputation and complete case techniques in 2 simulations. These initial results suggest that weights computed from pruned CART analyses performed well in terms of both bias and efficiency when compared with other methods. We discuss the implications of these findings for applied researchers. PMID:26389526

  1. Integrated Approach To Design And Analysis Of Systems

    NASA Technical Reports Server (NTRS)

    Patterson-Hine, F. A.; Iverson, David L.

    1993-01-01

    Object-oriented fault-tree representation unifies evaluation of reliability and diagnosis of faults. Programming/fault tree described more fully in "Object-Oriented Algorithm For Evaluation Of Fault Trees" (ARC-12731). Augmented fault tree object contains more information than fault tree object used in quantitative analysis of reliability. Additional information needed to diagnose faults in system represented by fault tree.

  2. Modelling the ecological consequences of whole tree harvest for bioenergy production

    NASA Astrophysics Data System (ADS)

    Skår, Silje; Lange, Holger; Sogn, Trine

    2013-04-01

    There is an increasing demand for energy from biomass as a substitute to fossil fuels worldwide, and the Norwegian government plans to double the production of bioenergy to 9% of the national energy production or to 28 TWh per year by 2020. A large part of this increase may come from forests, which have a great potential with respect to biomass supply as forest growth increasingly has exceeded harvest in the last decades. One feasible option is the utilization of forest residues (needles, twigs and branches) in addition to stems, known as Whole Tree Harvest (WTH). As opposed to WTH, the residues are traditionally left in the forest with Conventional Timber Harvesting (CH). However, the residues contain a large share of the treés nutrients, indicating that WTH may possibly alter the supply of nutrients and organic matter to the soil and the forest ecosystem. This may potentially lead to reduced tree growth. Other implications can be nutrient imbalance, loss of carbon from the soil and changes in species composition and diversity. This study aims to identify key factors and appropriate strategies for ecologically sustainable WTH in Norway spruce (Picea abies) and Scots pine (Pinus sylvestris) forest stands in Norway. We focus on identifying key factors driving soil organic matter, nutrients, biomass, biodiversity etc. Simulations of the effect on the carbon and nitrogen budget with the two harvesting methods will also be conducted. Data from field trials and long-term manipulation experiments are used to obtain a first overview of key variables. The relationships between the variables are hitherto unknown, but it is by no means obvious that they could be assumed as linear; thus, an ordinary multiple linear regression approach is expected to be insufficient. Here we apply two advanced and highly flexible modelling frameworks which hardly have been used in the context of tree growth, nutrient balances and biomass removal so far: Generalized Additive Models (GAMs) and Random Forests. Results obtained for GAMs so far show that there are differences between WTH and CH in two directions: both the significance of drivers and the shape of the response functions differ. GAMs turn out to be a flexible and powerful alternative to multivariate linear regression. The restriction to linear relationships seems to be unjustified in the present case. We use Random Forests as a highly efficient classifier which gives reliable estimates for the importance of each driver variable in determining the diameter growth for the two different harvesting treatments. Based on the final results of these two modelling approaches, the study contributes to find appropriate strategies and suitable regions (in Norway) where WTH may be sustainable performed.

  3. Improving ensemble decision tree performance using Adaboost and Bagging

    NASA Astrophysics Data System (ADS)

    Hasan, Md. Rajib; Siraj, Fadzilah; Sainin, Mohd Shamrie

    2015-12-01

    Ensemble classifier systems are considered as one of the most promising in medical data classification and the performance of deceision tree classifier can be increased by the ensemble method as it is proven to be better than single classifiers. However, in a ensemble settings the performance depends on the selection of suitable base classifier. This research employed two prominent esemble s namely Adaboost and Bagging with base classifiers such as Random Forest, Random Tree, j48, j48grafts and Logistic Model Regression (LMT) that have been selected independently. The empirical study shows that the performance varries when different base classifiers are selected and even some places overfitting issue also been noted. The evidence shows that ensemble decision tree classfiers using Adaboost and Bagging improves the performance of selected medical data sets.

  4. Random forests of interaction trees for estimating individualized treatment effects in randomized trials.

    PubMed

    Su, Xiaogang; Peña, Annette T; Liu, Lei; Levine, Richard A

    2018-04-29

    Assessing heterogeneous treatment effects is a growing interest in advancing precision medicine. Individualized treatment effects (ITEs) play a critical role in such an endeavor. Concerning experimental data collected from randomized trials, we put forward a method, termed random forests of interaction trees (RFIT), for estimating ITE on the basis of interaction trees. To this end, we propose a smooth sigmoid surrogate method, as an alternative to greedy search, to speed up tree construction. The RFIT outperforms the "separate regression" approach in estimating ITE. Furthermore, standard errors for the estimated ITE via RFIT are obtained with the infinitesimal jackknife method. We assess and illustrate the use of RFIT via both simulation and the analysis of data from an acupuncture headache trial. Copyright © 2018 John Wiley & Sons, Ltd.

  5. Additive or non-additive effect of mixing oak in pine stands on soil properties depends on the tree species in Mediterranean forests.

    PubMed

    Brunel, Caroline; Gros, Raphael; Ziarelli, Fabio; Farnet Da Silva, Anne Marie

    2017-07-15

    This study investigated how oak abundance in pine stands (using relative Oak Basal Area %, OBA%) may modulate soil microbial functioning. Forests were composed of sclerophyllous species i.e. Quercus ilex mixed with Pinus halepensis Miller or of Q. pubescens mixed with P. sylvestris. We used a series of plots with OBA% ranging from 0 to 100% in the two types of stand (n=60) and both OLF and A-horizon compartments were analysed. Relations between OBA% and either soil chemical (C and N contents, quality of organic matter via solid-state NMR, pH, CaCO 3 ) or microbial (enzyme activities, basal respiration, biomass and catabolic diversity via BIOLOG) characteristics were described. OBA% increase led to a decrease in the recalcitrant fraction of organic matter (OM) in OLF and promoted microbial growth. Catabolic profiles of microbial communities from A-horizon were significantly modulated in Q. ilex and P. halepensis stand by OBA% and alkyl C to carboxyl C ratio (characteristic of cutin from Q. ilex tissues) and in Q. pubescens and P. sylvestris stands, by OBA% and pH. In A-horizon under Q. ilex and P. halepensis stands, linear regressions were found between catabolic diversity, microbial biomass and OBA% suggesting an additive effect. Conversely, in A-horizon Q. pubescens and P. sylvestris stands, the relationship between OBA% and either cellulase activities, polysaccharides or ammonium contents, suggested a non-additive effect of Q. pubescens and P. sylvestris, enhancing mineralization of the OM labile fraction for plots characterized by an OBA% ranging from 40% to 60%. Mixing oak with pine thus favored microbial dynamics in both type of stands though OBA% print varied with tree species and consequently sustainable soil functioning depend strongly on the composition of mixed stands. Our study indeed revealed that, when evaluating the benefits of forest mixed stand on soil microbial functioning and OM turnover, the identity of tree species has to be considered. Copyright © 2017 Elsevier B.V. All rights reserved.

  6. Decision Tree Approach for Soil Liquefaction Assessment

    PubMed Central

    Gandomi, Amir H.; Fridline, Mark M.; Roke, David A.

    2013-01-01

    In the current study, the performances of some decision tree (DT) techniques are evaluated for postearthquake soil liquefaction assessment. A database containing 620 records of seismic parameters and soil properties is used in this study. Three decision tree techniques are used here in two different ways, considering statistical and engineering points of view, to develop decision rules. The DT results are compared to the logistic regression (LR) model. The results of this study indicate that the DTs not only successfully predict liquefaction but they can also outperform the LR model. The best DT models are interpreted and evaluated based on an engineering point of view. PMID:24489498

  7. Decision tree approach for soil liquefaction assessment.

    PubMed

    Gandomi, Amir H; Fridline, Mark M; Roke, David A

    2013-01-01

    In the current study, the performances of some decision tree (DT) techniques are evaluated for postearthquake soil liquefaction assessment. A database containing 620 records of seismic parameters and soil properties is used in this study. Three decision tree techniques are used here in two different ways, considering statistical and engineering points of view, to develop decision rules. The DT results are compared to the logistic regression (LR) model. The results of this study indicate that the DTs not only successfully predict liquefaction but they can also outperform the LR model. The best DT models are interpreted and evaluated based on an engineering point of view.

  8. Tree Mortality following Prescribed Fire and a Storm Surge Event in Slash Pine ( Pinus elliottii var. densa ) Forests in the Florida Keys, USA

    DOE PAGES

    Sah, Jay P.; Ross, Michael S.; Snyder, James R.; ...

    2010-01-01

    In fire-dependent forests, managers are interested in predicting the consequences of prescribed burning on postfire tree mortality. We examined the effects of prescribed fire on tree mortality in Florida Keys pine forests, using a factorial design with understory type, season, and year of burn as factors. We also used logistic regression to model the effects of burn season, fire severity, and tree dimensions on individual tree mortality. Despite limited statistical power due to problems in carrying out the full suite of planned experimental burns, associations with tree and fire variables were observed. Post-fire pine tree mortality was negatively correlated withmore » tree size and positively correlated with char height and percent crown scorch. Unlike post-fire mortality, tree mortality associated with storm surge from Hurricane Wilma was greater in the large size classes. Due to their influence on population structure and fuel dynamics, the size-selective mortality patterns following fire and storm surge have practical importance for using fire as a management tool in Florida Keys pinelands in the future, particularly when the threats to their continued existence from tropical storms and sea level rise are expected to increase.« less

  9. Detecting treatment-subgroup interactions in clustered data with generalized linear mixed-effects model trees.

    PubMed

    Fokkema, M; Smits, N; Zeileis, A; Hothorn, T; Kelderman, H

    2017-10-25

    Identification of subgroups of patients for whom treatment A is more effective than treatment B, and vice versa, is of key importance to the development of personalized medicine. Tree-based algorithms are helpful tools for the detection of such interactions, but none of the available algorithms allow for taking into account clustered or nested dataset structures, which are particularly common in psychological research. Therefore, we propose the generalized linear mixed-effects model tree (GLMM tree) algorithm, which allows for the detection of treatment-subgroup interactions, while accounting for the clustered structure of a dataset. The algorithm uses model-based recursive partitioning to detect treatment-subgroup interactions, and a GLMM to estimate the random-effects parameters. In a simulation study, GLMM trees show higher accuracy in recovering treatment-subgroup interactions, higher predictive accuracy, and lower type II error rates than linear-model-based recursive partitioning and mixed-effects regression trees. Also, GLMM trees show somewhat higher predictive accuracy than linear mixed-effects models with pre-specified interaction effects, on average. We illustrate the application of GLMM trees on an individual patient-level data meta-analysis on treatments for depression. We conclude that GLMM trees are a promising exploratory tool for the detection of treatment-subgroup interactions in clustered datasets.

  10. Birth-death models and coalescent point processes: the shape and probability of reconstructed phylogenies.

    PubMed

    Lambert, Amaury; Stadler, Tanja

    2013-12-01

    Forward-in-time models of diversification (i.e., speciation and extinction) produce phylogenetic trees that grow "vertically" as time goes by. Pruning the extinct lineages out of such trees leads to natural models for reconstructed trees (i.e., phylogenies of extant species). Alternatively, reconstructed trees can be modelled by coalescent point processes (CPPs), where trees grow "horizontally" by the sequential addition of vertical edges. Each new edge starts at some random speciation time and ends at the present time; speciation times are drawn from the same distribution independently. CPPs lead to extremely fast computation of tree likelihoods and simulation of reconstructed trees. Their topology always follows the uniform distribution on ranked tree shapes (URT). We characterize which forward-in-time models lead to URT reconstructed trees and among these, which lead to CPP reconstructed trees. We show that for any "asymmetric" diversification model in which speciation rates only depend on time and extinction rates only depend on time and on a non-heritable trait (e.g., age), the reconstructed tree is CPP, even if extant species are incompletely sampled. If rates additionally depend on the number of species, the reconstructed tree is (only) URT (but not CPP). We characterize the common distribution of speciation times in the CPP description, and discuss incomplete species sampling as well as three special model cases in detail: (1) the extinction rate does not depend on a trait; (2) rates do not depend on time; (3) mass extinctions may happen additionally at certain points in the past. Copyright © 2013 Elsevier Inc. All rights reserved.

  11. Tree-ring growth of Scots pine, Common beech and Pedunculate oak under future climate in northeastern Germany

    NASA Astrophysics Data System (ADS)

    Jurasinski, Gerald; Scharnweber, Tobias; Schröder, Christian; Lennartz, Bernd; Bauwe, Andreas

    2017-04-01

    Tree growth depends, among other factors, largely on the prevailing climatic conditions. Therefore, tree growth patterns are to be expected under climate change. Here, we analyze the tree-ring growth response of three major European tree species to projected future climate across a climatic (mostly precipitation) gradient in northeastern Germany. We used monthly data for temperature, precipitation, and the standardized precipitation evapotranspiration index (SPEI) over multiple time scales (1, 3, 6, 12, and 24 months) to construct models of tree-ring growth for Scots pine (Pinus syl- vestris L.) at three pure stands, and for Common beech (Fagus sylvatica L.) and Pedunculate oak (Quercus robur L.) at three mature mixed stands. The regression models were derived using a two-step approach based on partial least squares regression (PLSR) to extract potentially well explaining variables followed by ordinary least squares regression (OLSR) to consolidate the models to the least number of variables while retaining high explanatory power. The stability of the models was tested with a comprehensive calibration-verification scheme. All models were successfully verified with R2s ranging from 0.21 for the western pine stand to 0.62 for the beech stand in the east. For growth prediction, climate data forecasted until 2100 by the regional climate model WETTREG2010 based on the A1B Intergovernmental Panel on Climate Change (IPCC) emission scenario was used. For beech and oak, growth rates will likely decrease until the end of the 21st century. For pine, modeled growth trends vary and range from a slight growth increase to a weak decrease in growth rates depending on the position along the climatic gradient. The climatic gradient across the study area will possibly affect the future growth of oak with larger growth reductions towards the drier east. For beech, site-specific adaptations seem to override the influence of the climatic gradient. We conclude that in Northeastern Germany Scots pine has great potential to remain resilient to projected climate change without any greater impairment, whereas Common beech and Pedunculate oak will likely face lesser growth under the expected warmer and dryer climate conditions. The results call for an adaptation of forest management to mitigate the negative effects of climate change for beech and oak in the region.

  12. Mastectomy or breast conserving surgery? Factors affecting type of surgical treatment for breast cancer--a classification tree approach.

    PubMed

    Martin, Michael A; Meyricke, Ramona; O'Neill, Terry; Roberts, Steven

    2006-04-20

    A critical choice facing breast cancer patients is which surgical treatment--mastectomy or breast conserving surgery (BCS)--is most appropriate. Several studies have investigated factors that impact the type of surgery chosen, identifying features such as place of residence, age at diagnosis, tumor size, socio-economic and racial/ethnic elements as relevant. Such assessment of "propensity" is important in understanding issues such as a reported under-utilisation of BCS among women for whom such treatment was not contraindicated. Using Western Australian (WA) data, we further examine the factors associated with the type of surgical treatment for breast cancer using a classification tree approach. This approach deals naturally with complicated interactions between factors, and so allows flexible and interpretable models for treatment choice to be built that add to the current understanding of this complex decision process. Data was extracted from the WA Cancer Registry on women diagnosed with breast cancer in WA from 1990 to 2000. Subjects' treatment preferences were predicted from covariates using both classification trees and logistic regression. Tumor size was the primary determinant of patient choice, subjects with tumors smaller than 20 mm in diameter preferring BCS. For subjects with tumors greater than 20 mm in diameter factors such as patient age, nodal status, and tumor histology become relevant as predictors of patient choice. Classification trees perform as well as logistic regression for predicting patient choice, but are much easier to interpret for clinical use. The selected tree can inform clinicians' advice to patients.

  13. External heart deformities in passerine birds exposed to environmental mixtures of polychlorinated biphenyls during development.

    PubMed

    DeWitt, Jamie C; Millsap, Deborah S; Yeager, Ronnie L; Heise, Steve S; Sparks, Daniel W; Henshel, Diane S

    2006-02-01

    Necropsy-observable cardiac deformities were evaluated from 283 nestling passerines collected from one reference site and five polychlorinated biphenyl (PCB)-contaminated sites around Bloomington and Bedford, Indiana, USA. Hearts were weighed and assessed on relative scales in three dimensions (height, length, and width) and for externally visible deformities. Heart weights normalized to body weight (heart somatic index) were decreased significantly at the more contaminated sites in both house wren (Troglodytes aedon) and tree swallow (Tachycineta bicolor). Heart somatic indices significantly correlated with log PCB concentrations in Carolina chickadee (Parus carolinesis) and tree swallow and with log 2,3,7,8-tetrachlorodibenzo-p-dioxin toxic equivalent values in tree swallow alone. Ventricular length was increased significantly in eastern bluebirds (Sialia sialis) and decreased significantly in Carolina chickadee and tree swallow from contaminated sites versus the reference site. Heart length regressed significantly against the log PCB concentrations (Carolina chickadee and tree swallow) or the square of the PCB concentrations (red-winged blackbird [Agelaius phoeniceus]) in a sibling bird. The deformities that were observed most at the contaminated sites included abnormal tips (pointed, rounded, or flattened), center rolls, macro- and microsurface roughness, ventricular indentations on the ventral or dorsal surface, lateral ventricular notches, visibly thin ventricular walls, and changes in overall heart shape. A pooled heart deformity index regressed significantly against the logged contaminant concentrations for all species except red-winged blackbird. These results indicate that developmental changes in heart morphometrics and shape abnormalities are quantifiable and may be sensitive and useful indicators of PCB-related developmental impacts across many avian species.

  14. Data mining of tree-based models to analyze freeway accident frequency.

    PubMed

    Chang, Li-Yen; Chen, Wen-Chieh

    2005-01-01

    Statistical models, such as Poisson or negative binomial regression models, have been employed to analyze vehicle accident frequency for many years. However, these models have their own model assumptions and pre-defined underlying relationship between dependent and independent variables. If these assumptions are violated, the model could lead to erroneous estimation of accident likelihood. Classification and Regression Tree (CART), one of the most widely applied data mining techniques, has been commonly employed in business administration, industry, and engineering. CART does not require any pre-defined underlying relationship between target (dependent) variable and predictors (independent variables) and has been shown to be a powerful tool, particularly for dealing with prediction and classification problems. This study collected the 2001-2002 accident data of National Freeway 1 in Taiwan. A CART model and a negative binomial regression model were developed to establish the empirical relationship between traffic accidents and highway geometric variables, traffic characteristics, and environmental factors. The CART findings indicated that the average daily traffic volume and precipitation variables were the key determinants for freeway accident frequencies. By comparing the prediction performance between the CART and the negative binomial regression models, this study demonstrates that CART is a good alternative method for analyzing freeway accident frequencies. By comparing the prediction performance between the CART and the negative binomial regression models, this study demonstrates that CART is a good alternative method for analyzing freeway accident frequencies.

  15. Computer aided diagnosis system for the Alzheimer's disease based on partial least squares and random forest SPECT image classification.

    PubMed

    Ramírez, J; Górriz, J M; Segovia, F; Chaves, R; Salas-Gonzalez, D; López, M; Alvarez, I; Padilla, P

    2010-03-19

    This letter shows a computer aided diagnosis (CAD) technique for the early detection of the Alzheimer's disease (AD) by means of single photon emission computed tomography (SPECT) image classification. The proposed method is based on partial least squares (PLS) regression model and a random forest (RF) predictor. The challenge of the curse of dimensionality is addressed by reducing the large dimensionality of the input data by downscaling the SPECT images and extracting score features using PLS. A RF predictor then forms an ensemble of classification and regression tree (CART)-like classifiers being its output determined by a majority vote of the trees in the forest. A baseline principal component analysis (PCA) system is also developed for reference. The experimental results show that the combined PLS-RF system yields a generalization error that converges to a limit when increasing the number of trees in the forest. Thus, the generalization error is reduced when using PLS and depends on the strength of the individual trees in the forest and the correlation between them. Moreover, PLS feature extraction is found to be more effective for extracting discriminative information from the data than PCA yielding peak sensitivity, specificity and accuracy values of 100%, 92.7%, and 96.9%, respectively. Moreover, the proposed CAD system outperformed several other recently developed AD CAD systems. Copyright 2010 Elsevier Ireland Ltd. All rights reserved.

  16. Predicting membrane protein types using various decision tree classifiers based on various modes of general PseAAC for imbalanced datasets.

    PubMed

    Sankari, E Siva; Manimegalai, D

    2017-12-21

    Predicting membrane protein types is an important and challenging research area in bioinformatics and proteomics. Traditional biophysical methods are used to classify membrane protein types. Due to large exploration of uncharacterized protein sequences in databases, traditional methods are very time consuming, expensive and susceptible to errors. Hence, it is highly desirable to develop a robust, reliable, and efficient method to predict membrane protein types. Imbalanced datasets and large datasets are often handled well by decision tree classifiers. Since imbalanced datasets are taken, the performance of various decision tree classifiers such as Decision Tree (DT), Classification And Regression Tree (CART), C4.5, Random tree, REP (Reduced Error Pruning) tree, ensemble methods such as Adaboost, RUS (Random Under Sampling) boost, Rotation forest and Random forest are analysed. Among the various decision tree classifiers Random forest performs well in less time with good accuracy of 96.35%. Another inference is RUS boost decision tree classifier is able to classify one or two samples in the class with very less samples while the other classifiers such as DT, Adaboost, Rotation forest and Random forest are not sensitive for the classes with fewer samples. Also the performance of decision tree classifiers is compared with SVM (Support Vector Machine) and Naive Bayes classifier. Copyright © 2017 Elsevier Ltd. All rights reserved.

  17. Predicting Occurrence of Spine Surgery Complications Using "Big Data" Modeling of an Administrative Claims Database.

    PubMed

    Ratliff, John K; Balise, Ray; Veeravagu, Anand; Cole, Tyler S; Cheng, Ivan; Olshen, Richard A; Tian, Lu

    2016-05-18

    Postoperative metrics are increasingly important in determining standards of quality for physicians and hospitals. Although complications following spinal surgery have been described, procedural and patient variables have yet to be incorporated into a predictive model of adverse-event occurrence. We sought to develop a predictive model of complication occurrence after spine surgery. We used longitudinal prospective data from a national claims database and developed a predictive model incorporating complication type and frequency of occurrence following spine surgery procedures. We structured our model to assess the impact of features such as preoperative diagnosis, patient comorbidities, location in the spine, anterior versus posterior approach, whether fusion had been performed, whether instrumentation had been used, number of levels, and use of bone morphogenetic protein (BMP). We assessed a variety of adverse events. Prediction models were built using logistic regression with additive main effects and logistic regression with main effects as well as all 2 and 3-factor interactions. Least absolute shrinkage and selection operator (LASSO) regularization was used to select features. Competing approaches included boosted additive trees and the classification and regression trees (CART) algorithm. The final prediction performance was evaluated by estimating the area under a receiver operating characteristic curve (AUC) as predictions were applied to independent validation data and compared with the Charlson comorbidity score. The model was developed from 279,135 records of patients with a minimum duration of follow-up of 30 days. Preliminary assessment showed an adverse-event rate of 13.95%, well within norms reported in the literature. We used the first 80% of the records for training (to predict adverse events) and the remaining 20% of the records for validation. There was remarkable similarity among methods, with an AUC of 0.70 for predicting the occurrence of adverse events. The AUC using the Charlson comorbidity score was 0.61. The described model was more accurate than Charlson scoring (p < 0.01). We present a modeling effort based on administrative claims data that predicts the occurrence of complications after spine surgery. We believe that the development of a predictive modeling tool illustrating the risk of complication occurrence after spine surgery will aid in patient counseling and improve the accuracy of risk modeling strategies. Copyright © 2016 by The Journal of Bone and Joint Surgery, Incorporated.

  18. Demographic predictors of peanut, tree nut, fish, shellfish, and sesame allergy in Canada.

    PubMed

    Ben-Shoshan, M; Harrington, D W; Soller, L; Fragapane, J; Joseph, L; Pierre, Y St; Godefroy, S B; Elliott, S J; Clarke, A E

    2012-01-01

    Background. Studies suggest that the rising prevalence of food allergy during recent decades may have stabilized. Although genetics undoubtedly contribute to the emergence of food allergy, it is likely that other factors play a crucial role in mediating such short-term changes. Objective. To identify potential demographic predictors of food allergies. Methods. We performed a cross-Canada, random telephone survey. Criteria for food allergy were self-report of convincing symptoms and/or physician diagnosis of allergy. Multivariate logistic regressions were used to assess potential determinants. Results. Of 10,596 households surveyed in 2008/2009, 3666 responded, representing 9667 individuals. Peanut, tree nut, and sesame allergy were more common in children (odds ratio (OR) 2.24 (95% CI, 1.40, 3.59), 1.73 (95% CI, 1.11, 2.68), and 5.63 (95% CI, 1.39, 22.87), resp.) while fish and shellfish allergy were less common in children (OR 0.17 (95% CI, 0.04, 0.72) and 0.29 (95% CI, 0.14, 0.61)). Tree nut and shellfish allergy were less common in males (OR 0.55 (95% CI, 0.36, 0.83) and 0.63 (95% CI, 0.43, 0.91)). Shellfish allergy was more common in urban settings (OR 1.55 (95% CI, 1.04, 2.31)). There was a trend for most food allergies to be more prevalent in the more educated (tree nut OR 1.90 (95% CI, 1.18, 3.04)) and less prevalent in immigrants (shellfish OR 0.49 (95% CI, 0.26, 0.95)), but wide CIs preclude definitive conclusions for most foods. Conclusions. Our results reveal that in addition to age and sex, place of residence, socioeconomic status, and birth place may influence the development of food allergy.

  19. The Impact of Afforestation on Soil Organic Carbon Sequestration on the Qinghai Plateau, China

    PubMed Central

    Shi, Sheng-wei; Han, Peng-fei; Zhang, Ping; Ding, Fan; Ma, Cheng-lin

    2015-01-01

    Afforestation, the conversion of non-forested land into forest, is widespread in China. However, the dynamics of soil organic carbon (SOC) after afforestation are not well understood, especially in plateau climate zones. For a total of 48 shrub- and/or tree-dominated afforestation sites on the Qinghai Plateau, Northwestern China, post-afforestation changes in SOC, total nitrogen (TN), the carbon-to-nitrogen ratio (C/N) and soil bulk density (BD) were investigated to a soil depth of 60 cm using the paired-plots method. SOC and TN accumulated at rates of 138.2 g C m-2 yr-1 and 4.6 g N m-2 yr-1, respectively, in shrub-dominated afforestation sites and at rates of 113.3 g C m-2 yr--1 and 6.7 g N m-2yr-1, respectively, in tree-dominated afforestation sites. Soil BD was slightly reduced in all layers in the shrub-dominated afforestation plots, and significantly reduced in soil layers from 0–40cm in the tree-dominated afforestation plots. The C/N ratio was higher in afforested sites relative to the reference sites. SOC accumulation was closely related to TN accumulation following afforestation, and the inclusion of N-fixing species in tree-dominated afforestation sites additionally increased the soil accumulation capacity for SOC (p < 0.05). Multiple regression models including the age of an afforestation plot and total number of plant species explained 75% of the variation in relative SOC content change at depth of 0–20 cm, in tree-dominated afforestation sites. We conclude that afforestation on the Qinghai Plateau is associated with great capability of SOC and TN sequestration. This study improves our understanding of the mechanisms underlying SOC and TN accumulation in a plateau climate, and provides evidence on the C sequestration potentials associated with forestry projects in China. PMID:25706724

  20. Evaluating the ecosystem water use efficiency and gross primary productivity in boreal forest based on tree ring data

    NASA Astrophysics Data System (ADS)

    Liu, S.; Zhuang, Q.

    2016-12-01

    Climatic change affects the plant physiological and biogeochemistry processes, and therefore on the ecosystem water use efficiency (WUE). Therefore, a comprehensive understanding of WUE would help us understand the adaptability of ecosystem to variable climate conditions. Tree ring data have great potential in addressing the forest response to climatic changes compared with mechanistic model simulations, eddy flux measurement and manipulative experiments. Here, we collected the tree ring isotopic carbon data in 12 boreal forest sites to develop a multiple linear regression model, and the model was extrapolated to the whole boreal region to obtain the WUE spatial and temporal variation from 1948 to 2010. Two algorithms were also used to estimate the inter-annual gross primary productivity (GPP) based on our derived WUE. Our results demonstrated that most of boreal regions showed significant increasing WUE trend during the period except parts of Alaska. The spatial averaged annual mean WUE was predicted to increase by 13%, from 2.3±0.4 g C kg-1 H2O at 1948 to 2.6±0.7 g C kg-1 H2O at 2012, which was much higher than other land surface models. Our predicted GPP by the WUE definition algorithm was comparable with site observation, while for the revised light use efficiency algorithm, GPP estimation was higher than site observation as well as than land surface models. In addition, the increasing GPP trends by two algorithms were similar with land surface model simulations. This is the first study to evaluate regional WUE and GPP in forest ecosystem based on tree ring data and future work should consider other variables (elevation, nitrogen deposition) that influence tree ring isotopic signals and the dual-isotope approach may help improve predicting the inter-annual WUE variation.

  1. The impact of afforestation on soil organic carbon sequestration on the Qinghai Plateau, China.

    PubMed

    Shi, Sheng-wei; Han, Peng-fei; Zhang, Ping; Ding, Fan; Ma, Cheng-lin

    2015-01-01

    Afforestation, the conversion of non-forested land into forest, is widespread in China. However, the dynamics of soil organic carbon (SOC) after afforestation are not well understood, especially in plateau climate zones. For a total of 48 shrub- and/or tree-dominated afforestation sites on the Qinghai Plateau, Northwestern China, post-afforestation changes in SOC, total nitrogen (TN), the carbon-to-nitrogen ratio (C/N) and soil bulk density (BD) were investigated to a soil depth of 60 cm using the paired-plots method. SOC and TN accumulated at rates of 138.2 g C m(-2) yr(-1) and 4.6 g N m(-2) yr(-1), respectively, in shrub-dominated afforestation sites and at rates of 113.3 g C m(-2) yr(-1) and 6.7 g N m(-2) yr(-1), respectively, in tree-dominated afforestation sites. Soil BD was slightly reduced in all layers in the shrub-dominated afforestation plots, and significantly reduced in soil layers from 0-40cm in the tree-dominated afforestation plots. The C/N ratio was higher in afforested sites relative to the reference sites. SOC accumulation was closely related to TN accumulation following afforestation, and the inclusion of N-fixing species in tree-dominated afforestation sites additionally increased the soil accumulation capacity for SOC (p < 0.05). Multiple regression models including the age of an afforestation plot and total number of plant species explained 75% of the variation in relative SOC content change at depth of 0-20 cm, in tree-dominated afforestation sites. We conclude that afforestation on the Qinghai Plateau is associated with great capability of SOC and TN sequestration. This study improves our understanding of the mechanisms underlying SOC and TN accumulation in a plateau climate, and provides evidence on the C sequestration potentials associated with forestry projects in China.

  2. Validating automatic semantic annotation of anatomy in DICOM CT images

    NASA Astrophysics Data System (ADS)

    Pathak, Sayan D.; Criminisi, Antonio; Shotton, Jamie; White, Steve; Robertson, Duncan; Sparks, Bobbi; Munasinghe, Indeera; Siddiqui, Khan

    2011-03-01

    In the current health-care environment, the time available for physicians to browse patients' scans is shrinking due to the rapid increase in the sheer number of images. This is further aggravated by mounting pressure to become more productive in the face of decreasing reimbursement. Hence, there is an urgent need to deliver technology which enables faster and effortless navigation through sub-volume image visualizations. Annotating image regions with semantic labels such as those derived from the RADLEX ontology can vastly enhance image navigation and sub-volume visualization. This paper uses random regression forests for efficient, automatic detection and localization of anatomical structures within DICOM 3D CT scans. A regression forest is a collection of decision trees which are trained to achieve direct mapping from voxels to organ location and size in a single pass. This paper focuses on comparing automated labeling with expert-annotated ground-truth results on a database of 50 highly variable CT scans. Initial investigations show that regression forest derived localization errors are smaller and more robust than those achieved by state-of-the-art global registration approaches. The simplicity of the algorithm's context-rich visual features yield typical runtimes of less than 10 seconds for a 5123 voxel DICOM CT series on a single-threaded, single-core machine running multiple trees; each tree taking less than a second. Furthermore, qualitative evaluation demonstrates that using the detected organs' locations as index into the image volume improves the efficiency of the navigational workflow in all the CT studies.

  3. Calcium addition at the Hubbard Brook Experimental Forest increases sugar storage, antioxidant activity and cold tolerance in native red spruce (Picea rubens).

    PubMed

    Halman, Joshua M; Schaberg, Paul G; Hawley, Gary J; Eagar, Christopher

    2008-06-01

    In fall (November 2005) and winter (February 2006), we collected current-year foliage of native red spruce (Picea rubens Sarg.) growing in a reference watershed and in a watershed treated in 1999 with wollastonite (CaSiO(3), a slow-release calcium source) to simulate preindustrial soil calcium concentrations (Ca-addition watershed) at the Hubbard Brook Experimental Forest (Thornton, NH). We analyzed nutrition, soluble sugar concentrations, ascorbate peroxidase (APX) activity and cold tolerance, to evaluate the basis of recent (2003) differences between watersheds in red spruce foliar winter injury. Foliar Ca and total sugar concentrations were significantly higher in trees in the Ca-addition watershed than in trees in the reference watershed during both fall (P=0.037 and 0.035, respectively) and winter (P=0.055 and 0.036, respectively). The Ca-addition treatment significantly increased foliar fructose and glucose concentrations in November (P=0.013 and 0.007, respectively) and foliar sucrose concentrations in winter (P=0.040). Foliar APX activity was similar in trees in both watersheds during fall (P=0.28), but higher in trees in the Ca-addition watershed during winter (P=0.063). Cold tolerance of foliage was significantly greater in trees in the Ca-addition watershed than in trees in the reference watershed (P<0.001). Our results suggest that low foliar sugar concentrations and APX activity, and reduced cold tolerance in trees in the reference watershed contributed to their high vulnerability to winter injury in 2003. Because the reference watershed reflects forest conditions in the region, the consequences of impaired physiological function caused by soil Ca depletion may have widespread implications for forest health.

  4. Modeling non-linear growth responses to temperature and hydrology in wetland trees

    NASA Astrophysics Data System (ADS)

    Keim, R.; Allen, S. T.

    2016-12-01

    Growth responses of wetland trees to flooding and climate variations are difficult to model because they depend on multiple, apparently interacting factors, but are a critical link in hydrological control of wetland carbon budgets. To more generally understand tree growth to hydrological forcing, we modeled non-linear responses of tree ring growth to flooding and climate at sub-annual time steps, using Vaganov-Shashkin response functions. We calibrated the model to six baldcypress tree-ring chronologies from two hydrologically distinct sites in southern Louisiana, and tested several hypotheses of plasticity in wetlands tree responses to interacting environmental variables. The model outperformed traditional multiple linear regression. More importantly, optimized response parameters were generally similar among sites with varying hydrological conditions, suggesting generality to the functions. Model forms that included interacting responses to multiple forcing factors were more effective than were single response functions, indicating the principle of a single limiting factor is not correct in wetlands and both climatic and hydrological variables must be considered in predicting responses to hydrological or climate change.

  5. Mapping tree and impervious cover using Ikonos imagery: links with water quality and stream health

    NASA Astrophysics Data System (ADS)

    Wright, R.; Goetz, S. J.; Smith, A.; Zinecker, E.

    2002-12-01

    Precision georeferened Ikonos satellite imagery was used to map tree cover and impervious surface area in Montgomery county Maryland. The derived maps were used to assess riparian zone stream buffer tree cover and to predict, with multivariate logistic regression, stream health ratings across 246 small watersheds averaging 472 km2 in size. Stream health was assessed by state and county experts using a combination of physical measurements (e.g., dissolved oxygen) and biological indicators (e.g., benthic macroinvertebrates). We found it possible to create highly accurate (90+ per cent) maps of tree and impervious cover using decision tree classifiers, provided extensive field data were available for algorithm training. Impervious surface area was found to be the primary predictor of stream health, followed by tree cover in riparian buffers, and total tree cover within entire watersheds. A number of issues associated with mapping using Ikonos imagery were encountered, including differences in phenological and atmospheric conditions, shadowing within canopies and between scene elements, and limited spectral discrimination of cover types. We report on both the capabilities and limitations of Ikonos imagery for these applications, and considerations for extending these analyses to other areas.

  6. Leaf drop affects herbivory in oaks.

    PubMed

    Pearse, Ian S; Karban, Richard

    2013-11-01

    Leaf phenology is important to herbivores, but the timing and extent of leaf drop has not played an important role in our understanding of herbivore interactions with deciduous plants. Using phylogenetic general least squares regression, we compared the phenology of leaves of 55 oak species in a common garden with the abundance of leaf miners on those trees. Mine abundance was highest on trees with an intermediate leaf retention index, i.e. trees that lost most, but not all, of their leaves for 2-3 months. The leaves of more evergreen species were more heavily sclerotized, and sclerotized leaves accumulated fewer mines in the summer. Leaves of more deciduous species also accumulated fewer mines in the summer, and this was consistent with the idea that trees reduce overwintering herbivores by shedding leaves. Trees with a later leaf set and slower leaf maturation accumulated fewer herbivores. We propose that both leaf drop and early leaf phenology strongly affect herbivore abundance and select for differences in plant defense. Leaf drop may allow trees to dispose of their herbivores so that the herbivores must recolonize in spring, but trees with the longest leaf retention also have the greatest direct defenses against herbivores.

  7. Can incentives make a difference? Assessing the effects of policy tools for encouraging tree-planting on private lands.

    PubMed

    Ruseva, Tatyana B; Evans, Tom P; Fischer, Burnell C

    2015-05-15

    This study uses a mail survey of private landowners in the Midwest United States to understand the characteristics of owners who have planted trees or intend to plant trees in the future. The analysis examines what policy tools encourage owners to plant trees, and how policy tools operate across different ownership attributes to promote tree-planting on private lands. Logistic regression results suggest that cost-subsidizing policy tools, such as low-cost and free seedlings, significantly increase the odds of actual and planned reforestation when landowners consider them important for increasing forest cover. Individuals most likely to plant trees, when low-cost seedlings are available and important, are fairly recent (<5 years), college-educated owners who own small parcels (<4 ha) and use the land for recreation. Motivations to reforest were also shaped by owners' planning horizons, connection to the land, previous tree-planting experience, and peer influence. The study has relevance for the design of policy approaches that can encourage private forestation through provision of economic incentives and capacity to private landowners. Copyright © 2015 Elsevier Ltd. All rights reserved.

  8. Tree diversity and species identity effects on soil fungi, protists and animals are context dependent

    PubMed Central

    Tedersoo, Leho; Bahram, Mohammad; Cajthaml, Tomáš; Põlme, Sergei; Hiiesalu, Indrek; Anslan, Sten; Harend, Helery; Buegger, Franz; Pritsch, Karin; Koricheva, Julia; Abarenkov, Kessy

    2016-01-01

    Plant species richness and the presence of certain influential species (sampling effect) drive the stability and functionality of ecosystems as well as primary production and biomass of consumers. However, little is known about these floristic effects on richness and community composition of soil biota in forest habitats owing to methodological constraints. We developed a DNA metabarcoding approach to identify the major eukaryote groups directly from soil with roughly species-level resolution. Using this method, we examined the effects of tree diversity and individual tree species on soil microbial biomass and taxonomic richness of soil biota in two experimental study systems in Finland and Estonia and accounted for edaphic variables and spatial autocorrelation. Our analyses revealed that the effects of tree diversity and individual species on soil biota are largely context dependent. Multiple regression and structural equation modelling suggested that biomass, soil pH, nutrients and tree species directly affect richness of different taxonomic groups. The community composition of most soil organisms was strongly correlated due to similar response to environmental predictors rather than causal relationships. On a local scale, soil resources and tree species have stronger effect on diversity of soil biota than tree species richness per se. PMID:26172210

  9. A survival tree method for the analysis of discrete event times in clinical and epidemiological studies.

    PubMed

    Schmid, Matthias; Küchenhoff, Helmut; Hoerauf, Achim; Tutz, Gerhard

    2016-02-28

    Survival trees are a popular alternative to parametric survival modeling when there are interactions between the predictor variables or when the aim is to stratify patients into prognostic subgroups. A limitation of classical survival tree methodology is that most algorithms for tree construction are designed for continuous outcome variables. Hence, classical methods might not be appropriate if failure time data are measured on a discrete time scale (as is often the case in longitudinal studies where data are collected, e.g., quarterly or yearly). To address this issue, we develop a method for discrete survival tree construction. The proposed technique is based on the result that the likelihood of a discrete survival model is equivalent to the likelihood of a regression model for binary outcome data. Hence, we modify tree construction methods for binary outcomes such that they result in optimized partitions for the estimation of discrete hazard functions. By applying the proposed method to data from a randomized trial in patients with filarial lymphedema, we demonstrate how discrete survival trees can be used to identify clinically relevant patient groups with similar survival behavior. Copyright © 2015 John Wiley & Sons, Ltd.

  10. Tree diversity and species identity effects on soil fungi, protists and animals are context dependent.

    PubMed

    Tedersoo, Leho; Bahram, Mohammad; Cajthaml, Tomáš; Põlme, Sergei; Hiiesalu, Indrek; Anslan, Sten; Harend, Helery; Buegger, Franz; Pritsch, Karin; Koricheva, Julia; Abarenkov, Kessy

    2016-02-01

    Plant species richness and the presence of certain influential species (sampling effect) drive the stability and functionality of ecosystems as well as primary production and biomass of consumers. However, little is known about these floristic effects on richness and community composition of soil biota in forest habitats owing to methodological constraints. We developed a DNA metabarcoding approach to identify the major eukaryote groups directly from soil with roughly species-level resolution. Using this method, we examined the effects of tree diversity and individual tree species on soil microbial biomass and taxonomic richness of soil biota in two experimental study systems in Finland and Estonia and accounted for edaphic variables and spatial autocorrelation. Our analyses revealed that the effects of tree diversity and individual species on soil biota are largely context dependent. Multiple regression and structural equation modelling suggested that biomass, soil pH, nutrients and tree species directly affect richness of different taxonomic groups. The community composition of most soil organisms was strongly correlated due to similar response to environmental predictors rather than causal relationships. On a local scale, soil resources and tree species have stronger effect on diversity of soil biota than tree species richness per se.

  11. 7 CFR 319.77-4 - Conditions for the importation of regulated articles.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... Host Material from Canada § 319.77-4 Conditions for the importation of regulated articles. (a) Trees and shrubs. 1 (1) Trees without roots (e.g., Christmas trees), trees with roots, and shrubs with roots... restriction under this subpart if they: 1 Trees and shrubs from Canada may be subject to additional...

  12. Los Angeles 1-Million tree canopy cover assessment

    Treesearch

    Gregory E. McPherson; James R. Simpson; Qingfu Xiao; Wu Chunxia

    2008-01-01

    The Million Trees LA initiative intends to chart a course for sustainable growth through planting and stewardship of trees. The purpose of this study was to measure Los Angeles's existing tree canopy cover (TCC), determine if space exists for 1 million additional trees, and estimate future benefits from the planting. High resolution QuickBird remote sensing data,...

  13. Converting international ¼ inch tree volume to Doyle

    Treesearch

    Aaron Holley; John R. Brooks; Stuart A. Moss

    2014-01-01

    An equation for converting Mesavage and Girard's International ¼ inch tree volumes to the Doyle log rule is presented as a function of tree diameter. Volume error for trees having less than four logs exhibited volume prediction errors within a range of ±10 board feet. In addition, volume prediction error as a percent of actual Doyle tree volume...

  14. 7 CFR 319.77-4 - Conditions for the importation of regulated articles.

    Code of Federal Regulations, 2014 CFR

    2014-01-01

    ... Host Material from Canada § 319.77-4 Conditions for the importation of regulated articles. (a) Trees and shrubs. 1 (1) Trees without roots (e.g., Christmas trees), trees with roots, and shrubs with roots... restriction under this subpart if they: 1 Trees and shrubs from Canada may be subject to additional...

  15. 7 CFR 319.77-4 - Conditions for the importation of regulated articles.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... Host Material from Canada § 319.77-4 Conditions for the importation of regulated articles. (a) Trees and shrubs. 1 (1) Trees without roots (e.g., Christmas trees), trees with roots, and shrubs with roots... restriction under this subpart if they: 1 Trees and shrubs from Canada may be subject to additional...

  16. Mathematical models application for mapping soils spatial distribution on the example of the farm from the North of Udmurt Republic of Russia

    NASA Astrophysics Data System (ADS)

    Dokuchaev, P. M.; Meshalkina, J. L.; Yaroslavtsev, A. M.

    2018-01-01

    Comparative analysis of soils geospatial modeling using multinomial logistic regression, decision trees, random forest, regression trees and support vector machines algorithms was conducted. The visual interpretation of the digital maps obtained and their comparison with the existing map, as well as the quantitative assessment of the individual soil groups detection overall accuracy and of the models kappa showed that multiple logistic regression, support vector method, and random forest models application with spatial prediction of the conditional soil groups distribution can be reliably used for mapping of the study area. It has shown the most accurate detection for sod-podzolics soils (Phaeozems Albic) lightly eroded and moderately eroded soils. In second place, according to the mean overall accuracy of the prediction, there are sod-podzolics soils - non-eroded and warp one, as well as sod-gley soils (Umbrisols Gleyic) and alluvial soils (Fluvisols Dystric, Umbric). Heavy eroded sod-podzolics and gray forest soils (Phaeozems Albic) were detected by methods of automatic classification worst of all.

  17. Sampling the quality of hardwood trees

    Treesearch

    Adrian M. Gilbert

    1959-01-01

    Anyone acquainted with the conversion of hardwood trees into wood products knows that timber has a wide range in quality. Some trees will yield better products than others. So, in addition to rate of growth and size, tree values are affected by the quality of products yielded.

  18. Incorporating additional tree and environmental variables in a lodgepole pine stem profile model

    Treesearch

    John C. Byrne

    1993-01-01

    A new variable-form segmented stem profile model is developed for lodgepole pine (Pinus contorta) trees from the northern Rocky Mountains of the United States. I improved estimates of stem diameter by predicting two of the model coefficients with linear equations using a measure of tree form, defined as a ratio of dbh and total height. Additional improvements were...

  19. National assessment of Tree City USA participation

    EPA Pesticide Factsheets

    Tree City USA is a national program that recognizes municipal commitment to community forestry. In return for meeting program requirements, Tree City USA participants expect social, economic, and/or environmental benefits. Understanding the geographic distribution and socioeconomic characteristics of Tree City USA communities at the national scale can offer insights into the motivations or barriers to program participation, and provide context for community forestry research at finer scales. In this study, researchers assessed patterns in Tree City USA participation for all U.S. communities with more than 2,500 people according to geography, community population size, and socioeconomic characteristics, such as income, education, and race. Nationally, 23.5% of communities studied were Tree City USA participants, and this accounted for 53.9% of the total population in these communities. Tree City USA participation rates varied substantially by U.S. region, but in each region participation rates were higher in larger communities, and long-term participants tended to be larger communities than more recent enrollees. In logistic regression models, owner occupancy rates were significant negative predictors of Tree City USA participation, education and percent white population were positive predictors in many U.S. regions, and inconsistent patterns were observed for income and population age. The findings indicate that communities with smaller populations, lower educat

  20. Multiyear fate of a 15 N tracer in a mixed deciduous forest: retention, redistribution, and differences by mycorrhizal association.

    PubMed

    Goodale, Christine L

    2017-02-01

    The impact of atmospheric nitrogen deposition on forest ecosystems depends in large part on its fate. Past tracer studies show that litter and soils dominate the short-term fate of added 15 N, yet few have examined its longer term dynamics or differences among forest types. This study examined the fate of a 15 N-NO3- tracer over 5-6 years in a mixed deciduous stand that was evenly composed of trees with ectomycorrhizal and arbuscular mycorrhizal associations. The tracer was expected to slowly mineralize from its main initial fate in litter and surface soil, with some 15 N moving to trees, some to deeper soil, and some net losses. Recovery of added 15 N in trees and litterfall totaled 11.3% both 1 and 5-6 years after the tracer addition, as 15 N redistributed from fine and especially coarse roots into cumulative litterfall and small accumulations in woody tissues. Estimates of potential carbon sequestration from tree 15 N recovery amounted to 12-14 kg C per kg of N deposition. Tree 15 N acquisition occurred within the first year after the tracer addition, with no subsequent additional net transfer of 15 N from detrital to plant pools. In both years, ectomycorrhizal trees gained 50% more of the tracer than did trees with arbuscular mycorrhizae. Much of the 15 N recovered in wood occurred in tree rings formed prior to the 15 N addition, demonstrating the mobility of N in wood. Tracer recovery rapidly decreased over time in surface litter material and accumulated in both shallow and deep soil, perhaps through mixing by earthworms. Overall, results showed redistribution of tracer 15 N through trees and surface soils without any losses, as whole-ecosystem recovery remained constant between 1 and 5-6 years at 70% of the 15 N addition. These results demonstrate the persistent ecosystem retention of N deposition even as it redistributes, without additional plant uptake over this timescale. © 2016 John Wiley & Sons Ltd.

  1. New flux based dose-response relationships for ozone for European forest tree species.

    PubMed

    Büker, P; Feng, Z; Uddling, J; Briolat, A; Alonso, R; Braun, S; Elvira, S; Gerosa, G; Karlsson, P E; Le Thiec, D; Marzuoli, R; Mills, G; Oksanen, E; Wieser, G; Wilkinson, M; Emberson, L D

    2015-11-01

    To derive O3 dose-response relationships (DRR) for five European forest trees species and broadleaf deciduous and needleleaf tree plant functional types (PFTs), phytotoxic O3 doses (PODy) were related to biomass reductions. PODy was calculated using a stomatal flux model with a range of cut-off thresholds (y) indicative of varying detoxification capacities. Linear regression analysis showed that DRR for PFT and individual tree species differed in their robustness. A simplified parameterisation of the flux model was tested and showed that for most non-Mediterranean tree species, this simplified model led to similarly robust DRR as compared to a species- and climate region-specific parameterisation. Experimentally induced soil water stress was not found to substantially reduce PODy, mainly due to the short duration of soil water stress periods. This study validates the stomatal O3 flux concept and represents a step forward in predicting O3 damage to forests in a spatially and temporally varying climate. Crown Copyright © 2015. Published by Elsevier Ltd. All rights reserved.

  2. Modelling nitrate pollution pressure using a multivariate statistical approach: the case of Kinshasa groundwater body, Democratic Republic of Congo

    NASA Astrophysics Data System (ADS)

    Mfumu Kihumba, Antoine; Ndembo Longo, Jean; Vanclooster, Marnik

    2016-03-01

    A multivariate statistical modelling approach was applied to explain the anthropogenic pressure of nitrate pollution on the Kinshasa groundwater body (Democratic Republic of Congo). Multiple regression and regression tree models were compared and used to identify major environmental factors that control the groundwater nitrate concentration in this region. The analyses were made in terms of physical attributes related to the topography, land use, geology and hydrogeology in the capture zone of different groundwater sampling stations. For the nitrate data, groundwater datasets from two different surveys were used. The statistical models identified the topography, the residential area, the service land (cemetery), and the surface-water land-use classes as major factors explaining nitrate occurrence in the groundwater. Also, groundwater nitrate pollution depends not on one single factor but on the combined influence of factors representing nitrogen loading sources and aquifer susceptibility characteristics. The groundwater nitrate pressure was better predicted with the regression tree model than with the multiple regression model. Furthermore, the results elucidated the sensitivity of the model performance towards the method of delineation of the capture zones. For pollution modelling at the monitoring points, therefore, it is better to identify capture-zone shapes based on a conceptual hydrogeological model rather than to adopt arbitrary circular capture zones.

  3. Comparisons between physics-based, engineering, and statistical learning models for outdoor sound propagation.

    PubMed

    Hart, Carl R; Reznicek, Nathan J; Wilson, D Keith; Pettit, Chris L; Nykaza, Edward T

    2016-05-01

    Many outdoor sound propagation models exist, ranging from highly complex physics-based simulations to simplified engineering calculations, and more recently, highly flexible statistical learning methods. Several engineering and statistical learning models are evaluated by using a particular physics-based model, namely, a Crank-Nicholson parabolic equation (CNPE), as a benchmark. Narrowband transmission loss values predicted with the CNPE, based upon a simulated data set of meteorological, boundary, and source conditions, act as simulated observations. In the simulated data set sound propagation conditions span from downward refracting to upward refracting, for acoustically hard and soft boundaries, and low frequencies. Engineering models used in the comparisons include the ISO 9613-2 method, Harmonoise, and Nord2000 propagation models. Statistical learning methods used in the comparisons include bagged decision tree regression, random forest regression, boosting regression, and artificial neural network models. Computed skill scores are relative to sound propagation in a homogeneous atmosphere over a rigid ground. Overall skill scores for the engineering noise models are 0.6%, -7.1%, and 83.8% for the ISO 9613-2, Harmonoise, and Nord2000 models, respectively. Overall skill scores for the statistical learning models are 99.5%, 99.5%, 99.6%, and 99.6% for bagged decision tree, random forest, boosting, and artificial neural network regression models, respectively.

  4. Classification of sodium MRI data of cartilage using machine learning.

    PubMed

    Madelin, Guillaume; Poidevin, Frederick; Makrymallis, Antonios; Regatte, Ravinder R

    2015-11-01

    To assess the possible utility of machine learning for classifying subjects with and subjects without osteoarthritis using sodium magnetic resonance imaging data. Theory: Support vector machine, k-nearest neighbors, naïve Bayes, discriminant analysis, linear regression, logistic regression, neural networks, decision tree, and tree bagging were tested. Sodium magnetic resonance imaging with and without fluid suppression by inversion recovery was acquired on the knee cartilage of 19 controls and 28 osteoarthritis patients. Sodium concentrations were measured in regions of interests in the knee for both acquisitions. Mean (MEAN) and standard deviation (STD) of these concentrations were measured in each regions of interest, and the minimum, maximum, and mean of these two measurements were calculated over all regions of interests for each subject. The resulting 12 variables per subject were used as predictors for classification. Either Min [STD] alone, or in combination with Mean [MEAN] or Min [MEAN], all from fluid suppressed data, were the best predictors with an accuracy >74%, mainly with linear logistic regression and linear support vector machine. Other good classifiers include discriminant analysis, linear regression, and naïve Bayes. Machine learning is a promising technique for classifying osteoarthritis patients and controls from sodium magnetic resonance imaging data. © 2014 Wiley Periodicals, Inc.

  5. DupTree: a program for large-scale phylogenetic analyses using gene tree parsimony.

    PubMed

    Wehe, André; Bansal, Mukul S; Burleigh, J Gordon; Eulenstein, Oliver

    2008-07-01

    DupTree is a new software program for inferring rooted species trees from collections of gene trees using the gene tree parsimony approach. The program implements a novel algorithm that significantly improves upon the run time of standard search heuristics for gene tree parsimony, and enables the first truly genome-scale phylogenetic analyses. In addition, DupTree allows users to examine alternate rootings and to weight the reconciliation costs for gene trees. DupTree is an open source project written in C++. DupTree for Mac OS X, Windows, and Linux along with a sample dataset and an on-line manual are available at http://genome.cs.iastate.edu/CBL/DupTree

  6. Bayesian propensity scores for high-dimensional causal inference: A comparison of drug-eluting to bare-metal coronary stents.

    PubMed

    Spertus, Jacob V; Normand, Sharon-Lise T

    2018-04-23

    High-dimensional data provide many potential confounders that may bolster the plausibility of the ignorability assumption in causal inference problems. Propensity score methods are powerful causal inference tools, which are popular in health care research and are particularly useful for high-dimensional data. Recent interest has surrounded a Bayesian treatment of propensity scores in order to flexibly model the treatment assignment mechanism and summarize posterior quantities while incorporating variance from the treatment model. We discuss methods for Bayesian propensity score analysis of binary treatments, focusing on modern methods for high-dimensional Bayesian regression and the propagation of uncertainty. We introduce a novel and simple estimator for the average treatment effect that capitalizes on conjugacy of the beta and binomial distributions. Through simulations, we show the utility of horseshoe priors and Bayesian additive regression trees paired with our new estimator, while demonstrating the importance of including variance from the treatment regression model. An application to cardiac stent data with almost 500 confounders and 9000 patients illustrates approaches and facilitates comparison with existing alternatives. As measured by a falsifiability endpoint, we improved confounder adjustment compared with past observational research of the same problem. © 2018 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  7. Using Baidu Search Index to Predict Dengue Outbreak in China

    NASA Astrophysics Data System (ADS)

    Liu, Kangkang; Wang, Tao; Yang, Zhicong; Huang, Xiaodong; Milinovich, Gabriel J.; Lu, Yi; Jing, Qinlong; Xia, Yao; Zhao, Zhengyang; Yang, Yang; Tong, Shilu; Hu, Wenbiao; Lu, Jiahai

    2016-12-01

    This study identified the possible threshold to predict dengue fever (DF) outbreaks using Baidu Search Index (BSI). Time-series classification and regression tree models based on BSI were used to develop a predictive model for DF outbreak in Guangzhou and Zhongshan, China. In the regression tree models, the mean autochthonous DF incidence rate increased approximately 30-fold in Guangzhou when the weekly BSI for DF at the lagged moving average of 1-3 weeks was more than 382. When the weekly BSI for DF at the lagged moving average of 1-5 weeks was more than 91.8, there was approximately 9-fold increase of the mean autochthonous DF incidence rate in Zhongshan. In the classification tree models, the results showed that when the weekly BSI for DF at the lagged moving average of 1-3 weeks was more than 99.3, there was 89.28% chance of DF outbreak in Guangzhou, while, in Zhongshan, when the weekly BSI for DF at the lagged moving average of 1-5 weeks was more than 68.1, the chance of DF outbreak rose up to 100%. The study indicated that less cost internet-based surveillance systems can be the valuable complement to traditional DF surveillance in China.

  8. Atlas of United States Trees, Volume 2: Alaska Trees and Common Shrubs.

    ERIC Educational Resources Information Center

    Viereck, Leslie A.; Little, Elbert L., Jr.

    This volume is the second in a series of atlases describing the natural distribution or range of native tree species in the United States. The 82 species maps include 32 of trees in Alaska, 6 of shrubs rarely reaching tree size, and 44 more of common shrubs. More than 20 additional maps summarize environmental factors and furnish general…

  9. Mastication and prescribed fire influences on tree mortality and predicted fire behavior in ponderosa pine

    Treesearch

    Alicia L. Reiner; Nicole M. Vaillant; Scott N. Dailey

    2012-01-01

    The purpose of this study was to provide land managers with information on potential wildfire behavior and tree mortality associated with mastication and masticated/fire treatments in a plantation. Additionally, the effect of pulling fuels away from tree boles before applying fire treatment was studied in relation to tree mortality. Fuel characteristics and tree...

  10. A method to study response of large trees to different amounts of available soil water

    Treesearch

    D.H. Marx; Shi-Jean S. Sung; J.S. Cunningham; M.D. Thompson; L.M. White

    1995-01-01

    A method was developed to manipulate available soil water on large trees by intercepting thrufall with gutters placed under tree canopies and irrigating the intercepted thrufall onto other trees. With this design, trees were exposed for 2 years to either 25% less thrufall, normal thrufall, or 25% additional thrufall.Undercanopy construction in these plots moderately...

  11. A Method to Study Response of Large Trees to Different Amounts of Available Soil Water

    Treesearch

    Donald H. Marx; Shi-jean S. Sung; James S. Cunningham; Michael D. Thompson; Linda M. White

    1995-01-01

    A method was developed to manipulate available soil water on large trees by intercepting thrufall with gutters placed under tree canopies and irrigating the intercepted thrufall onto other trees. With this design, trees were exposed for 2 years to either 25 percent less thrufall, normal tbrufall,or 25 percent additional thrufall. Undercanopy construction in these plots...

  12. A soil map of a large watershed in China: applying digital soil mapping in a data sparse region

    NASA Astrophysics Data System (ADS)

    Barthold, F.; Blank, B.; Wiesmeier, M.; Breuer, L.; Frede, H.-G.

    2009-04-01

    Prediction of soil classes in data sparse regions is a major research challenge. With the advent of machine learning the possibilities to spatially predict soil classes have increased tremendously and given birth to new possibilities in soil mapping. Digital soil mapping is a research field that has been established during the last decades and has been accepted widely. We now need to develop tools to reduce the uncertainty in soil predictions. This is especially challenging in data sparse regions. One approach to do this is to implement soil taxonomic distance as a classification error criterion in classification and regression trees (CART) as suggested by Minasny et al. (Geoderma 142 (2007) 285-293). This approach assumes that the classification error should be larger between soils that are more dissimilar, i.e. differ in a larger number of soil properties, and smaller between more similar soils. Our study area is the Xilin River Basin, which is located in central Inner Mongolia in China. It is characterized by semi arid climate conditions and is representative for the natural occurring steppe ecosystem. The study area comprises 3600 km2. We applied a random, stratified sampling design after McKenzie and Ryan (Geoderma 89 (1999) 67-94) with landuse and topography as stratifying variables. We defined 10 sampling classes, from each class 14 replicates were randomly drawn and sampled. The dataset was split into 100 soil profiles for training and 40 soil profiles for validation. We then applied classification and regression trees (CART) to quantify the relationships between soil classes and environmental covariates. The classification tree explained 75.5% of the variance with land use and geology as most important predictor variables. Among the 8 soil classes that we predicted, the Kastanozems cover most of the area. They are predominantly found in steppe areas. However, even some of the soils at sand dune sites, which were thought to show only little soil formation, can be classified as Kastanozems. Besides the Kastanozems, Regosols are most common at the sand dune sites as well as at sites that are defined as bare soil which are characterized by little or no vegetation. Gleysols are mostly found at sites in the vicinity of the Xilin river, which are connected to the groundwater. They can also be found in small valleys or depressions where sub-surface waters from neighboring areas collect. The richest soils are found in mountain meadow areas. Pedogenetic conditions here are most favorable and lead to the formation of Chernozems with deep humic Ah horizons. Other soil types that occur in the study area are Arenosols, Calcisols, Cambisol and Phaeozems. In addition, soil taxonomic distance is implemented into the decision tree procedure as a measure of classification error. The results of incorporating taxonomic distance as a loss function in the decision tree will be compared with the standard application of the decision tree.

  13. An evaluation of supervised classifiers for indirectly detecting salt-affected areas at irrigation scheme level

    NASA Astrophysics Data System (ADS)

    Muller, Sybrand Jacobus; van Niekerk, Adriaan

    2016-07-01

    Soil salinity often leads to reduced crop yield and quality and can render soils barren. Irrigated areas are particularly at risk due to intensive cultivation and secondary salinization caused by waterlogging. Regular monitoring of salt accumulation in irrigation schemes is needed to keep its negative effects under control. The dynamic spatial and temporal characteristics of remote sensing can provide a cost-effective solution for monitoring salt accumulation at irrigation scheme level. This study evaluated a range of pan-fused SPOT-5 derived features (spectral bands, vegetation indices, image textures and image transformations) for classifying salt-affected areas in two distinctly different irrigation schemes in South Africa, namely Vaalharts and Breede River. The relationship between the input features and electro conductivity measurements were investigated using regression modelling (stepwise linear regression, partial least squares regression, curve fit regression modelling) and supervised classification (maximum likelihood, nearest neighbour, decision tree analysis, support vector machine and random forests). Classification and regression trees and random forest were used to select the most important features for differentiating salt-affected and unaffected areas. The results showed that the regression analyses produced weak models (<0.4 R squared). Better results were achieved using the supervised classifiers, but the algorithms tend to over-estimate salt-affected areas. A key finding was that none of the feature sets or classification algorithms stood out as being superior for monitoring salt accumulation at irrigation scheme level. This was attributed to the large variations in the spectral responses of different crops types at different growing stages, coupled with their individual tolerances to saline conditions.

  14. Tree-ring-based estimates of long-term seasonal precipitation in the Souris River Region of Saskatchewan, North Dakota and Manitoba

    USGS Publications Warehouse

    Ryberg, Karen R.; Vecchia, Aldo V.; Akyüz, F. Adnan; Lin, Wei

    2016-01-01

    Historically unprecedented flooding occurred in the Souris River Basin of Saskatchewan, North Dakota and Manitoba in 2011, during a longer term period of wet conditions in the basin. In order to develop a model of future flows, there is a need to evaluate effects of past multidecadal climate variability and/or possible climate change on precipitation. In this study, tree-ring chronologies and historical precipitation data in a four-degree buffer around the Souris River Basin were analyzed to develop regression models that can be used for predicting long-term variations of precipitation. To focus on longer term variability, 12-year moving average precipitation was modeled in five subregions (determined through cluster analysis of measures of precipitation) of the study area over three seasons (November–February, March–June and July–October). The models used multiresolution decomposition (an additive decomposition based on powers of two using a discrete wavelet transform) of tree-ring chronologies from Canada and the US and seasonal 12-year moving average precipitation based on Adjusted and Homogenized Canadian Climate Data and US Historical Climatology Network data. Results show that precipitation varies on long-term (multidecadal) time scales of 16, 32 and 64 years. Past extended pluvial and drought events, which can vary greatly with season and subregion, were highlighted by the models. Results suggest that the recent wet period may be a part of natural variability on a very long time scale.

  15. The relationship between trees and human health: evidence from the spread of the emerald ash borer.

    PubMed

    Donovan, Geoffrey H; Butry, David T; Michael, Yvonne L; Prestemon, Jeffrey P; Liebhold, Andrew M; Gatziolis, Demetrios; Mao, Megan Y

    2013-02-01

    Several recent studies have identified a relationship between the natural environment and improved health outcomes. However, for practical reasons, most have been observational, cross-sectional studies. A natural experiment, which provides stronger evidence of causality, was used to test whether a major change to the natural environment-the loss of 100 million trees to the emerald ash borer, an invasive forest pest-has influenced mortality related to cardiovascular and lower-respiratory diseases. Two fixed-effects regression models were used to estimate the relationship between emerald ash borer presence and county-level mortality from 1990 to 2007 in 15 U.S. states, while controlling for a wide range of demographic covariates. Data were collected from 1990 to 2007, and the analyses were conducted in 2011 and 2012. There was an increase in mortality related to cardiovascular and lower-respiratory-tract illness in counties infested with the emerald ash borer. The magnitude of this effect was greater as infestation progressed and in counties with above-average median household income. Across the 15 states in the study area, the borer was associated with an additional 6113 deaths related to illness of the lower respiratory system, and 15,080 cardiovascular-related deaths. Results suggest that loss of trees to the emerald ash borer increased mortality related to cardiovascular and lower-respiratory-tract illness. This finding adds to the growing evidence that the natural environment provides major public health benefits. Published by Elsevier Inc.

  16. Synchrony of forest responses to climate from the aspect of tree mortality in South Korea

    NASA Astrophysics Data System (ADS)

    Kim, M.; Lee, W. K.; Piao, D.; Choi, G. M.; Gang, H. U.

    2016-12-01

    Mortality is a key process in forest-stand dynamics. However, tree mortality is not well understood, particularly in relation to climatic factors. The objectives of this study were to: (i) determine the patterns of maximum stem number (MSN) per ha over dominant tree height from 5-year remeasurements of the permanent sample plots for temperate forests [Red pine (Pinus densiflora), Japanese larch (Larix kaempferi), Korean pine (Pinus koraiensis), Chinese cork oak (Quercus variabilis), and Mongolian oak (Quercus mongolica)] using Sterba's theory and Korean National Forest Inventory (NFI) data, (ii) develop a stand-level mortality (self-thinning) model using the MSN curve, and (iii) assess the impact of temperature on tree mortality in semi-variogram and linear regression models. The MSN curve represents the upper range of observed stem numbers per ha. The mortality model and validation statistic reveal significant differences between the observed data and the model predictions (R2 = 0.55-0.81), and no obvious dependencies or patterns that indicate systematic trends between the residuals and the independent variable. However, spatial autocorrelation was detected from residuals of coniferous species (Red pine, Japanese larch and Korean pine), but not of oak species (Chinese cork oak and Mongolian oak). Based on linear regression from residuals, we found that the mortality of coniferous forests tended to increase when the annual mean temperature increased. Conversely, oak mortality nonsignificantly decreased with increasing temperature. These findings indicate that enhanced tree mortality due to rising temperatures in response to climate change is possible, especially in coniferous forests, and are expected to contribute to policy decisions to support and forest management practices.

  17. VEGETATION COVER ANALYSIS OF HAZARDOUS WASTE SITES IN UTAH AND ARIZONA USING HYPERSPECTRAL REMOTE SENSING

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Serrato, M.; Jungho, I.; Jensen, J.

    2012-01-17

    Remote sensing technology can provide a cost-effective tool for monitoring hazardous waste sites. This study investigated the usability of HyMap airborne hyperspectral remote sensing data (126 bands at 2.3 x 2.3 m spatial resolution) to characterize the vegetation at U.S. Department of Energy uranium processing sites near Monticello, Utah and Monument Valley, Arizona. Grass and shrub species were mixed on an engineered disposal cell cover at the Monticello site while shrub species were dominant in the phytoremediation plantings at the Monument Valley site. The specific objectives of this study were to: (1) estimate leaf-area-index (LAI) of the vegetation using threemore » different methods (i.e., vegetation indices, red-edge positioning (REP), and machine learning regression trees), and (2) map the vegetation cover using machine learning decision trees based on either the scaled reflectance data or mixture tuned matched filtering (MTMF)-derived metrics and vegetation indices. Regression trees resulted in the best calibration performance of LAI estimation (R{sup 2} > 0.80). The use of REPs failed to accurately predict LAI (R{sup 2} < 0.2). The use of the MTMF-derived metrics (matched filter scores and infeasibility) and a range of vegetation indices in decision trees improved the vegetation mapping when compared to the decision tree classification using just the scaled reflectance. Results suggest that hyperspectral imagery are useful for characterizing biophysical characteristics (LAI) and vegetation cover on capped hazardous waste sites. However, it is believed that the vegetation mapping would benefit from the use of 1 higher spatial resolution hyperspectral data due to the small size of many of the vegetation patches (< 1m) found on the sites.« less

  18. Compatible Models of Carbon Content of Individual Trees on a Cunninghamia lanceolata Plantation in Fujian Province, China

    PubMed Central

    Zhuo, Lin; Tao, Hong; Wei, Hong; Chengzhen, Wu

    2016-01-01

    We tried to establish compatible carbon content models of individual trees for a Chinese fir (Cunninghamia lanceolata (Lamb.) Hook.) plantation from Fujian province in southeast China. In general, compatibility requires that the sum of components equal the whole tree, meaning that the sum of percentages calculated from component equations should equal 100%. Thus, we used multiple approaches to simulate carbon content in boles, branches, foliage leaves, roots and the whole individual trees. The approaches included (i) single optimal fitting (SOF), (ii) nonlinear adjustment in proportion (NAP) and (iii) nonlinear seemingly unrelated regression (NSUR). These approaches were used in combination with variables relating diameter at breast height (D) and tree height (H), such as D, D2H, DH and D&H (where D&H means two separate variables in bivariate model). Power, exponential and polynomial functions were tested as well as a new general function model was proposed by this study. Weighted least squares regression models were employed to eliminate heteroscedasticity. Model performances were evaluated by using mean residuals, residual variance, mean square error and the determination coefficient. The results indicated that models with two dimensional variables (DH, D2H and D&H) were always superior to those with a single variable (D). The D&H variable combination was found to be the most useful predictor. Of all the approaches, SOF could establish a single optimal model separately, but there were deviations in estimating results due to existing incompatibilities, while NAP and NSUR could ensure predictions compatibility. Simultaneously, we found that the new general model had better accuracy than others. In conclusion, we recommend that the new general model be used to estimate carbon content for Chinese fir and considered for other vegetation types as well. PMID:26982054

  19. Chi-squared Automatic Interaction Detection Decision Tree Analysis of Risk Factors for Infant Anemia in Beijing, China

    PubMed Central

    Ye, Fang; Chen, Zhi-Hua; Chen, Jie; Liu, Fang; Zhang, Yong; Fan, Qin-Ying; Wang, Lin

    2016-01-01

    Background: In the past decades, studies on infant anemia have mainly focused on rural areas of China. With the increasing heterogeneity of population in recent years, available information on infant anemia is inconclusive in large cities of China, especially with comparison between native residents and floating population. This population-based cross-sectional study was implemented to determine the anemic status of infants as well as the risk factors in a representative downtown area of Beijing. Methods: As useful methods to build a predictive model, Chi-squared automatic interaction detection (CHAID) decision tree analysis and logistic regression analysis were introduced to explore risk factors of infant anemia. A total of 1091 infants aged 6–12 months together with their parents/caregivers living at Heping Avenue Subdistrict of Beijing were surveyed from January 1, 2013 to December 31, 2014. Results: The prevalence of anemia was 12.60% with a range of 3.47%–40.00% in different subgroup characteristics. The CHAID decision tree model has demonstrated multilevel interaction among risk factors through stepwise pathways to detect anemia. Besides the three predictors identified by logistic regression model including maternal anemia during pregnancy, exclusive breastfeeding in the first 6 months, and floating population, CHAID decision tree analysis also identified the fourth risk factor, the maternal educational level, with higher overall classification accuracy and larger area below the receiver operating characteristic curve. Conclusions: The infant anemic status in metropolis is complex and should be carefully considered by the basic health care practitioners. CHAID decision tree analysis has demonstrated a better performance in hierarchical analysis of population with great heterogeneity. Risk factors identified by this study might be meaningful in the early detection and prompt treatment of infant anemia in large cities. PMID:27174328

  20. Chi-squared Automatic Interaction Detection Decision Tree Analysis of Risk Factors for Infant Anemia in Beijing, China.

    PubMed

    Ye, Fang; Chen, Zhi-Hua; Chen, Jie; Liu, Fang; Zhang, Yong; Fan, Qin-Ying; Wang, Lin

    2016-05-20

    In the past decades, studies on infant anemia have mainly focused on rural areas of China. With the increasing heterogeneity of population in recent years, available information on infant anemia is inconclusive in large cities of China, especially with comparison between native residents and floating population. This population-based cross-sectional study was implemented to determine the anemic status of infants as well as the risk factors in a representative downtown area of Beijing. As useful methods to build a predictive model, Chi-squared automatic interaction detection (CHAID) decision tree analysis and logistic regression analysis were introduced to explore risk factors of infant anemia. A total of 1091 infants aged 6-12 months together with their parents/caregivers living at Heping Avenue Subdistrict of Beijing were surveyed from January 1, 2013 to December 31, 2014. The prevalence of anemia was 12.60% with a range of 3.47%-40.00% in different subgroup characteristics. The CHAID decision tree model has demonstrated multilevel interaction among risk factors through stepwise pathways to detect anemia. Besides the three predictors identified by logistic regression model including maternal anemia during pregnancy, exclusive breastfeeding in the first 6 months, and floating population, CHAID decision tree analysis also identified the fourth risk factor, the maternal educational level, with higher overall classification accuracy and larger area below the receiver operating characteristic curve. The infant anemic status in metropolis is complex and should be carefully considered by the basic health care practitioners. CHAID decision tree analysis has demonstrated a better performance in hierarchical analysis of population with great heterogeneity. Risk factors identified by this study might be meaningful in the early detection and prompt treatment of infant anemia in large cities.

  1. Tree species diversity and its relationship to stand parameters and geomorphology features in the eastern Black Sea region forests of Turkey.

    PubMed

    Ozcelik, Ramazan; Gul, Altay Ugur; Merganic, Jan; Merganicova, Katarina

    2008-05-01

    We studied the effects of stand parameters (crown closure, basal area, stand volume, age, mean stand diameter number of trees, and heterogeneity index) and geomorphology features (elevation, aspect and slope) on tree species diversity in an example of untreated natural mixed forest stands in the eastern Black Sea region of Turkey. Tree species diversity and basal area heterogeneity in forest ecosystems are quantified using the Shannon-Weaver and Simpson indices. The relationship between tree species diversity basal area heterogeneity stand parameters and geomorphology features are examined using regression analysis. Our work revealed that the relationship between tree species diversity and stand parameters is loose with a correlation coefficient between 0.02 and 0.70. The correlation of basal area heterogeneity with stand parameters fluctuated between 0.004 and 0.77 (R2). According to our results, stands with higher tree species diversity are characterised by higher mean stand diameter number of diameter classes, basal area and lower homogeneity index value. Considering the effect of geomorphology features on tree species or basal area heterogeneity we found that all investigated relationships are loose with R < or = 0.24. A significant correlation was detected only between tree species diversity and aspect. Future work is required to verify the detected trends in behaviour of tree species diversity if it is to estimate from the usual forest stand parameters and topography characteristics.

  2. Characteristics of the tree-drawing test in chronic schizophrenia.

    PubMed

    Kaneda, Ayako; Yasui-Furukori, Norio; Saito, Manabu; Sugawara, Norio; Nakagami, Taku; Furukori, Hanako; Kaneko, Sunao

    2010-04-01

    A tree-drawing test acts as both a projective psychological examination as well as a supplementary psychodiagnostic tool. There is little information relating the characteristics of schizophrenia and the tree-drawing test. The present study compared the structural and morphological differences in the results of the tree-drawing test between schizophrenic patients and healthy individuals, as well as between schizophrenic patients who responded well to treatment and those who responded poorly. The subjects included 202 chronic schizophrenic patients and 113 healthy individuals. The schizophrenic patients were categorized as 'good responders' or 'poor responders' based on their response to medical treatments. The tree-drawing test was performed on all subjects. The tree drawn by each subject was analyzed structurally and morphologically. There were significant differences between the trunk and branches drawn by schizophrenic patients and those drawn by healthy controls. There were no significant differences between the good responders and the poor responders in any aspect of the tree drawings. Multiple regression models showed that the ratio of the tree area to the total area of the drawing paper, the width of the trunk, the trunk base opening, and the size of the branch ends were significantly associated with schizophrenia. The present study suggests that the trees drawn by schizophrenic patients are significantly different from those drawn by healthy individuals, but among schizophrenic patients, it is difficult to distinguish between good responders and poor responders using the tree-drawing test.

  3. Relationship between leaf functional traits and productivity in Aquilaria crassna (Thymelaeaceae) plantations: a tool to aid in the early selection of high-yielding trees.

    PubMed

    López-Sampson, Arlene; Cernusak, Lucas A; Page, Tony

    2017-05-01

    Physiological traits are frequently used as indicators of tree productivity. Aquilaria species growing in a research planting were studied to investigate relationships between leaf-productivity traits and tree growth. Twenty-eight trees were selected to measure isotopic composition of carbon (δ13C) and nitrogen (δ15N) and monitor six leaf attributes. Trees were sampled randomly within each of four diametric classes (at 150 mm above ground level) ensuring the variability in growth of the whole population was represented. A model averaging technique based on the Akaike's information criterion was computed to identify whether leaf traits could assist in diameter prediction. Regression analysis was performed to test for relationships between carbon isotope values and diameter and leaf traits. Approximately one new leaf per week was produced by a shoot. The rate of leaf expansion was estimated as 1.45 mm day-1. The range of δ13C values in leaves of Aquilaria species was from -25.5‰ to -31‰, with an average of -28.4 ‰ (±1.5‰ SD). A moderate negative correlation (R2 = 0.357) between diameter and δ13C in leaf dry matter indicated that individuals with high intercellular CO2 concentrations (low δ13C) and associated low water-use efficiency sustained rapid growth. Analysis of the 95% confidence of best-ranked regression models indicated that the predictors that could best explain growth in Aquilaria species were δ13C, δ15N, petiole length, number of new leaves produced per week and specific leaf area. The model constructed with these variables explained 55% (R2 = 0.55) of the variability in stem diameter. This demonstrates that leaf traits can assist in the early selection of high-productivity trees in Aquilaria species. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  4. Combining Passive Microwave and Optical Data to Estimate Snow Water Equivalent in Afghanistan's Hindu Kush

    NASA Astrophysics Data System (ADS)

    Dozier, J.; Bair, N.; Calfa, A. A.; Skalka, C.; Tolle, K.; Bongard, J.

    2015-12-01

    The task is to estimate spatiotemporally distributed estimates of snow water equivalent (SWE) in snow-dominated mountain environments, including those that lack on-the-ground measurements such as the Hindu Kush range in Afghanistan. During the snow season, we can use two measurements: (1) passive microwave estimates of SWE, which generally underestimate in the mountains; (2) fractional snow-covered area from MODIS. Once the snow has melted, we can reconstruct the accumulated SWE back to the last significant snowfall by calculating the energy used in melt. The reconstructed SWE values provide a training set for predictions from the passive microwave SWE and snow-covered area. We examine several machine learning methods—regression-boosted decision trees, bagged trees, neural networks, and genetic programming—to estimate SWE. All methods work reasonably well, with R2 values greater than 0.8. Predictors built with multiple years of data reduce the bias that usually appears if we predict one year from just one other year's training set. Genetic programming tends to produce results that additionally provide physical insight. Adding precipitation estimates from the Global Precipitation Measurements mission is in progress.

  5. [Satellite remote sensing retrieval of canopy nitrogen nutritional status of apple trees at blossom stage].

    PubMed

    Wang, Ling; Zhao, Geng-Xing; Zhu, Xi-Cun; Wang, Rui-Yan; Chang, Chun-Yan

    2013-10-01

    Taking Qixia City of Shandong, China as the study area, and based on the Landsat-5 TM and ALOS AVNIR-2 images, the canopy retrieval reflectance of apple trees at blossom stage was acquired. In combining with the measured reflectance of sample trees, the nitrogen-sensitive spectral indices were constructed and selected. By using the sensitive spectral indices as the independent variables, the nitrogen retrieval models were established, and the model with the best accuracy was used for spatial retrieve. The correlations between the spectral indices and the nitrogen nutritional status were in the order of canopy > leaf > flower. The sensitive indices were mainly composed of green, red, and near infrared bands. The accuracy of the retrieval models was in the order of support vector regression > multi-variable stepwise regression > one-variable regression. The retrieval results based on different images were similar, and showed that the leaf nitrogen content was mainly of grades 3-4 (27-33 g x kg(-1)), and the canopy nitrogen nutrient indices were mainly of grades 2-4 (TM: 38-47 g x kg(-1); ALOS: 32-41 g x kg(-1)). The spatial distribution of the retrieval nitrogen nutritional status based on different images also showed the similar trend, i. e., the nitrogen nutritional status was higher in the north and south than that in the middle part of the study area, and the areas with the high grades of leaf nitrogen and canopy nitrogen were mainly located in Sujiadian Town and Songshan subdistrict in the northwest, Zangjiazhuang Town and Tingkou Town in the northeast, and Shewopo Town in the south, which were consistent with the distribution of the key towns for apple production in Qixia City. This study provided a feasible method for the acquisition of nitrogen nutritional status of apple trees on macroscopic scale, and also, provided reference for other similar remote sensing retrievals.

  6. Identification of chilling and heat requirements of cherry trees--a statistical approach.

    PubMed

    Luedeling, Eike; Kunz, Achim; Blanke, Michael M

    2013-09-01

    Most trees from temperate climates require the accumulation of winter chill and subsequent heat during their dormant phase to resume growth and initiate flowering in the following spring. Global warming could reduce chill and hence hamper the cultivation of high-chill species such as cherries. Yet determining chilling and heat requirements requires large-scale controlled-forcing experiments, and estimates are thus often unavailable. Where long-term phenology datasets exist, partial least squares (PLS) regression can be used as an alternative, to determine climatic requirements statistically. Bloom dates of cherry cv. 'Schneiders späte Knorpelkirsche' trees in Klein-Altendorf, Germany, from 24 growing seasons were correlated with 11-day running means of daily mean temperature. Based on the output of the PLS regression, five candidate chilling periods ranging in length from 17 to 102 days, and one forcing phase of 66 days were delineated. Among three common chill models used to quantify chill, the Dynamic Model showed the lowest variation in chill, indicating that it may be more accurate than the Utah and Chilling Hours Models. Based on the longest candidate chilling phase with the earliest starting date, cv. 'Schneiders späte Knorpelkirsche' cherries at Bonn exhibited a chilling requirement of 68.6 ± 5.7 chill portions (or 1,375 ± 178 chilling hours or 1,410 ± 238 Utah chill units) and a heat requirement of 3,473 ± 1,236 growing degree hours. Closer investigation of the distinct chilling phases detected by PLS regression could contribute to our understanding of dormancy processes and thus help fruit and nut growers identify suitable tree cultivars for a future in which static climatic conditions can no longer be assumed. All procedures used in this study were bundled in an R package ('chillR') and are provided as Supplementary materials. The procedure was also applied to leaf emergence dates of walnut (cv. 'Payne') at Davis, California.

  7. Spatial variability of biotic and abiotic tree establishment constraints across a treeline ecotone in the Alaska range.

    PubMed

    Stueve, Kirk M; Isaacs, Rachel E; Tyrrell, Lucy E; Densmore, Roseann V

    2011-02-01

    Throughout interior Alaska (U.S.A.), a gradual warming trend in mean monthly temperatures occurred over the last few decades (approximatlely 2-4 degrees C). The accompanying increases in woody vegetation at many alpine treeline (hereafter treeline) locations provided an opportunity to examine how biotic and abiotic local site conditions interact to control tree establishment patterns during warming. We devised a landscape ecological approach to investigate these relationships at an undisturbed treeline in the Alaska Range. We identified treeline changes between 1953 (aerial photography) and 2005 (satellite imagery) in a geographic information system (GIS) and linked them with corresponding local site conditions derived from digital terrain data, ancillary climate data, and distance to 1953 trees. Logistic regressions enabled us to rank the importance of local site conditions in controlling tree establishment. We discovered a spatial transition in the importance of tree establishment controls. The biotic variable (proximity to 1953 trees) was the most important tree establishment predictor below the upper tree limit, providing evidence of response lags with the abiotic setting and suggesting that tree establishment is rarely in equilibrium with the physical environment or responding directly to warming. Elevation and winter sun exposure were important predictors of tree establishment at the upper tree limit, but proximity to trees persisted as an important tertiary predictor, indicating that tree establishment may achieve equilibrium with the physical environment. However, even here, influences from the biotic variable may obscure unequivocal correlations with the abiotic setting (including temperature). Future treeline expansion will likely be patchy and challenging to predict without considering the spatial variability of influences from biotic and abiotic local site conditions.

  8. Spatial variability of biotic and abiotic tree establishment constraints across a treeline ecotone in the Alaska Range

    USGS Publications Warehouse

    Stueve, K.M.; Isaacs, R.E.; Tyrrell, L.E.; Densmore, R.V.

    2011-01-01

    Throughout interior Alaska (USA), a gradual warming trend in mean monthly temperatures occurred over the last few decades (;2-48C). The accompanying increases in woody vegetation at many alpine treeline (hereafter treeline) locations provided an opportunity to examine how biotic and abiotic local site conditions interact to control tree establishment patterns during warming. We devised a landscape ecological approach to investigate these relationships at an undisturbed treeline in the Alaska Range. We identified treeline changes between 1953 (aerial photography) and 2005 (satellite imagery) in a geographic information system (GIS) and linked them with corresponding local site conditions derived from digital terrain data, ancillary climate data, and distance to 1953 trees. Logistic regressions enabled us to rank the importance of local site conditions in controlling tree establishment. We discovered a spatial transition in the importance of tree establishment controls. The biotic variable (proximity to 1953 trees) was the most important tree establishment predictor below the upper tree limit, providing evidence of response lags with the abiotic setting and suggesting that tree establishment is rarely in equilibrium with the physical environment or responding directly to warming. Elevation and winter sun exposure were important predictors of tree establishment at the upper tree limit, but proximity to trees persisted as an important tertiary predictor, indicating that tree establishment may achieve equilibrium with the physical environment. However, even here, influences from the biotic variable may obscure unequivocal correlations with the abiotic setting (including temperature). Future treeline expansion will likely be patchy and challenging to predict without considering the spatial variability of influences from biotic and abiotic local site conditions. ?? 2011 by the Ecological Society of America.

  9. Response of giant sequoia canopy foliage to elevated concentrations of atmospheric ozone.

    PubMed

    Grulke, N E; Miller, P R; Scioli, D

    1996-06-01

    We examined the physiological response of foliage in the upper third of the canopy of 125-year-old giant sequoia (Sequoiadendron giganteum Buchholz.) trees to a 61-day exposure to 0.25x, 1x, 2x or 3x ambient ozone concentration. Four branch exposure chambers, one per ozone treatment, were installed on 1-m long secondary branches of each tree at a height of 34 m. No visible symptoms of foliar ozone damage were apparent throughout the 61-day exposure period and none of the ozone treatments affected branch growth. Despite the similarity in ozone concentrations in the branch chambers within a treatment, the trees exhibited different physiological responses to increasing ozone uptake. Differences in diurnal and seasonal patterns of g(s) among the trees led to a 2-fold greater ozone uptake in tree No. 2 compared with trees Nos. 1 and 3. Tree No. 3 had significantly higher CER and g(s) at 0.25x ambient ozone than trees Nos. 1 and 2, and g(s) and CER of tree No. 3 declined with increasing ozone uptake. The y-intercept of the regression for dark respiration versus ozone uptake was significantly lower for tree No. 2 than for trees Nos. 1 and 3. In the 0.25x and 1x ozone treatments, the chlorophyll concentration of current-year foliage of trees Nos. 1 and 2 was significantly higher than that of current-year foliage of tree No. 3. Chlorophyll concentration of current-year foliage on tree No. 1 did not decline with increasing ozone uptake. In all trees, total needle water potential decreased with increasing ozone uptake, but turgor was constant. Although tree No. 2 had the greatest ozone uptake, g(s) was highest and foliar chlorophyll concentration was lowest in tree No. 3 in the 0.25x and 1x ambient atmospheric ozone treatments.

  10. Using Smartphone Sensors for Improving Energy Expenditure Estimation

    PubMed Central

    Zhu, Jindan; Das, Aveek K.; Zeng, Yunze; Mohapatra, Prasant; Han, Jay J.

    2015-01-01

    Energy expenditure (EE) estimation is an important factor in tracking personal activity and preventing chronic diseases, such as obesity and diabetes. Accurate and real-time EE estimation utilizing small wearable sensors is a difficult task, primarily because the most existing schemes work offline or use heuristics. In this paper, we focus on accurate EE estimation for tracking ambulatory activities (walking, standing, climbing upstairs, or downstairs) of a typical smartphone user. We used built-in smartphone sensors (accelerometer and barometer sensor), sampled at low frequency, to accurately estimate EE. Using a barometer sensor, in addition to an accelerometer sensor, greatly increases the accuracy of EE estimation. Using bagged regression trees, a machine learning technique, we developed a generic regression model for EE estimation that yields upto 96% correlation with actual EE. We compare our results against the state-of-the-art calorimetry equations and consumer electronics devices (Fitbit and Nike+ FuelBand). The newly developed EE estimation algorithm demonstrated superior accuracy compared with currently available methods. The results were calibrated against COSMED K4b2 calorimeter readings. PMID:27170901

  11. Using Smartphone Sensors for Improving Energy Expenditure Estimation.

    PubMed

    Pande, Amit; Zhu, Jindan; Das, Aveek K; Zeng, Yunze; Mohapatra, Prasant; Han, Jay J

    2015-01-01

    Energy expenditure (EE) estimation is an important factor in tracking personal activity and preventing chronic diseases, such as obesity and diabetes. Accurate and real-time EE estimation utilizing small wearable sensors is a difficult task, primarily because the most existing schemes work offline or use heuristics. In this paper, we focus on accurate EE estimation for tracking ambulatory activities (walking, standing, climbing upstairs, or downstairs) of a typical smartphone user. We used built-in smartphone sensors (accelerometer and barometer sensor), sampled at low frequency, to accurately estimate EE. Using a barometer sensor, in addition to an accelerometer sensor, greatly increases the accuracy of EE estimation. Using bagged regression trees, a machine learning technique, we developed a generic regression model for EE estimation that yields upto 96% correlation with actual EE. We compare our results against the state-of-the-art calorimetry equations and consumer electronics devices (Fitbit and Nike+ FuelBand). The newly developed EE estimation algorithm demonstrated superior accuracy compared with currently available methods. The results were calibrated against COSMED K4b2 calorimeter readings.

  12. [Age structure and dynamics of Quercus wutaishanica population in Lingkong Mountain of Shanxi Province, China].

    PubMed

    Zhang, Jie; Shangguan, Tie-Liang; Duan, Yi-Hao; Guo, Wei; Liu, Wei-Hua; Guo, Dong-Gang

    2014-11-01

    Using the plant survivorship theory, the age structure, and the relationship between tree height and diameter (DBH) of Quercus wutaishanica population in Lingkong Mountain were analyzed, and the static life table was compiled and the survival curve plotted. The shuttle shape in age structure of Q. wutaishanica population suggested its temporal stability. The linear regression significantly fitted the positive correlation between tree height and DBH. The maximal life expectancy was observed among the trees beyond the age of the highest mortality and coincided with the lowest point of mortality density, suggesting the strong vitality of the seedlings and young trees that survived in the natural selection and intraspecific competition. The population stability of the Q. wutaishanica population was characterized by the Deevey-II of the survival curve. The dynamic pattern was characterized by the recession in the early phase, growth in the intermediate phase, and stability in the latter phase.

  13. Remeasuring tree heights on permanent plots using rectangular coordinates and one angle per tree

    Treesearch

    Robert L. Neal

    1973-01-01

    Heights of permanent sample trees with tops visible from any point can be measured from that point with any clinometer, measuring one vertical angle per tree. Two horizontal angles and one additional vertical angle per observation point are necessary to orient the point to the plot. Permanently recorded coordinates and elevations of tree locations are used with the...

  14. Million trees Los Angeles canopy cover and benefit assessment

    Treesearch

    E.G. McPherson; J.R. Simpson; Q. Xiao; C. Wu

    2011-01-01

    The Million Trees LA initiative intends to improve Los Angeles’s environment through planting and stewardship of 1 million trees. The purpose of this study was to measure Los Angeles’s existing tree canopy cover (TCC), determine if space exists for 1 million additional trees, and estimate future benefits from the planting. High-resolution QuickBird remote sensing data...

  15. Missing Rings in Pinus halepensis – The Missing Link to Relate the Tree-Ring Record to Extreme Climatic Events

    PubMed Central

    Novak, Klemen; de Luis, Martin; Saz, Miguel A.; Longares, Luis A.; Serrano-Notivoli, Roberto; Raventós, Josep; Čufar, Katarina; Gričar, Jožica; Di Filippo, Alfredo; Piovesan, Gianluca; Rathgeber, Cyrille B. K.; Papadopoulos, Andreas; Smith, Kevin T.

    2016-01-01

    Climate predictions for the Mediterranean Basin include increased temperatures, decreased precipitation, and increased frequency of extreme climatic events (ECE). These conditions are associated with decreased tree growth and increased vulnerability to pests and diseases. The anatomy of tree rings responds to these environmental conditions. Quantitatively, the width of a tree ring is largely determined by the rate and duration of cell division by the vascular cambium. In the Mediterranean climate, this division may occur throughout almost the entire year. Alternatively, cell division may cease during relatively cool and dry winters, only to resume in the same calendar year with milder temperatures and increased availability of water. Under particularly adverse conditions, no xylem may be produced in parts of the stem, resulting in a missing ring (MR). A dendrochronological network of Pinus halepensis was used to determine the relationship of MR to ECE. The network consisted of 113 sites, 1,509 trees, 2,593 cores, and 225,428 tree rings throughout the distribution range of the species. A total of 4,150 MR were identified. Binomial logistic regression analysis determined that MR frequency increased with increased cambial age. Spatial analysis indicated that the geographic areas of south-eastern Spain and northern Algeria contained the greatest frequency of MR. Dendroclimatic regression analysis indicated a non-linear relationship of MR to total monthly precipitation and mean temperature. MR are strongly associated with the combination of monthly mean temperature from previous October till current February and total precipitation from previous September till current May. They are likely to occur with total precipitation lower than 50 mm and temperatures higher than 5°C. This conclusion is global and can be applied to every site across the distribution area. Rather than simply being a complication for dendrochronology, MR formation is a fundamental response of trees to adverse environmental conditions. The demonstrated relationship of MR formation to ECE across this dendrochronological network in the Mediterranean basin shows the potential of MR analysis to reconstruct the history of past climatic extremes and to predict future forest dynamics in a changing climate. PMID:27303421

  16. [Soil and forest structure in the Colombian Amazon].

    PubMed

    Calle-Rendón, Bayron R; Moreno, Flavio; Cárdenas López, Dairon

    2011-09-01

    Forests structural differences could result of environmental variations at different scales. Because soils are an important component of plant's environment, it is possible that edaphic and structural variables are associated and that, in consequence, spatial autocorrelation occurs. This paper aims to answer two questions: (1) are structural and edaphic variables associated at local scale in a terra firme forest of Colombian Amazonia? and (2) are these variables regionalized at the scale of work? To answer these questions we analyzed the data of a 6ha plot established in a terra firme forest of the Amacayacu National Park. Structural variables included basal area and density of large trees (diameter > or = 10cm) (Gdos and Ndos), basal area and density of understory individuals (diameter < 10cm) (Gsot and Nsot) and number of species of large trees (sp). Edaphic variables included were pH, organic matter, P, Mg, Ca, K, Al, sand, silt and clay. Structural and edaphic variables were reduced through a principal component analysis (PCA); then, the association between edaphic and structural components from PCA was evaluated by multiple regressions. The existence of regionalization of these variables was studied through isotropic variograms, and autocorrelated variables were spatially mapped. PCA found two significant components for structure, corresponding to the structure of large trees (G, Gdos, Ndos and sp) and of small trees (N, Nsot and Gsot), which explained 43.9% and 36.2% of total variance, respectively. Four components were identified for edaphic variables, which globally explained 81.9% of total variance and basically represent drainage and soil fertility. Regression analyses were significant (p < 0.05) and showed that the structure of both large and small trees is associated with greater sand contents and low soil fertility, though they explained a low proportion of total variability (R2 was 4.9% and 16.5% for the structure of large trees and small tress, respectively). Variables with spatial autocorrelation were the structure of small trees, Al, silt, and sand. Among them, Nsot and sand content showed similar patterns of spatial distribution inside the plot.

  17. Missing Rings in Pinus halepensis - The Missing Link to Relate the Tree-Ring Record to Extreme Climatic Events.

    PubMed

    Novak, Klemen; de Luis, Martin; Saz, Miguel A; Longares, Luis A; Serrano-Notivoli, Roberto; Raventós, Josep; Čufar, Katarina; Gričar, Jožica; Di Filippo, Alfredo; Piovesan, Gianluca; Rathgeber, Cyrille B K; Papadopoulos, Andreas; Smith, Kevin T

    2016-01-01

    Climate predictions for the Mediterranean Basin include increased temperatures, decreased precipitation, and increased frequency of extreme climatic events (ECE). These conditions are associated with decreased tree growth and increased vulnerability to pests and diseases. The anatomy of tree rings responds to these environmental conditions. Quantitatively, the width of a tree ring is largely determined by the rate and duration of cell division by the vascular cambium. In the Mediterranean climate, this division may occur throughout almost the entire year. Alternatively, cell division may cease during relatively cool and dry winters, only to resume in the same calendar year with milder temperatures and increased availability of water. Under particularly adverse conditions, no xylem may be produced in parts of the stem, resulting in a missing ring (MR). A dendrochronological network of Pinus halepensis was used to determine the relationship of MR to ECE. The network consisted of 113 sites, 1,509 trees, 2,593 cores, and 225,428 tree rings throughout the distribution range of the species. A total of 4,150 MR were identified. Binomial logistic regression analysis determined that MR frequency increased with increased cambial age. Spatial analysis indicated that the geographic areas of south-eastern Spain and northern Algeria contained the greatest frequency of MR. Dendroclimatic regression analysis indicated a non-linear relationship of MR to total monthly precipitation and mean temperature. MR are strongly associated with the combination of monthly mean temperature from previous October till current February and total precipitation from previous September till current May. They are likely to occur with total precipitation lower than 50 mm and temperatures higher than 5°C. This conclusion is global and can be applied to every site across the distribution area. Rather than simply being a complication for dendrochronology, MR formation is a fundamental response of trees to adverse environmental conditions. The demonstrated relationship of MR formation to ECE across this dendrochronological network in the Mediterranean basin shows the potential of MR analysis to reconstruct the history of past climatic extremes and to predict future forest dynamics in a changing climate.

  18. Estimating Dbh from Stump Diameter for 15 Southern Species

    Treesearch

    Carl V. Bylin

    1982-01-01

    Regression equations for predicting dbh from tree stump diameter inside and outside bark are presented for 15 southern species. Equations were certified on idependent test subsets using the F distrubution statistic with signigicance level of .05.

  19. Classification and regression trees

    Treesearch

    G. G. Moisen

    2008-01-01

    Frequently, ecologists are interested in exploring ecological relationships, describing patterns and processes, or making spatial or temporal predictions. These purposes often can be addressed by modeling the relationship between some outcome or response and a set of features or explanatory variables.

  20. Analysis of the Importance of Oxides and Clays in Cd, Cr, Cu, Ni, Pb and Zn Adsorption and Retention with Regression Trees

    PubMed Central

    González-Costa, Juan José; Reigosa, Manuel Joaquín; Matías, José María; Fernández-Covelo, Emma

    2017-01-01

    This study determines the influence of the different soil components and of the cation-exchange capacity on the adsorption and retention of different heavy metals: cadmium, chromium, copper, nickel, lead and zinc. In order to do so, regression models were created through decision trees and the importance of soil components was assessed. Used variables were: humified organic matter, specific cation-exchange capacity, percentages of sand and silt, proportions of Mn, Fe and Al oxides and hematite, and the proportion of quartz, plagioclase and mica, and the proportions of the different clays: kaolinite, vermiculite, gibbsite and chlorite. The most important components in the obtained models were vermiculite and gibbsite, especially for the adsorption of cadmium and zinc, while clays were less relevant. Oxides are less important than clays, especially for the adsorption of chromium and lead and the retention of chromium, copper and lead. PMID:28072849

  1. A novel tree-based procedure for deciphering the genomic spectrum of clinical disease entities.

    PubMed

    Mbogning, Cyprien; Perdry, Hervé; Toussile, Wilson; Broët, Philippe

    2014-01-01

    Dissecting the genomic spectrum of clinical disease entities is a challenging task. Recursive partitioning (or classification trees) methods provide powerful tools for exploring complex interplay among genomic factors, with respect to a main factor, that can reveal hidden genomic patterns. To take confounding variables into account, the partially linear tree-based regression (PLTR) model has been recently published. It combines regression models and tree-based methodology. It is however computationally burdensome and not well suited for situations for which a large number of exploratory variables is expected. We developed a novel procedure that represents an alternative to the original PLTR procedure, and considered different selection criteria. A simulation study with different scenarios has been performed to compare the performances of the proposed procedure to the original PLTR strategy. The proposed procedure with a Bayesian Information Criterion (BIC) achieved good performances to detect the hidden structure as compared to the original procedure. The novel procedure was used for analyzing patterns of copy-number alterations in lung adenocarcinomas, with respect to Kirsten Rat Sarcoma Viral Oncogene Homolog gene (KRAS) mutation status, while controlling for a cohort effect. Results highlight two subgroups of pure or nearly pure wild-type KRAS tumors with particular copy-number alteration patterns. The proposed procedure with a BIC criterion represents a powerful and practical alternative to the original procedure. Our procedure performs well in a general framework and is simple to implement.

  2. Using decision trees to understand structure in missing data

    PubMed Central

    Tierney, Nicholas J; Harden, Fiona A; Harden, Maurice J; Mengersen, Kerrie L

    2015-01-01

    Objectives Demonstrate the application of decision trees—classification and regression trees (CARTs), and their cousins, boosted regression trees (BRTs)—to understand structure in missing data. Setting Data taken from employees at 3 different industrial sites in Australia. Participants 7915 observations were included. Materials and methods The approach was evaluated using an occupational health data set comprising results of questionnaires, medical tests and environmental monitoring. Statistical methods included standard statistical tests and the ‘rpart’ and ‘gbm’ packages for CART and BRT analyses, respectively, from the statistical software ‘R’. A simulation study was conducted to explore the capability of decision tree models in describing data with missingness artificially introduced. Results CART and BRT models were effective in highlighting a missingness structure in the data, related to the type of data (medical or environmental), the site in which it was collected, the number of visits, and the presence of extreme values. The simulation study revealed that CART models were able to identify variables and values responsible for inducing missingness. There was greater variation in variable importance for unstructured as compared to structured missingness. Discussion Both CART and BRT models were effective in describing structural missingness in data. CART models may be preferred over BRT models for exploratory analysis of missing data, and selecting variables important for predicting missingness. BRT models can show how values of other variables influence missingness, which may prove useful for researchers. Conclusions Researchers are encouraged to use CART and BRT models to explore and understand missing data. PMID:26124509

  3. Establishing Decision Trees for Predicting Successful Postpyloric Nasoenteric Tube Placement in Critically Ill Patients.

    PubMed

    Chen, Weisheng; Sun, Cheng; Wei, Ru; Zhang, Yanlin; Ye, Heng; Chi, Ruibin; Zhang, Yichen; Hu, Bei; Lv, Bo; Chen, Lifang; Zhang, Xiunong; Lan, Huilan; Chen, Chunbo

    2016-08-31

    Despite the use of prokinetic agents, the overall success rate for postpyloric placement via a self-propelled spiral nasoenteric tube is quite low. This retrospective study was conducted in the intensive care units of 11 university hospitals from 2006 to 2016 among adult patients who underwent self-propelled spiral nasoenteric tube insertion. Success was defined as postpyloric nasoenteric tube placement confirmed by abdominal x-ray scan 24 hours after tube insertion. Chi-square automatic interaction detection (CHAID), simple classification and regression trees (SimpleCart), and J48 methodologies were used to develop decision tree models, and multiple logistic regression (LR) methodology was used to develop an LR model for predicting successful postpyloric nasoenteric tube placement. The area under the receiver operating characteristic curve (AUC) was used to evaluate the performance of these models. Successful postpyloric nasoenteric tube placement was confirmed in 427 of 939 patients enrolled. For predicting successful postpyloric nasoenteric tube placement, the performance of the 3 decision trees was similar in terms of the AUCs: 0.715 for the CHAID model, 0.682 for the SimpleCart model, and 0.671 for the J48 model. The AUC of the LR model was 0.729, which outperformed the J48 model. Both the CHAID and LR models achieved an acceptable discrimination for predicting successful postpyloric nasoenteric tube placement and were useful for intensivists in the setting of self-propelled spiral nasoenteric tube insertion. © 2016 American Society for Parenteral and Enteral Nutrition.

  4. Establishing Decision Trees for Predicting Successful Postpyloric Nasoenteric Tube Placement in Critically Ill Patients.

    PubMed

    Chen, Weisheng; Sun, Cheng; Wei, Ru; Zhang, Yanlin; Ye, Heng; Chi, Ruibin; Zhang, Yichen; Hu, Bei; Lv, Bo; Chen, Lifang; Zhang, Xiunong; Lan, Huilan; Chen, Chunbo

    2018-01-01

    Despite the use of prokinetic agents, the overall success rate for postpyloric placement via a self-propelled spiral nasoenteric tube is quite low. This retrospective study was conducted in the intensive care units of 11 university hospitals from 2006 to 2016 among adult patients who underwent self-propelled spiral nasoenteric tube insertion. Success was defined as postpyloric nasoenteric tube placement confirmed by abdominal x-ray scan 24 hours after tube insertion. Chi-square automatic interaction detection (CHAID), simple classification and regression trees (SimpleCart), and J48 methodologies were used to develop decision tree models, and multiple logistic regression (LR) methodology was used to develop an LR model for predicting successful postpyloric nasoenteric tube placement. The area under the receiver operating characteristic curve (AUC) was used to evaluate the performance of these models. Successful postpyloric nasoenteric tube placement was confirmed in 427 of 939 patients enrolled. For predicting successful postpyloric nasoenteric tube placement, the performance of the 3 decision trees was similar in terms of the AUCs: 0.715 for the CHAID model, 0.682 for the SimpleCart model, and 0.671 for the J48 model. The AUC of the LR model was 0.729, which outperformed the J48 model. Both the CHAID and LR models achieved an acceptable discrimination for predicting successful postpyloric nasoenteric tube placement and were useful for intensivists in the setting of self-propelled spiral nasoenteric tube insertion. © 2016 American Society for Parenteral and Enteral Nutrition.

  5. Tests of a habitat suitability model for black-capped chickadees

    USGS Publications Warehouse

    Schroeder, Richard L.

    1990-01-01

    The black-capped chickadee (Parus atricapillus) Habitat Suitability Index (HSI) model provides a quantitative rating of the capability of a habitat to support breeding, based on measures related to food and nest site availability. The model assumption that tree canopy volume can be predicted from measures of tree height and canopy closure was tested using data from foliage volume studies conducted in the riparian cottonwood habitat along the South Platte River in Colorado. Least absolute deviations (LAD) regression showed that canopy cover and over story tree height yielded volume predictions significantly lower than volume estimated by more direct methods. Revisions to these model relations resulted in improved predictions of foliage volume. The relation between the HSI and estimates of black-capped chickadee population densities was examined using LAD regression for both the original model and the model with the foliage volume revisions. Residuals from these models were compared to residuals from both a zero slope model and an ideal model. The fit model for the original HSI differed significantly from the ideal model, whereas the fit model for the original HSI did not differ significantly from the ideal model. However, both the fit model for the original HSI and the fit model for the revised HSI did not differ significantly from a model with a zero slope. Although further testing of the revised model is needed, its use is recommended for more realistic estimates of tree canopy volume and habitat suitability.

  6. Category of trees in representation theory of quantum algebras

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Moskaliuk, N. M.; Moskaliuk, S. S., E-mail: mss@bitp.kiev.ua

    2013-10-15

    New applications of categorical methods are connected with new additional structures on categories. One of such structures in representation theory of quantum algebras, the category of Kuznetsov-Smorodinsky-Vilenkin-Smirnov (KSVS) trees, is constructed, whose objects are finite rooted KSVS trees and morphisms generated by the transition from a KSVS tree to another one.

  7. Recovery efficiency of whole-tree harvesting

    Treesearch

    Bryce J. Stokes; William F. Watson

    1988-01-01

    The recovery of total tree biomass and most components of a stand is a practical economic and management alternative to tree-length harvesting. First, the increased utilization of woody biomass provides additional revenues from the site. Second, the removal and utilization of the stems and crowns reduces site preparation costs and makes tree planting easier. Third,...

  8. Site Preparation For Intensively Cultured Hybrid Poplar Plantations

    Treesearch

    Edward Hansen; Daniel Netzer; W.J. Rietveld

    1984-01-01

    Five site preparation treatments consisting of combinations of tillage, contact herbicide (glyphosate), and pre-emergent herbicide (linuron) were tested for their effects on tree survival and growth. Treatments had little effect on tree survival, but effects on second-year-tree height were significant and additive -- i.e., tree height increased as the number of types...

  9. On Tree-Based Phylogenetic Networks.

    PubMed

    Zhang, Louxin

    2016-07-01

    A large class of phylogenetic networks can be obtained from trees by the addition of horizontal edges between the tree edges. These networks are called tree-based networks. We present a simple necessary and sufficient condition for tree-based networks and prove that a universal tree-based network exists for any number of taxa that contains as its base every phylogenetic tree on the same set of taxa. This answers two problems posted by Francis and Steel recently. A byproduct is a computer program for generating random binary phylogenetic networks under the uniform distribution model.

  10. Fault-Tree Compiler

    NASA Technical Reports Server (NTRS)

    Butler, Ricky W.; Boerschlein, David P.

    1993-01-01

    Fault-Tree Compiler (FTC) program, is software tool used to calculate probability of top event in fault tree. Gates of five different types allowed in fault tree: AND, OR, EXCLUSIVE OR, INVERT, and M OF N. High-level input language easy to understand and use. In addition, program supports hierarchical fault-tree definition feature, which simplifies tree-description process and reduces execution time. Set of programs created forming basis for reliability-analysis workstation: SURE, ASSIST, PAWS/STEM, and FTC fault-tree tool (LAR-14586). Written in PASCAL, ANSI-compliant C language, and FORTRAN 77. Other versions available upon request.

  11. Use of generalized regression tree models to characterize vegetation favoring Anopheles albimanus breeding.

    PubMed

    Hernandez, J E; Epstein, L D; Rodriguez, M H; Rodriguez, A D; Rejmankova, E; Roberts, D R

    1997-03-01

    We propose the use of generalized tree models (GTMs) to analyze data from entomological field studies. Generalized tree models can be used to characterize environments with different mosquito breeding capacity. A GTM simultaneously analyzes a set of predictor variables (e.g., vegetation coverage) in relation to a response variable (e.g., counts of Anopheles albimanus larvae), and how it varies with respect to a set of criterion variables (e.g., presence of predators). The algorithm produces a treelike graphical display with its root at the top and 2 branches stemming down from each node. At each node, conditions on the value of predictors partition the observations into subgroups (environments) in which the relation between response and criterion variables is most homogeneous.

  12. Can SLE classification rules be effectively applied to diagnose unclear SLE cases?

    PubMed Central

    Mesa, Annia; Fernandez, Mitch; Wu, Wensong; Narasimhan, Giri; Greidinger, Eric L.; Mills, DeEtta K.

    2016-01-01

    Summary Objective Develop a novel classification criteria to distinguish between unclear SLE and MCTD cases. Methods A total of 205 variables from 111 SLE and 55 MCTD patients were evaluated to uncover unique molecular and clinical markers for each disease. Binomial logistic regressions (BLR) were performed on currently used SLE and MCTD classification criteria sets to obtain six reduced models with power to discriminate between unclear SLE and MCTD patients which were confirmed by Receiving Operating Characteristic (ROC) curve. Decision trees were employed to delineate novel classification rules to discriminate between unclear SLE and MCTD patients. Results SLE and MCTD patients exhibited contrasting molecular markers and clinical manifestations. Furthermore, reduced models highlighted SLE patients exhibit prevalence of skin rashes and renal disease while MCTD cases show dominance of myositis and muscle weakness. Additionally decision trees analyses revealed a novel classification rule tailored to differentiate unclear SLE and MCTD patients (Lu-vs-M) with an overall accuracy of 88%. Conclusions Validation of our novel proposed classification rule (Lu-vs-M) includes novel contrasting characteristics (calcinosis, CPK elevated and anti-IgM reactivity for U1-70K, U1A and U1C) between SLE and MCTD patients and showed a 33% improvement in distinguishing these disorders when compare to currently used classification criteria sets. Pending additional validation, our novel classification rule is a promising method to distinguish between patients with unclear SLE and MCTD diagnosis. PMID:27353506

  13. 78 FR 24665 - Gypsy Moth Generally Infested Areas; Additions in Wisconsin

    Federal Register 2010, 2011, 2012, 2013, 2014

    2013-04-26

    ... forest, shade, and commercial trees such as nursery stock and Christmas trees. The gypsy moth regulations... tree growers, and 2 nurseries. We expect that most if not all of these businesses are small according...

  14. Seedling growth responses to phosphorus reflect adult distribution patterns of tropical trees.

    PubMed

    Zalamea, Paul-Camilo; Turner, Benjamin L; Winter, Klaus; Jones, F Andrew; Sarmiento, Carolina; Dalling, James W

    2016-10-01

    Soils influence tropical forest composition at regional scales. In Panama, data on tree communities and underlying soils indicate that species frequently show distributional associations to soil phosphorus. To understand how these associations arise, we combined a pot experiment to measure seedling responses of 15 pioneer species to phosphorus addition with an analysis of the phylogenetic structure of phosphorus associations of the entire tree community. Growth responses of pioneers to phosphorus addition revealed a clear tradeoff: species from high-phosphorus sites grew fastest in the phosphorus-addition treatment, while species from low-phosphorus sites grew fastest in the low-phosphorus treatment. Traits associated with growth performance remain unclear: biomass allocation, phosphatase activity and phosphorus-use efficiency did not correlate with phosphorus associations; however, phosphatase activity was most strongly down-regulated in response to phosphorus addition in species from high-phosphorus sites. Phylogenetic analysis indicated that pioneers occur more frequently in clades where phosphorus associations are overdispersed as compared with the overall tree community, suggesting that selection on phosphorus acquisition and use may be strongest for pioneer species with high phosphorus demand. Our results show that phosphorus-dependent growth rates provide an additional explanation for the regional distribution of tree species in Panama, and possibly elsewhere. © 2016 The Authors. New Phytologist © 2016 New Phytologist Trust.

  15. Fast Screening Technology for Drug Emergency Management: Predicting Suspicious SNPs for ADR with Information Theory-based Models.

    PubMed

    Liang, Zhaohui; Liu, Jun; Huang, Jimmy X; Zeng, Xing

    2018-01-01

    The genetic polymorphism of Cytochrome P450 (CYP 450) is considered as one of the main causes for adverse drug reactions (ADRs). In order to explore the latent correlations between ADRs and potentially corresponding single-nucleotide polymorphism (SNPs) in CYP450, three algorithms based on information theory are used as the main method to predict the possible relation. The study uses a retrospective case-control study to explore the potential relation of ADRs to specific genomic locations and single-nucleotide polymorphism (SNP). The genomic data collected from 53 healthy volunteers are applied for the analysis, another group of genomic data collected from 30 healthy volunteers excluded from the study are used as the control group. The SNPs respective on five loci of CYP2D6*2,*10,*14 and CYP1A2*1C, *1F are detected by the Applied Biosystem 3130xl. The raw data is processed by ChromasPro to detect the specific alleles on the above loci from each sample. The secondary data are reorganized and processed by R combined with the reports of ADRs from clinical reports. Three information theory based algorithms are implemented for the screening task: JMI, CMIM, and mRMR. If a SNP is selected by more than two algorithms, we are confident to conclude that it is related to the corresponding ADR. The selection results are compared with the control decision tree + LASSO regression model. In the study group where ADRs occur, 10 SNPs are considered relevant to the occurrence of a specific ADR by the combined information theory model. In comparison, only 5 SNPs are considered relevant to a specific ADR by the decision tree + LASSO regression model. In addition, the new method detects more relevant pairs of SNP and ADR which are affected by both SNP and dosage. This implies that the new information theory based model is effective to discover correlations of ADRs and CYP 450 SNPs and is helpful in predicting the potential vulnerable genotype for some ADRs. The newly proposed information theory based model has superiority performance in detecting the relation between SNP and ADR compared to the decision tree + LASSO regression model. The new model is more sensitive to detect ADRs compared to the old method, while the old method is more reliable. Therefore, the selection criteria for selecting algorithms should depend on the pragmatic needs. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  16. The use of regression tree analysis for predicting the functional outcome following traumatic spinal cord injury.

    PubMed

    Facchinello, Yann; Beauséjour, Marie; Richard-Denis, Andreane; Thompson, Cynthia; Mac-Thiong, Jean-Marc

    2017-10-25

    Predicting the long-term functional outcome following traumatic spinal cord injury is needed to adapt medical strategies and to plan an optimized rehabilitation. This study investigates the use of regression tree for the development of predictive models based on acute clinical and demographic predictors. This prospective study was performed on 172 patients hospitalized following traumatic spinal cord injury. Functional outcome was quantified using the Spinal Cord Independence Measure collected within the first-year post injury. Age, delay prior to surgery and Injury Severity Score were considered as continuous predictors while energy of injury, trauma mechanisms, neurological level of injury, injury severity, occurrence of early spasticity, urinary tract infection, pressure ulcer and pneumonia were coded as categorical inputs. A simplified model was built using only injury severity, neurological level, energy and age as predictor and was compared to a more complex model considering all 11 predictors mentioned above The models built using 4 and 11 predictors were found to explain 51.4% and 62.3% of the variance of the Spinal Cord Independence Measure total score after validation, respectively. The severity of the neurological deficit at admission was found to be the most important predictor. Other important predictors were the Injury Severity Score, age, neurological level and delay prior to surgery. Regression trees offer promising performances for predicting the functional outcome after a traumatic spinal cord injury. It could help to determine the number and type of predictors leading to a prediction model of the functional outcome that can be used clinically in the future.

  17. A regional classification scheme for estimating reference water quality in streams using land-use-adjusted spatial regression-tree analysis

    USGS Publications Warehouse

    Robertson, Dale M.; Saad, D.A.; Heisey, D.M.

    2006-01-01

    Various approaches are used to subdivide large areas into regions containing streams that have similar reference or background water quality and that respond similarly to different factors. For many applications, such as establishing reference conditions, it is preferable to use physical characteristics that are not affected by human activities to delineate these regions. However, most approaches, such as ecoregion classifications, rely on land use to delineate regions or have difficulties compensating for the effects of land use. Land use not only directly affects water quality, but it is often correlated with the factors used to define the regions. In this article, we describe modifications to SPARTA (spatial regression-tree analysis), a relatively new approach applied to water-quality and environmental characteristic data to delineate zones with similar factors affecting water quality. In this modified approach, land-use-adjusted (residualized) water quality and environmental characteristics are computed for each site. Regression-tree analysis is applied to the residualized data to determine the most statistically important environmental characteristics describing the distribution of a specific water-quality constituent. Geographic information for small basins throughout the study area is then used to subdivide the area into relatively homogeneous environmental water-quality zones. For each zone, commonly used approaches are subsequently used to define its reference water quality and how its water quality responds to changes in land use. SPARTA is used to delineate zones of similar reference concentrations of total phosphorus and suspended sediment throughout the upper Midwestern part of the United States. ?? 2006 Springer Science+Business Media, Inc.

  18. Determining Biophysical Controls on Forest Structure using Hyperspatial Satellite Imagery and Ecological Gradient Modeling

    NASA Astrophysics Data System (ADS)

    Dobrowski, S. Z.; Greenberg, J. A.; Schladow, G.

    2006-12-01

    There is evidence from the Sierra Nevada that sub-alpine and alpine environments are currently experiencing landscape-mediated changes in growth and recruitment due to recent climate change. Understanding the biophysical controls of forest structure, growth, and recruitment in these environments is critical for interpreting and predicting the direction and magnitude of biotic responses to climate shift. We examined the abiotic controls of forest biomass within a 305 km2 region of the Carson Range on the eastern shore of Lake Tahoe, CA USA using estimates of forest structure and biophysical drivers developed continuously over the landscape. The study area ranged from 1900 m to 3400 m a.s.l. and encompassed montane, sub-alpine, and alpine environments. From hyperspatial optical imagery (IKONOS), we derived per-tree positions and crown sizes using a template matching approach applied to a pre-classified image of sunlit and shadowed vegetation pixels. From this remote sensing derived stem map, we calculated plot-level estimates of stem density, tree cover and average crown size. Additionally, we developed high resolution (30 m) estimates of climate variables within the study area using meteorological station data, topographic data, and a combination of empirical and mechanistic modeling approaches. From these climate surfaces, digital elevation data, and soil survey data, we derived estimates of direct and indirect biophysical drivers including heat loading, reference evapotranspiration, water deficit, solar radiation, topographic convergence, soil depth, and soil water holding capacity. Using these data sets, we conducted a regression tree analysis with stem density, tree cover, and average tree size as response and biophysical drivers as predictors. Trees were fit using half of the dataset randomly sampled (168,000 samples) and pruned using cost-complexity pruning based on 10-fold cross- validation. Predictions from pruned trees were then assessed against the hold-out data. Preliminary results from this analysis suggest that: 1) the relative importance and dependencies of biophysical drivers on forest structure are contingent upon the position of these forests along gradients of a limiting resource, 2) stem density shows a stronger dependence on water availability than tree size and 3) that the predictive power of abiotic variables are limited with our best models accounting for only 36-40 percent of the variance in the response. These results suggest that the response of forest structure to climate change may be highly idiosyncratic and difficult to predict using abiotic drivers alone.

  19. Trees in the city: valuing street trees in Portland, Oregon

    Treesearch

    G.H. Donovan; D.T. Butry

    2010-01-01

    We use a hedonic price model to simultaneously estimate the effects of street trees on the sales price and the time-on-market (TOM) of houses in Portland. Oregon. On average, street trees add $8,870 to sales price and reduce TOM by 1.7 days. In addition, we found that the benefits of street trees spill over to neighboring houses. Because the provision and maintenance...

  20. Assessing urban forest effects and values, Philladelphia's urban forest

    Treesearch

    David J. Nowak; Robert E., III Hoehn; Daniel E. Crane; Jack C. Stevens; Jeffrey T. Walton

    2007-01-01

    An analysis of trees in Philadelphia reveals that this city has about 2.1 million trees with canopies that cover 15.7 percent of the area. The most common tree species are black cherry, crabapple, and tree of heaven. The urban forest currently stores about 530,000 tons of carbon valued at $9.8 million. In addition, these trees remove about 16,100 tons of carbon per...

  1. Spatial patterns of a tropical tree species growing under an eucalyptus plantation in South-East Brazil.

    PubMed

    Higuchi, P; Silva, A C; Louzada, J N C; Machado, E L M

    2010-05-01

    The objectives of this study were to evaluate the influence of propagules source and the implication of tree size class on the spatial pattern of Xylopia brasiliensis Spreng. individuals growing under the canopy of an experimental plantation of eucalyptus. To this end, all individuals of Xylopia brasiliensis with diameter at soil height (dsh) > 1 cm were mapped in the understory of a 3.16 ha Eucalyptus spp. and Corymbia spp. plantation, located in the municipality of Lavras, SE Brazil. The largest nearby mature tree of X. brasiliensis was considered as the propagules source. Linear regressions were used to assess the influence of the distance of propagules source on the population parameters (density, basal area and height). The spatial pattern of trees was assessed through the Ripley K function. The overall pattern showed that the propagules source distance had strong influence over spatial distribution of trees, mainly the small ones, indicating that the closer the distance from the propagules source, the higher the tree density and the lower the mean tree height. The population showed different spatial distribution patterns according to the spatial scale and diameter class considered. While small trees tended to be aggregated up to around 80 m, the largest individuals were randomly distributed in the area. A plausible explanation for observed patterns might be limited seed rain and intra-population competition.

  2. Topographic influences on vegetation mosaics and tree diversity in the Chihuahuan Desert Borderlands.

    PubMed

    Poulos, Helen M; Camp, Ann E

    2010-04-01

    The abundance and distribution of species reflect how the niche requirements of species and the dynamics of populations interact with spatial and temporal variation in the environment. This study investigated the influence of geographical variation in environmental site conditions on tree dominance and diversity patterns in three topographically dissected mountain ranges in west Texas, USA, and northern Mexico. We measured tree abundance and basal area using a systematic sampling design across the forested areas of three mountain ranges and related these data to a suite of environmental parameters derived from field and digital elevation model data. We employed cluster analysis, classification and regression trees (CART), and rarefaction to identify (1) the dominant forest cover types across the three study sites and (2) environmental influences on tree distribution and diversity patterns. Elevation, topographic position, and incident solar radiation were the major influences on tree dominance and diversity. Mesic valley bottoms hosted high-diversity vegetation types, while hotter and drier mid-slopes and ridgetops supported lower tree diversity. Valley bottoms and other topographic positions shared few species, indicating high species turnover at the landscape scale. Mountain ranges with high topographic complexity also had higher species richness, suggesting that geographical variability in environmental conditions was a major influence on tree diversity. This study stressed the importance of landscape- and regional-scale topographic variability as a key factor controlling vegetation pattern and diversity in southwestern North America.

  3. Calibration of remotely sensed, coarse resolution NDVI to CO2 fluxes in a sagebrush–steppe ecosystem

    USGS Publications Warehouse

    Wylie, Bruce K.; Johnson, Douglas A.; Laca, Emilio; Saliendra, Nicanor Z.; Gilmanov, Tagir G.; Reed, Bradley C.; Tieszen, Larry L.; Worstell, Bruce B.

    2003-01-01

    The net ecosystem exchange (NEE) of carbon flux can be partitioned into gross primary productivity (GPP) and respiration (R). The contribution of remote sensing and modeling holds the potential to predict these components and map them spatially and temporally. This has obvious utility to quantify carbon sink and source relationships and to identify improved land management strategies for optimizing carbon sequestration. The objective of our study was to evaluate prediction of 14-day average daytime CO2 fluxes (Fday) and nighttime CO2 fluxes (Rn) using remote sensing and other data. Fday and Rnwere measured with a Bowen ratio–energy balance (BREB) technique in a sagebrush (Artemisia spp.)–steppe ecosystem in northeast Idaho, USA, during 1996–1999. Micrometeorological variables aggregated across 14-day periods and time-integrated Advanced Very High Resolution Radiometer (AVHRR) Normalized Difference Vegetation Index (iNDVI) were determined during four growing seasons (1996–1999) and used to predict Fday and Rn. We found that iNDVI was a strong predictor of Fday(R2=0.79, n=66, P<0.0001). Inclusion of evapotranspiration in the predictive equation led to improved predictions of Fday (R2=0.82, n=66, P<0.0001). Crossvalidation indicated that regression tree predictions of Fday were prone to overfitting and that linear regression models were more robust. Multiple regression and regression tree models predicted Rn quite well (R2=0.75–0.77, n=66) with the regression tree model being slightly more robust in crossvalidation. Temporal mapping of Fday and Rn is possible with these techniques and would allow the assessment of NEE in sagebrush–steppe ecosystems. Simulations of periodic Fday measurements, as might be provided by a mobile flux tower, indicated that such measurements could be used in combination with iNDVI to accurately predict Fday. These periodic measurements could maximize the utility of expensive flux towers for evaluating various carbon management strategies, carbon certification, and validation and calibration of carbon flux models.

  4. Calibration of remotely sensed, coarse resolution NDVI to CO2 fluxes in a sagebrush-steppe ecosystem

    USGS Publications Warehouse

    Wylie, B.K.; Johnson, D.A.; Laca, Emilio; Saliendra, Nicanor Z.; Gilmanov, T.G.; Reed, B.C.; Tieszen, L.L.; Worstell, B.B.

    2003-01-01

    The net ecosystem exchange (NEE) of carbon flux can be partitioned into gross primary productivity (GPP) and respiration (R). The contribution of remote sensing and modeling holds the potential to predict these components and map them spatially and temporally. This has obvious utility to quantify carbon sink and source relationships and to identify improved land management strategies for optimizing carbon sequestration. The objective of our study was to evaluate prediction of 14-day average daytime CO2 fluxes (Fday) and nighttime CO2 fluxes (Rn) using remote sensing and other data. Fday and Rn were measured with a Bowen ratio-energy balance (BREB) technique in a sagebrush (Artemisia spp.)-steppe ecosystem in northeast Idaho, USA, during 1996-1999. Micrometeorological variables aggregated across 14-day periods and time-integrated Advanced Very High Resolution Radiometer (AVHRR) Normalized Difference Vegetation Index (iNDVI) were determined during four growing seasons (1996-1999) and used to predict Fday and Rn. We found that iNDVI was a strong predictor of Fday (R2 = 0.79, n = 66, P < 0.0001). Inclusion of evapotranspiration in the predictive equation led to improved predictions of Fday (R2= 0.82, n = 66, P < 0.0001). Crossvalidation indicated that regression tree predictions of Fday were prone to overfitting and that linear regression models were more robust. Multiple regression and regression tree models predicted Rn quite well (R2 = 0.75-0.77, n = 66) with the regression tree model being slightly more robust in crossvalidation. Temporal mapping of Fday and Rn is possible with these techniques and would allow the assessment of NEE in sagebrush-steppe ecosystems. Simulations of periodic Fday measurements, as might be provided by a mobile flux tower, indicated that such measurements could be used in combination with iNDVI to accurately predict Fday. These periodic measurements could maximize the utility of expensive flux towers for evaluating various carbon management strategies, carbon certification, and validation and calibration of carbon flux models. ?? 2003 Elsevier Science Inc. All rights reserved.

  5. Large unbalanced credit scoring using Lasso-logistic regression ensemble.

    PubMed

    Wang, Hong; Xu, Qingsong; Zhou, Lifeng

    2015-01-01

    Recently, various ensemble learning methods with different base classifiers have been proposed for credit scoring problems. However, for various reasons, there has been little research using logistic regression as the base classifier. In this paper, given large unbalanced data, we consider the plausibility of ensemble learning using regularized logistic regression as the base classifier to deal with credit scoring problems. In this research, the data is first balanced and diversified by clustering and bagging algorithms. Then we apply a Lasso-logistic regression learning ensemble to evaluate the credit risks. We show that the proposed algorithm outperforms popular credit scoring models such as decision tree, Lasso-logistic regression and random forests in terms of AUC and F-measure. We also provide two importance measures for the proposed model to identify important variables in the data.

  6. [Analysis of the characteristics of the older adults with depression using data mining decision tree analysis].

    PubMed

    Park, Myonghwa; Choi, Sora; Shin, A Mi; Koo, Chul Hoi

    2013-02-01

    The purpose of this study was to develop a prediction model for the characteristics of older adults with depression using the decision tree method. A large dataset from the 2008 Korean Elderly Survey was used and data of 14,970 elderly people were analyzed. Target variable was depression and 53 input variables were general characteristics, family & social relationship, economic status, health status, health behavior, functional status, leisure & social activity, quality of life, and living environment. Data were analyzed by decision tree analysis, a data mining technique using SPSS Window 19.0 and Clementine 12.0 programs. The decision trees were classified into five different rules to define the characteristics of older adults with depression. Classification & Regression Tree (C&RT) showed the best prediction with an accuracy of 80.81% among data mining models. Factors in the rules were life satisfaction, nutritional status, daily activity difficulty due to pain, functional limitation for basic or instrumental daily activities, number of chronic diseases and daily activity difficulty due to disease. The different rules classified by the decision tree model in this study should contribute as baseline data for discovering informative knowledge and developing interventions tailored to these individual characteristics.

  7. Predictors of condom use and refusal among the population of Free State province in South Africa

    PubMed Central

    2012-01-01

    Background This study investigated the extent and predictors of condom use and condom refusal in the Free State province in South Africa. Methods Through a household survey conducted in the Free Sate province of South Africa, 5,837 adults were interviewed. Univariate and multivariate survey logistic regressions and classification trees (CT) were used for analysing two response variables ‘ever used condom’ and ‘ever refused condom’. Results Eighty-three per cent of the respondents had ever used condoms, of which 38% always used them; 61% used them during the last sexual intercourse and 9% had ever refused to use them. The univariate logistic regression models and CT analysis indicated that a strong predictor of condom use was its perceived need. In the CT analysis, this variable was followed in importance by ‘knowledge of correct use of condom’, condom availability, young age, being single and higher education. ‘Perceived need’ for condoms did not remain significant in the multivariate analysis after controlling for other variables. The strongest predictor of condom refusal, as shown by the CT, was shame associated with condoms followed by the presence of sexual risk behaviour, knowing one’s HIV status, older age and lacking knowledge of condoms (i.e., ability to prevent sexually transmitted diseases and pregnancy, availability, correct and consistent use and existence of female condoms). In the multivariate logistic regression, age was not significant for condom refusal while affordability and perceived need were additional significant variables. Conclusions The use of complementary modelling techniques such as CT in addition to logistic regressions adds to a better understanding of condom use and refusal. Further improvement in correct and consistent use of condoms will require targeted interventions. In addition to existing social marketing campaigns, tailored approaches should focus on establishing the perceived need for condom-use and improving skills for correct use. They should also incorporate interventions to reduce the shame associated with condoms and individual counselling of those likely to refuse condoms. PMID:22639964

  8. A comparison of selected parametric and imputation methods for estimating snag density and snag quality attributes

    USGS Publications Warehouse

    Eskelson, Bianca N.I.; Hagar, Joan; Temesgen, Hailemariam

    2012-01-01

    Snags (standing dead trees) are an essential structural component of forests. Because wildlife use of snags depends on size and decay stage, snag density estimation without any information about snag quality attributes is of little value for wildlife management decision makers. Little work has been done to develop models that allow multivariate estimation of snag density by snag quality class. Using climate, topography, Landsat TM data, stand age and forest type collected for 2356 forested Forest Inventory and Analysis plots in western Washington and western Oregon, we evaluated two multivariate techniques for their abilities to estimate density of snags by three decay classes. The density of live trees and snags in three decay classes (D1: recently dead, little decay; D2: decay, without top, some branches and bark missing; D3: extensive decay, missing bark and most branches) with diameter at breast height (DBH) ≥ 12.7 cm was estimated using a nonparametric random forest nearest neighbor imputation technique (RF) and a parametric two-stage model (QPORD), for which the number of trees per hectare was estimated with a Quasipoisson model in the first stage and the probability of belonging to a tree status class (live, D1, D2, D3) was estimated with an ordinal regression model in the second stage. The presence of large snags with DBH ≥ 50 cm was predicted using a logistic regression and RF imputation. Because of the more homogenous conditions on private forest lands, snag density by decay class was predicted with higher accuracies on private forest lands than on public lands, while presence of large snags was more accurately predicted on public lands, owing to the higher prevalence of large snags on public lands. RF outperformed the QPORD model in terms of percent accurate predictions, while QPORD provided smaller root mean square errors in predicting snag density by decay class. The logistic regression model achieved more accurate presence/absence classification of large snags than the RF imputation approach. Adjusting the decision threshold to account for unequal size for presence and absence classes is more straightforward for the logistic regression than for the RF imputation approach. Overall, model accuracies were poor in this study, which can be attributed to the poor predictive quality of the explanatory variables and the large range of forest types and geographic conditions observed in the data.

  9. Models of Marine Fish Biodiversity: Assessing Predictors from Three Habitat Classification Schemes.

    PubMed

    Yates, Katherine L; Mellin, Camille; Caley, M Julian; Radford, Ben T; Meeuwig, Jessica J

    2016-01-01

    Prioritising biodiversity conservation requires knowledge of where biodiversity occurs. Such knowledge, however, is often lacking. New technologies for collecting biological and physical data coupled with advances in modelling techniques could help address these gaps and facilitate improved management outcomes. Here we examined the utility of environmental data, obtained using different methods, for developing models of both uni- and multivariate biodiversity metrics. We tested which biodiversity metrics could be predicted best and evaluated the performance of predictor variables generated from three types of habitat data: acoustic multibeam sonar imagery, predicted habitat classification, and direct observer habitat classification. We used boosted regression trees (BRT) to model metrics of fish species richness, abundance and biomass, and multivariate regression trees (MRT) to model biomass and abundance of fish functional groups. We compared model performance using different sets of predictors and estimated the relative influence of individual predictors. Models of total species richness and total abundance performed best; those developed for endemic species performed worst. Abundance models performed substantially better than corresponding biomass models. In general, BRT and MRTs developed using predicted habitat classifications performed less well than those using multibeam data. The most influential individual predictor was the abiotic categorical variable from direct observer habitat classification and models that incorporated predictors from direct observer habitat classification consistently outperformed those that did not. Our results show that while remotely sensed data can offer considerable utility for predictive modelling, the addition of direct observer habitat classification data can substantially improve model performance. Thus it appears that there are aspects of marine habitats that are important for modelling metrics of fish biodiversity that are not fully captured by remotely sensed data. As such, the use of remotely sensed data to model biodiversity represents a compromise between model performance and data availability.

  10. Determination of biologically significant hydrologic condition metrics in urbanizing watersheds: an empirical analysis over a range of environmental settings

    USGS Publications Warehouse

    Steuer, Jeffrey J.; Stensvold, Krista A.; Gregory, Mark B.

    2010-01-01

    We investigated the relations among 83 hydrologic condition metrics (HCMs) and changes in algal, invertebrate, and fish communities in five metropolitan areas across the continental United States. We used a statistical approach that employed Spearman correlation and regression tree analysis to identify five HCMs that are strongly associated with observed biological variation along a gradient of urbanization. The HCMs related to average flow magnitude, high-flow magnitude, high-flow event frequency, high-flow duration, and rate of change of stream cross-sectional area were most consistently associated with changes in aquatic communities. Although our investigation used an urban gradient design with short hydrologic periods of record (≤1 year) of hourly cross-sectional area time series, these five HCMs were consistent with previous investigations using long-term daily-flow records. The ecological sampling day often was included in the hydrologic period. Regression tree models explained up to 73, 92, and 79% of variance for specific algal, invertebrate, and fish community metrics, respectively. National models generally were not as statistically significant as models for individual metropolitan areas. High-flow event frequency, a hydrologic metric found to be transferable across stream type and useful for classifying habitat by previous research, was found to be the most ecologically relevant HCM; transformation by precipitation increased national-scale applicability. We also investigated the relation between measures of stream flashiness and land-cover indicators of urbanization and found that land-cover characteristic and pattern variables, such as road density, percent wetland, and proximity of developed land, were strongly related to HCMs at both a metropolitan and national scale and, therefore, may be effective land-use management options in addition to wholesale impervious-area reduction.

  11. Mining Health App Data to Find More and Less Successful Weight Loss Subgroups

    PubMed Central

    2016-01-01

    Background More than half of all smartphone app downloads involve weight, diet, and exercise. If successful, these lifestyle apps may have far-reaching effects for disease prevention and health cost-savings, but few researchers have analyzed data from these apps. Objective The purposes of this study were to analyze data from a commercial health app (Lose It!) in order to identify successful weight loss subgroups via exploratory analyses and to verify the stability of the results. Methods Cross-sectional, de-identified data from Lose It! were analyzed. This dataset (n=12,427,196) was randomly split into 24 subsamples, and this study used 3 subsamples (combined n=972,687). Classification and regression tree methods were used to explore groupings of weight loss with one subsample, with descriptive analyses to examine other group characteristics. Data mining validation methods were conducted with 2 additional subsamples. Results In subsample 1, 14.96% of users lost 5% or more of their starting body weight. Classification and regression tree analysis identified 3 distinct subgroups: “the occasional users” had the lowest proportion (4.87%) of individuals who successfully lost weight; “the basic users” had 37.61% weight loss success; and “the power users” achieved the highest percentage of weight loss success at 72.70%. Behavioral factors delineated the subgroups, though app-related behavioral characteristics further distinguished them. Results were replicated in further analyses with separate subsamples. Conclusions This study demonstrates that distinct subgroups can be identified in “messy” commercial app data and the identified subgroups can be replicated in independent samples. Behavioral factors and use of custom app features characterized the subgroups. Targeting and tailoring information to particular subgroups could enhance weight loss success. Future studies should replicate data mining analyses to increase methodology rigor. PMID:27301853

  12. Models of Marine Fish Biodiversity: Assessing Predictors from Three Habitat Classification Schemes

    PubMed Central

    Yates, Katherine L.; Mellin, Camille; Caley, M. Julian; Radford, Ben T.; Meeuwig, Jessica J.

    2016-01-01

    Prioritising biodiversity conservation requires knowledge of where biodiversity occurs. Such knowledge, however, is often lacking. New technologies for collecting biological and physical data coupled with advances in modelling techniques could help address these gaps and facilitate improved management outcomes. Here we examined the utility of environmental data, obtained using different methods, for developing models of both uni- and multivariate biodiversity metrics. We tested which biodiversity metrics could be predicted best and evaluated the performance of predictor variables generated from three types of habitat data: acoustic multibeam sonar imagery, predicted habitat classification, and direct observer habitat classification. We used boosted regression trees (BRT) to model metrics of fish species richness, abundance and biomass, and multivariate regression trees (MRT) to model biomass and abundance of fish functional groups. We compared model performance using different sets of predictors and estimated the relative influence of individual predictors. Models of total species richness and total abundance performed best; those developed for endemic species performed worst. Abundance models performed substantially better than corresponding biomass models. In general, BRT and MRTs developed using predicted habitat classifications performed less well than those using multibeam data. The most influential individual predictor was the abiotic categorical variable from direct observer habitat classification and models that incorporated predictors from direct observer habitat classification consistently outperformed those that did not. Our results show that while remotely sensed data can offer considerable utility for predictive modelling, the addition of direct observer habitat classification data can substantially improve model performance. Thus it appears that there are aspects of marine habitats that are important for modelling metrics of fish biodiversity that are not fully captured by remotely sensed data. As such, the use of remotely sensed data to model biodiversity represents a compromise between model performance and data availability. PMID:27333202

  13. Our Air: Unfit for Trees.

    ERIC Educational Resources Information Center

    Dochinger, Leon S.

    To help urban, suburban, and rural tree owners know about air pollution's effects on trees and their tolerance and intolerance to pollutants, the USDA Forest Service has prepared this booklet. It answers the following questions about atmospheric pollution: Where does it come from? What can it do to trees? and What can we do about it? In addition,…

  14. Calcium and aluminum impacts on sugar maple physiology in a northern hardwood forest.

    PubMed

    Halman, Joshua M; Schaberg, Paul G; Hawley, Gary J; Pardo, Linda H; Fahey, Timothy J

    2013-11-01

    Forests of northeastern North America have been exposed to anthropogenic acidic inputs for decades, resulting in altered cation relations and disruptions to associated physiological processes in multiple tree species, including sugar maple (Acer saccharum Marsh.). In the current study, the impacts of calcium (Ca) and aluminum (Al) additions on mature sugar maple physiology were evaluated at the Hubbard Brook Experimental Forest (Thornton, NH, USA) to assess remediation (Ca addition) or exacerbation (Al addition) of current acidified conditions. Fine root cation concentrations and membrane integrity, carbon (C) allocation, foliar cation concentrations and antioxidant activity, foliar response to a spring freezing event and reproductive ability (flowering, seed quantity, filled seed and seed germination) were evaluated for dominant sugar maple trees in a replicated plot study. Root damage and foliar antioxidant activity were highest in Al-treated trees, while growth-associated C, foliar re-flush following a spring frost and reproductive ability were highest in Ca-treated trees. In general, we found that trees on Ca-treated plots preferentially used C resources for growth and reproductive processes, whereas Al-treated trees devoted C to defense-based processes. Similarities between Al-treated and control trees were observed for foliar cation concentrations, C partitioning and seed production, suggesting that sugar maples growing in native forests may be more stressed than previously perceived. Our experiment suggests that disruption of the balance of Ca and Al in sugar maples by acid deposition continues to be an important driver of tree health.

  15. Association between split selection instability and predictive error in survival trees.

    PubMed

    Radespiel-Tröger, M; Gefeller, O; Rabenstein, T; Hothorn, T

    2006-01-01

    To evaluate split selection instability in six survival tree algorithms and its relationship with predictive error by means of a bootstrap study. We study the following algorithms: logrank statistic with multivariate p-value adjustment without pruning (LR), Kaplan-Meier distance of survival curves (KM), martingale residuals (MR), Poisson regression for censored data (PR), within-node impurity (WI), and exponential log-likelihood loss (XL). With the exception of LR, initial trees are pruned by using split-complexity, and final trees are selected by means of cross-validation. We employ a real dataset from a clinical study of patients with gallbladder stones. The predictive error is evaluated using the integrated Brier score for censored data. The relationship between split selection instability and predictive error is evaluated by means of box-percentile plots, covariate and cutpoint selection entropy, and cutpoint selection coefficients of variation, respectively, in the root node. We found a positive association between covariate selection instability and predictive error in the root node. LR yields the lowest predictive error, while KM and MR yield the highest predictive error. The predictive error of survival trees is related to split selection instability. Based on the low predictive error of LR, we recommend the use of this algorithm for the construction of survival trees. Unpruned survival trees with multivariate p-value adjustment can perform equally well compared to pruned trees. The analysis of split selection instability can be used to communicate the results of tree-based analyses to clinicians and to support the application of survival trees.

  16. Summer and winter habitat suitability of Marco Polo argali in southeastern Tajikistan: A modeling approach.

    PubMed

    Salas, Eric Ariel L; Valdez, Raul; Michel, Stefan

    2017-11-01

    We modeled summer and winter habitat suitability of Marco Polo argali in the Pamir Mountains in southeastern Tajikistan using these statistical algorithms: Generalized Linear Model, Random Forest, Boosted Regression Tree, Maxent, and Multivariate Adaptive Regression Splines. Using sheep occurrence data collected from 2009 to 2015 and a set of selected habitat predictors, we produced summer and winter habitat suitability maps and determined the important habitat suitability predictors for both seasons. Our results demonstrated that argali selected proximity to riparian areas and greenness as the two most relevant variables for summer, and the degree of slope (gentler slopes between 0° to 20°) and Landsat temperature band for winter. The terrain roughness was also among the most important variables in summer and winter models. Aspect was only significant for winter habitat, with argali preferring south-facing mountain slopes. We evaluated various measures of model performance such as the Area Under the Curve (AUC) and the True Skill Statistic (TSS). Comparing the five algorithms, the AUC scored highest for Boosted Regression Tree in summer (AUC = 0.94) and winter model runs (AUC = 0.94). In contrast, Random Forest underperformed in both model runs.

  17. Determining Cutoff Point of Ensemble Trees Based on Sample Size in Predicting Clinical Dose with DNA Microarray Data.

    PubMed

    Yılmaz Isıkhan, Selen; Karabulut, Erdem; Alpar, Celal Reha

    2016-01-01

    Background/Aim . Evaluating the success of dose prediction based on genetic or clinical data has substantially advanced recently. The aim of this study is to predict various clinical dose values from DNA gene expression datasets using data mining techniques. Materials and Methods . Eleven real gene expression datasets containing dose values were included. First, important genes for dose prediction were selected using iterative sure independence screening. Then, the performances of regression trees (RTs), support vector regression (SVR), RT bagging, SVR bagging, and RT boosting were examined. Results . The results demonstrated that a regression-based feature selection method substantially reduced the number of irrelevant genes from raw datasets. Overall, the best prediction performance in nine of 11 datasets was achieved using SVR; the second most accurate performance was provided using a gradient-boosting machine (GBM). Conclusion . Analysis of various dose values based on microarray gene expression data identified common genes found in our study and the referenced studies. According to our findings, SVR and GBM can be good predictors of dose-gene datasets. Another result of the study was to identify the sample size of n = 25 as a cutoff point for RT bagging to outperform a single RT.

  18. Egg distribution and sampling of Diaprepes abbreviatus (Coleoptera: Curculionidae) on silver buttonwood

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Pena, J.E.; Mannion, C.; Amalin, D.

    2007-03-15

    Taylor's power law and Iwao's patchiness regression were used to analyze spatial distribution of eggs of the Diaprepes root weevil, Diaprepes abbreviatus (L.), on silver buttonwood trees, Conocarpus erectus, during 1997 and 1998. Taylor's power law and Iwao's patchiness regression provided similar descriptions of variance-mean relationship for egg distribution within trees. Sample size requirements were determined. Information presented in this paper should help to improve accuracy and efficiency in sampling of the weevil eggs in the future. (author) [Spanish] Se utilizaron la ley de Taylor y la regresion de Iwao para analizar la distribucion de los huevos del picudo Diaprepes,more » Diaprepes abbreviatus (L.) en arboles de boton plateado, Conocarpus erectus. Los estudios fueron realizados durante 1997 y 1998. Tanto la ley de Taylor como la regression de Iwao dieron resultados similares en cuanto a la relacion de la varianza y el promedio para la distribucion de huevos del picudo en los arboles. Se determinaron los requerimentos del tamano del numero de muestras. En un futuro, la informacion que se presenta en este articulo puede ayudar a mejorar la eficiencia del muestreo de huevos de este picudo. (author)« less

  19. Threshold responses of Blackside Dace (Chrosomus cumberlandensis) and Kentucky Arrow Darter (Etheostoma spilotum) to stream conductivity

    USGS Publications Warehouse

    Hitt, Nathaniel P.; Floyd, Michael; Compton, Michael; McDonald, Kenneth

    2016-01-01

    Chrosomus cumberlandensis (Blackside Dace [BSD]) and Etheostoma spilotum (Kentucky Arrow Darter [KAD]) are fish species of conservation concern due to their fragmented distributions, their low population sizes, and threats from anthropogenic stressors in the southeastern United States. We evaluated the relationship between fish abundance and stream conductivity, an index of environmental quality and potential physiological stressor. We modeled occurrence and abundance of KAD in the upper Kentucky River basin (208 samples) and BSD in the upper Cumberland River basin (294 samples) for sites sampled between 2003 and 2013. Segmented regression indicated a conductivity change-point for BSD abundance at 343 μS/cm (95% CI: 123–563 μS/cm) and for KAD abundance at 261 μS/cm (95% CI: 151–370 μS/cm). In both cases, abundances were negligible above estimated conductivity change-points. Post-hoc randomizations accounted for variance in estimated change points due to unequal sample sizes across the conductivity gradients. Boosted regression-tree analysis indicated stronger effects of conductivity than other natural and anthropogenic factors known to influence stream fishes. Boosted regression trees further indicated threshold responses of BSD and KAD occurrence to conductivity gradients in support of segmented regression results. We suggest that the observed conductivity relationship may indicate energetic limitations for insectivorous fishes due to changes in benthic macroinvertebrate community composition.

  20. The allometry of coarse root biomass: log-transformed linear regression or nonlinear regression?

    PubMed

    Lai, Jiangshan; Yang, Bo; Lin, Dunmei; Kerkhoff, Andrew J; Ma, Keping

    2013-01-01

    Precise estimation of root biomass is important for understanding carbon stocks and dynamics in forests. Traditionally, biomass estimates are based on allometric scaling relationships between stem diameter and coarse root biomass calculated using linear regression (LR) on log-transformed data. Recently, it has been suggested that nonlinear regression (NLR) is a preferable fitting method for scaling relationships. But while this claim has been contested on both theoretical and empirical grounds, and statistical methods have been developed to aid in choosing between the two methods in particular cases, few studies have examined the ramifications of erroneously applying NLR. Here, we use direct measurements of 159 trees belonging to three locally dominant species in east China to compare the LR and NLR models of diameter-root biomass allometry. We then contrast model predictions by estimating stand coarse root biomass based on census data from the nearby 24-ha Gutianshan forest plot and by testing the ability of the models to predict known root biomass values measured on multiple tropical species at the Pasoh Forest Reserve in Malaysia. Based on likelihood estimates for model error distributions, as well as the accuracy of extrapolative predictions, we find that LR on log-transformed data is superior to NLR for fitting diameter-root biomass scaling models. More importantly, inappropriately using NLR leads to grossly inaccurate stand biomass estimates, especially for stands dominated by smaller trees.

  1. Female married illiteracy as the most important continual determinant of total fertility rate among districts of Empowered Action Group States of India: Evidence from Annual Health Survey 2011-12.

    PubMed

    Kumar, Rajesh; Dogra, Vishal; Rani, Khushbu; Sahu, Kanti

    2017-01-01

    District level determinants of total fertility rate in Empowered Action Group states of India can help in ongoing population stabilization programs in India. Present study intends to assess the role of district level determinants in predicting total fertility rate among districts of the Empowered Action Group states of India. Data from Annual Health Survey (2011-12) was analysed using STATA and R software packages. Multiple linear regression models were built and evaluated using Akaike Information Criterion. For further understanding, recursive partitioning was used to prepare a regression tree. Female married illiteracy positively associated with total fertility rate and explained more than half (53%) of variance. Under multiple linear regression model, married illiteracy, infant mortality rate, Ante natal care registration, household size, median age of live birth and sex ratio explained 70% of total variance in total fertility rate. In regression tree, female married illiteracy was the root node and splits at 42% determined TFR <= 2.7. The next left side branch was again married illiteracy with splits at 23% to determine TFR <= 2.1. We conclude that female married illiteracy is one of the most important determinants explaining total fertility rate among the districts of an Empowered Action Group states. Focus on female literacy is required to stabilize the population growth in long run.

  2. Assessing urban forest effects and values, New York City's urban forest

    Treesearch

    David J. Nowak; Robert E., III Hoehn; Daniel E. Crane; Jack C. Stevens; Jeffrey T. Walton

    2007-01-01

    An analysis of trees in New York City reveals that this city has about 5.2 million trees with canopies that cover 20.9 percent of the area. The most common tree species are tree of heaven, black cherry, and sweetgum. The urban forest currently stores about 1.35 million tons of carbon valued at $24.9 million. In addition, these trees remove about 42,300 tons of carbon...

  3. Identification of immune correlates of protection in Shigella infection by application of machine learning.

    PubMed

    Arevalillo, Jorge M; Sztein, Marcelo B; Kotloff, Karen L; Levine, Myron M; Simon, Jakub K

    2017-10-01

    Immunologic correlates of protection are important in vaccine development because they give insight into mechanisms of protection, assist in the identification of promising vaccine candidates, and serve as endpoints in bridging clinical vaccine studies. Our goal is the development of a methodology to identify immunologic correlates of protection using the Shigella challenge as a model. The proposed methodology utilizes the Random Forests (RF) machine learning algorithm as well as Classification and Regression Trees (CART) to detect immune markers that predict protection, identify interactions between variables, and define optimal cutoffs. Logistic regression modeling is applied to estimate the probability of protection and the confidence interval (CI) for such a probability is computed by bootstrapping the logistic regression models. The results demonstrate that the combination of Classification and Regression Trees and Random Forests complements the standard logistic regression and uncovers subtle immune interactions. Specific levels of immunoglobulin IgG antibody in blood on the day of challenge predicted protection in 75% (95% CI 67-86). Of those subjects that did not have blood IgG at or above a defined threshold, 100% were protected if they had IgA antibody secreting cells above a defined threshold. Comparison with the results obtained by applying only logistic regression modeling with standard Akaike Information Criterion for model selection shows the usefulness of the proposed method. Given the complexity of the immune system, the use of machine learning methods may enhance traditional statistical approaches. When applied together, they offer a novel way to quantify important immune correlates of protection that may help the development of vaccines. Copyright © 2017 Elsevier Inc. All rights reserved.

  4. Estimating Leaf Water Potential of Giant Sequoia Trees from Airborne Hyperspectral Imagery

    NASA Astrophysics Data System (ADS)

    Francis, E. J.; Asner, G. P.

    2015-12-01

    Recent drought-induced forest dieback events have motivated research on the mechanisms of tree survival and mortality during drought. Leaf water potential, a measure of the force exerted by the evaporation of water from the leaf surface, is an indicator of plant water stress and can help predict tree mortality in response to drought. Scientists have traditionally measured water potentials on a tree-by-tree basis, but have not been able to produce maps of tree water potential at the scale of a whole forest, leaving forest managers unaware of forest drought stress patterns and their ecosystem-level consequences. Imaging spectroscopy, a technique for remote measurement of chemical properties, has been used to successfully estimate leaf water potentials in wheat and maize crops and pinyon-pine and juniper trees, but these estimates have never been scaled to the canopy level. We used hyperspectral reflectance data collected by the Carnegie Airborne Observatory (CAO) to map leaf water potentials of giant sequoia trees (Sequoiadendron giganteum) in an 800-hectare grove in Sequoia National Park. During the current severe drought in California, we measured predawn and midday leaf water potentials of 48 giant sequoia trees, using the pressure bomb method on treetop foliage samples collected with tree-climbing techniques. The CAO collected hyperspectral reflectance data at 1-meter resolution from the same grove within 1-2 weeks of the tree-level measurements. A partial least squares regression was used to correlate reflectance data extracted from the 48 focal trees with their water potentials, producing a model that predicts water potential of giant sequoia trees. Results show that giant sequoia trees can be mapped in the imagery with a classification accuracy of 0.94, and we predicted the water potential of the mapped trees to assess 1) similarities and differences between a leaf water potential map and a canopy water content map produced from airborne hyperspectral data, 2) spatial variability in leaf water potentials and, 3) relationships between water potential and tree leaf area, topography, and surrounding tree density. These results will help forest managers plan prescribed burns to maintain the health of giant sequoia trees during drought.

  5. Tree growth and competition in an old-growth Picea abies forest of boreal Sweden: influence of tree spatial patterning

    USGS Publications Warehouse

    Fraver, Shawn; D'Amato, Anthony W.; Bradford, John B.; Jonsson, Bengt Gunnar; Jönsson, Mari; Esseen, Per-Anders

    2013-01-01

    Question: What factors best characterize tree competitive environments in this structurally diverse old-growth forest, and do these factors vary spatially within and among stands? Location: Old-growth Picea abies forest of boreal Sweden. Methods: Using long-term, mapped permanent plot data augmented with dendrochronological analyses, we evaluated the effect of neighbourhood competition on focal tree growth by means of standard competition indices, each modified to include various metrics of trees size, neighbour mortality weighting (for neighbours that died during the inventory period), and within-neighbourhood tree clustering. Candidate models were evaluated using mixed-model linear regression analyses, with mean basal area increment as the response variable. We then analysed stand-level spatial patterns of competition indices and growth rates (via kriging) to determine if the relationship between these patterns could further elucidate factors influencing tree growth. Results: Inter-tree competition clearly affected growth rates, with crown volume being the size metric most strongly influencing the neighbourhood competitive environment. Including neighbour tree mortality weightings in models only slightly improved descriptions of competitive interactions. Although the within-neighbourhood clustering index did not improve model predictions, competition intensity was influenced by the underlying stand-level tree spatial arrangement: stand-level clustering locally intensified competition and reduced tree growth, whereas in the absence of such clustering, inter-tree competition played a lesser role in constraining tree growth. Conclusions: Our findings demonstrate that competition continues to influence forest processes and structures in an old-growth system that has not experienced major disturbances for at least two centuries. The finding that the underlying tree spatial pattern influenced the competitive environment suggests caution in interpreting traditional tree competition studies, in which tree spatial patterning is typically not taken into account. Our findings highlight the importance of forest structure – particularly the spatial arrangement of trees – in regulating inter-tree competition and growth in structurally diverse forests, and they provide insight into the causes and consequences of heterogeneity in this old-growth system.

  6. BAYESIAN METHODS FOR REGIONAL-SCALE EUTROPHICATION MODELS. (R830887)

    EPA Science Inventory

    We demonstrate a Bayesian classification and regression tree (CART) approach to link multiple environmental stressors to biological responses and quantify uncertainty in model predictions. Such an approach can: (1) report prediction uncertainty, (2) be consistent with the amou...

  7. Trainee Characteristics and Perceptions of HIV/AIDS Training Quality.

    ERIC Educational Resources Information Center

    Panter, A. T.; Huba, G. J.; Melchior, Lisa A.; Anderson, Donna; Driscoll, Mary; German, Victor F.; Henderson, Harold; Henderson, Ron; Lalonde, Bernadette; Uldall, Karnina K.; Zalumas, Jacqueline

    2000-01-01

    Reports findings from 7 HIV/AIDS education and training projects involving more than 600 training sessions. Trainee characteristics were related to their assessments of training quality, using a regression decision-tree analytic approach. Discusses implications for curriculum development. (SLD)

  8. Practical application of cure mixture model for long-term censored survivor data from a withdrawal clinical trial of patients with major depressive disorder.

    PubMed

    Arano, Ichiro; Sugimoto, Tomoyuki; Hamasaki, Toshimitsu; Ohno, Yuko

    2010-04-23

    Survival analysis methods such as the Kaplan-Meier method, log-rank test, and Cox proportional hazards regression (Cox regression) are commonly used to analyze data from randomized withdrawal studies in patients with major depressive disorder. However, unfortunately, such common methods may be inappropriate when a long-term censored relapse-free time appears in data as the methods assume that if complete follow-up were possible for all individuals, each would eventually experience the event of interest. In this paper, to analyse data including such a long-term censored relapse-free time, we discuss a semi-parametric cure regression (Cox cure regression), which combines a logistic formulation for the probability of occurrence of an event with a Cox proportional hazards specification for the time of occurrence of the event. In specifying the treatment's effect on disease-free survival, we consider the fraction of long-term survivors and the risks associated with a relapse of the disease. In addition, we develop a tree-based method for the time to event data to identify groups of patients with differing prognoses (cure survival CART). Although analysis methods typically adapt the log-rank statistic for recursive partitioning procedures, the method applied here used a likelihood ratio (LR) test statistic from a fitting of cure survival regression assuming exponential and Weibull distributions for the latency time of relapse. The method is illustrated using data from a sertraline randomized withdrawal study in patients with major depressive disorder. We concluded that Cox cure regression reveals facts on who may be cured, and how the treatment and other factors effect on the cured incidence and on the relapse time of uncured patients, and that cure survival CART output provides easily understandable and interpretable information, useful both in identifying groups of patients with differing prognoses and in utilizing Cox cure regression models leading to meaningful interpretations.

  9. Fitting and Calibrating a Multilevel Mixed-Effects Stem Taper Model for Maritime Pine in NW Spain

    PubMed Central

    Arias-Rodil, Manuel; Castedo-Dorado, Fernando; Cámara-Obregón, Asunción; Diéguez-Aranda, Ulises

    2015-01-01

    Stem taper data are usually hierarchical (several measurements per tree, and several trees per plot), making application of a multilevel mixed-effects modelling approach essential. However, correlation between trees in the same plot/stand has often been ignored in previous studies. Fitting and calibration of a variable-exponent stem taper function were conducted using data from 420 trees felled in even-aged maritime pine (Pinus pinaster Ait.) stands in NW Spain. In the fitting step, the tree level explained much more variability than the plot level, and therefore calibration at plot level was omitted. Several stem heights were evaluated for measurement of the additional diameter needed for calibration at tree level. Calibration with an additional diameter measured at between 40 and 60% of total tree height showed the greatest improvement in volume and diameter predictions. If additional diameter measurement is not available, the fixed-effects model fitted by the ordinary least squares technique should be used. Finally, we also evaluated how the expansion of parameters with random effects affects the stem taper prediction, as we consider this a key question when applying the mixed-effects modelling approach to taper equations. The results showed that correlation between random effects should be taken into account when assessing the influence of random effects in stem taper prediction. PMID:26630156

  10. Relationships between nutrient composition of flowers and fruit quality in orange trees grown in calcareous soil.

    PubMed

    Pestana, Maribela; Beja, Pedro; Correia, Pedro José; de Varennes, Amarilis; Faria, Eugénio Araújo

    2005-06-01

    To determine if flower nutrient composition can be used to predict fruit quality, a field experiment was conducted over three seasons (1996-1999) in a commercial orange orchard (Citrus sinensis (L.) Osbeck cv. 'Valencia Late', budded on Troyer citrange rootstock) established on a calcareous soil in southern Portugal. Flowers were collected from 20 trees during full bloom in April and their nutrient composition determined, and fruits were harvested the following March and their quality evaluated. Patterns of covariation in flower nutrient concentrations and in fruit quality variables were evaluated by principal component analysis. Regression models relating fruit quality variables to flower nutrient composition were developed by stepwise selection procedures. The predictive power of the regression models was evaluated with an independent data set. Nutrient composition of flowers at full bloom could be used to predict the fruit quality variables fresh fruit mass and maturation index in the following year. Magnesium, Ca and Zn concentrations measured in flowers were related to fruit fresh mass estimations and N, P, Mg and Fe concentrations were related to fruit maturation index. We also established reference values for the nutrient composition of flowers based on measurements made in trees that produced large (> 76 mm in diameter) fruit.

  11. Reconstructions of Soil Moisture for the Upper Colorado River Basin Using Tree-Ring Chronologies

    NASA Astrophysics Data System (ADS)

    Tootle, G.; Anderson, S.; Grissino-Mayer, H.

    2012-12-01

    Soil moisture is an important factor in the global hydrologic cycle, but existing reconstructions of historic soil moisture are limited. Tree-ring chronologies (TRCs) were used to reconstruct annual soil moisture in the Upper Colorado River Basin (UCRB). Gridded soil moisture data were spatially regionalized using principal components analysis and k-nearest neighbor techniques. Moisture sensitive tree-ring chronologies in and adjacent to the UCRB were correlated with regional soil moisture and tested for temporal stability. TRCs that were positively correlated and stable for the calibration period were retained. Stepwise linear regression was applied to identify the best predictor combinations for each soil moisture region. The regressions explained 42-78% of the variability in soil moisture data. We performed reconstructions for individual soil moisture grid cells to enhance understanding of the disparity in reconstructive skill across the regions. Reconstructions that used chronologies based on ponderosa pines (Pinus ponderosa) and pinyon pines (Pinus edulis) explained increased variance in the datasets. Reconstructed soil moisture was standardized and compared with standardized reconstructed streamflow and snow water equivalent from the same region. Soil moisture reconstructions were highly correlated with streamflow and snow water equivalent reconstructions, indicating reconstructions of soil moisture in the UCRB using TRCs successfully represent hydrologic trends, including the identification of periods of prolonged drought.

  12. Modelling the spatial distribution of Fasciola hepatica in bovines using decision tree, logistic regression and GIS query approaches for Brazil.

    PubMed

    Bennema, S C; Molento, M B; Scholte, R G; Carvalho, O S; Pritsch, I

    2017-11-01

    Fascioliasis is a condition caused by the trematode Fasciola hepatica. In this paper, the spatial distribution of F. hepatica in bovines in Brazil was modelled using a decision tree approach and a logistic regression, combined with a geographic information system (GIS) query. In the decision tree and the logistic model, isothermality had the strongest influence on disease prevalence. Also, the 50-year average precipitation in the warmest quarter of the year was included as a risk factor, having a negative influence on the parasite prevalence. The risk maps developed using both techniques, showed a predicted higher prevalence mainly in the South of Brazil. The prediction performance seemed to be high, but both techniques failed to reach a high accuracy in predicting the medium and high prevalence classes to the entire country. The GIS query map, based on the range of isothermality, minimum temperature of coldest month, precipitation of warmest quarter of the year, altitude and the average dailyland surface temperature, showed a possibility of presence of F. hepatica in a very large area. The risk maps produced using these methods can be used to focus activities of animal and public health programmes, even on non-evaluated F. hepatica areas.

  13. Combinations of Stressors in Midlife: Examining Role and Domain Stressors Using Regression Trees and Random Forests

    PubMed Central

    2013-01-01

    Objectives. Global perceptions of stress (GPS) have major implications for mental and physical health, and stress in midlife may influence adaptation in later life. Thus, it is important to determine the unique and interactive effects of diverse influences of role stress (at work or in personal relationships), loneliness, life events, time pressure, caregiving, finances, discrimination, and neighborhood circumstances on these GPS. Method. Exploratory regression trees and random forests were used to examine complex interactions among myriad events and chronic stressors in middle-aged participants’ (N = 410; mean age = 52.12) GPS. Results. Different role and domain stressors were influential at high and low levels of loneliness. Varied combinations of these stressors resulting in similar levels of perceived stress are also outlined as examples of equifinality. Loneliness emerged as an important predictor across trees. Discussion. Exploring multiple stressors simultaneously provides insights into the diversity of stressor combinations across individuals—even those with similar levels of global perceived stress—and answers theoretical mandates to better understand the influence of stress by sampling from many domain and role stressors. Further, the unique influences of each predictor relative to the others inform theory and applied work. Finally, examples of equifinality and multifinality call for targeted interventions. PMID:23341437

  14. Urban tree crown health assessment system: a tool for communities and citizen foresters

    Treesearch

    Matthew F. Winn; Sang-Mook Lee; Philip A. Araman

    2007-01-01

    Trees are important assets to urban communities. In addition to the aesthetic values that urban trees provide, they also aid in such things as erosion control, pollution removal, and rainfall interception. The urban environment, however, can often produce stresses to these trees. Soil compaction, limited root growth, and groundwater contamination are just a few of the...

  15. Systems Theoretic Process Analysis Applied to an Offshore Supply Vessel Dynamic Positioning System

    DTIC Science & Technology

    2016-06-01

    additional safety issues that were either not identified or inadequately mitigated through the use of Fault Tree Analysis and Failure Modes and...Techniques ...................................................................................................... 15 1.3.1. Fault Tree Analysis...49 3.2. Fault Tree Analysis Comparison

  16. Ground water chlorinated ethenes in tree trunks: Case studies, influence of recharge, and potential degradation mechanism

    USGS Publications Warehouse

    Vroblesky, D.A.; Clinton, B.D.; Vose, J.M.; Casey, C.C.; Harvey, G.J.; Bradley, P.M.

    2004-01-01

    Trichloroethene (TCE) was detected in cores of trees growing above TCE-contaminated ground at three sites: the Carswell Golf Course in Texas, Air Force Plant PJKS in Colorado, and Naval Weapons Station Charleston in South Carolina. This was true even when the depth to water was 7.9 m or when the contaminated aquifer was confined beneath ???3 m of clay. Additional ground water contaminants detected in the tree cores were cis-1,2-dichloroethene at two sites and tetrachloroethene at one site. Thus, tree coring can be a rapid and effective means of locating shallow subsurface chlorinated ethenes and possibly identifying zones of active TCE dechlorination. Tree cores collected over time were useful in identifying the onset of ground water contamination. Several factors affecting chlorinated ethene concentrations in tree cores were identified in this investigation. The factors include ground water chlorinated ethene concentrations and depth to ground water contamination. In addition, differing TCE concentrations around the trunk of some trees appear to be related to the roots deriving water from differing areas. Opportunistic uptake of infiltrating rainfall can dilute prerain TCE concentrations in the trunk. TCE concentrations in core headspace may differ among some tree species. In some trees, infestation of bacteria in decaying heartwood may provide a TCE dechlorination mechanism within the trunk.

  17. Microarray-based Resequencing of Multiple Bacillus anthracis Isolates

    DTIC Science & Technology

    2004-12-17

    generated an Unweighted Pair Group Method Arithmetic Mean ( UPGMA ) tree (see methods [56]; Figure 3). The strains group together in a manner broadly similar...was created using DNADIST, plotted as a UPGMA tree using NEIGHBOR and the tree plotted using DRAWGRAM [56]. The B1 strain A0465 was used as an...distance matrix was created using DNADIST, plotted as a UPGMA tree using NEIGHBOR and the tree plotted using DRAWGRAM [57]. Additional data files The

  18. Developing a dengue forecast model using machine learning: A case study in China.

    PubMed

    Guo, Pi; Liu, Tao; Zhang, Qin; Wang, Li; Xiao, Jianpeng; Zhang, Qingying; Luo, Ganfeng; Li, Zhihao; He, Jianfeng; Zhang, Yonghui; Ma, Wenjun

    2017-10-01

    In China, dengue remains an important public health issue with expanded areas and increased incidence recently. Accurate and timely forecasts of dengue incidence in China are still lacking. We aimed to use the state-of-the-art machine learning algorithms to develop an accurate predictive model of dengue. Weekly dengue cases, Baidu search queries and climate factors (mean temperature, relative humidity and rainfall) during 2011-2014 in Guangdong were gathered. A dengue search index was constructed for developing the predictive models in combination with climate factors. The observed year and week were also included in the models to control for the long-term trend and seasonality. Several machine learning algorithms, including the support vector regression (SVR) algorithm, step-down linear regression model, gradient boosted regression tree algorithm (GBM), negative binomial regression model (NBM), least absolute shrinkage and selection operator (LASSO) linear regression model and generalized additive model (GAM), were used as candidate models to predict dengue incidence. Performance and goodness of fit of the models were assessed using the root-mean-square error (RMSE) and R-squared measures. The residuals of the models were examined using the autocorrelation and partial autocorrelation function analyses to check the validity of the models. The models were further validated using dengue surveillance data from five other provinces. The epidemics during the last 12 weeks and the peak of the 2014 large outbreak were accurately forecasted by the SVR model selected by a cross-validation technique. Moreover, the SVR model had the consistently smallest prediction error rates for tracking the dynamics of dengue and forecasting the outbreaks in other areas in China. The proposed SVR model achieved a superior performance in comparison with other forecasting techniques assessed in this study. The findings can help the government and community respond early to dengue epidemics.

  19. Displayed Trees Do Not Determine Distinguishability Under the Network Multispecies Coalescent

    PubMed Central

    Zhu, Sha; Degnan, James H.

    2017-01-01

    Abstract Recent work in estimating species relationships from gene trees has included inferring networks assuming that past hybridization has occurred between species. Probabilistic models using the multispecies coalescent can be used in this framework for likelihood-based inference of both network topologies and parameters, including branch lengths and hybridization parameters. A difficulty for such methods is that it is not always clear whether, or to what extent, networks are identifiable—that is whether there could be two distinct networks that lead to the same distribution of gene trees. For cases in which incomplete lineage sorting occurs in addition to hybridization, we demonstrate a new representation of the species network likelihood that expresses the probability distribution of the gene tree topologies as a linear combination of gene tree distributions given a set of species trees. This representation makes it clear that in some cases in which two distinct networks give the same distribution of gene trees when sampling one allele per species, the two networks can be distinguished theoretically when multiple individuals are sampled per species. This result means that network identifiability is not only a function of the trees displayed by the networks but also depends on allele sampling within species. We additionally give an example in which two networks that display exactly the same trees can be distinguished from their gene trees even when there is only one lineage sampled per species. PMID:27780899

  20. Systematization method for distinguishing plastic groups by using NIR spectroscopy.

    PubMed

    Kaihara, Mikio; Satoh, Minami; Satoh, Minoru

    2007-07-01

    A systematic classification method for polymers is not yet available in case of using near infrared spectra (NIR). That is why we have been searching for a systematic method. Because raw NIR spectra usually have few obvious peaks, NIR spectra have been pretreated by 2nd derivation for taking well modulated spectra. After the pretreatment, we applied classification and regression trees (CART) to the discrimination between the spectra and the species of polymers. As a result, we obtained a relatively simple classification tree. Judging from the obtained splitting conditions and the classified polymers, we concluded that obtained knowledge on the chemical function groups estimated by the important wavelength regions is not always applicable to this classification tree. However, we clarified the splitting rules for polymer species from the NIR spectral point of view.

  1. Classification and regression tree (CART) analysis of endometrial carcinoma: Seeing the forest for the trees.

    PubMed

    Barlin, Joyce N; Zhou, Qin; St Clair, Caryn M; Iasonos, Alexia; Soslow, Robert A; Alektiar, Kaled M; Hensley, Martee L; Leitao, Mario M; Barakat, Richard R; Abu-Rustum, Nadeem R

    2013-09-01

    The objectives of the study are to evaluate which clinicopathologic factors influenced overall survival (OS) in endometrial carcinoma and to determine if the surgical effort to assess para-aortic (PA) lymph nodes (LNs) at initial staging surgery impacts OS. All patients diagnosed with endometrial cancer from 1/1993-12/2011 who had LNs excised were included. PALN assessment was defined by the identification of one or more PALNs on final pathology. A multivariate analysis was performed to assess the effect of PALNs on OS. A form of recursive partitioning called classification and regression tree (CART) analysis was implemented. Variables included: age, stage, tumor subtype, grade, myometrial invasion, total LNs removed, evaluation of PALNs, and adjuvant chemotherapy. The cohort included 1920 patients, with a median age of 62 years. The median number of LNs removed was 16 (range, 1-99). The removal of PALNs was not associated with OS (P=0.450). Using the CART hierarchically, stage I vs. stages II-IV and grades 1-2 vs. grade 3 emerged as predictors of OS. If the tree was allowed to grow, further branching was based on age and myometrial invasion. Total number of LNs removed and assessment of PALNs as defined in this study were not predictive of OS. This innovative CART analysis emphasized the importance of proper stage assignment and a binary grading system in impacting OS. Notably, the total number of LNs removed and specific evaluation of PALNs as defined in this study were not important predictors of OS. Copyright © 2013 Elsevier Inc. All rights reserved.

  2. Unearthing the hidden world of roots: Root biomass and architecture differ among species within the same guild

    PubMed Central

    2017-01-01

    The potential benefits of planting trees have generated significant interest with respect to sequestering carbon and restoring other forest based ecosystem services. Reliable estimates of carbon stocks are pivotal for understanding the global carbon balance and for promoting initiatives to mitigate CO2 emissions through forest management. There are numerous studies employing allometric regression models that convert inventory into aboveground biomass (AGB) and carbon (C). Yet the majority of allometric regression models do not consider the root system nor do these equations provide detail on the architecture and shape of different species. The root system is a vital piece toward understanding the hidden form and function roots play in carbon accumulation, nutrient and plant water uptake, and groundwater infiltration. Work that estimates C in forests as well as models that are used to better understand the hydrologic function of trees need better characterization of tree roots. We harvested 40 trees of six different species, including their roots down to 2 mm in diameter and created species-specific and multi-species models to calculate aboveground (AGB), coarse root belowground biomass (BGB), and total biomass (TB). We also explore the relationship between crown structure and root structure. We found that BGB contributes ~27.6% of a tree’s TB, lateral roots extend over 1.25 times the distance of crown extent, root allocation patterns varied among species, and that AGB is a strong predictor of TB. These findings highlight the potential importance of including the root system in C estimates and lend important insights into the function roots play in water cycling. PMID:29023553

  3. Personalized Risk Prediction in Clinical Oncology Research: Applications and Practical Issues Using Survival Trees and Random Forests.

    PubMed

    Hu, Chen; Steingrimsson, Jon Arni

    2018-01-01

    A crucial component of making individualized treatment decisions is to accurately predict each patient's disease risk. In clinical oncology, disease risks are often measured through time-to-event data, such as overall survival and progression/recurrence-free survival, and are often subject to censoring. Risk prediction models based on recursive partitioning methods are becoming increasingly popular largely due to their ability to handle nonlinear relationships, higher-order interactions, and/or high-dimensional covariates. The most popular recursive partitioning methods are versions of the Classification and Regression Tree (CART) algorithm, which builds a simple interpretable tree structured model. With the aim of increasing prediction accuracy, the random forest algorithm averages multiple CART trees, creating a flexible risk prediction model. Risk prediction models used in clinical oncology commonly use both traditional demographic and tumor pathological factors as well as high-dimensional genetic markers and treatment parameters from multimodality treatments. In this article, we describe the most commonly used extensions of the CART and random forest algorithms to right-censored outcomes. We focus on how they differ from the methods for noncensored outcomes, and how the different splitting rules and methods for cost-complexity pruning impact these algorithms. We demonstrate these algorithms by analyzing a randomized Phase III clinical trial of breast cancer. We also conduct Monte Carlo simulations to compare the prediction accuracy of survival forests with more commonly used regression models under various scenarios. These simulation studies aim to evaluate how sensitive the prediction accuracy is to the underlying model specifications, the choice of tuning parameters, and the degrees of missing covariates.

  4. Application of XGBoost algorithm in hourly PM2.5 concentration prediction

    NASA Astrophysics Data System (ADS)

    Pan, Bingyue

    2018-02-01

    In view of prediction techniques of hourly PM2.5 concentration in China, this paper applied the XGBoost(Extreme Gradient Boosting) algorithm to predict hourly PM2.5 concentration. The monitoring data of air quality in Tianjin city was analyzed by using XGBoost algorithm. The prediction performance of the XGBoost method is evaluated by comparing observed and predicted PM2.5 concentration using three measures of forecast accuracy. The XGBoost method is also compared with the random forest algorithm, multiple linear regression, decision tree regression and support vector machines for regression models using computational results. The results demonstrate that the XGBoost algorithm outperforms other data mining methods.

  5. Large Unbalanced Credit Scoring Using Lasso-Logistic Regression Ensemble

    PubMed Central

    Wang, Hong; Xu, Qingsong; Zhou, Lifeng

    2015-01-01

    Recently, various ensemble learning methods with different base classifiers have been proposed for credit scoring problems. However, for various reasons, there has been little research using logistic regression as the base classifier. In this paper, given large unbalanced data, we consider the plausibility of ensemble learning using regularized logistic regression as the base classifier to deal with credit scoring problems. In this research, the data is first balanced and diversified by clustering and bagging algorithms. Then we apply a Lasso-logistic regression learning ensemble to evaluate the credit risks. We show that the proposed algorithm outperforms popular credit scoring models such as decision tree, Lasso-logistic regression and random forests in terms of AUC and F-measure. We also provide two importance measures for the proposed model to identify important variables in the data. PMID:25706988

  6. [A strategy for assessing environmental influence on airway allergy using a regression binary tree-based method].

    PubMed

    Yoshioka, Fumi; Azuma, Emiko; Nakajima, Takae; Hashimoto, Masafumi; Toyoshima, Kyoichiro; Komachi, Yoshio

    2004-08-01

    To clarify the living environment factors that increase the risk of allergic sensitization to house dust mites, we applied a regression binary tree-based method (CART, Classification & Regression Trees) to an epidemiological study on airway allergy. The utility of the tree map in personal sanitary guidance for preventing allergic sensitization was examined with respect to feasibility and validity. A questionnaire was given to 386 healthy adult women, asking them about their individual living environments. Also, blood samples were collected to measure Dermatophagoides pteronyssinus (Dp)-specific IgE, the presence/absence of Dp-sensitization being expressed as positive/negative. The questionnaire consisted of nine items on (1) home ventilation by keeping windows open, (2) personal or family smoking habits, (3) use of air conditioners in hot weather, (4) type of flooring (tatami/wooden/carpet) in the living room, (5) visible mold proliferation in the kitchen, (6) type of housing (concrete/wooden), (7) residential area (heavy or light traffic area) (8) heating system (use of unventilated combustion appliances), and (9) frequency of cleaning (every day or less often). There also were queries on the past history of airway allergic diseases, such as bronchial asthma and allergic rhinitis. CART and a multivariate logistic regression analysis (MLRA) were performed. The subjects were first classified into two groups, with and without a history of airway allergic diseases (Groups WPH and WOPH). In each group, the involvement of living environment factors in Dp-sensitization was examined using CART and MLRA. In the MLRA study, individual living environment factors showed promotional or suppressive effects on Dp-sensitization with differences between the two groups. With respect to the CART results, the two groups were first split by the factor that had the most significant odds ratio for MLRA. In Group WPH, which had a Dp-sensitization risk of 19.5%, the first split was by the factor of visible mold proliferation in the kitchen into the factor-present group with a risk value of 45.5% and the factor-absent group with 13.5%. The mold proliferation group was split with reference to frequent cleaning, and the risk rose to 75% in the factor-absent group and to 100% when family smoking habits were reported. Group WOPH (the risk: 10.8%) was first split into two groups according to the use of air conditioners in hot weather for more than 6 hours a day or less, which showed risk values of 16.7% and 6.9%, respectively. The risk of the group that intensively used air conditioners fell to 8.3% with tatami as flooring in the living room, and, if others, rose to 20.8%. The risk of the factor-lacking group fell to 4.0% without wooden flooring. CART analysis enables us to express complex relationships between living environment factors and Dp-sensitization simply by a binary regression tree, pointing to preventive strategies that can be flexibly changed according to the individual living environments of the subjects.

  7. Relationship between Heavy Metal Concentrations in Soils and Grasses of Roadside Farmland in Nepal

    PubMed Central

    Yan, Xuedong; Zhang, Fan; Zeng, Chen; Zhang, Man; Devkota, Lochan Prasad; Yao, Tandong

    2012-01-01

    Transportation activities can contribute to accumulation of heavy metals in roadside soil and grass, which could potentially compromise public health and the environment if the roadways cross farmland areas. Particularly, heavy metals may enter the food chain as a result of their uptake by roadside edible grasses. This research was conducted to investigate heavy metal (Cu, Zn, Cd, and Pb) concentrations in roadside farmland soils and corresponding grasses around Kathmandu, Nepal. Four factors were considered for the experimental design, including sample type, sampling location, roadside distance, and tree protection. A total of 60 grass samples and 60 topsoil samples were collected under dry weather conditions. The Multivariate Analysis of Variance (MANOVA) results indicate that the concentrations of Cu, Zn, and Pb in the soil samples are significantly higher than those in the grass samples; the concentrations of Cu and Pb in the suburban roadside farmland are higher than those in the rural mountainous roadside farmland; and the concentrations of Cu and Zn at the sampling locations with roadside trees are significantly lower than those without tree protection. The analysis of transfer factor, which is calculated as the ratio of heavy-metal concentrations in grass to those in the corresponding soil, indicates that the uptake capabilities of heavy metals from soil to grass is in the order of Zn > Cu > Pb. Additionally, it is found that as the soils’ heavy-metal concentrations increase, the capability of heavy-metal transfer to the grass decreases, and this relationship can be characterized by an exponential regression model. PMID:23202679

  8. Accuracy and Calibration of Computational Approaches for Inpatient Mortality Predictive Modeling.

    PubMed

    Nakas, Christos T; Schütz, Narayan; Werners, Marcus; Leichtle, Alexander B

    2016-01-01

    Electronic Health Record (EHR) data can be a key resource for decision-making support in clinical practice in the "big data" era. The complete database from early 2012 to late 2015 involving hospital admissions to Inselspital Bern, the largest Swiss University Hospital, was used in this study, involving over 100,000 admissions. Age, sex, and initial laboratory test results were the features/variables of interest for each admission, the outcome being inpatient mortality. Computational decision support systems were utilized for the calculation of the risk of inpatient mortality. We assessed the recently proposed Acute Laboratory Risk of Mortality Score (ALaRMS) model, and further built generalized linear models, generalized estimating equations, artificial neural networks, and decision tree systems for the predictive modeling of the risk of inpatient mortality. The Area Under the ROC Curve (AUC) for ALaRMS marginally corresponded to the anticipated accuracy (AUC = 0.858). Penalized logistic regression methodology provided a better result (AUC = 0.872). Decision tree and neural network-based methodology provided even higher predictive performance (up to AUC = 0.912 and 0.906, respectively). Additionally, decision tree-based methods can efficiently handle Electronic Health Record (EHR) data that have a significant amount of missing records (in up to >50% of the studied features) eliminating the need for imputation in order to have complete data. In conclusion, we show that statistical learning methodology can provide superior predictive performance in comparison to existing methods and can also be production ready. Statistical modeling procedures provided unbiased, well-calibrated models that can be efficient decision support tools for predicting inpatient mortality and assigning preventive measures.

  9. Heart rate time series characteristics for early detection of infections in critically ill patients.

    PubMed

    Tambuyzer, T; Guiza, F; Boonen, E; Meersseman, P; Vervenne, H; Hansen, T K; Bjerre, M; Van den Berghe, G; Berckmans, D; Aerts, J M; Meyfroidt, G

    2017-04-01

    It is difficult to make a distinction between inflammation and infection. Therefore, new strategies are required to allow accurate detection of infection. Here, we hypothesize that we can distinguish infected from non-infected ICU patients based on dynamic features of serum cytokine concentrations and heart rate time series. Serum cytokine profiles and heart rate time series of 39 patients were available for this study. The serum concentration of ten cytokines were measured using blood sampled every 10 min between 2100 and 0600 hours. Heart rate was recorded every minute. Ten metrics were used to extract features from these time series to obtain an accurate classification of infected patients. The predictive power of the metrics derived from the heart rate time series was investigated using decision tree analysis. Finally, logistic regression methods were used to examine whether classification performance improved with inclusion of features derived from the cytokine time series. The AUC of a decision tree based on two heart rate features was 0.88. The model had good calibration with 0.09 Hosmer-Lemeshow p value. There was no significant additional value of adding static cytokine levels or cytokine time series information to the generated decision tree model. The results suggest that heart rate is a better marker for infection than information captured by cytokine time series when the exact stage of infection is not known. The predictive value of (expensive) biomarkers should always be weighed against the routinely monitored data, and such biomarkers have to demonstrate added value.

  10. Modelling spruce bark beetle infestation probability

    Treesearch

    Paulius Zolubas; Jose Negron; A. Steven Munson

    2009-01-01

    Spruce bark beetle (Ips typographus L.) risk model, based on pure Norway spruce (Picea abies Karst.) stand characteristics in experimental and control plots was developed using classification and regression tree statistical technique under endemic pest population density. The most significant variable in spruce bark beetle...

  11. CART DIAGNOSIS OF WATERSHED IMPAIRMENT IN THE MID-ATLANTIC REGION

    EPA Science Inventory

    Many factors ( stressors ) can lead to increased concentrations of nutrients and sediments, and these factors change across watersheds. Classification and Regression Tree (CART) is a statistical approach that can be used to "diagnose" which factors are important stressors on a pe...

  12. Equations for predicting biomass of six introduced tree species, island of Hawaii

    Treesearch

    Thomas H. Schukrt; Robert F. Strand; Thomas G. Cole; Katharine E. McDuffie

    1988-01-01

    Regression equations to predict total and stem-only above-ground dry biomass for six species (Acacia melanoxylon, Albizio falcataria, Eucalyptus globulus, E. grandis, E. robusta, and E. urophylla) were developed by felling and measuring 2- to 6-year-old...

  13. Tree cover at fine and coarse spatial grains interacts with shade tolerance to shape plant species distributions across the Alps

    PubMed Central

    Nieto-Lugilde, Diego; Lenoir, Jonathan; Abdulhak, Sylvain; Aeschimann, David; Dullinger, Stefan; Gégout, Jean-Claude; Guisan, Antoine; Pauli, Harald; Renaud, Julien; Theurillat, Jean-Paul; Thuiller, Wilfried; Van Es, Jérémie; Vittoz, Pascal; Willner, Wolfgang; Wohlgemuth, Thomas; Zimmermann, Niklaus E.; Svenning, Jens-Christian

    2015-01-01

    The role of competition for light among plants has long been recognised at local scales, but its importance for plant species distributions at larger spatial scales has generally been ignored. Tree cover modifies the local abiotic conditions below the canopy, notably by reducing light availability, and thus, also the performance of species that are not adapted to low-light conditions. However, this local effect may propagate to coarser spatial grains, by affecting colonisation probabilities and local extinction risks of herbs and shrubs. To assess the effect of tree cover at both the plot- and landscape-grain sizes (approximately 10-m and 1-km), we fit Generalised Linear Models (GLMs) for the plot-level distributions of 960 species of herbs and shrubs using 6,935 vegetation plots across the European Alps. We ran four models with different combinations of variables (climate, soil and tree cover) at both spatial grains for each species. We used partial regressions to evaluate the independent effects of plot- and landscape-grain tree cover on plot-level plant communities. Finally, the effects on species-specific elevational range limits were assessed by simulating a removal experiment comparing the species distributions under high and low tree cover. Accounting for tree cover improved the model performance, with the probability of the presence of shade-tolerant species increasing with increasing tree cover, whereas shade-intolerant species showed the opposite pattern. The tree cover effect occurred consistently at both the plot and landscape spatial grains, albeit most strongly at the former. Importantly, tree cover at the two grain sizes had partially independent effects on plot-level plant communities. With high tree cover, shade-intolerant species exhibited narrower elevational ranges than with low tree cover whereas shade-tolerant species showed wider elevational ranges at both limits. These findings suggest that forecasts of climate-related range shifts for herb and shrub species may be modified by tree cover dynamics. PMID:26290621

  14. In situ assessment of the velocity of carbon transfer by tracing 13 C in trunk CO2 efflux after pulse labelling: variations among tree species and seasons.

    PubMed

    Dannoura, Masako; Maillard, Pascale; Fresneau, Chantal; Plain, Caroline; Berveiller, Daniel; Gerant, Dominique; Chipeaux, Christophe; Bosc, Alexandre; Ngao, Jérôme; Damesin, Claire; Loustau, Denis; Epron, Daniel

    2011-04-01

    Phloem is the main pathway for transferring photosynthates belowground. In situ(13) C pulse labelling of trees 8-10 m tall was conducted in the field on 10 beech (Fagus sylvatica) trees, six sessile oak (Quercus petraea) trees and 10 maritime pine (Pinus pinaster) trees throughout the growing season. Respired (13) CO2 from trunks was tracked at different heights using tunable diode laser absorption spectrometry to determine time lags and the velocity of carbon transfer (V). The isotope composition of phloem extracts was measured on several occasions after labelling and used to estimate the rate constant of phloem sap outflux (kP ). Pulse labelling together with high-frequency measurement of the isotope composition of trunk CO2 efflux is a promising tool for studying phloem transport in the field. Seasonal variability in V was predicted in pine and oak by bivariate linear regressions with air temperature and soil water content. V differed among the three species consistently with known differences in phloem anatomy between broadleaf and coniferous trees. V increased with tree diameter in oak and beech, reflecting a nonlinear increase in volumetric flow with increasing bark cross-sectional area, which suggests changes in allocation pattern with tree diameter in broadleaf species. Discrepancies between V and kP indicate vertical changes in functional phloem properties. © 2011 The Authors. New Phytologist © 2011 New Phytologist Trust.

  15. The relative importance of vertical soil nutrient heterogeneity, and mean and depth-specific soil nutrient availabilities for tree species richness in tropical forests and woodlands.

    PubMed

    Shirima, Deo D; Totland, Ørjan; Moe, Stein R

    2016-11-01

    The relative importance of resource heterogeneity and quantity on plant diversity is an ongoing debate among ecologists, but we have limited knowledge on relationships between tree diversity and heterogeneity in soil nutrient availability in tropical forests. We expected tree species richness to be: (1) positively related to vertical soil nutrient heterogeneity; (2) negatively related to mean soil nutrient availability; and (3) more influenced by nutrient availability in the upper than lower soil horizons. Using a data set from 60, 20 × 40-m plots in a moist forest, and 126 plots in miombo woodlands in Tanzania, we regressed tree species richness against vertical soil nutrient heterogeneity, both depth-specific (0-15, 15-30, and 30-60 cm) and mean soil nutrient availability, and soil physical properties, with elevation and measures of anthropogenic disturbance as co-variables. Overall, vertical soil nutrient heterogeneity was the best predictor of tree species richness in miombo but, contrary to our prediction, the relationships between tree species richness and soil nutrient heterogeneity were negative. In the moist forest, mean soil nutrient availability explained considerable variations in tree species richness, and in line with our expectations, these relationships were mainly negative. Soil nutrient availability in the top soil layer explained more of the variation in tree species richness than that in the middle and lower layers in both vegetation types. Our study shows that vertical soil nutrient heterogeneity and mean availability can influence tree species richness at different magnitudes in intensively utilized tropical vegetation types.

  16. Health of native riparian vegetation and its relation to hydrologic conditions along the Mojave River, southern California

    USGS Publications Warehouse

    Lines, Gregory C.

    1999-01-01

    The health of native riparian vegetation and its relation to hydrologic conditions were studied along the Mojave River mainly during the growing seasons of 1997 and 1998. The study concentrated on cottonwood?willow woodlands (predominantly Populus fremontii and Salix gooddingii) and mesquite bosques (predominantly Prosopis glandulosa). Tree-growth characteristics were measured at 16 cottonwood?willow woodland sites and at 3 mesquite bosque sites. Density of live and dead trees, tree diameter and height, canopy density, live-crown volume, leaf-water potential, leaf-area index, mortality, and reproduction were measured or noted at each site. The sites included healthy and reproducing woodlands and bosques, stressed woodlands and bosques with no reproduction, and woodlands and bosques with high mortality. Tree roots were studied at seven sites to determine the vertical distribution of the root system and their relation to the water table at healthy, stressed, and high-mortality cottonwood?willow woodlands. In the six trenches that were dug for this study in May 1997, no cottonwood roots were observed that reached the water table. The root systems of healthy trees typically ended 1 to 2 feet above the water table. At sites with high mortality, the main root mass was commonly 7 to 8 feet above the water table. Water-table depth was monitored at each of the study sites. In addition, volumetric soil moisture and soil-water potential were monitored at varying depths at three cottonwood?willow woodland study sites and at two mesquite bosque sites. Ground, soil, river, lake, and plant (xylem sap) water were analyzed for concentrations of stable hydrogen and oxygen isotopes to determine the source of water used by the trees. On the basis of the root-distribution, soil- and leaf-water potential, and isotope data, it was concluded that cottonwood, willow, and mesquite trees mainly rely on ground water for their perennial sustained supply of water. The trees mainly utilize ground water that has moved upward from the water table into the capillary fringe and into unsaturated soil nearer to land surface. Most precipitation (average is 4 to 6 inches per year) is lost by evaporation and by transpiration of shallow-rooted xeric plants, and very little reaches the root zone of trees along the Mojave River. Water-table depth had no strong correlation to many individual tree-growth characteristics, such as density, diameter, height, and live-crown volume. However, leaf-area index (corrected for stem area) of both healthy and stressed cottonwood?willow woodlands had a highly significant statistical relation to water-table depth, and a curvilinear regression model was defined. As in cottonwood?willow woodlands, leaf-area index of mesquite bosques also decreased with increased water-table depth. However, because of the small number of sites, no significant statistical relation could be defined for mesquite bosques. Because it can be accurately measured repeatedly at the same locations, leaf-area index (corrected for stem area) is recommended as the primary growth characteristic that should be monitored. Future vegetation changes along the Mojave River can be quantified using the sites established for this study. Mortality was as high as 39 percent in healthy cottonwood?willow woodlands, but mortality of 50 to 100 percent was common where water-table depth was greater than about 7 feet or in areas where permanent water-table declines greater than about 5 feet had occurred. At a healthy mesquite bosque where the water-table depth ranged from about 8 to 11 feet, mortality was about 20 percent. Where the water table had been lowered an additional 10 to 25 feet by pumping, mortality of the mesquite was extremely high (80 to 99 percent). On the basis of observations of plant reproduction, it was concluded that established cottonwood?willow woodlands probably will reproduce, mainly by root sprouting of mature trees, if the water-t

  17. Decision Tree based Prediction and Rule Induction for Groundwater Trichloroethene (TCE) Pollution Vulnerability

    NASA Astrophysics Data System (ADS)

    Park, J.; Yoo, K.

    2013-12-01

    For groundwater resource conservation, it is important to accurately assess groundwater pollution sensitivity or vulnerability. In this work, we attempted to use data mining approach to assess groundwater pollution vulnerability in a TCE (trichloroethylene) contaminated Korean industrial site. The conventional DRASTIC method failed to describe TCE sensitivity data with a poor correlation with hydrogeological properties. Among the different data mining methods such as Artificial Neural Network (ANN), Multiple Logistic Regression (MLR), Case Base Reasoning (CBR), and Decision Tree (DT), the accuracy and consistency of Decision Tree (DT) was the best. According to the following tree analyses with the optimal DT model, the failure of the conventional DRASTIC method in fitting with TCE sensitivity data may be due to the use of inaccurate weight values of hydrogeological parameters for the study site. These findings provide a proof of concept that DT based data mining approach can be used in predicting and rule induction of groundwater TCE sensitivity without pre-existing information on weights of hydrogeological properties.

  18. The application of data mining techniques to oral cancer prognosis.

    PubMed

    Tseng, Wan-Ting; Chiang, Wei-Fan; Liu, Shyun-Yeu; Roan, Jinsheng; Lin, Chun-Nan

    2015-05-01

    This study adopted an integrated procedure that combines the clustering and classification features of data mining technology to determine the differences between the symptoms shown in past cases where patients died from or survived oral cancer. Two data mining tools, namely decision tree and artificial neural network, were used to analyze the historical cases of oral cancer, and their performance was compared with that of logistic regression, the popular statistical analysis tool. Both decision tree and artificial neural network models showed superiority to the traditional statistical model. However, as to clinician, the trees created by the decision tree models are relatively easier to interpret compared to that of the artificial neural network models. Cluster analysis also discovers that those stage 4 patients whose also possess the following four characteristics are having an extremely low survival rate: pN is N2b, level of RLNM is level I-III, AJCC-T is T4, and cells mutate situation (G) is moderate.

  19. Mechanisms behind the estimation of photosynthesis traits from leaf reflectance observations

    NASA Astrophysics Data System (ADS)

    Dechant, Benjamin; Cuntz, Matthias; Doktor, Daniel; Vohland, Michael

    2016-04-01

    Many studies have investigated the reflectance-based estimation of leaf chlorophyll, water and dry matter contents of plants. Only few studies focused on photosynthesis traits, however. The maximum potential uptake of carbon dioxide under given environmental conditions is determined mainly by RuBisCO activity, limiting carboxylation, or the speed of photosynthetic electron transport. These two main limitations are represented by the maximum carboxylation capacity, V cmax,25, and the maximum electron transport rate, Jmax,25. These traits were estimated from leaf reflectance before but the mechanisms underlying the estimation remain rather speculative. The aim of this study was therefore to reveal the mechanisms behind reflectance-based estimation of V cmax,25 and Jmax,25. Leaf reflectance, photosynthetic response curves as well as nitrogen content per area, Narea, and leaf mass per area, LMA, were measured on 37 deciduous tree species. V cmax,25 and Jmax,25 were determined from the response curves. Partial Least Squares (PLS) regression models for the two photosynthesis traits V cmax,25 and Jmax,25 as well as Narea and LMA were studied using a cross-validation approach. Analyses of linear regression models based on Narea and other leaf traits estimated via PROSPECT inversion, PLS regression coefficients and model residuals were conducted in order to reveal the mechanisms behind the reflectance-based estimation. We found that V cmax,25 and Jmax,25 can be estimated from leaf reflectance with good to moderate accuracy for a large number of species and different light conditions. The dominant mechanism behind the estimations was the strong relationship between photosynthesis traits and leaf nitrogen content. This was concluded from very strong relationships between PLS regression coefficients, the model residuals as well as the prediction performance of Narea- based linear regression models compared to PLS regression models. While the PLS regression model for V cmax,25 was fully based on the correlation to Narea, the PLS regression model for Jmax,25 was not entirely based on it. Analyses of the contributions of different parts of the reflectance spectrum revealed that the information contributing to the Jmax,25 PLS regression model in addition to the main source of information, Narea, was mainly located in the visible part of the spectrum (500-900 nm). Estimated chlorophyll content could be excluded as potential source of this extra information. The PLS regression coefficients of the Jmax,25 model indicated possible contributions from chlorophyll fluorescence and cytochrome f content. In summary, we found that the main mechanism behind the estimation of V cmax,25 and Jmax,25 from leaf reflectance observations is the correlation to Narea but that there is additional information related to Jmax,25 mainly in the visible part of the spectrum.

  20. Nutrient limitation in soils and trees of a treeline ecotone in Rolwaling Himal, Nepal

    NASA Astrophysics Data System (ADS)

    Drollinger, Simon; Müller, Michael; Schickhoff, Udo; Böhner, Jürgen; Scholten, Thomas

    2015-04-01

    At a global scale, tree growth and thus the position of natural alpine treelines is limited by low temperatures. At landscape and local scales, however, the treeline position depends on multiple interactions of influencing factors and mechanisms. The aim of our research is to understand local scale effects of soil properties and nutrient cycling on tree growth limitation, and their interactions with other abiotic and biotic factors, in a near-natural alpine treeline ecotone of Rolwaling Himal, Nepal. In total 48 plots (20 m x 20 m) were investigated. Three north-facing slopes were separated in four different altitudinal zones with the characteristic vegetation of tree species Rhododendron campanulatum, Abies spectabilis, Betula utilis, Sorbus microphylla and Acer spec. We collected 151 soil horizon samples (Ah, Ae, Bh, Bs), 146 litter layer samples (L), and 146 decomposition layer samples (Of) in 2013, as well as 251 leaves from standing biomass (SB) in 2013 and 2014. All samples were analysed for exchangeable cations or nutrient concentrations of C, N, P, K, Mg, Ca, Mn, Fe and Al. Soil moisture, soil and surface air temperatures were measured by 34 installed sensors. Precipitation and air temperatures were measured by three climate stations. The main pedogenic process is leaching of dissolved organic carbon, aluminium and iron from topsoil to subsoil. Soil types are classified as podzols with generally low nutrient concentrations. Soil acidity is extremely high and humus quality of mineral soils is poor. Our results indicate multilateral interactions and a great spatial variability of essential nutrients within the treeline ecotone. Both, soil nutrients and leave macronutrient concentrations of nitrogen (N), magnesium (Mg), potassium (K) decrease significantly with elevation in the treeline ecotone. Besides, phosphorus (P) foliar concentrations decrease significantly with elevation. Based on regression analyses, low soil temperatures and malnutrition most likely affect tree growth in high altitudes. Thus, we assume a high influence of soil properties and nutrient supply on the position of alpine treeline at a local scale. In addition, a manganese (Mn) excess in foliage of woody species was determined above treeline. With the help of multivariate statistical approaches, potential determining factors of treeline position could be quantified.

Top